Class ZipfDistribution

  • All Implemented Interfaces:
    DiscreteDistribution

    public final class ZipfDistribution
    extends AbstractDiscreteDistribution
    Implementation of the Zipf distribution.

    The probability mass function of \( X \) is:

    \[ f(k; N, s) = \frac{1/k^s}{H_{N,s}} \]

    for \( N \in \{1, 2, 3, \dots\} \) the number of elements, \( s \gt 0 \) the exponent characterizing the distribution, \( k \in \{1, 2, \dots, N\} \) the element rank, and \( H_{N,s} \) is the normalizing constant which corresponds to the generalized harmonic number of order N of s.

    See Also:
    Zipf distribution (Wikipedia)
    • Field Detail

      • numberOfElements

        private final int numberOfElements
        Number of elements.
      • exponent

        private final double exponent
        Exponent parameter of the distribution.
      • nthHarmonic

        private final double nthHarmonic
        Cached value of the nth generalized harmonic.
      • logNthHarmonic

        private final double logNthHarmonic
        Cached value of the log of the nth generalized harmonic.
    • Constructor Detail

      • ZipfDistribution

        private ZipfDistribution​(int numberOfElements,
                                 double exponent)
        Parameters:
        numberOfElements - Number of elements.
        exponent - Exponent.
    • Method Detail

      • of

        public static ZipfDistribution of​(int numberOfElements,
                                          double exponent)
        Creates a Zipf distribution.
        Parameters:
        numberOfElements - Number of elements.
        exponent - Exponent.
        Returns:
        the distribution
        Throws:
        java.lang.IllegalArgumentException - if numberOfElements <= 0 or exponent <= 0.
      • getNumberOfElements

        public int getNumberOfElements()
        Gets the number of elements parameter of this distribution.
        Returns:
        the number of elements.
      • getExponent

        public double getExponent()
        Gets the exponent parameter of this distribution.
        Returns:
        the exponent.
      • probability

        public double probability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X = x). In other words, this method represents the probability mass function (PMF) for the distribution.
        Parameters:
        x - Point at which the PMF is evaluated.
        Returns:
        the value of the probability mass function at x.
      • logProbability

        public double logProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm.
        Parameters:
        x - Point at which the PMF is evaluated.
        Returns:
        the logarithm of the value of the probability mass function at x.
      • cumulativeProbability

        public double cumulativeProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x). In other, words, this method represents the (cumulative) distribution function (CDF) for this distribution.
        Parameters:
        x - Point at which the CDF is evaluated.
        Returns:
        the probability that a random variable with this distribution takes a value less than or equal to x.
      • survivalProbability

        public double survivalProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X > x). In other words, this method represents the complementary cumulative distribution function.

        By default, this is defined as 1 - cumulativeProbability(x), but the specific implementation may be more accurate.

        Parameters:
        x - Point at which the survival function is evaluated.
        Returns:
        the probability that a random variable with this distribution takes a value greater than x.
      • getMean

        public double getMean()
        Gets the mean of this distribution.

        For number of elements \( N \) and exponent \( s \), the mean is:

        \[ \frac{H_{N,s-1}}{H_{N,s}} \]

        where \( H_{N,k} \) is the generalized harmonic number of order \( N \) of \( k \).

        Returns:
        the mean.
      • getVariance

        public double getVariance()
        Gets the variance of this distribution.

        For number of elements \( N \) and exponent \( s \), the variance is:

        \[ \frac{H_{N,s-2}}{H_{N,s}} - \frac{H_{N,s-1}^2}{H_{N,s}^2} \]

        where \( H_{N,k} \) is the generalized harmonic number of order \( N \) of \( k \).

        Returns:
        the variance.
      • generalizedHarmonic

        private static double generalizedHarmonic​(int n,
                                                  double m)
        Calculates the Nth generalized harmonic number. See Harmonic Series.

        Assumes exponent > 0 to arrange the terms to sum from small to large.

        Parameters:
        n - Term in the series to calculate (must be larger than 1)
        m - Exponent (special case m = 1 is the harmonic series).
        Returns:
        the nth generalized harmonic number.
      • generalizedHarmonicAscendingSum

        private static double generalizedHarmonicAscendingSum​(int n,
                                                              double m)
        Calculates the Nth generalized harmonic number.

        Checks the value of the exponent to arrange the terms to sum from from small to large.

        Parameters:
        n - Term in the series to calculate (must be larger than 1)
        m - Exponent (special case m = 1 is the harmonic series).
        Returns:
        the nth generalized harmonic number.
      • getSupportLowerBound

        public int getSupportLowerBound()
        Gets the lower bound of the support. This method must return the same value as inverseCumulativeProbability(0), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) \gt 0 \} \). By convention, Integer.MIN_VALUE should be substituted for negative infinity.

        The lower bound of the support is always 1.

        Returns:
        1.
      • getSupportUpperBound

        public int getSupportUpperBound()
        Gets the upper bound of the support. This method must return the same value as inverseCumulativeProbability(1), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) = 1 \} \). By convention, Integer.MAX_VALUE should be substituted for positive infinity.

        The upper bound of the support is the number of elements.

        Returns:
        number of elements.