Class ZipfDistribution

java.lang.Object
org.apache.commons.statistics.distribution.AbstractDiscreteDistribution
org.apache.commons.statistics.distribution.ZipfDistribution
All Implemented Interfaces:
DiscreteDistribution

public final class ZipfDistribution extends AbstractDiscreteDistribution
Implementation of the Zipf distribution.

The probability mass function of \( X \) is:

\[ f(k; N, s) = \frac{1/k^s}{H_{N,s}} \]

for \( N \in \{1, 2, 3, \dots\} \) the number of elements, \( s \gt 0 \) the exponent characterizing the distribution, \( k \in \{1, 2, \dots, N\} \) the element rank, and \( H_{N,s} \) is the normalizing constant which corresponds to the generalized harmonic number of order N of s.

See Also:
  • Field Details

    • numberOfElements

      private final int numberOfElements
      Number of elements.
    • exponent

      private final double exponent
      Exponent parameter of the distribution.
    • nthHarmonic

      private final double nthHarmonic
      Cached value of the nth generalized harmonic.
    • logNthHarmonic

      private final double logNthHarmonic
      Cached value of the log of the nth generalized harmonic.
  • Constructor Details

    • ZipfDistribution

      private ZipfDistribution(int numberOfElements, double exponent)
      Parameters:
      numberOfElements - Number of elements.
      exponent - Exponent.
  • Method Details

    • of

      public static ZipfDistribution of(int numberOfElements, double exponent)
      Creates a Zipf distribution.
      Parameters:
      numberOfElements - Number of elements.
      exponent - Exponent.
      Returns:
      the distribution
      Throws:
      IllegalArgumentException - if numberOfElements <= 0 or exponent <= 0.
    • getNumberOfElements

      public int getNumberOfElements()
      Gets the number of elements parameter of this distribution.
      Returns:
      the number of elements.
    • getExponent

      public double getExponent()
      Gets the exponent parameter of this distribution.
      Returns:
      the exponent.
    • probability

      public double probability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X = x). In other words, this method represents the probability mass function (PMF) for the distribution.
      Parameters:
      x - Point at which the PMF is evaluated.
      Returns:
      the value of the probability mass function at x.
    • logProbability

      public double logProbability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm.
      Parameters:
      x - Point at which the PMF is evaluated.
      Returns:
      the logarithm of the value of the probability mass function at x.
    • cumulativeProbability

      public double cumulativeProbability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x). In other, words, this method represents the (cumulative) distribution function (CDF) for this distribution.
      Parameters:
      x - Point at which the CDF is evaluated.
      Returns:
      the probability that a random variable with this distribution takes a value less than or equal to x.
    • survivalProbability

      public double survivalProbability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X > x). In other words, this method represents the complementary cumulative distribution function.

      By default, this is defined as 1 - cumulativeProbability(x), but the specific implementation may be more accurate.

      Parameters:
      x - Point at which the survival function is evaluated.
      Returns:
      the probability that a random variable with this distribution takes a value greater than x.
    • getMean

      public double getMean()
      Gets the mean of this distribution.

      For number of elements \( N \) and exponent \( s \), the mean is:

      \[ \frac{H_{N,s-1}}{H_{N,s}} \]

      where \( H_{N,k} \) is the generalized harmonic number of order \( N \) of \( k \).

      Returns:
      the mean.
    • getVariance

      public double getVariance()
      Gets the variance of this distribution.

      For number of elements \( N \) and exponent \( s \), the variance is:

      \[ \frac{H_{N,s-2}}{H_{N,s}} - \frac{H_{N,s-1}^2}{H_{N,s}^2} \]

      where \( H_{N,k} \) is the generalized harmonic number of order \( N \) of \( k \).

      Returns:
      the variance.
    • generalizedHarmonic

      private static double generalizedHarmonic(int n, double m)
      Calculates the Nth generalized harmonic number. See Harmonic Series.

      Assumes exponent > 0 to arrange the terms to sum from small to large.

      Parameters:
      n - Term in the series to calculate (must be larger than 1)
      m - Exponent (special case m = 1 is the harmonic series).
      Returns:
      the nth generalized harmonic number.
    • generalizedHarmonicAscendingSum

      private static double generalizedHarmonicAscendingSum(int n, double m)
      Calculates the Nth generalized harmonic number.

      Checks the value of the exponent to arrange the terms to sum from from small to large.

      Parameters:
      n - Term in the series to calculate (must be larger than 1)
      m - Exponent (special case m = 1 is the harmonic series).
      Returns:
      the nth generalized harmonic number.
    • getSupportLowerBound

      public int getSupportLowerBound()
      Gets the lower bound of the support. This method must return the same value as inverseCumulativeProbability(0), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) \gt 0 \} \). By convention, Integer.MIN_VALUE should be substituted for negative infinity.

      The lower bound of the support is always 1.

      Returns:
      1.
    • getSupportUpperBound

      public int getSupportUpperBound()
      Gets the upper bound of the support. This method must return the same value as inverseCumulativeProbability(1), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) = 1 \} \). By convention, Integer.MAX_VALUE should be substituted for positive infinity.

      The upper bound of the support is the number of elements.

      Returns:
      number of elements.
    • createSampler

      public DiscreteDistribution.Sampler createSampler(org.apache.commons.rng.UniformRandomProvider rng)
      Creates a sampler.
      Specified by:
      createSampler in interface DiscreteDistribution
      Overrides:
      createSampler in class AbstractDiscreteDistribution
      Parameters:
      rng - Generator of uniformly distributed numbers.
      Returns:
      a sampler that produces random numbers according this distribution.