Class HypergeometricDistribution

  • All Implemented Interfaces:
    DiscreteDistribution

    public final class HypergeometricDistribution
    extends AbstractDiscreteDistribution
    Implementation of the hypergeometric distribution.

    The probability mass function of \( X \) is:

    \[ f(k; N, K, n) = \frac{\binom{K}{k} \binom{N - K}{n-k}}{\binom{N}{n}} \]

    for \( N \in \{0, 1, 2, \dots\} \) the population size, \( K \in \{0, 1, \dots, N\} \) the number of success states, \( n \in \{0, 1, \dots, N\} \) the number of samples, \( k \in \{\max(0, n+K-N), \dots, \min(n, K)\} \) the number of successes, and

    \[ \binom{a}{b} = \frac{a!}{b! \, (a-b)!} \]

    is the binomial coefficient.

    See Also:
    Hypergeometric distribution (Wikipedia), Hypergeometric distribution (MathWorld)
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private double bp
      Binomial probability of success (sampleSize / populationSize).
      private double bq
      Binomial probability of failure ((populationSize - sampleSize) / populationSize).
      private static double HALF
      1/2.
      private int lowerBound
      The lower bound of the support (inclusive).
      private double[] midpoint
      Cached midpoint of the CDF/SF.
      private int numberOfSuccesses
      The number of successes in the population.
      private int populationSize
      The population size.
      private int sampleSize
      The sample size.
      private int upperBound
      The upper bound of the support (inclusive).
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private HypergeometricDistribution​(int populationSize, int numberOfSuccesses, int sampleSize)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private int computeInverseProbability​(double p, double q, boolean complement)
      Implementation for the inverse cumulative or survival probability.
      private double computeLogProbability​(int x)
      Compute the log probability.
      double cumulativeProbability​(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x).
      private static int getLowerDomain​(int nn, int k, int n)
      Return the lowest domain value for the given hypergeometric distribution parameters.
      double getMean()
      Gets the mean of this distribution.
      private double[] getMidPoint()
      Return the mid-point x of the distribution, and the cdf(x).
      int getNumberOfSuccesses()
      Gets the number of successes parameter of this distribution.
      int getPopulationSize()
      Gets the population size parameter of this distribution.
      int getSampleSize()
      Gets the sample size parameter of this distribution.
      int getSupportLowerBound()
      Gets the lower bound of the support.
      int getSupportUpperBound()
      Gets the upper bound of the support.
      private static int getUpperDomain​(int k, int n)
      Return the highest domain value for the given hypergeometric distribution parameters.
      double getVariance()
      Gets the variance of this distribution.
      private double innerCumulativeProbability​(int x0, int x1)
      For this distribution, X, this method returns P(x0 <= X <= x1).
      int inverseCumulativeProbability​(double p)
      Computes the quantile function of this distribution.
      private int inverseLower​(double p, double q, boolean complement)
      Compute the inverse cumulative or survival probability using the lower sum.
      int inverseSurvivalProbability​(double p)
      Computes the inverse survival probability function of this distribution.
      private int inverseUpper​(double p, double q, boolean complement)
      Compute the inverse cumulative or survival probability using the upper sum.
      double logProbability​(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm.
      static HypergeometricDistribution of​(int populationSize, int numberOfSuccesses, int sampleSize)
      Creates a hypergeometric distribution.
      double probability​(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X = x).
      double probability​(int x0, int x1)
      For a random variable X whose values are distributed according to this distribution, this method returns P(x0 < X <= x1).
      double survivalProbability​(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X > x).
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • numberOfSuccesses

        private final int numberOfSuccesses
        The number of successes in the population.
      • populationSize

        private final int populationSize
        The population size.
      • sampleSize

        private final int sampleSize
        The sample size.
      • lowerBound

        private final int lowerBound
        The lower bound of the support (inclusive).
      • upperBound

        private final int upperBound
        The upper bound of the support (inclusive).
      • bp

        private final double bp
        Binomial probability of success (sampleSize / populationSize).
      • bq

        private final double bq
        Binomial probability of failure ((populationSize - sampleSize) / populationSize).
      • midpoint

        private double[] midpoint
        Cached midpoint of the CDF/SF. The array holds [x, cdf(x)] for the midpoint x. Used for the cumulative probability functions.
    • Constructor Detail

      • HypergeometricDistribution

        private HypergeometricDistribution​(int populationSize,
                                           int numberOfSuccesses,
                                           int sampleSize)
        Parameters:
        populationSize - Population size.
        numberOfSuccesses - Number of successes in the population.
        sampleSize - Sample size.
    • Method Detail

      • of

        public static HypergeometricDistribution of​(int populationSize,
                                                    int numberOfSuccesses,
                                                    int sampleSize)
        Creates a hypergeometric distribution.
        Parameters:
        populationSize - Population size.
        numberOfSuccesses - Number of successes in the population.
        sampleSize - Sample size.
        Returns:
        the distribution
        Throws:
        java.lang.IllegalArgumentException - if numberOfSuccesses < 0, or populationSize <= 0 or numberOfSuccesses > populationSize, or sampleSize > populationSize.
      • getLowerDomain

        private static int getLowerDomain​(int nn,
                                          int k,
                                          int n)
        Return the lowest domain value for the given hypergeometric distribution parameters.
        Parameters:
        nn - Population size.
        k - Number of successes in the population.
        n - Sample size.
        Returns:
        the lowest domain value of the hypergeometric distribution.
      • getUpperDomain

        private static int getUpperDomain​(int k,
                                          int n)
        Return the highest domain value for the given hypergeometric distribution parameters.
        Parameters:
        k - Number of successes in the population.
        n - Sample size.
        Returns:
        the highest domain value of the hypergeometric distribution.
      • getPopulationSize

        public int getPopulationSize()
        Gets the population size parameter of this distribution.
        Returns:
        the population size.
      • getNumberOfSuccesses

        public int getNumberOfSuccesses()
        Gets the number of successes parameter of this distribution.
        Returns:
        the number of successes.
      • getSampleSize

        public int getSampleSize()
        Gets the sample size parameter of this distribution.
        Returns:
        the sample size.
      • probability

        public double probability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X = x). In other words, this method represents the probability mass function (PMF) for the distribution.
        Parameters:
        x - Point at which the PMF is evaluated.
        Returns:
        the value of the probability mass function at x.
      • probability

        public double probability​(int x0,
                                  int x1)
        For a random variable X whose values are distributed according to this distribution, this method returns P(x0 < X <= x1). The default implementation uses the identity P(x0 < X <= x1) = P(X <= x1) - P(X <= x0)

        Special cases:

        • returns 0.0 if x0 == x1;
        • returns probability(x1) if x0 + 1 == x1;
        Specified by:
        probability in interface DiscreteDistribution
        Overrides:
        probability in class AbstractDiscreteDistribution
        Parameters:
        x0 - Lower bound (exclusive).
        x1 - Upper bound (inclusive).
        Returns:
        the probability that a random variable with this distribution takes a value between x0 and x1, excluding the lower and including the upper endpoint.
      • logProbability

        public double logProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm.
        Parameters:
        x - Point at which the PMF is evaluated.
        Returns:
        the logarithm of the value of the probability mass function at x.
      • computeLogProbability

        private double computeLogProbability​(int x)
        Compute the log probability.
        Parameters:
        x - Value.
        Returns:
        log(P(X = x))
      • cumulativeProbability

        public double cumulativeProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x). In other, words, this method represents the (cumulative) distribution function (CDF) for this distribution.
        Parameters:
        x - Point at which the CDF is evaluated.
        Returns:
        the probability that a random variable with this distribution takes a value less than or equal to x.
      • survivalProbability

        public double survivalProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X > x). In other words, this method represents the complementary cumulative distribution function.

        By default, this is defined as 1 - cumulativeProbability(x), but the specific implementation may be more accurate.

        Parameters:
        x - Point at which the survival function is evaluated.
        Returns:
        the probability that a random variable with this distribution takes a value greater than x.
      • innerCumulativeProbability

        private double innerCumulativeProbability​(int x0,
                                                  int x1)
        For this distribution, X, this method returns P(x0 <= X <= x1). This probability is computed by summing the point probabilities for the values x0, x0 + dx, x0 + 2 * dx, ..., x1; the direction dx is determined using a comparison of the input bounds. This should be called by using x0 as the domain limit and x1 as the internal value. This will result in an initial sum of increasing larger magnitudes.
        Parameters:
        x0 - Inclusive domain bound.
        x1 - Inclusive internal bound.
        Returns:
        P(x0 <= X <= x1).
      • inverseSurvivalProbability

        public int inverseSurvivalProbability​(double p)
        Description copied from class: AbstractDiscreteDistribution
        Computes the inverse survival probability function of this distribution. For a random variable X distributed according to this distribution, the returned value is:

        \[ x = \begin{cases} \inf \{ x \in \mathbb Z : P(X \gt x) \le p\} & \text{for } 0 \le p \lt 1 \\ \inf \{ x \in \mathbb Z : P(X \gt x) \lt 1 \} & \text{for } p = 1 \end{cases} \]

        If the result exceeds the range of the data type int, then Integer.MIN_VALUE or Integer.MAX_VALUE is returned. In this case the result of survivalProbability(x) called using the returned (1-p)-quantile may not compute the original p.

        By default, this is defined as inverseCumulativeProbability(1 - p), but the specific implementation may be more accurate.

        The default implementation returns:

        Specified by:
        inverseSurvivalProbability in interface DiscreteDistribution
        Overrides:
        inverseSurvivalProbability in class AbstractDiscreteDistribution
        Parameters:
        p - Cumulative probability.
        Returns:
        the smallest (1-p)-quantile of this distribution (largest 0-quantile for p = 1).
      • computeInverseProbability

        private int computeInverseProbability​(double p,
                                              double q,
                                              boolean complement)
        Implementation for the inverse cumulative or survival probability.
        Parameters:
        p - Cumulative probability.
        q - Survival probability.
        complement - Set to true to compute the inverse survival probability.
        Returns:
        the value
      • inverseLower

        private int inverseLower​(double p,
                                 double q,
                                 boolean complement)
        Compute the inverse cumulative or survival probability using the lower sum.
        Parameters:
        p - Cumulative probability.
        q - Survival probability.
        complement - Set to true to compute the inverse survival probability.
        Returns:
        the value
      • inverseUpper

        private int inverseUpper​(double p,
                                 double q,
                                 boolean complement)
        Compute the inverse cumulative or survival probability using the upper sum.
        Parameters:
        p - Cumulative probability.
        q - Survival probability.
        complement - Set to true to compute the inverse survival probability.
        Returns:
        the value
      • getMean

        public double getMean()
        Gets the mean of this distribution.

        For population size \( N \), number of successes \( K \), and sample size \( n \), the mean is:

        \[ n \frac{K}{N} \]

        Returns:
        the mean.
      • getVariance

        public double getVariance()
        Gets the variance of this distribution.

        For population size \( N \), number of successes \( K \), and sample size \( n \), the variance is:

        \[ n \frac{K}{N} \frac{N-K}{N} \frac{N-n}{N-1} \]

        Returns:
        the variance.
      • getSupportLowerBound

        public int getSupportLowerBound()
        Gets the lower bound of the support. This method must return the same value as inverseCumulativeProbability(0), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) \gt 0 \} \). By convention, Integer.MIN_VALUE should be substituted for negative infinity.

        For population size \( N \), number of successes \( K \), and sample size \( n \), the lower bound of the support is \( \max \{ 0, n + K - N \} \).

        Returns:
        lower bound of the support
      • getSupportUpperBound

        public int getSupportUpperBound()
        Gets the upper bound of the support. This method must return the same value as inverseCumulativeProbability(1), i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) = 1 \} \). By convention, Integer.MAX_VALUE should be substituted for positive infinity.

        For number of successes \( K \), and sample size \( n \), the upper bound of the support is \( \min \{ n, K \} \).

        Returns:
        upper bound of the support
      • getMidPoint

        private double[] getMidPoint()
        Return the mid-point x of the distribution, and the cdf(x).

        This is not the true median. It is the value where the CDF(x) is closest to 0.5; as such the CDF may be below 0.5 if the next value of x is further from 0.5.

        Returns:
        the mid-point ([x, cdf(x)])