Class HypergeometricDistribution
- java.lang.Object
-
- org.apache.commons.statistics.distribution.AbstractDiscreteDistribution
-
- org.apache.commons.statistics.distribution.HypergeometricDistribution
-
- All Implemented Interfaces:
DiscreteDistribution
public final class HypergeometricDistribution extends AbstractDiscreteDistribution
Implementation of the hypergeometric distribution.The probability mass function of \( X \) is:
\[ f(k; N, K, n) = \frac{\binom{K}{k} \binom{N - K}{n-k}}{\binom{N}{n}} \]
for \( N \in \{0, 1, 2, \dots\} \) the population size, \( K \in \{0, 1, \dots, N\} \) the number of success states, \( n \in \{0, 1, \dots, N\} \) the number of samples, \( k \in \{\max(0, n+K-N), \dots, \min(n, K)\} \) the number of successes, and
\[ \binom{a}{b} = \frac{a!}{b! \, (a-b)!} \]
is the binomial coefficient.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.commons.statistics.distribution.DiscreteDistribution
DiscreteDistribution.Sampler
-
-
Field Summary
Fields Modifier and Type Field Description private double
bp
Binomial probability of success (sampleSize / populationSize).private double
bq
Binomial probability of failure ((populationSize - sampleSize) / populationSize).private static double
HALF
1/2.private int
lowerBound
The lower bound of the support (inclusive).private double[]
midpoint
Cached midpoint of the CDF/SF.private int
numberOfSuccesses
The number of successes in the population.private int
populationSize
The population size.private int
sampleSize
The sample size.private int
upperBound
The upper bound of the support (inclusive).
-
Constructor Summary
Constructors Modifier Constructor Description private
HypergeometricDistribution(int populationSize, int numberOfSuccesses, int sampleSize)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private int
computeInverseProbability(double p, double q, boolean complement)
Implementation for the inverse cumulative or survival probability.private double
computeLogProbability(int x)
Compute the log probability.double
cumulativeProbability(int x)
For a random variableX
whose values are distributed according to this distribution, this method returnsP(X <= x)
.private static int
getLowerDomain(int nn, int k, int n)
Return the lowest domain value for the given hypergeometric distribution parameters.double
getMean()
Gets the mean of this distribution.private double[]
getMidPoint()
Return the mid-pointx
of the distribution, and the cdf(x).int
getNumberOfSuccesses()
Gets the number of successes parameter of this distribution.int
getPopulationSize()
Gets the population size parameter of this distribution.int
getSampleSize()
Gets the sample size parameter of this distribution.int
getSupportLowerBound()
Gets the lower bound of the support.int
getSupportUpperBound()
Gets the upper bound of the support.private static int
getUpperDomain(int k, int n)
Return the highest domain value for the given hypergeometric distribution parameters.double
getVariance()
Gets the variance of this distribution.private double
innerCumulativeProbability(int x0, int x1)
For this distribution,X
, this method returnsP(x0 <= X <= x1)
.int
inverseCumulativeProbability(double p)
Computes the quantile function of this distribution.private int
inverseLower(double p, double q, boolean complement)
Compute the inverse cumulative or survival probability using the lower sum.int
inverseSurvivalProbability(double p)
Computes the inverse survival probability function of this distribution.private int
inverseUpper(double p, double q, boolean complement)
Compute the inverse cumulative or survival probability using the upper sum.double
logProbability(int x)
For a random variableX
whose values are distributed according to this distribution, this method returnslog(P(X = x))
, wherelog
is the natural logarithm.static HypergeometricDistribution
of(int populationSize, int numberOfSuccesses, int sampleSize)
Creates a hypergeometric distribution.double
probability(int x)
For a random variableX
whose values are distributed according to this distribution, this method returnsP(X = x)
.double
probability(int x0, int x1)
For a random variableX
whose values are distributed according to this distribution, this method returnsP(x0 < X <= x1)
.double
survivalProbability(int x)
For a random variableX
whose values are distributed according to this distribution, this method returnsP(X > x)
.-
Methods inherited from class org.apache.commons.statistics.distribution.AbstractDiscreteDistribution
createSampler, getMedian
-
-
-
-
Field Detail
-
HALF
private static final double HALF
1/2.- See Also:
- Constant Field Values
-
numberOfSuccesses
private final int numberOfSuccesses
The number of successes in the population.
-
populationSize
private final int populationSize
The population size.
-
sampleSize
private final int sampleSize
The sample size.
-
lowerBound
private final int lowerBound
The lower bound of the support (inclusive).
-
upperBound
private final int upperBound
The upper bound of the support (inclusive).
-
bp
private final double bp
Binomial probability of success (sampleSize / populationSize).
-
bq
private final double bq
Binomial probability of failure ((populationSize - sampleSize) / populationSize).
-
midpoint
private double[] midpoint
Cached midpoint of the CDF/SF. The array holds [x, cdf(x)] for the midpoint x. Used for the cumulative probability functions.
-
-
Method Detail
-
of
public static HypergeometricDistribution of(int populationSize, int numberOfSuccesses, int sampleSize)
Creates a hypergeometric distribution.- Parameters:
populationSize
- Population size.numberOfSuccesses
- Number of successes in the population.sampleSize
- Sample size.- Returns:
- the distribution
- Throws:
java.lang.IllegalArgumentException
- ifnumberOfSuccesses < 0
, orpopulationSize <= 0
ornumberOfSuccesses > populationSize
, orsampleSize > populationSize
.
-
getLowerDomain
private static int getLowerDomain(int nn, int k, int n)
Return the lowest domain value for the given hypergeometric distribution parameters.- Parameters:
nn
- Population size.k
- Number of successes in the population.n
- Sample size.- Returns:
- the lowest domain value of the hypergeometric distribution.
-
getUpperDomain
private static int getUpperDomain(int k, int n)
Return the highest domain value for the given hypergeometric distribution parameters.- Parameters:
k
- Number of successes in the population.n
- Sample size.- Returns:
- the highest domain value of the hypergeometric distribution.
-
getPopulationSize
public int getPopulationSize()
Gets the population size parameter of this distribution.- Returns:
- the population size.
-
getNumberOfSuccesses
public int getNumberOfSuccesses()
Gets the number of successes parameter of this distribution.- Returns:
- the number of successes.
-
getSampleSize
public int getSampleSize()
Gets the sample size parameter of this distribution.- Returns:
- the sample size.
-
probability
public double probability(int x)
For a random variableX
whose values are distributed according to this distribution, this method returnsP(X = x)
. In other words, this method represents the probability mass function (PMF) for the distribution.- Parameters:
x
- Point at which the PMF is evaluated.- Returns:
- the value of the probability mass function at
x
.
-
probability
public double probability(int x0, int x1)
For a random variableX
whose values are distributed according to this distribution, this method returnsP(x0 < X <= x1)
. The default implementation uses the identityP(x0 < X <= x1) = P(X <= x1) - P(X <= x0)
Special cases:
- returns
0.0
ifx0 == x1
; - returns
probability(x1)
ifx0 + 1 == x1
;
- Specified by:
probability
in interfaceDiscreteDistribution
- Overrides:
probability
in classAbstractDiscreteDistribution
- Parameters:
x0
- Lower bound (exclusive).x1
- Upper bound (inclusive).- Returns:
- the probability that a random variable with this distribution
takes a value between
x0
andx1
, excluding the lower and including the upper endpoint.
- returns
-
logProbability
public double logProbability(int x)
For a random variableX
whose values are distributed according to this distribution, this method returnslog(P(X = x))
, wherelog
is the natural logarithm.- Parameters:
x
- Point at which the PMF is evaluated.- Returns:
- the logarithm of the value of the probability mass function at
x
.
-
computeLogProbability
private double computeLogProbability(int x)
Compute the log probability.- Parameters:
x
- Value.- Returns:
- log(P(X = x))
-
cumulativeProbability
public double cumulativeProbability(int x)
For a random variableX
whose values are distributed according to this distribution, this method returnsP(X <= x)
. In other, words, this method represents the (cumulative) distribution function (CDF) for this distribution.- Parameters:
x
- Point at which the CDF is evaluated.- Returns:
- the probability that a random variable with this distribution
takes a value less than or equal to
x
.
-
survivalProbability
public double survivalProbability(int x)
For a random variableX
whose values are distributed according to this distribution, this method returnsP(X > x)
. In other words, this method represents the complementary cumulative distribution function.By default, this is defined as
1 - cumulativeProbability(x)
, but the specific implementation may be more accurate.- Parameters:
x
- Point at which the survival function is evaluated.- Returns:
- the probability that a random variable with this
distribution takes a value greater than
x
.
-
innerCumulativeProbability
private double innerCumulativeProbability(int x0, int x1)
For this distribution,X
, this method returnsP(x0 <= X <= x1)
. This probability is computed by summing the point probabilities for the valuesx0, x0 + dx, x0 + 2 * dx, ..., x1
; the directiondx
is determined using a comparison of the input bounds. This should be called by usingx0
as the domain limit andx1
as the internal value. This will result in an initial sum of increasing larger magnitudes.- Parameters:
x0
- Inclusive domain bound.x1
- Inclusive internal bound.- Returns:
P(x0 <= X <= x1)
.
-
inverseCumulativeProbability
public int inverseCumulativeProbability(double p)
Description copied from class:AbstractDiscreteDistribution
Computes the quantile function of this distribution. For a random variableX
distributed according to this distribution, the returned value is:\[ x = \begin{cases} \inf \{ x \in \mathbb Z : P(X \le x) \ge p\} & \text{for } 0 \lt p \le 1 \\ \inf \{ x \in \mathbb Z : P(X \le x) \gt 0 \} & \text{for } p = 0 \end{cases} \]
If the result exceeds the range of the data type
int
, thenInteger.MIN_VALUE
orInteger.MAX_VALUE
is returned. In this case the result ofcumulativeProbability(x)
called using the returnedp
-quantile may not compute the originalp
.The default implementation returns:
DiscreteDistribution.getSupportLowerBound()
forp = 0
,DiscreteDistribution.getSupportUpperBound()
forp = 1
, or- the result of a binary search between the lower and upper bound using
cumulativeProbability(x)
. The bounds may be bracketed for efficiency.
- Specified by:
inverseCumulativeProbability
in interfaceDiscreteDistribution
- Overrides:
inverseCumulativeProbability
in classAbstractDiscreteDistribution
- Parameters:
p
- Cumulative probability.- Returns:
- the smallest
p
-quantile of this distribution (largest 0-quantile forp = 0
).
-
inverseSurvivalProbability
public int inverseSurvivalProbability(double p)
Description copied from class:AbstractDiscreteDistribution
Computes the inverse survival probability function of this distribution. For a random variableX
distributed according to this distribution, the returned value is:\[ x = \begin{cases} \inf \{ x \in \mathbb Z : P(X \gt x) \le p\} & \text{for } 0 \le p \lt 1 \\ \inf \{ x \in \mathbb Z : P(X \gt x) \lt 1 \} & \text{for } p = 1 \end{cases} \]
If the result exceeds the range of the data type
int
, thenInteger.MIN_VALUE
orInteger.MAX_VALUE
is returned. In this case the result ofsurvivalProbability(x)
called using the returned(1-p)
-quantile may not compute the originalp
.By default, this is defined as
inverseCumulativeProbability(1 - p)
, but the specific implementation may be more accurate.The default implementation returns:
DiscreteDistribution.getSupportLowerBound()
forp = 1
,DiscreteDistribution.getSupportUpperBound()
forp = 0
, or- the result of a binary search between the lower and upper bound using
survivalProbability(x)
. The bounds may be bracketed for efficiency.
- Specified by:
inverseSurvivalProbability
in interfaceDiscreteDistribution
- Overrides:
inverseSurvivalProbability
in classAbstractDiscreteDistribution
- Parameters:
p
- Cumulative probability.- Returns:
- the smallest
(1-p)
-quantile of this distribution (largest 0-quantile forp = 1
).
-
computeInverseProbability
private int computeInverseProbability(double p, double q, boolean complement)
Implementation for the inverse cumulative or survival probability.- Parameters:
p
- Cumulative probability.q
- Survival probability.complement
- Set to true to compute the inverse survival probability.- Returns:
- the value
-
inverseLower
private int inverseLower(double p, double q, boolean complement)
Compute the inverse cumulative or survival probability using the lower sum.- Parameters:
p
- Cumulative probability.q
- Survival probability.complement
- Set to true to compute the inverse survival probability.- Returns:
- the value
-
inverseUpper
private int inverseUpper(double p, double q, boolean complement)
Compute the inverse cumulative or survival probability using the upper sum.- Parameters:
p
- Cumulative probability.q
- Survival probability.complement
- Set to true to compute the inverse survival probability.- Returns:
- the value
-
getMean
public double getMean()
Gets the mean of this distribution.For population size \( N \), number of successes \( K \), and sample size \( n \), the mean is:
\[ n \frac{K}{N} \]
- Returns:
- the mean.
-
getVariance
public double getVariance()
Gets the variance of this distribution.For population size \( N \), number of successes \( K \), and sample size \( n \), the variance is:
\[ n \frac{K}{N} \frac{N-K}{N} \frac{N-n}{N-1} \]
- Returns:
- the variance.
-
getSupportLowerBound
public int getSupportLowerBound()
Gets the lower bound of the support. This method must return the same value asinverseCumulativeProbability(0)
, i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) \gt 0 \} \). By convention,Integer.MIN_VALUE
should be substituted for negative infinity.For population size \( N \), number of successes \( K \), and sample size \( n \), the lower bound of the support is \( \max \{ 0, n + K - N \} \).
- Returns:
- lower bound of the support
-
getSupportUpperBound
public int getSupportUpperBound()
Gets the upper bound of the support. This method must return the same value asinverseCumulativeProbability(1)
, i.e. \( \inf \{ x \in \mathbb Z : P(X \le x) = 1 \} \). By convention,Integer.MAX_VALUE
should be substituted for positive infinity.For number of successes \( K \), and sample size \( n \), the upper bound of the support is \( \min \{ n, K \} \).
- Returns:
- upper bound of the support
-
getMidPoint
private double[] getMidPoint()
Return the mid-pointx
of the distribution, and the cdf(x).This is not the true median. It is the value where the CDF(x) is closest to 0.5; as such the CDF may be below 0.5 if the next value of x is further from 0.5.
- Returns:
- the mid-point ([x, cdf(x)])
-
-