Class UnconditionedExactTest
- java.lang.Object
-
- org.apache.commons.statistics.inference.UnconditionedExactTest
-
public final class UnconditionedExactTest extends java.lang.Object
Implements an unconditioned exact test for a contingency table.Performs an exact test for the statistical significance of the association (contingency) between two kinds of categorical classification. A 2x2 contingency table is:
\[ \left[ {\begin{array}{cc} a & b \\ c & d \\ \end{array} } \right] \]
This test applies to the case of a 2x2 contingency table with one margin fixed. Note that if both margins are fixed (the row sums and column sums are not random) then Fisher's exact test can be applied.
This implementation fixes the column sums \( m = a + c \) and \( n = b + d \). All possible tables can be created using \( 0 \le a \le m \) and \( 0 \le b \le n \). The random values \( a \) and \( b \) follow a binomial distribution with probabilities \( p_0 \) and \( p_1 \) such that \( a \sim B(m, p_0) \) and \( b \sim B(n, p_1) \). The p-value of the 2x2 table is the product of two binomials:
\[ \begin{aligned} p &= Pr(a; m, p_0) \times Pr(b; n, p_1) \\ &= \binom{m}{a} p_0^a (1-p_0)^{m-a} \times \binom{n}{b} p_1^b (1-p_1)^{n-b} \end{aligned} \]
For the binomial model, the null hypothesis is the two nuisance parameters are equal \( p_0 = p_1 = \pi\), with \( \pi \) the probability for equal proportions, and the probability of any single table is:
\[ p = \binom{m}{a} \binom{n}{b} \pi^{a+b} (1-\pi)^{m+n-a-b} \]
The p-value of the observed table is calculated by maximising the sum of the as or more extreme tables over the domain of the nuisance parameter \( 0 \lt \pi \lt 1 \):
\[ p(a, b) = \sum_{i,j} \binom{m}{i} \binom{n}{j} \pi^{i+j} (1-\pi)^{m+n-i-j} \]
where table \( (i,j) \) is as or more extreme than the observed table \( (a, b) \). The test can be configured to select more extreme tables using various methods.
Note that the sum of the joint binomial distribution is a univariate function for the nuisance parameter \( \pi \). This function may have many local maxima and the search enumerates the range with a configured number of points. The best candidates are optionally used as the start point for an optimized search for a local maxima.
References:
- Barnard, G.A. (1947). Significance tests for 2x2 tables. Biometrika, 34, Issue 1-2, 123–138.
- Boschloo, R.D. (1970). Raised conditional level of significance for the 2 × 2-table when testing the equality of two probabilities. Statistica neerlandica, 24(1), 1–9.
- Suisaa, A and Shuster, J.J. (1985). Exact Unconditional Sample Sizes for the 2 × 2 Binomial Trial. Journal of the Royal Statistical Society. Series A (General), 148(4), 317-327.
- Since:
- 1.1
- See Also:
FisherExactTest
, Boschloo's test (Wikipedia), Barnard's test (Wikipedia)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static interface
UnconditionedExactTest.BoschlooStatistic
Compute the statistic for Boschloo's test.(package private) static class
UnconditionedExactTest.Candidates
A container of (key,value) pairs to store candidate minima.static class
UnconditionedExactTest.Method
Define the method to determine the more extreme tables.static class
UnconditionedExactTest.Result
Result for the unconditioned exact test.private static class
UnconditionedExactTest.XYList
An expandable list of (x,y) values.
-
Field Summary
Fields Modifier and Type Field Description private AlternativeHypothesis
alternative
Alternative hypothesis.private static java.lang.String
COLUMN_SUM
Error message text for zero column sums.private static UnconditionedExactTest
DEFAULT
Default instance.private static double
INC_FRACTION
Fraction of the increment (interval between enumerated points) to initialise the bracket for the minima.private static double
LOWER_BOUND
Lower bound for the enumerated interval.private static int
MAX_CANDIDATES
Maximum number of candidate to optimize.private static int
MAX_TABLES
The maximum number of tables.private UnconditionedExactTest.Method
method
Method to identify more extreme tables.private static double
MINIMA_EPS
Relative distance of candidate minima from the lowest candidate.private boolean
optimize
Option to optimize the best initial point(s).private int
points
Number of initial points.private static double
SOLVER_RELATIVE_EPS
Relative epsilon for the Brent solver.
-
Constructor Summary
Constructors Modifier Constructor Description private
UnconditionedExactTest(AlternativeHypothesis alternative, UnconditionedExactTest.Method method, int points, boolean optimize)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
addCandidate(UnconditionedExactTest.Candidates minima, double v1, double v2, double v3, double x2)
Add point 2 to the list of minima if neither neighbour value is lower.private static void
checkTable(int[][] table)
Check the input is a 2-by-2 contingency table.private double[]
computePValue(UnconditionedExactTest.XYList tableList)
Compute the nuisance parameter and p-value for the binomial model given the list of possible tables.private static java.util.function.DoubleUnaryOperator
createBinomialModel(UnconditionedExactTest.XYList tableList)
Creates the binomial model p-value function for the nuisance parameter.private static java.util.function.IntToDoubleFunction
createLogBinomialCoefficients(int n)
Create the natural logarithm of the binomial coefficient for allk = [0, n]
.private double
findExtremeTables(int a, int b, UnconditionedExactTest.XYList tableList)
Find all tables that are as or more extreme than the observed table.private double
findExtremeTablesBoschloo(int a, int b, int m, int n, UnconditionedExactTest.XYList tableList)
Find all tables that are as or more extreme than the observed table using the Fisher's p-value as the statistic (also known as Boschloo's test).private double
findExtremeTablesZ(int a, int b, int m, int n, boolean pooled, UnconditionedExactTest.XYList tableList)
Find all tables that are as or more extreme than the observed table using the Z statistic.double
statistic(int[][] table)
Compute the statistic for the unconditioned exact test.private double
statisticBoschloo(int a, int b, int m, int n)
Compute the statistic using Fisher's p-value (also known as Boschloo's test).private static double
statisticBoschlooTwoSided(Hypergeom distribution, int k)
Compute the two-sided statistic using Fisher's p-value (also known as Boschloo's test).private static double
statisticZ(int a, int b, int m, int n, boolean pooled)
Compute the statistic from a Z-test.UnconditionedExactTest.Result
test(int[][] table)
Performs an unconditioned exact test on the 2-by-2 contingency table.UnconditionedExactTest
with(AlternativeHypothesis v)
Return an instance with the configured alternative hypothesis.UnconditionedExactTest
with(UnconditionedExactTest.Method v)
Return an instance with the configured method.static UnconditionedExactTest
withDefaults()
Return an instance using the default options.UnconditionedExactTest
withInitialPoints(int v)
Return an instance with the configured number of initial points.UnconditionedExactTest
withOptimize(boolean v)
Return an instance with the configured optimization of initial search points.
-
-
-
Field Detail
-
DEFAULT
private static final UnconditionedExactTest DEFAULT
Default instance.SciPy's boschloo_exact and barnard_exact tests use 32 points in the interval [0, 1) The R Exact package uses 100 in the interval [1e-5, 1-1e-5]. Barnards 1947 paper describes the nuisance parameter in the open interval
0 < pi < 1
. Here we respect the open-interval for the initial candidates and ignore 0 and 1. The initial bounds used are the same as the R Exact package. We closely match the inner 31 points from SciPy by using 33 points by default.
-
LOWER_BOUND
private static final double LOWER_BOUND
Lower bound for the enumerated interval. The upper bound is1 - lower
.- See Also:
- Constant Field Values
-
SOLVER_RELATIVE_EPS
private static final double SOLVER_RELATIVE_EPS
Relative epsilon for the Brent solver. This is limited for a univariate function to approximately sqrt(eps) with eps = 2^-52.- See Also:
- Constant Field Values
-
INC_FRACTION
private static final double INC_FRACTION
Fraction of the increment (interval between enumerated points) to initialise the bracket for the minima. Note the minima should lie between x +/- increment. The bracket should search within this range. Set to 1/8 and so the initial point of the bracket is approximately 1.61 * 1/8 = 0.2 of the increment away from initial points a or b.- See Also:
- Constant Field Values
-
MAX_CANDIDATES
private static final int MAX_CANDIDATES
Maximum number of candidate to optimize. This is a safety limit to avoid excess optimization. Only candidates within a relative tolerance of the best candidate are stored. If the number of candidates exceeds this value then many candidates have a very similar p-value and the top candidates will be optimized. Using a value of 3 allows at least one other candidate to be optimized when there is two-fold symmetry in the energy function.- See Also:
- Constant Field Values
-
MINIMA_EPS
private static final double MINIMA_EPS
Relative distance of candidate minima from the lowest candidate. Used to exclude poor candidates from optimization.- See Also:
- Constant Field Values
-
MAX_TABLES
private static final int MAX_TABLES
The maximum number of tables. This is limited by the maximum number of indices that can be maintained in memory. Potentially up to this number of tables must be tracked during computation of the p-value for as or more extreme tables. The limit is set using the same limit for maximum capacity as java.util.ArrayList. In practice any table anywhere near this limit can be computed using an alternative such as a chi-squared or g test.- See Also:
- Constant Field Values
-
COLUMN_SUM
private static final java.lang.String COLUMN_SUM
Error message text for zero column sums.- See Also:
- Constant Field Values
-
alternative
private final AlternativeHypothesis alternative
Alternative hypothesis.
-
method
private final UnconditionedExactTest.Method method
Method to identify more extreme tables.
-
points
private final int points
Number of initial points.
-
optimize
private final boolean optimize
Option to optimize the best initial point(s).
-
-
Constructor Detail
-
UnconditionedExactTest
private UnconditionedExactTest(AlternativeHypothesis alternative, UnconditionedExactTest.Method method, int points, boolean optimize)
- Parameters:
alternative
- Alternative hypothesis.method
- Method to identify more extreme tables.points
- Number of initial points.optimize
- Option to optimize the best initial point(s).
-
-
Method Detail
-
withDefaults
public static UnconditionedExactTest withDefaults()
Return an instance using the default options.- Returns:
- default instance
-
with
public UnconditionedExactTest with(AlternativeHypothesis v)
Return an instance with the configured alternative hypothesis.- Parameters:
v
- Value.- Returns:
- an instance
-
with
public UnconditionedExactTest with(UnconditionedExactTest.Method v)
Return an instance with the configured method.- Parameters:
v
- Value.- Returns:
- an instance
-
withInitialPoints
public UnconditionedExactTest withInitialPoints(int v)
Return an instance with the configured number of initial points.The search for the nuisance parameter will use \( v \) points in the open interval \( (0, 1) \). The interval is evaluated by including start and end points approximately equal to 0 and 1. Additional internal points are enumerated using increments of approximately \( \frac{1}{v-1} \). The minimum number of points is 2. Increasing the number of points increases the precision of the search at the cost of performance.
To approximately double the number of points so that all existing points are included and additional points half-way between them are sampled requires using
2p - 1
wherep
is the existing number of points.- Parameters:
v
- Value.- Returns:
- an instance
- Throws:
java.lang.IllegalArgumentException
- if the value is< 2
.
-
withOptimize
public UnconditionedExactTest withOptimize(boolean v)
Return an instance with the configured optimization of initial search points.If enabled then the initial point(s) with the highest probability is/are used as the start for an optimization to find a local maxima.
- Parameters:
v
- Value.- Returns:
- an instance
- See Also:
withInitialPoints(int)
-
statistic
public double statistic(int[][] table)
Compute the statistic for the unconditioned exact test. The statistic returned depends on the configured method.- Parameters:
table
- 2-by-2 contingency table.- Returns:
- test statistic
- Throws:
java.lang.IllegalArgumentException
- if thetable
is not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.- See Also:
with(Method)
,test(int[][])
-
test
public UnconditionedExactTest.Result test(int[][] table)
Performs an unconditioned exact test on the 2-by-2 contingency table. The statistic and p-value returned depends on the configured method and alternative hypothesis.The search for the nuisance parameter that maximises the p-value can be configured to: start with a number of initial points; and optimize the best points.
- Parameters:
table
- 2-by-2 contingency table.- Returns:
- test result
- Throws:
java.lang.IllegalArgumentException
- if thetable
is not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.- See Also:
with(Method)
,with(AlternativeHypothesis)
,statistic(int[][])
-
findExtremeTables
private double findExtremeTables(int a, int b, UnconditionedExactTest.XYList tableList)
Find all tables that are as or more extreme than the observed table.If the list of tables is full then all tables are more extreme. Some configurations can detect this without performing a search and in this case the list of tables is returned as empty.
- Parameters:
a
- Observed value for a.b
- Observed value for b.tableList
- List to track more extreme tables.- Returns:
- the test statistic
-
statisticZ
private static double statisticZ(int a, int b, int m, int n, boolean pooled)
Compute the statistic from a Z-test.- Parameters:
a
- Observed value for a.b
- Observed value for b.m
- Column sum m.n
- Column sum n.pooled
- true to use a pooled variance.- Returns:
- z
-
findExtremeTablesZ
private double findExtremeTablesZ(int a, int b, int m, int n, boolean pooled, UnconditionedExactTest.XYList tableList)
Find all tables that are as or more extreme than the observed table using the Z statistic.- Parameters:
a
- Observed value for a.b
- Observed value for b.m
- Column sum m.n
- Column sum n.pooled
- true to use a pooled variance.tableList
- List to track more extreme tables.- Returns:
- observed z
-
statisticBoschloo
private double statisticBoschloo(int a, int b, int m, int n)
Compute the statistic using Fisher's p-value (also known as Boschloo's test).- Parameters:
a
- Observed value for a.b
- Observed value for b.m
- Column sum m.n
- Column sum n.- Returns:
- p-value
-
statisticBoschlooTwoSided
private static double statisticBoschlooTwoSided(Hypergeom distribution, int k)
Compute the two-sided statistic using Fisher's p-value (also known as Boschloo's test).- Parameters:
distribution
- Hypergeometric distribution.k
- Observed value.- Returns:
- p-value
-
findExtremeTablesBoschloo
private double findExtremeTablesBoschloo(int a, int b, int m, int n, UnconditionedExactTest.XYList tableList)
Find all tables that are as or more extreme than the observed table using the Fisher's p-value as the statistic (also known as Boschloo's test).- Parameters:
a
- Observed value for a.b
- Observed value for b.m
- Column sum m.n
- Column sum n.tableList
- List to track more extreme tables.- Returns:
- observed p-value
-
computePValue
private double[] computePValue(UnconditionedExactTest.XYList tableList)
Compute the nuisance parameter and p-value for the binomial model given the list of possible tables.The current method enumerates an initial set of points and stores local extrema as candidates. Any candidate within 2% of the best is optionally optimized; this is limited to the top 3 candidates. These settings could be exposed as configurable options. Currently only the choice to optimize or not is exposed.
- Parameters:
tableList
- List of tables.- Returns:
- [nuisance parameter, p-value]
-
createBinomialModel
private static java.util.function.DoubleUnaryOperator createBinomialModel(UnconditionedExactTest.XYList tableList)
Creates the binomial model p-value function for the nuisance parameter. Note: This function computes the negative p-value so is suitable for optimization by a search for a minimum.- Parameters:
tableList
- List of tables.- Returns:
- the function
-
createLogBinomialCoefficients
private static java.util.function.IntToDoubleFunction createLogBinomialCoefficients(int n)
Create the natural logarithm of the binomial coefficient for allk = [0, n]
.- Parameters:
n
- Limit N.- Returns:
- ln binom(n, k)
-
addCandidate
private void addCandidate(UnconditionedExactTest.Candidates minima, double v1, double v2, double v3, double x2)
Add point 2 to the list of minima if neither neighbour value is lower.!(v1 < v2 || v3 < v2)
- Parameters:
minima
- Candidate minima.v1
- First point function value.v2
- Second point function value.v3
- Third point function value.x2
- Second point.
-
checkTable
private static void checkTable(int[][] table)
Check the input is a 2-by-2 contingency table.- Parameters:
table
- Contingency table.- Throws:
java.lang.IllegalArgumentException
- if thetable
is not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.
-
-