Class UnconditionedExactTest
Performs an exact test for the statistical significance of the association (contingency) between two kinds of categorical classification. A 2x2 contingency table is:
\[ \left[ {\begin{array}{cc} a & b \\ c & d \\ \end{array} } \right] \]
This test applies to the case of a 2x2 contingency table with one margin fixed. Note that if both margins are fixed (the row sums and column sums are not random) then Fisher's exact test can be applied.
This implementation fixes the column sums \( m = a + c \) and \( n = b + d \). All possible tables can be created using \( 0 \le a \le m \) and \( 0 \le b \le n \). The random values \( a \) and \( b \) follow a binomial distribution with probabilities \( p_0 \) and \( p_1 \) such that \( a \sim B(m, p_0) \) and \( b \sim B(n, p_1) \). The p-value of the 2x2 table is the product of two binomials:
\[ \begin{aligned} p &= Pr(a; m, p_0) \times Pr(b; n, p_1) \\ &= \binom{m}{a} p_0^a (1-p_0)^{m-a} \times \binom{n}{b} p_1^b (1-p_1)^{n-b} \end{aligned} \]
For the binomial model, the null hypothesis is the two nuisance parameters are equal \( p_0 = p_1 = \pi\), with \( \pi \) the probability for equal proportions, and the probability of any single table is:
\[ p = \binom{m}{a} \binom{n}{b} \pi^{a+b} (1-\pi)^{m+n-a-b} \]
The p-value of the observed table is calculated by maximising the sum of the as or more extreme tables over the domain of the nuisance parameter \( 0 \lt \pi \lt 1 \):
\[ p(a, b) = \sum_{i,j} \binom{m}{i} \binom{n}{j} \pi^{i+j} (1-\pi)^{m+n-i-j} \]
where table \( (i,j) \) is as or more extreme than the observed table \( (a, b) \). The test can be configured to select more extreme tables using various methods.
Note that the sum of the joint binomial distribution is a univariate function for the nuisance parameter \( \pi \). This function may have many local maxima and the search enumerates the range with a configured number of points. The best candidates are optionally used as the start point for an optimized search for a local maxima.
References:
- Barnard, G.A. (1947). Significance tests for 2x2 tables. Biometrika, 34, Issue 1-2, 123–138.
- Boschloo, R.D. (1970). Raised conditional level of significance for the 2 × 2-table when testing the equality of two probabilities. Statistica neerlandica, 24(1), 1–9.
- Suisaa, A and Shuster, J.J. (1985). Exact Unconditional Sample Sizes for the 2 × 2 Binomial Trial. Journal of the Royal Statistical Society. Series A (General), 148(4), 317-327.
- Since:
- 1.1
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static interface
Compute the statistic for Boschloo's test.(package private) static class
A container of (key,value) pairs to store candidate minima.static enum
Define the method to determine the more extreme tables.static final class
Result for the unconditioned exact test.private static class
An expandable list of (x,y) values. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final AlternativeHypothesis
Alternative hypothesis.private static final String
Error message text for zero column sums.private static final UnconditionedExactTest
Default instance.private static final double
Fraction of the increment (interval between enumerated points) to initialise the bracket for the minima.private static final double
Lower bound for the enumerated interval.private static final int
Maximum number of candidate to optimize.private static final int
The maximum number of tables.private final UnconditionedExactTest.Method
Method to identify more extreme tables.private static final double
Relative distance of candidate minima from the lowest candidate.private final boolean
Option to optimize the best initial point(s).private final int
Number of initial points.private static final double
Relative epsilon for the Brent solver. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
UnconditionedExactTest
(AlternativeHypothesis alternative, UnconditionedExactTest.Method method, int points, boolean optimize) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
addCandidate
(UnconditionedExactTest.Candidates minima, double v1, double v2, double v3, double x2) Add point 2 to the list of minima if neither neighbour value is lower.private static void
checkTable
(int[][] table) Check the input is a 2-by-2 contingency table.private double[]
computePValue
(UnconditionedExactTest.XYList tableList) Compute the nuisance parameter and p-value for the binomial model given the list of possible tables.private static DoubleUnaryOperator
createBinomialModel
(UnconditionedExactTest.XYList tableList) Creates the binomial model p-value function for the nuisance parameter.private static IntToDoubleFunction
Create the natural logarithm of the binomial coefficient for allk = [0, n]
.private double
findExtremeTables
(int a, int b, UnconditionedExactTest.XYList tableList) Find all tables that are as or more extreme than the observed table.private double
findExtremeTablesBoschloo
(int a, int b, int m, int n, UnconditionedExactTest.XYList tableList) Find all tables that are as or more extreme than the observed table using the Fisher's p-value as the statistic (also known as Boschloo's test).private double
findExtremeTablesZ
(int a, int b, int m, int n, boolean pooled, UnconditionedExactTest.XYList tableList) Find all tables that are as or more extreme than the observed table using the Z statistic.double
statistic
(int[][] table) Compute the statistic for the unconditioned exact test.private double
statisticBoschloo
(int a, int b, int m, int n) Compute the statistic using Fisher's p-value (also known as Boschloo's test).private static double
statisticBoschlooTwoSided
(Hypergeom distribution, int k) Compute the two-sided statistic using Fisher's p-value (also known as Boschloo's test).private static double
statisticZ
(int a, int b, int m, int n, boolean pooled) Compute the statistic from a Z-test.test
(int[][] table) Performs an unconditioned exact test on the 2-by-2 contingency table.Return an instance with the configured alternative hypothesis.Return an instance with the configured method.static UnconditionedExactTest
Return an instance using the default options.withInitialPoints
(int v) Return an instance with the configured number of initial points.withOptimize
(boolean v) Return an instance with the configured optimization of initial search points.
-
Field Details
-
DEFAULT
Default instance.SciPy's boschloo_exact and barnard_exact tests use 32 points in the interval [0, 1) The R Exact package uses 100 in the interval [1e-5, 1-1e-5]. Barnards 1947 paper describes the nuisance parameter in the open interval
0 < pi < 1
. Here we respect the open-interval for the initial candidates and ignore 0 and 1. The initial bounds used are the same as the R Exact package. We closely match the inner 31 points from SciPy by using 33 points by default. -
LOWER_BOUND
private static final double LOWER_BOUNDLower bound for the enumerated interval. The upper bound is1 - lower
.- See Also:
-
SOLVER_RELATIVE_EPS
private static final double SOLVER_RELATIVE_EPSRelative epsilon for the Brent solver. This is limited for a univariate function to approximately sqrt(eps) with eps = 2^-52.- See Also:
-
INC_FRACTION
private static final double INC_FRACTIONFraction of the increment (interval between enumerated points) to initialise the bracket for the minima. Note the minima should lie between x +/- increment. The bracket should search within this range. Set to 1/8 and so the initial point of the bracket is approximately 1.61 * 1/8 = 0.2 of the increment away from initial points a or b.- See Also:
-
MAX_CANDIDATES
private static final int MAX_CANDIDATESMaximum number of candidate to optimize. This is a safety limit to avoid excess optimization. Only candidates within a relative tolerance of the best candidate are stored. If the number of candidates exceeds this value then many candidates have a very similar p-value and the top candidates will be optimized. Using a value of 3 allows at least one other candidate to be optimized when there is two-fold symmetry in the energy function.- See Also:
-
MINIMA_EPS
private static final double MINIMA_EPSRelative distance of candidate minima from the lowest candidate. Used to exclude poor candidates from optimization.- See Also:
-
MAX_TABLES
private static final int MAX_TABLESThe maximum number of tables. This is limited by the maximum number of indices that can be maintained in memory. Potentially up to this number of tables must be tracked during computation of the p-value for as or more extreme tables. The limit is set using the same limit for maximum capacity as java.util.ArrayList. In practice any table anywhere near this limit can be computed using an alternative such as a chi-squared or g test.- See Also:
-
COLUMN_SUM
Error message text for zero column sums.- See Also:
-
alternative
Alternative hypothesis. -
method
Method to identify more extreme tables. -
points
private final int pointsNumber of initial points. -
optimize
private final boolean optimizeOption to optimize the best initial point(s).
-
-
Constructor Details
-
UnconditionedExactTest
private UnconditionedExactTest(AlternativeHypothesis alternative, UnconditionedExactTest.Method method, int points, boolean optimize) - Parameters:
alternative
- Alternative hypothesis.method
- Method to identify more extreme tables.points
- Number of initial points.optimize
- Option to optimize the best initial point(s).
-
-
Method Details
-
withDefaults
Return an instance using the default options.- Returns:
- default instance
-
with
Return an instance with the configured alternative hypothesis.- Parameters:
v
- Value.- Returns:
- an instance
-
with
Return an instance with the configured method.- Parameters:
v
- Value.- Returns:
- an instance
-
withInitialPoints
Return an instance with the configured number of initial points.The search for the nuisance parameter will use \( v \) points in the open interval \( (0, 1) \). The interval is evaluated by including start and end points approximately equal to 0 and 1. Additional internal points are enumerated using increments of approximately \( \frac{1}{v-1} \). The minimum number of points is 2. Increasing the number of points increases the precision of the search at the cost of performance.
To approximately double the number of points so that all existing points are included and additional points half-way between them are sampled requires using
2p - 1
wherep
is the existing number of points.- Parameters:
v
- Value.- Returns:
- an instance
- Throws:
IllegalArgumentException
- if the value is< 2
.
-
withOptimize
Return an instance with the configured optimization of initial search points.If enabled then the initial point(s) with the highest probability is/are used as the start for an optimization to find a local maxima.
- Parameters:
v
- Value.- Returns:
- an instance
- See Also:
-
statistic
public double statistic(int[][] table) Compute the statistic for the unconditioned exact test. The statistic returned depends on the configured method.- Parameters:
table
- 2-by-2 contingency table.- Returns:
- test statistic
- Throws:
IllegalArgumentException
- if thetable
is not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.- See Also:
-
test
Performs an unconditioned exact test on the 2-by-2 contingency table. The statistic and p-value returned depends on the configured method and alternative hypothesis.The search for the nuisance parameter that maximises the p-value can be configured to: start with a number of initial points; and optimize the best points.
- Parameters:
table
- 2-by-2 contingency table.- Returns:
- test result
- Throws:
IllegalArgumentException
- if thetable
is not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.- See Also:
-
findExtremeTables
Find all tables that are as or more extreme than the observed table.If the list of tables is full then all tables are more extreme. Some configurations can detect this without performing a search and in this case the list of tables is returned as empty.
- Parameters:
a
- Observed value for a.b
- Observed value for b.tableList
- List to track more extreme tables.- Returns:
- the test statistic
-
statisticZ
private static double statisticZ(int a, int b, int m, int n, boolean pooled) Compute the statistic from a Z-test.- Parameters:
a
- Observed value for a.b
- Observed value for b.m
- Column sum m.n
- Column sum n.pooled
- true to use a pooled variance.- Returns:
- z
-
findExtremeTablesZ
private double findExtremeTablesZ(int a, int b, int m, int n, boolean pooled, UnconditionedExactTest.XYList tableList) Find all tables that are as or more extreme than the observed table using the Z statistic.- Parameters:
a
- Observed value for a.b
- Observed value for b.m
- Column sum m.n
- Column sum n.pooled
- true to use a pooled variance.tableList
- List to track more extreme tables.- Returns:
- observed z
-
statisticBoschloo
private double statisticBoschloo(int a, int b, int m, int n) Compute the statistic using Fisher's p-value (also known as Boschloo's test).- Parameters:
a
- Observed value for a.b
- Observed value for b.m
- Column sum m.n
- Column sum n.- Returns:
- p-value
-
statisticBoschlooTwoSided
Compute the two-sided statistic using Fisher's p-value (also known as Boschloo's test).- Parameters:
distribution
- Hypergeometric distribution.k
- Observed value.- Returns:
- p-value
-
findExtremeTablesBoschloo
private double findExtremeTablesBoschloo(int a, int b, int m, int n, UnconditionedExactTest.XYList tableList) Find all tables that are as or more extreme than the observed table using the Fisher's p-value as the statistic (also known as Boschloo's test).- Parameters:
a
- Observed value for a.b
- Observed value for b.m
- Column sum m.n
- Column sum n.tableList
- List to track more extreme tables.- Returns:
- observed p-value
-
computePValue
Compute the nuisance parameter and p-value for the binomial model given the list of possible tables.The current method enumerates an initial set of points and stores local extrema as candidates. Any candidate within 2% of the best is optionally optimized; this is limited to the top 3 candidates. These settings could be exposed as configurable options. Currently only the choice to optimize or not is exposed.
- Parameters:
tableList
- List of tables.- Returns:
- [nuisance parameter, p-value]
-
createBinomialModel
Creates the binomial model p-value function for the nuisance parameter. Note: This function computes the negative p-value so is suitable for optimization by a search for a minimum.- Parameters:
tableList
- List of tables.- Returns:
- the function
-
createLogBinomialCoefficients
Create the natural logarithm of the binomial coefficient for allk = [0, n]
.- Parameters:
n
- Limit N.- Returns:
- ln binom(n, k)
-
addCandidate
private void addCandidate(UnconditionedExactTest.Candidates minima, double v1, double v2, double v3, double x2) Add point 2 to the list of minima if neither neighbour value is lower.!(v1 invalid input: '<' v2 || v3 invalid input: '<' v2)
- Parameters:
minima
- Candidate minima.v1
- First point function value.v2
- Second point function value.v3
- Third point function value.x2
- Second point.
-
checkTable
private static void checkTable(int[][] table) Check the input is a 2-by-2 contingency table.- Parameters:
table
- Contingency table.- Throws:
IllegalArgumentException
- if thetable
is not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.
-