Class UnconditionedExactTest


  • public final class UnconditionedExactTest
    extends java.lang.Object
    Implements an unconditioned exact test for a contingency table.

    Performs an exact test for the statistical significance of the association (contingency) between two kinds of categorical classification. A 2x2 contingency table is:

    \[ \left[ {\begin{array}{cc} a & b \\ c & d \\ \end{array} } \right] \]

    This test applies to the case of a 2x2 contingency table with one margin fixed. Note that if both margins are fixed (the row sums and column sums are not random) then Fisher's exact test can be applied.

    This implementation fixes the column sums \( m = a + c \) and \( n = b + d \). All possible tables can be created using \( 0 \le a \le m \) and \( 0 \le b \le n \). The random values \( a \) and \( b \) follow a binomial distribution with probabilities \( p_0 \) and \( p_1 \) such that \( a \sim B(m, p_0) \) and \( b \sim B(n, p_1) \). The p-value of the 2x2 table is the product of two binomials:

    \[ \begin{aligned} p &= Pr(a; m, p_0) \times Pr(b; n, p_1) \\ &= \binom{m}{a} p_0^a (1-p_0)^{m-a} \times \binom{n}{b} p_1^b (1-p_1)^{n-b} \end{aligned} \]

    For the binomial model, the null hypothesis is the two nuisance parameters are equal \( p_0 = p_1 = \pi\), with \( \pi \) the probability for equal proportions, and the probability of any single table is:

    \[ p = \binom{m}{a} \binom{n}{b} \pi^{a+b} (1-\pi)^{m+n-a-b} \]

    The p-value of the observed table is calculated by maximising the sum of the as or more extreme tables over the domain of the nuisance parameter \( 0 \lt \pi \lt 1 \):

    \[ p(a, b) = \sum_{i,j} \binom{m}{i} \binom{n}{j} \pi^{i+j} (1-\pi)^{m+n-i-j} \]

    where table \( (i,j) \) is as or more extreme than the observed table \( (a, b) \). The test can be configured to select more extreme tables using various methods.

    Note that the sum of the joint binomial distribution is a univariate function for the nuisance parameter \( \pi \). This function may have many local maxima and the search enumerates the range with a configured number of points. The best candidates are optionally used as the start point for an optimized search for a local maxima.

    References:

    1. Barnard, G.A. (1947). Significance tests for 2x2 tables. Biometrika, 34, Issue 1-2, 123–138.
    2. Boschloo, R.D. (1970). Raised conditional level of significance for the 2 × 2-table when testing the equality of two probabilities. Statistica neerlandica, 24(1), 1–9.
    3. Suisaa, A and Shuster, J.J. (1985). Exact Unconditional Sample Sizes for the 2 × 2 Binomial Trial. Journal of the Royal Statistical Society. Series A (General), 148(4), 317-327.
    Since:
    1.1
    See Also:
    FisherExactTest, Boschloo's test (Wikipedia), Barnard's test (Wikipedia)
    • Field Detail

      • DEFAULT

        private static final UnconditionedExactTest DEFAULT
        Default instance.

        SciPy's boschloo_exact and barnard_exact tests use 32 points in the interval [0, 1) The R Exact package uses 100 in the interval [1e-5, 1-1e-5]. Barnards 1947 paper describes the nuisance parameter in the open interval 0 < pi < 1. Here we respect the open-interval for the initial candidates and ignore 0 and 1. The initial bounds used are the same as the R Exact package. We closely match the inner 31 points from SciPy by using 33 points by default.

      • LOWER_BOUND

        private static final double LOWER_BOUND
        Lower bound for the enumerated interval. The upper bound is 1 - lower.
        See Also:
        Constant Field Values
      • SOLVER_RELATIVE_EPS

        private static final double SOLVER_RELATIVE_EPS
        Relative epsilon for the Brent solver. This is limited for a univariate function to approximately sqrt(eps) with eps = 2^-52.
        See Also:
        Constant Field Values
      • INC_FRACTION

        private static final double INC_FRACTION
        Fraction of the increment (interval between enumerated points) to initialise the bracket for the minima. Note the minima should lie between x +/- increment. The bracket should search within this range. Set to 1/8 and so the initial point of the bracket is approximately 1.61 * 1/8 = 0.2 of the increment away from initial points a or b.
        See Also:
        Constant Field Values
      • MAX_CANDIDATES

        private static final int MAX_CANDIDATES
        Maximum number of candidate to optimize. This is a safety limit to avoid excess optimization. Only candidates within a relative tolerance of the best candidate are stored. If the number of candidates exceeds this value then many candidates have a very similar p-value and the top candidates will be optimized. Using a value of 3 allows at least one other candidate to be optimized when there is two-fold symmetry in the energy function.
        See Also:
        Constant Field Values
      • MINIMA_EPS

        private static final double MINIMA_EPS
        Relative distance of candidate minima from the lowest candidate. Used to exclude poor candidates from optimization.
        See Also:
        Constant Field Values
      • MAX_TABLES

        private static final int MAX_TABLES
        The maximum number of tables. This is limited by the maximum number of indices that can be maintained in memory. Potentially up to this number of tables must be tracked during computation of the p-value for as or more extreme tables. The limit is set using the same limit for maximum capacity as java.util.ArrayList. In practice any table anywhere near this limit can be computed using an alternative such as a chi-squared or g test.
        See Also:
        Constant Field Values
      • COLUMN_SUM

        private static final java.lang.String COLUMN_SUM
        Error message text for zero column sums.
        See Also:
        Constant Field Values
      • points

        private final int points
        Number of initial points.
      • optimize

        private final boolean optimize
        Option to optimize the best initial point(s).
    • Constructor Detail

      • UnconditionedExactTest

        private UnconditionedExactTest​(AlternativeHypothesis alternative,
                                       UnconditionedExactTest.Method method,
                                       int points,
                                       boolean optimize)
        Parameters:
        alternative - Alternative hypothesis.
        method - Method to identify more extreme tables.
        points - Number of initial points.
        optimize - Option to optimize the best initial point(s).
    • Method Detail

      • withInitialPoints

        public UnconditionedExactTest withInitialPoints​(int v)
        Return an instance with the configured number of initial points.

        The search for the nuisance parameter will use \( v \) points in the open interval \( (0, 1) \). The interval is evaluated by including start and end points approximately equal to 0 and 1. Additional internal points are enumerated using increments of approximately \( \frac{1}{v-1} \). The minimum number of points is 2. Increasing the number of points increases the precision of the search at the cost of performance.

        To approximately double the number of points so that all existing points are included and additional points half-way between them are sampled requires using 2p - 1 where p is the existing number of points.

        Parameters:
        v - Value.
        Returns:
        an instance
        Throws:
        java.lang.IllegalArgumentException - if the value is < 2.
      • withOptimize

        public UnconditionedExactTest withOptimize​(boolean v)
        Return an instance with the configured optimization of initial search points.

        If enabled then the initial point(s) with the highest probability is/are used as the start for an optimization to find a local maxima.

        Parameters:
        v - Value.
        Returns:
        an instance
        See Also:
        withInitialPoints(int)
      • statistic

        public double statistic​(int[][] table)
        Compute the statistic for the unconditioned exact test. The statistic returned depends on the configured method.
        Parameters:
        table - 2-by-2 contingency table.
        Returns:
        test statistic
        Throws:
        java.lang.IllegalArgumentException - if the table is not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.
        See Also:
        with(Method), test(int[][])
      • test

        public UnconditionedExactTest.Result test​(int[][] table)
        Performs an unconditioned exact test on the 2-by-2 contingency table. The statistic and p-value returned depends on the configured method and alternative hypothesis.

        The search for the nuisance parameter that maximises the p-value can be configured to: start with a number of initial points; and optimize the best points.

        Parameters:
        table - 2-by-2 contingency table.
        Returns:
        test result
        Throws:
        java.lang.IllegalArgumentException - if the table is not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.
        See Also:
        with(Method), with(AlternativeHypothesis), statistic(int[][])
      • findExtremeTables

        private double findExtremeTables​(int a,
                                         int b,
                                         UnconditionedExactTest.XYList tableList)
        Find all tables that are as or more extreme than the observed table.

        If the list of tables is full then all tables are more extreme. Some configurations can detect this without performing a search and in this case the list of tables is returned as empty.

        Parameters:
        a - Observed value for a.
        b - Observed value for b.
        tableList - List to track more extreme tables.
        Returns:
        the test statistic
      • statisticZ

        private static double statisticZ​(int a,
                                         int b,
                                         int m,
                                         int n,
                                         boolean pooled)
        Compute the statistic from a Z-test.
        Parameters:
        a - Observed value for a.
        b - Observed value for b.
        m - Column sum m.
        n - Column sum n.
        pooled - true to use a pooled variance.
        Returns:
        z
      • findExtremeTablesZ

        private double findExtremeTablesZ​(int a,
                                          int b,
                                          int m,
                                          int n,
                                          boolean pooled,
                                          UnconditionedExactTest.XYList tableList)
        Find all tables that are as or more extreme than the observed table using the Z statistic.
        Parameters:
        a - Observed value for a.
        b - Observed value for b.
        m - Column sum m.
        n - Column sum n.
        pooled - true to use a pooled variance.
        tableList - List to track more extreme tables.
        Returns:
        observed z
      • statisticBoschloo

        private double statisticBoschloo​(int a,
                                         int b,
                                         int m,
                                         int n)
        Compute the statistic using Fisher's p-value (also known as Boschloo's test).
        Parameters:
        a - Observed value for a.
        b - Observed value for b.
        m - Column sum m.
        n - Column sum n.
        Returns:
        p-value
      • statisticBoschlooTwoSided

        private static double statisticBoschlooTwoSided​(Hypergeom distribution,
                                                        int k)
        Compute the two-sided statistic using Fisher's p-value (also known as Boschloo's test).
        Parameters:
        distribution - Hypergeometric distribution.
        k - Observed value.
        Returns:
        p-value
      • findExtremeTablesBoschloo

        private double findExtremeTablesBoschloo​(int a,
                                                 int b,
                                                 int m,
                                                 int n,
                                                 UnconditionedExactTest.XYList tableList)
        Find all tables that are as or more extreme than the observed table using the Fisher's p-value as the statistic (also known as Boschloo's test).
        Parameters:
        a - Observed value for a.
        b - Observed value for b.
        m - Column sum m.
        n - Column sum n.
        tableList - List to track more extreme tables.
        Returns:
        observed p-value
      • computePValue

        private double[] computePValue​(UnconditionedExactTest.XYList tableList)
        Compute the nuisance parameter and p-value for the binomial model given the list of possible tables.

        The current method enumerates an initial set of points and stores local extrema as candidates. Any candidate within 2% of the best is optionally optimized; this is limited to the top 3 candidates. These settings could be exposed as configurable options. Currently only the choice to optimize or not is exposed.

        Parameters:
        tableList - List of tables.
        Returns:
        [nuisance parameter, p-value]
      • createBinomialModel

        private static java.util.function.DoubleUnaryOperator createBinomialModel​(UnconditionedExactTest.XYList tableList)
        Creates the binomial model p-value function for the nuisance parameter. Note: This function computes the negative p-value so is suitable for optimization by a search for a minimum.
        Parameters:
        tableList - List of tables.
        Returns:
        the function
      • createLogBinomialCoefficients

        private static java.util.function.IntToDoubleFunction createLogBinomialCoefficients​(int n)
        Create the natural logarithm of the binomial coefficient for all k = [0, n].
        Parameters:
        n - Limit N.
        Returns:
        ln binom(n, k)
      • addCandidate

        private void addCandidate​(UnconditionedExactTest.Candidates minima,
                                  double v1,
                                  double v2,
                                  double v3,
                                  double x2)
        Add point 2 to the list of minima if neither neighbour value is lower.
         !(v1 < v2 || v3 < v2)
         
        Parameters:
        minima - Candidate minima.
        v1 - First point function value.
        v2 - Second point function value.
        v3 - Third point function value.
        x2 - Second point.
      • checkTable

        private static void checkTable​(int[][] table)
        Check the input is a 2-by-2 contingency table.
        Parameters:
        table - Contingency table.
        Throws:
        java.lang.IllegalArgumentException - if the table is not a 2-by-2 table; any table entry is negative; any column sum is zero; the table sum is zero or not an integer; or the number of possible tables exceeds the maximum array capacity.