Class ChiSquareTest


  • public final class ChiSquareTest
    extends java.lang.Object
    Implements chi-square test statistics.

    This implementation handles both known and unknown distributions.

    Two samples tests can be used when the distribution is unknown a priori but provided by one sample, or when the hypothesis under test is that the two samples come from the same underlying distribution.

    Since:
    1.1
    See Also:
    Chi-square test (Wikipedia)
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static java.lang.String COLUMN
      Name for the column.
      private static ChiSquareTest DEFAULT
      Default instance.
      private int degreesOfFreedomAdjustment
      Degrees of freedom adjustment.
      private static java.lang.String ROW
      Name for the row.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private ChiSquareTest​(int degreesOfFreedomAdjustment)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private static void checkNonZero​(double value, java.lang.String name, int index)
      Check the array value is non-zero.
      private static double computeP​(double chi2, double degreesOfFreedom)
      Compute the chi-square test p-value.
      double statistic​(double[] expected, long[] observed)
      Computes the chi-square goodness-of-fit statistic comparing observed and expected frequency counts.
      double statistic​(long[] observed)
      Computes the chi-square goodness-of-fit statistic comparing the observed counts to a uniform expected value (each category is equally likely).
      double statistic​(long[][] counts)
      Computes the chi-square statistic associated with a chi-square test of independence based on the input counts array, viewed as a two-way table in row-major format.
      double statistic​(long[] observed1, long[] observed2)
      Computes a chi-square statistic associated with a chi-square test of independence of frequency counts in observed1 and observed2.
      SignificanceResult test​(double[] expected, long[] observed)
      Perform a chi-square goodness-of-fit test evaluating the null hypothesis that the observed counts conform to the expected counts.
      SignificanceResult test​(long[] observed)
      Perform a chi-square goodness-of-fit test evaluating the null hypothesis that the observed counts conform to a uniform distribution (each category is equally likely).
      SignificanceResult test​(long[][] counts)
      Perform a chi-square test of independence based on the input counts array, viewed as a two-way table.
      SignificanceResult test​(long[] observed1, long[] observed2)
      Perform a chi-square test of independence of frequency counts in observed1 and observed2.
      static ChiSquareTest withDefaults()
      Return an instance using the default options.
      ChiSquareTest withDegreesOfFreedomAdjustment​(int v)
      Return an instance with the configured degrees of freedom adjustment.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • COLUMN

        private static final java.lang.String COLUMN
        Name for the column.
        See Also:
        Constant Field Values
      • DEFAULT

        private static final ChiSquareTest DEFAULT
        Default instance.
      • degreesOfFreedomAdjustment

        private final int degreesOfFreedomAdjustment
        Degrees of freedom adjustment.
    • Constructor Detail

      • ChiSquareTest

        private ChiSquareTest​(int degreesOfFreedomAdjustment)
        Parameters:
        degreesOfFreedomAdjustment - Degrees of freedom adjustment.
    • Method Detail

      • withDegreesOfFreedomAdjustment

        public ChiSquareTest withDegreesOfFreedomAdjustment​(int v)
        Return an instance with the configured degrees of freedom adjustment.

        The default degrees of freedom for a sample of length n are n - 1. An intrinsic null hypothesis is one where you estimate one or more parameters from the data in order to get the numbers for your null hypothesis. For a distribution with p parameters where up to p parameters have been estimated from the data the degrees of freedom is in the range [n - 1 - p, n - 1].

        Parameters:
        v - Value.
        Returns:
        an instance
        Throws:
        java.lang.IllegalArgumentException - if the value is negative
      • statistic

        public double statistic​(long[] observed)
        Computes the chi-square goodness-of-fit statistic comparing the observed counts to a uniform expected value (each category is equally likely).

        Note: This is a specialized version of a comparison of observed with an expected array of uniform values. The result is faster than calling statistic(double[], long[]) and the statistic is the same, with an allowance for accumulated floating-point error due to the optimized routine.

        Parameters:
        observed - Observed frequency counts.
        Returns:
        Chi-square statistic
        Throws:
        java.lang.IllegalArgumentException - if the sample size is less than 2; observed has negative entries; or all the observations are zero.
        See Also:
        test(long[])
      • statistic

        public double statistic​(double[] expected,
                                long[] observed)
        Computes the chi-square goodness-of-fit statistic comparing observed and expected frequency counts.

        Note:This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

        Parameters:
        expected - Expected frequency counts.
        observed - Observed frequency counts.
        Returns:
        Chi-square statistic
        Throws:
        java.lang.IllegalArgumentException - if the sample size is less than 2; the array sizes do not match; expected has entries that are not strictly positive; observed has negative entries; or all the observations are zero.
        See Also:
        test(double[], long[])
      • statistic

        public double statistic​(long[][] counts)
        Computes the chi-square statistic associated with a chi-square test of independence based on the input counts array, viewed as a two-way table in row-major format.
        Parameters:
        counts - 2-way table.
        Returns:
        Chi-square statistic
        Throws:
        java.lang.IllegalArgumentException - if the number of rows or columns is less than 2; the array is non-rectangular; the array has negative entries; or the sum of a row or column is zero.
        See Also:
        test(long[][])
      • statistic

        public double statistic​(long[] observed1,
                                long[] observed2)
        Computes a chi-square statistic associated with a chi-square test of independence of frequency counts in observed1 and observed2. The sums of frequency counts in the two samples are not required to be the same. The formula used to compute the test statistic is:

        \[ \sum_i{ \frac{(K * a_i - b_i / K)^2}{a_i + b_i} } \]

        where

        \[ K = \sqrt{ \sum_i{a_i} / \sum_i{b_i} } \]

        Note: This is a specialized version of a 2-by-n contingency table. The result is faster than calling statistic(long[][]) with the table composed as new long[][]{observed1, observed2}. The statistic is the same, with an allowance for accumulated floating-point error due to the optimized routine.

        Parameters:
        observed1 - Observed frequency counts of the first data set.
        observed2 - Observed frequency counts of the second data set.
        Returns:
        Chi-square statistic
        Throws:
        java.lang.IllegalArgumentException - if the sample size is less than 2; the array sizes do not match; either array has entries that are negative; either all counts of observed1 or observed2 are zero; or if the count at some index is zero for both arrays.
        See Also:
        test(long[], long[])
      • test

        public SignificanceResult test​(long[] observed)
        Perform a chi-square goodness-of-fit test evaluating the null hypothesis that the observed counts conform to a uniform distribution (each category is equally likely).
        Parameters:
        observed - Observed frequency counts.
        Returns:
        test result
        Throws:
        java.lang.IllegalArgumentException - if the sample size is less than 2; observed has negative entries; or all the observations are zero
        See Also:
        statistic(long[])
      • test

        public SignificanceResult test​(double[] expected,
                                       long[] observed)
        Perform a chi-square goodness-of-fit test evaluating the null hypothesis that the observed counts conform to the expected counts.

        The test can be configured to apply an adjustment to the degrees of freedom if the observed data has been used to create the expected counts.

        Parameters:
        expected - Expected frequency counts.
        observed - Observed frequency counts.
        Returns:
        test result
        Throws:
        java.lang.IllegalArgumentException - if the sample size is less than 2; the array sizes do not match; expected has entries that are not strictly positive; observed has negative entries; all the observations are zero; or the adjusted degrees of freedom are not strictly positive
        See Also:
        withDegreesOfFreedomAdjustment(int), statistic(double[], long[])
      • test

        public SignificanceResult test​(long[][] counts)
        Perform a chi-square test of independence based on the input counts array, viewed as a two-way table.
        Parameters:
        counts - 2-way table.
        Returns:
        test result
        Throws:
        java.lang.IllegalArgumentException - if the number of rows or columns is less than 2; the array is non-rectangular; the array has negative entries; or the sum of a row or column is zero.
        See Also:
        statistic(long[][])
      • test

        public SignificanceResult test​(long[] observed1,
                                       long[] observed2)
        Perform a chi-square test of independence of frequency counts in observed1 and observed2.

        Note: This is a specialized version of a 2-by-n contingency table.

        Parameters:
        observed1 - Observed frequency counts of the first data set.
        observed2 - Observed frequency counts of the second data set.
        Returns:
        test result
        Throws:
        java.lang.IllegalArgumentException - if the sample size is less than 2; the array sizes do not match; either array has entries that are negative; either all counts of observed1 or observed2 are zero; or if the count at some index is zero for both arrays.
        See Also:
        statistic(long[], long[])
      • computeP

        private static double computeP​(double chi2,
                                       double degreesOfFreedom)
        Compute the chi-square test p-value.
        Parameters:
        chi2 - Chi-square statistic.
        degreesOfFreedom - Degrees of freedom.
        Returns:
        p-value
      • checkNonZero

        private static void checkNonZero​(double value,
                                         java.lang.String name,
                                         int index)
        Check the array value is non-zero.
        Parameters:
        value - Value
        name - Name of the array
        index - Index in the array
        Throws:
        java.lang.IllegalArgumentException - if the value is zero