java.lang.Object
org.apache.commons.statistics.inference.GTest

public final class GTest extends Object
Implements G-test (Generalized Log-Likelihood Ratio Test) statistics.

This is known in statistical genetics as the McDonald-Kreitman test. The implementation handles both known and unknown distributions.

Two samples tests can be used when the distribution is unknown a priori but provided by one sample, or when the hypothesis under test is that the two samples come from the same underlying distribution.

Since:
1.1
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final GTest
    Default instance.
    private final int
    Degrees of freedom adjustment.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
    GTest(int degreesOfFreedomAdjustment)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    private static void
    checkNonZero(double value, String name, int index)
    Check the array value is non-zero.
    private static double
    computeP(double g, double degreesOfFreedom)
    Compute the G-test p-value.
    double
    statistic(double[] expected, long[] observed)
    Computes the G-test goodness-of-fit statistic comparing observed and expected frequency counts.
    double
    statistic(long[] observed)
    Computes the G-test goodness-of-fit statistic comparing the observed counts to a uniform expected value (each category is equally likely).
    double
    statistic(long[][] counts)
    Computes a G-test statistic associated with a G-test of independence based on the input counts array, viewed as a two-way table.
    test(double[] expected, long[] observed)
    Perform a G-test for goodness-of-fit evaluating the null hypothesis that the observed counts conform to the expected counts.
    test(long[] observed)
    Perform a G-test for goodness-of-fit evaluating the null hypothesis that the observed counts conform to a uniform distribution (each category is equally likely).
    test(long[][] counts)
    Perform a G-test of independence based on the input counts array, viewed as a two-way table.
    static GTest
    Return an instance using the default options.
    Return an instance with the configured degrees of freedom adjustment.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DEFAULT

      private static final GTest DEFAULT
      Default instance.
    • degreesOfFreedomAdjustment

      private final int degreesOfFreedomAdjustment
      Degrees of freedom adjustment.
  • Constructor Details

    • GTest

      private GTest(int degreesOfFreedomAdjustment)
      Parameters:
      degreesOfFreedomAdjustment - Degrees of freedom adjustment.
  • Method Details

    • withDefaults

      public static GTest withDefaults()
      Return an instance using the default options.
      Returns:
      default instance
    • withDegreesOfFreedomAdjustment

      public GTest withDegreesOfFreedomAdjustment(int v)
      Return an instance with the configured degrees of freedom adjustment.

      The default degrees of freedom for a sample of length n are n - 1. An intrinsic null hypothesis is one where you estimate one or more parameters from the data in order to get the numbers for your null hypothesis. For a distribution with p parameters where up to p parameters have been estimated from the data the degrees of freedom is in the range [n - 1 - p, n - 1].

      Parameters:
      v - Value.
      Returns:
      an instance
      Throws:
      IllegalArgumentException - if the value is negative
    • statistic

      public double statistic(long[] observed)
      Computes the G-test goodness-of-fit statistic comparing the observed counts to a uniform expected value (each category is equally likely).

      Note: This is a specialized version of a comparison of observed with an expected array of uniform values. The result is faster than calling statistic(double[], long[]) and the statistic is the same, with an allowance for accumulated floating-point error due to the optimized routine.

      Parameters:
      observed - Observed frequency counts.
      Returns:
      G-test statistic
      Throws:
      IllegalArgumentException - if the sample size is less than 2; observed has negative entries; or all the observations are zero.
      See Also:
    • statistic

      public double statistic(double[] expected, long[] observed)
      Computes the G-test goodness-of-fit statistic comparing observed and expected frequency counts.

      Note:This implementation rescales the values if necessary to ensure that the sum of the expected and observed counts are equal.

      Parameters:
      expected - Expected frequency counts.
      observed - Observed frequency counts.
      Returns:
      G-test statistic
      Throws:
      IllegalArgumentException - if the sample size is less than 2; the array sizes do not match; expected has entries that are not strictly positive; observed has negative entries; or all the observations are zero.
      See Also:
    • statistic

      public double statistic(long[][] counts)
      Computes a G-test statistic associated with a G-test of independence based on the input counts array, viewed as a two-way table. The formula used to compute the test statistic is:

      \[ G = 2 \cdot \sum_{ij}{O_{ij}} \cdot \left[ H(r) + H(c) - H(r,c) \right] \]

      and \( H \) is the Shannon Entropy of the random variable formed by viewing the elements of the argument array as incidence counts:

      \[ H(X) = - {\sum_{x \in \text{Supp}(X)} p(x) \ln p(x)} \]

      Parameters:
      counts - 2-way table.
      Returns:
      G-test statistic
      Throws:
      IllegalArgumentException - if the number of rows or columns is less than 2; the array is non-rectangular; the array has negative entries; or the sum of a row or column is zero.
      See Also:
    • test

      public SignificanceResult test(long[] observed)
      Perform a G-test for goodness-of-fit evaluating the null hypothesis that the observed counts conform to a uniform distribution (each category is equally likely).
      Parameters:
      observed - Observed frequency counts.
      Returns:
      test result
      Throws:
      IllegalArgumentException - if the sample size is less than 2; observed has negative entries; or all the observations are zero
      See Also:
    • test

      public SignificanceResult test(double[] expected, long[] observed)
      Perform a G-test for goodness-of-fit evaluating the null hypothesis that the observed counts conform to the expected counts.

      The test can be configured to apply an adjustment to the degrees of freedom if the observed data has been used to create the expected counts.

      Parameters:
      expected - Expected frequency counts.
      observed - Observed frequency counts.
      Returns:
      test result
      Throws:
      IllegalArgumentException - if the sample size is less than 2; the array sizes do not match; expected has entries that are not strictly positive; observed has negative entries; all the observations are zero; or the adjusted degrees of freedom are not strictly positive
      See Also:
    • test

      public SignificanceResult test(long[][] counts)
      Perform a G-test of independence based on the input counts array, viewed as a two-way table.
      Parameters:
      counts - 2-way table.
      Returns:
      test result
      Throws:
      IllegalArgumentException - if the number of rows or columns is less than 2; the array is non-rectangular; the array has negative entries; or the sum of a row or column is zero.
      See Also:
    • computeP

      private static double computeP(double g, double degreesOfFreedom)
      Compute the G-test p-value.
      Parameters:
      g - G-test statistic.
      degreesOfFreedom - Degrees of freedom.
      Returns:
      p-value
    • checkNonZero

      private static void checkNonZero(double value, String name, int index)
      Check the array value is non-zero.
      Parameters:
      value - Value
      name - Name of the array
      index - Index in the array
      Throws:
      IllegalArgumentException - if the value is zero