Class MannWhitneyUTest


  • public final class MannWhitneyUTest
    extends java.lang.Object
    Implements the Mann-Whitney U test (also called Wilcoxon rank-sum test).
    Since:
    1.1
    See Also:
    Mann-Whitney U test (Wikipedia)
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  MannWhitneyUTest.Result
      Result for the Mann-Whitney U test.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private AlternativeHypothesis alternative
      Alternative hypothesis.
      private static int AUTO_LIMIT
      Limit on sample size for the exact p-value computation for the auto mode.
      private static java.lang.ref.SoftReference<double[][][]> cacheF
      A reference to a previously computed storage for f.
      private boolean continuityCorrection
      Perform continuity correction.
      private static MannWhitneyUTest DEFAULT
      Default instance.
      private static java.lang.Object LOCK
      An object to use for synchonization when accessing the cache of F.
      private double mu
      Expected location shift.
      private PValueMethod pValueMethod
      Method to compute the p-value.
      private static RankingAlgorithm RANKING
      Ranking instance.
      private static double UNSET
      Value for an unset f computation.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private double calculateAsymptoticPValue​(double u, int n1, int n2, double c)
      Calculate the asymptotic p-value using a Normal approximation.
      (package private) static double calculateExactPValue​(double u, int m, int n, AlternativeHypothesis alternative)
      Calculate the exact p-value.
      private static double cdf​(int u1, int u2, int m, int n, double binom)
      Compute the cumulative density function of the Mann-Whitney U1 statistic.
      private static void checkSamples​(double[] x, double[] y)
      Ensures that the provided arrays fulfil the assumptions.
      private static double computeCdf​(int k, int m, int n, double binom)
      Compute the cumulative density function of the Mann-Whitney U statistic.
      private static double[] concatenateSamples​(double mu, double[] x, double[] y)
      Concatenate the samples into one array.
      private static double fmnk​(double[][][] f, int m, int n, int k)
      Compute f(m; n; k), the number of subsets of {0; 1; ...; n} with m elements such that the elements of this subset add up to k.
      private static double[][][] getF​(int m, int n, int k)
      Gets the storage for f(m, n, k).
      private static void initialize​(double[] fmn)
      Initialize the array for f(m, n, x).
      private static double sf​(int u1, int u2, int m, int n, double binom)
      Compute the survival function of the Mann-Whitney U1 statistic.
      double statistic​(double[] x, double[] y)
      Computes the Mann-Whitney U statistic comparing two independent samples possibly of different length.
      MannWhitneyUTest.Result test​(double[] x, double[] y)
      Performs a Mann-Whitney U test comparing the location for two independent samples.
      MannWhitneyUTest with​(AlternativeHypothesis v)
      Return an instance with the configured alternative hypothesis.
      MannWhitneyUTest with​(ContinuityCorrection v)
      Return an instance with the configured continuity correction.
      MannWhitneyUTest with​(PValueMethod v)
      Return an instance with the configured p-value method.
      static MannWhitneyUTest withDefaults()
      Return an instance using the default options.
      MannWhitneyUTest withMu​(double v)
      Return an instance with the configured location shift mu.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • AUTO_LIMIT

        private static final int AUTO_LIMIT
        Limit on sample size for the exact p-value computation for the auto mode.
        See Also:
        Constant Field Values
      • UNSET

        private static final double UNSET
        Value for an unset f computation.
        See Also:
        Constant Field Values
      • LOCK

        private static final java.lang.Object LOCK
        An object to use for synchonization when accessing the cache of F.
      • cacheF

        private static java.lang.ref.SoftReference<double[][][]> cacheF
        A reference to a previously computed storage for f. Use of a SoftReference ensures this is garbage collected before an OutOfMemoryError. The value should only be accessed, checked for size and optionally modified when holding the lock. When the storage is determined to be the correct size it can be returned for read/write to the array when not holding the lock.
      • pValueMethod

        private final PValueMethod pValueMethod
        Method to compute the p-value.
      • continuityCorrection

        private final boolean continuityCorrection
        Perform continuity correction.
      • mu

        private final double mu
        Expected location shift.
    • Constructor Detail

      • MannWhitneyUTest

        private MannWhitneyUTest​(AlternativeHypothesis alternative,
                                 PValueMethod method,
                                 boolean continuityCorrection,
                                 double mu)
        Parameters:
        alternative - Alternative hypothesis.
        method - P-value method.
        continuityCorrection - true to perform continuity correction.
        mu - Expected location shift.
    • Method Detail

      • with

        public MannWhitneyUTest with​(PValueMethod v)
        Return an instance with the configured p-value method.
        Parameters:
        v - Value.
        Returns:
        an instance
        Throws:
        java.lang.IllegalArgumentException - if the value is not in the allowed options or is null
      • with

        public MannWhitneyUTest with​(ContinuityCorrection v)
        Return an instance with the configured continuity correction.

        If ENABLED, adjust the U rank statistic by 0.5 towards the mean value when computing the z-statistic if a normal approximation is used to compute the p-value.

        Parameters:
        v - Value.
        Returns:
        an instance
      • withMu

        public MannWhitneyUTest withMu​(double v)
        Return an instance with the configured location shift mu.
        Parameters:
        v - Value.
        Returns:
        an instance
        Throws:
        java.lang.IllegalArgumentException - if the value is not finite
      • statistic

        public double statistic​(double[] x,
                                double[] y)
        Computes the Mann-Whitney U statistic comparing two independent samples possibly of different length.

        This statistic can be used to perform a Mann-Whitney U test evaluating the null hypothesis that the two independent samples differ by a location shift of mu.

        This returns the U1 statistic. Compute the U2 statistic using:

         u2 = (long) x.length * y.length - u1;
         
        Parameters:
        x - First sample values.
        y - Second sample values.
        Returns:
        Mann-Whitney U1 statistic
        Throws:
        java.lang.IllegalArgumentException - if x or y are zero-length; or contain NaN values.
        See Also:
        withMu(double)
      • test

        public MannWhitneyUTest.Result test​(double[] x,
                                            double[] y)
        Performs a Mann-Whitney U test comparing the location for two independent samples. The location is specified using mu.

        The test is defined by the AlternativeHypothesis.

        • 'two-sided': the distribution underlying (x - mu) is not equal to the distribution underlying y.
        • 'greater': the distribution underlying (x - mu) is stochastically greater than the distribution underlying y.
        • 'less': the distribution underlying (x - mu) is stochastically less than the distribution underlying y.

        If the p-value method is auto an exact p-value is computed if the samples contain less than 50 values; otherwise a normal approximation is used.

        Computation of the exact p-value is only valid if there are no tied ranks in the data; otherwise the p-value resorts to the asymptotic approximation using a tie correction and an optional continuity correction.

        Note: Exact computation requires tabulation of values not exceeding size (n+1)*(m+1)*(u+1) where u is the minimum of the U1 and U2 statistics and n and m are the sample sizes. This may use a very large amount of memory and result in an OutOfMemoryError. Exact computation requires a finite binomial coefficient binom(n+m, m) which is limited to n+m <= 1029 for any n and m, or min(n, m) <= 37 for any max(n, m). An OutOfMemoryError is not expected using the limits configured for the auto p-value computation as the maximum required memory is approximately 23 MiB.

        Parameters:
        x - First sample values.
        y - Second sample values.
        Returns:
        test result
        Throws:
        java.lang.IllegalArgumentException - if x or y are zero-length; or contain NaN values.
        java.lang.OutOfMemoryError - if the exact computation is user-requested for large samples and there is not enough memory.
        See Also:
        statistic(double[], double[]), withMu(double), with(AlternativeHypothesis), with(ContinuityCorrection)
      • checkSamples

        private static void checkSamples​(double[] x,
                                         double[] y)
        Ensures that the provided arrays fulfil the assumptions.
        Parameters:
        x - First sample values.
        y - Second sample values.
        Throws:
        java.lang.IllegalArgumentException - if x or y are zero-length.
      • concatenateSamples

        private static double[] concatenateSamples​(double mu,
                                                   double[] x,
                                                   double[] y)
        Concatenate the samples into one array. Subtract mu from the first sample.
        Parameters:
        mu - Expected difference between means.
        x - First sample values.
        y - Second sample values.
        Returns:
        concatenated array
      • calculateAsymptoticPValue

        private double calculateAsymptoticPValue​(double u,
                                                 int n1,
                                                 int n2,
                                                 double c)
        Calculate the asymptotic p-value using a Normal approximation.
        Parameters:
        u - Mann-Whitney U value.
        n1 - Number of subjects in first sample.
        n2 - Number of subjects in second sample.
        c - Tie-correction
        Returns:
        two-sided asymptotic p-value
      • calculateExactPValue

        static double calculateExactPValue​(double u,
                                           int m,
                                           int n,
                                           AlternativeHypothesis alternative)
        Calculate the exact p-value. If the value cannot be computed this returns -1.

        Note: Computation may run out of memory during array allocation, or method recursion.

        Parameters:
        u - Mann-Whitney U value.
        m - Number of subjects in first sample.
        n - Number of subjects in second sample.
        alternative - Alternative hypothesis.
        Returns:
        exact p-value (or -1) (two-sided, greater, or less using the options)
      • cdf

        private static double cdf​(int u1,
                                  int u2,
                                  int m,
                                  int n,
                                  double binom)
        Compute the cumulative density function of the Mann-Whitney U1 statistic. The U2 statistic is passed for convenience to exploit symmetry in the distribution.
        Parameters:
        u1 - Mann-Whitney U1 statistic
        u2 - Mann-Whitney U2 statistic
        m - First sample size.
        n - Second sample size.
        binom - binom(n+m, m) (must be finite)
        Returns:
        Pr(X <= k)
      • sf

        private static double sf​(int u1,
                                 int u2,
                                 int m,
                                 int n,
                                 double binom)
        Compute the survival function of the Mann-Whitney U1 statistic. The U2 statistic is passed for convenience to exploit symmetry in the distribution.
        Parameters:
        u1 - Mann-Whitney U1 statistic
        u2 - Mann-Whitney U2 statistic
        m - First sample size.
        n - Second sample size.
        binom - binom(n+m, m) (must be finite)
        Returns:
        Pr(X > k)
      • computeCdf

        private static double computeCdf​(int k,
                                         int m,
                                         int n,
                                         double binom)
        Compute the cumulative density function of the Mann-Whitney U statistic.

        This should be called with the lower of U1 or U2 for computational efficiency.

        Uses the recursive formula provided in Bucchianico, A.D, (1999) Combinatorics, computer algebra and the Wilcoxon-Mann-Whitney test, Journal of Statistical Planning and Inference, Volume 79, Issue 2, 349-364.

        Parameters:
        k - Mann-Whitney U statistic
        m - First sample size.
        n - Second sample size.
        binom - binom(n+m, m) (must be finite)
        Returns:
        Pr(X <= k)
      • getF

        private static double[][][] getF​(int m,
                                         int n,
                                         int k)
        Gets the storage for f(m, n, k).

        This may be cached for performance.

        Parameters:
        m - M.
        n - N.
        k - K.
        Returns:
        the storage for f
      • initialize

        private static void initialize​(double[] fmn)
        Initialize the array for f(m, n, x). Set value to 1 for x=0; otherwise UNSET.
        Parameters:
        fmn - Array.
      • fmnk

        private static double fmnk​(double[][][] f,
                                   int m,
                                   int n,
                                   int k)
        Compute f(m; n; k), the number of subsets of {0; 1; ...; n} with m elements such that the elements of this subset add up to k.

        The function is computed recursively.

        Parameters:
        f - Tabulated values of f[m][n][k].
        m - M
        n - N
        k - K
        Returns:
        f(m; n; k)