Class MannWhitneyUTest
- Since:
- 1.1
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final class
Result for the Mann-Whitney U test. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final AlternativeHypothesis
Alternative hypothesis.private static final int
Limit on sample size for the exact p-value computation for the auto mode.private static SoftReference
<double[][][]> A reference to a previously computed storage for f.private final boolean
Perform continuity correction.private static final MannWhitneyUTest
Default instance.private static final Object
An object to use for synchonization when accessing the cache of F.private final double
Expected location shift.private final PValueMethod
Method to compute the p-value.private static final RankingAlgorithm
Ranking instance.private static final double
Value for an unset f computation. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
MannWhitneyUTest
(AlternativeHypothesis alternative, PValueMethod method, boolean continuityCorrection, double mu) -
Method Summary
Modifier and TypeMethodDescriptionprivate double
calculateAsymptoticPValue
(double u, int n1, int n2, double c) Calculate the asymptotic p-value using a Normal approximation.(package private) static double
calculateExactPValue
(double u, int m, int n, AlternativeHypothesis alternative) Calculate the exact p-value.private static double
cdf
(int u1, int u2, int m, int n, double binom) Compute the cumulative density function of the Mann-Whitney U1 statistic.private static void
checkSamples
(double[] x, double[] y) Ensures that the provided arrays fulfil the assumptions.private static double
computeCdf
(int k, int m, int n, double binom) Compute the cumulative density function of the Mann-Whitney U statistic.private static double[]
concatenateSamples
(double mu, double[] x, double[] y) Concatenate the samples into one array.private static double
fmnk
(double[][][] f, int m, int n, int k) Compute f(m; n; k), the number of subsets of {0; 1; ...; n} with m elements such that the elements of this subset add up to k.private static double[][][]
getF
(int m, int n, int k) Gets the storage for f(m, n, k).private static void
initialize
(double[] fmn) Initialize the array for f(m, n, x).private static double
sf
(int u1, int u2, int m, int n, double binom) Compute the survival function of the Mann-Whitney U1 statistic.double
statistic
(double[] x, double[] y) Computes the Mann-Whitney U statistic comparing two independent samples possibly of different length.test
(double[] x, double[] y) Performs a Mann-Whitney U test comparing the location for two independent samples.Return an instance with the configured alternative hypothesis.Return an instance with the configured continuity correction.with
(PValueMethod v) Return an instance with the configured p-value method.static MannWhitneyUTest
Return an instance using the default options.withMu
(double v) Return an instance with the configured location shiftmu
.
-
Field Details
-
AUTO_LIMIT
private static final int AUTO_LIMITLimit on sample size for the exact p-value computation for the auto mode.- See Also:
-
RANKING
Ranking instance. -
UNSET
private static final double UNSETValue for an unset f computation.- See Also:
-
LOCK
An object to use for synchonization when accessing the cache of F. -
cacheF
A reference to a previously computed storage for f. Use of a SoftReference ensures this is garbage collected before an OutOfMemoryError. The value should only be accessed, checked for size and optionally modified when holding the lock. When the storage is determined to be the correct size it can be returned for read/write to the array when not holding the lock. -
DEFAULT
Default instance. -
alternative
Alternative hypothesis. -
pValueMethod
Method to compute the p-value. -
continuityCorrection
private final boolean continuityCorrectionPerform continuity correction. -
mu
private final double muExpected location shift.
-
-
Constructor Details
-
MannWhitneyUTest
private MannWhitneyUTest(AlternativeHypothesis alternative, PValueMethod method, boolean continuityCorrection, double mu) - Parameters:
alternative
- Alternative hypothesis.method
- P-value method.continuityCorrection
- true to perform continuity correction.mu
- Expected location shift.
-
-
Method Details
-
withDefaults
Return an instance using the default options.- Returns:
- default instance
-
with
Return an instance with the configured alternative hypothesis.- Parameters:
v
- Value.- Returns:
- an instance
-
with
Return an instance with the configured p-value method.- Parameters:
v
- Value.- Returns:
- an instance
- Throws:
IllegalArgumentException
- if the value is not in the allowed options or is null
-
with
Return an instance with the configured continuity correction.If
ENABLED
, adjust the U rank statistic by 0.5 towards the mean value when computing the z-statistic if a normal approximation is used to compute the p-value.- Parameters:
v
- Value.- Returns:
- an instance
-
withMu
Return an instance with the configured location shiftmu
.- Parameters:
v
- Value.- Returns:
- an instance
- Throws:
IllegalArgumentException
- if the value is not finite
-
statistic
public double statistic(double[] x, double[] y) Computes the Mann-Whitney U statistic comparing two independent samples possibly of different length.This statistic can be used to perform a Mann-Whitney U test evaluating the null hypothesis that the two independent samples differ by a location shift of
mu
.This returns the U1 statistic. Compute the U2 statistic using:
u2 = (long) x.length * y.length - u1;
- Parameters:
x
- First sample values.y
- Second sample values.- Returns:
- Mann-Whitney U1 statistic
- Throws:
IllegalArgumentException
- ifx
ory
are zero-length; or contain NaN values.- See Also:
-
test
Performs a Mann-Whitney U test comparing the location for two independent samples. The location is specified usingmu
.The test is defined by the
AlternativeHypothesis
.- 'two-sided': the distribution underlying
(x - mu)
is not equal to the distribution underlyingy
. - 'greater': the distribution underlying
(x - mu)
is stochastically greater than the distribution underlyingy
. - 'less': the distribution underlying
(x - mu)
is stochastically less than the distribution underlyingy
.
If the p-value method is auto an exact p-value is computed if the samples contain less than 50 values; otherwise a normal approximation is used.
Computation of the exact p-value is only valid if there are no tied ranks in the data; otherwise the p-value resorts to the asymptotic approximation using a tie correction and an optional continuity correction.
Note: Exact computation requires tabulation of values not exceeding size
(n+1)*(m+1)*(u+1)
whereu
is the minimum of the U1 and U2 statistics andn
andm
are the sample sizes. This may use a very large amount of memory and result in anOutOfMemoryError
. Exact computation requires a finite binomial coefficientbinom(n+m, m)
which is limited ton+m <= 1029
for anyn
andm
, ormin(n, m) <= 37
for anymax(n, m)
. AnOutOfMemoryError
is not expected using the limits configured for the auto p-value computation as the maximum required memory is approximately 23 MiB.- Parameters:
x
- First sample values.y
- Second sample values.- Returns:
- test result
- Throws:
IllegalArgumentException
- ifx
ory
are zero-length; or contain NaN values.OutOfMemoryError
- if the exact computation is user-requested for large samples and there is not enough memory.- See Also:
- 'two-sided': the distribution underlying
-
checkSamples
private static void checkSamples(double[] x, double[] y) Ensures that the provided arrays fulfil the assumptions.- Parameters:
x
- First sample values.y
- Second sample values.- Throws:
IllegalArgumentException
- ifx
ory
are zero-length.
-
concatenateSamples
private static double[] concatenateSamples(double mu, double[] x, double[] y) Concatenate the samples into one array. Subtractmu
from the first sample.- Parameters:
mu
- Expected difference between means.x
- First sample values.y
- Second sample values.- Returns:
- concatenated array
-
calculateAsymptoticPValue
private double calculateAsymptoticPValue(double u, int n1, int n2, double c) Calculate the asymptotic p-value using a Normal approximation.- Parameters:
u
- Mann-Whitney U value.n1
- Number of subjects in first sample.n2
- Number of subjects in second sample.c
- Tie-correction- Returns:
- two-sided asymptotic p-value
-
calculateExactPValue
Calculate the exact p-value. If the value cannot be computed this returns -1.Note: Computation may run out of memory during array allocation, or method recursion.
- Parameters:
u
- Mann-Whitney U value.m
- Number of subjects in first sample.n
- Number of subjects in second sample.alternative
- Alternative hypothesis.- Returns:
- exact p-value (or -1) (two-sided, greater, or less using the options)
-
cdf
private static double cdf(int u1, int u2, int m, int n, double binom) Compute the cumulative density function of the Mann-Whitney U1 statistic. The U2 statistic is passed for convenience to exploit symmetry in the distribution.- Parameters:
u1
- Mann-Whitney U1 statisticu2
- Mann-Whitney U2 statisticm
- First sample size.n
- Second sample size.binom
- binom(n+m, m) (must be finite)- Returns:
Pr(X <= k)
-
sf
private static double sf(int u1, int u2, int m, int n, double binom) Compute the survival function of the Mann-Whitney U1 statistic. The U2 statistic is passed for convenience to exploit symmetry in the distribution.- Parameters:
u1
- Mann-Whitney U1 statisticu2
- Mann-Whitney U2 statisticm
- First sample size.n
- Second sample size.binom
- binom(n+m, m) (must be finite)- Returns:
Pr(X > k)
-
computeCdf
private static double computeCdf(int k, int m, int n, double binom) Compute the cumulative density function of the Mann-Whitney U statistic.This should be called with the lower of U1 or U2 for computational efficiency.
Uses the recursive formula provided in Bucchianico, A.D, (1999) Combinatorics, computer algebra and the Wilcoxon-Mann-Whitney test, Journal of Statistical Planning and Inference, Volume 79, Issue 2, 349-364.
- Parameters:
k
- Mann-Whitney U statisticm
- First sample size.n
- Second sample size.binom
- binom(n+m, m) (must be finite)- Returns:
Pr(X <= k)
-
getF
private static double[][][] getF(int m, int n, int k) Gets the storage for f(m, n, k).This may be cached for performance.
- Parameters:
m
- M.n
- N.k
- K.- Returns:
- the storage for f
-
initialize
private static void initialize(double[] fmn) Initialize the array for f(m, n, x). Set value to 1 for x=0; otherwiseUNSET
.- Parameters:
fmn
- Array.
-
fmnk
private static double fmnk(double[][][] f, int m, int n, int k) Compute f(m; n; k), the number of subsets of {0; 1; ...; n} with m elements such that the elements of this subset add up to k.The function is computed recursively.
- Parameters:
f
- Tabulated values of f[m][n][k].m
- Mn
- Nk
- K- Returns:
- f(m; n; k)
-