Class KolmogorovSmirnovDistribution.One

java.lang.Object
org.apache.commons.statistics.inference.KolmogorovSmirnovDistribution.One
Enclosing class:
KolmogorovSmirnovDistribution

static final class KolmogorovSmirnovDistribution.One extends Object
Computes the complementary probability P[D_n^+ >= x] for the one-sided one-sample Kolmogorov-Smirnov distribution.
 D_n^+ = sup_x {CDF_n(x) - F(x)}
 

where n is the sample size; CDF_n(x) is an empirical cumulative distribution function; and F(x) is the expected distribution. The computation uses Smirnov's stable formula:

                   floor(n(1-x)) (n) ( j     ) (j-1)  (         j ) (n-j)
 P[D_n^+ >= x] = x     Sum       ( ) ( - + x )        ( 1 - x - - )
                       j=0       (j) ( n     )        (         n )
 

Computing using logs is not as accurate as direct multiplication when n is large. However the terms are very large and small. Multiplication uses a scaled representation with a separate exponent term to support the extreme range. Extended precision representation of the numbers reduces the error in the power terms. Details in van Mulbregt (2018).

References:

  1. van Mulbregt, P. (2018). Computing the Cumulative Distribution Function and Quantiles of the One-sided Kolmogorov-Smirnov Statistic arxiv:1802.06966.
  2. Magg & Dicaire (1971). On Kolmogorov-Smirnov Type One-Sample Statistics Biometrika 58.3 pp. 653–656.
Since:
1.1
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    (package private) static interface 
    Defines a scaled power function.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    Proxy for the default choice of the scaled power function.
    private static final int
    Maximum number of term for the Smirnov-Dwass algorithm.
    private static final int
    Minimum sample size for the Smirnov-Dwass algorithm.
    private static final int
    Number of bits of precision in the sum of terms Aj.
    private static final int
    Number of bits of precision in the sum of terms Aj.
    private static final int
    "Very large" n to use a asymptotic limiting form.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
    One()
    No instances.
  • Method Summary

    Modifier and Type
    Method
    Description
    private static int
    log2(int n)
    Returns floor(log2(n)).
    (package private) static double
    sf(double x, int n)
    Calculates complementary probability P[D_n^+ >= x], or survival function (SF), for the one-sided one-sample Kolmogorov-Smirnov distribution.
    (package private) static double
    Calculates complementary probability P[D_n^+ >= x], or survival function (SF), for the one-sided one-sample Kolmogorov-Smirnov distribution.
    private static double
    sfAsymptotic(double x, int n)
    Calculates complementary probability P[D_n^+ >= x], or survival function (SF), for the one-sided one-sample Kolmogorov-Smirnov distribution.
    private static double
    sfExact(double x, int n)
    Calculates exact cases for the complementary probability P[D_n^+ >= x] the one-sided one-sample Kolmogorov-Smirnov distribution.
    (package private) static int
    splitX(int n, double x, double[] alpha)
    Compute exactly x = (k + alpha) / n with k an integer and alpha in [0, 1).

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • VERY_LARGE_N

      private static final int VERY_LARGE_N
      "Very large" n to use a asymptotic limiting form. [1] suggests 1e12 but this is reduced to avoid excess computation time.
      See Also:
    • SD_MAX_TERMS

      private static final int SD_MAX_TERMS
      Maximum number of term for the Smirnov-Dwass algorithm.
      See Also:
    • SD_MIN_N

      private static final int SD_MIN_N
      Minimum sample size for the Smirnov-Dwass algorithm.
      See Also:
    • SUM_PRECISION_BITS

      private static final int SUM_PRECISION_BITS
      Number of bits of precision in the sum of terms Aj. This does not have to be the full 106 bits of a double-double as the final result is used as a double. The terms are represented as fractions with an exponent:
        Aj = 2^b * f
        f of sum(A) in [0.5, 1)
        f of Aj in [0.25, 2]
       

      The terms can be added if their exponents overlap. The bits of precision must account for the extra range of the fractional part of Aj by 1 bit. Note that additional bits are added to this dynamically based on the number of terms.

      See Also:
    • SD_SUM_PRECISION_BITS

      private static final int SD_SUM_PRECISION_BITS
      Number of bits of precision in the sum of terms Aj. For Smirnov-Dwass we use the full 106 bits of a double-double due to the summation of terms that cancel. Account for the extra range of the fractional part of Aj by 1 bit.
      See Also:
    • POWER_DEFAULT

      private static final KolmogorovSmirnovDistribution.One.ScaledPower POWER_DEFAULT
      Proxy for the default choice of the scaled power function. The actual choice is based on the chosen algorithm.
  • Constructor Details

    • One

      private One()
      No instances.
  • Method Details

    • sf

      static double sf(double x, int n)
      Calculates complementary probability P[D_n^+ >= x], or survival function (SF), for the one-sided one-sample Kolmogorov-Smirnov distribution.
      Parameters:
      x - Statistic.
      n - Sample size (assumed to be positive).
      Returns:
      \(P(D_n^+ ≥ x)\)
    • sfExact

      private static double sfExact(double x, int n)
      Calculates exact cases for the complementary probability P[D_n^+ >= x] the one-sided one-sample Kolmogorov-Smirnov distribution.

      Exact cases handle x not in [0, 1]. It is assumed n is positive.

      Parameters:
      x - Statistic.
      n - Sample size (assumed to be positive).
      Returns:
      \(P(D_n^+ ≥ x)\)
    • sfAsymptotic

      private static double sfAsymptotic(double x, int n)
      Calculates complementary probability P[D_n^+ >= x], or survival function (SF), for the one-sided one-sample Kolmogorov-Smirnov distribution.

      Computes the result using the asymptotic formula Eq 5 in [1].

      Parameters:
      x - Statistic.
      n - Sample size (assumed to be positive).
      Returns:
      \(P(D_n^+ ≥ x)\)
    • sf

      static double sf(double x, int n, KolmogorovSmirnovDistribution.One.ScaledPower power)
      Calculates complementary probability P[D_n^+ >= x], or survival function (SF), for the one-sided one-sample Kolmogorov-Smirnov distribution.

      Computes the result using double-double arithmetic. The power function can use a fast approximation or a full power computation.

      This function is safe for x > 1/n. When x approaches sub-normal then division or multiplication by x can under/overflow. The case of x < 1/n can be computed in sfExact.

      Parameters:
      x - Statistic (typically in (1/n, 1 - 1/n)).
      n - Sample size (assumed to be positive).
      power - Function to compute the scaled power (can be null).
      Returns:
      \(P(D_n^+ ≥ x)\)
      See Also:
      • DD.pow(int, long[])
      • DDMath.pow(DD, int, long[])
    • splitX

      static int splitX(int n, double x, double[] alpha)
      Compute exactly x = (k + alpha) / n with k an integer and alpha in [0, 1). Note that k ~ floor(nx) but may be rounded up if alpha -> 1 within working precision.

      This computation is a significant source of increased error if performed in 64-bit arithmetic. Although the value alpha is only used for the PDF computation a value of alpha == 0 indicates the final term of the SF summation can be dropped due to the cancellation of a power term (x + j/n) to zero with x = (n-j)/n. That is if alpha == 0 then x is the fraction k/n and one Aj term is zero.

      Parameters:
      n - Sample size.
      x - Statistic.
      alpha - Output alpha.
      Returns:
      k
    • log2

      private static int log2(int n)
      Returns floor(log2(n)).
      Parameters:
      n - Value.
      Returns:
      approximate log2(n)