Class Statistics

java.lang.Object
org.apache.sis.math.Statistics
All Implemented Interfaces:
Serializable, Cloneable, DoubleConsumer, LongConsumer
Direct Known Subclasses:
Statistics.WithDelta

public class Statistics extends Object implements DoubleConsumer, LongConsumer, Cloneable, Serializable
Holds some statistics derived from a series of sample values. Given a series of y₀, y₁, y₂, y₃, etc… samples, this class computes the minimum, maximum, mean, root mean square and standard deviation of the given samples.

In addition to the statistics on the sample values, this class can optionally compute statistics on the differences between consecutive sample values, i.e. the statistics on y₁-y₀, y₂-y₁, y₃-y₂, etc…, Those statistics can be fetched by a call to differences(). They are useful for verifying if the interval between sample values is approximately constant.

If the samples are (at least conceptually) the result of some y=f(x) function for x values increasing or decreasing at a constant interval Δx, then one can get the statistics on the discrete derivatives by a call to differences().scale(1/Δx).

Statistics are computed on the fly using the Kahan summation algorithm for reducing the numerical errors; the sample values are never stored in memory.

An instance of Statistics is initially empty: the count of values is set to zero, and all above-cited statistical values are set to NaN. The statistics are updated every time an accept(double) method is invoked with a non-NaN value.

Examples

The following examples assume that a y=f(x) function is defined. A simple usage is: Following example computes the statistics on the first and second derivatives in addition to the statistics on the sample values:
Since:
0.3
Version:
1.2
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    private static final class 
    Holds some statistics about the difference between consecutive sample values.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private int
    Number of non-NaN values given to the accept(double) method.
    private int
    Number of NaN values given to the accept(double) method.
    private double
    The low-order bits in last update of sum.
    private double
    The maximal value given to the accept(double) method.
    private double
    The minimal value given to the accept(double) method.
    private final org.opengis.util.InternationalString
    The name of the phenomenon for which this object is collecting statistics.
    private static final long
    Serial number for compatibility with different versions.
    private double
    The low-order bits in last update of squareSum.
    private double
    The sum of square of all values given to the accept(double) method.
    private double
    The sum of all values given to the accept(double) method.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructs an initially empty set of statistics.
    Statistics(CharSequence name, int countNaN, int count, double minimum, double maximum, double mean, double standardDeviation, boolean allPopulation)
    Constructs a set of statistics initialized to the given values.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    accept(double sample)
    Updates statistics for the specified floating-point sample value.
    void
    accept(long sample)
    Updates statistics for the specified integer sample value.
    Returns a clone of this statistics.
    void
    Updates statistics with all samples from the specified stats.
    int
    Returns the number of samples, excluding NaN values.
    int
    Returns the number of NaN samples.
    (package private) void
    For Statistics.WithDelta usage only.
    Returns the statistics on the differences between sample values, or null if none.
    boolean
    equals(Object object)
    Compares this statistics with the specified object for equality.
    static Statistics
    forSeries(CharSequence name, CharSequence... differenceNames)
    Constructs a new Statistics object which will also compute finite differences up to the given order.
    int
    Returns a hash code value for this statistics.
    double
    Returns the maximum sample value, or NaN if none.
    double
    Returns the mean value, or NaN if none.
    double
    Returns the minimum sample value, or NaN if none.
    org.opengis.util.InternationalString
    Returns the name of the phenomenon for which this object is collecting statistics.
    private void
    real(double sample)
    Implementation of accept(double) for real (non-NaN) numbers.
    void
    Resets this object state as if it was just created.
    double
    rms()
    Returns the root mean square, or NaN if none.
    void
    scale(double factor)
    Multiplies the statistics by the given factor.
    double
    Equivalents to maximum - minimum.
    double
    standardDeviation(boolean allPopulation)
    Returns the standard deviation.
    double
    sum()
    Returns the sum, or 0 if none.
    Returns a string representation of this statistics.

    Methods inherited from class java.lang.Object

    finalize, getClass, notify, notifyAll, wait, wait, wait

    Methods inherited from interface java.util.function.DoubleConsumer

    andThen

    Methods inherited from interface java.util.function.LongConsumer

    andThen
  • Field Details

    • serialVersionUID

      private static final long serialVersionUID
      Serial number for compatibility with different versions.
      See Also:
    • name

      private final org.opengis.util.InternationalString name
      The name of the phenomenon for which this object is collecting statistics. If non-null, then this name will be shown as column header in the table formatted by StatisticsFormat.
      See Also:
    • minimum

      private double minimum
      The minimal value given to the accept(double) method.
    • maximum

      private double maximum
      The maximal value given to the accept(double) method.
    • sum

      private double sum
      The sum of all values given to the accept(double) method.
    • squareSum

      private double squareSum
      The sum of square of all values given to the accept(double) method.
    • lowBits

      private transient double lowBits
      The low-order bits in last update of sum. This is used for the Kahan summation algorithm.
    • squareLowBits

      private transient double squareLowBits
      The low-order bits in last update of squareSum. This is used for the Kahan summation algorithm.
    • count

      private int count
      Number of non-NaN values given to the accept(double) method.
    • countNaN

      private int countNaN
      Number of NaN values given to the accept(double) method. Those value are ignored in the computation of all above values.
  • Constructor Details

    • Statistics

      public Statistics(CharSequence name)
      Constructs an initially empty set of statistics. The count() and the sum() are initialized to zero and all other statistical values are initialized to Double.NaN.

      Instances created by this constructor do not compute differences between sample values. If differences or discrete derivatives are wanted, use the forSeries(…) method instead.

      Parameters:
      name - the phenomenon for which this object is collecting statistics, or null if none. If non-null, it will be shown as column header in the table formatted by StatisticsFormat.
    • Statistics

      public Statistics(CharSequence name, int countNaN, int count, double minimum, double maximum, double mean, double standardDeviation, boolean allPopulation)
      Constructs a set of statistics initialized to the given values. The countNaN and count arguments must be positive. If count is 0, all following double arguments are ignored. Otherwise the following restrictions apply:
      • minimum and maximum arguments are mandatory and cannot be NaN.
      • mean argument is mandatory (cannot be NaN) if standardDeviation is not NaN.
      • mean and standardDeviation arguments can be both NaN if unknown, but statistics initialized that way will always return NaN from sum(), mean(), rms() and standardDeviation(boolean) methods.
      Parameters:
      name - the phenomenon for which this object is collecting statistics, or null if none.
      countNaN - the number of NaN samples.
      count - the number of samples, excluding NaN values.
      minimum - the minimum sample value. Ignored if count is zero.
      maximum - the maximum sample value. Ignored if count is zero.
      mean - the mean value. Ignored if count is zero.
      standardDeviation - the standard deviation. Ignored if count is zero.
      allPopulation - true if sample values were the totality of the population under study, or false if they were only a sampling.
      Since:
      1.2
  • Method Details

    • forSeries

      public static Statistics forSeries(CharSequence name, CharSequence... differenceNames)
      Constructs a new Statistics object which will also compute finite differences up to the given order. If the values to be given to the accept(…) methods are the y values of some y=f(x) function for x values increasing or decreasing at a constant interval Δx, then the finite differences are proportional to discrete derivatives.

      The Statistics object created by this method know nothing about the Δx interval. In order to get the discrete derivatives, the following method needs to be invoked after all sample values have been added:

      The maximal "derivative" order is determined by the length of the differenceNames array:
      • 0 if no differences are needed (equivalent to direct instantiation of a new Statistics object).
      • 1 for computing the statistics on the differences between consecutive samples (proportional to the statistics on the first discrete derivatives) in addition to the sample statistics.
      • 2 for computing also the statistics on the differences between consecutive differences (proportional to the statistics on the second discrete derivatives) in addition to the above.
      • etc.
      Parameters:
      name - the phenomenon for which this object is collecting statistics, or null if none. If non-null, then this name will be shown as column header in the table formatted by StatisticsFormat.
      differenceNames - the names of the statistics on differences. The given array cannot be null, but can contain null elements.
      Returns:
      the newly constructed, initially empty, set of statistics.
      See Also:
    • name

      public org.opengis.util.InternationalString name()
      Returns the name of the phenomenon for which this object is collecting statistics. If non-null, then this name will be shown as column header in the table formatted by StatisticsFormat.
      Returns:
      the phenomenon for which this object is collecting statistics, or null if none.
    • reset

      public void reset()
      Resets this object state as if it was just created. The count() and the sum() are set to zero and all other statistical values are set to Double.NaN.
    • accept

      public void accept(double sample)
      Updates statistics for the specified floating-point sample value. NaN values increment the NaN count, but are otherwise ignored.
      Specified by:
      accept in interface DoubleConsumer
      Parameters:
      sample - the sample value (may be NaN).
      See Also:
    • real

      private void real(double sample)
      Implementation of accept(double) for real (non-NaN) numbers.
      See Also:
    • accept

      public void accept(long sample)
      Updates statistics for the specified integer sample value. For very large integer values (greater than 252 in magnitude), this method may be more accurate than the accept(double) version.
      Specified by:
      accept in interface LongConsumer
      Parameters:
      sample - the sample value.
      See Also:
    • combine

      public void combine(Statistics stats)
      Updates statistics with all samples from the specified stats. Invoking this method is equivalent (except for rounding errors) to invoking accept(…) for all samples that were added to stats.
      Parameters:
      stats - the statistics to be added to this.
    • scale

      public void scale(double factor)
      Multiplies the statistics by the given factor. The given scale factory is also applied recursively on the differences statistics, if any. Invoking this method transforms the statistics as if every values given to the accept(…) had been first multiplied by the given factor.

      This method is useful for computing discrete derivatives from the differences between sample values. See differences() or forSeries(…) for more information.

      Parameters:
      factor - the factor by which to multiply the statistics.
    • decrementCountNaN

      void decrementCountNaN()
      For Statistics.WithDelta usage only.
    • countNaN

      public int countNaN()
      Returns the number of NaN samples. NaN samples are ignored in all other statistical computation. This method count them for information purpose only.
      Returns:
      the number of NaN values.
    • count

      public int count()
      Returns the number of samples, excluding NaN values.
      Returns:
      the number of sample values, excluding NaN.
    • minimum

      public double minimum()
      Returns the minimum sample value, or NaN if none.
      Returns:
      the minimum sample value, or NaN if none.
    • maximum

      public double maximum()
      Returns the maximum sample value, or NaN if none.
      Returns:
      the maximum sample value, or NaN if none.
    • span

      public double span()
      Equivalents to maximum - minimum. If no samples were added, then returns NaN.
      Returns:
      the span of sample values, or NaN if none.
    • sum

      public double sum()
      Returns the sum, or 0 if none. May also be NaN if that value was explicitly specified to the constructor.
      Returns:
      the sum, or 0 if none.
    • mean

      public double mean()
      Returns the mean value, or NaN if none.
      Returns:
      the mean value, or NaN if none.
    • rms

      public double rms()
      Returns the root mean square, or NaN if none.
      Returns:
      the root mean square, or NaN if none.
    • standardDeviation

      public double standardDeviation(boolean allPopulation)
      Returns the standard deviation. If the sample values given to the accept(…) methods have a uniform distribution, then the returned value should be close to sqrt(span² / 12). If they have a Gaussian distribution (which is the most common case), then the returned value is related to the error function.

      As a reminder, the table below gives the probability for a sample value to be inside the mean ± n × deviation range, assuming that the distribution is Gaussian (first column) or assuming that the distribution is uniform (second column).

      Probability values for some standard deviations
      nGaussianuniform
      0.569.1%28.9%
      1.084.2%57.7%
      1.593.3%86.6%
      2.097.7%100%
      3.099.9%100%
      Parameters:
      allPopulation - true if sample values given to accept(…) methods were the totality of the population under study, or false if they were only a sampling.
      Returns:
      the standard deviation.
    • differences

      public Statistics differences()
      Returns the statistics on the differences between sample values, or null if none. For example if the sample values given to the accept(…) methods were y₀, y₁, y₂ and y₃, then this method returns statistics on y₁-y₀, y₂-y₁ and y₃-y₂.

      The differences between sample values are related to the discrete derivatives as below, where Δx is the constant interval between the x values of the y=f(x) function:

      This method returns a non-null value only if this Statistics instance has been created by a call to the forSeries(…) method with a non-empty differenceNames array. More generally, calls to this method can be chained up to differenceNames.length times for fetching second or higher order derivatives, as in the above example.
      Returns:
      the statistics on the differences between consecutive sample values, or null if not calculated by this object.
      See Also:
    • toString

      public String toString()
      Returns a string representation of this statistics. This string will span multiple lines, one for each statistical value. For example:
      Overrides:
      toString in class Object
      Returns:
      a string representation of this statistics object.
      See Also:
    • clone

      public Statistics clone()
      Returns a clone of this statistics.
      Overrides:
      clone in class Object
      Returns:
      a clone of this statistics.
    • hashCode

      public int hashCode()
      Returns a hash code value for this statistics.
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object object)
      Compares this statistics with the specified object for equality.
      Overrides:
      equals in class Object
      Parameters:
      object - the object to compare with.
      Returns:
      true if both objects are equal.