Package cern.jet.stat.quantile
package cern.jet.stat.quantile
Scalable algorithms and data structures to compute approximate quantiles over very large data sequences.
The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset.
The main memory requirements are smaller than for any other known technique by an order of magnitude.
1. Algorithm to compute quantiles.
2. 1-dim-equi-depth histogram.
3. 1-dim-histogram arbitrarily rebinnable in real-time.
4. A space efficient MultiSet data structure using lossy compression.
5. A space efficient value preserving bin of a 2-dim or d-dim histogram.
(All subject to an accuracy specified by the user.)
Have a look at the documentation of class
Also see
The approx. algorithms are primarily intended to help applications scale. When faced with a large data sequence, traditional methods either need very large memories or time consuming disk based sorting. In constrast, the approx. algorithms can deal with > 10^10 values without disk based sorting.
All classes can be seen from various angles, for example as
QuantileFinderFactory
and the interface DoubleQuantileFinder
to learn more.
Most users will never need to know more than how to use these.
Actual implementations of the QuantileFinder interface are hidden.
They are indirectly constructed via the the factory.
Also see
QuantileBin1D
, demonstrating how this package can be used.-
ClassDescriptionA buffer holding elements; internally used for computing approximate quantiles.An abstract set of buffers; internally used for computing approximate quantiles.A buffer holding double elements; internally used for computing approximate quantiles.A set of buffers holding double elements; internally used for computing approximate quantiles.The abstract base class for approximate quantile finders computing quantiles over a sequence of double elements.The interface shared by all quantile finders, no matter if they are exact or approximate.Read-only equi-depth histogram for selectivity estimation.Exact quantile finding algorithm for known and unknown N requiring large main memory; computes quantiles over a sequence of double elements.Approximate quantile finding algorithm for known N requiring only one pass and little main memory; computes quantiles over a sequence of double elements.A class to test the QuantileBin1D code.Computes b and k vor various parameters.Factory constructing exact and approximate quantile finders for both known and unknown N.A class holding test cases for exact and approximate quantile finders.Approximate quantile finding algorithm for unknown N requiring only one pass and little main memory; computes quantiles over a sequence of double elements.Holds some utility methods shared by different quantile finding implementations.