Package cern.jet.stat.quantile
Class QuantileCalc
- java.lang.Object
-
- cern.jet.stat.quantile.QuantileCalc
-
class QuantileCalc extends java.lang.Object
Computes b and k vor various parameters.
-
-
Constructor Summary
Constructors Constructor Description QuantileCalc()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static double
binomial(long n, long k)
Efficiently computes the binomial coefficient, often also referred to as "n over k" or "n choose k".static long
ceiling(double value)
Returns the smallestlong >= value
.static long[]
known_N_compute_B_and_K(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability.protected static long[]
known_N_compute_B_and_K_quick(long N, double epsilon)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with a guaranteed approximation error no more than epsilon.protected static long[]
known_N_compute_B_and_K_slow(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability.static void
main(java.lang.String[] args)
static void
test_B_and_K_Calculation(java.lang.String[] args)
Computes b and k for different parameters.static long[]
unknown_N_compute_B_and_K(double epsilon, double delta, int quantiles)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability.
-
-
-
Method Detail
-
binomial
public static double binomial(long n, long k)
Efficiently computes the binomial coefficient, often also referred to as "n over k" or "n choose k". The binomial coefficient is defined as n!/((n-k)!*k!). Tries to avoid numeric overflows.- Returns:
- the binomial coefficient.
-
ceiling
public static long ceiling(double value)
Returns the smallestlong >= value
.- Examples:
1.0 -> 1, 1.2 -> 2, 1.9 -> 2
. This method is safer than using (long) Math.ceil(value), because of possible rounding error. - Examples:
-
known_N_compute_B_and_K
public static long[] known_N_compute_B_and_K(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability. Assumes that quantiles are to be computed over N values. The required sampling rate is computed and stored in the first element of the provided returnSamplingRate array, which, therefore must be at least of length 1.- Parameters:
N
- the number of values over which quantiles shall be computed (e.g 10^6).epsilon
- the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;delta
- the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To avoid probabilistic answers, set delta=0.0.quantiles
- the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.samplingRate
- a double[1] where the sampling rate is to be filled in.- Returns:
- long[2] - long[0]=the number of buffers, long[1]=the number of elements per buffer, returnSamplingRate[0]=the required sampling rate.
-
known_N_compute_B_and_K_quick
protected static long[] known_N_compute_B_and_K_quick(long N, double epsilon)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with a guaranteed approximation error no more than epsilon. Assumes that quantiles are to be computed over N values.- Parameters:
N
- the anticipated number of values over which quantiles shall be determined.epsilon
- the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;- Returns:
- long[2] - long[0]=the number of buffers, long[1]=the number of elements per buffer.
-
known_N_compute_B_and_K_slow
protected static long[] known_N_compute_B_and_K_slow(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability. Assumes that quantiles are to be computed over N values. The required sampling rate is computed and stored in the first element of the provided returnSamplingRate array, which, therefore must be at least of length 1.- Parameters:
N
- the anticipated number of values over which quantiles shall be computed (e.g 10^6).epsilon
- the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;delta
- the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To avoid probabilistic answers, set delta=0.0.quantiles
- the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.samplingRate
- a double[1] where the sampling rate is to be filled in.- Returns:
- long[2] - long[0]=the number of buffers, long[1]=the number of elements per buffer, returnSamplingRate[0]=the required sampling rate.
-
main
public static void main(java.lang.String[] args)
-
test_B_and_K_Calculation
public static void test_B_and_K_Calculation(java.lang.String[] args)
Computes b and k for different parameters.
-
unknown_N_compute_B_and_K
public static long[] unknown_N_compute_B_and_K(double epsilon, double delta, int quantiles)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability.- Parameters:
epsilon
- the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact results, set epsilon=0.0;delta
- the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To get exact results, set delta=0.0.quantiles
- the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.- Returns:
- long[3] - long[0]=the number of buffers, long[1]=the number of elements per buffer, long[2]=the tree height where sampling shall start.
-
-