Package edu.uci.ics.jung.algorithms.util
Class DiscreteDistribution
java.lang.Object
edu.uci.ics.jung.algorithms.util.DiscreteDistribution
A utility class for calculating properties of discrete distributions.
Generally, these distributions are represented as arrays of
double
values, which are assumed to be normalized
such that the entries in a single array sum to 1.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic double
cosine
(double[] dist, double[] reference) Returns the cosine distance between the two specified distributions, which must have the same number of elements.static double
entropy
(double[] dist) Returns the entropy of this distribution.static double
KullbackLeibler
(double[] dist, double[] reference) Returns the Kullback-Leibler divergence between the two specified distributions, which must have the same number of elements.static double[]
mean
(double[][] distributions) Returns the mean of the specified array of distributions, represented as normalized arrays ofdouble
values.static double[]
mean
(Collection<double[]> distributions) Returns the mean of the specifiedCollection
of distributions, which are assumed to be normalized arrays ofdouble
values.static void
normalize
(double[] counts, double alpha) Normalizes, with Lagrangian smoothing, the specifieddouble
array, so that the values sum to 1 (i.e., can be treated as probabilities).static double
squaredError
(double[] dist, double[] reference) Returns the squared difference between the two specified distributions, which must have the same number of elements.static double
symmetricKL
(double[] dist, double[] reference)
-
Constructor Details
-
DiscreteDistribution
public DiscreteDistribution()
-
-
Method Details
-
KullbackLeibler
public static double KullbackLeibler(double[] dist, double[] reference) Returns the Kullback-Leibler divergence between the two specified distributions, which must have the same number of elements. This is defined as the sum over alli
ofdist[i] * Math.log(dist[i] / reference[i])
. Note that this value is not symmetric; seesymmetricKL
for a symmetric variant.- Parameters:
dist
- the distribution whose divergence fromreference
is being measuredreference
- the reference distribution- Returns:
- sum_i of
dist[i] * Math.log(dist[i] / reference[i])
- See Also:
-
symmetricKL
public static double symmetricKL(double[] dist, double[] reference) - Parameters:
dist
- the distribution whose divergence fromreference
is being measuredreference
- the reference distribution- Returns:
KullbackLeibler(dist, reference) + KullbackLeibler(reference, dist)
- See Also:
-
squaredError
public static double squaredError(double[] dist, double[] reference) Returns the squared difference between the two specified distributions, which must have the same number of elements. This is defined as the sum over alli
of the square of(dist[i] - reference[i])
.- Parameters:
dist
- the distribution whose distance fromreference
is being measuredreference
- the reference distribution- Returns:
- sum_i
(dist[i] - reference[i])^2
-
cosine
public static double cosine(double[] dist, double[] reference) Returns the cosine distance between the two specified distributions, which must have the same number of elements. The distributions are treated as vectors indist.length
-dimensional space. Given the following definitionsv
= the sum over alli
ofdist[i] * dist[i]
w
= the sum over alli
ofreference[i] * reference[i]
vw
= the sum over alli
ofdist[i] * reference[i]
vw / (Math.sqrt(v) * Math.sqrt(w))
.- Parameters:
dist
- the distribution whose distance fromreference
is being measuredreference
- the reference distribution- Returns:
- the cosine distance between
dist
andreference
, considered as vectors
-
entropy
public static double entropy(double[] dist) Returns the entropy of this distribution. High entropy indicates that the distribution is close to uniform; low entropy indicates that the distribution is close to a Dirac delta (i.e., if the probability mass is concentrated at a single point, this method returns 0). Entropy is defined as the sum over alli
of-(dist[i] * Math.log(dist[i]))
- Parameters:
dist
- the distribution whose entropy is being measured- Returns:
- sum_i
-(dist[i] * Math.log(dist[i]))
-
normalize
public static void normalize(double[] counts, double alpha) Normalizes, with Lagrangian smoothing, the specifieddouble
array, so that the values sum to 1 (i.e., can be treated as probabilities). The effect of the Lagrangian smoothing is to ensure that all entries are nonzero; effectively, a value ofalpha
is added to each entry in the original array prior to normalization.- Parameters:
counts
- the array to be converted into a probability distributionalpha
- the value to add to each entry prior to normalization
-
mean
Returns the mean of the specifiedCollection
of distributions, which are assumed to be normalized arrays ofdouble
values.- Parameters:
distributions
- the distributions whose mean is to be calculated- Returns:
- the mean of the distributions
- See Also:
-
mean
public static double[] mean(double[][] distributions) Returns the mean of the specified array of distributions, represented as normalized arrays ofdouble
values. Will throw an "index out of bounds" exception if the distribution arrays are not all of the same length.- Parameters:
distributions
- the distributions whose mean is to be calculated- Returns:
- the mean of the distributions
-