Class ExtendedPrecision
It is based on the 1971 paper Dekker (1971) A floating-point technique for extending the available precision Numer. Math. 18, 224-242.
Adapted from org.apache.commons.numbers.core.ExtendedPrecision
.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final double
Threshold for a big number that may overflow when squared.private static final int
Approximate x squared value whereexp(-0.5*x*x) == 0
.private static final int
X squared value whereexp(-0.5*x*x)
cannot increase accuracy using the round-off from x squared.private static final double
The multiplier used to split the double value into high and low parts.private static final double
Scale down by 2^600.private static final double
Scale up by 2^600.private static final double
Threshold for a small number that may underflow when squared.(package private) static final double
sqrt(2 pi) as a double.private static final double
Upper bits of sqrt(2 pi).private static final double
Lower bits of sqrt(2 pi).(package private) static final double
Round-off from sqrt(2 pi) as a double. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate static double
computeSqrt2aa
(double a) Computesqrt(2 * a * a)
.private static double
computeXsqrt2pi
(double a) Computea * sqrt(2 * pi)
.(package private) static double
expmhxx
(double x) Computeexp(-0.5*x*x)
with high accuracy.private static double
expxx
(double a, double b) Computeexp(a+b)
with high accuracy assuminga+b = a
.private static double
highPartUnscaled
(double value) Implement Dekker's method to split a value into two parts.private static double
productLow
(double hx, double lx, double hy, double ly, double xy) Compute the low part of the double length number(z,zz)
for the exact product ofx
andy
using Dekker's mult12 algorithm.(package private) static double
sqrt2xx
(double x) Computesqrt(2 * x * x)
.private static double
squareLow
(double hx, double lx, double xx) Compute the low part of the double length number(z,zz)
for the exact square ofx
using Dekker's mult12 algorithm.(package private) static double
xsqrt2pi
(double x) Multiply the term by sqrt(2 pi).
-
Field Details
-
SQRT2PI
static final double SQRT2PIsqrt(2 pi) as a double. Computed to 64-digits precision and converted to double.- See Also:
-
SQRT2PI_R
static final double SQRT2PI_RRound-off from sqrt(2 pi) as a double. Computed from the value sqrt(2 pi) to 64-digits precision minusSQRT2PI
.- See Also:
-
MULTIPLIER
private static final double MULTIPLIERThe multiplier used to split the double value into high and low parts. From Dekker (1971): "The constant should be chosen equal to 2^(p - p/2) + 1, where p is the number of binary digits in the mantissa". Here p is 53 and the multiplier is2^27 + 1
.- See Also:
-
BIG
private static final double BIGThreshold for a big number that may overflow when squared. 2^500.- See Also:
-
SMALL
private static final double SMALLThreshold for a small number that may underflow when squared. 2^-500.- See Also:
-
SCALE_UP
private static final double SCALE_UPScale up by 2^600.- See Also:
-
SCALE_DOWN
private static final double SCALE_DOWNScale down by 2^600.- See Also:
-
SQRT2PI_H
private static final double SQRT2PI_HUpper bits of sqrt(2 pi). -
SQRT2PI_L
private static final double SQRT2PI_LLower bits of sqrt(2 pi). -
EXP_M_HALF_XX_MIN_VALUE
private static final int EXP_M_HALF_XX_MIN_VALUEX squared value whereexp(-0.5*x*x)
cannot increase accuracy using the round-off from x squared.- See Also:
-
EXP_M_HALF_XX_MAX_VALUE
private static final int EXP_M_HALF_XX_MAX_VALUEApproximate x squared value whereexp(-0.5*x*x) == 0
. This is above-2 * ln(2^-1074)
due to rounding performed within the exp function.- See Also:
-
-
Constructor Details
-
ExtendedPrecision
private ExtendedPrecision()No instances.
-
-
Method Details
-
xsqrt2pi
static double xsqrt2pi(double x) Multiply the term by sqrt(2 pi).- Parameters:
x
- Value (assumed to be positive)- Returns:
- x * sqrt(2 pi)
-
computeXsqrt2pi
private static double computeXsqrt2pi(double a) Computea * sqrt(2 * pi)
.- Parameters:
a
- Value- Returns:
- the result
-
sqrt2xx
static double sqrt2xx(double x) Computesqrt(2 * x * x)
.The result is computed using a high precision computation of
sqrt(2 * x * x)
avoiding underflow or overflow ofx
squared.- Parameters:
x
- Value (assumed to be positive)- Returns:
sqrt(2 * x * x)
-
computeSqrt2aa
private static double computeSqrt2aa(double a) Computesqrt(2 * a * a)
.- Parameters:
a
- Value- Returns:
- the result
-
expmhxx
static double expmhxx(double x) Computeexp(-0.5*x*x)
with high accuracy. This is performed using information in the round-off fromx*x
.This is accurate at large x to 1 ulp until exp(-0.5*x*x) is close to sub-normal. For very small exp(-0.5*x*x) the adjustment is sub-normal and bits can be lost in the adjustment for a max observed error of
< 2
ulp.At small x the accuracy cannot be improved over using exp(-0.5*x*x). This occurs at
x <= sqrt(2)
.- Parameters:
x
- Value- Returns:
- exp(-0.5*x*x)
- See Also:
-
expxx
private static double expxx(double a, double b) Computeexp(a+b)
with high accuracy assuminga+b = a
.This is accurate at large positive a to 1 ulp. If a is negative and exp(a) is close to sub-normal a bit of precision may be lost when adjusting result as the adjustment is sub-normal (max observed error
< 2
ulp). For the use case of multiplication of a number less than 1 by exp(-x*x), a = -x*x, the result will be sub-normal and the rounding error is lost.At small |a| the accuracy cannot be improved over using exp(a) as the round-off is too small to create terms that can adjust the standard result by more than 0.5 ulp. This occurs at
|a| <= 1
.- Parameters:
a
- High bits of a split numberb
- Low bits of a split number- Returns:
- exp(a+b)
- See Also:
-
highPartUnscaled
private static double highPartUnscaled(double value) Implement Dekker's method to split a value into two parts. Multiplying by (2^s + 1) creates a big value from which to derive the two split parts.c = (2^s + 1) * a a_big = c - a a_hi = c - a_big a_lo = a - a_hi a = a_hi + a_lo
The multiplicand allows a p-bit value to be split into (p-s)-bit value
a_hi
and a non-overlapping (s-1)-bit valuea_lo
. Combined they have (p-1) bits of significand but the sign bit ofa_lo
contains a bit of information. The constant is chosen so that s is ceil(p/2) where the precision p for a double is 53-bits (1-bit of the mantissa is assumed to be 1 for a non sub-normal number) and s is 27.This conversion does not use scaling and the result of overflow is NaN. Overflow may occur when the exponent of the input value is above 996.
Splitting a NaN or infinite value will return NaN.
- Parameters:
value
- Value.- Returns:
- the high part of the value.
- See Also:
-
productLow
private static double productLow(double hx, double lx, double hy, double ly, double xy) Compute the low part of the double length number(z,zz)
for the exact product ofx
andy
using Dekker's mult12 algorithm. The standard precision productx*y
must be provided. The numbersx
andy
should already be split into low and high parts.Note: This uses the high part of the result
(z,zz)
asx * y
and nothx * hy + hx * ty + tx * hy
as specified in Dekker's original paper. See Shewchuk (1997) for working examples.- Parameters:
hx
- High part of first factor.lx
- Low part of first factor.hy
- High part of second factor.ly
- Low part of second factor.xy
- Product of the factors.- Returns:
lx * ly - (((xy - hx * hy) - lx * hy) - hx * ly)
- See Also:
-
squareLow
private static double squareLow(double hx, double lx, double xx) Compute the low part of the double length number(z,zz)
for the exact square ofx
using Dekker's mult12 algorithm. The standard precision productx*x
must be provided. The numberx
should already be split into low and high parts.Note: This is a specialisation of
productLow(double, double, double, double, double)
.- Parameters:
hx
- High part of factor.lx
- Low part of factor.xx
- Square of the factor.- Returns:
lx * lx - (((xx - hx * hx) - lx * hx) - hx * lx)
-