Class DD
- All Implemented Interfaces:
Serializable
,Addition<DD>
,Multiplication<DD>
,NativeOperators<DD>
A double-double is an unevaluated sum of two IEEE double precision numbers capable of
representing at least 106 bits of significand. A normalized double-double number (x, xx)
satisfies the condition that the parts are non-overlapping in magnitude such that:
|x| > |xx| x == x + xx
This implementation assumes a normalized representation during operations on a DD
number and computes results as a normalized representation. Any double-double number
can be normalized by summation of the parts (see ofSum
).
Note that the number (x, xx)
may also be referred to using the labels high and low
to indicate the magnitude of the parts as
(x
hi, x
lo)
, or using a numerical suffix for the
parts as (x
0, x
1)
. The numerical suffix is
typically used when the number has an arbitrary number of parts.
The double-double class is immutable.
Construction
Factory methods to create a DD
that are exact use the prefix of
. Methods
that create the closest possible representation use the prefix from
. These methods
may suffer a possible loss of precision during conversion.
Primitive values of type double
, int
and long
are
converted exactly to a DD
.
The DD
class can also be created as the result of an arithmetic operation on a pair
of double
operands. The resulting DD
has the IEEE754 double
result
of the operation in the first part, and the second part contains the round-off lost from the
operation due to rounding. Construction using add (+
), subtract (-
) and
multiply (*
) operators are exact. Construction using division (/
) may be
inexact if the quotient is not representable.
Note that it is more efficient to create a DD
from a double
operation than
to create two DD
values and combine them with the same operation. The result will be
the same for add, subtract and multiply but may lose precision for divide.
// Inefficient
DD a = DD.of(1.23).add(DD.of(4.56));
// Optimal
DD b = DD.ofSum(1.23, 4.56);
// Inefficient and may lose precision
DD c = DD.of(1.23).divide(DD.of(4.56));
// Optimal
DD d = DD.fromQuotient(1.23, 4.56);
It is not possible to directly specify the two parts of the number.
The two parts must be added using ofSum
.
If the two parts already represent a number (x, xx)
such that x == x + xx
then the magnitudes of the parts will be unchanged; any signed zeros may be subject to a sign
change.
Primitive operands
Operations are provided using a DD
operand or a double
operand.
Implicit type conversion allows methods with a double
operand to be used
with other primitives such as int
or long
. Note that casting of a long
to a double
may result in loss of precision.
To maintain the full precision of a long
first convert the value to a DD
using
of(long)
and use the same arithmetic operation using the DD
operand.
Accuracy
Add and multiply operations using two double
values operands are computed to an
exact DD
result (see ofSum
and
ofProduct
). Operations involving a DD
and another
operand, either double
or DD
, are not exact.
This class is not intended to perform exact arithmetic. Arbitrary precision arithmetic is
available using BigDecimal
. Single operations will compute the DD
result within
a tolerance of the 106-bit exact result. This far exceeds the accuracy of double
arithmetic. The reduced accuracy is a compromise to deliver increased performance.
The class is intended to reduce error in equivalent double
arithmetic operations where
the double
valued result is required to high accuracy. Although it
is possible to reduce error to 2-106 for all operations, the additional computation
would impact performance and would require multiple chained operations to potentially
observe a different result when the final DD
is converted to a double
.
Canonical representation
The double-double number is the sum of its parts. The canonical representation of the
number is the explicit value of the parts. The toString()
method is provided to
convert to a String representation of the parts formatted as a tuple.
The class implements equals(Object)
and hashCode()
and allows usage as
a key in a Set or Map. Equality requires binary equivalence of the parts. Note that
representations of zero using different combinations of +/- 0.0 are not considered equal.
Also note that many non-normalized double-double numbers can represent the same number.
Double-double numbers can be normalized before operations that involve equals(Object)
by adding
the parts; this is exact for a finite sum
and provides equality support for non-zero numbers. Alternatively exact numerical equality
and comparisons are supported by conversion to a BigDecimal
representation. Note that BigDecimal
does not support non-finite values.
Overflow, underflow and non-finite support
A double-double number is limited to the same finite range as a double
(4.9E-324 to 1.7976931348623157E308). This class is intended for use when
the ultimate result is finite and intermediate values do not approach infinity or zero.
This implementation does not support IEEE standards for handling infinite and NaN when used
in arithmetic operations. Computations may split a 64-bit double into two parts and/or use
subtraction of intermediate terms to compute round-off parts. These operations may generate
infinite values due to overflow which then propagate through further operations to NaN,
for example computing the round-off using Inf - Inf = NaN
.
Operations that involve splitting a double (multiply, divide) are safe when the base 2 exponent is below 996. This puts an upper limit of approximately +/-6.7e299 on any values to be split; in practice the arguments to multiply and divide operations are further constrained by the expected finite value of the product or quotient.
Likewise the smallest value that can be represented is Double.MIN_VALUE
. The full
106-bit accuracy will be lost when intermediates are within 253 of
Double.MIN_NORMAL
.
The DD
result can be verified by checking it is a finite
evaluated sum. Computations expecting to approach over or underflow must use scaling of
intermediate terms (see frexp
and scalb
) and
appropriate management of the current base 2 scale.
References:
- Dekker, T.J. (1971) A floating-point technique for extending the available precision Numerische Mathematik, 18:224–242.
- Shewchuk, J.R. (1997) Arbitrary Precision Floating-Point Arithmetic.
- Hide, Y, Li, X.S. and Bailey, D.H. (2008) Library for Double-Double and Quad-Double Arithmetic.
- Since:
- 1.2
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
The value 1022 converted for use if usingInteger.compareUnsigned(int, int)
.private static final int
The value 2046 converted for use if usingInteger.compareUnsigned(int, int)
.private static final int
The value -1 converted for use if usingInteger.compareUnsigned(int, int)
.private static final int
The mask to extract the raw 11-bit exponent.private static final int
Exponent offset in IEEE754 representation.private static final char
private static final char
private static final char
private static final double
0.5.private static final long
Mask to extract the high 32-bits from a long.private static final long
Mask to extract the 52-bit mantissa from a long representation of a double.private static final double
The multiplier used to split the double value into high and low parts.static final DD
A double-double number representing one.private static final double
The limit for safe multiplication ofx*y
, assuming values above 1.private static final long
Serializable version identifier.private static final int
The size of the buffer fortoString()
.private static final double
2^512.private static final double
2^53.private static final double
2^-512.private static final long
Mask to remove the sign bit from a long.private final double
The high part of the double-double number.private final double
The low part of the double-double number.static final DD
A double-double number representing zero. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
DD
(double x, double xx) Create a double-double number(x, xx)
. -
Method Summary
Modifier and TypeMethodDescriptionabs()
Returns aDD
whose value is the absolute value of the number(x, xx)
This method assumes that the low partxx
is the smaller magnitude.(package private) static DD
accurateAdd
(double x, double xx, double y) Compute the sum of(x, xx)
andy
.(package private) static DD
accurateAdd
(double x, double xx, double y, double yy) Compute the sum of(x, xx)
and(y, yy)
.add
(double y) Returns aDD
whose value is(this + y)
.(package private) static DD
add
(double x, double xx, double y, double yy) Compute the sum of(x, xx)
and(y, yy)
.Returns aDD
whose value is(this + y)
.Get the value as aBigDecimal
.ceil()
Returns the smallest (closest to negative infinity)DD
value that is greater than or equal tothis
number(x, xx)
and is equal to a mathematical integer.private static DD
computePow
(double x, double xx, int n) Compute the numberx
(non-zero finite) raised to the powern
.private static DD
computePowScaled
(long b, double x, double xx, int n, long[] exp) Compute the numberx
(non-zero finite) raised to the powern
.divide
(double y) Returns aDD
whose value is(this / y)
.private static DD
divide
(double x, double xx, double y) Compute the division of(x, xx)
byy
.private static DD
divide
(double x, double xx, double y, double yy) Compute the division of(x, xx)
by(y, yy)
.Returns aDD
whose value is(this / y)
.double
Get the value as adouble
.private static boolean
equals
(double x, double y) Returnstrue
if the values are numerically equal.boolean
Test for equality with another object.(package private) static DD
fastTwoDiff
(double a, double b) Compute the difference of two numbersa
andb
using Dekker's two-sum algorithm.private static double
fastTwoDiffLow
(double a, double b, double x) Compute the round-off of the difference of two numbersa
andb
using Dekker's two-sum algorithm.(package private) static DD
fastTwoSum
(double a, double b) Compute the sum of two numbersa
andb
using Dekker's two-sum algorithm.(package private) static double
fastTwoSumLow
(double a, double b, double x) Compute the round-off of the sum of two numbersa
andb
using Dekker's two-sum algorithm.float
Get the value as afloat
.floor()
Returns the largest (closest to positive infinity)DD
value that is less than or equal tothis
number(x, xx)
and is equal to a mathematical integer.private static DD
floorOrCeil
(double x, double xx, DoubleUnaryOperator op) Implementation of the floor and ceiling functions.frexp
(int[] exp) Convertthis
numberx
to fractionalf
and integral2^exp
components.static DD
from
(BigDecimal x) Creates the double-double number(z, zz)
using thedouble
representation of the argumentx
; the low part is thedouble
representation of the round-off error.static DD
fromQuotient
(double x, double y) Returns aDD
whose value is(x / y)
.private static int
getScale
(double a) Returns a scale suitable for use withMath.scalb(double, int)
to normalise the number to the interval[1, 2)
.int
hashCode()
Gets a hash code for the double-double number.double
hi()
Gets the first partx
of the double-double number(x, xx)
.(package private) static double
highPart
(double value) Implement Dekker's method to split a value into two parts.int
intValue()
Get the value as anint
.boolean
isFinite()
Returnstrue
if the evaluated sum of the parts is finite.(package private) static boolean
isNotNormal
(double a) Checks if the number is not normal.boolean
isOne()
Check if this is a neutral element of multiplication, i.e.boolean
isZero()
Check if this is a neutral element of addition, i.e.double
lo()
Gets the second partxx
of the double-double number(x, xx)
.long
Get the value as along
.multiply
(double y) Returns aDD
whose value isthis * y
.private static DD
multiply
(double x, double xx, double y) Compute the multiplication product of(x, xx)
andy
.private static DD
multiply
(double x, double xx, double y, double yy) Compute the multiplication product of(x, xx)
and(y, yy)
.multiply
(int n) Repeated addition.Returns aDD
whose value isthis * y
.negate()
Returns aDD
whose value is the negation of both parts of double-double number.static DD
of
(double x) Creates the double-double number as the value(x, 0)
.(package private) static DD
of
(double x, double xx) Creates the double-double number as the value(x, xx)
.static DD
of
(int x) Creates the double-double number as the value(x, 0)
.static DD
of
(long x) Creates the double-double number with the high part equal to(double) x
and the low part equal to any remaining bits.static DD
ofDifference
(double x, double y) Returns aDD
whose value is(x - y)
.static DD
ofProduct
(double x, double y) Returns aDD
whose value is(x * y)
.static DD
ofSquare
(double x) Returns aDD
whose value is(x * x)
.static DD
ofSum
(double x, double y) Returns aDD
whose value is(x + y)
.one()
Identity element.pow
(int n) Computethis
number(x, xx)
raised to the powern
.pow
(int n, long[] exp) Computethis
numberx
raised to the powern
.Compute the reciprocal ofthis
.private static DD
reciprocal
(double y, double yy) Compute the inverse of(y, yy)
.scalb
(int exp) Multiplythis
number(x, xx)
by an integral power of two.sqrt()
Compute the square root ofthis
number(x, xx)
.square()
Returns aDD
whose value isthis * this
.private static DD
square
(double x, double xx) Compute the square of(x, xx)
.subtract
(double y) Returns aDD
whose value is(this - y)
.Returns aDD
whose value is(this - y)
.toString()
Returns a string representation of the double-double number.(package private) static DD
twoDiff
(double a, double b) Compute the difference of two numbersa
andb
using Knuth's two-sum algorithm.private static double
twoDiffLow
(double a, double b, double x) Compute the round-off of the difference of two numbersa
andb
using Knuth two-sum algorithm.(package private) static double
twoPow
(int n) Create a normalized double with the value2^n
.(package private) static DD
twoProd
(double x, double y) Compute the double-double number(z,zz)
for the exact product ofx
andy
.(package private) static double
twoProductLow
(double x, double y, double xy) Compute the low part of the double length number(z,zz)
for the exact product ofx
andy
using Dekker's mult12 algorithm.(package private) static double
twoProductLow
(double hx, double lx, double hy, double ly, double xy) Compute the low part of the double length number(z,zz)
for the exact product ofx
andy
using Dekker's mult12 algorithm.(package private) static DD
twoSquare
(double x) Compute the double-double number(z,zz)
for the exact square ofx
.(package private) static double
twoSquareLow
(double x, double x2) Compute the low part of the double length number(z,zz)
for the exact square ofx
using Dekker's mult12 algorithm.(package private) static double
twoSquareLow
(double hx, double lx, double x2) Compute the low part of the double length number(z,zz)
for the exact square ofx
using Dekker's mult12 algorithm.(package private) static DD
twoSum
(double a, double b) Compute the sum of two numbersa
andb
using Knuth's two-sum algorithm.(package private) static double
twoSumLow
(double a, double b, double x) Compute the round-off of the sum of two numbersa
andb
using Knuth two-sum algorithm.zero()
Identity element.Methods inherited from class java.lang.Number
byteValue, shortValue
-
Field Details
-
ONE
A double-double number representing one. -
ZERO
A double-double number representing zero. -
MULTIPLIER
private static final double MULTIPLIERThe multiplier used to split the double value into high and low parts. From Dekker (1971): "The constant should be chosen equal to 2^(p - p/2) + 1, where p is the number of binary digits in the mantissa". Here p is 53 and the multiplier is2^27 + 1
.- See Also:
-
EXP_MASK
private static final int EXP_MASKThe mask to extract the raw 11-bit exponent. The value must be shifted 52-bits to remove the mantissa bits.- See Also:
-
CMP_UNSIGNED_2046
private static final int CMP_UNSIGNED_2046The value 2046 converted for use if usingInteger.compareUnsigned(int, int)
. This requires addingInteger.MIN_VALUE
to 2046.- See Also:
-
CMP_UNSIGNED_MINUS_1
private static final int CMP_UNSIGNED_MINUS_1The value -1 converted for use if usingInteger.compareUnsigned(int, int)
. This requires addingInteger.MIN_VALUE
to -1.- See Also:
-
CMP_UNSIGNED_1022
private static final int CMP_UNSIGNED_1022The value 1022 converted for use if usingInteger.compareUnsigned(int, int)
. This requires addingInteger.MIN_VALUE
to 1022.- See Also:
-
TWO_POW_512
private static final double TWO_POW_5122^512.- See Also:
-
TWO_POW_M512
private static final double TWO_POW_M5122^-512.- See Also:
-
TWO_POW_53
private static final double TWO_POW_532^53. Any double with a magnitude above this is an even integer.- See Also:
-
HIGH32_MASK
private static final long HIGH32_MASKMask to extract the high 32-bits from a long.- See Also:
-
UNSIGN_MASK
private static final long UNSIGN_MASKMask to remove the sign bit from a long.- See Also:
-
MANTISSA_MASK
private static final long MANTISSA_MASKMask to extract the 52-bit mantissa from a long representation of a double.- See Also:
-
EXPONENT_OFFSET
private static final int EXPONENT_OFFSETExponent offset in IEEE754 representation.- See Also:
-
HALF
private static final double HALF0.5.- See Also:
-
SAFE_MULTIPLY
private static final double SAFE_MULTIPLYThe limit for safe multiplication ofx*y
, assuming values above 1. Used to maintain positive values during the power computation.- See Also:
-
TO_STRING_SIZE
private static final int TO_STRING_SIZEThe size of the buffer fortoString()
.The longest double will require a sign, a maximum of 17 digits, the decimal place and the exponent, e.g. for max value this is 24 chars: -1.7976931348623157e+308. Set the buffer size to twice this and round up to a power of 2 thus allowing for formatting characters. The size is 64.
- See Also:
-
FORMAT_START
private static final char FORMAT_START- See Also:
-
FORMAT_END
private static final char FORMAT_END- See Also:
-
FORMAT_SEP
private static final char FORMAT_SEP- See Also:
-
serialVersionUID
private static final long serialVersionUIDSerializable version identifier.- See Also:
-
x
private final double xThe high part of the double-double number. -
xx
private final double xxThe low part of the double-double number.
-
-
Constructor Details
-
DD
private DD(double x, double xx) Create a double-double number(x, xx)
.- Parameters:
x
- High part.xx
- Low part.
-
-
Method Details
-
of
Creates the double-double number as the value(x, 0)
.- Parameters:
x
- Value.- Returns:
- the double-double
-
of
Creates the double-double number as the value(x, xx)
.Warning
The arguments are used directly. No checks are made that they represent a normalized double-double number:
x == x + xx
.This method is exposed for testing.
- Parameters:
x
- High part.xx
- Low part.- Returns:
- the double-double
- See Also:
-
of
Creates the double-double number as the value(x, 0)
.Note this method exists to avoid using
of(long)
forinteger
arguments; thelong
variation is slower as it preserves all 64-bits of information.- Parameters:
x
- Value.- Returns:
- the double-double
- See Also:
-
of
Creates the double-double number with the high part equal to(double) x
and the low part equal to any remaining bits.Note this method preserves all 64-bits of precision. Faster construction can be achieved using up to 53-bits of precision using
of((double) x)
.- Parameters:
x
- Value.- Returns:
- the double-double
- See Also:
-
from
Creates the double-double number(z, zz)
using thedouble
representation of the argumentx
; the low part is thedouble
representation of the round-off error.double z = x.doubleValue(); double zz = x.subtract(new BigDecimal(z)).doubleValue();
If the value cannot be represented as a finite value the result will have an infinite high part and the low part is undefined.
Note: This conversion can lose information about the precision of the BigDecimal value. The result is the closest double-double representation to the value.
- Parameters:
x
- Value.- Returns:
- the double-double
-
ofSum
Returns aDD
whose value is(x + y)
. The values are not required to be ordered by magnitude, i.e. the result is commutative:x + y == y + x
.This method ignores special handling of non-normal numbers and overflow within the extended precision computation. This creates the following special cases:
- If
x + y
is infinite then the low part is NaN. - If
x
ory
is infinite or NaN then the low part is NaN. - If
x + y
is sub-normal or zero then the low part is +/-0.0.
An invalid result can be identified using
isFinite()
.The result is the exact double-double representation of the sum.
- Parameters:
x
- Addend.y
- Addend.- Returns:
- the sum
x + y
. - See Also:
- If
-
ofDifference
Returns aDD
whose value is(x - y)
. The values are not required to be ordered by magnitude, i.e. the result matches a negation and addition:x - y == -y + x
.Computes the same results as
ofSum(a, -b)
. See that method for details of special cases.An invalid result can be identified using
isFinite()
.The result is the exact double-double representation of the difference.
- Parameters:
x
- Minuend.y
- Subtrahend.- Returns:
x - y
.- See Also:
-
ofProduct
Returns aDD
whose value is(x * y)
.This method ignores special handling of non-normal numbers and intermediate overflow within the extended precision computation. This creates the following special cases:
- If either
|x|
or|y|
multiplied by1 + 2^27
is infinite (intermediate overflow) then the low part is NaN. - If
x * y
is infinite then the low part is NaN. - If
x
ory
is infinite or NaN then the low part is NaN. - If
x * y
is sub-normal or zero then the low part is +/-0.0.
An invalid result can be identified using
isFinite()
.Note: Ignoring special cases is a design choice for performance. The method is therefore not a drop-in replacement for
roundOff = Math.fma(x, y, -x * y)
.The result is the exact double-double representation of the product.
- Parameters:
x
- Factor.y
- Factor.- Returns:
- the product
x * y
.
- If either
-
ofSquare
Returns aDD
whose value is(x * x)
.This method is an optimisation of
multiply(x, x)
. See that method for details of special cases.An invalid result can be identified using
isFinite()
.The result is the exact double-double representation of the square.
- Parameters:
x
- Factor.- Returns:
- the square
x * x
. - See Also:
-
fromQuotient
Returns aDD
whose value is(x / y)
. Ify = 0
the result is undefined.This method ignores special handling of non-normal numbers and intermediate overflow within the extended precision computation. This creates the following special cases:
- If either
|x / y|
or|y|
multiplied by1 + 2^27
is infinite (intermediate overflow) then the low part is NaN. - If
x / y
is infinite then the low part is NaN. - If
x
ory
is infinite or NaN then the low part is NaN. - If
x / y
is sub-normal or zero, excluding the previous cases, then the low part is +/-0.0.
An invalid result can be identified using
isFinite()
.The result is the closest double-double representation to the quotient.
- Parameters:
x
- Dividend.y
- Divisor.- Returns:
- the quotient
x / y
.
- If either
-
hi
public double hi()Gets the first partx
of the double-double number(x, xx)
. In a normalized double-double number this part will have the greatest magnitude.This is equivalent to returning the high-part
x
hi for the number(x
hi, x
lo)
.- Returns:
- the first part
-
lo
public double lo()Gets the second partxx
of the double-double number(x, xx)
. In a normalized double-double number this part will have the smallest magnitude.This is equivalent to returning the low part
x
lo for the number(x
hi, x
lo)
.- Returns:
- the second part
-
isFinite
public boolean isFinite()Returnstrue
if the evaluated sum of the parts is finite.This method is provided as a utility to check the result of a
DD
computation. Note that for performance theDD
class does not follow IEEE754 arithmetic for infinite and NaN, and does not protect from overflow of intermediate values in multiply and divide operations. If this method returnsfalse
followingDD
arithmetic then the computation is not supported to extended precision.Note: Any number that returns
true
may be converted to the exactBigDecimal
value.- Returns:
true
if this instance represents a finitedouble
value.- See Also:
-
doubleValue
public double doubleValue()Get the value as adouble
. This is the evaluated sum of the parts.Note that even when the return value is finite, this conversion can lose information about the precision of the
DD
value.Conversion of a finite
DD
can also be performed using theBigDecimal
representation.- Specified by:
doubleValue
in classNumber
- Returns:
- the value converted to a
double
- See Also:
-
floatValue
public float floatValue()Get the value as afloat
. This is the narrowing primitive conversion of thedoubleValue()
. This conversion can lose range, resulting in afloat
zero from a nonzerodouble
and afloat
infinity from a finitedouble
. Adouble
NaN is converted to afloat
NaN and adouble
infinity is converted to the same-signedfloat
infinity.Note that even when the return value is finite, this conversion can lose information about the precision of the
DD
value.Conversion of a finite
DD
can also be performed using theBigDecimal
representation.- Specified by:
floatValue
in classNumber
- Returns:
- the value converted to a
float
- See Also:
-
intValue
public int intValue()Get the value as anint
. This conversion discards the fractional part of the number and effectively rounds the value to the closest whole number in the direction of zero. This is the equivalent of a cast of a floating-point number to an integer, for example(int) -2.75 => -2
.Note that this conversion can lose information about the precision of the
DD
value.Special cases:
- If the
DD
value is infinite the result isInteger.MAX_VALUE
. - If the
DD
value is -infinite the result isInteger.MIN_VALUE
. - If the
DD
value is NaN the result is 0.
Conversion of a finite
DD
can also be performed using theBigDecimal
representation. Note thatBigDecimal
conversion rounds to theBigInteger
whole number representation and returns the low-order 32-bits. Numbers too large for anint
may change sign. This method ensures the sign is correct by directly rounding to anint
and returning the respective upper or lower limit for numbers too large for anint
. - If the
-
longValue
public long longValue()Get the value as along
. This conversion discards the fractional part of the number and effectively rounds the value to the closest whole number in the direction of zero. This is the equivalent of a cast of a floating-point number to an integer, for example(long) -2.75 => -2
.Note that this conversion can lose information about the precision of the
DD
value.Special cases:
- If the
DD
value is infinite the result isLong.MAX_VALUE
. - If the
DD
value is -infinite the result isLong.MIN_VALUE
. - If the
DD
value is NaN the result is 0.
Conversion of a finite
DD
can also be performed using theBigDecimal
representation. Note thatBigDecimal
conversion rounds to theBigInteger
whole number representation and returns the low-order 64-bits. Numbers too large for along
may change sign. This method ensures the sign is correct by directly rounding to along
and returning the respective upper or lower limit for numbers too large for along
. - If the
-
bigDecimalValue
Get the value as aBigDecimal
. This is the evaluated sum of the parts; the conversion is exact.The conversion will raise a
NumberFormatException
if the number is non-finite.- Returns:
- the double-double as a
BigDecimal
. - Throws:
NumberFormatException
- if any part of the number isinfinite
orNaN
- See Also:
-
fastTwoSum
Compute the sum of two numbersa
andb
using Dekker's two-sum algorithm. The values are required to be ordered by magnitude:|a| >= |b|
.If
a
is zero andb
is non-zero the returned value is(b, 0)
.- Parameters:
a
- First part of sum.b
- Second part of sum.- Returns:
- the sum
- See Also:
-
fastTwoSumLow
static double fastTwoSumLow(double a, double b, double x) Compute the round-off of the sum of two numbersa
andb
using Dekker's two-sum algorithm. The values are required to be ordered by magnitude:|a| >= |b|
.If
a
is zero andb
is non-zero the returned value is zero.- Parameters:
a
- First part of sum.b
- Second part of sum.x
- Sum.- Returns:
- the sum round-off
- See Also:
-
fastTwoDiff
Compute the difference of two numbersa
andb
using Dekker's two-sum algorithm. The values are required to be ordered by magnitude:|a| >= |b|
.Computes the same results as
fastTwoSum(a, -b)
.- Parameters:
a
- Minuend.b
- Subtrahend.- Returns:
- the difference
- See Also:
-
fastTwoDiffLow
private static double fastTwoDiffLow(double a, double b, double x) Compute the round-off of the difference of two numbersa
andb
using Dekker's two-sum algorithm. The values are required to be ordered by magnitude:|a| >= |b|
.- Parameters:
a
- Minuend.b
- Subtrahend.x
- Difference.- Returns:
- the difference round-off
- See Also:
-
twoSum
Compute the sum of two numbersa
andb
using Knuth's two-sum algorithm. The values are not required to be ordered by magnitude, i.e. the result is commutatives = a + b == b + a
.- Parameters:
a
- First part of sum.b
- Second part of sum.- Returns:
- the sum
- See Also:
-
twoSumLow
static double twoSumLow(double a, double b, double x) Compute the round-off of the sum of two numbersa
andb
using Knuth two-sum algorithm. The values are not required to be ordered by magnitude, i.e. the result is commutatives = a + b == b + a
.- Parameters:
a
- First part of sum.b
- Second part of sum.x
- Sum.- Returns:
- the sum round-off
- See Also:
-
twoDiff
Compute the difference of two numbersa
andb
using Knuth's two-sum algorithm. The values are not required to be ordered by magnitude.Computes the same results as
twoSum(a, -b)
.- Parameters:
a
- Minuend.b
- Subtrahend.- Returns:
- the difference
- See Also:
-
twoDiffLow
private static double twoDiffLow(double a, double b, double x) Compute the round-off of the difference of two numbersa
andb
using Knuth two-sum algorithm. The values are not required to be ordered by magnitude,- Parameters:
a
- Minuend.b
- Subtrahend.x
- Difference.- Returns:
- the difference round-off
- See Also:
-
twoProd
Compute the double-double number(z,zz)
for the exact product ofx
andy
.The high part of the number is equal to the product
z = x * y
. The low part is set to the round-off of thedouble
product.This method ignores special handling of non-normal numbers and intermediate overflow within the extended precision computation. This creates the following special cases:
- If
x * y
is sub-normal or zero then the low part is +/-0.0. - If
x * y
is infinite then the low part is NaN. - If
x
ory
is infinite or NaN then the low part is NaN. - If either
|x|
or|y|
multiplied by1 + 2^27
is infinite (intermediate overflow) then the low part is NaN.
Note: Ignoring special cases is a design choice for performance. The method is therefore not a drop-in replacement for
round_off = Math.fma(x, y, -x * y)
.- Parameters:
x
- First factor.y
- Second factor.- Returns:
- the product
- If
-
twoProductLow
static double twoProductLow(double x, double y, double xy) Compute the low part of the double length number(z,zz)
for the exact product ofx
andy
using Dekker's mult12 algorithm. The standard precision productx*y
must be provided. The numbersx
andy
are split into high and low parts using Dekker's algorithm.Warning: This method does not perform scaling in Dekker's split and large finite numbers can create NaN results.
- Parameters:
x
- First factor.y
- Second factor.xy
- Product of the factors (x * y).- Returns:
- the low part of the product double length number
- See Also:
-
twoProductLow
static double twoProductLow(double hx, double lx, double hy, double ly, double xy) Compute the low part of the double length number(z,zz)
for the exact product ofx
andy
using Dekker's mult12 algorithm. The standard precision productx*y
, and the high and low parts of the factors must be provided.- Parameters:
hx
- High-part of first factor.lx
- Low-part of first factor.hy
- High-part of second factor.ly
- Low-part of second factor.xy
- Product of the factors (x * y).- Returns:
- the low part of the product double length number
-
twoSquare
Compute the double-double number(z,zz)
for the exact square ofx
.The high part of the number is equal to the square
z = x * x
. The low part is set to the round-off of thedouble
square.This method is an optimisation of
twoProd(x, x)
. See that method for details of special cases.- Parameters:
x
- Factor.- Returns:
- the square
- See Also:
-
twoSquareLow
static double twoSquareLow(double x, double x2) Compute the low part of the double length number(z,zz)
for the exact square ofx
using Dekker's mult12 algorithm. The standard precision squarex*x
must be provided. The numberx
is split into high and low parts using Dekker's algorithm.Warning: This method does not perform scaling in Dekker's split and large finite numbers can create NaN results.
- Parameters:
x
- Factor.x2
- Square of the factor (x * x).- Returns:
- the low part of the square double length number
- See Also:
-
twoSquareLow
static double twoSquareLow(double hx, double lx, double x2) Compute the low part of the double length number(z,zz)
for the exact square ofx
using Dekker's mult12 algorithm. The standard precision squarex*x
, and the high and low parts of the factors must be provided.- Parameters:
hx
- High-part of factor.lx
- Low-part of factor.x2
- Square of the factor (x * x).- Returns:
- the low part of the square double length number
-
highPart
static double highPart(double value) Implement Dekker's method to split a value into two parts. Multiplying by (2^s + 1) creates a big value from which to derive the two split parts.c = (2^s + 1) * a a_big = c - a a_hi = c - a_big a_lo = a - a_hi a = a_hi + a_lo
The multiplicand allows a p-bit value to be split into (p-s)-bit value
a_hi
and a non-overlapping (s-1)-bit valuea_lo
. Combined they have (p-1) bits of significand but the sign bit ofa_lo
contains a bit of information. The constant is chosen so that s is ceil(p/2) where the precision p for a double is 53-bits (1-bit of the mantissa is assumed to be 1 for a non sub-normal number) and s is 27.This conversion does not use scaling and the result of overflow is NaN. Overflow may occur when the exponent of the input value is above 996.
Splitting a NaN or infinite value will return NaN.
- Parameters:
value
- Value.- Returns:
- the high part of the value.
- See Also:
-
negate
Returns aDD
whose value is the negation of both parts of double-double number. -
abs
Returns aDD
whose value is the absolute value of the number(x, xx)
This method assumes that the low partxx
is the smaller magnitude.Cases:
- If the
x
value is negative the result is(-x, -xx)
. - If the
x
value is +/- 0.0 the result is(0.0, 0.0)
; this will remove sign information from the round-off component assumed to be zero. - Otherwise the result is
this
.
- Returns:
- the absolute value
- See Also:
- If the
-
floor
Returns the largest (closest to positive infinity)DD
value that is less than or equal tothis
number(x, xx)
and is equal to a mathematical integer.This method may change the representation of zero and non-finite values; the result is equivalent to
Math.floor(x)
and thexx
part is ignored.Cases:
- If
x
is NaN, then the result is(NaN, 0)
. - If
x
is infinite, then the result is(x, 0)
. - If
x
is +/-0.0, then the result is(x, 0)
. - If
x != Math.floor(x)
, then the result is(Math.floor(x), 0)
. - Otherwise the result is the
DD
value equal to the sumMath.floor(x) + Math.floor(xx)
.
The result may generate a high part smaller (closer to negative infinity) than
Math.floor(x)
ifx
is a representable integer and thexx
value is negative.- Returns:
- the largest (closest to positive infinity) value that is less than or equal
to
this
and is equal to a mathematical integer - See Also:
- If
-
ceil
Returns the smallest (closest to negative infinity)DD
value that is greater than or equal tothis
number(x, xx)
and is equal to a mathematical integer.This method may change the representation of zero and non-finite values; the result is equivalent to
Math.ceil(x)
and thexx
part is ignored.Cases:
- If
x
is NaN, then the result is(NaN, 0)
. - If
x
is infinite, then the result is(x, 0)
. - If
x
is +/-0.0, then the result is(x, 0)
. - If
x != Math.ceil(x)
, then the result is(Math.ceil(x), 0)
. - Otherwise the result is the
DD
value equal to the sumMath.ceil(x) + Math.ceil(xx)
.
The result may generate a high part larger (closer to positive infinity) than
Math.ceil(x)
ifx
is a representable integer and thexx
value is positive.- Returns:
- the smallest (closest to negative infinity) value that is greater than or equal
to
this
and is equal to a mathematical integer - See Also:
- If
-
floorOrCeil
Implementation of the floor and ceiling functions.Cases:
- If
x
is non-finite or zero, then the result is(x, 0)
. - If
x
is rounded by the operator to a new valuey
, then the result is(y, 0)
. - Otherwise the result is the
DD
value equal to the sumop(x) + op(xx)
.
- Parameters:
x
- High part of x.xx
- Low part of x.op
- Floor or ceiling operator.- Returns:
- the result
- If
-
add
Returns aDD
whose value is(this + y)
.This computes the same result as
add(DD.of(y))
.The computed result is within 2 eps of the exact result where eps is 2-106.
- Parameters:
y
- Value to be added to this number.- Returns:
this + y
.- See Also:
-
add
Returns aDD
whose value is(this + y)
.The computed result is within 4 eps of the exact result where eps is 2-106.
-
add
Compute the sum of(x, xx)
and(y, yy)
.The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
x
- High part of x.xx
- Low part of x.y
- High part of y.yy
- Low part of y.- Returns:
- the sum
- See Also:
-
accurateAdd
Compute the sum of(x, xx)
andy
.This computes the same result as
accurateAdd(x, xx, y, 0)
.Note: This is an internal helper method used when accuracy is required. The computed result is within 1 eps of the exact result where eps is 2-106. The performance is approximately 1.5-fold slower than
add(double)
.- Parameters:
x
- High part of x.xx
- Low part of x.y
- y.- Returns:
- the sum
-
accurateAdd
Compute the sum of(x, xx)
and(y, yy)
.The high-part of the result is within 1 ulp of the true sum
e
. The low-part of the result is within 1 ulp of the result of the high-part subtracted from the true sume - hi
.Note: This is an internal helper method used when accuracy is required. The computed result is within 1 eps of the exact result where eps is 2-106. The performance is approximately 2-fold slower than
add(DD)
.- Parameters:
x
- High part of x.xx
- Low part of x.y
- High part of y.yy
- Low part of y.- Returns:
- the sum
-
subtract
Returns aDD
whose value is(this - y)
.This computes the same result as
add(-y)
.The computed result is within 2 eps of the exact result where eps is 2-106.
- Parameters:
y
- Value to be subtracted from this number.- Returns:
this - y
.- See Also:
-
subtract
Returns aDD
whose value is(this - y)
.This computes the same result as
add(y.negate())
.The computed result is within 4 eps of the exact result where eps is 2-106.
- Specified by:
subtract
in interfaceNativeOperators<DD>
- Parameters:
y
- Value to be subtracted from this number.- Returns:
this - y
.
-
multiply
Returns aDD
whose value isthis * y
.This computes the same result as
multiply(DD.of(y))
.The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
y
- Factor.- Returns:
this * y
.- See Also:
-
multiply
Compute the multiplication product of(x, xx)
andy
.This computes the same result as
multiply(x, xx, y, 0)
.The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
x
- High part of x.xx
- Low part of x.y
- High part of y.- Returns:
- the product
- See Also:
-
multiply
Returns aDD
whose value isthis * y
.The computed result is within 4 eps of the exact result where eps is 2-106.
- Specified by:
multiply
in interfaceMultiplication<DD>
- Parameters:
y
- Factor.- Returns:
this * y
.
-
multiply
Compute the multiplication product of(x, xx)
and(y, yy)
.The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
x
- High part of x.xx
- Low part of x.y
- High part of y.yy
- Low part of y.- Returns:
- the product
-
square
Returns aDD
whose value isthis * this
.This method is an optimisation of
multiply(this)
.The computed result is within 4 eps of the exact result where eps is 2-106.
- Returns:
this
2- See Also:
-
square
Compute the square of(x, xx)
.- Parameters:
x
- High part of x.xx
- Low part of x.- Returns:
- the square
-
divide
Returns aDD
whose value is(this / y)
. Ify = 0
the result is undefined.The computed result is within 1 eps of the exact result where eps is 2-106.
- Parameters:
y
- Divisor.- Returns:
this / y
.
-
divide
Compute the division of(x, xx)
byy
. Ify = 0
the result is undefined.The computed result is within 1 eps of the exact result where eps is 2-106.
- Parameters:
x
- High part of x.xx
- Low part of x.y
- High part of y.- Returns:
- the quotient
-
divide
Returns aDD
whose value is(this / y)
. Ify = 0
the result is undefined.The computed result is within 4 eps of the exact result where eps is 2-106.
- Specified by:
divide
in interfaceNativeOperators<DD>
- Parameters:
y
- Divisor.- Returns:
this / y
.
-
divide
Compute the division of(x, xx)
by(y, yy)
. Ify = 0
the result is undefined.The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
x
- High part of x.xx
- Low part of x.y
- High part of y.yy
- Low part of y.- Returns:
- the quotient
-
reciprocal
Compute the reciprocal ofthis
. Ifthis
value is zero the result is undefined.The computed result is within 4 eps of the exact result where eps is 2-106.
- Specified by:
reciprocal
in interfaceMultiplication<DD>
- Returns:
this
-1
-
reciprocal
Compute the inverse of(y, yy)
. Ify = 0
the result is undefined.The computed result is within 4 eps of the exact result where eps is 2-106.
- Parameters:
y
- High part of y.yy
- Low part of y.- Returns:
- the inverse
-
sqrt
Compute the square root ofthis
number(x, xx)
.Uses the result
Math.sqrt(x)
if that result is not a finite normalizeddouble
.Special cases:
- If
x
is NaN or less than zero, then the result is(NaN, 0)
. - If
x
is positive infinity, then the result is(+infinity, 0)
. - If
x
is positive zero or negative zero, then the result is(x, 0)
.
The computed result is within 4 eps of the exact result where eps is 2-106.
- Returns:
sqrt(this)
- See Also:
- If
-
isNotNormal
static boolean isNotNormal(double a) Checks if the number is not normal. This is functionally equivalent to:final double abs = Math.abs(a); return (abs <= Double.MIN_NORMAL || !(abs <= Double.MAX_VALUE));
- Parameters:
a
- The value.- Returns:
- true if the value is not normal
-
scalb
Multiplythis
number(x, xx)
by an integral power of two.(y, yy) = (x, xx) * 2^exp
The result is rounded as if performed by a single correctly rounded floating-point multiply. This performs the same result as:
y = Math.scalb(x, exp); yy = Math.scalb(xx, exp);
The implementation computes using a single multiplication if
exp
is in[-1022, 1023]
. Otherwise the parts(x, xx)
are scaled by repeated multiplication by power-of-two factors. The result is exact unless the scaling generates sub-normal parts; in this case precision may be lost by a single rounding.- Parameters:
exp
- Power of two scale factor.- Returns:
- the result
- See Also:
-
twoPow
static double twoPow(int n) Create a normalized double with the value2^n
.Warning: Do not call with
n = -1023
. This will create zero.- Parameters:
n
- Exponent (in the range [-1022, 1023]).- Returns:
- the double
-
frexp
Convertthis
numberx
to fractionalf
and integral2^exp
components.x = f * 2^exp
The combined fractional part (f, ff) is in the range
[0.5, 1)
.Special cases:
- If
x
is zero, then the normalized fraction is zero and the exponent is zero. - If
x
is NaN, then the normalized fraction is NaN and the exponent is unspecified. - If
x
is infinite, then the normalized fraction is infinite and the exponent is unspecified. - If high-part
x
is an exact power of 2 and the low-partxx
has an opposite signed non-zero magnitude then fraction high-partf
will be+/-1
such that the double-double number is in the range[0.5, 1)
.
This is named using the equivalent function in the standard C math.h library.
- Parameters:
exp
- Power of two scale factor (integral exponent).- Returns:
- Fraction part.
- See Also:
- If
-
getScale
private static int getScale(double a) Returns a scale suitable for use withMath.scalb(double, int)
to normalise the number to the interval[1, 2)
.In contrast to
Math.getExponent(double)
this handles sub-normal numbers by computing the number of leading zeros in the mantissa and shifting the unbiased exponent. The result is that for all finite, non-zero, numbers, the magnitude ofscalb(x, -getScale(x))
is always in the range[1, 2)
.This method is a functional equivalent of the c function ilogb(double).
The result is to be used to scale a number using
Math.scalb(double, int)
. Hence the special case of a zero argument is handled using the return value for NaN as zero cannot be scaled. This is different fromMath.getExponent(double)
.Special cases:
- If the argument is NaN or infinite, then the result is
Double.MAX_EXPONENT
+ 1. - If the argument is zero, then the result is
Double.MAX_EXPONENT
+ 1.
- Parameters:
a
- Value.- Returns:
- The unbiased exponent of the value to be used for scaling, or 1024 for 0, NaN or Inf
- See Also:
- If the argument is NaN or infinite, then the result is
-
pow
Computethis
number(x, xx)
raised to the powern
.Special cases:
- If
x
is not a finite normalizeddouble
, the low partxx
is ignored and the result isMath.pow(x, n)
. - If
n = 0
the result is(1, 0)
. - If
n = 1
the result is(x, xx)
. - If
n = -1
the result is thereciprocal
. - If the computation overflows the result is undefined.
Computation uses multiplication by factors generated by repeat squaring of the value. These multiplications have no special case handling for overflow; in the event of overflow the result is undefined. The
pow(int, long[])
method can be used to generate a scaled fraction result for any finiteDD
number and exponent.The computed result is approximately
16 * (n - 1) * eps
of the exact result where eps is 2-106.- Specified by:
pow
in interfaceNativeOperators<DD>
- Parameters:
n
- Exponent.- Returns:
this
n- See Also:
- If
-
computePow
Compute the numberx
(non-zero finite) raised to the powern
.The input power is treated as an unsigned integer. Thus the negative value
Integer.MIN_VALUE
is 2^31.- Parameters:
x
- Fractional high part of x.xx
- Fractional low part of x.n
- Power (in [2, 2^31]).- Returns:
- x^n.
-
pow
Computethis
numberx
raised to the powern
.The value is returned as fractional
f
and integral2^exp
components.(x+xx)^n = (f+ff) * 2^exp
The combined fractional part (f, ff) is in the range
[0.5, 1)
.Special cases:
- If
(x, xx)
is zero the high part of the fractional part is computed usingMath.pow(x, n)
and the exponent is 0. - If
n = 0
the fractional part is 0.5 and the exponent is 1. - If
(x, xx)
is an exact power of 2 the fractional part is 0.5 and the exponent is the power of 2 minus 1. - If the result high-part is an exact power of 2 and the low-part has an opposite
signed non-zero magnitude then the fraction high-part
f
will be+/-1
such that the double-double number is in the range[0.5, 1)
. - If the argument is not finite then a fractional representation is not possible. In this case the fraction and the scale factor is undefined.
The computed result is approximately
16 * (n - 1) * eps
of the exact result where eps is 2-106.- Parameters:
n
- Power.exp
- Result power of two scale factor (integral exponent).- Returns:
- Fraction part.
- See Also:
- If
-
computePowScaled
Compute the numberx
(non-zero finite) raised to the powern
.The input power is treated as an unsigned integer. Thus the negative value
Integer.MIN_VALUE
is 2^31.- Parameters:
b
- Integral component 2^exp of x.x
- Fractional high part of x.xx
- Fractional low part of x.n
- Power (in [2, 2^31]).exp
- Result power of two scale factor (integral exponent).- Returns:
- Fraction part.
-
equals
Test for equality with another object. If the other object is aDD
then a comparison is made of the parts; otherwisefalse
is returned.If both parts of two double-double numbers are numerically equivalent the two
DD
objects are considered to be equal. For this purpose, twodouble
values are considered to be the same if and only if the method callDouble.doubleToLongBits(value + 0.0)
returns the identicallong
when applied to each value. This provides numeric equality of different representations of zero as per-0.0 == 0.0
, and equality ofNaN
values.Note that in most cases, for two instances of class
DD
,x
andy
, the value ofx.equals(y)
istrue
if and only ifx.hi() == y.hi() && x.lo() == y.lo()
also has the value
true
. However, there are exceptions:- Instances that contain
NaN
values in the same part are considered to be equal for that part, even thoughDouble.NaN == Double.NaN
has the valuefalse
. - Instances that share a
NaN
value in one part but have different values in the other part are not considered equal.
The behavior is the same as if the components of the two double-double numbers were passed to
Arrays.equals(double[], double[])
:Arrays.equals(new double[]{x.hi() + 0.0, x.lo() + 0.0}, new double[]{y.hi() + 0.0, y.lo() + 0.0});
Note: Addition of
0.0
converts signed representations of zero values-0.0
and0.0
to a canonical0.0
. - Instances that contain
-
hashCode
public int hashCode()Gets a hash code for the double-double number.The behavior is the same as if the parts of the double-double number were passed to
Arrays.hashCode(double[])
:Arrays.hashCode(new double[] {hi() + 0.0, lo() + 0.0})
Note: Addition of
0.0
provides the same hash code for different signed representations of zero values-0.0
and0.0
. -
equals
private static boolean equals(double x, double y) Returnstrue
if the values are numerically equal.Two
double
values are considered to be the same if and only if the method callDouble.doubleToLongBits(value + 0.0)
returns the identicallong
when applied to each value. This provides numeric equality of different representations of zero as per-0.0 == 0.0
, and equality ofNaN
values.- Parameters:
x
- Valuey
- Value- Returns:
true
if the values are numerically equal
-
toString
Returns a string representation of the double-double number.The string will represent the numeric values of the parts. The values are split by a separator and surrounded by parentheses.
The format for a double-double number is
"(x,xx)"
, withx
andxx
converted as if usingDouble.toString(double)
.Note: A numerical string representation of a finite double-double number can be generated by conversion to a
BigDecimal
before formatting. -
zero
Identity element.Note: Addition of this value with any element
a
may not create an element equal toa
if the element contains sign zeros. In this case the magnitude of the result will be identical. -
isZero
public boolean isZero()Check if this is a neutral element of addition, i.e.this.add(a)
returnsa
or an element representing the same value asa
.The default implementation calls
equals(zero())
. Implementations may want to employ more a efficient method. This may even be required if an implementation has multiple representations ofzero
and itsequals
method differentiates between them. -
one
Identity element.Note: Multiplication of this value with any element
a
may not create an element equal toa
if the element contains sign zeros. In this case the magnitude of the result will be identical.- Specified by:
one
in interfaceMultiplication<DD>
- Returns:
- the field element such that for all
a
,one().multiply(a).equals(a)
istrue
.
-
isOne
public boolean isOne()Check if this is a neutral element of multiplication, i.e.this.multiply(a)
returnsa
or an element representing the same value asa
.The default implementation calls
equals(one())
. Implementations may want to employ more a efficient method. This may even be required if an implementation has multiple representations ofone
and itsequals
method differentiates between them.- Specified by:
isOne
in interfaceMultiplication<DD>
- Returns:
true
ifthis
is a neutral element of multiplication.- See Also:
-
multiply
Repeated addition.This computes the same result as
multiply((double) y)
.- Specified by:
multiply
in interfaceNativeOperators<DD>
- Parameters:
n
- Number of times to addthis
to itself.- Returns:
n * this
.- See Also:
-