Class ExtendedPrecision

java.lang.Object
org.apache.commons.numbers.core.ExtendedPrecision

final class ExtendedPrecision extends Object
Computes extended precision floating-point operations.

It is based on the 1971 paper Dekker (1971) A floating-point technique for extending the available precision.

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final int
    The value 2046 converted for use if using Integer.compareUnsigned(int, int).
    private static final int
    The value -1 converted for use if using Integer.compareUnsigned(int, int).
    private static final double
    The scale to use when down-scaling during a split into a high part.
    private static final int
    The mask to extract the raw 11-bit exponent.
    private static final double
    The multiplier used to split the double value into high and low parts.
    private static final double
    The upper limit above which a number may overflow during the split into a high part.
    private static final double
    The scale to use when re-scaling during a split into a high part.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
    Private constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    (package private) static double
    highPartUnscaled(double value)
    Implement Dekker's method to split a value into two parts.
    (package private) static boolean
    isNotNormal(double a)
    Checks if the number is not normal.
    (package private) static double
    productLow(double x, double y, double xy)
    Compute the low part of the double length number (z,zz) for the exact product of x and y.
    private static double
    productLow(double hx, double lx, double hy, double ly, double xy)
    Compute the low part of the double length number (z,zz) for the exact product of x and y using Dekker's mult12 algorithm.
    private static double
    productLowUnscaled(double x, double y, double xy)
    Compute the low part of the double length number (z,zz) for the exact product of x and y using Dekker's mult12 algorithm.
    (package private) static double
    squareLowUnscaled(double x, double xx)
    Compute the low part of the double length number (z,zz) for the exact square of x using Dekker's mult12 algorithm.
    (package private) static double
    twoSumLow(double a, double b, double sum)
    Compute the round-off from the sum of two numbers a and b using Knuth's two-sum algorithm.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • MULTIPLIER

      private static final double MULTIPLIER
      The multiplier used to split the double value into high and low parts. From Dekker (1971): "The constant should be chosen equal to 2^(p - p/2) + 1, where p is the number of binary digits in the mantissa". Here p is 53 and the multiplier is 2^27 + 1.
      See Also:
    • SAFE_UPPER

      private static final double SAFE_UPPER
      The upper limit above which a number may overflow during the split into a high part. Assuming the multiplier is above 2^27 and the maximum exponent is 1023 then a safe limit is a value with an exponent of (1023 - 27) = 2^996. 996 is the value obtained from Math.getExponent(Double.MAX_VALUE / MULTIPLIER).
      See Also:
    • DOWN_SCALE

      private static final double DOWN_SCALE
      The scale to use when down-scaling during a split into a high part. This must be smaller than the inverse of the multiplier and a power of 2 for exact scaling.
      See Also:
    • UP_SCALE

      private static final double UP_SCALE
      The scale to use when re-scaling during a split into a high part. This is the inverse of DOWN_SCALE.
      See Also:
    • EXP_MASK

      private static final int EXP_MASK
      The mask to extract the raw 11-bit exponent. The value must be shifted 52-bits to remove the mantissa bits.
      See Also:
    • CMP_UNSIGNED_2046

      private static final int CMP_UNSIGNED_2046
      The value 2046 converted for use if using Integer.compareUnsigned(int, int). This requires adding Integer.MIN_VALUE to 2046.
      See Also:
    • CMP_UNSIGNED_MINUS_1

      private static final int CMP_UNSIGNED_MINUS_1
      The value -1 converted for use if using Integer.compareUnsigned(int, int). This requires adding Integer.MIN_VALUE to -1.
      See Also:
  • Constructor Details

    • ExtendedPrecision

      private ExtendedPrecision()
      Private constructor.
  • Method Details

    • productLow

      static double productLow(double x, double y, double xy)
      Compute the low part of the double length number (z,zz) for the exact product of x and y. This is equivalent to computing a double containing the magnitude of the rounding error when converting the exact 106-bit significand of the multiplication result to a 53-bit significand.

      The method is written to be functionally similar to using a fused multiply add (FMA) operation to compute the low part, for example JDK 9's Math.fma function (note the sign change in the input argument for the product):

        double x = ...;
        double y = ...;
        double xy = x * y;
        double low1 = Math.fma(x, y, -xy);
        double low2 = productLow(x, y, xy);
       

      Special cases:

      • If x * y is sub-normal or zero then the result is 0.0.
      • If x * y is infinite or NaN then the result is NaN.
      Parameters:
      x - First factor.
      y - Second factor.
      xy - Product of the factors (x * y).
      Returns:
      the low part of the product double length number
      See Also:
    • isNotNormal

      static boolean isNotNormal(double a)
      Checks if the number is not normal. This is functionally equivalent to:
       final double abs = Math.abs(a);
       return (abs invalid input: '<'= Double.MIN_NORMAL || !(abs invalid input: '<'= Double.MAX_VALUE));
       
      Parameters:
      a - The value.
      Returns:
      true if the value is not normal
    • productLowUnscaled

      private static double productLowUnscaled(double x, double y, double xy)
      Compute the low part of the double length number (z,zz) for the exact product of x and y using Dekker's mult12 algorithm. The standard precision product x*y must be provided. The numbers x and y are split into high and low parts using Dekker's algorithm.

      Warning: This method does not perform scaling in Dekker's split and large finite numbers can create NaN results.

      Parameters:
      x - First factor.
      y - Second factor.
      xy - Product of the factors (x * y).
      Returns:
      the low part of the product double length number
      See Also:
    • squareLowUnscaled

      static double squareLowUnscaled(double x, double xx)
      Compute the low part of the double length number (z,zz) for the exact square of x using Dekker's mult12 algorithm. The standard precision product x*x must be provided. The number x is split into high and low parts using Dekker's algorithm.

      Warning: This method does not perform scaling in Dekker's split and large finite numbers can create NaN results.

      Parameters:
      x - Number to square
      xx - Standard precision product x*x
      Returns:
      the low part of the square double length number
    • productLow

      private static double productLow(double hx, double lx, double hy, double ly, double xy)
      Compute the low part of the double length number (z,zz) for the exact product of x and y using Dekker's mult12 algorithm. The standard precision product x*y must be provided. The numbers x and y should already be split into low and high parts.

      Note: This uses the high part of the result (z,zz) as x * y and not hx * hy + hx * ty + tx * hy as specified in Dekker's original paper. See Shewchuk (1997) for working examples.

      Parameters:
      hx - High part of first factor.
      lx - Low part of first factor.
      hy - High part of second factor.
      ly - Low part of second factor.
      xy - Product of the factors.
      Returns:
      lx * ly - (((xy - hx * hy) - lx * hy) - hx * ly)
      See Also:
    • highPartUnscaled

      static double highPartUnscaled(double value)
      Implement Dekker's method to split a value into two parts. Multiplying by (2^s + 1) creates a big value from which to derive the two split parts.
       c = (2^s + 1) * a
       a_big = c - a
       a_hi = c - a_big
       a_lo = a - a_hi
       a = a_hi + a_lo
       

      The multiplicand allows a p-bit value to be split into (p-s)-bit value a_hi and a non-overlapping (s-1)-bit value a_lo. Combined they have (p-1) bits of significand but the sign bit of a_lo contains a bit of information. The constant is chosen so that s is ceil(p/2) where the precision p for a double is 53-bits (1-bit of the mantissa is assumed to be 1 for a non sub-normal number) and s is 27.

      This conversion does not use scaling and the result of overflow is NaN. Overflow may occur when the exponent of the input value is above 996.

      Splitting a NaN or infinite value will return NaN.

      Parameters:
      value - Value.
      Returns:
      the high part of the value.
      See Also:
    • twoSumLow

      static double twoSumLow(double a, double b, double sum)
      Compute the round-off from the sum of two numbers a and b using Knuth's two-sum algorithm. The values are not required to be ordered by magnitude. The standard precision sum must be provided.
      Parameters:
      a - First part of sum.
      b - Second part of sum.
      sum - Sum of the parts (a + b).
      Returns:
      (b - (sum - (sum - b))) + (a - (sum - b))
      See Also: