Class ExtendedPrecision

    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int CMP_UNSIGNED_2046
      The value 2046 converted for use if using Integer.compareUnsigned(int, int).
      private static int CMP_UNSIGNED_MINUS_1
      The value -1 converted for use if using Integer.compareUnsigned(int, int).
      private static double DOWN_SCALE
      The scale to use when down-scaling during a split into a high part.
      private static int EXP_MASK
      The mask to extract the raw 11-bit exponent.
      private static double MULTIPLIER
      The multiplier used to split the double value into high and low parts.
      private static double SAFE_UPPER
      The upper limit above which a number may overflow during the split into a high part.
      private static double UP_SCALE
      The scale to use when re-scaling during a split into a high part.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private ExtendedPrecision()
      Private constructor.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      (package private) static double highPartUnscaled​(double value)
      Implement Dekker's method to split a value into two parts.
      (package private) static boolean isNotNormal​(double a)
      Checks if the number is not normal.
      (package private) static double productLow​(double x, double y, double xy)
      Compute the low part of the double length number (z,zz) for the exact product of x and y.
      private static double productLow​(double hx, double lx, double hy, double ly, double xy)
      Compute the low part of the double length number (z,zz) for the exact product of x and y using Dekker's mult12 algorithm.
      private static double productLowUnscaled​(double x, double y, double xy)
      Compute the low part of the double length number (z,zz) for the exact product of x and y using Dekker's mult12 algorithm.
      (package private) static double squareLowUnscaled​(double x, double xx)
      Compute the low part of the double length number (z,zz) for the exact square of x using Dekker's mult12 algorithm.
      (package private) static double twoSumLow​(double a, double b, double sum)
      Compute the round-off from the sum of two numbers a and b using Knuth's two-sum algorithm.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • MULTIPLIER

        private static final double MULTIPLIER
        The multiplier used to split the double value into high and low parts. From Dekker (1971): "The constant should be chosen equal to 2^(p - p/2) + 1, where p is the number of binary digits in the mantissa". Here p is 53 and the multiplier is 2^27 + 1.
        See Also:
        Constant Field Values
      • SAFE_UPPER

        private static final double SAFE_UPPER
        The upper limit above which a number may overflow during the split into a high part. Assuming the multiplier is above 2^27 and the maximum exponent is 1023 then a safe limit is a value with an exponent of (1023 - 27) = 2^996. 996 is the value obtained from Math.getExponent(Double.MAX_VALUE / MULTIPLIER).
        See Also:
        Constant Field Values
      • DOWN_SCALE

        private static final double DOWN_SCALE
        The scale to use when down-scaling during a split into a high part. This must be smaller than the inverse of the multiplier and a power of 2 for exact scaling.
        See Also:
        Constant Field Values
      • UP_SCALE

        private static final double UP_SCALE
        The scale to use when re-scaling during a split into a high part. This is the inverse of DOWN_SCALE.
        See Also:
        Constant Field Values
      • EXP_MASK

        private static final int EXP_MASK
        The mask to extract the raw 11-bit exponent. The value must be shifted 52-bits to remove the mantissa bits.
        See Also:
        Constant Field Values
      • CMP_UNSIGNED_2046

        private static final int CMP_UNSIGNED_2046
        The value 2046 converted for use if using Integer.compareUnsigned(int, int). This requires adding Integer.MIN_VALUE to 2046.
        See Also:
        Constant Field Values
      • CMP_UNSIGNED_MINUS_1

        private static final int CMP_UNSIGNED_MINUS_1
        The value -1 converted for use if using Integer.compareUnsigned(int, int). This requires adding Integer.MIN_VALUE to -1.
        See Also:
        Constant Field Values
    • Constructor Detail

      • ExtendedPrecision

        private ExtendedPrecision()
        Private constructor.
    • Method Detail

      • productLow

        static double productLow​(double x,
                                 double y,
                                 double xy)
        Compute the low part of the double length number (z,zz) for the exact product of x and y. This is equivalent to computing a double containing the magnitude of the rounding error when converting the exact 106-bit significand of the multiplication result to a 53-bit significand.

        The method is written to be functionally similar to using a fused multiply add (FMA) operation to compute the low part, for example JDK 9's Math.fma function (note the sign change in the input argument for the product):

          double x = ...;
          double y = ...;
          double xy = x * y;
          double low1 = Math.fma(x, y, -xy);
          double low2 = productLow(x, y, xy);
         

        Special cases:

        • If x * y is sub-normal or zero then the result is 0.0.
        • If x * y is infinite or NaN then the result is NaN.
        Parameters:
        x - First factor.
        y - Second factor.
        xy - Product of the factors (x * y).
        Returns:
        the low part of the product double length number
        See Also:
        highPartUnscaled(double)
      • isNotNormal

        static boolean isNotNormal​(double a)
        Checks if the number is not normal. This is functionally equivalent to:
         final double abs = Math.abs(a);
         return (abs <= Double.MIN_NORMAL || !(abs <= Double.MAX_VALUE));
         
        Parameters:
        a - The value.
        Returns:
        true if the value is not normal
      • productLowUnscaled

        private static double productLowUnscaled​(double x,
                                                 double y,
                                                 double xy)
        Compute the low part of the double length number (z,zz) for the exact product of x and y using Dekker's mult12 algorithm. The standard precision product x*y must be provided. The numbers x and y are split into high and low parts using Dekker's algorithm.

        Warning: This method does not perform scaling in Dekker's split and large finite numbers can create NaN results.

        Parameters:
        x - First factor.
        y - Second factor.
        xy - Product of the factors (x * y).
        Returns:
        the low part of the product double length number
        See Also:
        highPartUnscaled(double), productLow(double, double, double, double, double)
      • squareLowUnscaled

        static double squareLowUnscaled​(double x,
                                        double xx)
        Compute the low part of the double length number (z,zz) for the exact square of x using Dekker's mult12 algorithm. The standard precision product x*x must be provided. The number x is split into high and low parts using Dekker's algorithm.

        Warning: This method does not perform scaling in Dekker's split and large finite numbers can create NaN results.

        Parameters:
        x - Number to square
        xx - Standard precision product x*x
        Returns:
        the low part of the square double length number
      • productLow

        private static double productLow​(double hx,
                                         double lx,
                                         double hy,
                                         double ly,
                                         double xy)
        Compute the low part of the double length number (z,zz) for the exact product of x and y using Dekker's mult12 algorithm. The standard precision product x*y must be provided. The numbers x and y should already be split into low and high parts.

        Note: This uses the high part of the result (z,zz) as x * y and not hx * hy + hx * ty + tx * hy as specified in Dekker's original paper. See Shewchuk (1997) for working examples.

        Parameters:
        hx - High part of first factor.
        lx - Low part of first factor.
        hy - High part of second factor.
        ly - Low part of second factor.
        xy - Product of the factors.
        Returns:
        lx * ly - (((xy - hx * hy) - lx * hy) - hx * ly)
        See Also:
        Shewchuk (1997) Theorum 18
      • highPartUnscaled

        static double highPartUnscaled​(double value)
        Implement Dekker's method to split a value into two parts. Multiplying by (2^s + 1) creates a big value from which to derive the two split parts.
         c = (2^s + 1) * a
         a_big = c - a
         a_hi = c - a_big
         a_lo = a - a_hi
         a = a_hi + a_lo
         

        The multiplicand allows a p-bit value to be split into (p-s)-bit value a_hi and a non-overlapping (s-1)-bit value a_lo. Combined they have (p-1) bits of significand but the sign bit of a_lo contains a bit of information. The constant is chosen so that s is ceil(p/2) where the precision p for a double is 53-bits (1-bit of the mantissa is assumed to be 1 for a non sub-normal number) and s is 27.

        This conversion does not use scaling and the result of overflow is NaN. Overflow may occur when the exponent of the input value is above 996.

        Splitting a NaN or infinite value will return NaN.

        Parameters:
        value - Value.
        Returns:
        the high part of the value.
        See Also:
        Math.getExponent(double)
      • twoSumLow

        static double twoSumLow​(double a,
                                double b,
                                double sum)
        Compute the round-off from the sum of two numbers a and b using Knuth's two-sum algorithm. The values are not required to be ordered by magnitude. The standard precision sum must be provided.
        Parameters:
        a - First part of sum.
        b - Second part of sum.
        sum - Sum of the parts (a + b).
        Returns:
        (b - (sum - (sum - b))) + (a - (sum - b))
        See Also:
        Shewchuk (1997) Theorum 7