Package org.apfloat.internal
The org.apfloat.internal
package contains four different
implementations of the apfloat SPI, each based on a different primitive
element type:
LongBuilderFactory
, based on element typelong
: This is the default implementation used by apfloat. It uses the 64-bitlong
integer as the elementary type for all data storage and manipulation. It usually is faster than theint
version on 64-bit JVMs, which is mostly the case today. In some places it uses alsodouble
arithmetic, so the processor should be able to perform double-precision floating point operations as well as convert betweendouble
andlong
, for decent performance. For example, on x86-64 and SPARC the 64-bitlong
version is faster than the 32-bitint
version. You can use thelong
implementation on 32-bit platforms too, however the performance per element is less than half of theint
version, even if roughly twice as much data is processed per element. The upside is that this implementation can do much bigger calculations: up to about 3.5 * 1015 digits in radix 10.IntBuilderFactory
, based on element typeint
: It works well for 32-bit platforms that perform integer operations fast (including integer multiplication), and can multiplydouble
s and convert betweendouble
andint
with adequate performance. This applies to most workstations today (Intel x86 processors and compatibles, in particular processors with SSE2 support, and most RISC architectures). You can do calculations up to roughly 226 million digits (in radix 10) with this implementation, which should be enough for most purposes.DoubleBuilderFactory
, based on element typedouble
: This implementation exists generally only as a curiosity. It will typically perform worse than thelong
version, and it's only able to do calculations with about 1/20 of its maximum digit length. The only situation where using thedouble
version might make sense is on a platform that performs floating-point arithmetic well, but performs integer arithmetic extremely badly. Finding such a platform today might be difficult, so generally it's advisable to use thelong
version instead, if you have a 64-bit platform or need the most extreme precision.FloatBuilderFactory
, based on element typefloat
: This version is also only a curiosity. The main downside is that it can only perform calculations up to about 1.3 million radix-10 digits. The per-digit performance is also typically less than that of theint
version. Unless you have a computer that performs floating-point arithmetic extraordinarily well compared to integer arithmetic, it's always advisable to use thelong
orint
version instead.
Type | Pentium 4 | Athlon XP | Athlon 64 (32-bit) | Athlon 64 (64-bit) | UltraSPARC II |
---|---|---|---|---|---|
Int | 100% | 100% | 100% | 100% | 100% |
Long | 40% | 76% | 59% | 95% | 132% |
Double | 45% | 63% | 59% | 94% | 120% |
Float | 40% | 43% | 46% | 42% | 82% |
(Test was done with apfloat 1.1 using Sun's Java 5.0 server VM calculating π to one million digits with no disk storage.)
Compared to the java.math.BigInteger
class with different digit
sizes, the apfloat relative performance with the same CPUs is as follows:
(Test was done with apfloat 1.1 using Sun's Java 5.0 server VM calculating 3n and converting the result to decimal.)
This benchmark suggests that for small numbers – less than roughly 200 decimal
digits in size – the BigInteger
/ BigDecimal
classes
are probably faster, even by an order of magnitude. Using apfloats is only beneficial
for numbers that have at least a couple hundred digits, or of course if some
mathematical functions are needed that are not available for BigInteger
s
or BigDecimal
s. The results can be easily explained by the smaller overhead
that BigInteger
s have due to their simpler implementation. When the size
of the mantissa grows, the O(n log n) complexity of apfloat's FFT-based multiplication
makes apfloat considerably faster than the steady O(n2) implementation
of the BigInteger
class. For numbers with millions of digits,
multiplication using BigInteger
s would be simply unfeasible, whereas for
apfloat it would not be a problem at all.
All of the above apfloat implementations have the following features (some of the links
point to the int
version, but all four versions have similar classes):
- Depending on the size, numbers can be stored in memory
(
IntMemoryDataStorage
) or on disk (IntDiskDataStorage
). - Multiplication can be done in an optimized way if one multiplicand
has size 1 (
IntShortConvolutionStrategy
), using a simple O(n2) long multiplication algorithm for small numbers, with low overhead (IntMediumConvolutionStrategy
), using the Karatsuba multiplication algorithm for slightly larger numbers, with some more overhead (IntKaratsubaConvolutionStrategy
), or using a Number Theoretic Transform (NTT) done using three different moduli, and the final result calculated using the Chinese Remainder Theorem (ThreeNTTConvolutionStrategy
), for big numbers. - Different NTT algorithms for different transform lengths: basic fast NTT
(
IntTableFNTStrategy
) when the entire transform fits in the processor cache, "six-step" NTT when the transform fits in the main memory (SixStepFNTStrategy
), and a disk-based "two-pass" NTT strategy when the whole transform doesn't fit in the available memory (TwoPassFNTStrategy
).
ApfloatInternalException
.
This exception, or various subclasses can be thrown in different situations, for
example:
- Backing storage failure. For example, if a number is stored on disk,
an
IOException
can be thrown in any of the disk operations, if e.g. a file can't be created, or written to if the disk is full. - Operands have different radixes. This is a limitation allowed by the specification.
- Other internal limitation, e.g. the maximum transform length mathematically possible for the implementation, is exceeded.
*.ap
and they are by default created in the current working directory. When the objects
are garbage collected, the temporary files are deleted. However, garbage collection
may not work perfectly at all times, and in general there are no guarantees that
it will happen at all. So, depending on the program being executed, it may be
beneficial to explicitly call System.gc()
at some point to ensure
that unused temporary files are deleted. However, VM vendors generally warn
against doing this too often, since it may seriously degrade performance. So,
figuring out how to optimally call it may be difficult. If the file deletion fails
for some reason, some temporary files may be left on disk after the program
exits. These files can be safely removed after the program has terminated.Many parts of the program are parallelized i.e. are processed with multiple threads in parallel. Parallelization is done where it has been easy to implement and where it is efficient. E.g. the "six-step" NTT is parallelized, because the data is in matrix form in memory and it's easy and highly efficient to process the rows of the matrix in parallel. Other places where parallelization is implemented are the in-place multiplication of transform results and the carry-CRT operation. However in both of these algorithms the process is parallelized only if the data is in memory - if the data was stored on disk then the irregular disk seeking could make the parallel algorithm highly inefficient.
Many sections of the code are not parallelized, where it's obvious that parallelization would not bring any benefits. Examples of such cases are addition, subtraction and matrix transposition. While parallel algorithms for these operations could certainly be implemented, they would not bring any performance improvement. The bottleneck in these operations is memory or I/O bandwidth and not CPU processing time. The CPU processing in addition and subtraction is highly trivial; in matrix transposition it's outright nonexistent - the algorithm only moves data from one place to another. Even if all the data was stored in memory, the memory bandwidth would be the bottleneck. E.g. in addition, the algorithm only needs a few CPU cycles per element to be processed. However moving the data from main memory to CPU registers and back to main memory needs likely significantly more CPU cycles than the addition operation itself. Parallelization would therefore not improve efficiency at all - the total CPU load might appear to increase but when measured in wall-clock time the execution would not be any faster.
Since the core functionality of the apfloat implementation is based on the
original C++ version of apfloat, no significant new algorithms have been
added (although the architecture has been otherwise greatly beautified e.g. by
separating the different implementations behind a SPI, and applying all kinds
of patterns everywhere). Thus, there are no different implementations for e.g.
using a floating-point FFT instead of a NTT, as the SPI (org.apfloat.spi
)
might suggest. However the default implementation does implement all the
patterns suggested by the SPI – in fact the SPI was designed for the
default implementation.
The class diagram for an example apfloat that is stored on disk is shown below.
Note that all the aggregate classes can be shared by multiple objects that point
to the same instance. For example, multiple Apfloats can point to the same
ApfloatImpl, multiple ApfloatImpls can point to the same DataStorage etc. This
sharing happens in various situations, e.g. by calling floor()
,
multiplying by one etc:
The sequence diagram for creating a new apfloat that is stored on disk is as follows. Note that the FileStorage class is a private inner class of the DiskDataStorage class:
The sequence diagram for multiplying two apfloats is as follows. In this case a NTT based convolution is used, and the resulting apfloat is stored in memory:
Most of the files in the apfloat implementations are generated from templates
where a template tag is replaced by int/long/float/double
or
Int/Long/Float/Double
. Also the byte size of the element type is
templatized and replaced by 4/8/4/8. The only files that are individually
implemented for each element type are:
BaseMath.java CRTMath.java ElementaryModMath.java ModConstants.java
- See Also:
org.apfloat.spi
-
Interface Summary Interface Description DoubleConstants Constants needed for various algorithms for thedouble
type.DoubleModConstants Constants needed for various modular arithmetic operations for thedouble
type.DoubleRadixConstants Constants related to different radixes for thedouble
data type.FloatConstants Constants needed for various algorithms for thefloat
type.FloatModConstants Constants needed for various modular arithmetic operations for thefloat
type.FloatRadixConstants Constants related to different radixes for thefloat
data type.IntConstants Constants needed for various algorithms for theint
type.IntModConstants Constants needed for various modular arithmetic operations for theint
type.IntRadixConstants Constants related to different radixes for theint
data type.LongConstants Constants needed for various algorithms for thelong
type.LongModConstants Constants needed for various modular arithmetic operations for thelong
type.LongRadixConstants Constants related to different radixes for thelong
data type.Parallelizable Any task that can use aParallelRunner
to execute operations in parallel. -
Class Summary Class Description AbstractConvolutionBuilder Abstract base class for creating convolutions of suitable type for the specified length.AbstractDataStorageBuilder Abstract base class for a data storage creation strategy.AbstractNTTBuilder Abstract base class for creating Number Theoretic Transforms suitable for the specified length, based on available memory configured in theApfloatContext
.AbstractStepFNTStrategy Abstract superclass for step-based FNT strategies.ConcurrentSoftHashMap<K,V> ConcurrentHashMap with softly referenced values.DiskDataStorage Abstract base class for disk-based data storage, containing the common functionality independent of the element type.DiskDataStorage.FileStorage DiskDataStorage.FileStorageReference DoubleAdditionBuilder Creates additions for the specified radix and thedouble
element type.DoubleAdditionStrategy Basic addition strategy for thedouble
element type.DoubleApfloatBuilder Builder class for buildingApfloatImpl
implementations with thedouble
data element type.DoubleApfloatImpl Immutable apfloat implementation class for thedouble
data element type.DoubleBaseMath Mathematical operations on numbers in a base.DoubleBuilderFactory Factory class for getting instances of the various builder classes needed to build anApfloatImpl
with thedouble
data element type.DoubleCarryCRTBuilder Creates carry-CRT related objects, for thedouble
type.DoubleCarryCRTStepStrategy Class for performing the final steps of a three-modulus Number Theoretic Transform based convolution.DoubleConvolutionBuilder Creates convolutions of suitable type for thedouble
type.DoubleCRTMath Basic arithmetic for calculating the Chinese Remainder Theorem.DoubleDataStorageBuilder Default data storage creation strategy for thedouble
data type.DoubleDiskDataStorage Disk-based data storage for thedouble
element type.DoubleElementaryModMath Elementary modulo arithmetic functions fordouble
data.DoubleFactor3NTTStepStrategy Steps for the factor-3 NTT.DoubleKaratsubaConvolutionStrategy Convolution strategy using the Karatsuba algorithm.DoubleMatrixBuilder Creates matrix operations objects, for thedouble
type.DoubleMatrixStrategy Optimized matrix transposition methods for thedouble
type.DoubleMediumConvolutionStrategy Medium-length convolution strategy.DoubleMemoryArrayAccess Array access class based on adouble[]
.DoubleMemoryDataStorage Memory based data storage implementation for thedouble
element type.DoubleModMath Modulo arithmetic functions fordouble
data.DoubleNTTBuilder Creates Number Theoretic Transforms for thedouble
type.DoubleNTTConvolutionStepStrategy Steps of a three-NTT convolution for thedouble
type.DoubleNTTStepStrategy Common methods to calculate Fast Number Theoretic Transforms in parallel using multiple threads.DoubleScramble Functions to perform bit-reverse ordering ofdouble
data.DoubleShortConvolutionStrategy Short convolution strategy.DoubleTableFNT Fast Number Theoretic Transform that uses lookup tables for powers of n:th root of unity and permutation indexes.DoubleTableFNTStrategy Fast Number Theoretic Transform strategy that uses lookup tables for powers of n:th root of unity and permutation indexes.DoubleWTables Helper class for generating and caching tables of powers of the n:th root of unity.Factor3NTTStrategy A transform that implements a 3-point transform on top of another Number Theoretic Transform that does transforms of length 2n.FloatAdditionBuilder Creates additions for the specified radix and thefloat
element type.FloatAdditionStrategy Basic addition strategy for thefloat
element type.FloatApfloatBuilder Builder class for buildingApfloatImpl
implementations with thefloat
data element type.FloatApfloatImpl Immutable apfloat implementation class for thefloat
data element type.FloatBaseMath Mathematical operations on numbers in a base.FloatBuilderFactory Factory class for getting instances of the various builder classes needed to build anApfloatImpl
with thefloat
data element type.FloatCarryCRTBuilder Creates carry-CRT related objects, for thefloat
type.FloatCarryCRTStepStrategy Class for performing the final steps of a three-modulus Number Theoretic Transform based convolution.FloatConvolutionBuilder Creates convolutions of suitable type for thefloat
type.FloatCRTMath Basic arithmetic for calculating the Chinese Remainder Theorem.FloatDataStorageBuilder Default data storage creation strategy for thefloat
data type.FloatDiskDataStorage Disk-based data storage for thefloat
element type.FloatElementaryModMath Elementary modulo arithmetic functions forfloat
data.FloatFactor3NTTStepStrategy Steps for the factor-3 NTT.FloatKaratsubaConvolutionStrategy Convolution strategy using the Karatsuba algorithm.FloatMatrixBuilder Creates matrix operations objects, for thefloat
type.FloatMatrixStrategy Optimized matrix transposition methods for thefloat
type.FloatMediumConvolutionStrategy Medium-length convolution strategy.FloatMemoryArrayAccess Array access class based on afloat[]
.FloatMemoryDataStorage Memory based data storage implementation for thefloat
element type.FloatModMath Modulo arithmetic functions forfloat
data.FloatNTTBuilder Creates Number Theoretic Transforms for thefloat
type.FloatNTTConvolutionStepStrategy Steps of a three-NTT convolution for thefloat
type.FloatNTTStepStrategy Common methods to calculate Fast Number Theoretic Transforms in parallel using multiple threads.FloatScramble Functions to perform bit-reverse ordering offloat
data.FloatShortConvolutionStrategy Short convolution strategy.FloatTableFNT Fast Number Theoretic Transform that uses lookup tables for powers of n:th root of unity and permutation indexes.FloatTableFNTStrategy Fast Number Theoretic Transform strategy that uses lookup tables for powers of n:th root of unity and permutation indexes.FloatWTables Helper class for generating and caching tables of powers of the n:th root of unity.IntAdditionBuilder Creates additions for the specified radix and theint
element type.IntAdditionStrategy Basic addition strategy for theint
element type.IntApfloatBuilder Builder class for buildingApfloatImpl
implementations with theint
data element type.IntApfloatImpl Immutable apfloat implementation class for theint
data element type.IntBaseMath Mathematical operations on numbers in a base.IntBuilderFactory Factory class for getting instances of the various builder classes needed to build anApfloatImpl
with theint
data element type.IntCarryCRTBuilder Creates carry-CRT related objects, for theint
type.IntCarryCRTStepStrategy Class for performing the final steps of a three-modulus Number Theoretic Transform based convolution.IntConvolutionBuilder Creates convolutions of suitable type for theint
type.IntCRTMath Basic arithmetic for calculating the Chinese Remainder Theorem.IntDataStorageBuilder Default data storage creation strategy for theint
data type.IntDiskDataStorage Disk-based data storage for theint
element type.IntElementaryModMath Elementary modulo arithmetic functions forint
data.IntFactor3NTTStepStrategy Steps for the factor-3 NTT.IntKaratsubaConvolutionStrategy Convolution strategy using the Karatsuba algorithm.IntMatrixBuilder Creates matrix operations objects, for theint
type.IntMatrixStrategy Optimized matrix transposition methods for theint
type.IntMediumConvolutionStrategy Medium-length convolution strategy.IntMemoryArrayAccess Array access class based on aint[]
.IntMemoryDataStorage Memory based data storage implementation for theint
element type.IntModMath Modulo arithmetic functions forint
data.IntNTTBuilder Creates Number Theoretic Transforms for theint
type.IntNTTConvolutionStepStrategy Steps of a three-NTT convolution for theint
type.IntNTTStepStrategy Common methods to calculate Fast Number Theoretic Transforms in parallel using multiple threads.IntScramble Functions to perform bit-reverse ordering ofint
data.IntShortConvolutionStrategy Short convolution strategy.IntTableFNT Fast Number Theoretic Transform that uses lookup tables for powers of n:th root of unity and permutation indexes.IntTableFNTStrategy Fast Number Theoretic Transform strategy that uses lookup tables for powers of n:th root of unity and permutation indexes.IntWTables Helper class for generating and caching tables of powers of the n:th root of unity.LongAdditionBuilder Creates additions for the specified radix and thelong
element type.LongAdditionStrategy Basic addition strategy for thelong
element type.LongApfloatBuilder Builder class for buildingApfloatImpl
implementations with thelong
data element type.LongApfloatImpl Immutable apfloat implementation class for thelong
data element type.LongBaseMath Mathematical operations on numbers in a base.LongBuilderFactory Factory class for getting instances of the various builder classes needed to build anApfloatImpl
with thelong
data element type.LongCarryCRTBuilder Creates carry-CRT related objects, for thelong
type.LongCarryCRTStepStrategy Class for performing the final steps of a three-modulus Number Theoretic Transform based convolution.LongConvolutionBuilder Creates convolutions of suitable type for thelong
type.LongCRTMath Basic arithmetic for calculating the Chinese Remainder Theorem.LongDataStorageBuilder Default data storage creation strategy for thelong
data type.LongDiskDataStorage Disk-based data storage for thelong
element type.LongElementaryModMath Elementary modulo arithmetic functions forlong
data.LongFactor3NTTStepStrategy Steps for the factor-3 NTT.LongKaratsubaConvolutionStrategy Convolution strategy using the Karatsuba algorithm.LongMatrixBuilder Creates matrix operations objects, for thelong
type.LongMatrixStrategy Optimized matrix transposition methods for thelong
type.LongMediumConvolutionStrategy Medium-length convolution strategy.LongMemoryArrayAccess Array access class based on along[]
.LongMemoryDataStorage Memory based data storage implementation for thelong
element type.LongModMath Modulo arithmetic functions forlong
data.LongNTTBuilder Creates Number Theoretic Transforms for thelong
type.LongNTTConvolutionStepStrategy Steps of a three-NTT convolution for thelong
type.LongNTTStepStrategy Common methods to calculate Fast Number Theoretic Transforms in parallel using multiple threads.LongScramble Functions to perform bit-reverse ordering oflong
data.LongShortConvolutionStrategy Short convolution strategy.LongTableFNT Fast Number Theoretic Transform that uses lookup tables for powers of n:th root of unity and permutation indexes.LongTableFNTStrategy Fast Number Theoretic Transform strategy that uses lookup tables for powers of n:th root of unity and permutation indexes.LongWTables Helper class for generating and caching tables of powers of the n:th root of unity.MessagePasser<K,V> Message passing helper class for parallel codes.ParallelExecutionBuilder Returns an execution strategy using theParallelRunner
.ParallelExecutionStrategy Execution strategy using theParallelRunner
.ParallelRunnable Abstract class for aRunnable
that can be run in parallel by multiple threads.ParallelRunner Class for runningParallelRunnable
objects in parallel using multiple threads.ParallelThreeNTTConvolutionStrategy Convolution using three Number Theoretic Transforms and the CRT to get the final result, using multiple threads in parallel.ParallelThreeNTTConvolutionStrategy.LockFuture Scramble Functions to perform bit-reverse ordering of data.SixStepFNTStrategy Fast Number Theoretic Transform that uses a "six-step" algorithm to calculate a long transform more efficiently on cache-based memory architectures.StepCarryCRTStrategy Class for performing the final step of a three-modulus Number Theoretic Transform based convolution.ThreeNTTConvolutionStrategy Convolution using three Number Theoretic Transforms and the Chinese Remainder Theorem to get the final result.TwoPassFNTStrategy Fast Number Theoretic Transform that uses a "two-pass" algorithm to calculate a very long transform on data that resides on a mass storage device. -
Exception Summary Exception Description ApfloatInternalException Exception indicating some unexpected apfloat implementation specific error situation.BackingStorageException Exception indicating a backing storage failure.ImplementationMismatchException Exception indicating a different implementation of the apfloat SPI being used in two operands of a calculation.RadixMismatchException Exception indicating a different radix being used in two operands of a calculation.TransformLengthExceededException Exception indicating that the "size" of the numbers used in a multiplication is too large.