Class FloatMatrixStrategy
- java.lang.Object
-
- org.apfloat.internal.FloatMatrixStrategy
-
- All Implemented Interfaces:
MatrixStrategy
public class FloatMatrixStrategy extends java.lang.Object implements MatrixStrategy
Optimized matrix transposition methods for thefloat
type. The matrix transposition algorithm isn't parallelized.While the matrix transposition algorithm could easily be parallelized, on an SMP machine it does not make any sense. If the matrix doesn't fit in any processor specific cache then the memory (or higher level shared cache) bandwidth becomes a bottleneck in the algorithm. Matrix transposition is in principle a very simple algorithm - it doesn't do anything else than move data from one place to another. If shared memory is the bottleneck, then the algorithm isn't any faster if the data is being moved around by one thread or by multiple threads in parallel.
If the data fits in a processor specific cache, then the algorithm could theoretically be made faster with parallelization. To make the parallelization effective however, the data would have to be set up in some kind of a NUMA way. For example, each processor core would hold an equal section of the data in the processor cache. Then the algorithm could be made faster as each processor core could quickly transpose blocks of data that are in the processor cache, and then exchange blocks with other processor cores via the slower higher level shared cache or main memory.
This approach doesn't work well in practice however, at least not in a Java program. The reason is that there are no guarantees where the data is when the algorithm starts (in which processor core caches), and further there are no guarantees of any processor affinity for the threads that are executing in parallel. Different processor cores could be executing the transposition of different sections of the data at any moment, depending on how the operating system (and the JVM) schedule thread execution. And more often than not, the operating system isn't smart enough to apply any such processor affinity for the threads.
An additional problem for any NUMA based attempt is that the data array would have to be aligned on a cache line (e.g. 64 or 128 bytes), to prevent cache contention at the edges of each data section. But a JVM makes no such guarantees about memory alignment. And since pointers do not exist in Java, manually aligning memory addresses isn't possible.
Considering all of the above, the parallel algorithm doesn't in practice work any faster than the single-thread algorithm, as the algorithm is bound by the memory bandwidth (or shared cache bandwidth). In some cases parallelization can even make the execution slower due to increased cache contention.
- Since:
- 1.7.0
- Version:
- 1.7.0
-
-
Constructor Summary
Constructors Constructor Description FloatMatrixStrategy()
Default constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static void
moveBlock(float[] source, int sourceOffset, int sourceWidth, float[] dest, int destOffset, int destWidth, int b)
private static void
permuteToDoubleWidth(float[] data, int offset, int n1, int n2)
void
permuteToDoubleWidth(ArrayAccess arrayAccess, int n1, int n2)
Permute the rows of the n1 x n2 matrix so that it is shaped like a n1/2 x 2*n2 matrix.private static void
permuteToHalfWidth(float[] data, int offset, int n1, int n2)
void
permuteToHalfWidth(ArrayAccess arrayAccess, int n1, int n2)
Permute the rows of the n1 x n2 matrix so that it is shaped like a 2*n1 x n2/2 matrix.void
transpose(ArrayAccess arrayAccess, int n1, int n2)
Transpose a n1 x n2 matrix.private static void
transpose2blocks(float[] data, int offset1, int offset2, int width, int b)
private static void
transposeBlock(float[] data, int offset, int width, int b)
private static void
transposeSquare(float[] data, int offset, int n1, int n2)
void
transposeSquare(ArrayAccess arrayAccess, int n1, int n2)
Transpose a square n1 x n1 block of n1 x n2 matrix.
-
-
-
Method Detail
-
transpose
public void transpose(ArrayAccess arrayAccess, int n1, int n2) throws ApfloatRuntimeException
Transpose a n1 x n2 matrix.Both n1 and n2 must be powers of two. Additionally, one of these must be true:
n1 = n2
n1 = 2*n2
n2 = 2*n1- Specified by:
transpose
in interfaceMatrixStrategy
- Parameters:
arrayAccess
- Accessor to the matrix data. This data will be transposed.n1
- Number of rows in the matrix.n2
- Number of columns in the matrix.- Throws:
ApfloatRuntimeException
-
transposeSquare
public void transposeSquare(ArrayAccess arrayAccess, int n1, int n2) throws ApfloatRuntimeException
Transpose a square n1 x n1 block of n1 x n2 matrix.Both n1 and n2 must be powers of two, and n1 <= n2.
- Specified by:
transposeSquare
in interfaceMatrixStrategy
- Parameters:
arrayAccess
- Accessor to the matrix data. This data will be transposed.n1
- Number of rows and columns in the block to be transposed.n2
- Number of columns in the matrix.- Throws:
ApfloatRuntimeException
-
permuteToDoubleWidth
public void permuteToDoubleWidth(ArrayAccess arrayAccess, int n1, int n2) throws ApfloatRuntimeException
Permute the rows of the n1 x n2 matrix so that it is shaped like a n1/2 x 2*n2 matrix. Logically, the matrix is split in half, and the lower half is moved to the right side of the upper half.Both n1 and n2 must be powers of two, and n1 >= 2.
E.g. if the matrix layout is originally as follows:
Matrix before 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Then after this method it is as follows:
Matrix after 0 1 2 3 8 9 10 11 4 5 6 7 12 13 14 15 - Specified by:
permuteToDoubleWidth
in interfaceMatrixStrategy
- Parameters:
arrayAccess
- Accessor to the matrix data. This data will be permuted.n1
- Number of rows in the matrix.n2
- Number of columns in the matrix.- Throws:
ApfloatRuntimeException
- Since:
- 1.7.0
-
permuteToHalfWidth
public void permuteToHalfWidth(ArrayAccess arrayAccess, int n1, int n2) throws ApfloatRuntimeException
Permute the rows of the n1 x n2 matrix so that it is shaped like a 2*n1 x n2/2 matrix. Logically, the matrix is split in half, and the right half is moved below the left half.Both n1 and n2 must be powers of two. E.g. if the matrix layout is originally as follows:
Matrix before 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Then after this method it is as follows:
Matrix after 0 1 2 3 8 9 10 11 4 5 6 7 12 13 14 15 - Specified by:
permuteToHalfWidth
in interfaceMatrixStrategy
- Parameters:
arrayAccess
- Accessor to the matrix data. This data will be permuted.n1
- Number of rows in the matrix.n2
- Number of columns in the matrix.- Throws:
ApfloatRuntimeException
- Since:
- 1.7.0
-
moveBlock
private static void moveBlock(float[] source, int sourceOffset, int sourceWidth, float[] dest, int destOffset, int destWidth, int b)
-
transpose2blocks
private static void transpose2blocks(float[] data, int offset1, int offset2, int width, int b)
-
transposeBlock
private static void transposeBlock(float[] data, int offset, int width, int b)
-
transposeSquare
private static void transposeSquare(float[] data, int offset, int n1, int n2)
-
permuteToHalfWidth
private static void permuteToHalfWidth(float[] data, int offset, int n1, int n2)
-
permuteToDoubleWidth
private static void permuteToDoubleWidth(float[] data, int offset, int n1, int n2)
-
-