Package org.apfloat.aparapi
Class LongKernel
- java.lang.Object
-
- com.aparapi.Kernel
-
- org.apfloat.aparapi.LongKernel
-
- All Implemented Interfaces:
java.lang.Cloneable
class LongKernel extends com.aparapi.Kernel
Kernel for thelong
element type. Contains everything needed for the NTT. The data is organized in columns, not rows, for efficient processing on the GPU. Due to the extreme parallelization requirements (global size should be at lest 1024) this algorithm works efficiently only with 8 million decimal digit calculations or bigger. However with 8 million digits, it's only approximately as fast as the pure-Java version (depending on the GPU and CPU hardware). Depending on the total amount of memory available for the GPU this algorithm will fail (or revert to the very slow software emulation) e.g. at one-billion-digit calculations if your GPU has 1 GB of memory. The maximum power-of-two size for a Java array is one billion (230) so if your GPU has more than 8 GB of memory then the algorithm can never fail (as any Java long[] will always fit to the GPU memory).Some notes about the aparapi specific requirements for code that must be converted to OpenCL:
assert()
does not work- Can't check for null
- Can't get array length
- Arrays referenced by the kernel can't be null even if they are not accessed
- Arrays referenced by the kernel can't be zero-length even if they are not accessed
- Can't invoke methods in other classes e.g. enclosing class of an inner class
- Early return statements do not work
- Variables used inside loops must be initialized before the loop
- Must compile the class with full debug information i.e. with
-g
- Since:
- 1.8.3
- Version:
- 1.9.0
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class com.aparapi.Kernel
com.aparapi.Kernel.Constant, com.aparapi.Kernel.Entry, com.aparapi.Kernel.EXECUTION_MODE, com.aparapi.Kernel.KernelState, com.aparapi.Kernel.Local, com.aparapi.Kernel.NoCL, com.aparapi.Kernel.OpenCLDelegate, com.aparapi.Kernel.OpenCLMapping, com.aparapi.Kernel.PrivateMemorySpace
-
-
Field Summary
Fields Modifier and Type Field Description private int
columns
private long[]
data
private int[]
index
private int
indexCount
static int
INVERSE_TRANSFORM_COLUMNS
static int
INVERSE_TRANSFORM_ROWS
private double
inverseModulus
private static java.lang.ThreadLocal<LongKernel>
kernel
private int
length
private long
modulus
static int
MULTIPLY_ELEMENTS
private int
n2
private int
offset
private int
op
private int[]
permutationTable
private int
permutationTableLength
static int
PERMUTE
private int
rows
private long
scaleFactor
private int
startColumn
private int
startRow
private int
stride
static int
TRANSFORM_COLUMNS
static int
TRANSFORM_ROWS
static int
TRANSPOSE
private long
w
private long
w1
private long
w2
private long[]
wTable
private long
ww
-
Constructor Summary
Constructors Modifier Constructor Description private
LongKernel()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
columnScramble(int offset)
private void
columnTableFNT()
static LongKernel
getInstance()
long
getModulus()
private void
inverseColumnTableFNT()
private long
modAdd(long a, long b)
private long
modMultiply(long a, long b)
private long
modPow(long a, long n)
private long
modSubtract(long a, long b)
private void
multiplyElements()
private void
permute()
void
run()
void
setArrayAccess(ArrayAccess arrayAccess)
void
setColumns(int columns)
void
setIndex(int[] index)
void
setIndexCount(int indexCount)
void
setLength(int length)
void
setModulus(long modulus)
void
setN2(int n2)
void
setOp(int op)
void
setPermutationTable(int[] permutationTable)
void
setRows(int rows)
void
setScaleFactor(long scaleFactor)
void
setStartColumn(int startColumn)
void
setStartRow(int startRow)
void
setW(long w)
void
setW1(long w1)
void
setW2(long w2)
void
setWTable(long[] wTable)
void
setWw(long ww)
private void
transformColumns()
private void
transpose()
-
Methods inherited from class com.aparapi.Kernel
abs, abs, abs, abs, acos, acos, acospi, acospi, addExecutionModes, asin, asin, asinpi, asinpi, atan, atan, atan2, atan2, atan2pi, atan2pi, atanpi, atanpi, atomicAdd, atomicAdd, atomicAnd, atomicCmpXchg, atomicDec, atomicGet, atomicInc, atomicMax, atomicMin, atomicOr, atomicSet, atomicSub, atomicXchg, atomicXor, cancelMultiPass, cbrt, cbrt, ceil, ceil, cleanUpArrays, clone, clz, clz, compile, compile, cos, cos, cosh, cosh, cospi, cospi, createRange, dispose, execute, execute, execute, execute, execute, execute, executeFallbackAlgorithm, exp, exp, exp10, exp10, exp2, exp2, expm1, expm1, floor, floor, fma, fma, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, getAccumulatedExecutionTime, getAccumulatedExecutionTimeAllThreads, getAccumulatedExecutionTimeCurrentThread, getCancelState, getConversionTime, getCurrentPass, getExecutionMode, getExecutionTime, getGlobalId, getGlobalId, getGlobalSize, getGlobalSize, getGroupId, getGroupId, getKernelCompileWorkGroupSize, getKernelLocalMemSizeInUse, getKernelMaxWorkGroupSize, getKernelMinimumPrivateMemSizeInUsePerWorkItem, getKernelPreferredWorkGroupSizeMultiple, getKernelState, getLocalId, getLocalId, getLocalSize, getLocalSize, getMappedMethodName, getNumGroups, getNumGroups, getPassId, getProfileInfo, getProfileReportCurrentThread, getProfileReportLastThread, getTargetDevice, globalBarrier, hasFallbackAlgorithm, hasNextExecutionMode, hypot, hypot, IEEEremainder, IEEEremainder, invalidateCaches, isAllowDevice, isAutoCleanUpArrays, isExecuting, isExplicit, isMappedMethod, isOpenCLDelegateMethod, isRunningCL, localBarrier, localGlobalBarrier, log, log, log10, log10, log1p, log1p, log2, log2, mad, mad, max, max, max, max, min, min, min, min, nextAfter, nextAfter, popcount, popcount, pow, pow, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, registerProfileReportObserver, rint, rint, round, round, rsqrt, rsqrt, setAutoCleanUpArrays, setExecutionMode, setExecutionModeWithoutFallback, setExplicit, setFallbackExecutionMode, sin, sin, sinh, sinh, sinpi, sinpi, sqrt, sqrt, tan, tan, tanh, tanh, tanpi, tanpi, toDegrees, toDegrees, toRadians, toRadians, toString, tryNextExecutionMode, usesAtomic32, usesAtomic64
-
-
-
-
Field Detail
-
kernel
private static java.lang.ThreadLocal<LongKernel> kernel
-
TRANSFORM_ROWS
public static final int TRANSFORM_ROWS
- See Also:
- Constant Field Values
-
INVERSE_TRANSFORM_ROWS
public static final int INVERSE_TRANSFORM_ROWS
- See Also:
- Constant Field Values
-
stride
private int stride
-
length
private int length
-
data
private long[] data
-
offset
private int offset
-
wTable
private long[] wTable
-
permutationTable
private int[] permutationTable
-
permutationTableLength
private int permutationTableLength
-
modulus
private long modulus
-
inverseModulus
private double inverseModulus
-
TRANSPOSE
public static final int TRANSPOSE
- See Also:
- Constant Field Values
-
PERMUTE
public static final int PERMUTE
- See Also:
- Constant Field Values
-
n2
private int n2
-
index
private int[] index
-
indexCount
private int indexCount
-
MULTIPLY_ELEMENTS
public static final int MULTIPLY_ELEMENTS
- See Also:
- Constant Field Values
-
startRow
private int startRow
-
startColumn
private int startColumn
-
rows
private int rows
-
columns
private int columns
-
w
private long w
-
scaleFactor
private long scaleFactor
-
TRANSFORM_COLUMNS
public static final int TRANSFORM_COLUMNS
- See Also:
- Constant Field Values
-
INVERSE_TRANSFORM_COLUMNS
public static final int INVERSE_TRANSFORM_COLUMNS
- See Also:
- Constant Field Values
-
op
private int op
-
ww
private long ww
-
w1
private long w1
-
w2
private long w2
-
-
Method Detail
-
getInstance
public static LongKernel getInstance()
-
setLength
public void setLength(int length)
-
setArrayAccess
public void setArrayAccess(ArrayAccess arrayAccess) throws ApfloatRuntimeException
- Throws:
ApfloatRuntimeException
-
setWTable
public void setWTable(long[] wTable)
-
setPermutationTable
public void setPermutationTable(int[] permutationTable)
-
columnTableFNT
private void columnTableFNT()
-
inverseColumnTableFNT
private void inverseColumnTableFNT()
-
columnScramble
private void columnScramble(int offset)
-
modMultiply
private long modMultiply(long a, long b)
-
modAdd
private long modAdd(long a, long b)
-
modSubtract
private long modSubtract(long a, long b)
-
setModulus
public void setModulus(long modulus)
-
getModulus
public long getModulus()
-
setN2
public void setN2(int n2)
-
setIndex
public void setIndex(int[] index)
-
setIndexCount
public void setIndexCount(int indexCount)
-
transpose
private void transpose()
-
permute
private void permute()
-
setStartRow
public void setStartRow(int startRow)
-
setStartColumn
public void setStartColumn(int startColumn)
-
setRows
public void setRows(int rows)
-
setColumns
public void setColumns(int columns)
-
setW
public void setW(long w)
-
setScaleFactor
public void setScaleFactor(long scaleFactor)
-
multiplyElements
private void multiplyElements()
-
modPow
private long modPow(long a, long n)
-
setOp
public void setOp(int op)
-
setWw
public void setWw(long ww)
-
setW1
public void setW1(long w1)
-
setW2
public void setW2(long w2)
-
run
public void run()
- Specified by:
run
in classcom.aparapi.Kernel
-
transformColumns
private void transformColumns()
-
-