Class LongKernel

  • All Implemented Interfaces:
    java.lang.Cloneable

    class LongKernel
    extends com.aparapi.Kernel
    Kernel for the long element type. Contains everything needed for the NTT. The data is organized in columns, not rows, for efficient processing on the GPU. Due to the extreme parallelization requirements (global size should be at lest 1024) this algorithm works efficiently only with 8 million decimal digit calculations or bigger. However with 8 million digits, it's only approximately as fast as the pure-Java version (depending on the GPU and CPU hardware). Depending on the total amount of memory available for the GPU this algorithm will fail (or revert to the very slow software emulation) e.g. at one-billion-digit calculations if your GPU has 1 GB of memory. The maximum power-of-two size for a Java array is one billion (230) so if your GPU has more than 8 GB of memory then the algorithm can never fail (as any Java long[] will always fit to the GPU memory).

    Some notes about the aparapi specific requirements for code that must be converted to OpenCL:

    • assert() does not work
    • Can't check for null
    • Can't get array length
    • Arrays referenced by the kernel can't be null even if they are not accessed
    • Arrays referenced by the kernel can't be zero-length even if they are not accessed
    • Can't invoke methods in other classes e.g. enclosing class of an inner class
    • Early return statements do not work
    • Variables used inside loops must be initialized before the loop
    • Must compile the class with full debug information i.e. with -g
    Since:
    1.8.3
    Version:
    1.9.0
    • Nested Class Summary

      • Nested classes/interfaces inherited from class com.aparapi.Kernel

        com.aparapi.Kernel.Constant, com.aparapi.Kernel.Entry, com.aparapi.Kernel.EXECUTION_MODE, com.aparapi.Kernel.KernelState, com.aparapi.Kernel.Local, com.aparapi.Kernel.NoCL, com.aparapi.Kernel.OpenCLDelegate, com.aparapi.Kernel.OpenCLMapping, com.aparapi.Kernel.PrivateMemorySpace
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private LongKernel()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void columnScramble​(int offset)  
      private void columnTableFNT()  
      static LongKernel getInstance()  
      long getModulus()  
      private void inverseColumnTableFNT()  
      private long modAdd​(long a, long b)  
      private long modMultiply​(long a, long b)  
      private long modPow​(long a, long n)  
      private long modSubtract​(long a, long b)  
      private void multiplyElements()  
      private void permute()  
      void run()  
      void setArrayAccess​(ArrayAccess arrayAccess)  
      void setColumns​(int columns)  
      void setIndex​(int[] index)  
      void setIndexCount​(int indexCount)  
      void setLength​(int length)  
      void setModulus​(long modulus)  
      void setN2​(int n2)  
      void setOp​(int op)  
      void setPermutationTable​(int[] permutationTable)  
      void setRows​(int rows)  
      void setScaleFactor​(long scaleFactor)  
      void setStartColumn​(int startColumn)  
      void setStartRow​(int startRow)  
      void setW​(long w)  
      void setW1​(long w1)  
      void setW2​(long w2)  
      void setWTable​(long[] wTable)  
      void setWw​(long ww)  
      private void transformColumns()  
      private void transpose()  
      • Methods inherited from class com.aparapi.Kernel

        abs, abs, abs, abs, acos, acos, acospi, acospi, addExecutionModes, asin, asin, asinpi, asinpi, atan, atan, atan2, atan2, atan2pi, atan2pi, atanpi, atanpi, atomicAdd, atomicAdd, atomicAnd, atomicCmpXchg, atomicDec, atomicGet, atomicInc, atomicMax, atomicMin, atomicOr, atomicSet, atomicSub, atomicXchg, atomicXor, cancelMultiPass, cbrt, cbrt, ceil, ceil, cleanUpArrays, clone, clz, clz, compile, compile, cos, cos, cosh, cosh, cospi, cospi, createRange, dispose, execute, execute, execute, execute, execute, execute, executeFallbackAlgorithm, exp, exp, exp10, exp10, exp2, exp2, expm1, expm1, floor, floor, fma, fma, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, get, getAccumulatedExecutionTime, getAccumulatedExecutionTimeAllThreads, getAccumulatedExecutionTimeCurrentThread, getCancelState, getConversionTime, getCurrentPass, getExecutionMode, getExecutionTime, getGlobalId, getGlobalId, getGlobalSize, getGlobalSize, getGroupId, getGroupId, getKernelCompileWorkGroupSize, getKernelLocalMemSizeInUse, getKernelMaxWorkGroupSize, getKernelMinimumPrivateMemSizeInUsePerWorkItem, getKernelPreferredWorkGroupSizeMultiple, getKernelState, getLocalId, getLocalId, getLocalSize, getLocalSize, getMappedMethodName, getNumGroups, getNumGroups, getPassId, getProfileInfo, getProfileReportCurrentThread, getProfileReportLastThread, getTargetDevice, globalBarrier, hasFallbackAlgorithm, hasNextExecutionMode, hypot, hypot, IEEEremainder, IEEEremainder, invalidateCaches, isAllowDevice, isAutoCleanUpArrays, isExecuting, isExplicit, isMappedMethod, isOpenCLDelegateMethod, isRunningCL, localBarrier, localGlobalBarrier, log, log, log10, log10, log1p, log1p, log2, log2, mad, mad, max, max, max, max, min, min, min, min, nextAfter, nextAfter, popcount, popcount, pow, pow, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, put, registerProfileReportObserver, rint, rint, round, round, rsqrt, rsqrt, setAutoCleanUpArrays, setExecutionMode, setExecutionModeWithoutFallback, setExplicit, setFallbackExecutionMode, sin, sin, sinh, sinh, sinpi, sinpi, sqrt, sqrt, tan, tan, tanh, tanh, tanpi, tanpi, toDegrees, toDegrees, toRadians, toRadians, toString, tryNextExecutionMode, usesAtomic32, usesAtomic64
      • Methods inherited from class java.lang.Object

        equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • kernel

        private static java.lang.ThreadLocal<LongKernel> kernel
      • stride

        private int stride
      • length

        private int length
      • data

        private long[] data
      • offset

        private int offset
      • wTable

        private long[] wTable
      • permutationTable

        private int[] permutationTable
      • permutationTableLength

        private int permutationTableLength
      • modulus

        private long modulus
      • inverseModulus

        private double inverseModulus
      • n2

        private int n2
      • index

        private int[] index
      • indexCount

        private int indexCount
      • startRow

        private int startRow
      • startColumn

        private int startColumn
      • rows

        private int rows
      • columns

        private int columns
      • w

        private long w
      • scaleFactor

        private long scaleFactor
      • INVERSE_TRANSFORM_COLUMNS

        public static final int INVERSE_TRANSFORM_COLUMNS
        See Also:
        Constant Field Values
      • op

        private int op
      • ww

        private long ww
      • w1

        private long w1
      • w2

        private long w2
    • Constructor Detail

      • LongKernel

        private LongKernel()
    • Method Detail

      • getInstance

        public static LongKernel getInstance()
      • setLength

        public void setLength​(int length)
      • setWTable

        public void setWTable​(long[] wTable)
      • setPermutationTable

        public void setPermutationTable​(int[] permutationTable)
      • columnTableFNT

        private void columnTableFNT()
      • inverseColumnTableFNT

        private void inverseColumnTableFNT()
      • columnScramble

        private void columnScramble​(int offset)
      • modMultiply

        private long modMultiply​(long a,
                                 long b)
      • modAdd

        private long modAdd​(long a,
                            long b)
      • modSubtract

        private long modSubtract​(long a,
                                 long b)
      • setModulus

        public void setModulus​(long modulus)
      • getModulus

        public long getModulus()
      • setN2

        public void setN2​(int n2)
      • setIndex

        public void setIndex​(int[] index)
      • setIndexCount

        public void setIndexCount​(int indexCount)
      • transpose

        private void transpose()
      • permute

        private void permute()
      • setStartRow

        public void setStartRow​(int startRow)
      • setStartColumn

        public void setStartColumn​(int startColumn)
      • setRows

        public void setRows​(int rows)
      • setColumns

        public void setColumns​(int columns)
      • setW

        public void setW​(long w)
      • setScaleFactor

        public void setScaleFactor​(long scaleFactor)
      • multiplyElements

        private void multiplyElements()
      • modPow

        private long modPow​(long a,
                            long n)
      • setOp

        public void setOp​(int op)
      • setWw

        public void setWw​(long ww)
      • setW1

        public void setW1​(long w1)
      • setW2

        public void setW2​(long w2)
      • run

        public void run()
        Specified by:
        run in class com.aparapi.Kernel
      • transformColumns

        private void transformColumns()