Package com.aparapi

Class Kernel

  • All Implemented Interfaces:
    java.lang.Cloneable

    public abstract class Kernel
    extends java.lang.Object
    implements java.lang.Cloneable
    A kernel encapsulates a data parallel algorithm that will execute either on a GPU (through conversion to OpenCL) or on a CPU via a Java Thread Pool.

    To write a new kernel, a developer extends the Kernel class and overrides the Kernel.run() method. To execute this kernel, the developer creates a new instance of it and calls Kernel.execute(int globalSize) with a suitable 'global size'. At runtime Aparapi will attempt to convert the Kernel.run() method (and any method called directly or indirectly by Kernel.run()) into OpenCL for execution on GPU devices made available via the OpenCL platform.

    Note that Kernel.run() is not called directly. Instead, the Kernel.execute(int globalSize) method will cause the overridden Kernel.run() method to be invoked once for each value in the range 0...globalSize.

    On the first call to Kernel.execute(int _globalSize), Aparapi will determine the EXECUTION_MODE of the kernel. This decision is made dynamically based on two factors:

    1. Whether OpenCL is available (appropriate drivers are installed and the OpenCL and Aparapi dynamic libraries are included on the system path).
    2. Whether the bytecode of the run() method (and every method that can be called directly or indirectly from the run() method) can be converted into OpenCL.

    Below is an example Kernel that calculates the square of a set of input values.

         class SquareKernel extends Kernel{
             private int values[];
             private int squares[];
             public SquareKernel(int values[]){
                this.values = values;
                squares = new int[values.length];
             }
             public void run() {
                 int gid = getGlobalID();
                 squares[gid] = values[gid]*values[gid];
             }
             public int[] getSquares(){
                 return(squares);
             }
         }
     

    To execute this kernel, first create a new instance of it and then call execute(Range _range).

         int[] values = new int[1024];
         // fill values array
         Range range = Range.create(values.length); // create a range 0..1024
         SquareKernel kernel = new SquareKernel(values);
         kernel.execute(range);
     

    When execute(Range) returns, all the executions of Kernel.run() have completed and the results are available in the squares array.

         int[] squares = kernel.getSquares();
         for (int i=0; i< values.length; i++){
            System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]);
         }
     

    A different approach to creating kernels that avoids extending Kernel is to write an anonymous inner class:

    
         final int[] values = new int[1024];
         // fill the values array
         final int[] squares = new int[values.length];
         final Range range = Range.create(values.length);
    
         Kernel kernel = new Kernel(){
             public void run() {
                 int gid = getGlobalID();
                 squares[gid] = values[gid]*values[gid];
             }
         };
         kernel.execute(range);
         for (int i=0; i< values.length; i++){
            System.out.printf("%4d %4d %8d\n", i, values[i], squares[i]);
         }
    
     

    Version:
    Alpha, 21/09/2010
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static interface  Kernel.Constant
      We can use this Annotation to 'tag' intended constant buffers.
      class  Kernel.Entry  
      static class  Kernel.EXECUTION_MODE
      Deprecated.
      It is no longer recommended that EXECUTION_MODEs are used, as a more sophisticated Device preference mechanism is in place, see KernelManager.
      class  Kernel.KernelState
      This class is for internal Kernel state management
      static interface  Kernel.Local
      We can use this Annotation to 'tag' intended local buffers.
      static interface  Kernel.NoCL
      Annotation which can be applied to either a getter (with usual java bean naming convention relative to an instance field), or to any method with void return type, which prevents both the method body and any calls to the method being emitted in the generated OpenCL.
      protected static interface  Kernel.OpenCLDelegate
      This annotation is for internal use only
      protected static interface  Kernel.OpenCLMapping
      This annotation is for internal use only
      static interface  Kernel.PrivateMemorySpace
      We can use this Annotation to 'tag' __private (unshared) array fields.
    • Constructor Summary

      Constructors 
      Constructor Description
      Kernel()  
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      protected double abs​(double _d)
      Delegates to either Math.abs(double) (Java) or fabs(double) (OpenCL).
      protected float abs​(float _f)
      Delegates to either Math.abs(float) (Java) or fabs(float) (OpenCL).
      protected int abs​(int n)
      Delegates to either Math.abs(int) (Java) or abs(int) (OpenCL).
      protected long abs​(long n)
      Delegates to either Math.abs(long) (Java) or abs(long) (OpenCL).
      protected double acos​(double a)
      Delegates to either Math.acos(double) (Java) or acos(double) (OpenCL).
      protected float acos​(float a)
      Delegates to either Math.acos(double) (Java) or acos(float) (OpenCL).
      protected double acospi​(double a)  
      protected float acospi​(float a)  
      void addExecutionModes​(Kernel.EXECUTION_MODE... platforms)
      Deprecated.
      protected double asin​(double _d)
      Delegates to either Math.asin(double) (Java) or asin(double) (OpenCL).
      protected float asin​(float _f)
      Delegates to either Math.asin(double) (Java) or asin(float) (OpenCL).
      protected double asinpi​(double a)  
      protected float asinpi​(float a)  
      protected double atan​(double _d)
      Delegates to either Math.atan(double) (Java) or atan(double) (OpenCL).
      protected float atan​(float _f)
      Delegates to either Math.atan(double) (Java) or atan(float) (OpenCL).
      protected double atan2​(double _d1, double _d2)
      Delegates to either Math.atan2(double, double) (Java) or atan2(double, double) (OpenCL).
      protected float atan2​(float _f1, float _f2)
      Delegates to either Math.atan2(double, double) (Java) or atan2(float, float) (OpenCL).
      protected double atan2pi​(double y, double x)  
      protected float atan2pi​(float y, double x)  
      protected double atanpi​(double a)  
      protected float atanpi​(float a)  
      protected int atomicAdd​(int[] _arr, int _index, int _delta)
      Atomically adds _delta value to _index element of array _arr (Java) or delegates to atomic_add(volatile int*, int) (OpenCL).
      protected int atomicAdd​(java.util.concurrent.atomic.AtomicInteger p, int val)  
      protected int atomicAnd​(java.util.concurrent.atomic.AtomicInteger p, int val)  
      protected int atomicCmpXchg​(java.util.concurrent.atomic.AtomicInteger p, int expectedVal, int newVal)  
      protected int atomicDec​(java.util.concurrent.atomic.AtomicInteger p)  
      protected int atomicGet​(java.util.concurrent.atomic.AtomicInteger p)  
      protected int atomicInc​(java.util.concurrent.atomic.AtomicInteger p)  
      protected int atomicMax​(java.util.concurrent.atomic.AtomicInteger p, int val)  
      protected int atomicMin​(java.util.concurrent.atomic.AtomicInteger p, int val)  
      protected int atomicOr​(java.util.concurrent.atomic.AtomicInteger p, int val)  
      protected void atomicSet​(java.util.concurrent.atomic.AtomicInteger p, int val)  
      protected int atomicSub​(java.util.concurrent.atomic.AtomicInteger p, int val)  
      protected int atomicXchg​(java.util.concurrent.atomic.AtomicInteger p, int newVal)  
      protected int atomicXor​(java.util.concurrent.atomic.AtomicInteger p, int val)  
      private static <K,​V,​T extends java.lang.Throwable>
      ValueCache<java.lang.Class<?>,​java.util.Map<K,​V>,​T>
      cacheProperty​(ValueCache.ThrowingValueComputer<java.lang.Class<?>,​java.util.Map<K,​V>,​T> throwingValueComputer)  
      void cancelMultiPass()
      Invoking this method flags that once the current pass is complete execution should be abandoned.
      protected double cbrt​(double a)  
      protected float cbrt​(float a)  
      protected double ceil​(double _d)
      Delegates to either Math.ceil(double) (Java) or ceil(double) (OpenCL).
      protected float ceil​(float _f)
      Delegates to either Math.ceil(double) (Java) or ceil(float) (OpenCL).
      void cleanUpArrays()
      Frees the bulk of the resources used by this kernel, by setting array sizes in non-primitive KernelArgs to 1 (0 size is prohibited) and invoking kernel execution on a zero size range.
      Kernel clone()
      When using a Java Thread Pool Aparapi uses clone to copy the initial instance to each thread.
      protected int clz​(int _i)
      Delegates to either Integer.numberOfLeadingZeros(int) (Java) or clz(int) (OpenCL).
      protected long clz​(long _l)
      Delegates to either Long.numberOfLeadingZeros(long) (Java) or clz(long) (OpenCL).
      Kernel compile​(Device _device)
      Force pre-compilation of the kernel for a given device, without executing it.
      Kernel compile​(java.lang.String _entrypoint, Device _device)
      Force pre-compilation of the kernel for a given device, without executing it.
      protected double cos​(double _d)
      Delegates to either Math.cos(double) (Java) or cos(double) (OpenCL).
      protected float cos​(float _f)
      Delegates to either Math.cos(double) (Java) or cos(float) (OpenCL).
      protected double cosh​(double x)  
      protected float cosh​(float x)  
      protected double cospi​(double a)  
      protected float cospi​(float a)  
      protected Range createRange​(int _range)  
      private static java.lang.String descriptorToReturnTypeLetter​(java.lang.String desc)  
      void dispose()
      Release any resources associated with this Kernel.
      Kernel execute​(int _range)
      Start execution of _range kernels.
      Kernel execute​(int _range, int _passes)
      Start execution of _passes iterations over the _range of kernels.
      Kernel execute​(Range _range)
      Start execution of _range kernels.
      Kernel execute​(Range _range, int _passes)
      Start execution of _passes iterations of _range kernels.
      Kernel execute​(java.lang.String _entrypoint, Range _range)
      Start execution of globalSize kernels for the given entrypoint.
      Kernel execute​(java.lang.String _entrypoint, Range _range, int _passes)
      Start execution of globalSize kernels for the given entrypoint.
      void executeFallbackAlgorithm​(Range _range, int _passId)
      If hasFallbackAlgorithm() has been overriden to return true, this method should be overriden so as to apply a single pass of the kernel's logic to the entire _range.
      protected double exp​(double _d)
      Delegates to either Math.exp(double) (Java) or exp(double) (OpenCL).
      protected float exp​(float _f)
      Delegates to either Math.exp(double) (Java) or exp(float) (OpenCL).
      protected double exp10​(double a)  
      protected float exp10​(float a)  
      protected double exp2​(double a)  
      protected float exp2​(float a)  
      protected double expm1​(double x)  
      protected float expm1​(float x)  
      protected double floor​(double _d)
      Delegates to either Math.floor(double) (Java) or floor(double) (OpenCL).
      protected float floor​(float _f)
      Delegates to either Math.floor(double) (Java) or floor(float) (OpenCL).
      protected double fma​(double a, double b, double c)
      Delegates to either {code}a*b+c{code} (Java) or fma(double, double, double) (OpenCL).
      protected float fma​(float a, float b, float c)
      Delegates to either {code}a*b+c{code} (Java) or fma(float, float, float) (OpenCL).
      Kernel get​(boolean[] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(boolean[][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(boolean[][][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(byte[] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(byte[][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(byte[][][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(char[] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(char[][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(char[][][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(double[] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(double[][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(double[][][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(float[] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(float[][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(float[][][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(int[] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(int[][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(int[][][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(long[] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(long[][] array)
      Enqueue a request to return this buffer from the GPU.
      Kernel get​(long[][][] array)
      Enqueue a request to return this buffer from the GPU.
      double getAccumulatedExecutionTime()
      Determine the total execution time of all previous Kernel.execute(range) calls for all threads that ran this kernel for the device used in the last kernel execution.
      double getAccumulatedExecutionTimeAllThreads​(Device device)
      Determine the total execution time of all produced profile reports from all threads that executed the current kernel on the specified device.
      double getAccumulatedExecutionTimeCurrentThread​(Device device)
      Determine the total execution time of all previous kernel executions called from the current thread, calling this method, that executed the current kernel on the specified device.
      private static java.lang.String getArgumentsLetters​(java.lang.reflect.Method method)  
      private static boolean getBoolean​(ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​java.lang.Boolean>,​java.lang.RuntimeException> methodNamesCache, ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)  
      int getCancelState()  
      double getConversionTime()
      Determine the time taken to convert bytecode to OpenCL for first Kernel.execute(range) call.
      int getCurrentPass()  
      Kernel.EXECUTION_MODE getExecutionMode()
      Deprecated.
      double getExecutionTime()
      Determine the execution time of the previous Kernel.execute(range) called from the last thread that ran and executed on the most recently used device.
      protected int getGlobalId()
      Determine the globalId of an executing kernel.
      protected int getGlobalId​(int _dim)  
      protected int getGlobalSize()
      Determine the value that was passed to Kernel.execute(int globalSize) method.
      protected int getGlobalSize​(int _dim)  
      protected int getGroupId()
      Determine the groupId of an executing kernel.
      protected int getGroupId​(int _dim)  
      int[] getKernelCompileWorkGroupSize​(Device device)
      Retrieves the specified work-group size in the compiled kernel for the specified device or intermediate language for the device.
      long getKernelLocalMemSizeInUse​(Device device)
      Retrieves the amount of local memory used in the specified device by this kernel instance.
      int getKernelMaxWorkGroupSize​(Device device)
      Retrieves the maximum work-group size allowed for this kernel when running on the specified device.
      long getKernelMinimumPrivateMemSizeInUsePerWorkItem​(Device device)
      Retrieves that minimum private memory in use per work item for this kernel instance and the specified device.
      int getKernelPreferredWorkGroupSizeMultiple​(Device device)
      Retrieves the preferred work-group multiple in the specified device for this kernel instance.
      Kernel.KernelState getKernelState()  
      protected int getLocalId()
      Determine the local id of an executing kernel.
      protected int getLocalId​(int _dim)  
      protected int getLocalSize()
      Determine the size of the group that an executing kernel is a member of.
      protected int getLocalSize​(int _dim)  
      static java.lang.String getMappedMethodName​(ClassModel.ConstantPool.MethodReferenceEntry _methodReferenceEntry)  
      protected int getNumGroups()
      Determine the number of groups that will be used to execute a kernel
      protected int getNumGroups​(int _dim)  
      protected int getPassId()
      Determine the passId of an executing kernel.
      java.util.List<ProfileInfo> getProfileInfo()
      Get the profiling information from the last successful call to Kernel.execute().
      java.lang.ref.WeakReference<ProfileReport> getProfileReportCurrentThread​(Device device)
      Retrieves the most recent complete report available for the current thread calling this method for the current kernel instance and executed on the given device.
      java.lang.ref.WeakReference<ProfileReport> getProfileReportLastThread​(Device device)
      Retrieves a profile report for the last thread that executed this kernel on the given device.
      private static <V,​T extends java.lang.Throwable>
      V
      getProperty​(ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​V>,​T> cache, ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry, V defaultValue)  
      private static java.lang.String getReturnTypeLetter​(java.lang.reflect.Method meth)  
      Device getTargetDevice()  
      protected void globalBarrier()
      Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
      It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.
      boolean hasFallbackAlgorithm()
      False by default.
      boolean hasNextExecutionMode()
      Deprecated.
      protected double hypot​(double a, double b)  
      protected float hypot​(float a, float b)  
      protected double IEEEremainder​(double _d1, double _d2)
      Delegates to either Math.IEEEremainder(double, double) (Java) or remainder(double, double) (OpenCL).
      protected float IEEEremainder​(float _f1, float _f2)
      Delegates to either Math.IEEEremainder(double, double) (Java) or remainder(float, float) (OpenCL).
      static void invalidateCaches()  
      boolean isAllowDevice​(Device _device)  
      boolean isAutoCleanUpArrays()  
      boolean isExecuting()  
      boolean isExplicit()
      For dev purposes (we should remove this for production) determine whether this Kernel uses explicit memory management
      static boolean isMappedMethod​(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)  
      static boolean isOpenCLDelegateMethod​(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)  
      private static boolean isRelevant​(java.lang.reflect.Method method)  
      boolean isRunningCL()  
      protected void localBarrier()
      Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
      It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.
      protected void localGlobalBarrier()
      Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
      It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.
      protected double log​(double _d)
      Delegates to either Math.log(double) (Java) or log(double) (OpenCL).
      protected float log​(float _f)
      Delegates to either Math.log(double) (Java) or log(float) (OpenCL).
      protected double log10​(double a)  
      protected float log10​(float a)  
      protected double log1p​(double x)  
      protected float log1p​(float x)  
      protected double log2​(double a)  
      protected float log2​(float a)  
      protected double mad​(double a, double b, double c)  
      protected float mad​(float a, float b, float c)  
      private static <A extends java.lang.annotation.Annotation>
      ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​java.lang.Boolean>,​java.lang.RuntimeException>
      markedWith​(java.lang.Class<A> annotationClass)  
      protected double max​(double _d1, double _d2)
      Delegates to either Math.max(double, double) (Java) or fmax(double, double) (OpenCL).
      protected float max​(float _f1, float _f2)
      Delegates to either Math.max(float, float) (Java) or fmax(float, float) (OpenCL).
      protected int max​(int n1, int n2)
      Delegates to either Math.max(int, int) (Java) or max(int, int) (OpenCL).
      protected long max​(long n1, long n2)
      Delegates to either Math.max(long, long) (Java) or max(long, long) (OpenCL).
      protected double min​(double _d1, double _d2)
      Delegates to either Math.min(double, double) (Java) or fmin(double, double) (OpenCL).
      protected float min​(float _f1, float _f2)
      Delegates to either Math.min(float, float) (Java) or fmin(float, float) (OpenCL).
      protected int min​(int n1, int n2)
      Delegates to either Math.min(int, int) (Java) or min(int, int) (OpenCL).
      protected long min​(long n1, long n2)
      Delegates to either Math.min(long, long) (Java) or min(long, long) (OpenCL).
      private float native_rsqrt​(float _f)  
      private float native_sqrt​(float _f)  
      protected double nextAfter​(double start, double direction)  
      protected float nextAfter​(float start, float direction)  
      protected int popcount​(int _i)
      Delegates to either Integer.bitCount(int) (Java) or popcount(int) (OpenCL).
      protected long popcount​(long _i)
      Delegates to either Long.bitCount(long) (Java) or popcount(long) (OpenCL).
      protected double pow​(double _d1, double _d2)
      Delegates to either Math.pow(double, double) (Java) or pow(double, double) (OpenCL).
      protected float pow​(float _f1, float _f2)
      Delegates to either Math.pow(double, double) (Java) or pow(float, float) (OpenCL).
      private KernelRunner prepareKernelRunner()  
      Kernel put​(boolean[] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(boolean[][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(boolean[][][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(byte[] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(byte[][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(byte[][][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(char[] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(char[][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(char[][][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(double[] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(double[][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(double[][][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(float[] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(float[][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(float[][][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(int[] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(int[][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(int[][][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(long[] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(long[][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      Kernel put​(long[][][] array)
      Tag this array so that it is explicitly enqueued before the kernel is executed
      void registerProfileReportObserver​(IProfileReportObserver observer)
      Registers a new profile report observer to receive profile reports as they're produced.
      protected double rint​(double _d)
      Delegates to either Math.rint(double) (Java) or rint(double) (OpenCL).
      protected float rint​(float _f)
      Delegates to either Math.rint(double) (Java) or rint(float) (OpenCL).
      protected long round​(double _d)
      Delegates to either Math.round(double) (Java) or round(double) (OpenCL).
      protected int round​(float _f)
      Delegates to either Math.round(float) (Java) or round(float) (OpenCL).
      protected double rsqrt​(double _d)
      Computes inverse square root using Math.sqrt(double) (Java) or delegates to rsqrt(double) (OpenCL).
      protected float rsqrt​(float _f)
      Computes inverse square root using Math.sqrt(double) (Java) or delegates to rsqrt(double) (OpenCL).
      abstract void run()
      The entry point of a kernel.
      void setAutoCleanUpArrays​(boolean autoCleanUpArrays)
      Property which if true enables automatic calling of cleanUpArrays() following each execution.
      void setExecutionMode​(Kernel.EXECUTION_MODE _executionMode)
      Deprecated.
      void setExecutionModeWithoutFallback​(Kernel.EXECUTION_MODE _executionMode)  
      void setExplicit​(boolean _explicit)
      For dev purposes (we should remove this for production) allow us to define that this Kernel uses explicit memory management
      void setFallbackExecutionMode()
      Deprecated.
      protected double sin​(double _d)
      Delegates to either Math.sin(double) (Java) or sin(double) (OpenCL).
      protected float sin​(float _f)
      Delegates to either Math.sin(double) (Java) or sin(float) (OpenCL).
      protected double sinh​(double x)
      Delegates to either Math.sinh(double) (Java) or sinh(double) (OpenCL).
      protected float sinh​(float x)
      Delegates to either Math.sinh(double) (Java) or sinh(float) (OpenCL).
      protected double sinpi​(double a)
      Backed by either Math.sin(double) (Java) or sinpi(double) (OpenCL).
      protected float sinpi​(float a)
      Backed by either Math.sin(double) (Java) or sinpi(float) (OpenCL).
      protected double sqrt​(double _d)
      Delegates to either Math.sqrt(double) (Java) or sqrt(double) (OpenCL).
      protected float sqrt​(float _f)
      Delegates to either Math.sqrt(double) (Java) or sqrt(float) (OpenCL).
      protected double tan​(double _d)
      Delegates to either Math.tan(double) (Java) or tan(double) (OpenCL).
      protected float tan​(float _f)
      Delegates to either Math.tan(double) (Java) or tan(float) (OpenCL).
      protected double tanh​(double x)
      Delegates to either Math.tanh(double) (Java) or tanh(double) (OpenCL).
      protected float tanh​(float x)
      Delegates to either java.lang.Math#tanh(float) (Java) or tanh(float) (OpenCL).
      protected double tanpi​(double a)
      Backed by either Math.tan(double) (Java) or tanpi(double) (OpenCL).
      protected float tanpi​(float a)
      Backed by either Math.tan(double) (Java) or tanpi(float) (OpenCL).
      private static java.lang.String toClassShortNameIfAny​(java.lang.Class<?> retClass)  
      protected double toDegrees​(double _d)
      Delegates to either Math.toDegrees(double) (Java) or degrees(double) (OpenCL).
      protected float toDegrees​(float _f)
      Delegates to either Math.toDegrees(double) (Java) or degrees(float) (OpenCL).
      protected double toRadians​(double _d)
      Delegates to either Math.toRadians(double) (Java) or radians(double) (OpenCL).
      protected float toRadians​(float _f)
      Delegates to either Math.toRadians(double) (Java) or radians(float) (OpenCL).
      private static java.lang.String toSignature​(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)  
      (package private) static java.lang.String toSignature​(java.lang.reflect.Method method)  
      java.lang.String toString()  
      void tryNextExecutionMode()
      Deprecated.
      static boolean usesAtomic32​(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)  
      static boolean usesAtomic64​(ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry)  
      • Methods inherited from class java.lang.Object

        equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • logger

        private static java.util.logging.Logger logger
      • LOCAL_SUFFIX

        public static final java.lang.String LOCAL_SUFFIX
        We can use this suffix to 'tag' intended local buffers. So either name the buffer
        
          int[] buffer_$local$ = new int[1024];
          
        Or use the Annotation form
        
          @Local int[] buffer = new int[1024];
          
        See Also:
        Constant Field Values
      • CONSTANT_SUFFIX

        public static final java.lang.String CONSTANT_SUFFIX
        We can use this suffix to 'tag' intended constant buffers. So either name the buffer
        
          int[] buffer_$constant$ = new int[1024];
          
        Or use the Annotation form
        
          @Constant int[] buffer = new int[1024];
          
        See Also:
        Constant Field Values
      • PRIVATE_SUFFIX

        public static final java.lang.String PRIVATE_SUFFIX
        We can use this suffix to 'tag' __private buffers.

        So either name the buffer

        
          int[] buffer_$private$32 = new int[32];
          
        Or use the Annotation form
        
          @PrivateMemorySpace(32) int[] buffer = new int[32];
          
        See Also:
        for a more detailed usage summary, Constant Field Values
      • autoCleanUpArrays

        private boolean autoCleanUpArrays
      • LOG_2_RECIPROCAL

        private static final double LOG_2_RECIPROCAL
      • minOperator

        private static final java.util.function.IntBinaryOperator minOperator
      • maxOperator

        private static final java.util.function.IntBinaryOperator maxOperator
      • andOperator

        private static final java.util.function.IntBinaryOperator andOperator
      • orOperator

        private static final java.util.function.IntBinaryOperator orOperator
      • xorOperator

        private static final java.util.function.IntBinaryOperator xorOperator
      • typeToLetterMap

        static final java.util.Map<java.lang.String,​java.lang.String> typeToLetterMap
      • useNullForLocalSize

        boolean useNullForLocalSize
      • mappedMethodFlags

        private static final ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​java.lang.Boolean>,​java.lang.RuntimeException> mappedMethodFlags
      • openCLDelegateMethodFlags

        private static final ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​java.lang.Boolean>,​java.lang.RuntimeException> openCLDelegateMethodFlags
      • atomic32Cache

        private static final ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​java.lang.Boolean>,​java.lang.RuntimeException> atomic32Cache
      • atomic64Cache

        private static final ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​java.lang.Boolean>,​java.lang.RuntimeException> atomic64Cache
      • mappedMethodNamesCache

        private static final ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​java.lang.String>,​java.lang.RuntimeException> mappedMethodNamesCache
    • Constructor Detail

      • Kernel

        public Kernel()
    • Method Detail

      • getGlobalId

        protected final int getGlobalId()
        Determine the globalId of an executing kernel.

        The kernel implementation uses the globalId to determine which of the executing kernels (in the global domain space) this invocation is expected to deal with.

        For example in a SquareKernel implementation:

             class SquareKernel extends Kernel{
                 private int values[];
                 private int squares[];
                 public SquareKernel(int values[]){
                    this.values = values;
                    squares = new int[values.length];
                 }
                 public void run() {
                     int gid = getGlobalID();
                     squares[gid] = values[gid]*values[gid];
                 }
                 public int[] getSquares(){
                     return(squares);
                 }
             }
         

        Each invocation of SquareKernel.run() retrieves it's globalId by calling getGlobalId(), and then computes the value of square[gid] for a given value of value[gid].

        Returns:
        The globalId for the Kernel being executed
        See Also:
        getLocalId(), getGroupId(), getGlobalSize(), getNumGroups(), getLocalSize()
      • getGlobalId

        protected final int getGlobalId​(int _dim)
      • getGroupId

        protected final int getGroupId()
        Determine the groupId of an executing kernel.

        When a Kernel.execute(int globalSize) is invoked for a particular kernel, the runtime will break the work into various 'groups'.

        A kernel can use getGroupId() to determine which group a kernel is currently dispatched to

        The following code would capture the groupId for each kernel and map it against globalId.

             final int[] groupIds = new int[1024];
             Kernel kernel = new Kernel(){
                 public void run() {
                     int gid = getGlobalId();
                     groupIds[gid] = getGroupId();
                 }
             };
             kernel.execute(groupIds.length);
             for (int i=0; i< values.length; i++){
                System.out.printf("%4d %4d\n", i, groupIds[i]);
             }
         
        Returns:
        The groupId for this Kernel being executed
        See Also:
        getLocalId(), getGlobalId(), getGlobalSize(), getNumGroups(), getLocalSize()
      • getGroupId

        protected final int getGroupId​(int _dim)
      • getPassId

        protected final int getPassId()
        Determine the passId of an executing kernel.

        When a Kernel.execute(int globalSize, int passes) is invoked for a particular kernel, the runtime will break the work into various 'groups'.

        A kernel can use getPassId() to determine which pass we are in. This is ideal for 'reduce' type phases

        Returns:
        The groupId for this Kernel being executed
        See Also:
        getLocalId(), getGlobalId(), getGlobalSize(), getNumGroups(), getLocalSize()
      • getLocalId

        protected final int getLocalId()
        Determine the local id of an executing kernel.

        When a Kernel.execute(int globalSize) is invoked for a particular kernel, the runtime will break the work into various 'groups'. getLocalId() can be used to determine the relative id of the current kernel within a specific group.

        The following code would capture the groupId for each kernel and map it against globalId.

             final int[] localIds = new int[1024];
             Kernel kernel = new Kernel(){
                 public void run() {
                     int gid = getGlobalId();
                     localIds[gid] = getLocalId();
                 }
             };
             kernel.execute(localIds.length);
             for (int i=0; i< values.length; i++){
                System.out.printf("%4d %4d\n", i, localIds[i]);
             }
         
        Returns:
        The local id for this Kernel being executed
        See Also:
        getGroupId(), getGlobalId(), getGlobalSize(), getNumGroups(), getLocalSize()
      • getLocalId

        protected final int getLocalId​(int _dim)
      • getLocalSize

        protected final int getLocalSize()
        Determine the size of the group that an executing kernel is a member of.

        When a Kernel.execute(int globalSize) is invoked for a particular kernel, the runtime will break the work into various 'groups'. getLocalSize() allows a kernel to determine the size of the current group.

        Note groups may not all be the same size. In particular, if (global size)%(# of compute devices)!=0, the runtime can choose to dispatch kernels to groups with differing sizes.

        Returns:
        The size of the currently executing group.
        See Also:
        getGroupId(), getGlobalId(), getGlobalSize(), getNumGroups(), getLocalSize()
      • getLocalSize

        protected final int getLocalSize​(int _dim)
      • getGlobalSize

        protected final int getGlobalSize()
        Determine the value that was passed to Kernel.execute(int globalSize) method.
        Returns:
        The value passed to Kernel.execute(int globalSize) causing the current execution.
        See Also:
        getGroupId(), getGlobalId(), getNumGroups(), getLocalSize()
      • getGlobalSize

        protected final int getGlobalSize​(int _dim)
      • getNumGroups

        protected final int getNumGroups()
        Determine the number of groups that will be used to execute a kernel

        When Kernel.execute(int globalSize) is invoked, the runtime will split the work into multiple 'groups'. getNumGroups() returns the total number of groups that will be used.

        Returns:
        The number of groups that kernels will be dispatched into.
        See Also:
        getGroupId(), getGlobalId(), getGlobalSize(), getNumGroups(), getLocalSize()
      • getNumGroups

        protected final int getNumGroups​(int _dim)
      • run

        public abstract void run()
        The entry point of a kernel.

        Every kernel must override this method.

      • hasFallbackAlgorithm

        public boolean hasFallbackAlgorithm()
        False by default. In the event that all preferred devices fail to execute a kernel, it is possible to supply an alternate (possibly non-parallel) execution algorithm by overriding this method to return true, and overriding executeFallbackAlgorithm(Range, int) with the alternate algorithm.
      • executeFallbackAlgorithm

        public void executeFallbackAlgorithm​(Range _range,
                                             int _passId)
        If hasFallbackAlgorithm() has been overriden to return true, this method should be overriden so as to apply a single pass of the kernel's logic to the entire _range.

        This is not normally required, as fallback to JavaDevice.THREAD_POOL will implement the algorithm in parallel. However in the event that thread pool execution may be prohibitively slow, this method might implement a "quick and dirty" approximation to the desired result (for example, a simple box-blur as opposed to a gaussian blur in an image processing application).

      • cancelMultiPass

        public void cancelMultiPass()
        Invoking this method flags that once the current pass is complete execution should be abandoned. Due to the complexity of intercommunication between java (or C) and executing OpenCL, this is the best we can do for general cancellation of execution at present. OpenCL 2.0 should introduce pipe mechanisms which will support mid-pass cancellation easily.

        Note that in the case of thread-pool/pure java execution we could do better already, using Thread.interrupt() (and/or other means) to abandon execution mid-pass. However at present this is not attempted.

        See Also:
        execute(int, int), execute(Range, int), execute(String, Range, int)
      • getCancelState

        public int getCancelState()
      • clone

        public Kernel clone()
        When using a Java Thread Pool Aparapi uses clone to copy the initial instance to each thread.

        If you choose to override clone() you are responsible for delegating to super.clone();

        Overrides:
        clone in class java.lang.Object
      • acos

        protected float acos​(float a)
        Delegates to either Math.acos(double) (Java) or acos(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        a - value to delegate to Math.acos(double)/acos(float)
        Returns:
        Math.acos(double) casted to float/acos(float)
        See Also:
        Math.acos(double), acos(float)
      • acos

        protected double acos​(double a)
        Delegates to either Math.acos(double) (Java) or acos(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        a - value to delegate to Math.acos(double)/acos(double)
        Returns:
        Math.acos(double)/acos(double)
        See Also:
        Math.acos(double), acos(double)
      • asin

        protected float asin​(float _f)
        Delegates to either Math.asin(double) (Java) or asin(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.asin(double)/asin(float)
        Returns:
        Math.asin(double) casted to float/asin(float)
        See Also:
        Math.asin(double), asin(float)
      • asin

        protected double asin​(double _d)
        Delegates to either Math.asin(double) (Java) or asin(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.asin(double)/asin(double)
        Returns:
        Math.asin(double)/asin(double)
        See Also:
        Math.asin(double), asin(double)
      • atan

        protected float atan​(float _f)
        Delegates to either Math.atan(double) (Java) or atan(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.atan(double)/atan(float)
        Returns:
        Math.atan(double) casted to float/atan(float)
        See Also:
        Math.atan(double), atan(float)
      • atan

        protected double atan​(double _d)
        Delegates to either Math.atan(double) (Java) or atan(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.atan(double)/atan(double)
        Returns:
        Math.atan(double)/atan(double)
        See Also:
        Math.atan(double), atan(double)
      • atan2

        protected float atan2​(float _f1,
                              float _f2)
        Delegates to either Math.atan2(double, double) (Java) or atan2(float, float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f1 - value to delegate to first argument of Math.atan2(double, double)/atan2(float, float)
        _f2 - value to delegate to second argument of Math.atan2(double, double)/atan2(float, float)
        Returns:
        Math.atan2(double, double) casted to float/atan2(float, float)
        See Also:
        Math.atan2(double, double), atan2(float, float)
      • atan2

        protected double atan2​(double _d1,
                               double _d2)
        Delegates to either Math.atan2(double, double) (Java) or atan2(double, double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d1 - value to delegate to first argument of Math.atan2(double, double)/atan2(double, double)
        _d2 - value to delegate to second argument of Math.atan2(double, double)/atan2(double, double)
        Returns:
        Math.atan2(double, double)/atan2(double, double)
        See Also:
        Math.atan2(double, double), atan2(double, double)
      • ceil

        protected float ceil​(float _f)
        Delegates to either Math.ceil(double) (Java) or ceil(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.ceil(double)/ceil(float)
        Returns:
        Math.ceil(double) casted to float/ceil(float)
        See Also:
        Math.ceil(double), ceil(float)
      • ceil

        protected double ceil​(double _d)
        Delegates to either Math.ceil(double) (Java) or ceil(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.ceil(double)/ceil(double)
        Returns:
        Math.ceil(double)/ceil(double)
        See Also:
        Math.ceil(double), ceil(double)
      • cos

        protected float cos​(float _f)
        Delegates to either Math.cos(double) (Java) or cos(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.cos(double)/cos(float)
        Returns:
        Math.cos(double) casted to float/cos(float)
        See Also:
        Math.cos(double), cos(float)
      • cos

        protected double cos​(double _d)
        Delegates to either Math.cos(double) (Java) or cos(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.cos(double)/cos(double)
        Returns:
        Math.cos(double)/cos(double)
        See Also:
        Math.cos(double), cos(double)
      • exp

        protected float exp​(float _f)
        Delegates to either Math.exp(double) (Java) or exp(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.exp(double)/exp(float)
        Returns:
        Math.exp(double) casted to float/exp(float)
        See Also:
        Math.exp(double), exp(float)
      • exp

        protected double exp​(double _d)
        Delegates to either Math.exp(double) (Java) or exp(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.exp(double)/exp(double)
        Returns:
        Math.exp(double)/exp(double)
        See Also:
        Math.exp(double), exp(double)
      • abs

        protected float abs​(float _f)
        Delegates to either Math.abs(float) (Java) or fabs(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.abs(float)/fabs(float)
        Returns:
        Math.abs(float)/fabs(float)
        See Also:
        Math.abs(float), fabs(float)
      • popcount

        protected int popcount​(int _i)
        Delegates to either Integer.bitCount(int) (Java) or popcount(int) (OpenCL).
        Parameters:
        _i - value to delegate to Integer.bitCount(int)/popcount(int)
        Returns:
        Integer.bitCount(int)/popcount(int)
        See Also:
        Integer.bitCount(int), popcount(int)
      • popcount

        protected long popcount​(long _i)
        Delegates to either Long.bitCount(long) (Java) or popcount(long) (OpenCL).
        Parameters:
        _i - value to delegate to Long.bitCount(long)/popcount(long)
        Returns:
        Long.bitCount(long)/popcount(long)
        See Also:
        Long.bitCount(long), popcount(long)
      • clz

        protected int clz​(int _i)
        Delegates to either Integer.numberOfLeadingZeros(int) (Java) or clz(int) (OpenCL).
        Parameters:
        _i - value to delegate to Integer.numberOfLeadingZeros(int)/clz(int)
        Returns:
        Integer.numberOfLeadingZeros(int)/clz(int)
        See Also:
        Integer.numberOfLeadingZeros(int), clz(int)
      • clz

        protected long clz​(long _l)
        Delegates to either Long.numberOfLeadingZeros(long) (Java) or clz(long) (OpenCL).
        Parameters:
        _l - value to delegate to Long.numberOfLeadingZeros(long)/clz(long)
        Returns:
        Long.numberOfLeadingZeros(long)/clz(long)
        See Also:
        Long.numberOfLeadingZeros(long), clz(long)
      • abs

        protected double abs​(double _d)
        Delegates to either Math.abs(double) (Java) or fabs(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.abs(double)/fabs(double)
        Returns:
        Math.abs(double)/fabs(double)
        See Also:
        Math.abs(double), fabs(double)
      • abs

        protected int abs​(int n)
        Delegates to either Math.abs(int) (Java) or abs(int) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        n - value to delegate to Math.abs(int)/abs(int)
        Returns:
        Math.abs(int)/abs(int)
        See Also:
        Math.abs(int), abs(int)
      • abs

        protected long abs​(long n)
        Delegates to either Math.abs(long) (Java) or abs(long) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        n - value to delegate to Math.abs(long)/abs(long)
        Returns:
        Math.abs(long)/abs(long)
        See Also:
        Math.abs(long), abs(long)
      • floor

        protected float floor​(float _f)
        Delegates to either Math.floor(double) (Java) or floor(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.floor(double)/floor(float)
        Returns:
        Math.floor(double) casted to float/floor(float)
        See Also:
        Math.floor(double), floor(float)
      • floor

        protected double floor​(double _d)
        Delegates to either Math.floor(double) (Java) or floor(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.floor(double)/floor(double)
        Returns:
        Math.floor(double)/floor(double)
        See Also:
        Math.floor(double), floor(double)
      • max

        protected float max​(float _f1,
                            float _f2)
        Delegates to either Math.max(float, float) (Java) or fmax(float, float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f1 - value to delegate to first argument of Math.max(float, float)/fmax(float, float)
        _f2 - value to delegate to second argument of Math.max(float, float)/fmax(float, float)
        Returns:
        Math.max(float, float)/fmax(float, float)
        See Also:
        Math.max(float, float), fmax(float, float)
      • max

        protected double max​(double _d1,
                             double _d2)
        Delegates to either Math.max(double, double) (Java) or fmax(double, double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d1 - value to delegate to first argument of Math.max(double, double)/fmax(double, double)
        _d2 - value to delegate to second argument of Math.max(double, double)/fmax(double, double)
        Returns:
        Math.max(double, double)/fmax(double, double)
        See Also:
        Math.max(double, double), fmax(double, double)
      • max

        protected int max​(int n1,
                          int n2)
        Delegates to either Math.max(int, int) (Java) or max(int, int) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        n1 - value to delegate to Math.max(int, int)/max(int, int)
        n2 - value to delegate to Math.max(int, int)/max(int, int)
        Returns:
        Math.max(int, int)/max(int, int)
        See Also:
        Math.max(int, int), max(int, int)
      • max

        protected long max​(long n1,
                           long n2)
        Delegates to either Math.max(long, long) (Java) or max(long, long) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        n1 - value to delegate to first argument of Math.max(long, long)/max(long, long)
        n2 - value to delegate to second argument of Math.max(long, long)/max(long, long)
        Returns:
        Math.max(long, long)/max(long, long)
        See Also:
        Math.max(long, long), max(long, long)
      • min

        protected float min​(float _f1,
                            float _f2)
        Delegates to either Math.min(float, float) (Java) or fmin(float, float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f1 - value to delegate to first argument of Math.min(float, float)/fmin(float, float)
        _f2 - value to delegate to second argument of Math.min(float, float)/fmin(float, float)
        Returns:
        Math.min(float, float)/fmin(float, float)
        See Also:
        Math.min(float, float), fmin(float, float)
      • min

        protected double min​(double _d1,
                             double _d2)
        Delegates to either Math.min(double, double) (Java) or fmin(double, double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d1 - value to delegate to first argument of Math.min(double, double)/fmin(double, double)
        _d2 - value to delegate to second argument of Math.min(double, double)/fmin(double, double)
        Returns:
        Math.min(double, double)/fmin(double, double)
        See Also:
        Math.min(double, double), fmin(double, double)
      • min

        protected int min​(int n1,
                          int n2)
        Delegates to either Math.min(int, int) (Java) or min(int, int) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        n1 - value to delegate to first argument of Math.min(int, int)/min(int, int)
        n2 - value to delegate to second argument of Math.min(int, int)/min(int, int)
        Returns:
        Math.min(int, int)/min(int, int)
        See Also:
        Math.min(int, int), min(int, int)
      • min

        protected long min​(long n1,
                           long n2)
        Delegates to either Math.min(long, long) (Java) or min(long, long) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        n1 - value to delegate to first argument of Math.min(long, long)/min(long, long)
        n2 - value to delegate to second argument of Math.min(long, long)/min(long, long)
        Returns:
        Math.min(long, long)/min(long, long)
        See Also:
        Math.min(long, long), min(long, long)
      • log

        protected float log​(float _f)
        Delegates to either Math.log(double) (Java) or log(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.log(double)/log(float)
        Returns:
        Math.log(double) casted to float/log(float)
        See Also:
        Math.log(double), log(float)
      • log

        protected double log​(double _d)
        Delegates to either Math.log(double) (Java) or log(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.log(double)/log(double)
        Returns:
        Math.log(double)/log(double)
        See Also:
        Math.log(double), log(double)
      • pow

        protected float pow​(float _f1,
                            float _f2)
        Delegates to either Math.pow(double, double) (Java) or pow(float, float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f1 - value to delegate to first argument of Math.pow(double, double)/pow(float, float)
        _f2 - value to delegate to second argument of Math.pow(double, double)/pow(float, float)
        Returns:
        Math.pow(double, double) casted to float/pow(float, float)
        See Also:
        Math.pow(double, double), pow(float, float)
      • pow

        protected double pow​(double _d1,
                             double _d2)
        Delegates to either Math.pow(double, double) (Java) or pow(double, double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d1 - value to delegate to first argument of Math.pow(double, double)/pow(double, double)
        _d2 - value to delegate to second argument of Math.pow(double, double)/pow(double, double)
        Returns:
        Math.pow(double, double)/pow(double, double)
        See Also:
        Math.pow(double, double), pow(double, double)
      • IEEEremainder

        protected float IEEEremainder​(float _f1,
                                      float _f2)
        Delegates to either Math.IEEEremainder(double, double) (Java) or remainder(float, float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f1 - value to delegate to first argument of Math.IEEEremainder(double, double)/remainder(float, float)
        _f2 - value to delegate to second argument of Math.IEEEremainder(double, double)/remainder(float, float)
        Returns:
        Math.IEEEremainder(double, double) casted to float/remainder(float, float)
        See Also:
        Math.IEEEremainder(double, double), remainder(float, float)
      • IEEEremainder

        protected double IEEEremainder​(double _d1,
                                       double _d2)
        Delegates to either Math.IEEEremainder(double, double) (Java) or remainder(double, double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d1 - value to delegate to first argument of Math.IEEEremainder(double, double)/remainder(double, double)
        _d2 - value to delegate to second argument of Math.IEEEremainder(double, double)/remainder(double, double)
        Returns:
        Math.IEEEremainder(double, double)/remainder(double, double)
        See Also:
        Math.IEEEremainder(double, double), remainder(double, double)
      • toRadians

        protected float toRadians​(float _f)
        Delegates to either Math.toRadians(double) (Java) or radians(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.toRadians(double)/radians(float)
        Returns:
        Math.toRadians(double) casted to float/radians(float)
        See Also:
        Math.toRadians(double), radians(float)
      • toRadians

        protected double toRadians​(double _d)
        Delegates to either Math.toRadians(double) (Java) or radians(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.toRadians(double)/radians(double)
        Returns:
        Math.toRadians(double)/radians(double)
        See Also:
        Math.toRadians(double), radians(double)
      • toDegrees

        protected float toDegrees​(float _f)
        Delegates to either Math.toDegrees(double) (Java) or degrees(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.toDegrees(double)/degrees(float)
        Returns:
        Math.toDegrees(double) casted to float/degrees(float)
        See Also:
        Math.toDegrees(double), degrees(float)
      • toDegrees

        protected double toDegrees​(double _d)
        Delegates to either Math.toDegrees(double) (Java) or degrees(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.toDegrees(double)/degrees(double)
        Returns:
        Math.toDegrees(double)/degrees(double)
        See Also:
        Math.toDegrees(double), degrees(double)
      • rint

        protected float rint​(float _f)
        Delegates to either Math.rint(double) (Java) or rint(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.rint(double)/rint(float)
        Returns:
        Math.rint(double) casted to float/rint(float)
        See Also:
        Math.rint(double), rint(float)
      • rint

        protected double rint​(double _d)
        Delegates to either Math.rint(double) (Java) or rint(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.rint(double)/rint(double)
        Returns:
        Math.rint(double)/rint(double)
        See Also:
        Math.rint(double), rint(double)
      • round

        protected int round​(float _f)
        Delegates to either Math.round(float) (Java) or round(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.round(float)/round(float)
        Returns:
        Math.round(float)/round(float)
        See Also:
        Math.round(float), round(float)
      • round

        protected long round​(double _d)
        Delegates to either Math.round(double) (Java) or round(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.round(double)/round(double)
        Returns:
        Math.round(double)/round(double)
        See Also:
        Math.round(double), round(double)
      • sin

        protected float sin​(float _f)
        Delegates to either Math.sin(double) (Java) or sin(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.sin(double)/sin(float)
        Returns:
        Math.sin(double) casted to float/sin(float)
        See Also:
        Math.sin(double), sin(float)
      • sin

        protected double sin​(double _d)
        Delegates to either Math.sin(double) (Java) or sin(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.sin(double)/sin(double)
        Returns:
        Math.sin(double)/sin(double)
        See Also:
        Math.sin(double), sin(double)
      • sqrt

        protected float sqrt​(float _f)
        Delegates to either Math.sqrt(double) (Java) or sqrt(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.sqrt(double)/sqrt(float)
        Returns:
        Math.sqrt(double) casted to float/sqrt(float)
        See Also:
        Math.sqrt(double), sqrt(float)
      • sqrt

        protected double sqrt​(double _d)
        Delegates to either Math.sqrt(double) (Java) or sqrt(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.sqrt(double)/sqrt(double)
        Returns:
        Math.sqrt(double)/sqrt(double)
        See Also:
        Math.sqrt(double), sqrt(double)
      • tan

        protected float tan​(float _f)
        Delegates to either Math.tan(double) (Java) or tan(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.tan(double)/tan(float)
        Returns:
        Math.tan(double) casted to float/tan(float)
        See Also:
        Math.tan(double), tan(float)
      • tan

        protected double tan​(double _d)
        Delegates to either Math.tan(double) (Java) or tan(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.tan(double)/tan(double)
        Returns:
        Math.tan(double)/tan(double)
        See Also:
        Math.tan(double), tan(double)
      • acospi

        protected final double acospi​(double a)
      • acospi

        protected final float acospi​(float a)
      • asinpi

        protected final double asinpi​(double a)
      • asinpi

        protected final float asinpi​(float a)
      • atanpi

        protected final double atanpi​(double a)
      • atanpi

        protected final float atanpi​(float a)
      • atan2pi

        protected final double atan2pi​(double y,
                                       double x)
      • atan2pi

        protected final float atan2pi​(float y,
                                      double x)
      • cbrt

        protected final double cbrt​(double a)
      • cbrt

        protected final float cbrt​(float a)
      • cosh

        protected final double cosh​(double x)
      • cosh

        protected final float cosh​(float x)
      • cospi

        protected final double cospi​(double a)
      • cospi

        protected final float cospi​(float a)
      • exp2

        protected final double exp2​(double a)
      • exp2

        protected final float exp2​(float a)
      • exp10

        protected final double exp10​(double a)
      • exp10

        protected final float exp10​(float a)
      • expm1

        protected final double expm1​(double x)
      • expm1

        protected final float expm1​(float x)
      • log2

        protected final double log2​(double a)
      • log2

        protected final float log2​(float a)
      • log10

        protected final double log10​(double a)
      • log10

        protected final float log10​(float a)
      • log1p

        protected final double log1p​(double x)
      • log1p

        protected final float log1p​(float x)
      • mad

        protected final double mad​(double a,
                                   double b,
                                   double c)
      • mad

        protected final float mad​(float a,
                                  float b,
                                  float c)
      • nextAfter

        protected final double nextAfter​(double start,
                                         double direction)
      • nextAfter

        protected final float nextAfter​(float start,
                                        float direction)
      • sinh

        protected final double sinh​(double x)
        Delegates to either Math.sinh(double) (Java) or sinh(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        x - value to delegate to Math.sinh(double)/sinh(double)
        Returns:
        Math.sinh(double)/sinh(double)
        See Also:
        Math.sinh(double), sinh(double)
      • sinh

        protected final float sinh​(float x)
        Delegates to either Math.sinh(double) (Java) or sinh(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        x - value to delegate to Math.sinh(double)/sinh(float)
        Returns:
        Math.sinh(double)/sinh(float)
        See Also:
        Math.sinh(double), sinh(float)
      • sinpi

        protected final double sinpi​(double a)
        Backed by either Math.sin(double) (Java) or sinpi(double) (OpenCL). This method is equivelant to Math.sin(a * Math.PI) User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        a - value to delegate to sinpi(double) or java equivelant
        Returns:
        sinpi(double) or java equivelant
        See Also:
        Math.sin(double), sinpi(double)
      • sinpi

        protected final float sinpi​(float a)
        Backed by either Math.sin(double) (Java) or sinpi(float) (OpenCL). This method is equivelant to Math.sin(a * Math.PI) User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        a - value to delegate to sinpi(float) or java equivelant
        Returns:
        sinpi(float) or java equivelant
        See Also:
        Math.sin(double), sinpi(float)
      • tanh

        protected final double tanh​(double x)
        Delegates to either Math.tanh(double) (Java) or tanh(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        x - value to delegate to Math.tanh(double)/tanh(double)
        Returns:
        Math.tanh(double)/tanh(double)
        See Also:
        Math.tanh(double), tanh(double)
      • tanh

        protected final float tanh​(float x)
        Delegates to either java.lang.Math#tanh(float) (Java) or tanh(float) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        x - value to delegate to java.lang.Math#tanh(float)/tanh(float)
        Returns:
        java.lang.Math#tanh(float)/tanh(float)
        See Also:
        java.lang.Math#tanh(float), tanh(float)
      • tanpi

        protected final double tanpi​(double a)
        Backed by either Math.tan(double) (Java) or tanpi(double) (OpenCL). This method is equivelant to Math.tan(a * Math.PI) User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        a - value to delegate to tanpi(double) or java equivelant
        Returns:
        tanpi(double) or java equivelant
        See Also:
        Math.tan(double), tanpi(double)
      • tanpi

        protected final float tanpi​(float a)
        Backed by either Math.tan(double) (Java) or tanpi(float) (OpenCL). This method is equivelant to Math.tan(a * Math.PI) User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        a - value to delegate to tanpi(float) or java equivelant
        Returns:
        tanpi(float) or java equivelant
        See Also:
        Math.tan(double), tanpi(float)
      • rsqrt

        protected float rsqrt​(float _f)
        Computes inverse square root using Math.sqrt(double) (Java) or delegates to rsqrt(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _f - value to delegate to Math.sqrt(double)/rsqrt(double)
        Returns:
        ( 1.0f / Math.sqrt(double) casted to float )/rsqrt(double)
        See Also:
        Math.sqrt(double), rsqrt(double)
      • rsqrt

        protected double rsqrt​(double _d)
        Computes inverse square root using Math.sqrt(double) (Java) or delegates to rsqrt(double) (OpenCL). User should note the differences in precision between Java and OpenCL's implementation of arithmetic functions to determine whether the difference in precision is acceptable.
        Parameters:
        _d - value to delegate to Math.sqrt(double)/rsqrt(double)
        Returns:
        ( 1.0f / Math.sqrt(double) ) /rsqrt(double)
        See Also:
        Math.sqrt(double), rsqrt(double)
      • native_sqrt

        private float native_sqrt​(float _f)
      • native_rsqrt

        private float native_rsqrt​(float _f)
      • atomicAdd

        protected int atomicAdd​(int[] _arr,
                                int _index,
                                int _delta)
        Atomically adds _delta value to _index element of array _arr (Java) or delegates to atomic_add(volatile int*, int) (OpenCL).
        Parameters:
        _arr - array for which an element value needs to be atomically incremented by _delta
        _index - index of the _arr array that needs to be atomically incremented by _delta
        _delta - value by which _index element of _arr array needs to be atomically incremented
        Returns:
        previous value of _index element of _arr array
        See Also:
        atomic_add(volatile int*, int)
      • atomicGet

        protected final int atomicGet​(java.util.concurrent.atomic.AtomicInteger p)
      • atomicSet

        protected final void atomicSet​(java.util.concurrent.atomic.AtomicInteger p,
                                       int val)
      • atomicAdd

        protected final int atomicAdd​(java.util.concurrent.atomic.AtomicInteger p,
                                      int val)
      • atomicSub

        protected final int atomicSub​(java.util.concurrent.atomic.AtomicInteger p,
                                      int val)
      • atomicXchg

        protected final int atomicXchg​(java.util.concurrent.atomic.AtomicInteger p,
                                       int newVal)
      • atomicInc

        protected final int atomicInc​(java.util.concurrent.atomic.AtomicInteger p)
      • atomicDec

        protected final int atomicDec​(java.util.concurrent.atomic.AtomicInteger p)
      • atomicCmpXchg

        protected final int atomicCmpXchg​(java.util.concurrent.atomic.AtomicInteger p,
                                          int expectedVal,
                                          int newVal)
      • atomicMin

        protected final int atomicMin​(java.util.concurrent.atomic.AtomicInteger p,
                                      int val)
      • atomicMax

        protected final int atomicMax​(java.util.concurrent.atomic.AtomicInteger p,
                                      int val)
      • atomicAnd

        protected final int atomicAnd​(java.util.concurrent.atomic.AtomicInteger p,
                                      int val)
      • atomicOr

        protected final int atomicOr​(java.util.concurrent.atomic.AtomicInteger p,
                                     int val)
      • atomicXor

        protected final int atomicXor​(java.util.concurrent.atomic.AtomicInteger p,
                                      int val)
      • localBarrier

        protected final void localBarrier()
        Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
        It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.

        Note1: In OpenCL will execute as barrier(CLK_LOCAL_MEM_FENCE), which will have a different behaviour than in Java, because it will only guarantee visibility of modifications made to local memory space to all threads leaving the barrier.

        Note2: In OpenCL it is required that all threads must enter the same if blocks and must iterate the same number of times in all loops (for, while, ...).

        Note3: Java version is identical to localBarrier(), globalBarrier() and localGlobalBarrier()
      • globalBarrier

        protected final void globalBarrier()
        Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
        It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.

        Note1: In OpenCL will execute as barrier(CLK_GLOBAL_MEM_FENCE), which will have a different behaviour; than in Java, because it will only guarantee visibility of modifications made to global memory space to all threads, in the work group, leaving the barrier.

        Note2: In OpenCL it is required that all threads must enter the same if blocks and must iterate the same number of times in all loops (for, while, ...).

        Note3: Java version is identical to localBarrier(), globalBarrier() and localGlobalBarrier()
      • localGlobalBarrier

        protected final void localGlobalBarrier()
        Wait for all kernels in the current work group to rendezvous at this call before continuing execution.
        It will also enforce memory ordering, such that modifications made by each thread in the work-group, to the memory, before entering into this barrier call will be visible by all threads leaving the barrier.

        Note1: When in doubt, use this barrier instead of localBarrier() or globalBarrier(), despite the possible performance loss.

        Note2: In OpenCL will execute as barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE), which will have the same behaviour than in Java, because it will guarantee the visibility of modifications made to any of the memory spaces to all threads, in the work group, leaving the barrier.

        Note3: In OpenCL it is required that all threads must enter the same if blocks and must iterate the same number of times in all loops (for, while, ...).

        Note4: Java version is identical to localBarrier(), globalBarrier() and localGlobalBarrier()
      • hypot

        protected float hypot​(float a,
                              float b)
      • hypot

        protected double hypot​(double a,
                               double b)
      • prepareKernelRunner

        private KernelRunner prepareKernelRunner()
      • registerProfileReportObserver

        public void registerProfileReportObserver​(IProfileReportObserver observer)
        Registers a new profile report observer to receive profile reports as they're produced. This is the method recommended when the client application desires to receive all the execution profiles for the current kernel instance on all devices over all client threads running such kernel with a single observer
        Note1: A report will be generated by a thread that finishes executing a kernel. In multithreaded execution environments it is up to the observer implementation to handle thread safety.
        Note2: To cancel the report subscription just set observer to null value.
        Parameters:
        observer - the observer instance that will receive the profile reports
      • getProfileReportCurrentThread

        public java.lang.ref.WeakReference<ProfileReport> getProfileReportCurrentThread​(Device device)
        Retrieves the most recent complete report available for the current thread calling this method for the current kernel instance and executed on the given device.
        Note1: If the profile report is intended to be kept in memory, the object should be cloned with ProfileReport.clone()
        Note2: If the thread didn't execute this kernel on the specified device, it will return null.
        Parameters:
        device - the relevant device where the kernel executed
        Returns:
        • the profiling report for the current most recent execution
        • null, if no profiling report is available for such thread
        See Also:
        getProfileReportLastThread(Device), registerProfileReportObserver(IProfileReportObserver), #getExecutionTimeCurrentThread(Device), #getConversionTimeCurrentThread(Device), getAccumulatedExecutionTimeAllThreads(Device)
      • getAccumulatedExecutionTimeCurrentThread

        public double getAccumulatedExecutionTimeCurrentThread​(Device device)
        Determine the total execution time of all previous kernel executions called from the current thread, calling this method, that executed the current kernel on the specified device.
        Note1: This is the recommended method to retrieve the accumulated execution time for a single current thread, even when doing multithreading for the same kernel and device.
        Note that this will include the initial conversion time.
        Parameters:
        the - device of interest where the kernel executed
        Returns:
        • The total time spent executing the kernel (ms)
        • NaN, if no profiling information is available
        See Also:
        getProfileReportCurrentThread(Device), getProfileReportLastThread(Device), registerProfileReportObserver(IProfileReportObserver), getAccumulatedExecutionTimeAllThreads(Device)
      • getAccumulatedExecutionTime

        public double getAccumulatedExecutionTime()
        Determine the total execution time of all previous Kernel.execute(range) calls for all threads that ran this kernel for the device used in the last kernel execution.
        Note1: This is kept for backwards compatibility only, usage of getAccumulatedExecutionTimeAllThreads(Device) is encouraged instead.
        Note2: Calling this method is not recommended when using more than a single thread to execute the same kernel on multiple devices concurrently.

        Note that this will include the initial conversion time.
        Returns:
        • The total time spent executing the kernel (ms)
        • NaN, if no profiling information is available
        See Also:
        #getProfileReport(Device), registerProfileReportObserver(IProfileReportObserver)
      • execute

        public Kernel execute​(Range _range)
        Start execution of _range kernels.

        When kernel.execute(globalSize) is invoked, Aparapi will schedule the execution of globalSize kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.

        Parameters:
        _range - The number of Kernels that we would like to initiate.
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • execute

        public Kernel execute​(int _range)
        Start execution of _range kernels.

        When kernel.execute(_range) is 1invoked, Aparapi will schedule the execution of _range kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.

        Since adding the new Range class this method offers backward compatibility and merely defers to return (execute(Range.create(_range), 1));.

        Parameters:
        _range - The number of Kernels that we would like to initiate.
      • createRange

        protected Range createRange​(int _range)
      • execute

        public Kernel execute​(Range _range,
                              int _passes)
        Start execution of _passes iterations of _range kernels.

        When kernel.execute(_range, _passes) is invoked, Aparapi will schedule the execution of _reange kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.

        Parameters:
        _passes - The number of passes to make
        Returns:
        The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
      • execute

        public Kernel execute​(int _range,
                              int _passes)
        Start execution of _passes iterations over the _range of kernels.

        When kernel.execute(_range) is invoked, Aparapi will schedule the execution of _range kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.

        Since adding the new Range class this method offers backward compatibility and merely defers to return (execute(Range.create(_range), 1));.

        Parameters:
        _range - The number of Kernels that we would like to initiate.
      • execute

        public Kernel execute​(java.lang.String _entrypoint,
                              Range _range)
        Start execution of globalSize kernels for the given entrypoint.

        When kernel.execute("entrypoint", globalSize) is invoked, Aparapi will schedule the execution of globalSize kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.

        Parameters:
        _entrypoint - is the name of the method we wish to use as the entrypoint to the kernel
        Returns:
        The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
      • execute

        public Kernel execute​(java.lang.String _entrypoint,
                              Range _range,
                              int _passes)
        Start execution of globalSize kernels for the given entrypoint.

        When kernel.execute("entrypoint", globalSize) is invoked, Aparapi will schedule the execution of globalSize kernels. If the execution mode is GPU then the kernels will execute as OpenCL code on the GPU device. Otherwise, if the mode is JTP, the kernels will execute as a pool of Java threads on the CPU.

        Parameters:
        _entrypoint - is the name of the method we wish to use as the entrypoint to the kernel
        Returns:
        The Kernel instance (this) so we can chain calls to put(arr).execute(range).get(arr)
      • compile

        public Kernel compile​(Device _device)
                       throws CompileFailedException
        Force pre-compilation of the kernel for a given device, without executing it.
        Parameters:
        _device - the device for which the kernel is to be compiled
        Returns:
        the Kernel instance (this) so we can chain calls
        Throws:
        CompileFailedException - if compilation failed for some reason
      • compile

        public Kernel compile​(java.lang.String _entrypoint,
                              Device _device)
                       throws CompileFailedException
        Force pre-compilation of the kernel for a given device, without executing it.
        Parameters:
        _entrypoint - is the name of the method we wish to use as the entrypoint to the kernel
        _device - the device for which the kernel is to be compiled
        Returns:
        the Kernel instance (this) so we can chain calls
        Throws:
        CompileFailedException - if compilation failed for some reason
      • getKernelMinimumPrivateMemSizeInUsePerWorkItem

        public long getKernelMinimumPrivateMemSizeInUsePerWorkItem​(Device device)
                                                            throws QueryFailedException
        Retrieves that minimum private memory in use per work item for this kernel instance and the specified device.
        Parameters:
        device - the device where the kernel is intended to run
        Returns:
        the number of bytes used per work item
        Throws:
        QueryFailedException - if the query couldn't complete
      • getKernelLocalMemSizeInUse

        public long getKernelLocalMemSizeInUse​(Device device)
                                        throws QueryFailedException
        Retrieves the amount of local memory used in the specified device by this kernel instance.
        Parameters:
        device - the device where the kernel is intended to run
        Returns:
        the number of bytes of local memory in use for the specified device and current kernel
        Throws:
        QueryFailedException - if the query couldn't complete
      • getKernelPreferredWorkGroupSizeMultiple

        public int getKernelPreferredWorkGroupSizeMultiple​(Device device)
                                                    throws QueryFailedException
        Retrieves the preferred work-group multiple in the specified device for this kernel instance.
        Parameters:
        device - the device where the kernel is intended to run
        Returns:
        the preferred work group multiple
        Throws:
        QueryFailedException - if the query couldn't complete
      • getKernelMaxWorkGroupSize

        public int getKernelMaxWorkGroupSize​(Device device)
                                      throws QueryFailedException
        Retrieves the maximum work-group size allowed for this kernel when running on the specified device.
        Parameters:
        device - the device where the kernel is intended to run
        Returns:
        the preferred work group multiple
        Throws:
        QueryFailedException - if the query couldn't complete
      • getKernelCompileWorkGroupSize

        public int[] getKernelCompileWorkGroupSize​(Device device)
                                            throws QueryFailedException
        Retrieves the specified work-group size in the compiled kernel for the specified device or intermediate language for the device.
        Parameters:
        device - the device where the kernel is intended to run
        Returns:
        the preferred work group multiple
        Throws:
        QueryFailedException - if the query couldn't complete
      • isAutoCleanUpArrays

        public boolean isAutoCleanUpArrays()
      • setAutoCleanUpArrays

        public void setAutoCleanUpArrays​(boolean autoCleanUpArrays)
        Property which if true enables automatic calling of cleanUpArrays() following each execution.
      • cleanUpArrays

        public void cleanUpArrays()
        Frees the bulk of the resources used by this kernel, by setting array sizes in non-primitive KernelArgs to 1 (0 size is prohibited) and invoking kernel execution on a zero size range. Unlike dispose(), this does not prohibit further invocations of this kernel, as sundry resources such as OpenCL queues are not freed by this method.

        This allows a "dormant" Kernel to remain in existence without undue strain on GPU resources, which may be strongly preferable to disposing a Kernel and recreating another one later, as creation/use of a new Kernel (specifically creation of its associated OpenCL context) is expensive.

        Note that where the underlying array field is declared final, for obvious reasons it is not resized to zero.

      • dispose

        public void dispose()
        Release any resources associated with this Kernel.

        When the execution mode is CPU or GPU, Aparapi stores some OpenCL resources in a data structure associated with the kernel instance. The dispose() method must be called to release these resources.

        If execute(int _globalSize) is called after dispose() is called the results are undefined.

      • isRunningCL

        public boolean isRunningCL()
      • getTargetDevice

        public final Device getTargetDevice()
      • isAllowDevice

        public boolean isAllowDevice​(Device _device)
        Returns:
        true by default, may be overriden to allow vetoing of a device or devices by a given Kernel instance.
      • getExecutionMode

        @Deprecated
        public Kernel.EXECUTION_MODE getExecutionMode()
        Deprecated.
        See Kernel.EXECUTION_MODE

        Return the current execution mode. Before a Kernel executes, this return value will be the execution mode as determined by the setting of the EXECUTION_MODE enumeration. By default, this setting is either GPU if OpenCL is available on the target system, or JTP otherwise. This default setting can be changed by calling setExecutionMode().

        After a Kernel executes, the return value will be the mode in which the Kernel actually executed.

        Returns:
        The current execution mode.
        See Also:
        setExecutionMode(EXECUTION_MODE)
      • setExecutionMode

        @Deprecated
        public void setExecutionMode​(Kernel.EXECUTION_MODE _executionMode)
        Deprecated.
        See Kernel.EXECUTION_MODE

        Set the execution mode.

        This should be regarded as a request. The real mode will be determined at runtime based on the availability of OpenCL and the characteristics of the workload.

        Parameters:
        _executionMode - the requested execution mode.
        See Also:
        getExecutionMode()
      • setExecutionModeWithoutFallback

        public void setExecutionModeWithoutFallback​(Kernel.EXECUTION_MODE _executionMode)
      • setFallbackExecutionMode

        @Deprecated
        public void setFallbackExecutionMode()
        Deprecated.
      • descriptorToReturnTypeLetter

        private static java.lang.String descriptorToReturnTypeLetter​(java.lang.String desc)
      • getReturnTypeLetter

        private static java.lang.String getReturnTypeLetter​(java.lang.reflect.Method meth)
      • toClassShortNameIfAny

        private static java.lang.String toClassShortNameIfAny​(java.lang.Class<?> retClass)
      • setExplicit

        public void setExplicit​(boolean _explicit)
        For dev purposes (we should remove this for production) allow us to define that this Kernel uses explicit memory management
        Parameters:
        _explicit - (true if we want explicit memory management)
      • isExplicit

        public boolean isExplicit()
        For dev purposes (we should remove this for production) determine whether this Kernel uses explicit memory management
        Returns:
        (true if we kernel is using explicit memory management)
      • put

        public Kernel put​(long[] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(long[][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(long[][][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(double[] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(double[][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(double[][][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(float[] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(float[][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(float[][][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(int[] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(int[][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(int[][][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(byte[] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(byte[][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(byte[][][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(char[] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(char[][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(char[][][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(boolean[] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(boolean[][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • put

        public Kernel put​(boolean[][][] array)
        Tag this array so that it is explicitly enqueued before the kernel is executed
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(long[] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(long[][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(long[][][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(double[] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(double[][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(double[][][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(float[] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(float[][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(float[][][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(int[] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(int[][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(int[][][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(byte[] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(byte[][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(byte[][][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(char[] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(char[][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(char[][][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(boolean[] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(boolean[][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • get

        public Kernel get​(boolean[][][] array)
        Enqueue a request to return this buffer from the GPU. This method blocks until the array is available.
        Parameters:
        array -
        Returns:
        This kernel so that we can use the 'fluent' style API
      • getProfileInfo

        public java.util.List<ProfileInfo> getProfileInfo()
        Get the profiling information from the last successful call to Kernel.execute().
        Returns:
        A list of ProfileInfo records
      • addExecutionModes

        @Deprecated
        public void addExecutionModes​(Kernel.EXECUTION_MODE... platforms)
        Deprecated.
        See Kernel.EXECUTION_MODE.

        set possible fallback path for execution modes. for example setExecutionFallbackPath(GPU,CPU,JTP) will try to use the GPU if it fails it will fall back to OpenCL CPU and finally it will try JTP.

      • hasNextExecutionMode

        @Deprecated
        public boolean hasNextExecutionMode()
        Deprecated.
        Returns:
        is there another execution path we can try
      • tryNextExecutionMode

        @Deprecated
        public void tryNextExecutionMode()
        Deprecated.
        See Kernel.EXECUTION_MODE. try the next execution path in the list if there aren't any more than give up
      • markedWith

        private static <A extends java.lang.annotation.Annotation> ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​java.lang.Boolean>,​java.lang.RuntimeException> markedWith​(java.lang.Class<A> annotationClass)
      • toSignature

        static java.lang.String toSignature​(java.lang.reflect.Method method)
      • getArgumentsLetters

        private static java.lang.String getArgumentsLetters​(java.lang.reflect.Method method)
      • isRelevant

        private static boolean isRelevant​(java.lang.reflect.Method method)
      • getProperty

        private static <V,​T extends java.lang.Throwable> V getProperty​(ValueCache<java.lang.Class<?>,​java.util.Map<java.lang.String,​V>,​T> cache,
                                                                             ClassModel.ConstantPool.MethodReferenceEntry methodReferenceEntry,
                                                                             V defaultValue)
                                                                      throws T extends java.lang.Throwable
        Throws:
        T extends java.lang.Throwable
      • cacheProperty

        private static <K,​V,​T extends java.lang.Throwable> ValueCache<java.lang.Class<?>,​java.util.Map<K,​V>,​T> cacheProperty​(ValueCache.ThrowingValueComputer<java.lang.Class<?>,​java.util.Map<K,​V>,​T> throwingValueComputer)
      • invalidateCaches

        public static void invalidateCaches()