menu

Extensions

A proposed aparapi extension mechanism.

Here is a proposed Aparapi extension mechanism

This would allow a developer to create a library that could be used by Aparapi Kernel code. The library would include OpenCL and Java implementations.

We will treat this as a live document. Please join the discussions at http://groups.google.com/group/aparapi-discuss/browse_thread/thread/7ec81ecb2169aa4 and I will update this page to reflect what I think the latest decisions are:-

Currently Aparapi allows Java bytecode to be converted to OpenCL at runtime. Only the OpenCL generated by this conversion process is made available. Sometimes for performance reasons we might want to allow hand coded OpenCL to be called from Aparapi kernel code.

Here we will present a strawman API which would allow extension points to be added by an end user or by a library provider.

We will use an FFT usecase to walk through the steps.

The FFT (Fast Fourier Transform) algorithm can be coded in Aparapi, but for performance reasons handcrafted OpenCL is likely to be more performant. The goal is to allow Aparapi to do what it does best, i.e. manage the host buffer allocations and provide a mechanism for binding arbitrary opencl code at runtime.

So lets assume we wanted an Aparapi Kernel to be able to call an Aparapi extension for computing FFT (forward and reverse). The Kernel implementation might look like this.


        public static class BandStopFilter extends Kernel{
           FFT fft = new FFT(); // Create an instance of the Extension point.
           float[] real;
           float[] imaginary;
        
          BandStopFilter (float[] _real){
             real = _real;
             imaginary = new float[_real.length];
        
          }
        
          @Override public void run() {
             fft.forward(real, imaginary);
          }
        }
        

The main method then would just execute the Kernel using the familiar kernel.execute() method :-


        public static void main(String[] args) {
           float[] data = new float[1024];
           BandStopFilter  kernel = new BandStopFilter (data);
           kernel.execute(data.length);
        }
        

Essentially we want the FFT.forward(float[] _real, float[] _imaginary) and FFT.reverse(float[] _real, float[] _imaginary) methods to be callable from Aparapi Kernel code. We want Aparapi to handle the call-forwarding and the argument/buffer mapping transfers. We want Aparapi to call the Java methods normally if OpenCL is not available but would like Aparapi to use the implementor provided OpenCL if it is. So the implementor will be required to provide both a Java and an OpenCL version of the callable methods because Aparapi will decide which version needs to be called ant runtime.

Any extension point is required to implement the AparapiExtensionPoint interface.


        public class AparapiExtensionPoint
           public String getOpenCL();
        }
        

Here is a possible (although incomplete) FFT implementation.


        public class FFT implements AparapiExtensionPoint{
            @AparapiCallable public void forward(
                @Global @ReadWrite float[] _data,
                @Global @ReadWrite float[] _imaginary) {
                  // java implementation
               }
        
            @AparapiCallable public void reverse(
                @Global @ReadWrite float[] _data,
                @Global @ReadWrite float[] _imaginary) {
                  // java implementation
                }
        
            @Override public String getOpenCL() {
                  return ""
                  +"void my_package_FFT_forward("
                  +"   __global float* _real,"
                  +"   __global float* _imaginary )"
                  +"   {"
                  +"       // OpenCL implemention"
                  +"   }"
                  +"void my_package_FFT_reverse("
                  +"   __global float* _real,"
                  +"   __global float* _imaginary )"
                  +"   {"
                  +"       // OpenCL implemention"
                  +"   }";
               }
        }
        

The implementer’s class will be required to define the callable aparapi methods as well as implement the getOpenCL() method so that the OpenCL implementation of those methods can be extracted at run-time.

Aparapi will provide annotations to decorate the methods and args/parameters of the exposed callable methods . These annotations provide information so that Aparapi locate the callable methods as well as parameter hints to help coordinate buffer types (global, local, constant) and transfer directions (read,write, readWrite) when executing the methods from a Kernel. This information is consulted during the normal bytecode analysis that Aparapi provides when Aparapi hits the call site.

Note that the Java code inside the @AparapiCallable functions (or code executed from it) is not constrained to the normal Aparapi subset. It can be any legitimate Java code, but should be thread safe (because it will be called from JTP mode!).

Note also that the OpenCL code yielded from the getOpenCL() method is assumed to be complete, Aparapi does not attempt to parse this code. If the code fails to compile Aparapi will fallback and execute the whole Kernel in JTP mode.

BTW we show getOpenCL() returning a String literal. This is most likely to be how code is returned. However, it could be extracted from a File? a resource in the Jar file? or dynamically generated based on some state. For example an FFT implementation might choose to use different code for radix2 or radix4 implementations (based on a paramater passed to FFT() constructor - say FFT(FFT.RADIX2)) in which case the getOpenCL() method might yield different code.

The above proposal covers the case where a third party might want to provide an Aparapi extension point as a library.

We might also consider allowing single methods within the Kernel to be optimized, where the OpenCL is made available via the AparapiCallable annotation. The method would still use the same Annotations for the args (to allow buffer txfers to be optimized).


        Kernel k = new Kernel(){
              @AparapiCallable( /* opencl code for sum() goes here */)
               int sum(@Global @ReadWrite int[] data, int length){
                     int  sum = 0;
                     for (int v:data){
                            sum+=v;
                     }
              }
             @Override public void run(){
                    sum(data);
             }
        }
        

Here are the proposed new interfaces/annotations


        public interface AparapiExtensionPoint {
           public String getOpenCL();
        }
        @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.METHOD)
        public @interface AparapiCallable {
             String value default NULL;
        }
        
        @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.PARAMETER)
        public @interface Global {}
        
        @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.PARAMETER)
        public @interface Local {}
        
        @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.PARAMETER)
        public @interface Constant {}
        
        @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.PARAMETER)
        public @interface ReadWrite {}
        
        @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.PARAMETER)
        public @interface ReadOnly {}
        
        @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.PARAMETER)
        public @interface WriteOnly {}
        

And here is the example code in one chunk


        public class FFT implements AparapiExtensionPoint{
            @AparapiCallable public void forward(
                @Global @ReadWrite float[] _data,
                @Global @ReadWrite float[] _imaginary) {
                  // java implementation
               }
        
          @AparapiCallable public void reverse(
              @Global @ReadWrite float[] _data,
              @Global @ReadWrite float[] _imaginary) {
                // java implementation
              }
        
          @Override public String getOpenCL() {
                return ""
                +"void my_package_FFT_forward("
                +"   __global float* _real,"
                +"   __global float* _imaginary )"
                +"   {"
                +"       // OpenCL implemention"
                +"   }"
                +"void my_package_FFT_reverse("
                +"   __global float* _real,"
                +"   __global float* _imaginary )"
                +"   {"
                +"       // OpenCL implemention"
                +"   }";
             }
        }
        
        public class BandStopFilter extends Kernel{
           FFT fft = new FFT();
           float[] real;
           float[] imaginary;
        
           BandStopFilter (float[] _real){
              real = _real;
              imaginary = new float[_real.length];
        
           }
        
           @Override public void run() {
              fft.forward(real, imaginary);
           }
        }
        
        public static void main(String[] args) {
           float[] data = new float[1024];
           BandStopFilter  kernel = new BandStopFilter (data);
           kernel.execute(data.length);
        }
        

After discussion I think we are converging on a less complex solution. This is based on Witold’s feedback suggestion (see below) where we use OpenCL annotations rather than forcing the implementation of the interface and the getOpenCL() method as originally suggested.

So we will create an @OpenCL annotation for classes/methods.

The @OpenCL annotation on the methods will contain the OpenCL source replacement for a specific method. The arg list will be created by Aparapi.

The @OpenCL annotation on a class allows us to optionally introduce common code (helper methods, #pragmas, constants) which will precede the method declarations in the OpenCL code.

So an FFT example whereby forward() and reverse() methods both called a common foo() method might look like this.


        @OpenCL(common="/* common void foo(){} + maybe #pragmas + accessable
        global fields declared here */")
        public class FFT extends AparapiExtensionPoint {
              @OpenCL(signature="//function signature - OPTIONAL", body="{ /* uses foo(); */ }")
              public void forward(
                  @Global @ReadWrite float[] _data,
                  @Global @ReadWrite float[] _imaginary) {
                    // java implementation
                 }
              @OpenCL(function="{  /*uses foo(); */) }")
              public void reverse(
                  @Global @ReadWrite float[] _data,
                  @Global @ReadWrite float[] _imaginary) {
                    // java implementation
                  }
           }
        }
        

To invoke from an Aparapi kernel. We should be able to do something like


        public class BandStopFilter extends Kernel{
             FFT fft = new FFT();
             float[] real;
             float[] imaginary;
        
             BandStopFilter (float[] _real){
                real = _real;
                imaginary = new float[_real.length];
        
             }
        
             @Override public void run() {
                fft.forward(this, real, imaginary);
             }
          }
        
          public static void main(String[] args) {
             float[] data = new float[1024];
             BandStopFilter  kernel = new BandStopFilter (data);
             kernel.execute(data.length);
          }
        }
        

Ideally we would also like to invoke FFT directly (instead of via a Kernel). This is tricky because the forward()} and reverse() methods will need to be invoked across a range and of course the dispatch across the range needs to be initiated from Aparapi.

The only way I can see how to do this is to force the creation of an interface so we can use Java’s existing Proxy mechanism to create a wrapper.


        @OpenCL(wraps=FFT.class);
        interface FFTInterface{
         public void forward(  Range _range, float[] _data,  float[] _imaginary);
             public void reverse( Range _range, float[] _data, float[] _imaginary);
        }
        Then provide a mechanism for extracting a proxy and invoking it.
        
        float[] real = //??
        float[] imag = //??
        Aparapi.wrap<FFT>(FFTInterface.class).forward(range, real, imag);
        

I can’t see a cleaner solution.