menu

Profiling the Kernel

Using Aparapi's built in profiling APIs.

If you want to extract OpenCL performance info from a kernel at runtime you need to set the property :-


        -Dcom.aparapi.enableProfiling=true
        

Your application can then call kernel.getProfileInfo() after a successful call to kernel.execute(range) to extract a List List.

Each ProfileInfo holds timing information for buffer writes, executs and buffer reads.

The following code will print a simple table of profile information


        List<ProfileInfo> profileInfo = k.getProfileInfo();
        for (final ProfileInfo p : profileInfo) {
           System.out.print(" " + p.getType() + " " + p.getLabel() + " " + (p.getStart() / 1000) + " .. "
               + (p.getEnd() / 1000) + " " + ((p.getEnd() - p.getStart()) / 1000) + "us");
           System.out.println();
        }
        

Here is an example implementation


        final float result[] = new float[2048*2048];
        Kernel k = new Kernel(){
           public void run(){
              final int gid=getGlobalId();
              result[gid] =0f;
           }
        };
        k.execute(result.length);
        List<ProfileInfo> profileInfo = k.getProfileInfo();
        
        for (final ProfileInfo p : profileInfo) {
           System.out.print(" " + p.getType() + " " + p.getLabel() + " " + (p.getStart() / 1000) + " .. "
              + (p.getEnd() / 1000) + " " + ((p.getEnd() - p.getStart()) / 1000) + "us");
           System.out.println();
        }
        k.dispose();
        

And here is the tabular output from


        java
           -Djava.library.path=${APARAPI_HOME}
           -Dcom.aparapi.enableProfiling=true
           -cp ${APARAPI_HOME}:.
           MyClass
        
        W val$result 69500 .. 72694 3194us
        X exec()     72694 .. 72835  141us
        R val$result 75327 .. 78225 2898us
        

The table shows that the transfer of the ‘result’ buffer to the device ('W’) took 3194 us (micro seconds), the execute ('X’) of the kernel 141 us and the read ('R’) of resulting buffer 2898 us.