OpenCL sum example

tomb · February 18, 2011, 13:31:47

Here is simple OpenCL example that may help if your writing your first program.

import org.lwjgl.opencl.Util;
import org.lwjgl.opencl.CLMem;
import org.lwjgl.opencl.CLCommandQueue;
import org.lwjgl.BufferUtils;
import org.lwjgl.PointerBuffer;
import org.lwjgl.opencl.CLProgram;
import org.lwjgl.opencl.CLKernel;
import java.nio.FloatBuffer;
import java.util.List;
import org.lwjgl.opencl.CL;
import org.lwjgl.opencl.CLContext;
import org.lwjgl.opencl.CLDevice;
import org.lwjgl.opencl.CLPlatform;
import static org.lwjgl.opencl.CL10.*;

public class OpenCLSum {

    static final String source =
              "kernel void "
            + "sum(global const float *a, "
            + "    global const float *b, "
            + "    global float *answer) { "
            + "  unsigned int xid = get_global_id(0); "
            + "  answer[xid] = a[xid] + b[xid];" 
            + "}"
            ;

    static final FloatBuffer a = toFloatBuffer(new float[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10});
    static final FloatBuffer b = toFloatBuffer(new float[]{9, 8, 7, 6, 5, 4, 3, 2, 1, 0});
    static final FloatBuffer answer = BufferUtils.createFloatBuffer(a.capacity());

    public static void main(String[] args) throws Exception {
        // initialization
        CL.create();
        CLPlatform platform = CLPlatform.getPlatforms().get(0);
        List<CLDevice> devices = platform.getDevices(CL_DEVICE_TYPE_GPU);
        CLContext context = CLContext.create(platform, devices, null, null, null);
        CLCommandQueue queue = clCreateCommandQueue(context, devices.get(0), CL_QUEUE_PROFILING_ENABLE, null);

        // allocation
        CLMem aMem = clCreateBuffer(context, CL_MEM_READ_ONLY, a, null);
        clEnqueueWriteBuffer(queue, aMem, 1, 0, a, null, null);
        CLMem bMem = clCreateBuffer(context, CL_MEM_READ_ONLY, b, null);
        clEnqueueWriteBuffer(queue, bMem, 1, 0, b, null, null);
        CLMem answerMem = clCreateBuffer(context, CL_MEM_WRITE_ONLY, answer, null);
        clFinish(queue);

        // program/kernel creation
        CLProgram program = clCreateProgramWithSource(context, source, null);
        Util.checkCLError(clBuildProgram(program, devices.get(0), "", null));
        // sum has to match a kernel method name in the OpenCL source
        CLKernel kernel = clCreateKernel(program, "sum", null);

        // execution
        PointerBuffer kernel1DGlobalWorkSize = BufferUtils.createPointerBuffer(1);
        kernel1DGlobalWorkSize.put(0, a.capacity());
        kernel.setArg(0, aMem);
        kernel.setArg(1, bMem);
        kernel.setArg(2, answerMem);
        clEnqueueNDRangeKernel(queue, kernel, 1, null, kernel1DGlobalWorkSize, null, null, null);

        // read the results back
        clEnqueueReadBuffer(queue, answerMem, 1, 0, answer, null, null);
        clFinish(queue);
        
        print(a);
        System.out.println("+");
        print(b);
        System.out.println("=");
        print(answer);

        // teardown
        clReleaseKernel(kernel);
        clReleaseProgram(program);
        clReleaseCommandQueue(queue);
        clReleaseContext(context);
        CL.destroy();
    }

    static FloatBuffer toFloatBuffer(float[] floats) {
        FloatBuffer buf = BufferUtils.createFloatBuffer(floats.length).put(floats);
        buf.rewind();
        return buf;
    }

    static void print(FloatBuffer buffer) {
        for (int i = 0; i < buffer.capacity(); i++) {
            System.out.print(buffer.get(i)+" ");
        }
        System.out.println("");
    }
}

kappa · February 18, 2011, 13:43:41

oh very nice and simple.

There is a lack of OpenCL tutorials/documentation on the wiki this would make an excellent example. Also would be much more visible on the wiki.

Care to add it there? or better roll it up into a simple short tutorial?

tomb · February 24, 2011, 14:44:16

I could try to dump the source in the wiki. However an admin would have to give me a wiki account. The wiki says the account creation is disabled.

kappa · February 24, 2011, 14:54:57

Quote from: tomb on February 24, 2011, 14:44:16
However an admin would have to give me a wiki account. The wiki says the account creation is disabled.

see pm.

Izibaar · February 24, 2011, 20:13:19

I tried your example because it looks so simple. But it is not working for me. The results I get are all 0.0 instead of 10.0. I changed some numbers so I should get 9.0 and 8.0 once, but it is still 0.0 for everything.

I installed the latest ATI driver (11.2) with ati stream for my HD4850 and donwloaded the latest lwjgl version just today.

Izibaar · March 01, 2011, 22:29:10

I am wondering if someone else got problems or could run it?

Edit: I tested it with the CPU and on my notebook (only cpu). Both did not work. The "answer" is always 0.0 for everything.

tomb · March 03, 2011, 14:45:57

I don't know why it don't run. I've only tested the code on a nvidia en9600gt.

The code has almost no error handling so it is difficult know what goes wrong. It was written to be easy to understand and learn. What you could do is download the lwjgl source and try the mandelbrot example and see if that works.

Kuko · March 19, 2011, 19:18:17

Hi, thanks for the above example, I found it very helpful to start programming OpenCL. I have tried to modify it in order to do a very simple texture access.
However, even after numerous efforts it did not work correctly. Can someone take a look at it, please? Here is the code:

import org.lwjgl.opencl.Util;
import org.lwjgl.opencl.CLMem;
import org.lwjgl.opencl.CLCommandQueue;
import org.lwjgl.BufferUtils;
import org.lwjgl.PointerBuffer;
import org.lwjgl.opencl.CLProgram;
import org.lwjgl.opencl.CLKernel;
import java.nio.FloatBuffer;
import java.util.List;
import org.lwjgl.opencl.CL;
import org.lwjgl.opencl.CLContext;
import org.lwjgl.opencl.CLDevice;
import org.lwjgl.opencl.CLPlatform;
import org.lwjgl.opencl.api.CLImageFormat;

import static org.lwjgl.opencl.CL10.*;

public class CL_Texture_Test {

	static final String source = "const sampler_t sampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST; "
			+ "  "
			+ "kernel void texTest(__read_only image2d_t tex, global float *answer) { "
			+ "int xid = get_global_id(0); "
			+ "int yid = get_global_id(1); "
			+ "float2 texCrds = (xid / 4.0, yid / 4.0); "
			+ "int pos = get_global_size(0) * yid + xid;  "
			+ "answer[pos] = read_imagef(tex, sampler, texCrds).x;  "
			+ "} "
			+ "; ";

	static final FloatBuffer texels = toFloatBuffer(new float[] { 1, 2, 3, 4,
			5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 });
	static final FloatBuffer answer = BufferUtils.createFloatBuffer(texels
			.capacity());

	public static void main(String[] args) throws Exception {
		// initialization
		CL.create();
		CLPlatform platform = CLPlatform.getPlatforms().get(0); 
		List<CLDevice> devices = platform.getDevices(CL_DEVICE_TYPE_GPU);
		CLContext context = CLContext.create(platform, devices, null, null,
				null);
		CLCommandQueue queue = clCreateCommandQueue(context, devices.get(0),
				CL_QUEUE_PROFILING_ENABLE, null);

		// allocation
		CLMem answerMem = clCreateBuffer(context, CL_MEM_WRITE_ONLY, answer,
				null);
		clFinish(queue);

		// program/kernel creation
		CLProgram program = clCreateProgramWithSource(context, source, null);
		Util.checkCLError(clBuildProgram(program, devices.get(0), "", null));
		// texTest has to match a kernel method name in the OpenCL source
		CLKernel kernel = clCreateKernel(program, "texTest", null);

		// Modified code (except for the source String) starts here:
		int width = 4, height = 4;
		CLImageFormat imageF = new CLImageFormat(CL_R, CL_FLOAT);

		CLMem texture = CLMem.createImage2D(context, CL_MEM_READ_ONLY, imageF,
				width, height, 0, null, null);

		PointerBuffer origin = BufferUtils.createPointerBuffer(3);
		origin.put(0, 0);
		origin.put(1, 0);
		origin.put(2, 0);

		PointerBuffer region = BufferUtils.createPointerBuffer(3);
		region.put(0, width);
		region.put(1, height);
		region.put(2, 1);

		clEnqueueWriteImage(queue, texture, 1, origin, region, 0, 0, texels,
				null, null);

		// execution
		PointerBuffer kernel2DGlobalWorkSize = BufferUtils.createPointerBuffer(2);
	    kernel2DGlobalWorkSize.put(0, width);
	    kernel2DGlobalWorkSize.put(1, height);
		
		kernel.setArg(0, texture);
		kernel.setArg(1, answerMem);
		clEnqueueNDRangeKernel(queue, kernel, 2, null, kernel2DGlobalWorkSize,
				null, null, null);

		// read the results back
		clEnqueueReadBuffer(queue, answerMem, 1, 0, answer, null, null);
		clFinish(queue);

		print(texels);
		System.out.println("!=");
		print(answer);

		// teardown
		clReleaseKernel(kernel);
		clReleaseProgram(program);
		clReleaseCommandQueue(queue);
		clReleaseContext(context);
		CL.destroy();
	}

	static FloatBuffer toFloatBuffer(float[] floats) {
		FloatBuffer buf = BufferUtils.createFloatBuffer(floats.length).put(
				floats);
		buf.rewind();
		return buf;
	}

	static void print(FloatBuffer buffer) {
		for (int i = 0; i < buffer.capacity(); i++) {
			System.out.print(buffer.get(i) + " ");
		}
		System.out.println("");
	}
}

The output on a GTX 260 and GTX 470 with latest drivers on win7 is:

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0 
!=
1.0 1.0 1.0 1.0 6.0 6.0 6.0 6.0 11.0 11.0 11.0 11.0 16.0 16.0 16.0 16.0

The x-component of "texCrds" seems to have no effect on the value delivered by read_imagef.
More observations:
-There are no error messages (err. Code in the example above is removed for simplicity)
-I tried the kernel with a (correct) OpenGL-Texture. Same result.
-xid and yid are correct

Kuko · April 19, 2011, 08:39:24

Hi, I solved my previous problem meanwhile. Additionally, I have one note on the original Example:
clCreateBuffer throws an CL_INVALID_HOST_PTR Error if host_ptr is not NULL but CL_MEM_COPY_HOST_PTR or CL_MEM_USE_HOST_PTR are not set in flags.
Nvidiaâ€™s driver ignores this for some reason, but the example will not execute correctly with AMDâ€™s or Intelâ€™s OpenCL drivers.

So, suggest to change for example

CLMem aMem = clCreateBuffer(context, CL_MEM_READ_ONLY, a, null);

to

CLMem aMem = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, a, null);

â€¦ and so on.

Izibaar · April 26, 2011, 13:45:03

Thank you Kuko. Finally I got a working and simple example I can start from.

News:

OpenCL sum example