LWJGL 3 and OCL

Started by Lobo, January 18, 2016, 18:12:19

Previous topic - Next topic

Lobo

Hi,

I'm not sure if this is the correct subthread for OCL questions, so sorry if I'm wrong.

I have a problem actually with my INtel I5-6600k Skylake CPU and OCL and LWJGL 3 (latest release from website).

Code of CL Kernel is very easy, its just a test kernel and has no effect, just stripped it down.

kernel void test(global float* pos, global float* vel, global float* mass, global int* size) 
{
	const int itemId = get_global_id(0); 
   	if(itemId < size[0])  
   	{
   		pos[itemId * 3 + 0] = pos[itemId * 3 + 0] + 1.0f;
		pos[itemId * 3 + 1] = pos[itemId * 3 + 1] + 1.0f;
		pos[itemId * 3 + 2] = pos[itemId * 3 + 2] + 1.0f;
   	}
}


The java code is also rel. easy

// Create an OpenCL 'program' from a source code file
		IntBuffer errorBuf = BufferUtils.createIntBuffer(1);
		System.err.println("context: " + context);
		
		long nBodyProgram = CL10.clCreateProgramWithSource(context, Utils.loadText("test.cls").trim(), errorBuf);
		CLUtil.checkCLError(errorBuf.get(0));
		// Build the OpenCL program, store it on the specified device
		int returnValue = CL10.clBuildProgram(nBodyProgram, Config.DEVICE.address(), null, null, MemoryUtil.NULL);
		boolean success = CLUtils.checkReturnValueBuildProgram(returnValue, nBodyProgram, Config.DEVICE.address());
		if(success)
		{
			// Create a kernel instance of our OpenCl program
			nBodyKernel = CL10.clCreateKernel(nBodyProgram, "test", null);
		}
		else
		{
			System.exit(1);
		}


I can post the utils method as it is wanted, but it just reads out the .cls file and creates a string out of it.
If I now check the return value in "CLUtils.checkReturnValueBuildProgram()" I get this at my console:

OCL: CL_BUILD_PROGRAM_FAILURE
1:3:21: error: implicit declaration of function 'get_global_id' is invalid in OpenCL
Compilation failed

For me it seems to be that he can read the cls file but dont get that get_global_id() is a build in method of OCL.

The strangest fact is, that I got a new PC and at my old pc (Intel i5 2500k) it works without problems. Only with the new one it doesnt work.

I installed the driver from the intel site for my CPU https://software.intel.com/en-us/articles/opencl-drivers

The "OpenCLâ„¢ Runtime 15.1 for IntelÃ,® Coreâ„¢ and IntelÃ,® XeonÃ,® Processors for Windows* (64-bit & 32-bit)", I hope they are correct.

Unfortunatelly with LWJGL 3 I get a JVM crash:

#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffaf271d210, pid=6880, tid=220
#
# JRE version: Java(TM) SE Runtime Environment (8.0_66-b18) (build 1.8.0_66-b18)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.66-b18 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C  [nvopencl.dll+0x26d210]
#

If I try to let it run at my geforce 970, If I switch back to lwjgl 2.9 it works with my graphics card (same ocl kernel code) but for the CPU I get the same error.

I really dont know what is wrong and what I should do.

So I'm very thankful to each information or help I get.

Some data:
Using Java 8 build 66
NVidia driver 359.06
Windows 10 OS

If someone needs more information, pls let me now. I will provide everything I can.

Thanks a lot.

Best regards


spasi

I cannot reproduce either the build failure or the crash. Tried both Intel (on an i5-2400) and Nvidia (GTX 970). Also, I don't think your code above would work without an NPE, the options parameter of clBuildProgram cannot be null.

Two things you could provide that might help:

- A full (but shortest possible) code sample that reproduces the build failure and the crash.
- The full crash log.

Lobo

Hi,

thank you for your answer. I also was afraid of the "null"-option, but I took a look into the API and saw that the API catched null and doesnt give feedback that this is not valid.

public static int clBuildProgram(long program, long device, CharSequence options, CLProgramCallback pfn_notify, long user_data) {
		APIBuffer __buffer = apiBuffer();
		int optionsEncoded = __buffer.stringParamASCII(options, true);
		int device_list = __buffer.pointerParam(device);
		return nclBuildProgram(program, 1, __buffer.address(device_list), __buffer.address(optionsEncoded), pfn_notify == null ? NULL : pfn_notify.address(), user_data);
	}


public int stringParamASCII(CharSequence value, boolean nullTerminated) {
		if ( value == null )
			return -1;

		int offset = bufferParam(value.length() + (nullTerminated ? 1 : 0));
		memEncodeASCII(value, nullTerminated, buffer, offset);
		return offset;
	}


But it will the first I will try this evening at home.

I have posted the full log. If I try to compile it with CPU it get this logoutput:

OCL: CL_BUILD_PROGRAM_FAILURE
1:3:21: error: implicit declaration of function 'get_global_id' is invalid in OpenCL
Compilation failed

But only because I catched the return value if clBuildProgram and readout the build log after it fails with

CL10.clGetProgramBuildInfo(program, device, CL10.CL_PROGRAM_BUILD_LOG, errorBuffer, null);


This is the checkReturnValueBuildProgram() call.

If I try to compile it with the GPU the return value is ok, but later on the nvidia driver crashes the JVM with this error message:

#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffaf271d210, pid=6880, tid=220
#
# JRE version: Java(TM) SE Runtime Environment (8.0_66-b18) (build 1.8.0_66-b18)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.66-b18 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C  [nvopencl.dll+0x26d210]
#

I can provide more code this evening if I'm at home. In general is there nothing more in front of this code instead of
creating the context where the errorBuffer is empty and I get a valid context adress, so I think it worked. And creating the command queue
also without errorBuffer entry. I will provide the code this evening.

Thanks a lot for the answer, because the first I will test is to set an option and dont give null into the API.

Best regards
Lobo

spasi

Quote from: Lobo on January 19, 2016, 08:04:16I also was afraid of the "null"-option, but I took a look into the API and saw that the API catched null and doesnt give feedback that this is not valid.

public static int clBuildProgram(long program, long device, CharSequence options, CLProgramCallback pfn_notify, long user_data) {
		APIBuffer __buffer = apiBuffer();
		int optionsEncoded = __buffer.stringParamASCII(options, true);
		int device_list = __buffer.pointerParam(device);
		return nclBuildProgram(program, 1, __buffer.address(device_list), __buffer.address(optionsEncoded), pfn_notify == null ? NULL : pfn_notify.address(), user_data);
	}


public int stringParamASCII(CharSequence value, boolean nullTerminated) {
		if ( value == null )
			return -1;

		int offset = bufferParam(value.length() + (nullTerminated ? 1 : 0));
		memEncodeASCII(value, nullTerminated, buffer, offset);
		return offset;
	}

This was a bug that has been fixed recently, please use the latest nightly build before testing again. I wouldn't be surprised if this fixes one or both of the issues you're seeing.

Lobo

Oh sorry didnt know that. Ok I will try it out.

Lobo

Hi,

ok tried out the newest build and now I get a null pointer if the options are null and I added "-cl-fast-relaxed-math". The call looks now like:

int returnValue = CL10.clBuildProgram(nBodyProgram, Config.DEVICE.address(), "-cl-fast-relaxed-math", null, MemoryUtil.NULL);


Unfortunatelly it doesnt work :-(, get the same errors. I attached my JVM err file, which appears if I try to start it with the GPU.

Next I will try to strip it down to just one java file with only the necessary calls. If it still not work I think it easier to provide the entire code here.

Best regards


spasi

Looks like it's crashing at clSetKernelArg. Could you share the relevant code?

Lobo

Hi,

thanks for the fast answer.

I created one java file, see at code at the bottom of my post. It should be runnable, maybe you have just to change the package name.
There should be no reference to another classes instead of standard java or lwjgl classes. I also added the cl code as string at the bottom of this
class. I fear its something very very stupid.

Thanks a lot for your help.

Best regards
Lobo

package general;

import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.FloatBuffer;
import java.nio.IntBuffer;
import java.util.ArrayList;

import org.lwjgl.BufferUtils;
import org.lwjgl.PointerBuffer;
import org.lwjgl.opencl.CL;
import org.lwjgl.opencl.CL10;
import org.lwjgl.opencl.CLContextCallback;
import org.lwjgl.opencl.CLDevice;
import org.lwjgl.opencl.CLPlatform;
import org.lwjgl.opencl.CLUtil;
import org.lwjgl.opencl.Info;
import org.lwjgl.system.MemoryUtil;

public class SimpleCLTest
{
	public static void main(String[] args)
	{
		//Does anyone know why I cant call this? If I call this I get an error message that OpenCL has already been created.
		//CL.create();
		
		String[] platformNames = getPlatformNames();
		for(int i = 0; i < platformNames.length; i++)
		{
			String[] deviceNames = getDeviceNames(i);
			System.out.println("Platform: " + platformNames[i]);
			for(int j = 0; j < deviceNames.length; j++)
			{
				System.out.println("\tDevices: " + deviceNames[j]);
			}
		}
		
		
		for(int i = 0; i < platformNames.length; i++)
		{
			String[] deviceNames = getDeviceNames(i);
			CLPlatform platform = CLPlatform.getPlatforms().get(i);
			for(int j = 0; j < deviceNames.length; j++)
			{
				// Create an OpenCL context, this is where we could create an
				// OpenCL-OpenGL compatible context
				CLDevice device = platform.getDevices(CL10.CL_DEVICE_TYPE_ALL).get(j);
				
				IntBuffer errorBuf = BufferUtils.createIntBuffer(1);
				PointerBuffer ctxProps = BufferUtils.createPointerBuffer(3);
				long context = CL10.clCreateContext(ctxProps, device.address(), CONTEXT_CALLBACK, MemoryUtil.NULL, errorBuf);
				CLUtil.checkCLError(errorBuf.get(0));
				System.out.println("Context: " + context);
				// Create a command queue
				long queue = CL10.clCreateCommandQueue(context, device.address(), CL10.CL_QUEUE_PROFILING_ENABLE, errorBuf);
				// Check for any errors
				CLUtil.checkCLError(errorBuf.get(0));
				System.out.println("Queue: " + queue);
				
				FloatBuffer posBuffer = BufferUtils.createFloatBuffer(100 * 3);
				errorBuf = BufferUtils.createIntBuffer(1);

				float[] position = new float[100 * 3];
				
				// Create a buffer containing our array of numbers, we can use the
				// buffer to create an OpenCL memory object
				posBuffer.put(position);
				posBuffer.rewind();
				// Create an OpenCL memory object containing a copy of the data buffer
				long positionMemory = CL10.clCreateBuffer(context, CL10.CL_MEM_READ_WRITE | CL10.CL_MEM_COPY_HOST_PTR, posBuffer, errorBuf);
				// Check if the error buffer now contains an error
				CLUtil.checkCLError(errorBuf.get(0));
				
				/**
				 * Until here everything seems to work fine - now comes the problematic code
				 */
				
				// Create an OpenCL 'program' from a source code file
				long nBodyProgram = CL10.clCreateProgramWithSource(context, loadText(), errorBuf);
				CLUtil.checkCLError(errorBuf.get(0));
				// Build the OpenCL program, store it on the specified device
				int returnValue = CL10.clBuildProgram(nBodyProgram, device.address(), "-cl-fast-relaxed-math", null, MemoryUtil.NULL);
				boolean success = checkReturnValueBuildProgram(returnValue, nBodyProgram, device.address());
				if(success)
				{
					// Create a kernel instance of our OpenCl program
					long kernel = CL10.clCreateKernel(nBodyProgram, "test", null);
					CL10.clSetKernelArg(kernel, 0, positionMemory);
					
					// Create a buffer of pointers defining the multi-dimensional size of
					// the number of work units to execute
					final int dimensions = 1;
					PointerBuffer globalWorkSize = BufferUtils.createPointerBuffer(dimensions);
					globalWorkSize.put(0, 100);
					// Run the specified number of work units using our OpenCL program
					// kernel
					CL10.clEnqueueNDRangeKernel(queue, kernel, dimensions, null, globalWorkSize, null, null, null);
					CL10.clFinish(queue);
				}
			}
		}

		CL.destroy();
	}
	
	//Down here some conventience methods
	
	public static String[] getPlatformNames()
	{
		ArrayList<String> platformNames = new ArrayList<>(0);
		CLPlatform.getPlatforms().forEach(element -> platformNames.add(getPlatformInfo(element, CL10.CL_PLATFORM_VENDOR).trim()));
		return platformNames.toArray(new String[platformNames.size()]);
	}

	public static String[] getDeviceNames(int index)
	{
		ArrayList<String> deviceNames = new ArrayList<>(0);
		CLPlatform.getPlatforms().get(index).getDevices(CL10.CL_DEVICE_TYPE_ALL).forEach(element -> deviceNames.add(getDeviceInfo(element, CL10.CL_DEVICE_NAME).trim()));
		return deviceNames.toArray(new String[deviceNames.size()]);
	}
	
	private static String getPlatformInfo(CLPlatform platform, int param) {
		return Info.clGetPlatformInfoStringUTF8(platform.address(), param);
	}

	private static String getDeviceInfo(CLDevice device, int param) {
		return Info.clGetDeviceInfoStringUTF8(device.address(), param);
	}
	
	public static final CLContextCallback CONTEXT_CALLBACK = new CLContextCallback() {
		@Override
		public void invoke(long errinfo, long private_info, long cb, long user_data) {
		}
	};
	
	public static String loadText()
	{
		StringBuilder clKernel = new StringBuilder();
		clKernel.append("kernel void test(global float* pos)                  {\n");
		clKernel.append("const int itemId = get_global_id(0); 				   \n");
		clKernel.append("if(itemId < 100){								       \n");
		clKernel.append("pos[itemId * 3 + 0] = pos[itemId * 3 + 0] + 1.0f;     \n");
		clKernel.append("pos[itemId * 3 + 1] = pos[itemId * 3 + 1] + 1.0f;     \n");
		clKernel.append("pos[itemId * 3 + 2] = pos[itemId * 3 + 2] + 1.0f;	   \n");
		clKernel.append("}                                                     \n");
		clKernel.append("}                                                     \n");
		System.out.println(clKernel);
		return clKernel.toString();
	}
	
	public static boolean checkReturnValueBuildProgram(int returnValue, long program, long device)
    {
        if(returnValue != CL10.CL_SUCCESS)
        {
            switch(returnValue)
            {
                case CL10.CL_INVALID_PROGRAM : System.err.println("OCL: CL_INVALID_PROGRAM");break;
                case CL10.CL_INVALID_VALUE : System.err.println("OCL: CL_INVALID_VALUE");break;
                case CL10.CL_INVALID_DEVICE : System.err.println("OCL: CL_INVALID_DEVICE");break;
                case CL10.CL_INVALID_BINARY : System.err.println("OCL: CL_INVALID_BINARY");break;
                case CL10.CL_INVALID_BUILD_OPTIONS : System.err.println("OCL: CL_INVALID_BUILD_OPTIONS");break;
                case CL10.CL_INVALID_OPERATION : System.err.println("OCL: CL_INVALID_OPERATION");break;
                case CL10.CL_COMPILER_NOT_AVAILABLE : System.err.println("OCL: CL_COMPILER_NOT_AVAILABLE");break;
                case CL10.CL_BUILD_PROGRAM_FAILURE : System.err.println("OCL: CL_BUILD_PROGRAM_FAILURE");break;
                case CL10.CL_OUT_OF_HOST_MEMORY : System.err.println("OCL: CL_OUT_OF_HOST_MEMORY");break;
                default:System.err.println("OCL: UNKNOWN CL BUILD ERROR");break;                
            }
            
            ByteBuffer errorBuffer = BufferUtils.createByteBuffer(1024 * 10);
			CL10.clGetProgramBuildInfo(program, device, CL10.CL_PROGRAM_BUILD_LOG, errorBuffer, null);
			errorBuffer.rewind();
			byte[] tmp = new byte[1024 * 10];
			errorBuffer.get(tmp);
			try
			{
				System.err.println("CL Build error: " + new String(tmp, "ASCII"));
			}
			catch (UnsupportedEncodingException e)
			{}
    		return false;
        }
        else
        {
        	return true;
        }
    }
}


spasi

The bug is at line 88 above. The clSetKernelArg function requires a buffer from which to read the argument value. The argument value in this case is the positionMemory object. You're trying to pass it as the buffer instead.

You'll see that LWJGL provides multiple clSetKernelArg overloads and alternatives. This makes the fix very easy, just change your code to:

CL10.clSetKernelArg1p(kernel, 0, positionMemory);

which is a shortcut for:

PointerBuffer arg = BufferUtils.createPointerBuffer(1);
arg.put(0, positionMemory);
CL10.clSetKernelArg(kernel, 0, arg);

Lobo

Thank you very very much. I dont saw this, jesus how blind I'm.

With my 970 it works, but my CPU still fails with the error:

OCL: CL_BUILD_PROGRAM_FAILURE
CL Build error: Compilation started
1:2:20: error: implicit declaration of function 'get_global_id' is invalid in OpenCL
Compilation failed

But I fear somethings wrong with my CPU in combination with OCL. I downloaded LuxMark and if I
choose openCL CPU rendering the application just crashs. GPU rendering works.

Dont know whats wrong, maybe wrong driver but dont find newer/others except of this one I linked in my first post.

https://software.intel.com/en-us/articles/opencl-drivers

But thank you many times I have never found this error without your help.

Best regards

spasi

I just found this, sounds exactly like the problem you're seeing.

Lobo

Good Morning,

thank you a lot, yes it seems to be the same problem I have. I will try to fix the problem like it is desribed
in this thread and I will let you know if it fixed the problem.

Thanks very much and have a nice day,
Lobo

Lobo

Hi,

now it seems to work. Somehow the onboard graphics was deactivated via the BIOS, after activating it I saw my intel HD
in the windows device manager. Now I could install the newest intel graphics hd driver and after this I could run it in open cl.

Thanks a lot for your help, I fear I would have needed forever to fix this problem.

Have a nice evening

Best regards
Lobo