Solved: HUGE OpenCL Error

Started by Evan407, May 28, 2017, 06:46:32

Previous topic - Next topic

Evan407

http://evansgame.com/com/evanstools/opencl/demo/
Platform.java
package com.evanstools.opencl.demo;
import static org.lwjgl.opencl.CL10.*;
import static org.lwjgl.opencl.CL20.*;
import static org.lwjgl.BufferUtils.*;
import java.nio.*;
import org.lwjgl.*;
import org.lwjgl.opencl.*;
class Platform{
  private Platform(){}
  long id;
  long context;
  long[] commandQueues;//one for each device
  long buffer;
  static Platform createPlatform(long id){
    Platform p = new Platform();
    p.id = id;
    PointerBuffer _devices = createPointerBuffer(1);
    IntBuffer num_devices = createIntBuffer(1);
    clGetDeviceIDs(p.id,CL_DEVICE_TYPE_ALL,_devices,num_devices);
    p.commandQueues = new long[num_devices.get()];//number of devices on platform
    num_devices.rewind();
    _devices = createPointerBuffer(p.commandQueues.length);
    clGetDeviceIDs(p.id,CL_DEVICE_TYPE_ALL,_devices,num_devices);//retrieves all devices for platform
    System.out.printf("Platform id: %d%nNumber devices: %d%n",p.id,p.commandQueues.length);
    //create context
    PointerBuffer properties = createPointerBuffer(4);
    properties.put(CL_CONTEXT_PLATFORM);
    properties.put(p.id);
    properties.put(0);
    CLContextCallbackI pfn_notify = new CLContextCallbackI(){
      @Override public void invoke(long errinfo, long private_info, long cb, long user_data){
        System.out.printf("error%n");
      }
    };
    IntBuffer errcode_ret = createIntBuffer(1);
    p.context = clCreateContext(properties,
                                _devices,
                                pfn_notify,
                                0,
                                errcode_ret);
    for(int i = 0;i < p.commandQueues.length;i++){
      int[] errcode_ret2 = {CL_SUCCESS - 2};
      p.commandQueues[i] = clCreateCommandQueue(p.context,
                                                _devices.get(),
                                                CL_QUEUE_PROFILING_ENABLE,
                                                errcode_ret2);
      if(errcode_ret2[0] != CL_SUCCESS)throw new RuntimeException("CL error code");
    }
    allcolateBuffer(p,0);
    return p;
  }
  static void allcolateBuffer(Platform p, int size){
/*
If clCreateBuffer is called with a pointer returned by clSVMAlloc as its host_ptr argument, and
CL_MEM_USE_HOST_PTR is set in its flags argument, clCreateBuffer will succeed and return a valid non-zero buffer
object as long as the size argument to clCreateBuffer is no larger than the size argument passed in the original
clSVMAlloc call. The new buffer object returned has the shared memory as the underlying storage. Locations in the
buffers underlying shared memory can be operated on using atomic operations to the devices level of support as defined in
the memory model
*/
    ByteBuffer host_ptr = clSVMAlloc(p.context,CL_MEM_READ_WRITE,size,0/*use largest*/);//ERROR
    //int[] errcode_ret = {CL_SUCCESS - 2};
    //p.buffer = clCreateBuffer(p.context,new Integer(CL_MEM_USE_HOST_PTR).longValue(),host_ptr,errcode_ret);
    //if(errcode_ret[0] != CL_SUCCESS)System.out.printf("Error creating buffer.%n");
  }
}

QuoteNumber of platforms: 1
Platform id: 139734215350992
Number devices: 1
[LWJGL] Loading library: jemalloc
[LWJGL]    Using SharedLibraryLoader...
[LWJGL]    Found at: /tmp/lwjglevan/3.1.1-build-16/libjemalloc.so
[LWJGL]    Loaded from org.lwjgl.librarypath: /tmp/lwjglevan/3.1.1-build-16/libjemalloc.so
[LWJGL] MemoryUtil allocator: JEmallocAllocator
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000000000, pid=10229, tid=0x00007f166f09a700
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  0x0000000000000000
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/evan/Desktop/Java/hs_err_pid10229.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted (core dumped)
QuoteJava$ cat /home/evan/Desktop/Java/hs_err_pid10229.log
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000000000, pid=10229, tid=0x00007f166f09a700
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  0x0000000000000000
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
...
The error is with clSVMAlloc

spasi

Shared virtual memory is an OpenCL 2.0 feature. You're trying to use it without checking the capabilities exposed by your platform/device (see the CLCapabilities class). Afaict, you're testing on an Nvidia GPU, which does NOT support OpenCL 2.0 (that's an issue of the Nvidia OpenCL driver, not of the GPU).

Evan407

Quote from: spasi on May 28, 2017, 08:57:44
Shared virtual memory is an OpenCL 2.0 feature. You're trying to use it without checking the capabilities exposed by your platform/device (see the CLCapabilities class). Afaict, you're testing on an Nvidia GPU, which does NOT support OpenCL 2.0 (that's an issue of the Nvidia OpenCL driver, not of the GPU).
I've tried checking the version but the library returns nothing.
  static void printPlatformVersion(Platform p){
    LongBuffer param_value = createLongBuffer(1);
    PointerBuffer param_value_size_ret = createPointerBuffer(1);
    clGetPlatformInfo(p.id,CL_PLATFORM_VERSION,param_value,param_value_size_ret);
    System.out.printf("param_value_size_ret = %d%n",param_value_size_ret.get());
    //param_value = createLongBuffer(new Long(param_value_size_ret.get()).intValue());
    //param_value_size_ret.rewind();
    //clGetPlatformInfo(p.id,CL_PLATFORM_VERSION,param_value,param_value_size_ret);
    //System.out.printf("Platform version: %d%n",param_value.get());
  }
Quoteparam_value_size_ret = 0


Evan407

Quote from: Kai on May 29, 2017, 07:49:31
Please look up how to correctly call clGetPlatformInfo.
See: https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clGetPlatformInfo.html
and especially: https://stackoverflow.com/questions/17240071/what-is-the-right-way-to-call-clgetplatforminfo
I've all ready done that I've called the method once to retrieve the param_value_size_ret and then a second time to retrieve the version after looking up the size. It says the length of the data I requested is zero.
Quoteparam_value_size_ret
Returns the actual size in bytes of data being queried by param_value. If param_value_size_ret is NULL, it is ignored

In the like you provided it says
QuoteCL_PLATFORM_VERSION   char[]   
OpenCL version string. Returns the OpenCL version supported by the implementation. This version string has the following format:

OpenCL<space><major_version.minor_version><space><platform-specific information>

The major_version.minor_version value returned will be 1.0.
(The link you provided is for OpenCL 1.0.)

Kai

QuoteI've all ready done that I've called the method once to retrieve the param_value_size_ret and then a second time to retrieve the version after looking up the size. It says the length of the data I requested is zero.
Then post the actual code snippet you used for that. Your previous post was not doing that. In fact, your erroneous call to clGetPlatformInfo() would have generated an OpenCL error -30 (CL_INVALID_VALUE) had you checked for the return/error code of the method call and that is the reason why the OpenCL function aborted with an error and left you with an untouched 0 in the size buffer.

Please, again, read the documentation https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clGetPlatformInfo.html:
Quote
Errors:
...
- CL_INVALID_VALUE if param_name is not one of the supported values or if size in bytes specified by param_value_size is less than size of return type and param_value is not a NULL value.

For once, here is one correct way of doing it:
static void assertNoError(int ret) {
  if (ret != CL_SUCCESS) throw new AssertionError("OpenCL error: " + ret);
}
...
IntBuffer ib = BufferUtils.createIntBuffer(1);
assertNoError(clGetPlatformIDs(null, ib));
int numPlatforms = ib.get(0);
PointerBuffer pb = BufferUtils.createPointerBuffer(numPlatforms);
assertNoError(clGetPlatformIDs(pb, ib));
PointerBuffer size = BufferUtils.createPointerBuffer(1);
for (int i = 0; i < numPlatforms; i++) {
  assertNoError(clGetPlatformInfo(pb.get(i), CL_PLATFORM_VERSION, (ByteBuffer) null, size));
  int length = (int) size.get(0);
  ByteBuffer bb = BufferUtils.createByteBuffer(length);
  assertNoError(clGetPlatformInfo(pb.get(i), CL_PLATFORM_VERSION, bb, size));
  String version = MemoryUtil.memASCII(bb, length - 1);
  System.out.println(version);
}


Example output on a Quadro K2000M with driver version 377.35:
QuoteOpenCL 1.2 CUDA 8.0.0

Evan407

Okay I am starting to see my error. I still have trouble working with buffers from time to time. There isn't a way to query the size of the data returned before hand. I had to make the buffer big enough. I didn't check what the error was myself 🤦 I should have thought of that I knew if it didn't return success it would tell me what went wrong but I forgot.

  static void printPlatformVersion(Platform p){
    ByteBuffer param_value = createByteBuffer(100);
    PointerBuffer param_value_size_ret = createPointerBuffer(1);
    clGetPlatformInfo(p.id,CL_PLATFORM_VERSION,param_value,param_value_size_ret);/*
    //System.out.printf("param_value_size_ret = %d%n",param_value_size_ret.get());*/
    System.out.printf("Platform version: %s%n",org.lwjgl.system.MemoryUtil.memASCII(param_value,new Long(param_value_size_ret.get()).intValue()));
  }


You were correct my openCL is
QuotePlatform version: OpenCL 1.2 CUDA 8.0.0