LWJGL Forum

Programming => Lightweight Java Gaming Library => Topic started by: petarts on September 30, 2016, 19:37:05

Title: OpenCL problem, out of stack space [solved]
Post by: petarts on September 30, 2016, 19:37:05
hello, i am trying the program here:
https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/CLDemo.java (https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/CLDemo.java)
and on
"PointerBuffer platforms = stack.mallocPointer(pi.get(0));"
it gives me
"Exception in thread "main" java.lang.OutOfMemoryError: Out of stack space."
is there any way i can fix that
Title: Re: OpenCL problem, out of stack space
Post by: spasi on September 30, 2016, 19:42:00
What happens if you change the first line from:

IntBuffer pi = stack.mallocInt(1);

to

IntBuffer pi = stack.callocInt(1);
Title: Re: OpenCL problem, out of stack space
Post by: petarts on September 30, 2016, 19:45:07
"if ( pi.get(0) == 0 )
throw new RuntimeException("No OpenCL platforms found.");"
this code throws an exeption
Title: Re: OpenCL problem, out of stack space
Post by: spasi on September 30, 2016, 20:09:40
Sounds like a broken OpenCL ICD loader. The error check on clGetPlatformIDs passes (it returns CL_SUCCESS), but it doesn't write anything to the IntBuffer. Very weird.
Title: Re: OpenCL problem, out of stack space
Post by: petarts on September 30, 2016, 20:16:18
actually i have removed all checkCLError-s because eclipse tells me
"The method checkCLError(int) is undefined for the type Main"
Title: Re: OpenCL problem, out of stack space
Post by: spasi on October 01, 2016, 11:28:38
This is (https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/opencl/InfoUtil.java) the InfoUtil class that contains the checkCLError method.
Title: Re: OpenCL problem, out of stack space
Post by: petarts on October 01, 2016, 12:53:11
it tells me
"The import org.lwjgl.opencl.InfoUtil cannot be resolved"
i am using 3.0.0 build 90
and i just redownloaded it to see if there was a problem with the preveous download, but the same thing happens
when i looked in the jar, that class is missing, but idk why
Title: Re: OpenCL problem, out of stack space
Post by: spasi on October 01, 2016, 14:29:56
The InfoUtil class is not part of LWJGL. It's only used in the LWJGL tests.
Title: Re: OpenCL problem, out of stack space
Post by: petarts on October 01, 2016, 16:15:31
by adding that, it gives me this information:
"#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffb716d14ea, pid=4264, tid=8068
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C  [OpenCL.dll+0x14ea]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# D:\Desktop\Eclipse_school\Island_Domination-TEST\hs_err_pid4264.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#"
i probably should've sent the code i'm testing sooner, but here you go
public class Main{
@SuppressWarnings("unused")
public static void main(String[] args){
//Game g=new Game("sample");
Configuration.OPENCL_EXPLICIT_INIT.set(true);
CL.create();
try(MemoryStack stack=stackPush()){
IntBuffer pi= stack.mallocInt(1);
InfoUtil.checkCLError(clGetPlatformIDs(null, pi));
if ( pi.get(0) == 0 )
throw new RuntimeException("No OpenCL platforms found.");
PointerBuffer platforms=stack.mallocPointer(pi.get(0));
InfoUtil.checkCLError(clGetPlatformIDs(platforms, (IntBuffer)null));
long platform=platforms.get(0);
PointerBuffer devices=stack.mallocPointer(pi.get(0));
long device=devices.get(0);
CLContextCallback contextCB;
PointerBuffer ctxProps = stack.mallocPointer(3);
ctxProps.put(0, CL_CONTEXT_PLATFORM).put(2, 0);
ctxProps.put(1,platform);
IntBuffer errcode_ret = stack.callocInt(1);
long context= clCreateContext(ctxProps, device, contextCB = CLContextCallback.create((errinfo, private_info, cb, user_data) -> {
System.err.println("[LWJGL] cl_context_callback");
System.err.println("\tInfo: " + memUTF8(errinfo));
}), NULL, errcode_ret);
long que=clCreateCommandQueue(context, device, NULL, errcode_ret);
CharSequence add=
"_kernel void sum(_global const float* a, _global float* result, int const size) {\n"+
" const int itemId = get_global_id(0); \n"+
" if(itemId < size) {\n"+
" result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
" }\n"+
"}";
long sumProgram=CL10.clCreateProgramWithSource(context, add, null);
long sumKernel=CL10.clCreateKernel(sumProgram, "sum", (int[])null);
float[] in=new float[200];
float[] out=new float[100];
for(int i=0;i<100;i++){
in[i]=i;
in[i+1]=i;
}
CL10.clSetKernelArg(sumKernel,0,in);
CL10.clSetKernelArg(sumKernel,1,out);
CL10.clSetKernelArg(sumKernel, 2, 100);
PointerBuffer globalWorkSize = BufferUtils.createPointerBuffer(1);
globalWorkSize.put(0, 100);
clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, null);
CL10.clFinish(que);
for(int i=0;i<100;i++){
System.out.println(out[i]);
}
}
}
}
Title: Re: OpenCL problem, out of stack space
Post by: Kai on October 01, 2016, 17:02:52
You are missing a few essential OpenCL calls in your code, which looks nothing like the referenced
  https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/CLDemo.java
which you said you had problems with.
Have a look at
  https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/Mandelbrot.java
which also uses kernels.
Title: Re: OpenCL problem, out of stack space
Post by: petarts on October 01, 2016, 18:55:23
ok, i have now used the CLDemo class as base for my class and removed anything that seems unnecessairy and put the things i need instead of doing it the other way arround- first the old tutorial than trying to repair it, but now i have bumped into another problem, here's the new code:

/*
* Copyright LWJGL. All rights reserved.
* License terms: https://www.lwjgl.org/license
*/
package Main;

import org.lwjgl.BufferUtils;
import org.lwjgl.PointerBuffer;
import org.lwjgl.opencl.*;
import org.lwjgl.system.MemoryStack;

import java.nio.IntBuffer;
import static org.lwjgl.opencl.CL10.*;
import static Main.InfoUtil.*;
import static org.lwjgl.system.MemoryStack.*;
import static org.lwjgl.system.MemoryUtil.*;

public final class CLDemo {

private CLDemo() {
}

public static void main(String[] args) {
try ( MemoryStack stack = stackPush() ) {
demo(stack);
}
}

private static void demo(MemoryStack stack) {
IntBuffer pi = stack.mallocInt(1);
checkCLError(clGetPlatformIDs(null, pi));
if ( pi.get(0) == 0 )
throw new RuntimeException("No OpenCL platforms found.");
PointerBuffer platforms = stack.mallocPointer(pi.get(0));
checkCLError(clGetPlatformIDs(platforms, (IntBuffer)null));

PointerBuffer ctxProps = stack.mallocPointer(3);
ctxProps
.put(0, CL_CONTEXT_PLATFORM)
.put(2, 0);

IntBuffer errcode_ret = stack.callocInt(1);
long platform = platforms.get(0);
ctxProps.put(1, platform);

CLCapabilities platformCaps = CL.createPlatformCapabilities(platform);

checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, null, pi));

PointerBuffer devices = stack.mallocPointer(pi.get(0));
checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, devices, (IntBuffer)null));
long device = devices.get(0);
CLCapabilities caps = CL.createDeviceCapabilities(device, platformCaps);
CLContextCallback contextCB;
long context = clCreateContext(ctxProps, device, contextCB = CLContextCallback.create((errinfo, private_info, cb, user_data) -> {
System.err.println("[LWJGL] cl_context_callback");
System.err.println("\tInfo: " + memUTF8(errinfo));
}), NULL, errcode_ret);
checkCLError(errcode_ret);
long que=clCreateCommandQueue(context, device, NULL, errcode_ret);
CharSequence add=
"_kernel void sum(_global const float* a, _global float* result, int const size) {\n"+
" const int itemId = get_global_id(0); \n"+
" if(itemId < size) {\n"+
" result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
" }\n"+
"}";
long sumProgram=CL10.clCreateProgramWithSource(context, add, null);
long sumKernel=CL10.clCreateKernel(sumProgram, "sum", (int[])null);
float[] in=new float[200];
float[] out=new float[100];
for(int i=0;i<100;i++){
in[i]=i;
in[i+1]=i;
}
CL10.clSetKernelArg(sumKernel,0,in);
CL10.clSetKernelArg(sumKernel,1,out);
CL10.clSetKernelArg(sumKernel, 2, 100);
PointerBuffer globalWorkSize = BufferUtils.createPointerBuffer(1);
globalWorkSize.put(0, 100);
clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, null);
CL10.clFinish(que);
for(int i=0;i<100;i++){
System.out.println(out[i]);
}

}


}

here's the error:

Exception in thread "main" java.lang.NullPointerException
   at org.lwjgl.system.Checks.checkPointer(Checks.java:103)
   at org.lwjgl.opencl.CL10.clSetKernelArg(CL10.java:8055)
   at Main.CLDemo.demo(CLDemo.java:76)
   at Main.CLDemo.main(CLDemo.java:25)
Title: Re: OpenCL problem, out of stack space
Post by: spasi on October 01, 2016, 19:33:11
OpenCL is not a simple API and you won't go far without heavy reading of the specification... You should start by making small changes to existing demos, the code above has considerable changes that don't make sense and is missing critical functionality:

- You are not building the program.
- You are not checking for compilation errors.
- The keywords are kernel and global, not _kernel and _global.
- You're trying to pass Java float arrays as arguments to the kernel. This is never going to work. You need to create cl_mem objects and set those as the kernel arguments.
- clSetKernelArg is a hard function to use properly and is kind of a special case in LWJGL (has a ton of overloads for convenience). Make sure you properly understand what it does and use the correct overload (hint for the first two arguments: use clSetKernelArg1p(sumKernel, index, cl_mem_object)).
Title: Re: OpenCL problem, out of stack space
Post by: petarts on October 01, 2016, 19:42:19
i would like to use openCL to optimize one library- J3dBool (UnBBoolean) as it uses the cpu to calculate 3d objects, and it becomes preety slow after some changes, and i feel like the code i have now, if i modify it to work it might be good ENOUGH to optimize that library
also- i didn't use a float buffer, because it showed that a normal float array could work, i guess i was wrong, a lot
Title: Re: OpenCL problem, out of stack space
Post by: Kai on October 01, 2016, 20:02:22
I don't know how fit you are in constructive solid geometry, but from my point of view CSG is not a good fit for SIMD (the computation model of OpenCL), because the algorithms are complex and are inherently non-parallelizable at certain points. Therefore you will likely not see any gains in performance there.
Did you think about how you are actually going to submit and read the mesh data with OpenCL (in what form you are representing the geometry), what your units of work are, and how you can make use of data parallelization (the only kind of parallelization that SIMD applies to)?
In my opinion, it would be much more worthwhile improving/optimizing the CPU implementation.
Title: Re: OpenCL problem, out of stack space
Post by: petarts on October 01, 2016, 20:15:17
the thing is i am trying to make use of it in a game and it's too slow to use in-game and i will try (after i learn how to use openCL) to optimize as much as i can in that library with opencl, i will possibly throw some things out of the window as i'm not using it (the colors of the objects)
Title: Re: OpenCL problem, out of stack space
Post by: petarts on October 02, 2016, 07:40:15
when i read the javadoc about clBuildProgram it says
"user_data can be NULL."
but i don't know how to give it null so i give it zero, but it gives me an error:
/*
* Copyright LWJGL. All rights reserved.
* License terms: https://www.lwjgl.org/license
*/
package Main;

import org.lwjgl.BufferUtils;
import org.lwjgl.PointerBuffer;
import org.lwjgl.opencl.*;
import org.lwjgl.system.MemoryStack;

import java.nio.FloatBuffer;
import java.nio.IntBuffer;
import static org.lwjgl.opencl.CL10.*;
import static Main.InfoUtil.*;
import static org.lwjgl.system.MemoryStack.*;
import static org.lwjgl.system.MemoryUtil.*;

public final class CLDemo {

private CLDemo() {
}

public static void main(String[] args) {
try ( MemoryStack stack = stackPush() ) {
demo(stack);
}
}

private static void demo(MemoryStack stack) {
IntBuffer pi = stack.mallocInt(1);
checkCLError(clGetPlatformIDs(null, pi));
if ( pi.get(0) == 0 )
throw new RuntimeException("No OpenCL platforms found.");
PointerBuffer platforms = stack.mallocPointer(pi.get(0));
checkCLError(clGetPlatformIDs(platforms, (IntBuffer)null));

PointerBuffer ctxProps = stack.mallocPointer(3);
ctxProps
.put(0, CL_CONTEXT_PLATFORM)
.put(2, 0);

IntBuffer errcode_ret = stack.callocInt(1);
long platform = platforms.get(0);
ctxProps.put(1, platform);

CLCapabilities platformCaps = CL.createPlatformCapabilities(platform);

checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, null, pi));

PointerBuffer devices = stack.mallocPointer(pi.get(0));
checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, devices, (IntBuffer)null));
long device = devices.get(0);
CLCapabilities caps = CL.createDeviceCapabilities(device, platformCaps);
CLContextCallback contextCB;
long context = clCreateContext(ctxProps, device, contextCB = CLContextCallback.create((errinfo, private_info, cb, user_data) -> {
System.err.println("[LWJGL] cl_context_callback");
System.err.println("\tInfo: " + memUTF8(errinfo));
}), NULL, errcode_ret);
checkCLError(errcode_ret);
long que=clCreateCommandQueue(context, device, NULL, errcode_ret);
CharSequence add=
"_kernel void sum(_global const float* a, _global float* result, int const size) {\n"+
" const int itemId = get_global_id(0); \n"+
" if(itemId < size) {\n"+
" result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
" }\n"+
"}";
long sumProgram=CL10.clCreateProgramWithSource(context, add, null);
int error = CL10.clBuildProgram(sumProgram, devices.get(0), "", null,0);
checkCLError(error);
long sumKernel=CL10.clCreateKernel(sumProgram, "sum", (int[])null);
float[] in=new float[200];
float[] out=new float[100];
for(int i=0;i<100;i++){
in[i]=i;
in[i+1]=i;
}
FloatBuffer aBuff = BufferUtils.createFloatBuffer(200);
aBuff.put(in);
aBuff.rewind();
IntBuffer errorBuff = BufferUtils.createIntBuffer(1); // Error buffer

long _in = CL10.clCreateBuffer(context, CL10.CL_MEM_WRITE_ONLY | CL10.CL_MEM_COPY_HOST_PTR, aBuff, errorBuff);
checkCLError(errorBuff.get(0));
long _out = CL10.clCreateBuffer(context, CL10.CL_MEM_READ_ONLY, 400, errorBuff);
checkCLError(errorBuff.get(0));
CL10.clSetKernelArg1p(sumKernel,0,_in);
CL10.clSetKernelArg1p(sumKernel,1,_out);
CL10.clSetKernelArg1p(sumKernel, 2, 100);
PointerBuffer globalWorkSize = BufferUtils.createPointerBuffer(1);
globalWorkSize.put(0, 100);
clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, null);
CL10.clFinish(que);
for(int i=0;i<100;i++){
System.out.println(out[i]);
}

}


}


Exception in thread "main" java.lang.RuntimeException: OpenCL error [0xFFFFFFF5]
at Main.InfoUtil.checkCLError(InfoUtil.java:130)
at Main.CLDemo.demo(CLDemo.java:71)
at Main.CLDemo.main(CLDemo.java:26)
Title: Re: OpenCL problem, out of stack space
Post by: Kai on October 02, 2016, 10:14:13
You should re-read Spasi's last post.

Long story short, here is a working version of your reduction program:

private static void demo(MemoryStack stack) {
    IntBuffer counts = stack.mallocInt(1);
    checkCLError(clGetPlatformIDs(null, counts));
    int platformCount = counts.get(0);
    if (platformCount == 0)
        throw new RuntimeException("No OpenCL platforms found.");
    PointerBuffer platforms = stack.mallocPointer(platformCount);
    checkCLError(clGetPlatformIDs(platforms, (IntBuffer) null));
    PointerBuffer ctxProps = stack.mallocPointer(3);
    ctxProps.put(0, CL_CONTEXT_PLATFORM).put(2, 0);
    IntBuffer errcode_ret = stack.callocInt(1);
    long platform = platforms.get(0);
    ctxProps.put(1, platform);
    checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, null, counts));
    int deviceCount = counts.get(0);
    if (deviceCount == 0)
      throw new RuntimeException("No OpenCL devices found.");
    PointerBuffer devices = stack.mallocPointer(deviceCount);
    checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, devices, (IntBuffer) null));
    long device = devices.get(0);
    long context = clCreateContext(ctxProps, device, null, NULL, errcode_ret);
    checkCLError(errcode_ret);
    long que = clCreateCommandQueue(context, device, NULL, errcode_ret);
    checkCLError(errcode_ret);
    CharSequence add =
    "kernel void sum(global const float* a, global float* result, int const size) {\n"+ // <- 'kernel' and 'global' !
    "   const int itemId = get_global_id(0); \n"+
    "   if(itemId < size) {\n"+
    "       result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
    "   }\n"+
    "}";
    long sumProgram = CL10.clCreateProgramWithSource(context, add, null);
    checkCLError(CL10.clBuildProgram(sumProgram, devices.get(0), "", null,0));
    checkCLError(errcode_ret);
    long sumKernel = CL10.clCreateKernel(sumProgram, "sum", errcode_ret);
    checkCLError(errcode_ret);
    float[] in  = new float[200];
    float[] out = new float[100];
    for (int i = 0; i < 200; i++) {
        in[i] = i;
    }
    FloatBuffer aBuff = stack.mallocFloat(200);
    aBuff.put(in).rewind();
    long _in = CL10.clCreateBuffer(context, CL10.CL_MEM_READ_ONLY | CL10.CL_MEM_COPY_HOST_PTR, aBuff, errcode_ret); // <- READ_ONLY !
    checkCLError(errcode_ret);
    long _out = CL10.clCreateBuffer(context, CL10.CL_MEM_READ_WRITE, 400, errcode_ret); // <- READ_WRITE !
    checkCLError(errcode_ret);
    checkCLError(CL10.clSetKernelArg1p(sumKernel, 0, _in));
    checkCLError(CL10.clSetKernelArg1p(sumKernel, 1, _out));
    checkCLError(CL10.clSetKernelArg1i(sumKernel, 2, 100)); // <- clSetKernelArg1i !
    PointerBuffer globalWorkSize = stack.mallocPointer(1);
    globalWorkSize.put(0, 100);
    PointerBuffer kernelEvent = stack.mallocPointer(1);
    checkCLError(clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, kernelEvent));
    PointerBuffer readEvent = stack.mallocPointer(1);
    checkCLError(clEnqueueReadBuffer(que, _out, 1, 0, out, kernelEvent, readEvent)); // <- read back results !
    checkCLError(clWaitForEvents(readEvent));
    for (int i = 0; i < 100; i++) {
        System.out.println(out[i]);
    }
}
Title: Re: OpenCL problem, out of stack space
Post by: petarts on October 02, 2016, 12:06:24
okay, thank you, i will now look at what you have done and see what mistakes i have made in my program