OpenCL problem, out of stack space [solved]

Started by petarts, September 30, 2016, 19:37:05

Previous topic - Next topic

petarts

hello, i am trying the program here:
https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/CLDemo.java
and on
"PointerBuffer platforms = stack.mallocPointer(pi.get(0));"
it gives me
"Exception in thread "main" java.lang.OutOfMemoryError: Out of stack space."
is there any way i can fix that

spasi

What happens if you change the first line from:

IntBuffer pi = stack.mallocInt(1);

to

IntBuffer pi = stack.callocInt(1);

petarts

"if ( pi.get(0) == 0 )
throw new RuntimeException("No OpenCL platforms found.");"
this code throws an exeption

spasi

Sounds like a broken OpenCL ICD loader. The error check on clGetPlatformIDs passes (it returns CL_SUCCESS), but it doesn't write anything to the IntBuffer. Very weird.

petarts

actually i have removed all checkCLError-s because eclipse tells me
"The method checkCLError(int) is undefined for the type Main"

spasi

This is the InfoUtil class that contains the checkCLError method.

petarts

it tells me
"The import org.lwjgl.opencl.InfoUtil cannot be resolved"
i am using 3.0.0 build 90
and i just redownloaded it to see if there was a problem with the preveous download, but the same thing happens
when i looked in the jar, that class is missing, but idk why

spasi

The InfoUtil class is not part of LWJGL. It's only used in the LWJGL tests.

petarts

by adding that, it gives me this information:
"#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffb716d14ea, pid=4264, tid=8068
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C  [OpenCL.dll+0x14ea]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# D:\Desktop\Eclipse_school\Island_Domination-TEST\hs_err_pid4264.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#"
i probably should've sent the code i'm testing sooner, but here you go
public class Main{
	@SuppressWarnings("unused")
	public static void main(String[] args){
		//Game g=new Game("sample");
		Configuration.OPENCL_EXPLICIT_INIT.set(true);
		CL.create();
		try(MemoryStack stack=stackPush()){
			IntBuffer pi= stack.mallocInt(1);
			InfoUtil.checkCLError(clGetPlatformIDs(null, pi));
			if ( pi.get(0) == 0 )
				throw new RuntimeException("No OpenCL platforms found.");
			PointerBuffer platforms=stack.mallocPointer(pi.get(0));
			InfoUtil.checkCLError(clGetPlatformIDs(platforms, (IntBuffer)null));
			long platform=platforms.get(0);
			PointerBuffer devices=stack.mallocPointer(pi.get(0));
			long device=devices.get(0);
			CLContextCallback contextCB;
			PointerBuffer ctxProps = stack.mallocPointer(3);
			ctxProps.put(0, CL_CONTEXT_PLATFORM).put(2, 0);
			ctxProps.put(1,platform);
			IntBuffer errcode_ret = stack.callocInt(1);
			long context= clCreateContext(ctxProps, device, contextCB = CLContextCallback.create((errinfo, private_info, cb, user_data) -> {
				System.err.println("[LWJGL] cl_context_callback");
				System.err.println("\tInfo: " + memUTF8(errinfo));
				}), NULL, errcode_ret);
			long que=clCreateCommandQueue(context, device, NULL, errcode_ret);
			CharSequence add=
			"_kernel void sum(_global const float* a, _global float* result, int const size) {\n"+
			"	const int itemId = get_global_id(0); \n"+
			"	if(itemId < size) {\n"+
			"		result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
			"	}\n"+
			"}";
			long sumProgram=CL10.clCreateProgramWithSource(context, add, null);
			long sumKernel=CL10.clCreateKernel(sumProgram, "sum", (int[])null);
			float[] in=new float[200];
			float[] out=new float[100];
			for(int i=0;i<100;i++){
				in[i]=i;
				in[i+1]=i;
			}
			CL10.clSetKernelArg(sumKernel,0,in);
			CL10.clSetKernelArg(sumKernel,1,out);
			CL10.clSetKernelArg(sumKernel, 2, 100);
			PointerBuffer globalWorkSize = BufferUtils.createPointerBuffer(1);
			globalWorkSize.put(0, 100);
			clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, null);
			CL10.clFinish(que);
			for(int i=0;i<100;i++){
				System.out.println(out[i]);
			}
		}
	}
}

Kai

You are missing a few essential OpenCL calls in your code, which looks nothing like the referenced
  https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/CLDemo.java
which you said you had problems with.
Have a look at
  https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/Mandelbrot.java
which also uses kernels.

petarts

ok, i have now used the CLDemo class as base for my class and removed anything that seems unnecessairy and put the things i need instead of doing it the other way arround- first the old tutorial than trying to repair it, but now i have bumped into another problem, here's the new code:
/*
 * Copyright LWJGL. All rights reserved.
 * License terms: https://www.lwjgl.org/license
 */
package Main;

import org.lwjgl.BufferUtils;
import org.lwjgl.PointerBuffer;
import org.lwjgl.opencl.*;
import org.lwjgl.system.MemoryStack;

import java.nio.IntBuffer;
import static org.lwjgl.opencl.CL10.*;
import static Main.InfoUtil.*;
import static org.lwjgl.system.MemoryStack.*;
import static org.lwjgl.system.MemoryUtil.*;

public final class CLDemo {

	private CLDemo() {
	}

	public static void main(String[] args) {
		try ( MemoryStack stack = stackPush() ) {
			demo(stack);
		}
	}

	private static void demo(MemoryStack stack) {
		IntBuffer pi = stack.mallocInt(1);
		checkCLError(clGetPlatformIDs(null, pi));
		if ( pi.get(0) == 0 )
			throw new RuntimeException("No OpenCL platforms found.");
		PointerBuffer platforms = stack.mallocPointer(pi.get(0));
		checkCLError(clGetPlatformIDs(platforms, (IntBuffer)null));

		PointerBuffer ctxProps = stack.mallocPointer(3);
		ctxProps
			.put(0, CL_CONTEXT_PLATFORM)
			.put(2, 0);

		IntBuffer errcode_ret = stack.callocInt(1);
			long platform = platforms.get(0);
			ctxProps.put(1, platform);

			CLCapabilities platformCaps = CL.createPlatformCapabilities(platform);

			checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, null, pi));

			PointerBuffer devices = stack.mallocPointer(pi.get(0));
			checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, devices, (IntBuffer)null));
				long device = devices.get(0);
				CLCapabilities caps = CL.createDeviceCapabilities(device, platformCaps);
				CLContextCallback contextCB;
				long context = clCreateContext(ctxProps, device, contextCB = CLContextCallback.create((errinfo, private_info, cb, user_data) -> {
					System.err.println("[LWJGL] cl_context_callback");
					System.err.println("\tInfo: " + memUTF8(errinfo));
				}), NULL, errcode_ret);
				checkCLError(errcode_ret);
				long que=clCreateCommandQueue(context, device, NULL, errcode_ret);
				CharSequence add=
				"_kernel void sum(_global const float* a, _global float* result, int const size) {\n"+
				"	const int itemId = get_global_id(0); \n"+
				"	if(itemId < size) {\n"+
				"		result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
				"	}\n"+
				"}";
				long sumProgram=CL10.clCreateProgramWithSource(context, add, null);
				long sumKernel=CL10.clCreateKernel(sumProgram, "sum", (int[])null);
				float[] in=new float[200];
				float[] out=new float[100];
				for(int i=0;i<100;i++){
					in[i]=i;
					in[i+1]=i;
				}
				CL10.clSetKernelArg(sumKernel,0,in);
				CL10.clSetKernelArg(sumKernel,1,out);
				CL10.clSetKernelArg(sumKernel, 2, 100);
				PointerBuffer globalWorkSize = BufferUtils.createPointerBuffer(1);
				globalWorkSize.put(0, 100);
				clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, null);
				CL10.clFinish(que);
				for(int i=0;i<100;i++){
					System.out.println(out[i]);
				}
				
	}


}

here's the error:

Exception in thread "main" java.lang.NullPointerException
   at org.lwjgl.system.Checks.checkPointer(Checks.java:103)
   at org.lwjgl.opencl.CL10.clSetKernelArg(CL10.java:8055)
   at Main.CLDemo.demo(CLDemo.java:76)
   at Main.CLDemo.main(CLDemo.java:25)

spasi

OpenCL is not a simple API and you won't go far without heavy reading of the specification... You should start by making small changes to existing demos, the code above has considerable changes that don't make sense and is missing critical functionality:

- You are not building the program.
- You are not checking for compilation errors.
- The keywords are kernel and global, not _kernel and _global.
- You're trying to pass Java float arrays as arguments to the kernel. This is never going to work. You need to create cl_mem objects and set those as the kernel arguments.
- clSetKernelArg is a hard function to use properly and is kind of a special case in LWJGL (has a ton of overloads for convenience). Make sure you properly understand what it does and use the correct overload (hint for the first two arguments: use clSetKernelArg1p(sumKernel, index, cl_mem_object)).

petarts

i would like to use openCL to optimize one library- J3dBool (UnBBoolean) as it uses the cpu to calculate 3d objects, and it becomes preety slow after some changes, and i feel like the code i have now, if i modify it to work it might be good ENOUGH to optimize that library
also- i didn't use a float buffer, because it showed that a normal float array could work, i guess i was wrong, a lot

Kai

I don't know how fit you are in constructive solid geometry, but from my point of view CSG is not a good fit for SIMD (the computation model of OpenCL), because the algorithms are complex and are inherently non-parallelizable at certain points. Therefore you will likely not see any gains in performance there.
Did you think about how you are actually going to submit and read the mesh data with OpenCL (in what form you are representing the geometry), what your units of work are, and how you can make use of data parallelization (the only kind of parallelization that SIMD applies to)?
In my opinion, it would be much more worthwhile improving/optimizing the CPU implementation.

petarts

the thing is i am trying to make use of it in a game and it's too slow to use in-game and i will try (after i learn how to use openCL) to optimize as much as i can in that library with opencl, i will possibly throw some things out of the window as i'm not using it (the colors of the objects)