hello, i am trying the program here:
https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/CLDemo.java (https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/CLDemo.java)
and on
"PointerBuffer platforms = stack.mallocPointer(pi.get(0));"
it gives me
"Exception in thread "main" java.lang.OutOfMemoryError: Out of stack space."
is there any way i can fix that
What happens if you change the first line from:
IntBuffer pi = stack.mallocInt(1);
to
IntBuffer pi = stack.callocInt(1);
"if ( pi.get(0) == 0 )
throw new RuntimeException("No OpenCL platforms found.");"
this code throws an exeption
Sounds like a broken OpenCL ICD loader. The error check on clGetPlatformIDs passes (it returns CL_SUCCESS), but it doesn't write anything to the IntBuffer. Very weird.
actually i have removed all checkCLError-s because eclipse tells me
"The method checkCLError(int) is undefined for the type Main"
This is (https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/opencl/InfoUtil.java) the InfoUtil class that contains the checkCLError method.
it tells me
"The import org.lwjgl.opencl.InfoUtil cannot be resolved"
i am using 3.0.0 build 90
and i just redownloaded it to see if there was a problem with the preveous download, but the same thing happens
when i looked in the jar, that class is missing, but idk why
The InfoUtil class is not part of LWJGL. It's only used in the LWJGL tests.
by adding that, it gives me this information:
"#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffb716d14ea, pid=4264, tid=8068
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C [OpenCL.dll+0x14ea]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# D:\Desktop\Eclipse_school\Island_Domination-TEST\hs_err_pid4264.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#"
i probably should've sent the code i'm testing sooner, but here you go
public class Main{
@SuppressWarnings("unused")
public static void main(String[] args){
//Game g=new Game("sample");
Configuration.OPENCL_EXPLICIT_INIT.set(true);
CL.create();
try(MemoryStack stack=stackPush()){
IntBuffer pi= stack.mallocInt(1);
InfoUtil.checkCLError(clGetPlatformIDs(null, pi));
if ( pi.get(0) == 0 )
throw new RuntimeException("No OpenCL platforms found.");
PointerBuffer platforms=stack.mallocPointer(pi.get(0));
InfoUtil.checkCLError(clGetPlatformIDs(platforms, (IntBuffer)null));
long platform=platforms.get(0);
PointerBuffer devices=stack.mallocPointer(pi.get(0));
long device=devices.get(0);
CLContextCallback contextCB;
PointerBuffer ctxProps = stack.mallocPointer(3);
ctxProps.put(0, CL_CONTEXT_PLATFORM).put(2, 0);
ctxProps.put(1,platform);
IntBuffer errcode_ret = stack.callocInt(1);
long context= clCreateContext(ctxProps, device, contextCB = CLContextCallback.create((errinfo, private_info, cb, user_data) -> {
System.err.println("[LWJGL] cl_context_callback");
System.err.println("\tInfo: " + memUTF8(errinfo));
}), NULL, errcode_ret);
long que=clCreateCommandQueue(context, device, NULL, errcode_ret);
CharSequence add=
"_kernel void sum(_global const float* a, _global float* result, int const size) {\n"+
" const int itemId = get_global_id(0); \n"+
" if(itemId < size) {\n"+
" result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
" }\n"+
"}";
long sumProgram=CL10.clCreateProgramWithSource(context, add, null);
long sumKernel=CL10.clCreateKernel(sumProgram, "sum", (int[])null);
float[] in=new float[200];
float[] out=new float[100];
for(int i=0;i<100;i++){
in[i]=i;
in[i+1]=i;
}
CL10.clSetKernelArg(sumKernel,0,in);
CL10.clSetKernelArg(sumKernel,1,out);
CL10.clSetKernelArg(sumKernel, 2, 100);
PointerBuffer globalWorkSize = BufferUtils.createPointerBuffer(1);
globalWorkSize.put(0, 100);
clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, null);
CL10.clFinish(que);
for(int i=0;i<100;i++){
System.out.println(out[i]);
}
}
}
}
You are missing a few essential OpenCL calls in your code, which looks nothing like the referenced
https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/CLDemo.java
which you said you had problems with.
Have a look at
https://github.com/LWJGL/lwjgl3/blob/master/modules/core/src/test/java/org/lwjgl/demo/opencl/Mandelbrot.java
which also uses kernels.
ok, i have now used the CLDemo class as base for my class and removed anything that seems unnecessairy and put the things i need instead of doing it the other way arround- first the old tutorial than trying to repair it, but now i have bumped into another problem, here's the new code:
/*
* Copyright LWJGL. All rights reserved.
* License terms: https://www.lwjgl.org/license
*/
package Main;
import org.lwjgl.BufferUtils;
import org.lwjgl.PointerBuffer;
import org.lwjgl.opencl.*;
import org.lwjgl.system.MemoryStack;
import java.nio.IntBuffer;
import static org.lwjgl.opencl.CL10.*;
import static Main.InfoUtil.*;
import static org.lwjgl.system.MemoryStack.*;
import static org.lwjgl.system.MemoryUtil.*;
public final class CLDemo {
private CLDemo() {
}
public static void main(String[] args) {
try ( MemoryStack stack = stackPush() ) {
demo(stack);
}
}
private static void demo(MemoryStack stack) {
IntBuffer pi = stack.mallocInt(1);
checkCLError(clGetPlatformIDs(null, pi));
if ( pi.get(0) == 0 )
throw new RuntimeException("No OpenCL platforms found.");
PointerBuffer platforms = stack.mallocPointer(pi.get(0));
checkCLError(clGetPlatformIDs(platforms, (IntBuffer)null));
PointerBuffer ctxProps = stack.mallocPointer(3);
ctxProps
.put(0, CL_CONTEXT_PLATFORM)
.put(2, 0);
IntBuffer errcode_ret = stack.callocInt(1);
long platform = platforms.get(0);
ctxProps.put(1, platform);
CLCapabilities platformCaps = CL.createPlatformCapabilities(platform);
checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, null, pi));
PointerBuffer devices = stack.mallocPointer(pi.get(0));
checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, devices, (IntBuffer)null));
long device = devices.get(0);
CLCapabilities caps = CL.createDeviceCapabilities(device, platformCaps);
CLContextCallback contextCB;
long context = clCreateContext(ctxProps, device, contextCB = CLContextCallback.create((errinfo, private_info, cb, user_data) -> {
System.err.println("[LWJGL] cl_context_callback");
System.err.println("\tInfo: " + memUTF8(errinfo));
}), NULL, errcode_ret);
checkCLError(errcode_ret);
long que=clCreateCommandQueue(context, device, NULL, errcode_ret);
CharSequence add=
"_kernel void sum(_global const float* a, _global float* result, int const size) {\n"+
" const int itemId = get_global_id(0); \n"+
" if(itemId < size) {\n"+
" result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
" }\n"+
"}";
long sumProgram=CL10.clCreateProgramWithSource(context, add, null);
long sumKernel=CL10.clCreateKernel(sumProgram, "sum", (int[])null);
float[] in=new float[200];
float[] out=new float[100];
for(int i=0;i<100;i++){
in[i]=i;
in[i+1]=i;
}
CL10.clSetKernelArg(sumKernel,0,in);
CL10.clSetKernelArg(sumKernel,1,out);
CL10.clSetKernelArg(sumKernel, 2, 100);
PointerBuffer globalWorkSize = BufferUtils.createPointerBuffer(1);
globalWorkSize.put(0, 100);
clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, null);
CL10.clFinish(que);
for(int i=0;i<100;i++){
System.out.println(out[i]);
}
}
}
here's the error:
Exception in thread "main" java.lang.NullPointerException
at org.lwjgl.system.Checks.checkPointer(Checks.java:103)
at org.lwjgl.opencl.CL10.clSetKernelArg(CL10.java:8055)
at Main.CLDemo.demo(CLDemo.java:76)
at Main.CLDemo.main(CLDemo.java:25)
OpenCL is not a simple API and you won't go far without heavy reading of the specification... You should start by making small changes to existing demos, the code above has considerable changes that don't make sense and is missing critical functionality:
- You are not building the program.
- You are not checking for compilation errors.
- The keywords are kernel and global, not _kernel and _global.
- You're trying to pass Java float arrays as arguments to the kernel. This is never going to work. You need to create cl_mem objects and set those as the kernel arguments.
- clSetKernelArg is a hard function to use properly and is kind of a special case in LWJGL (has a ton of overloads for convenience). Make sure you properly understand what it does and use the correct overload (hint for the first two arguments: use clSetKernelArg1p(sumKernel, index, cl_mem_object)).
i would like to use openCL to optimize one library- J3dBool (UnBBoolean) as it uses the cpu to calculate 3d objects, and it becomes preety slow after some changes, and i feel like the code i have now, if i modify it to work it might be good ENOUGH to optimize that library
also- i didn't use a float buffer, because it showed that a normal float array could work, i guess i was wrong, a lot
I don't know how fit you are in constructive solid geometry, but from my point of view CSG is not a good fit for SIMD (the computation model of OpenCL), because the algorithms are complex and are inherently non-parallelizable at certain points. Therefore you will likely not see any gains in performance there.
Did you think about how you are actually going to submit and read the mesh data with OpenCL (in what form you are representing the geometry), what your units of work are, and how you can make use of data parallelization (the only kind of parallelization that SIMD applies to)?
In my opinion, it would be much more worthwhile improving/optimizing the CPU implementation.
the thing is i am trying to make use of it in a game and it's too slow to use in-game and i will try (after i learn how to use openCL) to optimize as much as i can in that library with opencl, i will possibly throw some things out of the window as i'm not using it (the colors of the objects)
when i read the javadoc about clBuildProgram it says
"user_data can be NULL."
but i don't know how to give it null so i give it zero, but it gives me an error:
/*
* Copyright LWJGL. All rights reserved.
* License terms: https://www.lwjgl.org/license
*/
package Main;
import org.lwjgl.BufferUtils;
import org.lwjgl.PointerBuffer;
import org.lwjgl.opencl.*;
import org.lwjgl.system.MemoryStack;
import java.nio.FloatBuffer;
import java.nio.IntBuffer;
import static org.lwjgl.opencl.CL10.*;
import static Main.InfoUtil.*;
import static org.lwjgl.system.MemoryStack.*;
import static org.lwjgl.system.MemoryUtil.*;
public final class CLDemo {
private CLDemo() {
}
public static void main(String[] args) {
try ( MemoryStack stack = stackPush() ) {
demo(stack);
}
}
private static void demo(MemoryStack stack) {
IntBuffer pi = stack.mallocInt(1);
checkCLError(clGetPlatformIDs(null, pi));
if ( pi.get(0) == 0 )
throw new RuntimeException("No OpenCL platforms found.");
PointerBuffer platforms = stack.mallocPointer(pi.get(0));
checkCLError(clGetPlatformIDs(platforms, (IntBuffer)null));
PointerBuffer ctxProps = stack.mallocPointer(3);
ctxProps
.put(0, CL_CONTEXT_PLATFORM)
.put(2, 0);
IntBuffer errcode_ret = stack.callocInt(1);
long platform = platforms.get(0);
ctxProps.put(1, platform);
CLCapabilities platformCaps = CL.createPlatformCapabilities(platform);
checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, null, pi));
PointerBuffer devices = stack.mallocPointer(pi.get(0));
checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, devices, (IntBuffer)null));
long device = devices.get(0);
CLCapabilities caps = CL.createDeviceCapabilities(device, platformCaps);
CLContextCallback contextCB;
long context = clCreateContext(ctxProps, device, contextCB = CLContextCallback.create((errinfo, private_info, cb, user_data) -> {
System.err.println("[LWJGL] cl_context_callback");
System.err.println("\tInfo: " + memUTF8(errinfo));
}), NULL, errcode_ret);
checkCLError(errcode_ret);
long que=clCreateCommandQueue(context, device, NULL, errcode_ret);
CharSequence add=
"_kernel void sum(_global const float* a, _global float* result, int const size) {\n"+
" const int itemId = get_global_id(0); \n"+
" if(itemId < size) {\n"+
" result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
" }\n"+
"}";
long sumProgram=CL10.clCreateProgramWithSource(context, add, null);
int error = CL10.clBuildProgram(sumProgram, devices.get(0), "", null,0);
checkCLError(error);
long sumKernel=CL10.clCreateKernel(sumProgram, "sum", (int[])null);
float[] in=new float[200];
float[] out=new float[100];
for(int i=0;i<100;i++){
in[i]=i;
in[i+1]=i;
}
FloatBuffer aBuff = BufferUtils.createFloatBuffer(200);
aBuff.put(in);
aBuff.rewind();
IntBuffer errorBuff = BufferUtils.createIntBuffer(1); // Error buffer
long _in = CL10.clCreateBuffer(context, CL10.CL_MEM_WRITE_ONLY | CL10.CL_MEM_COPY_HOST_PTR, aBuff, errorBuff);
checkCLError(errorBuff.get(0));
long _out = CL10.clCreateBuffer(context, CL10.CL_MEM_READ_ONLY, 400, errorBuff);
checkCLError(errorBuff.get(0));
CL10.clSetKernelArg1p(sumKernel,0,_in);
CL10.clSetKernelArg1p(sumKernel,1,_out);
CL10.clSetKernelArg1p(sumKernel, 2, 100);
PointerBuffer globalWorkSize = BufferUtils.createPointerBuffer(1);
globalWorkSize.put(0, 100);
clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, null);
CL10.clFinish(que);
for(int i=0;i<100;i++){
System.out.println(out[i]);
}
}
}
Exception in thread "main" java.lang.RuntimeException: OpenCL error [0xFFFFFFF5]
at Main.InfoUtil.checkCLError(InfoUtil.java:130)
at Main.CLDemo.demo(CLDemo.java:71)
at Main.CLDemo.main(CLDemo.java:26)
You should re-read Spasi's last post.
Long story short, here is a working version of your reduction program:
private static void demo(MemoryStack stack) {
IntBuffer counts = stack.mallocInt(1);
checkCLError(clGetPlatformIDs(null, counts));
int platformCount = counts.get(0);
if (platformCount == 0)
throw new RuntimeException("No OpenCL platforms found.");
PointerBuffer platforms = stack.mallocPointer(platformCount);
checkCLError(clGetPlatformIDs(platforms, (IntBuffer) null));
PointerBuffer ctxProps = stack.mallocPointer(3);
ctxProps.put(0, CL_CONTEXT_PLATFORM).put(2, 0);
IntBuffer errcode_ret = stack.callocInt(1);
long platform = platforms.get(0);
ctxProps.put(1, platform);
checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, null, counts));
int deviceCount = counts.get(0);
if (deviceCount == 0)
throw new RuntimeException("No OpenCL devices found.");
PointerBuffer devices = stack.mallocPointer(deviceCount);
checkCLError(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, devices, (IntBuffer) null));
long device = devices.get(0);
long context = clCreateContext(ctxProps, device, null, NULL, errcode_ret);
checkCLError(errcode_ret);
long que = clCreateCommandQueue(context, device, NULL, errcode_ret);
checkCLError(errcode_ret);
CharSequence add =
"kernel void sum(global const float* a, global float* result, int const size) {\n"+ // <- 'kernel' and 'global' !
" const int itemId = get_global_id(0); \n"+
" if(itemId < size) {\n"+
" result[itemId] = a[itemId*2] + a[itemId*2+1];\n"+
" }\n"+
"}";
long sumProgram = CL10.clCreateProgramWithSource(context, add, null);
checkCLError(CL10.clBuildProgram(sumProgram, devices.get(0), "", null,0));
checkCLError(errcode_ret);
long sumKernel = CL10.clCreateKernel(sumProgram, "sum", errcode_ret);
checkCLError(errcode_ret);
float[] in = new float[200];
float[] out = new float[100];
for (int i = 0; i < 200; i++) {
in[i] = i;
}
FloatBuffer aBuff = stack.mallocFloat(200);
aBuff.put(in).rewind();
long _in = CL10.clCreateBuffer(context, CL10.CL_MEM_READ_ONLY | CL10.CL_MEM_COPY_HOST_PTR, aBuff, errcode_ret); // <- READ_ONLY !
checkCLError(errcode_ret);
long _out = CL10.clCreateBuffer(context, CL10.CL_MEM_READ_WRITE, 400, errcode_ret); // <- READ_WRITE !
checkCLError(errcode_ret);
checkCLError(CL10.clSetKernelArg1p(sumKernel, 0, _in));
checkCLError(CL10.clSetKernelArg1p(sumKernel, 1, _out));
checkCLError(CL10.clSetKernelArg1i(sumKernel, 2, 100)); // <- clSetKernelArg1i !
PointerBuffer globalWorkSize = stack.mallocPointer(1);
globalWorkSize.put(0, 100);
PointerBuffer kernelEvent = stack.mallocPointer(1);
checkCLError(clEnqueueNDRangeKernel(que, sumKernel, 1, null, globalWorkSize, null, null, kernelEvent));
PointerBuffer readEvent = stack.mallocPointer(1);
checkCLError(clEnqueueReadBuffer(que, _out, 1, 0, out, kernelEvent, readEvent)); // <- read back results !
checkCLError(clWaitForEvents(readEvent));
for (int i = 0; i < 100; i++) {
System.out.println(out[i]);
}
}
okay, thank you, i will now look at what you have done and see what mistakes i have made in my program