Atomic counter not updated

Started by Imagyx, September 01, 2021, 05:53:56

Previous topic - Next topic

Imagyx

Hello everyone.

I have a problem with an atomic counter in my compute shader.
It renders as it should and there are no syntax errors, but the counter doesn't seem to update or even get to the shader in the first place.

This is the code:
// Setup counter
		atomicBuffer = glGenBuffers();
		glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicBuffer);
		glBufferData(GL_ATOMIC_COUNTER_BUFFER, 4, GL_DYNAMIC_COPY);
		List<Long> counters = new ArrayList<>();
		ByteBuffer counter = glMapBuffer(GL_ATOMIC_COUNTER_BUFFER, GL_WRITE_ONLY);
		counter.put((byte) 42);
		counter.rewind();
		glUnmapBuffer(GL_ATOMIC_COUNTER_BUFFER);
		glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 0, atomicBuffer);
	
		// compute (works fine)
		glUseProgram(computeProgram);		
		// Calculate 
		glDispatchCompute(numGroupsX, numGroupsY, 1);
		glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);

		// Get atomic counter value
		glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 0, atomicBuffer);
		ByteBuffer counter2 = ByteBuffer.allocate(4);
		glBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0, counter2);
		while (counter2.hasRemaining()) {
			long val = counter2.get();
			System.out.println(val);
			counters.add(val);
		}
		System.out.println(Arrays.toString(counters.toArray()));


The compute shader looks like this: (setup binding, and somewhere in the code increment the counter. This gives no errors and the program gets to the increment part for sure, otherwise i didn't get a correct image)
layout(binding = 0, rgba8) uniform image2D framebufferImage;
layout(local_size_x = 8, local_size_y = 8) in;
layout (binding = 1) uniform atomic_uint numIntersections;
...
...
atomicCounterIncrement(numIntersections);
...


The output is this:
0
0
0
0
[0, 0, 0, 0]


I guess this means the initial value of 42 doesn't get to the shader at all, is not update there and is not read out correctly.

Please help me, I've read tons of code available (https://www.khronos.org/opengl/wiki/Atomic_Counter, https://arm-software.github.io/opengl-es-sdk-for-android/compute_intro.html, Addison.Wesley.OpenGL.Programming.Guide.8th.Edition ....)
but nothing written in it seems to be missing here.

KaiHH

With this code:
ByteBuffer counter2 = ByteBuffer.allocate(4);
glBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0, counter2);

you are effectively uploading zeros into the GL buffer object bound to GL_ATOMIC_COUNTER_BUFFER and then you are iterating over your NIO ByteBuffer (with all zeros in it).
In order to read the contents of a GL buffer object, you should use glGetBufferSubData(...).
glBufferSubData() on the other hand _uploads_ host-memory to the GL buffer object. It does not _read-back_ the contents of the GL buffer object to host-memory. That's what glGetBufferSubData() is for.

EDIT: Also, your barrier of `GL_SHADER_IMAGE_ACCESS_BARRIER_BIT` is wrong for the kind of access you are doing in the compute shader. You do not want the effect of accessing images to be visible after the compute shader invocation, but you want the effect of atomic variable access to be visible, so you must use GL_ATOMIC_COUNTER_BARRIER_BIT.
See: https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glMemoryBarrier.xhtml

EDIT2: Plus, you also bind to the wrong index/binding point with glBindBufferBase(). You bind to binding point index 0, whereas your compute shader uses binding point/location 1 for the atomic buffer.

Imagyx

Hi KaiHH,

thank you very much for your fast reply.
As you probably see, I'm not really used to working with the gpu yet  ;)

I have a few questions there after implementing your suggestions.
I understand that the use of glBufferSubData() is clearly wrong for reading values  :-[

What I don't understand is why I should bind to index 1 when the documentation of glBindBufferBase
says this is the "index of the binding point within the array specified by target" and target is GL_ATOMIC_COUNTER_BUFFER.
I set it to 1 for now, as you suggested.

About the barriers: I use the GL_SHADER_IMAGE_ACCESS_BARRIER_BIT for the actual image output of the compute shader. Should I add the GL_ATOMIC_COUNTER_BARRIER_BIT to it  or use GL_ALL_BARRIER_BITS like I do below?

I post the full code of the compute function here (including the changes from your suggestion)
It all worked fine without the new atomic counter part :(

private void compute() {
		atomicBuffer = glGenBuffers();
		glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicBuffer);
		glBufferData(GL_ATOMIC_COUNTER_BUFFER, 1, GL_DYNAMIC_COPY);
		List<Byte> counters = new ArrayList<>();
		ByteBuffer counter = glMapBuffer(GL_ATOMIC_COUNTER_BUFFER, GL_WRITE_ONLY);
		counter.put((byte) 42);
		counter.rewind();
		glUnmapBuffer(GL_ATOMIC_COUNTER_BUFFER);
		glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 0, atomicBuffer);

		//  Compute
		glUseProgram(computeProgram);		
		glDispatchCompute(numGroupsX, numGroupsY, 1);
		glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
		
		/* Display framebuffer texture ( here quadProgram is used with fragment and vertex shader only to display the output of the compute shader, which works well.
		 * 
		 */
		glUseProgram(quadProgram);
		glDrawArrays(GL_TRIANGLES, 0, 3);
		glMemoryBarrier(GL_ALL_BARRIER_BITS);
		
		/* Get atomic counter values
		 * 
		 */
		ByteBuffer counter2 = ByteBuffer.allocate(2); //                                         <<-- When I use  allocate(1) here, the program crashes without exception or error-log, with any bigger value I get Message: GL_INVALID_VALUE error generated. Offset and/or size is out of range.
		glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 1, atomicBuffer);
		glGetBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0, counter2);
		while (counter2.hasRemaining()) {
			byte val = counter2.get();
			System.out.println(val);
			counters.add(val);
		}
		System.out.println(Arrays.toString(counters.toArray()));
	}

	private void create_vao_fb_vs_fs_compute(int w, int h) {
		create_vao();
		create_framebuffer(w, h);
		create_vs_fs();
		create_compute();
	}

	private void create_vao() {
		glBindVertexArray(glGenVertexArrays());
	}

	private void create_framebuffer(int w, int h) {
		framebuffer = glGenTextures();
		glBindTexture(GL_TEXTURE_2D, framebuffer);
		glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
		glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
		glTexStorage2D(GL_TEXTURE_2D, 1, GL_RGBA8, w, h);
		glBindImageTexture(0, framebuffer, 0, false, 0, GL_WRITE_ONLY, GL_RGBA8);
	}

	private void create_vs_fs() {
		quadProgram = glCreateProgram();
		vsShader = glCreateShader(GL_VERTEX_SHADER);
		glAttachShader(quadProgram, vsShader);
		fsShader = glCreateShader(GL_FRAGMENT_SHADER);
		glAttachShader(quadProgram, fsShader);
	}

	private void create_compute() {
		computeProgram = glCreateProgram();
		computeShader = glCreateShader(GL_COMPUTE_SHADER);
		glAttachShader(computeProgram, computeShader);
	}


Output is:

[LWJGL] OpenGL debug message
   ID: 0x501
   Source: API
   Type: ERROR
   Severity: HIGH
   Message: GL_INVALID_VALUE error generated. Offset and/or size is out of range.
0
0
[0, 0]

Am I at least a bit on the right track?


Edit: Another problem is that I probably need long values instead of byte to count numIntersections in the computeShader, because there are at least some billions of them....

KaiHH

QuoteI set it to 1 for now, as you suggested.
You did change the argument to 1 for the second call that actually does not influence this. You must also change the index on the _first_ call to glBindBufferBase() which actually affects the glDispatchCompute().

QuoteWhat I don't understand is why I should bind to index 1 when the documentation of glBindBufferBase
says this is the "index of the binding point within the array specified by target" and target is GL_ATOMIC_COUNTER_BUFFER.
Sometimes, the GL documentation is horrible, and this is one example.
That binding point is the binding you specified in the shader.

QuoteAbout the barriers: I use the GL_SHADER_IMAGE_ACCESS_BARRIER_BIT for the actual image output of the compute shader. Should I add the GL_ATOMIC_COUNTER_BARRIER_BIT to it  or use GL_ALL_BARRIER_BITS like I do below?
If you want to _read back_ the values of that atomic buffer, then, yes, you should also include GL_ATOMIC_COUNTER_BARRIER_BIT. If you don't want/need to do that (for debugging) you can omit it.

Imagyx

Sadly it still doesn't work and I don't know what else to do.
I tried to see if a value of 1 in
ByteBuffer counter = glMapBuffer(GL_ATOMIC_COUNTER_BUFFER, GL_WRITE_ONLY);
counter.put((byte) 1);

could be seen in the compute shader as I set it as a color for the image output (1 should be white).
This step alone doesn't seem to work as I only get black.

KaiHH

Here is a super-minimal working example. You could work your way towards your solution:
import static org.lwjgl.opengl.GL43C.*;
import static org.lwjgl.system.MemoryStack.stackPush;
import java.nio.IntBuffer;
import org.lwjgl.system.MemoryStack;
import static org.lwjgl.glfw.GLFW.*;
import static org.lwjgl.opengl.GL.*;
public class ComputeShaderExample {
    public static void main(String[] args) {
        // create context
        glfwInit();
        glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 4);
        glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
        glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GLFW_TRUE);
        glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
        glfwWindowHint(GLFW_VISIBLE, GLFW_FALSE);
        long window = glfwCreateWindow(800, 600, "", 0L, 0L);
        glfwMakeContextCurrent(window);
        createCapabilities();

        // create buffer object for atomic counter
        int atomicBuffer = glGenBuffers();
        glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicBuffer);
        glBufferData(GL_ATOMIC_COUNTER_BUFFER, Integer.BYTES, GL_STATIC_DRAW);
        try (MemoryStack stack = stackPush()) {
            // initialize counter with 100
            glBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0L, stack.ints(100));
        }
        glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, 0);

        // create compute shader
        int program = glCreateProgram();
        int cs = glCreateShader(GL_COMPUTE_SHADER);
        glShaderSource(cs, 
        "#version 430 core\n" +
        "layout(binding=1) uniform atomic_uint counter;\n" +
        "layout(local_size_x = 8, local_size_y = 8) in;\n" +
        "void main(void) {\n" +
        "  atomicCounterIncrement(counter);\n" +
        "}");
        glCompileShader(cs);
        glAttachShader(program, cs);
        glLinkProgram(program);

        // dispatch compute
        glUseProgram(program);
        glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 1, atomicBuffer);
        glDispatchCompute(1, 1, 1);
        glMemoryBarrier(GL_ATOMIC_COUNTER_BARRIER_BIT);

        // read-back current counter value (should be 164 = 100 + 8*8)
        glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicBuffer);
        try (MemoryStack stack = stackPush()) {
            IntBuffer buffer = stack.mallocInt(1);
            glGetBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0L, buffer);
            System.out.println("Current value: " + buffer.get(0));
        }
    }
}


QuoteByteBuffer counter2 = ByteBuffer.allocate(2);
You are allocating an on-heap ByteBuffer backed by a Java primitive byte array here, which will not work when handing it off to a native library like OpenGL. You need to create a _direct_ off-heap ByteBuffer via allocateDirect(2).
And why do you use 2? For an atomic counter buffer you need at least 4 bytes (i.e. one int).

Imagyx

Thanks a lot for your example :)
This led me to the solution for my rendering process.
And I have multiple counters now, too, by changing to multiple Integers being transferred to and bound in my compute shader.

In case anyone else needs something like this, here's the essential part of my code:

// Setup the atomic counter
	int atomicBuffer = glGenBuffers();
	glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicBuffer);
        glBufferData(GL_ATOMIC_COUNTER_BUFFER, Integer.BYTES * NUM_COUNTERS, GL_STATIC_DRAW);
        try (MemoryStack stack = stackPush()) {
            // initialize counters with zeros
            glBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0L, stack.ints(0, 0, 0, 0, 0, 0, 0, 0));
        }
        glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, 0);
	glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 1, atomicBuffer);

// Compute
	glUseProgram(computeProgram);		
	// Create and bind textures here
	....
	// Transfer data to uniforms here
	...
	// Calculate 
	glDispatchCompute(numGroupsX, numGroupsY, 1);
	glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
// Display framebuffer
	glUseProgram(quadProgram);
	glDrawArrays(GL_TRIANGLES, 0, 3);
	glMemoryBarrier(GL_ALL_BARRIER_BITS);
// Get atomic counter values
	List<Integer> counters = new ArrayList<>();
	glMemoryBarrier(GL_ATOMIC_COUNTER_BARRIER_BIT);
        // read-back current counter values
        glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicBuffer);
        try (MemoryStack stack = stackPush()) {        	
	       	IntBuffer buffer = stack.mallocInt(NUM_COUNTERS);
        	glGetBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0L, buffer);
		while (buffer.hasRemaining()) {
			counters.add(buffer.get());				
		}
        }


And the compute shader:

#version 450 core
layout(binding = 0, rgba8) uniform image2D framebufferImage;
layout(local_size_x = 8, local_size_y = 8) in;
// counter buffer with currently max. 8 entries
layout (binding = 1) uniform atomic_uint numIntersections;
layout (binding = 1, offset = 4) uniform atomic_uint primaryRays;
layout (binding = 1, offset = 8) uniform atomic_uint secondaryRays;
layout (binding = 1, offset = 12) uniform atomic_uint numDE_steps;
...
...

atomicCounterIncrement(numDE_steps);
...
atomicCounterIncrement(primaryRays);
...


with vertex shader

#version 450 core
out vec2 texcoord;
void main(void) {
  vec2 vertex = vec2((gl_VertexID & 1) << 2, (gl_VertexID & 2) << 1) - vec2(1.0);
  gl_Position = vec4(vertex, 0.0, 1.0);
  texcoord = vertex * 0.5 + vec2(0.5, 0.5);
}


and fragment shader

#version 450 core
uniform sampler2D tex;
in vec2 texcoord;
layout(location = 0) out vec4 color;
void main(void) {
  color = texture2D(tex, texcoord);
}