UBOs and Uniform Blocks

Started by david37370, February 14, 2015, 20:06:44

Previous topic - Next topic

david37370

Does anyone know how to use uniform blocks and uniform buffer objects (UBO)? If yes, can i please get a short working example? all the examples that i found were with functions that don't exist now...
I want to make the transferring of light sources to frag shader smoother and easier, and I also need this for creating shadows. I would appreciate any help!
thanks :)

Kai

Have a look at http://www.lighthouse3d.com/tutorials/glsl-core-tutorial/3490-2/.
It is an excellent tutorial on how to use UBOs.

Additionally, because I have not used them for a while now and kinda needed an excercise, I just prepared an UBO example in LWJGL3's sources under demo/opengl/raytracing/Demo33Ubo.java for you to check out. :)

Java Sourcefile: https://github.com/LWJGL/lwjgl3/blob/master/src/tests/org/lwjgl/demo/opengl/raytracing/Demo33Ubo.java

The corresponding (fragment) shader: https://github.com/LWJGL/lwjgl3/blob/master/res/demo/raytracing/raytracingUbo.fs

There is some other stuff in there, but the relevant Java methods for UBOs in Demo33Ubo are:
- initRayTracingProgram()
- createCameraSettingsUbo()
- updateCameraSettingsUbo()
- trace()

Cheers,
Kai

david37370

Kai thanks very much! i copied parts of your class and summarized it in two functions:

private void createUbo() {
		this.uboID = glGenBuffers();
		glBindBuffer(GL_UNIFORM_BUFFER, uboID);
		glBufferData(GL_UNIFORM_BUFFER, 4, GL_DYNAMIC_DRAW);
		glBindBuffer(GL_UNIFORM_BUFFER, 0);
		
		glUseProgram(getShaderProgram());
		int cameraSettingsIndex = GL31.glGetUniformBlockIndex(this.getShaderProgram(), "CameraSettings");
		GL31.glUniformBlockBinding(this.getShaderProgram(), cameraSettingsIndex, this.uboBinding);
		glUseProgram(0);
		
	}
	
	private void updateUbo() {
		ByteBuffer uboData = BufferUtils.createByteBuffer(4);
		FloatBuffer fv = uboData.asFloatBuffer();
		/* Set viewing frustum corner rays in shader */
		
		fv.put(1);
		
		fv.flip();
		
		
		glBindBuffer(GL_UNIFORM_BUFFER, this.uboID);
		glBufferSubData(GL_UNIFORM_BUFFER, 0, fv);
		glBindBuffer(GL_UNIFORM_BUFFER, 0);
		
	}


and uniform block in shader -
uniform CameraSettings {
  uniform float a;
};


The sad part is that CameraSettings.a stays 0, even though i sent it 1 in the bufferSubData call. can you detect mistakes in my code please? i know that everything works fine, except those two functions. thanks!!! :)

Kai

Are you using glBindBufferBase to bind the UBO to a binding point before you invoke a draw call?
You must do this to let OpenGL know that you actually want it to use the UBO.
This is also done in the demo. Have a look at it.

david37370

Oh god thanks, it works now. you're god.
joking :P
I saw it, but didnt use it because all i saw between those binding calls was the "glBindFramebuffer(GL_FRAMEBUFFER, fbo); glBindVertexArray(vao);" thing. now im binding it before the drawing code - that is glDrawElements, and unbinding after.
anyway, i proved myself pretty badly that i have no clue about binding, buffers, drawElements function and everything... i feel as if its inevitable to make mistakes in the code that are invisible on my game but hurt it, performance wise. trying to learn...
thanks again :)

edit:
for some reason, only the first vector in my struct array gets modified.

private void updateUbo() {
		
		Point m = getLocationOnFrame(getMousePosition());
		
		FloatBuffer light = BufferUtils.createFloatBuffer(7*1);
		light.put(new float[]{
				
				(float)m.x,(float)m.y, 1.0f, 0.0f,1.0f, 1.0f, 0.8f,
				//player.x,player.y, 0.0f, 1.0f, 0.0f,    0.8f, 1.0f,
				//1f,0.8f, 1.0f, 0.0f, 0.0f,              0.6f, 0.7f,
				});
		light.flip();
		
		
		
		glBindBuffer(GL_UNIFORM_BUFFER, this.uboID);
		glBufferSubData(GL_UNIFORM_BUFFER, 0, light);
		glBindBuffer(GL_UNIFORM_BUFFER, 0);
		
	}


in the shader:
struct Light{
	vec2 location;
	vec3 color;
	float intensity;
	float radius;
};
uniform Lights {
  Light[1] lights;
};


i tested it and only location vector is set to mouse position, color stays (0, 0, 0) and everything after that...
i called glBufferData before with size 4*7 bytes, i dont understand why it doesn't work.

update: (once again)
the data transfer there works in a wierd way i must say... i "fixed it", but i have no clue why:

  
				//   x         y        Unknown    r     g     b   intensity  radius
				(float)m.x,(float)m.y,   0, 0,     0.0f,1.0f,0.0f,   0.5f,     4.5f,


this works, but i dont understand why the shader needs 2 values after the vec2. wierd thing about it is that vec3 works fine... sorry for the long reply :/ i hope more people read this when having such unexpected bugs. cheers all :)

Kai

Regarding the "weird data transfer" :) I encourage you to read the relevant part of the OpenGL 3.3 specification. Believe me, everything is fine there, it's not a bug it's a feature. Although it can be headaching to remember and apply the rules there. ;)

In chapter 2.11.4 "Uniform Variables" under section "Standard Uniform Block Layout" it says:
Quote
By default, uniforms contained within a uniform block are extracted from buffer storage in an implementation-dependent manner. Applications may query the offsets assigned to uniforms inside uniform blocks with query functions provided by the GL.

This means that you cannot simply neatly pack your vec2's, vec3's and floats in a Java ByteBuffer according to the uniform block specification in your GLSL shader and then expect everything to work out of the box.

The members of a uniform block instead have certain alignment requirements that must be met and OpenGL expects some padding bytes here and there so that those alignment requirements are met. By the way, you might have noticed that the Demo33Ubo was in effect writing five vec4's instead of five vec3's (as specified by the uniform block in the shader) into the ByteBuffer, because of the alignment requirements that I want to explain here throughout this long post. :)

Now, because the default behaviour is implementation-dependent, the first thing we should do is adding a "layout(std140)" to your uniform block in the shader, like so:
layout(std140) Lights {
  Light[1] lights;
};


This enables a standard layout that is specified as a numbered list of rules in the mentioned OpenGL 3.3 spec in chapter 2.11.4 "Uniform Variables" under section "Standard Uniform Block Layout": https://www.opengl.org/registry/doc/glspec33.core.20100311.withchanges.pdf (PDF page 72).

Now, let's excercise the rules for your struct to see which struct members are going to end up at which byte offset in your ByteBuffer:

In your example, you have a vec2, then a vec3 and afterwards two floats. All those are packed in a struct, so rule 9 of the spec fires first, specifying how to go about structs. It says:
Quote
If the member is a structure, the base alignment of the structure is N, where N is the largest base alignment value of any of its members, and rounded up to the base alignment of a vec4.

This sentence is relevant if we have more members than this single struct in our uniform block (either sequentially written as individual members or as array elements).
Since for you this is not the case, let's read on:
Quote
The individual members of this substructure are then assigned offsets by applying this set of rules recursively, where the base offset of the first member of the sub-structure is equal to the aligned offset of the structure. The structure may have padding at the end; the base offset of the member following the sub-structure is rounded up to the next multiple of the base alignment of the structure.

This basically says: Now, we are going to apply the rules for each of the struct members.

So, let's begin with our first member, a vec2. We are now at offset 0, because this is our first member to consider. You always need to have that "offset" value in mind when going through your members and applying alignment and padding. So, our first vec2 starts at offset 0.
Now, we search for an alignment rule and find rule number 2:
Quote
If the member is a two- or four-component vector with components consuming N basic machine units, the base alignment is 2N or 4N, respectively.
Since we have a vec2, our base alignment is 2*float = 8 bytes. Now we need to know what to do with this "base alignment" value. This was mentioned in the introductory paragraph of the section:
Quote
A structure and each structure member have a base offset and a base alignment, from which an aligned offset is computed by rounding the base offset up to a multiple of the base alignment. The base offset of the first member of a structure is taken from the aligned offset of the structure itself.
This lays out a complicated formula for computing the actual offset of a struct/uniform block member. We always need to compute the "aligned offset", which is the effective byte position in our ByteBuffer where we need to store the member. In the case of the first member of our single struct, this is 0.

Next, we go to the second struct member, the vec3. Here, rule number 3 applies:
QuoteIf the member is a three-component vector with components consuming N basic machine units, the base alignment is 4N.
Aha! So, a vec3 is basically treated as a vec4 in memory!
To compute the aligned offset of this vec3 we need to apply the complicated formula from above:

Quote
A structure and each structure member have a base offset and a base alignment, from which an aligned offset is computed by rounding the base offset up to a multiple of the base alignment. [...] The base offset of all other structure members is derived by taking the offset of the last basic machine unit consumed by the previous member and adding one.
So, the (actual) offset of "the last basic machine unit" (i.e. "byte") was 7 (i.e. the last byte of the second component of our first vec2).
We need to add one, which gives us 8. Now, we need to round that 8 up to a multiple of the base alignment. The base alignment for a vec3 is the same as for a vec4, namely 16 bytes. So, rounding up 8 to the next multiple of 16 gives us 16. And that is the offset at which we will store our vec3. *Phew....*

I will leave you with the (simple) computation of the remaining two floats, which are effectively just packed behind the vec3, so the first float starts at offset 28 and the second at 32.

Now that we've gone through all that trouble of computing the offsets manually, there is actually a simpler rule stated by many others to remember. You simply layout your members in such a way that they always fit into vec4 "slots". So, your first vec2 did fit into a vec4 slot, leaving two remaining components unused. Now, when we try to squeeze the next vec3 into there, it won't work, since we are short of the space for a single float-component.
So, we use "the next" vec4 slot for our vec3. The next member, the float, also fits into the last component of the second vec4 slot.
And the last float gets its own vec4 slot.

I hope that makes it all clearer!

All the best!
Kai

david37370

Thank you for the detailed answer.. everything works in that part now :)
now im encountering a different problem. as i noticed that my framerates went very low if i added more lights, i read about the branching problems with GPUs, and now i can't use if and for statements... what am i supposed to do now? it works for now, in a bad and inefficient way. what i did: open up the for loop, and do the operations one by one. this isn't the bad part yet - now i can't check if the current fragment is out of light range, so i need to calculate the color with my formula and it will be 0, but time was wasted.
im scared to even start shadows if i can't use if and for... do you have a suggestion? i thought about sending arrays of lines which are supposed to be shadow casters, but now that i can't loop, i will have to define a fixed size, and that would be extra stupid... I'm stuck, would appreciate help!
cheers,
david

Kai

Who told you that you cannot use if-statements and for-loops anymore?
Of course you can use them. And what you read about branching inefficiencies is probably largely irrelevant to your problem of lighting. Yes, you are probably not occupying 100% of your GPU cores if for some fragment's shader execution some if-condition is true while for another invocation of the adjacent pixel/fragment that same condition evalutes to false and both executions must diverge.
But, I guess that you still have some large "spatial coherence" in your shading. That is to say, your shader invocations do not wildly diverge and most of them (namely those of the same mesh face) still take the same branch.
Yet, this is all some speculation.

Quotenow i can't check if the current fragment is out of light range, so i need to calculate the color with my formula and it will be 0, but time was wasted.
Yes, it can to some degree be even more efficient to do the calculation without a branch even if the result will be zero. But that is wild speculation and you can never say for sure unless you actually measure it. There are tools out there that can help you evalute the "bottleneck" of your shader:

For Nvidia this is: http://www.nvidia.com/object/nvshaderperf_home.html
And for AMD this is: http://developer.amd.com/tools-and-sdks/graphics-development/gpu-shaderanalyzer/

If your frames drop significantly in your shader when you add more lights (how much?) and your scene already consists of many lights, you probably need another rendering approach, called Deferred Shading.

david37370

I added 20 lights, and it went from 28 fps to 60 (vsync limit) when i hardcoded the for loop and removed if's, so i guess they were right... I read in several forums that there are "warps" of closeby fragments that must execute the same code, and if it cannot be unrolled then it has to be branched, and the other part has to wait while some branch is executed. if i understood right. now i also read about rendering to textures, and then drawing them on the screen on a simple quad, all using frame buffers, and this is all getting very confusing.. and all the old tutorials with fixed pipeline don't help much either. how can multiple rendering cycles be done? all the tutorials for 2d lighting and shadows include depth buffer, but don't explain what it is. i feel as if i'm missing on something here.

Kai

What is your concrete question?
You seem to go a bit ahead of yourself.
Try building a simple game first!
Experiment around.
Another option you should seriously consider is using an existing rendering engine for your game, such as JMonkey Engine, instead of fiddling around with OpenGL.
It will give you good visible results faster.

david37370

I did that, i made snake with lighting, cute one :D
But my decision to learn opengl is final.
So i decided that I will learn how to use frame buffers, and i used some code from your example to do this. the general idea is simple, draw everything i want to a texture, and then draw the texture on the screen. and that will allow for future blending of different framebuffers. problem is that right now, the texture turns out to be black, as if i didnt render anything to it... why is that?
In the initializing step, i used your createFramebufferTexture function from that example code with raytracing.
//Bind things...


//Problematic part
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, vboiID);

//Supposedly drawing to texture
glBindFramebuffer(GL_FRAMEBUFFER, fbo);
glClear(GL_COLOR_BUFFER_BIT);
Render();
glBindFramebuffer(GL_FRAMEBUFFER, 0);

//Drawing the texture
presentQuad.render();

// Put everything back to default (deselect)
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);


//Unbind things

by the way, what is glDrawBuffers(fbo)?
Im currently trying to understand how to use the framebuffers, after that i'll probably render the scene for each light. thanks! :)
david

Kai

The issue with your black texture is most likely a missing texture parameter min/mag filter setting:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);

Without this setting, OpenGL (i.e. the texture sampler) does not know how to sample the texels in your rendered texture and hence it reads black everywhere.
The Demo33Ubo demo does not set the min/mag filter parameters on the texture object itself, but uses an OpenGL 3.3 Sampler Object instead, since that is the preferred way to decouple the two things "texture format and data settings" and "sampler settings" in OpenGL 3.3 and upwards, to be able to mix-and-match the same and even different sampler parameters with the same and different texture objects.
You can however just set those parameters directly on the texture as well.