Performance of VBO's versus Immediate-Mode

Cornix · September 16, 2013, 20:43:46

Hi everybody.

I started working with OpenGL about a year ago and worked with immediate mode.
A few months ago I switched to VBO's and it took me some time to rewrite my code.

Now I suddenly have to find out, that the performance didnt improve at all. In fact, it seems to be slightly worse!

I was trying to find out why but I just dont have a clue; here is a (rather) simple example Program I wrote:

public class SomeScene {

		private XRect[] rects;
		private int vbo_id;
		private int ibo_id;
		
		private static final int IMAGE_COUNT = 20000;
		
		public void update() {
			GL11.glBegin(GL11.GL_QUADS);
			for (int i = 0; i < IMAGE_COUNT; i++) {
				GL11.glVertex3f(rects[i].get_x(), rects[i].get_y(), 0);
				GL11.glVertex3f(rects[i].get_x(), rects[i].get_final_y(), 0);
				GL11.glVertex3f(rects[i].get_final_x(), rects[i].get_final_y(), 0);
				GL11.glVertex3f(rects[i].get_final_x(), rects[i].get_y(), 0);
			}
			GL11.glEnd();
			/*
			* GL11.glMatrixMode(GL11.GL_MODELVIEW);
			* for (int i = 0; i < IMAGE_COUNT; i++) {
			* 	GL11.glPushMatrix();
			* 	GL11.glTranslatef(rects[i].get_x(), rects[i].get_y(), 0);
			* 	GL11.glDrawElements(GL11.GL_TRIANGLES, 6, GL11.GL_UNSIGNED_BYTE, 0);
			* 	GL11.glPopMatrix();
			* }
			*/
		}
		
		public void initialize() {
			load_rects();
			load_vbo();
		}
		
		private void load_rects() {
			rects = new XRect[IMAGE_COUNT];
			for (int i = 0; i < IMAGE_COUNT; i++) {
				float x = (float) (Math.random() * XProgram.get_display_width());
				float y = (float) (Math.random() * XProgram.get_display_height());
				rects[i] = new XRect(x, y, 32, 32);
			}
		}
		
		private void load_vbo() {
			int FLOAT_BYTE_SIZE = 4;
			/*
			 * Define vertices and indices used by the VBO:
			 */
			float[] vertices = new float[] {
					0, 0, 0,
					0, 32, 0,
					32, 32, 0,
					32, 0, 0
			};
			byte[] indices = new byte[] {
					0, 1, 2, 2, 3, 0
			};
			
			/*
			 * Create the vertex buffer; upload vertices into buffer.
			 */
			vbo_id = GL15.glGenBuffers();
			FloatBuffer vbo_buf = ByteBuffer.allocateDirect(vertices.length * FLOAT_BYTE_SIZE).order(ByteOrder.nativeOrder()).asFloatBuffer();
			vbo_buf.put(vertices);
			vbo_buf.flip();
			GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, vbo_id);
			GL15.glBufferData(GL15.GL_ARRAY_BUFFER, vbo_buf, GL15.GL_STATIC_DRAW);
			
			/*
			 * Create the index buffer; upload indices into buffer.
			 */
			ibo_id = GL15.glGenBuffers();
			ByteBuffer ibo_buf = ByteBuffer.allocateDirect(indices.length).order(ByteOrder.nativeOrder());
			ibo_buf.put(indices);
			ibo_buf.flip();
			GL15.glBindBuffer(GL15.GL_ELEMENT_ARRAY_BUFFER, ibo_id);
			GL15.glBufferData(GL15.GL_ELEMENT_ARRAY_BUFFER, ibo_buf, GL15.GL_STATIC_DRAW);
			
			/*
			 * Make vbo ready for drawing:
			 */
			int POS_COUNT = 3;
			int TEXCOORD_COUNT = 0;
			int stride = (POS_COUNT + TEXCOORD_COUNT) * FLOAT_BYTE_SIZE;
			
			GL11.glEnableClientState(GL11.GL_VERTEX_ARRAY);
			//GL11.glEnableClientState(GL11.GL_TEXTURE_COORD_ARRAY);
			GL11.glVertexPointer(POS_COUNT, GL11.GL_FLOAT, stride, 0);
			//GL11.glTexCoordPointer(TEXCOORD_COUNT, GL11.GL_FLOAT, stride, POS_COUNT * FLOAT_BYTE_SIZE);
		}
	}

I obviously left some parts out as they dont change for either VBO or immediate mode and are thus not relevant for the performance.

Can anybody tell me what I am doing wrong?

Mickelukas · September 17, 2013, 07:23:22

Hi,

Did you use pure immediate mode or display lists? If you used display lists it is indeed possible that the performance decreased when it comes to rendering, but you should have seen a big improvement in the time it takes to create the data (VBO's create a lot quicker than display lists).

If you used immediate then you should see a performance increase, maybe except if you recreate the VBO's every single frame instead of reusing them.

Mike

Cornix · September 17, 2013, 11:15:26

Look at the code, its not that much. I use pure immediate mode, no display lists. And I only create a single VBO at the beginning.

Its basically this:

GL11.glBegin(GL11.GL_QUADS);
		for (int i = 0; i < IMAGE_COUNT; i++) {
			GL11.glVertex3f(rects[i].get_x(), rects[i].get_y(), 0);
			GL11.glVertex3f(rects[i].get_x(), rects[i].get_final_y(), 0);
			GL11.glVertex3f(rects[i].get_final_x(), rects[i].get_final_y(), 0);
			GL11.glVertex3f(rects[i].get_final_x(), rects[i].get_y(), 0);
		}
		GL11.glEnd();

versus this:

GL11.glMatrixMode(GL11.GL_MODELVIEW);
		for (int i = 0; i < IMAGE_COUNT; i++) {
			GL11.glPushMatrix();
			GL11.glTranslatef(rects[i].get_x(), rects[i].get_y(), 0);
			GL11.glDrawElements(GL11.GL_TRIANGLES, 6, GL11.GL_UNSIGNED_BYTE, 0);
			GL11.glPopMatrix();
		}

Fool Running · September 17, 2013, 17:28:43

It looks like you are only putting a little bit of data in your VBO (2 triangles?). Then you are translating to the correct location 20000 times and drawing that small VBO.

You would get *much* better results putting all 20000 rects in a single VBO and drawing that once. If these rects can move, then you should be using the streaming or dynamic VBOs and and still put all your data into one VBO.

EDIT: You are basically only saving 4 immediate mode calls with you VBO currently, but then you are adding in the glTranslate which makes it only saving 3. You also have the overhead of creating an indexed VBO which is a lot more overhead for 2 triangles then the savings you might expect to get. I would suggest not using an indexed VBO for data that isn't a mesh (i.e. a mesh is where more then 2 polygons share most indexes).

Cornix · September 17, 2013, 17:50:36

But if we assume, that the rects move around every other frame, should I still be updating the VBO data instead of using glTranslate?

Mickelukas · September 17, 2013, 19:45:43

Yes, you should

Mike

Cornix · September 17, 2013, 20:53:24

Okay, thank you two for your help so far.

So, lets assume I need to order those rectangles by their Z-value. Should I still go with a single big VBO and sort the indices within?
That way I would need to upload the entire VBO every frame (in the worst case scenario).

Fool Running · September 18, 2013, 12:22:33

Quote from: Cornix on September 17, 2013, 20:53:24
Okay, thank you two for your help so far.

So, lets assume I need to order those rectangles by their Z-value. Should I still go with a single big VBO and sort the indices within?
That way I would need to upload the entire VBO every frame (in the worst case scenario).

Why do you need to order them by their Z-value. If you need them Z-ordered, you should turn on depth testing. If you are using transparency, then there are tricks that should be used to keep from having to sort your geometry.

Cornix · September 18, 2013, 13:11:49

What tricks when I might ask?

Fool Running · September 18, 2013, 16:46:03

There are several tricks. I would suggest Googling "opengl order independent transparency". That should give you some ideas to start with.

News:

Performance of VBO's versus Immediate-Mode