Improving Performance of Rendering 2D Tiled Terrain

Started by Setlock, November 04, 2018, 16:48:19

Previous topic - Next topic

Setlock

Hello everyone Ive been working on optimizing my rendering technique for my 2D game. Right now I render quads in the most modern way I can find using VAO for vertices, indices etc. my rendering looks like this:
public void render(Map<TexturedModel,List<Entity>> entities)
{
    for(TexturedModel model : entities.keySet())
    {
        if(model!= null)
        {
            prepareTexturedModel(model);
            List<Entity> batch = entities.get(model);
            for(Entity entity : batch)
            {
                prepareInstance(entity);
                GL11.glDrawElements(GL11.GL_TRIANGLES,model.getRawModel().getVertexCount(), GL11.GL_UNSIGNED_INT,0);
            }
            unbindTexturedModel();
        }
    }
}


PrepareTexturedModel(model) simply binds a texture to the quad, prepareInstance(entity) simply sets transformation matrix and other attributes of the entity to be drawn. Currently the game Im making creates 2D terrain using Simplex Noise and creates a new Quad of width 16 and height 16 and ONLY the quads on screen are rendered so depending on the size of the screen more or less quads will be rendered, which increases or decreases performance respectively.


The image above should show how tiles are separated, the water doesnt show a border because its a collection of 32x32 quads instead of 16x16, just to make the texture look bigger.

My main question is how could I go about optimizing the rendering of those tiles, also just a side not they dont have to be individually editable, one idea I was thinking about was grouping similar tiles together and just making one larger tile and then texturing that, which would decrease the number of vertices but I im stumped on how I could go about doing that as well.

Thanks in advance for you help!

Cornix

What have you measured? What is your bottleneck?

If you want to "optimize" something you better know how good it is right now and how good it needs to be.

mudlee

First step would be to render the tiles instanced. That will help, but of course, cornix asked the right question too.
Instanced rendering: https://ahbejarano.gitbook.io/lwjglgamedev/chapter21

Setlock

Quote from: Cornix on November 04, 2018, 17:10:51
What have you measured? What is your bottleneck?

If you want to "optimize" something you better know how good it is right now and how good it needs to be.

I assume its the number of vertices that I need to process, I make sure to only render the things that are directly on screen, and of course I try to do my work on a pretty bad system so I can make it highly optimized so I know that it will run well on any average machine, the laptop I use right now has an Intel Celeron N2840 CPU and simply Intel HD graphics for the GPU so obviously it isnt great for games in the first place, but I dont feel that rendering a bunch of squares should be extremely difficulty, but it runs at 40-50FPS. So really I think I need to decrease the amount of entity processing calls I make.

Setlock

Quote from: mudlee on November 04, 2018, 17:57:26
First step would be to render the tiles instanced. That will help, but of course, cornix asked the right question too.
Instanced rendering: https://ahbejarano.gitbook.io/lwjglgamedev/chapter21

I think instance rendering is what Iââ,¬â,,¢ll be going for, since Iââ,¬â,,¢m only ever drawing squares I think thatââ,¬â,,¢s the best way to do that, then for other things Iââ,¬â,,¢ll just stick with my usual batch rendering, thanks

Setlock

Quote from: mudlee on November 04, 2018, 17:57:26
First step would be to render the tiles instanced. That will help, but of course, cornix asked the right question too.
Instanced rendering: https://ahbejarano.gitbook.io/lwjglgamedev/chapter21

So ive applied instance rendering and it hasnt helped performance at all, I would like to get it running at 60 fps on the extremely low end machine(its currently running at 30-50 depending on how many tiles are on screen) but maybe that just isnt possible, Im not sure what else I can do to increase FPS, im really only drawing 2800 squares at max currently, but maybe this is just to much for such a low end machine.

mudlee

Have you measured you render time? Measure how much time the glfwSwapBuffers takes, also the glDrawElementsInstanced.
Those should be around 0ms-10ms. If so, you performance problem is somewhere in the Java code.

KaiHH

Let's first make an obvious observation based on the performance figures:
"You called glDrawElements - and now glDrawElementsInstanced - 2800 times per frame, each call rendering a single quad." Right?
And a single quad is what you called an "Entity" and your for loop ranged over 2800 "entities" (i.e. quads). Right?

Instead of calling glDrawElements 2800 times and each call rendering a single quad (or two triangles), you should call glDrawElements exactly once (per material/texture) per frame.

The problem is that you are making _assumptions_ about where performance is going to hurt. And one of your assumptions was that rendering a visible quad takes more time than rendering an invisible quad. That's why you collected all visible quads in a list and then looped over each quad/entity and render it with a single draw call.
The truth/reality is that issuing a single draw call is veeeeery costly. Doing a single call to glDrawElements which renders a million _invisible_ quads is far faster than doing 3000 calls of glDrawElements each rendering 1 _visible_ quad. So the goal is not to reduce the amount of visible vs. invisible quads but rather reducing the amount of _draw calls_ you are doing.

Cornix

What KaiHH says is generally correct but also keep in mind that multiple draw calls are not that bad as long as there are no state changes in between. Drivers can oftentimes optimize these calls quite well.
As I tried to allude to in my first post: Before you try to "optimize" the performance of your code you need to actually understand where the time is lost. Do some profiling, find out the big time costs and then experiment. Without proper measuring you will run around in circles chasing invisible targets.

Setlock

Quote from: KaiHH on November 06, 2018, 09:21:01
Let's first make an obvious observation based on the performance figures:
"You called glDrawElements - and now glDrawElementsInstanced - 2800 times per frame, each call rendering a single quad." Right?
And a single quad is what you called an "Entity" and your for loop ranged over 2800 "entities" (i.e. quads). Right?

Instead of calling glDrawElements 2800 times and each call rendering a single quad (or two triangles), you should call glDrawElements exactly once (per material/texture) per frame.

The problem is that you are making _assumptions_ about where performance is going to hurt. And one of your assumptions was that rendering a visible quad takes more time than rendering an invisible quad. That's why you collected all visible quads in a list and then looped over each quad/entity and render it with a single draw call.
The truth/reality is that issuing a single draw call is veeeeery costly. Doing a single call to glDrawElements which renders a million _invisible_ quads is far faster than doing 3000 calls of glDrawElements each rendering 1 _visible_ quad. So the goal is not to reduce the amount of visible vs. invisible quads but rather reducing the amount of _draw calls_ you are doing.

So first actually I lied sorry, there was an FPS increase of about 10 frames after implementing instanced rendering but it wasnt the performance change I had hoped for. I only call glDrawElementsInstanced once per texture, and currently there are no more than 6 textures that are changing(once I implement texture atlases I will of course switch to those). Secondly isnt that what glDrawElementsInstanced is supposed to do? So I store all the information about a quad like position, color, texture coords, etc. and update the relevant information once per object, so really I loop through all the objects changing the information about its world position, color, lighting etc, and store that back in the VAO. Then after doing that it renders all of them with glDrawElementsInstanced. Thats how I currently do it, but it has only provided a small FPS increase but it really may come down to the hardware im testing this on, im running this on the worse laptop possible in order to heavily optimize it so it will run fine on any average computer, so it may just be hardware limitations but if you have any other thoughts im really happy to hear them, thanks for all the help!

mudlee


Setlock

Quote from: mudlee on November 06, 2018, 13:22:12
Can you link your code?

public void render(Map<TexturedModel,List<Entity>> entities)
	{
		for(TexturedModel model : entities.keySet())
		{
			if(model!= null)
			{
				prepareTexturedModel(model);
				pointer = 0;
				List<Entity> batch = entities.get(model);
				float[] vboData = new float[batch.size() * INSTANCE_DATA_LENGTH];
				for(Entity entity : batch)
				{
					updateTransform(entity,vboData);
				}
				Artist.loader.updateVbo(vbo, vboData, buffer);
				
				GL31.glDrawArraysInstanced(GL11.GL_TRIANGLE_STRIP, 0, quad.getVertexCount(), batch.size());
				
				unbindTexturedModel();
			}
		}
	}


updateTransform simply creates a new transformation matrix and stores it in the vbo, I havent correctly implemented storing the color and lighting values but obviously that would be implemented later, in the same way as the transformation matrix.

mudlee

Why do you update the transform in every frame? Also, why do you even create a new one in every frame? Once data in the GPU's memory, it should only be updated when it was really changed. Here is my rendering logic. Note that, updateBatchDataInGPU gets called ONLY if anything was changed.

https://gist.github.com/mudlee/94f2bd3bed5e1f1234a8dccf1a962c38

Setlock

Quote from: mudlee on November 06, 2018, 16:56:28
Why do you update the transform in every frame? Also, why do you even create a new one in every frame? Once data in the GPU's memory, it should only be updated when it was really changed. Here is my rendering logic. Note that, updateBatchDataInGPU gets called ONLY if anything was changed.

https://gist.github.com/mudlee/94f2bd3bed5e1f1234a8dccf1a962c38

Well I update the transform for each object every frame since objects are frequently moving on screen.

mudlee

frequently = all thousands of objects 60times per second? :) Note that, when KAIHH writes that calling glDraw* is expensive, it also means calling gl* is expensive. When you update data in the gpu's memory, it's expensive.

Quote from: Setlock on November 06, 2018, 20:11:01
Quote from: mudlee on November 06, 2018, 16:56:28
Why do you update the transform in every frame? Also, why do you even create a new one in every frame? Once data in the GPU's memory, it should only be updated when it was really changed. Here is my rendering logic. Note that, updateBatchDataInGPU gets called ONLY if anything was changed.

https://gist.github.com/mudlee/94f2bd3bed5e1f1234a8dccf1a962c38

Well I update the transform for each object every frame since objects are frequently moving on screen.