Best way of rendering lots of cubes.

Started by Meanz, August 11, 2011, 08:58:05

Previous topic - Next topic

Meanz

Well, as you probably figured by the title, I am doing like everyone else, creating a minecrafty clone.
My reasons are simple, I wan't to learn 3d programming and I don't know squat about modelling etc.
So cubes are the easy way to go, besides finding textures is no problem either.
But I have managed to render about 3k cubes at once without dropping below 50 fps, I am using Display.sync 60.

This is my display list:

  displayList = glGenLists(1);
        glNewList(displayList, GL_COMPILE);
        {
            /* 6 faces */

            ResourceManager.getManager().getTexture(frontTexture).bind();
            glBegin(GL_QUADS);
            {
                //front face
                glNormal3f(0f, 0f, 1f);
                glTexCoord2f(0f, 1f);
                glVertex3f(1f, 0f, 0f); //1
                glTexCoord2f(1f, 1f);
                glVertex3f(0f, 0f, 0f); // 2
                glTexCoord2f(1f, 0f);
                glVertex3f(0f, 1f, 0f); // 3
                glTexCoord2f(0f, 0f);
                glVertex3f(1f, 1f, 0f); // 4
            }
            glEnd();
            ResourceManager.getManager().getTexture(backTexture).bind();
            glBegin(GL_QUADS);
            {
                //right face
                glNormal3f(0f, 0f, -1f);
                glTexCoord2f(0f, 1f);
                glVertex3f(0f, 0f, 1f); //1
                glTexCoord2f(1f, 1f);
                glVertex3f(1f, 0f, 1f); // 2
                glTexCoord2f(1f, 0f);
                glVertex3f(1f, 1f, 1f); // 3
                glTexCoord2f(0f, 0f);
                glVertex3f(0f, 1f, 1f); // 4
            }
            glEnd();
            ResourceManager.getManager().getTexture(topTexture).bind();
            glBegin(GL_QUADS);
            {
                //top face
                glNormal3f(0f, -1f, 0f);
                glTexCoord2f(0f, 1f);
                glVertex3f(1f, 1f, 0f); //1
                glTexCoord2f(1f, 1f);
                glVertex3f(0f, 1f, 0f); // 2
                glTexCoord2f(1f, 0f);
                glVertex3f(0f, 1f, 1f); // 3
                glTexCoord2f(0f, 0f);
                glVertex3f(1f, 1f, 1f); // 4
            }
            glEnd();
            ResourceManager.getManager().getTexture(bottomTexture).bind();
            glBegin(GL_QUADS);
            {
                //bottom face
                glNormal3f(0f, 1f, 0f);
                glTexCoord2f(0f, 1f);
                glVertex3f(1f, 0f, 1f); //1
                glTexCoord2f(1f, 1f);
                glVertex3f(0f, 0f, 1f); // 2
                glTexCoord2f(1f, 0f);
                glVertex3f(0f, 0f, 0f); // 3
                glTexCoord2f(0f, 0f);
                glVertex3f(1f, 0f, 0f); // 4
            }
            glEnd();
            ResourceManager.getManager().getTexture(leftTexture).bind();
            glBegin(GL_QUADS);
            {
                //left face
                glNormal3f(-1f, 0f, 0f);
                glTexCoord2f(0f, 1f);
                glVertex3f(1f, 0f, 1f); //1
                glTexCoord2f(1f, 1f);
                glVertex3f(1f, 0f, 0f); // 2
                glTexCoord2f(1f, 0f);
                glVertex3f(1f, 1f, 0f); // 3
                glTexCoord2f(0f, 0f);
                glVertex3f(1f, 1f, 1f); // 4
            }
            glEnd();
            ResourceManager.getManager().getTexture(rightTexture).bind();
            glBegin(GL_QUADS);
            {
                //right face
                glNormal3f(1f, 0f, 0f);
                glTexCoord2f(0f, 1f);
                glVertex3f(0f, 0f, 0f); //1
                glTexCoord2f(1f, 1f);
                glVertex3f(0f, 0f, 1f); // 2
                glTexCoord2f(1f, 0f);
                glVertex3f(0f, 1f, 1f); // 3
                glTexCoord2f(0f, 0f);
                glVertex3f(0f, 1f, 0f); // 4
            }
            glEnd();
        }
        glEndList();


As you can see I am rendering each face and giving them an unique texture.

The first thing that comes in mind is the fact that for each square rendered there is a display list, and lets say for example I am going to render 3000 blocks. Then there will be atleast 3000 display lists in the world, I don't think that is a performance decrease but rather a memory hogger. (And thus knowing this I am at the moment thinking of a better way to handle this issue).

This is how I create my cubes.

This is how I draw them:

    public void render() {
        if (selected == 1) {
            glColor3f(1f, 0f, 0f);
            draw();
            selected = -1;
            return;
        }
        glColor3f(1f, 1f, 1f);
        draw();
    }

    public void draw() {
        glCallList(displayList);
    }


It is really not that big of a deal, but I am just leaving it here for pointers.

This is how they are parsed in my render loop

        /**
         * Apply Blocks
         */
        glPushMatrix();
        {
            glEnable(GL_TEXTURE_2D);
            glScalef(0.5f, 0.5f, 0.5f);
            renderBlocks();
        }
        glPopMatrix();


public void renderBlocks() {

        for (Block b : WorldManager.getWorld().
                getRegionForCoordinates(
                (int) gameCamera.position.x,
                (int) gameCamera.position.y,
                (int) gameCamera.position.z).getBlocks()) {
            if (b.isTransparent()) {
                transpBlocks.add(b);
                continue;
            }
            glPushMatrix();
            {
                glTranslatef(b.position.x, b.position.y, b.position.z);
                ((Renderable) b).render();
            }
            glPopMatrix();
        }

        //glDisable(GL_DEPTH_TEST);
        glDisable(GL_CULL_FACE);
        for (Block b : transpBlocks) {
            glPushMatrix();
            {
                glTranslatef(b.position.x, b.position.y, b.position.z);
                ((Renderable) b).render();
            }
            glPopMatrix();
        }
        //glEnable(GL_DEPTH_TEST);
        if (cullBack && !glIsEnabled(GL_CULL_FACE)) {
            glEnable(GL_CULL_FACE);
        }

        transpBlocks.clear();
    }


Alright and this is my region class, I know it's only sketchy.

public class Region {

    private HashMap<Vector3f, Block> blocks = new HashMap<Vector3f, Block>();
    private int regionX, regionY, regionZ;

    public Region(int regionX, int regionY, int regionZ) {
        this.regionX = regionX;
        this.regionY = regionY;
        this.regionZ = regionZ;
    }

    public void addBlock(int x, int y, int z, Block b) {
        if (x >= 128 || y >= 128 || z >= 128
                || x < 0 || y < 0 || z < 0) {
            return;
        }
        removeBlock(x, y, z);
        blocks.put(new Vector3f(x, y, z), b);
    }

    public boolean hasBlockAt(int x, int y, int z) {
        for (Object oKey : blocks.keySet().toArray()) {
            if (((Vector3f) oKey).x == x
                    && ((Vector3f) oKey).y == y
                    && ((Vector3f) oKey).z == z) {
                return true;
            }
        }
        return false;
    }

    public void removeBlock(int x, int y, int z) {
        if (x >= 128 || y >= 128 || z >= 128
                || x < 0 || y < 0 || z < 0) {
            return;
        }
        for (Object oKey : blocks.keySet().toArray()) {
            if (((Vector3f) oKey).x == x
                    && ((Vector3f) oKey).y == y
                    && ((Vector3f) oKey).z == z) {
                blocks.remove(oKey);
                return;
            }
        }
    }

    public Block getBlock(int x, int y, int z) {
        if (x >= 128 || y >= 128 || z >= 128
                || x < 0 || y < 0 || z < 0) {
            return null;
        }
        for (Object oKey : blocks.keySet().toArray()) {
            if (((Vector3f) oKey).x == x
                    && ((Vector3f) oKey).y == y
                    && ((Vector3f) oKey).z == z) {
                return blocks.get(oKey);
            }
        }
        return null;
    }

    public Block[] getBlocks() {
        Block[] b = new Block[getTotalBlocks()];
        return blocks.values().toArray(b);
    }

    public int getTotalBlocks() {
        return blocks.size();
    }
}


Now you got all the information.

And here is my computer specs:

AMD Phenom II X6 1100T 3.3GHz
ATI Radeon 5700

As you can see I have quite the powerful equipment, the display card isn't the best in the world, but it works.

In my opinion I should be able to render over 100k blocks without lagg. But again when I come to about 5000 > I start getting low fps.
So my question then becomes, how should I deal with rendering all these block? I am not asking for some specific code, but rather asking for a guideline or some tips.

CodeBunny

One thing I'd suggest is to reuse block rendering display lists.

For example, Minecraft has a lot of dirt blocks. I'm willing to bet real money that they share their rendering data (in your case, display lists). They may be unique "objects," but they share as much as they can.

Also, USE AN ARRAY. Using a HashMap there is painful to look at, and your performance will definitely be degraded in comparison. Just use a 3D array, man. That alone might get you some noticeable gains in performance. Try making it 20x20x20 (8000 blocks), and see what happens. You also won't be able to do collision detection (one of the main points of a tilemap) very well with a HashMap. I mean, you can make it work, but it will suck. Use an array.

Lastly, try placing the code that will render ALL of the blocks into a Display List and see what performance you can get up to. See if you can manage to render a 50x50x50 array.

Meanz

QuoteOne thing I'd suggest is to reuse block rendering display lists.

For example, Minecraft has a lot of dirt blocks. I'm willing to bet real money that they share their rendering data (in your case, display lists). They may be unique "objects," but they share as much as they can.

That was exactly what I was planning on doing, when I initially created the display lists it was for testing the use of texture binding inside a display list.

So my idea as yours obviously is, is to create a static declaration of each block type, for example dirt as you say have a static display list. Thus decreasing my display lists by an enormous amount.

Also, USE AN ARRAY. Using a HashMap there is painful to look at, and your performance will definitely be degraded in comparison. Just use a 3D array, man. That alone might get you some noticeable gains in performance. Try making it 20x20x20 (8000 blocks), and see what happens. You also won't be able to do collision detection (one of the main points of a tilemap) very well with a HashMap. I mean, you can make it work, but it will suck. Use an array.


First I started with a 3d array, but the problem that came was when rendering. I had to check each slot if there actually was a block there, otherwise I would be trying to render a null object, so when I switched over to hashmaps I increased the performance alot.
When I think about it twice, I could just add if(block == null) into my rendering method, so I will definately go back to 3d arrays.

QuoteLastly, try placing the code that will render ALL of the blocks into a Display List and see what performance you can get up to. See if you can manage to render a 50x50x50 array.

I will try that option, but what about removal of blocks, wouldn't that require me to rebuild the list ? As I see it, that is some heavy shit to process.

Meanz

Okay I tried benchmarking. I tried rendering all blocks into a displayList and then displaying the display list (though the display list was a one time compilation) And then I tried the original method rendering them by looping through each block.
Also I tried giving each unique block it's own display list.

I got odd results.

On all the different methods I tried, I couldn't even get a 1ms difference.

I was rendering 3000 blocks in 16 ms.
Which I still think is to slow, but I have to run to work now, so I will try to think of another way to handle it by that time.

CodeBunny

Have you reached your maximum fill rate? What are the results when you render from "further away?" If it's your fill rate that's limiting you, the only thing you can do is to get better hardware.

My guess is that your regular code is fast enough, it's simply your GPU that is limited.

How fast is the program when you don't try to have each in a separate display list (when you send each vertex manually)?

The big point about the 3D array is that you can use it for tilemap collision detection. Try using a linked list-3D array hodgepodge (have your 3D array reference entries in the linked list, which is used for rendering). That should give you the best of both worlds.

princec

Don't use display lists! Use VBOs.

Cas :)

Meanz

QuoteHave you reached your maximum fill rate? What are the results when you render from "further away?" If it's your fill rate that's limiting you, the only thing you can do is to get better hardware.

My guess is that your regular code is fast enough, it's simply your GPU that is limited.

I am sorry but, I don't really understand what you are aiming for ?
And it is definately not my hardware, as I stated I have quite new hardware.

QuoteHow fast is the program when you don't try to have each in a separate display list (when you send each vertex manually)?

When I use the glVertex commands (No display list) each frame takes 50ms to render.
With display lists 16ms.

This is with 3000 blocks.

QuoteDon't use display lists! Use VBOs.

Cas
I've read that display lists is supposed to be faster than vbo's?

CodeBunny

Graphics cards have a limit on their fill rate - the number of pixels that they can render in a second (usually, this is in the order of some billion; I've seen cards ranging from 4-54 billion pixel fill a second).

Pixel fill is why, in some games, particle effects render fine from a distance, but once up close, the frame rate drops, even if the exact same rendering procedure is occurring (i.e, no LoD reductions).

Consider a 500x500 textured quad. When viewed at 1-for-1, it's 500x500 pixels on the screen. So, it requires the graphics card to fill 250,000 pixels. But if it's viewed from 10 times the distance (so it appears as 50x50 onscreen) that's only 2,500 pixels.

My recommendation to "view from farther away" is so that you can see if any performance change occurs when you need to render less pixels.

NOTE: The above is only my informal understanding of the subject, if I'm wrong on any point please correct me.

From what I hear, Display Lists have odd corner cases, and though they should be faster than VBOs, driver developers seem to put more effort into making VBOs work better. It's only on nVidia that they are really faster.

Meanz

vsync was bothering me so I got weird benchmarks earlier.

Okay new benchmarks

Rendering 3000 blocks with glVertex commands:

21 FPS
48-50ms render time

Rendering 3000 blocks in one huge display list

80 FPS
11-12ms render time

Rendering 3000 blocks as 3000 separate display lists

75 FPS
12-13ms render time


Edit:

And I tried going away from fov, no difference in rendering time.

Morin

Quote from: Meanz on August 11, 2011, 08:58:05
Well, as you probably figured by the title, I am doing like everyone else, creating a minecrafty clone.
My reasons are simple, I wan't to learn 3d programming and I don't know squat about modelling etc.
So cubes are the easy way to go, besides finding textures is no problem either.
But I have managed to render about 3k cubes at once without dropping below 50 fps, I am using Display.sync 60.

This is my display list:
(...)

First, for all I know, forget display lists for this purpose. A DL is good if there is a lot of complexity stored in it. This isn't the case for a simple cube. To re-use a DL for each cube, you'll have to modify the transformation for each cube (you are doing this), which kills performance.

Some hints:

1. Absolutely turn on backface culling if you haven't already, since that eliminates half the polygons to render.

2. Only render cube faces that are facing air, not those that are facing another cube.

2a. Come up with a clever data structure that lets you quickly determine the faces to render in step 2, e.g. discarding a large region of air or the interior of a large region of solid cubes with a single test.

3. Sort cubes by render state (texture, material, transformation, effects, ...; changing color/normal/texture coords is allowed), and use that to render multiple cubes within a single glBegin/glEnd block.

3a. Again use a clever data structure to do this, or if you can't, do the following as a quick hack: Render a cube, then look ahead the next 3 cubes to render if any of them uses the same render state, then come back to render the ones you have skipped

thelittlebug

I've buyed minecraft 5 or 6 days ago. Since 3 days i'm trying to develop something similar.
I'm a Java noob, that makes it a little bit harder.

I have a Screenshot for you with 50k+ cubes at a framerate of 600+ on a not so good machine (buyed in summer 2008 with 550 euro)



I'm using some tricks to obtain this Framerate:
1. Dont render Faces not seeable (when 2 cubes connect there are not 2*6 faces)
2. hold everything precompiled (i use vbo, with 16*16*16 cubes) to lower the batchcount
3. those "chunks" gets frustumculled via bounding spheres (dont render chunks with 16*16*16 cubes if they are behind you)
4. minimize texture switches and things like that. all textures im using are in 1 big texture file

there are some other tricks i use to make these meshes editable and other things.

sry for my bad english, im a english noob too :)
Der Fehler ist immer und überall

Meanz

QuoteFirst, for all I know, forget display lists for this purpose. A DL is good if there is a lot of complexity stored in it. This isn't the case for a simple cube. To re-use a DL for each cube, you'll have to modify the transformation for each cube (you are doing this), which kills performance.

Some hints:

1. Absolutely turn on backface culling if you haven't already, since that eliminates half the polygons to render.

2. Only render cube faces that are facing air, not those that are facing another cube.

2a. Come up with a clever data structure that lets you quickly determine the faces to render in step 2, e.g. discarding a large region of air or the interior of a large region of solid cubes with a single test.

3. Sort cubes by render state (texture, material, transformation, effects, ...; changing color/normal/texture coords is allowed), and use that to render multiple cubes within a single glBegin/glEnd block.

3a. Again use a clever data structure to do this, or if you can't, do the following as a quick hack: Render a cube, then look ahead the next 3 cubes to render if any of them uses the same render state, then come back to render the ones you have skipped

1. Backface culling is implemented and it obviously works fine. Though turning it off gives me no difference in the manner of fps.

2. I will try my best

3. The word render state doesn't tell me anything as I am still a newbie, so a quick explanation would be lovely. (But my perception tells me that you are asking me to render one big cube instead of tons of small ones. Am I right?

QuoteI'm using some tricks to obtain this Framerate:
1. Dont render Faces not seeable (when 2 cubes connect there are not 2*6 faces)
2. hold everything precompiled (i use vbo, with 16*16*16 cubes) to lower the batchcount
3. those "chunks" gets frustumculled via bounding spheres (dont render chunks with 16*16*16 cubes if they are behind you)
4. minimize texture switches and things like that. all textures im using are in 1 big texture file

there are some other tricks i use to make these meshes editable and other things.

sry for my bad english, im a english noob too

Thank you for the confirmation ;)



Edit:
Alright, I have added a vbo and started drawing my chunks. I currently am rendering 1 chunk (4k blocks) and I am running on about 1k fps.
Though I am also rendering all the 6 faces, (face culling removes some of them). And I am rendering a big square, meaning the blocks inside the square isn't actually visible to my eyes, but I am still rendering them.

Time for cheap hacks ;p

Morin

Quote from: Meanz on August 13, 2011, 09:00:41
1. Backface culling is implemented and it obviously works fine. Though turning it off gives me no difference in the manner of fps.

First, I'm assuming here that by "Backface culling is implemented" you mean you turned it on in OpenGL and not implemented your own backface culling, because then you'd obviously have to turn off *both* for a meaningful experiment.

That it gives no difference in FPS is a hint that your performance is limited by the rendering code or by geometry computations, not by fill rate. In other words, you are drawing too many cubes, but not too many pixels. But also see my comment about render state below, because that might also cause this effect.

You can confirm this assumption: Increase/decrease screen resolution. It should have little effect on performance.

Quote from: Meanz on August 13, 2011, 09:00:41
3. The word render state doesn't tell me anything as I am still a newbie, so a quick explanation would be lovely. (But my perception tells me that you are asking me to render one big cube instead of tons of small ones. Am I right?

Although that would be another improvement, render state is a different thing. Without going into too much detail it means: Switching to another texture while you are drawing is an expensive thing. Avoid it. Same for switching material properties, lighting, or transformations.

Example: Suppose you have ten cubes, five with texture A and five with texture B. Suppose you render all ten cubes in a row, but first you render AAAAABBBBB, then you render ABABABABAB. Typically rendering the first way is much faster, because you switch to another texture only once. ("Typically" means: unless your graphics hardware is able to handle it).

Now with the simple textures you have in a Minecraft like game, the effect may be weak (no texture cache thrashing), but it's something worth trying.

The same problem occurs when you try to build a display list or similar *per cube*: This forces you to change the transformation before each cube (because each cube is at a different location), which is expensive.

OpenGL has a rule that certain calls must not occur between glBegin/glEnd. Part of the reason is that they change render state. Try to follow this rule and still stuff as many rendering as possible between a single glBegin/glEnd pair, and you avoid the costly state changes. (The other reason why you aren't allowed to, say, change the texture between glBegin/glEnd is that there is no defined behavior for changing the texture in the middle of a QUAD).

Meanz

QuoteFirst, I'm assuming here that by "Backface culling is implemented" you mean you turned it on in OpenGL and not implemented your own backface culling, because then you'd obviously have to turn off *both* for a meaningful experiment.

That it gives no difference in FPS is a hint that your performance is limited by the rendering code or by geometry computations, not by fill rate. In other words, you are drawing too many cubes, but not too many pixels. But also see my comment about render state below, because that might also cause this effect.

You can confirm this assumption: Increase/decrease screen resolution. It should have little effect on performance.

Your assumption is correct, I was wrong by using the word "implemented". But yes I am using opengl's backface culling.
And when I tried to change resolutions the fps was not changed.

QuoteAlthough that would be another improvement, render state is a different thing. Without going into too much detail it means: Switching to another texture while you are drawing is an expensive thing. Avoid it. Same for switching material properties, lighting, or transformations.

Example: Suppose you have ten cubes, five with texture A and five with texture B. Suppose you render all ten cubes in a row, but first you render AAAAABBBBB, then you render ABABABABAB. Typically rendering the first way is much faster, because you switch to another texture only once. ("Typically" means: unless your graphics hardware is able to handle it).

Now with the simple textures you have in a Minecraft like game, the effect may be weak (no texture cache thrashing), but it's something worth trying.

The same problem occurs when you try to build a display list or similar *per cube*: This forces you to change the transformation before each cube (because each cube is at a different location), which is expensive.

OpenGL has a rule that certain calls must not occur between glBegin/glEnd. Part of the reason is that they change render state. Try to follow this rule and still stuff as many rendering as possible between a single glBegin/glEnd pair, and you avoid the costly state changes. (The other reason why you aren't allowed to, say, change the texture between glBegin/glEnd is that there is no defined behavior for changing the texture in the middle of a QUAD).

Alright, I moved onto using a VBO and tried rendering 16x16x16 blocks in a big square (that's roughly 4k squares, 4k*6faces) and I got about 1k fps. And also I was not using transformations, as I was using glVertex(transformationX, transformationY, transformationZ) to determine the position of the block. And also Instead of using separate textures I am now using a image that consists of several textures bound together so now I can manipulate which texture to use by changing the values through my glTexCoord calls.

So I would say mission successfull. Though I have to come up with some clever method to ignore the cubes that are totally surrounded, and to lessen the amounts of faces drawn, even further than backface culling does.

Thank you for you help ;)


Morin

Quote from: Meanz on August 14, 2011, 08:10:37
Alright, I moved onto using a VBO and tried rendering 16x16x16 blocks in a big square (that's roughly 4k squares, 4k*6faces) and I got about 1k fps. And also I was not using transformations, as I was using glVertex(transformationX, transformationY, transformationZ) to determine the position of the block. And also Instead of using separate textures I am now using a image that consists of several textures bound together so now I can manipulate which texture to use by changing the values through my glTexCoord calls.

I didn't think of this when I replied, but I think combining textures into a super-texture is a common trick exactly to avoid texture changes. It is of course especially useful for those super-lo-res textures in a minecraft-like game (assuming you use similar 8x8 or 16x16 textures like minecraft itself).

The only thing you'd have to watch out for is mipmap bleeding, but with 8x8 textures you probably don't use mipmaps (scaled-down versions of the texture to use for polygons in the distance).

Quote from: Meanz on August 14, 2011, 08:10:37
So I would say mission successfull. Though I have to come up with some clever method to ignore the cubes that are totally surrounded, and to lessen the amounts of faces drawn, even further than backface culling does.

Not only for totally surrounded cubes. Remember that each individual cube face can be skipped if there is an adjacent cube for that face, even if the other faces are visible.

As a simple trick, you might store a bit field with 6 bits (using type "byte") for each cube, where each bit tells whether the corresponding face is visible. This bit is pre-computed from whether there is an adjacent cube in the corresponding direction. You'll have to update these bits when placing and removing cubes. When rendering the cube, you just check the bits and skip faces accordingly. This should speed things up (and if it doesn't, you have another indication towards where the performance bottleneck is -- probably loading cube data from RAM).

This is, however, a good starter. Move placing/deleting cubes into a separate method that also updates the bits, so you have the complex work at one place in the code. When testing the bits for each cube becomes too slow, you'll have to move to more complicated data structures, and the updates to this structure required to place or remove cubes become more complicated too, so it's good to have them in one place in the code.

More complex data structures will then allow you to skip multiple cubes in a single test. As an example, you might choose super-cubes of 16^3 or 32^3 (should be at least as large as the VBOs to simplify things). For each super-cube, store the number of filled cubes at startup, and update it when placing or removing cubes. When rendering, if the number is 0, skip the whole super-cube. If the number is equal to the size of the super-cube, at least you'll only have to render the outer cubes (and check face bits as before). You'll be able to skip all the empty air in a few steps.

Quote from: Meanz on August 14, 2011, 08:10:37
Thank you for you help ;)

Glad I could help :)