Main Menu

Speedy VBO

Started by LittleFrog, January 05, 2007, 21:08:32

Previous topic - Next topic

LittleFrog

How to use the VBO effectively. I try to speed up rendering process by divide a large mesh into small parts with the number of vertices and triangles lower than 4096 (GL_MAX_ELEMENTS_VERTICES and GL_MAX_ELEMENTS_INDICES like in http://lwjgl.org/wiki/doku.php/lwjgl/tutorials/opengl/speedyvbo ). However instead of increasing the speed the result is nearly 1.6 time slower, there's no improvement between
glDrawRangeElements and glDrawElements.

So my question is how to archive speedy vbo ?

I use Dell 360, 3.2Mhz, Nvidia 6600GT. I think i have the latest drive of Nividia
I try render Happy Budda: 543652 Vertices, 1087716 triangles.
Using normal VBO : render time is 0.029s
Using submesh(1000 submesh) and glDrawRangeElements rendertime is 0.046s

Is there any idea about what happen, if possible can some one send me the example that show benefit of glDrawRangeElements over the  glDrawElements, and the magic of GL_MAX_ELEMENTS_VERTICES and GL_MAX_ELEMENTS_INDICES.

Thanks alot

darkprophet

That is strange...

The test to backup the GL_MAX_ELEMENT_VERTICES/INDICES was a highly tessellated terrain. Anything about those magical values resulted in a severe slowdown during the rendering of a static VBO with an elemental index. Interleaving the data gave a substantial speed boost, but again, staying under or equal to those values gave a decent boost.

From the OGL reference pages regarding glDrawRangeElements:

Quote
Implementations denote recommended maximum amounts of vertex and index  data,  which  may  be  queried  by  calling  glGet with argument GL_MAX_ELEMENTS_VERTICES and GL_MAX_ELEMENTS_INDICES.  If end - start + 1 is greater than the value of GL_MAX_ELEMENTS_VERTICES, or if count is       greater than the value of GL_MAX_ELEMENTS_INDICES, then  the  call  may operate  at reduced performance.

Unfortunately , I cannot explain your findings.

As a side note, if you add EXTCompiledVertexArray.glLockArrayEXT(0, vertexCount); before your glDrawRangeElements call and EXTCompiledVertexArray.glUnlockArrayEXT(); after, you also get a boost under ATI cards (there is some boost under NV cards, but its insignificant).

DP

LittleFrog


I try the number of sub-meshes varies from 2, 20 , 50 (the mesh index fits into short int range) , and 1000 (speedy vbo) the runtime reduces from 0.053, 0.051, 0.048 to 0.043 so actually break down a large mesh into smaller ones to can increase the effectiveness of VBO, however that is not dramatically increase in speed.

Another thing i concern that is to breakdown the mesh i have to reorder the faces, in this case I use spacial sort because of its simplicity. But the rendering time (without sub-meshing) for the mesh dramatically change from 0.029s (original mesh) to 0.073 (sorted mesh) although the number of vertex and faces are the same (the only things i do are reordering and remapping). Is there any better strategy to break down the mesh.



darkprophet

What I would do is use a cache optimising index sorter to resort the indices of the budha so the GPU cache stays nice and warm (there are a few good articles about this). Then, evenly divide the new index buffer up into N chunks. So after you sort, divide the number of indices by GL_MAX_ELEMENTS_INDICES and use that number to split the mesh up.

What this does is each submesh becomes cache friendly within the mesh as well as with the before and after meshes...

HTH, DP