LWJGL performance profiling

Started by luigi_maroni, June 07, 2009, 09:15:44

Previous topic - Next topic

luigi_maroni

First let me say I'm very happy with LWJGL, having used it for many projects. For the first time today, I've run a profiler against my latest project. No particular reason except that I'm quite a defensive programmer performance-wise; I'm not experiencing any performance problems.

I've got some pretty challenging things going on in this project, with relatively involved AI, physics, music and some Java2D integration. Interestingly though, it turns out a whole 74% of CPU time is spent inside org.lwjgl.opengl.GL11.nglGetIntegerv, specifically as part of this call-chain:

org.lwjgl.opengl.GL11.glGetInteger(GL11.java:1341)
   org.lwjgl.opengl.GLChecks.checkBufferObject(GLChecks.java:82)
   org.lwjgl.opengl.GLChecks.ensureUnpackPBOdisabled(GLChecks.java:125)
   org.lwjgl.opengl.GL11.glTexSubImage2D(GL11.java:2822)

Having a poke around in the code I find that GLChecks.ensureUnpackPBOdisabled() is called from many methods inside the GL wrapper, and in fact these ensureXXX methods are basically a common theme.

I have to wonder whether they're necessary every call. Wouldn't it be sufficient to check for these each time a GLContext is made current for example?

Matzon

the call to glGetInteger is not the cause of the cpu spending. if you remove that call it just goes to somewhere else ... the reason why it looks like its happening there is because a call to glGetInteger flushes the ogl pipeline. There is a similar thread about it ... cant find it tho.

I am not sure to which end we can optimize the checks - but you can compile without checks if you recompile yourself. (generate checks in the build file)

princec

Hmmm.. even so that's a tragic performance bottleneck if you're relying on any code that calls GLChecks. Is there not something we can do with a commandline flag to disable GLChecks?

Cas :)

Matzon

I am fairly certain that the 74% would just be spend somewhere ELSE where it will flush the pipeline - which it does *eventually*.
Or in other words, these checks are not a performance problem.

However, you can just generate without checks, which I believe the lwjgl 2.2.0 release will do - a lwjgl.jar and a lwjgl-debug.jar

princec

Quote from: Matzon on June 15, 2009, 10:38:09
I am fairly certain that the 74% would just be spend somewhere ELSE where it will flush the pipeline - which it does *eventually*.
Or in other words, these checks are not a performance problem.

However, you can just generate without checks, which I believe the lwjgl 2.2.0 release will do - a lwjgl.jar and a lwjgl-debug.jar

OOoh nice, so the lwjgl.jar now has the checking disabled?
Will lwjgl-debug.jar now have a check after each GL command as well? That'd be handy.

Cas :)

Mickelukas

That would be very handy, would love that approach :)

Riven

Quote from: Matzon on June 15, 2009, 10:38:09
I am fairly certain that the 74% would just be spend somewhere ELSE where it will flush the pipeline - which it does *eventually*.
Or in other words, these checks are not a performance problem.

However, you can just generate without checks, which I believe the lwjgl 2.2.0 release will do - a lwjgl.jar and a lwjgl-debug.jar

Just registered to mention that this is a really big misconception.

The GPU is handling a lot asynchronously. When you are flushing, you will flush the commands, and block until the GPU has processed them. Meanwhile the CPU is blocking, so it doesn't send any new commands to the GPU. Once you start feeding it new commands, the GPU will have been idling *many* cycles.

Flushing on every VBO access can/will drop your performance considerably.

gima

Quote from: Matzon on June 15, 2009, 10:38:09
I am fairly certain that the 74% would just be spend somewhere ELSE where it will flush the pipeline - which it does *eventually*.
Or in other words, these checks are not a performance problem.

However, you can just generate without checks, which I believe the lwjgl 2.2.0 release will do - a lwjgl.jar and a lwjgl-debug.jar

The most recent SVN build I checked still has a performance-degrading check on every swapBuffers() call.
Link to the file in question


When I tried to dig information about how to upload large textures to GPU memory without the CPU-side blocking, I came to a rather daunting conclusion. The proper way for the moment is to use many smaller textures with glTexSubImage() along with Pixel Buffer Objects that do the the transfer in the background starting from swapBuffers() call. If the upload would not be finished when the next frame will be updated with swapBuffers() call, the CPU-side would block.

The problem with LWJGL is, that the first swapBuffers() call will block in GL11.glGetError() because the OpenGL cannot know the status of errors until the whole pipeline has been processed.

This forces the OpenGL pipeline to flush and puts the advantages offered by asynchronous DMA upload via PBO useless. And this may not be the only place where the pipeline is forcibly flushed. Hope you catch them all when making modifications.

I really hope that you provide completely different .jar package for debug-enabled LWJGL. Ofcourse I'm not an expert with building those packages and if it would be too much work to maintain two different packages, I would be glad to have those environment-variable tuned debug-flags available that were mentioned in the thread I linked to.


"Threads about same problem":

Fool Running

I'm fairly certain that Matzon is correct. Removing the glCheckError() won't help because swapBuffers() needs to flush the pipeline, anyways, before it can swap (i.e. The call to Context.swapBuffers() would then do the blocking). The pipeline has to be flushed every frame. Otherwise you could get many frames ahead of the GPU, which would be bad.
Programmers will, one day, rule the world... and the world won't notice until its too late.Just testing the marquee option ;D

spasi

I've just committed 2 fixes for both the issues described by luigi_maroni and hanrock.

- PBO-enabled functions will now use LWJGL's state tracking to check the buffer binding state, thus removing the need to call glGetInteger every time.

- The glGetError before buffer swapping will only be called when debug mode is enabled.

I would very much appreciate it if you could download a fresh build (#222) to try out these changes and let me know the results.

gima

Conclusion: Your changes seem to have made my PBO-related worry evaporate. Thank you.
About the luigi_maroni's ensureUnpackPBOdisabled() problem, well, I can't really comment on them as I've not put my eyes on them at all.


This is the code I used for testing: http://paste.servut.us/1itu (Didn't dare to paste using bbCode's code-block as I don't know how awfully long the post would have been made. Though for future references to the code it would probably be best if it was included in this post.

So me. I forgot part of the code. The GLTextureLoader: http://paste.servut.us/yed8 Yay.



LWJGL v2.1.0, actually using the texture (average for five runs of the test):

Loading image started
Loading image finished, took 509ms
Initializing texture started
Initializing texture finished, took 6ms
Map PBO started
Map PBO finished, took 0ms
Image conversion to BGR(A)8 format started
Image conversion to BGR(A)8 format finished, took 131ms
Unmap PBO started
Unmap PBO finished, took 0ms
Initiate image upload started
Initiate image upload finished, took 3ms

Display update started
Display update finished, took 81ms
Display update started
Display update finished, took 9ms
Display update started
Display update finished, took 0ms
Display update started
Display update finished, took 0ms




LWJGL v2.1.0, without actually using the texture (average for five runs of the test): (Only differences)
Loading image finished, took 643ms
Initializing texture finished, took 3ms
Image conversion to BGR(A)8 format finished, took 137ms

Display update started
Display update finished, took 81ms
Display update started
Display update finished, took 6ms




LWJGL v2.2.0, actually using the texture (average for five runs of the test): (Only differences)
Loading image finished, took 647ms
Initializing texture finished, took 0ms
Image conversion to BGR(A)8 format finished, took 140ms
Initiate image upload finished, took 0ms

Display update started
Display update finished, took 3ms
Display update started
Display update finished, took 84ms




LWJGL v2.2.0, without actually using the texture (average for five runs of the test): (Only differences)
Loading image finished, took 653ms
Initializing texture finished, took 0ms
Map PBO finished, took 3ms
Image conversion to BGR(A)8 format finished, took 131ms
Initiate image upload finished, took 0ms

Display update started
Display update finished, took 0ms
Display update started
Display update finished, took 90ms




gima

Any estimate on when the new v2.2.0 will be released as stable, like v2.1.0 is now?
It would probably be unwise to just grab a v2.2.0 build and start programming on top of it, wouldn't it?

Matzon

2.2.0 is probably soon - and fwiw, unless you're doing a commercial game with a short timeline, I'd use the latest-and-greatest