LWJGL Forum

Programming => Lightweight Java Gaming Library => Topic started by: DeX on July 02, 2005, 20:54:32

Title: Performance Issues with LWJGL
Post by: DeX on July 02, 2005, 20:54:32
I am currently torn between using Java or C++ to program my pool game.

I've made a test program in both C++ and Java using LWJGL which just consists of a bunch of spheres bouncing around a box and a light in the centre. The spheres are drawn with 32 slices and 32 stacks. I've found that in Java I'm only able to render 5 of these spheres before the framerate drops below 60 (which is the vsynch rate), anything more and it doesn't appear smooth. C++ on the other hand is able to display more than 60 spheres before showing any signs of slowing.

The code for both programs is pretty much exactly the same, in fact the C++ does a little more processing per frame than the Java program. I'm not at all very experienced with OpenGL so I don't know if the way I'm rendering spheres is the quickest way but it still is suprising that Java is more than 10 times slower than C++. Does anyone know of any ways I could improve performance maybe? Here is the render routine in Java:


public void render() {

Sphere s = new Sphere();

for (int i = 0; i < numBalls; i++) {
GL11.glPushMatrix();
GL11.glTranslatef((float)balls[i].pos.x,(float)balls[i].pos.y,(float)balls[i].pos.z);
GL11.glColor3f(balls[i].red, balls[i].green, balls[i].blue);
s.draw(balls[i].radius,32,32);
GL11.glPopMatrix();
}
}
Title: Performance Issues with LWJGL
Post by: Orangy Tang on July 02, 2005, 21:18:49
I assume that the Sphere.draw() is a glu sphere, in which case that's likely the cause of the slowdown. The ported GLU sphere uses immediate mode (glBegin/End etc.) which is just about the slowest method of drawing, even in C it incurs a significant hit because of the large amount of the shear number of function calls required.

However native calls from java have some additional overhead, so you're going to see a difference from the identical C code. Normally it'd be negilable, but immediate mode really is a worst case scenario. If you draw with vertex arrays you'd likely see near identical performance.

However an easier way would be to still use the glu sphere and stick it in a display list (and likewise for your C++ version). Minimal amout of code change and you don't have the same unnessisary overhead.
Title: Performance Issues with LWJGL
Post by: DeX on July 02, 2005, 23:21:10
Thanks for the display list tip, that's drastically improved the time taken to render the spheres. I can now display more spheres with more detail.

However this has uncovered another bottleneck in the code. It appears that the more spheres I add the longer it takes to update the display. This is the function which causes the delay:

Display.update();

It appears that the equivelent function in C++ is the SwapBuffers function but I can't imagine why it would take longer to swap the display buffers when I add more spheres. the Display.update() also polls for messages and input devices but again I don't see how this would be slowed down by having lots of spheres.

Is there any way to get around this?
Title: Performance Issues with LWJGL
Post by: Matzon on July 02, 2005, 23:42:56
could you post the code ?
else try a -Xprof to get some profiling going
Title: Performance Issues with LWJGL
Post by: Orangy Tang on July 02, 2005, 23:56:29
Assuming you're not vsync'd (which you shouldn't be when profiling code) then Display.update()/swapbuffers will block while the rendering pipeline gets flushed and the framebuffer actually displayed. Drawing more objects will naturally make this take longer (as your calls to draw display lists are only adding them onto a draw queue and returning nigh-on immediatly).

Again you've created for yourself another worst-case scenario - drawing huge amounts of polys in just a few quick calls to display lists and no other real work. In an actual game you can be happily doing your actual game logic while the rendering goes on in the background. I'd expect the C++ version to exhibit the same behaviour...?
Title: Performance Issues with LWJGL
Post by: DeX on July 03, 2005, 00:02:03
I haven't used XProf but I know this is the line causing the delay. I've surrounded the line with calls to System.nanoTime. I build up an average of the number of milliseconds that this line takes and display that every second in the window title. If you want to run the code you'll have to have Java 1.5.

The entire code can be found here. It should run with nothing more than the usual links to the LWJGL libraries I think:
http://homepage.ntlworld.com/alex.spurling/BallCollision.zip

Anyway here's some snippets from the code. This is the initGL routine which is called once at the start:

private void initGL() {

GL11.glEnable(GL11.GL_TEXTURE_2D); // Enable Texture Mapping
GL11.glShadeModel(GL11.GL_SMOOTH); // Enable Smooth Shading
GL11.glPolygonMode(GL11.GL_FRONT, GL11.GL_FILL ); // Back Face Is Filled In
GL11.glPolygonMode(GL11.GL_BACK, GL11.GL_POINT );

GL11.glClearColor(0.0f, 0.0f, 0.0f, 0.0f); // Black Background
GL11.glClearDepth(1.0); // Depth Buffer Setup
GL11.glEnable(GL11.GL_DEPTH_TEST); // Enables Depth Testing
GL11.glDepthFunc(GL11.GL_LEQUAL); // The Type Of Depth Testing To Do

GL11.glMatrixMode(GL11.GL_PROJECTION); // Select The Projection Matrix
GL11.glLoadIdentity(); // Reset The Projection Matrix

// Calculate The Aspect Ratio Of The Window
GLU.gluPerspective(
45.0f,
(float)glWindow.getWidth() / (float)glWindow.getHeight(),
0.1f,
100.0f);
       
GL11.glMatrixMode(GL11.GL_MODELVIEW); // Select The Modelview Matrix

// Really Nice Perspective Calculations
GL11.glHint(GL11.GL_PERSPECTIVE_CORRECTION_HINT, GL11.GL_NICEST);

GL11.glEnable(GL11.GL_COLOR_MATERIAL); //Enable colour material
GL11.glEnable(GL11.GL_LIGHTING);
}


Here's the RenderGL routine which renders all the balls and the surrounding box. It also sets up the light in this routine although I guess this should only be done once in the above InitGL routine:

public boolean renderGL() {

GL11.glClear(GL11.GL_COLOR_BUFFER_BIT | GL11.GL_DEPTH_BUFFER_BIT);          // Clear The Screen And The Depth Buffer
GL11.glMatrixMode(GL11.GL_MODELVIEW);
GL11.glLoadIdentity();                          // Reset The Current Modelview Matrix

GL11.glTranslatef(0.0f,0.0f,-12.0f);
GL11.glRotatef(angle,0.0f,1.0f,0.0f);

float lightAmbient[] = { 0.0f, 0.0f, 0.0f, 1.0f };
float lightDiffuse[] = { 0.5f, 0.5f, 0.5f, 1.0f };
float lightPosition[] = { 0.0f, 0.0f, 0.0f, 1.0f };
float lightSpecular[] = { 0.8f, 0.8f, 0.8f, 1.0f };


ByteBuffer temp = ByteBuffer.allocateDirect(16);
temp.order(ByteOrder.nativeOrder());
GL11.glLight(GL11.GL_LIGHT1, GL11.GL_AMBIENT, (FloatBuffer)temp.asFloatBuffer().put(lightAmbient).flip());              // Setup The Ambient Light
GL11.glLight(GL11.GL_LIGHT1, GL11.GL_DIFFUSE, (FloatBuffer)temp.asFloatBuffer().put(lightDiffuse).flip());              // Setup The Diffuse Light
GL11.glLight(GL11.GL_LIGHT1, GL11.GL_SPECULAR, (FloatBuffer)temp.asFloatBuffer().put(lightSpecular).flip());
GL11.glLight(GL11.GL_LIGHT1, GL11.GL_POSITION,(FloatBuffer)temp.asFloatBuffer().put(lightPosition).flip());         // Position The Light
GL11.glEnable(GL11.GL_LIGHT1);                          // Enable Light One

GL11.glMaterial(GL11.GL_FRONT, GL11.GL_SPECULAR,(FloatBuffer)temp.asFloatBuffer().put(lightSpecular).flip());
GL11.glMateriali(GL11.GL_FRONT,GL11.GL_SHININESS,70);

box.render();
sim.render();

long startTime = System.nanoTime();
glWindow.update();
long renderTime = System.nanoTime() - startTime;
totalRenderTime += (double)(renderTime / 1000000);

return true;
}


This is how I set up the display list for the spheres:

sphere = GL11.glGenLists(1);
Sphere s = new Sphere();
GL11.glNewList(sphere, GL11.GL_COMPILE);
s.draw(balls[0].radius,32,32);
GL11.glEndList();


And this is how I render all the spheres in the simulation:

public void render() {

for (int i = 0; i < numBalls; i++) {
GL11.glPushMatrix();
GL11.glTranslatef((float)balls[i].pos.x,(float)balls[i].pos.y,(float)balls[i].pos.z);
GL11.glColor3f(balls[i].red, balls[i].green, balls[i].blue);
GL11.glCallList(sphere);
GL11.glPopMatrix();
}
}


Edit: Thought I'd add my game loop:

glWindow = new GLWindow(1024, 768, 32, false, windowTitle);
Keyboard.create();
//Initialise OpenGL stuff
sim = new Simulation(40); //set up number of balls
box = new BoundingVolume(8.0f,6.0f,6.0f,new Vector3D(0.0,0.0,0.0));
initGL();
//Enter game loop
long lastTime = System.currentTimeMillis();
while (!done) {
loopCount++;
if(System.currentTimeMillis() - lastTime > 1000) {
glWindow.setTitle(windowTitle + " RTM: " + totalRenderTime / numFrames);
//glWindow.setTitle(windowTitle + " FPS: " + loopCount);  
lastTime = System.currentTimeMillis();
loopCount = 0;
}
done = glWindow.isWindowClosing();
processKeys();
update();
renderGL();
numFrames++;
}
//Exit game loop
shutdown();
Title: Performance Issues with LWJGL
Post by: DeX on July 03, 2005, 00:50:41
Quote from: "Orangy Tang"Assuming you're not vsync'd (which you shouldn't be when profiling code) then Display.update()/swapbuffers will block while the rendering pipeline gets flushed and the framebuffer actually displayed. Drawing more objects will naturally make this take longer (as your calls to draw display lists are only adding them onto a draw queue and returning nigh-on immediatly).

Again you've created for yourself another worst-case scenario - drawing huge amounts of polys in just a few quick calls to display lists and no other real work. In an actual game you can be happily doing your actual game logic while the rendering goes on in the background. I'd expect the C++ version to exhibit the same behaviour...?

I had the impression that the rendering of objects would be when the framebuffer was being built up. IE when I run glCallList(sphere) that it goes off and adds that sphere to the frame buffer. Then when I call Display.update it would swap the current frame and the back buffered frame.

I guess that if all the spheres are actually being drawn in the Display.update routine then that would explain why it takes a long time to process with more and more spheres. The C++ version behaves the same way. Once I use display lists then SwapBuffers function takes much longer to process.

So what should I do to improve my code? I don't think I can process the ball motions after rendering because of the way my collision code works. It relys on the rendering being done directly after the collision response.
Title: Performance Issues with LWJGL
Post by: DeX on July 03, 2005, 01:04:28
Ok now that I've changed both codes to use display lists I can increase the number of spheres that run smoothly in Java from 5 to about 35. However in C++ it increased from 60 to about 140. So I guess the original question still holds. How is it that C++ is able to display so many more polygons than Java when they both use the same graphics API and routines? I don't know if it's something I should be significantly worried about when making my game but the graphics may well become more advanced than a bunch of spheres moving around with one light so I want to make sure that I don't run into any performance barriers when I improve the graphics.
Title: Performance Issues with LWJGL
Post by: Orangy Tang on July 03, 2005, 01:04:50
Quote from: "DeX"I had the impression that the rendering of objects would be when the framebuffer was being built up. IE when I run glCallList(sphere) that it goes off and adds that sphere to the frame buffer. Then when I call Display.update it would swap the current frame and the back buffered frame.
From a strictly API view thats perfectly accurate, but somewhat different in practice. The graphics card is effectivly a second processor, and most gl commands are queued up and processed as soon as possible. glCallList may start drawing immediatly (if theres nothing else currently being drawn) but the method will return as soon as possible, while the actual drawing continues to take place in the background. Swapbuffers however explicitly blocks until the display is changed - which requires all pending drawing to be finished (flushing the pipeline). Display.update() doesn't actually do the rendering, it just has to wait if theres any still going on (unless you're on a freaky Cryo card, but those are just plain odd).

QuoteSo what should I do to improve my code? I don't think I can process the ball motions after rendering because of the way my collision code works. It relys on the rendering being done directly after the collision response.
Just swap the order of your update and rendering. You render the state from the previous frame, then go off and calculate the next state. Rendering still happens after logic, just at a different point in the frame.
Title: Performance Issues with LWJGL
Post by: DeX on July 03, 2005, 11:09:30
Thanks, Orangy Tang, I will keep that in mind as I develop the rest of the game. A the moment the physics part takes very little time compared with the rendering so swapping the order of the two makes little difference.

Did you see my last post? We both posted at the same time so you might not have read it. Still not sure about the answer to that one.
Title: Performance Issues with LWJGL
Post by: Orangy Tang on July 03, 2005, 13:22:02
Odd, I'd have thought you'd be seeing near-identical results now. :? You are using the same display mode / colour depth / anti-aliasing / etc. on both?
Title: Performance Issues with LWJGL
Post by: DeX on July 03, 2005, 13:33:04
Yep, both use 1024x768x32 resolution. Both are windowed. Both use spheres with the same level of detail (32x32). And both spend most of their time processing their respective Display.update and SwapUpdates routines. Anti Aliasing is turned off in both programs. There are some other small differences though. The Java version rotates the whole scene gradually and the C++ version displays stats onto the screen. Other than that there's no real difference.

I'd understand the Java code being slower in general but the slowest part is the graphics routines which I would have thought would be processed on the same way on the GPU.
Title: Performance Issues with LWJGL
Post by: Matzon on July 03, 2005, 14:14:55
I dont see anything inherently wrong with the java app (apart from some small optimizations). And all the time is spent in org.lwjgl.opengl.Win32ContextImplementation.nSwapBuffers
Can you post the C code too ?
Title: Performance Issues with LWJGL
Post by: DeX on July 03, 2005, 14:28:38
Ok but the C++ code is a lot messier (it's amazing how much you can clean up code by re-writing it):
http://homepage.ntlworld.com/alex.spurling/BallCollisionC++.zip
Title: Performance Issues with LWJGL
Post by: Matzon on July 03, 2005, 16:36:19
The call to:
GL11.glPolygonMode(GL11.GL_FRONT, GL11.GL_FILL );         // Back Face Is Filled In
GL11.glPolygonMode(GL11.GL_BACK, GL11.GL_POINT );

is the source for the slowdown. Comment those out, and the rotate, and speed is faster than C (probably because of no font rendering and unproject stuff).

Can you confirm ?
Title: Performance Issues with LWJGL
Post by: DeX on July 03, 2005, 17:01:51
Wow that's great, thanks very much Matzon. And thanks for taking the time to look through the code. The Java version does run just as fast as the C++ version now. :D

I don't understand exactly what that code is doing though. I used it originally to get rid of the back side of the box faces so that you could still see the balls as the box rotated. Now, I guess that removing that line means that the back face of the spheres are not drawn but the back faces of the box are still drawn. How can make it so I can see through the box's back faces?
Title: Performance Issues with LWJGL
Post by: Orangy Tang on July 03, 2005, 18:25:59
'fill' mode is the default, so the first line is probably not having any effect, it'll be the second line that's causing the problems I'll guess. I have seen some rediculous slow down with 'line' fill mode before (for quick and easy wireframe) so I wouldn't be too surprised to see 'point' mode being similarly slow.

For lines I found that actually using line primatives (specified in your draw calls) didn't suffer from this strange slowdown, I'd expect points to be the same.

I'm not quite sure what you're after, but it sounds like you should leave the fill mode alone and instead tinker with the face culling:

glEnable(GL_CULL_FACE);
glCullFace(GL_FRONT); // Doesn't draw front faces
glCullFace(GL_BACK); // Doesn't draw back faces

And make sure your geometry has the correct winding order set (via glFrontFace, default is usually correct).
Title: Performance Issues with LWJGL
Post by: DeX on July 04, 2005, 13:19:32
Thanks again, the cull back face is what I wanted to acheive.
Title: Performance Issues with LWJGL
Post by: funsheep on July 13, 2005, 07:47:44
Quote from: "Orangy Tang"... If you draw with vertex arrays you'd likely see near identical performance.

However an easier way would be to still use the glu sphere and stick it in a display list (and likewise for your C++ version) ...

In this context, i am wondering, which is faster, to pack all my code:

glBegin(GL_TRIANGLES);
   glVertex(...)*
   glNormal(...)*
   glColor(...)*
   .
   .
   .
glEnd();


in a displaylist, or to render with vertex arrays, or both vertex arrays packed in displaylists?
Title: Performance Issues with LWJGL
Post by: Optus on July 15, 2005, 23:53:42
Basically there's a little rule of thumb to decide what to use.  First, never ever use both, putting a vertex array inside a display list could actually slow things down.  Basically, you need to decide, does your data change?  In the case of a pool game, it seems like it would not.  You are rendering some spheres, or the pool table, or the pool cue.  None of these things are animated, the same vertex/texcoord/color/etc data will always be used.  Thus, the easiest and fastest way to render would be using Display Lists.  If the data were to change, you would have to recompile the Display List which can be pretty slow.  As an example of when to use Vertex Arrays, I use them for skeletal animation, because the vertex data is updated every frame based on the time passed and animation data.  It's faster to create a buffer for all of the updated vertices, and upload them to OpenGL all with one call using vertex arrays, rather then the potentially large amount of calls it would take to recompile the display list.