Performance Issues with LWJGL

Started by DeX, July 02, 2005, 20:54:32

Previous topic - Next topic

DeX

I am currently torn between using Java or C++ to program my pool game.

I've made a test program in both C++ and Java using LWJGL which just consists of a bunch of spheres bouncing around a box and a light in the centre. The spheres are drawn with 32 slices and 32 stacks. I've found that in Java I'm only able to render 5 of these spheres before the framerate drops below 60 (which is the vsynch rate), anything more and it doesn't appear smooth. C++ on the other hand is able to display more than 60 spheres before showing any signs of slowing.

The code for both programs is pretty much exactly the same, in fact the C++ does a little more processing per frame than the Java program. I'm not at all very experienced with OpenGL so I don't know if the way I'm rendering spheres is the quickest way but it still is suprising that Java is more than 10 times slower than C++. Does anyone know of any ways I could improve performance maybe? Here is the render routine in Java:

public void render() {
		
		Sphere s = new Sphere();
		
		for (int i = 0; i < numBalls; i++) {
			GL11.glPushMatrix();
			GL11.glTranslatef((float)balls[i].pos.x,(float)balls[i].pos.y,(float)balls[i].pos.z);
			GL11.glColor3f(balls[i].red, balls[i].green, balls[i].blue);
			s.draw(balls[i].radius,32,32);
			GL11.glPopMatrix();
		}
	}

Orangy Tang

I assume that the Sphere.draw() is a glu sphere, in which case that's likely the cause of the slowdown. The ported GLU sphere uses immediate mode (glBegin/End etc.) which is just about the slowest method of drawing, even in C it incurs a significant hit because of the large amount of the shear number of function calls required.

However native calls from java have some additional overhead, so you're going to see a difference from the identical C code. Normally it'd be negilable, but immediate mode really is a worst case scenario. If you draw with vertex arrays you'd likely see near identical performance.

However an easier way would be to still use the glu sphere and stick it in a display list (and likewise for your C++ version). Minimal amout of code change and you don't have the same unnessisary overhead.

DeX

Thanks for the display list tip, that's drastically improved the time taken to render the spheres. I can now display more spheres with more detail.

However this has uncovered another bottleneck in the code. It appears that the more spheres I add the longer it takes to update the display. This is the function which causes the delay:

Display.update();

It appears that the equivelent function in C++ is the SwapBuffers function but I can't imagine why it would take longer to swap the display buffers when I add more spheres. the Display.update() also polls for messages and input devices but again I don't see how this would be slowed down by having lots of spheres.

Is there any way to get around this?

Matzon

could you post the code ?
else try a -Xprof to get some profiling going

Orangy Tang

Assuming you're not vsync'd (which you shouldn't be when profiling code) then Display.update()/swapbuffers will block while the rendering pipeline gets flushed and the framebuffer actually displayed. Drawing more objects will naturally make this take longer (as your calls to draw display lists are only adding them onto a draw queue and returning nigh-on immediatly).

Again you've created for yourself another worst-case scenario - drawing huge amounts of polys in just a few quick calls to display lists and no other real work. In an actual game you can be happily doing your actual game logic while the rendering goes on in the background. I'd expect the C++ version to exhibit the same behaviour...?

DeX

I haven't used XProf but I know this is the line causing the delay. I've surrounded the line with calls to System.nanoTime. I build up an average of the number of milliseconds that this line takes and display that every second in the window title. If you want to run the code you'll have to have Java 1.5.

The entire code can be found here. It should run with nothing more than the usual links to the LWJGL libraries I think:
http://homepage.ntlworld.com/alex.spurling/BallCollision.zip

Anyway here's some snippets from the code. This is the initGL routine which is called once at the start:
private void initGL() {
		
		GL11.glEnable(GL11.GL_TEXTURE_2D); // Enable Texture Mapping
		GL11.glShadeModel(GL11.GL_SMOOTH); // Enable Smooth Shading
		GL11.glPolygonMode(GL11.GL_FRONT, GL11.GL_FILL );			// Back Face Is Filled In
		GL11.glPolygonMode(GL11.GL_BACK, GL11.GL_POINT );

		GL11.glClearColor(0.0f, 0.0f, 0.0f, 0.0f); // Black Background
		GL11.glClearDepth(1.0); // Depth Buffer Setup
		GL11.glEnable(GL11.GL_DEPTH_TEST); // Enables Depth Testing
		GL11.glDepthFunc(GL11.GL_LEQUAL); // The Type Of Depth Testing To Do

		GL11.glMatrixMode(GL11.GL_PROJECTION); // Select The Projection Matrix
		GL11.glLoadIdentity(); // Reset The Projection Matrix

		// Calculate The Aspect Ratio Of The Window
		GLU.gluPerspective(
				45.0f,
				(float)glWindow.getWidth() / (float)glWindow.getHeight(),
				0.1f,
				100.0f);
	         
		GL11.glMatrixMode(GL11.GL_MODELVIEW); // Select The Modelview Matrix

		// Really Nice Perspective Calculations
		GL11.glHint(GL11.GL_PERSPECTIVE_CORRECTION_HINT, GL11.GL_NICEST);
		
		GL11.glEnable(GL11.GL_COLOR_MATERIAL);			//Enable colour material
		GL11.glEnable(GL11.GL_LIGHTING);	
	}


Here's the RenderGL routine which renders all the balls and the surrounding box. It also sets up the light in this routine although I guess this should only be done once in the above InitGL routine:
public boolean renderGL() {

		GL11.glClear(GL11.GL_COLOR_BUFFER_BIT | GL11.GL_DEPTH_BUFFER_BIT);          // Clear The Screen And The Depth Buffer
		GL11.glMatrixMode(GL11.GL_MODELVIEW);
		GL11.glLoadIdentity();                          // Reset The Current Modelview Matrix

		GL11.glTranslatef(0.0f,0.0f,-12.0f);
		GL11.glRotatef(angle,0.0f,1.0f,0.0f);
		
		float lightAmbient[] = { 0.0f, 0.0f, 0.0f, 1.0f };
		float lightDiffuse[] = { 0.5f, 0.5f, 0.5f, 1.0f };
		float lightPosition[] = { 0.0f, 0.0f, 0.0f, 1.0f };
		float lightSpecular[] = { 0.8f, 0.8f, 0.8f, 1.0f };

		
		ByteBuffer temp = ByteBuffer.allocateDirect(16);
		temp.order(ByteOrder.nativeOrder());
		GL11.glLight(GL11.GL_LIGHT1, GL11.GL_AMBIENT, (FloatBuffer)temp.asFloatBuffer().put(lightAmbient).flip());              // Setup The Ambient Light
		GL11.glLight(GL11.GL_LIGHT1, GL11.GL_DIFFUSE, (FloatBuffer)temp.asFloatBuffer().put(lightDiffuse).flip());              // Setup The Diffuse Light
		GL11.glLight(GL11.GL_LIGHT1, GL11.GL_SPECULAR, (FloatBuffer)temp.asFloatBuffer().put(lightSpecular).flip()); 
		GL11.glLight(GL11.GL_LIGHT1, GL11.GL_POSITION,(FloatBuffer)temp.asFloatBuffer().put(lightPosition).flip());         // Position The Light
		GL11.glEnable(GL11.GL_LIGHT1);                          // Enable Light One 
		
		GL11.glMaterial(GL11.GL_FRONT, GL11.GL_SPECULAR,(FloatBuffer)temp.asFloatBuffer().put(lightSpecular).flip());
		GL11.glMateriali(GL11.GL_FRONT,GL11.GL_SHININESS,70);

		box.render();
		sim.render();

		long startTime = System.nanoTime();
		glWindow.update();
		long renderTime = System.nanoTime() - startTime;
		totalRenderTime += (double)(renderTime / 1000000);
		
		return true;
	}


This is how I set up the display list for the spheres:
sphere = GL11.glGenLists(1);
Sphere s = new Sphere();
GL11.glNewList(sphere, GL11.GL_COMPILE);
s.draw(balls[0].radius,32,32);
GL11.glEndList();


And this is how I render all the spheres in the simulation:
public void render() {
		
		for (int i = 0; i < numBalls; i++) {
			GL11.glPushMatrix();
			GL11.glTranslatef((float)balls[i].pos.x,(float)balls[i].pos.y,(float)balls[i].pos.z);
			GL11.glColor3f(balls[i].red, balls[i].green, balls[i].blue);
			GL11.glCallList(sphere);
			GL11.glPopMatrix();
		}
	}


Edit: Thought I'd add my game loop:
glWindow = new GLWindow(1024, 768, 32, false, windowTitle);
			Keyboard.create();
			//Initialise OpenGL stuff
			sim = new Simulation(40); //set up number of balls
			box = new BoundingVolume(8.0f,6.0f,6.0f,new Vector3D(0.0,0.0,0.0));
			initGL();
			//Enter game loop
			long lastTime = System.currentTimeMillis();
			while (!done) {
				loopCount++;
				if(System.currentTimeMillis() - lastTime > 1000) {
					glWindow.setTitle(windowTitle + " RTM: " + totalRenderTime / numFrames);
					//glWindow.setTitle(windowTitle + " FPS: " + loopCount);   
					lastTime = System.currentTimeMillis();
					loopCount = 0;
				}
				done = glWindow.isWindowClosing();
				processKeys();
				update();
				renderGL();
				numFrames++;
			}
			//Exit game loop
			shutdown();

DeX

Quote from: "Orangy Tang"Assuming you're not vsync'd (which you shouldn't be when profiling code) then Display.update()/swapbuffers will block while the rendering pipeline gets flushed and the framebuffer actually displayed. Drawing more objects will naturally make this take longer (as your calls to draw display lists are only adding them onto a draw queue and returning nigh-on immediatly).

Again you've created for yourself another worst-case scenario - drawing huge amounts of polys in just a few quick calls to display lists and no other real work. In an actual game you can be happily doing your actual game logic while the rendering goes on in the background. I'd expect the C++ version to exhibit the same behaviour...?

I had the impression that the rendering of objects would be when the framebuffer was being built up. IE when I run glCallList(sphere) that it goes off and adds that sphere to the frame buffer. Then when I call Display.update it would swap the current frame and the back buffered frame.

I guess that if all the spheres are actually being drawn in the Display.update routine then that would explain why it takes a long time to process with more and more spheres. The C++ version behaves the same way. Once I use display lists then SwapBuffers function takes much longer to process.

So what should I do to improve my code? I don't think I can process the ball motions after rendering because of the way my collision code works. It relys on the rendering being done directly after the collision response.

DeX

Ok now that I've changed both codes to use display lists I can increase the number of spheres that run smoothly in Java from 5 to about 35. However in C++ it increased from 60 to about 140. So I guess the original question still holds. How is it that C++ is able to display so many more polygons than Java when they both use the same graphics API and routines? I don't know if it's something I should be significantly worried about when making my game but the graphics may well become more advanced than a bunch of spheres moving around with one light so I want to make sure that I don't run into any performance barriers when I improve the graphics.

Orangy Tang

Quote from: "DeX"I had the impression that the rendering of objects would be when the framebuffer was being built up. IE when I run glCallList(sphere) that it goes off and adds that sphere to the frame buffer. Then when I call Display.update it would swap the current frame and the back buffered frame.
From a strictly API view thats perfectly accurate, but somewhat different in practice. The graphics card is effectivly a second processor, and most gl commands are queued up and processed as soon as possible. glCallList may start drawing immediatly (if theres nothing else currently being drawn) but the method will return as soon as possible, while the actual drawing continues to take place in the background. Swapbuffers however explicitly blocks until the display is changed - which requires all pending drawing to be finished (flushing the pipeline). Display.update() doesn't actually do the rendering, it just has to wait if theres any still going on (unless you're on a freaky Cryo card, but those are just plain odd).

QuoteSo what should I do to improve my code? I don't think I can process the ball motions after rendering because of the way my collision code works. It relys on the rendering being done directly after the collision response.
Just swap the order of your update and rendering. You render the state from the previous frame, then go off and calculate the next state. Rendering still happens after logic, just at a different point in the frame.

DeX

Thanks, Orangy Tang, I will keep that in mind as I develop the rest of the game. A the moment the physics part takes very little time compared with the rendering so swapping the order of the two makes little difference.

Did you see my last post? We both posted at the same time so you might not have read it. Still not sure about the answer to that one.

Orangy Tang

Odd, I'd have thought you'd be seeing near-identical results now. :? You are using the same display mode / colour depth / anti-aliasing / etc. on both?

DeX

Yep, both use 1024x768x32 resolution. Both are windowed. Both use spheres with the same level of detail (32x32). And both spend most of their time processing their respective Display.update and SwapUpdates routines. Anti Aliasing is turned off in both programs. There are some other small differences though. The Java version rotates the whole scene gradually and the C++ version displays stats onto the screen. Other than that there's no real difference.

I'd understand the Java code being slower in general but the slowest part is the graphics routines which I would have thought would be processed on the same way on the GPU.

Matzon

I dont see anything inherently wrong with the java app (apart from some small optimizations). And all the time is spent in org.lwjgl.opengl.Win32ContextImplementation.nSwapBuffers
Can you post the C code too ?

DeX

Ok but the C++ code is a lot messier (it's amazing how much you can clean up code by re-writing it):
http://homepage.ntlworld.com/alex.spurling/BallCollisionC++.zip

Matzon

The call to:
GL11.glPolygonMode(GL11.GL_FRONT, GL11.GL_FILL );         // Back Face Is Filled In
GL11.glPolygonMode(GL11.GL_BACK, GL11.GL_POINT );

is the source for the slowdown. Comment those out, and the rotate, and speed is faster than C (probably because of no font rendering and unproject stuff).

Can you confirm ?