LWJGL Forum

Programming => OpenGL => Topic started by: Qudus on January 15, 2008, 00:30:38

Title: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 15, 2008, 00:30:38
The title says it. In Xith3D using the native Display class to handle the OpenGL context is comparably fast to JOGL. But using AWTGLCanvas is much faster! Why is this? Am I doing anything wrong?

Here is the context handing class for the Display class: http://xith3d.svn.sourceforge.net/viewvc/xith3d/trunk/src/org/xith3d/render/lwjgl/CanvasPeerImplNative.java?view=markup
And here is the one for the AWTGLCanvas: http://xith3d.svn.sourceforge.net/viewvc/xith3d/trunk/src/org/xith3d/render/lwjgl/CanvasPeerImplAWT.java?view=markup

As you can see, the only really differing code is the constructor, where the display is created and of course the extended AWTGLCanvas. These two classes are the only ones, that differ for these two renderers.

In the Quake3 flight test I get about 200 FPS with native LWJGL and 275 FPS with the AWTGLCanvas, which is a quite amazing number. And I would like to get this number for native LWJGL, too, if possible.

Any thoughts? Thanks.

Marvin

PS: In both cases I use an 800x600x24 sized non-fullscreen window and no vsync.
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: darkprophet on January 15, 2008, 00:41:46
Im sorry, but if you expect people to help you, provide a self contained test case. Uploading a very Xith-3D centred class isn't going to help much since a) I can't run it. b) I dont know how/when/why you call those methods and under what circumstances.

Self contained test case please :)

DP :)
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 15, 2008, 00:51:13
Well, the constructor is called with the parameters explained above (+ null owner, 24 bit depth buffer).
The makeCurrent() method is called when the Thread to be used for rendering is not the current context-thread (which is never in my case).
The initRenderingImpl() method is called each thread to invoke the rendering.

That's all.

Of course a self contained testcase would be best, but I have no experience with using LWJGL directly. So creating a testcase, that actually renders something, is very hard for me. I thought, the class was that short (considering only the three used methods), that it might help you do tell me, if I'm doing something obviously evil.

If the code plus these explanations is not enough, I would suggest, that I simply create a testcase, where the display is created like in my classes and you can insert your rendering code at the right place. I'm sure, you have some amazing render examples in your repertoire ;). Would that be sufficient?

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Matthias on January 15, 2008, 01:08:42
Well - switching context is very slow anyway - so stop doing it and you get a much higher framerate. Also don't call glGetXYZ() - it works against multi threaded OpenGL drivers (like the nVidia one).

Ciao Matthias
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 15, 2008, 02:44:33
Thanks for the quick reply.

Quote from: Matthias on January 15, 2008, 01:08:42
Well - switching context is very slow anyway - so stop doing it and you get a much higher framerate. Also don't call glGetXYZ() - it works against multi threaded OpenGL drivers (like the nVidia one).

This might be a dumb question. But where am I switching the context? If you're talking about the makeCurrent() call, then remember, that I said, that this method is never called in my testcase.

I only call glGetXYZ in the initialization phase, this should be ok, isn't it?.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 15, 2008, 02:46:01
Is there an example for the usage of AWTGLCanvas somewhere?
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: bobjob on January 15, 2008, 04:06:44
is it possible that the vsync isnt working in the awt version?
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 15, 2008, 04:13:33
Quote from: bobjob on January 15, 2008, 04:06:44
is it possible that the vsync isnt working in the awt version?

I am not using vsync in both cases.
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: wolf_m on January 15, 2008, 05:04:07
Quote from: Qudus on January 15, 2008, 00:51:13

If the code plus these explanations is not enough, I would suggest, that I simply create a testcase, where the display is created like in my classes and you can insert your rendering code at the right place. I'm sure, you have some amazing render examples in your repertoire ;). Would that be sufficient?

Marvin
See http://lwjgl.org/wiki/doku.php/lwjgl/tutorials/opengl/basicopengl#rendering_our_square (method: render()) for very simple, thus high-framerates GL code. This should introduce an even bigger gap if the native Display is actually the problem here because if there's a slowdown per frame, this slowdown results in a higher difference with more frames per second, obviously. If the difference in FPS remains the same, there's some subtle difference on xith's side or the problem is this ominous Q3 flight thing. Or something entirely different.
You need to increase angle every gametick to let the cube rotate, by the way.

Why don't you ask the xith guys as well, by the way? They should know what the problem is. http://www.xith.org/forum/index.php
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 15, 2008, 19:09:31
Quote from: wolf_m on January 15, 2008, 05:04:07
See http://lwjgl.org/wiki/doku.php/lwjgl/tutorials/opengl/basicopengl#rendering_our_square (method: render()) for very simple, thus high-framerates GL code. This should introduce an even bigger gap if the native Display is actually the problem here because if there's a slowdown per frame, this slowdown results in a higher difference with more frames per second, obviously. If the difference in FPS remains the same, there's some subtle difference on xith's side or the problem is this ominous Q3 flight thing. Or something entirely different.
You need to increase angle every gametick to let the cube rotate, by the way.

Cool. Thanks. I will create a testcase now...

Quote from: wolf_m on January 15, 2008, 05:04:07
Why don't you ask the xith guys as well, by the way? They should know what the problem is. http://www.xith.org/forum/index.php

Haha ;D. Yes, maybe I should ask... myself ;).

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 15, 2008, 20:24:14
Here is a testcase. Unfortunately I didn't get the AWT version to render. Don't know, why. Please have a look at it. Thanks.

Btw. In the wiki is an article about LWJGL and AWT, but it isn't written yet. It this planned?

Marvin

EDIT: Modified the testcase to avoid AWT threading issues. But it didn't make it render, too.
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 17, 2008, 01:31:59
Any news here? Is the testcase good? Or do you need some more explanations?

This is very important. Would be extremely cool, if someone could have a look at it. Thanks.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Matzon on January 17, 2008, 20:40:48
not sure about your test case  - seems needlessly complex  ???

check org.lwjgl.test.opengl.awt.AWTGears and org.lwjgl.test.opengl.Gears

Same application - one using native display - the other awt
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 17, 2008, 21:49:30
Thanks.

Running these two tests I get the following results:
Gears:
13013 frames in 5.0 seconds = 2602.6
14154 frames in 5.0 seconds = 2830.8
14290 frames in 5.0 seconds = 2858.0
17005 frames in 5.0 seconds = 3401.0
14312 frames in 5.0 seconds = 2862.4
14356 frames in 5.0 seconds = 2871.2
14347 frames in 5.0 seconds = 2869.4
14346 frames in 5.0 seconds = 2869.2

AWTGears:
14955 frames in 5.0 seconds = 2991.0
20794 frames in 5.0 seconds = 4158.8
18682 frames in 5.0 seconds = 3736.4
18501 frames in 5.0 seconds = 3700.2
18504 frames in 5.0 seconds = 3700.8
22683 frames in 5.0 seconds = 4536.6
23671 frames in 5.0 seconds = 4734.2
22475 frames in 5.0 seconds = 4495.0
21372 frames in 5.0 seconds = 4274.4

So the AWT version appears to be a lot faster. Do you get similar results? Why is it (that much) faster? I would expect the "direct" LWJGL way to be the most efficient way, wince there's no AWT overhead to deal with.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Matzon on January 17, 2008, 21:51:55
I get twice the performance using the Native display compared to AWT.
What graphics card?
Do you have some weird forced settings in your driver panel ?
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 17, 2008, 22:25:20
My card is an NVIDIA GeForce 7900 GS. I have attached screenshots of the settings. I hope, I don't have any strange settings.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 17, 2008, 22:28:29
Btw. here are slightly modified gears tests, that don't rotate as fast as your computer can, but at a time-based fixed rotation speed. Maybe you find it useful.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 17, 2008, 22:38:31
btw. A friend of mine (with an 8800 GT on Linux) has the same behavior, too. The quake3 flight, that I previously talked about is running faster with the AWTGLCanvas.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Matzon on January 17, 2008, 22:50:36
maybe its a linux thing ?
I can only check on windows
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 17, 2008, 22:53:37
Quote from: Matzon on January 17, 2008, 22:50:36
maybe its a linux thing ?
I can only check on windows

Maybe. Do you have someone in the lwjgl team, who could test it on linux?

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: elias on January 18, 2008, 10:30:38
The AWTGears and Gears tests don't use the same window dimensions. With that fixed (in svn), the awt test is still slightly faster (3800->3900 fps).

- elias
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 01:24:12
Quote from: elias on January 18, 2008, 10:30:38
The AWTGears and Gears tests don't use the same window dimensions. With that fixed (in svn), the awt test is still slightly faster (3800->3900 fps).

- elias

These are the FPSes, after the Canvas size fix:
14094 frames in 5.0 seconds = 2818.8
20271 frames in 5.0 seconds = 4054.2
23265 frames in 5.0 seconds = 4653.0
21218 frames in 5.0 seconds = 4243.6
21032 frames in 5.0 seconds = 4206.4
21910 frames in 5.0 seconds = 4382.0
21240 frames in 5.0 seconds = 4248.0

So, it didn't do too much.

Are you running it on Linux, elias? Are you an LWJGL developer?

Marvin

btw. Thanks to both of you for investigating this issue :).
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 02:33:09
These are the results of the native "Gears" test after I changed

Display.update();

to

Display.swapBuffers();


43674 frames in 5.0 seconds = 8734.8
44205 frames in 5.0 seconds = 8841.0
44160 frames in 5.0 seconds = 8832.0
52232 frames in 5.0 seconds = 10446.4
52564 frames in 5.0 seconds = 10512.8
52608 frames in 5.0 seconds = 10521.6
49756 frames in 5.0 seconds = 9951.2

Now I can reproduce your "twice the performance of the AWT test", Matzon :). Is it possible, that polling input devices, which is implicitly done by the Display.update() method, is quite expensive on Linux?

And these are the results after I further modified the testcase to only check for Display.isCloseRequested() each 100ms.
76863 frames in 5.0 seconds = 15372.6
80966 frames in 5.0 seconds = 16193.2
80683 frames in 5.0 seconds = 16136.6
80596 frames in 5.0 seconds = 16119.2
80918 frames in 5.0 seconds = 16183.6
79504 frames in 5.0 seconds = 15900.8

YourKit told me, that Display.isCloseRequested() is very expensive. Isn't this just one boolean flag, that is set by the display_impl.update() method? If this is true, the display_impl.update() method should only be called conditionally by this method, too, since you will always have the chance to call it right after Display.update() or Display.isVisible() or any other method, that implicitly calls the display_impl.update() method.

And since the isVisible() method also calls the display_impl.update() method, which itself reads input, it is really slow. The display_impl.update() method simply must not read input implicitly.

I guess, it would be best to add a boolean parameter to all methods, that implicitly invoke the display_impl.update() method to give the developer control over this method being called, so that you can manually call processMessages() and don't let all the other methods call it again.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: elias on January 19, 2008, 09:28:21
With the recent changes in SVN, the native Gears test is now faster than AWTGears on my Ubuntu 7.10 with a nvidia geforce 140m graphics card.

- elias
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 13:49:26
Quote from: elias on January 19, 2008, 09:28:21
With the recent changes in SVN, the native Gears test is now faster than AWTGears on my Ubuntu 7.10 with a nvidia geforce 140m graphics card.

Thank you very much for doing these changes and the quick reply.

How about the pollInputDevices() call in Display.update(). Could you maybe overload the Display.update() method and add a boolean parameter to the new method, which tells, if the input devices are to be polled? That would further speed it up, if you don'T need the input devices to be polled at this time.

Can I download a nightly build somewhere?

btw. I guess, the display_impl.update() call can be simply removed from the isCloseRequested() method, since it will assumably never be queried without another call (like Display.update()) calling the display_impl.update() once before. If this can not be said for sure, just overload it as well with a boolean parameter. Currently the isCloseRequested() method is unnecessarily expensive, which lead me to just calling it each 10ms, which is quick hackish, but did a noticeable speedup.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 14:58:07
Another idea for the overloaded Dispaly.update() methods is using a bitmask telling what to update and query.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: elias on January 19, 2008, 15:51:49
I already removed display_impl.update() from read() and poll() on the input devices, so Display.pollDevices() should be much faster now.

I don't like the overloaded boolean methods approach, though.

- elias
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 16:03:10
Quote from: elias on January 19, 2008, 15:51:49
I already removed display_impl.update() from read() and poll() on the input devices, so Display.pollDevices() should be much faster now.

I don't like the overloaded boolean methods approach, though.

OK. then I won't use the Display.update() method, but the swapBuffers() one instead. So, it's ok for me.

What about a nightly build?

Thanks.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: elias on January 19, 2008, 16:18:05
Nightly builds are not really practical, since we need access to all three platforms to build the natives. However, with a little luck, Mazon will release 1.1.4 tonight :)

- elias
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 16:23:39
Quote from: elias on January 19, 2008, 16:18:05
Nightly builds are not really practical, since we need access to all three platforms to build the natives. However, with a little luck, Mazon will release 1.1.4 tonight :)

That would be awesome :). Thanks.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 17:16:44
Quote from: Qudus on January 19, 2008, 13:49:26
btw. I guess, the display_impl.update() call can be simply removed from the isCloseRequested() method, since it will assumably never be queried without another call (like Display.update()) calling the display_impl.update() once before. If this can not be said for sure, just overload it as well with a boolean parameter. Currently the isCloseRequested() method is unnecessarily expensive, which lead me to just calling it each 10ms, which is quick hackish, but did a noticeable speedup.

Hmm... I don't want to be a pain in the ass. Sorry, if I am. But I didn't get an answer to this questions.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: elias on January 19, 2008, 21:57:22
Sorry, but I don't like the overloaded boolean method and I'm not convinced isCloseRequested() won't be called without Display.update() first. Consider


if (Display.isVisible()) {
    Display.update();
}
if (Display.isCloseRequested())
    System.exit(0);


- elias
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 22:44:17
Well, in this code the display_impl.update() method would be called three times. Once by the isVisible() method, once by the update() method and once by isCloseRequested(). No matter in which order. Or am I wrong?

And if you like the overloaded approach or not. You must admit, that taking 2% of the entire performance of an expensive testcase just for querying the is-close-requested state cannot be accepted. Don't you agree? Just have a socond look at my timing results of the Gears test, where isCloseRequested() even took 60%.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: elias on January 19, 2008, 22:47:08
With the newest lwjgl, will it still be 2%?

  - elias
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 22:49:29
Quote from: elias on January 19, 2008, 22:47:08
With the newest lwjgl, will it still be 2%?

Well, since you didn't remove the display_impl.update() method from the isCloseRequested() method, it should be. I guess, it would be best, if you would test it with the Gears Test and compare it (percentually) with my results above.

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Matzon on January 19, 2008, 23:23:49
Quote from: Qudus on January 19, 2008, 16:23:39
Quote from: elias on January 19, 2008, 16:18:05
Nightly builds are not really practical, since we need access to all three platforms to build the natives. However, with a little luck, Mazon will release 1.1.4 tonight :)

That would be awesome :). Thanks.

Marvin
meh - I decided to play some Enemy Territory instead  ::)
Tomorrow! :)
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 19, 2008, 23:36:03
Quote from: Matzon on January 19, 2008, 23:23:49
meh - I decided to play some Enemy Territory instead  ::)
Tomorrow! :)

No problem. Have fun ;).

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: elias on January 20, 2008, 11:10:55
We've more or less some to the conclusion that there changes should be done as part of a lwjgl 2.0, so we're going to release 1.1.4 without any (intentional) breakage and see if we can get a 2.0 out with all the controversial changes (most likely your changes and the replacement of devil and fmod).

- elias
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: Qudus on January 20, 2008, 14:40:36
ok

But I have a new idea, how you could simply use caching to improve the performance without changing anything in the API.

The LinuxDisplay and WindowsDisplay would simply need to store a frame-ID (long). The swapBuffers() method would then increase the frame-ID stored in Display and any call to display_impl.update() would pass the current Display's frame-ID to the impl. Then the display_impl.update() method would immediately return, if the incoming frame-ID is not greater than the currently stored one. If it is greater, it is stored.

Wouldn't this be easy to implement and very convenient?

Marvin
Title: Re: Why is AWTGLCanvas so much faster than the native Display?
Post by: elias on January 20, 2008, 17:21:33
I thought about a similar solution, too (which would need to be implemented for all queries in Display too). However, we have a bunch of even more intrusive changes lined up for 2.0, so I won't bother.

- elias