Why is AWTGLCanvas so much faster than the native Display?

Started by Qudus, January 15, 2008, 00:30:38

Previous topic - Next topic

Qudus

My card is an NVIDIA GeForce 7900 GS. I have attached screenshots of the settings. I hope, I don't have any strange settings.

Marvin

Qudus

Btw. here are slightly modified gears tests, that don't rotate as fast as your computer can, but at a time-based fixed rotation speed. Maybe you find it useful.

Marvin

Qudus

btw. A friend of mine (with an 8800 GT on Linux) has the same behavior, too. The quake3 flight, that I previously talked about is running faster with the AWTGLCanvas.

Marvin

Matzon

maybe its a linux thing ?
I can only check on windows

Qudus

Quote from: Matzon on January 17, 2008, 22:50:36
maybe its a linux thing ?
I can only check on windows

Maybe. Do you have someone in the lwjgl team, who could test it on linux?

Marvin

elias

The AWTGears and Gears tests don't use the same window dimensions. With that fixed (in svn), the awt test is still slightly faster (3800->3900 fps).

- elias

Qudus

Quote from: elias on January 18, 2008, 10:30:38
The AWTGears and Gears tests don't use the same window dimensions. With that fixed (in svn), the awt test is still slightly faster (3800->3900 fps).

- elias

These are the FPSes, after the Canvas size fix:
14094 frames in 5.0 seconds = 2818.8
20271 frames in 5.0 seconds = 4054.2
23265 frames in 5.0 seconds = 4653.0
21218 frames in 5.0 seconds = 4243.6
21032 frames in 5.0 seconds = 4206.4
21910 frames in 5.0 seconds = 4382.0
21240 frames in 5.0 seconds = 4248.0

So, it didn't do too much.

Are you running it on Linux, elias? Are you an LWJGL developer?

Marvin

btw. Thanks to both of you for investigating this issue :).

Qudus

These are the results of the native "Gears" test after I changed
Display.update();

to
Display.swapBuffers();


43674 frames in 5.0 seconds = 8734.8
44205 frames in 5.0 seconds = 8841.0
44160 frames in 5.0 seconds = 8832.0
52232 frames in 5.0 seconds = 10446.4
52564 frames in 5.0 seconds = 10512.8
52608 frames in 5.0 seconds = 10521.6
49756 frames in 5.0 seconds = 9951.2

Now I can reproduce your "twice the performance of the AWT test", Matzon :). Is it possible, that polling input devices, which is implicitly done by the Display.update() method, is quite expensive on Linux?

And these are the results after I further modified the testcase to only check for Display.isCloseRequested() each 100ms.
76863 frames in 5.0 seconds = 15372.6
80966 frames in 5.0 seconds = 16193.2
80683 frames in 5.0 seconds = 16136.6
80596 frames in 5.0 seconds = 16119.2
80918 frames in 5.0 seconds = 16183.6
79504 frames in 5.0 seconds = 15900.8

YourKit told me, that Display.isCloseRequested() is very expensive. Isn't this just one boolean flag, that is set by the display_impl.update() method? If this is true, the display_impl.update() method should only be called conditionally by this method, too, since you will always have the chance to call it right after Display.update() or Display.isVisible() or any other method, that implicitly calls the display_impl.update() method.

And since the isVisible() method also calls the display_impl.update() method, which itself reads input, it is really slow. The display_impl.update() method simply must not read input implicitly.

I guess, it would be best to add a boolean parameter to all methods, that implicitly invoke the display_impl.update() method to give the developer control over this method being called, so that you can manually call processMessages() and don't let all the other methods call it again.

Marvin

elias

With the recent changes in SVN, the native Gears test is now faster than AWTGears on my Ubuntu 7.10 with a nvidia geforce 140m graphics card.

- elias

Qudus

Quote from: elias on January 19, 2008, 09:28:21
With the recent changes in SVN, the native Gears test is now faster than AWTGears on my Ubuntu 7.10 with a nvidia geforce 140m graphics card.

Thank you very much for doing these changes and the quick reply.

How about the pollInputDevices() call in Display.update(). Could you maybe overload the Display.update() method and add a boolean parameter to the new method, which tells, if the input devices are to be polled? That would further speed it up, if you don'T need the input devices to be polled at this time.

Can I download a nightly build somewhere?

btw. I guess, the display_impl.update() call can be simply removed from the isCloseRequested() method, since it will assumably never be queried without another call (like Display.update()) calling the display_impl.update() once before. If this can not be said for sure, just overload it as well with a boolean parameter. Currently the isCloseRequested() method is unnecessarily expensive, which lead me to just calling it each 10ms, which is quick hackish, but did a noticeable speedup.

Marvin

Qudus

Another idea for the overloaded Dispaly.update() methods is using a bitmask telling what to update and query.

Marvin

elias

I already removed display_impl.update() from read() and poll() on the input devices, so Display.pollDevices() should be much faster now.

I don't like the overloaded boolean methods approach, though.

- elias

Qudus

Quote from: elias on January 19, 2008, 15:51:49
I already removed display_impl.update() from read() and poll() on the input devices, so Display.pollDevices() should be much faster now.

I don't like the overloaded boolean methods approach, though.

OK. then I won't use the Display.update() method, but the swapBuffers() one instead. So, it's ok for me.

What about a nightly build?

Thanks.

Marvin

elias

Nightly builds are not really practical, since we need access to all three platforms to build the natives. However, with a little luck, Mazon will release 1.1.4 tonight :)

- elias

Qudus

Quote from: elias on January 19, 2008, 16:18:05
Nightly builds are not really practical, since we need access to all three platforms to build the natives. However, with a little luck, Mazon will release 1.1.4 tonight :)

That would be awesome :). Thanks.

Marvin