These are the results of the native "Gears" test after I changed
Display.update();
to
Display.swapBuffers();
43674 frames in 5.0 seconds = 8734.8
44205 frames in 5.0 seconds = 8841.0
44160 frames in 5.0 seconds = 8832.0
52232 frames in 5.0 seconds = 10446.4
52564 frames in 5.0 seconds = 10512.8
52608 frames in 5.0 seconds = 10521.6
49756 frames in 5.0 seconds = 9951.2
Now I can reproduce your "twice the performance of the AWT test", Matzon

. Is it possible, that polling input devices, which is implicitly done by the Display.update() method, is quite expensive on Linux?
And these are the results after I further modified the testcase to only check for Display.isCloseRequested() each 100ms.
76863 frames in 5.0 seconds = 15372.6
80966 frames in 5.0 seconds = 16193.2
80683 frames in 5.0 seconds = 16136.6
80596 frames in 5.0 seconds = 16119.2
80918 frames in 5.0 seconds = 16183.6
79504 frames in 5.0 seconds = 15900.8
YourKit told me, that Display.isCloseRequested() is very expensive. Isn't this just one boolean flag, that is set by the display_impl.update() method? If this is true, the display_impl.update() method should only be called conditionally by this method, too, since you will always have the chance to call it right after Display.update() or Display.isVisible() or any other method, that implicitly calls the display_impl.update() method.
And since the isVisible() method also calls the display_impl.update() method, which itself reads input, it is really slow. The display_impl.update() method simply must not read input implicitly.
I guess, it would be best to add a boolean parameter to all methods, that implicitly invoke the display_impl.update() method to give the developer control over this method being called, so that you can manually call processMessages() and don't let all the other methods call it again.
Marvin