LWJGL Forum

Programming => Lightweight Java Gaming Library => Topic started by: elias on July 25, 2004, 06:51:28

Title: "Fast" Pbuffers implemented
Post by: elias on July 25, 2004, 06:51:28
Hi,

I've just completed a new Pbuffer feature in CVS, "fast" pbuffers. They work byt using the Display context so that context switch overhead are minimized. The downside is that you can't specify a separate pixel format and that the Display has to be created. Two new static factory calls have replaced the Pbuffer constructor:


   public static Pbuffer createPbufferUsingDisplayContext(int width, int height, RenderTexture renderTexture) throws LWJGLException;
   public static Pbuffer createPbufferUsingUniqueContext(int width, int height, PixelFormat pixel_format, RenderTexture renderTexture) throws LWJGLException;


I'd appreciate if you guy could test it, even though you don't use Pbuffers, since changes had to be done to the Display context creation (to increase the chance of creating a context capable of being used with Pbuffers):

http://odense.kollegienet.dk/~naur/lwjgl-20040725.zip

Compare it to the old build that didn't include the new feature:

http://odense.kollegienet.dk/~naur/lwjgl-20040723.zip

Thanks,

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 07:58:53
Marathon hangs (just stops there, no exceptions, no crashes) at createPbufferUsingUniqueContext when using render-to-texture. Without render-to-texture it works, but awfully slow (tops at ~30fps). Both create a new PixelFormat, with just 24bit depth.

It works fine with createPbufferUsingDisplayContext. A few questions though: What exactly is happening here? The OpenGL state is shared between the display and the pbuffer? How is this possible? It's great of course, but can this be done in other OSes too? Cause it kind of messes my state changes right now...:?

And one last thing: Do you know what happens behind the scenes, memory-wise? I need a good estimation of our memory requirements and I'm not sure what to calculate for the pbuffer. A double-buffered pbuffer (as in LWJGL) should be counted as two times the (width x height x bits)? Is it possible to create a non-double-buffered pbuffer?
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 08:46:10
Quote from: "spasi"Marathon hangs (just stops there, no exceptions, no crashes) at createPbufferUsingUniqueContext when using render-to-texture. Without render-to-texture it works, but awfully slow (tops at ~30fps). Both create a new PixelFormat, with just 24bit depth.

The createPbufferUsingUniqueContext() path should be the same as in 0.9. How does 0.9 work with Marathon? Mind you I never tested RenderTexture, because TT doesn't use it and there's no test for it in LWJGL. Could you create a simple test for it (based on PbufferTest, maybe)?.

Quote
It works fine with createPbufferUsingDisplayContext. A few questions though: What exactly is happening here? The OpenGL state is shared between the display and the pbuffer? How is this possible? It's great of course, but can this be done in other OSes too? Cause it kind of messes my state changes right now...:?

It's very simple: An OpenGL context can potentially be used for multiple drawables, so when the Display is created, I try to make sure the context supports both windows and pbuffers if at all possible. The Pbuffer class simply creates a Pbuffer, but no context, and Pbuffer.makeCurrent then binds the context to the pbuffer. That's why the state is shared and it works on both linux and windows. The normal path (Unique context) creates a new context for the pbuffer, but enables texture (and other object type) sharing between the pbuffer context and the display context. This path is slower because it's a separate context and because GLContext need to reload the OpenGL functions at every context switch.

Quote
And one last thing: Do you know what happens behind the scenes, memory-wise? I need a good estimation of our memory requirements and I'm not sure what to calculate for the pbuffer. A double-buffered pbuffer (as in LWJGL) should be counted as two times the (width x height x bits)? Is it possible to create a non-double-buffered pbuffer?

No idea, but the DisplayContext path is most definitely cheaper, memorywise. The biggest memory hogs are the (color, depth, stencil etc.)buffers though. I ddidn't code a way to avoid the double buffered Pbuffer, although it is probably possible (but it involves more code to find a pixel format that exactly matches the display pixel format, apart from  the double buffer property).

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 09:29:40
Quote from: "elias"The createPbufferUsingUniqueContext() path should be the same as in 0.9. How does 0.9 work with Marathon? Mind you I never tested RenderTexture, because TT doesn't use it and there's no test for it in LWJGL. Could you create a simple test for it (based on PbufferTest, maybe)?.

It's working fine with 0.9, with or without render-to-texture. I'll make a little test when I can (so busy right now...).

Quote from: "elias"
It's very simple: An OpenGL context can potentially be used for multiple drawables, so when the Display is created, I try to make sure the context supports both windows and pbuffers if at all possible. The Pbuffer class simply creates a Pbuffer, but no context, and Pbuffer.makeCurrent then binds the context to the pbuffer. That's why the state is shared and it works on both linux and windows. The normal path (Unique context) creates a new context for the pbuffer, but enables texture (and other object type) sharing between the pbuffer context and the display context. This path is slower because it's a separate context and because GLContext need to reload the OpenGL functions at every context switch.

But in 0.9, without render-to-texture, it was slower by only 10-20%. And that was the additional texture copy. The context switch overhead was present in both paths (with & without rtt) and I could easily get well above 100 fps. Sth is definitely wrong with the unique context creation. To sum up, the problems I have are: When creating a new context, I get bad performance without rtt and it hangs with rtt. It may give you a clue of what may be wrong.

Quote from: "elias"No idea, but the DisplayContext path is most definitely cheaper, memorywise. The biggest memory hogs are the (color, depth, stencil etc.)buffers though. I ddidn't code a way to avoid the double buffered Pbuffer, although it is probably possible (but it involves more code to find a pixel format that exactly matches the display pixel format, apart from  the double buffer property).

I'd love a way to disable double-buffered pbuffers. Our shadow maps are really big (at least 1024x1024) and that's a lot of memory to waste. I hope you're not as busy as I am...:wink:
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 09:40:31
The reason Unique context Pbuffers are slower in this version is beacuse the OpenGL function pointers are reloaded at each switch to a different context. The reason is that that the functions might have different addresses for different contexts, according to the spec. That's was sort of the reason I implemented the "fast" path.

- elias
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 09:41:45
Let's make a deal: You commit a test case using render-to-texture (that hangs) and I'll look into the hang bug and the double buffer issue. OK? Can't be that hard given we already have a Pbuffer test.

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 10:04:13
Quote from: "elias"The reason Unique context Pbuffers are slower in this version is beacuse the OpenGL function pointers are reloaded at each switch to a different context. The reason is that that the functions might have different addresses for different contexts, according to the spec. That's was sort of the reason I implemented the "fast" path.

But they are cached, aren't they? From useContext javadoc in GLContext.java:

If the context has not been encountered before it will be fully initialized from scratch.
* Otherwise a cached set of caps and function pointers will be used.


Why would that be slowing down things?

Quote from: "elias"Let's make a deal: You commit a test case using render-to-texture (that hangs) and I'll look into the hang bug and the double buffer issue. OK? Can't be that hard given we already have a Pbuffer test.

It's a deal :wink:.
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 10:23:05
No, they're not cached. And even though they were "cached" I'm sure the expensive part is actually re-assigning the pointers, not fetching them directly from the library.

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 10:59:49
Wow. This messes 0.9 functionality that was ok. I mean, pbuffers in different contexts are useless in games as it is now. And many will need them, as creating a pbuffer with the same pixel format as the display context is probably a serious waste of memory. Isn't there a better way to do it? If I got it right, the only problem is that the functions may have different addresses in different contexts, right? Maybe detect that they are not different, so no reassignment will be necessary?
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 11:21:20
Well, if you commit that test and I fix it and the double buffer issue, everybody will want the DisplayContext version Pbuffers, because they should be even faster than the 0.9, given that they don't have an extra OpenGL context.

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 12:56:30
But this is not a speed issue (well, it is when your memory is full). Of course, the context change affects speed, but it's not important if you're doing anything non-trivial.

Assume a standard display pixel format: 24bits RGB, 8bits ALPHA and 24bits DEPTH. Say you want to render a 1024x1024 shadow map and a 512x512 reflection map. You'd normally want to use render-to-texture, so:

Pbuffer A: 512x512x3 = 786432 bytes
Pbuffer B: 1024x1024x3 = 3145728 bytes

But, if you share the display context, you'll have:

Pbuffer A: 512x512x7 = 1835008 bytes
Pbuffer B: 1024x1024x7 = 7340032 bytes

That's 5 wasted megs! If you add to that the extra memory required for textures (since, rtt support sucks, especially for depth maps) and the probably higher resolution the user will want, it's getting out of hand.
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 13:16:16
Why are you using 2 Pbuffers instead of just using Pbuffer B for both? I'll consider simply re-using the pointers (you can't check if they're the same unless you run through them all).

Post that test! (Just a simple one that demonstrates the hang)

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 13:25:56
Now, that's a speed issue. I wouldn't want to render a 1024x1024 reflection map, it would slow everything down, for no apparent quality gains. But I want/need a hi-res shadow map. Anyway, that was just an example. In an Unreal3 (http://www.unrealtechnology.com) type of engine(!), these matters would be of really big importance.

Sorry, but the test will take a while (I'm at work now). I'll try to have it ready by tomorrow morning.
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 14:30:48
You don't need to use all the 1024*1024 available pixels. I'm saying that if you use one 1024*1024 pbuffer, why not reuse the lower left corner of it to save the additional 512*512 pbuffer?

- elias
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 14:40:09
Anyway, I just made GLContext only load the stubs when they're unloaded, here's the new build:

http://odense.kollegienet.dk/~naur/lwjgl-20040726.zip

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 15:48:25
OK, pbuffers with unique context, without rtt, work great now. But it still hangs with rtt and now I get a "Could not make pbuffer context current" with context sharing (with & without rtt).

I'm starting the pbuffer test now...
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 19:04:06
The tests are ready. They are in org.lwjgl.test.opengl.pbuffers. Usage: PbufferTest <mode>, where mode can be one of 1-4, to try the different techniques.

From the four tests only the second doesn't run, it hangs just like in Marathon. The others are fine.
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 19:24:10
That was an easy one, the test number 2 works now, except that the inner colored square is never shown like in the rest of the tests. Could you verify that this is intended?

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 19:54:34
No, it's not. Post a new build and I'll check it out. I couldn't test that one at all.
Title: "Fast" Pbuffers implemented
Post by: elias on July 26, 2004, 20:09:32
Here you go:

http://odense.kollegienet.dk/~naur/lwjgl-20040726-2.zip

I had to disable the single buffering stuff for the shared context path, because I discovered that the pixel format has to match exactly, not just the buffer depths like in GLX.

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 26, 2004, 22:16:47
OK, fixed it. I was using the back buffer instead of the front.

So, to sum up, when sharing the display context the pbuffer is forced to have the exact same pixel format and be double-buffered, right? And when having a new context, you can choose whatever pixel format you want and it will be single buffered (GL_FRONT_LEFT only). Am I correct?
Title: "Fast" Pbuffers implemented
Post by: elias on July 27, 2004, 14:15:11
Mostly correct. I updated the docs, and as they state, single buffered pbuffers are always preferred, but if that fails, you can get a double-buffered one. In practice, it means that in Linux where the GLX spec is less restrictive, you can get a single buffered pbuffer with a shared context (but still same buffer depths as the display), but in win32 you will always get a double-buffered one. For unique context pbuffers, you will most likely get a single-buffered pbuffer, but if that is not possible, a double-buffered format pbuffer could be created.

- elias
Title: "Fast" Pbuffers implemented
Post by: spasi on July 27, 2004, 15:15:47
Nice. What about the different function addresses? I guess that will never happen in a standard game, but will we do something about it? Just to cover an unusual usage of the library. Probably not, right?
Title: "Fast" Pbuffers implemented
Post by: elias on July 27, 2004, 15:51:36
Already converted to the Assumption: "Contexts will never have different function pointer sets". So the pointers will only load if they're not loaded already.

- elias