Hello Guest

"Fast" Pbuffers implemented

  • 23 Replies
  • 22763 Views
*

Offline elias

  • *****
  • 899
    • http://oddlabs.com
"Fast" Pbuffers implemented
« on: July 25, 2004, 06:51:28 »
Hi,

I've just completed a new Pbuffer feature in CVS, "fast" pbuffers. They work byt using the Display context so that context switch overhead are minimized. The downside is that you can't specify a separate pixel format and that the Display has to be created. Two new static factory calls have replaced the Pbuffer constructor:

Code: [Select]

    public static Pbuffer createPbufferUsingDisplayContext(int width, int height, RenderTexture renderTexture) throws LWJGLException;
    public static Pbuffer createPbufferUsingUniqueContext(int width, int height, PixelFormat pixel_format, RenderTexture renderTexture) throws LWJGLException;


I'd appreciate if you guy could test it, even though you don't use Pbuffers, since changes had to be done to the Display context creation (to increase the chance of creating a context capable of being used with Pbuffers):

http://odense.kollegienet.dk/~naur/lwjgl-20040725.zip

Compare it to the old build that didn't include the new feature:

http://odense.kollegienet.dk/~naur/lwjgl-20040723.zip

Thanks,

 - elias

*

Offline spasi

  • *****
  • 2261
    • WebHotelier
"Fast" Pbuffers implemented
« Reply #1 on: July 26, 2004, 07:58:53 »
Marathon hangs (just stops there, no exceptions, no crashes) at createPbufferUsingUniqueContext when using render-to-texture. Without render-to-texture it works, but awfully slow (tops at ~30fps). Both create a new PixelFormat, with just 24bit depth.

It works fine with createPbufferUsingDisplayContext. A few questions though: What exactly is happening here? The OpenGL state is shared between the display and the pbuffer? How is this possible? It's great of course, but can this be done in other OSes too? Cause it kind of messes my state changes right now...:?

And one last thing: Do you know what happens behind the scenes, memory-wise? I need a good estimation of our memory requirements and I'm not sure what to calculate for the pbuffer. A double-buffered pbuffer (as in LWJGL) should be counted as two times the (width x height x bits)? Is it possible to create a non-double-buffered pbuffer?

*

Offline elias

  • *****
  • 899
    • http://oddlabs.com
"Fast" Pbuffers implemented
« Reply #2 on: July 26, 2004, 08:46:10 »
Quote from: "spasi"
Marathon hangs (just stops there, no exceptions, no crashes) at createPbufferUsingUniqueContext when using render-to-texture. Without render-to-texture it works, but awfully slow (tops at ~30fps). Both create a new PixelFormat, with just 24bit depth.


The createPbufferUsingUniqueContext() path should be the same as in 0.9. How does 0.9 work with Marathon? Mind you I never tested RenderTexture, because TT doesn't use it and there's no test for it in LWJGL. Could you create a simple test for it (based on PbufferTest, maybe)?.

Quote

It works fine with createPbufferUsingDisplayContext. A few questions though: What exactly is happening here? The OpenGL state is shared between the display and the pbuffer? How is this possible? It's great of course, but can this be done in other OSes too? Cause it kind of messes my state changes right now...:?


It's very simple: An OpenGL context can potentially be used for multiple drawables, so when the Display is created, I try to make sure the context supports both windows and pbuffers if at all possible. The Pbuffer class simply creates a Pbuffer, but no context, and Pbuffer.makeCurrent then binds the context to the pbuffer. That's why the state is shared and it works on both linux and windows. The normal path (Unique context) creates a new context for the pbuffer, but enables texture (and other object type) sharing between the pbuffer context and the display context. This path is slower because it's a separate context and because GLContext need to reload the OpenGL functions at every context switch.

Quote

And one last thing: Do you know what happens behind the scenes, memory-wise? I need a good estimation of our memory requirements and I'm not sure what to calculate for the pbuffer. A double-buffered pbuffer (as in LWJGL) should be counted as two times the (width x height x bits)? Is it possible to create a non-double-buffered pbuffer?


No idea, but the DisplayContext path is most definitely cheaper, memorywise. The biggest memory hogs are the (color, depth, stencil etc.)buffers though. I ddidn't code a way to avoid the double buffered Pbuffer, although it is probably possible (but it involves more code to find a pixel format that exactly matches the display pixel format, apart from  the double buffer property).

 - elias

*

Offline spasi

  • *****
  • 2261
    • WebHotelier
"Fast" Pbuffers implemented
« Reply #3 on: July 26, 2004, 09:29:40 »
Quote from: "elias"
The createPbufferUsingUniqueContext() path should be the same as in 0.9. How does 0.9 work with Marathon? Mind you I never tested RenderTexture, because TT doesn't use it and there's no test for it in LWJGL. Could you create a simple test for it (based on PbufferTest, maybe)?.


It's working fine with 0.9, with or without render-to-texture. I'll make a little test when I can (so busy right now...).

Quote from: "elias"

It's very simple: An OpenGL context can potentially be used for multiple drawables, so when the Display is created, I try to make sure the context supports both windows and pbuffers if at all possible. The Pbuffer class simply creates a Pbuffer, but no context, and Pbuffer.makeCurrent then binds the context to the pbuffer. That's why the state is shared and it works on both linux and windows. The normal path (Unique context) creates a new context for the pbuffer, but enables texture (and other object type) sharing between the pbuffer context and the display context. This path is slower because it's a separate context and because GLContext need to reload the OpenGL functions at every context switch.


But in 0.9, without render-to-texture, it was slower by only 10-20%. And that was the additional texture copy. The context switch overhead was present in both paths (with & without rtt) and I could easily get well above 100 fps. Sth is definitely wrong with the unique context creation. To sum up, the problems I have are: When creating a new context, I get bad performance without rtt and it hangs with rtt. It may give you a clue of what may be wrong.

Quote from: "elias"
No idea, but the DisplayContext path is most definitely cheaper, memorywise. The biggest memory hogs are the (color, depth, stencil etc.)buffers though. I ddidn't code a way to avoid the double buffered Pbuffer, although it is probably possible (but it involves more code to find a pixel format that exactly matches the display pixel format, apart from  the double buffer property).


I'd love a way to disable double-buffered pbuffers. Our shadow maps are really big (at least 1024x1024) and that's a lot of memory to waste. I hope you're not as busy as I am...:wink:

*

Offline elias

  • *****
  • 899
    • http://oddlabs.com
"Fast" Pbuffers implemented
« Reply #4 on: July 26, 2004, 09:40:31 »
The reason Unique context Pbuffers are slower in this version is beacuse the OpenGL function pointers are reloaded at each switch to a different context. The reason is that that the functions might have different addresses for different contexts, according to the spec. That's was sort of the reason I implemented the "fast" path.

 - elias

*

Offline elias

  • *****
  • 899
    • http://oddlabs.com
"Fast" Pbuffers implemented
« Reply #5 on: July 26, 2004, 09:41:45 »
Let's make a deal: You commit a test case using render-to-texture (that hangs) and I'll look into the hang bug and the double buffer issue. OK? Can't be that hard given we already have a Pbuffer test.

 - elias

*

Offline spasi

  • *****
  • 2261
    • WebHotelier
"Fast" Pbuffers implemented
« Reply #6 on: July 26, 2004, 10:04:13 »
Quote from: "elias"
The reason Unique context Pbuffers are slower in this version is beacuse the OpenGL function pointers are reloaded at each switch to a different context. The reason is that that the functions might have different addresses for different contexts, according to the spec. That's was sort of the reason I implemented the "fast" path.


But they are cached, aren't they? From useContext javadoc in GLContext.java:

If the context has not been encountered before it will be fully initialized from scratch.
* Otherwise a cached set of caps and function pointers will be used.


Why would that be slowing down things?

Quote from: "elias"
Let's make a deal: You commit a test case using render-to-texture (that hangs) and I'll look into the hang bug and the double buffer issue. OK? Can't be that hard given we already have a Pbuffer test.


It's a deal :wink:.

*

Offline elias

  • *****
  • 899
    • http://oddlabs.com
"Fast" Pbuffers implemented
« Reply #7 on: July 26, 2004, 10:23:05 »
No, they're not cached. And even though they were "cached" I'm sure the expensive part is actually re-assigning the pointers, not fetching them directly from the library.

 - elias

*

Offline spasi

  • *****
  • 2261
    • WebHotelier
"Fast" Pbuffers implemented
« Reply #8 on: July 26, 2004, 10:59:49 »
Wow. This messes 0.9 functionality that was ok. I mean, pbuffers in different contexts are useless in games as it is now. And many will need them, as creating a pbuffer with the same pixel format as the display context is probably a serious waste of memory. Isn't there a better way to do it? If I got it right, the only problem is that the functions may have different addresses in different contexts, right? Maybe detect that they are not different, so no reassignment will be necessary?

*

Offline elias

  • *****
  • 899
    • http://oddlabs.com
"Fast" Pbuffers implemented
« Reply #9 on: July 26, 2004, 11:21:20 »
Well, if you commit that test and I fix it and the double buffer issue, everybody will want the DisplayContext version Pbuffers, because they should be even faster than the 0.9, given that they don't have an extra OpenGL context.

 - elias

*

Offline spasi

  • *****
  • 2261
    • WebHotelier
"Fast" Pbuffers implemented
« Reply #10 on: July 26, 2004, 12:56:30 »
But this is not a speed issue (well, it is when your memory is full). Of course, the context change affects speed, but it's not important if you're doing anything non-trivial.

Assume a standard display pixel format: 24bits RGB, 8bits ALPHA and 24bits DEPTH. Say you want to render a 1024x1024 shadow map and a 512x512 reflection map. You'd normally want to use render-to-texture, so:

Pbuffer A: 512x512x3 = 786432 bytes
Pbuffer B: 1024x1024x3 = 3145728 bytes

But, if you share the display context, you'll have:

Pbuffer A: 512x512x7 = 1835008 bytes
Pbuffer B: 1024x1024x7 = 7340032 bytes

That's 5 wasted megs! If you add to that the extra memory required for textures (since, rtt support sucks, especially for depth maps) and the probably higher resolution the user will want, it's getting out of hand.

*

Offline elias

  • *****
  • 899
    • http://oddlabs.com
"Fast" Pbuffers implemented
« Reply #11 on: July 26, 2004, 13:16:16 »
Why are you using 2 Pbuffers instead of just using Pbuffer B for both? I'll consider simply re-using the pointers (you can't check if they're the same unless you run through them all).

Post that test! (Just a simple one that demonstrates the hang)

 - elias

*

Offline spasi

  • *****
  • 2261
    • WebHotelier
"Fast" Pbuffers implemented
« Reply #12 on: July 26, 2004, 13:25:56 »
Now, that's a speed issue. I wouldn't want to render a 1024x1024 reflection map, it would slow everything down, for no apparent quality gains. But I want/need a hi-res shadow map. Anyway, that was just an example. In an Unreal3 type of engine(!), these matters would be of really big importance.

Sorry, but the test will take a while (I'm at work now). I'll try to have it ready by tomorrow morning.

*

Offline elias

  • *****
  • 899
    • http://oddlabs.com
"Fast" Pbuffers implemented
« Reply #13 on: July 26, 2004, 14:30:48 »
You don't need to use all the 1024*1024 available pixels. I'm saying that if you use one 1024*1024 pbuffer, why not reuse the lower left corner of it to save the additional 512*512 pbuffer?

 - elias

*

Offline elias

  • *****
  • 899
    • http://oddlabs.com
"Fast" Pbuffers implemented
« Reply #14 on: July 26, 2004, 14:40:09 »
Anyway, I just made GLContext only load the stubs when they're unloaded, here's the new build:

http://odense.kollegienet.dk/~naur/lwjgl-20040726.zip

 - elias