Shaders, when to compile?

Started by lagz85, April 28, 2010, 13:36:52

Previous topic - Next topic

lagz85

Hi,

I am new to shaders and have been following this tutorial to get started:
http://lwjgl.org/wiki/doku.php/lwjgl/tutorials/opengl/basicshaders

As I understand it you create a "shader object" corresponding to a fragment shader and a separate one corresponding to a vertex shader. You compile these objects separately. Finally you create a "program", associate the shader objects with it and then link it. I have a few questions about this "process":

1. I am creating a project where there will be many shaders and it is not known when each one will be required. Obviously I cant compile them "on the fly" because this affects my rendering loop (I cant afford to have any pauses in the visual output whilst a shader compiles, and I believe compilation needs to be done on the rendering thread?). Is it ok just to load and compile potentially hundreds of shaders when my program starts? Can OpenGL deal with paging them in and out of the graphics card as they are needed, or is this just not a problem with the amount of memory of modern GPUs?

2. Is it valid to compile a shader and then associate it to multiple programs? Say if I want to combine different combinations of fragment and vertex shaders together, can I compile each one only once and then use them in multiple programs?

3. Would you recommend creating all of the necessary programs and linking them when the program starts, or can this always be done quickly at runtime (i.e. without causing a significant framerate drop)? What resources are actually consumed by a "program" object?

Thanks.

Rene

1 - On most systems, compiling on the fly won't be a big issue. As you know, shaders are very small compared to 'real programs', so compilation is very fast. If you do notice pauses, you can usually have a lot of shaders compiled at the same time so compile them at startup. The exact maximum amount depends on a lot of factors, most important the OpenGL implementation (graphics driver). How many is 'many' shaders, and why do you need many shaders?

2 - Yes, it is valid to attach a single shader to multiple programs. Remember that if you re-compile a shader, you also need to re-link all programs it is attached to.

3 - Same as 1. Usually the linking process is quite fast, but you can do it at startup if you wish. I'm not sure what resources are used by a linked program, but I think a shader's source is stored in system memory by the driver, the compiled shader code is stored in GPU memory, and every shader program probably uses a piece of GPU memory required by the uniform variables used by the attached shaders.

Also, if you're going to use GLSL a lot, the 'OpenGL Shading Language' book is a must have. A decent book is always better than online tutorials: http://www.amazon.com/OpenGL-Shading-Language-Randi-Rost/dp/0321637631/ref=sr_1_1?ie=UTF8&s=books&qid=1272467746&sr=8-1
When I am king, they shall not have bread and shelter only, but also teachings out of books, for a full belly is little worth where the mind is starved - Mark Twain

spasi

Quote from: lagz85 on April 28, 2010, 13:36:521. I am creating a project where there will be many shaders and it is not known when each one will be required. Obviously I cant compile them "on the fly" because this affects my rendering loop (I cant afford to have any pauses in the visual output whilst a shader compiles, and I believe compilation needs to be done on the rendering thread?). Is it ok just to load and compile potentially hundreds of shaders when my program starts? Can OpenGL deal with paging them in and out of the graphics card as they are needed, or is this just not a problem with the amount of memory of modern GPUs?

Paging and memory shouldn't be a problem, but compiling/linking does take a lot of time with GLSL. Unlike Direct3D, which allows both offline compilation and linking, OpenGL drivers are forced to compile, optimize and link during runtime.

One option you might want to try is compiling/linking shaders in a separate thread/context. You can check LWJGL's tests for an example that does this for background thread texture loading (just replace texture loading with shader loading). You'll need a nightly release, the test is test.opengl.multithread.BackgroundLoadTest. Its usefulness depends on what your app is doing of course, but I think there are many chances to load stuff on the background in most 3D apps.

Quote from: lagz85 on April 28, 2010, 13:36:522. Is it valid to compile a shader and then associate it to multiple programs? Say if I want to combine different combinations of fragment and vertex shaders together, can I compile each one only once and then use them in multiple programs?

Yes, you can. But keep in mind that linking programs isn't free. Modern GL drivers try to do most of the optimizations during linking, as that's when they have all the info about what exactly is going to run on the GPU. Linking shaders A+B might allow optimizations that aren't possible when linking shaders A+C and vice-versa.

Quote from: lagz85 on April 28, 2010, 13:36:523. Would you recommend creating all of the necessary programs and linking them when the program starts, or can this always be done quickly at runtime (i.e. without causing a significant framerate drop)? What resources are actually consumed by a "program" object?

See my reply above for 1. I'm pretty sure that if we're talking about hundreds of shader programs, start-up performance will take a significant hit if you try to link everything then. Lazily linking during runtime might be fine, but again, that depends on your app and the shader complexity. The earlier you can predict when a program will be needed, the easier it will be to avoid performance issues related to compilation/linking (e.g. using a separate thread or spreading compilation over many frames).

lagz85

Thanks for the advice from both of you. Compiling on the fly definitely introduces pauses, even for trivial fragment shaders, so it is definitely an issue.

I'm building a program where a user can essentially choose to apply "effects" to a visual output. Since you cannot predict what the user is about to do, there is no easy way to predict when a shader will be needed (unlike, say, a game where you are moving towards an area where an effect is used). The effects themselves are going to be plugins, so the number will grow over time as new plugins are added.

I am a bit concerned by Rene's comment about a maximum number of shaders. That sounds like a pretty bad design decision to me (although I guess there is a reason!). Is this definitely the case, and if so is there a way to query what this number is (if it is always something like 2^32 then it obviously isn't a problem in practice)? I couldn't find anything from a quick google search.

Using a background thread is probably a good option. I didn't realise you could do this reliably (I thought all OpenGL calls had to be made from a single thread, is this not the case?), so I will have a look at the code. I can allow a small delay between a user selecting an effect and subsequently being able to apply it (although the applying bit must be instant), so I could potentially use a background thread during this period of time.

Only other question: How can you "split compilation over many frames" as mentioned in the last post? As far as I can see compilation is just a single method call, with no easy way to split it up (obviously you can do the linking etc in a separate frame, but the compilation method call by itself takes long enough to cause a pause, so that doesn't help).

Rene

Quote from: spasi on April 28, 2010, 15:21:37
... but compiling/linking does take a lot of time with GLSL.

I really have to disagree with that. I just ran a simple test where I load, compile and link a relatively simple shader program (+- 50 lines of code) while the application is running, and it doesn't even show up on the framerate graph at a framerate of about 3000 fps. Thats on my desktop. On the laptop it shows a little hick, but it has some crappy Intel card so that doesn't really count  ;)
When I am king, they shall not have bread and shelter only, but also teachings out of books, for a full belly is little worth where the mind is starved - Mark Twain

princec

Quote from: lagz85 on April 28, 2010, 16:36:33(I thought all OpenGL calls had to be made from a single thread, is this not the case?)
No, it's not the case :) Only one thread can own a context at a time. However, you can do your compilation on a different context using a different thread, and then share the contexts, so that the rendering context can use the fragment and vertex shaders. Well, that's the theory. Similar thing for big texture uploading - do that in a shared context on another thread if it's hurting performance (completely unlikely though as that's just DMA bandwidth).

Cas :)

spasi

Quote from: lagz85 on April 28, 2010, 16:36:33Only other question: How can you "split compilation over many frames" as mentioned in the last post? As far as I can see compilation is just a single method call, with no easy way to split it up (obviously you can do the linking etc in a separate frame, but the compilation method call by itself takes long enough to cause a pause, so that doesn't help).

I said spread, not split. For example, if you know you'll need 10 new shaders 10 frames from now, you could try to compile 1 shader every frame for 10 frames, instead of 10 shaders in 1 frame. If the GL driver uses a separate thread for compiling shaders, there's another thing you can try. Instead of requesting the GL_COMPILE_STATUS immediately after compiling the shader, you can try and retrieve it 1-2 frames later, just like is usually done with occlusion queries.

Quote from: Rene on April 28, 2010, 16:58:54I really have to disagree with that. I just ran a simple test where I load, compile and link a relatively simple shader program (+- 50 lines of code) while the application is running, and it doesn't even show up on the framerate graph at a framerate of about 3000 fps. Thats on my desktop. On the laptop it shows a little hick, but it has some crappy Intel card so that doesn't really count  ;)

Do you bind and use the program for rendering after linking it?

Anyway, I've personally never had any issues with shader compilation (and haven't tried any of my suggestions above), but I've heard lots of complains about it, especially from people that are forced to generate hundreds of shader combinations (for performance reasons). Support for offline compilation and binary shader uploads is one of the top requests from the OpenGL community.

Rene

Quote from: spasi on April 29, 2010, 07:00:29
Do you bind and use the program for rendering after linking it?

Anyway, I've personally never had any issues with shader compilation (and haven't tried any of my suggestions above), but I've heard lots of complains about it, especially from people that are forced to generate hundreds of shader combinations (for performance reasons). Support for offline compilation and binary shader uploads is one of the top requests from the OpenGL community.

Yeah, basically what I do is
if(framecount == 10000){
    someObject.addShaderEffect(new SomeEffect());
}

Shader loading is logged, so I'm sure they aren't loaded at startup.

What I should add is that this 'framerate graph' isn't very accurate. It's only 50 pixels high and mainly meant for quick benchmarks.
When I am king, they shall not have bread and shelter only, but also teachings out of books, for a full belly is little worth where the mind is starved - Mark Twain

lagz85

Quote from: spasi on April 28, 2010, 15:21:37
One option you might want to try is compiling/linking shaders in a separate thread/context. You can check LWJGL's tests for an example that does this for background thread texture loading (just replace texture loading with shader loading). You'll need a nightly release, the test is test.opengl.multithread.BackgroundLoadTest. Its usefulness depends on what your app is doing of course, but I think there are many chances to load stuff on the background in most 3D apps.

Just got round to taking a look at this and it looks like a really neat way of loading resources :). I think I'm going to move all my texture and shader loading over to this kind of approach. It should make my life much easier.

My only other question is: what is the 'cost' of creating and using a SharedDrawable? The very easiest approach to my problem would just be to create a new one whenever I want to create a resource. Is this a stupid idea compared to just creating one SharedDrawable and using it to load all of my background resources?

lagz85

[Update - solution below] Hmm. The BackgroundLoadTest throws an exception in both the PB and SD modes on my machine. This is on Windows 7 with nVidia drivers.

Exception in thread "Thread-1" java.lang.RuntimeException: org.lwjgl.LWJGLException: Could not share contexts
   at BackgroundLoader$1.run(BackgroundLoader.java:90)
   at java.lang.Thread.run(Unknown Source)
Caused by: org.lwjgl.LWJGLException: Could not share contexts
   at org.lwjgl.opengl.WindowsContextImplementation.nCreate(Native Method)
   at org.lwjgl.opengl.WindowsContextImplementation.create(WindowsContextImplementation.java:50)
   at org.lwjgl.opengl.Context.<init>(Context.java:127)
   at org.lwjgl.opengl.Pbuffer.<init>(Pbuffer.java:225)
   at org.lwjgl.opengl.Pbuffer.<init>(Pbuffer.java:190)
   at org.lwjgl.opengl.Pbuffer.<init>(Pbuffer.java:166)
   at BackgroundLoadTest$1.getDrawable(BackgroundLoadTest.java:192)
   at BackgroundLoader$1.run(BackgroundLoader.java:87)
   ... 1 more



It looks like you are not allowed to call Display.getDrawable() from the background thread, which you effectively do in your implementation of the abstract getDrawable method of BackgroundLoader. The solution is to pass the drawable in as an argument to a concrete BackgroundLoader class:

backgroundLoader = new BackgroundLoader(new SharedDrawable(Display.getDrawable()));

Another problem is that it is vital that you call GL11.flush() after you load each texture in the background thread. My machine actually renders the sphere with the texture before last because the GL commands have not been flushed :-)!