Dynamic Rendering Pipeline

Started by manu3d, May 04, 2015, 23:14:16

Previous topic - Next topic

manu3d

The project I'm contributing to, Terasology, is currently blessed with a number of niceties when it comes to its rendering engine, i.e. ambient occlusion, DoF, bloom filter, hdr-based tone mapping, motion blur, film grain and so on. Its rendering pipeline however is largely static. Aspects of it, i.e. filters, can be turned on and off via if-statements, but new features require going down to the source code and modifying it.

In this context I've been thinking about setting up individual passes as "Rendering Tasks", which can be dependent on other Rendering Tasks and together form a directed acyclic graph wrapped by a Rendering Pipeline object. Adding a filter then would be just a matter of writing a new Rendering Task and adding it to the pipeline's DAG. Crucially, tasks could be added to and removed from a pipeline at runtime. I.e. if the bloom filter gets disabled, the whole task is removed from the pipeline. Also, multiple pipeline could coexist, i.e. one used for each in-game security camera, rendering to textures and featuring screen-like noise on top of a 3d scene, and one for the actual rendering to the display.

Now, in this big picture I have at least one concern. Global state changes operated within a task might be beneficial, inconsequential or unhelpful for the following task. I.e. a rendering task might set something one way and the next task might need it in a different way. Having knowledge of the inner working of a task would help, but is not realistic in a context where the shape of the pipeline can change and new tasks can be added through, for example, external plugins.

To solve this problem, I can imagine a number of strategies:

1. each rendering task is required to set the state back to how it found it (easy to implement, but perhaps wasteful?)
2. each rendering task is responsible for verifying the state OpenGL is in and change it as necessary (checking the whole state sounds like a lot of checks!)
3. opengl state-changing operations between renders are analyzed prior to the rendering pipeline being used for the first time. Redundant state changes are eliminated from the pipeline.

Thoughts?

 

Cornix

I am not the greatest OpenGL developer there is, but there is one thing I learned: Dont check the OpenGL state. Whenever you try to get data out of OpenGL it is rather slow, because you are not supposed to. In any case, first checking and then setting will be slower then just setting regardless of previous state. I would recommend that either:
1) Each task will set all the state it needs at the beginning, redundant state changes are not a big problem
2) Each task will use its own custom shader and buffers and so they have very little effect on each other

Kirov Anatoly

I'm not an expert either, but I think you should neither check OpenGL state NOR set it redundantly.
Instead you should wrap all relevant opengl state (which shouldnt be too much) in your own code.

This way you can just do nothing when you detect a redundant change, only calling the opengl function when the state changes.
The overhead will be a single function call, which should be managable.

Instead of (stupid example but you get the point) calling glEnable(GL_TEXTURE_2D) 5 times you'd call myGLEnable(GL_TEXTURE_2D) 5 times, which would in turn call glEnable(GL_TEXTURE_2D) only once, because you know the other 4 times are superfluous. Of course you'd need to track state, and also have a substitute for glDisable, but you get the point. Extending this reasoning, you could check for enables/disables, texturebinds, shaderbinds, bufferbinds, anything you want!

In general, wrapping opengl functionality for shaders and buffers into your own classes is a good idea, since OpenGL is not built in the OO paradigm.

The other approaches you mentioned are technically more performant, but also way more difficult. So I'd start using this, and see how it works out for you. It should be fine.

More info on redundancy here:
https://www.opengl.org/discussion_boards/showthread.php/148964-How-Expensive-are-redundant-State-Changes


manu3d

Quote from: Cornix on May 05, 2015, 06:02:08
2) Each task will use its own custom shader and buffers and so they have very little effect on each other

This made me think. A number of rendering tasks are purely shader based, but a number of other tasks are straight opengl renderings. I wonder if I could do those completely in shaders, bypassing the fixed rendering pipeline and only have to do bindings, no other state changes involved. I'll investigate.

manu3d

Quote from: Kirov Anatoly on May 05, 2015, 15:45:40
I'm not an expert either, but I think you should neither check OpenGL state NOR set it redundantly.
Instead you should wrap all relevant opengl state (which shouldnt be too much) in your own code.
(...)
More info on redundancy here:
https://www.opengl.org/discussion_boards/showthread.php/148964-How-Expensive-are-redundant-State-Changes

Yes, when I was writing point 3 in my first post I was thinking exactly about wrapping at least some state-changing functions so that redundant calls are eliminated. I this context I thought of compiling a list of state changes, eliminate any redundancy and only then use the list to run through the pipeline. I imagine conceptually one could even write something that changes the order of the state changes and rendering tasks so that potentially even more state changes are eliminated. I feel my brain starting to twist at the thought but it should be possible if I wanted to do something really fancy.

Regarding the link you provided, thank you for it, it was informative. Unfortunately the last post in the thread was in May 2001. I therefore wonder how much has changed since then in the implementations from the major manufacturers. Perhaps state changes are not as bad as they used to be?

Kirov Anatoly

Haha, didn't even notice the 2001. But generally this kind of info stays true, its just that the performance impact reduces a lot over the years. From my own experience, I can say that switching textures, shaders and FBO's is very slow (my video card is 5 yrs old though).
When rescheduling for improved redundancy, you really shoud focus on only textures and shaders, since they have the biggest impact (perhaps FBO's have a big impact too, I'm not sure, but you can only fix this if the resolution of multiple tasks are identical). This means you don't have to look at ALL state when you are sorting the tasks, just the most performance intensive state.

I must add that you might be prematurely optimising this, so I'd like to ask you to think about why you are doing this, exactly. Do you know it will be "too slow" ? Do you know you can significantly improve the speed by optimising? If you just wanna do this out of curiosity, go ahead. But if you want to have progress developing, maybe you should just start creating the tasks and see when and where the problems arise.

manu3d

Good to know that textures, shaders and potentially FBOs are the heavy hitters in terms of state change weight. Reading around I found that also setting a number of blending state variable can have a significant performance hit as lots changes under the hood.

Regarding premature optimization, I don't have the current performance data, so it might look like premature optimization. However these are also big architectural issues. Terasology is open source but has also been designed with mods in mind. Mods can already alter and add a lot of things but they wouldn't be able to touch the renderer and its pipeline. Light shafts and ambient occlusion for example are currently implemented in screen space. At some point somebody might want to proper 3d implementations. Right now this wouldn't be possible without changing the rendering engine code.

To open the rendering engine and its pipeline to mods, so that currently unspecified rendering tasks can be added to it, implies working out a generic way to handle state changes, so that a rendering task can be developed in relative isolation and it can be usually considered a black box. Perhaps I should have been clearer in my original post that my goal is not optimization but "moddability" without sacrificing too much performance.