Help with optimizing shader uniform calls

Andrew_3ds · November 12, 2015, 02:33:40

I'm using single pass multiple lighting in my scene. However, for some reason with only one light I get ~250 fps, which with the same scene in jmonkey engine, I can get around 200-300 more fps. I don't know if its an issue with my shader or the uniforms. The reason I decided to use single pass instead of multipass is that I do not have to render a mesh multiple times to get the same effect. Also, if possible, how would I have a better way to link uniforms with dynamic multiple lights? For example, in the getUniformLocations method, I manually create 10 light uniforms for each point light (I would have to do this twice once I add spot lights). That is not ideal, and I was wondering if there is a better way to do this. Here is my PhongShader class.

public class PhongShader extends Shader {
    private int[] matrix;

    public PhongShader() {
        super();
        addVertexSource(ResourceLoader.loadShader("PhongVertex.vs"));
        addFragmentSource(ResourceLoader.loadShader("PhongFragment.fs"));
        compile();
    }

    public int[] getMatrixLocations() {
        return matrix;
    }

    @Override
    public void getUniformLocations() {
        matrix = new int[] {
                getUniformLocation("matrix.model"),
                getUniformLocation("matrix.view"),
                getUniformLocation("matrix.projection")
        };

        addUniform("material.diffuse");
        addUniform("material.color");
        addUniform("material.color");
        addUniform("material.specularIntensity");
        addUniform("material.specularExponent");
        addUniform("ambientLight");
        addUniform("eyePos");
        addUniform("amtPLights");
        addUniform("dirLight.base.color");
        addUniform("dirLight.base.intensity");
        addUniform("dirLight.direction");
        for (int i = 0; i < 10; i++) {
            addUniform("pointLights["+i+"].base.color");
            addUniform("pointLights["+i+"].base.intensity");
            addUniform("pointLights["+i+"].atten.constant");
            addUniform("pointLights["+i+"].atten.linear");
            addUniform("pointLights["+i+"].atten.exponent");
            addUniform("pointLights["+i+"].position");
        }
    }

    @Override
    public void updateUniforms(Object... args) {
        loadMat4(matrix[0],(Matrix4f)args[0]);
        loadMat4(matrix[1],(Matrix4f)args[1]);
        loadMat4(matrix[2],(Matrix4f)args[2]);
        loadInt("material.diffuse",(Integer)args[3]);
        loadVec3("material.color",(Vector3f)args[4]);
        loadFloat("material.specularIntensity",(Float)args[5]);
        loadFloat("material.specularExponent",(Float)args[6]);
        loadVec3("eyePos", CoreStructure.getRenderingEngine().getCamera().getPos());
        loadVec3("ambientLight", LightingEngine.getAmbientLight());

        int amtLights = LightingEngine.getLights().size();
        int amtPLights = 0;
        for(int i = 0; i < amtLights; i++) {
            BaseLight baseLight = LightingEngine.getLights().get(i);
            if(baseLight instanceof DirLight) {
                loadDirLight((DirLight)baseLight,"dirLight");
            }
            if(baseLight instanceof PointLight) {
                loadPointLight((PointLight)baseLight,"pointLights",amtPLights++);
            }

        }

        loadInt("amtPLights",amtPLights);
    }

    private void loadBaseLight(BaseLight base, String uniform) {
        loadVec3(uniform+".base.color",new Vector3f(base.getColor()));
        loadFloat(uniform+".base.intensity",base.getIntensity());
    }

    private void loadDirLight(DirLight light, String uniform) {
        loadBaseLight(light,uniform);
        loadVec3(uniform+".direction",light.getDirection());
    }

    private void loadPointLight(PointLight light, String uniform, int index) {
        loadBaseLight(light,uniform+"["+index+"]");
        loadFloat(uniform+"["+index+"].atten.constant",light.getAtten().getConstant());
        loadFloat(uniform+"["+index+"].atten.linear",light.getAtten().getLinear());
        loadFloat(uniform+"["+index+"].atten.exponent",light.getAtten().getExponent());
        loadVec3(uniform+"["+index+"].position",light.getPosition());
    }
}

I'm not sure if it's the class or the shader itself that is causing bad performance. I heard that OpenGL can execute shader code even when it's not supposed to, like it would still run both sides of an if statement but only execute the correct case. My lights are working, but I have bad performance. If you have an idea why, could you change my code a bit and tell me what's wrong? Here is the fragment shader source, if you are concerned with looking at it

#version 330 core

#define MAX_LIGHTS 10

in vec3 out_worldPos;
in vec2 out_texCoord;
in vec3 out_normal;

out vec4 FragColor;

uniform vec3 ambientLight;
uniform vec3 eyePos;

uniform struct Material {
    sampler2D diffuse;
    vec3 color;
    float specularIntensity;
    float specularExponent;
} material;

struct BaseLight {
    vec3 color;
    float intensity;
};

struct DirLight {
    BaseLight base;
    vec3 direction;
};
uniform DirLight dirLight;

struct Attenuation {
    float constant;
    float linear;
    float exponent;
};

struct PointLight {
    BaseLight base;
    Attenuation atten;
    vec3 position;
};
uniform PointLight pointLights[MAX_LIGHTS];
uniform int amtPLights;

vec3 calcLight(BaseLight base, vec3 direction, vec3 normal) {
    float diffuseFactor = dot(normal,-direction);
    
    vec3 diffuseColor = vec3(0);
    vec3 specularColor = vec3(0);
    
    if(diffuseFactor > 0) {
        diffuseColor = base.color * base.intensity * diffuseFactor;
        
        vec3 directionToEye = normalize(eyePos - out_worldPos);
        vec3 reflectDir = normalize(reflect(direction,normal));
        
        float specularFactor = dot(directionToEye, reflectDir);
        specularFactor = pow(specularFactor,material.specularExponent);
        
        if(specularFactor > 0) {
            specularColor = specularFactor * material.specularIntensity * base.color;
        }
    }
    
    return diffuseColor + specularColor;
}

vec3 calcDirLight(DirLight light, vec3 normal) {
    return calcLight(light.base,-light.direction,normal);
}

vec3 calcPointLight(PointLight light, vec3 normal) {
    vec3 lightDir = out_worldPos - light.position;
    float distanceToLight = length(lightDir);
    
    lightDir = normalize(lightDir);
    vec3 lightColor = calcLight(light.base, lightDir, normal);
    
    float attenuation = light.atten.constant + 
                        light.atten.linear * distanceToLight +
                        light.atten.exponent * distanceToLight * distanceToLight +
                        0.0001;
    
    return lightColor / attenuation;
}

void main(void) {
    vec3 surfaceColor = (texture2D(material.diffuse,out_texCoord) * vec4(material.color,1.0)).xyz;
    vec3 normal = normalize(out_normal);
    vec3 linearColor = ambientLight;
    
    linearColor += calcDirLight(dirLight,normal);
    
    for(int i = 0; i < amtPLights; i++) {
        linearColor += calcPointLight(pointLights[i],normal);
    }

    float gamma = 2.2;
    vec3 pixel = pow(surfaceColor * linearColor, vec3(1.0/gamma));
    FragColor = vec4(pixel,1.0);
}

Kai · November 12, 2015, 13:53:47

Generally, you should always first try to detect where your actual performance bottleneck is and what your performance is currently bound by:
- you could be Java/CPU bound by some calculations you are doing in Java
- you could be driver call / JNI bound
- you could be vertex transform bound
- you could be shader instruction overhead bound
- you could be fillrate/ROP bound

I suspect the second.

In your case, try using Uniform Buffer Objects (i.e. uniforms whose memory is baked by buffer objects) to reduce the number of OpenGL/JNI calls to update the uniforms.
With UBO you would just update the light properties in a simple ByteBuffer and then upload that once for each frame.

That should give you better performance, as I suspect JNI calls to be your bottleneck.

Every type of bottleneck can be tested, though. For example, you can test whether you are driver/JNI bound, by just adding more JNI calls, such as redundant glUniform* calls and see whether the framerate drops significantly.
Likewise being shader instruction overhead (in the fragment shader) and fillrate limited can be tested by reducing the resolution of your framebuffer/window.

I don't think you are shader instruction overhead limited with your shader.

Andrew_3ds · November 12, 2015, 23:19:54

I looked into it, and it's either uniform overhead or fill rate or vertex shader related. I used a shader that simply textures and has no light with no fps increase. Changing the samples from 0-8 on anti aliasing doesn't have an affect on frame rate. I looked into UBOs, but every website has a different solution. How would I achieve loading a struct (i.e. a point light object struct) into a UBO? I can't use antrying higher than opengl 3.2, as my current gpu doesn't support opengl 4.

Kai · November 13, 2015, 08:33:53

Quote from: Andrew_3ds on November 12, 2015, 23:19:54
I looked into UBOs, but every website has a different solution. How would I achieve loading a struct (i.e. a point light object struct) into a UBO?

Structs are nothing but their members linearly/monotonically written in memory.
Open the ARB_uniform_buffer_object spec and search for the string

QuoteWhen using the "std140" storage layout, structures will be laid out

The layout rules are described precisely there and will show you where in the ByteBuffer you need to place the individual struct members.

Kai · November 13, 2015, 09:11:23

Also since mapping Java primitives to members in uniform buffer objects or shader storage buffer objects is a completely algorithmic process, one could think of building Java class pendants for your GLSL structs and then writing a generator that will either statically generate code to fill a ByteBuffer/UBO with a list of Java objects of GLSL structs taking the layout rules into account, or using a dynamic runtime approach via reflection. The latter could however be too slow.
The optimum, in my opinion, would be to dynamically generate a mapping class during runtime via ASM that will do the mapping of Java classes to GLSL structs in a buffer object.
This could actually be useful for some people, I guess. Maybe I'll write something like that sometime.

News:

Help with optimizing shader uniform calls

Andrew_3ds

Kai

Andrew_3ds

Kai

Kai