Dropping frames doing nothing but swap buffers on OS X

Started by nvt, July 31, 2016, 21:28:54

Previous topic - Next topic

nvt

In order to try and track down persistent stuttering in a game I'm working on I created an absolutely minimal game loop for a baseline. However, even doing nothing but swap buffers every frame I am getting frequent dropping of frames. Often the interval between iterations of the loop is in the range of 0.035 to 0.08 seconds (with these spikes occurring at least once a second). Has anyone had similar experiences, particularly on OS X? I'm leaning towards some sort of hardware failure or OS bug at this point, but any advice is welcome.

Code below:

public class SlowUpdates {
    long window = 0l;

    public void run() {
        try {
            init();
            loop();

            glfwDestroyWindow(window);
        } finally {
            glfwTerminate();
        }
    }

    public void init() {
        if (!glfwInit()) {
            throw new IllegalStateException("Unable to initialize GLFW");
        }

        // Configure our window
        glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 2); // using a 4.1 profile exhibits the same problems
        glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 0);

        window = glfwCreateWindow(800,600, "Hello World!", 0L, 0L);

        // Get the resolution of the primary monitor
        GLFWVidMode vidmode = glfwGetVideoMode(glfwGetPrimaryMonitor());
        // Make the OpenGL context current
        glfwMakeContextCurrent(window);
        glfwSwapInterval(1); // Setting VSync off makes no difference
        glfwShowWindow(window);
    }

    public void loop() {
        GL.createCapabilities();
        glViewport(0,0,800,600);

        double lastUpdated = glfwGetTime();

        // Run until the user indicates we should stop
        while (!glfwWindowShouldClose(window)) {
            double curTime = glfwGetTime();
            double deltaSeconds = curTime - lastUpdated;
            lastUpdated = curTime;

            // If we have a gap of more than 2 frames, print out the delta
            if (deltaSeconds > (0.016666667 * 2.1)) {
                System.out.println("Long update time: " + deltaSeconds);
            }

            glfwSwapBuffers(window);
            glfwPollEvents();
        }
    }

    public static void main(String[] args) {
        new SlowUpdates().run();
    }
}


Further running information:
Machine: 2.3GHz Late 2013 Macbook Pro (with discrete graphics card)
Resolution: Default
VM Args: -XstartOnFirstThread
Java version : 8
LWJGL version : 3.0.0

Example output from running on my machine for 10 seconds or so:

Long update time: 0.03519966600000002
Long update time: 0.048400801999999965
Long update time: 0.05260543899999992
Long update time: 0.05686535199999998
Long update time: 0.039696041999999654
Long update time: 0.05309769600000003
Long update time: 0.05737294300000029
Long update time: 0.03731823899999975
Long update time: 0.0663081339999998
Long update time: 0.04076894600000003
Long update time: 0.03728830400000005
Long update time: 0.038040972999999756
Long update time: 0.05522503099999998
Long update time: 0.04303661199999986
Long update time: 0.043963965000000105
Long update time: 0.03907344100000021
Long update time: 0.03926434200000006
Long update time: 0.03820931900000035
Long update time: 0.05243613700000038
Long update time: 0.03801172999999958
Long update time: 0.03797626900000006
Long update time: 0.03930413699999935
Long update time: 0.05232179700000028
Long update time: 0.04220759300000054
Long update time: 0.03862296800000031
Long update time: 0.039433597000000375
Long update time: 0.04283244800000041
Long update time: 0.036080586000000636
Long update time: 0.03919839099999933
Long update time: 0.04091377800000018
Long update time: 0.05657044099999986
Long update time: 0.041036066000000204
Long update time: 0.04767241800000033
Long update time: 0.0377868320000001
Long update time: 0.04219191099999975
Long update time: 0.04396938799999983
Long update time: 0.04859977400000037
Long update time: 0.05908144699999962
Long update time: 0.039497417000000645
Long update time: 0.046268094000000204
Long update time: 0.03927137400000014
Long update time: 0.043469968000000136
Long update time: 0.04399093800000031
Long update time: 0.041085579999999844
Long update time: 0.04568488099999968
Long update time: 0.03579920800000025
Long update time: 0.05230304200000013
Long update time: 0.04685661700000132
Long update time: 0.04761215599999957
Long update time: 0.03753488800000149
Long update time: 0.038046283999999986
Long update time: 0.03772946099999963
Long update time: 0.04108765500000011
Long update time: 0.038663435000000135
Long update time: 0.03759004499999996
Long update time: 0.052365797000000214
Long update time: 0.0374093490000007
Long update time: 0.04953469599999849
Long update time: 0.0395372260000002
Long update time: 0.03847309400000043
Long update time: 0.03951624100000117
Long update time: 0.04416338000000053
Long update time: 0.038125530000000296
Long update time: 0.03944585000000167
Long update time: 0.03862907599999943
Long update time: 0.03533686100000111
Long update time: 0.03519534199999974
Long update time: 0.03853694799999907
Long update time: 0.04572978800000094
Long update time: 0.06691946499999979
Long update time: 0.04981436099999925
Long update time: 0.03572906099999962
Long update time: 0.038748087999998404
Long update time: 0.03947808300000055
Long update time: 0.047825573999999094
Long update time: 0.03803626299999863
Long update time: 0.03783003899999926
Long update time: 0.03801885099999858
Long update time: 0.04459856499999937
Long update time: 0.03689624899999977
Long update time: 0.05462112100000027
Long update time: 0.037220873000000765
Long update time: 0.035962877000001114
Long update time: 0.04610506699999917
Long update time: 0.035799774000000895
Long update time: 0.049529357999999135
Long update time: 0.03719905700000048
Long update time: 0.04233656099999905
Long update time: 0.04300985000000068
Long update time: 0.04495792899999884
Long update time: 0.05274710000000127
Long update time: 0.05670581400000074
Long update time: 0.04431241099999994
Long update time: 0.04083954999999939
Long update time: 0.03708499800000098
Long update time: 0.03880964099999851
Long update time: 0.03889574400000129
Long update time: 0.05666360800000092
Long update time: 0.04578334400000017
Long update time: 0.038040420999999824
Long update time: 0.039791002000001185
Long update time: 0.038779772999999906
Long update time: 0.03913228300000071
Long update time: 0.03952030800000017
Long update time: 0.06215457499999921
Long update time: 0.0441046140000001
Long update time: 0.038172996000000126
Long update time: 0.03680606600000047
Long update time: 0.038834962999999334
Long update time: 0.0384102189999993
Long update time: 0.038783572999999905


Edit: spent an hour getting the test program running on the same machine under Windows and the poor performance does not occur. This would appear to be something mac specific.
Edit: wrote an equivalent program in C using glfw directly (version 3.0.2) and the problem does not occur. For sanity checking I built a separate project that runs the code above, but using the JGLFW bindings instead of the LWJGL bindings, and performance is good there as well. Unless I can figure something else out, it looks like there is some sort of odd behavior in the LWJGL layer on a mac, but since all LWJGL's glfwSwapBuffers does is call straight through to GLFW, I'm not sure what could cause that.
Edit: and now the JGLFW version has started behaving poorly as well. Latest working theory is that my computer is haunted.

spasi

Afaict it's an OS X problem with non-fullscreen windows. If you switch to fullscreen there should be no stuttering. If it's related to LWJGL (or how LWJGL-CI builds GLFW for OSX), I haven't been able to identify such an issue. I'll try the native GLFW again, but IIRC the issue is present on there as well.

metric

I just want to let you know, this is also now happening on Window's 10 latest update. At least, this seems to be the problem I am experiencing as well when in window mode and FULL SCREEN mode. Where in it randomly stutters every so often. I am using JDK 8U102 for compiling with and the latest stable build of LWJGL.

All I can say is it definitely has something to do with with the glfw window. The glfw getTime is even affected by it! I know, because I have an fps counter using it, but when I use the built in java Date class for my profiler, it simply returns around 0.001ms to 0.002ms for my whole render loop with 256 separate renderers. Yet, it seems glfw getTime is stuck for a few loops, but my actual game loop is continuing to update. I say this, because my camera matrix and other objects are being updated even when this happens, but the screen is not. The issue is definitely more pronounced in window mode than full screen mode, but it is happening in both for me.

I am running 2 Nvidia 970s with the latest drives, but I only currently have 1 970 set for use with games, aka it isn't in sli mode. The other thing I should mention is that this does indeed only occur when glfwSwapInterval is set to 0. When I set it to 1 for vsync, everything is peachy smooth!

spasi

java.util.Date uses System.currentTimeMillis() internally, which has limited accuracy. Comparing glfwGetTime() to System.nanoTime() would be better.

Could you share the code you use to test on Windows? Do you use any extra threads? Do you do anything odd while rendering and does it happen if you simplify rendering (e.g. a simple glClear)?

metric

I am using one other thread, but it isn't even interacting with the render code whatsoever. The extra thread is simply processing data arrays for computing positions. However, those positions are not being rendered from the thread, and the positions are only accessed from the main thread after the thread is finished. I am however, doing deferred rendering with 2 FBO. One for the deferred part and another for post processing effects.

Give me a bit and I will put together a simple glclear with the nano time and the fps counter to see if it still happens.

Edit -- This is the result of using glfw set to using opengl 4 and with no vsync enabled, and then simply clearing color buffer bit and depth buffer bit and then glclearcolor to blue. The fps is using the glfw getTime. The other Render loop profiler is now using nano. And as you can tell even the profiler is now jumping up to 0.29-0.4ms when it stutters after the swap buffer. The profiler starts at the beginning and then ends at after swapbuffer and poll. You will also notice the fps drops from 650ish to 575ish when the stutter occurs

Render Loop With GLClear: 0.147748ms
FPS: 653.8563
Render Loop With GLClear: 0.153307ms
FPS: 653.8563
Render Loop With GLClear: 0.160036ms
FPS: 653.8563
Render Loop With GLClear: 0.153014ms
FPS: 653.8563
Render Loop With GLClear: 0.174956ms
FPS: 653.8563
Render Loop With GLClear: 0.162376ms
FPS: 653.8563
Render Loop With GLClear: 0.261557ms
FPS: 653.8563
Render Loop With GLClear: 0.251609ms
FPS: 653.8563
Render Loop With GLClear: 0.2402ms
FPS: 653.8563
Render Loop With GLClear: 0.195436ms
FPS: 653.8563
Render Loop With GLClear: 0.13897ms
FPS: 653.8563
Render Loop With GLClear: 0.158865ms
FPS: 653.8563
Render Loop With GLClear: 0.143652ms
FPS: 653.8563
Render Loop With GLClear: 0.185489ms
FPS: 653.8563
Render Loop With GLClear: 0.149796ms
FPS: 653.8563
Render Loop With GLClear: 0.154477ms
FPS: 653.8563
Render Loop With GLClear: 0.153013ms
FPS: 653.8563
Render Loop With GLClear: 0.166472ms
FPS: 653.8563
Render Loop With GLClear: 0.171445ms
FPS: 653.8563
Render Loop With GLClear: 0.1299ms
FPS: 653.8563
Render Loop With GLClear: 0.128438ms
FPS: 653.8563
Render Loop With GLClear: 0.127268ms
FPS: 653.8563
Render Loop With GLClear: 0.115565ms
FPS: 653.8563
Render Loop With GLClear: 0.12756ms
FPS: 653.8563
Render Loop With GLClear: 0.12405ms
FPS: 653.8563
Render Loop With GLClear: 0.127852ms
FPS: 653.8563
Render Loop With GLClear: 0.132241ms
FPS: 575.688
Render Loop With GLClear: 0.184903ms
FPS: 575.688
Render Loop With GLClear: 0.216794ms
FPS: 575.688
Render Loop With GLClear: 0.154769ms
FPS: 575.688
Render Loop With GLClear: 0.21182ms
FPS: 575.688
Render Loop With GLClear: 0.423055ms
FPS: 575.688
Render Loop With GLClear: 0.236981ms
FPS: 575.688
Render Loop With GLClear: 0.122293ms
FPS: 575.688
Render Loop With GLClear: 0.129901ms
FPS: 575.688
Render Loop With GLClear: 0.126683ms
FPS: 575.688
Render Loop With GLClear: 0.126975ms
FPS: 575.688
Render Loop With GLClear: 0.131071ms
FPS: 575.688
Render Loop With GLClear: 0.120539ms
FPS: 575.688
Render Loop With GLClear: 0.13663ms
FPS: 575.688
Render Loop With GLClear: 0.126683ms
FPS: 575.688
Render Loop With GLClear: 0.123171ms
FPS: 575.688
Render Loop With GLClear: 0.125805ms
FPS: 575.688
Render Loop With GLClear: 0.134289ms
FPS: 575.688
Render Loop With GLClear: 0.125219ms
FPS: 575.688
Render Loop With GLClear: 0.12405ms
FPS: 575.688
Render Loop With GLClear: 0.12639ms
FPS: 575.688
Render Loop With GLClear: 0.1378ms
FPS: 575.688
Render Loop With GLClear: 0.129315ms
FPS: 575.688
Render Loop With GLClear: 0.123757ms
FPS: 575.688
Render Loop With GLClear: 0.116735ms
FPS: 575.688
Render Loop With GLClear: 0.12873ms
FPS: 575.688
Render Loop With GLClear: 0.130778ms
FPS: 575.688
Render Loop With GLClear: 0.123465ms
FPS: 575.688
Render Loop With GLClear: 0.120539ms
FPS: 575.688
Render Loop With GLClear: 0.12756ms
FPS: 575.688
Render Loop With GLClear: 0.126975ms
FPS: 575.688
Render Loop With GLClear: 0.126683ms
FPS: 575.688
Render Loop With GLClear: 0.126097ms
FPS: 575.688
Render Loop With GLClear: 0.128438ms
FPS: 575.688
Render Loop With GLClear: 0.183148ms
FPS: 575.688
Render Loop With GLClear: 0.13897ms
FPS: 575.688
Render Loop With GLClear: 0.129023ms
FPS: 575.688
Render Loop With GLClear: 0.129023ms
FPS: 575.688
Render Loop With GLClear: 0.128731ms
FPS: 575.688
Render Loop With GLClear: 0.114394ms
FPS: 575.688
Render Loop With GLClear: 0.125219ms
FPS: 647.70905


Here is when vsync is enabled to 1:
FPS: 58.983757
Render Loop With GLClear: 18.00442ms
FPS: 58.983757
Render Loop With GLClear: 16.012901ms
FPS: 58.983757
Render Loop With GLClear: 16.330631ms
FPS: 58.983757
Render Loop With GLClear: 16.520509ms
FPS: 58.983757
Render Loop With GLClear: 16.551813ms
FPS: 58.983757
Render Loop With GLClear: 16.592481ms
FPS: 58.983757
Render Loop With GLClear: 16.434493ms
FPS: 58.983757
Render Loop With GLClear: 16.485108ms
FPS: 58.983757
Render Loop With GLClear: 16.798742ms
FPS: 58.983757
Render Loop With GLClear: 16.402896ms
FPS: 58.983757
Render Loop With GLClear: 16.644266ms
FPS: 58.983757
Render Loop With GLClear: 16.630222ms
FPS: 58.983757
Render Loop With GLClear: 16.823026ms
FPS: 58.983757
Render Loop With GLClear: 16.299034ms
FPS: 58.983757
Render Loop With GLClear: 16.516998ms
FPS: 58.983757
Render Loop With GLClear: 18.321272ms
FPS: 58.983757
Render Loop With GLClear: 15.485984ms
FPS: 58.983757
Render Loop With GLClear: 16.63227ms
FPS: 58.983757
Render Loop With GLClear: 16.146313ms
FPS: 58.983757
Render Loop With GLClear: 16.738765ms
FPS: 58.983757
Render Loop With GLClear: 17.155092ms
FPS: 58.983757
Render Loop With GLClear: 16.088676ms
FPS: 58.983757
Render Loop With GLClear: 16.627882ms
FPS: 58.983757
Render Loop With GLClear: 17.167087ms
FPS: 58.983757
Render Loop With GLClear: 16.141925ms
FPS: 58.983757
Render Loop With GLClear: 15.969601ms
FPS: 58.983757
Render Loop With GLClear: 17.618521ms
FPS: 58.983757
Render Loop With GLClear: 16.209801ms
FPS: 58.983757
Render Loop With GLClear: 16.849942ms
FPS: 58.983757
Render Loop With GLClear: 15.717698ms
FPS: 58.983757
Render Loop With GLClear: 19.014077ms
FPS: 58.983757
Render Loop With GLClear: 15.035134ms
FPS: 58.983757
Render Loop With GLClear: 16.223258ms
FPS: 58.983757
Render Loop With GLClear: 16.584289ms
FPS: 58.983757
Render Loop With GLClear: 16.715067ms
FPS: 58.983757
Render Loop With GLClear: 16.614424ms
FPS: 58.983757
Render Loop With GLClear: 16.9067ms
FPS: 58.983757
Render Loop With GLClear: 16.3988ms
FPS: 58.983757
Render Loop With GLClear: 16.327705ms
FPS: 58.983757
Render Loop With GLClear: 16.284406ms
FPS: 58.983757
Render Loop With GLClear: 17.229989ms
FPS: 58.983757
Render Loop With GLClear: 16.350819ms
FPS: 58.983757
Render Loop With GLClear: 16.780603ms
FPS: 58.983757
Render Loop With GLClear: 16.376857ms
FPS: 58.983757
Render Loop With GLClear: 16.813955ms
FPS: 58.983757
Render Loop With GLClear: 16.213896ms
FPS: 58.983757
Render Loop With GLClear: 17.096871ms
FPS: 58.983757
Render Loop With GLClear: 16.179665ms
FPS: 58.983757
Render Loop With GLClear: 16.684347ms
FPS: 58.983757
Render Loop With GLClear: 16.871007ms
FPS: 58.983757
Render Loop With GLClear: 16.522849ms
FPS: 58.983757
Render Loop With GLClear: 16.530748ms
FPS: 58.983757
Render Loop With GLClear: 16.079022ms
FPS: 58.983757
Render Loop With GLClear: 17.124079ms
FPS: 59.01507
Render Loop With GLClear: 16.423961ms
FPS: 59.01507
Render Loop With GLClear: 16.25076ms
FPS: 59.01507
Render Loop With GLClear: 16.868081ms
FPS: 59.01507
Render Loop With GLClear: 17.103599ms
FPS: 59.01507
Render Loop With GLClear: 16.537185ms
FPS: 59.01507
Render Loop With GLClear: 16.113545ms
FPS: 59.01507
Render Loop With GLClear: 19.107407ms
FPS: 59.01507
Render Loop With GLClear: 14.848475ms
FPS: 59.01507
Render Loop With GLClear: 17.217409ms
FPS: 59.01507
Render Loop With GLClear: 15.462285ms
FPS: 59.01507
Render Loop With GLClear: 16.990375ms
FPS: 59.01507
Render Loop With GLClear: 16.466676ms
FPS: 59.01507
Render Loop With GLClear: 16.447074ms
FPS: 59.01507
Render Loop With GLClear: 16.739058ms
FPS: 59.01507
Render Loop With GLClear: 16.659771ms
FPS: 59.01507
Render Loop With GLClear: 17.08429ms
FPS: 59.01507
Render Loop With GLClear: 16.413429ms
FPS: 59.01507
Render Loop With GLClear: 15.829752ms
FPS: 59.01507
Render Loop With GLClear: 16.919573ms
FPS: 59.01507
Render Loop With GLClear: 16.379198ms
FPS: 59.01507
Render Loop With GLClear: 16.909041ms
FPS: 59.01507
Render Loop With GLClear: 17.236426ms
FPS: 59.01507
Render Loop With GLClear: 15.477499ms
FPS: 59.01507


As you can see the fps when vsync is far more consistent as well as the milliseconds only varying by 1-3 every so often.

And here is the render loop code:
if(win != null) {
	while(!win.isClosing()) {
		Profiler.start();
		float time = win.getTime() * 1000;
		deltaTime = time - lastTime;
		lastTime = time;
		FPSCounter.update(time);
		glClear(GL11.GL_COLOR_BUFFER_BIT | GL11.GL_DEPTH_BUFFER_BIT);
		glClearColor(0,0,1,1);
		if(FPSCounter.getFPS() > 0) {
			System.out.println("FPS: " + FPSCounter.getFPS());
		}
		win.swapBuffers();
		win.poll();
		Profiler.end("Render Loop With GLClear");
	}
}

spasi

I'm not sure I understand the timings above. There doesn't seem to be a correlation between glClear times and FPS. Also, how can the FPS values be so stable?

Anyway, I cannot reproduce such stuttering on my machine (GTX 970, latest drivers, Windows 10, Java 8u102). Tried both fullscreen and windowed modes. As I said, some example code that reproduces the issue would be nice (I don't know what Profiler or FPSCounter in your code are doing, could the problem be there?), or maybe you could prepare a binary that I can run on my machine.

metric

Indeed, it looks like it is something to do with the main part of my code. After letting my machine set for the whole day and commenting out lots of code I have come to that conclusion. I booted a simple test with just a quad rendered and ui text for fps with the deferred rendering and it works just fine with no stuttering. So, it seems I have some debugging to do, to see where something isn't getting released on closure from the main part of the code. I have a feeling it is probably a vbo or texture that isn't being deleted properly somewhere along the lines. Either that or it doesn't like the threading I am doing at parts.

metric

Spasi, I thought you might find this interesting, but I tracked down the exact point where it is catching in the render loop. This is with 3.0.1 build 17 of lwjgl now. With the 3.0.1 build 17 it isn't as bad as it was, but it still does it every so often.

Every so often one of the renderers when binding causes a huge thread block. I am talking upwards of 600+ms. I have seen it goes as high as 1500ms. Would there be any specific reason why a simple vao bind would cause a thread block? It seems to be somewhat random as to which one it happens on. I have eliminated all other threading at this point just to track this down. Everything is on the main thread.

Here is the bind code for the renderer:

public void bind() {
	if(vao != null) {
		vao.bind();
	}
}


Here is the inner code for the main vao.bind() function:

public void bind() {
      glBindVertexArray(id);
 }


id of course is the int reference that opengl provides.

That is all there is in the code. At this specific point is when it thread blocks every so often on my machine with 100+ vao. I am not even changing the materials because all these vao use the same shaders. The only thing the shader is being updated with is the model matrix4 for each object before the vao is bound. The vbos of the vao are not even being changed at this point as they have already been filled.

spasi

Quote from: metric on August 27, 2016, 17:40:37Would there be any specific reason why a simple vao bind would cause a thread block? It seems to be somewhat random as to which one it happens on.

If it's random, it could be a resource that's being exhausted and the driver stalls waiting for the GPU to catch up. Are you putting too much pressure on GPU memory? Note that the GTX 970 has an unconventional memory architecture.

Also, there have been people in the past saying that VAOs are useless for performance. I'm not sure what the situation with current drivers is, but I remember suggestions like binding a single VAO at startup (required by modern OpenGL), then writing the rest of your renderer as if VAOs don't exist.

Quote from: metric on August 27, 2016, 17:40:37I have eliminated all other threading at this point just to track this down. Everything is on the main thread.

There's one more source of threading in Nvidia's OpenGL implementation. Open the NVIDIA Control Panel -> Manage 3D settings -> Threaded optimisation -> set to Off.