[FIXED] Accurate Display.sync()

Started by kappa, February 20, 2012, 22:52:04

Previous topic - Next topic

kappa

LWJGL's Display.sync() method has been broken/inaccurate for a while now. It doesn't do an accurate 60fps, many have abandoned it for custom solutions and you might as well use Thread.sleep() directly for similar results.

So thought I'd have an attempt at a replacement that beats it. Generally the most accurate solutions seem to be the ones that don't sleep and burn CPU using something like Thread.yield() (or nothing at all). However on the other hand sleeping is needed to stop overuse of the CPU/GPU but sleeping on its own is inaccurate.

So the solution I propose is a hybrid of the above, basically sleep as much as possible but then burn a little CPU to maintain the accuracy. Here's what I've come up with

/**
	 * An accurate sync method
	 * 
	 * Since Thread.sleep() isn't 100% accurate, we assume that it has 
	 * roughly a margin of error of 1ms. This method will sleep for the 
	 * sync time but burn a few CPU cycles "Thread.yield()" for the last 
	 * 1 millisecond plus any remainder micro + nano's to ensure accurate 
	 * sync time.
	 * 
	 * @param fps The desired frame rate, in frames per second
	 */
	public static void sync(int fps) {
		if (fps <= 0) return;
		
		long errorMargin = 1000*1000; // 1 millisecond error margin for Thread.sleep()
		long sleepTime = 1000000000 / fps; // nanoseconds to sleep this frame
		
		// if smaller than sleepTime burn for errorMargin + remainder micro & nano seconds
		long burnTime = Math.min(sleepTime, errorMargin + sleepTime % (1000*1000));
		
		long overSleep = 0; // time the sleep or burn goes over by
		
		try {
			while (true) {
				long t = getTime() - lastTime;
				
				if (t < sleepTime - burnTime) {
					Thread.sleep(1);
				}
				else if (t < sleepTime) {
					// burn the last few CPU cycles to ensure accuracy
					Thread.yield();
				}
				else {
					overSleep = Math.min(t - sleepTime, errorMargin);
					break; // exit while loop
				}
			}
		} catch (InterruptedException e) {}
		
		lastTime = getTime() - overSleep;
	}
	
	/**
	 * Get System Nano Time
	 * @return will return the current time in nano's
	 */
	private static long getTime() {
	    return (Sys.getTime() * 1000000000) / Sys.getTimerResolution();
	}


Basically the code assumes that Thread.sleep(1) can be about 1ms inaccurate (errorMargin) so sleep for all the time and burn the CPU for the small errorMargin time. Nano time is used as it allows easy swapping with System.nanoTime(), in case we ever want to do that in the future.

I've been testing it for a number of days on different systems and have gotten really nice results with it (solid 60fps), much better than the current Display.sync(). Thought I'd post it here for some peer review and feedback before considering whether its a feasible solution to replace the current LWJGL sync method.

So what do you think?

princec

I get a variance of about 5ms with sleep() on my i7. Maybe you could get it to auto-tune itself?

Cas :)

kappa

Quote from: princec on February 21, 2012, 10:19:55
I get a variance of about 5ms with sleep() on my i7. Maybe you could get it to auto-tune itself?

Cas :)
ooh, that is actually a pretty good idea and I think very doable. I'll do some experimentation later today with auto tuning code and see what I can come up with. The above code (if you include the oversleep bit) already accounts for a variance of up to 2ms.

Just curious, the 5ms variance sounds pretty bad, what windows edition are you using? I had heard the Thread.sleep() situation had gotten better since Windows Vista+. I've tested above code on Windows XP/Pentium 4 and it does work pretty well, maybe its because its a quad core or something?.

princec

Core i7 @ 2.6ghz with Vista64. It's pretty steady most of the time but spikes fairly consistently if irregularly by 5ms either way. I guess if you just make a note of the maximum error that sleep() gives (after say about 1000-2000 frames) and keep the threshold set to that value it'll tune itself out pretty fast.

Cas :)

kappa

Quote from: princec on February 21, 2012, 12:37:20
Core i7 @ 2.6ghz with Vista64. It's pretty steady most of the time but spikes fairly consistently if irregularly by 5ms either way. I guess if you just make a note of the maximum error that sleep() gives (after say about 1000-2000 frames) and keep the threshold set to that value it'll tune itself out pretty fast.
Thanks for that, implemented the above and it works rather nicely on my Windows Vista machine.

It kinda breaks on the Windows XP machine, I get 1-2ms sleep and a random spike of about 3ms-5ms after every 60 frames or so. Which works fine using the auto tuning however randomly I get like one or two major spikes during the program run of 20ms+ (especially when dragging the LWJGL Display window, even seen values of up to 50-65ms) which break the auto tuning as they causing the loop to run full time using Thread.yield(). Since these big spikes happen so rarely, i would say it should be safe to ignore those by setting the auto tune code to only consider sleep values between 1ms to 10ms and ignore anything above that. What do you think?

A useful page on Thread.sleep() on XP which shows that it is usually < 10ms even on a busy system.

kappa

Just tested above and setting the auto tuning to only consider Thread.sleep() values between 1ms-10ms seems to work pretty nicely on XP at least.

princec

Hm that seems a little kludgy. Maybe we could come up with a better heuristic - I think possibly the frequency of spikes should probably be taken into account somehow, over a short period, and picking the highest one over the last few seconds. And then continually doing that; so you need to hold the last few seconds of sleep data in a circular buffer. That way it'll keep autotuning if some other system characteristic caused it to spike a bit, and it'll return to normal after a few seconds, too.

Cas :)

kappa

Testing out an auto tuning method that continually adapts using the following:

- Increasing the time that Thread.yield() is used by 1ms every time there is a Thread.sleep() spike greater than it (there by decreasing the time Thread.sleep() is used) this eventually tunes it to the right value after a few spikes.

- Slowly decrease the Thread.yield() time by 10 micro's when Thread.sleep() spikes are smaller than the Thread.yield() time, Any spike will push it back up again.

This seems to work pretty well (at least on XP) and continuously auto tunes the time Thread.yield() is used. So pretty useful especially if the System becomes more or less busy. I'll run a few more tests on some different Systems to see if it holds up.

spasi

Quote from: kappa on February 21, 2012, 10:55:58Just curious, the 5ms variance sounds pretty bad, what windows edition are you using?

I don't think it's the OS, but the CPU going into a deep sleep state. If we assume a Display.sync()ed Puppygames game and nothing else heavy running in the background, it's probably quite likely that an i7 will go to sleep often. So the spikes are due to the CPU waking up cores rather than some OS scheduling weirdness. Cas, try running a heavy process in the background and see if the spikes go away.

kappa, two random ideas you could try:

- Rotate between Thread.sleep(1) and yielding in the sync loop, instead of sleeping all the way to the burn time threshold. This could deter the CPU from going to a sleep state.
- Use LockSupport.parkNanos() instead of Thread.sleep(). It's supposed to have lower latency (probably not on Windows though).

spasi

Another idea: Spikes could be caused by the thread getting scheduled to a different core after the sleep call. You could try setting the process affinity (through the Windows task manager) or thread affinity (Java Thread Affinity) to a single core and see if it fixes the spikes.

kappa

@spasi thanks for that explanation, will test out the alternating sleep and yield() idea. Not tried LockSupport.park*() methods but might be worth investigating.

I've got the adaptive code to work really nice now, seems to work smoothly and accurately on all the machines I've tested on (including XP) while using minimal yielding by continually auto adapting to the system its running on.

/**
	 * Get System Nano Time
	 * @return will return the current time in nano's
	 */
	private static long getTime() {
	    return (Sys.getTime() * 1000000000) / Sys.getTimerResolution();
	}
	
	static long variableYieldTime;
	
	/**
	 * An accurate sync method that adapts automatically
	 * to the system it runs on to provide reliable results.
	 * 
	 * @param fps The desired frame rate, in frames per second
	 */
	public static void sync(int fps) {
		if (fps <= 0) return;
		
		long sleepTime = 1000000000 / fps; // nanoseconds to sleep this frame
		// yieldTime + remainder micro & nano seconds if smaller than sleepTime
		long yieldTime = Math.min(sleepTime, variableYieldTime + sleepTime % (1000*1000));
		long overSleep = 0; // time the sync goes over by
		
		try {
			while (true) {
				long t = getTime() - lastTime;
				
				if (t < sleepTime - yieldTime) {
					Thread.sleep(1);
				}
				else if (t < sleepTime) {
					// burn the last few CPU cycles to ensure accuracy
					Thread.yield();
				}
				else {
					overSleep = t - sleepTime;
					break; // exit while loop
				}
			}
		} catch (InterruptedException e) {}
		
		lastTime = getTime() - Math.min(overSleep, sleepTime);
		
		// auto tune the time sync should yield
		if (overSleep > variableYieldTime) {
			// increase by 200 microseconds (1/5 a ms)
			variableYieldTime = Math.min(variableYieldTime + 200*1000, sleepTime);
		}
		else if (overSleep < variableYieldTime - 200*1000) {
			// decrease by 2 microseconds
			variableYieldTime = Math.max(variableYieldTime - 2*1000, 0);
		}
	}

	/**
	 * Get System Nano Time
	 * @return will return the current time in nano's
	 */
	private static long getTime() {
	    return (Sys.getTime() * 1000000000) / Sys.getTimerResolution();
	}

kappa

above is now committed and should be part of the next nightly build.

Obsyd

This new sync with Thread.yield is sometimes giving me 30-70% cpu usage.
(with a simple empty gameloop and with a more "advanced" loop)
On a 3Ghz C2D.
OS: mac osx

kappa

Thanks for reporting, I've tweaked the Display.sync() method further, it had a minimum Thread.yield() of 1ms of the total sync() time, I have reduced this down to 0 now, so if there is a perfect Thread.sleep() there will be no Thread.yield()'ing at all. Do test the next nightly build of LWJGL.

Some testing here shows CPU usage to be 1% or below with a basic game loop. The sync() method will sleep as much as possible and only yield for the amount of time by which Thread.sleep() is inaccurate (0.2ms here on my linux system), so Display.sync() should still use Thread.sleep() for most of its duration.

Just curious, how are you measuring CPU usage?

Obsyd

With atMonitor and the built in activity monitor.