Display.update slow when disk I/O usage high

Started by mbaumbach, November 15, 2010, 02:09:16

Previous topic - Next topic

mbaumbach

This may be hard to explain, but essentially I have a game using LWJGL and it runs on average about 100 FPS at about 50% CPU on a typical machine. The frames stay consistent even as various logic pieces are hit (Collision detection, ray tracing, etc.).

But occasionally and rather randomly the CPU spikes to nearly 100% and then my frames drop to 10-12. It can stay this way for quite some time before it recovers. I tried timing various parts of my code and disabling various parts, but the culprit seems to be Display.update in my game loop, taking anywhere from 50-80ms. Typically it's 0-10.

I've tried all sorts of remedies, including disabling all of my logic code, enabling Vsync, and calling Display.sync before the Display.update. I tend to have less problems when Vsync or sync is used to limit the framerates. Is this normal? Would hitting too high of framerates cause Display.update to fall behind? My friend running a much better computer than I gets about 400-500 FPS, but also will hit these spikes from time to time where it drops to about 20 FPS, however his generally recovers very quickly, or it just seems like a short blip. But it is noticeable and annoying.

I'm just trying to nip this in the bud before it becomes a bigger problem if it's something in my code. Any ideas, thoughts, or things I could try?

EDIT: Updated subject to better reflect the issue.

jediTofu

To me, this sounds like a garbage collection problem rather than a Display.update problem.  Usually in Java games, the gc is to blame for random spikes like this (especially later in the game when you have used a lot of resources)...

Every interval (maybe 5 seconds? experiment), call System.gc(), and see if this fixes it.  Also, make sure to set variables to null when you no longer need them.
cool story, bro

mbaumbach

Thanks for the tip. I'll certainly play around with that. I have had times where it starts up and immediately is at 90% CPU and running low FPS. But the more common case tends to be every couple of minutes.

EDIT: Just a basic test, I had System.gc() get called about every 5 seconds (both on the current thread running the game loop and on it's own thread, but I didn't expect that to make a difference) and I saw an even worse, more noticeable blip every time it was called.

BatKid

If it is indeed a gc problem, then I think calling gc() all the time is not the answer (it cost something everytime gc is called).  

This is a shot in the dark, but in your code, are you doing a lot of "new ..."?  If you are, then you may want to consider re-writing your code to eliminate these new calls.  For example, if you have a method that allocates a new array and return that array, you can eliminate the allocation by making the return array a field.

Projects: 
   Env3D (http://env3d.org): Learn Java in 3D
   WhaleChat (http://whalechat.com): A 3D social programming experiment

jediTofu

Quote from: mbaumbach on November 15, 2010, 02:40:01
EDIT: Just a basic test, I had System.gc() get called about every 5 seconds (both on the current thread running the game loop and on it's own thread, but I didn't expect that to make a difference) and I saw an even worse, more noticeable blip every time it was called.

The question is how often does it blip with the original code.  I was just guessing 5 seconds.  If it blips every 30 seconds, then maybe 15 seconds would call some of it earlier smoothing it out.  I'm not sure what would be the best interval in order to make it smooth.  I think that java.util.Timer would be best for this...but also try in the main loop using System.currentTimeMillis or equivalent, like you said you had.  Of course, the optimum time to call the gc is when very little is happening on screen.

If you search google with "java game garbage collection," you'll get some more detailed hits on smoothing out the gc in your game.

Also, there are a number of VM args (which are easily specified in Web Start Apps and Applets) you could pass for adjusting the GC for your needs:
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
http://download.oracle.com/javase/6/docs/technotes/tools/windows/java.html


Of course, the gc could not be the problem.  It could also be a number of problems (e.g., need to use both the gc and sync).
cool story, bro

mbaumbach

I'll certainly look into reducing the creation of some of my objects, I guess my question is more general though. Is there any other ideas or does the Display.sync make sense for why it might remove some of these problems? Obviously I would prefer to pull as many frames per second out as I can, but if that causes problems, I'm more than fine telling the Display to sync in the 60-80 FPS range.

I've done a bit of YourKit profiling with the game and the object allocation isn't anything that seems outrageous to me. Forcing a garbage collection when I start to see the problems clears up all the memory with no hints to a memory leak being present and the FPS still doesn't improve until it seemingly feels like it. ;D

I appreciate all the ideas/help guys!

EDIT: Perhaps a better question would be, what's the benefit to using sync vs. just letting the game go full throttle? Obviously you'll use less CPU/GPU resources if the computer can handle more than 60, but is that the only benefit?

princec

I'm not using Display.sync() any more because it uses Thread.sleep(1) which I'm afraid suffers erratic timing (at least on Windows). Instead I use a tight loop on Thread.yield() instead and get rock solid framerates. When I get a pause, it doesn't slow the frame rate down; it literally stops for a fraction of a second. This is what I'm finding odd about your setup - GC normally stops everything altogether rather than just slows it down a bit. Are you sure it's not some other process using the CPU? (Windows Messenger for example is turning out to do VERY EVIL THINGS to my desktop to the point where I'm actually considering uninstalling it)

Cas :)

broumbroum

Quote from: mbaumbach on November 15, 2010, 02:09:16
...
...My friend running a much better computer than I gets about 400-500 FPS, but also will hit these spikes from time to time where it drops to about 20 FPS, however his generally recovers very quickly, or it just seems like a short blip. But it is noticeable and annoying.
...
There's the matter that your friend also experiences the same "slowerness", from 400 to 20 that's about 100% to 5%, where you talk about 100% to 10% while your computer is slowing down. That leads to a hypothetical unlinearity of the stats, which typically means part of your code uses the same hardware on both machines. Can you find a common device in both machines ? I'd bet it is the harddisk speed. Thus a piece of the code would make some r/w to the disk. Either way, System garbage collection may significantly reduce the speed.
-Are you accessing the Java2D graphics in your game loop ? (this is lot of heap memory and therefore may implies file swapping)
-What's the harddisk cache consumption before you run the program and after you've terminated it ?
-Do you load any .jar resource from within the loop ? (your rendering thread will not have the time to free up those resources before transfering them to the front rendering)

mbaumbach

Thanks for all the replies. So to answer some of the questions:


  • The GC stopping things altogether is what I experience when I force a garbage collection while using YourKit. That isn't the symptom I was seeing that made me open this thread.
  • I actually think it has something more to do with a runaway process causing Display.update to get backed up and thus try to play catch up, but I don't know enough about the internals of Display.update and why Display.sync might help keep that from happening.
  • I'm not doing any Java2D graphics at all, everything is being done in OpenGL. I'm actually currently using Display to create my window, so it's not even a child of a parent Canvas yet.
  • Watching the disk activity, it does seem to be happening when disk activity spikes. Unfortunately, my process isn't even showing up as doing any disk activity, which makes sense since I'm not doing any specific reads or writes. But perhaps I'm just shoving too much into memory that it's making paging an issue for the whole OS. I'm using about 500mb of 4GB, so I could try cutting down on that, but my game is dealing with an absolutely massive map, so I'd prefer to keep it as massive as possible. The graphics data is stored in VBOs on the video card and some of the necessary model code is obviously stored in memory (that's the 500mb).
  • No to the .jar resource loading. There's a fair amount of loading that occurs up front to store the map in VBOs and model into memory, but from then on it's mostly rendering. I delete and regenerate a VBO when the map portion that is stored in that VBO is modified in game (Only happens as an action of the user in the game, not every frame or something like that, that would be an easy culprit if that was the case :) ). This could be improved by making the VBOs a bit bigger, subbing the data into and out of it and only recreating it if I reach some size limit. I can't imagine that being the cause as the slow downs aren't directly correlated to those being modified. I've seen the slow down happen without any VBO destruction/creation.

I actually tried reducing my map to a pathetically small amount and still saw the problem. To me this sounds like other processes just getting in the way. It's certainly a lot easier to reproduce with other memory hogging apps running.

jediTofu

Quote from: mbaumbach on November 15, 2010, 15:22:06

  • Watching the disk activity, it does seem to be happening when disk activity spikes. Unfortunately, my process isn't even showing up as doing any disk activity, which makes sense since I'm not doing any specific reads or writes. But perhaps I'm just shoving too much into memory that it's making paging an issue for the whole OS. I'm using about 500mb of 4GB, so I could try cutting down on that, but my game is dealing with an absolutely massive map, so I'd prefer to keep it as massive as possible. The graphics data is stored in VBOs on the video card and some of the necessary model code is obviously stored in memory (that's the 500mb).

I actually tried reducing my map to a pathetically small amount and still saw the problem. To me this sounds like other processes just getting in the way. It's certainly a lot easier to reproduce with other memory hogging apps running.

From reading the first paragraph, it sounds like you're exceeding the max heap size, which is then causing the reading/writing from disk rather than RAM, but then in your 2nd paragraph, it seems to even happen with small map sizes.  But, you could still try setting these to be higher using -Xms (Initial) and -Xmx (max) VM arguments, just to give it a shot.  Once you have loaded everything right before the game (making sure to null variables you don't need), call System.gc() just once there.  Then try Thread.yield() in the loop as princec suggested.

EDIT:  All Display.update is doing is changing boolean values, swapping the buffers, and polling input devices.  This could be the problem, but I think that there would be more complaints, and the code is pretty straightforward.  During these spikes, are there tons of stuff being drawn to the screen?  Perhaps you're drawing too many things that can't be seen anyway?
cool story, bro

mbaumbach

Quote from: jediTofu on November 15, 2010, 19:01:29
Quote from: mbaumbach on November 15, 2010, 15:22:06

  • Watching the disk activity, it does seem to be happening when disk activity spikes. Unfortunately, my process isn't even showing up as doing any disk activity, which makes sense since I'm not doing any specific reads or writes. But perhaps I'm just shoving too much into memory that it's making paging an issue for the whole OS. I'm using about 500mb of 4GB, so I could try cutting down on that, but my game is dealing with an absolutely massive map, so I'd prefer to keep it as massive as possible. The graphics data is stored in VBOs on the video card and some of the necessary model code is obviously stored in memory (that's the 500mb).

I actually tried reducing my map to a pathetically small amount and still saw the problem. To me this sounds like other processes just getting in the way. It's certainly a lot easier to reproduce with other memory hogging apps running.

From reading the first paragraph, it sounds like you're exceeding the max heap size, which is then causing the reading/writing from disk rather than RAM, but then in your 2nd paragraph, it seems to even happen with small map sizes.  But, you could still try setting these to be higher using -Xms (Initial) and -Xmx (max) VM arguments, just to give it a shot.  Once you have loaded everything right before the game (making sure to null variables you don't need), call System.gc() just once there.  Then try Thread.yield() in the loop as princec suggested.

EDIT:  All Display.update is doing is changing boolean values, swapping the buffers, and polling input devices.  This could be the problem, but I think that there would be more complaints, and the code is pretty straightforward.  During these spikes, are there tons of stuff being drawn to the screen?  Perhaps you're drawing too many things that can't be seen anyway?

I actually am using the -Xmx1024m and -Xms512m and I hover around 580MB pretty stable (even when the slow down occurs). I am null-ing my variables, clearing lists, and called System.gc() after all of my initial load takes place, but no luck. :-\

In terms of drawing a lot on the screen, no, nothing outrageous. I can have the problem occur even without changing my frustum/viewpoint at all. I'm doing typical frustum culling as well as some occlusion culling (not realtime occlusion... yet).

It is directly correlated to my disk I/O speed that the OS is telling me. When the disk I/O speed is around 10MB/s, things are just dandy, but it sometimes spikes in I/O operations and that transfer speed will crawl to 100Kb/s. That's when I see it. So I guess I'm just going to look into what could be causing that, as my game isn't even reporting read/write operations (nor is Java) but it's hard to tell how much Windows is really letting me know. :)

I'll look into the Thread.yield too. Thanks guys, let me know if any other ideas come to mind!

AndyGFX

I had same problem. Check your code, how many time you have used local buffer variables which are used as argument to any your or GL function in main loop. This buffer isnt cleared when is used as argument to function. Every time is created new pointer and when is out of limit, then GC clear it, and CPU is on 100%.

mbaumbach

Quote from: AndyGFX on November 16, 2010, 00:25:33
I had same problem. Check your code, how many time you have used local buffer variables which are used as argument to any your or GL function in main loop. This buffer isnt cleared when is used as argument to function. Every time is created new pointer and when is out of limit, then GC clear it, and CPU is on 100%.

I have one FloatBuffer which I use for transferring the model data to the VBOs, but I re-use it for all of them and clear it when I'm done with it. I don't think clearing it makes a difference though as it takes up memory regardless, just moves the "mark" pointer to the beginning so that data can be overridden. In any case, all of my float buffers are re-used/shared where necessary.

The biggest reason I target this to external sources is that sometimes the game starts up with the slowness, so it doesn't seem to relate to any kind of leaking resources.