I'm dynamically rendering a 2d scene in my app, using VBO's that are dynamically streamed each render pass. Everything is working great, except that my main bottleneck is putting stuff in the direct bytebuffer that is created via lwjgl.
The app peaks at ~30000 drawn sprites. that's 30000 invokations of the following:
buffer = BufferUtils.createByteBuffer(BUFFER_SIZE);
void render(TextureCoords t, TextureCoords to, int x1, int x2, int y1, int y2, COLOR color, OPACITY opacity) {
buffer.putShort((short) x1).putShort((short) y2);
buffer.putShort(t.x1()).putShort(t.y2());
buffer.putShort(to.x1()).putShort(to.y2());
buffer.put(color.red()).put(color.green()).put(color.blue()).put(opacity.get());
buffer.putShort((short) x2).putShort((short) y2);
buffer.putShort(t.x2()).putShort(t.y2());
buffer.putShort(to.x2()).putShort(to.y2());
buffer.put(color.red()).put(color.green()).put(color.blue()).put(opacity.get());
buffer.putShort((short) x1).putShort((short) y1);
buffer.putShort(t.x1()).putShort(t.y1());
buffer.putShort(to.x1()).putShort(to.y1());
buffer.put(color.red()).put(color.green()).put(color.blue()).put(opacity.get());
buffer.putShort((short) x2).putShort((short) y1);
buffer.putShort(t.x2()).putShort(t.y1());
buffer.putShort(to.x2()).putShort(to.y1());
buffer.put(color.red()).put(color.green()).put(color.blue()).put(opacity.get());
count++;
}
I'm calling this 30000 times, 60 times per second and it eats roughly 30% of the capacity of my thread. This is fast, don't get me wrong, but I'm wondering if it can be made faster.
I've tried batching my vertices in an array in JVM memory and then put it all into the buffer, but that didn't help.
I'm curious about bound checks and endian-conversions, as I believe specifically the endian conversion can be quite expensive.
Any tips of how to speed things up?