LWJGL discussion forum officially migrates here today!

Started by princec, June 10, 2003, 15:01:25

Previous topic - Next topic

elias

the createDirectBuffer needs to go too. It is only used by calls like wglAllocateMemoryNV which need to return a ByteBuffer directly.

- elias

William

Cas, this is such a great decision that I felt compelled to register just to compliment you on making it. I have not really used or been involved in LWJGL, even when I have coded directly to the OpenGL layer I have just used GL4Java, and the use of int pointers was one major reason for that. The API is definitely moving in the right direction.

princec

Oh goody! It seems to be a pretty popular decision.

I'm not sure where the fear of slices() comes from - they're tiny little objects really. To create a vertex buffer you'd have, what, maybe 3 slices? Vertices, normals, texcoords? Not a big deal really. You should see all the nasty static final int offsets I use with the current system.

Remember, we're not setting up these buffers every frame or anything - probably just once at init. We've got to let go our hangups about creating objects, because they're important!

Cas :)

jbanes

Actually, creating objects isn't so bad. It's the deleting part that'll get you. :)

spasi

Quote
the createDirectBuffer needs to go too. It is only used by calls like wglAllocateMemoryNV which need to return a ByteBuffer directly.

From the ARB_vertex_buffer_object extension specification:


The entire data store of a buffer object can be mapped into the client's address space by calling

void *MapBufferARB(enum target, enum access);

with <target> set to ARRAY_BUFFER_ARB.  If the GL is able to map the buffer object's data store into the client's address space, MapBufferARB returns the pointer value to the data store.


And more functions like this one are coming soon (see pixel_buffer_object extension, GL2 will have more...)

We HAVE to be able to create a ByteBuffer from a pointer address.

Cas, you're right that a few slices won't be problem (for performance/memory), but having to keep 3 ByteBuffers around, instead of a single one...I think it's probably gonna get ugly. We'll support any decision you guys make, but please consider all the possible solutions first.

princec

Worry not - instead of returning pointers we will be returning DirectByteBuffers, set with appropriate limits, too :)

And sure it'll be a bit ugly but that's Java for you. Until we get Doug Twlleager to do structs for us.

Cas :)

elias

I think what worries spasi is the VBO case which could potentially create a new buffer at every mapping, that is, every frame. But those kind of problems could probably be solved by only returning a new buffer if the address changes (which it probably won't in the general case)

- elias

princec

Hm, you've got a point there. And you won't be able to do that clever trick because that means keeping a hash map of buffer configurations to addresses lying around doesn't it? And that's asking for trouble.

Having said that - creating a buffer every frame won't tax the gc at all. Even creating 100 buffers every frame won't I don't think.

Cas :)

elias

Well you could always pass the old buffer to the map method and then it would only return a new one if the old is null or has different address. A litle ugly but better than addresses I think. The check is going to be performed by the app anyway to avoid new buffers anyway.

- elias

elias

To move the discussion here where it belongs from "nVIDIA's Cg" thread:

We're to determine a good solution to the int pointers -> buffers convertion problem. My personal favorite right now is to use ByteBuffers directly and fetching the address from C native code. To get insight on just how costly this is, I created a test, using glVertexPointer as my testing point. It will not compile other places than here, because I added a buffer version of glVertexPointer to my private library.

Here's the prototype for the new glVertexPointer:

public void vertexPointer2(int size, int type, int stride, ByteBuffer buffer);


Here's the code:

package org.lwjgl.test;

import org.lwjgl.*;
import org.lwjgl.opengl.GL;
import java.nio.*;

/**
 * @author Elias Naur
 */
public class NativeCallTimingTest {
        private final static int WARMUP_ITERATIONS = 5;
        private final static int ITERATIONS = 10000000;

        public static void main(String[] args) {
                GL gl = null;

                try {
                        gl = new GL("WindowCreationTest", 50, 50, 320, 240, 16, 0, 0, 0);
                        gl.create();
                } catch (Exception e) {
                        e.printStackTrace();
                }
                System.out.println("Display created");

                gl.tick();

                long time_taken_nounpack = 0;
                long time_taken_unpack = 0;
                for (int j = 0; j < WARMUP_ITERATIONS; j++) {
                        long before;
                        long after;
                        ByteBuffer buffer = ByteBuffer.allocateDirect(4096).order(ByteOrder.nativeOrder());
                        int address = Sys.getDirectBufferAddress(buffer);
                        before = System.currentTimeMillis();
                        for (int i = 0; i < ITERATIONS; i++)
                                gl.vertexPointer(4, GL.FLOAT, 0, address);
                        after = System.currentTimeMillis();
                        time_taken_nounpack = (after - before);
                        before = System.currentTimeMillis();
                        for (int i = 0; i < ITERATIONS; i++)
                                gl.vertexPointer2(4, GL.FLOAT, 0, buffer);
                        after = System.currentTimeMillis();
                        time_taken_unpack = (after - before);
                        System.out.println("No unpack, time taken: " + time_taken_nounpack + " millis");
                        System.out.println("With unpack, time taken: " + time_taken_unpack + " millis");
                }
                double ratio = (double)time_taken_unpack/time_taken_nounpack;
                System.out.println("FINAL: No unpack, time taken: " + time_taken_nounpack + " millis");
                System.out.println("FINAL: With unpack, time taken: " + time_taken_unpack + " millis");
                System.out.println("FINAL: Ratio unpack/nounpack: " + ratio);

                gl.destroy();
        }
}


And here's the result:

[elias@ip172 tmp]$ java -Djava.library.path=. -cp lwjgl.jar:lwjgl_test.jar org.lwjgl.test.NativeCallTimingTest
Display created
No unpack, time taken: 2979 millis
With unpack, time taken: 7154 millis
No unpack, time taken: 2970 millis
With unpack, time taken: 7075 millis
No unpack, time taken: 2927 millis
With unpack, time taken: 7070 millis
No unpack, time taken: 2909 millis
With unpack, time taken: 7152 millis
No unpack, time taken: 2893 millis
With unpack, time taken: 7126 millis
FINAL: No unpack, time taken: 2893 millis
FINAL: With unpack, time taken: 7126 millis
FINAL: Ratio unpack/nounpack: 2.4631870031109573


So a rough 2.5 times increase in time taken per call with one buffer.

- elias

elias

The overhead is roughly 400 nanos if you do the math.

- elias

princec

That correlates pretty much with what I discovered - about 300ns on my 1.2ghz machine.

Or, if you extrapolate, about 1us on a low end 350Mhz machine. So you could really get away with a thousand of them and still only use 5% of your frame time. And if you need to do a thousand calls you're almost certainly not going to get performance anyway on something that low-end as you'll be rendering far, far too much for its weedy graphics card.

Cas :)

elias

Yes I forgot that, my machine is a 700 MHz athlon.

So this means that this is a viable solution we're all happy with? I'm okay with that, as long as we change the calls to be static at the same time.

Should we poll it or is there not enough good alternatives to warrant one?

- elias

princec

There are no real alternatives to it that will give us security and credibility.
The next question is do we implement it natively as macros or a function call? Function call will result in a much smaller library, but with another 100ns overhead... unless it gets inlined of course...

Cas :)

elias

I'm not sure what you mean by macro or function - is it the env->GetDirectBufferAddress call you mean?

- elias