static vector operations and ReadableVector3f

Started by Jens v.P., October 24, 2007, 16:25:57

Previous topic - Next topic

elias

Making the fields protected breaks existing users of the vector utils, so let's try the abstract gettters in ReadableVector first, and see if we can't get the required performance from it.

- elias

Jens v.P.

IMHO it should make no difference whether an abstract method is defined in an abstract class or in an interface. Thus, my last patch already implemented these abstract getters -- with less performance. The only way to improve the performance is by moving the fields (x,y,z) to the readable (and then abstract) classes and allowing direct access within the hierarchy by defining them abstract.

But you're right: Direct field access from outside the hierarchy would be not possible anymore... and that's indeed a problem.

elias

It does make a difference, any class can only inherit from one class, but can "inherit" multiple interfaces, and this can pose problems for a jvm. A simple interface->abstract class conversion did make a difference in our own vector stuff (which happens to be mostly stolen from lwjgl).

- elias

Matthias

simple get methods like this:
public float getX() { return x; }

are inlined by the VM is they are not overwritten in a subclass.

So making a ReadableVectorXf class with has fields and getters plus a writeable VectorXf class that extends the ReadableVectorXf class should get full performance.

The only question is the desired inheritance model - e.g. should Vector3f inherit Vector2f or not - the above model would prevent this.

But having Vector4f inherit from Vector3f can cause subtle bugs (e.g. Matrix.transform() for vectors and points). Also some methods are defined only for 3 components and not for 4 - like normalize() when w=1.0f is assumed.

My proposal:
public class ReadableVector3f {
  float x,y,z;
  public float getX() { return x; }
  ...
}
public class Vector3f extends ReadableVector3f {
  public Vector3f setX(float x); { this.x=x; return this; }
}


Ciao Matthias

princec

Maths doesn't fit very well with the "hierarchy" of vectors. A Vector3f is not actually a Vector2f in reality. In fact you could reasonably argue a Vector2f is actually derived from Vector3f, it's just that z == 0.

Cas :)

Jens v.P.

I've done some benchmarks. The program with source code is attached to this posting. The goal of this benchmark is to measure the performance of different design alternatives for implementing readable and writable vector classes.

The Test

Tested were two classes (or interfaces) implementing a Vector3f version. The first class (or interface) provides a read-only access while the second read-write access. Additional, a static method "add" is introduced as implemented in the current Vector3f class.
Three different design alternatives were implemented:

Concrete: The first one (called "Concrete") implements a concrete readable class with final getters and protected fields. The writable class extends this class providing setters.

Abstract: The second version (called "Abstract") implements an abstract readable class with abstract getters and no fields. The writable class extends this class, defining the fields (public) and implementing final getters and setters.

Interface: The third version (called "Interface") defines a read-only interfaces with getters, the writable class implements this interface and defines all fields (public) and final getters and setters.

The static add method has a similar signature in all three versions: left and right vector are passed as read-only instances, the result is a writable instance. If the passed result is null, a temporary variable is constructed and returned, otherwise the result is filled.

The benchmark was executed 4 times, two times with a server VM and two times with a client VM. For each VM, 1000000 calls of the add method and a setter were repeated 100 and 1000 times. The result are shown below and illustrated in the image.

The Result

The last version using interfaces is the slowest one in all tests. Using an abstract class is more or less as fast as using a concrete readable class. Interestingly it is even sometimes a little bit slower (which I didn't expected)!
Also surprising (at least for me) is the fact that in some tests (with the server VM) creating new temporary instances (i.e. passing null as result parameter) is faster then using an in-out parameter.
The server VM is in all tests much faster then the client VM. in some cases about 5 times. Even the slowest server tests is nearly as fast as the fastest client test!

Conclusion

Using abstract readable base classes is the winner. It doesn't change the current access possibilities, i.e. accessing the fields of a vector directly or by using the getter. It is just as fast as using concrete classes. So, I think the optimal design would look like the one I submitted above in this thread, except that the readable classes become abstract and that the fields are defined as public attributes in the vector classes. On the other hand, using interfaces (which is the clear looser of this benchmark) may be the most flexible design. IMHO vector classes are like "native" types for 3D programming and an interface is seldom needed, so I personally prefer using abstract classes.

Remarks

I was really surprised by the results. I didn't expect the server VM to perform so much faster. What surprised me most is that in some cases using abstract classes (and getters) is even faster then using a concrete class.
elias did already suggested using abstract classes and he is right! Maybe I should have trusted him in the first place, but it was still interesting doing this benchmark. princec pointed out that we have to worry about design more then performance since new (and better) VM versions will be available in the near future. I think he is right, too. And even if I declared "abstract" classes as the winner, using interfaces may be a better design decision because it is more flexible. On the other hand, using abstract classes was 1.3 to 2 times faster then interfaces in the nearly all cases (except one weird exception), so I think we should use that design.
Since I added the classes (and the sources) you may try the benchmark on your machine. Since the performance of the VM is so important, I'm curious about your result.

Since I'm not an expert on benchmarks, nor on VMs, nor on 3D, I wouldn't be surprised if you find some design errors in the benchmark or in my argumentation  :-)


The Numbers

$ java -server -jar benchmark.jar
Run benchmark, times: 100, count: 1000000
JRE 1.5.0_07 (Apple Computer, Inc.), Java HotSpot(TM) Server VM 1.5.0_07-87 ("Apple Computer, Inc.") on Mac OS X 10.4.10 (i386)
Concrete, create temp : 1,04 sec (1.040.273.000)         
Concrete, pass result : 0,51 sec (511.121.000) 
Abstract, create temp : 1,05 sec (1.052.720.000)
Abstract, pass result : 1,13 sec (1.126.970.000)
Interface, create temp: 1,04 sec (1.039.608.000)
Interface, pass result: 2,25 sec (2.248.686.000)

$ java -server -jar benchmark.jar 1000
Run benchmark, times: 1000, count: 1000000
JRE 1.5.0_07 (Apple Computer, Inc.), Java HotSpot(TM) Server VM 1.5.0_07-87 ("Apple Computer, Inc.") on Mac OS X 10.4.10 (i386)
Concrete, create temp : 10,84 sec (10.835.785.000)
Concrete, pass result : 5,04 sec (5.038.323.000)
Abstract, create temp : 10,51 sec (10.511.653.000)
Abstract, pass result : 9,64 sec (9.636.774.000)
Interface, create temp: 13,67 sec (13.669.681.000)
Interface, pass result: 22,45 sec (22.450.686.000)

$ java -client -jar benchmark.jar
Run benchmark, times: 100, count: 1000000
JRE 1.5.0_07 (Apple Computer, Inc.), Java HotSpot(TM) Client VM 1.5.0_07-87 ("Apple Computer, Inc.") on Mac OS X 10.4.10 (i386)
Concrete, create temp : 5,02 sec (5.024.559.000)
Concrete, pass result : 1,80 sec (1.797.204.000)
Abstract, create temp : 4,44 sec (4.444.387.000)
Abstract, pass result : 1,86 sec (1.864.297.000)
Interface, create temp: 6,87 sec (6.873.674.000)
Interface, pass result: 4,33 sec (4.327.192.000)

$ java -client -jar benchmark.jar 1000
Run benchmark, times: 1000, count: 1000000
JRE 1.5.0_07 (Apple Computer, Inc.), Java HotSpot(TM) Client VM 1.5.0_07-87 ("Apple Computer, Inc.") on Mac OS X 10.4.10 (i386)
Concrete, create temp : 51,42 sec (51.422.792.000)
Concrete, pass result : 18,05 sec (18.049.931.000)
Abstract, create temp : 50,21 sec (50.210.897.000)
Abstract, pass result : 18,46 sec (18.464.446.000)
Interface, create temp: 67,97 sec (67.965.217.000)
Interface, pass result: 43,23 sec (43.227.655.000)

elias

I'm not sure the abstract class implementatio is the winner:

$ java -server -jar benchmark.jar 1000
Run benchmark, times: 1000, count: 1000000
JRE 1.5.0_07 (Apple Computer, Inc.), Java HotSpot(TM) Server VM 1.5.0_07-87 ("Apple Computer, Inc.") on Mac OS X 10.4.10 (i386)
Concrete, pass result : 5,04 sec (5.038.323.000)
Abstract, pass result : 9,64 sec (9.636.774.000)


In this case, concrete seems quite a bit faster than abstract.

- elias

princec

As I understand it the Apple VM has no "server" VM - when you pass -server on the command line it merely alters some default memory and garbage collection parameters. There is no optimising compiler.

Can anyone clarify this?

Cas :)

ndhb

Quote from: jpilgrim on October 29, 2007, 13:13:44
Since I added the classes (and the sources) you may try the benchmark on your machine. Since the performance of the VM is so important, I'm curious about your result.

The results I am getting supports another conclusion. Using the client VM, abstract seems to be the slowest option.

Run benchmark, times: 100, count: 1000000
JRE 1.6.0_02 (Sun Microsystems Inc.), Java HotSpot(TM) Server VM 1.6.0_02-b06 (Sun Microsystems Inc.) on Windows Vista 6.0 (x86)
Concrete, create temp : 1,12 sec (1.121.485.831)
Concrete, pass result : 0,94 sec (935.372.792)
Abstract, create temp : 1,10 sec (1.103.308.661)
Abstract, pass result : 0,93 sec (933.168.042)
Interface, create temp: 1,06 sec (1.061.411.157)
Interface, pass result: 3,76 sec (3.758.560.097)

Run benchmark, times: 1000, count: 1000000
JRE 1.6.0_02 (Sun Microsystems Inc.), Java HotSpot(TM) Server VM 1.6.0_02-b06 (Sun Microsystems Inc.) on Windows Vista 6.0 (x86)
Concrete, create temp : 10,78 sec (10.775.736.861)
Concrete, pass result : 9,29 sec (9.290.345.002)
Abstract, create temp : 10,66 sec (10.656.240.947)
Abstract, pass result : 9,36 sec (9.359.918.090)
Interface, create temp: 10,50 sec (10.504.635.086)
Interface, pass result: 34,40 sec (34.400.562.362)

Run benchmark, times: 100, count: 1000000
JRE 1.6.0_02 (Sun Microsystems Inc.), Java HotSpot(TM) Client VM 1.6.0_02-b06 (Sun Microsystems Inc.) on Windows Vista 6.0 (x86)
Concrete, create temp : 2,03 sec (2.030.250.493)
Concrete, pass result : 0,77 sec (767.155.044)
Abstract, create temp : 6,72 sec (6.717.633.386)
Abstract, pass result : 4,67 sec (4.666.612.656)
Interface, create temp: 2,71 sec (2.710.840.903)
Interface, pass result: 1,49 sec (1.491.753.003)

Run benchmark, times: 1000, count: 1000000
JRE 1.6.0_02 (Sun Microsystems Inc.), Java HotSpot(TM) Client VM 1.6.0_02-b06 (Sun Microsystems Inc.) on Windows Vista 6.0 (x86)
Concrete, create temp : 20,40 sec (20.400.974.984)
Concrete, pass result : 7,69 sec (7.692.924.101)
Abstract, create temp : 64,07 sec (64.073.280.822)
Abstract, pass result : 39,62 sec (39.621.325.894)
Interface, create temp: 26,89 sec (26.885.917.344)
Interface, pass result: 15,73 sec (15.729.212.791)

kind regards,
Nicolai de Haan Brøgger

princec

Conclusion: get the design correct; then send the benchmarks off to Sun and ask them to make the best design the fastest too.

Cas :)

Matzon


Jens v.P.

So, since it seems that it's the VM that matters, what will you (the LWJGL masters) implement? And when, i.e. in which LWJGL version?

elias

I vote for "Concrete", since it seems faster in all cases if run for long enough.

- elias

Jens v.P.

I would vote for concrete, too. But it may break existing applications (since the fields x, y, and z are moved from Vector to Readable and must become protected).