Do you know why this is? I would have thought the driver would be good at matrix manipulations seeing as it does so many of them!

The biggest one is the overhead of JNI to translate java method calls to native ones and any argument validations that occur in the LWJGL classes.

The driver is not that fast at matrix manipulation; the GPU is. But the glTranslate, etc. methods do not make it that far!

I don't understand why this is necessary. I am reading the current state of the matrix so there shouldn't be a need for push/pop. I assumed the performance dip would be due to the glGetFloat method.

Yes, but you might find the need to build the matrix yourself with the convenient glTranslate, glRotate, etc. methods on the OpenGL matrix stack and afterwards get the resulting matrix with glGetFloat just to transform some vertices with that matrix. Then, in order to prevent any pre-existing matrix setting to be destroyed, you would push/pop.

The overhead of the OpenGL commands involved in that would be way higher than doing the matrix manipulations within the Java linear algebra classes.

In fact when you have some Physics and other stuff that heavily relies on math involved in your program, you would have to have some sort of math library for that, too, and then it is cumbersome to keep those two parallel worlds "OpenGL matrix stack" and your own math objects in synch.

I believe this might be the reason why the Khronos group decided to mark the matrix stack and any glTranslate, glRotate, etc. operations as deprecated in the OpenGL 3.0.