Installation test fails on Linux

Started by JesseC, August 24, 2008, 01:33:20

Previous topic - Next topic

JesseC

I downloaded both lwjgl-2.0rc1 and lwjgl-1.1.4 and tried running the recommended command:

java -cp .:res:jar/lwjgl.jar:jar/lwjgl_test.jar:jar/lwjgl_util.jar:jar/lwjgl_fmod3.jar:jar/lwjgl_devil.jar:jar/jinput.jar: -Djava.library.path=native/linux org.lwjgl.test.WindowCreationTest


In both, though, it gives me this:

The following keys are available:
ESCAPE:		Exit test
ARROW Keys:	Move window when in non-fullscreen mode
L:		List selectable display modes
0-8:		Selection of display modes
F:		Toggle fullscreen
SHIFT-F:	Toggle fullscreen with Display.destroy()/create() cycle
Found 14 display modes
Problem retrieving mode with 640x480x16@-1
Problem retrieving mode with 640x480x32@-1
Problem retrieving mode with 800x600x16@-1
Problem retrieving mode with 800x600x32@-1
Problem retrieving mode with 1024x768x16@-1
Problem retrieving mode with 1024x768x32@-1
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x00002ac824fb6e30, pid=10728, tid=47039105212656
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.5.0_16-b02 mixed mode)
# Problematic frame:
# C  [libc.so.6+0x74e30]  memset+0x60
#
# An error report file with more information is saved as hs_err_pid10728.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted


On the one hand, I'm sure OpenGL and Java are generally working, from glxinfo / glxgears / other Java apps / etc.

On the other hand, I got the test running earlier, but OpenGL wasn't quite right (DRI was disabled, for one thing) so I ended up reinstalling drivers and changing kernel versions. And, now, it gives this error! Any ideas?

Matzon

the memset problem is pretty serious!
could you attach hs_err_pid10728.log ?

JesseC

OK, here's the log. Thanks for the quick reply!

Matzon

my best guess is that opengl drivers are messing with lwjgl stuff, which causes lwjgl to initialize with a null display.
since you happen to get the bug, it would be nice if you could debug:
LinuxDisplayPeerInfo.initDefaultPeerInfo
org_lwjgl_opengl_Display.c::Java_org_lwjgl_opengl_LinuxDisplayPeerInfo_initDefaultPeerInfo
context.c::initPeerInfo
and check for null pointers and similar

JesseC

Oh, OK! Well, I'll do my best! I checked the source out with subversion and tried compiling it (according to this), but I'm not sure where to go from there. I see there's a "runtest" target in the build file, should I use that? (Running it with -Dtest.mainclass=org/lwjgl/test/WindowCreationTest gives me pretty much the same error as in my first post, which I assume is what I should expect.)

And finally, for debugging, should I use jdb, or something different? Sorry for all the questions, but I'm in a bit over my head! (Until now, I've only tried debugging my own bits of code inside Eclipse...)

Matzon

well, the best thing right now is to try and identify the location of the memset command and why its happening. Thanks to the lack of stacktraces in native code, it would probably be easiest just to compile some System.out.println("Display: " + display); into the java code - for instance in initDefaultPeerInfo and check the arguments being passed.
The same with the native code - add some printf("display pointer: %x", display); statements in org_lwjgl_opengl_Display.c::Java_org_lwjgl_opengl_LinuxDisplayPeerInfo_initDefaultPeerInfo and context.c::initPeerInfo and check the value of the arguments.

no need to start with a debugger just yet.

I've never used the runtest command - I just type the stuff from command line :)

JesseC

OK, I think I'm on the right track... just one more question: where is the initDefaultPeerInfo function, declared in LinuxDisplayPeerInfo.java, actually defined? And how does it relate to Java_org_lwjgl_opengl_LinuxDisplayPeerInfo_initDefaultPeerInfo from org_lwjgl_opengl_Display.c? (I don't have a very firm understanding of how the java and native code fit together, as you can see.)

On the plus side, I've got context.c::initPeerInfo spitting out its arguments:

     [java] => entered initDefaultPeerInfo
     [java] ===> env:              40112998
     [java] ===> claszz:           4022a860
     [java] ===> display:          0.000000
     [java] ===> screen:           0
     [java] ===> peer_info_handle: 4022a878
     [java] ===> pixel_format:     4022a870
     [java] => entered initPeerInfo
     [java] ===> env:              40112998
     [java] ===> peer_info_handle: 4022a878
     [java] ===> display:          baa0b810
     [java] ===> screen:           0
     [java] ===> pixel_format:     4022a870
     [java] ===> use_display_bpp:  1
     [java] ===> drawable_type:    1
     [java] ===> doubled_buffered: 1
     [java] ===> force_glx13:      0


...though I don't actually know what these arguments should be, in general (or if I'm printf'ing things properly)... does that output explain anything?

JesseC

Here's what's happening to the best of my understanding:

initDefaultPeerInfo calls initPeerInfo,
which calls extgl_InitGLX,
which calls lwjgl_glXQueryVersion,
which crashes.

Also, none of the pointers I've seen are null.

Matzon

I am curious as to how display can be NULL in initDefaultPeerInfo but baa0b810 in initPeerInfo.

Whats the spec of your setup? (distro, graphics card?). Is it only lwjgl apps that are broken?

JesseC

Hmm, good point! I was confused by display being of type "jlong" in initDefaultPeerInfo but a pointer to a "Display" in initPeerInfo. I hadn't paid much attention to this line, before:

Display *disp = (Display *)(intptr_t)display;


It doesn't make much sense to me that it should go from 0.0 to some large value for the pointer. I guess I'll poke at that a little more.

EDIT: Ah, it wasn't zero at all; My printf() statement in initDefaultPeerInfo for display had %f, and with %x it gives the same value. Oops!

As far as I can tell, other OpenGL / Java apps work fine (some games, screen savers, Eclipse, etc.)
As for my setup, I'm running Gentoo on kernel 2.6.23 with ATI's fglrx driver, with a "radeon xpress 1100". lspci output for the card:

01:05.0 VGA compatible controller: ATI Technologies Inc RS485 [Radeon Xpress 1100 IGP] (prog-if 00 [VGA controller])
	Subsystem: Acer Incorporated [ALI] Device 009f
	Flags: bus master, 66MHz, medium devsel, latency 255, IRQ 17
	Memory at c8000000 (32-bit, prefetchable) [size=128M]
	I/O ports at 9000 [size=256]
	Memory at c0100000 (32-bit, non-prefetchable) [size=64K]
	[virtual] Expansion ROM at c0120000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 2
	Kernel driver in use: radeonfb
	Kernel modules: fglrx


I attached the output from glxinfo, if that helps at all.

JesseC

So, with all the printf()'s checked and using %x, here's the complete output from "ant -Dtest.mainclass=org/lwjgl/test/WindowCreationTest runtest". It just goes into lwjgl_glXQueryVersion (what I labeled "extgl_InitGLX step 1") and never comes out again...

Matzon

not sure whats going on :/ Can you check the Screen that is passed - why is it null?

JesseC

Oh! is "screen" an address, also? Since it's an integer everywhere, I had assumed that it was a value for the screen to use on the X Server, and "0" made sense to me (default screen; ie, echo $DISPLAY --> ":0.0"). But, if it's being used for the value of a pointer, then I see what you mean.

To look a bit more at the value for screen, what calls Java_org_lwjgl_opengl_LinuxDisplayPeerInfo_initDefaultPeerInfo ? I don't really know where to trace things backwards from there...

Matzon

sorry - you're right (I'm a windows user :)) - screen is indeed the x default screen

JesseC

Ah, well, in that case, is there anything left that looks suspicious? I'd be happy to try to find the problem, but I'm not sure where to look at this point.

Also, using Gentoo's eselect tool, I've tried setting java-vm to both "sun-jdk-1.5" and "sun-jdk-1.6", and opengl to "ati" and "xorg-x11". No luck with any combination  :-\

...And it actually runs if I use the generic radeon driver instead of ATI's fglrx driver, but the generic driver corrupts my display in other ways. I guess that means I can blame the fglrx driver for this, so I'm going to focus on figuring out what's different between those two cases...