LWJGL Forum

Programming => Bug Reports / RFE => Topic started by: cpw on January 30, 2019, 02:12:14

Title: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: cpw on January 30, 2019, 02:12:14
Hi
So this is quite the problem I seem to have. It seems 99% likely to be an issue with LWJGL 3.1.6 at this point - I can't try newer because I'm tied to what the game is using.

I am cpw, one of the forge platform developers for Minecraft. For the past few weeks, I've been tackling the upgrade to 1.13 of Minecraft, and I am repeatedly encountering JVM aborts and game crashes, with a wide variety of different behaviours.

I've encountered an actual "Stack smash" error on one occasion, numerous SEGV/SIGBUS and SEGV/SIGABRT errors, captured multiple core dumps (the only commonality is some REALLY weird thread traces in them), unmodifieable objects that are suddenly null, methods that fail with NPE at the end of the method (no functional code?!).

The computer I'm running on has passed all memtests I've thrown at it, and is quite capable of playing any other graphically intensive game going, including older versions of Minecraft, so although I initially thought "hardware (failing memory)" as the problem, I think I can safely discount that. It seems to me that the 1.13 update, with my rather wider scoped use of threading, seems to have caused LWJGL to overwrite parts of the stack somehow? I've read about the debugging of the new MemoryUtil functions, and turning on debugging seems to reduce the incidence of the issue somewhat - but it does not completely remove it. Furthermore, no errors seem to be reported by any of the debugging I have enabled, and yet the error persists. Perhaps your reporting functions aren't durable to a JVM SEGV?

What I would like is guidance on how I can possibly debug this problem. I'm happy to try building local copies with additional debugging enabled, but I am not familiar with the Memory code.

For a view of what's happening, I would recommend my recent twitch streams - it happens pretty frequently, usually accompanied by a bout of swearing. This evening I dedicated an hour long stream to investigating this, with no real results. https://twitch.tv/cpw. I've created a collection, where I will gather highlights showing the crashes occurring (so they don't get deleted by twitch after 30 days): https://www.twitch.tv/collections/_20aCTl-fhUW6g

Thanks for your time!
cpw
verification:
echo -n "I am cpw on the lwjgl forums, I signed up on 29 Jan 2019" | sha256sum
https://twitter.com/voxcpw/status/1090432302528806912
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on January 30, 2019, 10:37:34
Hey cpw,

I'm sorry to hear about the troubles you're having. I must say though that it's unlikely this kind of crashing is caused by LWJGL itself. The code in LWJGL is so simple, that there's simply not much room for weird bugs. It doesn't spawn internal threads or anything like that and even utilities like the memory leak detection is the most stupid simple and inefficient code you can think of (this is all on purpose btw). The two places with some kind of sophistication are a) the shared library loader, which has matured over time and is considered stable and working as expected and b) the nasty business with OpenGL current context tracking and ThreadLocalUtil (read the comments in that class for details).

There were a couple of known issues with LWJGL 3.1.6 that have been resolved in subsequent releases but, if Minecraft was affected, there would be many more reports about them. So, the most likely cause would be: 1. a bug in one of the libraries Minecraft uses 2. a bug in Minecraft itself and 3. something wrong with the user's machine/environment. We can reasonably assume that it's not 2 (more users would be affected) or 3 (you don't have issues with other applications).

That leaves us with 1. Before going deeper into this, my first guess would be that you're hitting a bug in jemalloc. LWJGL 3.1.6 ships with jemalloc 5.0.1, which has caused troubles in many applications. LWJGL 3.2.0 updated jemalloc to the much more stable 5.1.0. Some quick things you could try:

- Run Minecraft with -Dorg.lwjgl.system.allocator=system
- Delete/move the jemalloc shared library so that LWJGL can't find it (equivalent to the first solution, LWJGL will fall back to the system allocator automatically)
- Build jemalloc 5.1.0 or current head and replace the shared library that comes with Minecraft. (you could also download a recent build from https://www.lwjgl.org/browse/nightly/linux/x64)
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on January 30, 2019, 10:43:08
If that doesn't help, next step would be visiting https://builds.shipilev.net/ and downloading a fastdebug JDK build. Triggering a crash with it may provide more useful information.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: cpw on January 31, 2019, 00:24:10
Thanks for those tips. I wholeheartedly agree that this is 100% a corner case - I know I'm in a teeny tiny minority, daring to run Minecraft on Linux, but it's worked for 8 years, why stop now?

I'm not super familiar with the new memory architecture that is being used in LWJGL 3, so I think your analysis of jemalloc seems highly plausible. Certainly, one of the "big enhancements" I've been adding is that all modloading is MUCH more parallel than previously - concurrency bugs were always my top thought, but somehow LWJGL, or it's associated libs, seemed likely the culprit.

What I will try is moving to the system allocator first and foremost. I gather the jemalloc allocator is of value, for performance reasons, so I will probably try the 5.1 version of same as well, and let you know outcomes.

Finally, if neither of those works, I'll grab that JDK.

Thanks again for the feedback.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: illy on January 31, 2019, 01:37:23
Wanted to chime in and say I have been able to recreate cpw's setup and can confirm that this is a bug I am running into I'll try your suggestions and come back with updates.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: cpw on January 31, 2019, 03:38:34
Hi, So attached to this post are a couple of hs_err files from tonights stream where we looked into your suggestions. 4364 is captured using fastdebug, 14558 is captured using oracle j8u201. None of your suggestions seemed to have any effect - the game persistently crashes whenever it is run from any kind of development environment (gradle or intellij). I even upgraded to LWJGL 3.2.1 (latest?) which didn't stop the occurrence of problems.

As you can also see, this problem isn't exclusive to my computer either - Illy above has the same problem on Linux as well. I am going to try and get a cohort of Linux gamers to give this a go in the desktop client as well, over the next few days, to see if the problem exists there as well.

I am honestly at a loss as to how we can further debug this issue at present. Nothing seems to have had a significant effect.

Also attached is a "stack smash" error I managed to get one time, the other day.

Thanks for listening. I hope we can figure out a way to figure out what's going on here.

cpw
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on January 31, 2019, 13:48:45
Quick update: I'm able to reproduce this with MinecraftForge (branch 1.13-pre), JDK 8u201, Ubuntu 18.10.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: jojomodding on January 31, 2019, 15:16:32
I can also reproduce this, using updated Arch Linux, Java 8 and Intel iGPU.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on January 31, 2019, 15:52:43
Workaround that seems to eliminate the crash for me: net.minecraft.client.MainWindow.java:297, replace GLFW.glfwWaitEventsTimeout(d0 - d1) with GLFW.glfwPollEvents().

I don't fully understand the reason behind this yet (using GLFW.glfwWaitEvents() without a timeout crashes too), but could you please check if this change makes a difference for you?
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: cpw on January 31, 2019, 15:58:24
I'll do that and report back. That's really fascinating.

I've gotten widespread evidence that the game, as launched from the native launcher, does not generally seem to crash with this problem, but anyone running any variant of the same code in an IDE or launched from gradle, is easily able to reproduce this problem. This again is really curious evidence.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: cpw on January 31, 2019, 16:16:24
Just got this, with the change (see in the window above). It seems that it didn't help..  :'(
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: ichttt on January 31, 2019, 17:04:19
I found a way to reproduce this every time:
Add -Xcomp to the JVM flags to force every class to be compiled. It is slow AF, but it always crashed on C  [liblwjgl.so+0x21d2a]  Java_org_lwjgl_system_JNI_callPV__JIIIIJZ+0x1a
Full header here: https://pastebin.com/eXiwYJRW
Tested in a dev enviroment and in a compiled production enviroment, this flag always causes this
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: ichttt on January 31, 2019, 18:22:08
Well, nevermind that last response. Just found out that this is a bug with the compiler fixed in j10, and using -XX:-CriticalJNINatives prevents the issue.
But using -XX:-CriticalJNINatives and -Xcomp fixes the issue for me
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: cpw on January 31, 2019, 22:39:04
Some research updates. The vanilla launcher does not trigger this crash, seemingly ever. It seems to be exclusive to development type environments, where the game is launched from either an IDE or gradle.

It seems to be much harder (impossible) to trigger the crash if G1GC is enabled, instead of the default. But that's not definitive yet.

We've been running with -verbose:jni and -Xcheck:jni enabled. We can see that whenever it crashes, a small cluster of the same JNI methods are being bound.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on February 01, 2019, 08:59:43
Quote from: ichttt on January 31, 2019, 18:22:08Just found out that this is a bug with the compiler fixed in j10, and using -XX:-CriticalJNINatives prevents the issue.

LWJGL needs to support JDK 8, so it skips critical natives for functions affected by JDK-8167409 (https://bugs.openjdk.java.net/browse/JDK-8167409) on Linux & macOS. Unfortunately it missed certain functions, they will be fixed in 3.2.2. This issue is unrelated to the crash we're investigating though.

Quote from: cpw on January 31, 2019, 22:39:04Some research updates. The vanilla launcher does not trigger this crash, seemingly ever. It seems to be exclusive to development type environments, where the game is launched from either an IDE or gradle.

It seems to be much harder (impossible) to trigger the crash if G1GC is enabled, instead of the default. But that's not definitive yet.

We've been running with -verbose:jni and -Xcheck:jni enabled. We can see that whenever it crashes, a small cluster of the same JNI methods are being bound.

Using G1GC or SerialGC didn't help, I'm still getting crashes.

The only thing that completely eliminates the issue for me is what I said above, replacing glfwWaitEvents with glfwPollEvents. With that change, I cannot reproduce the crash anymore. Neither with the IntelliJ run configuration, nor with Gradle's forge:runclient from the terminal.

Quote from: cpw on January 31, 2019, 16:16:24Just got this, with the change (see in the window above). It seems that it didn't help..  :'(

I have a feeling that the stack smashing crash may also be a separate issue. When it happens, it happens very early in the program execution, whereas the "usual" crashes happen after the engine has finished loading (after the splash screen). It may have something to do with how IntelliJ launches the application, has it ever happened to you when launched from Gradle?
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: cpw on February 01, 2019, 16:59:54
Quote from: spasi on February 01, 2019, 08:59:43

LWJGL needs to support JDK 8, so it skips critical natives for functions affected by JDK-8167409 (https://bugs.openjdk.java.net/browse/JDK-8167409) on Linux & macOS. Unfortunately it missed certain functions, they will be fixed in 3.2.2. This issue is unrelated to the crash we're investigating though.
Are you sure it doesn't affect it? It seems that running with JDK 10 has made the problem completely disappear as well. So maybe there is a relationship here? JDK 10 has that issue fixed, as I understand it.

Quote from: spasi on February 01, 2019, 08:59:43
Using G1GC or SerialGC didn't help, I'm still getting crashes.
Interesting, I could not recreate the issue at all with it. Something about the vanilla launcher seems to prevent it. Perhaps -Xss (seems unlikely, and fiddling didn't change outcomes as far as I can tell).

Quote from: spasi on February 01, 2019, 08:59:43
The only thing that completely eliminates the issue for me is what I said above, replacing glfwWaitEvents with glfwPollEvents. With that change, I cannot reproduce the crash anymore. Neither with the IntelliJ run configuration, nor with Gradle's forge:runclient from the terminal.
That's curious. I tried that, and it didn't eliminate it for me. It still crashed, less frequently, but still crashing.

Quote from: spasi on February 01, 2019, 08:59:43
I have a feeling that the stack smashing crash may also be a separate issue. When it happens, it happens very early in the program execution, whereas the "usual" crashes happen after the engine has finished loading (after the splash screen). It may have something to do with how IntelliJ launches the application, has it ever happened to you when launched from Gradle?

I have had it happen twice. It seems to be at the point where LWJGL is trying to load it's code for me, which is why I felt it was not uncorrelated. It is a lot rarer, as a problem though.

Anyway, for right now, my workaround is to use J10, for running the game. It fixes the problem for me, as far as I can tell. 20+ runs without this crash seems pretty definitive. We still build against J8, and J8 is the default setup everyone will be using, it's just a dev-time workaround for now.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on February 01, 2019, 18:16:09
Quote from: cpw on February 01, 2019, 16:59:54Are you sure it doesn't affect it? It seems that running with JDK 10 has made the problem completely disappear as well. So maybe there is a relationship here? JDK 10 has that issue fixed, as I understand it.

If it was affecting anything, simply running with -XX:-CriticalJNINatives ( on JDK 8 ) would eliminate the crashes. Also, G1GC is the default GC since JDK 9, so maybe JDK 10 isn't crashing for the same reason JDK8+UseG1GC isn't crashing for you.

So, we still haven't identified a universal fix for this. The issue seemingly goes away when the performance characteristics of the execution change (-Xcomp, poll vs wait, G1GC vs parallel, etc), which suggests a nasty race somewhere. But I've no idea what to blame (IntelliJ/Gradle? Minecraft/Forge? LWJGL? The JVM?).
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: princec on February 22, 2019, 14:14:31
Thought I'd better join in on this thread as it now seems I've got the same issue with 3.1.6.

So far, no combination of switches has eliminated the problem - it comes and goes randomly. Today, everything just stopped working, after several days of no problems at all. This is on Windows 10 / OpenJDK 11.

Cas :)
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: CoDi on February 22, 2019, 15:23:39
Are there any debug builds of LWJGL's native libraries or jemalloc to possibly get more verbose crash logs?

Also, I've had some success locating memory corruption / use-after-free crashes with Microsoft Application Verifier. I never tried to use it with a Java application though.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: princec on February 22, 2019, 16:45:13
Ok, riddle me this - what's wrong with this piece of code:

try (GLFWImage.Buffer imageBuffer = GLFWImage.create(images.length)) {
for (GLFWImage image : images) {
imageBuffer.put(image);
}
imageBuffer.flip();
glfwSetWindowIcon(window, imageBuffer);
for (GLFWImage image : images) {
image.free();
}
}


Cas :)
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: princec on February 22, 2019, 16:47:30
Well, I shall let on:

java.lang.IllegalStateException: The memory address specified is not being tracked
at org.lwjgl.system.MemoryManage$DebugAllocator.untrack(MemoryManage.java:192)
at org.lwjgl.system.MemoryManage$DebugAllocator.free(MemoryManage.java:153)
at org.lwjgl.system.MemoryUtil.nmemFree(MemoryUtil.java:254)
at org.lwjgl.system.CustomBuffer.free(CustomBuffer.java:63)
at org.lwjgl.system.NativeResource.close(NativeResource.java:20)

with
-Dorg.lwjgl.util.DebugAllocator=true
-Dorg.lwjgl.util.DebugAllocator.internal=true

Strikes me as being incorrect; a NativeResource such as GLFWImage.Buffer implements AutoClosable.

Cas :)
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: princec on February 22, 2019, 16:50:28
What's interesting is the game stops crashing when I move that GLFWImage.Buffer out into a static and don't allocate it in a try-with-resources (and thus don't attempt to free it).

Cas :)
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on February 22, 2019, 19:36:34
Quote from: princec on February 22, 2019, 16:45:13Ok, riddle me this - what's wrong with this piece of code:

try (GLFWImage.Buffer imageBuffer = GLFWImage.create(images.length)) {
for (GLFWImage image : images) {
imageBuffer.put(image);
}
imageBuffer.flip();
glfwSetWindowIcon(window, imageBuffer);
for (GLFWImage image : images) {
image.free();
}
}

edit: removed previous reply because I misunderstood the code.

The problem is that GLFWImage.create uses ByteBuffer.allocateDirect to do the allocation. I.e. it's memory tracked by the JVM/GC, and should not be used in a try-with-resources block, you cannot free it explicitly. If you change it to GLFWImage.malloc, it will work.

From previous reply: Also, you don't need a Java array of GLFWImages AND a GLFWImage.Buffer. It's extra allocations and data copies for no reason. For example you could do this:

ByteBuffer icon16;
ByteBuffer icon32;
try {
    icon16 = ioResourceToByteBuffer("lwjgl16.png", 2048);
    icon32 = ioResourceToByteBuffer("lwjgl32.png", 4096);
} catch (Exception e) {
    throw new RuntimeException(e);
}

try (GLFWImage.Buffer icons = GLFWImage.malloc(2)) {
    ByteBuffer pixels16 = Objects.requireNonNull(stbi_load_from_memory(icon16, w, h, comp, 4));
    icons
        .get(0)
        .width(w.get(0))
        .height(h.get(0))
        .pixels(pixels16);

    ByteBuffer pixels32 = Objects.requireNonNull(stbi_load_from_memory(icon32, w, h, comp, 4));
    icons
        .get(1)
        .width(w.get(0))
        .height(h.get(0))
        .pixels(pixels32);

    glfwSetWindowIcon(window, icons);

    stbi_image_free(pixels32);
    stbi_image_free(pixels16);
}
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on February 24, 2019, 22:31:10
Hey Cas,

Did my reply above help explain what was happening? Have the crashes disappeared completely now? If yes, could you also please try LWJGL 3.2.1 or even the current 3.2.2 snapshot? I'm losing sleep over this thing and want to make sure there's nothing seriously wrong with the library.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: princec on February 25, 2019, 10:09:57
I will try with 3.2.2 tonight. Although I would be pleasantly surprised if the problems went away I am rather sceptical, as one of the first things I did to try and nail down the root cause of the bug was to remove the icon setting code, and it didn't help.

Regarding that bit of code... it does and doesn't help. To my mind I'm using the API in the correct way: GLFWImage.Buffer implements Autoclosable, therefore, it must be usable in a try-with-resources block. If this is not the case, it must not implement Autoclosable. Even if this is explicitly stated in the Javadoc, it still fails because code itself can't read Javadocs and does what it is supposed to do when confronted by interfaces with a specific contract. The API as it stands (and anywhere else this may be occurring) is simply asking for big trouble, and looks like it's found it too.

Even so - there should be actual runtime exception checks that are always on for management of this sort of thing in LWJGL I think. Correctness IMO always trumps performance. If I were worried about pure performance, I'd be using C.

Cas :)
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on February 25, 2019, 20:07:49
Quote from: princec on February 25, 2019, 10:09:57
Regarding that bit of code... it does and doesn't help. To my mind I'm using the API in the correct way: GLFWImage.Buffer implements Autoclosable, therefore, it must be usable in a try-with-resources block. If this is not the case, it must not implement Autoclosable. Even if this is explicitly stated in the Javadoc, it still fails because code itself can't read Javadocs and does what it is supposed to do when confronted by interfaces with a specific contract. The API as it stands (and anywhere else this may be occurring) is simply asking for big trouble, and looks like it's found it too.

Even so - there should be actual runtime exception checks that are always on for management of this sort of thing in LWJGL I think. Correctness IMO always trumps performance. If I were worried about pure performance, I'd be using C.

I totally get your point of view and the concerns you raise have been carefully considered... years ago.

The chosen solution for correctness was the DebugAllocator. The exception you were seeing ("The memory address specified is not being tracked") was saying that someone tried to free a pointer that was never allocated by an explicit allocation API. It also triggers on a double-free. It is not enabled by default because of the performance impact. The current implementation uses a global ConcurrentHashMap from Long to Allocation, where Allocation holds the allocation size, thread ID and stacktrace where the allocation happened. Even with the new stack walking API in Java 9, this is very expensive. Best we could do is make the DebugAllocator implementation configurable and have an alternative implementation (per-thread state, primitive collections, no call stack tracking) that might be more reasonable to have always enabled during development (e.g. enabled-by-default in Debug mode, without a separate switch).

One might think that doing some kind of tracking or detection per buffer/struct would be ideal. First of all, buffers come from the JDK and there's no way to add extra state, so immediately we don't have a general solution, we're left with structs only. Most importantly though, this is virtually an impossible problem to solve: The backing buffer might be sliced/duplicated. A Struct might have come from a StructBuffer. The StructBuffer itself might be sliced/duplicated. We'd have to recurse through the chain, doing instanceof checks and then we'd still need to examine private NIO buffer data. And we'd still not be 100% sure about ownership, we'd have to consider the API used: We just got a pointer from an API, are we responsible for freeing it or will the API take care of it? It becomes too much. And it's not like C does anything better about this, tracking pointer ownership is the developer's responsibility. Rust is the only language with robust ownership tracking, but we can't expect LWJGL to be comparable to Rust, can we?

Anyway, I think getting used to the API more will help. Quick reminder of how the "what to use for allocation" algorithm goes:

1. Is it a short-lived, small-sized allocation? Use the MemoryStack.
    * Anything allocated via the MemoryStack must NOT be explicitly freed.
    * You manipulate the stack instead, a pop will free anything allocated after the last push. This can be done either automatically in a try-with-resources block, or by calling push/pop explicitly.
    * Yes, the MemoryStack is AutoCloseable too and close() delegates to pop().

2. Is it a long-lived and/or big allocation and it has a clearly defined life-cycle? Use the explicit allocation/deallocation API.
    * If it's called malloc/calloc/realloc, it's an explicit allocation that must be freed manually.
    * This is where you'd use try-with-resources to do the free safely/automatically.

3. Weird life-cycle or too much hassle to track liveness? Use BufferUtils.
    * If it starts with "create", it uses ByteBuffer.allocateDirect under the hood. It must NOT be explicitly freed.
    * Same semantics as every allocation in an LWJGL 2-based application, the allocation will be freed by the GC.
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: princec on February 25, 2019, 21:23:50
Then I think that it had better not implement Autoclosable if it is not actually closeable (and doubly so if it causes a crash!) The actual returned implementation could implement it, and I could conceivably cast it to Autoclosable to get try-with-resources to look a bit prettier. I don't think it is at all sensible to break the contract of an API, especially not one that's actually now built-in to the JLS.

So the way around it is, for things that are created with calloc/malloc, have the returned things implement something like

interface Calloced extends AutoCloseable {
   default void free() { close(); }
}


and have things created with BufferUtils simply... not implement that interface. The abstract base class of the allocated object clearly then can't implement a free() or close() method, so you'll have to generate specialised concrete subclasses, eg GLFWImageCalloced or somesuch. A bit of a mouthful maybe. But look what a mess it gets everything in to when it gets accidentally used wrong, which it will I imagine quite a lot, and the symptoms are every bit as unhelpful as a C program exploding rendering the use of Java a bit pointless when one of the main reasons to use it is program correctness.

It'll all be moot when Valhalla turns up anyway so I can't help but wondering if this is perhaps just rearranging deckchairs on the Titanic anyway, saving a few cycles here and there for a very tiny subset of programs that will ever exist before it's all replaced by something more straightforward anyway in due course.

I know I should have stuck my oar in years ago but I was too busy at the time... anyway, back to bug reproduction...

Cas :)
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: princec on February 25, 2019, 21:32:00
With 3.2.1, and tweaked icon code... currently not seeing a crash. So that's hopeful.

Cas :)
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: spasi on February 26, 2019, 10:43:16
Quote from: princec on February 25, 2019, 21:23:50So the way around it is, for things that are created with calloc/malloc, have the returned things implement something like

interface Calloced extends AutoCloseable {
   default void free() { close(); }
}


and have things created with BufferUtils simply... not implement that interface. The abstract base class of the allocated object clearly then can't implement a free() or close() method, so you'll have to generate specialised concrete subclasses, eg GLFWImageCalloced or somesuch. A bit of a mouthful maybe. But look what a mess it gets everything in to when it gets accidentally used wrong, which it will I imagine quite a lot, and the symptoms are every bit as unhelpful as a C program exploding rendering the use of Java a bit pointless when one of the main reasons to use it is program correctness.

I'd much rather remove AutoCloseable from structs in LWJGL 3.3. As I tried to explain, there's no scheme we could come up with that will cover all cases. It's going to be ugly and a waste of time for minimal gain.

Quote from: princec on February 25, 2019, 21:23:50It'll all be moot when Valhalla turns up anyway so I can't help but wondering if this is perhaps just rearranging deckchairs on the Titanic anyway, saving a few cycles here and there for a very tiny subset of programs that will ever exist before it's all replaced by something more straightforward anyway in due course.

Valhalla is going to help with value types, but what we're waiting for is Panama (and of course its integration with Valhalla). You'd still need to pass a pointer to the struct data to native code, how this ends up being represented in Panama is going to matter... a lot. Is it going to be zero-copy? Is it going to be as convenient as in C? The current prototype uses Scope in try-with-resources for this and is still not close to providing satisfying answers.

AutoCloseable is not about performance, it's just a convenience. It did come into the picture because of a performance concern (using malloc/calloc instead of BufferUtils), but that's justified on its own. We're talking much more than a few cycles and having instances that do not automatically escape (which is what happens with ByteBuffer.allocateDirect).

Btw, not sure if you noticed already, not all structs are AutoCloseable. Only those that are "mallocable" and used as input to some API. So, GLFWImage is AutoCloseable, but GLFWVidMode is not (only returned by GLFW).

Finally, one more thing I'd like to ask: how did you end up using try-with-resources when you first wrote this? My guess is, you didn't really inspect LWJGL's source code (AutoCloseable is hidden behind the NativeResource interface), but it was Eclipse that warned you about it? Not solving the real problem, but does it help if you disable these warnings?

- Code style -> Resource not managed via try-with-resources (1.7 or higher)
- Potential programming problems -> Potential resource leak

(they were called like that back in 2016, not sure about now)

Quote from: princec on February 25, 2019, 21:32:00With 3.2.1, and tweaked icon code... currently not seeing a crash. So that's hopeful.

Great, thanks!
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: princec on February 26, 2019, 11:11:46
I agree it's not worth changing course with LWJGL3.x at this point, and wait-and-see what Valhalla/Panama bring to the table before doing another ground-up redesign for LWJGL4.

I tend to keep most warning enabled in Eclipse - and indeed I'd never have thought to put it in try-with-resources if Eclipse hadn't warned me about it. But consider an API (of admittedly obscure function) that consumes AutoCloseables and closes them without regard to knowledge of what it is actually closing - the contract of AutoCloseable says in no uncertain terms how it should be behaving under any circumstances and here's a circumstance where it clearly isn't following the party line. Given the amount of hassle it's clearly already caused - and the sheer embuggerance of tracking it down by the nature of the blowup it caused - it would definitely be best if the close() method was indeed safely wrapped up with a check and a proper exception that are explicitly turned off by a flag to gain performance, rather than turned on by a flag to gain debugging.

There could be many more such instances of this going awry in the wild and it just results in more support, more headaches, more hair loss, and buggier software, so if there was one thing I would like to have fixed in LWJGL, it's this, if not for me then for the hundreds of programmers and customers who come after me and stumble across the same behaviour (I really like the new API otherwise!)

Cas :)
Title: Re: Probable stack smash in LWJGL 3.1.6 on Linux with J8
Post by: Lightbuffer on January 03, 2023, 16:14:21
I know the topic is pretty old, but since it's the only Google search query that I found when I encountered this issue.

It appears that "stack smashing" error can also be caused when GLFW window requested to be created with a bigger than the size of the screen. In my case, I was testing my LWJGL (3.3.1) app on an Ubuntu (22.04.1) VM and it was crashing at 800x600 (while the default window size in my app is 1280x720). I accidentally fixed this when I had to change the size of the screen (1600x1050) in system settings to fit IntelliJ's window. I launched the app via IntelliJ and it worked. I launched again without IntelliJ and it worked as well. I changed back the size to 800x600, it crashed again with *** stack smashing ... ***. I hope it helps someone else who would be trying out their LWJGL app in a Ubuntu VM.