MemoryStack vs Escape Analysis

Uze · November 21, 2018, 11:48:57

As I understand, methods like MemoryStack#malloc(int) are supposed to be gc-free only if EA eliminates actual ByteBuffer creation? And for that to happen, developer should be really careful about passing allocated buffer to other methods. For example, harmless-looking call to java.lang.System#identityHashCode or any other native method accepting Java class reference will lead to object escaping and thus some amount of garbage? I suspect that since LWJGL natives accepts only long addresses that is not the case here but any other native method call can be non gc-free, right?

If that is the case - how reliable is current Java's EA?
And was there any ideas about creation of additional org.lwjgl.system.MemoryAccess.MemoryAccessor implementation with some buffer instance caching?

spasi · November 21, 2018, 21:48:07

Quote from: Uze on November 21, 2018, 11:48:57As I understand, methods like MemoryStack#malloc(int) are supposed to be gc-free only if EA eliminates actual ByteBuffer creation? And for that to happen, developer should be really careful about passing allocated buffer to other methods. For example, harmless-looking call to java.lang.System#identityHashCode or any other native method accepting Java class reference will lead to object escaping and thus some amount of garbage? I suspect that since LWJGL natives accepts only long addresses that is not the case here but any other native method call can be non gc-free, right?

Correct. For escape analysis to work and scalar replacement to happen, the object must be instantiated and last used within the same block of inlined code. Passing the object to any method that is not inlineable (e.g. native methods) or not inlined for whatever reason (method too big, callee too big, maximum inline depth reached, etc), will mark it as escaping and scalar replacement cannot be performed.

Quote from: Uze on November 21, 2018, 11:48:57If that is the case - how reliable is current Java's EA?

It's reliable, as long as you learn to live within the current set of limitations. Hotspot is at a disadvantage compared to GraalVM, because the latter can perform partial escape analysis, enabling more complicated code to benefit. But, in any case, the important thing is that it's easy to discover where EA fails and there are strategies you can follow to make it work. I personally use JITWatch heavily, it reports both where actual allocations happen and why EA couldn't eliminate them. The most common issue is big methods, I usually break uncommon/slow paths to separate methods and the problem goes away. Writing what is traditionally considered "good" Java code (many, small methods, doing one thing only) is a good recipe to get the most out of EA. In extreme cases, the worst thing you'll have to do is manually inline code (if it's small enough), or something like:

allocate buffer -> use buffer locally -> unwrap to raw address -> pass to some other code -> wrap address to a buffer -> use buffer locally

So, instead of struggling to make EA eliminate a single buffer allocation, you make it easy for EA and the two buffer allocations are trivially eliminated.

Quote from: Uze on November 21, 2018, 11:48:57And was there any ideas about creation of additional org.lwjgl.system.MemoryAccess.MemoryAccessor implementation with some buffer instance caching?

No, in my experience with LWJGL 3 after all these years, caching buffer instances is never needed. Hot loops can be tested with JITWatch and refactored if necessary. For code that isn't performance sensitive, even if you miss a few allocations, young GCs are super cheap. Note that I'm talking about buffers allocated via MemoryUtil or MemoryStack. Buffers allocated via BufferUtils (i.e. ByteBuffer.allocateDirect) are automatically marked as escaping because of the Cleaner required to free the backing memory (also, two GCs cycles are needed to trigger this). So, yeah, only use BufferUtils if you have to (long-lived buffers with no obvious way to manually free).

Btw, MemoryAccess has been removed in LWJGL 3.2.1. This was done along a set of optimizations that make more LWJGL code benefit from escape analysis. See the current release notes for details. A good example is the new memSlice: using the JDK implementation was problematic after Java 9, because of new method overloads introduced in NIO buffer classes. This increased the number of methods involved and made it extremely easy to hit the maximum inlining level, which is 9 (!) by default in Hotspot, so any buffer returned by memSlice would automatically escape. The new implementation has an "inlining budget" of just 2 levels. Such optimizations have been applied to several other places, with struct & struct buffer instances benefiting the most. See the InlineTest benchmark for examples of garbage free iteration over a struct buffer.

Uze · November 22, 2018, 20:44:16

Thank you for such a thorough answer and also for pointing me at JitWatch! Looks like I need to update my dependencies and do more checks.

News:

MemoryStack vs Escape Analysis

Uze

spasi

Uze