Long enqueue times with openCL on linux+Nvidia

Started by mitchoaa, January 02, 2023, 23:47:23

Previous topic - Next topic


I'm using lwjgl through Scala and enqueueing openCL kernels for fast array operations. This is not for a game, this is training neural nets. I'm trying to speed up training of neural nets and notice that when I run training on my macbook the "enqueue" kernel command accounts for about 0.5% of all time, but when I run on a linux machine with Nvidia graphics cards it accounts for about 15% of all time. I've also compared the enqueue times in lwjgl with enqueue times in C, and they're sometimes significantly longer e.g. 300,000ns vs 20,000ns (comparing worst times).

I've tried a number of different ways to measure timing, and tried tweaking some lwjgl configurations, but I still can't figure out why the "enqueue" time should be sometimes slow.

Does anyone have any tips for tracking down the cause of this slowness?


Hey mitchoaa,

The enqueue methods in LWJGL are direct calls to the OpenCL driver. There's nothing complicated going on, just plain JNI passing the Java arguments directly to the native function. There must be something else in the Scala program that causes this, that does not happen in the C version.

Also, when comparing Java vs C timings, make sure the Java code is sufficiently warmed up.