The Windows DLL loader lock: how a Rust thread can hang your JVM
Introduction
Several weeks ago, we encountered a silent, sporadic hang in our Windows CI pipeline. After a deep investigation, we uncovered a deadlock that left processes completely frozen with no ability to extract a Java stack trace.
This blog post walks through our debugging journey and includes low-level details about the Java Virtual Machine's garbage collection, Rust's thread-local storage, the JNI (Java Native Interface) attachment protocol, and a core Windows kernel primitive known as the Loader Lock.
TL;DR:
On Windows, the OS holds the process-wide Loader Lock during thread termination (specifically during Rust's TLS destruction).
TLS destruction triggers
jni-rs, which tries to detach the thread from the JVM. This step transitions the thread from "Native" to "VM" state, and because the GC is running, this transition is blocked at the Safepoint Barrier. The Rust thread waits for the GC to unpark it.Simultaneously, the GC is waiting for a newly spawning Java thread to report in. However, this new thread cannot reach the safepoint; it is blocked in the OS initialization phase, waiting for the Loader Lock (held by the Rust thread).
The First Clues: A Local Reproducer and Thread Dumps
Our CI pipeline runs a suite of tests on Linux, MacOS and Windows using Azure Pipelines. On Windows, we noticed that some test suites would occasionally hang until the job timed out.
My first reflex was to replicate the issue locally in order to gather more details. After a few attempts, the hang occurred, and I was able to capture a process dump.
With this process dump, I was able to extract native stacks using WinDbg and Java stacks using jhsdb. We found three clues:
- The main thread was stuck in GC:
"Time-limited test" #4053 daemon prio=5 tid=0x0000019183c06c30 nid=0x2270 waiting on condition [0x00000019f0afe000]java.lang.Thread.State: RUNNABLEJavaThread state: _thread_blocked- java.lang.Runtime.gc() @bci=0 (Compiled frame; information may be imprecise)- java.lang.System.gc() @bci=3, line=1907 (Compiled frame)- io.questdb.ServerMain.start(boolean) @bci=66, line=251 (Interpreted frame)- locked <0x0000000704bc3138> (a com.questdb.AbstractEntBootstrapTest$EntGriffinServerMain)- io.questdb.ServerMain.start() @bci=2, line=239 (Interpreted frame)
It was waiting for all threads to reach a "safepoint", a JVM mechanism for safely pausing threads during VM-level operations, including garbage collection.
- Several Rust Tokio worker threads were in their
on_thread_stophook, making a JNI call.
53 Id: 4994.575c Suspend: 0 Teb: 00000019`f07a8000 Unfrozen "tokio-runtime-worker"# Call Site00 ntdll!NtWaitForSingleObject+0x1401 KERNELBASE!WaitForSingleObjectEx+0x8e02 jvm!XXX+0x1cb60703 jvm!XXX+0x5e9b404 jvm!XXX+0x624b905 jvm!XXX+0x11a06f06 qdb_ent14818614347342639976!jni::wrapper::jnienv::JNIEnv::call_method_unchecked<ref$<jni::wrapper::objects::global_ref::GlobalRef>,jni::wrapper::objects::jmethodid::JMethodID>+0xa681 [C:\w\.cargo\registry\src\index.crates.io-1949cf8c6b5b557f\jni-0.21.1\src\wrapper\macros.rs @ 86]07 qdb_ent14818614347342639976!qdb_ent::call_method<tuple$<>,qdb_ent::call_void_method::closure_env$0>+0xf0 [C:\w\questdb-ent\rust\qdb-ent\src\lib.rs @ 56]08 qdb_ent14818614347342639976!qdb_ent::call_void_method+0x49 [C:\w\questdb-ent\rust\qdb-ent\src\lib.rs @ 86]09 qdb_ent14818614347342639976!qdb_ent::tokio::ThreadLifetimeListener::on_thread_stop+0x35 [C:\w\questdb-ent\rust\qdb-ent\src\tokio.rs @ 46]0a qdb_ent14818614347342639976!qdb_ent::tokio::Java_com_questdb_tokio_TokioRuntime_create::closure$4+0xe [C:\w\questdb-ent\rust\qdb-ent\src\tokio.rs @ 101]
Note: we have hidden some addresses with XXX because the debug symbols were
not available.
- There were several unnamed threads laying around.
Aside: What is a Safepoint?
A safepoint is a point in execution where a thread's state is fully describable to the JVM: all object references reside in known locations (registers, stack slots, or heap), and no heap mutation is in flight. The JVM can only perform certain global operations - most notably GC - when all mutator threads are stopped at a safepoint simultaneously. (Since JDK 10, thread-local handshakes allow some operations on individual threads, but GC still requires a global stop.)
The mechanism: the JVM pre-allocates two contiguous memory pages - a "bad" page (no access) and a "good" page (readable). The JIT compiler emits polling instructions at method returns and loop back-edges. To arm a safepoint, the VM switches threads' poll addresses from the good page to the bad page. Reading the bad page triggers a SIGSEGV (or access violation on Windows) that the JVM's signal handler catches and uses to block the thread at the safepoint barrier.
Threads executing native code via JNI are in a special "Native" state and don't poll - they're considered "safe" because they shouldn't hold direct object references. However, when a native thread transitions back to "VM" state (via any JNI call, including
DetachCurrentThread), it must > check the safepoint flag. If a safepoint is in progress, the thread blocks until the VM operation completes.
The Red Herring
The stack trace pointed to our Rust integration as the culprit, but I wasn't convinced. Most of the test suites that failed never touched Rust code.
They didn't spawn Tokio threads and didn't call native functions, yet the system was brought to a halt during their execution.
Furthermore, we had added extensive logging to our Rust teardown logic and to tokio to confirm that no threads from our tokio runtimes were lying around.
We added
ProcDump
(procdump -ma) to our CI runner to capture full memory dumps the moment a hang
occurred.
The Breakthrough: Safepoint Timeout and the Loader Lock
While I wrestled with ProcDump, my colleague Jaromir Hamala managed to
trigger a failure with Safepoint timeout enabled (JVM flags
-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=60000).
Since we knew the main thread was stuck in System.gc(), we suspected a
Safepoint issue.
2025-12-04T23:08:45.7599743Z [71.871s][warning][safepoint]2025-12-04T23:08:45.7600792Z [71.871s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:2025-12-04T23:08:45.7602055Z [71.871s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.2025-12-04T23:08:45.7603054Z [71.872s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:2025-12-04T23:08:45.7604092Z [71.872s][warning][safepoint] # "Thread-84" #105 daemon prio=5 os_prio=0 cpu=0.00ms elapsed=60.01s ... runnable2025-12-04T23:08:45.7605091Z [71.872s][warning][safepoint] java.lang.Thread.State: RUNNABLE2025-12-04T23:08:45.7605880Z [71.872s][warning][safepoint]2025-12-04T23:08:45.7607402Z [71.872s][warning][safepoint] # SafepointSynchronize::begin: (End of list)2025-12-04T23:55:36.3930473Z ##[error]The task has timed out.
In this case, the GC required a safepoint, but one thread (Thread-84) was not
cooperating.
The 0 CPU usage combined with the RUNNABLE state was suspicious. It meant the
thread wasn't doing work, but it also wasn't waiting on a standard Java lock.
The thread was healthy from the JVM's perspective, but it was stuck somewhere
outside of Java.
Jaromir then stumbled upon JNA Issue #1479: Improper thread detaching causes deadlock in LDR on Windows 10+.
The loader lock is a per-process mutex that Windows holds during DLL loading/unloading and also while starting/terminating threads.
The Rust documentation actually warns about this. When a thread exits, it runs Thread Local Storage (TLS) destructors while holding the Loader Lock.
If those destructors try to interact with synchronization primitives, you may deadlock.
Based on this, Jaromir formed a theory about how the deadlock occurs:
- Rust thread exits: A Rust thread finishes execution. Windows begins thread teardown and acquires the Loader Lock.
- TLS destructors run: While holding the Loader Lock, the thread runs its
TLS destructors. One destructor (from
jni-rs) tries to detach the thread from the JVM. - Detach triggers safepoint check: The JNI detach call transitions the thread from "Native" to "VM" state. This transition requires a safepoint poll.
- GC blocks the Rust thread: The GC is already running, so the Rust thread blocks at the safepoint barrier, waiting to be unparked when GC completes.
- GC waits for new Java threads: The GC cannot complete until all threads reach the safepoint. Some newly spawning Java threads haven't reported in yet.
- New threads blocked on Loader Lock: These new Java threads are stuck in OS initialization - Windows blocks them because they need the Loader Lock, which the Rust thread still holds.
The cycle is complete: The Rust thread waits for GC, GC waits for new threads, new threads wait for the Loader Lock, which is held by the Rust thread.
Aside: How Does the JVM Start Threads on Windows?
Understanding Windows thread creation is crucial to this deadlock. The JVM doesn't simply call
CreateThreadand let it run. Instead, it uses a two-phase approach that creates a window for state divergence between the JVM and the OS.Phase 1: Suspended Creation. The JVM creates threads using the
CREATE_SUSPENDEDflag:const unsigned initflag = CREATE_SUSPENDED | STACK_SIZE_PARAM_IS_A_RESERVATION;thread_handle = (HANDLE)__beginthreadex(nullptr,(unsigned)stack_size,&thread_native_entry,thread,initflag,&thread_id);This allocates all OS resources for the thread, but the thread doesn't execute yet. The JVM marks the internal state as
INITIALIZED.Phase 2: The Dangerous Handoff. When it's time to start the thread,
os::start_threaddoes two things in sequence:void os::start_thread(Thread* thread) {OSThread* osthread = thread->osthread();osthread->set_state(RUNNABLE); // 1. Mark as RUNNABLEpd_start_thread(thread); // 2. Then call ResumeThread()}The state is set to
RUNNABLEbefore the thread actually resumes. This happens in the parent thread's context.The Gap. After
ResumeThread()is called, the new thread is scheduled by Windows, but it must first acquire the Loader Lock to complete DLL initialization before user code runs. The OSThread state documentation acknowledges this gap: "Has been started and is runnable, but not necessarily running."This creates a critical window: the JVM considers the thread
RUNNABLEand expects it to reach a safepoint, but the thread may be blocked at the OS level, waiting for the Loader Lock. If another thread holds that lock indefinitely (say, while blocked on a JVM safepoint), the new thread can never make progress - and the JVM will wait forever for a thread that appears healthy but is actually frozen in kernel space.
The 10-Second Mystery
The theory was solid, but it didn't explain the "Pure Java" crashes. Why was this happening in tests that didn't use Rust?
I finally managed to capture a valid ProcDump (-ma for a full dump) from a
hanging CI runner during one of these "unrelated" tests.
I opened the dump, expecting to see only Java threads. Instead, I found Tokio worker threads being destroyed.
But how? These tests didn't use our Rust module.
Let's look at one of these Tokio threads:
34 Id: 2ac.a68 Suspend: 0 Teb: 0000003f`d963a000 Unfrozen "opendal-tokio-worker-18"# : Call Site00 : ntdll!NtWaitForAlertByThreadId+0x14...0a : ntdll!ImageTlsCallbackCaller+0x1a0b : ntdll!LdrpCallInitRoutine+0x6b0c : ntdll!LdrpCallTlsInitializers+0xc5 <-- THE SMOKING GUN (TLS DESTRUCTOR)0d : ntdll!LdrShutdownThread+0x14e0e : ntdll!RtlExitUserThread+0x3e0f : kernel32!BaseThreadInitThunk+0x1910 : ntdll!RtlUserThreadStart+0x2b
The culprit was OpenDAL, we use their Java library in other parts of the codebase.
It turns out that if you don't provide an AsyncExecutor (which is nothing more
than a wrapper around a Tokio runtime) to OpenDAL's Java bindings, it will
create a default Tokio runtime in a global static variable. When our
Rust-heavy tests finished, they cleaned up their resources, but the OpenDAL
global runtime persisted even with tests that didn't use it.
// bindings/java/src/executor.rsstatic mut RUNTIME: OnceLock<Executor> = OnceLock::new();
Jaromir noticed a peculiar pattern: there was exactly a 10-second delay between the Rust-related tests and the "unrelated" failures.
I dug into the Tokio source code. While the main thread pool lives forever, blocking threads have an idle timeout:
// DISCLAIMER: this is a simplified version of the actual Tokio codeconst KEEP_ALIVE: Duration = Duration::from_secs(10);impl Inner {fn run(&self, worker_thread_id: usize) {if let Some(f) = &self.after_start {f(); // call the on_thread_start hook}// initialization code...'main: loop {// running tasks...while !shared.shutdown {let lock_result = self.condvar.wait_timeout(shared, self.keep_alive).unwrap();// check that no work arrivedif !shared.shutdown && timeout_result.timed_out() {// No work arrived within the keep-alive duration, exit the thread.break 'main;}}}// uninitialization code...if let Some(f) = &self.before_stop {f(); // call the on_thread_stop hook}}}
As you can see, if a blocking thread is idle for 10 seconds, it exits.
The sequence:
- Tests using OpenDAL run, use blocking I/O, spawning blocking threads
- Tests finish, OpenDAL resources are cleaned up
- The global runtime stays alive (it's static)
- 10 seconds pass
- OpenDAL's Tokio runtime reaps idle blocking threads
- An unrelated test happens to start Java threads and trigger GC at that moment
- Deadlock
The "unrelated" tests were only unrelated in terms of direct function calls. They were tragically related by timing and global shared state. Knowing this, we were confident in our theory and were able to move on to the fix.
Fixing the issue
Ultimately, we believe this is a design flaw in the jni-rs library. The
library relies on thread_local! destructors to automatically detach the thread
from the JVM. While convenient, this is dangerous on Windows because TLS
destructors run while the OS holds the Loader Lock.
If you try to grab a JVM lock (which happens during detachment) while holding the Loader Lock, and the JVM tries to grab a thread lock (during GC) while holding the JVM lock, you get a classic lock inversion.
The fix is straightforward, instead of relying on automatic detachment during
TLS destruction, we need to explicitly detach threads, which clears the
thread_local! variable before TLS destructors run. Fortunately, Tokio provides
an on_thread_stop callback that runs while the thread is still in a normal
state, before the OS acquires the Loader Lock.
We patched our usage to explicitly handle detachment:
thread_local! {// We store the JNIEnv in a thread-local variable called ENVstatic ENV: RefCell<Option<JNIEnv<'static>>> = RefCell::new(None);}let stop_vm = env.get_java_vm().expect("The Java VM is not available");let runtime = tokio::runtime::Builder::new_multi_thread().on_thread_stop(move || {// We take the JNIEnv from the thread local storage manuallyENV.take();unsafe {// SAFETY: we've dropped the JNIEnv reference already// Explicitly detach while we are still "safe"stop_vm.detach_current_thread();}}).build().unwrap();
We've opened issues to address this upstream:
We also want to thank the maintainers of both projects for their quick responses, particularly rib who fixed the issue in version 0.22.0.
What we learned
Debugging is often about finding the one assumption you made that was wrong. For me, it was assuming that "Unrelated Tests" were actually unrelated.
But deeper than that, it highlights the dangers of abstraction. jni-rs tried
to be helpful by auto-detaching threads, but that convenience hid a critical
OS-level constraint.
Tools that helped
- ProcDump (
procdump -ma): Captures full memory dumps - WinDbg: Native stack analysis
- jhsdb (
jhsdb jstack --exe <java bin path> --core <dump>): Java stacks from dumps - Safepoint timeout logging
(
-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=60000): Shows which threads aren't cooperating with GC
The key was combining native and Java stack analysis. The problem lived at the boundary between the two runtimes, and you needed both views to see it.
Wrapping up
Four technologies intersected to create this bug: Java's safepoint mechanism, Rust's TLS destructors, JNI's thread attachment, and Windows' loader lock. Each one is well-documented individually. The interaction between them, less so.
If you're using JNI with Rust on Windows: don't let thread detachment happen during TLS destruction. Detach explicitly, while you still can.