The Windows DLL loader lock: how a Rust thread can hang your JVM

Raphaël DalmonCore Database Engineer

Jaromir HamalaCore Database Engineer

February 26, 2026

Tags:

engineering rust java jni windows debugging

QuestDB is the open-source time-series database for demanding workloads—from trading floors to mission control. It delivers ultra-low latency, high ingestion throughput, and a multi-tier storage engine. Native support for Parquet and SQL keeps your data portable, AI-ready—no vendor lock-in.

Introduction

Several weeks ago, we encountered a silent, sporadic hang in our Windows CI pipeline. After a deep investigation, we uncovered a deadlock that left processes completely frozen with no ability to extract a Java stack trace.

This blog post walks through our debugging journey and includes low-level details about the Java Virtual Machine's garbage collection, Rust's thread-local storage, the JNI (Java Native Interface) attachment protocol, and a core Windows kernel primitive known as the Loader Lock.

TL;DR:

On Windows, the OS holds the process-wide Loader Lock during thread termination (specifically during Rust's TLS destruction).

TLS destruction triggers jni-rs, which tries to detach the thread from the JVM. This step transitions the thread from "Native" to "VM" state, and because the GC is running, this transition is blocked at the Safepoint Barrier. The Rust thread waits for the GC to unpark it.

Simultaneously, the GC is waiting for a newly spawning Java thread to report in. However, this new thread cannot reach the safepoint; it is blocked in the OS initialization phase, waiting for the Loader Lock (held by the Rust thread).

The First Clues: A Local Reproducer and Thread Dumps

Our CI pipeline runs a suite of tests on Linux, MacOS and Windows using Azure Pipelines. On Windows, we noticed that some test suites would occasionally hang until the job timed out.

My first reflex was to replicate the issue locally in order to gather more details. After a few attempts, the hang occurred, and I was able to capture a process dump.

With this process dump, I was able to extract native stacks using WinDbg and Java stacks using jhsdb. We found three clues:

The main thread was stuck in GC:

"Time-limited test" #4053 daemon prio=5 tid=0x0000019183c06c30 nid=0x2270 waiting on condition [0x00000019f0afe000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_blocked
 - java.lang.Runtime.gc() @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.System.gc() @bci=3, line=1907 (Compiled frame)
 - io.questdb.ServerMain.start(boolean) @bci=66, line=251 (Interpreted frame)
	- locked <0x0000000704bc3138> (a com.questdb.AbstractEntBootstrapTest$EntGriffinServerMain)
 - io.questdb.ServerMain.start() @bci=2, line=239 (Interpreted frame)

It was waiting for all threads to reach a "safepoint", a JVM mechanism for safely pausing threads during VM-level operations, including garbage collection.

Several Rust Tokio worker threads were in their on_thread_stop hook, making a JNI call.

53  Id: 4994.575c Suspend: 0 Teb: 00000019`f07a8000 Unfrozen "tokio-runtime-worker"
 # Call Site
00 ntdll!NtWaitForSingleObject+0x14
01 KERNELBASE!WaitForSingleObjectEx+0x8e
02 jvm!XXX+0x1cb607
03 jvm!XXX+0x5e9b4
04 jvm!XXX+0x624b9
05 jvm!XXX+0x11a06f
06 qdb_ent14818614347342639976!jni::wrapper::jnienv::JNIEnv::call_method_unchecked<ref$<jni::wrapper::objects::global_ref::GlobalRef>,jni::wrapper::objects::jmethodid::JMethodID>+0xa681 [C:\w\.cargo\registry\src\index.crates.io-1949cf8c6b5b557f\jni-0.21.1\src\wrapper\macros.rs @ 86]
07 qdb_ent14818614347342639976!qdb_ent::call_method<tuple$<>,qdb_ent::call_void_method::closure_env$0>+0xf0 [C:\w\questdb-ent\rust\qdb-ent\src\lib.rs @ 56]
08 qdb_ent14818614347342639976!qdb_ent::call_void_method+0x49 [C:\w\questdb-ent\rust\qdb-ent\src\lib.rs @ 86]
09 qdb_ent14818614347342639976!qdb_ent::tokio::ThreadLifetimeListener::on_thread_stop+0x35 [C:\w\questdb-ent\rust\qdb-ent\src\tokio.rs @ 46]
0a qdb_ent14818614347342639976!qdb_ent::tokio::Java_com_questdb_tokio_TokioRuntime_create::closure$4+0xe [C:\w\questdb-ent\rust\qdb-ent\src\tokio.rs @ 101]

Note: we have hidden some addresses with XXX because the debug symbols were not available.

There were several unnamed threads laying around.

Aside: What is a Safepoint?

A safepoint is a point in execution where a thread's state is fully describable to the JVM: all object references reside in known locations (registers, stack slots, or heap), and no heap mutation is in flight. The JVM can only perform certain global operations - most notably GC - when all mutator threads are stopped at a safepoint simultaneously. (Since JDK 10, thread-local handshakes allow some operations on individual threads, but GC still requires a global stop.)

The mechanism: the JVM pre-allocates two contiguous memory pages - a "bad" page (no access) and a "good" page (readable). The JIT compiler emits polling instructions at method returns and loop back-edges. To arm a safepoint, the VM switches threads' poll addresses from the good page to the bad page. Reading the bad page triggers a SIGSEGV (or access violation on Windows) that the JVM's signal handler catches and uses to block the thread at the safepoint barrier.

Threads executing native code via JNI are in a special "Native" state and don't poll - they're considered "safe" because they shouldn't hold direct object references. However, when a native thread transitions back to "VM" state (via any JNI call, including DetachCurrentThread), it must > check the safepoint flag. If a safepoint is in progress, the thread blocks until the VM operation completes.

The Red Herring

The stack trace pointed to our Rust integration as the culprit, but I wasn't convinced. Most of the test suites that failed never touched Rust code.

They didn't spawn Tokio threads and didn't call native functions, yet the system was brought to a halt during their execution.

Furthermore, we had added extensive logging to our Rust teardown logic and to tokio to confirm that no threads from our tokio runtimes were lying around.

We added ProcDump (procdump -ma) to our CI runner to capture full memory dumps the moment a hang occurred.

The Breakthrough: Safepoint Timeout and the Loader Lock

While I wrestled with ProcDump, my colleague Jaromir Hamala managed to trigger a failure with Safepoint timeout enabled (JVM flags -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=60000).

Since we knew the main thread was stuck in System.gc(), we suspected a Safepoint issue.

2025-12-04T23:08:45.7599743Z [71.871s][warning][safepoint]
2025-12-04T23:08:45.7600792Z [71.871s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
2025-12-04T23:08:45.7602055Z [71.871s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
2025-12-04T23:08:45.7603054Z [71.872s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
2025-12-04T23:08:45.7604092Z [71.872s][warning][safepoint] # "Thread-84" #105 daemon prio=5 os_prio=0 cpu=0.00ms elapsed=60.01s ... runnable
2025-12-04T23:08:45.7605091Z [71.872s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
2025-12-04T23:08:45.7605880Z [71.872s][warning][safepoint]
2025-12-04T23:08:45.7607402Z [71.872s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
2025-12-04T23:55:36.3930473Z ##[error]The task has timed out.

In this case, the GC required a safepoint, but one thread (Thread-84) was not cooperating.

The 0 CPU usage combined with the RUNNABLE state was suspicious. It meant the thread wasn't doing work, but it also wasn't waiting on a standard Java lock. The thread was healthy from the JVM's perspective, but it was stuck somewhere outside of Java.

Jaromir then stumbled upon JNA Issue #1479: Improper thread detaching causes deadlock in LDR on Windows 10+.

The loader lock is a per-process mutex that Windows holds during DLL loading/unloading and also while starting/terminating threads.

The Rust documentation actually warns about this. When a thread exits, it runs Thread Local Storage (TLS) destructors while holding the Loader Lock.

If those destructors try to interact with synchronization primitives, you may deadlock.

Based on this, Jaromir formed a theory about how the deadlock occurs:

Rust thread exits: A Rust thread finishes execution. Windows begins thread teardown and acquires the Loader Lock.
TLS destructors run: While holding the Loader Lock, the thread runs its TLS destructors. One destructor (from jni-rs) tries to detach the thread from the JVM.
Detach triggers safepoint check: The JNI detach call transitions the thread from "Native" to "VM" state. This transition requires a safepoint poll.
GC blocks the Rust thread: The GC is already running, so the Rust thread blocks at the safepoint barrier, waiting to be unparked when GC completes.
GC waits for new Java threads: The GC cannot complete until all threads reach the safepoint. Some newly spawning Java threads haven't reported in yet.
New threads blocked on Loader Lock: These new Java threads are stuck in OS initialization - Windows blocks them because they need the Loader Lock, which the Rust thread still holds.

The cycle is complete: The Rust thread waits for GC, GC waits for new threads, new threads wait for the Loader Lock, which is held by the Rust thread.

Aside: How Does the JVM Start Threads on Windows?

Understanding Windows thread creation is crucial to this deadlock. The JVM doesn't simply call CreateThread and let it run. Instead, it uses a two-phase approach that creates a window for state divergence between the JVM and the OS.

Phase 1: Suspended Creation. The JVM creates threads using the CREATE_SUSPENDED flag:
const unsigned initflag = CREATE_SUSPENDED | STACK_SIZE_PARAM_IS_A_RESERVATION;
thread_handle = (HANDLE)__beginthreadex(nullptr,
                            (unsigned)stack_size,
                            &thread_native_entry,
                            thread,
                            initflag,
                            &thread_id);
This allocates all OS resources for the thread, but the thread doesn't execute yet. The JVM marks the internal state as INITIALIZED.

Phase 2: The Dangerous Handoff. When it's time to start the thread, os::start_thread does two things in sequence:
void os::start_thread(Thread* thread) {
  OSThread* osthread = thread->osthread();
  osthread->set_state(RUNNABLE);  // 1. Mark as RUNNABLE
  pd_start_thread(thread);        // 2. Then call ResumeThread()
}
The state is set to RUNNABLE before the thread actually resumes. This happens in the parent thread's context.

The Gap. After ResumeThread() is called, the new thread is scheduled by Windows, but it must first acquire the Loader Lock to complete DLL initialization before user code runs. The OSThread state documentation acknowledges this gap: "Has been started and is runnable, but not necessarily running."

This creates a critical window: the JVM considers the thread RUNNABLE and expects it to reach a safepoint, but the thread may be blocked at the OS level, waiting for the Loader Lock. If another thread holds that lock indefinitely (say, while blocked on a JVM safepoint), the new thread can never make progress - and the JVM will wait forever for a thread that appears healthy but is actually frozen in kernel space.

The 10-Second Mystery

The theory was solid, but it didn't explain the "Pure Java" crashes. Why was this happening in tests that didn't use Rust?

I finally managed to capture a valid ProcDump (-ma for a full dump) from a hanging CI runner during one of these "unrelated" tests.

I opened the dump, expecting to see only Java threads. Instead, I found Tokio worker threads being destroyed.

But how? These tests didn't use our Rust module.

Let's look at one of these Tokio threads:

 34  Id: 2ac.a68 Suspend: 0 Teb: 0000003f`d963a000 Unfrozen "opendal-tokio-worker-18"
 # : Call Site
00 : ntdll!NtWaitForAlertByThreadId+0x14
...
0a : ntdll!ImageTlsCallbackCaller+0x1a
0b : ntdll!LdrpCallInitRoutine+0x6b
0c : ntdll!LdrpCallTlsInitializers+0xc5 <-- THE SMOKING GUN (TLS DESTRUCTOR)
0d : ntdll!LdrShutdownThread+0x14e
0e : ntdll!RtlExitUserThread+0x3e
0f : kernel32!BaseThreadInitThunk+0x19
10 : ntdll!RtlUserThreadStart+0x2b

The culprit was OpenDAL, we use their Java library in other parts of the codebase.

It turns out that if you don't provide an AsyncExecutor (which is nothing more than a wrapper around a Tokio runtime) to OpenDAL's Java bindings, it will create a default Tokio runtime in a global static variable. When our Rust-heavy tests finished, they cleaned up their resources, but the OpenDAL global runtime persisted even with tests that didn't use it.

// bindings/java/src/executor.rs
static mut RUNTIME: OnceLock<Executor> = OnceLock::new();

Jaromir noticed a peculiar pattern: there was exactly a 10-second delay between the Rust-related tests and the "unrelated" failures.

I dug into the Tokio source code. While the main thread pool lives forever, blocking threads have an idle timeout:

// DISCLAIMER: this is a simplified version of the actual Tokio code

const KEEP_ALIVE: Duration = Duration::from_secs(10);

impl Inner {
    fn run(&self, worker_thread_id: usize) {
        if let Some(f) = &self.after_start {
            f(); // call the on_thread_start hook
        }

        // initialization code...

        'main: loop {
            // running tasks...

            while !shared.shutdown {
                let lock_result = self.condvar.wait_timeout(shared, self.keep_alive).unwrap();

                // check that no work arrived

                if !shared.shutdown && timeout_result.timed_out() {
                    // No work arrived within the keep-alive duration, exit the thread.
                    break 'main;
                }
            }
        }

        // uninitialization code...

        if let Some(f) = &self.before_stop {
            f(); // call the on_thread_stop hook
        }
    }
}

As you can see, if a blocking thread is idle for 10 seconds, it exits.

The sequence:

Tests using OpenDAL run, use blocking I/O, spawning blocking threads
Tests finish, OpenDAL resources are cleaned up
The global runtime stays alive (it's static)
10 seconds pass
OpenDAL's Tokio runtime reaps idle blocking threads
An unrelated test happens to start Java threads and trigger GC at that moment
Deadlock

The "unrelated" tests were only unrelated in terms of direct function calls. They were tragically related by timing and global shared state. Knowing this, we were confident in our theory and were able to move on to the fix.

Fixing the issue

Ultimately, we believe this is a design flaw in the jni-rs library. The library relies on thread_local! destructors to automatically detach the thread from the JVM. While convenient, this is dangerous on Windows because TLS destructors run while the OS holds the Loader Lock.

If you try to grab a JVM lock (which happens during detachment) while holding the Loader Lock, and the JVM tries to grab a thread lock (during GC) while holding the JVM lock, you get a classic lock inversion.

The fix is straightforward, instead of relying on automatic detachment during TLS destruction, we need to explicitly detach threads, which clears the thread_local! variable before TLS destructors run. Fortunately, Tokio provides an on_thread_stop callback that runs while the thread is still in a normal state, before the OS acquires the Loader Lock.

We patched our usage to explicitly handle detachment:

thread_local! {
  // We store the JNIEnv in a thread-local variable called ENV
  static ENV: RefCell<Option<JNIEnv<'static>>> = RefCell::new(None);
}

let stop_vm = env.get_java_vm().expect("The Java VM is not available");
let runtime = tokio::runtime::Builder::new_multi_thread()
    .on_thread_stop(move || {
        // We take the JNIEnv from the thread local storage manually
        ENV.take();
        unsafe {
            // SAFETY: we've dropped the JNIEnv reference already
            // Explicitly detach while we are still "safe"
            stop_vm.detach_current_thread();
        }
    })
    .build()
    .unwrap();

We've opened issues to address this upstream:

We also want to thank the maintainers of both projects for their quick responses, particularly rib who fixed the issue in version 0.22.0.

What we learned

Debugging is often about finding the one assumption you made that was wrong. For me, it was assuming that "Unrelated Tests" were actually unrelated.

But deeper than that, it highlights the dangers of abstraction. jni-rs tried to be helpful by auto-detaching threads, but that convenience hid a critical OS-level constraint.

Tools that helped

ProcDump (procdump -ma): Captures full memory dumps
WinDbg: Native stack analysis
jhsdb (jhsdb jstack --exe <java bin path> --core <dump>): Java stacks from dumps
Safepoint timeout logging (-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=60000): Shows which threads aren't cooperating with GC

The key was combining native and Java stack analysis. The problem lived at the boundary between the two runtimes, and you needed both views to see it.

Wrapping up

Four technologies intersected to create this bug: Java's safepoint mechanism, Rust's TLS destructors, JNI's thread attachment, and Windows' loader lock. Each one is well-documented individually. The interaction between them, less so.

If you're using JNI with Rust on Windows: don't let thread detachment happen during TLS destruction. Detach explicitly, while you still can.