The Mystery of the Phantom Quote in My CI Builds
Working on distributed systems at Hazelcast and now at QuestDB, I've debugged my share of race conditions - the kind involving network partitions, timing issues, and subtle concurrency bugs. But the bug I hit last week was different.
A CI build was failing randomly, all because of a single, phantom single-quote ('). The root cause? Not a typo, not a dangling string, but a bizarre race condition... between stdout and stderr.
The Setup: A "Simple" CI Step
My goal was straightforward: speed up our Rust builds in Azure Pipelines using sccache. The plan was simple:
- Create a Bash script step to run on our macOS and Windows builders.
- The script would check if
sccachewas already installed. - If not, it would download the correct release from GitHub, unpack it, and move it to a known path.
- Finally, it would set the
RUSTC_WRAPPERenvironment variable for subsequent steps using the##vso[task.setvariable]logging command.
A quick note on
##vso[task.setvariable]: This is an Azure Pipelines logging command - a special string that the agent watches for in your script's output (on both stdout and stderr). When it sees this pattern, it parses the command and sets an environment variable that subsequent steps in the pipeline can use. It's how you pass values between different tasks in a pipeline. You simply echo the command, and the agent does the rest.
Here's what the pipeline step looked like:
- bash: |set -euxSCCACHE_VERSION="0.10.0"export PATH="$(CARGO_INSTALL_PATH)/bin:$PATH"if [ "$(uname)" == "Darwin" ]; thenSCCACHE_BINARY="sccache"ARCHIVE="sccache-v${SCCACHE_VERSION}-x86_64-apple-darwin.tar.gz"elseSCCACHE_BINARY="sccache.exe"ARCHIVE="sccache-v${SCCACHE_VERSION}-x86_64-pc-windows-msvc.tar.gz"fiif [ -f "$(CARGO_INSTALL_PATH)/bin/${SCCACHE_BINARY}" ]; thenecho "sccache already installed at: $(CARGO_INSTALL_PATH)/bin"elseecho "Installing sccache ${SCCACHE_VERSION} for $(uname)..."# ... logic to curl, tar, and mv the binary ...echo "sccache installed successfully"fisccache --versionecho "##vso[task.setvariable variable=RUSTC_WRAPPER]$(CARGO_INSTALL_PATH)/bin/${SCCACHE_BINARY}"displayName: "Install sccache (Windows and macOS)"condition: or(eq(variables['os'], 'Windows'), eq(variables['os'], 'macOS'))env:SCCACHE_AZURE_BLOB_CONTAINER: $(RUST_CACHE_CONTAINER)SCCACHE_AZURE_CONNECTION_STRING: $(RUST_CACHE_CONNECTION_STRING)
This seemed perfectly fine. It ran. It installed sccache. The logs showed sccache --version succeeding.
And then, about 10% of the time, the build would inexplicably fail.
The Ghost: A Flaky "No such file or directory" Error
The sccache install step would pass, but the very next step, our Maven build, would collapse with this error:
[INFO] Running: cargo build --target-dir ... --release[INFO] error: could not execute process `/Users/runner/cargo-install/bin/sccache' /Users/runner/.rustup/toolchains/1.90-x86_64-apple-darwin/bin/rustc -vV` (never executed)[INFO][INFO] Caused by:[INFO] No such file or directory (os error 2)
I stared at that error for ages. I added ls commands. I double-checked paths. The file was always there.
The install step always reported success. How could the file exist one second and "not exist" the next?
Then I finally saw it. I must have read that line a hundred times, but my brain filtered it out. Look at the path in the error message. Really look at it.
error: [...] `/Users/runner/cargo-install/bin/sccache' [...]`
See it?
/Users/runner/cargo-install/bin/sccache'
There's a phantom single quote at the end of the path.
The system wasn't looking for sccache. It was looking for a file literally named sccache', which, of course, did not exist.
The Hunt: stdout vs. stderr
My script didn't add a quote. The echo command was clean. So where was it coming from? And why only sometimes? The "sometimes" was the key. This wasn't a simple typo; it was a race condition.
After a lot of digging, I found Azure Pipelines Task issue #10331, which confirmed my suspicions.
The culprit is a fascinating, unintentional collision between two completely separate features: Bash's xtrace and the Azure Pipelines agent's log parser.
Here’s the precise root cause:
- Bash Does Its Job: My script uses
set -x. This option tells Bash to print a trace of each command before executing it. This trace is correctly sent to stderr and is designed for human debugging. To make the output unambiguous, Bash'sxtraceintentionally quotes arguments that contain special characters. - The Agent is Overly "Helpful": The Azure Pipelines agent scans both stdout and stderr for special logging commands like
##vso[task.setvariable ...]. - The Race Condition: When my script runs the echo command, it generates output on both streams simultaneously:
stdout(the actual output):##vso[task.setvariable variable=RUSTC_WRAPPER]/Users/runner/cargo-install/bin/sccachestderr(the debug trace):+ echo '##vso[task.setvariable variable=RUSTC_WRAPPER]/Users/runner/cargo-install/bin/sccache'
Notice how Bash wraps the entire argument in single quotes on stderr - this is normal xtrace behavior. The agent's parser finds the ##vso[task.setvariable marker in both streams and processes both lines. Whichever line is processed last wins. When the parser extracts the value from the stderr line, it includes everything after variable=RUSTC_WRAPPER] up to the end of the line - including the trailing ' that Bash added for quoting. So the variable gets set to /Users/runner/cargo-install/bin/sccache' instead of /Users/runner/cargo-install/bin/sccache. Whether the build succeeds or fails depends entirely on which stream the agent happens to process last.
stdout: ##vso[...]/path/sccache <-─┐├-> Agent Parser (RACE!)stderr: + echo '##vso[...]/path/sccache' <-─┘
Why does this race exist? Looking at the Azure Pipelines agent source code,
both stdout and stderr events are wired to the same handler. When both streams emit ##vso markers simultaneously, both events can fire concurrently with no synchronization between them. Whichever finishes last wins.
I consider myself fairly experienced with concurrent programming, but finding a race condition between stderr and stdout in a simple CI script was a new one for me.
The Fix: Silencing stderr
The solution is wonderfully simple now that we know the problem. We just need to make sure set -x is turned off for the echo command, so it never writes its trace to stderr at all.
Since the echo is the last command in the script anyway, the fix is straightforward:
Before:
sccache --versionecho "##vso[task.setvariable variable=RUSTC_WRAPPER]$(CARGO_INSTALL_PATH)/bin/${SCCACHE_BINARY}"
After:
sccache --versionset +xecho "##vso[task.setvariable variable=RUSTC_WRAPPER]$(CARGO_INSTALL_PATH)/bin/${SCCACHE_BINARY}"
By calling set +x right before the echo, we turn off tracing for that final command. The stderr stream remains silent, and the agent only ever sees the one, clean ##vso line on stdout.
The phantom quote was banished, and our builds have been stable ever since.
The Takeaway
This wasn't a bug in Azure Pipelines or Bash - both were working exactly as designed. The race condition emerged from a surprising interaction between two legitimate features: Bash's debug tracing and the agent's log parsing. Someone more experienced with Azure Pipelines might have immediately recognized the danger of combining set -x with ##vso commands, but I didn't.
The broader lesson? Complex systems create unexpected emergent behaviors. When tools compose in non-obvious ways, even simple scripts can exhibit race conditions. If you ever see flaky behavior in your pipelines - especially with variables that seem to have extra characters - consider whether debug output on stderr might be interfering with special commands that your CI system is parsing.