Highly Available Reads with QuestDB

September 3, 2025

Tags:

QuestDB is the open-source time-series database for demanding workloads—from trading floors to mission control It delivers ultra-low latency, high ingestion throughput, and a multi-tier storage engine. Native support for Parquet and SQL keeps your data portable, AI-ready—no vendor lock-in.

QuestDB is designed for high performance and reliability, and under normal circumstances, a single node can handle substantial load with ease. But in production environments, even the best systems can face disruptions: network issues, storage limits, power outages, or routine maintenance can all cause a node to go offline.

If you're building mission-critical or latency-sensitive applications, you want to make sure that read queries don't fail just because a single node is down. QuestDB Enterprise supports built-in replication, so you can distribute read traffic across replicas and keep queries running even during outages.

PostgreSQL-compatible clients, by default, don't have visibility into replication state. This applies to both PostgreSQL and QuestDB. If a node goes down, the client won't automatically fail over unless you configure it to.

You can handle this at the infrastructure level using DNS failover, floating IPs, or a proxy between your servers and clients. But there's no need to complicate things on the server side. Most Postgres clients already support multiple hosts in the connection string, making it easy to implement client-side failover.

In this post, you'll learn how to implement high-availability reads using standard PostgreSQL clients and basic reconnect logic. If you don’t have a replicated cluster yet, you can simulate one locally and walk through examples in several languages.

Overview of replication using QuestDB Enterprise

In QuestDB Enterprise, replicas stay in sync automatically, using object store to propagate any DDL, DML, WAL ingestion, or authorization change to all the replicas in the cluster. Adding or removing replicas, and having them catch up to a consistent state is fully automated. Your entire topology behaves as a single logical database.

✅ You change a setting? It's updated everywhere.

✅ You create a table? It's replicated.

✅ You ingest via ILP? All replicas see it.

No extra plumbing needed.

Client applications can use either the REST API or the PostgreSQL protocol to query data from whichever node in the cluster. If a node is not responding, the client needs to be aware it needs to connect to the next available node. When using the postgres protocol, we can take advantage of the multi-host feature supported by most clients.

Starting the demo nodes

We recommend to use your QuestDB Enterprise cluster, configured with at least one replica, to run this demo.

Simulating failover locally using QuestDB OSS (no replication)

If you'd rather not test against your production environment, or if you prefer to run the demo from a local development machine, you can simulate failover by creating a fake cluster using docker and starting several instances of QuestDB Open Source listening on different ports.

We are not doing any kind of replication in this case, as instances are completely unaware of each other, but since we only want to see how a client application can seamlessly failover reads, this simplified setup will do.

We’ll start three independent QuestDB instances on a single machine using Docker. Each listens on its own HTTP and Postgres port. We’ll pass a unique environment variable (QDB_CAIRO_WAL_TEMP_PENDING_RENAME_TABLE_PREFIX) so we can distinguish nodes later:

NOTE

This is not replication. These containers are completely independent. No writes or schema changes are shared between them. This setup is only meant to simulate failover for the client retry demo.

docker run --name primary -d --rm \
  -e QDB_CAIRO_WAL_TEMP_PENDING_RENAME_TABLE_PREFIX=temp_primary \
  -p 9000:9000 -p 8812:8812 questdb/questdb

docker run --name replica1 -d --rm \
  -e QDB_CAIRO_WAL_TEMP_PENDING_RENAME_TABLE_PREFIX=temp_replica1 \
  -p 9001:9000 -p 8813:8812 questdb/questdb

docker run --name replica2 -d --rm \
  -e QDB_CAIRO_WAL_TEMP_PENDING_RENAME_TABLE_PREFIX=temp_replica2 \
  -p 9002:9000 -p 8814:8812 questdb/questdb

To verify which node you’re connected to, run:

select value from (show parameters)
where property_path IN ( 'replication.role', 'cairo.wal.temp.pending.rename.table.prefix')
limit 1;

In QuestDB Enterprise, the output of the command will be primary or replica, as read from the replication.role parameter. If using the fake cluster via containers, the output will be temp_primary, temp_replica1, and temp_replica2.

Now that you have several instances to test failover, let’s look at how to build client-side retry logic.

How HA reads work

PostgreSQL clients in most languages, support the standard postgres's libpq way of passing multiple hosts in the connection string. This lets the driver connect to the first available host, but, as per libpq itself, there’s no automatic reconnection if that host dies.

To make reads highly available, you need two things:

List of nodes in the connection string.
Retry logic to catch failures and connect to the next one.

This repository shows how to get highly available reads using standard postgresql libraries from Python, Java, C++, C#, Go, Rust, and Node.js.

We’ll show the Python version first, followed by Node.js, which uses a slightly different approach.

Python example

This Python example uses psycopg3. The retry logic is simple: try to connect, catch failure, and try the next host.

TIP

Make sure libpq is installed on your system. This is typically included with PostgreSQL, or available via your package manager.

import time
import psycopg

CONN_STR = "host=localhost:8812,localhost:8813,localhost:8814 user=admin password=quest dbname=qdb connect_timeout=3"
QUERY = """
          select value from (show parameters)
           where property_path = 'cairo.wal.temp.pending.rename.table.prefix'
        """

def get_conn():
    while True:
        try:
            conn = psycopg.connect(CONN_STR)
            print(f"Connected to {conn.info.host}:{conn.info.port}")
            return conn
        except Exception as e:
            print(f"Connection failed: {e}")
            time.sleep(1)

with get_conn() as conn:
    with conn.cursor() as cur:
        for i in range(250):
            try:
                cur.execute(QUERY)
                row = cur.fetchone()
                print(row[0])
            except Exception as e:
                print(f"Query failed: {e}")
                conn = get_conn()
                cur = conn.cursor()
            time.sleep(0.3)

The script will run the query 250 times, one every 300ms, and print the result. If you stop/restart your docker containers during the demo, you will see how the application seamlessly reconnects to the next available host, or it just keeps retrying until one is available if all hosts are down.

Python HA client reconnects as containers restart. — Screen capture of the Python script reconnecting as hosts are being stopped and restarted.

You can find a more complete implementation in the GitHub repo, as well as the Java, Go, Rust, C#, and C++ versions.

What about Node.js?

Node.js supports libpq-style multi-host connection strings only when using pg-native (i.e. libpq bindings). The default pg client (pure JS) does not support it. If you want cross-platform compatibility or don’t want to rely on native bindings, you need to implement the failover logic manually.

You can build your own rotation logic: keep an array of hosts, and connect to the first one that responds. If the query fails, close the connection and try the next.

Here’s the key part:

const hosts = [
  { host: 'localhost', port: 8812 },
  { host: 'localhost', port: 8813 },
  { host: 'localhost', port: 8814 }
];

let current = 0;

async function connectToAvailableHost() {
  for (let i = 0; i < hosts.length; i++) {
    const { host, port } = hosts[current];
    const connStr = `postgres://admin:quest@${host}:${port}/qdb`;

    try {
      const client = new Client({ connectionString: connStr });
      await client.connect();
      console.log(`Connected to ${host}:${port}`);
      return client;
    } catch (e) {
      console.error(`Failed to connect to ${host}:${port}: ${e.message}`);
      current = (current + 1) % hosts.length;
    }
  }

  throw new Error('Could not connect to any host');
}

Then you use connectToAvailableHost() to build your retry loop just like in the other examples. The example code for NodeJs can be found at the demo repository.

Summary

High-availability reads are crucial for resilience, whether you're using QuestDB for market data, observability, or analytics.

✅ QuestDB Enterprise provides out-of-the box read replicas.

🧠 You can implement smart retry logic with minimal effort in most languages.

🧪 We’ve tested this in Python, Java, Node.js, Go, Rust, .NET, and C++. Find them all in the companion repo.

Give it a spin, pull the plug on your primary node, and watch your reads survive.