QuestDB and the Modern Data Stack: Bridging Time Series, OLAP, and the Lakehouse
When was the last time you wished your database was slower? Probably never. And yet, for most of the history of databases, speed at the scale of millions of writes per second and queries over billions of records wasn't even on the table.
At QuestDB, we've had a front-row seat to how the database landscape has evolved. This post tells the story of how we got here, where things stand today, and how QuestDB fits into the modern data ecosystem. Not just as a time series database, but as a high-performance analytical engine for both real-time and historical data, built on the open standards the agent-driven data stack demands.
How we got here
To understand where QuestDB fits, it helps to understand the landscape it was born into.
Two decades ago, databases followed the OLTP pattern. They were heavily biased for reads, not writes, designed for a few million rows at best, and relied on indexes to speed up queries. Every developer in the 90s knew the refrain: the database is the bottleneck.
Then came two parallel movements. NoSQL databases optimized for fast inserts and fast non-analytical queries. Great for throughput, but not for the kind of analytical workloads that businesses increasingly needed. OLAP databases went the other direction: optimized for large batch inserts and fast analytical queries via complex indexes, materialization, denormalization, and data duplication.
The separation of storage and compute
Following the success of MapReduce and HDFS for data processing, many OLAP databases separated storage from computation. The key advantage was that multiple independent engines could now query the same data without each needing its own copy. As more teams and systems began writing data into shared object storage, the data lake emerged: a single space where different engines could query data created by others. But data lakes were still static. Writes were mostly batched, the OLAP file formats of the time made it very costly to update individual records, and cloud-based object stores worked with immutable files with no random-access updates. For a long time, OLAP essentially meant immutable data.
Open formats change the game
A major shift came with the development of new open file formats. Uber created Apache Hudi, Netflix created Apache Iceberg, and Databricks developed Delta Lake. All three are open formats that support mutable data, transactions, schema evolution, and streaming. Because they're open, multiple data engines and applications can share the same datasets with no duplication.
Suddenly, the question wasn't just "which database is fastest?" but "how do all these tools work together without duplicating data?"
This is the data lakehouse: a single shared space in object storage where data is no longer a static dump of files. It supports transactions, can be incremental and fresh, and multiple engines can read and write against the same datasets. The data lake, but with the reliability and mutability that applications actually need.
Streaming data complicates everything
Meanwhile, the real world was generating streaming data at an accelerating pace, and streaming data has some uncomfortable properties. It can get very big. It never stops. It's always incomplete. It will burst, lag, and arrive out of order. It will get updated after you've already emitted results. Individual data points lose value over time, but long-term aggregations are priceless. And analysts consistently prefer low latency and data freshness.
None of the existing database paradigms handled all of these realities well. Time series databases emerged to fill this gap, specializing in very fast ingestion, very fast queries over nascent data, and powerful time-based analytical queries. This is where QuestDB enters the picture.
QuestDB's three-tier storage engine
QuestDB implements a row-based write path for maximum ingestion throughput and a column-based read path for maximum query performance. The storage model is organized into three tiers that handle data across its entire lifecycle, from the moment it arrives to long-term archival.
Tier One: Hot ingest (WAL), durable by default
Incoming data is appended to the write-ahead log (WAL) with ultra-low latency. Writes are made durable before any processing, preserving order and surviving failures without data loss. The WAL is asynchronously shipped to object storage, so new replicas can bootstrap quickly and read the same history.
Tier Two: Real-time SQL on live data
Data is time-ordered and de-duplicated into QuestDB's native, time-partitioned columnar format and becomes immediately queryable. Power real-time analysis with vectorized, multi-core execution, streaming materialized views, and time-series SQL (e.g., ASOF JOIN, SAMPLE BY). The query planner spans tiers seamlessly.
Tier Three: Cold storage, open and queryable
Older data is automatically tiered to object storage in Apache Parquet. Query it in-place through QuestDB or use any tool that reads Parquet. This delivers predictable costs, interoperability with AI/ML tooling, and zero lock-in.
Tier 1: Parallel Write-Ahead Log
All incoming data is first appended to a Write-Ahead Log (WAL) with ultra-low latency. Writes are made durable before any processing, preserving order and surviving failures without data loss.
What makes this unusual is that the WAL is parallel. Multiple WAL writers can operate concurrently, each writing to their own WAL files, while a component called the Sequencer coordinates transactions across them. The Sequencer allocates unique transaction numbers chronologically and serves as the single source of truth, enabling data deduplication and consolidation. This design allows ingestion to scale linearly without sacrificing consistency.
Tier 2: QuestDB binary columnar storage
Changes from the WAL are applied by the TableWriter into QuestDB's native columnar binary format. The TableWriter also handles out-of-order data and enables deduplication. Data is organized into time-based partitions with a file-per-column layout. This means queries only touch the partitions that overlap the requested time range, and within each partition only the files for the columns actually referenced in the query. The result is that the engine skips the vast majority of stored data on every read.
The active (most recent) partition is always stored in this tier, ensuring minimum query latency for nascent data and optimizing writes for out-of-order arrivals or materialized view updates.
Tier 3: Parquet, locally or in object storage
Older partitions (anything other than the most recent) can be converted to Apache Parquet for both interoperability and compression. Users don't need to know which format a partition is in: the query engine spans both tiers seamlessly.
In QuestDB Enterprise, this conversion happens automatically. Parquet files can be sent to object storage (Amazon S3, Azure Blob Storage, Google Cloud Storage, NFS) to reduce the cost of storing historical data while keeping it fully queryable. The WAL is also asynchronously shipped to object storage, enabling new replicas to bootstrap quickly.
This three-tier model means QuestDB handles the full data lifecycle: hot ingest with durability guarantees, real-time SQL on live data with vectorized multi-core execution, and cold storage in an open, queryable format with predictable costs and zero lock-in.
Beyond nascent data: three ways to access historical data
QuestDB is often described as a time series database for nascent data. Fast ingestion, fast queries on recent records. That's true, but it's only part of the story. QuestDB also provides multiple access patterns for working with historical data, each suited to different use cases and resource constraints.
Run queries on raw historical data
QuestDB's query engine can scan across the entire dataset, both the binary
columnar partitions and the Parquet partitions in cold storage. You get the full
power of SQL with time-series extensions like
SAMPLE BY,
LATEST ON,
ASOF JOIN,
WINDOW JOIN, and HORIZON JOIN
across your entire history.
For very large working datasets, a practical approach is to run historical queries on a replica while the primary continues serving real-time ingestion without interference.
Query downsampled data via materialized views
For many analytical use cases, you don't need every raw data point. QuestDB's materialized views let you define downsampled versions of your data (for example, OHLC bars from tick data, or 15-minute averages from per-second sensor readings) that are maintained automatically as new data arrives. These are dramatically smaller than the raw dataset, making them the sweet spot for dashboards, reporting, and monitoring.
Read Parquet files directly, bypassing the query engine
For use cases that need the whole raw dataset, such as training ML models or running batch analytics, you can skip the query engine entirely and read the Parquet files that QuestDB generates. Any tool in the ecosystem (PyArrow, Pandas, DuckDB, Spark, Polars) can read them natively. No serialization overhead, no vendor lock-in.
OLAP performance without the trade-off
QuestDB is first and foremost a time series database. But its columnar architecture, parallel SQL engine, and JIT compiler make it a very capable OLAP engine for non-time-series queries too.
This isn't just a theoretical claim. On the ClickBench benchmark, a well-known independent benchmark for analytical databases maintained by the ClickHouse team using real-world web analytics data and generic OLAP queries with no time-series bias, QuestDB ranks among the fastest open-source databases. On hot runs (c7a.metal-48xl, open source, untuned), QuestDB places third at ×1.51, behind only DuckDB (×1.32) and ahead of every ClickHouse variant, including the creators of the benchmark. On cold runs, the picture shifts: ClickHouse variants lead, with QuestDB at ×6.47, comfortably ahead of DuckDB at ×19.19.
This matters because real-world workloads aren't purely time series or purely
OLAP. If your primary access pattern is time-series (high-frequency ingestion,
recent-data queries, SAMPLE BY aggregations), QuestDB handles that at the
performance you'd expect from a purpose-built TSDB. But when you also need to
run ad-hoc analytical queries that don't rely on time-based filtering, you don't
pay a performance penalty. You don't need a second database for the OLAP
workload.
Open formats and the broader ecosystem
One of the most important shifts in the data landscape is the move toward open formats and interoperability. QuestDB leans into this in several ways.
Data in Tier 3 is stored as Apache Parquet, which means it's accessible to any tool that reads Parquet. No extraction, no ETL, no duplication needed.
QuestDB speaks the PostgreSQL wire protocol, meaning it works with the vast ecosystem of PostgreSQL clients, BI tools, and integrations out of the box. Client libraries that support Apache Arrow, like Polars via connectorx, can connect through Pgwire and return data as Arrow on the client side.
AI-native by design
As the data stack becomes increasingly agent-driven, the database layer needs to be something agents can work with out of the box. That means standard SQL, open formats, open APIs, and documentation structured around concrete use cases rather than abstract reference pages. Proprietary query languages and closed ecosystems become a liability when your users are agents, not just humans.
QuestDB is built on exactly these primitives. Fast ingestion, expressive time-series SQL, materialized views, open storage formats, and a REST API that any agent can call. If the building blocks are right, agents can pick them up and solve specific workflows end to end. You don't need bespoke products for each asset class or use case. You just need the right instruction set for agents to work with.
A concrete example: a trader could prompt an agent to measure execution toxicity, decompose implementation shortfall, or compare fill quality across lit venues. These are the kinds of analyses firms pay specialist vendors for as packaged SaaS products today. The right primitives plus an agent can get you there from a prompt. Our capital markets SQL cookbook covers everything from VWAP to realized volatility with ready-to-run queries that agents can compose into complete workflows.
To make this practical, we provide dedicated QuestDB skills for AI coding agents like Claude Code and OpenAI Codex. These embed context about QuestDB's query syntax, ingestion patterns, and best practices directly into the agent, so you can go from a single prompt to a working market data pipeline.
The result is a database that fits into the modern data stack without creating data silos. Ingest via QuestDB's high-performance streaming protocol, query via SQL with time-series extensions, consume results via Pgwire (or, in the future, ADBC), and access the underlying data directly as Parquet. Let agents orchestrate the pieces. No lock-in at any layer.
Wrapping up
If your data has a timestamp and you need both speed and analytical depth, QuestDB might be the only database you need.
Want to try QuestDB? The open-source version is available at github.com/questdb/questdb, and you can explore a live demo at demo.questdb.io. For enterprise features like replication, RBAC, TLS, cold storage, and Kubernetes operator support, check out questdb.io/enterprise.