Interested in QuestDB use cases?

Merge-on-read

SUMMARY

Merge-on-read is a data storage optimization strategy that defers the merging of base data and change data until read time, prioritizing write performance over read performance. This approach is particularly valuable in time-series databases and data lake architectures where write-heavy workloads are common.

How merge-on-read works

Merge-on-read maintains two data structures:

A base data layer containing the original data
A delta layer containing subsequent modifications

When a query is executed, the system merges these layers on-the-fly to provide the current view of the data.

Comparison with copy-on-write

While copy-on-write performs merging during write operations, merge-on-read shifts this cost to read time:

Write performance: Faster writes as changes are only recorded in the delta layer
Read performance: Higher latency as merging occurs during query execution
Storage efficiency: More space-efficient as it avoids creating new copies during updates

Next generation time-series database

QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.

Try live demo Read documentation

Applications in time-series data

In time-series databases, merge-on-read is particularly useful for:

High-frequency data ingestion where write performance is critical
Late-arriving data that needs to be merged with historical records
Systems with append-heavy workloads

For example, in financial market data:

Next generation time-series database

Try live demo Read documentation

Optimization techniques

Several strategies can optimize merge-on-read performance:

Compaction thresholds: Automatically merging delta files when they exceed size limits
Caching: Maintaining frequently accessed merged results
Parallel merging: Distributing merge operations across multiple threads

Use cases and considerations

Merge-on-read is ideal for:

Real-time analytics platforms
Event sourcing systems
Applications with high write volumes
Scenarios where read latency is less critical than write performance

Consider these factors when implementing merge-on-read:

Query patterns and frequency
Write-to-read ratio
Storage capacity
Acceptable read latency

The strategy works well with data lake architectures and modern table formats that support versioning and time travel capabilities.