Merge-on-read
Merge-on-read is a data storage optimization strategy that defers the merging of base data and change data until read time, prioritizing write performance over read performance. This approach is particularly valuable in time-series databases and data lake architectures where write-heavy workloads are common.
How merge-on-read works
Merge-on-read maintains two data structures:
- A base data layer containing the original data
- A delta layer containing subsequent modifications
When a query is executed, the system merges these layers on-the-fly to provide the current view of the data.
Comparison with copy-on-write
While copy-on-write performs merging during write operations, merge-on-read shifts this cost to read time:
- Write performance: Faster writes as changes are only recorded in the delta layer
- Read performance: Higher latency as merging occurs during query execution
- Storage efficiency: More space-efficient as it avoids creating new copies during updates
Next generation time-series database
QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.
Applications in time-series data
In time-series databases, merge-on-read is particularly useful for:
- High-frequency data ingestion where write performance is critical
- Late-arriving data that needs to be merged with historical records
- Systems with append-heavy workloads
For example, in financial market data:
Next generation time-series database
QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.
Optimization techniques
Several strategies can optimize merge-on-read performance:
- Compaction thresholds: Automatically merging delta files when they exceed size limits
- Caching: Maintaining frequently accessed merged results
- Parallel merging: Distributing merge operations across multiple threads
Use cases and considerations
Merge-on-read is ideal for:
- Real-time analytics platforms
- Event sourcing systems
- Applications with high write volumes
- Scenarios where read latency is less critical than write performance
Consider these factors when implementing merge-on-read:
- Query patterns and frequency
- Write-to-read ratio
- Storage capacity
- Acceptable read latency
The strategy works well with data lake architectures and modern table formats that support versioning and time travel capabilities.