Upsert
An upsert (update-insert) is an atomic database operation that either inserts a new record or updates an existing one based on a specified condition. In time-series databases, upserts are particularly important for handling out-of-order data, late-arriving events, and data corrections while maintaining data consistency.
How upserts work
Upserts combine two fundamental database operations into a single atomic transaction:
- Check if a record exists based on a unique identifier
- If it exists, update it; if not, insert a new record
This behavior is especially valuable in time-series systems where data may arrive out of sequence or require revision:
Next generation time-series database
QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.
Importance in time-series data
Upserts are crucial for handling several common scenarios in time-series data management:
- Late-arriving data: When events arrive after their timestamp
- Data corrections: When historical values need revision
- Real-time updates: When maintaining latest-value tables
For example, in financial market data, upserts help maintain accurate order book snapshots when updates arrive asynchronously:
SELECT * FROM trades_latest_1dWHERE symbol = 'AAPL'LATEST ON timestamp PARTITION BY symbol;
Next generation time-series database
QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.
Implementation considerations
When implementing upserts, several factors need consideration:
Performance implications
- Write amplification: Upserts may require additional I/O compared to pure inserts
- Index maintenance: Indexes must be updated during upsert operations
- Concurrency: Multiple simultaneous upserts need proper transaction isolation
Data integrity
- Atomicity: The operation must be atomic to prevent partial updates
- Idempotency: Multiple identical upserts should produce the same result
- Consistency: Related indexes must remain synchronized
The efficiency of upsert operations significantly impacts overall system performance, especially in high-throughput scenarios involving real-time data ingestion.