🛡️ QuestDB 9.0 is here!Read the release blog

Table Format

SUMMARY

A table format is a specification that defines how data is organized, stored, and managed at the file system level. Modern table formats like Apache Iceberg, Delta Lake, and Apache Hudi provide features such as ACID transactions, schema evolution, and time travel capabilities for large-scale data management.

Understanding table formats

Table formats serve as the foundational layer between raw storage and data processing engines. They define:

File organization and naming conventions
Metadata management and structure
Transaction handling and concurrency control
Data versioning and time travel capabilities
Schema evolution rules

Unlike traditional file formats (CSV, Parquet), table formats provide a higher-level abstraction that ensures data consistency and reliability across distributed systems.

Next generation time-series database

QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.

Try live demo Read documentation

Core capabilities

Transaction support

Modern table formats implement ACID (Atomic, Consistent, Isolated, Durable) properties through mechanisms like:

Atomic commits using manifest files
Snapshot isolation for concurrent operations
Optimistic concurrency control

Version control and time travel

Table formats maintain historical versions of data through:

Snapshot-based versioning
Incremental changes tracking
Time travel query support

Next generation time-series database

Try live demo Read documentation

Implementation patterns

Copy-on-Write vs. Merge-on-Read

Table formats typically implement one of two patterns for managing updates:

Copy-on-write:
- Creates new files for each modification
- Provides immediate consistency
- Optimal for read-heavy workloads
Merge-on-Read:
- Maintains delta files for changes
- Defers compaction
- Better for write-heavy scenarios

Metadata management

Table formats employ sophisticated metadata handling:

Integration with data ecosystems

Modern table formats are designed to work seamlessly with:

Data lakes and lakehouse architectures
Stream processing engines
SQL query engines
Machine learning pipelines

They provide a unified approach to data management across these diverse environments.

Performance considerations

When implementing table formats, organizations should consider:

Read vs. write optimization needs
Metadata overhead and management
Compaction strategies
Partition design
Cache efficiency

The choice of table format can significantly impact query performance and operational efficiency.

Future trends

Table formats continue to evolve with:

Enhanced support for streaming data
Improved compression and encoding schemes
Better integration with cloud object storage
Advanced partitioning strategies
Simplified maintenance operations

These developments aim to address the growing demands of modern data architectures while maintaining backward compatibility and ease of use.