🛡️ QuestDB 9.0 is here!Read the release blog

Columnar File Format

SUMMARY

A columnar file format is a data storage format that organizes information by columns rather than rows, enabling efficient querying and compression of similar data types. These formats are particularly valuable for time-series data and analytical workloads where queries typically access specific columns rather than entire rows.

How columnar file formats work

Columnar file formats store data by grouping values from the same column together, rather than storing complete rows sequentially. This organization offers several advantages:

This approach enables:

Efficient compression of similar data types
Reduced I/O when querying specific columns
Better CPU cache utilization
Improved vectorized processing

Next generation time-series database

QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.

Try live demo Read documentation

Key features and benefits

Column-specific compression

Different columns can use different compression algorithms optimized for their data types. For example:

Timestamps often use delta encoding
Numeric columns benefit from bit-packing
String columns can use dictionary encoding

Predicate pushdown

Columnar formats enable efficient filtering by allowing queries to skip entire columns that aren't relevant to the query, known as predicate pushdown.

Schema evolution

Modern columnar formats support schema evolution, allowing columns to be added or modified without requiring a full data rewrite.

Next generation time-series database

Try live demo Read documentation

Common columnar file formats

Apache Parquet

Apache Parquet is a widely-adopted columnar format that offers:

Efficient encoding and compression schemes
Nested data structure support
Rich metadata handling

ORC (Optimized Row Columnar)

The ORC file format provides:

ACID transaction support
Advanced indexing capabilities
Built-in query optimization

Applications in time-series data

Columnar formats are particularly well-suited for time-series databases because:

Time-series queries typically focus on specific metrics over time periods
Similar data types in columns enable better compression ratios
Time-based partitioning aligns well with columnar storage

For example, in QuestDB, columnar storage enables efficient processing of time-based queries:

-- ⚠️ ANSI (requires QuestDB adaptation)
SELECT avg(temperature), max(humidity)
FROM sensor_data
WHERE timestamp BETWEEN '2023-01-01' AND '2023-01-31'

This query benefits from columnar storage by:

Reading only required columns (temperature, humidity, timestamp)
Leveraging column-specific compression
Enabling vectorized processing of homogeneous data