Open Format Databases

Open format databases are database engines that read and write data directly in open, vendor-neutral storage formats, typically on object storage. Instead of hiding data inside a proprietary layout, they treat formats like Apache Parquet and table formats such as Apache Iceberg as the system of record.

What Are Open Format Databases?

An open format database is defined by its contract with storage. Data lives in an open table format on object storage, while the database engine focuses on query execution, indexing, and concurrency.

Multiple engines can safely share the same tables: a time-series database for real-time analytics, a batch engine for ETL, and a query engine for ad‑hoc exploration, all operating over the same Parquet/Iceberg data without copying or lock-in. This pattern underpins modern “Type III” architectures and open data lakes.

Next generation time-series database

QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.

Try live demo Read documentation

Why They Matter for Modern Analytics

Open format databases separate concerns cleanly:

Storage is cheap, durable, and standardized (cloud object storage).

Layout and governance come from the table format (schemas, partitions, snapshots, deletes).

Performance comes from specialized engines that can be swapped or combined over time.

For capital markets, this means a single, regulator‑friendly history of tick data, orders, and risk metrics that different engines can query for backtesting, overnight risk, or real‑time monitoring. In heavy industry, telemetry and OT data can be retained for years and reused across BI, digital twin models, and incident investigations without repeated migrations.

Key Building Blocks and Use Cases

Open format databases typically rely on:

Columnar file formats such as Parquet or ORC file.

A table format like Iceberg, which manages partitions, manifests, and versioned tables for time travel and safe schema evolution.

A catalog service that exposes tables to any compatible engine.

Common use cases include market data lakes, unified observability stores, and regulatory archives where long-term accessibility and interoperability are as important as raw performance.