🛡️ QuestDB 9.0 is here!Read the release blog

Ingestion Format

SUMMARY

Ingestion format refers to the structured way data is organized and encoded when being loaded into a time-series database. The choice of format significantly impacts ingestion performance, storage efficiency, and query capabilities. Common formats include CSV, JSON, Protocol Buffers, and custom binary formats optimized for time-series data.

Understanding ingestion formats

Ingestion formats provide a standardized way to structure and encode data for efficient loading into time-series databases. The format defines how timestamps, metrics, and associated metadata are organized, impacting everything from parsing performance to storage requirements.

Key components of ingestion formats

Timestamp encoding: How temporal information is represented
Data type specifications: Definition of metric value formats
Schema information: Structure of the data and relationships
Metadata handling: How tags and labels are encoded
Compression characteristics: Built-in data reduction capabilities

Next generation time-series database

QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.

Try live demo Read documentation

Common ingestion formats

JSON

JSON is a flexible, human-readable format popular for its ease of use and wide support. While not the most efficient for high-volume ingestion, it excels in development and debugging scenarios.

{
  "timestamp": "2024-01-20T10:30:00Z",
  "metric": "cpu_usage",
  "value": 85.5,
  "tags": {
    "host": "server-01",
    "datacenter": "us-east"
  }
}

Protocol Buffers

Protocol buffer ingestion offers a compact binary format with strong schema validation, making it ideal for high-performance production environments.

CSV and Line Protocols

Text-based formats offering simplicity and widespread tool support. Often used for bulk loading historical data.

timestamp,metric,value,host,datacenter
2024-01-20T10:30:00Z,cpu_usage,85.5,server-01,us-east

Next generation time-series database

Try live demo Read documentation

Binary formats: Lower memory overhead
Text formats: Higher memory requirements for parsing
Compressed formats: Trade CPU for memory efficiency

Schema flexibility

Different formats offer varying levels of schema evolution support:

JSON: Highly flexible but with overhead
Protocol Buffers: Strong typing with controlled evolution
CSV: Fixed schema with limited flexibility

Monitoring and optimization

Ingestion metrics

Key metrics to monitor:

Parse time per record
Memory usage during ingestion
Error rates by format type
Write throughput by format

Format selection criteria

Consider these factors when choosing an ingestion format:

Data volume and velocity requirements
Schema stability vs flexibility needs
Development and operational complexity
Tool ecosystem compatibility

Best practices

Match format to use case:
- Development: Human-readable formats
- Production: Optimized binary formats
- Bulk loading: Compressed text formats
Consider the full pipeline:
- Source system capabilities
- Network transfer efficiency
- Storage implications
- Query requirements
Plan for evolution:
- Format versioning strategy
- Schema change procedures
- Backward compatibility needs
Implement proper validation:
- Schema validation
- Data quality checks
- Error handling procedures

Ingestion Format

Understanding ingestion formats

Key components of ingestion formats

Next generation time-series database

Common ingestion formats

JSON

Protocol Buffers

CSV and Line Protocols

Next generation time-series database

Performance considerations

Parsing efficiency

Memory utilization

Schema flexibility

Monitoring and optimization

Ingestion metrics

Format selection criteria

Best practices