JSON Lines
JSON Lines (JSONL) is a text format where each line represents a valid JSON object, separated by newline characters. This format combines JSON's flexibility with the simplicity of line-oriented processing, making it ideal for streaming data ingestion and time-series applications.
How JSON Lines works
JSON Lines structures data as a sequence of independent JSON objects, with each object on its own line. This simple yet powerful format enables efficient streaming processing and incremental parsing.
{"timestamp": "2024-01-01T00:00:00Z", "sensor": "A1", "value": 23.4}{"timestamp": "2024-01-01T00:00:01Z", "sensor": "A1", "value": 23.5}{"timestamp": "2024-01-01T00:00:02Z", "sensor": "A1", "value": 23.6}
The format's line-oriented nature allows systems to process data without loading entire files into memory, making it particularly suitable for time-series databases.
Next generation time-series database
QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.
Advantages for time-series data
Streaming-friendly structure
JSON Lines excels in streaming scenarios because:
- Each line is self-contained and independently parseable
- No need to maintain parsing state between records
- Natural fit for append-only data patterns
Schema flexibility
The format inherits JSON's schema flexibility while maintaining processability:
- Fields can be added or removed without breaking existing parsers
- Different record types can coexist in the same stream
- Optional fields don't waste space when absent
Next generation time-series database
QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.
Common use cases
Time-series data ingestion
JSON Lines is frequently used for ingesting time-series data from various sources:
{"ts": "2024-01-01T00:00:00Z", "type": "trade", "symbol": "AAPL", "price": 123.45}{"ts": "2024-01-01T00:00:01Z", "type": "quote", "symbol": "AAPL", "bid": 123.44, "ask": 123.46}
Industrial telemetry
The format efficiently handles sensor data and industrial metrics:
{"timestamp": "2024-01-01T00:00:00Z", "device_id": "pump_01", "pressure": 102.3, "temp": 85.6}{"timestamp": "2024-01-01T00:00:01Z", "device_id": "pump_01", "pressure": 102.4, "temp": 85.7}
Best practices
-
Timestamp consistency
- Use consistent timestamp formats (ISO 8601 recommended)
- Consider timezone handling requirements
-
Data validation
- Validate each line independently
- Include schema version if needed
- Handle malformed lines gracefully
-
Performance optimization
- Minimize unnecessary fields
- Consider compression for storage/transmission
- Use appropriate line buffering for processing
Comparison with other formats
JSON Lines offers distinct advantages compared to alternatives:
Format | Advantages | Disadvantages |
---|---|---|
JSON Lines | Line-oriented processing, human-readable | Less compact than binary formats |
Protocol Buffer | More compact, schema enforcement | Requires schema definition |
CSV | Simpler structure | Limited data types, escaping issues |
The choice between these formats often depends on specific requirements around schema flexibility, human readability, and processing efficiency.