Ingestion Contract
An ingestion contract is a formal specification that defines how data should be formatted, validated, and processed during ingestion into a time-series database. It establishes the rules and expectations between data producers and the database system, ensuring data quality and consistency.
How ingestion contracts work
Ingestion contracts act as a formal agreement between data sources and the database system, specifying:
- Expected data format and structure
- Required fields and their data types
- Timestamp format and precision requirements
- Validation rules and constraints
- Error handling and rejection policies
# Example ingestion contract pseudocodecontract TradeData {required timestamp: datetime(precision=microseconds)required symbol: string(max_length=10)required price: decimal(precision=6)required volume: integeroptional trade_id: stringvalidations {price > 0volume > 0timestamp <= current_time}}
Next generation time-series database
QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.
Benefits of ingestion contracts
Data quality assurance
Ingestion contracts help maintain data quality by enforcing validation rules at ingestion time. This prevents invalid or malformed data from entering the system, reducing the need for downstream cleaning and correction.
Clear expectations
By explicitly defining the expected format and rules, contracts provide clear guidance to data producers about how their data should be structured and what validations will be applied.
Error handling
Contracts specify how the system should handle various types of errors, such as:
- Missing required fields
- Invalid data types
- Constraint violations
- Out-of-order events
Next generation time-series database
QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.
Implementation considerations
Schema evolution
Ingestion contracts should account for schema evolution to handle changes in data structure over time. This might include:
- Version control for contracts
- Backward compatibility rules
- Migration strategies
Performance impact
Contract validation adds overhead to the ingestion pipeline. Systems need to balance thoroughness of validation with performance requirements:
Integration with data flows
Contracts need to work seamlessly with various ingestion methods:
- Batch ingestion
- Real-time data ingestion
- Multiple data formats (JSON, CSV, Protocol Buffers)
Best practices
- Clear documentation: Document all contract requirements and validation rules thoroughly
- Version control: Maintain versioning for contracts to track changes
- Testing: Provide test data and validation tools for data producers
- Error feedback: Implement clear error reporting mechanisms
- Performance monitoring: Track validation overhead and impact on ingestion latency