Schema on Write

Schema-on-write is a data validation approach where the structure and format of data are strictly enforced at ingestion time, before being written to storage. This traditional database paradigm ensures data consistency but can impact ingestion performance and require careful upfront schema design.

How schema-on-write works

Schema-on-write validates incoming data against a predefined schema when data is first written to the database. This approach acts as a gatekeeper, rejecting any data that doesn't conform to the expected structure. The process involves:

Schema definition

Validation at write time

Storage in structured format

Optimized read performance

Next generation time-series database

QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.

Try live demo Read documentation

Comparison with schema-on-read

Unlike schema-on-read, schema-on-write frontloads data validation work. This fundamental difference affects several aspects of database operations:

Aspect	Schema-on-write	Schema-on-read
Write Performance	Slower	Faster
Read Performance	Faster	Slower
Data Consistency	Enforced	Optional
Flexibility	Lower	Higher

Aspect

Schema-on-write

Schema-on-read

Write Performance

Slower

Faster

Read Performance

Faster

Slower

Data Consistency

Enforced

Optional

Flexibility

Lower

Higher

Next generation time-series database

Try live demo Read documentation

Applications in time-series data

In time-series databases, schema-on-write is particularly important for:

Industrial sensor data validation

Financial market data normalization

IoT device data conformance

Regulatory compliance requirements

For example, in financial market data:

# Pseudocode example of schema validation
schema = {
    "timestamp": "datetime",
    "symbol": "string",
    "price": "decimal",
    "volume": "integer"
}

def validate_trade(data):
    if not matches_schema(data, schema):
        raise ValidationError("Invalid trade data")
    return process_trade(data)

Best practices

Careful Schema Design

Plan for future fields
Consider data types carefully
Document schema decisions

Performance Optimization

Balance validation depth with performance
Use appropriate data types
Consider batch validation for bulk loads

Error Handling

Implement clear error messages
Create recovery procedures
Log validation failures

Schema Evolution

Plan for schema updates
Maintain backward compatibility
Document schema versions

Schema-on-write remains a cornerstone of data quality management, particularly in systems where data consistency and reliability are paramount.

Schema on Write

How schema-on-write works

Next generation time-series database

Benefits and tradeoffs

Advantages

Disadvantages

Comparison with schema-on-read

Next generation time-series database

Applications in time-series data

Best practices