Materialized views

info

Materialized View support is in beta. It may not be fit for production use.

Please let us know if you run into issues. Either:

  1. Email us at support@questdb.io
  2. Join our public Slack
  3. Post on our Discourse community

A materialized view is a special QuestDB table that stores the pre-computed results of a query. Unlike regular views, which compute their results at query time, materialized views persist their data to disk, making them particularly efficient for expensive aggregate queries that are run frequently.

What are materialized views for?​

As data grows in size, the performance of certain queries can degrade. Materialized views store the result of a SAMPLE BY or time-based GROUP BY query on disk, and keep it automatically up to date.

The refresh of a materialized view is INCREMENTAL and very efficient, and using materialized views can offer 100x or higher query speedups. If you require the lowest latency queries, for example, for charts and dashboards, use materialized views!

For a better understanding of what materialized views are for, read the introduction to materialized views documentation.

Creating a materialized view​

There is a fundamental limit to how fast certain aggregation and scanning queries can execute, based on the data size, number of rows, disk speed, and number of cores.

Materialized views let you bound the runtime for common aggregation queries, by allowing you to pre-aggregate historical data ahead-of-time. This means that for many queries, you only need to aggregate the latest partition's data, and then you can use already aggregated results for historical data.

Throughout this document, we will use the demo trades table. This is a table containing crypto trading data, with over 1.6 billion rows.

trades ddl
CREATE TABLE 'trades' (
symbol SYMBOL,
side SYMBOL,
price DOUBLE,
amount DOUBLE,
timestamp TIMESTAMP
) TIMESTAMP(timestamp) PARTITION BY DAY;

A full syntax definition can be found in the CREATE MATERIALIZED VIEW documentation.

Here is a materialized view taken from our demo, which calculates OHLC bars for a candlestick chart. The view reads data from the base table, trades. It then calculates aggregate functions such as first, sum etc. over 15 minutes time buckets. The view is incrementally refreshed, meaning it is always up to date with the latest trades data.

note

If you are unfamiliar with the OHLC concept, please see our OHLC guide.

trades_OHLC_15m ddl
CREATE MATERIALIZED VIEW 'trades_OHLC_15m'
WITH BASE 'trades' REFRESH INCREMENTAL
AS (
SELECT
timestamp, symbol,
first(price) AS open,
max(price) as high,
min(price) as low,
last(price) AS close,
sum(amount) AS volume
FROM trades
SAMPLE BY 15m
) PARTITION BY MONTH;

In this example:

  1. The view is called trades_OHLC_15m.
  2. The base table is trades
    • This is the data source, and will trigger incremental refresh when new data is written.
  3. The refresh strategy is INCREMENTAL
    • The data is automatically refreshed and incrementally written; efficient, fast, low maintenance.
  4. The SAMPLE BY query contains two key column (timestamp, symbol) and five aggregates (first, max, min, last, price) calculated in 15m time buckets.
  5. The view is partitioned by DAY.
  6. No TTL is defined
    • Therefore, the materialized view will contain a summary of all the base trades table's data.
tip

This particular example can also be written via the compact syntax.

The view name​

We recommend naming the view with some reference to the base table, its purpose, and its sample size.

In our trades_OHLC_15m example, we combine:

  • trades (the base table name)
  • OHLC (the purpose)
  • 15m (the sample unit)

The base table​

The base table triggers updating the materialized view, and is the main source of raw data.

The SAMPLE BY query can contain a JOIN. However, the secondary JOIN tables will not trigger any sort of refresh.

Refresh strategies​

Currently, only INCREMENTAL refresh is supported. This strategy incrementally updates the view when new data is inserted into the base table. This means that only new data is written to the view, so there is minimal write overhead.

Upon creation, or when the view is invalidated, a full refresh will occur, which rebuilds the view from scratch.

SAMPLE BY​

Materialized views are populated using SAMPLE BY or time-based GROUP BY queries.

When new data is written into the base table, an incremental refresh is triggered, which adds this new data to the view.

Not all SAMPLE BY syntax is supported. In general, you should aim to keep your query as simple as possible, and move complex transformations to an outside query that runs on the down-sampled data.

PARTITION BY​

Optionally, you may specify a partitioning scheme.

You should choose a partition unit which is larger than the sampling interval. Ideally, the partition unit should be divisible by the sampling interval.

For example, an SAMPLE BY 8h clause fits nicely with a DAY partitioning strategy, with 3 timestamp buckets per day.

Default partitioning​

If the PARTITION BY clauses is omitted, the partitioning scheme is automatically inferred from the SAMPLE BY clause.

IntervalDefault partitioning
> 1 hourPARTITION BY YEAR
> 1 minutePARTITION BY MONTH
<= 1 minutePARTITION BY DAY

TTL​

Though TTL was not included, it can be set on a materialized view, and does not need to match the base table.

For example, if we only wanted to pre-aggregate the last 30 days of data, we could add:

PARTITION BY DAY TTL 30 DAYS;

to the end of our materialized view definition.

Compact syntax​

If you're happy with the defaults and don't need to customize materialized view parameters such as PARTITION BY or TTL, then you can use the compact syntax which omits the parentheses.

trades_OHLC_15m compact syntax
CREATE MATERIALIZED VIEW trades_OHLC_15m AS
SELECT
timestamp, symbol,
first(price) AS open,
max(price) as high,
min(price) as low,
last(price) AS close,
sum(amount) AS volume
FROM trades
SAMPLE BY 15m;

Querying materialized views​

note

The example trades_OHLC_15m view is available on our demo, and contains realtime crypto data - try it out!

Materialized Views support all the same queries as regular QuestDB tables.

Here's how you can check today's trading data:

querying trades_OHLC_15mDemo this query
trades_OHLC_15m WHERE timestamp IN today();
timestampsymbolopenhighlowclosevolume
2025-03-31T00:00:00.000000ZETH-USD1807.941813.321804.691808.581784.144071999995
2025-03-31T00:00:00.000000ZBTC-USD82398.482456.582177.682284.534.47331241
2025-03-31T00:00:00.000000ZDOGE-USD0.166540.167480.166290.166773052051.6327359965
2025-03-31T00:00:00.000000ZAVAX-USD18.8718.88518.78118.8266092.852976000005
.....................

How much faster is it?​

Let's run the OHLC query without using the view, against our trades table:

the OHLC queryDemo this query
SELECT
timestamp, symbol,
first(price) AS open,
max(price) as high,
min(price) as low,
last(price) AS close,
sum(amount) AS volume
FROM trades
SAMPLE BY 15m;

This takes several seconds to execute.

Yet if we query the materialized view instead:

OHLC materialized view unboundedDemo this query
trades_OHLC_15m;

This returns in milliseconds, since the database only has to respond with data, and not calculate anything - that has all been done efficiently, ahead of time.

What about for fewer rows?​

Let's try this calculation again, but just for one day instead of the entire 1.6 billion rows.

OHLC query for yesterdayDemo this query
SELECT
timestamp, symbol,
first(price) AS open,
max(price) as high,
min(price) as low,
last(price) AS close,
sum(amount) AS volume
FROM trades
WHERE timestamp IN yesterday()
SAMPLE BY 15m
ORDER BY timestamp, symbol;
timestampsymbolopenhighlowclosevolume
2025-03-30T00:00:00.000000ZADA-USD0.67320.67440.6710.6744132304.36510000005
2025-03-30T00:00:00.000000ZADA-USDC0.67270.6730.6710.672915614.750700000002
2025-03-30T00:00:00.000000ZADA-USDT0.67320.67440.6710.6744132304.36510000005
2025-03-30T00:00:00.000000ZAVAX-USD19.60219.63219.51819.6313741.162465999998
2025-03-30T00:00:00.000000ZAVAX-USDT19.60219.63219.51819.6313741.162465999998
2025-03-30T00:00:00.000000ZBTC-USD826508275082563.68274725.493136499999
.....................

Calculating the OHLC for a single day takes only 15ms.

We can get the same data using the materialized view:

OHLC materialized view for yesterdayDemo this query
trades_OHLC_15m
WHERE timestamp IN yesterday()
ORDER BY timestamp, symbol;
timestampsymbolopenhighlowclosevolume
2025-03-30T00:00:00.000000ZADA-USD0.67320.67440.6710.6744132304.36510000005
2025-03-30T00:00:00.000000ZADA-USDC0.67270.6730.6710.672915614.750700000002
2025-03-30T00:00:00.000000ZADA-USDT0.67320.67440.6710.6744132304.36510000005
2025-03-30T00:00:00.000000ZAVAX-USD19.60219.63219.51819.6313741.162465999998
2025-03-30T00:00:00.000000ZAVAX-USDT19.60219.63219.51819.6313741.162465999998
2025-03-30T00:00:00.000000ZBTC-USD826508275082563.68274725.493136499999
.....................

This returns the data in just 2ms over 7x faster - again, because it doesn't have to calculate anything. The data has already been efficiently pre-aggregated, cached by the materialized view, and persisted to disk. No aggregation is required, and hardly any rows are scanned!

So even for small amounts of data, a materialized view can be extremely useful.

Limitations​

Beta​

  • Not all SAMPLE BY syntax is supported, for example, FILL.
  • The INCREMENTAL refresh strategy relies on deduplicated inserts (O3 writes)
    • We will instead delete a time range and insert the data as an append, which is much faster.
    • This also means that currently, deduplication keys must be aligned across the base table and the view.

Post-release​

  • Only INCREMENTAL refresh is supported
    • We intend to add alternatives, such as:
      • PERIODIC (once per partition),
      • TIMER (once per time interval)
      • MANUAL (only when manually triggered)
  • INCREMENTAL refresh is only triggered by inserts into the base table, not join tables.

LATEST ON materialized views​

LATEST ON queries can have variable performance, based on how frequently the symbols in the PARTITION BY column have new entries written to the table. Infrequently updated symbols require scanning more data to find their last entry.

For example, pretend you have two symbols, A and B, and 100 million rows.

Then, there is one row with B, at the start of the data set, and the rest are A.

Unfortunately, the database will scan backwards and scan all 100 million rows of data, just to find the B entry.

But materialized views offer a solution to this performance issue too!

LATEST ON on demo tradesDemo this query
trades LATEST ON timestamp PARTITION BY symbol;
symbolsidepriceamounttimestamp
XLM-BTCsell0.000001635412024-08-21T16:56:15.038557Z
AVAX-BTCsell0.0003904410.1252024-08-21T18:00:24.549949Z
MATIC-BTCsell0.0000088622.62024-08-21T18:01:21.607212Z
ADA-BTCbuy0.00000621127.322024-08-21T18:05:37.852092Z
...............

This takes around 2s to execute. Now, let's see how much data needed to be scanned:

filtering for the time rangeDemo this query
SELECT min(timestamp), max(timestamp)
FROM trades
LATEST ON timestamp
PARTITION BY symbol;
minmax
2024-08-21T16:56:15.038557Z2025-03-31T12:55:28.193000Z

So the database scanned approximately 7 months of data to serve this query. How many rows was that?

number of rows the LATEST ON scannedDemo this query
SELECT count()
FROM trades
WHERE timestamp BETWEEN '2024-08-21T16:56:15.038557Z' AND '2025-03-31T12:55:28.193000Z';
count
766834703

Yes, ~767 million rows, just to serve the most recent 42 rows, one for each symbol.

Let's fix this using a new materialized view.

Observe that we have 42 unique symbols in the dataset.

If we were to take a LATEST ON query for a single day, we would therefore expect up to 84 rows (42 buys, 42 sells):

yesterday() LATEST ONDemo this query
(trades WHERE timestamp IN yesterday())
LATEST ON timestamp PARTITION BY symbol, side
ORDER BY symbol, side, timestamp;
symbolsidepriceamounttimestamp
ADA-USDbuy0.6611686.35572025-03-30T23:59:59.052000Z
ADA-USDsell0.6609270.89352025-03-30T23:59:46.585999Z
ADA-USDCbuy0.6603109.352025-03-30T23:57:56.194000Z
ADA-USDCsell0.6607755.97392025-03-30T23:59:35.635000Z
ADA-USDTbuy0.6611686.35572025-03-30T23:59:59.052000Z
ADA-USDTsell0.6609270.89352025-03-30T23:59:46.585999Z
AVAX-USDbuy18.8599.1998422025-03-30T23:59:47.788000Z
AVAX-USDsell18.8467.700862025-03-30T23:59:13.130000Z
AVAX-USDTbuy18.8599.1998422025-03-30T23:59:47.788000Z
AVAX-USDTsell18.8467.700862025-03-30T23:59:13.130000Z
BTC-USDbuy82398.20.0000252025-03-30T23:59:59.992000Z
BTC-USDsell82397.90.000018192025-03-30T23:59:59.796999Z
...............

This executes in 40ms.

A similar GROUP BY query looks like this:

LATEST ON as a GROUP BYDemo this query
SELECT
symbol,
side,
last(price) AS price,
last(amount) AS amount,
last(timestamp) AS timestamp
FROM trades
WHERE timestamp IN yesterday()
ORDER BY symbol, side, timestamp;

which executes in 8ms.

Instead of using the LATEST ON syntax, we can use a SAMPLE BY equivalent, which massively reduces the number of rows we need to query.

Then, we run this SAMPLE BY automatically using a materialized view, so we always have the fastest possible LATEST ON query.

Pre-aggregating the data​

We will pre-aggregate the ~767 million rows into just ~15000.

Instead of storing the raw data, we will store one row, per symbol, per side, per day of data.

down-sampling test queryDemo this query

SELECT timestamp, symbol, side, price, amount, "latest" as timestamp FROM (
SELECT timestamp,
symbol,
side,
last(price) AS price,
last(amount) AS amount,
last(timestamp) as latest
FROM trades
WHERE timestamp BETWEEN '2024-08-21T16:56:15.038557Z' AND '2025-03-31T12:55:28.193000Z'
SAMPLE BY 1d
) ORDER BY timestamp;

This result set comprises just 14595 rows, instead of ~767 million. That's 51000x fewer rows the database needs to scan to handle the query.

Here it is as a materialized view:

LATEST ON materialized view
CREATE MATERIALIZED VIEW 'trades_latest_1d' WITH BASE 'trades' REFRESH INCREMENTAL AS (
SELECT
timestamp,
symbol,
side,
last(price) AS price,
last(amount) AS amount,
last(timestamp) as latest
FROM trades
SAMPLE BY 1d
) PARTITION BY DAY;

You can try this view out on our demo:

trades_latest_1dDemo this query
trades_latest_1d;

Then, you can query this 'per-day LATEST ON' view to quickly calculate the 'true' LATEST ON result.

LATEST ON over the trades_latest_1dDemo this query
SELECT symbol, side, price, amount, "latest" as timestamp FROM (
trades_latest_1d
LATEST ON timestamp
PARTITION BY symbol, side
) ORDER BY timestamp;

And in just a few milliseconds, we get the result:

symbolsidepriceamounttimestamp
ETH-BTCsell0.021960.0059982025-03-31T14:24:18.916000Z
DAI-USDTsell1.0006532025-03-31T14:29:19.392999Z
DAI-USDsell1.0006532025-03-31T14:29:19.392999Z
DAI-USDbuy1.000729.7851062025-03-31T14:30:33.394000Z
DAI-USDTbuy1.000729.7851062025-03-31T14:30:33.394000Z
...............

Seconds down to milliseconds - 100x, even 1000x faster!

Architecture and internals​

The rest of this document contains information about how materialized views work internally.

Storage model​

Materialized views in QuestDB are implemented as special tables that maintain their data independently of their base tables. They use the same underlying storage engine as regular tables, benefiting from QuestDB's columnar storage and partitioning capabilities.

Refresh mechanism​

note

Currently, QuestDB only supports incremental refresh for materialized views.

Future releases will include additional refresh types, such as time-interval and manual refreshes.

Unlike regular views, which recompute their results at query time, materialized views in QuestDB are incrementally refreshed as new data is added to the base table. This approach ensures that only the relevant time slices of the view are updated, avoiding the need to recompute the entire dataset. The refresh process works as follows:

  1. New data is inserted into the base table.
  2. The time-range of this data is identified.
  3. This data is extracted and used to recompute the materialized view.

This refresh happens asynchronously, minimizing any impact on write performance. The refresh state of the materialized view is tracked using transaction numbers. The transaction numbers can be compared with the base table, for monitoring the 'refresh lag'.

For example, if a base table receives new rows for 2025-02-18, only that day's relevant time slices are recomputed, rather than reprocessing all historical data.

You can monitor refresh status using the materialized_views() system function:

Listing all materialized views
SELECT
view_name,
last_refresh_timestamp,
view_status,
base_table_txn,
applied_base_table_txn
FROM materialized_views();

Here is an example output:

view_namelast_refresh_timestampview_statusbase_table_txnapplied_base_table_txn
trades_viewnullvalid102102

When base_table_txn matches applied_base_table_txn, the materialized view is fully up-to-date.

Refreshing an invalid view​

If a materialized view becomes invalid, you can check its status:

Checking view status
SELECT
view_name,
base_table_name,
view_status,
invalidation_reason
FROM materialized_views();
view_namebase_table_nameview_statusinvalidation_reason
trades_viewtradesvalidnull
exchange_viewexchangeinvalid[-105] table does not exist [table=exchange]

To restore an invalid view, and refresh its data from scratch, use:

Restoring an invalid view
REFRESH MATERIALIZED VIEW view_name FULL;

This command deletes existing data in the materialized view, and re-runs its query.

Once the view is repopulated, the view is marked as 'valid' so that it can be incrementally refreshed.

For large base tables, a full refresh may take a significant amount of time. You can cancel the refresh using the CANCEL QUERY SQL.

For the conditions which can invalidate a materialized view, see the technical requirements section.

Base table relationship​

Every materialized view is tied to a base table that serves as its primary data source.

  • For single-table queries, the base table is automatically determined.
  • For multi-table queries, one table must be explicitly defined as the base table using WITH BASE.

The view is automatically refreshed when the base table is changed. Therefore, you should make sure the table that you wish to drive the view is defined correctly. If you use the wrong base table, then the view may not be refreshed at the times you expect.

Technical requirements​

Query constraints​

To create a materialized view, your query:

  • Must use either SAMPLE BY or GROUP BY with a designated timestamp column key.
  • Must not contain FROM-TO, FILL, and ALIGN TO FIRST OBSERVATION clauses in SAMPLE BY queries
  • Must use join conditions that are compatible with incremental refreshing.
  • When the base table has deduplication enabled, the non-aggregate columns selected by the materialized view query must be a subset of the DEDUP keys from the base table.

We intend to loosen some of these restrictions in future.

View invalidation​

The view's structure is tightly coupled with its base table.

The main cause of invalidation for a materialized view, is when the table schema or underlying data is modified.

These changes include dropping columns, dropping partitions and renaming the table.

Data deletion or modification, for example, using TRUNCATE or UPDATE, may also cause invalidation.

Replicated views (Enterprise only)​

Replication of the base table is independent of materialized view maintenance.

If you promote a replica to a new primary instance, this may trigger a full materialized view refresh in the case where the replica did not already have a fully up-to-date materialized view.

Resource management​

Materialized Views are compatible with the usual resource management systems:

  • View TTL settings are separate from the base table.
  • TTL deletions in the base table will not be propagated to the view.
  • Partitions are managed separately between the base table and the view.
  • Refresh intervals can be configured independently.

Materialized view with TTL​

Materialized Views take extra storage and resources to maintain. If your SAMPLE BY unit is small (seconds, milliseconds), this could be a significant amount of data.

Therefore, you can decide on a retention policy for the data, and set it using TTL:

Create a materialized view with a TTL policy
CREATE MATERIALIZED VIEW trades_hourly_prices AS (
SELECT
timestamp,
symbol,
avg(price) AS avg_price
FROM trades
SAMPLE BY 1h
) PARTITION BY WEEK TTL 8 WEEKS;

In this example, the view stores hourly summaries of the pricing data, in weekly partitions, keeping the prior 8 partitions.