QuestDB 8.1.0 - Parquet, smarter snapshots, improved SAMPLE BY, and more
QuestDB 8.1.0 has arrived.
Our prior release introduced JSON extraction.
And now, by popular demand, QuestDB meets Apache Parquet.
In this release, QuestDB's foundation takes another step towards our next-generation architecture. This article will break down three key features, then itemize performance improvements and bugs.
For the full notes including PR links, checkout GitHub.
Parquet read support
Headlining this release is read support for Apache Parquet.
With Parquet support, you can have:
- timeseries data within Questdb
- master tables on Parquet files
And then join them with a single query.
To do so, use the read_parquet()
function to read
Apache Parquet files.
WITH parsed_data AS (SELECTtimestamp,json_extract(metadata, '$.location') AS locationFROMread_parquet('financial_demo.parquet')ORDER BYtimestamp) timestamp(timestamp)SELECTtimestamp,location,count()FROMparsed_dataSAMPLE BY1dFILL(null);
The query above:
- Reads a Parquet file named
financial_demo.parquet
- Extracts the timestamp and location from the JSON metadata column
- Orders the data by timestamp
- Counts the occurrences of each location for each day
- Additionally, if there are any days with no data, those days will be included in the result set with null values
For more information, check out the documentation.
Both JSON support and Parquet reading are new as of the past two weeks. We plan on continuing this pace of innovation to improve and polish these in-demand features. We'll need your help! If it fits your use-case, we'd love to hear from you.
Please note that Parquet support is currently in beta.
Improved SAMPLE BY
Adds new FROM-TO
syntax. This syntax is for specifying the output range for a
query. In doing so, it enables "fill" data on either end of a result set. This
also defines a lower bound for the data.
SAMPLE BY '5d' FROM '2018-01-01' TO '2019-01-01'
In simple terms, WHERE defines what data will be sampled, and FROM defines what
shape the output data will have. The changes also adds more parallization to
SAMPLE BY, with both FILL(VALUE)
and FILL(NULL)
.
For more information, check out the thorough write-up in the documentation.
Snapshot recovery
Previously, snapshot recovery was driven by the Cairo Snapshot Instance ID property, which had no default value and was generally confusing to use. Snapshot recovery is now triggered by the existence of a trigger file in the QuestDB root directory.
Now, if a file named root/_restore
exists and a snapshot is available, it will
be restored on startup. The old strategy with Cairo Snapshot Instance IDs
remains for backwards compatibility but is no longer recommended.
Additionally, a failure during snapshot recovery aborts QuestDB start. A failure during recovery means the system state is undefined and should not be relied on.
For more information, check out the documentation.
Performance 🚀
In addition, we've got the usual set of performance improvements.
- Improved ILP ingress latency
- SQL Performance:
- Speedup for small transaction writing by 50-100%.
- Enhanced performance for the
like/ilike
operator on symbol columns. - Speed improvements for
regexp_replace(varchar)
with simple patterns.
- Aggregation queries: Aggregation queries are now faster.
Bug Fixes 🐛
And naturally, bugs too.
-
SQL:
- Fixed spurious "unsupported operation" errors.
- Corrected wrong results from parallel
WHERE
andGROUP BY
for some function keys. - Fixed incorrect results from parallel
GROUP BY
with a single varchar function key. - Addressed issues with some window functions double-counting rows.
-
Core:
- Prevented table suspension under memory pressure.
- Fixed Charts when using together with OAuth.
Summary
Our past two releases came in quick succession. We're eager to get feedback and continue apace. If possible, please upgrade at your earliest convenience and let us know your thoughts. For questions and comments, swing by our Community Forum or our public Slack.
We'd love to hear from you.