Export or Convert Data to Parquet
There are three ways of converting or exporting data to Parquet:
Data Compressionβ
By default, Parquet files generated by QuestDB are uncompressed. One of the key advantages of Parquet over QuestDBβs native format is its built-in compression.
You can configure compression in server.conf
with:
# zstd
cairo.partition.encoder.parquet.compression.codec=6
# level is from 1 to 22, 1 is fastest
cairo.partition.encoder.parquet.compression.level=10
You can override these defaults when exporting via COPY.
Export queries as filesβ
Exporting as files is right now available on a development branch: https://github.com/questdb/questdb/pull/6008. If you want to test this feature, you need to clone and compile the branch.
The code is functional, but it is just lacking fuzzy tests and documentation. We should be able to include this in a release soon enough, but for exporting it is safe to just checkout the development branch, compile, and start QuestDB pointing to the target jar.
To export a query as a file, you can use either the /exp
REST API endpoint or the COPY
command.
Export query as file via RESTβ
You can use the same parameters as when doing a CSV export, only passing parquet
as the fmt
parameter value.
curl -G \
--data-urlencode "query=select * from market_data limit 3;" \
'http://localhost:9000/exp?fmt=parquet' > ~/tmp/exp.parquet
By default the Parquet file will not be compressed, but it can be controlled with the server.conf
variables above.
Exporting via REST API is a synchronous process. For larger queries you might
prefer to use the COPY
method below
Once exported, you can just use it from anywhere, including DuckDB, Pandas, or Polars. If you wanted to point DuckDB to the example file exported in the previous example, you could start DuckDB and execute:
select * from read_parquet('~/tmp/exp.parquet');
Export query as files via COPYβ
If you prefer to export data via SQL, or if you want to export asynchronously, you
can use the COPY
command from the web console, from any pgwire-compliant client,
or using the exec
endpoint of the REST API.
You can export a query:
COPY (select * from market_data limit 3) TO 'market_data_parquet_table' WITH FORMAT PARQUET;
Or you can export a whole table:
COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET;
The output files (one per partition) will be under $QUESTDB_ROOT_FOLDER/export/$TO_TABLE_NAME/
.
The COPY
command will return immediately, but the export happens in the background. The command will return an export
id string:
id |
---|
45ba24e5ba338099 |
If you want to monitor the export process, you can issue a call like this:
SELECT * FROM 'sys.copy_export_log' WHERE id = '45ba24e5ba338099';
While it is running, export can be cancelled with:
COPY '45ba24e5ba338099' CANCEL;
By default, the Parquet files will not be compressed, unless server-side configuration is applied, as seen above. However, you can override the compression individually for each export. For example:
COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET COMPRESSION_CODEC LZ4_RAW;
In-place conversionβ
At the moment, converting to Parquet in-place is work-in-progress and not recommended for production. We recommend caution and taking a snapshot before starting any in-place data conversion.
When using in-place conversion, the partition(s) remain under QuestDB's control, and data can still be queried as if it were in native format.
It is recommended to use QuestDB 9.0.1 or higher, as some features, like arrays on Parquet partitions, were not supported in previous versions.
At its current state, in-place conversion of native partitions into Parquet has the following limitations:
- We have been testing Parquet support for months, and we haven't experienced data corruption or data loss, but this is not guaranteed. It is strongly advised to back up first.
- We have seen cases in which querying Parquet partitions leads to a database crash. This can happen if metadata in the table is different from metadata in the Parquet partitions, but it could also happen in other cases.
- While converting data, writes to the partitions remain blocked.
- After a partition has been converted to Parquet, it will not register any changes you send to that partition, including respecting any applicable TTL, unless you convert back to native.
- Schema changes are not supported.
- Some parallel queries are still not optimized for Parquet.
- There is no compression by default (but it can be enabled via config values)
For the reasons above, we recommend not using in-place conversion in production yet, unless you test extensively with the shape of the data and queries you will be running, and take frequent snapshots.
All those caveats should disappear in the next few months, when we will announce it is ready for production.
Basics of In-place conversionβ
Converting partitions from native format to Parquet, or from Parquet into native format, is done via ALTER TABLE
. You
need to pass a filter specifying the partitions to convert. The filter can be either a WHERE
or a LIST
, in the same
way it is used for the DETACH
command.
The active (most recent) partition will never be converted into Parquet, even if it matches the filter.
Conversion is asynchronous, and can take a while to finish, depending on the number of partitions, on the partition size, on the compression being used, and on disk performance and general load of the server.
To monitor how the conversion is going, you can issue a SHOW PARTITIONS
command. Partitions in the Parquet format will have the isParquet
column set to true
and will show the size on the
parquetFileSize
column.
Converting from Native Format into Parquetβ
ALTER TABLE trades CONVERT PARTITION TO PARQUET WHERE timestamp < '2025-08-31';
From this moment on, any changes sent to the affected partitions will be discarded.
Converting from Parquet into Native Formatβ
ALTER TABLE trades CONVERT PARTITION TO NATIVE WHERE timestamp < '2025-08-31';