Export or Convert Data to Parquet

There are three ways of converting or exporting data to Parquet:

Data Compression​

By default, Parquet files generated by QuestDB are uncompressed. One of the key advantages of Parquet over QuestDB’s native format is its built-in compression.

You can configure compression in server.conf with:

# zstd
cairo.partition.encoder.parquet.compression.codec=6
# level is from 1 to 22, 1 is fastest
cairo.partition.encoder.parquet.compression.level=10

You can override these defaults when exporting via COPY.

Export queries as files​

warning

Exporting as files is right now available on a development branch: https://github.com/questdb/questdb/pull/6008. If you want to test this feature, you need to clone and compile the branch.

The code is functional, but it is just lacking fuzzy tests and documentation. We should be able to include this in a release soon enough, but for exporting it is safe to just checkout the development branch, compile, and start QuestDB pointing to the target jar.

To export a query as a file, you can use either the /exp REST API endpoint or the COPY command.

Export query as file via REST​

You can use the same parameters as when doing a CSV export, only passing parquet as the fmt parameter value.

 curl  -G \
--data-urlencode "query=select * from market_data limit 3;" \
'http://localhost:9000/exp?fmt=parquet' > ~/tmp/exp.parquet

By default the Parquet file will not be compressed, but it can be controlled with the server.conf variables above.

note

Exporting via REST API is a synchronous process. For larger queries you might prefer to use the COPY method below

Once exported, you can just use it from anywhere, including DuckDB, Pandas, or Polars. If you wanted to point DuckDB to the example file exported in the previous example, you could start DuckDB and execute:

    select * from read_parquet('~/tmp/exp.parquet');

Export query as files via COPY​

If you prefer to export data via SQL, or if you want to export asynchronously, you can use the COPY command from the web console, from any pgwire-compliant client, or using the exec endpoint of the REST API.

You can export a query:

    COPY (select * from market_data limit 3) TO 'market_data_parquet_table' WITH FORMAT PARQUET;

Or you can export a whole table:

    COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET;

The output files (one per partition) will be under $QUESTDB_ROOT_FOLDER/export/$TO_TABLE_NAME/.

The COPY command will return immediately, but the export happens in the background. The command will return an export id string:

id
45ba24e5ba338099

If you want to monitor the export process, you can issue a call like this:

SELECT * FROM 'sys.copy_export_log' WHERE id = '45ba24e5ba338099';

While it is running, export can be cancelled with:

COPY '45ba24e5ba338099' CANCEL;

By default, the Parquet files will not be compressed, unless server-side configuration is applied, as seen above. However, you can override the compression individually for each export. For example:

COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET COMPRESSION_CODEC LZ4_RAW;

In-place conversion​

warning

At the moment, converting to Parquet in-place is work-in-progress and not recommended for production. We recommend caution and taking a snapshot before starting any in-place data conversion.

When using in-place conversion, the partition(s) remain under QuestDB's control, and data can still be queried as if it were in native format.

note

It is recommended to use QuestDB 9.0.1 or higher, as some features, like arrays on Parquet partitions, were not supported in previous versions.

At its current state, in-place conversion of native partitions into Parquet has the following limitations:

  • We have been testing Parquet support for months, and we haven't experienced data corruption or data loss, but this is not guaranteed. It is strongly advised to back up first.
  • We have seen cases in which querying Parquet partitions leads to a database crash. This can happen if metadata in the table is different from metadata in the Parquet partitions, but it could also happen in other cases.
  • While converting data, writes to the partitions remain blocked.
  • After a partition has been converted to Parquet, it will not register any changes you send to that partition, including respecting any applicable TTL, unless you convert back to native.
  • Schema changes are not supported.
  • Some parallel queries are still not optimized for Parquet.
  • There is no compression by default (but it can be enabled via config values)

For the reasons above, we recommend not using in-place conversion in production yet, unless you test extensively with the shape of the data and queries you will be running, and take frequent snapshots.

All those caveats should disappear in the next few months, when we will announce it is ready for production.

Basics of In-place conversion​

Converting partitions from native format to Parquet, or from Parquet into native format, is done via ALTER TABLE. You need to pass a filter specifying the partitions to convert. The filter can be either a WHERE or a LIST, in the same way it is used for the DETACH command.

tip

The active (most recent) partition will never be converted into Parquet, even if it matches the filter.

Conversion is asynchronous, and can take a while to finish, depending on the number of partitions, on the partition size, on the compression being used, and on disk performance and general load of the server.

To monitor how the conversion is going, you can issue a SHOW PARTITIONS command. Partitions in the Parquet format will have the isParquet column set to true and will show the size on the parquetFileSize column.

Converting from Native Format into Parquet​

ALTER TABLE trades CONVERT PARTITION TO PARQUET WHERE timestamp < '2025-08-31';

From this moment on, any changes sent to the affected partitions will be discarded.

Converting from Parquet into Native Format​

ALTER TABLE trades CONVERT PARTITION TO NATIVE WHERE timestamp < '2025-08-31';