Streaming market data from Arroyo into QuestDB

May 30, 2025

Tags:

streaming SQL Arroyo Kafka QuestDB tutorial

QuestDB is the world's fastest growing time-series database. Engineered for demanding workloads—from trading floors to stock exchanges—it delivers ultra-low latency, high ingestion throughput, and a multi-tier storage engine. It's open source and integrates with many tools and languages.

Introduction

Arroyo is a new stream processing engine that’s gained a lot of attention since its release — and especially after its recent acquisition by Cloudflare. Designed for low-latency, SQL-first stream processing, Arroyo is written in Rust and makes it easy to build streaming data pipelines without the complexity of alternative systems like Apache Flink or Spark.

It’s fast, lightweight, and expressive. And more importantly for us: it speaks SQL.

QuestDB is a high-performance time-series database built for SQL. If you’re using Arroyo for in-stream processing — such as enrichment, validation, or transformation — and need a sink that can power real-time analytics, QuestDB is a natural fit.

The problem: no native QuestDB connector

When I first looked into connecting Arroyo and QuestDB, I considered a few approaches:

Option 1: Kafka + Kafka Connect

Arroyo already supports Kafka sinks, and we have a Kafka Connect connector for QuestDB. Here’s how it might look:

{
  "connector.class": "io.questdb.kafka.QuestDBSinkConnector",
  "tasks.max": "5",
  "topics": "trades",
  "client.conf.string": "http::addr=http://localhost:9000;",
  "name": "questdb-trades",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "include.key": false,
  "key.converter": "org.apache.kafka.connect.storage.StringConverter",
  "table": "trades",
  "symbols": "symbol, side",
  "timestamp.field.name": "ts",
  "value.converter.schemas.enable": true
}

This works, especially if you're already using Kafka. But it adds overhead — an extra system to manage, with added CPU and memory usage, increased latency, and a need to debug Kafka Connect itself if anything goes wrong. On top of that, it uses JSON encoding, which is not ideal for throughput.

Option 2: Postgres sink

Arroyo offers PostgreSQL compatibility through Debezium, which also requires Kafka. And again, the format is JSON — which is bulky, and the pipeline is still more complex than it needs to be.

The idea: use Arroyo's webhook + QuestDB’s ILP

Both Arroyo and QuestDB speak SQL. And QuestDB exposes an HTTP endpoint (/write) that accepts InfluxDB Line Protocol (ILP).

So I had this idea: could I use Arroyo’s webhook connector to send ILP-formatted data directly to QuestDB?

Turns out: yes! I later found a video from Micah Wylde (creator of Arroyo) where he used this exact approach to send data into InfluxDB. That confirmed my hunch — this could work with QuestDB too.

How ILP works

Here's an example of a line in ILP:

trades,symbol=BTC-USD,side=buy price=39269.98,amount=0.001 1646762637710419000

It consists of:

Table name: trades
Tags (symbols/strings): symbol=BTC-USD,side=buy
Fields: price=..., amount=...
Timestamp (optional, in nanoseconds)

You can send many lines in a single HTTP POST to /write, each separated by \n.

Full working example

Let’s walk through a working example using Arroyo’s impulse connector to generate data, and its webhook connector to send data to QuestDB.

1. Create the table in QuestDB

CREATE TABLE trades (
  symbol SYMBOL,
  side SYMBOL,
  price DOUBLE,
  amount DOUBLE,
  ts TIMESTAMP
) TIMESTAMP(ts) PARTITION BY DAY;

2. Create the Arroyo webhook sink

CREATE TABLE questdb_sink (
  value TEXT
) WITH (
  connector = 'webhook',
  endpoint = 'http://localhost:9000/write',
  format = 'raw_string'
);

3. Generate data with Arroyo’s impulse connector

CREATE TABLE impulse WITH (
  connector = 'impulse',
  event_rate = '1'
);

4. Insert formatted ILP lines into QuestDB

INSERT INTO questdb_sink
SELECT
  ARRAY_TO_STRING(
    ARRAY_AGG(
      CONCAT(
        'trades,symbol=BTC-USD,side=buy ',
        'price=', RANDOM() * 30000 + 20000, ',',
        'amount=', RANDOM() * 0.01, ' ',
        CAST(to_timestamp_nanos(NOW()) AS BIGINT)
      )
    ),
    CHR(10)
  ) AS value
FROM impulse
GROUP BY TUMBLE(INTERVAL '5 SECONDS');

This inserts one batch of 5 records every 5 seconds into QuestDB — all done with SQL.

Conclusion

Arroyo and QuestDB pair naturally — both are fast, SQL-first, and easy to integrate.

While there’s no native QuestDB sink in Arroyo today, the webhook connector + ILP over HTTP makes for a clean, dependency-free pipeline. No Kafka, no JSON, no UDFs.

If you're experimenting with Arroyo and want fast, scalable storage for your real-time streams, give this a try — and let us know how it goes!

Explore other integrations of QuestDB with third party tools or learn more about QuestDB at our documentation.