🛡️ QuestDB 9.0 is here!Read the release blog

Ingesting L2 order-book data with multidimensional arrays

QuestDB is the world's fastest growing time-series database. Engineered for demanding workloads—from trading floors to stock exchanges—it delivers ultra-low latency, high ingestion throughput, and a multi-tier storage engine. It's open source and integrates with many tools and languages.

If you’ve ever worked with cryptocurrency or financial market tick data, you know that Level 2 (L2) data is critical for understanding market depth. Unlike Level 1 data, which only shows the best bid and ask prices, L2 data reveals more information about market liquidity and order flow with aggregated bid and ask volumes across multiple price levels. This data is crucial for traders, market makers, and quantitative analysts to make informed decisions based on supply and demand dynamics.

But ingesting, storing, and querying L2 data has historically been no easy feat. Efficiently representing multiple (price, size) pairs for bids and asks within a traditional database schema has been challenging, especially when using a relational database that isn't optimized for array or nested data.

Typically, to store something like an order book with many price-size pairs, you’d have to:

  • Create multiple rows per timestamp, one for each price level, leading to data explosion and slow queries.
  • Use JSON or other nested types (if supported), which often don’t perform well for analytical workloads.
  • Lose the compactness and speed benefits of storing all price and size data in a contiguous structure.

This schema design complexity and performance bottleneck make L2 data management cumbersome.

Recently, QuestDB (9.0.0+) added support for float64 arrays, which makes ingesting and querying L2 data much more efficient. Let's give it a spin. We'll modify the existing Cryptofeed tutorial to instead store L2 data as arrays.

Prerequisites

QuestDB supports the ARRAY type from versions 9.0.0 onwards. You can find the latest release from the download page.

We will use the Docker container for our tests.

First, pull the latest QuestDB image and run it:

Start QuestDB Docker container
docker run -p 9000:9000 -d questdb/questdb:latest

Next, let’s set up our Python environment and install a version of the QuestDB Python client library with ARRAY support (3.0.0+).

We'll use uv here, but feel free to use venv or other package management tools.

Set up Python environment and install dependencies
uv venv --python=python3.12
source .venv/bin/activate
uv pip install cryptofeed numpy pandas asyncio
uv pip install -U "questdb[dataframe]"

A quick integration test

Before ingesting real data (which can be big), let’s verify that we can get ingestion working with hardcoded data.

Create a test.py file and paste the following code:

Integration test with artificial data
import pandas as pd
import numpy as np
from questdb.ingress import Sender
df = pd.DataFrame({
'symbol': pd.Categorical(['ETH-USD', 'BTC-USD']),
'side': pd.Categorical(['sell', 'sell']),
'price': [2615.54, 39269.98],
'amount': [0.00044, 0.001],
# NumPy float64 arrays are supported from v3.0.0rc1 onwards.
'ord_book_bids': [
np.array([2615.54, 2618.63]),
np.array([39269.98, 39270.00])
],
'timestamp': pd.to_datetime(['2021-01-01', '2021-01-02'])})
conf = f'http::addr=localhost:9000;'
with Sender.from_conf(conf) as sender:
sender.dataframe(df, table_name='trades', at='timestamp')

This code creates some artificial data and sends it to the database, as a quick integration test.

Now run python test.py and navigate to localhost:9000. Here you will find QuestDB's web console.

A quick select should reveal the artificial data that we sent:

Selecting artificial trading data
Click to zoom

Look good? Onwards!

Ingesting real data

Next, let’s connect to a real exchange and start ingesting data.

We'll apply Bitstamp to grab L2 data (which has a public unauthenticated endpoint). You can use Binance or Coinbase instead, if you have an API key.

To start, create ingest.py with the following code:

Complete L2 data ingestion script
#!/usr/bin/env python3
import asyncio
import numpy as np
import pandas as pd
from cryptofeed import FeedHandler
from cryptofeed.defines import L2_BOOK
from cryptofeed.exchanges import Bitstamp
from cryptofeed.types import OrderBook
from questdb.ingress import Sender
CONF = 'http::addr=localhost:9000;'
# Shared in-memory buffer and lock
_buffer: list[dict] = []
_lock = asyncio.Lock()
_flusher_started = False
async def flush_buffer():
while True:
await asyncio.sleep(5.0)
async with _lock:
if not _buffer:
continue
df = pd.DataFrame(_buffer)
with Sender.from_conf(CONF) as sender:
sender.dataframe(
df,
table_name='btc_usd_bitstamp',
at='timestamp'
)
_buffer.clear()
async def handle_book(ob: OrderBook, receipt_ts: float):
global _flusher_started
if not _flusher_started:
_flusher_started = True
asyncio.get_running_loop().create_task(flush_buffer())
# separate price and size arrays
bids_dict = getattr(ob, 'bids', getattr(ob.book, 'bids', {}))
asks_dict = getattr(ob, 'asks', getattr(ob.book, 'asks', {}))
# build separate price and size arrays
bid_prices = np.array([float(p) for p in bids_dict], dtype=np.float64)
bid_sizes = np.array([float(bids_dict[p]) for p in bids_dict], dtype=np.float64)
ask_prices = np.array([float(p) for p in asks_dict], dtype=np.float64)
ask_sizes = np.array([float(asks_dict[p]) for p in asks_dict], dtype=np.float64)
async with _lock:
_buffer.append({
'symbol': ob.symbol,
'bid_prices': bid_prices,
'bid_sizes': bid_sizes,
'ask_prices': ask_prices,
'ask_sizes': ask_sizes,
'timestamp': pd.to_datetime(int(receipt_ts * 1e9), unit='ns'),
})
def main():
fh = FeedHandler()
fh.add_feed(Bitstamp(
symbols=['BTC-USD'],
channels=[L2_BOOK],
callbacks={L2_BOOK: handle_book}
))
fh.run()
if __name__ == '__main__':
main()

Let’s break down this code and explore how it works.

main()

Main function to set up Cryptofeed handler
def main():
fh = FeedHandler()
fh.add_feed(Bitstamp(
symbols=['BTC-USD'],
channels=[L2_BOOK],
callbacks={L2_BOOK: handle_book}
))
fh.run()

The main() function is straightforward. We’re instantiating a cryptofeed handler, connecting to Bitstamp, and calling L2_BOOK data with our custom handler.

handle_book

Extract and convert order book data to arrays
# separate price and size arrays
bids_dict = getattr(ob, 'bids', getattr(ob.book, 'bids', {}))
asks_dict = getattr(ob, 'asks', getattr(ob.book, 'asks', {}))
# build separate price and size arrays
bid_prices = np.array([float(p) for p in bids_dict], dtype=np.float64)
bid_sizes = np.array([float(bids_dict[p]) for p in bids_dict], dtype=np.float64)
ask_prices = np.array([float(p) for p in asks_dict], dtype=np.float64)
ask_sizes = np.array([float(asks_dict[p]) for p in asks_dict], dtype=np.float64)

In the handle_book function, we’re decomposing the format Bitstamp L2 data is sent in, and turning it into a format that QuestDB client library currently supports.

If we look at Cryptofeed Bitstamp L2 code, we see that data is sent as (price, size) per side (bid, ask).

Instead, we separate the pairs into price and size arrays per side, and coerce it to the np.float64 type.

flush_buffer

Buffer flush function to send data to QuestDB
async def flush_buffer():
while True:
await asyncio.sleep(5.0)
async with _lock:
if not _buffer:
continue
df = pd.DataFrame(_buffer)
with Sender.from_conf(CONF) as sender:
sender.dataframe(
df,
table_name='btc_usd_bitstamp',
at='timestamp'
)
_buffer.clear()

In the flush_buffer function, we flush our data to QuestDB every 5 seconds (configurable), and send the dataframe to the btc_usd_bitstamp table.

Now let's try it out!

Querying L2 data

Before proceeding, make sure you run the above ingestion script:

Run the ingestion script
python ingest.py

Next, we will query the database and view our data.

Querying L2 data

We can perform a quick sanity check and see what was written:

View all ingested data
SELECT * FROM btc_usd_bitstamp;
Selecting real bitcoin data
Click to zoom

Now let's run through some example queries:

Selecting the best bids and asks

TIP

When accessing values in an array, QuestDB uses 1-based indexing, following standard SQL conventions.

Get best bid and ask prices
SELECT
timestamp,
bid_prices[1] AS best_bid,
ask_prices[1] AS best_ask
FROM btc_usd_bitstamp
ORDER BY timestamp DESC
LIMIT 10;
timestampbest_bidbest_ask
2025-07-31T15:12:36.504695Z118027.0118028.0
2025-07-31T15:12:36.188858Z118026.0118028.0
2025-07-31T15:12:35.878437Z118021.0118024.0
2025-07-31T15:12:35.555968Z118022.0118024.0
2025-07-31T15:12:35.247727Z118020.0118022.0
2025-07-31T15:12:34.920907Z118021.0118022.0
2025-07-31T15:12:34.606257Z118021.0118022.0
2025-07-31T15:12:34.285651Z118021.0118022.0
2025-07-31T15:12:33.969207Z118021.0118022.0
2025-07-31T15:12:33.665459Z118021.0118022.0

Selecting the average of the best bids and asks over 30 second intervals

Calculate average best bid and ask over 30-second intervals
SELECT
avg(bid_prices[1]) as avg_best_bid,
avg(ask_prices[1]) as avg_best_ask
FROM btc_usd_bitstamp
SAMPLE BY 30s;
avg_best_bidavg_best_ask
117885.88421052632117887.02105263158
118001.17021276595118005.25531914894
118039.58947368422118041.05263157895
118037.41935483871118038.64516129032
118077.14814814815118079.46296296296

Calculating the latest spread

The spread represents the difference between the price you'd pay to buy a specific volume (using ask prices) and the price you'd receive to sell the same volume (using bid prices). This is a key measure of market liquidity and trading costs.

Here we calculate the spread for a volume of 18 BTC by using QuestDB's l2price() function, which computes the average execution price across multiple order book levels to fill the target volume:

Calculate spread for 18 BTC volume using l2price function
DECLARE
@volume := 18,
@asks := l2price(@volume, ask_sizes, ask_prices),
@bids := l2price(@volume, bid_sizes, bid_prices),
@spread := spread(@bids, @asks)
SELECT timestamp,
@spread as spread
FROM btc_usd_bitstamp
LIMIT -1;
timestampspread
2025-07-31T15:14:07.092943Z189.75827528552327

Finding the price level for a given volume

This query uses array_cum_sum to do a cumulative sum of the ask sizes. It then uses the insertion_point function to find at which position our target volume would be available. There's a set of array-specific functions to experiment with.

Find price level for target volume using cumulative sum
SELECT
timestamp,
array_cum_sum(ask_sizes) cum_volumes,
insertion_point(cum_volumes, 18, true) target_level
FROM btc_usd_bitstamp
limit -5;
timestampcum_volumestarget_level
2025-07-31T15:14:30.890342ZARRAY[0.12274851,0.18...]49
2025-07-31T15:14:31.193497ZARRAY[0.12274851,0.29...]49
2025-07-31T15:14:31.512645ZARRAY[0.12274851,0.17...]52
2025-07-31T15:14:31.833741ZARRAY[0.12274851,0.17...]51
2025-07-31T15:14:32.145116ZARRAY[0.12188851,0.25...]51

Alternate order book layouts

So far, we have used several 1D arrays to store our order book data. However, there are alternate data models available.

For example, the questdb demo has a live (simulated) FX dataset storing bids and asks as bi-dimensional arrays, where bids[1] and asks[1] represent the prices at multiple levels, while bids[2] and asks[2] hold the sizes at the same levels.

You can run this query to find the spread in basis points on the demo dataset.

Spread in Basis PointsDemo this query
SELECT timestamp, symbol, spread_bps(bids[1][1], asks[1][1]) spread_bps
FROM market_data
WHERE symbol IN ('GBPUSD', 'EURUSD')
LIMIT -10;

Wrapping up

In this tutorial, we modified our Cryptofeed example to send L2 book data using arrays.

Instead of using many rows or many columns to represent each (price, size) pair, we can instead use arrays to ingest and query L2 data compactly. This makes it easier to store and later analyze for trends, with lower overall overhead.

We've also shown alternate models for order books, for example as bi-dimensional arrays.

Subscribe to our newsletters for the latest. Secure and never shared or sold.