Create a sample database

This guide walks you through creating a sample dataset.

It utilizes rnd_ functions and basic SQL grammar to generate 'mock' data of specific types.

For most applications, you will import your data using methods like the InfluxDB Line Protocol, CSV imports, or integration with third-party tools such as Telegraf, Kafka, or Prometheus. If your interest lies in data ingestion rather than generation, refer to our ingestion overview. Alternatively, the QuestDB demo instance offers a practical way to explore data creation and manipulation without setting up your dataset.

All that said, in this tutorial you will learn how to:

  1. Create tables
  2. Populate tables with sample data
  3. Run simple and advanced queries
  4. Delete tables

Before we begin...

All commands are run through the Web Console accessible at http://localhost:9000.

You can also run the same SQL via the Postgres endpoint or the REST API.

If QuestDB is not running locally, checkout the quick start.

Creating a table

With QuestDB running, the first step is to create a table.

We'll start with one representing financial market data. Then in the insert section, we'll create another pair of tables representing temperature sensors and their readings.

Let's start by creating the trades table:

CREATE TABLE trades (
timestamp TIMESTAMP,
symbol SYMBOL,
side SYMBOL,
price DOUBLE,
amount DOUBLE
) TIMESTAMP(timestamp) PARTITION BY DAY
DEDUP UPSERT KEYS(timestamp, symbol);

This is a basic yet robust table. It applies SYMBOLs for ticker and side, a price, and a designated timestamp. It's partitioned by day and deduplicates the timestamp and ticker columns. As the links above show, there's lots to unpack in this table! Feel free to learn more about the nuances.

We've done all of this to match the nature of how we'll query this data. We're focused on a the flow of the market, the pulse of the market's day-to-day, hence we've partitioned it as such. We're also leery of duplicates, for accuracy of data, so we'll ensure that if timestamps are identical that we do not create a duplicate. Timestamps are essential for time-series analysis.

We'll proceed forward to INSERT.

Inserting data

Financial market data

Let's populate our trades table with procedurally-generated data:

Insert as SELECT
INSERT INTO trades
SELECT
timestamp_sequence('2024-01-01T00:00:00', 60000L * x) timestamp, -- Generate a timestamp every minute starting from Jan 1, 2024
rnd_str('ETH-USD', 'BTC-USD', 'SOL-USD', 'LTC-USD', 'UNI-USD') symbol, -- Random ticker symbols
rnd_str('buy', 'sell') side, -- Random side (BUY or SELL)
rnd_double() * 1000 + 100 price, -- Random price between 100.0 and 1100.0,
rnd_double() * 2000 + 0.1 amount -- Random price between 0.1 and 2000.1
FROM long_sequence(10000) x;

Our trades table now contains 10,000 randomly-generated trades. The comments indicate how we've structured our random data. We picked a few companies, BUY vs. SELL, and created a timestamp every minute. We've dictated the overall number of rows generated via long_sequence(10000). We can bump that up, if we want.

We've also conservatively generated a timestamp per minute, even though in reality trades against these companies are likely much more frequent. This helps keep our basic examples basic.

Now let's look at the table and its data:

'trades';

It will look similar to this, albeit with alternative randomized values.

timestampsymbolsidepriceamount
2024-01-01T00:00:00.000000ZBTC-USDsell483.904143675277139.449481016294
2024-01-01T00:00:00.060000ZETH-USDsell920.296245196274920.296245196274
2024-01-01T00:00:00.180000ZUNI-USDsell643.277468441839643.277468441839
2024-01-01T00:00:00.360000ZLTC-USDbuy218.0920768859729.81119178972
2024-01-01T00:00:00.600000ZBTC-USDsell157.596416931116691.081778396176

That's some fake market data. Let's create more tables to demonstrate joins.

Quotes and instruments

This next example will create and populate two more tables. One table will contain price quotes, and the other will contain instrument metadata. In both cases, we will create the table and generate the data at the same time.

This combines the CREATE & SELECT operations to perform a create-and-insert:

Create table as, quotes
CREATE TABLE quotes
AS(
SELECT
x ID,
timestamp_sequence(to_timestamp('2019-10-17T00:00:00', 'yyyy-MM-ddTHH:mm:ss'), rnd_long(1,10,0) * 100000L) ts,
rnd_double(0)*80 + 100 price,
rnd_long(0, 10000, 0) instrument_id
FROM long_sequence(10000000) x)
TIMESTAMP(ts)
PARTITION BY MONTH DEDUP UPSERT KEYS(ts);

For our table, we've again hit the following key notes:

  • TIMESTAMP(ts) elects the ts column as a designated timestamp for partitioning over time.
  • PARTITION BY MONTH creates a monthly partition, where the stored data is effectively sharded by month.
  • DEDUP UPSERT KEYS(ts) deduplicates the timestamp column

The generated data will look like the following:

IDtspriceinstrument_id
12019-10-17T00:00:00.000000Z145.379160
22019-10-17T00:00:00.600000Z162.919671
32019-10-17T00:00:01.400000Z128.588731
42019-10-17T00:00:01.500000Z131.693447
52019-10-17T00:00:01.600000Z155.687985
............

Nice - and our next table, which includes the instruments themselves and their detail:

Create table as, instruments
CREATE TABLE instruments
AS(
SELECT
x ID, -- Increasing integer
rnd_str('NYSE', 'NASDAQ', 'LSE', 'TSE', 'HKEX') exchange, -- Random exchange
rnd_str('Tech', 'Finance', 'Energy', 'Healthcare', 'Consumer') sector -- Random sector
FROM long_sequence(10000) x)

Note that we've not included a timestamp in this instruments table. This is one of the rare examples where we're not including it, and thus not taking advantage of time-series optimization. As we have a timestamp in the paired quotes table, it's helpful to demonstrate them as a pair.

With these two new tables, and our prior financial market data table, we've got a lot of useful queries we can test.

Running queries

Our financial market data table is a great place to test various aggregate functions, to compute price over time intervals, and similar analysis.

Let's expand on the quotes and instruments tables.

First, let's look at quotes, running our shorthand for SELECT * FROM quotes;:

quotes;

Let's then select the count of records from quotes:

SELECT count() FROM quotes;
count
10,000,000

And then the average price:

SELECT avg(price) FROM quotes;
average
139.99217780895

We can now use the instruments table alongside the quotes table to get more interesting results using a JOIN:

SELECT *
FROM quotes
JOIN(
SELECT ID inst_id, exchange, sector
FROM instruments)
ON quotes.instrument_id = inst_id;

The results should look like the table below:

IDtspriceinstrument_idinst_idexchangesector
12019-10-17T00:00:00.000000Z146.4732113211LSETech
22019-10-17T00:00:00.100000Z136.5923192319NASDAQFinance
32019-10-17T00:00:00.100000Z160.2987238723NYSETech
42019-10-17T00:00:00.100000Z170.94885885HKEXHealthcare
52019-10-17T00:00:00.200000Z149.3432003200NASDAQEnergy
62019-10-17T00:00:01.100000Z160.9540534053TSEConsumer

Note the timestamps returned as we've JOIN'd the tables together.

Let's try another type of aggregation:

Aggregation keyed by sector
SELECT sector, max(price)
FROM quotes
JOIN(
SELECT ID inst_id, sector
FROM instruments) a
ON quotes.instrument_id = a.inst_id;

The results should look like the table below:

sectormax
Tech179.99998786398
Finance179.99998138348
Energy179.9999994818
Healthcare179.99991705861
Consumer179.99999233377

Back to time, given we have one table (quotes) partitioned by time, let's see what we can do when we JOIN the tables together to perform an aggregation based on an hour of time:

Aggregation by hourly time buckets
SELECT ts, sector, exchange, avg(price)
FROM quotes timestamp(ts)
JOIN
(SELECT ID inst_id, sector, exchange
FROM instruments
WHERE sector='Tech' AND exchange='NYSE') a
ON quotes.instrument_id = a.inst_id
WHERE ts IN '2019-10-21;1d' -- this is an interval between 2019/10/21 and the next day
SAMPLE BY 1h -- aggregation by hourly time buckets
ALIGN TO CALENDAR; -- align the ts with the start of the hour (hh:00:00)

The results should look like the table below:

tssectorexchangeaverage
2019-10-21T00:00:00.000000ZTechNYSE140.004285872
2019-10-21T00:01:00.000000ZTechNYSE136.68436714
2019-10-21T00:02:00.000000ZTechNYSE135.24368409
2019-10-21T00:03:00.000000ZTechNYSE137.19398410
2019-10-21T00:04:00.000000ZTechNYSE150.77868682
............

For more information about these statements, please refer to the SELECT, JOIN and SAMPLE BY pages.

Deleting tables

We can now clean up the demo data by using DROP TABLE SQL. Be careful using this statement as QuestDB cannot recover data that is deleted in this way:

DROP TABLE quotes;
DROP TABLE instruments;
DROP TABLE trades;