Interested in QuestDB use cases?

Learn more

High-Cardinality Time-Series Metrics

RedditHackerNewsX
SUMMARY

High-cardinality time-series metrics arise when metrics are labeled with many distinct combinations of dimensions, such as host, container, customer, or trading symbol. They are a core scalability challenge for observability platforms, capital markets data stores, and any system tracking fine-grained telemetry at scale.

What Are High-Cardinality Time-Series Metrics?

In metric systems, a "series" is usually identified by a metric name plus a set of labels or tags. Cardinality refers to the number of distinct series. High-cardinality time-series metrics occur when label combinations explode, for example:

  • per-pod, per-HTTP-route, per-customer latency metrics in observability
  • per-symbol, per-venue, per-algorithm order-book or P&L metrics in trading

The result is millions to billions of distinct series, each with relatively few data points over time. This differs from generic high cardinality in relational systems because every distinct label set becomes its own time series.

They stress metadata storage, index structures, and query planning far more than raw point volume.

Next generation time-series database

QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.

Why They Are Hard for Time-Series Databases

Naive schemas that map each label combination to its own time-series or table quickly hit limits:

  • Indexes on high-cardinality tags grow large and randomize I/O
  • Per-series state (caches, in-memory indexes) consumes unbounded memory
  • Group-by on labels becomes expensive as the number of groups explodes

In observability, this shows up as "cardinality bombs" when a new dimension (for example, user_id) is added to all metrics. In capital markets, per-order or per-account labeling on market data time-series schema can have the same effect.

Schema and Labeling Strategies

Mitigating high-cardinality time-series starts at schema design:

  • Control metric cardinality: avoid unbounded identifiers in labels, use sampling or bucketing where possible, and understand metric cardinality.
  • Normalize volatile attributes into dimension tables instead of tags.
  • Separate high-level KPIs from exploratory metrics to contain tag explosion.
  • Use rollups and telemetry rollups so only a subset of labels is retained at long horizons.

Indexing and Storage Techniques

Columnar time-series databases typically handle high-cardinality metrics using a combination of:

  • Symbol or dictionary-encoded label columns to compress repeated values
  • Time-based partitioning plus a time-series index to localize queries to relevant intervals
  • Carefully chosen indexing strategy that favors common filters (for example, symbol, service, region)
  • Approximate structures such as HyperLogLog and other sketch algorithms for distinct counts and percentiles without scanning all series

For observability and capital markets workloads, evaluating a TSDB's behavior under high-cardinality scenarios is often more important than peak ingest benchmarks.

For a deeper dive on real-world behavior, see How databases handle 10 million devices in high-cardinality benchmarks and Using QuestDB to collect infrastructure metrics.

Subscribe to our newsletters for the latest. Secure and never shared or sold.