Time-Series Analytics at Scale with ClickHouse

Time-Series Analytics at Scale with ClickHouse


Time-series data — metrics, logs, traces, IoT telemetry, financial ticks — is the dominant analytics workload in modern systems. ClickHouse has become the default engine for time-series analytics at scale because it combines columnar storage, vectorized execution, and purpose-built codecs. This ChistaDATA guide details production patterns for time-series on ClickHouse.

Time-Series Analytics

Why ClickHouse Excels at Time-Series

Time-series workloads have three properties that OLAP engines often struggle with: high-throughput append-only inserts, long retention with tiered storage, and queries that combine narrow time filters with wide aggregation. ClickHouse was engineered precisely for this shape.

Three ClickHouse features directly target time-series performance:

  • Columnar compression with time-series codecs — DoubleDelta, Gorilla, and T64 codecs exploit the monotonic, slowly-changing nature of metrics to achieve 10–30x compression.
  • Sparse primary indexes over time — ordering by (entity_id, timestamp) lets ClickHouse locate a narrow time range in O(log N) block scans.
  • Tiered storage policies — hot data on NVMe, warm on SSD, cold on S3, all transparent to queries.

Real-world evidence: Netflix ingests 5 petabytes of time-stamped log events daily into ClickHouse, sustaining 10.6M events per second and serving queries in sub-second time.

Schema Design for Time-Series

The canonical time-series schema in ClickHouse uses MergeTree with a time-ordered sort key and time-oriented codecs:

CREATE TABLE metrics
(
    metric_name LowCardinality(String),
    tags        Map(LowCardinality(String), LowCardinality(String)),
    timestamp   DateTime64(3, 'UTC') CODEC(DoubleDelta, LZ4),
    value       Float64              CODEC(Gorilla, LZ4),
    host        LowCardinality(String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (metric_name, host, timestamp)
TTL toDate(timestamp) + INTERVAL 30 DAY TO VOLUME 'warm',
    toDate(timestamp) + INTERVAL 90 DAY TO VOLUME 'cold',
    toDate(timestamp) + INTERVAL 365 DAY DELETE;

Key design choices:

  • DoubleDelta on timestamps stores deltas-of-deltas. For monotonically increasing timestamps this typically compresses to 1–2 bytes per row.
  • Gorilla on values is a Facebook-invented codec that XORs consecutive floats and stores only the changed bits — ideal for slowly-varying gauges like CPU or latency.
  • LowCardinality for tag keys and values dramatically reduces memory during GROUP BY. Values with more than a few thousand distinct entries should drop LowCardinality.
  • Daily partitioning is the sweet spot for time-series — fine enough for TTL and tier transitions, coarse enough to avoid part explosion.

High-Throughput Ingestion Patterns

Time-series ingestion is typically measured in rows per second per node. ChistaDATA has repeatedly hit 2M+ rows/sec per node on properly sized clusters. The techniques:

-- Async inserts collapse many small client inserts into bigger parts
SET async_insert = 1;
SET wait_for_async_insert = 1;
SET async_insert_max_data_size = 10485760;   -- 10 MB batches
SET async_insert_busy_timeout_ms = 200;      -- flush at most every 200ms

Alternatively, use a buffer engine or Kafka engine in front of the MergeTree. For ultra-high throughput, the S3Queue table engine (24.x+) reads files from S3 as they arrive — ideal for batch telemetry drops from edge agents.

CREATE TABLE metrics_s3_queue
(
    metric_name String,
    timestamp   DateTime64(3),
    value       Float64
)
ENGINE = S3Queue('s3://bucket/telemetry/*.parquet', 'Parquet')
SETTINGS mode = 'unordered',
         after_processing = 'delete',
         s3queue_processing_threads_num = 8;

CREATE MATERIALIZED VIEW metrics_s3_mv TO metrics
AS SELECT metric_name, {} AS tags, timestamp, value, '' AS host
   FROM metrics_s3_queue;

Downsampling and Retention

Raw time-series data is rarely queried at full resolution after 24–48 hours. Downsampling via materialized views keeps query performance flat as retention grows.

-- 1-minute rollup
CREATE TABLE metrics_1m
(
    minute      DateTime,
    metric_name LowCardinality(String),
    host        LowCardinality(String),
    count       AggregateFunction(count),
    sum         AggregateFunction(sum, Float64),
    min         AggregateFunction(min, Float64),
    max         AggregateFunction(max, Float64),
    p50         AggregateFunction(quantileTDigest(0.50), Float64),
    p99         AggregateFunction(quantileTDigest(0.99), Float64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(minute)
ORDER BY (metric_name, host, minute)
TTL minute + INTERVAL 90 DAY;

CREATE MATERIALIZED VIEW metrics_1m_mv TO metrics_1m AS
SELECT
    toStartOfMinute(timestamp)         AS minute,
    metric_name,
    host,
    countState()                       AS count,
    sumState(value)                    AS sum,
    minState(value)                    AS min,
    maxState(value)                    AS max,
    quantileTDigestState(0.50)(value)  AS p50,
    quantileTDigestState(0.99)(value)  AS p99
FROM metrics
GROUP BY minute, metric_name, host;

Chain additional MVs for 5-minute, 1-hour, and 1-day rollups. Dashboards automatically route to the correct resolution based on the selected time range.

Gap-Filling, Interpolation, and Step Queries

Time-series queries often need continuous output even when some intervals are missing data. ClickHouse provides three primitives for this: WITH FILL, arrayFill, and interpolate

-- Dense per-minute output with forward-filled missing values
SELECT
    minute,
    host,
    avg_value
FROM (
    SELECT
        toStartOfMinute(timestamp) AS minute,
        host,
        avg(value)                 AS avg_value
    FROM metrics
    WHERE metric_name = 'cpu.usage'
      AND timestamp >= now() - INTERVAL 1 HOUR
    GROUP BY minute, host
)
ORDER BY host, minute WITH FILL
    STEP toIntervalMinute(1)
    INTERPOLATE (avg_value AS avg_value);

WITH FILL STEP generates missing minute buckets; INTERPOLATE carries the last known value forward. For linear interpolation use arrayCumSum patterns or application-side resampling.

Anomaly Detection and Forecasting Inside ClickHouse

Basic statistical anomaly detection runs natively in ClickHouse — no external ML pipeline required. Z-score and moving-window standard deviation are the workhorses:

WITH windowed AS (
    SELECT
        minute,
        value,
        AVG(value)       OVER w AS rolling_avg,
        stddevPop(value) OVER w AS rolling_std
    FROM metrics_1m_finalized
    WHERE metric_name = 'orders.per_minute'
      AND minute >= now() - INTERVAL 6 HOUR
    WINDOW w AS (ORDER BY minute ROWS BETWEEN 60 PRECEDING AND 1 PRECEDING)
)
SELECT
    minute,
    value,
    rolling_avg,
    rolling_std,
    (value - rolling_avg) / nullIf(rolling_std, 0) AS z_score
FROM windowed
WHERE abs(z_score) > 3
ORDER BY minute DESC;

For forecasting, ClickHouse 24.x introduced built-in time-series functions including seriesDecomposeSTL for trend/seasonality decomposition. Combined with window functions, this gives teams a credible first-mile forecasting capability without leaving SQL.

Key Takeaways

  • Use DoubleDelta on timestamps and Gorilla on values to achieve 10-30x compression on time-series columns.
  • Order tables by (metric_name, host, timestamp) for narrow-time, filter-heavy queries.
  • Batch inserts via async_insert, Kafka engine, or S3Queue to hit multi-million rows/sec ingestion.
  • Downsample via chained AggregatingMergeTree MVs (raw → 1m → 5m → 1h → 1d) for flat dashboard latency.
  • Use WITH FILL STEP and INTERPOLATE for gap-filling continuous time-series output.
  • Run Z-score anomaly detection natively with window functions; use seriesDecomposeSTL for trend/seasonality.
  • Tier storage hot → warm → cold via TTL MOVE policies to keep retention long and cost low.

How ChistaDATA Can Help

At ChistaDATA Inc., we are 100% focused on ClickHouse. Our engineering team delivers ClickHouse consulting, 24×7 managed services, performance engineering, and mission-critical support for analytics workloads ranging from gigabytes to multi-petabyte clusters. Whether you are building a real-time analytics platform from scratch, migrating from Snowflake, BigQuery, or Redshift, or tuning an existing ClickHouse deployment, we can help accelerate outcomes. Schedule a consultation with our ClickHouse engineering team to discuss architecture, capacity planning, and performance SLAs.

Frequently Asked Questions

How does ClickHouse compare to Prometheus, InfluxDB, or TimescaleDB for time-series?

Prometheus is optimized for short-retention metrics with PromQL — great for alerting but limited for long-term analytics. InfluxDB 2.x handles metrics well at small-to-medium scale but struggles past a few TB. TimescaleDB extends PostgreSQL with good SQL but cannot match ClickHouse’s columnar compression and parallel query execution at multi-petabyte scale. ClickHouse is the preferred choice when you need long retention, full SQL, and sub-second queries over hundreds of billions of points.

What compression ratio should I expect on time-series data?

With DoubleDelta on timestamps and Gorilla on floating-point values, production workloads commonly achieve 10-30x compression compared to raw JSON. Integer counters and slowly-changing gauges compress best (25-40x). Highly volatile values with wide variance compress less (5-10x). Always benchmark with your actual data before committing to a storage budget.

How do we handle late-arriving data in a time-series pipeline?

ClickHouse accepts out-of-order inserts natively — the MergeTree sorts them during background merges. For aggregating materialized views, late data is aggregated correctly as long as it arrives within the partition window. For TTL-based retention, ensure late data arrives before the TTL expiration or use TTL MOVE rather than TTL DELETE to preserve the rows.

Can ClickHouse replace Prometheus as a metrics backend?

ClickHouse is increasingly used as a long-term storage backend for Prometheus via remote-write adapters. Short-term, Prometheus remains strong for alerting and service discovery; long-term, ClickHouse stores the full history for analytics, capacity planning, and forensic investigation. Many teams run both with Prometheus as the alerting tier and ClickHouse as the analytics tier.

How do we forecast time-series directly in ClickHouse?

ClickHouse 24.x added seriesDecomposeSTL for trend and seasonality decomposition, plus seriesOutliersDetectTukey for outlier detection. Combined with window functions for rolling averages, exponential smoothing, and linear regression via polynomial fits, ClickHouse covers most operational forecasting needs without an external ML stack. For deeper forecasting (ARIMA, Prophet, neural models), export features from ClickHouse and train externally.

How does ChistaDATA support large-scale time-series deployments?

ChistaDATA has deployed and managed ClickHouse time-series clusters ingesting millions of rows per second for observability, IoT, and fintech use cases. Our engineers help with schema design, codec selection, downsampling strategy, tiered storage, and 24×7 operations. Engagements range from performance audits to fully managed ClickHouse as a service with SLA-backed ingestion and query latency guarantees.

You might also like:

About ChistaDATA Inc. 201 Articles
We are an full-stack ClickHouse infrastructure operations Consulting, Support and Managed Services provider with core expertise in performance, scalability and data SRE. Based out of California, Our consulting and support engineering team operates out of San Francisco, Vancouver, London, Germany, Russia, Ukraine, Australia, Singapore and India to deliver 24*7 enterprise-class consultative support and managed services. We operate very closely with some of the largest and planet-scale internet properties like PayPal, Garmin, Honda cars IoT project, Viacom, National Geographic, Nike, Morgan Stanley, American Express Travel, VISA, Netflix, PRADA, Blue Dart, Carlsberg, Sony, Unilever etc