Dominate Your Analytics with Materialized Views in ClickHouse

Table of Contents

Materialized Views in ClickHouse for High-Performance Analytics

Materialized views are the secret behind interactive dashboards that feel instant on billion-row tables. In this guide, ChistaDATA engineers detail both incremental and refreshable materialized view patterns in ClickHouse, including AggregatingMergeTree design, chaining, and operational pitfalls.

Why Materialized Views Matter in ClickHouse

A materialized view in ClickHouse is not a cached query result — it is a trigger that transforms incoming data at insert time and writes the transformed rows to a destination table. This architecture is fundamentally different from traditional materialized views in PostgreSQL or Oracle, and it is the reason ClickHouse can sustain millions of inserts per second while serving pre-aggregated dashboards in milliseconds.

Two distinct flavors of materialized views exist in modern ClickHouse:

Incremental materialized views — the classic trigger-style MV, which runs the transformation query on each insert block and writes the result to a destination table.
Refreshable materialized views — introduced as experimental in ClickHouse 24.9 and declared production-ready in 24.10. They execute the full query on a schedule, replacing (or appending to) the destination.

ChistaDATA uses both patterns heavily. Incremental MVs power real-time aggregations; refreshable MVs handle snapshots, cross-source joins, and slow-changing reference data.

Incremental Materialized Views with AggregatingMergeTree

The most powerful incremental MV pattern in ClickHouse combines a MATERIALIZED VIEW with an AggregatingMergeTree destination. Aggregate states are stored partially aggregated on disk and merged lazily, which means dashboards can query pre-aggregated rollups without recomputing anything.

-- Raw events table (source)
CREATE TABLE events
(
    event_time DateTime,
    user_id    UInt64,
    country    LowCardinality(String),
    event_type LowCardinality(String),
    revenue    Float64
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, country, event_time);

-- Destination: pre-aggregated by hour and country
CREATE TABLE events_hourly
(
    hour         DateTime,
    country      LowCardinality(String),
    event_type   LowCardinality(String),
    events       AggregateFunction(count),
    users        AggregateFunction(uniqExact, UInt64),
    revenue_sum  AggregateFunction(sum, Float64),
    revenue_p95  AggregateFunction(quantileTDigest(0.95), Float64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(hour)
ORDER BY (event_type, country, hour);

-- Trigger: transforms source rows into aggregate states
CREATE MATERIALIZED VIEW events_hourly_mv TO events_hourly AS
SELECT
    toStartOfHour(event_time)          AS hour,
    country,
    event_type,
    countState()                       AS events,
    uniqExactState(user_id)            AS users,
    sumState(revenue)                  AS revenue_sum,
    quantileTDigestState(0.95)(revenue) AS revenue_p95
FROM events
GROUP BY hour, country, event_type;

Dashboards then query the rollup with Merge-suffixed functions, which finalize the aggregate state at read time:

SELECT
    hour,
    country,
    countMerge(events)                      AS events,
    uniqExactMerge(users)                   AS users,
    sumMerge(revenue_sum)                   AS revenue,
    quantileTDigestMerge(0.95)(revenue_p95) AS revenue_p95
FROM events_hourly
WHERE hour >= now() - INTERVAL 7 DAY
GROUP BY hour, country
ORDER BY hour;

How Incremental Materialized Views Actually Execute

Understanding the execution model prevents a long list of production incidents. Three rules every ClickHouse engineer must internalize:

The MV query runs on each insert block, not on the full table. The FROM clause only sees the rows in the current insert. Joins against other tables in the MV query read the current state of those tables at insert time — not a snapshot.
The MV is a transformation, not a view. Querying the MV name is equivalent to querying the destination table. You cannot “refresh” a classic MV — the destination is always the current truth.
Failures in the MV query fail the insert. A buggy MV can block ingestion. Always test with INSERT INTO source_table SELECT … before wiring a production MV.

The corollary is that joining a large dimension into an incremental MV is dangerous — the dimension is re-read on every insert block. For dimension-heavy joins, the refreshable MV pattern is safer.

Refreshable Materialized Views for Snapshots and Joins

Refreshable materialized views run the full query on a schedule and replace (or append to) the destination table. They are ideal for use cases where incremental computation is impossible or awkward.

-- Enable the feature (no longer needed in 24.10+ where it is stable)
SET allow_experimental_refreshable_materialized_view = 1;

-- Hourly snapshot of current top-1000 sessions
CREATE MATERIALIZED VIEW top_sessions_mv
REFRESH EVERY 1 HOUR
ENGINE = MergeTree
ORDER BY (session_duration_sec DESC)
AS
SELECT
    session_id,
    user_id,
    max(event_time) - min(event_time) AS session_duration_sec,
    count()                           AS event_count
FROM events
WHERE event_time >= now() - INTERVAL 24 HOUR
GROUP BY session_id, user_id
ORDER BY session_duration_sec DESC
LIMIT 1000;

The APPEND variant (24.9+) adds new rows rather than replacing — useful for capturing point-in-time snapshots:

CREATE MATERIALIZED VIEW events_snapshot_mv
REFRESH EVERY 10 MINUTE APPEND TO events_snapshot
AS SELECT
    now()        AS snapshot_time,
    event_type,
    count()      AS events,
    uniqExact(user_id) AS users
FROM events
WHERE event_time >= now() - INTERVAL 10 MINUTE
GROUP BY event_type;

Refreshable MVs can also pull from external sources via table functions (s3(), postgresql(), mysql(), url()), effectively turning ClickHouse into a scheduled ETL engine for reference data.

Chaining Materialized Views for Multi-Level Rollups

It is common to need analytics at multiple granularities — hourly, daily, monthly. Rather than running three MVs against the raw source, chain them: raw → hourly → daily → monthly. Each level reads from the previous level, not the source, which keeps cost low.

-- Daily rollup from hourly
CREATE TABLE events_daily
(
    day        Date,
    country    LowCardinality(String),
    events     AggregateFunction(count),
    users      AggregateFunction(uniqExact, UInt64),
    revenue    AggregateFunction(sum, Float64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (country, day);

CREATE MATERIALIZED VIEW events_daily_mv TO events_daily AS
SELECT
    toDate(hour) AS day,
    country,
    countMergeState(events)    AS events,
    uniqExactMergeState(users) AS users,
    sumMergeState(revenue_sum) AS revenue
FROM events_hourly
GROUP BY day, country;

The magic is in the -MergeState combinator, which re-aggregates aggregate states into aggregate states — preserving the ability to merge further downstream without losing precision on uniq/quantile/TDigest estimators.

Operational Pitfalls and How to Avoid Them

From countless ChistaDATA engagements, these are the five most common MV pitfalls in production:

Non-deterministic functions in MV definitions. now(), rand(), and generateUUIDv4() produce different values on retries. Use source columns or ingest_time defaults instead.
JOIN against mutating dimensions. The dimension state at insert time is what gets baked into the MV row. If the dimension changes later, the MV is stale. Use refreshable MVs or dictionaries with lifetime refresh for this case.
Missing ORDER BY alignment. The destination table’s ORDER BY should match how the MV is queried. Otherwise the rollup does not accelerate the dashboard.
Too many small parts. MVs receive rows per insert block. Small, frequent inserts create small parts that must merge. Batch inserts to 100k–1M rows per block where possible.
No monitoring on MV lag. If an MV query errors repeatedly, inserts may succeed while the MV falls behind. Monitor system.materialized_views and system.query_log for MV failures.

-- Detect MV errors in the last hour
SELECT
    event_time,
    query_id,
    exception
FROM system.query_log
WHERE event_time >= now() - INTERVAL 1 HOUR
  AND type = 'ExceptionWhileProcessing'
  AND query LIKE '%MATERIALIZED VIEW%'
ORDER BY event_time DESC
LIMIT 50;

Key Takeaways

ClickHouse incremental materialized views are insert-time triggers, not cached results.
Pair incremental MVs with AggregatingMergeTree and -State/-Merge combinators for dashboard-grade aggregates.
Refreshable materialized views (24.10+) are production-ready for snapshots, scheduled ETL, and dimension-heavy joins.
Chain MVs raw → hourly → daily → monthly using -MergeState to compose rollups without re-reading raw data.
Avoid now(), rand(), and mutable JOINs inside MV definitions — all break determinism and idempotency.
Monitor system.query_log and system.materialized_views for MV errors and lag.

How ChistaDATA Can Help

At ChistaDATA Inc., we are 100% focused on ClickHouse. Our engineering team delivers ClickHouse consulting, 24×7 managed services, performance engineering, and mission-critical support for analytics workloads ranging from gigabytes to multi-petabyte clusters. Whether you are building a real-time analytics platform from scratch, migrating from Snowflake, BigQuery, or Redshift, or tuning an existing ClickHouse deployment, we can help accelerate outcomes. Schedule a consultation with our ClickHouse engineering team to discuss architecture, capacity planning, and performance SLAs.

Frequently Asked Questions

What is the difference between incremental and refreshable materialized views?

Incremental MVs are triggers that run on each insert block and write transformed rows to a destination. Refreshable MVs run the full query on a schedule and replace (or append to) the destination. Incremental is ideal for streaming aggregations; refreshable is ideal for snapshots, dimension joins, or reference data imports.

Can I create a materialized view on an existing table with historical data?

Yes, but the MV only processes new inserts. To backfill history, use a POPULATE clause on creation (which blocks until backfill finishes) or manually INSERT historical data into the destination table using the same SELECT as the MV definition. We recommend manual backfill for large tables to avoid long-running locks.

Do materialized views impact insert throughput?

Yes — each MV adds transformation cost per insert block. A well-designed MV typically adds 5-20% overhead. Chained MVs multiply that cost per level. At ChistaDATA we benchmark MV overhead during schema design and cap total MV count per source table at 3-5 for high-throughput workloads.

How do I handle schema changes on a table with materialized views?

For additive changes (new columns), ALTER TABLE on the source, then ALTER TABLE on the destination, then recreate the MV with the updated SELECT. For destructive changes, create a new MV alongside the old one, backfill, swap via rename, and drop the old MV. Always test schema migrations in a staging environment first.

Can materialized views write to a different cluster or remote database?

Yes. A materialized view can have a Distributed table as its destination, which forwards rows to shards across a cluster. For writing to external systems, use a refreshable MV combined with an INSERT INTO FUNCTION statement, or use the Kafka engine as a destination to stream rows out to a message bus.

How does ChistaDATA help design materialized view strategies?

ChistaDATA performs ClickHouse schema audits where we inventory existing MVs, identify redundant aggregations, recommend chaining opportunities, and design AggregatingMergeTree destinations matched to actual dashboard queries. Our engineers also train in-house teams on the -State/-Merge combinator pattern that underpins efficient multi-level rollups.