ClickHouse Query Optimization: Indexing Strategies That Actually Work
ClickHouse indexing does not work the way row-store indexing does — and that is exactly why so many clusters underperform. This is a senior engineer’s guide to the indexing strategies that genuinely move query latency, grounded in production practice, with runnable SQL at every step.
If you arrive at ClickHouse from PostgreSQL, MySQL, or SQL Server and try to “add an index to the slow column,” you will be disappointed — and you will misunderstand the most powerful analytical database in open source.
ClickHouse does not maintain a per-row B-tree on every indexed column. Instead, performance comes from a small number of structural decisions: the order in which data is physically stored, a sparse primary index that lets the engine skip whole blocks of rows, lightweight data-skipping indexes that prune granules before they are read, and projection and materialized-view machinery that maintains alternate physical layouts for you. Get these right and a query that scanned billions of rows touches only a few million; get them wrong and no amount of hardware will save you.
This guide walks through the indexing strategies that actually work on ClickHouse in production. Each strategy explains the mechanism, shows the SQL, and notes the judgment that separates a feature from an outcome. The discipline underneath all of them is the same one our engineers apply on every ClickHouse performance audit: measure how many granules a query reads, change one structural decision, and measure again. Indexing in ClickHouse is an exercise in reading less data, and the only proof that you have succeeded is fewer rows scanned.
One framing is worth holding throughout. In a transactional row store, an index is a lookup structure that finds individual rows quickly. In ClickHouse, an analytical column store built for scanning, the goal is almost never to find one row — it is to avoid reading the billions of rows that are irrelevant to the query. Every technique below is therefore a different way of answering one question: how do we let the engine skip data it does not need? Hold that question in mind and the strategies stop feeling like a grab-bag of features and start forming a single, coherent design method.
Treat the sorting key as the index
In a MergeTree table, the ORDER BY clause is the single most consequential performance decision you will make. It defines the order in which rows are physically stored on disk, and that physical order is what the sparse primary index is built on. There is no separate “create index” step that rescues a poor sorting key after the fact; the sorting key is the primary index, so it must reflect how the table is actually queried.
ClickHouse stores data in parts, and within each part rows are sorted by the ORDER BY expression and divided into granules — by default, blocks of 8,192 rows. The sparse primary index keeps one entry per granule rather than one per row, which is why it is small enough to stay in memory even for enormous tables. When a query filters on a prefix of the sorting key, the engine performs binary search over these marks and reads only the granules that can possibly match. The mechanics are documented in detail in the ClickHouse guide to the sparse primary index.
The practical consequence is that you cannot bolt performance on later. Because the sorting key dictates the physical layout of the data on disk, changing it means rewriting the table, so the decision deserves real analysis up front. Treat it the way you would treat the primary-key design of a transactional schema: study the queries that matter, model the access patterns, and choose the ordering that serves the most valuable ones — because in ClickHouse this one clause determines the ceiling on how fast the table can ever be.
-- The ORDER BY clause is your primary index
CREATE TABLE events
(
tenant_id UInt32,
event_date Date,
event_type LowCardinality(String),
user_id UInt64,
payload String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_date)
ORDER BY (tenant_id, event_type, event_date);
Order primary-key columns correctly
Once you accept that the sorting key is the index, the order of its columns becomes the next decision — and it is governed by a strict rule. ClickHouse can use the primary index efficiently only for a left-to-right prefix of the sorting key. A filter on the first column is fast; a filter on the second column is fast only when the first is also constrained; a filter on the third column alone cannot use the index at all. Designing the sorting key therefore means ranking columns by how your queries filter, not by intuition.
Two principles guide the ordering. First, place the columns that appear most often as equality filters earliest, because they enable the index for the widest range of queries. Second, among those, lead with lower-cardinality columns, because they group many rows under each index mark and produce long runs that compress well and prune cleanly. A common, effective pattern is tenant or category first, then a moderate-cardinality dimension, then time. Validate the choice against the MergeTree documentation and, more importantly, against your real query log.
The most common ClickHouse anti-pattern we remediate is a high-cardinality column such as user_id placed first in ORDER BY. It cripples the index for every query that does not filter on that exact user. Lead with the low-cardinality predicate instead — explore our ClickHouse performance tuning approach.
Tune index_granularity to the workload
The sparse index keeps one mark per granule, and the granule size — index_granularity, 8,192 rows by default — sets the trade-off between index precision and index size. Smaller granules mean the engine can skip to a more precise location and read fewer surrounding rows, at the cost of a larger index and more marks to scan. Larger granules shrink the index but force the engine to read more rows around each match. For most workloads the default is well chosen, and changing it should be evidence-driven rather than reflexive.
Two scenarios justify revisiting it. Highly selective point-lookup workloads on very large tables can benefit from a smaller granularity or from adaptive granularity, which sizes granules by byte volume rather than a fixed row count. Conversely, wide-scan analytical workloads rarely benefit from shrinking it and may pay for the larger index in memory. As with every decision in this guide, measure the granules read before and after; an index change that does not reduce data read is not an optimization.
-- Set granularity explicitly when the workload justifies it
CREATE TABLE metrics
(
metric_id UInt32,
ts DateTime,
value Float64
)
ENGINE = MergeTree
ORDER BY (metric_id, ts)
SETTINGS index_granularity = 4096;
Add data-skipping indexes deliberately
When a query filters on a column that is not part of the sorting-key prefix, the primary index cannot help — but a data-skipping index often can. These secondary structures store a small summary per block of granules and let the engine skip blocks that cannot contain a match. They are powerful when applied to the right column with the right type, and pure overhead when scattered across columns indiscriminately. The decisive factor is whether the indexed values are correlated with the physical row order; an index on a column whose values are randomly distributed across granules will skip nothing.
Choosing the index type matters as much as choosing the column. The table below summarizes the practical choices; the full set is described in the ClickHouse documentation on data-skipping indexes.
Two cautions govern their use. First, a skip index has its own GRANULARITY — expressed as a number of primary-index granules per index block — and tuning it controls how coarse or fine the skipping is; too coarse and it rarely skips, too fine and the index grows large. Second, more is not better. Each skip index adds write-time cost and storage, and an index that never prunes a block is pure overhead. Add them one at a time, in response to a specific slow query, and confirm with the query plan that the index actually eliminates blocks before keeping it.
| Index type | Best for | How it works |
|---|---|---|
minmax |
Numeric or date columns correlated with row order | Stores the min and max per block; skips blocks outside the filter range. Cheapest and often most effective. |
set(N) |
Low-to-moderate cardinality equality filters | Stores up to N distinct values per block; skips blocks not containing the value. |
bloom_filter |
Higher-cardinality equality and IN filters | Probabilistic membership test with a tunable false-positive rate; compact but approximate. |
ngrambf_v1 |
Substring and LIKE search on text | Bloom filter over character n-grams; supports partial-string matching. |
tokenbf_v1 |
Whole-token (word) search in text and logs | Bloom filter over tokens; ideal for log search on delimited text. |
-- Add a skip index on a non-prefix column
ALTER TABLE events
ADD INDEX idx_user user_id TYPE bloom_filter(0.01) GRANULARITY 4;
-- Materialize it across existing parts
ALTER TABLE events MATERIALIZE INDEX idx_user;
Use projections for alternate orderings
A single sorting key cannot be optimal for every query against a table. When two access patterns demand contradictory orderings — say, by tenant for one report and by time for another — projections resolve the conflict. A projection is an additional, automatically maintained physical copy of the data within the same table, stored in a different order or pre-aggregated. ClickHouse chooses the best projection for each query transparently, so the application does not change.
Projections come in two flavors. A normal projection reorders the data to serve queries that filter or sort differently from the base table, effectively giving you a second primary index. An aggregating projection pre-computes group-by results, so a dashboard query reads a small summary rather than scanning the raw rows. The cost is additional storage and write-time maintenance, so add projections to serve specific, high-value query patterns rather than speculatively. The feature is documented under ClickHouse projections.
-- A projection that serves a different access pattern
ALTER TABLE events ADD PROJECTION p_by_user
(
SELECT * ORDER BY (user_id, event_date)
);
ALTER TABLE events MATERIALIZE PROJECTION p_by_user;
Pre-aggregate with materialized views
The fastest scan is the one that reads a pre-computed answer. ClickHouse materialized views are not stored query results that go stale; they are insert triggers that transform incoming rows and write them into a target table as the data arrives. Paired with an AggregatingMergeTree target, a materialized view maintains rolling aggregates — counts, sums, uniques, quantiles — continuously and incrementally, so a dashboard reads thousands of summary rows instead of scanning billions of raw events.
This is indexing in the broad sense: you are trading a small amount of write-time work and storage for an enormous reduction in read-time data volume. The pattern is especially powerful for the high-frequency aggregate queries that dominate analytical dashboards, where the same group-by runs thousands of times an hour. Design the view around those known queries, and keep the raw table for ad-hoc exploration. The mechanics are covered in the ClickHouse documentation on materialized views.
Two design notes keep this technique reliable. Because the view runs on insert, it sees only the newly inserted block, not the whole table — so it suits incremental aggregates and not calculations that need to look across all historical rows at write time. And because the target uses aggregate states, queries against it must finalize those states with the matching -Merge combinator, reading the rolled-up data through a small wrapper view or query. Done correctly, a dashboard that once scanned a billion-row table answers from a few thousand pre-aggregated rows, and the difference in latency is not incremental but categorical.
-- Incremental rollup into an aggregating target table
CREATE MATERIALIZED VIEW mv_daily_counts
TO daily_counts AS
SELECT tenant_id, event_date,
countState() AS events,
uniqState(user_id) AS users
FROM events
GROUP BY tenant_id, event_date;
Read less with types, codecs & LowCardinality
Indexing decides which granules are read; data types and codecs decide how much each granule costs to read. Because ClickHouse is columnar and I/O-bound on large scans, shrinking the on-disk footprint of a column is functionally an optimization of every query that touches it. Choosing the narrowest correct integer type, storing timestamps as DateTime rather than strings, and wrapping repetitive string columns in LowCardinality can cut both storage and scan time dramatically.
Compression codecs add another lever. The Delta and DoubleDelta codecs excel on monotonic sequences such as timestamps and counters, Gorilla suits slowly changing floating-point metrics, and these can be layered before the general-purpose compressor. The effect compounds with good ordering: a well-chosen sorting key clusters similar values together, which makes the codecs far more effective. Together, type discipline and codecs frequently halve scan time without any change to the queries themselves.
-- Narrow types and codecs reduce I/O on every scan
CREATE TABLE readings
(
sensor_id UInt32 CODEC(Delta, ZSTD),
ts DateTime CODEC(DoubleDelta, ZSTD),
region LowCardinality(String),
value Float32 CODEC(Gorilla, ZSTD)
)
ENGINE = MergeTree
ORDER BY (sensor_id, ts);
Partition for pruning, not for indexing
Partitioning is frequently confused with indexing, and conflating the two is a reliable way to slow a cluster down. The PARTITION BY clause divides a table into independent parts on disk, which enables partition pruning — skipping entire partitions whose values cannot match a filter — and makes data lifecycle operations such as dropping old data instantaneous. It is a coarse, complementary mechanism to the fine-grained sparse index, not a substitute for it.
The cardinal rule is to keep partitions large and few. A monthly partition on a date column is almost always right; a partition per day per tenant, producing tens of thousands of tiny partitions, overwhelms the engine with parts and degrades both inserts and queries. Partition by the dimension you use to expire or archive data, usually a coarse time bucket, and let the sorting key and skip indexes do the within-partition work. The trade-offs are detailed in the documentation on custom partitioning keys.
Over-partitioning is the most expensive ClickHouse mistake we see at scale: too many parts, slow merges, and exhausted file handles. If your system.parts shows tens of thousands of active parts, partitioning — not indexing — is your bottleneck.
How to prove an index change actually worked
Every strategy in this guide ends with the same instruction: measure. ClickHouse makes this straightforward, and refusing to guess is what separates engineering from folklore. The authoritative tool is EXPLAIN indexes = 1, which shows exactly which indexes were used and how many granules survived each pruning step. A successful index change reduces the granules selected; if the number does not fall, the index did nothing useful regardless of how reasonable it looked.
Complement the plan with the system tables. system.query_log records rows read and bytes read per query, which is the ground truth of how much data a query touched, and system.parts reveals part counts and sizes that expose partitioning problems. The EXPLAIN documentation describes the available modes. Capture these numbers before and after every change, and you will build an evidence trail that turns ClickHouse tuning into a repeatable engineering process rather than a sequence of hopeful edits.
Make the measurement a habit, not an afterthought. The single most useful metric is rows read per query relative to table size: a well-indexed query reads a tiny fraction of the table, and a query reading most of the table is telling you the sorting key, skip indexes, or partitioning are not serving it. Track that ratio for your most important queries over time, and regressions become visible the moment a schema change, a data-distribution shift, or a new access pattern erodes the design. This is precisely the telemetry our engineers baseline at the start of an engagement and re-measure at the end, so that every recommendation is backed by a number rather than an opinion.
-- See which indexes ran and how many granules survived
EXPLAIN indexes = 1
SELECT count() FROM events
WHERE tenant_id = 42 AND event_type = 'purchase';
-- Ground truth: rows and bytes actually read
SELECT query_duration_ms, read_rows, formatReadableSize(read_bytes)
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY event_time DESC LIMIT 10;
The indexing strategy in one view
Eight strategies, one principle: indexing in ClickHouse means engineering the cluster to read less data.
- The sorting key is the index. Choose
ORDER BYfrom real query patterns; everything else builds on it. - Order columns by the prefix rule. Low-cardinality, frequently filtered columns lead; high-cardinality columns rarely belong first.
- Add skip indexes only where values correlate with row order. Match the type — minmax, set, bloom, ngram, or token — to the filter.
- Use projections and materialized views to serve conflicting access patterns and to pre-aggregate the queries that run most often.
- Read less per granule with narrow types, LowCardinality, and codecs that exploit your ordering.
- Partition coarsely for pruning and lifecycle — never per day per tenant — and prove every change with
EXPLAIN indexes = 1.
Frequently asked questions
Does ClickHouse support traditional indexes like B-trees?
No. ClickHouse has no per-row B-tree indexes. Performance comes from the table’s sorting key, which defines physical row order and underpins a sparse primary index, supplemented by data-skipping indexes, projections, and materialized views. Engineers arriving from row stores should think in terms of “how do I make this query read fewer granules” rather than “which column do I index.”
How do I choose the ORDER BY key for a ClickHouse table?
Rank columns by how your queries filter, place the most frequently used equality predicates earliest, and among those lead with lower-cardinality columns. The primary index is usable only for a left-to-right prefix of the sorting key, so the order directly determines which queries it accelerates. Validate against your real query log, and engage a ClickHouse consulting review for high-stakes schemas.
When should I use a data-skipping index versus a projection?
Use a data-skipping index when a query filters on a non-prefix column whose values are correlated with row order — it is lightweight and prunes blocks cheaply. Use a projection when an access pattern needs a fundamentally different ordering or a pre-aggregated result, effectively giving the table a second physical layout. Projections cost more storage and write-time work, so reserve them for specific high-value patterns.
Why is my ClickHouse query still slow after adding an index?
The most common reasons are that the indexed column’s values are not correlated with physical row order, that the filter targets a non-prefix column the primary index cannot use, or that the table is over-partitioned into too many small parts. Run EXPLAIN indexes = 1 and inspect system.query_log to see whether granules and rows read actually fell; if they did not, the index is not the lever.
Can over-partitioning hurt ClickHouse performance?
Yes, severely. Partitioning by a high-cardinality combination such as day-and-tenant creates tens of thousands of small parts, which slows merges, inflates metadata, and degrades both inserts and queries. Partition coarsely — typically a monthly time bucket aligned with how you expire data — and let the sorting key and skip indexes do the fine-grained work within each partition.
When should we bring in a specialist for ClickHouse optimization?
Engage a specialist when query latency or cost targets are at risk, when a cluster has grown faster than its schema design, or when an estate is large enough that a structured audit will pay for itself. ChistaDATA provides ClickHouse performance audits, ongoing ClickHouse support, and managed services for teams that want a senior bench without the in-house headcount.
ChistaDATA Engineering Team
ChistaDATA Inc. is a specialist ClickHouse infrastructure operations firm delivering consulting, performance engineering, migration, and 24×7 support on 100% open-source ClickHouse with zero vendor lock-in. Read more on the ChistaDATA blog or explore ClickHouse consulting.
Is your ClickHouse cluster reading far more data than it should?
ChistaDATA engineers run structured ClickHouse performance audits that re-engineer sorting keys, skip indexes, and projections to cut the data your queries read — verified against your production telemetry. A thirty-minute conversation is enough to scope the work.
ClickHouse is a registered trademark of ClickHouse, Inc. ChistaDATA is not affiliated with, endorsed by, or sponsored by ClickHouse, Inc. All other trademarks are the property of their respective owners. Copyright © 2010–2026. All Rights Reserved by ChistaDATA®.