ClickHouse sharding is one of the most consequential architectural decisions you will make when scaling an analytical database from millions to billions of rows per day. In a high-growth environment — where query volumes double every quarter and new data streams come online without warning — a poorly chosen sharding strategy will become a ceiling on your entire analytics platform.
This guide covers every major ClickHouse sharding strategy in depth, with real configuration examples, trade-off analysis, and operational guidance for teams running distributed ClickHouse clusters at scale. Whether you are choosing a ClickHouse sharding key and ClickHouse sharding topology for the first time or evaluating whether to re-shard an existing production cluster, you will find concrete answers here.
What Is Sharding in ClickHouse?
Sharding in ClickHouse means dividing data across multiple independent server nodes so that each node stores and processes only a subset of the total dataset. Unlike replication — which copies the same data to multiple nodes for fault tolerance — sharding distributes distinct data partitions horizontally, allowing the cluster to store and query far more data than any single node could handle alone.
ClickHouse implements sharding through the Distributed table engine. When you write to a Distributed table, ClickHouse routes each inserted row to one of the configured shards according to a sharding key expression. When you query a Distributed table, ClickHouse fans the query out to all relevant shards, collects partial results, and merges them at the initiating node before returning the final result set to the client.
The cluster topology is defined in the config.xml file (or its include files), where you declare shards and replicas inside a <remote_servers> block. Each shard can contain one or more replica nodes for fault tolerance. Understanding this topology is prerequisite knowledge before evaluating any sharding strategy.
Why ClickHouse Sharding Strategy Matters in High-Growth Environments
High-growth environments expose sharding defects faster than steady-state workloads. When data volume doubles every six months, a ClickHouse sharding key that produces acceptably uniform distribution today may create a 10x hotspot in two years.
When new product lines add new tenant identifiers, a hash-based ClickHouse sharding strategy that was balanced yesterday may skew dramatically tomorrow. The cost of re-sharding a production cluster — stopping ingestion, copying petabytes of data, repointing application connections — is high enough that getting the sharding strategy right on the first design pass is worth significant upfront investment.
The four primary dimensions along which sharding strategies differ are: data distribution uniformity, query locality (whether a query can be answered by a single shard or must scatter to all shards), operational simplicity, and elasticity (how easily the cluster expands when you add nodes).
Defining the ClickHouse Cluster Topology
Before choosing a sharding key, you must define your cluster in config.xml. The following example configures a three-shard cluster where each shard has two replicas, giving you both horizontal scale-out and fault tolerance.
<remote_servers>
<analytics_cluster>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ch-node-01</host>
<port>9000</port>
</replica>
<replica>
<host>ch-node-02</host>
<port>9000</port>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ch-node-03</host>
<port>9000</port>
</replica>
<replica>
<host>ch-node-04</host>
<port>9000</port>
</replica>
</shard>
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>ch-node-05</host>
<port>9000</port>
</replica>
<replica>
<host>ch-node-06</host>
<port>9000</port>
</replica>
</shard>
</analytics_cluster>
</remote_servers>
The internal_replication flag tells ClickHouse to replicate inserts between shard replicas automatically using ReplicatedMergeTree, rather than writing to all replicas from the Distributed layer. This is the recommended setting for production clusters because it reduces write amplification on the client side and provides stronger consistency guarantees.
ClickHouse Sharding Strategy 1: Random Sharding (rand())
The simplest possible ClickHouse sharding key is rand(). Every inserted row is assigned a random hash value, routing it uniformly across all shards.
-- Local table on each shard (MergeTree family)
CREATE TABLE events_local ON CLUSTER analytics_cluster
(
event_id UUID,
tenant_id UInt32,
event_type LowCardinality(String),
event_ts DateTime,
payload String
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/{shard}/events_local',
'{replica}'
)
PARTITION BY toYYYYMM(event_ts)
ORDER BY (tenant_id, event_ts);
-- Distributed table with random sharding
CREATE TABLE events ON CLUSTER analytics_cluster
AS events_local
ENGINE = Distributed(
analytics_cluster,
currentDatabase(),
events_local,
rand()
);
Random ClickHouse sharding produces near-perfect data distribution and is operationally trivial. The critical downside is that every query must scatter to all shards, because ClickHouse has no way to know which shard holds any particular row. For workloads where the primary access pattern is full-table aggregations — such as computing global event counts or percentile distributions across an entire dataset — random sharding is perfectly appropriate. For workloads that frequently filter by a high-cardinality column such as tenant_id, random sharding is wasteful: the full scatter cost is paid on every query regardless of how selective the filter is.
ClickHouse Sharding Strategy 2: Hash-Based Sharding on a Business Key
Hash-based sharding uses a deterministic hash of a business key column (or combination of columns) to route rows to shards. All rows sharing the same key value are guaranteed to land on the same shard, enabling shard-local query execution for queries that filter on the sharding key.
-- Distribute by tenant_id so all rows for a tenant are co-located
CREATE TABLE events ON CLUSTER analytics_cluster
AS events_local
ENGINE = Distributed(
analytics_cluster,
currentDatabase(),
events_local,
intHash64(tenant_id)
);
-- Verify distribution across shards after ingestion
SELECT
hostName() AS node,
count() AS row_count,
formatReadableSize(sum(data_compressed_bytes)) AS compressed_size
FROM clusterAllReplicas('analytics_cluster', system.parts)
WHERE table = 'events_local'
AND active = 1
GROUP BY node
ORDER BY node;
When a query includes WHERE tenant_id = 42, the Distributed engine computes intHash64(42), identifies which shard owns that value, and sends the query only to that shard. This dramatically reduces inter-node network traffic and CPU load on shards that hold no relevant data. Hash-based ClickHouse sharding on a tenant identifier is the most common ClickHouse sharding strategy for multi-tenant SaaS analytics platforms.
The risk with hash-based sharding is cardinality skew. If one tenant generates 60% of all events — a common scenario in SaaS platforms that have a few very large enterprise accounts — that tenant’s shard becomes a hotspot. Queries that touch the hot shard will always be slower than queries on cold shards, and ingestion throughput on the hot shard can become a bottleneck. The weight attribute in the cluster topology can partially mitigate this by allocating more disk and CPU to shards expected to receive higher volumes, but it does not eliminate the fundamental skew.
ClickHouse Sharding Strategy 3: Composite Key Sharding
Composite key sharding hashes two or more columns together to route rows. This is useful when a single column would produce skewed distribution but a combination of columns produces acceptable uniformity.
-- Hash on (tenant_id, event_type) to spread large tenants across shards
-- while still enabling partial shard pruning on tenant_id + event_type queries
CREATE TABLE events ON CLUSTER analytics_cluster
AS events_local
ENGINE = Distributed(
analytics_cluster,
currentDatabase(),
events_local,
cityHash64(tenant_id, event_type)
);
-- Example of a query that benefits from composite sharding:
-- ClickHouse can prune to a subset of shards when both columns are in WHERE
SELECT
toDate(event_ts) AS event_date,
count() AS total_events,
uniq(event_id) AS unique_events
FROM events
WHERE tenant_id = 101
AND event_type = 'page_view'
AND event_ts >= now() - INTERVAL 30 DAY
GROUP BY event_date
ORDER BY event_date;
Composite sharding spreads hot tenants more evenly because two tenants with the same tenant_id but different dominant event_type values will hash to different shards. The trade-off is that queries filtering only on tenant_id (without specifying event_type) cannot prune shards and must scatter. You must profile your query workload carefully before choosing a composite sharding key to confirm that the most frequent query patterns include all sharding key columns in their WHERE clause.
ClickHouse Sharding Strategy 4: Time-Based Sharding
Time-based sharding routes rows to shards based on a timestamp column, so that each shard holds a distinct time window of data. This strategy is a natural fit for time-series workloads where recent data is queried most frequently and old data can be archived or tiered to cheaper storage.
-- Time-based sharding: route to shard based on month of event_ts
-- Shard 1 = months where toMonth(event_ts) IN (1,2,3,4)
-- Shard 2 = months where toMonth(event_ts) IN (5,6,7,8)
-- Shard 3 = months where toMonth(event_ts) IN (9,10,11,12)
CREATE TABLE events ON CLUSTER analytics_cluster
AS events_local
ENGINE = Distributed(
analytics_cluster,
currentDatabase(),
events_local,
toMonth(event_ts) % 3
);
-- Validate that data lands on correct shards
SELECT
hostName() AS node,
min(event_ts) AS earliest,
max(event_ts) AS latest,
count() AS row_count
FROM clusterAllReplicas('analytics_cluster', currentDatabase(), events_local)
GROUP BY node
ORDER BY node;
Time-based sharding enables very efficient shard pruning for time-range queries: a query asking for the last 30 days of data may only need to touch one or two shards rather than all three. It also simplifies data lifecycle management — you can drop an entire shard to implement a data retention policy, or migrate the oldest shard to a tier of lower-cost hardware without touching active shards.
The major risk is write hotspotting. In a real-time streaming ingestion workload, all incoming events have timestamps near “now,” which means all writes land on the same shard — the one responsible for the current time window. This creates severe write-side imbalance. Time-based sharding is best suited for batch-loaded historical datasets or combined with a hash-based component to spread concurrent writes.
ClickHouse Sharding Strategy 5: Weighted Sharding for Heterogeneous Hardware
When your cluster nodes have different hardware specifications — perhaps because you added larger nodes during a hardware refresh cycle — you can use shard weights to route a proportionally larger fraction of writes to the more capable shards.
<remote_servers>
<analytics_cluster>
<!-- Legacy shard: 2 vCPU, 16 GB RAM, 1 TB SSD -->
<shard>
<weight>1</weight>
<replica>
<host>ch-node-01</host>
<port>9000</port>
</replica>
</shard>
<!-- New shard: 16 vCPU, 128 GB RAM, 8 TB NVMe -->
<shard>
<weight>8</weight>
<replica>
<host>ch-node-03</host>
<port>9000</port>
</replica>
</shard>
</analytics_cluster>
</remote_servers>
With this configuration, the Distributed engine routes writes to the new shard eight times more often than to the legacy shard, approximately matching the difference in their storage and compute capacity. Weighted sharding is a pragmatic bridge strategy during hardware transitions and cloud instance type migrations. Note that weights only affect write routing for the rand() and similar non-deterministic sharding keys — deterministic hash-based keys always send a given key value to the same shard regardless of weight.
ClickHouse Sharding Strategy 6: Per-Tenant Sharding (Virtual Shards)
In large multi-tenant deployments, some teams implement per-tenant sharding: each major customer gets their own dedicated shard or even their own dedicated cluster. This provides the strongest query isolation guarantees and makes capacity planning straightforward at the per-tenant level.
-- Create a dedicated cluster definition for tenant 42
-- In config.xml: <tenant_42_cluster> pointing to dedicated nodes
-- Local table on the tenant's dedicated shard
CREATE TABLE events_local ON CLUSTER tenant_42_cluster
(
event_id UUID,
tenant_id UInt32,
event_type LowCardinality(String),
event_ts DateTime,
payload String
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/{shard}/tenant_42/events_local',
'{replica}'
)
PARTITION BY toYYYYMM(event_ts)
ORDER BY (event_type, event_ts);
-- Query the tenant's data on their dedicated cluster
SELECT
event_type,
count() AS event_count,
uniq(event_id) AS unique_events
FROM events_local -- Query directly on local table (no scatter)
WHERE event_ts >= now() - INTERVAL 7 DAY
GROUP BY event_type
ORDER BY event_count DESC
LIMIT 20;
Per-tenant sharding eliminates noisy-neighbor problems entirely and provides clear SLA boundaries. The operational overhead is significant: you maintain N cluster configurations, N sets of tables, and N ingestion pipelines. This strategy is practical for a small number of very large enterprise tenants, typically fewer than twenty, where the isolation guarantee justifies the operational complexity. For the long tail of smaller tenants, a standard hash-based sharding strategy on the shared cluster is more appropriate.
ClickHouse Sharding: Shard Rebalancing and Resharding
ClickHouse does not have a built-in automatic ClickHouse sharding or resharding mechanism. When you add new shards to an existing cluster, existing data does not automatically migrate to the new nodes. You must handle rebalancing manually or through orchestration tooling. The standard approach is to create a new Distributed table pointing to the expanded cluster topology, backfill data from the old shards to the new ones using INSERT INTO ... SELECT FROM, and then swap the application connection to the new table.
-- Step 1: Create new local tables on all nodes (including newly added shard)
CREATE TABLE events_local_v2 ON CLUSTER analytics_cluster_v2
(
event_id UUID,
tenant_id UInt32,
event_type LowCardinality(String),
event_ts DateTime,
payload String
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/{shard}/events_local_v2',
'{replica}'
)
PARTITION BY toYYYYMM(event_ts)
ORDER BY (tenant_id, event_ts);
-- Step 2: Create new Distributed table using expanded cluster
CREATE TABLE events_v2 ON CLUSTER analytics_cluster_v2
AS events_local_v2
ENGINE = Distributed(
analytics_cluster_v2,
currentDatabase(),
events_local_v2,
intHash64(tenant_id)
);
-- Step 3: Backfill historical data partition by partition
-- (Run for each YYYYMM partition to control memory and IO)
INSERT INTO events_v2
SELECT event_id, tenant_id, event_type, event_ts, payload
FROM events
WHERE toYYYYMM(event_ts) = 202501;
-- Step 4: Verify row counts match before cutover
SELECT
'old' AS source, count() AS rows FROM events
WHERE toYYYYMM(event_ts) = 202501
UNION ALL
SELECT
'new' AS source, count() AS rows FROM events_v2
WHERE toYYYYMM(event_ts) = 202501;
This partition-by-partition backfill approach limits the memory pressure on the coordinator node that merges results across shards. Running the backfill during off-peak hours and monitoring disk IO on both old and new shards is essential to avoid impacting production query performance during the migration window.
Monitoring ClickHouse Sharding Balance and Detecting Hotspots
Maintaining healthy ClickHouse sharding balance in a high-growth environment requires continuous monitoring. The system.parts and system.disks tables on each node provide the raw metrics you need. The following query aggregates per-shard statistics across the entire cluster and is suitable for integration into a monitoring dashboard.
-- Cluster-wide shard balance report
SELECT
hostName() AS node,
table,
count() AS part_count,
sum(rows) AS total_rows,
formatReadableSize(sum(data_compressed_bytes)) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
round(sum(data_compressed_bytes) /
sum(sum(data_compressed_bytes)) OVER ()
* 100, 2) AS pct_of_total
FROM clusterAllReplicas('analytics_cluster', system.parts)
WHERE active = 1
AND database = currentDatabase()
GROUP BY node, table
ORDER BY table, total_rows DESC;
-- Detect ingestion hotspots: nodes receiving disproportionate write traffic
SELECT
hostName() AS node,
sum(written_rows) AS rows_written_last_hour
FROM clusterAllReplicas(
'analytics_cluster',
system.query_log
)
WHERE event_time >= now() - INTERVAL 1 HOUR
AND query_kind = 'Insert'
GROUP BY node
ORDER BY rows_written_last_hour DESC;
Alert on shard imbalance when the largest shard holds more than 150% of the data volume of the smallest shard. In high-growth environments where data patterns shift as new product features launch, running this check weekly and reviewing the results quarterly against your sharding key selection is a sound operational practice.
ClickHouse Sharding Coordination: ZooKeeper vs ClickHouse Keeper
ClickHouse sharding with replicated shards depends on ZooKeeper or ClickHouse Keeper for replication coordination. The official ClickHouse Keeper documentation provides complete configuration reference. As shard count grows, the metadata load on the coordination service grows proportionally. Each ReplicatedMergeTree table registers a ZooKeeper path, and every INSERT, MERGE, and ALTER operation generates coordination traffic. In clusters with more than five shards and hundreds of tables, ClickHouse Keeper is strongly preferred over ZooKeeper because it is purpose-built for ClickHouse’s access patterns and scales more gracefully under high metadata throughput.
<!-- config.xml: Use ClickHouse Keeper instead of ZooKeeper -->
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<raft_logs_level>warning</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>ch-keeper-01</hostname>
<port>9234</port>
</server>
<server>
<id>2</id>
<hostname>ch-keeper-02</hostname>
<port>9234</port>
</server>
<server>
<id>3</id>
<hostname>ch-keeper-03</hostname>
<port>9234</port>
</server>
</raft_configuration>
</keeper_server>
<!-- Point ClickHouse at the Keeper cluster -->
<zookeeper>
<node>
<host>ch-keeper-01</host>
<port>9181</port>
</node>
<node>
<host>ch-keeper-02</host>
<port>9181</port>
</node>
<node>
<host>ch-keeper-03</host>
<port>9181</port>
</node>
</zookeeper>
Choosing the Right ClickHouse Sharding Strategy: A Decision Framework
No single ClickHouse sharding strategy is universally optimal. The right choice depends on the dominant query pattern, the distribution characteristics of your data, and the operational maturity of your team. Use the following framework to guide your decision.
Choose random sharding (rand()) when your primary workload is full-table aggregations or analytical scans that must touch all data regardless of filters. Choose hash-based sharding on tenant_id when you operate a multi-tenant platform and the majority of queries filter by tenant.
Choose composite key sharding when a single-column hash creates unacceptable skew and your query patterns reliably include multiple high-cardinality columns in filters. Choose time-based sharding when your workload is a batch-loaded time series with strong temporal locality in queries and a clear data lifecycle requirement.
Choose weighted sharding when you are operating a heterogeneous cluster during a hardware transition. Choose per-tenant sharding only when you have a small number of very large tenants whose isolation and SLA requirements justify the operational overhead.
In high-growth environments, revisit your sharding strategy as part of your quarterly capacity review. As data volumes grow and query patterns evolve, the optimal strategy may shift. The cost of a proactive ClickHouse sharding operation — done during a planned maintenance window before you hit capacity — is always lower than an emergency resharding under production pressure.
Related ClickHouse Sharding Resources from ChistaDATA
For teams building on ClickHouse, the following resources from ChistaDATA provide additional depth on related architectural topics:
- Scaling ClickHouse from Gigabytes to Petabytes: A Practical Playbook — end-to-end guide to capacity planning and horizontal scaling
- ClickHouse Workload Isolation: 7 Techniques for Noisy-Neighbor Problems — resource governance for multi-tenant clusters
- ClickHouse Performance Pitfalls: 7 Mistakes That Slow Down Your Queries — query-level optimizations that complement sharding decisions
Conclusion
ClickHouse sharding strategies range from the operationally trivial (rand()) to the architecturally sophisticated (per-tenant dedicated clusters). In high-growth environments, the ClickHouse sharding key you choose today directly determines whether your cluster scales gracefully for the next three years or requires disruptive re-architecture after the next growth inflection.
Invest the time to understand your dominant query patterns, measure your data cardinality distribution, and validate your sharding choice against realistic write and read workloads before committing to a production topology. When in doubt, start with hash-based sharding on your primary tenant or entity identifier — it handles the widest range of workloads well and provides a clear migration path to more specialized ClickHouse sharding strategies as your requirements evolve.