Designing Multi-Region ClickHouse Deployments for Global Scale: 3 Best Patterns

ClickHouse multi-region deployment is one of the most architecturally demanding problems in distributed analytics. When your users span three continents, a single-region cluster becomes a latency ceiling, a compliance liability, and a single point of failure.

This guide provides a comprehensive, production-tested blueprint for designing multi-region ClickHouse deployments that deliver global scale, low query latency, and strong fault tolerance — without sacrificing operational simplicity.

Whether you are building a real-time analytics platform for a global SaaS product, a financial reporting system with strict data residency requirements, or a multi-cloud observability pipeline, understanding how to architect ClickHouse across regions is a foundational engineering skill.

multi-region ClickHouse deployment architecture diagram

Why Multi-Region ClickHouse Deployments Are Necessary

A single-region ClickHouse deployment can serve hundreds of billions of rows efficiently, but it introduces three categories of problems as your user base globalizes.

Query Latency Across Geographies is the most immediately visible issue. A ClickHouse cluster in US-East will add 150–300ms of network round-trip time for users in Southeast Asia or Europe. For interactive analytics dashboards where sub-second response times define the product experience, this is unacceptable.

Data Residency and Compliance Requirements are increasingly non-negotiable. GDPR requires that personal data of EU citizens be stored within the EU. PDPA in Thailand, LGPD in Brazil, and equivalent regulations in dozens of other jurisdictions impose similar constraints. A multi-region ClickHouse architecture lets you route and store data in the correct geographic boundary without maintaining entirely separate data stacks.

High Availability Across Availability Zones and Regions protects against both zonal outages and full regional failures. A well-designed multi-region ClickHouse deployment can survive the complete loss of an entire cloud region without data loss or service interruption.

Core Architectural Patterns for Multi-Region ClickHouse

There are three primary architectural patterns for multi-region ClickHouse deployments. Each addresses a different tradeoff between consistency, latency, and operational complexity.

Pattern 1: Active-Passive Replication (Read-Local, Write-Global)

In the active-passive pattern, all writes flow to a single primary region. ClickHouse’s built-in replication mechanism replicates data to one or more secondary regions. Reads can be served from the nearest region, provided your application tolerates slightly stale data.

This pattern is the simplest to operate and is appropriate when write volume is moderate and reads dominate your workload. It is also the most appropriate pattern for compliance-driven architectures where the primary region must be the authoritative source of record.

<remote_servers>
    <global_analytics_cluster>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <!-- US-East Primary Region -->
            <replica>
                <host>ch-us-east-01.internal</host>
                <port>9000</port>
                <priority>1</priority>
            </replica>
            <replica>
                <host>ch-us-east-02.internal</host>
                <port>9000</port>
                <priority>1</priority>
            </replica>
            <!-- EU-West Secondary Region -->
            <replica>
                <host>ch-eu-west-01.internal</host>
                <port>9000</port>
                <priority>2</priority>
            </replica>
            <!-- AP-Southeast Secondary Region -->
            <replica>
                <host>ch-ap-southeast-01.internal</host>
                <port>9000</port>
                <priority>3</priority>
            </replica>
        </shard>
    </global_analytics_cluster>
</remote_servers>

In this configuration, replicas with priority=1 are preferred for reads. ClickHouse will fall back to higher-numbered priorities only when lower-priority replicas are unavailable. Cross-region replicas are assigned higher priority numbers to ensure that local reads always use the nearest healthy replica.

Pattern 2: Active-Active Multi-Master with ZooKeeper/ClickHouse Keeper

The active-active pattern allows writes in any region and uses ClickHouse Keeper (or Apache ZooKeeper) to coordinate replication metadata across regions. This delivers both write locality and read locality at the cost of higher operational complexity and stricter requirements on inter-region network latency.

For active-active to function correctly, ClickHouse Keeper must maintain quorum across regions. A three-region deployment requires at least one Keeper node per region, with an odd total count to guarantee majority quorum even when one region is partitioned.

<keeper_server>
    <tcp_port>9181</tcp_port>
    <server_id>1</server_id>
    <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
    <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

    <coordination_settings>
        <operation_timeout_ms>10000</operation_timeout_ms>
        <session_timeout_ms>30000</session_timeout_ms>
        <raft_logs_level>warning</raft_logs_level>
    </coordination_settings>

    <raft_configuration>
        <!-- US-East Keeper Nodes -->
        <server>
            <id>1</id>
            <hostname>keeper-us-east-01.internal</hostname>
            <port>9444</port>
        </server>
        <server>
            <id>2</id>
            <hostname>keeper-us-east-02.internal</hostname>
            <port>9444</port>
        </server>
        <!-- EU-West Keeper Node -->
        <server>
            <id>3</id>
            <hostname>keeper-eu-west-01.internal</hostname>
            <port>9444</port>
        </server>
        <!-- AP-Southeast Keeper Node -->
        <server>
            <id>4</id>
            <hostname>keeper-ap-southeast-01.internal</hostname>
            <port>9444</port>
        </server>
        <!-- Tie-breaker in US-West -->
        <server>
            <id>5</id>
            <hostname>keeper-us-west-01.internal</hostname>
            <port>9444</port>
        </server>
    </raft_configuration>
</keeper_server>

Pattern 3: Federated Query with Region-Local Storage

In the federated query pattern, each region maintains a fully independent ClickHouse cluster with its own local data. A global query layer — typically using ClickHouse’s Distributed table engine — fans queries out to all regional clusters and merges results at the query coordinator node. This pattern delivers the strongest data locality and compliance guarantees because data never leaves its home region at rest.

-- Local table in each region (identical schema in all regions)
CREATE TABLE events_local ON CLUSTER global_analytics_cluster
(
    event_id     UUID,
    user_id      UInt64,
    region       LowCardinality(String),
    event_type   LowCardinality(String),
    properties   String,
    created_at   DateTime64(3, 'UTC')
)
ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/{shard}/events_local',
    '{replica}'
)
PARTITION BY toYYYYMM(created_at)
ORDER BY (region, user_id, created_at)
TTL created_at + INTERVAL 365 DAY;

-- Global distributed table for cross-region fan-out queries
CREATE TABLE events_global ON CLUSTER global_analytics_cluster
(
    event_id     UUID,
    user_id      UInt64,
    region       LowCardinality(String),
    event_type   LowCardinality(String),
    properties   String,
    created_at   DateTime64(3, 'UTC')
)
ENGINE = Distributed(
    'global_analytics_cluster',
    'default',
    'events_local',
    cityHash64(user_id)
);

When a query is issued against events_global, ClickHouse fans it out to all regional shards in global_analytics_cluster, executes the query locally on each shard in parallel, and returns merged results to the coordinator. This allows global aggregations while keeping data physically co-located with the users who generated it.

ClickHouse Replication Across Regions: Key Considerations

ClickHouse replication was originally designed for low-latency intra-datacenter use cases. ClickHouse’s official replication documentation covers the underlying mechanics in depth. Extending it across regions introduces several challenges that require careful configuration.

Replication Lag and Network Bandwidth

Cross-region replication in ClickHouse transfers data parts between replicas over TCP. In a typical multi-region deployment, you should expect replication lag of 5–60 seconds depending on part size, network bandwidth, and write throughput. For workloads that insert billions of rows per day, cross-region replication can easily saturate a 1Gbps inter-region link.

Bandwidth planning is therefore a critical step before deploying multi-region ClickHouse at scale.

To manage bandwidth consumption, configure ClickHouse’s replication throttle settings:

<merge_tree>
    <!-- Max bytes per second for replication data transfer -->
    <max_replicated_fetches_network_bandwidth_for_server>104857600</max_replicated_fetches_network_bandwidth_for_server>

    <!-- Max bytes per second for replication sends -->
    <max_replicated_sends_network_bandwidth_for_server>104857600</max_replicated_sends_network_bandwidth_for_server>

    <!-- Time in seconds to wait for a replica to sync before declaring it stale -->
    <replication_alter_partitions_sync>2</replication_alter_partitions_sync>

    <!-- Number of threads for background replication -->
    <background_fetches_pool_size>8</background_fetches_pool_size>
</merge_tree>

ClickHouse Keeper Quorum Latency

Every DDL operation and every INSERT into a ReplicatedMergeTree table requires at least one round-trip to ClickHouse Keeper to log the operation in the replication queue.

In a single-region deployment, this round-trip is sub-millisecond. In a multi-region deployment, if your Keeper leader is in US-East and your inserts are being processed in EU-West, every insert incurs ~100ms of Keeper latency before the data is committed.

The solution is to co-locate the Keeper leader with the region that handles the highest write volume, and to use quorum_inserts selectively rather than globally:

-- Enable quorum inserts only for critical tables that require cross-region durability
SET insert_quorum = 2;
SET insert_quorum_timeout = 60000; -- 60 seconds for cross-region replication

INSERT INTO events_local
SELECT
    generateUUIDv4()      AS event_id,
    user_id               AS user_id,
    'eu-west'             AS region,
    event_type            AS event_type,
    JSONExtractRaw(raw)   AS properties,
    now64(3)              AS created_at
FROM incoming_events_eu
WHERE created_at >= now() - INTERVAL 5 MINUTE;

Data Sharding Strategies for Multi-Region Scale

Multi-region ClickHouse deployments must address data sharding at two independent levels: intra-region sharding (distributing data across nodes within a region) and inter-region partitioning (determining which region owns which data).

Geo-Based Partitioning

The simplest inter-region strategy is geo-based partitioning: data generated in a region is stored in that region. This delivers maximum data locality for both writes and reads, minimizes cross-region network costs, and satisfies most data residency requirements.

CREATE TABLE user_events
(
    event_id     UUID          DEFAULT generateUUIDv4(),
    user_id      UInt64,
    home_region  LowCardinality(String),
    event_type   LowCardinality(String),
    payload      String,
    ts           DateTime64(3, 'UTC')
)
ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/user_events/{shard}',
    '{replica}'
)
PARTITION BY (home_region, toYYYYMM(ts))
ORDER BY (user_id, ts)
SETTINGS index_granularity = 8192;

-- Materialized view to route events to the correct region's table
CREATE MATERIALIZED VIEW mv_eu_events
TO user_events
AS SELECT *
FROM incoming_raw_events
WHERE home_region IN ('eu-west', 'eu-central', 'eu-north');

Consistent Hash-Based Sharding for Cross-Region Tables

When your analytics workload requires global aggregations across all regions — such as computing a global daily active users count — you need a sharding strategy that distributes data evenly across all regional nodes while still allowing efficient per-region reads.

-- Create the distributed overlay for global aggregations
CREATE TABLE global_metrics ON CLUSTER global_analytics_cluster
(
    metric_name  LowCardinality(String),
    region       LowCardinality(String),
    value        Float64,
    labels       Map(String, String),
    ts           DateTime64(3, 'UTC')
)
ENGINE = Distributed(
    'global_analytics_cluster',
    'default',
    'global_metrics_local',
    -- Route by metric_name to keep time-series data co-located per node
    cityHash64(metric_name)
)
SETTINGS
    -- Allow reads from stale replicas to reduce cross-region read latency
    max_replica_delay_for_distributed_queries = 30,
    skip_unavailable_shards = 1;

Monitoring Multi-Region ClickHouse Deployments

Observability in a distributed ClickHouse cluster across multiple regions requires monitoring both cluster-wide health and per-region replication state.

You may also want to review our post on ClickHouse performance pitfalls to avoid common monitoring gaps. both the cluster-wide health and the per-region replication state. The following queries are essential for a multi-region operations runbook.

Checking Cross-Region Replication Lag

-- Monitor replication queue depth and lag across all replicas
SELECT
    database,
    table,
    replica_name,
    replica_path,
    is_leader,
    is_readonly,
    absolute_delay          AS lag_seconds,
    queue_size              AS pending_ops,
    inserts_in_queue        AS pending_inserts,
    merges_in_queue         AS pending_merges,
    log_max_index           AS log_max,
    log_pointer             AS log_current,
    (log_max_index - log_pointer) AS log_gap
FROM system.replicas
WHERE absolute_delay > 10
   OR queue_size > 100
ORDER BY absolute_delay DESC, queue_size DESC;

Identifying Slow Cross-Region Queries

-- Identify queries with high cross-region network transfer time
SELECT
    query_id,
    user,
    initial_address,
    query_duration_ms,
    read_rows,
    read_bytes,
    result_rows,
    result_bytes,
    -- Network time is the difference between total duration and local compute
    (query_duration_ms - ProfileEvents['NetworkReceiveElapsedMicroseconds'] / 1000)
        AS estimated_local_compute_ms,
    ProfileEvents['NetworkReceiveElapsedMicroseconds'] / 1000
        AS network_receive_ms,
    ProfileEvents['NetworkSendElapsedMicroseconds'] / 1000
        AS network_send_ms,
    query
FROM system.query_log
WHERE type = 'QueryFinish'
  AND is_initial_query = 1
  AND query_duration_ms > 1000
  AND ProfileEvents['NetworkReceiveElapsedMicroseconds'] > 100000
ORDER BY query_duration_ms DESC
LIMIT 20;

ClickHouse High Availability and Disaster Recovery in Multi-Region Deployments

A production-grade multi-region ClickHouse deployment must have a tested failover plan. See also our guide on ClickHouse Disaster Recovery Drills for a comprehensive runbook approach. The goal is to define your Recovery Time Objective (RTO) and Recovery Point Objective (RPO) before a failure occurs, not during one.

Automatic ClickHouse Failover for Replica Management

ClickHouse’s Distributed table engine handles replica failover automatically based on the load_balancing setting and replica health checks. For multi-region deployments, the recommended settings are:

<!-- User-level settings for distributed query failover -->
<profiles>
    <default>
        <!-- Prefer nearest replica, fall back on errors -->
        <load_balancing>nearest_hostname</load_balancing>

        <!-- Seconds to consider a replica stale -->
        <max_replica_delay_for_distributed_queries>300</max_replica_delay_for_distributed_queries>

        <!-- Fall back to stale replica if no fresh ones available -->
        <fallback_to_stale_replicas_for_distributed_queries>1</fallback_to_stale_replicas_for_distributed_queries>

        <!-- Skip unavailable shards rather than returning an error -->
        <skip_unavailable_shards>1</skip_unavailable_shards>

        <!-- Max connection attempts per replica before marking unhealthy -->
        <connections_with_failover_max_tries>3</connections_with_failover_max_tries>
    </default>
</profiles>

Cross-Region Backup Strategy

For disaster recovery, ClickHouse’s native backup feature can write backups directly to object storage (Amazon S3, Google Cloud Storage, Azure Blob Storage) in multiple regions. A typical multi-region backup policy runs incremental backups every hour to local region object storage, and full backups every 24 hours to a secondary region for cross-region durability.

-- Incremental backup to primary region S3 (runs every hour via cron/Airflow)
BACKUP DATABASE default
TO S3('https://s3.us-east-1.amazonaws.com/ch-backups-us-east/incremental/{backup_id}',
       'ACCESS_KEY_ID', 'SECRET_ACCESS_KEY')
SETTINGS
    incremental_base = S3('https://s3.us-east-1.amazonaws.com/ch-backups-us-east/base/',
                          'ACCESS_KEY_ID', 'SECRET_ACCESS_KEY'),
    compression_method = 'lz4',
    compression_level = 1;

-- Full backup to secondary region S3 for cross-region DR (runs every 24 hours)
BACKUP DATABASE default
TO S3('https://s3.eu-west-1.amazonaws.com/ch-backups-eu-west/full/{backup_id}',
       'ACCESS_KEY_ID', 'SECRET_ACCESS_KEY')
SETTINGS
    compression_method = 'zstd',
    compression_level = 3,
    deduplicate_files = 1;

Performance Optimization for Multi-Region Queries

Cross-region queries introduce network latency that cannot be eliminated — only minimized. These optimization strategies reduce the volume of data that must traverse region boundaries.

Precompute Regional Aggregates with Materialized Views

Instead of fanning out raw-data queries to all regions, pre-aggregate at the regional level and replicate only the aggregated results to a global summary table. This can reduce cross-region data transfer by orders of magnitude for common analytical queries.

-- Regional hourly aggregation MV (runs locally in each region)
CREATE MATERIALIZED VIEW mv_hourly_metrics_regional
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (region, metric_name, hour)
POPULATE
AS
SELECT
    region,
    metric_name,
    toStartOfHour(ts)   AS hour,
    sum(value)          AS total_value,
    count()             AS sample_count,
    min(value)          AS min_value,
    max(value)          AS max_value,
    quantileState(0.95)(value) AS p95_state
FROM global_metrics_local
GROUP BY region, metric_name, hour;

-- Global Distributed view over the regional aggregates
-- Cross-region transfer is now tiny: only hourly summaries move, not raw events
CREATE TABLE global_hourly_metrics ON CLUSTER global_analytics_cluster
ENGINE = Distributed(
    'global_analytics_cluster',
    'default',
    'mv_hourly_metrics_regional',
    cityHash64(metric_name)
);

Using ClickHouse Query Settings to Minimize Cross-Region Fan-out

-- Route queries to a specific region's shards to avoid unnecessary cross-region traffic
-- Use this pattern when the query is scoped to a single region's data
SELECT
    toStartOfDay(ts)    AS day,
    event_type,
    count()             AS event_count,
    uniq(user_id)       AS unique_users
FROM events_global
WHERE region = 'eu-west'         -- Partition pruning: only EU shards are queried
  AND ts BETWEEN '2026-01-01' AND '2026-01-31'
GROUP BY day, event_type
ORDER BY day, event_count DESC
SETTINGS
    -- Restrict to EU shards by using a consistent virtual shard key
    prefer_localhost_replica = 1,
    -- Timeout for waiting on cross-region shard responses
    distributed_connections_pool_size = 16,
    receive_timeout = 300;

Security and Network Architecture for Multi-Region ClickHouse

Multi-region ClickHouse deployments introduce a larger network perimeter than single-region deployments. The following security controls are mandatory for production environments.

All inter-region communication between ClickHouse nodes should be encrypted with TLS. ClickHouse’s SSL/TLS documentation provides the full reference for these settings.

ClickHouse supports native TLS for both the HTTP interface and the TCP (native) interface. Configure inter-server TLS with mutual authentication to prevent unauthorized nodes from joining the cluster:

<interserver_http_host>ch-node-01.internal</interserver_http_host>
<interserver_http_port>9009</interserver_http_port>

<!-- Enable TLS for all inter-server (replication) communication -->
<interserver_https_port>9010</interserver_https_port>

<openSSL>
    <server>
        <certificateFile>/etc/clickhouse-server/certs/server.crt</certificateFile>
        <privateKeyFile>/etc/clickhouse-server/certs/server.key</privateKeyFile>
        <caConfig>/etc/clickhouse-server/certs/ca.crt</caConfig>
        <verificationMode>strict</verificationMode>
        <loadDefaultCAFile>false</loadDefaultCAFile>
        <cacheSessions>true</cacheSessions>
        <disableProtocols>sslv2,sslv3,tlsv1,tlsv1_1</disableProtocols>
        <preferServerCiphers>true</preferServerCiphers>
    </server>
    <client>
        <certificateFile>/etc/clickhouse-server/certs/client.crt</certificateFile>
        <privateKeyFile>/etc/clickhouse-server/certs/client.key</privateKeyFile>
        <caConfig>/etc/clickhouse-server/certs/ca.crt</caConfig>
        <verificationMode>strict</verificationMode>
    </client>
</openSSL>

Choosing the Right Multi-Region Architecture for Your Use Case

The right multi-region ClickHouse architecture depends on four factors: your write volume, your read latency requirements, your data residency obligations, and your operational maturity. Before designing your multi-region setup, make sure you’ve reviewed our guide to scaling ClickHouse from gigabytes to petabytes to understand the single-region scaling fundamentals first.

For teams building their first multi-region ClickHouse deployment, the active-passive replication pattern is the recommended starting point. It is the simplest to operate, requires no changes to application write logic, and provides strong read-local performance with acceptable replication lag for most analytical use cases.

Teams with strict write latency requirements across multiple geographies should evaluate the active-active multi-master pattern, accepting the operational overhead of managing a distributed Keeper ensemble and resolving the rare cases of replication divergence.

Organizations with the strongest data residency requirements — particularly those operating under GDPR, HIPAA, or financial regulations that prohibit cross-border data movement — should implement the federated query pattern, accepting that some global aggregations will require cross-region network transfers at query time rather than at ingest time.

Conclusion: Building ClickHouse Global Scale Infrastructure

The ClickHouse Distributed table engine is central to all three patterns discussed in this guide. Achieving ClickHouse global scale requires designing your deployment across regions from day one. Designing multi-region ClickHouse deployments for global scale requires making deliberate tradeoffs between write consistency, read latency, data residency, and operational complexity. There is no single architecture that is optimal for all workloads — but with the three patterns covered in this guide (active-passive replication, active-active multi-master, and federated query), you have the full toolkit to design a ClickHouse global deployment that meets your specific requirements.

The configuration examples, SQL definitions, and monitoring queries in this guide are production-ready starting points. As your deployment scales, revisit your inter-region network topology, your ClickHouse Keeper placement, and your materialized view pre-aggregation strategy — these three areas consistently determine whether a multi-region ClickHouse deployment delivers the global performance and reliability your users expect.

For expert guidance on designing and operating multi-region ClickHouse deployments, contact the ChistaDATA team. Our engineers have designed ClickHouse clusters serving trillions of rows across multiple continents and can help you build the right architecture for your scale.

About ChistaDATA Inc. 227 Articles
We are an full-stack ClickHouse infrastructure operations Consulting, Support and Managed Services provider with core expertise in performance, scalability and data SRE. Based out of California, Our consulting and support engineering team operates out of San Francisco, Vancouver, London, Germany, Russia, Ukraine, Australia, Singapore and India to deliver 24*7 enterprise-class consultative support and managed services. We operate very closely with some of the largest and planet-scale internet properties like PayPal, Garmin, Honda cars IoT project, Viacom, National Geographic, Nike, Morgan Stanley, American Express Travel, VISA, Netflix, PRADA, Blue Dart, Carlsberg, Sony, Unilever etc