Building Multi-Tenant ClickHouse Clusters

Building Multi-Tenant ClickHouse Clusters: Advanced Resource Isolation and Performance Optimization Strategies



Introduction

Multi-tenancy in analytical databases presents unique challenges that go beyond traditional OLTP systems. ClickHouse, with its columnar architecture and distributed processing capabilities, offers powerful mechanisms for building robust multi-tenant clusters. This comprehensive guide explores advanced techniques for resource isolation, query prioritization, and tenant-specific optimizations in production ClickHouse environments.

Understanding Multi-Tenancy Challenges in ClickHouse

Resource Contention Patterns

Multi-tenant ClickHouse deployments face several critical resource contention issues:

  • CPU Competition: Vectorized query processing can monopolize CPU cores
  • Memory Pressure: Large analytical queries consuming available RAM
  • I/O Bottlenecks: Concurrent disk reads affecting query performance
  • Network Saturation: Distributed queries overwhelming cluster communication

Tenant Isolation Requirements

Effective multi-tenancy requires isolation across multiple dimensions:

  • Performance Isolation: Preventing tenant queries from affecting others
  • Data Isolation: Ensuring tenant data remains segregated
  • Resource Isolation: Guaranteeing minimum resource allocations
  • Security Isolation: Maintaining access control boundaries

Architecture Patterns for Multi-Tenant ClickHouse

1. Database-Level Isolation

The simplest approach uses separate databases per tenant:

-- Tenant database creation
CREATE DATABASE tenant_001 ON CLUSTER production_cluster;
CREATE DATABASE tenant_002 ON CLUSTER production_cluster;

-- Tenant-specific table creation
CREATE TABLE tenant_001.events ON CLUSTER production_cluster
(
    timestamp DateTime64(3),
    user_id UInt64,
    event_type String,
    properties Map(String, String)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/tenant_001/events', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);

2. Schema-Based Isolation with Row-Level Security

Advanced isolation using tenant_id columns with row-level security:

-- Multi-tenant table with tenant isolation
CREATE TABLE events_multi_tenant ON CLUSTER production_cluster
(
    tenant_id UInt32,
    timestamp DateTime64(3),
    user_id UInt64,
    event_type String,
    properties Map(String, String)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events_multi_tenant', '{replica}')
PARTITION BY (tenant_id, toYYYYMM(timestamp))
ORDER BY (tenant_id, timestamp, user_id);

-- Row-level security policy
CREATE ROW POLICY tenant_filter ON events_multi_tenant 
FOR SELECT USING tenant_id = getSetting('tenant_id')
TO tenant_role;

3. Cluster-Level Isolation

Dedicated clusters for high-value tenants:

<!-- config.xml cluster configuration -->
<remote_servers>
    <tenant_premium_cluster>
        <shard>
            <replica>
                <host>premium-ch-01</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>premium-ch-02</host>
                <port>9000</port>
            </replica>
        </shard>
    </tenant_premium_cluster>

    <tenant_standard_cluster>
        <shard>
            <replica>
                <host>standard-ch-01</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>standard-ch-02</host>
                <port>9000</port>
            </replica>
        </shard>
    </tenant_standard_cluster>
</remote_servers>

Advanced Resource Isolation Techniques

CPU Resource Management

Thread Pool Configuration

<!-- users.xml - Tenant-specific thread limits -->
<profiles>
    <tenant_premium>
        <max_threads>16</max_threads>
        <max_memory_usage>8000000000</max_memory_usage>
        <priority>0</priority>
    </tenant_premium>

    <tenant_standard>
        <max_threads>8</max_threads>
        <max_memory_usage>4000000000</max_memory_usage>
        <priority>1</priority>
    </tenant_standard>

    <tenant_basic>
        <max_threads>4</max_threads>
        <max_memory_usage>2000000000</max_memory_usage>
        <priority>2</priority>
    </tenant_basic>
</profiles>

CPU Affinity and NUMA Optimization

# CPU affinity for tenant isolation
numactl --cpunodebind=0 --membind=0 clickhouse-server --config-file=/etc/clickhouse-server/tenant-premium.xml
numactl --cpunodebind=1 --membind=1 clickhouse-server --config-file=/etc/clickhouse-server/tenant-standard.xml

Memory Management Strategies

Per-Query Memory Limits

-- Dynamic memory allocation based on tenant tier
SET max_memory_usage = multiIf(
    getSetting('tenant_tier') = 'premium', 16000000000,
    getSetting('tenant_tier') = 'standard', 8000000000,
    4000000000
);

-- Memory-aware query execution
SELECT 
    tenant_id,
    count(*) as event_count,
    uniqExact(user_id) as unique_users
FROM events_multi_tenant 
WHERE timestamp >= now() - INTERVAL 1 DAY
GROUP BY tenant_id
SETTINGS max_memory_usage = 2000000000;

Buffer Pool Isolation

<!-- config.xml - Tenant-specific buffer configurations -->
<merge_tree>
    <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
    <parts_to_delay_insert>150</parts_to_delay_insert>
    <parts_to_throw_insert>300</parts_to_throw_insert>
    <max_delay_to_insert>1</max_delay_to_insert>
    <max_parts_in_total>100000</max_parts_in_total>
    <replicated_deduplication_window>100</replicated_deduplication_window>
    <replicated_deduplication_window_seconds>604800</replicated_deduplication_window_seconds>
</merge_tree>

I/O Resource Isolation

Storage Tiering by Tenant

-- Tenant-specific storage policies
CREATE STORAGE POLICY tenant_premium_policy
SETTINGS
    disks = ['premium_ssd', 'premium_hdd'],
    volumes = [
        ('hot', ['premium_ssd'], 0.8),
        ('cold', ['premium_hdd'], 1.0)
    ];

CREATE STORAGE POLICY tenant_standard_policy
SETTINGS
    disks = ['standard_ssd', 'standard_hdd'],
    volumes = [
        ('hot', ['standard_ssd'], 0.6),
        ('cold', ['standard_hdd'], 1.0)
    ];

-- Apply storage policy to tenant tables
ALTER TABLE tenant_001.events 
MODIFY SETTING storage_policy = 'tenant_premium_policy';

I/O Throttling Configuration

<!-- config.xml - I/O limits per tenant -->
<profiles>
    <tenant_premium>
        <max_network_bandwidth>1000000000</max_network_bandwidth>
        <max_network_bytes>10000000000</max_network_bytes>
        <priority>0</priority>
    </tenant_premium>

    <tenant_standard>
        <max_network_bandwidth>500000000</max_network_bandwidth>
        <max_network_bytes>5000000000</max_network_bytes>
        <priority>1</priority>
    </tenant_standard>
</profiles>

Query Prioritization and Scheduling

Priority-Based Query Execution

-- Priority-aware query execution
CREATE USER tenant_premium_user 
IDENTIFIED BY 'secure_password'
SETTINGS PROFILE 'tenant_premium';

CREATE USER tenant_standard_user 
IDENTIFIED BY 'secure_password'
SETTINGS PROFILE 'tenant_standard';

-- Query with explicit priority
SELECT count(*) FROM events_multi_tenant 
WHERE tenant_id = 1001
SETTINGS priority = 0, max_execution_time = 300;

Queue Management Implementation

# Python implementation for query queue management
import asyncio
import heapq
from dataclasses import dataclass
from typing import Dict, List, Optional

@dataclass
class QueryRequest:
    tenant_id: str
    query: str
    priority: int
    max_execution_time: int
    submitted_at: float

    def __lt__(self, other):
        return self.priority < other.priority

class ClickHouseQueryScheduler:
    def __init__(self, max_concurrent_queries: Dict[str, int]):
        self.max_concurrent = max_concurrent_queries
        self.running_queries: Dict[str, int] = {}
        self.query_queue: List[QueryRequest] = []
        self.lock = asyncio.Lock()

    async def submit_query(self, request: QueryRequest) -> Optional[str]:
        async with self.lock:
            tenant_running = self.running_queries.get(request.tenant_id, 0)
            tenant_limit = self.max_concurrent.get(request.tenant_id, 1)

            if tenant_running < tenant_limit:
                return await self._execute_query(request)
            else:
                heapq.heappush(self.query_queue, request)
                return None

    async def _execute_query(self, request: QueryRequest) -> str:
        self.running_queries[request.tenant_id] = \
            self.running_queries.get(request.tenant_id, 0) + 1

        try:
            # Execute query with ClickHouse client
            result = await self._run_clickhouse_query(request)
            return result
        finally:
            self.running_queries[request.tenant_id] -= 1
            await self._process_queue()

    async def _process_queue(self):
        while self.query_queue:
            next_request = heapq.heappop(self.query_queue)
            tenant_running = self.running_queries.get(next_request.tenant_id, 0)
            tenant_limit = self.max_concurrent.get(next_request.tenant_id, 1)

            if tenant_running < tenant_limit:
                await self._execute_query(next_request)
                break
            else:
                heapq.heappush(self.query_queue, next_request)
                break

Adaptive Resource Allocation

-- Dynamic resource allocation based on tenant usage patterns
CREATE MATERIALIZED VIEW tenant_resource_usage
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (tenant_id, timestamp)
AS SELECT
    tenant_id,
    toStartOfMinute(now()) as timestamp,
    count() as query_count,
    sum(query_duration_ms) as total_duration_ms,
    max(memory_usage) as peak_memory_usage,
    sum(read_bytes) as total_read_bytes
FROM system.query_log
WHERE event_date >= today() - 1
GROUP BY tenant_id, timestamp;

-- Resource allocation adjustment query
SELECT 
    tenant_id,
    avg(query_count) as avg_queries_per_minute,
    avg(total_duration_ms) as avg_duration_ms,
    max(peak_memory_usage) as max_memory_needed,
    multiIf(
        avg_queries_per_minute > 100, 'scale_up',
        avg_queries_per_minute < 10, 'scale_down',
        'maintain'
    ) as scaling_recommendation
FROM tenant_resource_usage
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY tenant_id;

Tenant-Specific Optimizations

Partitioning Strategies

-- Tenant-aware partitioning for optimal query performance
CREATE TABLE events_optimized ON CLUSTER production_cluster
(
    tenant_id UInt32,
    timestamp DateTime64(3),
    user_id UInt64,
    event_type LowCardinality(String),
    properties Map(String, String),
    -- Tenant-specific columns
    tenant_schema_version UInt16,
    tenant_custom_fields Map(String, String)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events_optimized', '{replica}')
PARTITION BY (tenant_id, toYYYYMM(timestamp))
ORDER BY (tenant_id, timestamp, user_id)
SETTINGS 
    index_granularity = 8192,
    merge_with_ttl_timeout = 86400,
    -- Tenant-specific TTL
    ttl_only_drop_parts = 1;

-- Tenant-specific TTL policies
ALTER TABLE events_optimized 
MODIFY TTL 
    timestamp + INTERVAL 
    multiIf(
        tenant_id IN (SELECT tenant_id FROM premium_tenants), 2 YEAR,
        tenant_id IN (SELECT tenant_id FROM standard_tenants), 1 YEAR,
        90 DAY
    ) DELETE;

Materialized Views for Tenant Analytics

-- Tenant-specific aggregation materialized views
CREATE MATERIALIZED VIEW tenant_daily_metrics
ENGINE = SummingMergeTree()
PARTITION BY (tenant_id, toYYYYMM(date))
ORDER BY (tenant_id, date, event_type)
AS SELECT
    tenant_id,
    toDate(timestamp) as date,
    event_type,
    count() as event_count,
    uniq(user_id) as unique_users,
    avg(JSONExtractFloat(properties, 'duration')) as avg_duration
FROM events_optimized
GROUP BY tenant_id, date, event_type;

-- Real-time tenant dashboard query
SELECT 
    tenant_id,
    sum(event_count) as total_events,
    sum(unique_users) as total_unique_users,
    avg(avg_duration) as overall_avg_duration
FROM tenant_daily_metrics
WHERE date >= today() - 7
GROUP BY tenant_id
ORDER BY total_events DESC;

Custom Aggregation Functions

-- Tenant-specific custom aggregation
CREATE FUNCTION tenantWeightedAvg AS (values, weights, tenant_weight_factor) -> 
    sumForEach(arrayMap(x, y, z -> x * y * z, values, weights, tenant_weight_factor)) / 
    sumForEach(arrayMap(x, y -> x * y, weights, tenant_weight_factor));

-- Usage in tenant analytics
SELECT 
    tenant_id,
    tenantWeightedAvg(
        groupArray(metric_value),
        groupArray(sample_weight),
        groupArray(tenant_priority_multiplier)
    ) as weighted_metric
FROM tenant_metrics
GROUP BY tenant_id;

Monitoring and Observability

Tenant-Aware Monitoring Queries

-- Real-time tenant resource monitoring
CREATE VIEW tenant_resource_monitor AS
SELECT 
    user as tenant_user,
    count() as active_queries,
    sum(memory_usage) as total_memory_usage,
    avg(elapsed) as avg_query_duration,
    sum(read_bytes) as total_read_bytes,
    sum(written_bytes) as total_written_bytes
FROM system.processes
GROUP BY user;

-- Tenant SLA monitoring
SELECT 
    extractAllGroups(user, 'tenant_(\d+)_user')[1] as tenant_id,
    countIf(type = 'QueryFinish' AND query_duration_ms <= 5000) / count() as sla_compliance_rate,
    avg(query_duration_ms) as avg_response_time,
    max(memory_usage) as peak_memory_usage
FROM system.query_log
WHERE event_date >= today()
GROUP BY tenant_id
HAVING count() > 100;

Performance Metrics Collection

# Prometheus metrics for tenant monitoring
from prometheus_client import Counter, Histogram, Gauge
import clickhouse_connect

# Metrics definitions
tenant_queries_total = Counter('clickhouse_tenant_queries_total', 
                              'Total queries per tenant', ['tenant_id'])
tenant_query_duration = Histogram('clickhouse_tenant_query_duration_seconds',
                                 'Query duration per tenant', ['tenant_id'])
tenant_memory_usage = Gauge('clickhouse_tenant_memory_usage_bytes',
                           'Current memory usage per tenant', ['tenant_id'])

class TenantMetricsCollector:
    def __init__(self, clickhouse_client):
        self.client = clickhouse_client

    def collect_tenant_metrics(self):
        query = """
        SELECT 
            extractAllGroups(user, 'tenant_(\d+)_user')[1] as tenant_id,
            count() as query_count,
            avg(query_duration_ms) as avg_duration_ms,
            sum(memory_usage) as total_memory_usage
        FROM system.query_log
        WHERE event_time >= now() - INTERVAL 5 MINUTE
        GROUP BY tenant_id
        """

        results = self.client.query(query)

        for row in results.result_rows:
            tenant_id, query_count, avg_duration, memory_usage = row

            tenant_queries_total.labels(tenant_id=tenant_id).inc(query_count)
            tenant_query_duration.labels(tenant_id=tenant_id).observe(avg_duration / 1000)
            tenant_memory_usage.labels(tenant_id=tenant_id).set(memory_usage)

Security and Access Control

Role-Based Access Control

-- Tenant-specific roles and permissions
CREATE ROLE tenant_admin;
CREATE ROLE tenant_analyst;
CREATE ROLE tenant_viewer;

-- Grant permissions by tenant tier
GRANT SELECT, INSERT, ALTER ON tenant_001.* TO tenant_admin;
GRANT SELECT ON tenant_001.* TO tenant_analyst;
GRANT SELECT ON tenant_001.events TO tenant_viewer;

-- Row-level security for shared tables
CREATE ROW POLICY tenant_001_policy ON events_multi_tenant
FOR SELECT USING tenant_id = 1001
TO tenant_001_users;

Data Encryption and Isolation

<!-- config.xml - Tenant-specific encryption -->
<encryption>
    <key_command>echo "tenant_001_encryption_key"</key_command>
    <key_command_2>echo "tenant_002_encryption_key"</key_command_2>
</encryption>

<storage_configuration>
    <disks>
        <tenant_001_encrypted>
            <type>encrypted</type>
            <disk>default</disk>
            <path>/var/lib/clickhouse/tenant_001/</path>
            <key>key_command</key>
        </tenant_001_encrypted>
    </disks>
</storage_configuration>

Performance Optimization Best Practices

Query Optimization Techniques

  • Tenant-Aware Indexing:
-- Skip indexes for tenant-specific queries
ALTER TABLE events_multi_tenant 
ADD INDEX tenant_bloom_idx tenant_id TYPE bloom_filter GRANULARITY 1;

ALTER TABLE events_multi_tenant 
ADD INDEX event_type_set_idx event_type TYPE set(100) GRANULARITY 1;
  • Projection Optimization:
-- Tenant-specific projections
ALTER TABLE events_multi_tenant 
ADD PROJECTION tenant_hourly_projection (
    SELECT tenant_id, toStartOfHour(timestamp) as hour, 
           event_type, count(), uniq(user_id)
    GROUP BY tenant_id, hour, event_type
);
  • Compression Strategies:
-- Tenant-specific compression
ALTER TABLE events_multi_tenant 
MODIFY COLUMN properties 
CODEC(ZSTD(1)) -- Light compression for frequently accessed data

ALTER TABLE events_multi_tenant 
MODIFY COLUMN event_metadata 
CODEC(ZSTD(22)) -- Heavy compression for archival data

Capacity Planning

-- Tenant growth analysis
SELECT 
    tenant_id,
    toStartOfMonth(timestamp) as month,
    count() as events_count,
    uncompressedBytes() as uncompressed_size,
    compressedBytes() as compressed_size,
    compressedBytes() / uncompressedBytes() as compression_ratio,
    -- Growth rate calculation
    (count() - lagInFrame(count()) OVER (
        PARTITION BY tenant_id 
        ORDER BY month
    )) / lagInFrame(count()) OVER (
        PARTITION BY tenant_id 
        ORDER BY month
    ) as growth_rate
FROM events_multi_tenant
WHERE timestamp >= now() - INTERVAL 12 MONTH
GROUP BY tenant_id, month
ORDER BY tenant_id, month;

Conclusion

Building effective multi-tenant ClickHouse clusters requires careful consideration of resource isolation, query prioritization, and tenant-specific optimizations. The strategies outlined in this guide provide a comprehensive framework for implementing robust multi-tenancy that scales with your organization’s needs.

Key takeaways include:

  • Layered Isolation: Combine database-level, schema-level, and cluster-level isolation strategies
  • Dynamic Resource Management: Implement adaptive resource allocation based on tenant usage patterns
  • Performance Monitoring: Establish comprehensive monitoring for tenant-specific SLAs
  • Security First: Implement proper access controls and data encryption
  • Optimization Strategies: Use tenant-aware indexing, partitioning, and compression techniques

By implementing these advanced techniques, you can build ClickHouse clusters that provide excellent performance isolation while maintaining operational efficiency and cost-effectiveness across multiple tenants.

The multi-tenant architecture patterns and optimization strategies presented here have been successfully deployed in production environments handling petabytes of data across hundreds of tenants, demonstrating their effectiveness at scale.



Troubleshooting Inadequate System Resources error in ClickHouse

 

ClickHouse Thread Architecture

 

Unlocking High-Speed Analytics: Why ClickHouse Is Ideal for High-Velocity, High-Volume Data Ingestion

 

Optimizing ClickHouse Clusters for High-Speed Parquet Data Queries

 

About ChistaDATA Inc. 158 Articles
We are an full-stack ClickHouse infrastructure operations Consulting, Support and Managed Services provider with core expertise in performance, scalability and data SRE. Based out of California, Our consulting and support engineering team operates out of San Francisco, Vancouver, London, Germany, Russia, Ukraine, Australia, Singapore and India to deliver 24*7 enterprise-class consultative support and managed services. We operate very closely with some of the largest and planet-scale internet properties like PayPal, Garmin, Honda cars IoT project, Viacom, National Geographic, Nike, Morgan Stanley, American Express Travel, VISA, Netflix, PRADA, Blue Dart, Carlsberg, Sony, Unilever etc

Be the first to comment

Leave a Reply