Building Multi-Tenant ClickHouse Clusters: Advanced Resource Isolation and Performance Optimization Strategies
Introduction
Multi-tenancy in analytical databases presents unique challenges that go beyond traditional OLTP systems. ClickHouse, with its columnar architecture and distributed processing capabilities, offers powerful mechanisms for building robust multi-tenant clusters. This comprehensive guide explores advanced techniques for resource isolation, query prioritization, and tenant-specific optimizations in production ClickHouse environments.
Understanding Multi-Tenancy Challenges in ClickHouse
Resource Contention Patterns
Multi-tenant ClickHouse deployments face several critical resource contention issues:
- CPU Competition: Vectorized query processing can monopolize CPU cores
- Memory Pressure: Large analytical queries consuming available RAM
- I/O Bottlenecks: Concurrent disk reads affecting query performance
- Network Saturation: Distributed queries overwhelming cluster communication
Tenant Isolation Requirements
Effective multi-tenancy requires isolation across multiple dimensions:
- Performance Isolation: Preventing tenant queries from affecting others
- Data Isolation: Ensuring tenant data remains segregated
- Resource Isolation: Guaranteeing minimum resource allocations
- Security Isolation: Maintaining access control boundaries
Architecture Patterns for Multi-Tenant ClickHouse
1. Database-Level Isolation
The simplest approach uses separate databases per tenant:
-- Tenant database creation CREATE DATABASE tenant_001 ON CLUSTER production_cluster; CREATE DATABASE tenant_002 ON CLUSTER production_cluster; -- Tenant-specific table creation CREATE TABLE tenant_001.events ON CLUSTER production_cluster ( timestamp DateTime64(3), user_id UInt64, event_type String, properties Map(String, String) ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/tenant_001/events', '{replica}') PARTITION BY toYYYYMM(timestamp) ORDER BY (timestamp, user_id);
2. Schema-Based Isolation with Row-Level Security
Advanced isolation using tenant_id columns with row-level security:
-- Multi-tenant table with tenant isolation CREATE TABLE events_multi_tenant ON CLUSTER production_cluster ( tenant_id UInt32, timestamp DateTime64(3), user_id UInt64, event_type String, properties Map(String, String) ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events_multi_tenant', '{replica}') PARTITION BY (tenant_id, toYYYYMM(timestamp)) ORDER BY (tenant_id, timestamp, user_id); -- Row-level security policy CREATE ROW POLICY tenant_filter ON events_multi_tenant FOR SELECT USING tenant_id = getSetting('tenant_id') TO tenant_role;
3. Cluster-Level Isolation
Dedicated clusters for high-value tenants:
<!-- config.xml cluster configuration --> <remote_servers> <tenant_premium_cluster> <shard> <replica> <host>premium-ch-01</host> <port>9000</port> </replica> <replica> <host>premium-ch-02</host> <port>9000</port> </replica> </shard> </tenant_premium_cluster> <tenant_standard_cluster> <shard> <replica> <host>standard-ch-01</host> <port>9000</port> </replica> <replica> <host>standard-ch-02</host> <port>9000</port> </replica> </shard> </tenant_standard_cluster> </remote_servers>
Advanced Resource Isolation Techniques
CPU Resource Management
Thread Pool Configuration
<!-- users.xml - Tenant-specific thread limits --> <profiles> <tenant_premium> <max_threads>16</max_threads> <max_memory_usage>8000000000</max_memory_usage> <priority>0</priority> </tenant_premium> <tenant_standard> <max_threads>8</max_threads> <max_memory_usage>4000000000</max_memory_usage> <priority>1</priority> </tenant_standard> <tenant_basic> <max_threads>4</max_threads> <max_memory_usage>2000000000</max_memory_usage> <priority>2</priority> </tenant_basic> </profiles>
CPU Affinity and NUMA Optimization
# CPU affinity for tenant isolation numactl --cpunodebind=0 --membind=0 clickhouse-server --config-file=/etc/clickhouse-server/tenant-premium.xml numactl --cpunodebind=1 --membind=1 clickhouse-server --config-file=/etc/clickhouse-server/tenant-standard.xml
Memory Management Strategies
Per-Query Memory Limits
-- Dynamic memory allocation based on tenant tier SET max_memory_usage = multiIf( getSetting('tenant_tier') = 'premium', 16000000000, getSetting('tenant_tier') = 'standard', 8000000000, 4000000000 ); -- Memory-aware query execution SELECT tenant_id, count(*) as event_count, uniqExact(user_id) as unique_users FROM events_multi_tenant WHERE timestamp >= now() - INTERVAL 1 DAY GROUP BY tenant_id SETTINGS max_memory_usage = 2000000000;
Buffer Pool Isolation
<!-- config.xml - Tenant-specific buffer configurations --> <merge_tree> <max_suspicious_broken_parts>5</max_suspicious_broken_parts> <parts_to_delay_insert>150</parts_to_delay_insert> <parts_to_throw_insert>300</parts_to_throw_insert> <max_delay_to_insert>1</max_delay_to_insert> <max_parts_in_total>100000</max_parts_in_total> <replicated_deduplication_window>100</replicated_deduplication_window> <replicated_deduplication_window_seconds>604800</replicated_deduplication_window_seconds> </merge_tree>
I/O Resource Isolation
Storage Tiering by Tenant
-- Tenant-specific storage policies CREATE STORAGE POLICY tenant_premium_policy SETTINGS disks = ['premium_ssd', 'premium_hdd'], volumes = [ ('hot', ['premium_ssd'], 0.8), ('cold', ['premium_hdd'], 1.0) ]; CREATE STORAGE POLICY tenant_standard_policy SETTINGS disks = ['standard_ssd', 'standard_hdd'], volumes = [ ('hot', ['standard_ssd'], 0.6), ('cold', ['standard_hdd'], 1.0) ]; -- Apply storage policy to tenant tables ALTER TABLE tenant_001.events MODIFY SETTING storage_policy = 'tenant_premium_policy';
I/O Throttling Configuration
<!-- config.xml - I/O limits per tenant --> <profiles> <tenant_premium> <max_network_bandwidth>1000000000</max_network_bandwidth> <max_network_bytes>10000000000</max_network_bytes> <priority>0</priority> </tenant_premium> <tenant_standard> <max_network_bandwidth>500000000</max_network_bandwidth> <max_network_bytes>5000000000</max_network_bytes> <priority>1</priority> </tenant_standard> </profiles>
Query Prioritization and Scheduling
Priority-Based Query Execution
-- Priority-aware query execution CREATE USER tenant_premium_user IDENTIFIED BY 'secure_password' SETTINGS PROFILE 'tenant_premium'; CREATE USER tenant_standard_user IDENTIFIED BY 'secure_password' SETTINGS PROFILE 'tenant_standard'; -- Query with explicit priority SELECT count(*) FROM events_multi_tenant WHERE tenant_id = 1001 SETTINGS priority = 0, max_execution_time = 300;
Queue Management Implementation
# Python implementation for query queue management import asyncio import heapq from dataclasses import dataclass from typing import Dict, List, Optional @dataclass class QueryRequest: tenant_id: str query: str priority: int max_execution_time: int submitted_at: float def __lt__(self, other): return self.priority < other.priority class ClickHouseQueryScheduler: def __init__(self, max_concurrent_queries: Dict[str, int]): self.max_concurrent = max_concurrent_queries self.running_queries: Dict[str, int] = {} self.query_queue: List[QueryRequest] = [] self.lock = asyncio.Lock() async def submit_query(self, request: QueryRequest) -> Optional[str]: async with self.lock: tenant_running = self.running_queries.get(request.tenant_id, 0) tenant_limit = self.max_concurrent.get(request.tenant_id, 1) if tenant_running < tenant_limit: return await self._execute_query(request) else: heapq.heappush(self.query_queue, request) return None async def _execute_query(self, request: QueryRequest) -> str: self.running_queries[request.tenant_id] = \ self.running_queries.get(request.tenant_id, 0) + 1 try: # Execute query with ClickHouse client result = await self._run_clickhouse_query(request) return result finally: self.running_queries[request.tenant_id] -= 1 await self._process_queue() async def _process_queue(self): while self.query_queue: next_request = heapq.heappop(self.query_queue) tenant_running = self.running_queries.get(next_request.tenant_id, 0) tenant_limit = self.max_concurrent.get(next_request.tenant_id, 1) if tenant_running < tenant_limit: await self._execute_query(next_request) break else: heapq.heappush(self.query_queue, next_request) break
Adaptive Resource Allocation
-- Dynamic resource allocation based on tenant usage patterns CREATE MATERIALIZED VIEW tenant_resource_usage ENGINE = SummingMergeTree() PARTITION BY toYYYYMMDD(timestamp) ORDER BY (tenant_id, timestamp) AS SELECT tenant_id, toStartOfMinute(now()) as timestamp, count() as query_count, sum(query_duration_ms) as total_duration_ms, max(memory_usage) as peak_memory_usage, sum(read_bytes) as total_read_bytes FROM system.query_log WHERE event_date >= today() - 1 GROUP BY tenant_id, timestamp; -- Resource allocation adjustment query SELECT tenant_id, avg(query_count) as avg_queries_per_minute, avg(total_duration_ms) as avg_duration_ms, max(peak_memory_usage) as max_memory_needed, multiIf( avg_queries_per_minute > 100, 'scale_up', avg_queries_per_minute < 10, 'scale_down', 'maintain' ) as scaling_recommendation FROM tenant_resource_usage WHERE timestamp >= now() - INTERVAL 1 HOUR GROUP BY tenant_id;
Tenant-Specific Optimizations
Partitioning Strategies
-- Tenant-aware partitioning for optimal query performance CREATE TABLE events_optimized ON CLUSTER production_cluster ( tenant_id UInt32, timestamp DateTime64(3), user_id UInt64, event_type LowCardinality(String), properties Map(String, String), -- Tenant-specific columns tenant_schema_version UInt16, tenant_custom_fields Map(String, String) ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events_optimized', '{replica}') PARTITION BY (tenant_id, toYYYYMM(timestamp)) ORDER BY (tenant_id, timestamp, user_id) SETTINGS index_granularity = 8192, merge_with_ttl_timeout = 86400, -- Tenant-specific TTL ttl_only_drop_parts = 1; -- Tenant-specific TTL policies ALTER TABLE events_optimized MODIFY TTL timestamp + INTERVAL multiIf( tenant_id IN (SELECT tenant_id FROM premium_tenants), 2 YEAR, tenant_id IN (SELECT tenant_id FROM standard_tenants), 1 YEAR, 90 DAY ) DELETE;
Materialized Views for Tenant Analytics
-- Tenant-specific aggregation materialized views CREATE MATERIALIZED VIEW tenant_daily_metrics ENGINE = SummingMergeTree() PARTITION BY (tenant_id, toYYYYMM(date)) ORDER BY (tenant_id, date, event_type) AS SELECT tenant_id, toDate(timestamp) as date, event_type, count() as event_count, uniq(user_id) as unique_users, avg(JSONExtractFloat(properties, 'duration')) as avg_duration FROM events_optimized GROUP BY tenant_id, date, event_type; -- Real-time tenant dashboard query SELECT tenant_id, sum(event_count) as total_events, sum(unique_users) as total_unique_users, avg(avg_duration) as overall_avg_duration FROM tenant_daily_metrics WHERE date >= today() - 7 GROUP BY tenant_id ORDER BY total_events DESC;
Custom Aggregation Functions
-- Tenant-specific custom aggregation CREATE FUNCTION tenantWeightedAvg AS (values, weights, tenant_weight_factor) -> sumForEach(arrayMap(x, y, z -> x * y * z, values, weights, tenant_weight_factor)) / sumForEach(arrayMap(x, y -> x * y, weights, tenant_weight_factor)); -- Usage in tenant analytics SELECT tenant_id, tenantWeightedAvg( groupArray(metric_value), groupArray(sample_weight), groupArray(tenant_priority_multiplier) ) as weighted_metric FROM tenant_metrics GROUP BY tenant_id;
Monitoring and Observability
Tenant-Aware Monitoring Queries
-- Real-time tenant resource monitoring CREATE VIEW tenant_resource_monitor AS SELECT user as tenant_user, count() as active_queries, sum(memory_usage) as total_memory_usage, avg(elapsed) as avg_query_duration, sum(read_bytes) as total_read_bytes, sum(written_bytes) as total_written_bytes FROM system.processes GROUP BY user; -- Tenant SLA monitoring SELECT extractAllGroups(user, 'tenant_(\d+)_user')[1] as tenant_id, countIf(type = 'QueryFinish' AND query_duration_ms <= 5000) / count() as sla_compliance_rate, avg(query_duration_ms) as avg_response_time, max(memory_usage) as peak_memory_usage FROM system.query_log WHERE event_date >= today() GROUP BY tenant_id HAVING count() > 100;
Performance Metrics Collection
# Prometheus metrics for tenant monitoring from prometheus_client import Counter, Histogram, Gauge import clickhouse_connect # Metrics definitions tenant_queries_total = Counter('clickhouse_tenant_queries_total', 'Total queries per tenant', ['tenant_id']) tenant_query_duration = Histogram('clickhouse_tenant_query_duration_seconds', 'Query duration per tenant', ['tenant_id']) tenant_memory_usage = Gauge('clickhouse_tenant_memory_usage_bytes', 'Current memory usage per tenant', ['tenant_id']) class TenantMetricsCollector: def __init__(self, clickhouse_client): self.client = clickhouse_client def collect_tenant_metrics(self): query = """ SELECT extractAllGroups(user, 'tenant_(\d+)_user')[1] as tenant_id, count() as query_count, avg(query_duration_ms) as avg_duration_ms, sum(memory_usage) as total_memory_usage FROM system.query_log WHERE event_time >= now() - INTERVAL 5 MINUTE GROUP BY tenant_id """ results = self.client.query(query) for row in results.result_rows: tenant_id, query_count, avg_duration, memory_usage = row tenant_queries_total.labels(tenant_id=tenant_id).inc(query_count) tenant_query_duration.labels(tenant_id=tenant_id).observe(avg_duration / 1000) tenant_memory_usage.labels(tenant_id=tenant_id).set(memory_usage)
Security and Access Control
Role-Based Access Control
-- Tenant-specific roles and permissions CREATE ROLE tenant_admin; CREATE ROLE tenant_analyst; CREATE ROLE tenant_viewer; -- Grant permissions by tenant tier GRANT SELECT, INSERT, ALTER ON tenant_001.* TO tenant_admin; GRANT SELECT ON tenant_001.* TO tenant_analyst; GRANT SELECT ON tenant_001.events TO tenant_viewer; -- Row-level security for shared tables CREATE ROW POLICY tenant_001_policy ON events_multi_tenant FOR SELECT USING tenant_id = 1001 TO tenant_001_users;
Data Encryption and Isolation
<!-- config.xml - Tenant-specific encryption --> <encryption> <key_command>echo "tenant_001_encryption_key"</key_command> <key_command_2>echo "tenant_002_encryption_key"</key_command_2> </encryption> <storage_configuration> <disks> <tenant_001_encrypted> <type>encrypted</type> <disk>default</disk> <path>/var/lib/clickhouse/tenant_001/</path> <key>key_command</key> </tenant_001_encrypted> </disks> </storage_configuration>
Performance Optimization Best Practices
Query Optimization Techniques
- Tenant-Aware Indexing:
-- Skip indexes for tenant-specific queries ALTER TABLE events_multi_tenant ADD INDEX tenant_bloom_idx tenant_id TYPE bloom_filter GRANULARITY 1; ALTER TABLE events_multi_tenant ADD INDEX event_type_set_idx event_type TYPE set(100) GRANULARITY 1;
- Projection Optimization:
-- Tenant-specific projections ALTER TABLE events_multi_tenant ADD PROJECTION tenant_hourly_projection ( SELECT tenant_id, toStartOfHour(timestamp) as hour, event_type, count(), uniq(user_id) GROUP BY tenant_id, hour, event_type );
- Compression Strategies:
-- Tenant-specific compression ALTER TABLE events_multi_tenant MODIFY COLUMN properties CODEC(ZSTD(1)) -- Light compression for frequently accessed data ALTER TABLE events_multi_tenant MODIFY COLUMN event_metadata CODEC(ZSTD(22)) -- Heavy compression for archival data
Capacity Planning
-- Tenant growth analysis SELECT tenant_id, toStartOfMonth(timestamp) as month, count() as events_count, uncompressedBytes() as uncompressed_size, compressedBytes() as compressed_size, compressedBytes() / uncompressedBytes() as compression_ratio, -- Growth rate calculation (count() - lagInFrame(count()) OVER ( PARTITION BY tenant_id ORDER BY month )) / lagInFrame(count()) OVER ( PARTITION BY tenant_id ORDER BY month ) as growth_rate FROM events_multi_tenant WHERE timestamp >= now() - INTERVAL 12 MONTH GROUP BY tenant_id, month ORDER BY tenant_id, month;
Conclusion
Building effective multi-tenant ClickHouse clusters requires careful consideration of resource isolation, query prioritization, and tenant-specific optimizations. The strategies outlined in this guide provide a comprehensive framework for implementing robust multi-tenancy that scales with your organization’s needs.
Key takeaways include:
- Layered Isolation: Combine database-level, schema-level, and cluster-level isolation strategies
- Dynamic Resource Management: Implement adaptive resource allocation based on tenant usage patterns
- Performance Monitoring: Establish comprehensive monitoring for tenant-specific SLAs
- Security First: Implement proper access controls and data encryption
- Optimization Strategies: Use tenant-aware indexing, partitioning, and compression techniques
By implementing these advanced techniques, you can build ClickHouse clusters that provide excellent performance isolation while maintaining operational efficiency and cost-effectiveness across multiple tenants.
The multi-tenant architecture patterns and optimization strategies presented here have been successfully deployed in production environments handling petabytes of data across hundreds of tenants, demonstrating their effectiveness at scale.
Troubleshooting Inadequate System Resources error in ClickHouse
Optimizing ClickHouse Clusters for High-Speed Parquet Data Queries
Be the first to comment