Achieve Peak Efficiency: Next-Level ClickHouse Cluster Resource Management
Introduction
Multi-tenancy in analytical databases presents unique challenges that go beyond traditional OLTP systems. ClickHouse, with its columnar architecture and distributed processing capabilities, offers powerful mechanisms for building robust multi-tenant clusters. This comprehensive guide explores advanced techniques for resource isolation, query prioritization, and tenant-specific optimizations in production ClickHouse environments.
Understanding Multi-Tenancy Challenges in ClickHouse
Resource Contention Patterns
Multi-tenant ClickHouse deployments face several critical resource contention issues:
- CPU Competition: Vectorized query processing can monopolize CPU cores
- Memory Pressure: Large analytical queries consuming available RAM
- I/O Bottlenecks: Concurrent disk reads affecting query performance
- Network Saturation: Distributed queries overwhelming cluster communication
Tenant Isolation Requirements
Effective multi-tenancy requires isolation across multiple dimensions:
- Performance Isolation: Preventing tenant queries from affecting others
- Data Isolation: Ensuring tenant data remains segregated
- Resource Isolation: Guaranteeing minimum resource allocations
- Security Isolation: Maintaining access control boundaries
Architecture Patterns for Multi-Tenant ClickHouse
1. Database-Level Isolation
The simplest approach uses separate databases per tenant:
-- Tenant database creation
CREATE DATABASE tenant_001 ON CLUSTER production_cluster;
CREATE DATABASE tenant_002 ON CLUSTER production_cluster;
-- Tenant-specific table creation
CREATE TABLE tenant_001.events ON CLUSTER production_cluster
(
timestamp DateTime64(3),
user_id UInt64,
event_type String,
properties Map(String, String)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/tenant_001/events', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);
2. Schema-Based Isolation with Row-Level Security
Advanced isolation using tenant_id columns with row-level security:
-- Multi-tenant table with tenant isolation
CREATE TABLE events_multi_tenant ON CLUSTER production_cluster
(
tenant_id UInt32,
timestamp DateTime64(3),
user_id UInt64,
event_type String,
properties Map(String, String)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events_multi_tenant', '{replica}')
PARTITION BY (tenant_id, toYYYYMM(timestamp))
ORDER BY (tenant_id, timestamp, user_id);
-- Row-level security policy
CREATE ROW POLICY tenant_filter ON events_multi_tenant
FOR SELECT USING tenant_id = getSetting('tenant_id')
TO tenant_role;
3. Cluster-Level Isolation
Dedicated clusters for high-value tenants:
<!-- config.xml cluster configuration -->
<remote_servers>
<tenant_premium_cluster>
<shard>
<replica>
<host>premium-ch-01</host>
<port>9000</port>
</replica>
<replica>
<host>premium-ch-02</host>
<port>9000</port>
</replica>
</shard>
</tenant_premium_cluster>
<tenant_standard_cluster>
<shard>
<replica>
<host>standard-ch-01</host>
<port>9000</port>
</replica>
<replica>
<host>standard-ch-02</host>
<port>9000</port>
</replica>
</shard>
</tenant_standard_cluster>
</remote_servers>
Advanced Resource Isolation Techniques
CPU Resource Management
Thread Pool Configuration
<!-- users.xml - Tenant-specific thread limits -->
<profiles>
<tenant_premium>
<max_threads>16</max_threads>
<max_memory_usage>8000000000</max_memory_usage>
<priority>0</priority>
</tenant_premium>
<tenant_standard>
<max_threads>8</max_threads>
<max_memory_usage>4000000000</max_memory_usage>
<priority>1</priority>
</tenant_standard>
<tenant_basic>
<max_threads>4</max_threads>
<max_memory_usage>2000000000</max_memory_usage>
<priority>2</priority>
</tenant_basic>
</profiles>
CPU Affinity and NUMA Optimization
# CPU affinity for tenant isolation numactl --cpunodebind=0 --membind=0 clickhouse-server --config-file=/etc/clickhouse-server/tenant-premium.xml numactl --cpunodebind=1 --membind=1 clickhouse-server --config-file=/etc/clickhouse-server/tenant-standard.xml
Memory Management Strategies
Per-Query Memory Limits
-- Dynamic memory allocation based on tenant tier
SET max_memory_usage = multiIf(
getSetting('tenant_tier') = 'premium', 16000000000,
getSetting('tenant_tier') = 'standard', 8000000000,
4000000000
);
-- Memory-aware query execution
SELECT
tenant_id,
count(*) as event_count,
uniqExact(user_id) as unique_users
FROM events_multi_tenant
WHERE timestamp >= now() - INTERVAL 1 DAY
GROUP BY tenant_id
SETTINGS max_memory_usage = 2000000000;
Buffer Pool Isolation
<!-- config.xml - Tenant-specific buffer configurations -->
<merge_tree>
<max_suspicious_broken_parts>5</max_suspicious_broken_parts>
<parts_to_delay_insert>150</parts_to_delay_insert>
<parts_to_throw_insert>300</parts_to_throw_insert>
<max_delay_to_insert>1</max_delay_to_insert>
<max_parts_in_total>100000</max_parts_in_total>
<replicated_deduplication_window>100</replicated_deduplication_window>
<replicated_deduplication_window_seconds>604800</replicated_deduplication_window_seconds>
</merge_tree>
I/O Resource Isolation
Storage Tiering by Tenant
-- Tenant-specific storage policies
CREATE STORAGE POLICY tenant_premium_policy
SETTINGS
disks = ['premium_ssd', 'premium_hdd'],
volumes = [
('hot', ['premium_ssd'], 0.8),
('cold', ['premium_hdd'], 1.0)
];
CREATE STORAGE POLICY tenant_standard_policy
SETTINGS
disks = ['standard_ssd', 'standard_hdd'],
volumes = [
('hot', ['standard_ssd'], 0.6),
('cold', ['standard_hdd'], 1.0)
];
-- Apply storage policy to tenant tables
ALTER TABLE tenant_001.events
MODIFY SETTING storage_policy = 'tenant_premium_policy';
I/O Throttling Configuration
<!-- config.xml - I/O limits per tenant -->
<profiles>
<tenant_premium>
<max_network_bandwidth>1000000000</max_network_bandwidth>
<max_network_bytes>10000000000</max_network_bytes>
<priority>0</priority>
</tenant_premium>
<tenant_standard>
<max_network_bandwidth>500000000</max_network_bandwidth>
<max_network_bytes>5000000000</max_network_bytes>
<priority>1</priority>
</tenant_standard>
</profiles>
Query Prioritization and Scheduling
Priority-Based Query Execution
-- Priority-aware query execution CREATE USER tenant_premium_user IDENTIFIED BY 'secure_password' SETTINGS PROFILE 'tenant_premium'; CREATE USER tenant_standard_user IDENTIFIED BY 'secure_password' SETTINGS PROFILE 'tenant_standard'; -- Query with explicit priority SELECT count(*) FROM events_multi_tenant WHERE tenant_id = 1001 SETTINGS priority = 0, max_execution_time = 300;
Queue Management Implementation
# Python implementation for query queue management
import asyncio
import heapq
from dataclasses import dataclass
from typing import Dict, List, Optional
@dataclass
class QueryRequest:
tenant_id: str
query: str
priority: int
max_execution_time: int
submitted_at: float
def __lt__(self, other):
return self.priority < other.priority
class ClickHouseQueryScheduler:
def __init__(self, max_concurrent_queries: Dict[str, int]):
self.max_concurrent = max_concurrent_queries
self.running_queries: Dict[str, int] = {}
self.query_queue: List[QueryRequest] = []
self.lock = asyncio.Lock()
async def submit_query(self, request: QueryRequest) -> Optional[str]:
async with self.lock:
tenant_running = self.running_queries.get(request.tenant_id, 0)
tenant_limit = self.max_concurrent.get(request.tenant_id, 1)
if tenant_running < tenant_limit:
return await self._execute_query(request)
else:
heapq.heappush(self.query_queue, request)
return None
async def _execute_query(self, request: QueryRequest) -> str:
self.running_queries[request.tenant_id] = \
self.running_queries.get(request.tenant_id, 0) + 1
try:
# Execute query with ClickHouse client
result = await self._run_clickhouse_query(request)
return result
finally:
self.running_queries[request.tenant_id] -= 1
await self._process_queue()
async def _process_queue(self):
while self.query_queue:
next_request = heapq.heappop(self.query_queue)
tenant_running = self.running_queries.get(next_request.tenant_id, 0)
tenant_limit = self.max_concurrent.get(next_request.tenant_id, 1)
if tenant_running < tenant_limit:
await self._execute_query(next_request)
break
else:
heapq.heappush(self.query_queue, next_request)
break
Adaptive Resource Allocation
-- Dynamic resource allocation based on tenant usage patterns
CREATE MATERIALIZED VIEW tenant_resource_usage
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (tenant_id, timestamp)
AS SELECT
tenant_id,
toStartOfMinute(now()) as timestamp,
count() as query_count,
sum(query_duration_ms) as total_duration_ms,
max(memory_usage) as peak_memory_usage,
sum(read_bytes) as total_read_bytes
FROM system.query_log
WHERE event_date >= today() - 1
GROUP BY tenant_id, timestamp;
-- Resource allocation adjustment query
SELECT
tenant_id,
avg(query_count) as avg_queries_per_minute,
avg(total_duration_ms) as avg_duration_ms,
max(peak_memory_usage) as max_memory_needed,
multiIf(
avg_queries_per_minute > 100, 'scale_up',
avg_queries_per_minute < 10, 'scale_down',
'maintain'
) as scaling_recommendation
FROM tenant_resource_usage
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY tenant_id;
Tenant-Specific Optimizations
Partitioning Strategies
-- Tenant-aware partitioning for optimal query performance
CREATE TABLE events_optimized ON CLUSTER production_cluster
(
tenant_id UInt32,
timestamp DateTime64(3),
user_id UInt64,
event_type LowCardinality(String),
properties Map(String, String),
-- Tenant-specific columns
tenant_schema_version UInt16,
tenant_custom_fields Map(String, String)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events_optimized', '{replica}')
PARTITION BY (tenant_id, toYYYYMM(timestamp))
ORDER BY (tenant_id, timestamp, user_id)
SETTINGS
index_granularity = 8192,
merge_with_ttl_timeout = 86400,
-- Tenant-specific TTL
ttl_only_drop_parts = 1;
-- Tenant-specific TTL policies
ALTER TABLE events_optimized
MODIFY TTL
timestamp + INTERVAL
multiIf(
tenant_id IN (SELECT tenant_id FROM premium_tenants), 2 YEAR,
tenant_id IN (SELECT tenant_id FROM standard_tenants), 1 YEAR,
90 DAY
) DELETE;
Materialized Views for Tenant Analytics
-- Tenant-specific aggregation materialized views
CREATE MATERIALIZED VIEW tenant_daily_metrics
ENGINE = SummingMergeTree()
PARTITION BY (tenant_id, toYYYYMM(date))
ORDER BY (tenant_id, date, event_type)
AS SELECT
tenant_id,
toDate(timestamp) as date,
event_type,
count() as event_count,
uniq(user_id) as unique_users,
avg(JSONExtractFloat(properties, 'duration')) as avg_duration
FROM events_optimized
GROUP BY tenant_id, date, event_type;
-- Real-time tenant dashboard query
SELECT
tenant_id,
sum(event_count) as total_events,
sum(unique_users) as total_unique_users,
avg(avg_duration) as overall_avg_duration
FROM tenant_daily_metrics
WHERE date >= today() - 7
GROUP BY tenant_id
ORDER BY total_events DESC;
Custom Aggregation Functions
-- Tenant-specific custom aggregation
CREATE FUNCTION tenantWeightedAvg AS (values, weights, tenant_weight_factor) ->
sumForEach(arrayMap(x, y, z -> x * y * z, values, weights, tenant_weight_factor)) /
sumForEach(arrayMap(x, y -> x * y, weights, tenant_weight_factor));
-- Usage in tenant analytics
SELECT
tenant_id,
tenantWeightedAvg(
groupArray(metric_value),
groupArray(sample_weight),
groupArray(tenant_priority_multiplier)
) as weighted_metric
FROM tenant_metrics
GROUP BY tenant_id;
Monitoring and Observability
Tenant-Aware Monitoring Queries
-- Real-time tenant resource monitoring
CREATE VIEW tenant_resource_monitor AS
SELECT
user as tenant_user,
count() as active_queries,
sum(memory_usage) as total_memory_usage,
avg(elapsed) as avg_query_duration,
sum(read_bytes) as total_read_bytes,
sum(written_bytes) as total_written_bytes
FROM system.processes
GROUP BY user;
-- Tenant SLA monitoring
SELECT
extractAllGroups(user, 'tenant_(\d+)_user')[1] as tenant_id,
countIf(type = 'QueryFinish' AND query_duration_ms <= 5000) / count() as sla_compliance_rate,
avg(query_duration_ms) as avg_response_time,
max(memory_usage) as peak_memory_usage
FROM system.query_log
WHERE event_date >= today()
GROUP BY tenant_id
HAVING count() > 100;
Performance Metrics Collection
# Prometheus metrics for tenant monitoring
from prometheus_client import Counter, Histogram, Gauge
import clickhouse_connect
# Metrics definitions
tenant_queries_total = Counter('clickhouse_tenant_queries_total',
'Total queries per tenant', ['tenant_id'])
tenant_query_duration = Histogram('clickhouse_tenant_query_duration_seconds',
'Query duration per tenant', ['tenant_id'])
tenant_memory_usage = Gauge('clickhouse_tenant_memory_usage_bytes',
'Current memory usage per tenant', ['tenant_id'])
class TenantMetricsCollector:
def __init__(self, clickhouse_client):
self.client = clickhouse_client
def collect_tenant_metrics(self):
query = """
SELECT
extractAllGroups(user, 'tenant_(\d+)_user')[1] as tenant_id,
count() as query_count,
avg(query_duration_ms) as avg_duration_ms,
sum(memory_usage) as total_memory_usage
FROM system.query_log
WHERE event_time >= now() - INTERVAL 5 MINUTE
GROUP BY tenant_id
"""
results = self.client.query(query)
for row in results.result_rows:
tenant_id, query_count, avg_duration, memory_usage = row
tenant_queries_total.labels(tenant_id=tenant_id).inc(query_count)
tenant_query_duration.labels(tenant_id=tenant_id).observe(avg_duration / 1000)
tenant_memory_usage.labels(tenant_id=tenant_id).set(memory_usage)
Security and Access Control
Role-Based Access Control
-- Tenant-specific roles and permissions CREATE ROLE tenant_admin; CREATE ROLE tenant_analyst; CREATE ROLE tenant_viewer; -- Grant permissions by tenant tier GRANT SELECT, INSERT, ALTER ON tenant_001.* TO tenant_admin; GRANT SELECT ON tenant_001.* TO tenant_analyst; GRANT SELECT ON tenant_001.events TO tenant_viewer; -- Row-level security for shared tables CREATE ROW POLICY tenant_001_policy ON events_multi_tenant FOR SELECT USING tenant_id = 1001 TO tenant_001_users;
Data Encryption and Isolation
<!-- config.xml - Tenant-specific encryption -->
<encryption>
<key_command>echo "tenant_001_encryption_key"</key_command>
<key_command_2>echo "tenant_002_encryption_key"</key_command_2>
</encryption>
<storage_configuration>
<disks>
<tenant_001_encrypted>
<type>encrypted</type>
<disk>default</disk>
<path>/var/lib/clickhouse/tenant_001/</path>
<key>key_command</key>
</tenant_001_encrypted>
</disks>
</storage_configuration>
Performance Optimization Best Practices
Query Optimization Techniques
- Tenant-Aware Indexing:
-- Skip indexes for tenant-specific queries ALTER TABLE events_multi_tenant ADD INDEX tenant_bloom_idx tenant_id TYPE bloom_filter GRANULARITY 1; ALTER TABLE events_multi_tenant ADD INDEX event_type_set_idx event_type TYPE set(100) GRANULARITY 1;
- Projection Optimization:
-- Tenant-specific projections
ALTER TABLE events_multi_tenant
ADD PROJECTION tenant_hourly_projection (
SELECT tenant_id, toStartOfHour(timestamp) as hour,
event_type, count(), uniq(user_id)
GROUP BY tenant_id, hour, event_type
);
- Compression Strategies:
-- Tenant-specific compression ALTER TABLE events_multi_tenant MODIFY COLUMN properties CODEC(ZSTD(1)) -- Light compression for frequently accessed data ALTER TABLE events_multi_tenant MODIFY COLUMN event_metadata CODEC(ZSTD(22)) -- Heavy compression for archival data
Capacity Planning
-- Tenant growth analysis
SELECT
tenant_id,
toStartOfMonth(timestamp) as month,
count() as events_count,
uncompressedBytes() as uncompressed_size,
compressedBytes() as compressed_size,
compressedBytes() / uncompressedBytes() as compression_ratio,
-- Growth rate calculation
(count() - lagInFrame(count()) OVER (
PARTITION BY tenant_id
ORDER BY month
)) / lagInFrame(count()) OVER (
PARTITION BY tenant_id
ORDER BY month
) as growth_rate
FROM events_multi_tenant
WHERE timestamp >= now() - INTERVAL 12 MONTH
GROUP BY tenant_id, month
ORDER BY tenant_id, month;
Conclusion
Building effective multi-tenant ClickHouse clusters requires careful consideration of resource isolation, query prioritization, and tenant-specific optimizations. The strategies outlined in this guide provide a comprehensive framework for implementing robust multi-tenancy that scales with your organization’s needs.
Key takeaways include:
- Layered Isolation: Combine database-level, schema-level, and cluster-level isolation strategies
- Dynamic Resource Management: Implement adaptive resource allocation based on tenant usage patterns
- Performance Monitoring: Establish comprehensive monitoring for tenant-specific SLAs
- Security First: Implement proper access controls and data encryption
- Optimization Strategies: Use tenant-aware indexing, partitioning, and compression techniques
By implementing these advanced techniques, you can build ClickHouse clusters that provide excellent performance isolation while maintaining operational efficiency and cost-effectiveness across multiple tenants.
The multi-tenant architecture patterns and optimization strategies presented here have been successfully deployed in production environments handling petabytes of data across hundreds of tenants, demonstrating their effectiveness at scale.
Further reading
Mastering Nested JOINs in ClickHouse: A Complete Guide to Embedding JOINs within JOINs
Understanding the OpenTelemetry Collector: A Comprehensive Guide to Modern Telemetry Management
Building a Medallion Architecture with ClickHouse: A Complete Guide
Mastering Custom Partitioning Keys in ClickHouse: A Complete Guide
Why is ClickHouse So Fast? The Architecture Behind Lightning-Speed Analytics
Troubleshooting Inadequate System Resources error in ClickHouse
Optimizing ClickHouse Clusters for High-Speed Parquet Data Queries