ClickHouse Cluster Setup and Configuration
Being a full-stack ClickHouse Optimization Support and Managed Services provider company we often get queries on ClickHouse installation and configuration for both standalone and clustered infrastructure setup. So we decided to write this blog which will help anyone interested in the setup and configuration of ClickHouse. This document is purely recommended for learning ClickHouse installation and configuration, please don’t ever use this document as a checklist or guidance for installation and configuration of ClicHouse on your production infrastructure. MinervaDB or any of their group companies/subsidiaries are not responsible for any kind of damages caused to your business from following this document for your production setup. Technically, ClickHouse installation is quite straight forward and you can run ClickHouse on any of the Linux, FreeBSD, or Mac OS X with x86_64, AArch64, or PowerPC64LE CPU architecture.
ClickHouse installation on Debian systems:
sudo apt-get install apt-transport-https ca-certificates dirmngr sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \ /etc/apt/sources.list.d/clickhouse.list sudo apt-get update sudo apt-get install -y clickhouse-server clickhouse-client sudo service clickhouse-server start clickhouse-client
From RPM packages
Add the official repository to install from pre-compiled rpm packages for CentOS, RedHat, and all other rpm-based Linux distributions:
sudo yum install yum-utils sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/stable/x86_64
ClickHouse installation from repository configured above:
sudo yum install clickhouse-server clickhouse-client
Installation and configuration of ClickHouse from source:
You can install and configure ClickHouse from the source here: https://github.com/ChistaDATA/ClickHouse
Single server with docker:
Run server
docker run -d --name clickhouse-server -p 9000:9000 --ulimit nofile=262144:262144 yandex/clickhouse-server
Run client
docker run -it --rm --link clickhouse-server:clickhouse-server yandex/clickhouse-client --host clickhouse-server
Step-by-step ClickHouse Cluster setup
- We will have 1 cluster with 3 shards in this setup
- Each shard will have 2 replica servers
- We are using ReplicatedMergeTree and Distributed table for this setup
Cluster setup
We have copied below docker-compose.yml for your reference:
<yandex> <clickhouse_remote_servers> <cluster_1> <shard> <weight>1</weight> <internal_replication>true</internal_replication> <replica> <host>clickhouse-01</host> <port>9000</port> </replica> <replica> <host>clickhouse-06</host> <port>9000</port> </replica> </shard> <shard> <weight>1</weight> <internal_replication>true</internal_replication> <replica> <host>clickhouse-02</host> <port>9000</port> </replica> <replica> <host>clickhouse-03</host> <port>9000</port> </replica> </shard> <shard> <weight>1</weight> <internal_replication>true</internal_replication> <replica> <host>clickhouse-04</host> <port>9000</port> </replica> <replica> <host>clickhouse-05</host> <port>9000</port> </replica> </shard> </cluster_1> </clickhouse_remote_servers> <zookeeper-servers> <node index="1"> <host>clickhouse-zookeeper</host> <port>2181</port> </node> </zookeeper-servers> <networks> <ip>::/0</ip> </networks> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
Each server has its own macros.xml, We have copied below same:
<yandex> <macros> <replica>clickhouse-01</replica> <shard>01</shard> <layer>01</layer> </macros> </yandex>
Note: Please confirm your macros setting are in sync. with remote server settings in metrika.xml
Start the server:
docker network create clickhouse-net docker-compose up -d
Please connect with ClickHouse Server to confirm cluster settings are operational:
clickhouse-01 :) select * from system.clusters; SELECT * FROM system.clusters ┌─cluster─────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─────┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┐ │ cluster_1 │ 1 │ 1 │ 1 │ clickhouse-01 │ 192.168.1.105 │ 9000 │ 1 │ default │ │ │ cluster_1 │ 1 │ 1 │ 2 │ clickhouse-06 │ 192.168.1.106 │ 9000 │ 1 │ default │ │ │ cluster_1 │ 2 │ 1 │ 1 │ clickhouse-02 │ 192.168.1.107 │ 9000 │ 0 │ default │ │ │ cluster_1 │ 2 │ 1 │ 2 │ clickhouse-03 │ 192.168.1.108 │ 9000 │ 0 │ default │ │ │ cluster_1 │ 3 │ 1 │ 1 │ clickhouse-04 │ 192.168.1.109 │ 9000 │ 0 │ default │ │ │ cluster_1 │ 3 │ 1 │ 2 │ clickhouse-05 │ 192.168.1.110 │ 9000 │ 0 │ default │ │ │ test_shard_localhost │ 1 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9000 │ 1 │ default │ │ │ test_shard_localhost_secure │ 1 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9440 │ 0 │ default │ │ └─────────────────────────────┴───────────┴──────────────┴─────────────┴───────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┘
When you see this, It means the cluster setting is successful
Replica Table
we have now successfully configured a ClickHouse cluster and replica settings. For clickhouse, we have to create a ReplicatedMergeTree Table as a local table in each server:
CREATE TABLE test_house (id Int32) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/test_house', '{replica}') PARTITION BY id ORDER BY id
Create Distributed Table conn to the local table:
CREATE TABLE test_house_all as test_house ENGINE = Distributed(cluster_1, default, test_house, rand());
Test the setup with the INSERT script
data generation and load
# docker exec into client server 1 and for ((idx=1;idx<=100;++idx)); do clickhouse-client --host clickhouse-server --query "Insert into default.test_house_all values ($idx)"; done;
Count records on the Distributed table:
select count(*) from test_house_all;
Count records on the local table:
select count(*) from test_house;