Image Source – Pexels
Data replication is the process of storing multiple copies of data to ensure system reliability and improve data availability. ClickHouse supports multi-primary replication, and it is asynchronous (eventually consistent). Every MergeTree table engine has a replicated version, and other engines don’t support data replication.
Replicated tables require Apache Zookeeper or ClickHouse keeper. Creation/Deletion of replicated tables should happen in individual nodes of the cluster. The data can be inserted and read from any nodes in the cluster. Let us look at the steps to create a replicated table in the ClickHouse cluster. I am using docker and Apache Zookeeper to demonstrate the steps involved. Ensure the Docker engine and Docker compose are installed and working.
Docker-based ClickHouse cluster.
Let us set up a 3-node ClickHouse cluster and a single-node Zookeeper. I have used the docker-compose file below for this setup.
version: '3' services: zookeeper: image: 'bitnami/zookeeper:latest' networks: - ch_replicated environment: - ALLOW_ANONYMOUS_LOGIN=yes - ZOOKEEPER_CLIENT_PORT=2181 ports: - "2182:2181" - "2888:2888" - "3888:3888" clickhouse1: image: clickhouse/clickhouse-server ports: - "8002:9000" - "9123:8123" ulimits: nproc: 65535 nofile: soft: 262144 hard: 262144 networks: - ch_replicated depends_on: - zookeeper clickhouse2: image: clickhouse/clickhouse-server ports: - "8003:9000" - "9124:8123" ulimits: nproc: 65535 nofile: soft: 262144 hard: 262144 networks: - ch_replicated depends_on: - zookeeper clickhouse3: image: clickhouse/clickhouse-server ports: - "8004:9000" - "9125:8123" ulimits: nproc: 65535 nofile: soft: 262144 hard: 262144 networks: - ch_replicated depends_on: - zookeeper networks: ch_replicated: driver: bridge
We are creating a bridge network (ch_replicated) for seamless networking experience across the containers. Fire up the containers using docker-compose up command. verify if all the containers are up and running.
vijayanand@chistadata:/media/vijayanand/Data/ClickHouse/ClickHouse_Replication$ sudo docker container ls [sudo] password for vijayanand: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f60b856f1fb5 clickhouse/clickhouse-server "/entrypoint.sh" 5 days ago Up 58 seconds 9009/tcp, 0.0.0.0:9124->8123/tcp, :::9124->8123/tcp, 0.0.0.0:8003->9000/tcp, :::8003->9000/tcp clickhouse_replication_clickhouse2_1 a04d155475e4 clickhouse/clickhouse-server "/entrypoint.sh" 5 days ago Up 58 seconds 9009/tcp, 0.0.0.0:9123->8123/tcp, :::9123->8123/tcp, 0.0.0.0:8002->9000/tcp, :::8002->9000/tcp clickhouse_replication_clickhouse1_1 b222f8b44db8 clickhouse/clickhouse-server "/entrypoint.sh" 5 days ago Up 58 seconds 9009/tcp, 0.0.0.0:9125->8123/tcp, :::9125->8123/tcp, 0.0.0.0:8004->9000/tcp, :::8004->9000/tcp clickhouse_replication_clickhouse3_1 69527fd60266 bitnami/zookeeper:latest "/opt/bitnami/script…" 5 days ago Up 58 seconds 0.0.0.0:2888->2888/tcp, :::2888->2888/tcp, 0.0.0.0:3888->3888/tcp, :::3888->3888/tcp, 8080/tcp, 0.0.0.0:2182->2181/tcp, :::2182->2181/tcp clickhouse_replication_zookeeper_1
Configuring the ClickHouse cluster for replicated tables
Configure the following things in the config.xml
- Cluster config
- Zookeeper config
- Macros (optional)
Cluster Config
Add the following in the config.xml file in every node to configure the multi node ClickHouse cluster.
<remote_servers> <replicated_cluster> <shard> <internal_replication>true</internal_replication> <replica> <host>clickhouse1</host> <port>9000</port> </replica> <replica> <host>clickhouse2</host> <port>9000</port> </replica> <replica> <host>clickhouse3</host> <port>9000</port> </replica> </shard> </replicated_cluster> </remote_servers>
This config tells the ClickHouse server on the existence of other ClickHouse nodes. The host-name and port details are based on the details in docker-compose file.
Zookeeper Config
Add the details of the Zookeeper cluster in every node’s config file.
<zookeeper> <node index="1"> <host>zookeeper</host> <port>2181</port> </node> </zookeeper>
Macros
Macros substitutions are commonly used while configuring the Zookeeper path in replicated tables.
<macros> <shard>01</shard> <replica>ch1</replica> </macros>
The macro values must vary for every node.
Creating and testing a replicated table
Let us create a replicated table with just two columns. Repeat the creation process in every configured node.
CREATE TABLE replicated_example ( ID UInt32, Name String ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/replicated_example_1', '{replica}') ORDER BY (ID)
The table creation syntax is similar to the non-replicated MergeTree counterpart. Along with the regular parameters, we will have to provide the Zookeeper path (Znode path where the replication info will be stored) and the replica information. The {replica} is substituted from the macro which we configured earlier.
Once the table creation is successful, try inserting some data in the table from one node. the data will be available automatically in the rest of the configured nodes.
Conclusion
- Replicated versions of all the MergeTree family engines are available in ClickHouse.
- Replication requires a Zookeeper/ClickHouse keeper.
- DDL commands needs to be run in every node (CREATE, DROP)
- DML and DQL can be run in any one of the nodes (ALTER, INSERT, and SELECT)
References
https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/
https://medium.com/@merticariug/distributed-clickhouse-configuration-d412c211687c