1. Home
  2. Knowledge Base
  3. ClickHouse
  4. Building a Fault Tolerant ClickHouse Cluster with ZooKeeper
  1. Home
  2. Knowledge Base
  3. ClickHouse DBA
  4. Building a Fault Tolerant ClickHouse Cluster with ZooKeeper
  1. Home
  2. Knowledge Base
  3. ClickHouse Performance
  4. Building a Fault Tolerant ClickHouse Cluster with ZooKeeper
  1. Home
  2. Knowledge Base
  3. ClickHouse Support
  4. Building a Fault Tolerant ClickHouse Cluster with ZooKeeper

Building a Fault Tolerant ClickHouse Cluster with ZooKeeper

Previous post : Setup ClickHouse Cluster Replication with Zookeeper

Using a single-node ZooKeeper instance is suitable during the development and testing phase. However, in production environments, it is recommended to use an odd number of nodes greater than 1 to adhere to the quorum principle.

Moreover, adding or removing additional ZooKeeper nodes is possible based on the cluster size and load. To make configuration changes in a ZooKeeper node without any downtime, you can make changes and reboot one node at a time while the other nodes continue serving traffic. This approach helps avoid complete downtime and ensures the service remains available.

Also, It is recommended to have dedicated nodes for the zookeeper cluster and have fast storage (SSD) for daraDir and logDir to avoid latency.

Configure and Start Zookeeper

Install dependencies and zookeeper packages using apt-get and also use hostnamectl utility to set hostname for easy identification.

root@ip-10-0-11-49:~#  sudo apt-get update
root@ip-10-0-11-49:~#  sudo apt-get install zookeeper netcat

root@ip-10-0-11-49:~# cat /etc/hosts
10.0.11.49 zookeeper1
10.0.8.194 zookeeper2
10.0.4.205 zookeeper3

root@ip-10-0-11-49:~# hostnamectl set-hostname zookeeper1
root@ip-10-0-8-194:~# hostnamectl set-hostname zookeeper2
root@ip-10-0-4-205:~# hostnamectl set-hostname zookeeper3

root@ip-10-0-11-49:~# reboot
Configure /etc/hosts on all nodes in cluster

Configure three nodes Zookeeper cluster.

  1. – The myid should be unique on each node.
  2. – Keep zoo.cfg file same across all nodes in a cluster.
  3. – Make the necessary optimization as listed below.
  4. – Start the service.
# Configure this id and It should be unique across cluster.
root@zookeeper1:~# cat /etc/zookeeper/conf/myid
1
root@zookeeper2:~# cat /etc/zookeeper/conf/myid
2
root@zookeeper3:~# cat /etc/zookeeper/conf/myid
3

# Keep below configuration same on all nodes in cluster on zookeeper , zookeeper1,zookeeper2,zookeeper3
root@zookeeper1:~# sudo cat /etc/zookeeper/conf/zoo.cfg 
tickTime=2000
initLimit=20
syncLimit=10
dataDir=/var/lib/zookeeper
clientPort=2181
maxSessionTimeout=60000000
maxClientCnxns=2000
server.1=zookeeper1:2888:3888
autopurge.purgeInterval=1
autopurge.snapRetainCount=10
4lw.commands.whitelist=*
preAllocSize=131072
snapCount=3000000

## start service
root@ip-10-0-11-49:~# sudo -u zookeeper /usr/share/zookeeper/bin/zkServer.sh restart 
ZooKeeper JMX enabled by default 
Using config: /etc/zookeeper/conf/zoo.cfg 
Starting zookeeper ... STARTED

Verify Zookeeper running status by executing the below commands.

  1. – Use ruok , mntr , stat commands.
  2. – Can try making a connection to zookeeper node.
  3. – Check node status using:  `/bin/zkServer.sh status`.

 

echo ruok | nc localhost 2181
echo mntr | nc localhost 2181
echo stat | nc localhost 2181

#Connect to Zookeeper
/usr/share/zookeeper/bin/bin/zkCli.sh -server 127.0.0.1:2181

# Expect output like added below
root@zookeeper1:~# echo ruok | nc localhost 2181
imokroot@zookeeper1:~#

root@zookeeper1:~# echo mntr | nc localhost 2181
zk_version 3.4.13-6ubuntu4--1, built on Wed, 23 Mar 2022 13:40:46 +0000
zk_avg_latency 0
zk_max_latency 11
zk_min_latency 0
zk_packets_received 5108
zk_packets_sent 5115
zk_num_alive_connections 2
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 191
zk_watch_count 7
zk_ephemerals_count 6
zk_approximate_data_size 20568
zk_open_file_descriptor_count 37
zk_max_file_descriptor_count 1048576
zk_fsync_threshold_exceed_count 0
zk_followers 2
zk_synced_followers 2
zk_pending_syncs 0
zk_last_proposal_size 36
zk_max_proposal_size 409
zk_min_proposal_size 32
Zookeeper version: 3.4.13-6ubuntu4--1, built on Wed, 23 Mar 2022 13:40:46 +0000
Clients:
/0:0:0:0:0:0:0:1:34914[0](queued=0,recved=1,sent=0)
/10.0.12.141:58302[1](queued=0,recved=2412,sent=2412)

root@zookeeper1:~# echo stat | nc localhost 2181
Zookeeper version: 3.4.13-6ubuntu4--1, built on Wed, 23 Mar 2022 13:40:46 +0000
Clients:
/10.0.12.141:58302[1](queued=0,recved=2498,sent=2498)
/0:0:0:0:0:0:0:1:39064[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/11
Received: 5198
Sent: 5205
Connections: 2
Outstanding: 0
Zxid: 0x1000000d6
Mode: leader
Node count: 191
Proposal sizes last/min/max: 36/32/409
root@zookeeper1:~#

## Also you can try making connections to test
root@zookeeper1:~# /usr/share/zookeeper/bin/zkCli.sh -server 127.0.0.1:2181
Connecting to 127.0.0.1:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: 127.0.0.1:2181(CONNECTED) 0]

## After successful installation you will see nodes with status where one node will be leader and other two nodes acting as follower

root@zookeeper1:~# /usr/share/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg

Mode: follower
root@zookeeper1:~#

root@zookeeper2:~# /usr/share/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg

Mode: leader

root@zookeeper3:~# /usr/share/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg

Mode: follower
root@zookeeper3:~#

Configuring ClickHouse to use ZooKeeper

Let click house Server know about each zookeeper node in cluster, make below changes in XML file.

root@clickhouse1:/etc/clickhouse-server/config.d# cat zookeeper.xml
<yandex>
    <zookeeper>
        <node>
            <host>zookeeper1</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper2</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper3</host>
            <port>2181</port>
        </node>
        <session_timeout_ms>30000</session_timeout_ms>
        <operation_timeout_ms>10000</operation_timeout_ms>
        <!-- Optional. Chroot suffix. Should exist. -->
        <!-- <root>/path/to/zookeeper/node</root> -->
        <!-- Optional. ZooKeeper digest ACL string. -->
        <!-- <identity>user:password</identity> -->
    </zookeeper>
    <!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster. -->
    <!-- Works only if ZooKeeper is enabled. Comment it out if such functionality isn't required. -->
    <distributed_ddl>
        <!-- Path in ZooKeeper to queue with DDL queries -->
        <path>/clickhouse/task_queue/ddl</path>

        <!-- Settings from this profile will be used to execute DDL queries -->
        <!-- <profile>default</profile> -->
    </distributed_ddl>
</yandex>

 

Restart ClickHouse and check the status of the cluster.

Verify Setup

clickhouse2 :) SELECT
                   hostName(),
                   *
               FROM clusterAllReplicas('cluster_demo_ash', system, one)

SELECT
    hostName(),
    *
FROM clusterAllReplicas('cluster_demo_ash', system, one)

Query id: 212b066b-91c3-4ac9-84e3-3253fd4b50dc

┌─hostName()──┬─dummy─┐
│ clickhouse2 │     0 │
└─────────────┴───────┘
┌─hostName()──┬─dummy─┐
│ clickhouse1 │     0 │
└─────────────┴───────┘

2 rows in set. Elapsed: 0.005 sec.clickhouse1 :) CREATE DATABASE IF NOT EXISTS Example_DB_3 ON CLUSTER `{cluster}`

CREATE DATABASE IF NOT EXISTS Example_DB_3 ON CLUSTER `{cluster}`

Query id: 65eee69c-e7ef-441a-bd97-6ab72d8b0e0d

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse1 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

2 rows in set. Elapsed: 0.122 sec.

clickhouse1 :) CREATE TABLE Example_DB_3.product ON CLUSTER '{cluster}'
                              (
                                   created_at DateTime,
                                   product_id UInt32,
                                   category UInt32
                              ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/Example_DB_3/product', '{replica}')
                              PARTITION BY toYYYYMM(created_at)
                              ORDER BY (product_id, toDate(created_at), category)
                              SAMPLE BY category;

CREATE TABLE Example_DB_3.product ON CLUSTER `{cluster}`
(
    `created_at` DateTime,
    `product_id` UInt32,
    `category` UInt32
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/Example_DB_3/product', '{replica}')
PARTITION BY toYYYYMM(created_at)
ORDER BY (product_id, toDate(created_at), category)
SAMPLE BY category

Query id: 2e57061c-9b89-474a-b987-a654986f1e5d

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse1 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

2 rows in set. Elapsed: 0.285 sec.

clickhouse2 :) INSERT INTO Example_DB_3.product VALUES (now(),9832,11);

INSERT INTO Example_DB_3.product FORMAT Values

Query id: d5f7870f-76ce-4da4-acb8-22e67a599dab

Ok.

1 row in set. Elapsed: 0.010 sec.

clickhouse2 :) INSERT INTO Example_DB_3.product VALUES (now(),123,12);

INSERT INTO Example_DB_3.product FORMAT Values

Query id: 1bcea5df-9c62-4d2d-a6a5-6791204fa7e8

Ok.

1 row in set. Elapsed: 0.010 sec.

clickhouse2 :) INSERT INTO Example_DB_3.product VALUES (now(),0987,13);

INSERT INTO Example_DB_3.product FORMAT Values

Query id: 7d84d232-1ac2-45c7-909e-8aed69339e88

Ok.

1 row in set. Elapsed: 0.010 sec.


clickhouse1 :) select * from Example_DB_3.product;

SELECT *
FROM Example_DB_3.product

Query id: 475fd914-2bbd-470a-956e-f90c836dbb24

┌──────────created_at─┬─product_id─┬─category─┐
│ 2023-04-24 08:02:17 │        123 │       12 │
└─────────────────────┴────────────┴──────────┘
┌──────────created_at─┬─product_id─┬─category─┐
│ 2023-04-24 08:02:12 │       9832 │       11 │
└─────────────────────┴────────────┴──────────┘
┌──────────created_at─┬─product_id─┬─category─┐
│ 2023-04-24 08:02:21 │        987 │       13 │
└─────────────────────┴────────────┴──────────┘

3 rows in set. Elapsed: 0.003 sec.


clickhouse2 :) select * from system.zookeeper WHERE path = '/clickhouse/tables/cluster_demo_ash/1/default/test' FORMAT Vertical

SELECT *
FROM system.zookeeper
WHERE path = '/clickhouse/tables/cluster_demo_ash/1/default/test'
FORMAT Vertical

Query id: 83ad70ff-1417-4a44-a16a-1a58d9fdd5d0

Row 1:
──────
name:  alter_partition_version
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 2:
──────
name:  metadata
value: metadata format version: 1
date column:
sampling expression: userid
index granularity: 8192
mode: 0
sign column:
primary key: contractid, toDate(timestamp), userid
data format version: 1
partition key: toYYYYMM(timestamp)
granularity bytes: 10485760

path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 3:
──────
name:  temp
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 4:
──────
name:  table_shared_id
value: d4497dd6-48f3-4a78-ac0e-615cf9ff1450
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 5:
──────
name:  log
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 6:
──────
name:  leader_election
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 7:
──────
name:  columns
value: columns format version: 1
3 columns:
`timestamp` DateTime
`contractid` UInt32
`userid` UInt32

path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 8:
──────
name:  blocks
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 9:
───────
name:  async_blocks
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 10:
───────
name:  nonincrement_block_numbers
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 11:
───────
name:  replicas
value: last added replica: clickhouse1
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 12:
───────
name:  quorum
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 13:
───────
name:  pinned_part_uuids
value: {"part_uuids":"[]"}
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 14:
───────
name:  block_numbers
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 15:
───────
name:  mutations
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

Row 16:
───────
name:  part_moves_shard
value:
path:  /clickhouse/tables/cluster_demo_ash/1/default/test

16 rows in set. Elapsed: 0.004 sec.

clickhouse2 :)

 

To dockerize your application with the ClickHouse cluster and Zookeeper nodes, Please refer https://github.com/ashwini-ahire7/clickhouse-replication-zookeeper_HA

Was this article helpful?

Related Articles

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

In the spirit of freedom, independence and innovation. ChistaDATA Corporation is not affiliated with ClickHouse Corporation 

Need Support?

Can't find the answer you're looking for?
Contact Support

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)

Copyright 2022 ChistaDATA Inc

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.