Configuring ClickHouse Keeper?
ClickHouse Keeper is a built-in solution with ClickHouse Server for implementing ClickHouse Replication solutions for horizontal scalability across nodes and clusters. So you don’t have to worry about ZooKeeper installation and configuration outside the ClickHouse infrastructure. In this blog post, we have explained how to build/configure ClickHouse Keeper across 3 nodes of Linux to evaluate distributed operations.
Configuring Nodes with Keeper settings
▬▬▬▬▬▬▬▬▬▬▬▬▬
Step 1: ClickHouse installation across 3 Linux nodes – We will call them clickhousen1, clickhousen2 and clickhousen3
Step 2: Enter the following details for allowing the external communication through the network interface
<listen_host>0.0.0.0</listen_host>
Step 3: Configure the ClickHouse Keeper across all three servers updating <server_id> setting with values “1” for clickhousen1, “2” for clickhousen2 and “3” for clickhousen3
<keeper_server> <tcp_port>9181</tcp_port> <server_id>1</server_id> <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path> <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path> <coordination_settings> <operation_timeout_ms>10000</operation_timeout_ms> <session_timeout_ms>30000</session_timeout_ms> <raft_logs_level>warning</raft_logs_level> </coordination_settings> <raft_configuration> <server> <id>1</id> <hostname>clickhousen1.domain.com</hostname> <port>9444</port> </server> <server> <id>2</id> <hostname>clickhousen2.domain.com</hostname> <port>9444</port> </server> <server> <id>3</id> <hostname>clickhousen3.domain.com</hostname> <port>9444</port> </server> </raft_configuration> </keeper_server>
The detailed description of configuration parameters is copied below ( Source: https://clickhouse.com/docs/en/guides/sre/clickhouse-keeper/ )
Parameter | Description | Example |
---|---|---|
tcp_port | port to be used by clients of keeper | 9181 default equivalent of 2181 as in zookeeper |
server_id | Unique identifier for each CLickHouse Keeper server which is used in raft configuration | 1 |
coordination_settings | section to parameters such as timeouts | timeouts: 10000, log level: trace |
server | definition of server participating | list of each server definition |
raft_configuration | settings for each server in the keeper cluster | server and settings for each |
id | numeric id of the server for keeper services | 1 |
hostname | hostname, IP or FQDN of each server in the keeper cluster | clickhousen1.domain.com |
port | port to listen on for interserver keeper connections | 9444 |
Step 4: Enabling the ZooKeeper component using ClickHouseKeeper storage emm Enable the Zookeeper component. It will use the ClickHouse Keeper engine:
Parameter | Description | Example |
---|---|---|
node | list of nodes for ClickHouse Keeper connections | settings entry for each server |
host | hostname, IP or FQDN of each ClickHouse keepr node | clickhousen1.domain.com |
port | ClickHouse Keeper client port | 9181 |
Step 5: Restart ClickHouse Server:
Restart ClickHouse and verify that each Keeper instance is running. Execute the following command on each server. The ruok command returns imok if Keeper is running and healthy:
P.S. – command on each server. The ruok command returns imok if Keeper is running and healthy:
# echo ruok | nc localhost 9181; echo imok
The system database has a table named zookeeper that contains the details of your ClickHouse Keeper instances. Let’s view the table:
SELECT * FROM system.zookeeper WHERE path IN ('/', '/clickhouse')
The table will look like this:
┌─name───────┬─value─┬─czxid─┬─mzxid─┬───────────────ctime─┬───────────────mtime─┬─version─┬─cversion─┬─aversion─┬─ephemeralOwner─┬─dataLength─┬─numChildren─┬─pzxid─┬─path────────┐ │ clickhouse │ │ 618 │ 579 │ 2022-05-22 13:17:11 │ 2022-05-22 13:17:11 │ 0 │ 2 │ 0 │ 0 │ 0 │ 2 │ 5693 │ / │ │ task_queue │ │ 791 │ 681 │ 2022-05-22 13:17:11 │ 2022-05-22 13:17:11 │ 0 │ 1 │ 0 │ 0 │ 0 │ 1 │ 126 │ /clickhouse │ │ tables │ │ 2139 │ 1259 │ 2022-05-22 13:17:11 │ 2022-05-22 13:17:11 │ 0 │ 3 │ 0 │ 0 │ 0 │ 3 │ 6461 │ /clickhouse │ └────────────┴───────┴───────┴───────┴─────────────────────┴─────────────────────┴─────────┴──────────┴──────────┴────────────────┴────────────┴─────────────┴───────┴─────────────┘
How to configure a cluster in ClickHouse?
In this example, we are configuring a very simple cluster with just 2 shards (please update configuration on clickhousen1 and clickhousen2) and a replica on 2 of the modes. We are using the third node to build a quorum for ClickHouse Keeper. The following cluster builds 1 shared on each node for a total of 2 shards with no replication. So some data will be on nod1 and others will be on node2
<cluster_2shards_1Repl> <shard> <replica> <host>clickhousen1.domain.com</host> <port>9000</port> <user>default</user> <password>ChistaDATA@12345</password> </replica> </shard> <shard> <replica> <host>clickhousen2.domain.com</host> <port>9000</port> <user>default</user> <password>ChistaDATA@12345</password> </replica> </shard> </cluster_2shards_1Repl>
Parameter | Description |
---|---|
shard | A larger datasets will be split in smaller chunks and stored across multiple data nodes for scalability and reliability |
replica | Retaining additional copies of the database across the nodes for performance, scalability (READs mostly and reliability) |
host | hostname, IP or FQDN of server t |
port | An endpoint of service for communication purposes. |
user | User for login |
password | Password for successful authorisation/connectivity |
Step 6: Restart ClickHouse Server and validate for the successful creation of Cluster:
SHOW clusters;
Note if your above steps had completed successfully you should be seeing the cluster details as copied below:
cluster_2shards_1Repl
Creating a distributed table to validate clustering
Step 1: create a new database using ClickHouse Client on clickhousen1. This will create a new database on both nodes:
CREATE DATABASE cdat1 ON CLUSTER 'cluster_2shards_1Repl';
Step 2: Create a new table on the cdat1 database:
CREATE TABLE cdat1.tab1 on cluster 'cluster_2shards_1Repl' ( `id` UInt64, `column1` String ) ENGINE = MergeTree ORDER BY column1
Step 3: Please insert clickhousen1 with rows as copied below:
INSERT INTO cdat1.tab1 (id, column1) VALUES (50, 'dec'), (51, 'jan')
Step 4: Please insert clickhousen2 with rows as copied below:
INSERT INTO cdat1.tab1 (id, column1) VALUES (53, 'mar'), (54, 'feb')
Please Note: The SELECTs on each node show only the data on that node:
clickhousen1
┌─id─┬─column1─┐ │ 50 │ dec │ │ 51 │ jan │ └────┴─────────┘
clickhousen2
┌─id─┬─column1─┐ │ 53 │ mar │ │ 54 │ feb │ └────┴─────────┘
Step 5: To represent the data on both shards you can create a Distributed Table:
CREATE TABLE cdat1.dist_table ( id UInt64, column1 String ) ENGINE = Distributed(cluster_2shards_1Repl,cdat1,tab1)
Note:
You can query distributed table (cdat1.dist_table) to return all four rows of data from the two shards
SELECT * FROM cdat1.dist_table ┌─id─┬─column1─┐ │ 50 │ dec │ │ 51 │ jan │ └────┴─────────┘ ┌─id─┬─column1─┐ │ 53 │ mar │ │ 54 │ feb │ └────┴─────────┘
Summary
The objective of this blog post is to explain to you the step-by-step installation and configuration of ClickHouse Keeper. Thanks for reading and your comments.
References: https://clickhouse.com/docs/en/guides/sre/