ClickHouse MergeTree: Introduction to ReplicatedMergeTree

Introduction

ClickHouse has MergeTree family of engines and data replication can be achieved through the replicated version of the MergeTree family engines. This replication works on an individual table level. ClickHouse has recently added support for database level replication via the Replicated database engine.

The Replicated database engine is only responsible for replicating the database and table metadata on the cluster. The data replication across the cluster happens only at the table level. That is, ReplicatedMergeTree* table engines are required to replicate the data. Let us look at an example of this Database engine.

Prerequisites:

  • 2 node ClickHouse cluster
  • Single node zookeeper

The below docker-compose file is used for this article.

version: '3'

services:
  zookeeper:
    image: 'bitnami/zookeeper:latest'
    networks:
      - ch_replicated
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
      - ZOOKEEPER_CLIENT_PORT=2181
    ports:
      - "2182:2181"
      - "2888:2888"
      - "3888:3888"
    
  clickhouse1:
    image: clickhouse/clickhouse-server
    ports:
      - "8002:9000"
      - "9123:8123"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    networks:
      - ch_replicated
    depends_on:
      - zookeeper
      
  clickhouse2:
    image: clickhouse/clickhouse-server
    ports:
      - "8003:9000"
      - "9124:8123"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    networks:
      - ch_replicated
    depends_on:
      - zookeeper
      
networks:
  ch_replicated:
    driver: bridge

Once the ClickHouse docker containers are up and running make the following changes in the config.xml file.

<remote_servers>
<replicated_cluster>
    <shard>
    <internal_replication>true</internal_replication>
    <replica>
        <host>clickhouse1</host>
        <port>9000</port>
    </replica>
    <replica>
        <host>clickhouse2</host>
        <port>9000</port>
    </replica>
    </shard>
</replicated_cluster>
</remote_servers>


<zookeeper>
  <node index="1">
    <host>zookeeper</host>
    <port>2181</port>
  </node>  
</zookeeper>

<macros>
<shard>01</shard>
<replica>ch1</replica>
</macros>

Steps:

  • Enable replicated database engine in users.xml file
<allow_experimental_database_replicated>1</allow_experimental_database_replicated>
  •  Create the database in both the nodes
CREATE DATABASE testdb ENGINE = Replicated('replicated_db', '{shard}', '{replica}');
  •  3. Create a replicated MergeTree table in one of the node
CREATE TABLE testdb.test1

(

col1 Int32,

col2 Int32)

Engine = ReplicatedMergeTree()

ORDER BY col1;
  • Insert the data in one node and verify in both the nodes
INSERT INTO testdb.test1

VALUES (1,1);

SELECT * FROM testdb.test1;
  • Create a MergeTree table in one of the node
CREATE TABLE testdb.test2

(

col1 Int32,

col2 Int32)

Engine = MergeTree()

ORDER BY col1;
  • Insert a row and verify
INSERT INTO testdb.test2

VALUES (1,1);

SELECT * FROM testdb.test2;

The MergeTree table will be available on the second server but the data will not be replicated automatically.

Conclusion

The new replicated database engine helps us to perform database level replication and will be quite useful in multi-node ClickHouse setup.

To read more about MergeTree in ClickHouse, do consider reading the following articles

References:

https://clickhouse.com/docs/en/engines/database-engines/replicated/