ClickHouse MergeTree: Introduction to ReplicatedMergeTree

Table of Contents

Introduction

ClickHouse has MergeTree family of engines and data replication can be achieved through the replicated version of the MergeTree family engines. This replication works on an individual table level. ClickHouse has recently added support for database level replication via the Replicated database engine.

The Replicated database engine is only responsible for replicating the database and table metadata on the cluster. The data replication across the cluster happens only at the table level. That is, ReplicatedMergeTree* table engines are required to replicate the data. Let us look at an example of this Database engine.

Prerequisites:

2 node ClickHouse cluster
Single node zookeeper

The below docker-compose file is used for this article.

version: '3'

services:
  zookeeper:
    image: 'bitnami/zookeeper:latest'
    networks:
      - ch_replicated
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
      - ZOOKEEPER_CLIENT_PORT=2181
    ports:
      - "2182:2181"
      - "2888:2888"
      - "3888:3888"
    
  clickhouse1:
    image: clickhouse/clickhouse-server
    ports:
      - "8002:9000"
      - "9123:8123"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    networks:
      - ch_replicated
    depends_on:
      - zookeeper
      
  clickhouse2:
    image: clickhouse/clickhouse-server
    ports:
      - "8003:9000"
      - "9124:8123"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    networks:
      - ch_replicated
    depends_on:
      - zookeeper
      
networks:
  ch_replicated:
    driver: bridge

Once the ClickHouse docker containers are up and running make the following changes in the config.xml file.

<remote_servers>
<replicated_cluster>
    <shard>
    <internal_replication>true</internal_replication>
    <replica>
        <host>clickhouse1</host>
        <port>9000</port>
    </replica>
    <replica>
        <host>clickhouse2</host>
        <port>9000</port>
    </replica>
    </shard>
</replicated_cluster>
</remote_servers>


<zookeeper>
  <node index="1">
    <host>zookeeper</host>
    <port>2181</port>
  </node>  
</zookeeper>

<macros>
<shard>01</shard>
<replica>ch1</replica>
</macros>

Steps:

Enable replicated database engine in users.xml file

<allow_experimental_database_replicated>1</allow_experimental_database_replicated>

Create the database in both the nodes

CREATE DATABASE testdb ENGINE = Replicated('replicated_db', '{shard}', '{replica}');

3. Create a replicated MergeTree table in one of the node

CREATE TABLE testdb.test1

(

col1 Int32,

col2 Int32)

Engine = ReplicatedMergeTree()

ORDER BY col1;

Insert the data in one node and verify in both the nodes

INSERT INTO testdb.test1

VALUES (1,1);

SELECT * FROM testdb.test1;

Create a MergeTree table in one of the node

CREATE TABLE testdb.test2

(

col1 Int32,

col2 Int32)

Engine = MergeTree()

ORDER BY col1;

Insert a row and verify

INSERT INTO testdb.test2

VALUES (1,1);

SELECT * FROM testdb.test2;

The MergeTree table will be available on the second server but the data will not be replicated automatically.

Conclusion

The new replicated database engine helps us to perform database level replication and will be quite useful in multi-node ClickHouse setup.

To read more about MergeTree in ClickHouse, do consider reading the following articles

References:

https://clickhouse.com/docs/en/engines/database-engines/replicated/

ChistaDATA Inc.

Enterprise-class 24*7 ClickHouse Consultative Support and Managed Services

ClickHouse MergeTree: Introduction to ReplicatedMergeTree

Introduction

Prerequisites:

Steps:

Conclusion