Introduction
This knowledge-based article will guide you through converting an existing MergeTree table into a Replication in ClickHouse.
This conversion is beneficial for achieving high availability and load sharing by adding more resources to your database cluster.
Note: If you want to create a replication cluster from a single node or you want to add a new node into replication, you can follow this knowledge base article https://chistadata.com/?post_type=ht_kb&p=7152&preview=true
Table Overview
Before diving into the conversion process, let’s take a quick look at the structure of the original MergeTree table, named “orders,” which we will be replicating:
CREATE TABLE orders.orders ( `topics` LowCardinality(String), `IdentifierC` LowCardinality(String), `TimestampNic` DateTime, `resetTimestamp` Nullable(Int64), `timestamp_event` Int64, `imessage` Int64, `sourceevent` Int32, `eventID` Int64, `StatusComplete` UInt8, `Event_type` LowCardinality(String), `Event_fields_keys` Array(LowCardinality(String)), `EFields_values` Array(String) ) ENGINE = ReplacingMergeTree PARTITION BY (IdentifierC, Event_type ,toDate(TimestampNic) ORDER BY (IdentifierC, Event_type, TimestampNic, topics) TTL TimestampNic + toIntervalDay(15) ; ┌─table─────────────────────────────────────┬─compressed_size─┬─uncompressed_size─┬─ratio─┐ │ orders │ 10.65 GiB │ 49.38 GiB │ 4.64 │
- The table uses the ReplacingMergeTree engine, which efficiently handles data replacement and merging for time-series data.
- It is partitioned by “IdentifierC” and “Event_type.”
- The table is sorted by “IdentifierC,” “Event_type,” “TimestampNic,” and “topics” to improve query performance, particularly for range queries.
- A Time-to-Live (TTL) setting is defined to remove data older than 15 days.
Conversion Steps
Create a New Replicated Table:
First, create a new replicated table named “orders_replicated” using the ReplicatedReplacingMergeTree engine:
use orders; CREATE TABLE orders.orders_replicated ON CLUSTER exp_prod_cluster ( `topics` LowCardinality(String), `IdentifierC` LowCardinality(String), `TimestampNic` DateTime, `resetTimestamp` Nullable(Int64), `timestamp_event` Int64, `imessage` Int64, `sourceevent` Int32, `eventID` Int64, `StatusComplete` UInt8, `Event_type` LowCardinality(String), `Event_fields_keys` Array(LowCardinality(String)), `EFields_values` Array(String) ) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/orders/orders', '{replica}') PARTITION BY (IdentifierC, Event_type ,toDate(TimestampNic)) ORDER BY (IdentifierC, Event_type, TimestampNic, topics) TTL TimestampNic + toIntervalDay(15) SETTINGS index_granularity = 8192;
Prepare the Attach Partition Query:
Generate the SQL query to attach each partition from the original table to the newly created “orders_replicated” table. This step ensures that data is migrated properly.
use orders; clickhouse01 :) SELECT DISTINCT 'ALTER TABLE orders_replicated ATTACH PARTITION ID \'' || partition_id || '\' FROM orders;' from system.parts WHERE table = 'orders' AND active SELECT DISTINCT concat('ALTER TABLE orders_replicated ATTACH PARTITION ID \'', partition_id, '\' FROM orders;') FROM system.parts WHERE (table = 'orders') AND active Query id: e5fcedsc-435g-56tf-7ujg-nnde237e2632 ┌─concat('ALTER TABLE orders_replicated ATTACH PARTITION ID \'', partition_id, '\' FROM orders;')───┐ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '3117c8f06hsgahgfds4567827t583940' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID 'fscgff5432gsbtvzda4132u308262aq1' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '543sabfdacd07cxvsaxfd24567sfrsgb' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '7231gsycdtwbhyyev8652ajudxgxfsbs' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '28765rfsghn023wdcvghgfvssssxcfds' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '1qsyxtgbsvtyxhju7yt562ygbshfsfsg' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '654edfghfedcv1bqgsatrfgbssw23q11' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '876543wwtyjhfdsxcvbnhtr231234444' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '1qazsw23edcvfr45tgbhy7uj8ikm9yde' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '32wsedcyegbvcdr567uyghwystrfgxhg' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '2238765edftyhgfcvb1qaz2wdc5tghgf' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '6543ewdfghgfed2wsdf4ed24rfvcxzss' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '87654etyuhfrxfbhwysgdhdgddtdhbgg' FROM orders; │ │ ALTER TABLE orders_replicated ATTACH PARTITION ID '1qazxd34567trtyhgfdfghfrtyhbsvvv' FROM orders; │ ---- output truncated -----
Execute the Partition Migration:
Execute the generated SQL query to move each partition to the replicated table:
clickhouse-client -u default --password='********' -d orders -q "SELECT DISTINCT 'ALTER TABLE orders_replicated ATTACH PARTITION ID \'' || partition_id || '\' FROM orders;' from system.parts WHERE table = 'orders' format TabSeparatedRaw" |clickhouse-client -mn -u default --password='*********' -d orders
Verify Data Migration:
Check the data migration by comparing the row counts in the original and replicated tables.
Also, execute count on orders on other replicas and verify schema; data should replicate within a few minutes, depending on table size.
SELECT count(1) FROM orders_replicated SELECT count(1) FROM orders
Rename Tables:
After verifying successful data migration, rename the tables to make the replicated table the primary one:
RENAME TABLE orders TO orders_old, orders_replicated TO orders ON CLUSTER exp_prod_cluster
Cleanup (Optional):
If everything works fine with the replicated table, you can keep the “orders_old” copy as a backup for 2-4 days. After ensuring the stability of the new setup, perform any necessary cleanup activities and drop the “orders_old” backup table.
Conclusion:
Following these steps, you can seamlessly convert an existing MergeTree table into a Replication in ClickHouse. This conversion enhances high availability and allows for efficient load sharing in your database cluster.