How to Restore/Recover ClickHouse after ZooKeeper Metadata is corrupted/lost?

How to Restore/Recover ClickHouse after ZooKeeper Metadata is corrupted/lost?


When you are building a highly available ClickHouse Infrastrstucture for scale and performance we use ZooKeeper for ClickHouse Replication Management and Operations with ReplicatedMergeTree table engine. You can expect ZooKeeper to face data loss (or a very complex corruption scenario) due to either human error or system crashes. This scenario is scary for any ClickHouse data infrastructure owner. This will stop/prevent writing to replicated tables though reading from replicated tables will not be interrupted. How can we troubleshoot now this ClickHouse outage? From ClickHouse 21.7, Troubleshooting such a situation has become direct with a new SQL command – SYSTEM RESTORE REPLICA

How to implement this magic?

The entire operation is handled or purely relies on ALTER TABLE ATTACH activity. So the first step is action begins with ALTER ATTACH which triggers GET_PART command on a replicated log. So while applying this command to itself, the replica will fetch the part from other replicated instances. The next action begins with ALTER ATTACH which starts ATTACH_PART command. In this process, it begins with a replica checking the detached folder for the missing part. the comparison of parts happens with checksums. If ever a part is found then the replica attaches the same from the detached folder by fetching it from another available replica.

If this process is invoked on a read-only ClickHouse instance or replica, the RESTORE interpreter will get all the data parts in all states. So it moves all parts to a detached folder clearing the internal state so that table with become empty.

In the next step, The interpreter will push the metadata of the table into ZooKeeper. Internally what will happen is only the first query succeed and the rest will be executed in multiple replicas parallel.

Note: If you try to attempt multiple RESTORE queries on a single replica it will trigger new error. So only one thread will be engaged in the restoration process.

Step-by-Step Restore of Replica

In this example, We have three replicas in a cluster:

CREATE TABLE table_for_restore(in UInt32)
    ENGINE = ReplicatedMergeTree('/clickhouse/tables/cdblab/', '{replica}')
    ORDER BY n PARTITION BY n % 10;

INSERT INTO table_for_restore SELECT * FROM numbers(1000);

You can manually delete or crash to lose all the metadata information across the replicas:

zk.delete("/clickhouse/tables/cdblab", recursive=True)

Restore and Restart 

The RESTORE query works only for read-only tables for several security and reliability reasons so when you restart a ClickHouse replica with ZooKeeper metadata missing, It can attach just read-only tables:

SYSTEM RESTART REPLICA cdblab;

This restores your missing ZooKeeper metadata so eithey you can run RESTORE query on each replicas manually or across the cluster:

SYSTEM RESTORE REPLICA cdblab; -- Single replica restoration
SYSTEM RESTORE REPLICA cdblab ON CLUSTER cdbcluster;

So restoring ZooKeeper metadata with ClickHouse 21.7 is direct and simple.

About ChistaDATA Inc. 10 Articles
We are an full-stack ClickHouse infrastructure operations Consulting, Support and Managed Services provider with core expertise in performance, scalability and data SRE. Based out of California, Our consulting and support engineering team operates out of San Francisco, Vancouver, London, Germany, Russia, Ukraine, Australia, Singapore and India to deliver 24*7 enterprise-class consultative support and managed services. We operate very closely with some of the largest and planet-scale internet properties like PayPal, Garmin, Honda cars IoT project, Viacom, National Geographic, Nike, Morgan Stanley, American Express Travel, VISA, Netflix, PRADA, Blue Dart, Carlsberg, Sony, Unilever etc