In ClickHouse, replication does not protect the system from accidental deletes on tables or some records on the tables. For that reason, backup and recovery strategies are crucial for companies.
In this article, I will discuss the ClickHouse native“Backup/Restore” method for protecting database(s) and table(s) from accidental deletions.
You can find another method for a backup/restore strategy here.
Environment
OS : Ubuntu Linux 22.04
DB: ClickHouse v22.9.2.7
Preparations
First of all, we need to specify the backup/restore location where the ClickHouse server runs. To do that, we need to create a “backup_disk.xml” file under ClickHouse’s “config.d” folder. For the default installation, the full path is : “/etc/clickhouse-server/config.d/“.
As shown below, in our example, “/clickHouse/backups” is the backup location and named as “backups“. You can use or create any folder for backup and restore operations.
<clickhouse> <storage_configuration> <disks> <backups> <type>local</type> <path>/clickHouse/backups/</path> </backups> </disks> </storage_configuration> <backups> <allowed_disk>backups</allowed_disk> <allowed_path>/clickHouse/backups/</allowed_path> </backups> </clickhouse>
We need to create a user and give permission for backup and restore operations. Therefore, we connect to the ClickHouse as the “default” user and create a backup user. But first, we have to check the “access_management” parameter in “/etc/clickhouse-server/users.xml“.
If this parameter is commented or the value for it is not “1”, we have to change the configuration as shown below and restart the ClickHouse DB.
<access_management>1</access_management>
Then, it is time to create the “backupAdmin” user and give it the related permissions as well.
clickhouse01 :) create user backupAdmin identified with SHA256_PASSWORD by '<your_password>'; clickhouse01 :) grant CREATE TABLE,CREATE DATABASE on *.* to backupAdmin; clickhouse01 :) grant INSERT ON *.* to backupAdmin; clickhouse01 :) grant backup on *.* to backupAdmin;
For using backup and restore tests, we create a database called “sampleDatasets” . After that, we create the “opensky” table under “sampleDatasets” and insert data as explained here. Use “default” or “admin” users for these operations.
Note: In the scope of these tests, we installed the 4 month data from the “opensky” dataset.
clickhouse01 :) create database sampleDatasets clickhouse01 :) CREATE TABLE sampleDatasets.opensky ( callsign String, number String, icao24 String, registration String, typecode String, origin String, destination String, firstseen DateTime, lastseen DateTime, day DateTime, latitude_1 Float64, longitude_1 Float64, altitude_1 Float64, latitude_2 Float64, longitude_2 Float64, altitude_2 Float64 ) ENGINE = MergeTree ORDER BY (origin, destination, callsign);
Steps
Now, it is time to backup table with “backupAdmin” user.
clickhouse01 :) BACKUP TABLE sampleDatasets.opensky TO Disk('backups','opensky_20220929.bak'); BACKUP TABLE sampleDatasets.opensky TO Disk('backups', 'opensky_20220929.bak') Query id: f3e01544-b48f-43c2-ba17-8077d187a986 ┌─id───────────────────────────────────┬─status─────────┐ │ 9f9053bf-1a38-4f76-b260-12bacfcd5aca │ BACKUP_CREATED │ └──────────────────────────────────────┴────────────────┘ 1 row in set. Elapsed: 4.323 sec.
At the same time, while the backup is running, I deleted a record from the “opensky” table from another terminal with the “default” user and the backup did not affect or the table was not locked at that time.
clickhouse01 :) select count(*) from sampleDatasets.opensky where registration='N661JM'; ┌─count()─┐ │ 1 │ └─────────┘ clickhouse01 :) set allow_experimental_lightweight_delete=1; clickhouse01 :) delete from sampleDatasets.opensky where registration='N661JM'; Ok. clickhouse01 :) select count(*) from sampleDatasets.opensky where registration='N661JM'; ┌─count()─┐ │ 0 │ └─────────┘
You can check backup operations from “system.backups” system table with backup id.
clickhouse01 :) select id,status,start_time,end_time from system.backups where id='9f9053bf-1a38-4f76-b260-12bacfcd5aca'; SELECT id, status, start_time, end_time FROM system.backups WHERE id = '9f9053bf-1a38-4f76-b260-12bacfcd5aca' Query id: d552f726-0ebc-445e-8d95-0ae0869d3940 ┌─id───────────────────────────────────┬─status─────────┬──────────start_time─┬────────────end_time─┐ │ 9f9053bf-1a38-4f76-b260-12bacfcd5aca │ BACKUP_CREATED │ 2022-09-29 10:13:17 │ 2022-09-29 10:13:21 │ └──────────────────────────────────────┴────────────────┴─────────────────────┴─────────────────────┘
It is time to restore the table, so I expect to see the record with the ‘N661JM‘ registration ID,which was deleted while the backup was running, after the table is restored.
For that reason, I truncate or drop table(or both at time same time) with “default” user at first.
clickhouse01 :) truncate table sampleDatasets.opensky; Ok. clickhouse01 :) drop table sampleDatasets.opensky; Ok.
Then, restore the table with “backupAdmin” user.
RESTORE TABLE sampleDatasets.opensky FROM Disk('backups', 'opensky_20220929.bak') Query id: eb4e5033-a0ac-4931-aa68-083c096d22f4 ┌─id───────────────────────────────────┬─status───┐ │ 56299e68-8fa0-4129-8dfe-a5511f2bdf47 │ RESTORED │ └──────────────────────────────────────┴──────────┘
Finally, I queried for the “opensky” table if it exists and has data.
clickhouse01 :) select count(*) from sampleDatasets.opensky; ┌──count()─┐ │ 11348850 │ └──────────┘
Afterward, I checked the record that I deleted during the backup process.
clickhouse01 :) select count(*) from sampleDatasets.opensky where registration='N661JM'; ┌─count()─┐ │ 1 │ └─────────┘
I have also tested “Truncate” or “Drop Table” while the backup is running. In all of the scenarios, I was able to restore the tables.
Backup and restore utilities are not only used for tables but also for databases and even materialized views and dictionaries as well. You can backup tables or databases full or incrementally, compress backups, and secure them with a password.