Photo by Lenharth Systems on StockSnap
Introduction
Backup is a way to protect and restore databases. It is a safeguard against unexpected data loss and application errors. Replication protects the databases from hardware or connection failures, but user errors like accidentally deleting a table or deleting a database or software bugs like incorrect data processing and data corruption are still problems. Because these mistakes affect all clusters, even replicas.
It is better to prepare a backup and restore strategy as a company. On the other hand, there is no magic solution to fit every situation. In ClickHouse, we can use a variety of backup and restore methods. Some of them are:
- Backup / Restore
- Export / Import ( click here to access related KB article)
- Duplicating Source Data
- Filesystem Snapshots
- Using Tool( clickhouse-copier)
- Manipulations with Parts
Every method has its own pros and cons. It is important to choose one or more suitable methods for your company.
This article it is aimed to explain ClickHouse Backup/Restore method.
Environments
In the scope of the demos, ClickHouse V22.9.3 on Ubuntu Linux 22.04 was used.
Backup & Restore
While we are starting to configure the backup solution, the first thing to do is to specify the backup location. As a backup destination, you can use a local directory, an NFS mounted filesystem or even cloud storage such as Amazon S3. For that reason, we need to create an XML file under “/etc/clickhouse-server/config.d“.
The local directory configuration file is shown below. In this example, “/clickHouse/backups/” is the folder where backups are stored, and you can use any other folder as the path. Also, we named “backups” as our backup location, so we are using it on the backup command.
<clickhouse> <storage_configuration> <disks> <backups> <type>local</type> <path>/clickHouse/backups/</path> </backups> </disks> </storage_configuration> <backups> <allowed_disk>backups</allowed_disk> <allowed_path>/clickHouse/backups/</allowed_path> </backups> </clickhouse>
Once this XML file is added, you need to restart the ClickHouse database. After that, login to the database and check if the backup disk is configured. As you can see, “backups” disk with the given path was configured.
clickhouse01 :) SELECT name, path, formatReadableSize(free_space) AS free, formatReadableSize(total_space) AS total, formatReadableSize(keep_free_space) AS reserved FROM system.disks Query id: c45d1937-d10a-494f-b532-08d4ed9f06d7 ┌─name────┬─path─────────────────┬─free──────┬─total─────┬─reserved─┐ │ backups │ /clickHouse/backups/ │ 17.84 GiB │ 28.89 GiB │ 0.00 B │ │ default │ /var/lib/clickhouse/ │ 17.84 GiB │ 28.89 GiB │ 0.00 B │ └─────────┴──────────────────────┴───────────┴───────────┴──────────┘
Also, it is possible to configure S3 as a backup disk. Of course, we need an active AWS S3 bucket and keys to access it.
<clickhouse> <storage_configuration> <disks> <s3> <type>s3</type> <endpoint>https://your_buckets_URL_address</endpoint> <access_key_id>A.........5</access_key_id> <secret_access_key>N...........D</secret_access_key> </s3> </disks> </storage_configuration> <backups> <allowed_disk>s3</allowed_disk> <allowed_path>/backups/</allowed_path> </backups> </clickhouse>
After the backup location is set, it is time to take a backup. The backup and restore commands are basically defined as shown below.
BACKUP TABLE|DATABASE table_name|database_name TO Disk('<backup_location>', 'backupName.zip'); RESTORE TABLE|DATABASE table_name|database_name FROM Disk('<backup_location>', 'backupName.zip')
Backup Parameters
There are some useful parameters that you can use. These are:
- ASYNC: backup or restore asynchronously (default is SYNC)
- PARTITIONS: a list of partitions to restore. Allow restoring specific partitions
- SETTINGS:
- compression_method and compression_level.
- password for the file on disk.
- base_backup: It is possible to use an incremental backup to reduce to the usage of storage, time, and network bandwidth.
In the scope of this blog, we did some tests on both single instance and sharded cluster databases. Let’s examine the backup/restore scenarios.
Scenario I – Single Instance Backup&Restore
First of all, we need a sample dataset to backup. Therefore, we can use air traffic data from the OpenSky network. You can download and install it as shown here. Actually, this is a huge dataset, so I used a subset with 11 million rows.
clickhouse01 :) SELECT count() FROM opensky ┌──count()─┐ │ 11348850 │ └──────────┘
Let’s start with the single table backup. While taking backup, I truncated the table from another console.
#Console 1 clickhouse01 :) BACKUP TABLE opensky TO Disk('backups', 'opensky_20221007'); BACKUP TABLE opensky TO Disk('backups', 'opensky_20221007') Query id: e490789b-59ae-4d08-861e-3514a74988f1 ┌─id───────────────────────────────────┬─status─────────┐ │ 1bb1733d-5b70-4c94-a113-a2b96d6e54ab │ BACKUP_CREATED │ └──────────────────────────────────────┴────────────────┘ #Console 2 clickhouse01 :) truncate table opensky; TRUNCATE TABLE opensky
Now, the table exists but has no record on it.
clickhouse01 :) select count() from opensky; SELECT count() FROM opensky Query id: 2316aa82-2f2a-4174-85b3-c5038c7f6f90 ┌─count()─┐ │ 0 │ └─────────┘
You can access backup and restore information from a system table called backups. But unfortunately, it is a dynamic table and flushed after a restart. So it would be a good idea to install records on any other table that we create. As you can see, just because we did not use “.zip” suffix in backup statement, backup did not compressed. For compression and reduce disk usage, it is important to use “backupName.zip” format.
SELECT * FROM system.backups WHERE id = '1bb1733d-5b70-4c94-a113-a2b96d6e54ab' FORMAT Vertical Query id: c6997a5e-7d7d-4672-a4f8-c9885a5e1008 Row 1: ────── id: 1bb1733d-5b70-4c94-a113-a2b96d6e54ab name: Disk('backups', 'opensky_20221007') status: BACKUP_CREATED num_files: 117 uncompressed_size: 508789970 compressed_size: 508789970 error: start_time: 2022-10-07 09:32:20 end_time: 2022-10-07 09:32:23
Let’s have a look at the backup disk and examine the backup file. As you can see, it backed up data, metadata, and other necessary information.
root@clickhouse01:/clickHouse/backups/opensky_20221007# ls -al total 40 drwxr-x--- 4 clickhouse clickhouse 4096 Oct 7 09:32 . drwxr-x--- 3 clickhouse clickhouse 4096 Oct 7 09:32 .. -rw-r----- 1 clickhouse clickhouse 20731 Oct 7 09:32 .backup drwxr-x--- 3 clickhouse clickhouse 4096 Oct 7 09:32 data drwxr-x--- 3 clickhouse clickhouse 4096 Oct 7 09:32 metadata
Now, it is time to restore the table. As you remember, we truncated the table in the middle of the backup process. ClickHouse did not lock the table and backup operation did not fail. Restore is quite similar to “Backup”. After restore is completed, we can access table and all rows.
clickhouse01 :) RESTORE TABLE opensky FROM Disk('backups', 'opensky_20221007'); ┌─id───────────────────────────────────┬─status───┐ │ ea94e314-4d8d-4adf-ab78-07ff498977be │ RESTORED │ └──────────────────────────────────────┴──────────┘ clickhouse01 :) select count() from opensky; ┌──count()─┐ │ 11348850 │ └──────────┘
It is possible to backup not only tables but also databases. Let’s backup the database and in the middle of the process, drop the complete database with its tables.
#console 1 clickhouse01 :) show tables; ┌─name─────────────────┐ │ opensky │ │ openskyDeletedRecord │ │ opensky_1000000 │ │ opensky_temp │ └──────────────────────┘ clickhouse01 :) BACKUP database sampleDatasets TO Disk('backups', 'sampleDatasets_2022107.zip'); ┌─id───────────────────────────────────┬─status─────────┐ │ 4cbd6759-2e98-40fa-bebd-dcc9c16fc4ac │ BACKUP_CREATED │ └──────────────────────────────────────┴────────────────┘ #console2 clickhouse01 :) drop database sampleDatasets; clickhouse01 :) show databases; ┌─name───────────────┐ │ INFORMATION_SCHEMA │ │ backupInfo │ │ default │ │ information_schema │ │ system │ └────────────────────┘
It is time to restore the database, likewise, restoring the table, restoring a database is quite simple and dropping the database in the middle of the backup process does not cause any problems.
clickhouse01 :) RESTORE database sampleDatasets FROM Disk('backups', 'sampleDatasets_2022107.zip'); ┌─id───────────────────────────────────┬─status───┐ │ ab332153-6346-441e-a593-a489160099ac │ RESTORED │ └──────────────────────────────────────┴──────────┘ clickhouse01 :) show tables; ┌─name─────────────────┐ │ opensky │ │ openskyDeletedRecord │ │ opensky_1000000 │ │ opensky_temp │ └──────────────────────┘
You can use AWS S3 storage as a backup location, too. For that reason, you need to create S3 XML file (as shown in this document) under config.d folder and restart the ClickHouse server. After that, the S3 bucket is ready for backup and restore operations.
Scenario II – Sharded Cluster Backup&Restore
For this test, you will need to create a cluster with 2 shards. You can use this document to create it with ClickHouse Keeper. Using a shared disk for backup and restore operations would be a better idea. For that reason, we created the nfs filesystem and mounted it to servers. After that we organize backup_disk.xml file(as described in this document) for all nodes and restart the servers.
Because we are using cluster infrastructure, we need to create a database and local tables on the cluster. In this example, our cluster name is “cls_chista_2S_1R“. In the scope of this blog, I would not explain how we create clusters and shards.
It is enough to run this command on any of the cluster nodes.
CREATE DATABASE flightlist ON CLUSTER cls_chista_2S_1R; CREATE TABLE flightlist.opensky_local ON CLUSTER cls_chista_2S_1R ( `callsign` String, `number` String, `icao24` String, `registration` String, `typecode` String, `origin` String, `destination` String, `firstseen` DateTime, `lastseen` DateTime, `day` DateTime, `latitude_1` Float64, `longitude_1` Float64, `altitude_1` Float64, `latitude_2` Float64, `longitude_2` Float64, `altitude_2` Float64 ) ENGINE = MergeTree ORDER BY (origin, destination, callsign)
It is time to create the main table. From now, we have to use a distributed table for all queries and DML operations.
CREATE TABLE flightlist.opensky ON CLUSTER cls_chista_2S_1R ( `callsign` String, `number` String, `icao24` String, `registration` String, `typecode` String, `origin` String, `destination` String, `firstseen` DateTime, `lastseen` DateTime, `day` DateTime, `latitude_1` Float64, `longitude_1` Float64, `altitude_1` Float64, `latitude_2` Float64, `longitude_2` Float64, `altitude_2` Float64 ) ENGINE = Distributed(cls_chista_2S_1R,flightlist,opensky_local,cityHash64(origin, destination, callsign) );
After inserting the data into the distributed “opensky” table, ClickHouse distributed the rows over nodes with the given shard key definition.
clickhouse01 :) select count() from opensky_local; ┌─count()─┐ │ 4387412 │ └─────────┘ clickhouse02 :) select count() from opensky_local; ┌─count()─┐ │ 4422271 │ └─────────┘ clickhouse01 :)SELECT count() FROM opensky; ┌─count()─┐ │ 8809683 │ └─────────┘
You can backup and restore table every single node with its local data. In the following example, we backed up only the data in the local table.
clickhouse01 :) BACKUP TABLE flightlist.opensky_local TO Disk('backups', 'opensky_20221007_S1.zip'); BACKUP TABLE flightlist.opensky_local TO Disk('backups', 'opensky_20221007_S1.zip')
Backup and restore operations are similar to single-node operations, but one important point is to take care of them. Since any shard of the distributed table is (“opensky_local“) table itself and the distributed table(“opensky“) has no data, it is important to backup the database for all nodes concurrently.
Conclusion
Backup and restore are crucial for database operations. Because there is no magical method to cover all our problems, it is important to choose the right strategies to protect against data loss. Because of their nature, backup and restore processes use system resources such as network bandwidth, CPU and memory. It is important to schedule backup operations at the least busy time. Also, it is important to test backups taken to see if they are restorable on another server. Another significant point is backup automation and monitoring. It is important to take daily backups automatically via a portal and monitor them in case of failure.
In ClickHouse, you can backup a database, a table or even a materialized view and restore it at a database, table or partition level. Since it is a column-store analytic database, the point-in-time recovery concept is not applicable in restore operations, but it is possible to take an incremental backup instead.