ClickHouse Horizontal Scaling: Sharding and Resharding Strategies

 

Introduction

Sharding is a process in which a large database table is divided horizontally into smaller ones (with same schema/columns) and stored across different nodes. ClickHouse supports sharding via distributed table engine. You can learn more about sharding and distributed engines in this blog post. While sharding is a powerful technique which can be used to split the data that would be otherwise impossible to fit in to a single server, if not planned properly, we may end up with following scenarios.

  1. Shard Imbalance
  2. Reshard the existing cluster due to unexpected/planned growth in data volume

Along with these, we may have to convert an unsharded table to a sharded one due to business reasons. Let us look at handling such scenarios in a ChistaDATA DBAAS using the clickhouse-copier tool. You can read more about this tool in this blog article. The proposed solution is based on the following premise.

  • ClickHouse doesn’t support automatic shard rebalancing.
  • For an existing sharded table, the simplest way to rebalance the shard is by adjusting the shard weights until the data is equally distributed and resetting the weights later on.  The existing data will not be rebalanced and may require time ranging from days to weeks to achieve balance which is dependent on the velocity of the incoming new data.
  • To rebalance existing data that has been partitioned, we can manually move the partitions from one shard to another. This works only for the tables that have been partitioned at the time of creation.

Sharding an unsharded table in ClickHouse

Problem Statement: 

Let us say that we have a 2 node CH server with replicated tables. We need to shard the table (named ‘unsharded’) to 2 shards and 3 replicas. 

Suggested Steps:

  1. We will need a new 6 node cluster
  2. In the new cluster, make the necessary changes in config.xml in every node for the shards and replicas
  3. Create the distributed table and the underlying replicated tables for storage in the new cluster
  4. In the old cluster, rename the ‘unsharded’ table (From ‘unsharded’ to ”unsharded_old’)
  5. Create a new storage table (named ‘unsharded’) with the same schema for the replicated table. The new data will be temporarily written to the new table.
  6. Update the config for clickhouse-copier with source and destination table and necessary config. Refer Here. The destination table will be the distributed table in the new cluster and the source table will be the renamed storage table (‘unsharded_old’).
  7. After the copier has copied the data, point all the applications reading/writing data to the new cluster.
  8. Then, update the config so that copier will read the data from the new storage tables (named ‘unsharded’) in the old cluster and write to the new cluster.
  9. This will ensure the data that arrived at the time of migration is also captured and the servers would have very minimal downtime.

Sharding rebalancing/resharding

Problem Statement: 

Let us say that we have a 4 node CH server with 2 shards and 2 replicas. We need to re-shard the distributed table (named ‘unsharded’) to 2 shards and 3 replicas. Let us assume name of the storage table as ‘storage’.

Suggested Steps:

  1. We will need a new 6 node cluster
  2. In the new cluster, make the necessary changes in config.xml in every node for the shards and replicas
  3. Create the distributed table and the underlying replicated tables for storage in the new cluster
  4. In the old cluster, rename the storage table for the underlying ‘unsharded’ table (From ‘storage’ to ”storage_old’)
  5. Create a new storage table (named ‘storage’) with the same schema for the distributed table. The new data will be temporarily written to the new table.
  6. Update the config for clickhouse-copier with source and destination table and necessary config. Refer Here. The destination table will be the distributed table in the new cluster and the source table will be the renamed storage table (‘storage_old’).
  7. After the copier has copied the data, point all the applications reading/writing data to the new cluster.
  8. Then, update the config so that copier will read the data from the new storage tables (named ‘storage’) in the old cluster and write to the new cluster.
  9. Repeat step 4,5,6,7 and 8 for all the shards
  10. This will ensure the data that arrived at the time of migration is also captured and the servers would have very minimal downtime.

Conclusion

  • Recommended to keep the old cluster until the customer confirms that the data migration is successful.
  • It’s important to thoroughly test this procedure in a test environment before attempting it on a production system to avoid data loss or corruption.

To learn more about Sharding & Horizontal Scaling in ClickHouse, do read the following articles:

References:

https://clickhouse.com/docs/en/operations/utilities/clickhouse-copier