ClickHouse Updates: Deep Dive into ALTER Statement

Introduction

Business requirements change over time, and the database design may alter as well. You might not be able to afford to remove a table and rebuild it from scratch. The ALTER statement allows you to perform all of this very easily. For example, you may need to add a column to a certain table or need to change the data type of a certain column in a certain table. This blog post will attempt to delve deeper into what it is, what it does, and when it should be utilized.

As the saying goes, “the only constant in life is change.”

Alter Statement

The ALTER statement is often used in a database to add, change, or drop columns in a table. In addition, the table can be renamed, and data types can be changed, and all these changes are based on DDL. In ClickHouse, however, unlike other databases, the ALTER statement allows us to do UPDATE and DELETE column operations that are based on DML. The ALTER statement is more versatile than you would imagine. Briefly, summarize the usage scenarios as a diagram, and let’s take a look at some of them:

Column manipulations

This article will contain trick information about utilizing the ALTER statement. Because the statement of syntax is already simple to utilize. Without further ado, let’s get to the tips that will make your job easier. Here is our table metadata:

ClickHouse01 :) show create table test;

SHOW CREATE TABLE test

Query id: a5eda870-ff84-4ccc-ad0f-0679ae9d2eb3

┌─statement───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.test
(
    `EventDate` DateTime,
    `id` UInt64
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(EventDate)
ORDER BY id
SETTINGS index_granularity = 8192 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

ADD|DROP|MODIFY|CLEAR|COMMENT commands can be used to alter a column in a ClickHouse table.

ALTER TABLE default.test ADD COLUMN `Quantity` String AFTER `EventDate`;

Giving ‘AFTER’ with the name of another column, the column is added to the list of table columns after the one specified. Otherwise, the column is appended to the table’s end. It should be noted that it is not possible to add a column at the beginning of a table. After using these commands, the ‘Quantity’ column comes after the ‘EventDate’ in the table structure.

Adding a column modifies the table structure but has no effect on the data. The data does not show on the disk instantly after the ALTER statement. When reading from the table, if a column’s data is missing, it is filled in with default values. After merging data parts, the column appears on the disk. This method allows us to finish the ALTER query rapidly without increasing the volume of old data.

ALTER TABLE default.test DROP COLUMN `Quantities`;

Deletes all information about the `Quantities` column from the file system. Because this deletes whole files, the query is almost instantaneous. Also, there is no support for deleting columns in the primary key.

ALTER TABLE default.test MODIFY COLUMN `Quantities` UInt64;

Changing the column type is a little bit of a difficult task. Because it modifies the contents of data files. This may take a long time for huge tables. If one of the subsequent processes fails during the operation, the data can be manually recovered. Also, changing the column type in arrays and nested data structures is not supported. Modifying the type of columns in the primary key is permitted only if it does not involve changing the actual data.

Also, it should not be forgotten that the ALTER MODIFY query blocks all tables for reads and writes. As an example, consider data type changes. As previously stated, changing the data type of a column changes the data in it. Therefore, if a long SELECT is running at the time of the ALTER query, the ALTER query must wait for the SELECT to conclude before continuing.

Manipulations with partitions

•DETACH PARTITION

ALTER TABLE default.test DETACH PARTITION '202209';

The ‘detached’ directory contains partitions that have been removed from the table using the DETACH query. Partitions that are damaged are also relocated to this location rather than being deleted. You can add, delete, or edit data in the ‘detached’ directory at any time; the server will not be aware of this until you run the ATTACH statement. This query is replicated, so the relevant partitions will go to the ‘detached’ folder on all replicas of the cluster.

•ATTACH PARTITION

ALTER TABLE default.test ATTACH PARTITION '202209';

Adding data to the table from the ‘detached’ directory is possible. This query is replicated, so the relevant partition will be added to the table in all replicas of the cluster.

•DROP PARTITION

ALTER TABLE default.test DROP PARTITION '202209';

Removes all data from the table. Data partitions will be marked as inactive and destroyed completely in 8 minutes. The query has been replicated, data on all replicas will be destroyed.

•FREEZE PARTITION

ALTER TABLE default.test FREEZE PARTITION '202209';

Creates a local backup of one or more partitions but it is worth specifying that, this command just gives data and does not include metadata of table. The name can be the partition’s complete name (for example, 202209), or its prefix (for example, 2022), in which case the backup will be performed for all relevant partitions. Consider the /var/lib/clickhouse/metadata/default/test.sql(as default location) file to reach table metadata. To restore from this backup, using CREATE statement instead of ATTACH is possible. This query hasn’t been replicated. A local backup is created exclusively on the local server.

•FETCH PARTITION

ALTER TABLE default.test FETCH PARTITION '202209' FROM 'path-in-zookeeper';

Has data been accidentally deleted or corrupted? No worries. This functionality can be considered a kind of backup. The FETCH command downloads the specified partition from the shard with the supplied ZooKeeper path in the FROM clause, then stores it in the ‘detached’ directory for the selected table. This query is only applicable to replica tables. The system verifies that the partition exists and that the table structure matches before downloading. The most suitable replica is chosen automatically from among the healthy replicas. The query FETCH is not duplicated. Only the local server’s ‘detached’ directory will get the partition. It should be noted that if you use the ATTACH query to add data to the table after this, the data will be added to all replicas.

Using ZooKeeper features in replicated environments

ALTER TABLE default.test MODIFY COLUMN `Quantities` UInt64 settings replication_alter_partitions_sync= '2';
or
SET replication_alter_partitions_sync = 2;
ALTER TABLE default.test MODIFY COLUMN `Quantities` UInt64;

All ALTER queries for non-replica tables are executed synchronously. However, for replica tables, the instructions are kept in ZooKeeper and subsequently applied to each copy. ALTER queries are executed in the same order. The query, on the other hand, can wait for these activities to be completed on all of the replicas. In such cases, using the ZooKeeper parameter dynamically, which is replication_alter_partitions_sync is possible. Queueing actions will be executed asynchronously or synchronously on replicas according to their values. You may also specify this command by embedding the ALTER command at the end or specifying the command at the beginning is possible like above example. If you specify this option at the beginning, all ALTER, TRUNCATE, and OPTIMIZE commands that execute in the database will operate in accordance with the parameter value. Possible parameter values are:

0 – do not wait

1 – only wait for own execution (default)

2 – wait for all replicas.

Privileges

ALTER statements are classified into numerous categories, some of which are hierarchical and must be explicitly stated. Privileges are granted or revoked. To utilize the ALTER statement, the user must be granted the appropriate permissions, and because these privileges are hierarchical, be aware of what you are approving!

Conclusion

ClickHouse uses the ALTER statement to add, delete or modify columns in a table and allows deleting and modifying data from the columns. It also supports partitioning operations and provides numerous customization options not covered here. In order for the statement to be successfully executed, you must have the necessary privileges and the ALTER query is only supported for *MergeTree, Merge, and Distributed type tables.

To learn more about Clickhouse updates & troubleshooting, do consider reading the below articles

About Ilkay 25 Articles
Ilkay has been administering databases including NoSQL and RDBMS for over 7 years. He is experienced working high-scale banking databases. He is currently working at ChistaDATA Inc. as Database Administrator.
Contact: Website