Introduction
In ClickHouse, data is organized into tables, and each table is divided into one or more part/partitions. For detailed information about parts and partitions, please visit Part 1. In this part of the series, we will discuss manipulation operations on parts and partitions. Partitions are a way of dividing a table into smaller pieces to make it more manageable. Part/partition manipulation refers to the act of creating, altering, or deleting parts/partitions in a ClickHouse table.
Let’s talk about the primary manipulation operations on parts/partitions in depth.
Environments
OS :Ubuntu 22.04
DB : ClickHouse v22.12.1
Dataset: Opensky ( You can download the dataset as explained here.)
Note: In the scope of this blog, we created the “opensky_partitioned” table, which is partitioned with the “PARTITION BY toYYYYMMDD(lastseen)” clause, under the “sampleDatasets” database.
Detach or Attach Part / Partition
With the help of “detach” operation, specified parts or partitions can move to the “detached” folder. The user cannot access related parts or partitions until they are attached. By default, the “detached” folder is located under “/var/lib/clickhouse/data/<DATABASE_NAME>/<TABLE_NAME>” directory. On the other hand, “attach” operation is attached parts or partitions from the “detached” folder. The “detach” and “attach” operation syntax in ClickHouse is shown below:
#DETACH PART/PARTITION ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DETACH PARTITION|PART <PARTITION_EXPRESSION> #ATTACH PART/PARTITION ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] ATTACH PARTITION|PART <PARTITION_EXPRESSION>
Here, the partition expression should be the partition name, partition id or expression itself.
Let’s assume we want to detach any specified days (in our example, we’ll use 04/15/2019 data) partition from the “opensky_partitioned” table.
First, we discovered the partition and part name for the given date. As you can see, this date is in partition “20190415,” which has two parts.
SELECT partition, name, partition_id FROM system.parts WHERE (database = 'sampleDatasets') AND (table = 'opensky_partitioned') AND (partition_id IN ( SELECT DISTINCT _partition_id FROM sampleDatasets.opensky_partitioned WHERE toDate(lastseen) = '2019-04-15' )) ┌─partition─┬─name─────────────┬─partition_id─┐ │ 20190415 │ 20190415_62_62_0 │ 20190415 │ │ 20190415 │ 20190415_78_78_0 │ 20190415 │ └───────────┴──────────────────┴──────────────┘
Then, it is time to detach partition “20190415”, attach it again, and do the same test with one of the “part”.
-- Count related date before detached SELECT count() FROM sampleDatasets.opensky_partitioned WHERE toDate(lastseen) = '2019-04-15' ┌─count()─┐ │ 78647 │ └─────────┘ -- detach partition ALTER TABLE sampleDatasets.opensky_partitioned DETACH PARTITION 20190415 Ok. -- Count related date after detached SELECT count() FROM sampleDatasets.opensky_partitioned WHERE toDate(lastseen) = '2019-04-15' ┌─count()─┐ │ 0 │ └─────────┘
As we talked about, partitions are logical division of tables and include at least one part. In our example, the “20190415” partition has 2 parts. While we are detached the partitions all this parts are move to “/var/lib/clickhouse/data/<DATABASE_NAME>/<TABLE_NAME>/detached” folder.
root@server:~# ls -1 /var/lib/clickhouse/data/sampleDatasets/opensky_partitioned/detached 20190415_62_62_0 20190415_78_78_0
Now, we’ll attach the “20190415” partition as follow:
--attach partition ALTER TABLE sampleDatasets.opensky_partitioned ATTACH PARTITION 20190415 Ok. --Count related date after attach partition SELECT count() FROM sampleDatasets.opensky_partitioned WHERE toDate(lastseen) = '2019-04-15' ┌─count()─┐ │ 78647 │ └─────────┘
Let’s do same test for “part”.
--find parts for given date SELECT partition, name, partition_id FROM system.parts WHERE (database = 'sampleDatasets') AND (table = 'opensky_partitioned') AND (partition_id IN ( SELECT DISTINCT _partition_id FROM sampleDatasets.opensky_partitioned WHERE toDate(lastseen) = '2019-04-15' )) ┌─partition─┬─name───────────────┬─partition_id─┐ │ 20190415 │ 20190415_142_142_0 │ 20190415 │ │ 20190415 │ 20190415_143_143_0 │ 20190415 │ └───────────┴────────────────────┴──────────────┘ -- Count before detach part operation SELECT count() FROM sampleDatasets.opensky_partitioned WHERE toDate(lastseen) = '2019-04-15' ┌─count()─┐ │ 78647 │ └─────────┘ -- detach any of the parts ALTER TABLE sampleDatasets.opensky_partitioned DETACH PART '20190415_142_142_0' Ok. -- Count after detach part operation SELECT count() FROM sampleDatasets.opensky_partitioned WHERE toDate(lastseen) = '2019-04-15' ┌─count()─┐ │ 78517 │ └─────────┘ -- Attach related part ALTER TABLE sampleDatasets.opensky_partitioned ATTACH PART '20190415_142_142_0' -- Count after attach the part SELECT count() FROM sampleDatasets.opensky_partitioned WHERE toDate(lastseen) = '2019-04-15' ┌─count()─┐ │ 78647 │ └─────────┘
We can also attach a partition from the source table to the destination table with the following command. In this method, data won’t be deleted from either source or destination.
ALTER TABLE <DESTINATION_TABLE> [ON CLUSTER <CLUSTER_NAME>] ATTACH PARTITION <PARTITION EXPRESSION> FROM <SOURCE_TABLE>
For running this command, source and destination tables have same:
- Structure.
- Partition key, the same order by key and the same primary key.
- Storage policy.
Drop Part / Partition
It drops the specified part or partition from the table. Dropped part or partition is marked as inactive in “system.parts” table.
That part/partition stays in “/var/lib/clickhouse/data/<database_name>/<table_name>/” folder about 10 more minutes after drop operation.
You can find “drop” command syntax and drop operation example as shown below:
-- Drop from table itself ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DROP PARTITION|PART 'PART/PARTITON EXPRESSION' -- Remove specified part/partition from detached folder ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DROP DETACHED PARTITION|PART 'PART/PARTITON EXPRESSION'
-- Specified part/partition is active before drop operation ┌─partition─┬─name───────────────┬─partition_id─┬─active─┐ │ 20190415 │ 20190415_147_147_0 │ 20190415 │ 1 │ └───────────┴────────────────────┴──────────────┴────────┘ -- Drop partition ALTER TABLE sampleDatasets.opensky_partitioned DROP PARTITION 20190415 -- Specified part/partition is not active(active=0) after drop partition ┌─partition─┬─name───────────────┬─partition_id─┬─active─┐ │ 20190415 │ 20190415_147_147_1 │ 20190415 │ 0 │ └───────────┴────────────────────┴──────────────┴────────┘ -- Specified part/partition is completely deleted after 10 minutes
Move Part / Partition
You can move the partition(s) to another table in Clickhouse. In that case, both source and destination tables have the same structure, partition key, order by key, primary key, storage policy and engine family. The other option for “move” is moving a part or partition to another disk or volume for merge tree engine tables.
Move the partition to another table:
-- Usage: ALTER TABLE <DATABASE_NAME.SOURCE_TABLE> [ON CLUSTER <CLUSTER_NAME>] MOVE PARTITION <PARTITION EXPRESSION> TO TABLE <DATABASE_NAME.DESTINATION_TABLE> -- 1. Create empty opensky_partitioned_new table same as the opensky_partitioned -- 2. Move 20190415 partition from opensky_partitioned to opensky_partitioned_new ALTER TABLE sampleDatasets.opensky_partitioned MOVE PARTITION '20190415' TO TABLE sampleDatasets.opensky_partitioned_new Ok. --3. Take partition count for 20190415 in source table SELECT count() FROM sampleDatasets.opensky_partitioned WHERE _partition_id = '20190415' ┌─count()─┐ │ 0 │ └─────────┘ --4 Take partition count for 20190415 in destination table SELECT count() FROM sampleDatasets.opensky_partitioned_new WHERE _partition_id = '20190415' Query id: 49319c73-c84f-4fbf-838d-ab787971eaad ┌─count()─┐ │ 78647 │ └─────────┘
For moving a part or partition to another disk or volume, we need to set a storage policy and create a new table with this policy. On the other hand, storage policy and data tiering are outside the scope of this post. We will not cover storage policy in this post. We only show the movement concept.
-- Our table's(opensky_partitioned_with_storage_policy) policy is "ssd_to_hdd" -- and this volume contain 2 disks(ssd and hdd) SELECT * FROM system.storage_policies WHERE policy_name = ( SELECT storage_policy FROM system.tables WHERE (database = 'sampleDatasets') AND (name = 'opensky_partitioned_with_storage_policy') ┌─policy_name─┬─volume_name─┬─volume_priority─┬─disks───┬─volume_type─┬─max_data_part_size─┬─move_factor─┬─prefer_not_to_merge─┐ │ ssd_to_hdd │ ssd_volume │ 1 │ ['ssd'] │ JBOD │ 0 │ 0.1 │ 0 │ │ ssd_to_hdd │ hdd_volume │ 2 │ ['hdd'] │ JBOD │ 0 │ 0.1 │ 0 │ └─────────────┴─────────────┴─────────────────┴─────────┴─────────────┴────────────────────┴─────────────┴─────────────────────┘ -- The 20190415 partition is stored under ssd disk. SELECT partition, name, path FROM system.parts WHERE (database = 'sampleDatasets') AND (table = 'opensky_partitioned_with_storage_policy') AND (partition = '20190415') ┌─partition─┬─name───────────────┬─path────────────────────────────────────────────────────────────────────┐ │ 20190415 │ 20190415_107_107_0 │ /ssd/store/300/3005efb6-244d-473f-93ac-944233d79a55/20190415_107_107_0/ │ └───────────┴────────────────────┴─────────────────────────────────────────────────────────────────────────┘ -- Let's move it to the hdd disk ALTER TABLE sampleDatasets.opensky_partitioned_with_storage_policy MOVE PARTITION '20190415' TO DISK 'hdd' -Check the new volume SELECT partition, name, path FROM system.parts WHERE (database = 'sampleDatasets') AND (table = 'opensky_partitioned_with_storage_policy') AND (partition = '20190415') ┌─partition─┬─name───────────────┬─path────────────────────────────────────────────────────────────────────┐ │ 20190415 │ 20190415_107_107_0 │ /hdd/store/300/3005efb6-244d-473f-93ac-944233d79a55/20190415_107_107_0/ │ └───────────┴────────────────────┴─────────────────────────────────────────────────────────────────────────┘ -- You can move parts and also you can move volume instead of disk
Freeze or Unfreeze Part / Partition
We are using the “freeze” command to backup part, partitions and even tables. You can find detailed blog about freeze here.
Conclusion
ClickHouse has several manipulation operations for partitions and parts. In the scope of this post, we talked about attach, detach, move, drop and freeze operations in partitions. There are also replace, update, delete and fetch operations in partition operations. In the next and last part of this series, we’ll talk about partition expressions and mutations.
To know more about ClickHouse Partitions, do consider giving the following articles a read