Parts and Partitions in ClickHouse – Part II: Manipulation Operations

In ClickHouse, data is organized into tables, and each table is divided into one or more part/partitions. For detailed information about parts and partitions, please visit Part I. In this part of the series, we will discuss manipulation operations on parts and partitions. Partitions are a way of dividing a table into smaller pieces to make it more manageable. Part/partition manipulation refers to the act of creating, altering, or deleting parts/partitions in a ClickHouse table.

partition-img

Pexels – Photo By ROMBO

Let’s talk about the primary manipulation operations on parts/partitions in depth.

Environments:

OS :Ubuntu 22.04

DB : ClickHouse v22.12.1

Dataset: Opensky ( You can download the dataset as explained here.)

Note: In the scope of this blog, we created the “opensky_partitioned” table, which is partitioned with the “PARTITION BY toYYYYMMDD(lastseen)” clause, under the “sampleDatasets” database.

DETACH/ATTACH PARTITION|PART

With the help of “detach” operation, specified parts or partitions can move to the “detached” folder. The user cannot access related parts or partitions until they are attached. By default, the “detached” folder is located under  “/var/lib/clickhouse/data/<DATABASE_NAME>/<TABLE_NAME>” directory.  On the other hand, “attach” operation is attached parts or partitions from the “detached” folder. The “detach” and “attach” operation syntax in ClickHouse is shown below:

#DETACH PART/PARTITION
ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DETACH PARTITION|PART <PARTITION_EXPRESSION>

#ATTACH PART/PARTITION
ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] ATTACH PARTITION|PART <PARTITION_EXPRESSION>

Here, the partition expression should be the partition name, partition id or expression itself.

Let’s assume we want to detach any specified days (in our example, we’ll use 04/15/2019 data) partition from the “opensky_partitioned” table.

First, we discovered the partition and part name for the given date. As you can see, this date is in partition “20190415,” which has two parts.

SELECT
    partition,
    name,
    partition_id
FROM system.parts
WHERE (database = 'sampleDatasets') AND (table = 'opensky_partitioned') AND (partition_id IN (
    SELECT DISTINCT _partition_id
    FROM sampleDatasets.opensky_partitioned
    WHERE toDate(lastseen) = '2019-04-15'
))


┌─partition─┬─name─────────────┬─partition_id─┐
│ 20190415  │ 20190415_62_62_0 │ 20190415     │
│ 20190415  │ 20190415_78_78_0 │ 20190415     │
└───────────┴──────────────────┴──────────────┘

Then, it is time to detach partition “20190415”, attach it again, and do the same test with one of the “part”.

-- Count related date before detached
SELECT count()
FROM sampleDatasets.opensky_partitioned
WHERE toDate(lastseen) = '2019-04-15'
┌─count()─┐
│   78647 │
└─────────┘

-- detach partition
ALTER TABLE sampleDatasets.opensky_partitioned
    DETACH PARTITION 20190415
Ok.

-- Count related date after detached
SELECT count()
FROM sampleDatasets.opensky_partitioned
WHERE toDate(lastseen) = '2019-04-15'
┌─count()─┐
│       0 │
└─────────┘

As we talked about, partitions are logical division of tables and include at least one part. In our example, the “20190415” partition has 2 parts. While we are detached the partitions all this parts are move to “/var/lib/clickhouse/data/<DATABASE_NAME>/<TABLE_NAME>/detached” folder.

root@server:~# ls -1 /var/lib/clickhouse/data/sampleDatasets/opensky_partitioned/detached
20190415_62_62_0
20190415_78_78_0

Now, we’ll attach the “20190415” partition as follow:

--attach partition
ALTER TABLE sampleDatasets.opensky_partitioned
    ATTACH PARTITION 20190415

Ok.
--Count related date after attach partition
SELECT count()
FROM sampleDatasets.opensky_partitioned
WHERE toDate(lastseen) = '2019-04-15'
┌─count()─┐
│   78647 │
└─────────┘

Let’s do same test for “part”.

--find parts for given date
SELECT
    partition,
    name,
    partition_id
FROM system.parts
WHERE (database = 'sampleDatasets') AND (table = 'opensky_partitioned') AND (partition_id IN (
    SELECT DISTINCT _partition_id
    FROM sampleDatasets.opensky_partitioned
    WHERE toDate(lastseen) = '2019-04-15'
))
┌─partition─┬─name───────────────┬─partition_id─┐
│ 20190415  │ 20190415_142_142_0 │ 20190415     │
│ 20190415  │ 20190415_143_143_0 │ 20190415     │
└───────────┴────────────────────┴──────────────┘

-- Count before detach part operation
SELECT count()
FROM sampleDatasets.opensky_partitioned
WHERE toDate(lastseen) = '2019-04-15'
┌─count()─┐
│   78647 │
└─────────┘

-- detach any of the parts
ALTER TABLE sampleDatasets.opensky_partitioned
    DETACH PART '20190415_142_142_0'
Ok.

-- Count after detach part operation
SELECT count()
FROM sampleDatasets.opensky_partitioned
WHERE toDate(lastseen) = '2019-04-15'
┌─count()─┐
│   78517 │
└─────────┘

-- Attach related part
ALTER TABLE sampleDatasets.opensky_partitioned
    ATTACH PART '20190415_142_142_0'

-- Count after attach the part
SELECT count()
FROM sampleDatasets.opensky_partitioned
WHERE toDate(lastseen) = '2019-04-15'
┌─count()─┐
│   78647 │
└─────────┘

We can also attach a partition from the source table to the destination table with the following command. In this method, data won’t be deleted from either source or destination.

ALTER TABLE <DESTINATION_TABLE> [ON CLUSTER <CLUSTER_NAME>] ATTACH PARTITION <PARTITION EXPRESSION> FROM <SOURCE_TABLE>

For running this command, source and destination tables have same:

  • Structure.
  • Partition key, the same order by key and the same primary key.
  • Storage policy.

DROP PARTITION/PART

It drops the specified part or partition from the table. Dropped part or partition is marked as inactive in “system.parts” table.

That part/partition stays in “/var/lib/clickhouse/data/<database_name>/<table_name>/” folder about 10 more minutes after drop operation.

You can find “drop” command syntax and drop operation example as shown below:

-- Drop from table itself
ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DROP PARTITION|PART 'PART/PARTITON EXPRESSION'

-- Remove specified part/partition from detached folder
ALTER TABLE <DATABASE_NAME>.<TABLE_NAME> [ON CLUSTER <CLUSTER_NAME>] DROP DETACHED PARTITION|PART 'PART/PARTITON EXPRESSION'
-- Specified part/partition is active before drop operation
┌─partition─┬─name───────────────┬─partition_id─┬─active─┐
│ 20190415  │ 20190415_147_147_0 │ 20190415     │      1 │
└───────────┴────────────────────┴──────────────┴────────┘

-- Drop partition
ALTER TABLE sampleDatasets.opensky_partitioned
    DROP PARTITION 20190415

-- Specified part/partition is not active(active=0) after drop partition
┌─partition─┬─name───────────────┬─partition_id─┬─active─┐
│ 20190415  │ 20190415_147_147_1 │ 20190415     │      0 │
└───────────┴────────────────────┴──────────────┴────────┘

-- Specified part/partition is completely deleted after 10 minutes

MOVE PARTITION/PART

You can move the partition(s) to another table in Clickhouse. In that case, both source and destination tables have the same structure, partition key, order by key, primary key, storage policy and engine family. The other option for “move” is moving a part or partition to another disk or volume for merge tree engine tables.

Move the partition to another table:

-- Usage: ALTER TABLE <DATABASE_NAME.SOURCE_TABLE> [ON CLUSTER <CLUSTER_NAME>] MOVE PARTITION <PARTITION EXPRESSION> TO TABLE <DATABASE_NAME.DESTINATION_TABLE>

-- 1. Create empty opensky_partitioned_new table same as the opensky_partitioned

-- 2. Move 20190415 partition from opensky_partitioned to opensky_partitioned_new
ALTER TABLE sampleDatasets.opensky_partitioned
    MOVE PARTITION '20190415' TO TABLE sampleDatasets.opensky_partitioned_new
Ok.
--3. Take partition count for 20190415 in source table
SELECT count()
FROM sampleDatasets.opensky_partitioned
WHERE _partition_id = '20190415'
┌─count()─┐
│       0 │
└─────────┘

--4 Take partition count for 20190415 in destination table
SELECT count()
FROM sampleDatasets.opensky_partitioned_new
WHERE _partition_id = '20190415'

Query id: 49319c73-c84f-4fbf-838d-ab787971eaad

┌─count()─┐
│   78647 │
└─────────┘

For moving a part or partition to another disk or volume, we need to set a storage policy and create a new table with this policy. On the other hand, storage policy and data tiering are outside the scope of this post. We will not cover storage policy in this post. We only show the movement concept.

-- Our table's(opensky_partitioned_with_storage_policy) policy is "ssd_to_hdd" 
-- and this volume contain 2 disks(ssd and hdd)
SELECT *
FROM system.storage_policies
WHERE policy_name = (
    SELECT storage_policy
    FROM system.tables
    WHERE (database = 'sampleDatasets') AND (name = 'opensky_partitioned_with_storage_policy')

┌─policy_name─┬─volume_name─┬─volume_priority─┬─disks───┬─volume_type─┬─max_data_part_size─┬─move_factor─┬─prefer_not_to_merge─┐
│ ssd_to_hdd  │ ssd_volume  │               1 │ ['ssd'] │ JBOD        │                  0 │         0.1 │                   0 │
│ ssd_to_hdd  │ hdd_volume  │               2 │ ['hdd'] │ JBOD        │                  0 │         0.1 │                   0 │
└─────────────┴─────────────┴─────────────────┴─────────┴─────────────┴────────────────────┴─────────────┴─────────────────────┘

-- The 20190415 partition is stored under ssd disk.
SELECT
    partition,
    name,
    path
FROM system.parts
WHERE (database = 'sampleDatasets') AND (table = 'opensky_partitioned_with_storage_policy') AND (partition = '20190415')


┌─partition─┬─name───────────────┬─path────────────────────────────────────────────────────────────────────┐
│ 20190415  │ 20190415_107_107_0 │ /ssd/store/300/3005efb6-244d-473f-93ac-944233d79a55/20190415_107_107_0/ │
└───────────┴────────────────────┴─────────────────────────────────────────────────────────────────────────┘

-- Let's move it to the hdd disk
ALTER TABLE sampleDatasets.opensky_partitioned_with_storage_policy
    MOVE PARTITION '20190415' TO DISK 'hdd'

-Check the new volume
SELECT
    partition,
    name,
    path
FROM system.parts
WHERE (database = 'sampleDatasets') AND (table = 'opensky_partitioned_with_storage_policy') AND (partition = '20190415')

┌─partition─┬─name───────────────┬─path────────────────────────────────────────────────────────────────────┐
│ 20190415  │ 20190415_107_107_0 │ /hdd/store/300/3005efb6-244d-473f-93ac-944233d79a55/20190415_107_107_0/ │
└───────────┴────────────────────┴─────────────────────────────────────────────────────────────────────────┘

-- You can move parts and also you can move volume instead of disk 

FREEZE/UNFREEZE PARTITION/PART

We are using the “freeze” command to backup part, partitions and even tables. You can find detailed blog about freeze here.

CONCLUSION

ClickHouse has several manipulation operations for partitions and parts. In the scope of this post, we talked about attach, detach, move, drop and freeze operations in partitions. There are also replace, update, delete and fetch operations in partition operations. In the next and last part of this series, we’ll talk about partition expressions and mutations.

About Emrah Idman 9 Articles
Emrah Idman has considerable experience in relational and NoSQL databases. He has worked in a large-scale financial company for over 15 years. He has significant experience in team management, procurement and capacity planning, database administration and product testing for high-volume systems. He is working at ChistaDATA Inc. as senior database administrator.
Contact: Website