ClickHouse MergeTree: Deletes and Updates with CollapsingMergeTree

Introduction

Deletion and updating the records are common on transactional databases. ClickHouse supports deletes and updates in MergeTree table engine via the ALTER TABLE … statement. They are also known as Mutations which happen asynchronously in the background. Mutations involve rewriting the data parts and this may result in increased hardware usage, and it is not recommended to have frequent mutations happening in a table. An alternate way to perform updates and deletes is to use a CollapsingMergeTree table engine.

CollapsingMergeTree

This is a variant of the MergeTree table engine. A sign column (Int8 data type) is mandatory at the time of table creation and holds either 1 or -1 (rows with 1 in the sign column are called state rows and -1 are called cancel rows ). This engine deletes the rows if they have different sign values and the same sorting key. The collapsing logic is as follows. After sorting the rows based on ORDER BY columns, for every group of rows with same sorting keys

  1. If there is an equal number of state and cancel rows
    1. All the rows are collapsed if the last row is a cancel row and nothing remains in the group
    2. The first cancel row and the last state row is retained in the group if the last row is a state row
  2. If there is an unequal number of state and cancel rows
    1. The last state row is retained if there are more state rows
    2. The first cancel row is retained if there are more cancel rows

Getting Started

Let us create a table using CollapsingMergeTree engine. The table has three columns named ID, Name and Sign (the mandatory sign column). Sorting happens based on the ID column.

CREATE TABLE collapsing_mergetree
(
    ID UInt64,
    name String, 
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY ID;

Insert two rows of data. To demonstrate the update and delete, we shall use one row each.

INSERT INTO collapsing_mergetree VALUES
(22,'a', 1), (33, 'b', 1);

Delete operation

Performing a delete operation is fairly straightforward. To delete a row, simply insert a copy of the row with -1 in the sign column. The CollapsingMergeTree engine will take care of the rest.

INSERT INTO collapsing_mergetree (ID, Sign) VALUES (22,-1);

It is sufficient to insert the sorting key values and the sign column.

Update Operation

This is a two-step process that has a delete followed by an insert operation. Deletion of the old record is again based on the sorting key. The next step is to insert the new record which is again based on the sorting key.

INSERT INTO collapsing_mergetree (ID, Sign) VALUES (33,-1);

INSERT INTO collapsing_mergetree VALUES (33, 'c', -1);

We need two rows to be inserted for an update operation.

Note

If the newly inserted rows hasn’t triggered any merges, we can use the OPTIMIZE table statement.

Conclusion

  • CollapsingMergeTree engine collapses the rows with the same sorting key and different sign
  • Deletes and updates can be performed via Insert operation in this engine
  • The same can be done via ALTER TABLE statement
  • The collapsing of data rows with the same sorting key and different signs happens in the background during the merge operations.

To learn more about MergeTree in ClickHouse, do read the following articles:

References

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/collapsingmergetree/