ClickHouse MergeTree: Introduction to VersionedCollapsingMergeTree

Introduction

The VersionedCollapsingMergeTree table engine is again based on MergeTree engine, and it adds more functionality on top of CollapsingMergeTree engine. You can read about CollapsingMergeTree engine and the collapsing rules here.

CollapsingMergeTree engine requires the rows in specific order in order to collapse the rows.  The ordering may be lost when rows are inserted from multiple different threads.  VersionedCollapsingMergeTree is specifically designed to overcome this by adding a new ‘version’ column.

A sign column (Int8 data type) is mandatory at the time of table creation and holds either 1 or -1 (rows with 1 in the sign column are called state rows, and -1 are called cancel rows ). The collapsing logic uses ‘version’ column along with the sign column. This engine deletes the rows if they have different sign values and the same sorting key and version.

This engine is quite useful where constant Updates and Deletes are required. Collapsing the rows is far more efficient than performing Updates and Deletes via ALTER statement. Mutations could potentially cause performance degradation in ClickHouse if done quite frequently and in large numbers. This engine carries forward the same speed and performance from MergeTree engine.

Creating a table with VersionedCollapsingMergeTree

Let us create a table based on the VersionedCollapsingMergeTree engine to illustrate its working.

CREATE TABLE ver_col_mt
(
    ID UInt64,
    Name String,
    Sign Int8,
    Version UInt8
)
ENGINE = VersionedCollapsingMergeTree(Sign, Version)
ORDER BY ID;

Let us insert two rows of data (state rows with the same sorting key but a different version).

INSERT INTO ver_col_mt VALUES (1, 'a', 1, 1);

INSERT INTO ver_col_mt VALUES (1, 'b', 1, 2);

Verify using the below query.

chistadata :) SELECT * FROM ver_col_mt;

SELECT *
FROM ver_col_mt

Query id: 6bf374dd-b456-48ff-af2f-2f87d319998f

┌─ID─┬─Name─┬─Sign─┬─Version─┐
│  1 │ b    │    1 │       2 │
└────┴──────┴──────┴─────────┘
┌─ID─┬─Name─┬─Sign─┬─Version─┐
│  1 │ a    │    1 │       1 │
└────┴──────┴──────┴─────────┘

2 rows in set. Elapsed: 0.003 sec. 

These two rows can co-exist since they are of different versions despite having a same sorting key.

Deleting a row with VersionedCollapsingMergeTree

Let us remove the first version by inserting a cancel row.

chistadata :) INSERT INTO ver_col_mt VALUES (1, 'a', -1, 1);
              

INSERT INTO ver_col_mt FORMAT Values

Query id: 980cd62e-b797-420c-b86a-c5373486884d

Ok.

1 row in set. Elapsed: 0.004 sec. 

Verify if the row corresponding to the first version has collapsed.

chistadata :) SELECT * FROM ver_col_mt;

SELECT *
FROM ver_col_mt

Query id: 28174613-c08e-478d-b22f-470626a79ff3

┌─ID─┬─Name─┬─Sign─┬─Version─┐
│  1 │ b    │    1 │       2 │
└────┴──────┴──────┴─────────┘

1 row in set. Elapsed: 0.002 sec. 

Updating a row with VersionedCollapsingMergeTree

Cancel the old row and insert a state row with higher version and updated values.

chistadata :) INSERT INTO ver_col_mt VALUES (1, 'b', -1, 2), (1, 'c', 1, 3);

INSERT INTO ver_col_mt FORMAT Values

Query id: 581672d2-b1a1-4968-8776-0a7cab1a2e36

Ok.

2 rows in set. Elapsed: 0.004 sec. 

Verify using the below SQL.

chistadata :) SELECT * FROM ver_col_mt;

SELECT *
FROM ver_col_mt

Query id: 02f6fc56-cee3-478c-ade0-313629e7f56e

┌─ID─┬─Name─┬─Sign─┬─Version─┐
│  1 │ c    │    1 │       3 │
└────┴──────┴──────┴─────────┘

1 row in set. Elapsed: 0.002 sec. 

Note

If the newly inserted rows haven’t triggered any merges, we can use the OPTIMIZE table statement.

Conclusion

  • VersionedCollapsingMergeTree engine collapses the rows with the same sorting key , version and different sign value
  • Collapsing the rows are far more efficient than mutations (ALTER statement)
  • The collapsing of data rows happens in the background during the merge operations.

 

To learn more about ClickHouse MergeTree storage engines, read the following articles:

References: VersionedCollapsingMergeTree in ClickHouse Docs