Image by walpepers.com
Introduction
Every new release includes new features, enhancements, and numerous bug fixes, and the ChistaDATA team always stays on top of the latest releases. On April, 2024, ClickHouse version 24.4 was released, and this version contains the following;
- 13 new features,
- 16 performance optimizations,
- 64 bug fixes.
For further details, please see the official ClickHouse docs here.
- v24.4 Source Code : GitHub Link
- v24.4 Release Webinar : Slides
- Installation: ClickHouse Docs
This article will look at the critical features of the ClickHouse 24.4 release.
DROP Multiple Tables
Supports dropping multiple tables at the same time like DROP TABLE a, b, c
;
DROP TABLE chista, data, consultant;
Unloading Primary Key
Utilize SYSTEM UNLOAD PRIMARY KEY to optimize memory allocation for the primary key of a table.
SYSTEM UNLOAD PRIMARY KEY db.table
Enabling SYSTEM UNLOAD PRIMARY KEY allows for freeing up memory allocated to the primary key of a table, especially beneficial for tables with infrequent usage. The primary key data will be loaded on demand during subsequent queries to the table.
New Virtual Columns in MergeTree
Introduced a new persistent virtual column called ‘_block_offset,’ which retains the original row number assigned during insertion. Enabling persistence for ‘_block_offset’ can be done through the MergeTree setting ‘enable_block_offset_column.’ Additionally, a new virtual column named ‘_part_data_version’ has been added, containing either the minimum block number or the mutation version of the partition. It’s worth noting that the persistent virtual column ‘_block_number’ is no longer in its experimental phase.
CREATE TABLE ... ENGINE = MergeTree SETTINGS enable_block_number_column = 1, enable_block_offset_column = 1;
Transform OUTER to INNER JOIN
When the filter applied after a JOIN consistently filters out default values, the OUTER JOIN is automatically optimized to an INNER JOIN. This optimization is governed by the setting ‘query_plan_convert_outer_join_to_inner_join,’ which is enabled by default.
SELECT * FROM l LEFT JOIN r ON l.id = r.id WHERE r.value != '' -- this allows the optimization
SELECT * FROM l INNER JOIN r ON l.id = r.id WHERE r.value != ''
Conclusion
In summary, these updates, implemented by the ClickHouse database, represent a substantial stride forward in optimizing performance, scalability, and resource efficiency. By focusing on improving parallelism, cache locality, and reducing memory usage, ClickHouse has demonstrated a commitment to enhancing the user experience and meeting the evolving demands of modern data management. The introduction of new modes for distributed DDL output handling further underscores ClickHouse’s dedication to providing flexibility and control to its users. These updates collectively reinforce ClickHouse’s position as a leading solution for high-performance analytical workloads.
These are the ClickHouse 24.4 features. To find out more details, please visit the official ClickHouse Docs.