Introduction
Every new release includes new features, enhancements, and numerous bug fixes, and the ChistaDATA team always stays on top of the latest releases. On October 25, 2022, ClickHouse version 22.10 (October 2022) was released, and this version contains 12 new features, 11 performance optimizations, and 40 bug fixes.
For further details, please see the official ClickHouse change log list here.
- Source Code – GitHub Link
- Installation Instructions – Official Documentation
- v22.10 Release Webinar – Recording
- v22.10 Release Webinar – Slides
Let’s look at some important new features.
Key features & improvements
1. Control Over Merging
If your tables have more than one part/partition, they will be merged during the SELECT query, and the result will be fetched. As a result of the merging process, time will be lost. SELECT queries will be faster if only one part/partition exists, and the min_age_to_force_merge_seconds parameter can help with that. For the duration specified, this table-based parameter will run in the background and always merge your part/partition in your table. You can modify your table or also consider this while creating a new one.
ALTER TABLE test_tbl MODIFY SETTING min_age_to_force_merge_seconds = 7776000 -- 3 months specified
┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ CREATE TABLE test_tbl ( `id` UInt16, `create_time` Date, `comment` Nullable(String) ) ENGINE = MergeTree PARTITION BY create_time PRIMARY KEY (id, create_time) ORDER BY (id, create_time) TTL create_time + toIntervalMonth(1) SETTINGS min_age_to_force_merge_seconds = 8640000 │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
2. Resetting The Settings
You changed the parameters on a session basis, but now you want them to reset. Use the reset option to the global default value.
ch-server :) SET max_block_size = 4096; SET max_block_size = 4096 ch-server :) SELECT * FROM system.settings WHERE changed; ┌─name───────────┬─value─┬─changed─┬─description────────────────────┬─min──┬─max──┬─readonly─┬─type───┐ │ max_block_size │ 4096 │ 1 │ Maximum block size for reading │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ UInt64 │ └────────────────┴───────┴─────────┴────────────────────────────────┴──────┴──────┴──────────┴────────┘
ch-server :) SET max_block_size = DEFAULT; SET max_block_size = DEFAULT ch-server :) SELECT * FROM system.settings WHERE changed; 0 rows in set.
Additionally, when the max_block_size parameter is set to the default value, we can notice their default value as below.
ch-server :) SELECT getSetting('max_block_size') ┌─getSetting('max_block_size')─┐ │ 65409 │ └──────────────────────────────┘
3. Keeper Snapshots in S3
Want to take additional backups for your metadata? Now snapshots from ClickHouse Keeper can be uploaded to S3. Each snapshot is zstd compressed, and these snapshots are not used for restoration automatically. If everything on your local drive was lost, ClickHouse metadata could be restored via Keeper backups. It is simple to download and restore.
s3_snapshot: endpoint: http://minio1:9001/snapshots/ access_key_id: minio secret_access_key: minio123
4. Backup Table/Database to S3
The process of backing up your table or database to S3 is simple. just type a ‘BACKUP TABLE table_name TO S3’ or ‘BACKUP DATABASE database_name TO S3’. Isn’t it quite simple?
ch-server :) BACKUP TABLE test_tbl TO S3( 'https://mars-doc-test.s3.amazonaws.com/backup-S3/my_backup', 'ABC123', 'Abc+123')
We can also restore from there.
ch-server :) RESTORE TABLE test_tbl FROM S3( 'https://mars-doc-test.s3.amazonaws.com/backup-S3/my_backup', 'ABC123', 'Abc+123')
5. Composable Protocols
Will you use a proxy for DDOS protection and HAProxy for load balancing? From there, ClickHouse will receive connections, requiring the source IP address for quotas, ACL, and logging. It is possible to enable the PROXYv1 protocol in the proxy and configure it in ClickHouse as a protocol wrapper. The header is then read and the network packets are unwrapped by ClickHouse.
<tcp_proxy> <type>proxy1</type> <impl>tcp</impl> <port>9100</port> <description>native protocol with PROXYv1</description> </tcp_proxy>
6. Morton Curve
Encode/decode functions for Morton Coding (ZCurve) are now available in ClickHouse.
SELECT mortonDecode(2, mortonEncode(CAST('255', 'UInt8'), CAST('65536', 'UInt32'))) ┌─mortonDecode(2, mortonEncode(CAST('255', 'UInt8'), CAST('65536', 'UInt32')))─┐ │ (255,65536) │ └──────────────────────────────────────────────────────────────────────────────┘
The Z-order curve inspired to innovate in compression and parallel data construction. Morton order with z-curve as a tool for linearization of multidimensional data grids.
Two-dimensional data is linearized. a is the first level, and b is the second level in this example.
7. ClickHouse Sink for Kafka Connect
ClickHouse already has a Kafka table engine for publishing or subscribing to data flows, but it now also has a Sink for Kafka Connect.
So, what exactly is it and why? You can use Kafka Connect to push data to ClickHouse without installing the Kafka Engine. It has benefits over the Kafka engine. It makes sense to use this tool with exactly-once semantics and push model instead of pull features.
It is in the experimental stage, and you are welcome to try it via https://github.com/ClickHouse/clickhouse-kafka-connect
Other Notable Enhancements/Features
- Fuzzy History Search.
- Faster Reconnection to Keeper.
- The system table
asynchronous_insert_log
was added. It includes information on asynchronous inserts. - Add support for the
lz4
,bz2
,snappy
in HTTP’sAccept-Encoding,
a non-standard extension to the HTTP protocol. - Functions (
randUniform
,randNormal
,randLogNormal
,randExponential
,randChiSquared
,randStudentT
,randFisherF
,randBernoulli
,randBinomial
,randNegativeBinomial
,randPoisson
) to generate random values based on the specified distributions have been added.
Conclusion
The ClickHouse October 2022 Release (Version 22.10) brings a host of new features, performance enhancements, and bug fixes to improve data management and analysis. With capabilities like control over merging, Keeper snapshots in S3, composable protocols, Morton curve, ClickHouse Sink for Kafka Connect, and more, users can expect enhanced efficiency and flexibility in handling their data.
To know more about ClickHouse releases, do visit the following articles:
- ClickHouse Release Notes – v22.11 (November 2022)
- ClickHouse Release Notes – v22.12 (December 2022)
- ClickHouse Release Notes – v23.1 (January 2023)
- ClickHouse Release Notes – v23.3 (March 2023)
- ClickHouse Release Notes – v23.4 (April 2023)
- ClickHouse Release Notes – v23.5 (May 2023)
References