ClickHouse October 2022 Release – v22.10

Introduction

Every new release includes new features, enhancements, and numerous bug fixes, and the ChistaDATA team always stays on top of the latest releases. On October 25, 2022, ClickHouse version 22.10 (October 2022) was released, and this version contains 12 new features, 11 performance optimizations, and 40 bug fixes.

For further details, please see the official ClickHouse change log list here.

Let’s look at some important new features.

Key features & improvements

1. Control Over Merging

If your tables have more than one part/partition, they will be merged during the SELECT query, and the result will be fetched. As a result of the merging process, time will be lost. SELECT queries will be faster if only one part/partition exists, and the min_age_to_force_merge_seconds parameter can help with that. For the duration specified, this table-based parameter will run in the background and always merge your part/partition in your table. You can modify your table or also consider this while creating a new one.

ALTER TABLE test_tbl MODIFY SETTING min_age_to_force_merge_seconds = 7776000 -- 3 months specified
┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE test_tbl
(
    `id` UInt16,
    `create_time` Date,
    `comment` Nullable(String)
)
ENGINE = MergeTree
PARTITION BY create_time
PRIMARY KEY (id, create_time)
ORDER BY (id, create_time)
TTL create_time + toIntervalMonth(1)
SETTINGS min_age_to_force_merge_seconds = 8640000 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

2. Resetting The Settings

You changed the parameters on a session basis, but now you want them to reset. Use the reset option to the global default value.

ch-server :) SET max_block_size = 4096;

SET max_block_size = 4096

ch-server :) SELECT * FROM system.settings WHERE changed;

┌─name───────────┬─value─┬─changed─┬─description────────────────────┬─min──┬─max──┬─readonly─┬─type───┐
│ max_block_size │ 4096   │       1 │ Maximum block size for reading │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ UInt64 │
└────────────────┴───────┴─────────┴────────────────────────────────┴──────┴──────┴──────────┴────────┘
ch-server :) SET max_block_size = DEFAULT;

SET max_block_size = DEFAULT

ch-server :) SELECT * FROM system.settings WHERE changed;

0 rows in set.

Additionally, when the max_block_size parameter is set to the default value, we can notice their default value as below.

ch-server :) SELECT getSetting('max_block_size')

┌─getSetting('max_block_size')─┐
│                        65409 │
└──────────────────────────────┘

3. Keeper Snapshots in S3

Want to take additional backups for your metadata? Now snapshots from ClickHouse Keeper can be uploaded to S3. Each snapshot is zstd compressed, and these snapshots are not used for restoration automatically. If everything on your local drive was lost, ClickHouse metadata could be restored via Keeper backups. It is simple to download and restore.

s3_snapshot:
  endpoint: http://minio1:9001/snapshots/
  access_key_id: minio
  secret_access_key: minio123

4. Backup Table/Database to S3

The process of backing up your table or database to S3 is simple. just type a ‘BACKUP TABLE table_name TO S3’ or ‘BACKUP DATABASE database_name TO S3’. Isn’t it quite simple?

ch-server :) BACKUP TABLE test_tbl TO S3(
  'https://mars-doc-test.s3.amazonaws.com/backup-S3/my_backup',
  'ABC123', 'Abc+123')

We can also restore from there.

ch-server :) RESTORE TABLE test_tbl FROM S3(
  'https://mars-doc-test.s3.amazonaws.com/backup-S3/my_backup',
  'ABC123', 'Abc+123')

5. Composable Protocols

Will you use a proxy for DDOS protection and HAProxy for load balancing? From there, ClickHouse will receive connections, requiring the source IP address for quotas, ACL, and logging. It is possible to enable the PROXYv1 protocol in the proxy and configure it in ClickHouse as a protocol wrapper. The header is then read and the network packets are unwrapped by ClickHouse.

<tcp_proxy>
  <type>proxy1</type>
  <impl>tcp</impl>
  <port>9100</port>
  <description>native protocol with PROXYv1</description>
</tcp_proxy>

6. Morton Curve

Encode/decode functions for Morton Coding (ZCurve) are now available in ClickHouse.

SELECT mortonDecode(2, mortonEncode(CAST('255', 'UInt8'), CAST('65536', 'UInt32')))

┌─mortonDecode(2, mortonEncode(CAST('255', 'UInt8'), CAST('65536', 'UInt32')))─┐
│ (255,65536)                                                                  │
└──────────────────────────────────────────────────────────────────────────────┘

The Z-order curve inspired to innovate in compression and parallel data construction. Morton order with z-curve as a tool for linearization of multidimensional data grids.

Two-dimensional data is linearized. a is the first level, and b is the second level in this example.

7. ClickHouse Sink for Kafka Connect

ClickHouse already has a Kafka table engine for publishing or subscribing to data flows, but it now also has a Sink for Kafka Connect.

So, what exactly is it and why? You can use Kafka Connect to push data to ClickHouse without installing the Kafka Engine. It has benefits over the Kafka engine. It makes sense to use this tool with exactly-once semantics and push model instead of pull features.

It is in the experimental stage, and you are welcome to try it via https://github.com/ClickHouse/clickhouse-kafka-connect

Other Notable Enhancements/Features

  • Fuzzy History Search.
  • Faster Reconnection to Keeper.
  • The system table asynchronous_insert_log was added. It includes information on asynchronous inserts.
  • Add support for the lz4bz2snappy in HTTP’s Accept-Encoding, a non-standard extension to the HTTP protocol.
  • Functions (randUniformrandNormalrandLogNormalrandExponentialrandChiSquaredrandStudentTrandFisherFrandBernoullirandBinomialrandNegativeBinomialrandPoisson) to generate random values based on the specified distributions have been added.

Conclusion

The ClickHouse October 2022 Release (Version 22.10) brings a host of new features, performance enhancements, and bug fixes to improve data management and analysis. With capabilities like control over merging, Keeper snapshots in S3, composable protocols, Morton curve, ClickHouse Sink for Kafka Connect, and more, users can expect enhanced efficiency and flexibility in handling their data.

To know more about ClickHouse releases, do visit the following articles:

References

About Ilkay 25 Articles
Ilkay has been administering databases including NoSQL and RDBMS for over 7 years. He is experienced working high-scale banking databases. He is currently working at ChistaDATA Inc. as Database Administrator.
Contact: Website