ClickHouse 23.8 LTS – Release Blog

Introduction

The latest Long Term Support version of ClickHouse was released on 31 August (23.8 LTS). ClickHouse has two LTS releases every year (March and August) and this latest release has a lot of bug fixes (60+) which improve the stability of the ClickHouse database, lots of performance improvements, and brand new features that increase the usability of the ClickHouse. The complete list of features, improvements, and bug fixes is available here. I will cover the important and exciting ones in this blog post.

Features and improvements

Parallel processing of data in Azure Blob Storage

It is now possible to process files in Azure Blob Storage from multiple nodes in a ClickHouse cluster using azureBlobStorageCluster table function. The syntax is below.

azureBlobStorageCluster(cluster_name, connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression, structure])

Limit simultaneous sessions per authenticated user

There is a new setting to limit the maximum number of simultaneous sessions per authenticated user via max_sessions_for_user  setting.

Truncate the entire database (Use with utmost caution)

It is now possible to truncate all the tables in the database using the TRUNCATE DATABASE statement.

TRUNCATE DATABASE IF EXISTS <database name>

This operation will truncate all the tables within the specified database but the database will remain in the server.

Limit the number of threads used for the Primary key Index

We can now limit the number of CPU threads used to process the primary key  / Order By key index in a MergeTree family of table engines using  max_threads_for_indexes  setting.

Read data from compressed files

Using the File function, we can now read the data from compressed archives (zip, 7-zip, tar, etc.). The syntax is below.

SELECT <columns> FROM file('path_to_archive :: filename')

# Example

SELECT * FROM file('compressed_file.zip :: data.csv')

ALTER TABLE FREEZE – Not replicated in Replicated database engine

ClickHouse has an ALTER TABLE…FREEZE… command to backup partitions or tables as a whole. In Replicated database engines, the actions performed on the tables within the database are automatically carried out in the replica nodes as well. Starting from this version, the ALTER TABLE…FREEZE… will not be replicated across the nodes of a cluster for tables under replicated databases.

Changes in System Tables

  • A new column named ptr in system.trace_log – Contains the address of the allocation
  • A new column named query_cache_usage in system.query_log – Indicates whether query cache was used for the query
  • A new column named name to system.clusters table – To store alias for a cluster
  • A new column named partition to the system.part_log
  • A new column called name as an alias to event and metric in the system.events and system.metrics tables

Arrays

  • It is now possible to perform pointwise addition and subtraction on two arrays – Example [1, 2] + [3, 4]
  • New functions named arrayRotateLeft, arrayRotateRight, arrayShiftLeft, arrayShiftRight

Other improvements

  • Implement a new ORC input format parser without Apache Arrow’s dependency – Increased parsing speed for ORC files
  • Improved thread management with S3 table function – Increased speed for S3 tables
  • Maximum concurrent queries increased from 100 to 1000 (default Value)
  • Improved thread pool utilization for BACKUP/RESTORE

Notable bug fixes

  • Support IPv4 and IPv6 data types as dictionary attributes
  • ClickHouse mistakenly understand comma as part of date(time) field in CSV and it is fixed now
  • Fix reading of unnecessary columns in case of multistage PREWHERE clauses in SQL
  • Fix password leak in SHOW CREATE statement for MySQL table
  • Added a fix for incorrect checksums of compressed marks and primary key
  • A new fix for the “Context has expired” error in dictionaries when using subqueries.

Breaking Changes

Dynamic Disks

The dynamic disks should be specified as disk = disk(name = ‘disk_name’, …) instead of disk = disk_<disk_name>(…)

# Old Syntax

CREATE TABLE <table_name>
(
Columns ...,
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS disk = disk_s3_disk(type = cache, max_size = '100Ki', path =<path>, disk = s3_disk);

# New Syntax

CREATE TABLE <table_name>
(
Columns ...,
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS disk = disk(name = 's3_disk', type = cache, max_size = '100Ki', path = <path>, disk = s3_disk);

Metadata Cache

The metadata cache has been deprecated and will be removed in the future. This is because this feature could cause potential data loss. Along with this, system.merge_tree_metadata_cache table will also be removed.

3DES disabled for TLS connections

Triple data encryption standard (3DES) is recommended to be disabled and this is implemented now in ClickHouse for TLS connections.

System Tables

  • microseconds column is removed from the system.text_log table
  • milliseconds column is removed from the system.metric_log table

Conclusion

ClickHouse development team has rolled out yet another great Long-term support version of ClickHouse with the right balance between the features, improvements, and the number of bug fixes. For the complete list of changes, please refer to the changelog.

To know more about ClickHouse releases, do visit the following articles:

References

  • https://clickhouse.com/docs/en/whats-new/changelog#-clickhouse-release-238-lts-2023-08-31
  • https://www.youtube.com/watch?v=d1_pyoWcydk