ClickHouse November 2023 Release – Version 23.11

Every new release includes new features, enhancements, and numerous bug fixes, and the ChistaDATA team always stays on top of the latest releases. On December 06, 2023, ClickHouse version 23.11 was released, and this version contains the following;

  • 25 new features,
  • 24 performance optimizations,
  • 70 bug fixes.

For further details, please see the official ClickHouse docs here.

This article will look at the critical features of the ClickHouse 23.11 release.

fileCluster

Presenting the fileCluster table function, which proves beneficial when linking a shared filesystem (e.g., NFS) to the directory named user_files.

Also, unveil a fresh table function and table engine, akin to s3Cluster, urlCluster, hdfsCluster, and azureBlobStorageCluster.

SELECT uniq(UserID) FROM fileCluster('partitioned/hits*.parquet')

Processes the set of files, spreading the work across the servers in a cluster.

Requires the usage of network filesystem,
mounted inside the user_files path,
so all the files are visible on every server of a cluster.

preferred_optimize_projection_name

A recently integrated feature known as “preferred_optimize_projection_name” has been implemented. When configured with a non-empty string, this setting enables the selection of a specific projection, if available, instead of considering all potential candidates.

SET preferred_optimize_projection_name = 'summaries';

prefers using the specified projection if it exists instead of automatic selection.

system.blob_storage_log

Incorporated a new system table named “blob_storage_log.” This table facilitates the auditing of all data written to object storages such as S3 and others.

CREATE TABLE system.blob_storage_log
(
    event_date Date,
    event_time DateTime,
    event_time_microseconds DateTime64(6),
    event_type Enum8(
        'Upload' = 1, 'Delete' = 2,
        'MultiPartUploadCreate' = 3, 'MultiPartUploadWrite' = 4,
        'MultiPartUploadComplete' = 5, 'MultiPartUploadAbort' = 6),
    query_id String,
    thread_id UInt64,
    thread_name String,
    disk_name LowCardinality(String),
    bucket String,
    remote_path String,
    local_path String,
    data_size UInt64,
    error String
)

Keeper Improvements

Improve memory-usage during startup by delaying log preprocessing.

A four-letter command to yield leadership:

echo 'rqld' | nc localhost 2181

Allows to initiate a leader election and wait for a new leader
before removing the node.

Compression in the network protocol.

Controlled on the client side:

zookeeper:
  node:
    host: example1
    port: 2181
    use_compression: true

Information about Availability Zone.

keeper_server:
  availability_zone:
    # option 1, explicit value:
    value: eu-central-1b
    # option 2, from the metadata service:
    enable_auto_detection_on_cloud: true

The info is published in /keeper/availability-zone

_size virtual column

Include a new virtual column named “_size” to represent file size in bytes for the following engines: s3, file, hdfs, url, and azureBlobStorage. This enhancement applies specifically to external tables in these engines, supplementing the existing virtual columns “_file” and “_path.”

SELECT _file, _size, count(), uniq(UserID) FROM s3(
  's3://clickhouse-public-datasets/hits_compatible/athena_partitioned/hits*.parquet')
GROUP BY _file, _size ORDER BY _file

Concurrency Control

Raise the default value of “max_concurrent_queries” from 100 to 1000. This adjustment is particularly advantageous in situations with numerous connecting clients engaged in slow data transmission or reception, ensuring that server performance isn’t hindered by CPU limitations. This change is especially relevant when the number of CPU cores exceeds 100.

Additionally, enable concurrency control by default, setting the preferred number of query processing threads to twice the number of CPU cores. This enhancement aims to enhance performance, especially in scenarios involving an extensive volume of concurrent queries.

As of version 23.11, it’s noteworthy that concurrency control is now enabled by default.

It automatically adjusts the number of threads per query.

concurrent_threads_soft_limit_ratio_to_cores: 2

Result: high QPS without performance degradation.

 

 

This version has made valuable updates for improve the overall efficiency and responsiveness of the system, especially in environments with a substantial volume of concurrent activities. Future releases will allow us to use ClickHouse more efficiently.

These are the ClickHouse 23.11 features. To find out more details, please visit the official ClickHouse Docs.

About Can Sayn 36 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website