ClickHouse November 2023 Release – v23.11

Introduction

Every new release includes new features, enhancements, and numerous bug fixes, and the ChistaDATA team always stays on top of the latest releases. On December 06, 2023, ClickHouse version 23.11 was released, and this version contains the following;

  • 25 new features,
  • 24 performance optimizations,
  • 70 bug fixes.

For further details, please see the official ClickHouse docs here.

This article will look at the critical features of the ClickHouse 23.11 release.

Key features & improvements

1. fileCluster

Presenting the fileCluster table function, which proves beneficial when linking a shared filesystem (e.g., NFS) to the directory named user_files.

Also, unveil a fresh table function and table engine, akin to s3Cluster, urlCluster, hdfsCluster, and azureBlobStorageCluster.

SELECT uniq(UserID) FROM fileCluster('partitioned/hits*.parquet')

Processes the set of files, spreading the work across the servers in a cluster.

Requires the usage of network filesystem,
mounted inside the user_files path,
so all the files are visible on every server of a cluster.

2. preferred_optimize_projection_name

A recently integrated feature known as “preferred_optimize_projection_name” has been implemented. When configured with a non-empty string, this setting enables the selection of a specific projection, if available, instead of considering all potential candidates.

SET preferred_optimize_projection_name = 'summaries';

prefers using the specified projection if it exists instead of automatic selection.

3. system.blob_storage_log

Incorporated a new system table named “blob_storage_log.” This table facilitates the auditing of all data written to object storages such as S3 and others.

CREATE TABLE system.blob_storage_log
(
    event_date Date,
    event_time DateTime,
    event_time_microseconds DateTime64(6),
    event_type Enum8(
        'Upload' = 1, 'Delete' = 2,
        'MultiPartUploadCreate' = 3, 'MultiPartUploadWrite' = 4,
        'MultiPartUploadComplete' = 5, 'MultiPartUploadAbort' = 6),
    query_id String,
    thread_id UInt64,
    thread_name String,
    disk_name LowCardinality(String),
    bucket String,
    remote_path String,
    local_path String,
    data_size UInt64,
    error String
)

4. Keeper Improvements

Improve memory-usage during startup by delaying log preprocessing.

A four-letter command to yield leadership:

echo 'rqld' | nc localhost 2181

Allows to initiate a leader election and wait for a new leader
before removing the node.

Compression in the network protocol.

Controlled on the client side:

zookeeper:
  node:
    host: example1
    port: 2181
    use_compression: true

Information about Availability Zone.

keeper_server:
  availability_zone:
    # option 1, explicit value:
    value: eu-central-1b
    # option 2, from the metadata service:
    enable_auto_detection_on_cloud: true

The info is published in /keeper/availability-zone

5. _size virtual column

Include a new virtual column named “_size” to represent file size in bytes for the following engines: s3, file, hdfs, url, and azureBlobStorage. This enhancement applies specifically to external tables in these engines, supplementing the existing virtual columns “_file” and “_path.”

SELECT _file, _size, count(), uniq(UserID) FROM s3(
  's3://clickhouse-public-datasets/hits_compatible/athena_partitioned/hits*.parquet')
GROUP BY _file, _size ORDER BY _file

6. Concurrency Control

Raise the default value of “max_concurrent_queries” from 100 to 1000. This adjustment is particularly advantageous in situations with numerous connecting clients engaged in slow data transmission or reception, ensuring that server performance isn’t hindered by CPU limitations. This change is especially relevant when the number of CPU cores exceeds 100.

Additionally, enable concurrency control by default, setting the preferred number of query processing threads to twice the number of CPU cores. This enhancement aims to enhance performance, especially in scenarios involving an extensive volume of concurrent queries.

As of version 23.11, it’s noteworthy that concurrency control is now enabled by default.

It automatically adjusts the number of threads per query.

concurrent_threads_soft_limit_ratio_to_cores: 2

Result: high QPS without performance degradation.

Conclusion

ClickHouse version 23.11 introduces significant enhancements and features focused on improving performance, efficiency, and usability. From new table functions like fileCluster to concurrency control improvements and auditing capabilities with system.blob_storage_log, this release strengthens ClickHouse as a powerful analytical database system. The default settings adjustments ensure smoother operation in high-concurrency environments. Overall, ClickHouse 23.11 empowers users to work with data more effectively and efficiently, setting the stage for future advancements in data analytics.

These are the ClickHouse 23.11 features. To find out more details, please visit the official ClickHouse Docs.

To know more about ClickHouse releases, do visit the following articles:

You might also like:

About ChistaDATA Inc. 174 Articles
We are an full-stack ClickHouse infrastructure operations Consulting, Support and Managed Services provider with core expertise in performance, scalability and data SRE. Based out of California, Our consulting and support engineering team operates out of San Francisco, Vancouver, London, Germany, Russia, Ukraine, Australia, Singapore and India to deliver 24*7 enterprise-class consultative support and managed services. We operate very closely with some of the largest and planet-scale internet properties like PayPal, Garmin, Honda cars IoT project, Viacom, National Geographic, Nike, Morgan Stanley, American Express Travel, VISA, Netflix, PRADA, Blue Dart, Carlsberg, Sony, Unilever etc