ClickHouse January 2024 Release – v24.1

Introduction

Every new release includes new features, enhancements, and numerous bug fixes, and the ChistaDATA team always stays on top of the latest releases. On January 30, 2024, ClickHouse version 24.1 was released, and this version contains the following;

  • 26 new features,
  • 22 performance optimizations,
  • 47 bug fixes.

For further details, please see the official ClickHouse docs here.

This article will look at the critical features of the ClickHouse 24.1 release.

Key features & improvements

(1) Improvements For Replicated Databases

Introduced two new modes, null_status_on_timeout_only_active and throw_only_active, for the distributed_ddl_output_mode. These modes enable the avoidance of waiting for inactive replicas.

SET distributed_ddl_output_mode = 'throw_only_active';
SET distributed_ddl_output_mode = 'null_status_on_timeout_only_active';

(2) arrayShingles

Introduce the arrayShingles function to generate subarrays. For example, calling arrayShingles([1, 2, 3, 4, 5], 3) will yield [[1,2,3],[2,3,4],[3,4,5]].

SELECT
    'ClickHouse is a good database' AS phrase,
    tokens(phrase) AS tok,
    arrayShingles(tok, 3) AS shingles

Row 1:
──────
phrase:   ClickHouse is a good database
tok:      ['ClickHouse','is','a','good','database']
shingles: [['ClickHouse','is','a'],['is','a','good'],['a','good','database']]

(3) quantileDD

quantileDDquantilesDDmedianDD

Introduce the quantileDD aggregate function along with its counterparts quantilesDD and medianDD, which are derived from the DDSketch algorithm outlined in https://www.vldb.org/pvldb/vol12/p2195-masson.pdf. This includes updating the documentation to reflect these user-facing changes.

SELECT quantileExact(c), quantileDD(0.0001)(c), quantile(c),
   quantileBFloat16(c), quantileTiming(c), quantileTDigest(c)
FROM (
   SELECT created_at::Date, count() AS c FROM github_events
   WHERE repo_name = 'ClickHouse/ClickHouse'
      AND event_type = 'PullRequestEvent' AND action = 'opened'
   GROUP BY ALL)

──────
quantileExact(c):            19
quantileDD(0.0001)(c):       19.001159522718307
quantile(c):                 19
quantileBFloat16(c):         19
quantileTiming(c):           19
quantileTDigest(c):          18.804445

(4) Functions For Punycode

punycodeEncodepunycodeDecodeidnaEncodeidnaDecode

New features have been incorporated, including punycodeEncode, punycodeDecode, idnaEncode, and idnaDecode, facilitating the conversion of international domain names into an ASCII format in line with the IDNA standard.

:) SELECT punycodeEncode('ClickHouse是一个很好的数据库')

ClickHouse-zf2pypw92j24o7ldjpvw6hdrd236i

:) SELECT idnaEncode('ClickHouse.是一个不错的.数据库')

clickhouse.xn--4gq0a0fy48indsd45b.xn--dxty1ibyb

:) SELECT idnaDecode('clickhouse.xn--4gq0a0fy48indsd45b.xn--dxty1ibyb')

clickhouse.是一个不错的.数据库

(5) New String Similarity Functions

levenshteinDistance, damerauLevenshteinDistance, jaroSimilarityjaroWinklerSimilarity

Incorporated new string similarity functionalities: dramerauLevenshteinDistance, jaroSimilarity, and jaroWinklerSimilarity.

SELECT word,
          levenshteinDistance(word, 'clickhouse') AS d1,
   damerauLevenshteinDistance(word, 'clickhouse') AS d2,
               jaroSimilarity(word, 'clickhouse') AS d3,
        jaroWinklerSimilarity(word, 'clickhouse') AS d4
FROM (
    SELECT DISTINCT arrayJoin(tokens(lower(title))) AS word
    FROM hackernews)
ORDER BY d1 ASC LIMIT 50

(6) Control For Compression Level

Introduce two settings: output_format_compression_level to adjust the compression level of the output, and output_format_compression_zstd_window_log to specify the compression window size explicitly and activate long-range mode for zstd compression when the output compression method is zstd. These settings are applicable when using INTO OUTFILE and when writing to table functions file, URL, HDFS, S3, and Azure Blob Storage.

:) SELECT text FROM hackernews INTO OUTFILE 'text.tsv.zst'
   SETTINGS output_format_compression_level = 6;

:) SELECT text FROM hackernews INTO OUTFILE 'text.tsv.zst'
   SETTINGS output_format_compression_level = 6,
            output_format_compression_zstd_window_log = 26;

(7) Speed Up For Parallel Replicas

The coordination mechanism for parallel replicas has been revamped to enhance parallelism and optimize cache locality. Extensive testing has confirmed its linear scalability across hundreds of replicas. Additionally, it now supports reading in sequential order.

SET allow_experimental_parallel_reading_from_replicas = 1,
    max_parallel_replicas = 123;

Enhanced cache locality entails reading identical ranges from matching replicas when accessible.

Improved tail latency involves quicker replicas usurping tasks from slower counterparts.

Conclusion

In summary, these updates, implemented by the ClickHouse database, represent a substantial stride forward in optimizing performance, scalability, and resource efficiency. By focusing on improving parallelism, cache locality, and reducing memory usage, ClickHouse has demonstrated a commitment to enhancing the user experience and meeting the evolving demands of modern data management. The introduction of new modes for distributed DDL output handling further underscores ClickHouse’s dedication to providing flexibility and control to its users. These updates collectively reinforce ClickHouse’s position as a leading solution for high-performance analytical workloads.

These are the ClickHouse 24.1 features. To find out more details, please visit the official ClickHouse Docs.

Learn about the last v24.2 release in our release notes.

About Can Sayn 41 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website