ClickHouse January 2023 Release – Version 23.1

Introduction

ClickHouse team is back with a new release for the month of January. Being the first release of the year, it has lot of exciting features, enhancements, and speed improvements. This release has

  • 17 New features
  • 17 performance optimisations
  • 78 bug fixes

The release notes and Change log can be found here in the official documentation.

SQL Features

Array Join for Map data type

Array join clause is now supported for Map data type (key-value pair).

chistadata :) SELECT * FROM VALUES 
              ('map_data Map(UInt16, UInt8)', ('{1:1,2:2}'))
              ARRAY JOIN map_data;

SELECT *
FROM VALUES('map_data Map(UInt16, UInt8)', '{1:1,2:2}')
ARRAY JOIN map_data

Query id: 226d4fa9-0d1e-4e1b-b3bd-49a7e2389223

┌─map_data─┐
│ (1,1)    │
│ (2,2)    │
└──────────┘

2 rows in set. Elapsed: 0.002 sec. 

Age Function

The age function is similar to the dateDiff function, but it calculates the fully completed unit.

chistadata :) SELECT dateDiff('year',  toDate('2020-01-25'), toDate('2023-01-20'));
              

SELECT dateDiff('year', toDate('2020-01-25'), toDate('2023-01-20'))

Query id: d15cabc8-08fd-4acc-8e61-703c557dafbc

┌─dateDiff('year', toDate('2020-01-25'), toDate('2023-01-20'))─┐
│                                                            3 │
└──────────────────────────────────────────────────────────────┘

1 row in set. Elapsed: 0.001 sec. 
chistadata :) SELECT age('year',  toDate('2020-01-25'), toDate('2023-01-20'));

SELECT age('year', toDate('2020-01-25'), toDate('2023-01-20'))

Query id: cb2b1494-60e3-4828-bea5-874289ce6f18

┌─age('year', toDate('2020-01-25'), toDate('2023-01-20'))─┐
│                                                       2 │
└─────────────────────────────────────────────────────────┘

1 row in set. Elapsed: 0.001 sec. 

Standard Hexadecimal and Binary Literals

In order to have better SQL compliance, ClickHouse has added standard hex and binary literals.

chistadata :) SELECT x'43686973746144415441', b'01000011011010000110100101110011011101000110000101000100010000010101010001000001';

SELECT
    'ChistaDATA',
    'ChistaDATA'

Query id: d34db4b1-c6e6-4c92-855a-04be74b77370

┌─'ChistaDATA'─┬─'ChistaDATA'─┐
│ ChistaDATA   │ ChistaDATA   │
└──────────────┴──────────────┘

1 row in set. Elapsed: 0.002 sec. 

Parameterized Views

We can now create views with parameterized SELECT statements.

chistadata :) CREATE VIEW aml_view AS SELECT
              event_time, metric, value
              FROM system.asynchronous_metric_log
              WHERE metric = {metric_name:String};

CREATE VIEW aml_view AS
SELECT
    event_time,
    metric,
    value
FROM system.asynchronous_metric_log
WHERE metric = {metric_name:String}

Query id: d889e665-d5cd-4f9e-a3b6-0b7a04225931

Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.1.3 revision 54461.

Ok.

0 rows in set. Elapsed: 0.002 sec.
chistadata :) SELECT * FROM aml_view(metric_name='HTTPThreads') LIMIT 5;

SELECT *
FROM aml_view(metric_name = 'HTTPThreads')
LIMIT 5

Query id: 72f22dda-d1df-45d3-a67b-9886bb23a9c0

┌──────────event_time─┬─metric──────┬─value─┐
│ 2023-02-07 13:39:24 │ HTTPThreads │     1 │
│ 2023-02-07 13:39:25 │ HTTPThreads │     1 │
│ 2023-02-07 13:39:26 │ HTTPThreads │     1 │
│ 2023-02-07 13:39:27 │ HTTPThreads │     1 │
│ 2023-02-07 13:39:28 │ HTTPThreads │     1 │
└─────────────────────┴─────────────┴───────┘

5 rows in set. Elapsed: 0.012 sec. Processed 558.84 thousand rows, 7.43 MB (45.91 million rows/s., 610.31 MB/s.)

Other notable features

  • Streaming engines Kafka, RabbitMQ, NATS, Redpanda, etc, supports all the data formats supported in ClickHouse for INSERT operation. Added option to control block sizes while inserting into these engines.
  • Auto-detect headers in CSV/TSV files. Not required to specify the exact format anymore.

Performance Improvements

  • Reading Structs from Parquet or ORC is now even faster.
  • Reads from the S3 file engine are optimized further using asynchronous pre-fetching
  • Faster server startup times by not loading inactive data parts that will anyway be deleted after loading
  • MMap for File table engine, which makes it even faster for the data to be read from the disk
  • Query result cache (Experimental). We can configure the cache size, minimum query run time, minimum query runs, max staleness, passive usage, sharing cache between users etc. A cache is local on the nodes, not replicated or distributed.
  • Inverted full-text index – more on this will be covered in a separate follow-up article.

Integrations

  • Metabase
  • Java – Driver supports R2DBC and file upload/download
  • Python – Performance improvement with streaming

Backward incompatible changes

  • SYSTEM RESTART DISK query will not be supported
  • PREALLOCATE option for HASHED/SPARSE_HASHED dictionaries is no more valid
  • Gorilla codec on columns of non-Float32 or non-Float64 is disabled
  • Parallel quorum inserts support is completely disabled for MergeTree tables created using old syntax
  • Use the GetObjectAttributes request instead of the HeadObject request to get the size of an object in AWS S3.

Conclusion

This ClickHouse release has some exciting features and performance enhancements that are quite useful in various use cases. Being the year’s first release, the ClickHouse team has managed to deliver another version packed with useful features and fixes. Experimental features like inverted indices could well be a game changer for ClickHouse and make it a multipurpose DBMS in the future.

To know more about ClickHouse releases, do visit the following articles: