ClickHouse November 2022 Release – Version 22.11

Introduction

Every new release includes new features, enhancements, and numerous bug fixes, and the ChistaDATA team always stays on top of the latest releases. On November 17, 2022, ClickHouse version 22.11 was released, and this version contains the following;

  • 15 new features,
  • 5 performance optimizations,
  • 32 bug fixes.

For further details, please see the official ClickHouse docs here.

This article will look at the critical features of the ClickHouse 22.11 release.

Key features & improvements

1. Composite Time Intervals

Support for composite time intervals. 1. Add, subtract and negate operations are now available on Intervals. In the case where the types of Intervals are different, they will be transformed into the Tuple of those types. 2. A tuple of intervals can be added to or subtracted from a Date/DateTime field. 3. Added parsing of Intervals with different types, for example: INTERVAL '1 HOUR 1 MINUTE 1 SECOND'

The example below shows your current time and the time 1 month 2 days ago.

SELECT now(), now() + (toIntervalMonth(1), toIntervalDay(-2))

SELECT
    now(),
    now() + (toIntervalMonth(1), toIntervalDay(-2))

Query id: 10793c38-c423-4eb0-b59f-13581e4bd6eb

┌───────────────now()─┬─plus(now(), tuple(toIntervalMonth(1), toIntervalDay(-2)))─┐
│ 2022-11-20 13:51:35 │                                       2022-12-18 13:51:35 │
└─────────────────────┴───────────────────────────────────────────────────────────┘

The following example shows a tuple.

SELECT INTERVAL '1 MONTH -2 DAY';

SELECT (toIntervalMonth(1), toIntervalDay(-2))

Query id: ff916581-fb93-4dcc-9ecb-50c46e17d2b4

┌─tuple(toIntervalMonth(1), toIntervalDay(-2))─┐
│ (1,-2)                                       │
└──────────────────────────────────────────────┘

To examine the tuple type, you can run the following command.

You will see that it is a tuple of two interval types.

SELECT toTypeName(INTERVAL '1 MONTH -2 DAY');

SELECT toTypeName((toIntervalMonth(1), toIntervalDay(-2)))

Query id: 51165ade-6f69-42fb-b9fe-28a2b4221b0d

┌─toTypeName(tuple(toIntervalMonth(1), toIntervalDay(-2)))─┐
│ Tuple(IntervalMonth, IntervalDay)                        │
└──────────────────────────────────────────────────────────┘

2. Support For ** Glob

ClickHouse now added ** glob support for recursive directory traversal of the filesystem and S3.

Now ClickHouse support: ***?{abc,def}{1..10}{01..10}.

You can run the following command if you like to see how many special characters you have in your code blog. This command tries to find cpp (c++) source code and cpp header files, then reads every single file.

SELECT line, count() FROM file ('**/*.{h.cpp}', LineAsString) GROUP BY line ORDER BY count() DESC LIMIT 20;

As you can see below, the most popular line in code blogs is an empty line.

─Line───────────────────────────────┬───────Count─┐
│                                   │     1518855 │
│                     ""            │       90748 │
│ {                                 │       31615 │
│ }                                 │       26350 │
│        {                          │       26306 │
│        }                          │       22941 │
│                {                  │       13439 │
│                }                  │       12465 │
│                          {        │        5798 │
│                          }        │        5473 │
│ };                                │        5270 │
│ namespace DB                      │        4535 │
│ public:                           │        3002 │
│ #pragma once                      │        2322 │
│                        else       │        2296 │
│                            {      │        2218 │
│                            }      │        2143 │
│ };                                │        2087 │
│ private:                          │        1996 │
│ else                              │        1692 │
└───────────────────────────────────┴─────────────┘

3. Functions For Spark Compatibility

  • Add function ascii like in Apache Spark: https://spark.apache.org/docs/latest/api/sql/#ascii.
  • Add function pmod which returns non-negative result based on modulo.
  • Add function formatReadableDecimalSize.
  • Add function randCanonical, which is similar to the rand function in Apache Spark or Impala. The function generates pseudo random results with independent and identically distributed uniformly distributed values in [0, 1).

Example;

SELECT modulo(-3, 10), -3 % 10;
                

SELECT
    -3 % 10,
    -3 % 10

Query id: 2f07420d-837f-45fc-a424-746faf14ffbc

┌─modulo(-3, 10)─┬─modulo(-3, 10)─┐
│             -3 │             -3 │
└────────────────┴────────────────┘
SELECT pmod(-3, 10), positiveModulo(-3, 10);
                

SELECT
    pmod(-3, 10),
    positiveModulo(-3, 10)

Query id: bcec8a40-32ac-405c-98d6-24d51469ff8c

┌─positiveModulo(-3, 10)─┬─positiveModulo(-3, 10)─┐
│                      7 │                      7 │
└────────────────────────┴────────────────────────┘

4. Functions For Convenience

Some functions have been developed for easier access to data about your database.

You can see how much disk space you have. For example, I have a 20 GB disk size on my local machine.

SELECT
    filesystemCapacity() AS size,
    formatReadableSize(size) AS bin,
    formatReadableDecimalSize(size) AS dec

Query id: b52f9f23-5362-4282-adea-d98b0f2504a9

┌────────size─┬─bin───────┬─dec──────┐
│ 20625031168 │ 19.21 GiB │ 20.63 GB │
└─────────────┴───────────┴──────────┘

And you can see your display name.

SELECT displayName()

Query id: cb049230-ee10-435f-9381-7047ee6163df

┌─displayName()─┐
│ clickhouse01  │
└───────────────┘

5. Retries On INSERT

Adds support for retries during INSERTs into ReplicatedMergeTree when a session with ClickHouse Keeper is lost. Apart from fault tolerance, it aims to provide a better user experience – avoid returning a user an error during insert if the keeper is restarted (for example, due to an upgrade)

When the session is expired, tables become readonly mode. If you run the following command, it will never happen again.

INSERT will survive restarts of ClickHouse Keeper or ZooKeeper
and reconnections.

SET insert_keeper_max_retries = 10;

6. Data Lakes

Added Hudi and DeltaLake table engines, read-only, only for tables on S3.

Added table function hudi and deltaLake.

ClickHouse can work as a server (clickhouse-server)
or as a tool without installation (clickhouse-local).ClickHouse can store the data
or process externally stored data on the fly.External data:
— remote databases: MySQL, PostgreSQL, MongoDB, ODBC, JDBC…
— object storages: S3, HDFS, Azure, COSN, OSS…
— from URL and local files;All possible data formats:
— text: CSV, TSV, JSON, Values, MySQLDump, Regexp…
— binary: Parquet, Arrow, ORC, Avro, Protobuf, MsgPack…
— schemaful and schemaless;
SELECT count() AS c FROM s3(
'https://clickhouse-public-datasets.s3.amazonaws.com/hits/native/*.native.zst')
  WHERE URL LIKE '%google%';
-- 5.924 sec.

SELECT count() AS c FROM s3Cluster('default',
'https://clickhouse-public-datasets.s3.amazonaws.com/hits/native/*.native.zst')
  WHERE URL LIKE '%google%';
-- 2.560 sec.

SELECT count() AS c FROM s3(
'https://clickhouse-public-datasets.s3.amazonaws.com/hits_compatible/athena_partitioned/hits_*.parquet')
  WHERE URL LIKE '%google%';
-- 5.887 sec.

SELECT count() AS c FROM s3Cluster('default',
'https://clickhouse-public-datasets.s3.amazonaws.com/hits_compatible/athena_partitioned/hits_*.parquet')
  WHERE URL LIKE '%google%'
-- 2.935 sec.

SELECT count() AS c FROM hits WHERE URL LIKE '%google%';
-- 0.913 sec.

ClickHouse also supports Apache Hudi and Delta Lake
for SELECT queries.

Advantages:
— these formats are somewhat resembling MergeTree
allowing incremental data insertion,
approaching to ClickHouse data formats;

Disadvantages:
— alien data formats from Apache/Hadoop/Java world;
— nothing works out of the box
unless you really know how to deal with it;

7. S3 Readonly Disk

Introduce s3_plain disk type for write-once-read-many operations. Implement ATTACH of MergeTree table for s3_plain disk

Attach tables directly from s3. No local state.

A new disk type — s3_plain.

$ cat /etc/clickhouse-server/config.d/s3_plain.yaml

storage_configuration:
  disks:
    s3_plain:
      type: s3_plain
        endpoint: 'http://clickhouse-public-datasets.s3.amazonaws.com/s3_plain/'
  policies:
    s3_plain:
      volumes:
        main:
          disk: s3_plain
ATTACH TABLE hits_worm UUID '5eb0149e-d250-4ec4-bfc7-a90ce167690f'
(
    `WatchID` UInt64,
    `JavaEnable` UInt8,
    `Title` String,
    `GoodEvent` Int16,
    `EventTime` DateTime,
    `EventDate` Date,
    `CounterID` UInt32,
    `ClientIP` UInt32,
    `RegionID` UInt32,
    `UserID` UInt64,
    `CounterClass` Int8,
    `OS` UInt8,
    `UserAgent` UInt8,
    `URL` String,
    `Referer` String,
    `Refresh` UInt8,
    `RefererCategoryID` UInt16,
    `RefererRegionID` UInt32,
    `URLCategoryID` UInt16,
    `URLRegionID` UInt32,
    `ResolutionWidth` UInt16,
    `ResolutionHeight` UInt16,
    `ResolutionDepth` UInt8,
    `FlashMajor` UInt8,
    `FlashMinor` UInt8,
    `FlashMinor2` String,
    `NetMajor` UInt8,
    `NetMinor` UInt8,
    `UserAgentMajor` UInt16,
    `UserAgentMinor` FixedString(2),
    `CookieEnable` UInt8,
    `JavascriptEnable` UInt8,
    `IsMobile` UInt8,
    `MobilePhone` UInt8,
    `MobilePhoneModel` String,
    `Params` String,
    `IPNetworkID` UInt32,
    `TraficSourceID` Int8,
    `SearchEngineID` UInt16,
    `SearchPhrase` String,
    `AdvEngineID` UInt8,
    `IsArtifical` UInt8,
    `WindowClientWidth` UInt16,
    `WindowClientHeight` UInt16,
    `ClientTimeZone` Int16,
    `ClientEventTime` DateTime,
    `SilverlightVersion1` UInt8,
    `SilverlightVersion2` UInt8,
    `SilverlightVersion3` UInt32,
    `SilverlightVersion4` UInt16,
    `PageCharset` String,
    `CodeVersion` UInt32,
    `IsLink` UInt8,
    `IsDownload` UInt8,
    `IsNotBounce` UInt8,
    `FUniqID` UInt64,
    `OriginalURL` String,
    `HID` UInt32,
    `IsOldCounter` UInt8,
    `IsEvent` UInt8,
    `IsParameter` UInt8,
    `DontCountHits` UInt8,
    `WithHash` UInt8,
    `HitColor` FixedString(1),
    `LocalEventTime` DateTime,
    `Age` UInt8,
    `Sex` UInt8,
    `Income` UInt8,
    `Interests` UInt16,
    `Robotness` UInt8,
    `RemoteIP` UInt32,
    `WindowName` Int32,
    `OpenerName` Int32,
    `HistoryLength` Int16,
    `BrowserLanguage` FixedString(2),
    `BrowserCountry` FixedString(2),
    `SocialNetwork` String,
    `SocialAction` String,
    `HTTPError` UInt16,
    `SendTiming` UInt32,
    `DNSTiming` UInt32,
    `ConnectTiming` UInt32,
    `ResponseStartTiming` UInt32,
    `ResponseEndTiming` UInt32,
    `FetchTiming` UInt32,
    `SocialSourceNetworkID` UInt8,
    `SocialSourcePage` String,
    `ParamPrice` Int64,
    `ParamOrderID` String,
    `ParamCurrency` FixedString(3),
    `ParamCurrencyID` UInt16,
    `OpenstatServiceName` String,
    `OpenstatCampaignID` String,
    `OpenstatAdID` String,
    `OpenstatSourceID` String,
    `UTMSource` String,
    `UTMMedium` String,
    `UTMCampaign` String,
    `UTMContent` String,
    `UTMTerm` String,
    `FromTag` String,
    `HasGCLID` UInt8,
    `RefererHash` UInt64,
    `URLHash` UInt64,
    `CLID` UInt32
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID), EventTime)
SETTINGS storage_policy = 's3_plain'

 

SELECT count() FROM hits_worm WHERE URL LIKE '%google%'

-- 2.533 sec.

Now change the disk type to cache and see the executing time result. Under than 1 second.

storage_configuration:
  disks:
    s3_plain_cache:
      type: cache
      disk: s3_plain
      path: /var/lib/clickhouse/cache/
      max_size: 1000000000000
  policies:
    s3_plain_cache:
      volumes:
        main:
          disk: s3_plain_cache
SELECT count() FROM hits_worm WHERE URL LIKE '%google%'

-- 0.719 sec.

These are the ClickHouse 22.11 features. To find out more details, please visit the official ClickHouse Docs.

Conclusion

The ClickHouse November 2022 Release (Version 22.11) introduces key features like composite time intervals, ** glob support, Spark-compatible functions, retries on INSERT, and support for Hudi and DeltaLake. These enhancements enhance data management and analysis capabilities.

To know more about ClickHouse releases, do visit the following articles:

About Can Sayn 41 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website