Introduction
Every new release includes new features, enhancements, and numerous bug fixes, and the ChistaDATA team always stays on top of the latest releases. On November 17, 2022, ClickHouse version 22.11 was released, and this version contains the following;
- 15 new features,
- 5 performance optimizations,
- 32 bug fixes.
For further details, please see the official ClickHouse docs here.
- v22.11 Source Code : GitHub Link
- v22.11 Release Webinar: Recording
- v22.11 Release Webinar : Slides
- Installation: ClickHouse Docs
This article will look at the critical features of the ClickHouse 22.11 release.
Key features & improvements
1. Composite Time Intervals
Support for composite time intervals. 1. Add, subtract and negate operations are now available on Intervals. In the case where the types of Intervals are different, they will be transformed into the Tuple of those types. 2. A tuple of intervals can be added to or subtracted from a Date/DateTime field. 3. Added parsing of Intervals with different types, for example: INTERVAL '1 HOUR 1 MINUTE 1 SECOND'
The example below shows your current time and the time 1 month 2 days ago.
SELECT now(), now() + (toIntervalMonth(1), toIntervalDay(-2)) SELECT now(), now() + (toIntervalMonth(1), toIntervalDay(-2)) Query id: 10793c38-c423-4eb0-b59f-13581e4bd6eb ┌───────────────now()─┬─plus(now(), tuple(toIntervalMonth(1), toIntervalDay(-2)))─┐ │ 2022-11-20 13:51:35 │ 2022-12-18 13:51:35 │ └─────────────────────┴───────────────────────────────────────────────────────────┘
The following example shows a tuple.
SELECT INTERVAL '1 MONTH -2 DAY'; SELECT (toIntervalMonth(1), toIntervalDay(-2)) Query id: ff916581-fb93-4dcc-9ecb-50c46e17d2b4 ┌─tuple(toIntervalMonth(1), toIntervalDay(-2))─┐ │ (1,-2) │ └──────────────────────────────────────────────┘
To examine the tuple type, you can run the following command.
You will see that it is a tuple of two interval types.
SELECT toTypeName(INTERVAL '1 MONTH -2 DAY'); SELECT toTypeName((toIntervalMonth(1), toIntervalDay(-2))) Query id: 51165ade-6f69-42fb-b9fe-28a2b4221b0d ┌─toTypeName(tuple(toIntervalMonth(1), toIntervalDay(-2)))─┐ │ Tuple(IntervalMonth, IntervalDay) │ └──────────────────────────────────────────────────────────┘
2. Support For ** Glob
ClickHouse now added **
glob support for recursive directory traversal of the filesystem and S3.
Now ClickHouse support: *, **, ?, {abc,def}, {1..10}, {01..10}.
You can run the following command if you like to see how many special characters you have in your code blog. This command tries to find cpp (c++) source code and cpp header files, then reads every single file.
SELECT line, count() FROM file ('**/*.{h.cpp}', LineAsString) GROUP BY line ORDER BY count() DESC LIMIT 20;
As you can see below, the most popular line in code blogs is an empty line.
─Line───────────────────────────────┬───────Count─┐ │ │ 1518855 │ │ "" │ 90748 │ │ { │ 31615 │ │ } │ 26350 │ │ { │ 26306 │ │ } │ 22941 │ │ { │ 13439 │ │ } │ 12465 │ │ { │ 5798 │ │ } │ 5473 │ │ }; │ 5270 │ │ namespace DB │ 4535 │ │ public: │ 3002 │ │ #pragma once │ 2322 │ │ else │ 2296 │ │ { │ 2218 │ │ } │ 2143 │ │ }; │ 2087 │ │ private: │ 1996 │ │ else │ 1692 │ └───────────────────────────────────┴─────────────┘
3. Functions For Spark Compatibility
- Add function
ascii
like in Apache Spark: https://spark.apache.org/docs/latest/api/sql/#ascii. - Add function
pmod
which returns non-negative result based on modulo. - Add function
formatReadableDecimalSize
. - Add function
randCanonical
, which is similar to therand
function in Apache Spark or Impala. The function generates pseudo random results with independent and identically distributed uniformly distributed values in [0, 1).
Example;
SELECT modulo(-3, 10), -3 % 10; SELECT -3 % 10, -3 % 10 Query id: 2f07420d-837f-45fc-a424-746faf14ffbc ┌─modulo(-3, 10)─┬─modulo(-3, 10)─┐ │ -3 │ -3 │ └────────────────┴────────────────┘
SELECT pmod(-3, 10), positiveModulo(-3, 10); SELECT pmod(-3, 10), positiveModulo(-3, 10) Query id: bcec8a40-32ac-405c-98d6-24d51469ff8c ┌─positiveModulo(-3, 10)─┬─positiveModulo(-3, 10)─┐ │ 7 │ 7 │ └────────────────────────┴────────────────────────┘
4. Functions For Convenience
Some functions have been developed for easier access to data about your database.
You can see how much disk space you have. For example, I have a 20 GB disk size on my local machine.
SELECT filesystemCapacity() AS size, formatReadableSize(size) AS bin, formatReadableDecimalSize(size) AS dec Query id: b52f9f23-5362-4282-adea-d98b0f2504a9 ┌────────size─┬─bin───────┬─dec──────┐ │ 20625031168 │ 19.21 GiB │ 20.63 GB │ └─────────────┴───────────┴──────────┘
And you can see your display name.
SELECT displayName() Query id: cb049230-ee10-435f-9381-7047ee6163df ┌─displayName()─┐ │ clickhouse01 │ └───────────────┘
5. Retries On INSERT
Adds support for retries during INSERTs into ReplicatedMergeTree when a session with ClickHouse Keeper is lost. Apart from fault tolerance, it aims to provide a better user experience – avoid returning a user an error during insert if the keeper is restarted (for example, due to an upgrade)
When the session is expired, tables become readonly mode. If you run the following command, it will never happen again.
INSERT will survive restarts of ClickHouse Keeper or ZooKeeper
and reconnections.
SET insert_keeper_max_retries = 10;
6. Data Lakes
Added Hudi
and DeltaLake
table engines, read-only, only for tables on S3.
Added table function hudi
and deltaLake
.
SELECT count() AS c FROM s3( 'https://clickhouse-public-datasets.s3.amazonaws.com/hits/native/*.native.zst') WHERE URL LIKE '%google%'; -- 5.924 sec. SELECT count() AS c FROM s3Cluster('default', 'https://clickhouse-public-datasets.s3.amazonaws.com/hits/native/*.native.zst') WHERE URL LIKE '%google%'; -- 2.560 sec. SELECT count() AS c FROM s3( 'https://clickhouse-public-datasets.s3.amazonaws.com/hits_compatible/athena_partitioned/hits_*.parquet') WHERE URL LIKE '%google%'; -- 5.887 sec. SELECT count() AS c FROM s3Cluster('default', 'https://clickhouse-public-datasets.s3.amazonaws.com/hits_compatible/athena_partitioned/hits_*.parquet') WHERE URL LIKE '%google%' -- 2.935 sec. SELECT count() AS c FROM hits WHERE URL LIKE '%google%'; -- 0.913 sec.
ClickHouse also supports Apache Hudi and Delta Lake
for SELECT queries.
Advantages:
— these formats are somewhat resembling MergeTree
allowing incremental data insertion,
approaching to ClickHouse data formats;
Disadvantages:
— alien data formats from Apache/Hadoop/Java world;
— nothing works out of the box
unless you really know how to deal with it;
7. S3 Readonly Disk
Introduce s3_plain
disk type for write-once-read-many operations. Implement ATTACH
of MergeTree
table for s3_plain
disk
Attach tables directly from s3. No local state.
A new disk type — s3_plain.
$ cat /etc/clickhouse-server/config.d/s3_plain.yaml storage_configuration: disks: s3_plain: type: s3_plain endpoint: 'http://clickhouse-public-datasets.s3.amazonaws.com/s3_plain/' policies: s3_plain: volumes: main: disk: s3_plain
ATTACH TABLE hits_worm UUID '5eb0149e-d250-4ec4-bfc7-a90ce167690f' ( `WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `Refresh` UInt8, `RefererCategoryID` UInt16, `RefererRegionID` UInt32, `URLCategoryID` UInt16, `URLRegionID` UInt32, `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `OriginalURL` String, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `LocalEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `RemoteIP` UInt32, `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` UInt32, `DNSTiming` UInt32, `ConnectTiming` UInt32, `ResponseStartTiming` UInt32, `ResponseEndTiming` UInt32, `FetchTiming` UInt32, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32 ) ENGINE = MergeTree PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID), EventTime) SETTINGS storage_policy = 's3_plain'
Now change the disk type to cache and see the executing time result. Under than 1 second.
storage_configuration: disks: s3_plain_cache: type: cache disk: s3_plain path: /var/lib/clickhouse/cache/ max_size: 1000000000000 policies: s3_plain_cache: volumes: main: disk: s3_plain_cache
SELECT count() FROM hits_worm WHERE URL LIKE '%google%' -- 0.719 sec.
These are the ClickHouse 22.11 features. To find out more details, please visit the official ClickHouse Docs.
Conclusion
The ClickHouse November 2022 Release (Version 22.11) introduces key features like composite time intervals, ** glob support, Spark-compatible functions, retries on INSERT, and support for Hudi and DeltaLake. These enhancements enhance data management and analysis capabilities.
To know more about ClickHouse releases, do visit the following articles: