ClickHouse April 2023 Release – Version 23.4

Introduction

Each new release includes new features, improvements, and numerous bug fixes, and the ChistaDATA team is always on top of the latest releases. On 26th April 2023, ClickHouse version 22.4 (April 2023) was released, and this version contains 15 new features, 11 performance optimizations, and 36 bug fixes.

 

For further details, please see the official ClickHouse change log list here.

Let’s look at some important new features:

SHOW COLUMNS

In the 23.4 release, SHOW COLUMNS is a command used to display the columns of a table or view in the database. It provides information about the columns’ names, types, and properties in a tabular format. The SHOW COLUMNS function is useful for quickly examining the structure of a table or view in ClickHouse and can be used as a reference when working with the database.

When you run the SHOW COLUMNS command in ClickHouse, you will get a result set that contains the following information for each column in the table:

field: The column name (String)
type: The data type of the column (String)
null: if the column datatype is nullable (UInt8)
key: PRI if the column is part of the primary key, SOR if the column is part of the sort key, otherwise empty (String)
default: The default expression of the column if it is of type ALIAS, DEFAULT, or MATERIALIZED; otherwise, NULL. (Nullable(String))
extra: additional information, currently unused (String)
collation: (only if a FULL keyword is specified) Collation of the column, always NULL as ClickHouse does not have per-column collations (Nullable(String))
comment: (only if a FULL keyword is specified) comment on the column (String)
privilege: (only if a FULL keyword was specified) The privilege you have for this column, currently unavailable (String)

Let’s have a look:

SHOW COLUMNS FROM kafka_table;

┌─field─────────────┬─type────┬─null─┬─key─────┬─default─┬─extra─┐
│ city              │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ county            │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ district          │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ duration          │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ locality          │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ newly_built       │ Bool    │    0 │         │ ᴺᵁᴸᴸ    │       │
│ paon              │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ postcode          │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ ppd_category_type │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ price             │ Float32 │    0 │         │ ᴺᵁᴸᴸ    │       │
│ property_type     │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ record_status     │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ saon              │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ street            │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ transaction       │ UUID    │    0 │ PRI SOR │ ᴺᵁᴸᴸ    │       │
│ transfer_date     │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
└───────────────────┴─────────┴──────┴─────────┴─────────┴───────┘

QuantileGK

ClickHouse’s QuantileGK is a probabilistic data structure that provides an efficient way to estimate quantiles of large data sets in real-time. It is based on the Greenwald-Khanna algorithm and is used to approximate quantiles with a configurable accuracy. QuantileGK works by dividing the data set into multiple levels, each level having a different granularity. At each level, it tracks the bounds of the data set within that granularity. This approach allows for a trade-off between accuracy and memory usage, where more accurate estimates require more memory.

In ClickHouse, QuantileGK is used as a built-in aggregate function that can be applied to columns in a SELECT statement. For example, to calculate the median of a column, you can use the quantile function like this.

SELECT quantilesGK(1, 0.25, 0.5, 0.75)(number + 1)
FROM numbers(1000)

┌─quantilesGK(1, 0.25, 0.5, 0.75)(plus(number, 1))─┐
│ [1,1,1]                                          │
└──────────────────────────────────────────────────┘

SELECT quantilesGK(10, 0.25, 0.5, 0.75)(number + 1)
FROM numbers(1000)

┌─quantilesGK(10, 0.25, 0.5, 0.75)(plus(number, 1))─┐
│ [156,413,659]                                     │
└───────────────────────────────────────────────────┘


SELECT quantilesGK(100, 0.25, 0.5, 0.75)(number + 1)
FROM numbers(1000)

┌─quantilesGK(100, 0.25, 0.5, 0.75)(plus(number, 1))─┐
│ [251,498,741]                                      │
└────────────────────────────────────────────────────┘

SELECT quantilesGK(1000, 0.25, 0.5, 0.75)(number + 1)
FROM numbers(1000)

┌─quantilesGK(1000, 0.25, 0.5, 0.75)(plus(number, 1))─┐
│ [249,499,749]                                       │
└─────────────────────────────────────────────────────┘

The QuantileGK feature is particularly useful for performing real-time analysis on large data sets, such as log files or user behavior data, where precise calculations of percentiles or other statistical measures are required, but the data set is too large to process in its entirety.

GRANT CURRENT GRANTS

In ClickHouse, The CURRENT GRANTS statement allows you to grant all of the specified privileges to the specified user or role. If none of the privileges are specified, then the given user or role is granted all the privileges available for CURRENT_USER.

CREATE ROLE accountant;
GRANT  CURRENT GRANTS TO accountant;
SHOW GRANTS FOR accountant FORMAT PrettyJSONEachRow;
{
    "GRANTS FOR accountant FORMAT PrettyJSONEachRow": "GRANT SHOW, SELECT, INSERT, ALTER, CREATE, DROP, UNDROP TABLE, TRUNCATE, OPTIMIZE, BACKUP, KILL QUERY, KILL TRANSACTION, MOVE PARTITION BETWEEN SHARDS, ACCESS MANAGEMENT, SYSTEM, dictGet, INTROSPECTION, SOURCES, CLUSTER ON *.* TO accountant"
}

extractKeyValuePairs

In ClickHouse, extractKeyValuePairs is a built-in function used to parse a string containing key-value pairs and return them as a set of rows. This function is useful for processing and extracting structured data stored in a single string column. The extractKeyValuePairs function takes two arguments: the string to be parsed and a delimiter to separate the key-value pairs. The delimiter is usually a comma (,) or a semicolon (;), but it can be any character that separates the pairs.

select extractKeyValuePairs('name:arda, surname:güler, age:18 team:fenerbahçe, nationality:turkish') as kv

OR

INSERT INTO test_kv VALUES
(1, 'color:red;size:large;material:cotton'),
(2, 'color:blue;size:medium;material:silk'),
(3, 'color:green;size:small;material:wool');

SELECT * FROM test_kv;
┌─id─┬─properties───────────────────────────┐
│  1 │ color:red;size:large;material:cotton │
│  2 │ color:blue;size:medium;material:silk │
│  3 │ color:green;size:small;material:wool │
└────┴──────────────────────────────────────┘

Let’s look at some important improvements:

  • The increased default value for connect_timeout_with_failover_ms to 1000ms, but hedged_connection_timeout_ms = 50. It starts connecting to other replicas after 50 ms but doesn’t drop the connection to the first replica and continues to connect to multiple replicas in parallel.
  • Added support for Iceberg v2. and partitioned and non-partitioned Iceberg, Hudi, and DeltaLake.
  • If we run a mutation with IN (subquery) like this: ALTER TABLE kafka UPDATE col='new value' WHERE id IN (SELECT id FROM huge_table) and the table kafka has multiple parts, then for each part a set for subquery SELECT id FROM huge_table is built-in memory. And if there are many parts, it can use a lot of memory and CPU. The solution is to introduce a short-lived cache of sets currently being built by mutation tasks. If another task with the same mutation is running at the same time, it can look up the set in the cache, wait for it to be built, and reuse it.

Note: If you want to test your scenarios, use the following Docker containers and container packages we have prepared. Here is the link to this installation.

Conclusion

The April 2023 release of ClickHouse, version 23.4, brings several new features and improvements to enhance data processing and analysis. With additions like the SHOW COLUMNS command, QuantileGK for real-time analysis, GRANT CURRENT GRANTS statement, and extractKeyValuePairs function, ClickHouse continues to evolve as a powerful tool for efficient data management. Stay ahead with the latest updates from ChistaDATA.

To know more about ClickHouse releases, do visit the following articles:

 

About Ilkay 24 Articles
Ilkay has been administering databases including NoSQL and RDBMS for over 7 years. He is experienced working high-scale banking databases. He is currently working at ChistaDATA Inc. as Database Administrator.
Contact: Website