ClickHouse August 2022 Release – v22.8

Introduction

ClickHouse version 22.08 (August 2022) was released on August 18, 2022. This version includes 12 new features, 12 performance improvements, +40 other improvements, and over 45 bug fixes.

For more information, please visit the official ClickHouse list of change logs here.

You can access the related documents and source code as follow:

Let’s take a look at the main features and improvements in this version.

Key features & improvements

1. Query Parameters in interactive Mode

Query parameters can be set in interactive mode with “SET param_” command. With the help of this feature, as shown below, values can be assigned to the parameters, and these parameters can be queried.

ubuntu@clickhouse01:~$ clickhouse-client --password
clickhouse01 :) SET param_x='chistaDATA';

SET 

Query id: 16fb3569-401a-4e8f-b208-6dab92afc12a

Ok.

0 rows in set. Elapsed: 0.004 sec. 

clickhouse01 :) select {x:String}

SELECT {x:String}

Query id: fe1ce2e3-b9ed-4093-8847-b5551f0bca25

┌─_CAST('chistaDATA', 'String')─┐
│ chistaDATA                    │
└───────────────────────────────┘

1 row in set. Elapsed: 0.003 sec.

2. DELETE FROM Syntax Support

In this version, the SQL standard of “DELETE FROM” syntax is supported on merge tree tables and lightweight deletes implementation for merge tree families. This feature improves delete performance dramatically.

For testing “ALTER TABLE DELETE” and “DELETE FROM” syntax, 2 demo tables were created from “cell_towers” table.

clickhouse01 :) create table cell_towers_demo ENGINE = MergeTree ORDER BY (radio, mcc, net, created) as select * from cell_towers;

clickhouse01 :) create table cell_towers_demo2 ENGINE = MergeTree ORDER BY (radio, mcc, net, created) as select * from cell_towers;

This table has more than 43 million records, and we will delete the records for area code 25464 by using both methods.

clickhouse01 :) select count() from cell_towers;

SELECT count()
FROM cell_towers

Query id: e01de37f-81d0-4740-b25b-c98693e9d3a5

┌──count()─┐
│ 43276158 │
└──────────┘

1 row in set. Elapsed: 0.004 sec. 


clickhouse01 :) select count() from cell_towers where area=25464;

SELECT count()
FROM cell_towers
WHERE area = 25464

Query id: 546a45f9-0b39-4b36-89a5-6cab932f21af

┌─count()─┐
│      51 │
└─────────┘

With the “ALTER TABLE DELETE” command, it took 25.7 seconds to delete 51 records from “cell_towers_demo” table.

clickhouse01 :) SET mutations_sync = 2;
clickhouse01 :) alter table cell_towers_demo delete where area= 25464;

ALTER TABLE cell_towers_demo
    DELETE WHERE area = 25464

Query id: a41fb540-9a18-435b-a0be-d6b48439d1f6

Ok.

0 rows in set. Elapsed: 25.709 sec.

On the other hand, “DELETE FROM” command deleted the same 51 records in 0.7 seconds.

clickhouse01 :) SET allow_experimental_lightweight_delete = true;

SET allow_experimental_lightweight_delete = 1

Query id: ae8a0fd4-948b-4860-aa10-ed00dd6e561d

Ok.

0 rows in set. Elapsed: 0.003 sec. 


clickhouse01 :) SET mutations_sync = 2;	

clickhouse01 :) delete from cell_towers_demo2 where area= 25464;

DELETE FROM cell_towers_demo2 WHERE area = 25464

Query id: c0ff8d9c-19c3-44ac-b9a3-f86d6451469d

Ok.

0 rows in set. Elapsed: 0.783 sec.

3. Exact Rows Before Limit Setting

If exact_rows_before_limit is set to 1(default 0), ClickHouse provide exact value for rows_before_limit_at_least statistic. Because of the read whole table, the cost of the read data will increase.

In the example below, before setting “exact_rows_before_limit” to 1 “rows_before_limit_at_least” is 391. After setting the “exact_rows_before_limit” value to 1(or true), “rows_before_limit_at_least” have changed to 100000, which is the exact number of read records.

clickhouse01 :) select number % 100000 , count() from numbers_mt(10000000) group by 1 limit 1 format JSONCompact;

SELECT
    number % 100000,
    count()
FROM numbers_mt(10000000)
GROUP BY 1
LIMIT 1
FORMAT JSONCompact

Query id: f6f44389-3857-49b0-a709-21df9474e01f

{
    "meta":
    [
        {
            "name": "modulo(number, 100000)",
            "type": "UInt32"
        },
        {
            "name": "count()",
            "type": "UInt64"
        }
    ],

    "data":
    [
        [73035, "100"]                                                                                          
    ],

    "rows": 1,

    "rows_before_limit_at_least": 391,

    "statistics":
    {
        "elapsed": 0.135662472,
        "rows_read": 10000000,
        "bytes_read": 80000000
    }
}

1 row in set. Elapsed: 0.136 sec. Processed 10.00 million rows, 80.00 MB (73.41 million rows/s., 587.25 MB/s.)
clickhouse01 :) SET exact_rows_before_limit=true;

SET exact_rows_before_limit = 1

Query id: a7f58e95-5527-4a9d-ba74-c1d00a36ebda

Ok.

0 rows in set. Elapsed: 0.003 sec. 


clickhouse01 :) select number % 100000 , count() from numbers_mt(10000000) group by 1 limit 1 format JSONCompact;

SELECT
    number % 100000,
    count()
FROM numbers_mt(10000000)
GROUP BY 1
LIMIT 1
FORMAT JSONCompact

Query id: 03e3e497-f5d5-4cea-848b-613dbc61f557

{
    "meta":
    [
        {
            "name": "modulo(number, 100000)",
            "type": "UInt32"
        },
        {
            "name": "count()",
            "type": "UInt64"
        }
    ],

    "data":
    [
        [73035, "100"]                                                                                          
    ],

    "rows": 1,

    "rows_before_limit_at_least": 100000,

    "statistics":
    {
        "elapsed": 0.140187318,
        "rows_read": 10000000,
        "bytes_read": 80000000
    }
}

1 row in set. Elapsed: 0.175 sec. Processed 10.00 million rows, 80.00 MB (57.30 million rows/s., 458.38 MB/s.)

4. Distributed INSERT INTO SELECT from s3Cluster

Support for parallel distributed INSERT INTO SELECT with s3Cluster table function into tables with Distributed and Replicated engine.

The query can be parallelized with the s3Cluster table function if you have a cluster with multiple nodes. Now, if the target table is Replicated or Distributed, the INSERT will be distributed throughout the cluster (will be parallelized).

INSERT INTO targetTable
select * FROM s3Cluster('cluster','url');

5. JSON Formatted Log Output

The outputs can be collected in JSON format. The purpose is to allow easier ingestion and query in log analysis tools. To enable JSON logging support, it is enough for the uncomment <formatting> tag in config.xml file as shown below.

 <!-- <formatting>json</formatting> --> ## uncomment this line

<formatting>json</formatting>           ## the uncommented "formatting" tag

6. SELECT INTO OUTFILE AND STDOUT

Support for saving query results as output files and showing these results at the screen same time.

clickhouse01 :) select radio,lon,lat,range from cell_towers limit 10 INTO OUTFILE 'results.txt' AND STDOUT FORMAT Pretty


SELECT
    radio,
    lon,
    lat,
    range
FROM cell_towers
LIMIT 10
INTO OUTFILE 'results.txt'
FORMAT Pretty

Query id: f30e18d1-c31c-4c1c-b08f-c68ed8ac8bfe

┏━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ radio ┃         lon ┃       lat ┃ range ┃
┡━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ CDMA  │ -112.069237 │ 48.978268 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │ -112.069237 │ 48.978268 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │ -112.069237 │ 48.978268 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.462952 │ 44.009564 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.522812 │  43.79319 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.315284 │ 43.838686 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.459198 │ 43.797741 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.462547 │  44.01469 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -80.480919 │ 43.435841 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │   -79.37619 │  43.84483 │  1000 │
└───────┴─────────────┴───────────┴───────┘

10 rows in set. Elapsed: 0.008 sec. 

clickhouse01 :) 

ubuntu@clickhouse01:~$ cat results.txt
┏━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ radio ┃         lon ┃       lat ┃ range ┃
┡━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ CDMA  │ -112.069237 │ 48.978268 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │ -112.069237 │ 48.978268 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │ -112.069237 │ 48.978268 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.462952 │ 44.009564 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.522812 │  43.79319 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.315284 │ 43.838686 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.459198 │ 43.797741 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -79.462547 │  44.01469 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │  -80.480919 │ 43.435841 │  1000 │
├───────┼─────────────┼───────────┼───────┤
│ CDMA  │   -79.37619 │  43.84483 │  1000 │
└───────┴─────────────┴───────────┴───────┘
ubuntu@clickhouse01:~$ 

Other Notable Features/Improvements

  • Extended the range of Date32 and DateTime64 to support dates from 1900 to 2299.
  • ClickFiddle, a tool to check query results in almost all ClickHouse versions.
  • ClickHouse Hardware Benchmark site.
  • Quota key set in the native protocol.
  • New settings to control schema inference from text formats.
  • Ability to specify settings for an executable() table function.
  • Improved memory usage during memory efficient merging of aggregation results.
  • Normalize AggregateFunction types and state representations.

Conclusion

The August 2022 release (version 22.8) of ClickHouse brings significant enhancements, including query parameter support, improved delete performance, and expanded data output options. Features like distributed INSERT INTO SELECT, JSON formatted log output, and SELECT INTO OUTFILE AND STDOUT streamline data management and analysis, reinforcing ClickHouse’s status as a leading analytical database system.

To know more about ClickHouse releases, do visit the following articles: 

References

About Emrah Idman 11 Articles
Emrah Idman has considerable experience in relational and NoSQL databases. He has worked in a large-scale financial company for over 15 years. He has significant experience in team management, procurement and capacity planning, database administration and product testing for high-volume systems. He is working at ChistaDATA Inc. as senior database administrator.
Contact: Website