ClickHouse May 2022 Release – v22.5

Introduction

This article will introduce the new features defined in the new version of ClickHouse, version 22.5.

You can reach to the new ClickHouse release here.

Key features & improvements

ClickHouse has added new features to group by modifiers such as WITH TOTALS, WITH ROLLUP, WITH CUBE and GROUPING SETS.

GROUP BY modifiers;

  • WITH TOTALS;
  • WITH ROLLUP;
  • WITH CUBE;
  • GROUPING SETS;

1. WITH TOTAL

There is a simple Select query run  WITHOUT TOTAL which is shows id, name, menus_appreared, first_appreared, lowest_price and highest_price informations Group By year county town.

select
    id,
    name,
    menus_appeared,
    first_appeared,
    avg(lowest_price),
    avg(highest_price)
FROM dish
GROUP BY id,
    name,
    menus_appeared,
    first_appeared,
    lowest_price,
    highest_price
ORDER BY highest_price DESC 
LIMIT 10
┌─────id─┬─name───────────────────────────────┬─menus_appeared─┬─first_appeared─┬─avg(lowest_price)─┬─avg(highest_price)─┐
│  40288 │ Cream cheese with bar-le-duc jelly │             16 │           1900 │              0.45 │               3050 │
│     66 │ Grape fruit                        │            712 │           1895 │                 0 │               2540 │
│ 125853 │ Oysters Baked in Shell             │              3 │           1857 │              0.85 │               2065 │
│  27559 │ Pommery & Greno. Ex. Dry           │             10 │           1900 │               1.4 │               2050 │
│ 253803 │ luso                               │              1 │           1987 │               600 │               1100 │
│ 175911 │ Omelette aux Foies de Poulet       │              1 │              0 │              1035 │               1035 │
│   1212 │ Coffee, Pot                        │            165 │           1898 │              0.05 │               1000 │
│ 187125 │ Beef Stew Bourgoise                │              2 │              0 │               8.5 │                850 │
│ 129839 │ Assorted Ice Creams                │             24 │           1889 │               0.2 │                800 │
│ 241478 │ Helados diversos                   │              1 │           1958 │               800 │                800 │
└────────┴────────────────────────────────────┴────────────────┴────────────────┴───────────────────┴────────────────────┘

Same query’s result after run WITH TOTAL options. Query add one more additional column that shows the values of every aggregate function.

select
    id,
    name,
    menus_appeared,
    first_appeared,
    avg(lowest_price),
    avg(highest_price)
FROM dish
GROUP BY id,
    name,
    menus_appeared,
    first_appeared,
    lowest_price,
    highest_price
    WITH TOTALS
ORDER BY highest_price DESC 
LIMIT 10
┌─────id─┬─name───────────────────────────────┬─menus_appeared─┬─first_appeared─┬─avg(lowest_price)─┬─avg(highest_price)─┐
│  40288 │ Cream cheese with bar-le-duc jelly │             16 │           1900 │              0.45 │               3050 │
│     66 │ Grape fruit                        │            712 │           1895 │                 0 │               2540 │
│ 125853 │ Oysters Baked in Shell             │              3 │           1857 │              0.85 │               2065 │
│  27559 │ Pommery & Greno. Ex. Dry           │             10 │           1900 │               1.4 │               2050 │
│ 253803 │ luso                               │              1 │           1987 │               600 │               1100 │
│ 175911 │ Omelette aux Foies de Poulet       │              1 │              0 │              1035 │               1035 │
│   1212 │ Coffee, Pot                        │            165 │           1898 │              0.05 │               1000 │
│ 187125 │ Beef Stew Bourgoise                │              2 │              0 │               8.5 │                850 │
│ 129839 │ Assorted Ice Creams                │             24 │           1889 │               0.2 │                800 │
│ 241478 │ Helados diversos                   │              1 │           1958 │               800 │                800 │
└────────┴────────────────────────────────────┴────────────────┴────────────────┴───────────────────┴────────────────────┘

Totals:
┌─id─┬─name─┬─menus_appeared─┬─first_appeared─┬──avg(lowest_price)─┬─avg(highest_price)─┐
│  0 │      │              0 │              0 │ 0.8889513390292098 │   1.47707363843175 │
└────┴──────┴────────────────┴────────────────┴────────────────────┴────────────────────┘

2. WITH ROLLUP

WITH ROLLUP FUNCTION returns the result of every aggregate operation by default, as in the WITH TOTAL function. In addition, firstly, returns column that aggregate by id. Secondly, returns column that aggregates by name, then name, menus_appreared, first_appreared, lowest_price and highest_price. As a result WITH ROLLUP function aggregates by all the keys which are given in the GROUP BY function.

select
    id,
    name,
    menus_appeared,
    first_appeared,
    avg(lowest_price),
    avg(highest_price)
FROM dish
GROUP BY id,
    name,
    menus_appeared,
    first_appeared,
    lowest_price,
    highest_price
    WITH ROLLUP
ORDER BY highest_price DESC 
LIMIT 10
┌─────id─┬─name───────────────────────────────┬─menus_appeared─┬─first_appeared─┬─avg(lowest_price)─┬─avg(highest_price)─┐
│  40288 │ Cream cheese with bar-le-duc jelly │             16 │           1900 │              0.45 │               3050 │
│     66 │ Grape fruit                        │            712 │           1895 │                 0 │               2540 │
│ 125853 │ Oysters Baked in Shell             │              3 │           1857 │              0.85 │               2065 │
│  27559 │ Pommery & Greno. Ex. Dry           │             10 │           1900 │               1.4 │               2050 │
│ 253803 │ luso                               │              1 │           1987 │               600 │               1100 │
│ 175911 │ Omelette aux Foies de Poulet       │              1 │              0 │              1035 │               1035 │
│   1212 │ Coffee, Pot                        │            165 │           1898 │              0.05 │               1000 │
│ 187125 │ Beef Stew Bourgoise                │              2 │              0 │               8.5 │                850 │
│ 129839 │ Assorted Ice Creams                │             24 │           1889 │               0.2 │                800 │
│ 241478 │ Helados diversos                   │              1 │           1958 │               800 │                800 │
└────────┴────────────────────────────────────┴────────────────┴────────────────┴───────────────────┴────────────────────┘

3. WITH CUBE

It write every subset keys in the GROUP BY. Instead of aggregate id, name, menus_appreared, first_appreared, lowest_price and highest_price columns, WITH CUBE function make combination of this keys.  If you have multiple keys the result can be huge. 

select
    id,
    name,
    menus_appeared,
    first_appeared,
    avg(lowest_price),
    avg(highest_price)
FROM dish
GROUP BY id,
    name,
    menus_appeared,
    first_appeared,
    lowest_price,
    highest_price
    WITH CUBE
ORDER BY highest_price DESC 
LIMIT 10
┌────id─┬─name───────────────────────────────┬─menus_appeared─┬─first_appeared─┬─avg(lowest_price)─┬─avg(highest_price)─┐
│     0 │ Cream cheese with bar-le-duc jelly │             16 │           1900 │              0.45 │               3050 │
│     0 │ Cream cheese with bar-le-duc jelly │             16 │           1900 │              0.45 │               3050 │
│ 40288 │                                    │             16 │           1900 │              0.45 │               3050 │
│ 40288 │ Cream cheese with bar-le-duc jelly │             16 │           1900 │              0.45 │               3050 │
│     0 │ Cream cheese with bar-le-duc jelly │             16 │              0 │              0.45 │               3050 │
│ 40288 │ Cream cheese with bar-le-duc jelly │              0 │           1900 │              0.45 │               3050 │
│     0 │ Cream cheese with bar-le-duc jelly │             16 │              0 │              0.45 │               3050 │
│ 40288 │                                    │             16 │           1900 │              0.45 │               3050 │
│     0 │ Cream cheese with bar-le-duc jelly │              0 │           1900 │              0.45 │               3050 │
│     0 │ Cream cheese with bar-le-duc jelly │              0 │           1900 │              0.45 │               3050 │
└───────┴────────────────────────────────────┴────────────────┴────────────────┴───────────────────┴────────────────────┘

4. GROUPING SETS

If you want to GROUP BY year and also want the same result with aggregation of lowest_price and highest_price you can use GROUP BY GROUPING SETS function. Before this function you have to do multiple select and union operations. However GROUPING SETS function allows you do it in just one process.

select
    id,
    avg(lowest_price),
    avg(highest_price)
FROM dish
GROUP BY GROUPING SETS 
    ((), (id,lowest_price), (id,highest_price)
    )
ORDER BY highest_price DESC 
LIMIT 10
┌─────id─┬─avg(lowest_price)─┬─avg(highest_price)─┐
│  40288 │              0.45 │               3050 │
│     66 │                 0 │               2540 │
│ 125853 │              0.85 │               2065 │
│  27559 │               1.4 │               2050 │
│ 253803 │               600 │               1100 │
│ 175911 │              1035 │               1035 │
│   1212 │              0.05 │               1000 │
│ 187125 │               8.5 │                850 │
│ 129839 │               0.2 │                800 │
│ 241478 │               800 │                800 │
└────────┴───────────────────┴────────────────────┘

5. MYSQLDUMP as an input format

If you have mysqldump, you can easily import into ClickHouse

6. PROMETHEUS as output format

If you want to monitor ClickHouse with Prometheus or you have some monitoring data in ClickHouse, and you want to export it into Prometheus you can use Prometheus output format with some query that will output some metrics. You can define custom http handler so this data will data will available at some path by url.

7. PERFORMANCE OPTIMIZATIONS

AVG and SUM functions without group by are up to 1.5 times faster.

Unary arithmetic functions are up to 7 times faster.

Conclusion

In this version, emphasis was placed on aggregate functions and performance improvement. Improvements were made to the functions in order to give faster results for the queries. For the GROUP BY function, new functions have been defined in order to retrieve data in different combinations.

To know more about the ClickHouse releases, do visit the following articles:

About Can Sayn 36 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website