Introduction
This article will introduce the new features defined in the new version of ClickHouse, version 22.5.
You can reach to the new ClickHouse release here.
Key features & improvements
ClickHouse has added new features to group by modifiers such as WITH TOTALS, WITH ROLLUP, WITH CUBE and GROUPING SETS.
GROUP BY modifiers;
- WITH TOTALS;
- WITH ROLLUP;
- WITH CUBE;
- GROUPING SETS;
1. WITH TOTAL
There is a simple Select query run WITHOUT TOTAL which is shows id, name, menus_appreared, first_appreared, lowest_price and highest_price informations Group By year county town.
select id, name, menus_appeared, first_appeared, avg(lowest_price), avg(highest_price) FROM dish GROUP BY id, name, menus_appeared, first_appeared, lowest_price, highest_price ORDER BY highest_price DESC LIMIT 10
┌─────id─┬─name───────────────────────────────┬─menus_appeared─┬─first_appeared─┬─avg(lowest_price)─┬─avg(highest_price)─┐ │ 40288 │ Cream cheese with bar-le-duc jelly │ 16 │ 1900 │ 0.45 │ 3050 │ │ 66 │ Grape fruit │ 712 │ 1895 │ 0 │ 2540 │ │ 125853 │ Oysters Baked in Shell │ 3 │ 1857 │ 0.85 │ 2065 │ │ 27559 │ Pommery & Greno. Ex. Dry │ 10 │ 1900 │ 1.4 │ 2050 │ │ 253803 │ luso │ 1 │ 1987 │ 600 │ 1100 │ │ 175911 │ Omelette aux Foies de Poulet │ 1 │ 0 │ 1035 │ 1035 │ │ 1212 │ Coffee, Pot │ 165 │ 1898 │ 0.05 │ 1000 │ │ 187125 │ Beef Stew Bourgoise │ 2 │ 0 │ 8.5 │ 850 │ │ 129839 │ Assorted Ice Creams │ 24 │ 1889 │ 0.2 │ 800 │ │ 241478 │ Helados diversos │ 1 │ 1958 │ 800 │ 800 │ └────────┴────────────────────────────────────┴────────────────┴────────────────┴───────────────────┴────────────────────┘
Same query’s result after run WITH TOTAL options. Query add one more additional column that shows the values of every aggregate function.
select id, name, menus_appeared, first_appeared, avg(lowest_price), avg(highest_price) FROM dish GROUP BY id, name, menus_appeared, first_appeared, lowest_price, highest_price WITH TOTALS ORDER BY highest_price DESC LIMIT 10
┌─────id─┬─name───────────────────────────────┬─menus_appeared─┬─first_appeared─┬─avg(lowest_price)─┬─avg(highest_price)─┐ │ 40288 │ Cream cheese with bar-le-duc jelly │ 16 │ 1900 │ 0.45 │ 3050 │ │ 66 │ Grape fruit │ 712 │ 1895 │ 0 │ 2540 │ │ 125853 │ Oysters Baked in Shell │ 3 │ 1857 │ 0.85 │ 2065 │ │ 27559 │ Pommery & Greno. Ex. Dry │ 10 │ 1900 │ 1.4 │ 2050 │ │ 253803 │ luso │ 1 │ 1987 │ 600 │ 1100 │ │ 175911 │ Omelette aux Foies de Poulet │ 1 │ 0 │ 1035 │ 1035 │ │ 1212 │ Coffee, Pot │ 165 │ 1898 │ 0.05 │ 1000 │ │ 187125 │ Beef Stew Bourgoise │ 2 │ 0 │ 8.5 │ 850 │ │ 129839 │ Assorted Ice Creams │ 24 │ 1889 │ 0.2 │ 800 │ │ 241478 │ Helados diversos │ 1 │ 1958 │ 800 │ 800 │ └────────┴────────────────────────────────────┴────────────────┴────────────────┴───────────────────┴────────────────────┘ Totals: ┌─id─┬─name─┬─menus_appeared─┬─first_appeared─┬──avg(lowest_price)─┬─avg(highest_price)─┐ │ 0 │ │ 0 │ 0 │ 0.8889513390292098 │ 1.47707363843175 │ └────┴──────┴────────────────┴────────────────┴────────────────────┴────────────────────┘
2. WITH ROLLUP
WITH ROLLUP FUNCTION returns the result of every aggregate operation by default, as in the WITH TOTAL function. In addition, firstly, returns column that aggregate by id. Secondly, returns column that aggregates by name, then name, menus_appreared, first_appreared, lowest_price and highest_price. As a result WITH ROLLUP function aggregates by all the keys which are given in the GROUP BY function.
select id, name, menus_appeared, first_appeared, avg(lowest_price), avg(highest_price) FROM dish GROUP BY id, name, menus_appeared, first_appeared, lowest_price, highest_price WITH ROLLUP ORDER BY highest_price DESC LIMIT 10
┌─────id─┬─name───────────────────────────────┬─menus_appeared─┬─first_appeared─┬─avg(lowest_price)─┬─avg(highest_price)─┐ │ 40288 │ Cream cheese with bar-le-duc jelly │ 16 │ 1900 │ 0.45 │ 3050 │ │ 66 │ Grape fruit │ 712 │ 1895 │ 0 │ 2540 │ │ 125853 │ Oysters Baked in Shell │ 3 │ 1857 │ 0.85 │ 2065 │ │ 27559 │ Pommery & Greno. Ex. Dry │ 10 │ 1900 │ 1.4 │ 2050 │ │ 253803 │ luso │ 1 │ 1987 │ 600 │ 1100 │ │ 175911 │ Omelette aux Foies de Poulet │ 1 │ 0 │ 1035 │ 1035 │ │ 1212 │ Coffee, Pot │ 165 │ 1898 │ 0.05 │ 1000 │ │ 187125 │ Beef Stew Bourgoise │ 2 │ 0 │ 8.5 │ 850 │ │ 129839 │ Assorted Ice Creams │ 24 │ 1889 │ 0.2 │ 800 │ │ 241478 │ Helados diversos │ 1 │ 1958 │ 800 │ 800 │ └────────┴────────────────────────────────────┴────────────────┴────────────────┴───────────────────┴────────────────────┘
3. WITH CUBE
It write every subset keys in the GROUP BY. Instead of aggregate id, name, menus_appreared, first_appreared, lowest_price and highest_price columns, WITH CUBE function make combination of this keys. If you have multiple keys the result can be huge.
select id, name, menus_appeared, first_appeared, avg(lowest_price), avg(highest_price) FROM dish GROUP BY id, name, menus_appeared, first_appeared, lowest_price, highest_price WITH CUBE ORDER BY highest_price DESC LIMIT 10
┌────id─┬─name───────────────────────────────┬─menus_appeared─┬─first_appeared─┬─avg(lowest_price)─┬─avg(highest_price)─┐ │ 0 │ Cream cheese with bar-le-duc jelly │ 16 │ 1900 │ 0.45 │ 3050 │ │ 0 │ Cream cheese with bar-le-duc jelly │ 16 │ 1900 │ 0.45 │ 3050 │ │ 40288 │ │ 16 │ 1900 │ 0.45 │ 3050 │ │ 40288 │ Cream cheese with bar-le-duc jelly │ 16 │ 1900 │ 0.45 │ 3050 │ │ 0 │ Cream cheese with bar-le-duc jelly │ 16 │ 0 │ 0.45 │ 3050 │ │ 40288 │ Cream cheese with bar-le-duc jelly │ 0 │ 1900 │ 0.45 │ 3050 │ │ 0 │ Cream cheese with bar-le-duc jelly │ 16 │ 0 │ 0.45 │ 3050 │ │ 40288 │ │ 16 │ 1900 │ 0.45 │ 3050 │ │ 0 │ Cream cheese with bar-le-duc jelly │ 0 │ 1900 │ 0.45 │ 3050 │ │ 0 │ Cream cheese with bar-le-duc jelly │ 0 │ 1900 │ 0.45 │ 3050 │ └───────┴────────────────────────────────────┴────────────────┴────────────────┴───────────────────┴────────────────────┘
4. GROUPING SETS
If you want to GROUP BY year and also want the same result with aggregation of lowest_price and highest_price you can use GROUP BY GROUPING SETS function. Before this function you have to do multiple select and union operations. However GROUPING SETS function allows you do it in just one process.
select id, avg(lowest_price), avg(highest_price) FROM dish GROUP BY GROUPING SETS ((), (id,lowest_price), (id,highest_price) ) ORDER BY highest_price DESC LIMIT 10
┌─────id─┬─avg(lowest_price)─┬─avg(highest_price)─┐ │ 40288 │ 0.45 │ 3050 │ │ 66 │ 0 │ 2540 │ │ 125853 │ 0.85 │ 2065 │ │ 27559 │ 1.4 │ 2050 │ │ 253803 │ 600 │ 1100 │ │ 175911 │ 1035 │ 1035 │ │ 1212 │ 0.05 │ 1000 │ │ 187125 │ 8.5 │ 850 │ │ 129839 │ 0.2 │ 800 │ │ 241478 │ 800 │ 800 │ └────────┴───────────────────┴────────────────────┘
5. MYSQLDUMP as an input format
If you have mysqldump, you can easily import into ClickHouse
6. PROMETHEUS as output format
If you want to monitor ClickHouse with Prometheus or you have some monitoring data in ClickHouse, and you want to export it into Prometheus you can use Prometheus output format with some query that will output some metrics. You can define custom http handler so this data will data will available at some path by url.
7. PERFORMANCE OPTIMIZATIONS
AVG and SUM functions without group by are up to 1.5 times faster.
Unary arithmetic functions are up to 7 times faster.
Conclusion
In this version, emphasis was placed on aggregate functions and performance improvement. Improvements were made to the functions in order to give faster results for the queries. For the GROUP BY function, new functions have been defined in order to retrieve data in different combinations.
To know more about the ClickHouse releases, do visit the following articles: