Introduction
Functions that arrive at the resultant value by calculating on a set of values are called aggregation functions. In ClickHouse, aggregate functions work in the normal way as expected by database expert.
In this article, we will give examples with Aggregate Functions, Parametric Aggregate Functions and Aggregate Function Combinators on ClickHouse.
ClickHouse Aggregate Functions
( To more information please visit ClickHouse docs here.)
Lets explain some of the aggregate functions then give examples with ClickHouse.
count
Count function, Counts the number of rows or not-NULL values.
ClickHouse supports the following syntaxes for count
:
count(expr)
orCOUNT(DISTINCT expr)
.count()
orCOUNT(*)
. Thecount()
syntax is ClickHouse-specific.
Example 1
SELECT count() FROM table
Result
┌─count()─┐ │ 47 │ └─────────┘
Example 2
SELECT count(DISTINCT number) FROM table
Result
┌─uniqExact(num)─┐ │ 8 │ └────────────────┘
This example shows that count(DISTINCT num)
is performed by the uniqExact
function according to the count_distinct_implementation
setting value.
min
Aggregate function that calculates the minimum across a group of values.
Example
SELECT min(salary) FROM employees;
SELECT department, min(salary) FROM employees GROUP BY department;
max
Aggregate function that calculates the maximum across a group of values.
SELECT max(salary) FROM employees;
SELECT department, max(salary) FROM employees GROUP BY department;
sum
Calculates the sum. Only works for numbers.
SELECT sum(salary) FROM employees;
avg
Calculates the arithmetic mean.
SELECT avg(salary) FROM employees;
Parametric Aggregate Functions
Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters – constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments.
Parametric Aggregate Functions listed below:
histogram | retention |
sequenceMatch | uniqUpTo |
sequenceCount | sumMapFiltered |
windowFunnel |
Calculates an adaptive histogram. It does not guarantee precise results.
histogram
Example;
SELECT histogram(5)(number + 1) FROM ( SELECT * FROM system.numbers LIMIT 20 )
Result;
┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐ │ [(1,4.5,4),(4.5,8.5,4),(8.5,12.75,4.125),(12.75,17,4.625),(17,20,3.25)] │ └─────────────────────────────────────────────────────────────────────────┘
You can visualize a histogram with the bar function, for example:
WITH histogram(5)(rand() % 100) AS hist SELECT arrayJoin(hist).3 AS height, bar(height, 0, 6, 5) AS bar FROM ( SELECT * FROM system.numbers LIMIT 20 )
Result;
┌─height─┬─bar───┐ │ 2.125 │ █▋ │ │ 3.25 │ ██▌ │ │ 5.625 │ ████▏ │ │ 5.625 │ ████▏ │ │ 3.375 │ ██▌ │ └────────┴───────┘
sequenceMatch
Checks whether the sequence contains an event chain that matches the pattern.
sequenceMatch(pattern)(timestamp, cond1, cond2, ...)
Example;
SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2) FROM test
Result;
┌─sequenceMatch('(?1)(?2)')(time, equals(number, 1), equals(number, 2), equals(number, 3))─┐ │ 0 │ └──────────────────────────────────────────────────────────────────────────────────────────┘
sequenceCount
Counts the number of event chains that matched the pattern. The function searches event chains that do not overlap. It starts to search for the next chain after the current chain is matched.
Example;
Consider data in the test
table:
┌─time─┬─number─┐ │ 1 │ 1 │ │ 2 │ 3 │ │ 3 │ 2 │ │ 4 │ 1 │ │ 5 │ 3 │ │ 6 │ 2 │ └──────┴────────┘
Count how many times the number 2 occurs after the number 1 with any amount of other numbers between them:
SELECT sequenceCount('(?1).*(?2)')(time, number = 1, number = 2) FROM t
Result;
┌─sequenceCount('(?1).*(?2)')(time, equals(number, 1), equals(number, 2))─┐ │ 2 │ └─────────────────────────────────────────────────────────────────────────┘
Aggregate Function Combinators
The name of an aggregate function can have a suffix appended to it. This changes the way the aggregate function works.
Aggregate function combinators listed below;
If | State |
Array | Merge |
Map | MergeState |
SimpleState | ForEach |
Distinct | OrDefault |
OrNull | Resample |
If
The suffix -If can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition (Uint8 type). The aggregate function processes only the rows that trigger the condition. If the condition was not triggered even once, it returns a default value (usually zeros or empty strings).
Examples: sumIf(column, cond)
, countIf(cond)
, avgIf(x, cond)
, quantilesTimingIf(level1, level2)(x, cond)
, argMinIf(arg, val, cond)
and so on.
Example;
SELECT sumIf(value, value % 2 == 1) AS odd_sum, FROM table;
Avg
Sum or arithmetic average, but only for the rows that satisfy the condition passed by the second argument.
Therefore, SUM_IF(value, condition)
is a slightly shorter notation for SUM(IF(condition, value))
, same for AVG
. The argument’s data type expansion is similar to the same-name functions without a suffix.
SELECT avgIf(value, value % 2 == 1) AS odd_avg, FROM table;
Conclusion
In this article, we gave examples with Aggregate Functions, Parametric Aggregate Functions and Aggregate Function Combinators on ClickHouse. ClickHouse, is a database that has analyzed the deficiencies in the queries written to reach the data in the industry and found solutions to them. We have covered some of the examples of these solutions in this article. We can access the data in the format we want by adding some functions to the same query that we use to access the data and access this data.
To read more functions in ClickHouse, do consider reading the following articles