Aggregate Functions via ClickHouse

Introduction

Functions that arrive at the resultant value by calculating on a set of values ​​are called aggregation functions. In ClickHouse, aggregate functions work in the normal way as expected by database expert.

In this article, we will give examples with Aggregate Functions, Parametric Aggregate Functions and Aggregate Function Combinators on ClickHouse.

ClickHouse Aggregate Functions

( To more information please visit ClickHouse docs here.)

Standard aggregate functions: ClickHouse-specific aggregate functions:
count anyHeavy groupBitAnd kurtSamp quantileTiming
min anyLast groupBitOr kurtPop quantileTimingWeighted
max argMin groupBitXor uniq quantileDeterministic
sum argMax groupBitmap uniqExact quantileTDigest
avg avgWeighted groupBitmapAnd uniqCombined quantileTDigestWeighted
any topK groupBitmapOr uniqCombined64 quantileBFloat16
stddevPop topKWeighted groupBitmapXor uniqHLL12 quantileBFloat16Weighted
stddevSamp groupArray sumWithOverflow quantile simpleLinearRegression
varPop groupUniqArray sumMap quantiles stochasticLinearRegression
varSamp groupArrayInsertAt minMap quantileExact stochasticLogisticRegression
covarPop groupArrayMovingAvg maxMap quantileExactLow categoricalInformationValue
covarSamp groupArrayMovingSum skewSamp quantileExactHigh  
    skewPop quantileExactWeighted  

Lets explain some of the aggregate functions then give examples with ClickHouse.

count

Count function, Counts the number of rows or not-NULL values.

ClickHouse supports the following syntaxes for count:

  • count(expr) or COUNT(DISTINCT expr).
  • count() or COUNT(*). The count() syntax is ClickHouse-specific.

Example 1

SELECT count() FROM table

Result

┌─count()─┐
│      47 │
└─────────┘

Example 2

SELECT count(DISTINCT number) FROM table

Result

┌─uniqExact(num)─┐
│              8 │
└────────────────┘

This example shows that count(DISTINCT num) is performed by the uniqExact function according to the count_distinct_implementation setting value.

min

Aggregate function that calculates the minimum across a group of values.

Example

SELECT min(salary) FROM employees;
SELECT department, min(salary) FROM employees GROUP BY department;

max

Aggregate function that calculates the maximum across a group of values.

SELECT max(salary) FROM employees;
SELECT department, max(salary) FROM employees GROUP BY department;

sum

Calculates the sum. Only works for numbers.

SELECT sum(salary) FROM employees;

avg

Calculates the arithmetic mean.

SELECT avg(salary) FROM employees;

Parametric Aggregate Functions

Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters – constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments.

Parametric Aggregate Functions listed below:

histogram retention
sequenceMatch uniqUpTo
sequenceCount sumMapFiltered
windowFunnel  

Calculates an adaptive histogram. It does not guarantee precise results.

histogram

Example;
SELECT histogram(5)(number + 1)
FROM (
    SELECT *
    FROM system.numbers
    LIMIT 20
)

Result;

┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐
│ [(1,4.5,4),(4.5,8.5,4),(8.5,12.75,4.125),(12.75,17,4.625),(17,20,3.25)] │
└─────────────────────────────────────────────────────────────────────────┘

You can visualize a histogram with the bar function, for example:

WITH histogram(5)(rand() % 100) AS hist
SELECT
    arrayJoin(hist).3 AS height,
    bar(height, 0, 6, 5) AS bar
FROM
(
    SELECT *
    FROM system.numbers
    LIMIT 20
)

Result;

┌─height─┬─bar───┐
│  2.125 │ █▋    │
│   3.25 │ ██▌   │
│  5.625 │ ████▏ │
│  5.625 │ ████▏ │
│  3.375 │ ██▌   │
└────────┴───────┘

sequenceMatch

Checks whether the sequence contains an event chain that matches the pattern.

sequenceMatch(pattern)(timestamp, cond1, cond2, ...)

Example;

SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2) FROM test

Result;

┌─sequenceMatch('(?1)(?2)')(time, equals(number, 1), equals(number, 2), equals(number, 3))─┐
│                                                                                        0 │
└──────────────────────────────────────────────────────────────────────────────────────────┘

sequenceCount

Counts the number of event chains that matched the pattern. The function searches event chains that do not overlap. It starts to search for the next chain after the current chain is matched.

Example;

Consider data in the test table:

┌─time─┬─number─┐
│    1 │      1 │
│    2 │      3 │
│    3 │      2 │
│    4 │      1 │
│    5 │      3 │
│    6 │      2 │
└──────┴────────┘

Count how many times the number 2 occurs after the number 1 with any amount of other numbers between them:

SELECT sequenceCount('(?1).*(?2)')(time, number = 1, number = 2) FROM t

Result;

┌─sequenceCount('(?1).*(?2)')(time, equals(number, 1), equals(number, 2))─┐
│                                                                       2 │
└─────────────────────────────────────────────────────────────────────────┘

Aggregate Function Combinators

The name of an aggregate function can have a suffix appended to it. This changes the way the aggregate function works.

Aggregate function combinators listed below;

If State
Array Merge
Map MergeState
SimpleState ForEach
Distinct OrDefault
OrNull Resample

If

The suffix -If can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition (Uint8 type). The aggregate function processes only the rows that trigger the condition. If the condition was not triggered even once, it returns a default value (usually zeros or empty strings).

Examples: sumIf(column, cond)countIf(cond)avgIf(x, cond)quantilesTimingIf(level1, level2)(x, cond)argMinIf(arg, val, cond) and so on.

Example;

SELECT
    sumIf(value, value % 2 == 1) AS odd_sum,
FROM table;

Avg

Sum or arithmetic average, but only for the rows that satisfy the condition passed by the second argument.

Therefore, SUM_IF(value, condition) is a slightly shorter notation for SUM(IF(condition, value)), same for AVG. The argument’s data type expansion is similar to the same-name functions without a suffix.

SELECT
    avgIf(value, value % 2 == 1) AS odd_avg,
FROM table;

Conclusion

In this article, we gave examples with Aggregate Functions, Parametric Aggregate Functions and Aggregate Function Combinators on ClickHouse. ClickHouse, is a database that has analyzed the deficiencies in the queries written to reach the data in the industry and found solutions to them. We have covered some of the examples of these solutions in this article. We can access the data in the format we want by adding some functions to the same query that we use to access the data and access this data.

To read more functions in ClickHouse, do consider reading the following articles

About Can Sayn 41 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website