Aggregate Functions via ClickHouse

Table of Contents

Introduction

Functions that arrive at the resultant value by calculating on a set of values are called aggregation functions. In ClickHouse, aggregate functions work in the normal way as expected by database expert.

In this article, we will give examples with Aggregate Functions, Parametric Aggregate Functions and Aggregate Function Combinators on ClickHouse.

ClickHouse Aggregate Functions

( To more information please visit ClickHouse docs here.)

Standard aggregate functions:	ClickHouse-specific aggregate functions:
count	anyHeavy	groupBitAnd	kurtSamp	quantileTiming
min	anyLast	groupBitOr	kurtPop	quantileTimingWeighted
max	argMin	groupBitXor	uniq	quantileDeterministic
sum	argMax	groupBitmap	uniqExact	quantileTDigest
avg	avgWeighted	groupBitmapAnd	uniqCombined	quantileTDigestWeighted
any	topK	groupBitmapOr	uniqCombined64	quantileBFloat16
stddevPop	topKWeighted	groupBitmapXor	uniqHLL12	quantileBFloat16Weighted
stddevSamp	groupArray	sumWithOverflow	quantile	simpleLinearRegression
varPop	groupUniqArray	sumMap	quantiles	stochasticLinearRegression
varSamp	groupArrayInsertAt	minMap	quantileExact	stochasticLogisticRegression
covarPop	groupArrayMovingAvg	maxMap	quantileExactLow	categoricalInformationValue
covarSamp	groupArrayMovingSum	skewSamp	quantileExactHigh
		skewPop	quantileExactWeighted

Lets explain some of the aggregate functions then give examples with ClickHouse.

count

Count function, Counts the number of rows or not-NULL values.

ClickHouse supports the following syntaxes for count:

count(expr) or COUNT(DISTINCT expr).
count() or COUNT(*). The count() syntax is ClickHouse-specific.

Example 1

SELECT count() FROM table

Result

┌─count()─┐
│      47 │
└─────────┘

Example 2

SELECT count(DISTINCT number) FROM table

Result

┌─uniqExact(num)─┐
│              8 │
└────────────────┘

This example shows that count(DISTINCT num) is performed by the uniqExact function according to the count_distinct_implementation setting value.

min

Aggregate function that calculates the minimum across a group of values.

Example

SELECT min(salary) FROM employees;

SELECT department, min(salary) FROM employees GROUP BY department;

max

Aggregate function that calculates the maximum across a group of values.

SELECT max(salary) FROM employees;

SELECT department, max(salary) FROM employees GROUP BY department;

sum

Calculates the sum. Only works for numbers.

SELECT sum(salary) FROM employees;

avg

Calculates the arithmetic mean.

SELECT avg(salary) FROM employees;

Parametric Aggregate Functions

Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters – constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments.

Parametric Aggregate Functions listed below:

histogram	retention
sequenceMatch	uniqUpTo
sequenceCount	sumMapFiltered
windowFunnel

Calculates an adaptive histogram. It does not guarantee precise results.

histogram

Example;

SELECT histogram(5)(number + 1)
FROM (
    SELECT *
    FROM system.numbers
    LIMIT 20
)

Result;

┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐
│ [(1,4.5,4),(4.5,8.5,4),(8.5,12.75,4.125),(12.75,17,4.625),(17,20,3.25)] │
└─────────────────────────────────────────────────────────────────────────┘

You can visualize a histogram with the bar function, for example:

WITH histogram(5)(rand() % 100) AS hist
SELECT
    arrayJoin(hist).3 AS height,
    bar(height, 0, 6, 5) AS bar
FROM
(
    SELECT *
    FROM system.numbers
    LIMIT 20
)

Result;

┌─height─┬─bar───┐
│  2.125 │ █▋    │
│   3.25 │ ██▌   │
│  5.625 │ ████▏ │
│  5.625 │ ████▏ │
│  3.375 │ ██▌   │
└────────┴───────┘

sequenceMatch

Checks whether the sequence contains an event chain that matches the pattern.

sequenceMatch(pattern)(timestamp, cond1, cond2, ...)

Example;

SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2) FROM test

Result;

┌─sequenceMatch('(?1)(?2)')(time, equals(number, 1), equals(number, 2), equals(number, 3))─┐
│                                                                                        0 │
└──────────────────────────────────────────────────────────────────────────────────────────┘

sequenceCount

Counts the number of event chains that matched the pattern. The function searches event chains that do not overlap. It starts to search for the next chain after the current chain is matched.

Example;

Consider data in the test table:

┌─time─┬─number─┐
│    1 │      1 │
│    2 │      3 │
│    3 │      2 │
│    4 │      1 │
│    5 │      3 │
│    6 │      2 │
└──────┴────────┘

Count how many times the number 2 occurs after the number 1 with any amount of other numbers between them:

SELECT sequenceCount('(?1).*(?2)')(time, number = 1, number = 2) FROM t

Result;

┌─sequenceCount('(?1).*(?2)')(time, equals(number, 1), equals(number, 2))─┐
│                                                                       2 │
└─────────────────────────────────────────────────────────────────────────┘

Aggregate Function Combinators

The name of an aggregate function can have a suffix appended to it. This changes the way the aggregate function works.

Aggregate function combinators listed below;

If	State
Array	Merge
Map	MergeState
SimpleState	ForEach
Distinct	OrDefault
OrNull	Resample

If

The suffix -If can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition (Uint8 type). The aggregate function processes only the rows that trigger the condition. If the condition was not triggered even once, it returns a default value (usually zeros or empty strings).

Examples: sumIf(column, cond), countIf(cond), avgIf(x, cond), quantilesTimingIf(level1, level2)(x, cond), argMinIf(arg, val, cond) and so on.

Example;

SELECT
    sumIf(value, value % 2 == 1) AS odd_sum,
FROM table;

Avg

Sum or arithmetic average, but only for the rows that satisfy the condition passed by the second argument.

Therefore, SUM_IF(value, condition) is a slightly shorter notation for SUM(IF(condition, value)), same for AVG. The argument’s data type expansion is similar to the same-name functions without a suffix.

SELECT
    avgIf(value, value % 2 == 1) AS odd_avg,
FROM table;

Conclusion

In this article, we gave examples with Aggregate Functions, Parametric Aggregate Functions and Aggregate Function Combinators on ClickHouse. ClickHouse, is a database that has analyzed the deficiencies in the queries written to reach the data in the industry and found solutions to them. We have covered some of the examples of these solutions in this article. We can access the data in the format we want by adding some functions to the same query that we use to access the data and access this data.

To read more functions in ClickHouse, do consider reading the following articles

ChistaDATA Inc.

Enterprise-class 24*7 ClickHouse Consultative Support and Managed Services

Aggregate Functions via ClickHouse

Introduction

ClickHouse Aggregate Functions

count

Example 1

Result

Example 2

Result

min

Example

max

sum

avg

Parametric Aggregate Functions

histogram

sequenceMatch

sequenceCount

Aggregate Function Combinators

If

Avg

Conclusion