Functions that arrive at the resultant value by calculating on a set of values are called aggregation functions. In ClickHouse, aggregate functions work in the normal way as expected by database expert.

In this article, we will give examples with **Aggregate Functions**, **Parametric Aggregate Functions** and **Aggregate Function Combinators** on **ClickHouse**.

ClickHouse Aggregate Functions are listed below;

( To more information please visit official ClickHouse docs here. )

Lets explain some of the aggregate functions then give examples with ClickHouse.

# count

Count function, Counts the number of rows or not-NULL values.

ClickHouse supports the following syntaxes for `count`

:

`count(expr)`

or`COUNT(DISTINCT expr)`

.`count()`

or`COUNT(*)`

. The`count()`

syntax is ClickHouse-specific.

Example 1;

SELECT count() FROM table

Result;

┌─count()─┐ │ 47 │ └─────────┘

Example 2;

SELECT count(DISTINCT number) FROM table

Result;

┌─uniqExact(num)─┐ │ 8 │ └────────────────┘

This example shows that `count(DISTINCT num)`

is performed by the `uniqExact`

function according to the `count_distinct_implementation`

setting value.

# min

Aggregate function that calculates the minimum across a group of values.

Example;

SELECT min(salary) FROM employees;

SELECT department, min(salary) FROM employees GROUP BY department;

# max

Aggregate function that calculates the maximum across a group of values.

SELECT max(salary) FROM employees;

SELECT department, max(salary) FROM employees GROUP BY department;

# sum

Calculates the sum. Only works for numbers.

SELECT sum(salary) FROM employees;

# avg

Calculates the arithmetic mean.

SELECT avg(salary) FROM employees;

# Parametric Aggregate Functions

Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters – constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments.

Parametric Aggregate Functions listed below;

histogram | retention |

sequenceMatch | uniqUpTo |

sequenceCount | sumMapFiltered |

windowFunnel |

Calculates an adaptive histogram. It does not guarantee precise results.

## histogram

Example;

SELECT histogram(5)(number + 1) FROM ( SELECT * FROM system.numbers LIMIT 20 )

Result;

┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐ │ [(1,4.5,4),(4.5,8.5,4),(8.5,12.75,4.125),(12.75,17,4.625),(17,20,3.25)] │ └─────────────────────────────────────────────────────────────────────────┘

You can visualize a histogram with the bar function, for example:

WITH histogram(5)(rand() % 100) AS hist SELECT arrayJoin(hist).3 AS height, bar(height, 0, 6, 5) AS bar FROM ( SELECT * FROM system.numbers LIMIT 20 )

Result;

┌─height─┬─bar───┐ │ 2.125 │ █▋ │ │ 3.25 │ ██▌ │ │ 5.625 │ ████▏ │ │ 5.625 │ ████▏ │ │ 3.375 │ ██▌ │ └────────┴───────┘

## sequenceMatch

Checks whether the sequence contains an event chain that matches the pattern.

sequenceMatch(pattern)(timestamp, cond1, cond2, ...)

Example;

SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2) FROM test

Result;

┌─sequenceMatch('(?1)(?2)')(time, equals(number, 1), equals(number, 2), equals(number, 3))─┐ │ 0 │ └──────────────────────────────────────────────────────────────────────────────────────────┘

## sequenceCount

Counts the number of event chains that matched the pattern. The function searches event chains that do not overlap. It starts to search for the next chain after the current chain is matched.

Example;

Consider data in the `test`

table:

┌─time─┬─number─┐ │ 1 │ 1 │ │ 2 │ 3 │ │ 3 │ 2 │ │ 4 │ 1 │ │ 5 │ 3 │ │ 6 │ 2 │ └──────┴────────┘

Count how many times the number 2 occurs after the number 1 with any amount of other numbers between them:

SELECT sequenceCount('(?1).*(?2)')(time, number = 1, number = 2) FROM t

Result;

┌─sequenceCount('(?1).*(?2)')(time, equals(number, 1), equals(number, 2))─┐ │ 2 │ └─────────────────────────────────────────────────────────────────────────┘

# Aggregate Function Combinators

The name of an aggregate function can have a suffix appended to it. This changes the way the aggregate function works.

Aggregate function combinators listed below;

If | State |

Array | Merge |

Map | MergeState |

SimpleState | ForEach |

Distinct | OrDefault |

OrNull | Resample |

## If

The suffix -If can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition (Uint8 type). The aggregate function processes only the rows that trigger the condition. If the condition was not triggered even once, it returns a default value (usually zeros or empty strings).

Examples: `sumIf(column, cond)`

, `countIf(cond)`

, `avgIf(x, cond)`

, `quantilesTimingIf(level1, level2)(x, cond)`

, `argMinIf(arg, val, cond)`

and so on.

Example;

SELECT sumIf(value, value % 2 == 1) AS odd_sum, FROM table;

## Avg

Sum or arithmetic average, but only for the rows that satisfy the condition passed by the second argument.

Therefore, `SUM_IF(value, condition)`

is a slightly shorter notation for `SUM(IF(condition, value))`

, same for `AVG`

. The argument’s data type expansion is similar to the same-name functions without a suffix.

SELECT avgIf(value, value % 2 == 1) AS odd_avg, FROM table;

## Summary

In this article, we gave examples with **Aggregate Functions**, **Parametric Aggregate Functions** and **Aggregate Function Combinators** on **ClickHouse**. ClickHouse, is a database that has analyzed the deficiencies in the queries written to reach the data in the industry and found solutions to them. We have covered some of the examples of these solutions in this article. We can access the data in the format we want by adding some functions to the same query that we use to access the data and access this data.