Window Functions in ClickHouse with ‘OVER’ Clause

Introduction

The “OVER” clause in ClickHouse is used to perform window functions, which allow you to calculate values across a set of table rows related to the current row. Window functions provide a way to analyze data in a flexible and powerful manner. Let’s explore how to use the “OVER” clause with real-life data set examples and a use case.

Syntax of the OVER Clause

SELECT
    column1,
    column2,
    ...
    window_function() OVER (PARTITION BY partition_column ORDER BY order_column)
FROM
    table_name;

Example Use Case: Sales Performance Analysis

Consider a scenario where you have a sales dataset with the following columns: product_id, sale_date, revenue. You want to calculate the cumulative revenue for each product over time. This is where the “OVER” clause comes into play.

Sample Data

product_idsale_daterevenue
12023-01-01100
12023-01-02150
22023-01-01200
12023-01-03120
22023-01-02180

Query:

SELECT
    product_id,
    sale_date,
    revenue,
    SUM(revenue) OVER (PARTITION BY product_id ORDER BY sale_date) AS cumulative_revenue
FROM
    sales
ORDER BY
    product_id, sale_date;

Explanation

In this query, we’re selecting the product_id, sale_date, and revenue columns from the sales table. The key part is the SUM(revenue) OVER (PARTITION BY product_id ORDER BY sale_date) portion. Here’s what each part does:

  • SUM(revenue): This is the window function we’re applying. It calculates the cumulative sum of the revenue column.
  • OVER (PARTITION BY product_id ORDER BY sale_date): This defines the window over which the function operates. We’re partitioning the data by product_id, which means the cumulative sum will reset for each product. The ORDER BY sale_date specifies the order in which the sum is calculated.

Result

product_idsale_daterevenuecumulative_revenue
12023-01-01100100
12023-01-02150250
12023-01-03120370
22023-01-01200200
22023-01-02180380

Use Case Explanation

In this example, the “OVER” clause helped us calculate the cumulative revenue for each product over time. This kind of analysis is valuable for assessing the growth trajectory of product sales.

The “OVER” clause provides powerful insights into data trends within specific contexts, enhancing the depth of analysis and enabling more informed decision-making.

Conclusion

The “OVER” clause in ClickHouse’s window functions is a versatile tool for conducting advanced analyses that involve comparing data across rows. Whether you’re dealing with financial data, sales metrics, or any other dataset, the “OVER” clause adds a layer of flexibility to your queries, enabling deeper insights into your data.

To know more about ClickHouse Functions, do consider reading the following articles:

About Shiv Iyer 227 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.