Advanced ClickHouse SQL: Window Functions, Array, and JSON Processing

Advanced ClickHouse SQL

ClickHouse SQL offers capabilities that go far beyond basic querying, enabling developers and data engineers to tackle complex analytics with ease. This article dives deep into Advanced ClickHouse SQL features such as window functions, array operations, and JSON processing, demonstrating how they can be applied to solve real-world problems. By mastering these techniques, you can significantly enhance query performance, streamline data transformations, and unlock the full analytical power of ClickHouse in high-performance environments.

ClickHouse Window Functions for Advanced Analytics

ClickHouse introduced window functions in version 20.5, providing powerful analytical capabilities directly in SQL queries.

Basic Window Function Syntax

SELECT
    event_time,
    user_id,
    value,
    avg(value) OVER (PARTITION BY user_id ORDER BY event_time
                     ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_avg
FROM user_events

Practical Window Function Examples

Calculating Moving Averages

SELECT
    timestamp,
    metric_value,
    avg(metric_value) OVER (
        ORDER BY timestamp
        ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
    ) AS moving_avg_6points
FROM sensor_data
WHERE sensor_id = 'temp001'
ORDER BY timestamp

Ranking Results

SELECT
    user_id,
    purchase_amount,
    rank() OVER (ORDER BY purchase_amount DESC) AS purchase_rank,
    dense_rank() OVER (ORDER BY purchase_amount DESC) AS dense_purchase_rank
FROM purchases
WHERE purchase_date >= now() - INTERVAL 30 DAY

Calculating Differences Between Rows

SELECT
    timestamp,
    metric_value,
    metric_value - lag(metric_value) OVER (ORDER BY timestamp) AS value_change
FROM sensor_data
ORDER BY timestamp

ClickHouse Array Functions and Operations

ClickHouse has first-class support for array data types, enabling powerful operations directly in SQL.

Creating and Manipulating Arrays

-- Creating arrays
SELECT [1, 2, 3] AS simple_array;

-- Array functions
SELECT
    arrayJoin([1, 2, 3]) AS joined_array,
    arrayMap(x -> x * x, [1, 2, 3]) AS squared_array,
    arrayFilter(x -> x > 1, [1, 2, 3]) AS filtered_array

Practical Array Examples

Sessionization with Arrays

SELECT
    user_id,
    groupArray(event_time) AS event_times,
    groupArray(event_type) AS event_sequence,
    length(groupArray(event_type)) AS session_events_count
FROM user_events
GROUP BY user_id, session_id
HAVING session_events_count > 1

Array Aggregations

SELECT
    date,
    arraySort(groupUniqArray(user_id)) AS unique_users,
    length(arraySort(groupUniqArray(user_id))) AS unique_users_count
FROM page_views
GROUP BY date
ORDER BY date

Working with Nested Data

SELECT
    product_id,
    arrayJoin(categories) AS category,
    count() AS category_count
FROM products
GROUP BY product_id, category
ORDER BY category_count DESC

ClickHouse JSON Processing for Complex Data

ClickHouse provides extensive functions for working with JSON data, whether stored as strings or using specialized JSON types.

Basic JSON Extraction

SELECT
    request_id,
    JSONExtractString(payload, 'user', 'name') AS user_name,
    JSONExtractInt(payload, 'metrics', 'duration') AS duration
FROM api_logs
WHERE JSONHas(payload, 'error') = 1

Advanced JSON Techniques

Working with JSON Arrays

SELECT
    request_id,
    JSONLength(payload, 'items') AS items_count,
    arrayMap(x -> JSONExtractString(x, 'name'),
             JSONExtractArrayRaw(payload, 'items')) AS item_names
FROM orders
WHERE items_count > 0

JSON Path Expressions

SELECT
    event_id,
    JSONExtractString(data, 'user.profile.preferences.theme') AS theme,
    JSONExtractRaw(data, 'user.devices[0]') AS primary_device
FROM events
WHERE JSONType(data, 'user.profile') = 'object'

Converting Between JSON and Arrays

SELECT
    product_id,
    JSONExtractArrayRaw(attributes, 'options') AS options_json,
    arrayMap(x -> JSONExtractString(x, 'name'), options_json) AS option_names
FROM products
WHERE JSONHas(attributes, 'options')

Combining Techniques

The real power comes from combining these features:

SELECT
    user_id,
    -- Window function to get session rank by duration
    rank() OVER (PARTITION BY user_id ORDER BY session_duration DESC) AS session_rank,
    -- Array of events in the session
    groupArray(event_type) AS events,
    -- JSON extraction from each event
    arrayMap(x -> JSONExtractString(x, 'details.page'),
             groupArray(event_payload)) AS pages_visited
FROM user_sessions
WHERE session_start >= now() - INTERVAL 7 DAY
GROUP BY user_id, session_id, session_duration
HAVING session_rank <= 3
ORDER BY user_id, session_rank

Performance Considerations

  • Window functions can be memory-intensive; use appropriate LIMIT clauses
  • Large arrays may impact performance; consider using arrayJoin earlier in the query pipeline
  • JSON extraction is slower than native columns; extract frequently used fields to dedicated columns

Conclusion: Mastering Advanced ClickHouse SQL for Real-World Analytics

ClickHouse’s advanced SQL capabilities provide powerful tools for complex analytical workloads. Window functions enable sophisticated time-series analysis, array operations simplify working with collections of data, and JSON processing functions make it easy to work with semi-structured data.

By mastering these techniques, you can solve complex analytical problems directly in SQL without needing to export data to external processing systems, taking full advantage of ClickHouse’s exceptional performance.

Further reading

Mastering Nested JOINs in ClickHouse: A Complete Guide to Embedding JOINs within JOINs

Understanding the OpenTelemetry Collector: A Comprehensive Guide to Modern Telemetry Management

Building a Medallion Architecture with ClickHouse: A Complete Guide

Mastering Custom Partitioning Keys in ClickHouse: A Complete Guide

Why is ClickHouse So Fast? The Architecture Behind Lightning-Speed Analytics

A fastest Database for Analytics

Untangling the Spaghetti: Writing Efficient ClickHouse SQL

 

Most Common ClickHouse Analytical Models

 

Real-Time Bid Tracking and Optimization with ClickHouse in High-Performance Data Pipelines

You might also like:

About Shiv Iyer 268 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.