Derived Tables for Query Performance in ClickHouse

Table of Contents

Introduction

Derived tables are tables that are created on-the-fly as a result of a query. They are temporary tables that exist only for the duration of the query, and are not stored in the database. ClickHouse supports derived tables as a way to simplify complex queries and reduce the amount of data that needs to be processed. Here’s how to implement derived tables in ClickHouse:

  1. Creating a derived table: To create a derived table, you can include a subquery in your main query, and use the result of the subquery as the derived table. For example:
SELECT *
FROM (
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
GROUP BY customer_id
) AS customer_orders
WHERE num_orders > 10;

In this example, the subquery creates a derived table that counts the number of orders for each customer. The main query then filters the results to only include customers with more than 10 orders.

  1. Joining a derived table: You can also join a derived table with another table in your query. For example:
SELECT *
FROM products
JOIN (
SELECT product_id, AVG(price) AS avg_price
FROM prices
GROUP BY product_id
) AS product_prices
ON products.product_id = product_prices.product_id
WHERE product_prices.avg_price > 50;

In this example, the subquery creates a derived table that calculates the average price for each product. The main query then joins this derived table with the products table and filters the results to only include products with an average price greater than 50.

  1. Using a derived table multiple times: You can also use a derived table multiple times within a query. For example:
SELECT *
FROM (
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
GROUP BY customer_id
) AS customer_orders
JOIN (
SELECT customer_id, SUM(total_price) AS total_spent
FROM orders
GROUP BY customer_id
) AS customer_totals
ON customer_orders.customer_id = customer_totals.customer_id
WHERE customer_orders.num_orders > 10
AND customer_totals.total_spent > 1000;

Conclusion

In this example, two derived tables are created: one that counts the number of orders for each customer, and one that calculates the total amount spent by each customer. The main query then joins these derived tables and filters the results to only include customers with more than 10 orders and a total spend of more than 1000.

To read more about ClickHouse, do consider reading the below articles:

About Shiv Iyer 216 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.