A self-join in ClickHouse joins a table with itself using aliases. This technique helps compare rows within the same table, find relationships between records, and analyze hierarchical data.
Let’s explore how to implement self-joins in ClickHouse:
1. Use Case Example: Find Rows That Share a Common Key
Consider a table named employees with this schema:
CREATE TABLE employees
(
    employee_id UInt32,
    manager_id UInt32,
    employee_name String,
    salary UInt32
) ENGINE = MergeTree()
ORDER BY employee_id;
Here:
employee_idis the ID of an employee.manager_idis the ID of the manager for that employee.- Our goal is to find all employees along with their managers’ details.
 
2. Basic Self-Join Query
Here’s how to join the employees table with itself using aliases:
SELECT
    e1.employee_id AS employee_id,
    e1.employee_name AS employee_name,
    e1.salary AS employee_salary,
    e2.employee_id AS manager_id,
    e2.employee_name AS manager_name,
    e2.salary AS manager_salary
FROM employees AS e1
LEFT JOIN employees AS e2
    ON e1.manager_id = e2.employee_id;
Explanation:
AS e1andAS e2: Aliases distinguish between the two instances of theemployeestable.ON e1.manager_id = e2.employee_id: The join condition matchese1(employee) withe2(manager).LEFT JOIN: Preserves employee records even when they have no manager (NULL).
3. Use Case: Finding Hierarchical Relationships
For hierarchical data (like employees reporting to managers), you can use multiple self-joins.
Here’s how to find employees, their managers, and their managers’ managers:
SELECT
    e1.employee_name AS employee_name,
    e2.employee_name AS manager_name,
    e3.employee_name AS grand_manager_name
FROM employees AS e1
LEFT JOIN employees AS e2
    ON e1.manager_id = e2.employee_id
LEFT JOIN employees AS e3
    ON e2.manager_id = e3.employee_id;
4. Self-Join for Conditional Comparison
Self-joins with additional conditions help compare rows within the same table.
Example: Find employees with matching salaries:
SELECT
    e1.employee_id AS employee_1_id,
    e1.employee_name AS employee_1_name,
    e2.employee_id AS employee_2_id,
    e2.employee_name AS employee_2_name,
    e1.salary AS shared_salary
FROM employees AS e1
JOIN employees AS e2
    ON e1.salary = e2.salary
    AND e1.employee_id != e2.employee_id;
Explanation:
e1.salary = e2.salary: Matches rows with the same salary.e1.employee_id != e2.employee_id: Prevents matching employees with themselves.
5. Performance Considerations for Self-Joins
- Table Design:
- Use an efficient 
ORDER BYclause for proper sorting and fast lookups. - Use 
PRIMARY KEYand indexes where relevant for filtering. 
 - Use an efficient 
 - Partitioning:
- For large tables, consider partitioning by 
manager_idorsalaryto reduce scan ranges. 
 - For large tables, consider partitioning by 
 - Filtering:
- Apply 
WHEREconditions to pre-filter rows and minimize comparisons. 
 - Apply 
 - Limit Rows:
- Use 
LIMITor specific filters during development to improve query performance. 
 - Use 
 
6. Example with Filtering
To query only employees with salaries above 50,000:
SELECT
    e1.employee_id,
    e1.employee_name,
    e2.employee_id AS manager_id,
    e2.employee_name AS manager_name
FROM employees AS e1
LEFT JOIN employees AS e2
    ON e1.manager_id = e2.employee_id
WHERE e1.salary > 50000;
Summary
Self-joins in ClickHouse combine a table with itself using aliases and JOIN conditions. They excel at:
- Comparing rows within the same table.
 - Building hierarchical relationships.
 - Finding records with common attributes.
 
When properly optimized with filtering and indexing, self-joins perform well even on large ClickHouse tables.
Further Reading
Connect Prometheus to Your ClickHouse® Cluster
ClickHouse Projections: A Complete Guide to Query Optimization
Guide for Query optimization | ClickHouse Docs
Efficient Strategies for Purging Data in ClickHouse: Real-Life Use Cases and Detailed Implementation
Enhancing ClickHouse Performance: Strategic Insights on Partitioning, Indexing, and Monitoring
Mastering Performance Tuning in ClickHouse: Tips for Inspecting Statistics Objects
You might also like:
- ClickHouse ReplacingMergeTree Explained
 - ClickHouse Security: How to set up TLS-SSL for ClickHouse Server
 - ClickHouse Proxy Server for Read-Write Splitting with Connection Pooling & Query Cache
 - Boosting ClickHouse Ingestion Performance by Disabling Foreign Key Checks
 - Overview of Key Features of ChistaDATA’s ClickHouse DBaaS