ClickHouse Permutation by Recursion and Cross Join

Permutation is the process of arranging a set of elements in all possible orders. ClickHouse supports two methods for computing permutations: recursion and cross join. Here’s how each method works, along with real-life data examples and use cases.

1. Permutation via Recursion:

The recursion method involves recursively selecting each element in the set and computing the permutations of the remaining elements. The base case occurs when there is only one element remaining, at which point the permutation is complete.

Suppose we have a table of orders, and we want to compute all possible permutations of the order items. We can use recursion to generate all possible order item combinations. For example:

WITH RECURSIVE permute(items) AS (
SELECT ARRAY['item1', 'item2', 'item3', 'item4'] AS items
UNION ALL
SELECT arrayConcat(items[2:], [items[1]]) FROM permute WHERE length(items) > 1
)
SELECT items FROM permute;

This will generate all possible order item combinations.

Use cases:

  • Computing all possible combinations of items in a shopping cart.
  • Generating all possible permutations of a sequence of DNA bases in a bioinformatics application.

2. Permutation via Cross Join:

The cross join method involves joining the set of elements with itself repeatedly, creating a Cartesian product. The resulting rows represent all possible combinations of the elements.

Suppose we have a table of products, and we want to generate all possible combinations of the products for a promotion. We can use cross join to generate all possible product combinations. For example:

SELECT p1.product_id, p2.product_id
FROM products AS p1
CROSS JOIN products AS p2
WHERE p1.product_id < p2.product_id;

This will generate all possible product combinations.

Use cases:

  • Generating all possible combinations of product bundles for a promotion.
  • Computing all possible pairs of customers who purchased a particular product.

Conclusion

Overall, both permutation via recursion and permutation via cross join can be useful in a variety of applications. Recursion is well-suited for situations where the number of elements in the set is relatively small, while cross join is more efficient for larger sets.

To read more about JOINs in ClickHouse, do consider reading the below articles:

About Shiv Iyer 229 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.