How are Regular Expressions (RegEx) implemented in ClickHouse?

Introduction

In ClickHouse, regular expressions are implemented using the match function and the like operator.

  1. The match function is used to check if a string matches a specified regular expression. The syntax for the function is as follows: match(string, pattern). The string argument is the string that you want to check, and the pattern argument is the regular expression that you want to match against. The function returns 1 if the string matches the pattern, and 0 if it does not.
  2. The like operator is used to check if a string matches a specified wildcard pattern. The syntax for the operator is as follows: string like pattern. The string argument is the string that you want to check, and the pattern argument is the wildcard pattern that you want to match against. The operator returns 1 if the string matches the pattern, and 0 if it does not.

Example of RegEx in ClickHouse

ClickHouse supports a broad set of regular expressions features, including:

  • Character classes
  • Alternation
  • Anchors
  • Quantifiers
  • Backreferences
  • Lookarounds
  • Named groups

You can use these features to match and extract specific information from a dataset.

Here’s an example of how to use the match function and the like operator in ClickHouse:

-- Check if a string matches a regular expression
SELECT match('abcde', 'abc(de|d)'); -- returns 1

-- Check if a string matches a wildcard pattern
SELECT 'abcde' like 'a%e'; -- returns 1

Conclusion

It’s important to note that the performance of the match function and the like operator can be impacted by the complexity of the regular expression and the size of the dataset. It’s recommended that you test the performance of your regular expressions on a representative sample of your data before using them in a production environment.

If you liked reading the article, please do consider reading the below articles on ClickHouse:

About Shiv Iyer 216 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.