Introduction
ClickHouse has three types of functions for string operations. These are:
- General Functions
- Searching Functions
- Replacing Functions
I aimed to explain the most commonly used general string functions with examples in this article.
General String Functions
I used the “Air Traffic Dataset” for testing the queries. You can download and install it by just clicking here.
Here are some recently used general functions with simple explanations.
- length – lengthUTF8 : Returns the length of a string in bytes (for length) and in Unicode code points.
- empty – notEmpty : Checks whether the input string is empty or not.
In this example, you can check if a column is empty or not using both the “length” and “notEmpty” functions. Similarly, you can find empty columns in tables.
SELECT callsign, icao24, registration, destination FROM opensky WHERE (length(destination) > 0) AND notEmpty(registration) LIMIT 10 ┌─callsign─┬─icao24─┬─registration─┬─destination─┐ │ N309MH │ a341dc │ N309MH │ 00AK │ │ N51446 │ a6734e │ N51446 │ 00AK │ │ N907XL │ ac8b45 │ N907XL │ 00AK │ │ CGUNH │ c07ad2 │ C-GUNH │ 00AL │ │ N171GS │ a11d8b │ N171GS │ 00AL │ │ N222KD │ a1e954 │ N222KD │ 00AL │ │ N321JB │ a372d1 │ N321JB │ 00AL │ │ N425PJ │ a50f80 │ N425PJ │ 00AL │ │ N477PS │ a5dc71 │ N477PS │ 00AL │ │ N4799X │ a5e632 │ N4799X │ 00AL │ └──────────┴────────┴──────────────┴─────────────┘
- leftPad – leftPadUTF8 / rightPad – rightPadUTF8 : Pads the given string for the right(rightPad – rightPadUTF8) or left (leftPad – leftPadUTF8) with a blank or specified string or character.
SELECT rightPad('Hello!', 10, '*') AS RightPad, leftPad('Hello!', 10, '*') AS LeftPad ┌─RightPad───┬─LeftPad────┐ │ Hello!**** │ ****Hello! │ └────────────┴────────────┘
- upper – ucase / lower – lcase : Converts ASCII Latin symbols in a string to uppercase ( upper – ucase) or lowercase ( lower – lcase )
SELECT upper('Hello!') AS UpperCase, lower('Hello!') AS LowerCase ┌─UpperCase─┬─LowerCase─┐ │ HELLO! │ hello! │ └───────────┴───────────┘
- repeat : Repeats a string whatever you want.
- reverse: reverse a string.
- format : Formatting constant pattern with the string listed in the arguments.
- concat : Concatenates the strings listed in the arguments, without a separator.
You can combine two or more functions to work together. In the example below, I used the repeat and reverse functions individually and together.
The “format” function used indexes to sort arguments. On the other hand, the “concat” function basically concatenates the parameters.
SELECT repeat('hello', 5) AS repeatFiveTimes, reverse('hello') AS reverseFunction, repeat(reverse('hello'), 5) ┌─repeatFiveTimes───────────┬─reverseFunction─┬─repeat(reverse('hello'), 5)─┐ │ hellohellohellohellohello │ olleh │ olleholleholleholleholleh │ └───────────────────────────┴─────────────────┴─────────────────────────────┘
SELECT format('{1} {0} {3} {2}', 'World!', 'Hello', 'ClickHouse', 'Welcome to') ┌─format('{1} {0} {3} {2}', 'World!', 'Hello', 'ClickHouse', 'Welcome to')─┐ │ Hello World! Welcome to ClickHouse │ └──────────────────────────────────────────────────────────────────────────┘ SELECT concat('Hello ', 'World! ', 'Welcome to ', 'ClickHouse') ┌─concat('Hello ', 'World! ', 'Welcome to ', 'ClickHouse')─┐ │ Hello World! Welcome to ClickHouse │ └──────────────────────────────────────────────────────────┘
- substring – substr : return a substring of a given string from the offset with length.
- startsWith / endsWith : Returns 1 whether string starts(startswith) or ends(endswith) with the specified prefix, otherwise it returns 0.
In the “substr” example, the first offset index is 1, so it starts from the first character and returns the following five characters (which is Hello).
Another example is the “startsWith” and “endsWith” functions. Functions return true if the given string starts with “https” and ends with “.com“.
SELECT substr('Hello World!', 1, 5) ┌─substring('Hello World!', 1, 5)─┐ │ Hello │ └─────────────────────────────────┘
SET param_address='https://www.google.com' SELECT {address:String} WHERE startsWith({address:String}, 'https') AND endsWith({address:String}, '.com') ┌─_CAST('https://www.google.com', 'String')─┐ │ https://www.google.com │ └───────────────────────────────────────────┘
- trim : Removes all specified characters from the start, end or both of a string
- trimLeft. : Removes all consecutive occurrences of common whitespace (ASCII character 32) from the beginning of a string
- trimRight : Removes all consecutive occurrences of common whitespace (ASCII character 32) from the end of a string
- trimBoth : Removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string
In the following example, it trims the “https:” pattern from the start of the string. On the other hand, “trimLeft“, “trimRight” and “trimBoth” functions remove only whitespace.
SET param_address='https://www.google.com'; select trim(LEADING 'https://' FROM {address:String}) as trimExample; ┌─trimExample────┐ │ www.google.com │ └────────────────┘
SELECT trimLeft(' Hello, world! ') AS LeftTrim, trimRight(' Hello, world! ') AS RightTrim, trimBoth(' Hello, world! ') AS BothTrim ┌─LeftTrim───────────┬─RightTrim──────────┬─BothTrim──────┐ │ Hello, world! │ Hello, world! │ Hello, world! │ └────────────────────┴────────────────────┴───────────────┘
- normalizeQuery : Provides to replace literals and lists of literals in SQL query to question marks.
- normalizedQueryHash : Provides a function to calculate a hash of query structure without the values of literals.
SELECT normalizeQuery('select registration,firstseen from opensky where typecode=\'B734\' order by 1 limit 5') FORMAT Vertical Row 1: ────── normalizeQuery('select registration,firstseen from opensky where typecode=\'B734\' order by 1 limit 5'): select registration,firstseen from opensky where typecode=? order by ? limit ?
In the following queries, “typecode” values are different but query hash values are exactly the same.
SELECT normalizedQueryHash('select registration,typecode,firstseen,lastseen from opensky where typecode=\'PC12\' limit 5') ┌─normalizedQueryHash('select registration,typecode,firstseen,lastseen from opensky where typecode=\'PC12\' limit 5')─┐ │ 16603946898512661700 │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ SELECT normalizedQueryHash('select registration,typecode,firstseen,lastseen from opensky where typecode=\'B734\' limit 5') ┌─normalizedQueryHash('select registration,typecode,firstseen,lastseen from opensky where typecode=\'B734\' limit 5')─┐ │ 16603946898512661700 │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
I would like to explain general string functions in ClickHouse. In the following article, I’ll explain searching and replacing string functions as well.
For further details about string functions in ClickHouse, please visit here.