1. Home
  2. Knowledge Base
  3. ClickHouse
  4. ClickHouse String Functions – Part I
  1. Home
  2. Knowledge Base
  3. ClickHouse DBA
  4. ClickHouse String Functions – Part I
  1. Home
  2. Knowledge Base
  3. ClickHouse Performance
  4. ClickHouse String Functions – Part I

ClickHouse String Functions – Part I

Introduction

ClickHouse has three types of functions for string operations. These are:

  • General Functions
  • Searching Functions
  • Replacing Functions

I aimed to explain the most commonly used general string functions with examples in this article.

General String Functions

I used the “Air Traffic Dataset” for testing the queries. You can download and install it by just clicking here.

Here are some recently used general functions with simple explanations.

  • length – lengthUTF8 : Returns the length of a string in bytes (for length) and in Unicode code points.
  • empty – notEmpty : Checks whether the input string is empty or not.

In this example, you can check if a column is empty or not using both the “length” and “notEmpty” functions. Similarly, you can find empty columns in tables.

SELECT
    callsign,
    icao24,
    registration,
    destination
FROM opensky
WHERE (length(destination) > 0) AND notEmpty(registration)
LIMIT 10

┌─callsign─┬─icao24─┬─registration─┬─destination─┐
│ N309MH   │ a341dc │ N309MH       │ 00AK        │
│ N51446   │ a6734e │ N51446       │ 00AK        │
│ N907XL   │ ac8b45 │ N907XL       │ 00AK        │
│ CGUNH    │ c07ad2 │ C-GUNH       │ 00AL        │
│ N171GS   │ a11d8b │ N171GS       │ 00AL        │
│ N222KD   │ a1e954 │ N222KD       │ 00AL        │
│ N321JB   │ a372d1 │ N321JB       │ 00AL        │
│ N425PJ   │ a50f80 │ N425PJ       │ 00AL        │
│ N477PS   │ a5dc71 │ N477PS       │ 00AL        │
│ N4799X   │ a5e632 │ N4799X       │ 00AL        │
└──────────┴────────┴──────────────┴─────────────┘

 

  • leftPad – leftPadUTF8 / rightPad – rightPadUTF8 : Pads the given string for the right(rightPad – rightPadUTF8) or left (leftPad – leftPadUTF8) with a blank or specified string or character.
SELECT
    rightPad('Hello!', 10, '*') AS RightPad,
    leftPad('Hello!', 10, '*') AS LeftPad

┌─RightPad───┬─LeftPad────┐
│ Hello!**** │ ****Hello! │
└────────────┴────────────┘

 

  • upper – ucase / lower – lcase : Converts ASCII Latin symbols in a string to uppercase ( upper – ucase) or lowercase ( lower – lcase )
SELECT
    upper('Hello!') AS UpperCase,
    lower('Hello!') AS LowerCase

┌─UpperCase─┬─LowerCase─┐
│ HELLO!    │ hello!    │
└───────────┴───────────┘

 

  • repeat : Repeats a string whatever you want.
  • reverse: reverse a string.
  • format : Formatting constant pattern with the string listed in the arguments.
  • concat : Concatenates the strings listed in the arguments, without a separator.

You can combine two or more functions to work together. In the example below, I used the repeat and reverse functions individually and together.

The “format” function used indexes to sort arguments. On the other hand, the “concat” function basically concatenates the parameters.

SELECT
    repeat('hello', 5) AS repeatFiveTimes,
    reverse('hello') AS reverseFunction,
    repeat(reverse('hello'), 5)


┌─repeatFiveTimes───────────┬─reverseFunction─┬─repeat(reverse('hello'), 5)─┐
│ hellohellohellohellohello │ olleh           │ olleholleholleholleholleh   │
└───────────────────────────┴─────────────────┴─────────────────────────────┘
SELECT format('{1} {0} {3} {2}', 'World!', 'Hello', 'ClickHouse', 'Welcome to')


┌─format('{1} {0} {3} {2}', 'World!', 'Hello', 'ClickHouse', 'Welcome to')─┐
│ Hello World! Welcome to ClickHouse                                       │
└──────────────────────────────────────────────────────────────────────────┘


SELECT concat('Hello ', 'World! ', 'Welcome to ', 'ClickHouse')


┌─concat('Hello ', 'World! ', 'Welcome to ', 'ClickHouse')─┐
│ Hello World! Welcome to ClickHouse                       │
└──────────────────────────────────────────────────────────┘

 

  • substring – substr : return a substring of a given string from the offset with length.
  • startsWith / endsWith : Returns 1 whether string starts(startswith) or ends(endswith) with the specified prefix, otherwise it returns 0.

In the “substr” example, the first offset index is 1, so it starts from the first character and returns the following five characters (which is Hello).

Another example is the “startsWith” and “endsWith” functions. Functions return true if the given string starts with “https” and ends with “.com“.

SELECT substr('Hello World!', 1, 5)

┌─substring('Hello World!', 1, 5)─┐
│ Hello                           │
└─────────────────────────────────┘

 

SET param_address='https://www.google.com'
SELECT {address:String}
WHERE startsWith({address:String}, 'https') AND endsWith({address:String}, '.com')


┌─_CAST('https://www.google.com', 'String')─┐
│ https://www.google.com                    │
└───────────────────────────────────────────┘

 

  • trim : Removes all specified characters from the start, end or both of a string
  • trimLeft. : Removes all consecutive occurrences of common whitespace (ASCII character 32) from the beginning of a string
  • trimRight : Removes all consecutive occurrences of common whitespace (ASCII character 32) from the end of a string
  • trimBoth : Removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string

In the following example, it trims the “https:” pattern from the start of the string. On the other hand, “trimLeft“, “trimRight” and “trimBoth” functions remove only whitespace.

SET param_address='https://www.google.com';

select trim(LEADING 'https://' FROM {address:String}) as trimExample;

┌─trimExample────┐
│ www.google.com │
└────────────────┘
SELECT
    trimLeft('     Hello, world!     ') AS LeftTrim,
    trimRight('     Hello, world!     ') AS RightTrim,
    trimBoth('     Hello, world!     ') AS BothTrim


┌─LeftTrim───────────┬─RightTrim──────────┬─BothTrim──────┐
│ Hello, world!      │      Hello, world! │ Hello, world! │
└────────────────────┴────────────────────┴───────────────┘

 

  • normalizeQuery : Provides to replace literals and lists of literals in SQL query to question marks.
  • normalizedQueryHash : Provides a function to calculate a hash of query structure without the values of literals.
SELECT normalizeQuery('select registration,firstseen from opensky where typecode=\'B734\' order by 1 limit 5')
FORMAT Vertical

Row 1:
──────
normalizeQuery('select registration,firstseen from opensky where typecode=\'B734\' order by 1 limit 5'): select registration,firstseen from opensky where typecode=? order by ? limit ?

In the following queries, “typecode” values are different but query hash values are exactly the same.

SELECT normalizedQueryHash('select registration,typecode,firstseen,lastseen from opensky where typecode=\'PC12\' limit 5')


┌─normalizedQueryHash('select registration,typecode,firstseen,lastseen from opensky where typecode=\'PC12\' limit 5')─┐
│                                                                                                16603946898512661700 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘


SELECT normalizedQueryHash('select registration,typecode,firstseen,lastseen from opensky where typecode=\'B734\' limit 5')


┌─normalizedQueryHash('select registration,typecode,firstseen,lastseen from opensky where typecode=\'B734\' limit 5')─┐
│                                                                                                16603946898512661700 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

 

I would like to explain general string functions in ClickHouse. In the following article, I’ll explain searching and replacing string functions as well.

For further details about string functions in ClickHouse, please visit here.

 

 

 

 

 

 

Was this article helpful?

Related Articles

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

In the spirit of freedom, independence and innovation. ChistaDATA Corporation is not affiliated with ClickHouse Corporation 

Need Support?

Can't find the answer you're looking for?
Contact Support

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)

Copyright 2022 ChistaDATA Inc

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.