ClickHouse supports user-defined functions (UDF) that can either be created based on the built-in functions in ClickHouse or from an external script or program. ClickHouse uses STDIN and STDOUT data streams to communicate with the program/script. Let us focus on ClickHouse UDFs from an external Python program in this example. For the sake of simplicity, let us create a Python function that can calculate the volume of a cylinder.
Step 1 – Create the UDF Python script
The following Python script will be used to create the Clickhouse UDF which will calculate the volume of the cylinder.
#!/usr/bin/python3 import math import sys if __name__ == '__main__': for line in sys.stdin: radius, height = line.split('\t') radius = float(radius) height = float(height) res = str(radius * radius * height * math.pi) print(res, end='') sys.stdout.flush()
Save the script as volume_cylinder.py in /var/lib/clickhouse/user_scripts directory.
Both the height and the radius of the cylinder are received from STDIN. Once the file is ready, convert it as an executable.
chmod +x volume_cylinder.py
Step 2 – Add Function Configuration
Note down the setting value for user_defined_executable_functions_config in the config.xml file. You may have something similar to below.
This is the path where the UDF config XML is stored. For the above example setting, the config file will be in /etc/clickhouse-server/*_function.*ml.
Let us create an XML file called /etc/clickhouse-server/udf_cylinder_volume_function.xml with the below contents.
<functions> <function> <type>executable</type> <name>cylinder_volume</name> <return_type>String</return_type> <argument><type>String</type></argument> <argument><type>String</type></argument> <format>TabSeparated</format> <command>volume_cylinder.py</command> <execute_direct>1</execute_direct> </function> </functions>
We are creating a ClickHouse UDF named cylinder_volume based on the volume_cylinder.py file. The descriptions for the configurations used are below.
- type – executable or executable_pool
- name – the name of the UDF in ClickHouse
- return_type – The data type of the returned Value
- argument – Arguments and data type for the argument that will be passed via STDIN to the function script
- format – One of the supported data formats in ClickHouse
- command – Script name or command to execute
- execute_direct – If true, the script is executed from the user_script folder, else the command is passed as an argument for bin/sh -c
Step 3 – Test the UDF
Once the configs and the scripts are in place, try calling the UDF from ClickHouse SQL.