ClickHouse supports user-defined functions (UDF) that can either be created based on the built-in functions in ClickHouse or from an external script or program. ClickHouse uses STDIN and STDOUT data streams to communicate with the program/script. Let us focus on ClickHouse UDFs from an external Python program in this example. For the sake of simplicity, let us create a Python function that can calculate the volume of a cylinder.
Step 1 – Create the UDF Python script
The following Python script will be used to create the Clickhouse UDF which will calculate the volume of the cylinder.
#!/usr/bin/python3 import math import sys if __name__ == '__main__': for line in sys.stdin: radius, height = line.split('\t') radius = float(radius) height = float(height) res = str(radius * radius * height * math.pi) print(res, end='') sys.stdout.flush()
Save the script as volume_cylinder.py in /var/lib/clickhouse/user_scripts directory.
Both the height and the radius of the cylinder are received from STDIN. Once the file is ready, convert it as an executable.
chmod +x volume_cylinder.py
Step 2 – Add Function Configuration
Note down the setting value for user_defined_executable_functions_config in the config.xml file. You may have something similar to below.
<user_defined_executable_functions_config>*_function.*ml</user_defined_executable_functions_config>
This is the path where the UDF config XML is stored. For the above example setting, the config file will be in /etc/clickhouse-server/*_function.*ml.
Let us create an XML file called /etc/clickhouse-server/udf_cylinder_volume_function.xml with the below contents.
<functions> <function> <type>executable</type> <name>cylinder_volume</name> <return_type>String</return_type> <argument><type>String</type></argument> <argument><type>String</type></argument> <format>TabSeparated</format> <command>volume_cylinder.py</command> <execute_direct>1</execute_direct> </function> </functions>
We are creating a ClickHouse UDF named cylinder_volume based on the volume_cylinder.py file. The descriptions for the configurations used are below.
- type – executable or executable_pool
- name – the name of the UDF in ClickHouse
- return_type – The data type of the returned Value
- argument – Arguments and data type for the argument that will be passed via STDIN to the function script
- format – One of the supported data formats in ClickHouse
- command – Script name or command to execute
- execute_direct – If true, the script is executed from the user_script folder, else the command is passed as an argument for bin/sh -c
Step 3 – Test the UDF
Once the configs and the scripts are in place, try calling the UDF from ClickHouse SQL.
SELECT cylinder_volume(2,3);
References
https://clickhouse.com/docs/en/sql-reference/functions/udf
https://stackoverflow.com/questions/71236415/how-to-send-multiple-arguments-to-executable-udf-in-clickhouse