Image Courtesy – Pexels
ClickHouse is an open-source columnar database meant for online analytical processing workloads. We have covered how to set up ClickHouse using Docker in this post. In this article, we will cover the following.
- Using Docker Compose to run ClickHouse
- Mounting data directory
- Mounting config file path
- Creating a bridge network for multi-node ClickHouse.
“Docker Compose is a tool for running multi-container applications on Docker defined using the Compose file format. A Compose file defines how one or more containers that make up your application are configured. Once you have a Compose file, you can create and start your application with a single command: docker compose up. ” – Official Github page of Docker Compose
Running ClickHouse using docker-compose has many benefits, especially for multi-node cluster setups and Kafka based setups. We can run all the necessary containers in one go, and networking between the nodes will be easier using a bridge network. Let us look at a simple docker-compose file, and the explanations for the contents declared in the file.
version: '3' services: clickhouse: image: clickhouse/clickhouse-server volumes: - /mnt/Data/ClickHouse/Blog/ch_docker_part2/data_dir:/var/lib/clickhouse - /mnt/Data/ClickHouse/Blog/ch_docker_part2/clickhouse_config.xml:/etc/clickhouse-server/config.xml ports: - "8002:9000" - "9123:8123" ulimits: nproc: 65535 nofile: soft: 262144 hard: 262144 networks: - ch_network networks: ch_network: driver: bridge
The containers are defined inside the services section. We have defined a container named ‘clickhouse.’ It will be based on the ‘clickhouse/clickhouse-server’ image available in the docker hub.
A file or a directory in the host machine can be mounted to a container. This can be useful to persist the changes in the container. The volumes section in the compose file defines the container’s host path and destination path.
volumes: - /mnt/Data/ClickHouse/Blog/ch_docker_part2/data_dir:/var/lib/clickhouse - /mnt/Data/ClickHouse/Blog/ch_docker_part2/clickhouse_config.xml:/etc/clickhouse-server/config.xml
In this example, we are mounting the data directory and the config file from the host machine to the container.
Docker supports software bridge networks which help containers inside the same bridge to communicate. The containers outside the bridge cannot communicate with the ones inside the network directly without extra configurations (overlay network). The Bridge network is defined here in the file.
networks: ch_network: driver: bridge
Port mapping is used to access the containers via a port in the host machine. Requests made to the host port is diverted to the mapped port in the container.
ports: - "8002:9000" - "9123:8123"
Running the ClickHouse
We can start the container using docker-compose up command.
vijay@chistadata:/mnt/Data/ClickHouse/Blog/ch_docker_part2$ sudo docker-compose up Creating network "ch_docker_part2_ch_network" with driver "bridge" Pulling clickhouse (clickhouse/clickhouse-server:)... latest: Pulling from clickhouse/clickhouse-server eaead16dc43b: Pull complete 95c72a7c9a77: Pull complete d8c6c7be5640: Pull complete 92d52b704d74: Pull complete 8bb19296fdc9: Pull complete 95f923355d9a: Pull complete 15ecb3e85677: Pull complete f892ab7ded43: Pull complete 4f4fb700ef54: Pull complete Digest: sha256:3737db889711e7b19b199428a07931ed795f9a8e3561a4b323e5d395633ee638 Status: Downloaded newer image for clickhouse/clickhouse-server:latest Creating ch_docker_part2_clickhouse_1 ... done Attaching to ch_docker_part2_clickhouse_1 clickhouse_1 | Processing configuration file '/etc/clickhouse-server/config.xml'. clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'. clickhouse_1 | Logging trace to /var/log/clickhouse-server/clickhouse-server.log clickhouse_1 | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log clickhouse_1 | Processing configuration file '/etc/clickhouse-server/config.xml'. clickhouse_1 | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'. clickhouse_1 | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'. clickhouse_1 | Processing configuration file '/etc/clickhouse-server/users.xml'. clickhouse_1 | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.
We can verify using the docker container ls command
vijay@chistadata:/mnt/Data/ClickHouse/Blog/ch_docker_part2$ sudo docker container ls [sudo] password for vijay: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 66eb58a15b53 clickhouse/clickhouse-server "/entrypoint.sh" 2 minutes ago Up 2 minutes 9009/tcp, 0.0.0.0:9123->8123/tcp, :::9123->8123/tcp, 0.0.0.0:8002->9000/tcp, :::8002->9000/tcp ch_docker_part2_clickhouse_1
Drop into the root shell of the container using the below command.
vijay@chistadata:/mnt/Data/ClickHouse/Blog/ch_docker_part2$ sudo docker exec -it 66e bash [sudo] password for vijay: root@66eb58a15b53:/#
Connect to the server via clickhouse-client tool.
root@66eb58a15b53:/# clickhouse-client -m ClickHouse client version 22.214.171.124 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 22.11.2 revision 54460. Warnings: * Effective user of the process (clickhouse) does not match the owner of the data (root). 66eb58a15b53 :)
Loading a dataset
Let us load a dataset into the newly spun-up ClickHouse container. We will make use of the UK property Price Paid dataset. The details are available in the official ClickHouse documentation. Let us create a table to load the data.
CREATE TABLE uk_price_paid ( price UInt32, date Date, postcode1 LowCardinality(String), postcode2 LowCardinality(String), type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0), is_new UInt8, duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0), addr1 String, addr2 String, street LowCardinality(String), locality LowCardinality(String), town LowCardinality(String), district LowCardinality(String), county LowCardinality(String) ) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);
Insert the data into the table once it is created.
INSERT INTO uk_price_paid WITH splitByChar(' ', postcode) AS p SELECT toUInt32(price_string) AS price, parseDateTimeBestEffortUS(time) AS date, p AS postcode1, p AS postcode2, transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type, b = 'Y' AS is_new, transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration, addr1, addr2, street, locality, town, district, county FROM url( 'http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv', 'CSV', 'uuid_string String, price_string String, time String, postcode String, a String, b String, c String, addr1 String, addr2 String, street String, locality String, town String, district String, county String, d String, e String' ) SETTINGS max_http_get_redirects=10;
The insert operation could take a while since the uncompressed data is around 4 GB. Run the following queries to test the inserted data.
SELECT count() FROM uk_price_paid Query id: acfc4af9-4688-4457-8648-bb90e83c7943 ┌──count()─┐ │ 27734966 │ └──────────┘ 1 row in set. Elapsed: 0.001 sec.
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid' Query id: af6d6f6b-9dba-44af-beb0-0dbd89437361 ┌─formatReadableSize(total_bytes)─┐ │ 298.19 MiB │ └─────────────────────────────────┘ 1 row in set. Elapsed: 0.001 sec. 66eb58a15b53 :)
We have seen how to run ClickHouse using a docker-compose. We created a container, mounted data directories, and created a bridge network. As a continuation of this series, you can refer to the below articles for more practical use cases.