Running ClickHouse with Docker: Part 2

Introduction

ClickHouse is an open-source columnar database meant for online analytical processing workloads. We have covered how to set up ClickHouse using Docker in this post. In this article, we will cover the following:

  • Using Docker Compose to run ClickHouse
  • Mounting data directory
  • Mounting config file path
  • Creating a bridge network for multi-node ClickHouse.

Docker Compose

“Docker Compose is a tool for running multi-container applications on Docker defined using the Compose file format. A Compose file defines how one or more containers that make up your application are configured. Once you have a Compose file, you can create and start your application with a single command: docker compose up. ” Official Github page of Docker Compose

Running ClickHouse using docker-compose has many benefits, especially for multi-node cluster setups and Kafka based setups. We can run all the necessary containers in one go, and networking between the nodes will be easier using a bridge network. Let us look at a simple docker-compose file, and the explanations for the contents declared in the file.

version: '3'

services:
  clickhouse:
    image: clickhouse/clickhouse-server
    volumes:
      - /mnt/Data/ClickHouse/Blog/ch_docker_part2/data_dir:/var/lib/clickhouse
      - /mnt/Data/ClickHouse/Blog/ch_docker_part2/clickhouse_config.xml:/etc/clickhouse-server/config.xml
    ports:
      - "8002:9000"
      - "9123:8123"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    networks:
      - ch_network
     

networks:
  ch_network:
    driver: bridge

The containers are defined inside the services section. We have defined a container named ‘clickhouse.’ It will be based on the ‘clickhouse/clickhouse-server’ image available in the docker hub.

Bind Mounts

A file or a directory in the host machine can be mounted to a container. This can be useful to persist the changes in the container. The volumes section in the compose file defines the container’s host path and destination path.

volumes:
  - /mnt/Data/ClickHouse/Blog/ch_docker_part2/data_dir:/var/lib/clickhouse 
  - /mnt/Data/ClickHouse/Blog/ch_docker_part2/clickhouse_config.xml:/etc/clickhouse-server/config.xml

In this example, we are mounting the data directory and the config file from the host machine to the container.

Bridge Network

Docker supports software bridge networks which help containers inside the same bridge to communicate. The containers outside the bridge cannot communicate with the ones inside the network directly without extra configurations (overlay network). The Bridge network is defined here in the file.

networks: 
  ch_network:
     driver: bridge

Port mapping

Port mapping is used to access the containers via a port in the host machine. Requests made to the host port is diverted to the mapped port in the container.

ports: 
  - "8002:9000"
  - "9123:8123"

Running ClickHouse

We can start the container using docker-compose up command.

vijay@chistadata:/mnt/Data/ClickHouse/Blog/ch_docker_part2$ sudo docker-compose up
Creating network "ch_docker_part2_ch_network" with driver "bridge"
Pulling clickhouse (clickhouse/clickhouse-server:)...
latest: Pulling from clickhouse/clickhouse-server
eaead16dc43b: Pull complete
95c72a7c9a77: Pull complete
d8c6c7be5640: Pull complete
92d52b704d74: Pull complete
8bb19296fdc9: Pull complete
95f923355d9a: Pull complete
15ecb3e85677: Pull complete
f892ab7ded43: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:3737db889711e7b19b199428a07931ed795f9a8e3561a4b323e5d395633ee638
Status: Downloaded newer image for clickhouse/clickhouse-server:latest
Creating ch_docker_part2_clickhouse_1 ... done
Attaching to ch_docker_part2_clickhouse_1
clickhouse_1  | Processing configuration file '/etc/clickhouse-server/config.xml'.
clickhouse_1  | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
clickhouse_1  | Logging trace to /var/log/clickhouse-server/clickhouse-server.log
clickhouse_1  | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
clickhouse_1  | Processing configuration file '/etc/clickhouse-server/config.xml'.
clickhouse_1  | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
clickhouse_1  | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
clickhouse_1  | Processing configuration file '/etc/clickhouse-server/users.xml'.
clickhouse_1  | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.

We can verify using the docker container ls command

vijay@chistadata:/mnt/Data/ClickHouse/Blog/ch_docker_part2$ sudo docker container ls
[sudo] password for vijay: 
CONTAINER ID   IMAGE                          COMMAND            CREATED         STATUS         PORTS                                                                                            NAMES
66eb58a15b53   clickhouse/clickhouse-server   "/entrypoint.sh"   2 minutes ago   Up 2 minutes   9009/tcp, 0.0.0.0:9123->8123/tcp, :::9123->8123/tcp, 0.0.0.0:8002->9000/tcp, :::8002->9000/tcp   ch_docker_part2_clickhouse_1

Drop into the root shell of the container using the below command.

vijay@chistadata:/mnt/Data/ClickHouse/Blog/ch_docker_part2$ sudo docker exec -it 66e bash
[sudo] password for vijay: 
root@66eb58a15b53:/# 

Connect to the server via clickhouse-client tool.

root@66eb58a15b53:/# clickhouse-client -m
ClickHouse client version 22.11.2.30 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.11.2 revision 54460.

Warnings:
 * Effective user of the process (clickhouse) does not match the owner of the data (root).

66eb58a15b53 :) 

Loading a dataset into ClickHouse

Let us load a dataset into the newly spun-up ClickHouse container. We will make use of the UK property Price Paid dataset. The details are available in the official ClickHouse documentation. Let us create a table to load the data.

CREATE TABLE uk_price_paid
(
    price UInt32,
    date Date,
    postcode1 LowCardinality(String),
    postcode2 LowCardinality(String),
    type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
    is_new UInt8,
    duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
    addr1 String,
    addr2 String,
    street LowCardinality(String),
    locality LowCardinality(String),
    town LowCardinality(String),
    district LowCardinality(String),
    county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2);

Insert the data into the table once it is created.

INSERT INTO uk_price_paid
WITH
   splitByChar(' ', postcode) AS p
SELECT
    toUInt32(price_string) AS price,
    parseDateTimeBestEffortUS(time) AS date,
    p[1] AS postcode1,
    p[2] AS postcode2,
    transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
    b = 'Y' AS is_new,
    transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
    addr1,
    addr2,
    street,
    locality,
    town,
    district,
    county
FROM url(
    'http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv',
    'CSV',
    'uuid_string String,
    price_string String,
    time String,
    postcode String,
    a String,
    b String,
    c String,
    addr1 String,
    addr2 String,
    street String,
    locality String,
    town String,
    district String,
    county String,
    d String,
    e String'
) SETTINGS max_http_get_redirects=10;

The insert operation could take a while since the uncompressed data is around 4 GB. Run the following queries to test the inserted data.

SELECT count()
FROM uk_price_paid

Query id: acfc4af9-4688-4457-8648-bb90e83c7943

┌──count()─┐
│ 27734966 │
└──────────┘

1 row in set. Elapsed: 0.001 sec. 
SELECT formatReadableSize(total_bytes)
FROM system.tables
WHERE name = 'uk_price_paid'

Query id: af6d6f6b-9dba-44af-beb0-0dbd89437361

┌─formatReadableSize(total_bytes)─┐
│ 298.19 MiB                      │
└─────────────────────────────────┘

1 row in set. Elapsed: 0.001 sec. 

66eb58a15b53 :) 

Conclusion

We have seen how to run ClickHouse using a docker-compose. We created a container, mounted data directories, and created a bridge network. As a continuation of this series, you can refer to the below articles for more practical use cases.

References

https://clickhouse.com/docs/en/getting-started/example-datasets/uk-price-paid
https://docs.docker.com/network/bridge/