ClickHouse on Kubernetes: Running ClickHouse Cluster on Amazon EKS

Introduction

Kubernetes orchestration simplifies many common operational concerns like scheduling, auto-scaling, and failover. Usually, databases that support replication, sharding, and auto-scaling are well-suited for Kubernetes. ClickHouse and Kubernetes can perform better together.

At ChistaDATA, we are interested in writing the following series of blogs to explain the ClickHouse on Kubernetes topic.

This is the second part of the series.

In this blog post, we will explain the complete details of the Installation and configurations of the ClickHouse cluster on Amazon EKS.

Overview of ClickHouse on EKS

To complete the setup, we need to work on the following steps.

  • Creating IAM user and configurations
  • Installing AWS CLI and configurations
  • Installing eksctl
  • Installing Kubectl
  • Creating the Amazon EKS cluster
  • Creating Addon for the EBS CSI driver
  • Cloning cluster configs and configurations
  • Testing the connections and cluster status

Creating IAM user and configurations

The first step we would say is to create the IAM user with proper configurations. The user should have the access to AWS management console and Access key/Secret keys. Make sure you already have the access to AWS management console, If not use the link to create one.

Once login the AWS console, open the IAM panel and click the create users section. Enter the user name and select the AWS management console and Access key/Secret key as shown below.

Then choose the AdministratorAccess as shown below in the permissions section, then click next.

Then complete the setup. Once you click the user name, you can see the Access key and Secret Key, as shown below.

Now, we are all good with the IAM user. The next step is we need to configure this in the local terminal using awscli.

Installing AWS CLI and configurations

The AWS Command Line Interface (AWS CLI) is a unified tool to manage your AWS services. Follow the below steps to Install the was client tool. ( link )

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Make sure you have the latest version of the awscli to avoid compatibility issues. After the installation, we need to configure the account using the command “aws configure” as shown below.

ubuntu@eksClick:~$ aws configure
AWS Access Key ID [None]: AKIAQY6KMRS4SHJUNVLM
AWS Secret Access Key [None]: xNvmA***********************************p3
Default region name [None]: us-east-2
Default output format [None]: json


ubuntu@eksClick:~$ aws configure list
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile                <not set>             None    None
access_key     ****************NVLM shared-credentials-file    
secret_key     ****************XGp3 shared-credentials-file    
    region                us-east-2      config-file    ~/.aws/config

Installing Eksctl

Eksctl is a simple CLI tool for creating and managing clusters on EKS – Amazon’s managed Kubernetes service for EC2. The following steps can be used to install the eksctl client tool. ( link )

curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
eksctl version

Installing Kubectl

The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. The following steps can be used to install the “kubectl” client tool.

sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl

Creating the Amazon EKS cluster

So far, we have installed and configured the necessary tools. The next step is, We need to create the Amazon EKS cluster. We will use the client tool “eksctl” to create the Amazon EKS cluster. The following configuration can be used to create the cluster.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: eks-ClickHouseCluster
  region: us-east-2

managedNodeGroups:
  - name: eks-ClickHouse
    instanceType: t3.xlarge
    desiredCapacity: 1
    volumeSize: 60

privateCluster:
  enabled: false
  skipEndpointCreation: false

Copy this content and store them in the .yaml file. In our case, We have created the “AmazonEKS.yaml” file and stored that config. Once the file is created, we can run the following command to create the cluster.

eksctl create cluster -f AmazonEKS.yaml

This will create the EKS cluster for you. Below We are sharing the logs for the references.

2022-10-29 18:45:16 [ℹ]  eksctl version 0.115.0-dev+2e9feac31.2022-10-14T12:52:53Z
2022-10-29 18:45:16 [ℹ]  using region us-east-2
2022-10-29 18:45:17 [ℹ]  setting availability zones to [us-east-2c us-east-2b us-east-2a]
2022-10-29 18:45:17 [ℹ]  subnets for us-east-2c - public:192.168.0.0/19 private:192.168.96.0/19
2022-10-29 18:45:17 [ℹ]  subnets for us-east-2b - public:192.168.32.0/19 private:192.168.128.0/19
2022-10-29 18:45:17 [ℹ]  subnets for us-east-2a - public:192.168.64.0/19 private:192.168.160.0/19
2022-10-29 18:45:17 [ℹ]  nodegroup "eks-ClickHouse" will use "" [AmazonLinux2/1.23]
2022-10-29 18:45:17 [ℹ]  using Kubernetes version 1.23
2022-10-29 18:45:17 [ℹ]  creating EKS cluster "eks-ClickHouseCluster" in "us-east-2" region with managed nodes
2022-10-29 18:45:17 [ℹ]  1 nodegroup (eks-ClickHouse) was included (based on the include/exclude rules)
2022-10-29 18:45:17 [ℹ]  will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s)
2022-10-29 18:45:17 [ℹ]  will create a CloudFormation stack for cluster itself and 1 managed nodegroup stack(s)
2022-10-29 18:45:17 [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-2 --cluster=eks-ClickHouseCluster'
2022-10-29 18:45:17 [ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "eks-ClickHouseCluster" in "us-east-2"
2022-10-29 18:45:17 [ℹ]  CloudWatch logging will not be enabled for cluster "eks-ClickHouseCluster" in "us-east-2"
2022-10-29 18:45:17 [ℹ]  you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-east-2 --cluster=eks-ClickHouseCluster'
2022-10-29 18:45:17 [ℹ]  
2 sequential tasks: { create cluster control plane "eks-ClickHouseCluster", 
    2 sequential sub-tasks: { 
        wait for control plane to become ready,
        create managed nodegroup "eks-ClickHouse",
    } 
}
2022-10-29 18:45:17 [ℹ]  building cluster stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:45:19 [ℹ]  deploying stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:45:49 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:46:21 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:47:22 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:48:23 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:49:24 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:50:25 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:51:31 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:52:32 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:53:33 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:54:35 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:55:36 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:56:38 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-cluster"
2022-10-29 18:58:49 [ℹ]  building managed nodegroup stack "eksctl-eks-ClickHouseCluster-nodegroup-eks-ClickHouse"
2022-10-29 18:58:50 [ℹ]  deploying stack "eksctl-eks-ClickHouseCluster-nodegroup-eks-ClickHouse"
2022-10-29 18:58:50 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-nodegroup-eks-ClickHouse"
2022-10-29 18:59:21 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-nodegroup-eks-ClickHouse"
2022-10-29 19:00:17 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-nodegroup-eks-ClickHouse"
2022-10-29 19:01:26 [ℹ]  waiting for CloudFormation stack "eksctl-eks-ClickHouseCluster-nodegroup-eks-ClickHouse"
2022-10-29 19:01:27 [ℹ]  waiting for the control plane to become ready
2022-10-29 19:01:27 [✔]  saved kubeconfig as "/Users/sakthivel/.kube/config"
2022-10-29 19:01:27 [ℹ]  no tasks
2022-10-29 19:01:27 [✔]  all EKS cluster resources for "eks-ClickHouseCluster" have been created
2022-10-29 19:01:28 [ℹ]  nodegroup "eks-ClickHouse" has 1 node(s)
2022-10-29 19:01:28 [ℹ]  node "ip-192-168-1-0.us-east-2.compute.internal" is ready
2022-10-29 19:01:28 [ℹ]  waiting for at least 1 node(s) to become ready in "eks-ClickHouse"
2022-10-29 19:01:28 [ℹ]  nodegroup "eks-ClickHouse" has 1 node(s)
2022-10-29 19:01:28 [ℹ]  node "ip-192-168-1-0.us-east-2.compute.internal" is ready
2022-10-29 19:01:30 [ℹ]  kubectl command should work with "/Users/sakthivel/.kube/config", try 'kubectl get nodes'
2022-10-29 19:01:30 [✔]  EKS cluster "eks-ClickHouseCluster" in "us-east-2" region is ready

At the end, you might see the “EKS cluster <ClusterName> in <region> is ready” logs.

2022-10-29 19:01:30 [✔]  EKS cluster "eks-ClickHouseCluster" in "us-east-2" region is ready

from console,

Creating Add-on for EBS CSI driver

Now we have the Amazon EKS cluster running. Before configuring the ClickHouse cluster, we need to make sure that we have configured the EBS CSI driver for the EKS cluster.

The Amazon Elastic Block Store (Amazon EBS) Container Storage Interface (CSI) driver allows Amazon Elastic Kubernetes Service (Amazon EKS) clusters to manage the lifecycle of Amazon EBS volumes for persistent volumes.

The following steps can be used to add the EBS driver to cluster.

To create IAM Open ID Connect provider,

eksctl utils associate-iam-oidc-provider --region=us-east-2 --cluster=eks-ClickHouseCluster --approve

To create IAM role for service account,

eksctl create iamserviceaccount --name ebs-csi-controller-sa --namespace kube-system --cluster eks-ClickHouseCluster --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy --approve --role-only --role-name AmazonEKS_EBS_CSI_DriverRoleForClickHouse --region us-east-2

To create the add-on,

eksctl create addon --name aws-ebs-csi-driver --cluster eks-ClickHouseCluster --service-account-role-arn arn:aws:iam::01111111111111:role/AmazonEKS_EBS_CSI_DriverRoleForClickHouse --force --region us-east-2

The final output will be like below,

2022-10-29 23:48:48 [ℹ]  Kubernetes version "1.23" in use by cluster "eks-ClickHouseCluster"
2022-10-29 23:48:48 [ℹ]  using provided ServiceAccountRoleARN "arn:aws:iam::0111111111111:role/AmazonEKS_EBS_CSI_DriverRoleForClickHouse"
2022-10-29 23:48:48 [ℹ]  creating addon
2022-10-29 23:50:25 [ℹ]  addon "aws-ebs-csi-driver" active

Note: Make sure to replace the cluster name, region, user ID on the above commands. The ID, you can get it from AWS console user section.

You can use the following command to verify the EDS add-on status.

ubuntu@eksClick:~$ kubectl get pods -n kube-system | grep -i ebs
ebs-csi-controller-65d9ff4584-47nhn   6/6     Running   0          36h
ebs-csi-controller-65d9ff4584-ph4gc   6/6     Running   0          36h
ebs-csi-node-kwzlm                    3/3     Running   0          36h

From console,

Cloning cluster configurations

The next step is, we need to configure the ClickHouse cluster. The configs are publicly available in our repository. You can clone them directly as following.

ubuntu@eksClick:~$ git clone https://github.com/ChistaDATA/clickhouse_lab.git
Cloning into 'clickhouse_lab'...
remote: Enumerating objects: 9, done.
remote: Counting objects: 100% (9/9), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 9 (delta 1), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (9/9), 29.61 KiB | 286.00 KiB/s, done.
Resolving deltas: 100% (1/1), done.

Once you have cloned the repository, you might see the following files under the folder “clickhouse_lab/ClickHouseCluster”.

ubuntu@eksClick:~$ cd clickhouse_lab/ClickHouseCluster/
ubuntu@eksClick:~/clickhouse_lab/ClickHouseCluster$ 
ubuntu@eksClick:~/clickhouse_lab/ClickHouseCluster$ ls -lrth
total 252K
-rw-rw-r-- 1 ubuntu ubuntu  254 Oct 28 15:13 create-zookeeper
-rw-rw-r-- 1 ubuntu ubuntu  241 Oct 28 15:13 create-operator
-rw-rw-r-- 1 ubuntu ubuntu  199 Oct 28 15:13 create-cluster
-rw-rw-r-- 1 ubuntu ubuntu 1.6K Oct 28 15:13 cluster.yaml
-rw-rw-r-- 1 ubuntu ubuntu 6.1K Oct 28 15:13 zookeeper.yaml
-rw-rw-r-- 1 ubuntu ubuntu 227K Oct 28 15:13 operator.yaml

Here you can see three .yaml files and the respective bash files. We can execute the bash files one by one, it will call the respective config and build the cluster.

First we need to call the operator script as shown below,

ubuntu@eksClick:~$ ./create-operator 
namespace/chista-operator created
customresourcedefinition.apiextensions.k8s.io/clickhouseinstallations.clickhouse.altinity.com created
customresourcedefinition.apiextensions.k8s.io/clickhouseinstallationtemplates.clickhouse.altinity.com created
customresourcedefinition.apiextensions.k8s.io/clickhouseoperatorconfigurations.clickhouse.altinity.com created
serviceaccount/clickhouse-operator created
clusterrole.rbac.authorization.k8s.io/clickhouse-operator-chista-operator created
clusterrolebinding.rbac.authorization.k8s.io/clickhouse-operator-chista-operator created
configmap/etc-clickhouse-operator-files created
configmap/etc-clickhouse-operator-confd-files created
configmap/etc-clickhouse-operator-configd-files created
configmap/etc-clickhouse-operator-templatesd-files created
configmap/etc-clickhouse-operator-usersd-files created
deployment.apps/clickhouse-operator created
service/clickhouse-operator-metrics created

Secondly, we need to call the zookeeper script,

ubuntu@eksClick:~$ ./create-zookeeper 
namespace/chista-zookeeper created
service/zookeeper created
service/zookeepers created
poddisruptionbudget.policy/zookeeper-pod-disruption-budget created
statefulset.apps/zookeeper created

Finally, need to call the cluster script as shown below,

ubuntu@eksClick:~$ ./create-cluster 
clickhouseinstallation.clickhouse.altinity.com/herc created

ClickHouse Cluster is created. We can verify this using the following command.

ubuntu@eksClick:~$ kubectl get pods -n chista-operator
NAME                                   READY   STATUS    RESTARTS   AGE
chi-herc-herc-cluster-0-0-0            2/2     Running   0          36h
chi-herc-herc-cluster-0-1-0            2/2     Running   0          36h
chi-herc-herc-cluster-1-0-0            2/2     Running   0          36h
chi-herc-herc-cluster-1-1-0            2/2     Running   0          36h
clickhouse-operator-58768cf654-h9nrj   2/2     Running   0          36h

From console,

As per the config ( cluster.yaml ), We have mentioned 2 replicas and 2 shards.

ubuntu@eksClick:~$ less cluster.yaml| grep -i 'repl\|shard'
          shardsCount: 2
          replicasCount: 2

Here,

  • “chi-herc-herc-cluster-0-0-0” and “chi-herc-herc-cluster-1-0-0” are the shards.
  • “chi-herc-herc-cluster-0-1-0” and “chi-herc-herc-cluster-1-1-0” are the respective replicas.

Shard 1:

chi-herc-herc-cluster-0-0-0
  \_chi-herc-herc-cluster-0-1-0

Shard 2:

chi-herc-herc-cluster-1-0-0
  \_chi-herc-herc-cluster-1-1-0

The cluster setup is completed!

Testing the connections and cluster status

You can use the following command to go directly login the ClickHouse shell.

ubuntu@ip-172-31-2-151:~$ kubectl exec -it chi-herc-herc-cluster-0-0-0 -n chista-operator -- clickhouse-client
Defaulted container "clickhouse" out of: clickhouse, clickhouse-log
ClickHouse client version 21.10.5.3 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 21.10.5 revision 54449.

chi-herc-herc-cluster-0-0-0.chi-herc-herc-cluster-0-0.chista-operator.svc.cluster.local :) show databases;

SHOW DATABASES

Query id: f0f16155-9967-479e-b439-e9babdae5799

┌─name────┐
│ default │
│ system  │
└─────────┘

2 rows in set. Elapsed: 0.013 sec.

You can also use the following method to login the OS shell then ClickHouse shell.

ubuntu@ip-172-31-2-151:~$ kubectl exec -it chi-herc-herc-cluster-0-0-0 /bin/bash -n chista-operator           
clickhouse@chi-herc-herc-cluster-0-0-0:/$ 
clickhouse@chi-herc-herc-cluster-0-0-0:/$ clickhouse-client 
ClickHouse client version 21.10.5.3 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 21.10.5 revision 54449.

chi-herc-herc-cluster-0-0-0.chi-herc-herc-cluster-0-0.chista-operator.svc.cluster.local :) show databases;

SHOW DATABASES

Query id: ee1e1378-4881-4c23-b76c-67ee3cc417c8

┌─name────┐
│ default │
│ system  │
└─────────┘

2 rows in set. Elapsed: 0.003 sec.

ClickHouse cluster status:

chi-herc-herc-cluster-0-0-0.chi-herc-herc-cluster-0-0.chista-operator.svc.cluster.local :) SELECT
:-]     cluster,
:-]     shard_num,
:-]     replica_num,
:-]     host_name
:-] FROM system.clusters
:-] WHERE cluster = 'herc-cluster'

SELECT
    cluster,
    shard_num,
    replica_num,
    host_name
FROM system.clusters
WHERE cluster = 'herc-cluster'

Query id: fc87564e-3602-4a35-a7ae-c39a2cd5109d

┌─cluster──────┬─shard_num─┬─replica_num─┬─host_name─────────────────┐
│ herc-cluster │         1 │           1 │ chi-herc-herc-cluster-0-0 │
│ herc-cluster │         1 │           2 │ chi-herc-herc-cluster-0-1 │
│ herc-cluster │         2 │           1 │ chi-herc-herc-cluster-1-0 │
│ herc-cluster │         2 │           2 │ chi-herc-herc-cluster-1-1 │
└──────────────┴───────────┴─────────────┴───────────────────────────┘

4 rows in set. Elapsed: 0.007 sec. 

From the above output, We have overall 4 nodes. For shard_num “1”, we have two nodes, and for shard_num “2” we have two nodes, and we can see the respective replicas as well. So, the configuration is perfect!

Conclusion

Hopefully, this blog will help you understand the configurations involved in the ClickHosue cluster on Amazon EKS. We will continue with this series and come up with the final part ( GKE ) soon. Thank you!

About Sri Sakthivel M.D. 6 Articles
Oracle certified MySQL DBA. Have expertise knowledge on the MySQL and its related technologies. Love to learn the Open source databases. Currently focusing on Clickhouse and its internals. Active MySQL Blogger and Youtuber.
Contact: Website