In the previous part of this series, we talked about AWS S3 configuration to use in ClickHouse. As you remember from the first part of this series, we created a folder under the S3 bucket and stored its URL, as well as the access key and secret access key for the IAM user.
Configuration Steps
Now it is time to configure ClickHouse to use S3 as a disk.
- To do that, first of all, we need to create the storage.xml file under the ClickHouse configuration directory, which is “/etc/clickhouse-server/config.d/” by default. The xml file should be like that:
<clickhouse> <storage_configuration> <disks> <s3_disk> <type>s3</type> <endpoint>https://YOUR_S3_URL/</endpoint> <access_key_id>YOUR_ACCESS_KEY</access_key_id> <secret_access_key>YOUR_SECRET_KEY</secret_access_key> <metadata_path>/var/lib/clickhouse/disks/s3_disk/</metadata_path> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <cache_path>/var/lib/clickhouse/disks/s3_disk/cache/</cache_path> </s3_disk> </disks> <policies> <s3_policy> <volumes> <main> <disk>s3_disk</disk> </main> </volumes> </s3_policy> </policies> </storage_configuration> </clickhouse>
Here, endpoint, access_key_id and secret_access_key information are stored in the S3 configuration step. The XML file should read from “clickhouse” user.
- After creating this XML file, we need to restart the ClickHouse instance to have the changes take effect.
service clickhouse-server restart
- To check if “s3_disk” was created successfully, we need to connect ClickHouse and run the following command:
SELECT name, path FROM system.disks WHERE name = 's3_disk' ┌─name────┬─path───────────────────────────────┐ │ s3_disk │ /var/lib/clickhouse/disks/s3_disk/ │ └─────────┴────────────────────────────────────┘
- We also need to check the storage policy with the command below:
SELECT policy_name, volume_name, disks FROM system.storage_policies ┌─policy_name─┬─volume_name─┬─disks───────┐ │ default │ default │ ['default'] │ │ s3_policy │ main │ ['s3_disk'] │ └─────────────┴─────────────┴─────────────┘
- Here you can see “s3_disk” is created with “s3_policy“. That means we can create a table with this policy. You can find the table creation script with “s3_policy” as follow:
CREATE TABLE myS3Table ( `id` UInt64, `name` String ) ENGINE = MergeTree ORDER BY tuple() SETTINGS storage_policy = 's3_policy'
- This table is created in the S3 object store right now. Let’s try to insert data and query it.
insert into myS3Table values(1,'ChistaDATA'); SELECT * FROM myS3Table ┌─id─┬─name───────┐ │ 1 │ ChistaDATA │ └────┴────────────┘
Now, you can directly write to and read from S3 as the table’s data storage.
With the help of these articles, we would like to configure S3 Object Store as a ClickHouse disk and use it as well.
References