Comprehensive Guide to ClickHouse Data Files

Table of Contents

Introduction

The ClickHouse data directory, typically located at /var/lib/clickhouse, is the central hub for storing various files essential to the functioning of the ClickHouse database management system. In this blog post, we will explore each file in the ClickHouse data directory, understanding its purpose and significance.

root@de6077477132:/var/lib/clickhouse# ls -l
total 76
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 access
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 cores
drwxr-x--- 1 clickhouse clickhouse 4096 Jun  1 11:09 data
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 dictionaries_lib
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 flags
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 format_schemas
drwxr-x--- 4 clickhouse clickhouse 4096 Jun  1 11:09 metadata
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 metadata_dropped
drwxr-x--- 1 clickhouse clickhouse 4096 Jun  1 11:09 preprocessed_configs
-rw-r----- 1 clickhouse clickhouse   56 Jun  5 16:36 status
drwxr-x--- 1 clickhouse clickhouse 4096 Jun  5 16:38 store
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 tmp
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 user_defined
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 user_files
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 user_scripts
-rw-r----- 1 clickhouse clickhouse   36 Jun  1 11:09 uuid

access:

The access directory stores files related to user access control and permissions in ClickHouse. It includes configuration files and access-related metadata.

root@de6077477132:/var/lib/clickhouse# cd access/
root@de6077477132:/var/lib/clickhouse/access# ls -l
total 20
-rw-r----- 1 clickhouse clickhouse 1 Jun  1 11:09 quotas.list
-rw-r----- 1 clickhouse clickhouse 1 Jun  1 11:09 roles.list
-rw-r----- 1 clickhouse clickhouse 1 Jun  1 11:09 row_policies.list
-rw-r----- 1 clickhouse clickhouse 1 Jun  1 11:09 settings_profiles.list
-rw-r----- 1 clickhouse clickhouse 1 Jun  1 11:09 users.list

cores:

ClickHouse stores core dump files in the cores directory when the system encounters a crash or abnormal termination. Core dump files contain the program’s state at the time of the crash and are crucial for debugging purposes.

data:

The data directory is the primary location where ClickHouse stores the actual data files for tables and partitions. It contains subdirectories for each database and further subdirectories for each table and partition, storing the respective data files in a columnar format.

root@de6077477132:/var/lib/clickhouse# cd data/
root@de6077477132:/var/lib/clickhouse/data# ls -l
total 8
drwxr-x--- 2 clickhouse clickhouse 4096 Jun  1 11:09 default
drwxr-x--- 1 clickhouse clickhouse 4096 Jun  5 16:38 system

dictionaries_lib:

The dictionaries_lib directory contains user-defined dictionaries used in ClickHouse. Dictionaries provide additional metadata and mappings for efficient data retrieval and processing.

flags:

The flags directory includes files storing various system flags and settings that ClickHouse uses. These files help control the behavior and configuration of ClickHouse components.

format_schemas:

The format_schemas directory houses files related to data formats and schemas used in ClickHouse. It includes definitions and metadata for custom data formats and serialization formats.

metadata:

The metadata directory contains metadata files that store information about databases, tables, and columns in ClickHouse. These files are essential for managing and querying the database effectively.

root@de6077477132:/var/lib/clickhouse# cd metadata root@de6077477132:/var/lib/clickhouse/metadata# ls -l total 32 drwxr-x— 2 clickhouse clickhouse 4096 Jun 1 11:09 INFORMATION_SCHEMA -rw-r—– 1 clickhouse clickhouse 51 Jun 1 11:09 INFORMATION_SCHEMA.sql lrwxrwxrwx 1 clickhouse clickhouse 67 Jun 1 11:09 default -> /var/lib/clickhouse/store/c97/c975ff80-de9a-4944-8036-c931ed3048d2/ -rw-r—– 1 clickhouse clickhouse 78 Jun 1 11:09 default.sql drwxr-x— 2 clickhouse clickhouse 4096 Jun 1 11:09 information_schema -rw-r—– 1 clickhouse clickhouse 51 Jun 1 11:09 information_schema.sql lrwxrwxrwx 1 clickhouse clickhouse 67 Jun 1 11:09 system -> /var/lib/clickhouse/store/608/60837445-e8be-4ed5-b547-eca2855a065e/ -rw-r—– 1 clickhouse clickhouse 78 Jun 1 11:09 system.sql

metadata_dropped:

The metadata_dropped directory holds metadata files for dropped databases and tables. It allows ClickHouse to keep track of historical metadata information even after objects have been deleted from the system.

preprocessed_configs:

The preprocessed_configs directory includes preprocessed configuration files used by ClickHouse. These files are generated during the configuration parsing process and can help identify any errors or modifications made to the configurations.

root@de6077477132:/var/lib/clickhouse# cd preprocessed_configs/
root@de6077477132:/var/lib/clickhouse/preprocessed_configs# ls -l
total 84
-rw-r----- 1 clickhouse clickhouse 74101 Jun  5 16:36 config.xml
-rw-r----- 1 clickhouse clickhouse  5591 Jun  5 16:36 users.xml

status:

The status file is a small file that stores the current status of the ClickHouse server. It can provide information about the server’s health, uptime, and other relevant details.

root@de6077477132:/var/lib/clickhouse# cat status 
PID: 35
Started at: 2023-06-05 16:36:37
Revision: 54473

store:

The store directory contains files related to ClickHouse data storage and management. It includes information about data parts, replicas, and the distribution of data across different servers.

root@de6077477132:/var/lib/clickhouse# cd store/
root@de6077477132:/var/lib/clickhouse/store# ls -l
total 52
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  1 11:10 061
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  5 16:38 0d9
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  5 16:38 3c4
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  1 11:10 3eb
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  5 16:38 453
drwxr-x--- 1 clickhouse clickhouse 4096 Jun  1 11:09 608
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  5 16:38 692
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  1 11:10 8ad
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  1 11:10 910
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  1 11:09 c97
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  5 16:38 e4a
drwxr-x--- 3 clickhouse clickhouse 4096 Jun  5 16:38 eec

tmp:

The tmp directory is a temporary storage location where ClickHouse creates and manages temporary files during various operations like query processing, data ingestion, or intermediate result storage.

user_defined:

The user_defined directory is a customizable directory where users can place their own files and data. It serves as a dedicated space for storing user-defined content that ClickHouse can access.

user_files:

The user_files directory is another customizable directory where users can store their own files. It provides a convenient location for users to store data files or any other files relevant to their ClickHouse workflows.

user_scripts:

The user_scripts directory allows users to store custom scripts or queries that can be executed within ClickHouse. It provides a centralized location for managing user-specific scripts and enhancing flexibility in query execution.

uuid:

The uuid file contains a universally unique identifier (UUID) that uniquely identifies the ClickHouse server instance. It serves as a unique identifier for the ClickHouse installation.

root@de6077477132:/var/lib/clickhouse# cat uuid 
05bb76fb-3a81-4a60-a88d-e9bb0c4a7756root@de6077477132:/var/lib/clickhouse#

Conclusion

Acquainting yourself with the ClickHouse data directory and its physical files is essential for proficiently managing, optimizing, and troubleshooting your ClickHouse database. By understanding the structure and purpose of these physical files, you can make informed decisions regarding data organization, storage optimization, and performance tuning in ClickHouse. Navigating the ClickHouse data directory will empower you to leverage the full potential of this powerful columnar database management system.

To know more about ClickHouse internals, do visit the following articles:

About Can Sayn 41 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website