Introduction
The ClickHouse data directory, typically located at /var/lib/clickhouse, is the central hub for storing various files essential to the functioning of the ClickHouse database management system. In this blog post, we will explore each file in the ClickHouse data directory, understanding its purpose and significance.
root@de6077477132:/var/lib/clickhouse# ls -l total 76 drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 access drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 cores drwxr-x--- 1 clickhouse clickhouse 4096 Jun 1 11:09 data drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 dictionaries_lib drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 flags drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 format_schemas drwxr-x--- 4 clickhouse clickhouse 4096 Jun 1 11:09 metadata drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 metadata_dropped drwxr-x--- 1 clickhouse clickhouse 4096 Jun 1 11:09 preprocessed_configs -rw-r----- 1 clickhouse clickhouse 56 Jun 5 16:36 status drwxr-x--- 1 clickhouse clickhouse 4096 Jun 5 16:38 store drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 tmp drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 user_defined drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 user_files drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 user_scripts -rw-r----- 1 clickhouse clickhouse 36 Jun 1 11:09 uuid
access:
The access directory stores files related to user access control and permissions in ClickHouse. It includes configuration files and access-related metadata.
root@de6077477132:/var/lib/clickhouse# cd access/ root@de6077477132:/var/lib/clickhouse/access# ls -l total 20 -rw-r----- 1 clickhouse clickhouse 1 Jun 1 11:09 quotas.list -rw-r----- 1 clickhouse clickhouse 1 Jun 1 11:09 roles.list -rw-r----- 1 clickhouse clickhouse 1 Jun 1 11:09 row_policies.list -rw-r----- 1 clickhouse clickhouse 1 Jun 1 11:09 settings_profiles.list -rw-r----- 1 clickhouse clickhouse 1 Jun 1 11:09 users.list
cores:
ClickHouse stores core dump files in the cores directory when the system encounters a crash or abnormal termination. Core dump files contain the program’s state at the time of the crash and are crucial for debugging purposes.
data:
The data directory is the primary location where ClickHouse stores the actual data files for tables and partitions. It contains subdirectories for each database and further subdirectories for each table and partition, storing the respective data files in a columnar format.
root@de6077477132:/var/lib/clickhouse# cd data/ root@de6077477132:/var/lib/clickhouse/data# ls -l total 8 drwxr-x--- 2 clickhouse clickhouse 4096 Jun 1 11:09 default drwxr-x--- 1 clickhouse clickhouse 4096 Jun 5 16:38 system
dictionaries_lib:
The dictionaries_lib directory contains user-defined dictionaries used in ClickHouse. Dictionaries provide additional metadata and mappings for efficient data retrieval and processing.
flags:
The flags directory includes files storing various system flags and settings that ClickHouse uses. These files help control the behavior and configuration of ClickHouse components.
format_schemas:
The format_schemas directory houses files related to data formats and schemas used in ClickHouse. It includes definitions and metadata for custom data formats and serialization formats.
metadata:
The metadata directory contains metadata files that store information about databases, tables, and columns in ClickHouse. These files are essential for managing and querying the database effectively.
root@de6077477132:/var/lib/clickhouse# cd metadata root@de6077477132:/var/lib/clickhouse/metadata# ls -l total 32 drwxr-x— 2 clickhouse clickhouse 4096 Jun 1 11:09 INFORMATION_SCHEMA -rw-r—– 1 clickhouse clickhouse 51 Jun 1 11:09 INFORMATION_SCHEMA.sql lrwxrwxrwx 1 clickhouse clickhouse 67 Jun 1 11:09 default -> /var/lib/clickhouse/store/c97/c975ff80-de9a-4944-8036-c931ed3048d2/ -rw-r—– 1 clickhouse clickhouse 78 Jun 1 11:09 default.sql drwxr-x— 2 clickhouse clickhouse 4096 Jun 1 11:09 information_schema -rw-r—– 1 clickhouse clickhouse 51 Jun 1 11:09 information_schema.sql lrwxrwxrwx 1 clickhouse clickhouse 67 Jun 1 11:09 system -> /var/lib/clickhouse/store/608/60837445-e8be-4ed5-b547-eca2855a065e/ -rw-r—– 1 clickhouse clickhouse 78 Jun 1 11:09 system.sql
metadata_dropped:
The metadata_dropped directory holds metadata files for dropped databases and tables. It allows ClickHouse to keep track of historical metadata information even after objects have been deleted from the system.
preprocessed_configs:
The preprocessed_configs directory includes preprocessed configuration files used by ClickHouse. These files are generated during the configuration parsing process and can help identify any errors or modifications made to the configurations.
root@de6077477132:/var/lib/clickhouse# cd preprocessed_configs/ root@de6077477132:/var/lib/clickhouse/preprocessed_configs# ls -l total 84 -rw-r----- 1 clickhouse clickhouse 74101 Jun 5 16:36 config.xml -rw-r----- 1 clickhouse clickhouse 5591 Jun 5 16:36 users.xml
status:
The status file is a small file that stores the current status of the ClickHouse server. It can provide information about the server’s health, uptime, and other relevant details.
root@de6077477132:/var/lib/clickhouse# cat status PID: 35 Started at: 2023-06-05 16:36:37 Revision: 54473
store:
The store directory contains files related to ClickHouse data storage and management. It includes information about data parts, replicas, and the distribution of data across different servers.
root@de6077477132:/var/lib/clickhouse# cd store/ root@de6077477132:/var/lib/clickhouse/store# ls -l total 52 drwxr-x--- 3 clickhouse clickhouse 4096 Jun 1 11:10 061 drwxr-x--- 3 clickhouse clickhouse 4096 Jun 5 16:38 0d9 drwxr-x--- 3 clickhouse clickhouse 4096 Jun 5 16:38 3c4 drwxr-x--- 3 clickhouse clickhouse 4096 Jun 1 11:10 3eb drwxr-x--- 3 clickhouse clickhouse 4096 Jun 5 16:38 453 drwxr-x--- 1 clickhouse clickhouse 4096 Jun 1 11:09 608 drwxr-x--- 3 clickhouse clickhouse 4096 Jun 5 16:38 692 drwxr-x--- 3 clickhouse clickhouse 4096 Jun 1 11:10 8ad drwxr-x--- 3 clickhouse clickhouse 4096 Jun 1 11:10 910 drwxr-x--- 3 clickhouse clickhouse 4096 Jun 1 11:09 c97 drwxr-x--- 3 clickhouse clickhouse 4096 Jun 5 16:38 e4a drwxr-x--- 3 clickhouse clickhouse 4096 Jun 5 16:38 eec
tmp:
The tmp directory is a temporary storage location where ClickHouse creates and manages temporary files during various operations like query processing, data ingestion, or intermediate result storage.
user_defined:
The user_defined directory is a customizable directory where users can place their own files and data. It serves as a dedicated space for storing user-defined content that ClickHouse can access.
user_files:
The user_files directory is another customizable directory where users can store their own files. It provides a convenient location for users to store data files or any other files relevant to their ClickHouse workflows.
user_scripts:
The user_scripts directory allows users to store custom scripts or queries that can be executed within ClickHouse. It provides a centralized location for managing user-specific scripts and enhancing flexibility in query execution.
uuid:
The uuid file contains a universally unique identifier (UUID) that uniquely identifies the ClickHouse server instance. It serves as a unique identifier for the ClickHouse installation.
root@de6077477132:/var/lib/clickhouse# cat uuid 05bb76fb-3a81-4a60-a88d-e9bb0c4a7756root@de6077477132:/var/lib/clickhouse#
Conclusion
Acquainting yourself with the ClickHouse data directory and its physical files is essential for proficiently managing, optimizing, and troubleshooting your ClickHouse database. By understanding the structure and purpose of these physical files, you can make informed decisions regarding data organization, storage optimization, and performance tuning in ClickHouse. Navigating the ClickHouse data directory will empower you to leverage the full potential of this powerful columnar database management system.
To know more about ClickHouse internals, do visit the following articles: