Introduction
ClickHouse is an open-source columnar database management system that is designed for real-time data processing and analytics. One of the unique features of ClickHouse is its use of the information_schema
database, which provides a set of virtual system tables that contain metadata about the database and its objects. This article will provide an overview of the information_schema
tables in ClickHouse and their use cases.
What is information_schema
in ClickHouse?
SHOW TABLES FROM INFORMATION_SCHEMA;
┌─name─────┐ │ columns │ │ schemata │ │ tables │ │ views │ └──────────┘
The information_schema
database is a virtual system database in ClickHouse that provides a set of system tables that contain metadata about the database and its objects. These tables are virtual, meaning they do not physically exist in the database but are generated dynamically by the database management system.
The information_schema
database is modeled after the information_schema
database in MySQL and provides a similar set of tables that contain metadata about the database schema, tables, columns, indexes, and other database objects.
What is the role of information_schema ?
The information_schema
database in ClickHouse serves as a virtual database that provides a set of system tables containing metadata about the ClickHouse database schema, tables, columns, functions, processes, views, clusters, and system events. This database is similar to the information_schema
database in other relational database management systems, such as MySQL and PostgreSQL.
The information_schema
database plays a critical role in the ClickHouse ecosystem as it enables users to access and query the system metadata in a standardized way, without requiring access to low-level system information. The virtual tables in the information_schema
database can be queried like any other tables in ClickHouse, allowing users to gain insights into the database schema, optimize queries, and diagnose performance issues.
For example, users can query the information_schema.columns
table to retrieve metadata about the columns in a table, such as the column name, data type, default value, and whether it is nullable or not. Similarly, the information_schema.tables
table can be used to retrieve metadata about the tables in the database schema, including the table name, type, engine, and the number of rows in the table.
What are the information_schema
tables in ClickHouse?
The information_schema
database in ClickHouse provides the following set of tables:
1- schemata: Contains metadata about the database schema, including the schema name and the default character set.
SELECT * FROM information_schema.schemata WHERE schema_name ILIKE 'information_schema' LIMIT 1 FORMAT Vertical;
Row 1: ────── catalog_name: INFORMATION_SCHEMA schema_name: INFORMATION_SCHEMA schema_owner: default default_character_set_catalog: ᴺᵁᴸᴸ default_character_set_schema: ᴺᵁᴸᴸ default_character_set_name: ᴺᵁᴸᴸ sql_path: ᴺᵁᴸᴸ
2- tables: Contains metadata about the tables in the database schema, including the table name, type, engine, and the number of rows in the table.
SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE (table_schema = currentDatabase() OR table_schema = '') AND table_name NOT LIKE '%inner%' LIMIT 1 FORMAT Vertical;
Row 1: ────── table_catalog: default table_schema: default table_name: describe_example table_type: BASE TABLE
3- columns: Contains metadata about the columns in the tables in the database schema, including the column name, data type, and whether the column is nullable or not.
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE (table_schema=currentDatabase() OR table_schema='') AND table_name NOT LIKE '%inner%' LIMIT 1 FORMAT Vertical;
Row 1: ────── table_catalog: default table_schema: default table_name: describe_example column_name: id ordinal_position: 1 column_default: is_nullable: 0 data_type: UInt64 character_maximum_length: ᴺᵁᴸᴸ character_octet_length: ᴺᵁᴸᴸ numeric_precision: 64 numeric_precision_radix: 2 numeric_scale: 0 datetime_precision: ᴺᵁᴸᴸ character_set_catalog: ᴺᵁᴸᴸ character_set_schema: ᴺᵁᴸᴸ character_set_name: ᴺᵁᴸᴸ collation_catalog: ᴺᵁᴸᴸ collation_schema: ᴺᵁᴸᴸ collation_name: ᴺᵁᴸᴸ domain_catalog: ᴺᵁᴸᴸ domain_schema: ᴺᵁᴸᴸ domain_name: ᴺᵁᴸᴸ
4- views: Contains metadata about the views in the database schema, including the view name, definition, and the tables used in the view.
CREATE VIEW v (n Nullable(Int32), f Float64) AS SELECT n, f FROM t; CREATE MATERIALIZED VIEW mv ENGINE = Null AS SELECT * FROM system.one; SELECT * FROM information_schema.views WHERE table_schema = currentDatabase() LIMIT 1 FORMAT Vertical;
Row 1: ────── table_catalog: default table_schema: default table_name: mv view_definition: SELECT * FROM system.one check_option: NONE is_updatable: NO is_insertable_into: YES is_trigger_updatable: NO is_trigger_deletable: NO is_trigger_insertable_into: NO
What are the use cases of information_schema
tables in ClickHouse?
The information_schema
tables in ClickHouse are useful for a variety of use cases, including:
- Data dictionary: The
information_schema
tables can be used to create a data dictionary that provides information about the database schema, tables, and columns. This information can be used by data analysts and developers to understand the data model and schema of the database. - Database administration: The
information_schema
tables can be used by database administrators to monitor the database and diagnose issues. For example, theprocesses
table can be used to monitor the running queries on the server and diagnose performance issues. - Query optimization: The
information_schema
tables can be used to optimize queries by providing metadata about the database schema, tables, and columns. For example, thecolumns
table can be used to determine the data type and nullability of columns, which can be used to optimize query execution. - Cluster management: The
information_schema
tables can be used to manage ClickHouse clusters, including monitoring the cluster status, adding or removing nodes from the cluster, and changing the replication factor.
Conclusion
In conclusion, the information_schema
database in ClickHouse is a powerful tool that provides a set of virtual system tables containing metadata about the database schema, tables, columns, functions, processes, views, clusters, and system events. These tables can be used for a wide range of use cases, such as data dictionary creation, database administration, query optimization, and cluster management. By leveraging the information_schema
tables, users can better understand their data model, diagnose performance issues, optimize queries, and manage their ClickHouse clusters more effectively. The information_schema
tables are an essential part of ClickHouse’s system design, making it a versatile and powerful database management system for real-time data processing and analytics.
To read more about ClickHouse internals, do consider reading the following articles