Overview of information_schema Tables In Clickhouse

Introduction

ClickHouse is an open-source columnar database management system that is designed for real-time data processing and analytics. One of the unique features of ClickHouse is its use of the information_schema database, which provides a set of virtual system tables that contain metadata about the database and its objects. This article will provide an overview of the information_schema tables in ClickHouse and their use cases.

What is information_schema in ClickHouse?

SHOW TABLES FROM INFORMATION_SCHEMA;
┌─name─────┐
│ columns  │
│ schemata │
│ tables   │
│ views    │
└──────────┘

The information_schema database is a virtual system database in ClickHouse that provides a set of system tables that contain metadata about the database and its objects. These tables are virtual, meaning they do not physically exist in the database but are generated dynamically by the database management system.

The information_schema database is modeled after the information_schema database in MySQL and provides a similar set of tables that contain metadata about the database schema, tables, columns, indexes, and other database objects.

What is the role of information_schema ?

The information_schema database in ClickHouse serves as a virtual database that provides a set of system tables containing metadata about the ClickHouse database schema, tables, columns, functions, processes, views, clusters, and system events. This database is similar to the information_schema database in other relational database management systems, such as MySQL and PostgreSQL.

The information_schema database plays a critical role in the ClickHouse ecosystem as it enables users to access and query the system metadata in a standardized way, without requiring access to low-level system information. The virtual tables in the information_schema database can be queried like any other tables in ClickHouse, allowing users to gain insights into the database schema, optimize queries, and diagnose performance issues.

For example, users can query the information_schema.columns table to retrieve metadata about the columns in a table, such as the column name, data type, default value, and whether it is nullable or not. Similarly, the information_schema.tables table can be used to retrieve metadata about the tables in the database schema, including the table name, type, engine, and the number of rows in the table.

What are the information_schema tables in ClickHouse?

The information_schema database in ClickHouse provides the following set of tables:

1- schemata: Contains metadata about the database schema, including the schema name and the default character set.

SELECT * FROM information_schema.schemata WHERE schema_name ILIKE 'information_schema' LIMIT 1 FORMAT Vertical;
Row 1:
──────
catalog_name:                  INFORMATION_SCHEMA
schema_name:                   INFORMATION_SCHEMA
schema_owner:                  default
default_character_set_catalog: ᴺᵁᴸᴸ
default_character_set_schema:  ᴺᵁᴸᴸ
default_character_set_name:    ᴺᵁᴸᴸ
sql_path:                      ᴺᵁᴸᴸ

2- tables: Contains metadata about the tables in the database schema, including the table name, type, engine, and the number of rows in the table.

SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE (table_schema = currentDatabase() OR table_schema = '') AND table_name NOT LIKE '%inner%' LIMIT 1 FORMAT Vertical;
Row 1:
──────
table_catalog: default
table_schema:  default
table_name:    describe_example
table_type:    BASE TABLE

3- columns: Contains metadata about the columns in the tables in the database schema, including the column name, data type, and whether the column is nullable or not.

SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE (table_schema=currentDatabase() OR table_schema='') AND table_name NOT LIKE '%inner%' LIMIT 1 FORMAT Vertical;
Row 1:
──────
table_catalog:            default
table_schema:             default
table_name:               describe_example
column_name:              id
ordinal_position:         1
column_default:
is_nullable:              0
data_type:                UInt64
character_maximum_length: ᴺᵁᴸᴸ
character_octet_length:   ᴺᵁᴸᴸ
numeric_precision:        64
numeric_precision_radix:  2
numeric_scale:            0
datetime_precision:       ᴺᵁᴸᴸ
character_set_catalog:    ᴺᵁᴸᴸ
character_set_schema:     ᴺᵁᴸᴸ
character_set_name:       ᴺᵁᴸᴸ
collation_catalog:        ᴺᵁᴸᴸ
collation_schema:         ᴺᵁᴸᴸ
collation_name:           ᴺᵁᴸᴸ
domain_catalog:           ᴺᵁᴸᴸ
domain_schema:            ᴺᵁᴸᴸ
domain_name:              ᴺᵁᴸᴸ

4- views: Contains metadata about the views in the database schema, including the view name, definition, and the tables used in the view.

CREATE VIEW v (n Nullable(Int32), f Float64) AS SELECT n, f FROM t;
CREATE MATERIALIZED VIEW mv ENGINE = Null AS SELECT * FROM system.one;
SELECT * FROM information_schema.views WHERE table_schema = currentDatabase() LIMIT 1 FORMAT Vertical;
Row 1:
──────
table_catalog:              default
table_schema:               default
table_name:                 mv
view_definition:            SELECT * FROM system.one
check_option:               NONE
is_updatable:               NO
is_insertable_into:         YES
is_trigger_updatable:       NO
is_trigger_deletable:       NO
is_trigger_insertable_into: NO

 

What are the use cases of information_schema tables in ClickHouse?

The information_schema tables in ClickHouse are useful for a variety of use cases, including:

  1. Data dictionary: The information_schema tables can be used to create a data dictionary that provides information about the database schema, tables, and columns. This information can be used by data analysts and developers to understand the data model and schema of the database.
  2. Database administration: The information_schema tables can be used by database administrators to monitor the database and diagnose issues. For example, the processes table can be used to monitor the running queries on the server and diagnose performance issues.
  3. Query optimization: The information_schema tables can be used to optimize queries by providing metadata about the database schema, tables, and columns. For example, the columns table can be used to determine the data type and nullability of columns, which can be used to optimize query execution.
  4. Cluster management: The information_schema tables can be used to manage ClickHouse clusters, including monitoring the cluster status, adding or removing nodes from the cluster, and changing the replication factor.

Conclusion

In conclusion, the information_schema database in ClickHouse is a powerful tool that provides a set of virtual system tables containing metadata about the database schema, tables, columns, functions, processes, views, clusters, and system events. These tables can be used for a wide range of use cases, such as data dictionary creation, database administration, query optimization, and cluster management. By leveraging the information_schema tables, users can better understand their data model, diagnose performance issues, optimize queries, and manage their ClickHouse clusters more effectively. The information_schema tables are an essential part of ClickHouse’s system design, making it a versatile and powerful database management system for real-time data processing and analytics.

To read more about ClickHouse internals, do consider reading the following articles

About Can Sayn 41 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website