Introduction
Among DBAs, there is much debate regarding the significance of database performance tuning. As we now understand, various types of data are what really power the business world. It should be operating very effectively and always be accessible to ensure that the data is quickly and easily accessible for all beneficiaries.
Getting a regular database performance audit is one of the finest strategies to deal with previous performance problems however, some performance tips are unique to each database. These unique techniques present excellent database performance, and in this article, we’ll mention one for ClickHouse.
Utilizing the EXPLAIN Output in ClickHouse
To make queries as quick as possible, ClickHouse offers an EXPLAIN statement that can explain how queries are run, and let’s make a table and fill it with random stuff to examine clearly.
CREATE TABLE exp_test ( `id` UInt64, `address` String, `value` Varchar ) ENGINE = MergeTree ORDER BY id
Our test table will contain one billion rows.
INSERT INTO exp_test SELECT * FROM generateRandom('id UInt64, address String, value Varchar') LIMIT 10000000
The EXPLAIN command will be run; however, as you can see, that default output is so unhelpful, but since the statement may be customized, let’s look at how to learn more about how queries work.
EXPLAIN SELECT * FROM exp_test WHERE id = yesterday() ┌─explain─────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Filter (WHERE) │ │ ReadFromMergeTree (sourcedb.exp_test) │ └─────────────────────────────────────────────┘
We must comprehend how ClickHouse will use indexes for our query if we are to understand the performance of our searches. Additionally, JSON formatted output is easier to read. To accomplish this, we combined the EXPLAIN statement with the json, indexes, and description properties.
EXPLAIN json = 1, indexes = 1, description = 1 SELECT * FROM exp_test WHERE id = yesterday() ┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ [ { "Plan": { "Node Type": "Expression", "Description": "(Projection + Before ORDER BY)", "Plans": [ { "Node Type": "Filter", "Description": "WHERE", "Plans": [ { "Node Type": "ReadFromMergeTree", "Description": "sourcedb.exp_test", "Indexes": [ { "Type": "PrimaryKey", "Keys": ["id"], "Condition": "(id in [1668891600, 1668891600])", "Initial Parts": 5, "Selected Parts": 5, "Initial Granules": 1221, "Selected Granules": 5 } ] } ] } ] } } ] │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
ClickHouse was able to run the query in the example above using the primary key, and to get results, it needed five parts (out of a total of 5) and five granules (out of a total of 1221). For the fastest query execution times, it is generally ideal if there are fewer values for Selected Parts and Selected Granules.
EXPLAIN json = 1, indexes = 1, description = 1 SELECT * FROM exp_test WHERE address = 'custom' ┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ [ { "Plan": { "Node Type": "Expression", "Description": "(Projection + Before ORDER BY)", "Plans": [ { "Node Type": "ReadFromMergeTree", "Description": "sourcedb.exp_test", "Indexes": [ { "Type": "PrimaryKey", "Condition": "true", "Initial Parts": 5, "Selected Parts": 5, "Initial Granules": 1221, "Selected Granules": 1221 } ] } ] } } ] │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Bad news 🙁 All of the granules and parts have been read 100%. Lowering these chosen ratios is most useful method for achieving good query performance. To accomplish this, create a new table with a new primary key, which produces a more positive EXPLAIN output.
Resources Evaluation using EXPLAIN
Using EXPLAIN ESTIMATE, we can also rapidly determine how many resources are required for query execution:
EXPLAIN ESTIMATE SELECT * FROM exp_test WHERE id = yesterday() ┌─database─┬─table────┬─parts─┬──rows─┬─marks─┐ │ sourcedb │ exp_test │ 5 │ 40960 │ 5 │ └──────────┴──────────┴───────┴───────┴───────┘
The rows column contains a total row count.
Conclusion
A query’s execution method is explained via the EXPLAIN keyword; in ClickHouse, you can use EXPLAIN before a query that starts with SELECT only. Also, the standard EXPLAIN statement is not particularly informative. When optimizing queries, using them with additional options can be more beneficial.
To learn more about using the EXPLAIN statement in ClickHouse, read the following articles: