Performance Tricks in ClickHouse – Part I

Among DBAs, there is much debate regarding the significance of database performance tuning. As we now understand, various types of data are what really power the business world. It should be operating very effectively and always be accessible to ensure that the data is quickly and easily accessible for all beneficiaries.

Pexels

Getting a regular database performance audit is one of the finest strategies to deal with previous performance problems however, some performance tips are unique to each database. These unique techniques present excellent database performance, and in this article, we’ll mention one for ClickHouse.

Utilize EXPLAIN Output

To make queries as quick as possible, ClickHouse offers an EXPLAIN statement that can explain how queries are run, and let’s make a table and fill it with random stuff to examine clearly.

CREATE TABLE exp_test
(
    `id` UInt64,
    `address` String,
    `value` Varchar
)
ENGINE = MergeTree
ORDER BY id

Our test table will contain one billion rows.

INSERT INTO exp_test SELECT *
FROM generateRandom('id UInt64, address String, value Varchar')
LIMIT 10000000

The EXPLAIN command will be run; however, as you can see, that default output is so unhelpful, but since the statement may be customized, let’s look at how to learn more about how queries work.

EXPLAIN
SELECT *
FROM exp_test
WHERE id = yesterday()

┌─explain─────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│   Filter (WHERE)                            │
│     ReadFromMergeTree (sourcedb.exp_test)   │
└─────────────────────────────────────────────┘

We must comprehend how ClickHouse will use indexes for our query if we are to understand the performance of our searches. Additionally, JSON formatted output is easier to read. To accomplish this, we combined the EXPLAIN statement with the json, indexes, and description properties.

EXPLAIN json = 1, indexes = 1, description = 1
SELECT *
FROM exp_test
WHERE id = yesterday()

┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ [
  {
    "Plan": {
      "Node Type": "Expression",
      "Description": "(Projection + Before ORDER BY)",
      "Plans": [
        {
          "Node Type": "Filter",
          "Description": "WHERE",
          "Plans": [
            {
              "Node Type": "ReadFromMergeTree",
              "Description": "sourcedb.exp_test",
              "Indexes": [
                {
                  "Type": "PrimaryKey",
                  "Keys": ["id"],
                  "Condition": "(id in [1668891600, 1668891600])",
                  "Initial Parts": 5,
                  "Selected Parts": 5,
                  "Initial Granules": 1221,
                  "Selected Granules": 5
                }
              ]
            }
          ]
        }
      ]
    }
  }
] │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

ClickHouse was able to run the query in the example above using the primary key, and to get results, it needed five parts (out of a total of 5) and five granules (out of a total of 1221). For the fastest query execution times, it is generally ideal if there are fewer values for Selected Parts and Selected Granules.

EXPLAIN json = 1, indexes = 1, description = 1
SELECT *
FROM exp_test
WHERE address = 'custom'

┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ [
  {
    "Plan": {
      "Node Type": "Expression",
      "Description": "(Projection + Before ORDER BY)",
      "Plans": [
        {
          "Node Type": "ReadFromMergeTree",
          "Description": "sourcedb.exp_test",
          "Indexes": [
            {
              "Type": "PrimaryKey",
              "Condition": "true",
              "Initial Parts": 5,
              "Selected Parts": 5,
              "Initial Granules": 1221,
              "Selected Granules": 1221
            }
          ]
        }
      ]
    }
  }
] │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Bad news 🙁 All of the granules and parts have been read 100 %. Lowering these chosen ratios is most useful method for achieving good query performance. To accomplish this, create a new table with a new primary key, which produces a more positive EXPLAIN output.

Resources Evaluation

Using EXPLAIN ESTIMATE, we can also rapidly determine how many resources are required for query execution:

EXPLAIN ESTIMATE
SELECT *
FROM exp_test
WHERE id = yesterday()

┌─database─┬─table────┬─parts─┬──rows─┬─marks─┐
│ sourcedb │ exp_test │     5 │ 40960 │     5 │
└──────────┴──────────┴───────┴───────┴───────┘

The rows column contains a total row count.

Conclusion

A query’s execution method is explained via the EXPLAIN keyword; in ClickHouse, you can use EXPLAIN before a query that starts with SELECT only. Also, the standard EXPLAIN statement is not particularly informative. When optimizing queries, using them with additional options can be more beneficial.

About Ilkay Cetindag 8 Articles
Ilkay has been administering databases including NoSQL and RDBMS for over 7 years. He is experienced working high-scale banking databases. He is currently working at ChistaDATA Inc. as Database Administrator.
Contact: Website

Be the first to comment

Leave a Reply