ClickHouse Troubleshooting: How to identify Top 50 Expensive Query Operations by Disk I/O

Introduction

Disk I/O performance can greatly impact ClickHouse performance because ClickHouse stores data on disk and relies on disk I/O to read and write data. If disk I/O performance is slow, it can cause queries to take longer to execute, resulting in poor query performance and a decrease in overall system throughput. Additionally, slow disk I/O can also cause data replication to lag, leading to data inconsistencies. Therefore it is important to monitor disk I/O performance and take steps to improve it if necessary, such as by using faster storage devices or optimizing disk I/O settings.

Top 50 Expensive Query Operations by disk I/O operations

I have copied below the query which I use for troubleshooting Disk I/O performance in ClickHouse:

WITH query_stats AS (
    SELECT query, thread_id, source, SUM(bytes_read) as bytes_read, SUM(duration) as duration
    FROM system.query_thread_log
    GROUP BY query, thread_id, source
    ORDER BY bytes_read DESC
    LIMIT 50
)
SELECT query, thread_id, source, bytes_read, duration
FROM query_stats
ORDER BY duration DESC

Conclusion

This query will give you the top 50 most expensive queries by disk I/O operations, along with their thread ID, source, and duration. The results are ordered by duration, so the most expensive queries by duration will be at the top. Note that the system table query_thread_log keeps query statistics on the running queries, so you can use it to monitor the performance of your queries in real-time.

To read more about troubleshooting Disk I/O, do consider reading the following articles

About Shiv Iyer 219 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.