Introduction
Disk I/O performance can greatly impact ClickHouse performance because ClickHouse stores data on disk and relies on disk I/O to read and write data. If disk I/O performance is slow, it can cause queries to take longer to execute, resulting in poor query performance and a decrease in overall system throughput. Additionally, slow disk I/O can also cause data replication to lag, leading to data inconsistencies. Therefore it is important to monitor disk I/O performance and take steps to improve it if necessary, such as by using faster storage devices or optimizing disk I/O settings.
Top 50 Expensive Query Operations by disk I/O operations
I have copied below the query which I use for troubleshooting Disk I/O performance in ClickHouse:
WITH query_stats AS (
SELECT query, thread_id, source, SUM(bytes_read) as bytes_read, SUM(duration) as duration
FROM system.query_thread_log
GROUP BY query, thread_id, source
ORDER BY bytes_read DESC
LIMIT 50
)
SELECT query, thread_id, source, bytes_read, duration
FROM query_stats
ORDER BY duration DESC
Conclusion
This query will give you the top 50 most expensive queries by disk I/O operations, along with their thread ID, source, and duration. The results are ordered by duration, so the most expensive queries by duration will be at the top. Note that the system table query_thread_log keeps query statistics on the running queries, so you can use it to monitor the performance of your queries in real-time.
To read more about troubleshooting Disk I/O, do consider reading the following articles
- ClickHouse Troubleshooting: Runbook for Resolving Excessive Logical IOs
- ClickHouse Troubleshooting: Runbook for Resolving Disk I/O Performance
- ClickHouse Troubleshooting with I/O Performance Counters
You might also like:
- Comprehensive Guide to ChistaDATA’s ClickHouse Performance Audit
- From Snowflake to ClickHouse: How ChistaDATA Enabled the World’s Largest Ad Tech Platform’s Migration and Built an Optimal Real-Time Analytics Infrastructure
- Why Delta Updates Are Not Recommended in OLAP Databases: A Performance and Efficiency Perspective
- Troubleshooting ClickHouse in Kubernetes
- Migrating MSSQL Database to ClickHouse