Introduction
In ClickHouse, you can monitor query latency and waits/latches using the system.query_log and system.metrics tables. Here is an example query that can be used to monitor query latency and wait/latch events:
SELECT query_id, toDateTime(query_start_time) AS query_start_time, round((query_duration_ms / 1000), 2) AS query_duration_sec, round((read_duration_ms / 1000), 2) AS read_duration_sec, round((execution_time_ms / 1000), 2) AS execution_time_sec, round((result_rows / execution_time_ms) * 1000, 2) AS rows_per_sec, round((result_bytes / execution_time_ms) * 1000, 2) AS bytes_per_sec, arrayStringConcat(waits, ', ') AS waits, arrayStringConcat(latches, ', ') AS latches FROM system.query_log WHERE type = 'QueryFinish' ORDER BY query_start_time DESC LIMIT 100;
This query selects query-related columns from the system.query_log table, including the query ID, start time, duration, read duration, execution time, rows per second, bytes per second, and waits/latches. The waits and latches are concatenated into comma-separated lists using the arrayStringConcat function.
To get more detailed information about the waits and latches, you can use the system.metrics table. Here is an example query that joins the system.metrics table with the system.query_log table to show detailed information about the waits and latches for a particular query:
SELECT query_id, event_name, value AS count, round((duration_ms / 1000), 2) AS duration_sec, round((value / duration_ms) * 1000, 2) AS rate_per_sec FROM system.metrics ANY LEFT JOIN system.query_log ON query_id = metric_query_id WHERE query_id = 'YOUR_QUERY_ID_HERE' AND event_name LIKE 'Wait%' OR event_name LIKE 'Latch%' ORDER BY duration_sec DESC;
This query selects wait and latch-related columns from the system.metrics table and joins it with the system.query_log table using the query ID. The query filters by a particular query ID and event names starting with ‘Wait’ or ‘Latch’. The results include the event name, count, duration, and rate per second for each wait/latch event.
Conclusion
These queries can help identify slow queries and the waits/latches causing wait errors in ClickHouse. Note that monitoring query-related information can be resource-intensive, so it is important to use these queries judiciously and with care on busy systems.
To read more locks and waits in ClickHouse, please do consider reading the below articles