Troubleshooting ClickHouse Performance – Monitoring ClickHouse for Performance
Monitoring ClickHouse Events Performance
In this article you will learn how to check following operations in ClickHouse;
- Monitoring Expensive Query Operations In ClickHouse
- Expensive Queries In ClickHouse By Memory /available RAM usage
- Monitoring The Most Expensive READ Operations Queries Happening On ClickHouse
- Top 5 Most Expensive Threads In ClickHouse By Memory Usage
- Top 5 Queries By Latency
- Average Load Created By SELECT/READ Queries On ClickHouse For Last 7 Days
- Identifying Expensive Query Operations In ClickHouse
- The Most cpu / write / read-intensive Queries From query_log
- Queries By Latency And Memory Consumption In ClickHouse
- Most Expensive ClickHouse Processes By Latency
- ClickHouse Query Performance (for last 24 hours) From Most Expensive Threads Perspective
- Monitoring ClickHouse Error Log
- Auditing ClickHouse Threads
- Measure ClickHouse Infrastructure Operation Throughput For Every 5 Minutes
- Monitoring Query Latency For Last 24 Hours (time-series query performance data for every minute)
- Most Expensive ClickHouse Processes (Top ClickHouse)
- Most Expensive ClickHouse Thread Performance Operations
- Monitoring Slow Queries By Response Time/Latency And IOPs
- Top Most READ Intensive Queries In ClickHouse
- Top Most WRITE Intensive Queries In ClickHouse
- Find Top Largest ClickHouse Tables
- Find Largest Databases On ClickHouse
Configuring ClickHouse config.xml for logging
An example of config.xml file. As default “trace” is enabled.
<yandex> <logger> <!-- Possible levels [1]: - none (turns off logging) - fatal - critical - error - warning - notice - information - debug - trace [1]: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105-L114 --> <level>trace</level> <log>/var/log/clickhouse-server/clickhouse-server.log</log> <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog> <!-- Rotation policy See https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/FileChannel.h#L54-L85 --> <size>1000M</size> <count>10</count> <!-- <console>1</console> --> <!-- Default behavior is autodetection(log to console if not daemon mode and is tty) --> <!-- Per level overrides (legacy): For example to suppress logging of the ConfigReloader you can use: NOTE: levels.logger is reserved, see below. --> <!-- <levels> <ConfigReloader>none</ConfigReloader> </levels> --> <!-- Per level overrides: For example to suppress logging of the RBAC for default user you can use: (But please note that the logger name maybe changed from version to version, even after minor upgrade) --> <!-- <levels> <logger> <name>ContextAccess (default)</name> <level>none</level> </logger> <logger> <name>DatabaseOrdinary (test)</name> <level>none</level> </logger> </levels> --> </logger> </yandex>
quite often you want to have on-disk group by / order by enabled (both disabled by default)
cat /etc/clickhouse-server/users.d/enable_on_disk_operations.xml <yandex> <profiles> <default> <max_bytes_before_external_group_by>2000000000</max_bytes_before_external_group_by> <max_bytes_before_external_sort>2000000000</max_bytes_before_external_sort> </default> </profiles> </yandex>
on older versions, log_queries is disabled by default. It’s worth having it enabled always.
$ cat /etc/clickhouse-server/users.d/log_queries.xml <yandex> <profiles> <default> <log_queries>1</log_queries> <log_query_threads>1</log_query_threads> </default> </profiles> </yandex>
part_log – may be nice, especially at the beginning / during system tuning/analyze.
cat /etc/clickhouse-server/config.d/part_log.xml <yandex> <part_log replace="1"> <database>system</database> <table>part_log</table> <flush_interval_milliseconds>7500</flush_interval_milliseconds> <engine> ENGINE = MergeTree PARTITION BY toYYYYMM(event_date) ORDER BY (event_time) TTL toStartOfMonth(event_date) + INTERVAL 3 MONTH SETTINGS ttl_only_drop_parts=1 </engine> </part_log> </yandex>
query_log and other _log tables – set up TTL, or some other cleanup procedures.
cat /etc/clickhouse-server/config.d/query_log.xml <yandex> <query_log replace="1"> <database>system</database> <table>query_log</table> <flush_interval_milliseconds>7500</flush_interval_milliseconds> <engine> ENGINE = MergeTree PARTITION BY event_date ORDER BY (event_time) TTL event_date + interval 90 day SETTINGS ttl_only_drop_parts=1 </engine> </query_log> </yandex>
Queries
1. Monitoring Expensive Query Operations In ClickHouse
select * from (select * from (select toDateTime(system.metric_log.event_time) as "Event Date Time", sum(system.metric_log.ProfileEvent_InsertedRows) as "Number of rows INSERTed to all tables.", sum(system.metric_log.ProfileEvent_DiskWriteElapsedMicroseconds) as "Total time spent waiting for write syscall. This include writes to page cache.", sum(system.metric_log.ProfileEvent_InsertQuery) as "Total number of INSERT queries.", sum(system.metric_log.ProfileEvent_InsertedBytes) as "Total data INSERTed in Bytes", sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time of all queries", sum(system.metric_log.ProfileEvent_SelectQuery) as "Total number of SELECT queries", sum(system.metric_log.ProfileEvent_SelectQueryTimeMicroseconds) as "Total time of SELECT queries.", sum(system.metric_log.ProfileEvent_SelectedRows) as "Total number of rows SELECTed", sum(system.metric_log.ProfileEvent_DiskReadElapsedMicroseconds) as "Total time spent waiting for read syscall. This include reads from page cache.", sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time of all queries." from system.metric_log group by system.metric_log.event_time,system.metric_log.ProfileEvent_InsertedRows, system.metric_log.ProfileEvent_DiskWriteElapsedMicroseconds, system.metric_log.ProfileEvent_InsertQuery,system.metric_log.ProfileEvent_InsertedBytes, system.metric_log.ProfileEvent_QueryTimeMicroseconds, system.metric_log.ProfileEvent_SelectQuery, system.metric_log.ProfileEvent_SelectQueryTimeMicroseconds, system.metric_log.ProfileEvent_SelectedRows, system.metric_log.ProfileEvent_DiskReadElapsedMicroseconds, system.metric_log.ProfileEvent_QueryTimeMicroseconds order by system.metric_log.ProfileEvent_QueryTimeMicroseconds desc ));
Output;
Event Date Time | Number of rows INSERTed to all tables | Total time spent waiting for write syscall. This include writes to page cache | Total number of INSERT queries | Total data INSERTed in Bytes | Total time of all queries | Total number of SELECT queries | Total time of SELECT queries | Total number of rows SELECTed | Total time spent waiting for read syscall. This include reads from page cache | Total time of all queries |
2022-12-11 12:01:33.000 | 186323 | 32206 | 0 | 467000485 | 168278623 | 0 | 0 | 186323 | 996845 | 168278623 |
2022-12-11 00:01:22.000 | 412253 | 21412 | 0 | 966018280 | 128038826 | 0 | 0 | 412253 | 834287 | 128038826 |
2022-12-12 16:01:09.000 | 716421 | 40039 | 0 | 955276843 | 125437290 | 1 | 1097 | 716425 | 557773 | 125437290 |
2022-12-12 20:01:04.000 | 662376 | 36003 | 0 | 900578104 | 116752507 | 0 | 0 | 662376 | 818950 | 116752507 |
2022-12-12 00:01:04.000 | 383957 | 25265 | 0 | 711821462 | 107371270 | 17 | 83512 | 2817741653 | 2320227 | 107371270 |
2. Expensive Queries In ClickHouse By Memory /available RAM usage
select system.processes.client_hostname, system.processes.client_name, system.processes.query, system.processes.memory_usage, system.processes.read_rows from system.processes where system.processes.peak_memory_usage > ( select avg(system.processes.peak_memory_usage) from system.processes );
Output;
Row 1: ────── client_hostname: client_name: query: select system.processes.client_hostname, system.processes.client_name, system.processes.query, system.processes.memory_usage, system.processes.read_rows from system.processes where system.processes.peak_memory_usage > ( select avg(system.processes.peak_memory_usage) from system.processes ) format Vertical memory_usage: 29024 read_rows: 1
3. Monitoring The Most Expensive READ Operations Queries Happening On ClickHouse
select system.processes.client_hostname, system.processes.client_name, system.processes.query, system.processes.read_rows from system.processes where system.processes.read_rows > ( select avg(system.processes.read_rows) from system.processes );
Output;
Row 1: ────── client_hostname: client_name: query: select system.processes.client_hostname, system.processes.client_name, system.processes.query, system.processes.read_rows from system.processes where system.processes.read_rows > ( select avg(system.processes.read_rows) from system.processes ) format Vertical read_rows: 1
4. Top 5 Most Expensive Threads In ClickHouse By Memory Usage
select system.query_thread_log.query, system.query_thread_log.query_start_time_microseconds, system.query_thread_log.query_duration_ms, system.query_thread_log.read_rows, system.query_thread_log.written_rows, system.query_thread_log.memory_usage from system.query_thread_log where system.query_thread_log.memory_usage > ( select avg(system.query_thread_log.memory_usage) from system.query_thread_log ) limit 5
5. Top 5 Queries By Latency
select system.processes.client_hostname, system.processes.client_name, system.processes.query, system.processes.read_rows, system.processes.elapsed as " Time in seconds since request execution started" from system.processes where system.processes.read_rows >= ( select avg(system.processes.read_rows) from system.processes );
Output;
Row 1: ────── client_hostname: client_name: query: select system.processes.client_hostname, system.processes.client_name, system.processes.query, system.processes.read_rows, system.processes.elapsed as " Time in seconds since request execution started" from system.processes where system.processes.read_rows >= ( select avg(system.processes.read_rows) from system.processes ) format Vertical read_rows: 1 Time in seconds since request execution started: 0.002441698
6. Average Load Created By SELECT/READ Queries On ClickHouse For Last 7 Days
select system.metric_log.event_date as "Date", sum(system.metric_log.CurrentMetric_QueryThread) as "Total number of query processing threads", avg(system.metric_log.CurrentMetric_QueryThread) as "Average number of query processing threads", sum(system.metric_log.ProfileEvent_Query) as "Total number of queries executed", avg(system.metric_log.ProfileEvent_Query as "Average number of queries executed"), sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time (in Microseconds) spend executing queries", avg(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Average time (in Microseconds) spend executing queries", sum(system.metric_log.ProfileEvent_NetworkSendElapsedMicroseconds) as "Total Nework Latency for sending Data to Network from ClickHouse", avg(system.metric_log.ProfileEvent_NetworkSendElapsedMicroseconds) as "Average Nework Latency for sending Data to Network from ClickHouse", sum(system.metric_log.ProfileEvent_FileOpen) as "Total number of Files Opened", avg(system.metric_log.ProfileEvent_FileOpen) as "Average number of Files Opened", sum(system.metric_log.ProfileEvent_SelectQuery) as "Total number of SELECT queries", avg(system.metric_log.ProfileEvent_SelectQuery) as "Average number of SELECT queries", sum(system.metric_log.ProfileEvent_FailedSelectQuery) "Total failed SELECT queries", sum(system.metric_log.CurrentMetric_Read) as "Total Reads", avg(system.metric_log.CurrentMetric_Read) as "Average Reads" from system.metric_log where system.metric_log.event_date >= now()-interval 7 DAY and system.metric_log.event_date < now() + interval 1 DAY group by system.metric_log.event_date limit 25;
Output;
Row 1: ────── Date: 2022-12-27 Total number of query processing threads: 34 Average number of query processing threads: 0.0003935185185185185 Total number of queries executed: 30960 avg(ProfileEvent_Query): 0.35833333333333334 Total time (in Microseconds) spend executing queries: 774826485 Average time (in Microseconds) spend executing queries: 8967.899131944445 Total Nework Latency for sending Data to Network from ClickHouse: 17221961 Average Nework Latency for sending Data to Network from ClickHouse: 199.32825231481482 Total number of Files Opened: 4184232 Average number of Files Opened: 48.42861111111111 Total number of SELECT queries: 26306 Average number of SELECT queries: 0.3044675925925926 Total failed SELECT queries: 45 Total Reads: 173136 Average Reads: 2.003888888888889 Row 2: ────── Date: 2022-12-28 Total number of query processing threads: 28 Average number of query processing threads: 0.00032407407407407406 Total number of queries executed: 30699 avg(ProfileEvent_Query): 0.3553125 Total time (in Microseconds) spend executing queries: 681405011 Average time (in Microseconds) spend executing queries: 7886.632071759259 Total Nework Latency for sending Data to Network from ClickHouse: 6080138 Average Nework Latency for sending Data to Network from ClickHouse: 70.3719675925926 Total number of Files Opened: 4112347 Average number of Files Opened: 47.596608796296294 Total number of SELECT queries: 29023 Average number of SELECT queries: 0.33591435185185187 Total failed SELECT queries: 29 Total Reads: 173251 Average Reads: 2.0052199074074073
7. Identifying Expensive Query Operations In ClickHouse
SELECT ql.user AS user, qtl.query AS Query, qtl.query_start_time AS "Query Start Time", qtl.query_duration_ms AS "Query Duration", qtl.memory_usage AS "Memory Usage", qtl.peak_memory_usage AS "Peak Memory Usage", qtl.written_rows AS "Rows Written", qtl.read_rows AS "Rows Read", ql.databases AS "Databases Present in Query", ql.tables AS "Tables present in the query" FROM system.query_thread_log AS qtl, system.query_log AS ql WHERE qtl.user = ql.user order by qtl.query_duration_ms desc limit 10;
Output;
Row 1: ────── user: default Query: SELECT DISTINCT arrayJoin(extractAll(name, '[\\w_]{2,}')) AS res FROM (SELECT * FROM viewIfPermitted(SELECT name FROM system.functions ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.table_engines ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.formats ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.table_functions ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.data_type_families ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.merge_tree_settings ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.settings ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT cluster FROM system.clusters ELSE null('cluster String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT macro FROM system.macros ELSE null('macro String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT policy_name FROM system.storage_policies ELSE null('policy_name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT concat(func.name, comb.name) AS x FROM system.functions AS func CROSS JOIN system.aggregate_function_combinators AS comb WHERE is_aggregate ELSE null('x String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.databases LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.tables LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.dictionaries LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.columns LIMIT 10000 ELSE null('name String'))) WHERE notEmpty(res) Query Start Time: 2023-01-02 21:22:23 Query Duration: 67 Memory Usage: 9015182 Peak Memory Usage: 9015182 Rows Written: 0 Rows Read: 0 Databases Present in Query: [] Tables present in the query: []
8. The Most CPU / write / read-intensive Queries From query_log
SELECT normalized_query_hash, any(query), count(), sum(query_duration_ms) / 1000 AS QueriesDuration, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'RealTimeMicroseconds')]) / 1000000 AS RealTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'UserTimeMicroseconds')]) / 1000000 AS UserTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'SystemTimeMicroseconds')]) / 1000000 AS SystemTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'DiskReadElapsedMicroseconds')]) / 1000000 AS DiskReadTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'DiskWriteElapsedMicroseconds')]) / 1000000 AS DiskWriteTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'NetworkSendElapsedMicroseconds')]) / 1000000 AS NetworkSendTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'NetworkReceiveElapsedMicroseconds')]) / 1000000 AS NetworkReceiveTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'ZooKeeperWaitMicroseconds')]) / 1000000 AS ZooKeeperWaitTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'OSIOWaitMicroseconds')]) / 1000000 AS OSIOWaitTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'OSCPUWaitMicroseconds')]) / 1000000 AS OSCPUWaitTime, sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'OSCPUVirtualTimeMicroseconds')]) / 1000000 AS OSCPUVirtualTime, sum(read_rows) AS ReadRows, formatReadableSize(sum(read_bytes)) AS ReadBytes, sum(written_rows) AS WrittenRows, formatReadableSize(sum(written_bytes)) AS WrittenBytes, sum(result_rows) AS ResultRows, formatReadableSize(sum(result_bytes)) AS ResultBytes FROM system.query_log WHERE (event_time > (now() - 3600)) AND type in (2,4) -- QueryFinish,ExceptionWhileProcessing GROUP BY normalized_query_hash WITH TOTALS ORDER BY UserTime DESC LIMIT 1 FORMAT Vertical
Output;
Row 1: ────── normalized_query_hash: 13350786718219699381 any(query): SELECT MAX("file_time") as timestamp FROM "default"."github_events" count(): 1 QueriesDuration: 1.095 RealTime: 40.060057 UserTime: 25.884537 SystemTime: 0.959022 DiskReadTime: 2.536537 DiskWriteTime: 0 NetworkSendTime: 0 NetworkReceiveTime: 0 ZooKeeperWaitTime: 0 OSIOWaitTime: 0 OSCPUWaitTime: 3.689139 OSCPUVirtualTime: 26.844435 ReadRows: 5499832413 ReadBytes: 20.49 GiB WrittenRows: 0 WrittenBytes: 0.00 B ResultRows: 1 ResultBytes: 256.00 B Totals: ─────── normalized_query_hash: 0 any(query): SELECT COUNT(DISTINCT "author_association") / COUNT(1) AS uniqueness FROM "default"."github_events" count(): 333 QueriesDuration: 5.382 RealTime: 160.792345 UserTime: 60.971457 SystemTime: 4.375224 DiskReadTime: 2.926343 DiskWriteTime: 0 NetworkSendTime: 0.000694 NetworkReceiveTime: 0.00047 ZooKeeperWaitTime: 0 OSIOWaitTime: 0.3 OSCPUWaitTime: 13.742999 OSCPUVirtualTime: 65.346143 ReadRows: 16504589389 ReadBytes: 31.13 GiB WrittenRows: 0 WrittenBytes: 0.00 B ResultRows: 5396 ResultBytes: 8.73 MiB
9. Queries By Latency And Memory Consumption In ClickHouse
select ql.event_time as "Query Date and Time", ql.query_kind as "Type of the Query", ql.query as "Query", ql.query_duration_ms/1000 as "Query Duration in Seconds", ql.memory_usage as "Memory Usage", ql.written_rows as "Total Number of Rows Written ", ql.read_rows as "Total Number of Rows Read" from system.query_log as ql where (ql.event_time > (now() - 3600)) order by "Query Duration in Seconds" desc limit 1 format Vertical;
Output;
Row 1: ────── Query Date and Time: 2023-01-02 21:01:15 Type of the Query: Select Query: SELECT COUNT(DISTINCT "author_association") AS cardinality FROM "default"."github_events" Query Duration in Seconds: 1.618 Memory Usage: 23352651 Total Number of Rows Written : 0 Total Number of Rows Read: 5499832413
10. Most Expensive ClickHouse Processes By Latency
select sp.client_hostname as "Client Hostname", sp.query as "Query" ,avg(sp.elapsed) as "Average Query Latency", avg(sp.read_rows) as "Average Rows Read", avg(sp.written_rows) as "Average Rows Written", avg(sp.memory_usage) as "Average Memory Usage", avg(sp.peak_memory_usage) as "Peak Memory Usage" from system.processes as sp group by sp.query,sp.client_hostname order by "Average Query Latency" desc format Vertical;
Output;
Row 1: ────── Client Hostname: Query: select sp.client_hostname as "Client Hostname", sp.query as "Query" ,avg(sp.elapsed) as "Average Query Latency", avg(sp.read_rows) as "Average Rows Read", avg(sp.written_rows) as "Average Rows Written", avg(sp.memory_usage) as "Average Memory Usage", avg(sp.peak_memory_usage) as "Peak Memory Usage" from system.processes as sp group by sp.query,sp.client_hostname order by "Average Query Latency" desc format Vertical Average Query Latency: 0.000313866 Average Rows Read: 0 Average Rows Written: 0 Average Memory Usage: 0 Peak Memory Usage: 0
11. ClickHouse Query Performance (for last 24 hours) From Most Expensive Threads Perspective
select qtl.query as "Query", qtl.thread_name as "Thread Name", avg(qtl.query_duration_ms) as "Average Query Duration in Milliseconds", avg(qtl.memory_usage) as "Average Memory Usage", avg(qtl.peak_memory_usage) as "Average Peak Memory Usage", avg(read_rows) as "Average Rows Read", avg(written_rows) as "Average Number of Rows Written", avg(read_bytes) as "Average Reads in Bytes", avg(written_bytes) as "Average Writes in Bytes" from system.query_thread_log qtl WHERE (event_time > (now() - 86400)) group by qtl.query, qtl.thread_name;
12. Monitoring ClickHouse Error Log
select e.name as "Error Name", sum(e.value) as "Value", toDateTime(last_error_time) as "Error Reported Recently" from system.errors e where (e.last_error_time > (now() - 86400)) group by `Error Name`,`Error Reported Recently`;
Output;
Error Name | Value | Error Reported Recently |
TABLE_ALREADY_EXISTS | 268 | 2023-01-02 12:29:58.000 |
UNKNOWN_SETTING | 190 | 2023-01-02 12:28:39.000 |
UNKNOWN_TABLE | 9 | 2023-01-02 21:52:00.000 |
EXTERNAL_SERVER_IS_NOT_RESPONDING | 37 | 2023-01-02 20:43:23.000 |
SYNTAX_ERROR | 32 | 2023-01-02 21:44:14.000 |
CLIENT_HAS_CONNECTED_TO_WRONG_PORT | 18934 | 2023-01-02 21:52:07.000 |
UNEXPECTED_DATA_AFTER_PARSED_VALUE | 315 | 2023-01-02 21:01:09.000 |
TYPE_MISMATCH | 315 | 2023-01-02 21:01:09.000 |
CANNOT_PARSE_QUOTED_STRING | 72 | 2023-01-02 12:29:45.000 |
UNKNOWN_FORMAT | 1 | 2023-01-02 21:48:57.000 |
ACCESS_DENIED | 390 | 2023-01-02 12:29:12.000 |
NETWORK_ERROR | 82 | 2023-01-02 12:25:07.000 |
AUTHENTICATION_FAILED | 146 | 2023-01-02 12:29:10.000 |
CANNOT_READ_ALL_DATA | 190 | 2023-01-02 12:28:39.000 |
IP_ADDRESS_NOT_ALLOWED | 32 | 2023-01-01 21:54:04.000 |
UNKNOWN_USER | 96 | 2023-01-02 12:29:10.000 |
13. Auditing ClickHouse Threads
select tl.thread_name, count(tl.thread_name), tl.logger_name, count(tl.logger_name), tl.message from system.text_log as tl where (tl.event_time > (now() - 86400)) group by tl.logger_name,tl.thread_name,tl.message ;
14. Measure ClickHouse Infrastructure Operation Throughput For Every 5 Minutes
select toStartOfFiveMinutes(toDateTime(ml.event_time)) as Period, avg(ml.CurrentMetric_Query) as "Average Number of Queries Executed", avg(ml.CurrentMetric_QueryThread) as "Average Number of query processing threads", avg(ml.CurrentMetric_Read) as "Average Number of Reads", avg(ml.CurrentMetric_Write) as "Average Number of Writes", avg(ml.CurrentMetric_MemoryTracking) as "Average amount of memory (bytes) allocated by the server", avg(ml.CurrentMetric_DictCacheRequests) as "Average Number of Dictionary Cache Requests" from system.metric_log ml WHERE (event_time > (now() - 86400)) group by Period order by Period desc limit 1 format Vertical;
Output;
Row 1: ────── Period: 2023-01-02 21:55:00 Average Number of Queries Executed: 0 Average Number of query processing threads: 0 Average Number of Reads: 2 Average Number of Writes: 0 Average amount of memory (bytes) allocated by the server: 5371874273.895833 Average Number of Dictionary Cache Requests: 0
15. Monitoring Query Latency For Last 24 Hours (time-series query performance data for every minute)
select toStartOfMinute(event_time) as Period, substring(ql.query,1,50) as Query, formatReadableSize(ql.read_bytes) as Read_Size, formatReadableSize(ql.written_bytes) as Write_size, formatReadableSize(ql.result_bytes) as Result_size, formatReadableSize(ql.memory_usage) as Memory_Usage, ql.query_duration_ms/1000 as Query_Latency_in_Seconds from system.query_log ql WHERE (ql.event_time > (now() - 86400)) order by Period desc limit 1 format Vertical;
Output;
Row 1: ────── Period: 2023-01-02 21:58:00 Query: select toStartOfMinute(event_time) as Period, Read_Size: 1.92 MiB Write_size: 0.00 B Result_size: 3.00 KiB Memory_Usage: 5.66 MiB Query_Latency_in_Seconds: 0.01
16. Most Expensive ClickHouse Processes (Top ClickHouse)
select sp.user as User, substring(sp.query,1,40) as Query, sp.elapsed as Query_Latemcy_in_Seconds, formatReadableSize(sp.memory_usage) as Memory_Consumed_by_ClickHouse_Server, formatReadableSize(sp.peak_memory_usage) as Peak_Memory_Consumed_by_ClickHouse_Server, sp.total_rows_approx as Total_Rows_Processed_Approximately, formatReadableSize(sp.read_bytes) as Total_Reads_in_Bytes, sp.read_rows as Total_Rows_Read, formatReadableSize(sp.written_bytes) as Total_Writes_in_Bytes, sp.written_rows as Total_Rows_Written from system.processes sp order by sp.elapsed desc format Vertical;
Output;
Row 1: ────── User: demo Query: select sp.user as User, substrin Query_Latemcy_in_Seconds: 0.000410529 Memory_Consumed_by_ClickHouse_Server: 0.00 B Peak_Memory_Consumed_by_ClickHouse_Server: 0.00 B Total_Rows_Processed_Approximately: 0 Total_Reads_in_Bytes: 0.00 B Total_Rows_Read: 0 Total_Writes_in_Bytes: 0.00 B Total_Rows_Written: 0
17. Most Expensive ClickHouse Thread Performance Operations
select client_name as Client, client_hostname as Client_Hostname, user as User, query_start_time as Query_Start_Time, query as Query, query_duration_ms / 1000 AS Duration_in_Seconds, round(memory_usage / 1048576) AS Memory_usage_MB, formatReadableSize(written_bytes) as WRITES_in_Bytes, formatReadableSize(read_bytes) as READs_in_Bytes from system.query_thread_log where query_start_time >= date_sub(DAY, 7, today()) order by Duration_in_Seconds desc limit 1 format Vertical;
Output;
Row 1: ────── Client: ClickHouse client Client_Hostname: User: default Query_Start_Time: 2023-01-02 21:22:23 Query: SELECT DISTINCT arrayJoin(extractAll(name, '[\\w_]{2,}')) AS res FROM (SELECT * FROM viewIfPermitted(SELECT name FROM system.functions ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.table_engines ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.formats ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.table_functions ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.data_type_families ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.merge_tree_settings ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.settings ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT cluster FROM system.clusters ELSE null('cluster String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT macro FROM system.macros ELSE null('macro String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT policy_name FROM system.storage_policies ELSE null('policy_name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT concat(func.name, comb.name) AS x FROM system.functions AS func CROSS JOIN system.aggregate_function_combinators AS comb WHERE is_aggregate ELSE null('x String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.databases LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.tables LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.dictionaries LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.columns LIMIT 10000 ELSE null('name String'))) WHERE notEmpty(res) Duration_in_Seconds: 0.067 Memory_usage_MB: 9 WRITES_in_Bytes: 0.00 B READs_in_Bytes: 0.00 B
18. Monitoring Slow Queries By Response Time/Latency And IOPs
select formatDateTime(ql.query_start_time, '%Y/%m/%d %H:%M:%S') as Query_Started,ql.user, substring(ql.query,1,50) as Query, (ql.query_duration_ms/1000) as Query_Duration_in_Seconds, round(ql.memory_usage/1048576) AS Memory_Usage_MB, formatReadableSize(ql.result_bytes) AS result_size_in_bytes, ql.result_rows as Total_Rows_Processed, ql.read_rows as Rows_Read, formatReadableSize(ql.read_bytes) AS READs_IN_Bytes, ql.written_rows as Rows_written, formatReadableSize(qlwritten_bytes) as Writes_IN_Bytes from system.query_log ql where ql.type='QueryFinish' order by Query_Duration_in_Seconds DESC;
19. Top Most READ Intensive Queries In ClickHouse
select formatDateTime(ql.query_start_time, '%Y/%m/%d %H:%M:%S') as Query_Started, ql.user, substring(ql.query,1,50) as Query, (ql.query_duration_ms/1000) as Query_Duration_in_Seconds, round(ql.memory_usage/1048576) AS Memory_Usage_MB, formatReadableSize(ql.result_bytes) AS result_size_in_bytes, ql.result_rows as Total_Rows_Processed, ql.read_rows as Rows_Read, formatReadableSize(ql.read_bytes) AS READs_IN_Bytes, ql.written_rows as Rows_written, formatReadableSize(ql.written_bytes) as Writes_IN_Bytes from system.query_log ql where ql.type='QueryFinish' order by READs_IN_Bytes DESC limit 1 Format Vertical;
Output;
Row 1: ────── Query_Started: 2022/12/27 05:51:38 user: demo Query: select null as TABLE_CAT, database as TABLE_SCHEM, Query_Duration_in_Seconds: 0.004 Memory_Usage_MB: 0 result_size_in_bytes: 33.32 KiB Total_Rows_Processed: 11 Rows_Read: 11 READs_IN_Bytes: 999.00 B Rows_written: 0 Writes_IN_Bytes: 0.00 B
20. Top Most WRITE Intensive Queries In ClickHouse
select formatDateTime(ql.query_start_time, '%Y/%m/%d %H:%M:%S') as Query_Started, ql.user, substring(ql.query,1,50) as Query, (ql.query_duration_ms/1000) as Query_Duration_in_Seconds, round(ql.memory_usage/1048576) AS Memory_Usage_MB, formatReadableSize(ql.result_bytes) AS result_size_in_bytes, ql.result_rows as Total_Rows_Processed, ql.read_rows as Rows_Read, formatReadableSize(ql.read_bytes) AS READs_IN_Bytes, ql.written_rows as Rows_written, formatReadableSize(ql.written_bytes) as Writes_IN_Bytes from system.query_log ql where ql.type='QueryFinish' order by Writes_IN_Bytes DESC limit 1 Format Vertical;
Output;
Row 1: ────── Query_Started: 2022/12/29 16:00:20 user: updater Query: INSERT INTO github_events_1672329620 FORMAT JSONCo Query_Duration_in_Seconds: 22.812 Memory_Usage_MB: 370 result_size_in_bytes: 99.75 MiB Total_Rows_Processed: 174987 Rows_Read: 174987 READs_IN_Bytes: 99.75 MiB Rows_written: 174987 Writes_IN_Bytes: 99.75 MiB
21. Find Top Largest ClickHouse Tables
SELECT database as Database_Name, name as Table_Name, engine as Storage_Engine, storage_policy as Storage_Policy, total_rows as Total_Rows, formatReadableSize(sum(total_bytes)) as Size, is_temporary, metadata_modification_time as Schema_Modification_DateTime FROM system.tables WHERE database != 'system' and total_rows > 0 GROUP BY database, name, engine, is_temporary, metadata_modification_time, storage_policy, total_rows, total_bytes ORDER BY sum(total_bytes) DESC LIMIT 5 format Vertical;
Output;
Row 1: ────── Database_Name: default Table_Name: github_events Storage_Engine: MergeTree Storage_Policy: default Total_Rows: 5499832413 Size: 333.44 GiB is_temporary: 0 Schema_Modification_DateTime: 2021-04-29 09:21:08 Row 2: ────── Database_Name: default Table_Name: tripdata Storage_Engine: MergeTree Storage_Policy: default Total_Rows: 1310903963 Size: 54.55 GiB is_temporary: 0 Schema_Modification_DateTime: 2021-04-29 09:21:08 Row 3: ────── Database_Name: default Table_Name: ontime Storage_Engine: MergeTree Storage_Policy: default Total_Rows: 201575308 Size: 13.30 GiB is_temporary: 0 Schema_Modification_DateTime: 2021-04-29 09:21:08 Row 4: ────── Database_Name: default Table_Name: dict_source_2 Storage_Engine: MergeTree Storage_Policy: default Total_Rows: 10000000 Size: 466.36 MiB is_temporary: 0 Schema_Modification_DateTime: 2022-08-25 23:37:52 Row 5: ────── Database_Name: default Table_Name: airports Storage_Engine: MergeTree Storage_Policy: default Total_Rows: 7543 Size: 377.71 KiB is_temporary: 0 Schema_Modification_DateTime: 2021-04-29 09:27:11
22. Find Largest Databases On ClickHouse
SELECT database as Database_Name, sum(total_rows) as Total_Rows, formatReadableSize(sum(total_bytes)) as Size FROM system.tables WHERE database != 'system' and total_rows > 0 GROUP BY database ORDER BY sum(total_bytes) DESC LIMIT 25 format Vertical;
Output;
Row 1: ────── Database_Name: default Total_Rows: 7022339227 Size: 401.75 GiB
Note: You can disable any or few of the variables as copied below;
cat /etc/clickhouse-server/config.d/z_log_disable.xml <?xml version="1.0"?> <yandex> <asynchronous_metric_log remove="1"/> <metric_log remove="1"/> <query_thread_log remove="1" /> <query_log remove="1" /> <query_views_log remove="1" /> <part_log remove="1"/> <session_log remove="1"/> <text_log remove="1" /> <trace_log remove="1"/> <crash_log remove="1"/> <opentelemetry_span_log remove="1"/> <zookeeper_log remove="1"/> </yandex>