1. Home
  2. Knowledge Base
  3. ClickHouse DBA
  4. Troubleshooting ClickHouse Performance
  1. Home
  2. Knowledge Base
  3. ClickHouse Performance
  4. Troubleshooting ClickHouse Performance
  1. Home
  2. Knowledge Base
  3. ClickHouse Support
  4. Troubleshooting ClickHouse Performance

Troubleshooting ClickHouse Performance

Troubleshooting ClickHouse Performance – Monitoring ClickHouse for Performance

Monitoring ClickHouse Events Performance

In this article you will learn how to check following operations in ClickHouse;

  • Monitoring Expensive Query Operations In ClickHouse
  • Expensive Queries In ClickHouse By Memory /available RAM usage
  • Monitoring The Most Expensive READ Operations Queries Happening On ClickHouse
  • Top 5 Most Expensive Threads In ClickHouse By Memory Usage
  • Top 5 Queries By Latency
  • Average Load Created By SELECT/READ Queries On ClickHouse For Last 7 Days
  • Identifying Expensive Query Operations In ClickHouse
  • The Most cpu / write / read-intensive Queries From query_log
  • Queries By Latency And Memory Consumption In ClickHouse
  • Most Expensive ClickHouse Processes By Latency
  • ClickHouse Query Performance (for last 24 hours) From Most Expensive Threads Perspective
  • Monitoring ClickHouse Error Log
  • Auditing ClickHouse Threads
  • Measure ClickHouse Infrastructure Operation Throughput For Every 5 Minutes
  • Monitoring Query Latency For Last 24 Hours (time-series query performance data for every minute)
  • Most Expensive ClickHouse Processes (Top ClickHouse)
  • Most Expensive ClickHouse Thread Performance Operations
  • Monitoring Slow Queries By Response Time/Latency And IOPs
  • Top Most READ Intensive Queries In ClickHouse
  • Top Most WRITE Intensive Queries In ClickHouse
  • Find Top Largest ClickHouse Tables
  • Find Largest Databases On ClickHouse

Configuring ClickHouse config.xml for logging

An example of config.xml file. As default “trace” is enabled.

<yandex>
    <logger>         <!-- Possible levels [1]:
-	none (turns off logging)
-	fatal
-	critical
-	error
-	warning
-	notice
-	information
-	debug
-	trace
            [1]: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105-L114
        -->
        <level>trace</level>
        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
        <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
        <!-- Rotation policy
             See https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/FileChannel.h#L54-L85
          -->
        <size>1000M</size>
        <count>10</count>
        <!-- <console>1</console> --> <!-- Default behavior is autodetection(log to console if not daemon mode and is tty) -->
        <!-- Per level overrides (legacy):
        For example to suppress logging of the ConfigReloader you can use:
        NOTE: levels.logger is reserved, see below.
        -->
        <!--
        <levels>
          <ConfigReloader>none</ConfigReloader>
 </levels>
        -->
        <!-- Per level overrides:
        For example to suppress logging of the RBAC for default user you can use:
        (But please note that the logger name maybe changed from version to
version, even after minor upgrade)
        -->
        <!--
        <levels>
          <logger>
            <name>ContextAccess (default)</name>
            <level>none</level>
          </logger>
          <logger>
            <name>DatabaseOrdinary (test)</name>
            <level>none</level>
          </logger>
</levels>
        -->
    </logger>
</yandex>

quite often you want to have on-disk group by / order by enabled (both disabled by default)

cat /etc/clickhouse-server/users.d/enable_on_disk_operations.xml
<yandex>
    <profiles>
        <default>
<max_bytes_before_external_group_by>2000000000</max_bytes_before_external_group_by>
<max_bytes_before_external_sort>2000000000</max_bytes_before_external_sort>
        </default>
    </profiles>
</yandex>

on older versions, log_queries is disabled by default. It’s worth having it enabled always.

$ cat /etc/clickhouse-server/users.d/log_queries.xml
<yandex>
    <profiles>
        <default>
            <log_queries>1</log_queries>
            <log_query_threads>1</log_query_threads>
        </default>
   </profiles>
</yandex>

part_log – may be nice, especially at the beginning / during system tuning/analyze.

cat /etc/clickhouse-server/config.d/part_log.xml
<yandex>
    <part_log replace="1">
        <database>system</database>
        <table>part_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
        <engine>
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_time)
TTL toStartOfMonth(event_date) + INTERVAL 3 MONTH
SETTINGS ttl_only_drop_parts=1
        </engine>
    </part_log>
</yandex>

query_log and other _log tables – set up TTL, or some other cleanup procedures.

cat /etc/clickhouse-server/config.d/query_log.xml
<yandex>
    <query_log replace="1">
        <database>system</database>
        <table>query_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
        <engine>
ENGINE = MergeTree
PARTITION BY event_date
ORDER BY (event_time)
TTL event_date + interval 90 day
SETTINGS ttl_only_drop_parts=1
        </engine>
    </query_log>
</yandex>

Queries

1. Monitoring Expensive Query Operations In ClickHouse

select *
from (select *
      from (select toDateTime(system.metric_log.event_time) as "Event Date Time",
                   sum(system.metric_log.ProfileEvent_InsertedRows) as "Number of rows INSERTed to all tables.",
                   sum(system.metric_log.ProfileEvent_DiskWriteElapsedMicroseconds) as "Total time spent waiting for write syscall. This include writes to page cache.",
                   sum(system.metric_log.ProfileEvent_InsertQuery) as "Total number of INSERT queries.",
                   sum(system.metric_log.ProfileEvent_InsertedBytes) as "Total data INSERTed in Bytes",
                   sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time of all queries",
                   sum(system.metric_log.ProfileEvent_SelectQuery) as "Total number of SELECT queries", 
                   sum(system.metric_log.ProfileEvent_SelectQueryTimeMicroseconds) as "Total time of SELECT queries.",
                   sum(system.metric_log.ProfileEvent_SelectedRows) as "Total number of rows SELECTed",
                   sum(system.metric_log.ProfileEvent_DiskReadElapsedMicroseconds) as "Total time spent waiting for read syscall. This include reads from page cache.",
                   sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time of all queries."
            from system.metric_log
            group by
                system.metric_log.event_time,system.metric_log.ProfileEvent_InsertedRows,
                system.metric_log.ProfileEvent_DiskWriteElapsedMicroseconds,
                system.metric_log.ProfileEvent_InsertQuery,system.metric_log.ProfileEvent_InsertedBytes,
                system.metric_log.ProfileEvent_QueryTimeMicroseconds,
                system.metric_log.ProfileEvent_SelectQuery,
                system.metric_log.ProfileEvent_SelectQueryTimeMicroseconds,
                system.metric_log.ProfileEvent_SelectedRows,
                system.metric_log.ProfileEvent_DiskReadElapsedMicroseconds,
                system.metric_log.ProfileEvent_QueryTimeMicroseconds
          order by
                system.metric_log.ProfileEvent_QueryTimeMicroseconds desc
                ));

Output;

Event Date Time Number of rows INSERTed to all tables Total time spent waiting for write syscall. This include writes to page cache Total number of INSERT queries Total data INSERTed in Bytes Total time of all queries Total number of SELECT queries Total time of SELECT queries Total number of rows SELECTed Total time spent waiting for read syscall. This include reads from page cache Total time of all queries
2022-12-11 12:01:33.000 186323 32206 0 467000485 168278623 0 0 186323 996845 168278623
2022-12-11 00:01:22.000 412253 21412 0 966018280 128038826 0 0 412253 834287 128038826
2022-12-12 16:01:09.000 716421 40039 0 955276843 125437290 1 1097 716425 557773 125437290
2022-12-12 20:01:04.000 662376 36003 0 900578104 116752507 0 0 662376 818950 116752507
2022-12-12 00:01:04.000 383957 25265 0 711821462 107371270 17 83512 2817741653 2320227 107371270

 

2. Expensive Queries In ClickHouse By Memory /available RAM usage

select 	system.processes.client_hostname,
    system.processes.client_name,
    system.processes.query,
       	system.processes.memory_usage, 
       	system.processes.read_rows
from system.processes
where system.processes.peak_memory_usage > (
       select avg(system.processes.peak_memory_usage)
       from system.processes
);

Output;

Row 1:
──────
client_hostname: 
client_name:     
query:           select system.processes.client_hostname, system.processes.client_name, system.processes.query, system.processes.memory_usage, system.processes.read_rows from system.processes where system.processes.peak_memory_usage > ( select avg(system.processes.peak_memory_usage) from system.processes ) format Vertical
memory_usage:    29024
read_rows:       1

3. Monitoring The Most Expensive READ Operations Queries Happening On ClickHouse

select system.processes.client_hostname,
       system.processes.client_name,
       system.processes.query,
       system.processes.read_rows
from system.processes
where system.processes.read_rows > (
    select avg(system.processes.read_rows)
    from system.processes
);

Output;

Row 1:
──────
client_hostname: 
client_name:     
query:           select system.processes.client_hostname, system.processes.client_name, system.processes.query, system.processes.read_rows from system.processes where system.processes.read_rows > ( select avg(system.processes.read_rows) from system.processes ) format Vertical
read_rows:       1

4. Top 5 Most Expensive Threads In ClickHouse By Memory Usage

select
    system.query_thread_log.query,
    system.query_thread_log.query_start_time_microseconds,
    system.query_thread_log.query_duration_ms,
    system.query_thread_log.read_rows,
    system.query_thread_log.written_rows, 
    system.query_thread_log.memory_usage
from
    system.query_thread_log
where system.query_thread_log.memory_usage >
(
select avg(system.query_thread_log.memory_usage)
from
    system.query_thread_log
)
limit 5

5. Top 5 Queries By Latency

select system.processes.client_hostname,
       system.processes.client_name,
       system.processes.query,
       system.processes.read_rows,
       system.processes.elapsed as " Time in seconds since request execution
started"
from system.processes
where system.processes.read_rows >= (
    select avg(system.processes.read_rows)
    from system.processes
);

Output;

Row 1:
──────
client_hostname:                                  
client_name:                                      
query: select system.processes.client_hostname, system.processes.client_name, system.processes.query, system.processes.read_rows, system.processes.elapsed as " Time in seconds since request execution started" from system.processes where system.processes.read_rows >= ( select avg(system.processes.read_rows) from system.processes ) format Vertical
read_rows:                                        1
 Time in seconds since request execution started: 0.002441698

6. Average Load Created By SELECT/READ Queries On ClickHouse For Last 7 Days

select
system.metric_log.event_date as "Date",
sum(system.metric_log.CurrentMetric_QueryThread) as "Total number of query processing threads",
avg(system.metric_log.CurrentMetric_QueryThread) as "Average number of query processing threads",
sum(system.metric_log.ProfileEvent_Query) as "Total number of queries executed",
avg(system.metric_log.ProfileEvent_Query as "Average number of queries executed"),
sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time (in Microseconds) spend executing queries",
avg(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Average time (in Microseconds) spend executing queries",
sum(system.metric_log.ProfileEvent_NetworkSendElapsedMicroseconds) as "Total Nework Latency for sending Data to Network from ClickHouse",
avg(system.metric_log.ProfileEvent_NetworkSendElapsedMicroseconds) as "Average Nework Latency for sending Data to Network from ClickHouse",
sum(system.metric_log.ProfileEvent_FileOpen) as "Total number of Files Opened",
avg(system.metric_log.ProfileEvent_FileOpen) as "Average number of Files Opened",
sum(system.metric_log.ProfileEvent_SelectQuery) as "Total number of SELECT queries",
avg(system.metric_log.ProfileEvent_SelectQuery) as "Average number of SELECT queries",
sum(system.metric_log.ProfileEvent_FailedSelectQuery) "Total failed SELECT queries",
sum(system.metric_log.CurrentMetric_Read) as "Total Reads",
avg(system.metric_log.CurrentMetric_Read) as "Average Reads"
from
    system.metric_log
where
system.metric_log.event_date >= now()-interval 7 DAY
and
system.metric_log.event_date < now() + interval 1 DAY
group by
system.metric_log.event_date
limit 25;

Output;

Row 1:
──────
Date:                                                               2022-12-27
Total number of query processing threads:                           34
Average number of query processing threads:                         0.0003935185185185185
Total number of queries executed:                                   30960
avg(ProfileEvent_Query):                                            0.35833333333333334
Total time (in Microseconds) spend executing queries:               774826485
Average time (in Microseconds) spend executing queries:             8967.899131944445
Total Nework Latency for sending Data to Network from ClickHouse:   17221961
Average Nework Latency for sending Data to Network from ClickHouse: 199.32825231481482
Total number of Files Opened:                                       4184232
Average number of Files Opened:                                     48.42861111111111
Total number of SELECT queries:                                     26306
Average number of SELECT queries:                                   0.3044675925925926
Total failed SELECT queries:                                        45
Total Reads:                                                        173136
Average Reads:                                                      2.003888888888889

Row 2:
──────
Date:                                                               2022-12-28
Total number of query processing threads:                           28
Average number of query processing threads:                         0.00032407407407407406
Total number of queries executed:                                   30699
avg(ProfileEvent_Query):                                            0.3553125
Total time (in Microseconds) spend executing queries:               681405011
Average time (in Microseconds) spend executing queries:             7886.632071759259
Total Nework Latency for sending Data to Network from ClickHouse:   6080138
Average Nework Latency for sending Data to Network from ClickHouse: 70.3719675925926
Total number of Files Opened:                                       4112347
Average number of Files Opened:                                     47.596608796296294
Total number of SELECT queries:                                     29023
Average number of SELECT queries:                                   0.33591435185185187
Total failed SELECT queries:                                        29
Total Reads:                                                        173251
Average Reads:                                                      2.0052199074074073

7. Identifying Expensive Query Operations In ClickHouse

SELECT
    ql.user AS user,
    qtl.query AS Query,
    qtl.query_start_time AS "Query Start Time",
    qtl.query_duration_ms AS "Query Duration",
    qtl.memory_usage AS "Memory Usage",
    qtl.peak_memory_usage AS "Peak Memory Usage",
    qtl.written_rows AS "Rows Written",
    qtl.read_rows AS "Rows Read",
    ql.databases AS "Databases Present in Query",
    ql.tables AS "Tables present in the query"
FROM system.query_thread_log AS qtl, system.query_log AS ql
WHERE qtl.user = ql.user
order by qtl.query_duration_ms desc
limit 10;

Output;

Row 1:
──────
user:                        default
Query:                       SELECT DISTINCT arrayJoin(extractAll(name, '[\\w_]{2,}')) AS res FROM (SELECT * FROM viewIfPermitted(SELECT name FROM system.functions ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.table_engines ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.formats ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.table_functions ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.data_type_families ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.merge_tree_settings ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.settings ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT cluster FROM system.clusters ELSE null('cluster String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT macro FROM system.macros ELSE null('macro String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT policy_name FROM system.storage_policies ELSE null('policy_name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT concat(func.name, comb.name) AS x FROM system.functions AS func CROSS JOIN system.aggregate_function_combinators AS comb WHERE is_aggregate ELSE null('x String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.databases LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.tables LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.dictionaries LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.columns LIMIT 10000 ELSE null('name String'))) WHERE notEmpty(res)
Query Start Time:            2023-01-02 21:22:23
Query Duration:              67
Memory Usage:                9015182
Peak Memory Usage:           9015182
Rows Written:                0
Rows Read:                   0
Databases Present in Query:  []
Tables present in the query: []

 

8. The Most CPU / write / read-intensive Queries From query_log

SELECT
    normalized_query_hash,
    any(query),
    count(),
    sum(query_duration_ms) / 1000 AS QueriesDuration,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'RealTimeMicroseconds')]) / 1000000 AS RealTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'UserTimeMicroseconds')]) / 1000000 AS UserTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'SystemTimeMicroseconds')]) / 1000000 AS SystemTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'DiskReadElapsedMicroseconds')]) / 1000000 AS DiskReadTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'DiskWriteElapsedMicroseconds')]) / 1000000 AS DiskWriteTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'NetworkSendElapsedMicroseconds')]) / 1000000 AS NetworkSendTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'NetworkReceiveElapsedMicroseconds')]) / 1000000 AS NetworkReceiveTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'ZooKeeperWaitMicroseconds')]) / 1000000 AS ZooKeeperWaitTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'OSIOWaitMicroseconds')]) / 1000000 AS OSIOWaitTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'OSCPUWaitMicroseconds')]) / 1000000 AS OSCPUWaitTime,
    sum(ProfileEvents.Values[indexOf(ProfileEvents.Names,'OSCPUVirtualTimeMicroseconds')]) / 1000000 AS OSCPUVirtualTime,
    sum(read_rows) AS ReadRows,
    formatReadableSize(sum(read_bytes)) AS ReadBytes,
    sum(written_rows) AS WrittenRows,
    formatReadableSize(sum(written_bytes)) AS WrittenBytes,
    sum(result_rows) AS ResultRows,
    formatReadableSize(sum(result_bytes)) AS ResultBytes
FROM system.query_log
WHERE (event_time > (now() - 3600)) AND type in (2,4) -- QueryFinish,ExceptionWhileProcessing
GROUP BY normalized_query_hash
    WITH TOTALS
ORDER BY UserTime DESC
LIMIT 1
FORMAT Vertical

Output;

Row 1:
──────
normalized_query_hash: 13350786718219699381
any(query):            SELECT MAX("file_time") as timestamp
    FROM "default"."github_events"
count():               1
QueriesDuration:       1.095
RealTime:              40.060057
UserTime:              25.884537
SystemTime:            0.959022
DiskReadTime:          2.536537
DiskWriteTime:         0
NetworkSendTime:       0
NetworkReceiveTime:    0
ZooKeeperWaitTime:     0
OSIOWaitTime:          0
OSCPUWaitTime:         3.689139
OSCPUVirtualTime:      26.844435
ReadRows:              5499832413
ReadBytes:             20.49 GiB
WrittenRows:           0
WrittenBytes:          0.00 B
ResultRows:            1
ResultBytes:           256.00 B


Totals:
───────
normalized_query_hash: 0
any(query):            SELECT COUNT(DISTINCT "author_association") / COUNT(1) AS uniqueness
    FROM "default"."github_events"
count():               333
QueriesDuration:       5.382
RealTime:              160.792345
UserTime:              60.971457
SystemTime:            4.375224
DiskReadTime:          2.926343
DiskWriteTime:         0
NetworkSendTime:       0.000694
NetworkReceiveTime:    0.00047
ZooKeeperWaitTime:     0
OSIOWaitTime:          0.3
OSCPUWaitTime:         13.742999
OSCPUVirtualTime:      65.346143
ReadRows:              16504589389
ReadBytes:             31.13 GiB
WrittenRows:           0
WrittenBytes:          0.00 B
ResultRows:            5396
ResultBytes:           8.73 MiB

9. Queries By Latency And Memory Consumption In ClickHouse

select
    ql.event_time as "Query Date and Time",
    ql.query_kind as "Type of the Query",
    ql.query as "Query",
    ql.query_duration_ms/1000 as "Query Duration in Seconds",
    ql.memory_usage as "Memory Usage",
    ql.written_rows as "Total Number of Rows Written ",
    ql.read_rows as "Total Number of Rows Read"
        from
            system.query_log as ql
    where (ql.event_time > (now() - 3600))
order by "Query Duration in Seconds" desc 
limit 1
format Vertical;

Output;

Row 1:
──────
Query Date and Time:           2023-01-02 21:01:15
Type of the Query:             Select
Query:                         SELECT COUNT(DISTINCT "author_association") AS cardinality
    FROM "default"."github_events"
Query Duration in Seconds:     1.618
Memory Usage:                  23352651
Total Number of Rows Written : 0
Total Number of Rows Read:     5499832413

10. Most Expensive ClickHouse Processes By Latency

select
    sp.client_hostname as "Client Hostname",
    sp.query as "Query"
       ,avg(sp.elapsed) as "Average Query Latency",
        avg(sp.read_rows) as "Average Rows Read",
        avg(sp.written_rows) as "Average Rows Written",
       avg(sp.memory_usage) as "Average Memory Usage",
       avg(sp.peak_memory_usage) as "Peak Memory Usage"
       from system.processes as sp
group by sp.query,sp.client_hostname
order by "Average Query Latency" desc format Vertical;

Output;

Row 1:
──────
Client Hostname:       
Query:                 select
    sp.client_hostname as "Client Hostname",
    sp.query as "Query"
       ,avg(sp.elapsed) as "Average Query Latency",
        avg(sp.read_rows) as "Average Rows Read",
        avg(sp.written_rows) as "Average Rows Written",
       avg(sp.memory_usage) as "Average Memory Usage",
       avg(sp.peak_memory_usage) as "Peak Memory Usage"
       from system.processes as sp
group by sp.query,sp.client_hostname
order by "Average Query Latency" desc format Vertical
Average Query Latency: 0.000313866
Average Rows Read:     0
Average Rows Written:  0
Average Memory Usage:  0
Peak Memory Usage:     0

11. ClickHouse Query Performance (for last 24 hours) From Most Expensive Threads Perspective

select
    qtl.query as "Query",
    qtl.thread_name as "Thread Name",
    avg(qtl.query_duration_ms) as "Average Query Duration in Milliseconds",
    avg(qtl.memory_usage) as "Average Memory Usage",
    avg(qtl.peak_memory_usage) as "Average Peak Memory Usage",
    avg(read_rows) as "Average Rows Read",
    avg(written_rows) as "Average Number of Rows Written",
    avg(read_bytes) as "Average Reads in Bytes",
    avg(written_bytes) as "Average Writes in Bytes"
from
    system.query_thread_log qtl
WHERE (event_time > (now() - 86400))
group by qtl.query, qtl.thread_name;

12. Monitoring ClickHouse Error Log

select
    e.name as "Error Name",
    sum(e.value) as "Value",
    toDateTime(last_error_time) as "Error Reported Recently"
from
    system.errors e
where (e.last_error_time > (now() - 86400))
group by `Error Name`,`Error Reported Recently`;

Output;

Error Name Value Error Reported Recently
TABLE_ALREADY_EXISTS 268 2023-01-02 12:29:58.000
UNKNOWN_SETTING 190 2023-01-02 12:28:39.000
UNKNOWN_TABLE 9 2023-01-02 21:52:00.000
EXTERNAL_SERVER_IS_NOT_RESPONDING 37 2023-01-02 20:43:23.000
SYNTAX_ERROR 32 2023-01-02 21:44:14.000
CLIENT_HAS_CONNECTED_TO_WRONG_PORT 18934 2023-01-02 21:52:07.000
UNEXPECTED_DATA_AFTER_PARSED_VALUE 315 2023-01-02 21:01:09.000
TYPE_MISMATCH 315 2023-01-02 21:01:09.000
CANNOT_PARSE_QUOTED_STRING 72 2023-01-02 12:29:45.000
UNKNOWN_FORMAT 1 2023-01-02 21:48:57.000
ACCESS_DENIED 390 2023-01-02 12:29:12.000
NETWORK_ERROR 82 2023-01-02 12:25:07.000
AUTHENTICATION_FAILED 146 2023-01-02 12:29:10.000
CANNOT_READ_ALL_DATA 190 2023-01-02 12:28:39.000
IP_ADDRESS_NOT_ALLOWED 32 2023-01-01 21:54:04.000
UNKNOWN_USER 96 2023-01-02 12:29:10.000

 

13. Auditing ClickHouse Threads

select
    tl.thread_name, count(tl.thread_name), tl.logger_name,
count(tl.logger_name), tl.message
from
    system.text_log as tl
where (tl.event_time > (now() - 86400))
group by tl.logger_name,tl.thread_name,tl.message  ;

14. Measure ClickHouse Infrastructure Operation Throughput For Every 5 Minutes

select
    toStartOfFiveMinutes(toDateTime(ml.event_time)) as Period,
    avg(ml.CurrentMetric_Query) as "Average Number of Queries Executed",
    avg(ml.CurrentMetric_QueryThread) as "Average Number of query processing threads",
    avg(ml.CurrentMetric_Read) as "Average Number of Reads",
    avg(ml.CurrentMetric_Write) as "Average Number of Writes",
    avg(ml.CurrentMetric_MemoryTracking) as "Average amount of memory (bytes) allocated by the server",
    avg(ml.CurrentMetric_DictCacheRequests) as "Average Number of Dictionary Cache Requests"
        from system.metric_log ml
        WHERE (event_time > (now() - 86400))
            group by Period
order by Period desc
limit 1
format Vertical;

Output;

Row 1:
──────
Period:                                                   2023-01-02 21:55:00
Average Number of Queries Executed:                       0
Average Number of query processing threads:               0
Average Number of Reads:                                  2
Average Number of Writes:                                 0
Average amount of memory (bytes) allocated by the server: 5371874273.895833
Average Number of Dictionary Cache Requests:              0

 

15. Monitoring Query Latency For Last 24 Hours (time-series query performance data for every minute)

select
    toStartOfMinute(event_time) as Period,
    substring(ql.query,1,50) as Query,
    formatReadableSize(ql.read_bytes) as Read_Size,
    formatReadableSize(ql.written_bytes) as Write_size,
    formatReadableSize(ql.result_bytes) as Result_size,
    formatReadableSize(ql.memory_usage) as Memory_Usage,
    ql.query_duration_ms/1000 as Query_Latency_in_Seconds
        from
            system.query_log ql
        WHERE (ql.event_time > (now() - 86400))
        order by Period desc
        limit 1
        format Vertical;

Output;

Row 1:
──────
Period:                   2023-01-02 21:58:00
Query:                    select
    toStartOfMinute(event_time) as Period,

Read_Size:                1.92 MiB
Write_size:               0.00 B
Result_size:              3.00 KiB
Memory_Usage:             5.66 MiB
Query_Latency_in_Seconds: 0.01

 

16. Most Expensive ClickHouse Processes (Top ClickHouse)

select
    sp.user as User,
    substring(sp.query,1,40) as Query,
    sp.elapsed as Query_Latemcy_in_Seconds,
    formatReadableSize(sp.memory_usage) as Memory_Consumed_by_ClickHouse_Server,
    formatReadableSize(sp.peak_memory_usage) as
    Peak_Memory_Consumed_by_ClickHouse_Server,
    sp.total_rows_approx as Total_Rows_Processed_Approximately,
    formatReadableSize(sp.read_bytes) as Total_Reads_in_Bytes,
    sp.read_rows as Total_Rows_Read,
    formatReadableSize(sp.written_bytes) as Total_Writes_in_Bytes,
    sp.written_rows as Total_Rows_Written
from
    system.processes sp
order by sp.elapsed desc
format Vertical;

Output;

Row 1:
──────
User:                                      demo
Query:                                     select
    sp.user as User,
    substrin
Query_Latemcy_in_Seconds:                  0.000410529
Memory_Consumed_by_ClickHouse_Server:      0.00 B
Peak_Memory_Consumed_by_ClickHouse_Server: 0.00 B
Total_Rows_Processed_Approximately:        0
Total_Reads_in_Bytes:                      0.00 B
Total_Rows_Read:                           0
Total_Writes_in_Bytes:                     0.00 B
Total_Rows_Written:                        0

17. Most Expensive ClickHouse Thread Performance Operations

select
    client_name as Client,
    client_hostname as Client_Hostname,
    user as User,
    query_start_time as Query_Start_Time,
    query as Query,
    query_duration_ms / 1000 AS Duration_in_Seconds,
    round(memory_usage / 1048576) AS Memory_usage_MB,
    formatReadableSize(written_bytes) as WRITES_in_Bytes,
    formatReadableSize(read_bytes) as READs_in_Bytes
from system.query_thread_log
 where  query_start_time   >= date_sub(DAY, 7, today())
order by Duration_in_Seconds desc
limit 1
format Vertical;

Output;

Row 1:
──────
Client:              ClickHouse client
Client_Hostname:     
User:                default
Query_Start_Time:    2023-01-02 21:22:23
Query:               SELECT DISTINCT arrayJoin(extractAll(name, '[\\w_]{2,}')) AS res FROM (SELECT * FROM viewIfPermitted(SELECT name FROM system.functions ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.table_engines ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.formats ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.table_functions ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.data_type_families ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.merge_tree_settings ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.settings ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT cluster FROM system.clusters ELSE null('cluster String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT macro FROM system.macros ELSE null('macro String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT policy_name FROM system.storage_policies ELSE null('policy_name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT concat(func.name, comb.name) AS x FROM system.functions AS func CROSS JOIN system.aggregate_function_combinators AS comb WHERE is_aggregate ELSE null('x String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT name FROM system.databases LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.tables LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.dictionaries LIMIT 10000 ELSE null('name String')) UNION ALL SELECT * FROM viewIfPermitted(SELECT DISTINCT name FROM system.columns LIMIT 10000 ELSE null('name String'))) WHERE notEmpty(res)
Duration_in_Seconds: 0.067
Memory_usage_MB:     9
WRITES_in_Bytes:     0.00 B
READs_in_Bytes:      0.00 B

 

18. Monitoring Slow Queries By Response Time/Latency And IOPs

select
    formatDateTime(ql.query_start_time, '%Y/%m/%d %H:%M:%S') as Query_Started,ql.user,
    substring(ql.query,1,50) as Query,
    (ql.query_duration_ms/1000) as Query_Duration_in_Seconds,
    round(ql.memory_usage/1048576) AS Memory_Usage_MB,
    formatReadableSize(ql.result_bytes) AS result_size_in_bytes,
    ql.result_rows as Total_Rows_Processed,
    ql.read_rows as Rows_Read,
    formatReadableSize(ql.read_bytes) AS READs_IN_Bytes,
    ql.written_rows as Rows_written,
    formatReadableSize(qlwritten_bytes) as Writes_IN_Bytes
        from
            system.query_log ql
where ql.type='QueryFinish'
order by Query_Duration_in_Seconds DESC;

19. Top Most READ Intensive Queries In ClickHouse

select
    formatDateTime(ql.query_start_time, '%Y/%m/%d %H:%M:%S') as Query_Started,
    ql.user,
    substring(ql.query,1,50) as Query,
    (ql.query_duration_ms/1000) as Query_Duration_in_Seconds,
    round(ql.memory_usage/1048576) AS Memory_Usage_MB,
    formatReadableSize(ql.result_bytes) AS result_size_in_bytes,
    ql.result_rows as Total_Rows_Processed,
    ql.read_rows as Rows_Read,
    formatReadableSize(ql.read_bytes) AS READs_IN_Bytes,
    ql.written_rows as Rows_written,
    formatReadableSize(ql.written_bytes) as Writes_IN_Bytes
        from
            system.query_log ql
where ql.type='QueryFinish'
order by READs_IN_Bytes DESC
limit 1
Format Vertical;

Output;

Row 1:
──────
Query_Started:             2022/12/27 05:51:38
user:                      demo
Query:                     select null as TABLE_CAT, database as TABLE_SCHEM,
Query_Duration_in_Seconds: 0.004
Memory_Usage_MB:           0
result_size_in_bytes:      33.32 KiB
Total_Rows_Processed:      11
Rows_Read:                 11
READs_IN_Bytes:            999.00 B
Rows_written:              0
Writes_IN_Bytes:           0.00 B

20. Top Most WRITE Intensive Queries In ClickHouse

select
    formatDateTime(ql.query_start_time, '%Y/%m/%d %H:%M:%S') as Query_Started,
    ql.user,
    substring(ql.query,1,50) as Query,
    (ql.query_duration_ms/1000) as Query_Duration_in_Seconds,
    round(ql.memory_usage/1048576) AS Memory_Usage_MB,
    formatReadableSize(ql.result_bytes) AS result_size_in_bytes,
    ql.result_rows as Total_Rows_Processed,
    ql.read_rows as Rows_Read,
    formatReadableSize(ql.read_bytes) AS READs_IN_Bytes,
    ql.written_rows as Rows_written,
    formatReadableSize(ql.written_bytes) as Writes_IN_Bytes
        from
            system.query_log ql
where ql.type='QueryFinish'
order by Writes_IN_Bytes DESC
limit 1
Format Vertical;

Output;

Row 1:
──────
Query_Started:             2022/12/29 16:00:20
user:                      updater
Query:                     INSERT INTO github_events_1672329620 FORMAT JSONCo
Query_Duration_in_Seconds: 22.812
Memory_Usage_MB:           370
result_size_in_bytes:      99.75 MiB
Total_Rows_Processed:      174987
Rows_Read:                 174987
READs_IN_Bytes:            99.75 MiB
Rows_written:              174987
Writes_IN_Bytes:           99.75 MiB

21. Find Top Largest ClickHouse Tables

SELECT database as Database_Name,
       name as Table_Name,
       engine as Storage_Engine,
       storage_policy as Storage_Policy,
       total_rows as Total_Rows,
       formatReadableSize(sum(total_bytes)) as Size,
       is_temporary,
       metadata_modification_time as Schema_Modification_DateTime
FROM system.tables
WHERE database != 'system'
and
total_rows > 0
GROUP BY database, name, engine, is_temporary, metadata_modification_time,
storage_policy, total_rows, total_bytes
ORDER BY sum(total_bytes) DESC
LIMIT 5
format Vertical;

Output;

Row 1:
──────
Database_Name:                default
Table_Name:                   github_events
Storage_Engine:               MergeTree
Storage_Policy:               default
Total_Rows:                   5499832413
Size:                         333.44 GiB
is_temporary:                 0
Schema_Modification_DateTime: 2021-04-29 09:21:08

Row 2:
──────
Database_Name:                default
Table_Name:                   tripdata
Storage_Engine:               MergeTree
Storage_Policy:               default
Total_Rows:                   1310903963
Size:                         54.55 GiB
is_temporary:                 0
Schema_Modification_DateTime: 2021-04-29 09:21:08

Row 3:
──────
Database_Name:                default
Table_Name:                   ontime
Storage_Engine:               MergeTree
Storage_Policy:               default
Total_Rows:                   201575308
Size:                         13.30 GiB
is_temporary:                 0
Schema_Modification_DateTime: 2021-04-29 09:21:08

Row 4:
──────
Database_Name:                default
Table_Name:                   dict_source_2
Storage_Engine:               MergeTree
Storage_Policy:               default
Total_Rows:                   10000000
Size:                         466.36 MiB
is_temporary:                 0
Schema_Modification_DateTime: 2022-08-25 23:37:52

Row 5:
──────
Database_Name:                default
Table_Name:                   airports
Storage_Engine:               MergeTree
Storage_Policy:               default
Total_Rows:                   7543
Size:                         377.71 KiB
is_temporary:                 0
Schema_Modification_DateTime: 2021-04-29 09:27:11

22. Find Largest Databases On ClickHouse

SELECT database as Database_Name,
  sum(total_rows) as Total_Rows,
  formatReadableSize(sum(total_bytes)) as Size
      FROM system.tables
   WHERE database != 'system'
and
       total_rows > 0
GROUP BY database
ORDER BY sum(total_bytes) DESC
LIMIT 25
format Vertical;

Output;

Row 1:
──────
Database_Name: default
Total_Rows:    7022339227
Size:          401.75 GiB

Note: You can disable any or few of the variables as copied below;

cat /etc/clickhouse-server/config.d/z_log_disable.xml
<?xml version="1.0"?>
<yandex>
    <asynchronous_metric_log remove="1"/>
    <metric_log remove="1"/>
    <query_thread_log remove="1" />
    <query_log remove="1" />
    <query_views_log remove="1" />
    <part_log remove="1"/>
    <session_log remove="1"/>
    <text_log remove="1" />
    <trace_log remove="1"/>
    <crash_log remove="1"/>
    <opentelemetry_span_log remove="1"/>
    <zookeeper_log remove="1"/>
</yandex>

 

 

Was this article helpful?

Related Articles

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

In the spirit of freedom, independence and innovation. ChistaDATA Corporation is not affiliated with ClickHouse Corporation 

Need Support?

Can't find the answer you're looking for?
Contact Support

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)

Copyright 2022 ChistaDATA Inc

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.