Monitoring ClickHouse Server Load

Table of Contents

Introduction

What is the Data Load distribution happening to ClickHouse Server? I ask this question often to ClickHouse infrastructure owners. This matrix is used for trending the load and throughput. This will help me identify the load on ClickHouse Server from peak to off-peak hours.

As a Performance Engineer, I spent long hours trending the performance of ClickHouse Infrastructure Operations across latency and throughput for troubleshooting performance more evidence-based than speculative. A detailed understanding of READ – WRITE distribution on ClickHouse Server also helps efficiently compute the usage of available system resources optimally. Both oversized and undersized Infrastructure is a different issue to troubleshoot altogether.

Observing the Load on ClickHouse Server

Total Rows Inserted to all the Tables
Total number of INSERT queries
Total Data Inserted in Bytes
Time spent(WRITE Latency) waiting for WRITE SYSCALL
Total number of SELECT queries
Total Rows Selected
Time spent(READ Latency) waiting for READ SYSCALL
Total time spent on query operations – Query Operations Latency Matrix
Query operations throughput

Script to measure the load on ClickHouse Server

select *
from (select *
from (select toDateTime(system.metric_log.event_time) as "Event Date Time",
sum(system.metric_log.ProfileEvent_InsertedRows) as "Number of rows INSERTed to all tables.",
sum(system.metric_log.ProfileEvent_DiskWriteElapsedMicroseconds) as "Total time spent waiting for write syscall. This include writes to page cache.",
sum(system.metric_log.ProfileEvent_InsertQuery) as "Total number of INSERT queries.",
sum(system.metric_log.ProfileEvent_InsertedBytes) as "Total data INSERTed in Bytes",
sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time of all queries",
sum(system.metric_log.ProfileEvent_SelectQuery) as "Total number of SELECT queries",
sum(system.metric_log.ProfileEvent_SelectQueryTimeMicroseconds) as "Total time of SELECT queries.",
sum(system.metric_log.ProfileEvent_SelectedRows) as "Total number of rows SELECTed",
sum(system.metric_log.ProfileEvent_DiskReadElapsedMicroseconds) as "Total time spent waiting for read syscall. This include reads from page cache.",
sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time of all queries."
from system.metric_log
group by
system.metric_log.event_time,system.metric_log.ProfileEvent_InsertedRows,
system.metric_log.ProfileEvent_DiskWriteElapsedMicroseconds,
system.metric_log.ProfileEvent_InsertQuery,system.metric_log.ProfileEvent_InsertedBytes,
system.metric_log.ProfileEvent_QueryTimeMicroseconds,
system.metric_log.ProfileEvent_SelectQuery,
system.metric_log.ProfileEvent_SelectQueryTimeMicroseconds,
system.metric_log.ProfileEvent_SelectedRows,
system.metric_log.ProfileEvent_DiskReadElapsedMicroseconds,
system.metric_log.ProfileEvent_QueryTimeMicroseconds
order by
system.metric_log.ProfileEvent_QueryTimeMicroseconds desc
));

Load on ClickHouse Server Metrics Explained

ID.	Metric Name	Description
1	metric_log.event_time	Event Date Time
2	metric_log.ProfileEvent_InsertedRows	Number of rows INSERTed to all tables
3	metric_log.ProfileEvent_DiskWriteElapsedMicroseconds	Total time spent waiting for write syscall. This include writes to page cache
4	metric_log.ProfileEvent_InsertQuery	Total number of INSERT queries
5	metric_log.ProfileEvent_InsertedBytes	Total data INSERTed in Bytes
6	metric_log.ProfileEvent_QueryTimeMicroseconds	Total time of all queries
7	metric_log.ProfileEvent_SelectQuery	Total number of SELECT queries
8	metric_log.ProfileEvent_SelectQueryTimeMicroseconds	Total time of SELECT queries
9	metric_log.ProfileEvent_SelectedRows	Total number of rows SELECTed
10	metric_log.ProfileEvent_DiskReadElapsedMicroseconds	Total time spent waiting for read syscall. This include reads from page cache
11	metric_log.ProfileEvent_QueryTimeMicroseconds	Total time of all queries

Conclusion

It is essential to closely monitor and troubleshoot ClickHouse server load to be able to optimize the cluster for maximum performance. This guide will give a high-level view on the critical metrics to keep an eye on as you engage in this critical process to achieve high performance ClickHouse operations.

To know more about monitoring in ClickHouse, do read the following articles:

ChistaDATA Inc.

Enterprise-class 24*7 ClickHouse Consultative Support and Managed Services

Monitoring ClickHouse Server Load

Introduction

Observing the Load on ClickHouse Server

Script to measure the load on ClickHouse Server

Load on ClickHouse Server Metrics Explained

Conclusion

Introduction

Observing the Load on ClickHouse Server

Script to measure the load on ClickHouse Server

Load on ClickHouse Server Metrics Explained

Conclusion

Related Articles

Optimizing Long Integer Queries in ClickHouse: Strategies for High-Speed Data Analysis

Comprehensive Guide for ClickHouse Horizontal Scaling and Capacity Planning

Optimising ClickHouse Queries – Turbocharge your queries