ClickHouse Monitoring: Python Script to Monitor Linux Disk I/O Matrix

Introduction

Optimizing and monitoring the performance of your Linux kernel is absolutely key to achieving the highest performance from your ClickHouse cluster. In this article we explore how to use a simple python script to monitor the disk I/O matrix of the Linux kernel.

Python Script to Monitor Linux Disk I/O Matrix

Here is a sample Python script that monitors the disk I/O usage of Linux and saves the information in a ClickHouse database:
import psutil

import mysql.connector


# Connect to the ClickHouse database

cnx = mysql.connector.connect(user='<username>', password='<password>', host='<hostname>', port='<port>', database='<database>')

cursor = cnx.cursor()

def monitor_disk_io_usage():

disk_io = psutil.disk_io_counters()

# Get read_count

read_count = disk_io.read_count

# Get write_count

write_count = disk_io.write_count

# Get read_bytes

read_bytes = disk_io.read_bytes

# Get write_bytes

write_bytes = disk_io.write_bytes

# Insert disk I/O usage into ClickHouse database

query = f"INSERT INTO disk_io_usage (read_count, write_count, read_bytes, write_bytes) VALUES ({read_count}, {write_count}, {read_bytes}, {write_bytes})"

cursor.execute(query)

cnx.commit()

while True:

monitor_disk_io_usage()

# Close the cursor and connection

cursor.close()

cnx.close()

Script Explanation

This script uses the psutil library to get information about the disk I/O usage and mysql.connector to connect and insert the information in a ClickHouse database. The script runs in an infinite loop, so it continuously monitors the disk I/O usage and saves the information in the database. You can add a sleep function to the loop to make it run periodically.
You will need to replace the <username>, <password>, <hostname>, <port>, <database> with the appropriate values for your ClickHouse setup.

Conclusion

Make sure that you have created the table disk_io_usage in the ClickHouse database before running the script, the table should have 4 columns read_count, write_count, read_bytes, and write_bytes
Also, you can use Grafana to visualize the data in the ClickHouse database to have a better analysis of the disk I/O usage.
About Shiv Iyer 236 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.