ClickHouse Monitoring: Python Script to Monitor Linux Process Memory Matrix

Introduction

Optimizing and monitoring the performance of your Linux kernel is absolutely key to achieving the highest performance from your ClickHouse cluster. In this article we explore how to use a simple python script to monitor the memory matrix of Linux kernel processes.

Python Script to Monitor Linux Process Memory Matrix

Here is a sample Python script that monitors the memory usage of Linux processes and saves the information in a ClickHouse database:

import psutil
import mysql.connector

# Connect to the ClickHouse database
cnx = mysql.connector.connect(user='<username>', password='<password>', host='<hostname>', port='<port>', database='<database>')
cursor = cnx.cursor()

def monitor_memory_usage():
    for proc in psutil.process_iter():
        try:
            # Get process details as a named tuple
            process = proc.as_dict(attrs=['pid', 'name', 'memory_info'])
            # Get memory usage
            memory_info = process['memory_info']
            # Get process name
            process_name = process['name']
            # Get process id
            process_id = process['pid']
            # Insert process details into ClickHouse database
            query = f"INSERT INTO process_memory_usage (process_name, process_id, memory_usage) VALUES ('{process_name}', {process_id}, {memory_info.rss})"
            cursor.execute(query)
            cnx.commit()
        except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
            pass

while True:
    monitor_memory_usage()

# Close the cursor and connection
cursor.close()
cnx.close()

This script uses the psutil library to get information about all running processes in the system and mysql.connector to connect and insert the memory usage of the process in a ClickHouse database. The script runs in an infinite loop, so it continuously monitors the memory usage of processes and saves the information in the database. You can add a sleep function to the loop to make it run periodically.

You will need to replace the <username>, <password>, <hostname>, <port>, <database> with the appropriate values for your ClickHouse setup.

Make sure that you have created the table process_memory_usage in the ClickHouse database before running the script, the table should have 3 columns process_name, process_id, and memory_usage.

Conclusion

It’s important to note that this script will only return the information of running processes, it will not give you the information of the processes that have been terminated. Also, you can use Grafana to visualize the data in the ClickHouse database to have a better analysis of the memory usage.

To know more about Linux for ClickHouse, please do consider reading the below articles: 

About Shiv Iyer 218 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.