ClickHouse Query Profiler for Performance Monitoring

Introduction

Creating a Python CLI (Command-Line Interface) application for custom profiling of ClickHouse Server queries involves several components. The application will allow users to input queries via the command line, execute them on a ClickHouse server, and then provide performance metrics. Below is a Python script that achieves this:

Python Script for ClickHouse Query Performance Profiling

import argparse
import time
import psutil
from clickhouse_driver import Client

def profile_query(query, host, port, user, password):
    """
    Execute a query on ClickHouse server and profile its performance.

    Args:
        query (str): The ClickHouse SQL query to be executed.
        host (str): Hostname of the ClickHouse server.
        port (int): Port of the ClickHouse server.
        user (str): Username for the ClickHouse server.
        password (str): Password for the ClickHouse server.

    Returns:
        dict: Execution time, CPU usage, and I/O stats.
    """
    client = Client(host=host, port=port, user=user, password=password)

    start_time = time.time()
    start_cpu = psutil.cpu_times()
    start_io = psutil.disk_io_counters()

    # Execute the query
    client.execute(query)

    end_time = time.time()
    end_cpu = psutil.cpu_times()
    end_io = psutil.disk_io_counters()

    # Calculate performance metrics
    duration = end_time - start_time
    cpu_usage = {'user': end_cpu.user - start_cpu.user, 'system': end_cpu.system - start_cpu.system}
    io_usage = {'read_bytes': end_io.read_bytes - start_io.read_bytes,
                'write_bytes': end_io.write_bytes - start_io.write_bytes}

    return {'duration': duration, 'cpu_usage': cpu_usage, 'io_usage': io_usage}

def main():
    """
    Main function to handle command-line arguments and execute query profiling.
    """
    parser = argparse.ArgumentParser(description='Profile a ClickHouse query for performance.')
    parser.add_argument('query', help='The ClickHouse SQL query to profile.')
    parser.add_argument('--host', default='localhost', help='Hostname of the ClickHouse server.')
    parser.add_argument('--port', type=int, default=9000, help='Port of the ClickHouse server.')
    parser.add_argument('--user', default='default', help='Username for the ClickHouse server.')
    parser.add_argument('--password', default='', help='Password for the ClickHouse server.')

    args = parser.parse_args()

    # Execute and profile the query
    result = profile_query(args.query, args.host, args.port, args.user, args.password)

    print(f"Query Execution Time: {result['duration']} seconds")
    print(f"CPU Usage: {result['cpu_usage']}")
    print(f"I/O Usage: {result['io_usage']}")

if __name__ == "__main__":
    main()

How to Use the Script

  1. Install Dependencies: This script requires clickhouse-driver and psutil. You can install them using pip:
pip install clickhouse-driver psutil

     2. Running the Script: Use the command line to run the script. For example:

python profiler.py "SELECT * FROM my_table" --host myclickhousehost --user myuser --password mypassword
  1. Output: The script will output the query execution time, CPU usage, and I/O statistics.

Script Features

  • Command-Line Arguments: The script uses argparse to handle command-line arguments for flexibility.
  • Query Execution: Connects to ClickHouse using provided credentials and executes the specified query.
  • Performance Profiling: Gathers CPU and I/O metrics during query execution.
  • Output: Displays the profiling results in a readable format.

Note

  • Security: Ensure that sensitive information like passwords is handled securely, especially in a production environment.

Conclusion

In this article, we have learned to use a simple python script for ClickHouse query profiling, for performance monitoring & optimization.

To learn more about ClickHouse Query Profiling, consider reading the following articles: 

About Shiv Iyer 206 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.