How to Troubleshoot Performance of Fragmented ClickHouse Databases?

 

Introduction

A fragmented ClickHouse database can impact performance in several ways:
  • Increased disk I/O: When a database is fragmented, the data is stored in multiple parts across the disk, so it takes more time and I/O operations to read the necessary data.
  • Increased memory usage: ClickHouse uses a caching mechanism that stores recently read data in memory. When the data is fragmented, it increases the number of cache misses, which leads to increased memory usage.
  • Increased CPU usage: ClickHouse needs to perform more computation to read the necessary data when the database is fragmented, which leads to increased CPU usage.
  • Slow query performance: When the data is fragmented, it takes more time to read the necessary data, which leads to slow query performance.
  • Reduced scalability: When the data is fragmented, it increases the number of I/O operations, which leads to reduced scalability when dealing with high loads.
To troubleshoot the performance of a fragmented ClickHouse database, you can use system tables to check the fragmentation level of tables and partitions and use the ALTER TABLE … OPTIMIZE command to defragment them. Also, you can monitor the performance metrics like cache misses, query time, and disk I/O to determine the impact of fragmentation.
 

(1) How to troubleshoot performance of a fragmented ClickHouse database?

To troubleshoot the performance of a fragmented ClickHouse database, you can follow these steps:
  1. Monitor the size of your tables and partitions: Fragmentation can cause the size of your tables and partitions to grow, which can lead to poor performance. You can use the system.parts table to check the size of your partitions and identify any that are larger than expected.
  2. Monitor the merge process: ClickHouse uses a merge process to defragment tables and partitions. You can use the system.merges table to monitor the status of the merge process, including the number of parts that are being merged, the number of parts that are waiting to be merged, and the number of parts that have been merged.
  3. Monitor the query performance: Use the system.query_log table to monitor the performance of your queries. This table provides information on query execution time, the number of rows returned, and the number of bytes read.
  4. Analyze the query execution plan: You can use the EXPLAIN command to get the query execution plan, which shows how the query is executed and which indices are used. This can help you identify if an index is not being used or if a different index would be more efficient.
  5. Monitor the system load: Keep an eye on the system load, such as CPU, memory, and disk usage. A high system load can cause poor performance.
  6. Optimize your schema: Make sure your schema is optimized for the queries you are running. This can include using the right data types, partitioning your tables, and adding appropriate indices.
  7. Use the performance monitoring tools: ClickHouse provides several performance monitoring tools such as the system.metrics table, system.graphite_rollup table, and system.prometheus_rollup table that can be used to monitor and troubleshoot the performance of your ClickHouse cluster.
  8. Consider the use of Distributed tables: ClickHouse allows you to distribute tables across multiple servers for better scalability, which can help alleviate performance issues.

(2) How to monitor fragmented databases in ClickHouse?

There are a few different SQL queries you can use to monitor a fragmented ClickHouse database. One way to check for fragmentation is to query the system.parts table, which contains information about the partitions within a table. You can use the following query to see the number of parts and the amount of disk space used by each partition:
 
SELECT table, partition, count(), sum(bytes) 
FROM system.parts
GROUP BY table, partition '
ORDER BY sum(bytes) DESC;

Another way to check for fragmentation is to query the system.merges table, which contains information about completed and in-progress merge operations. You can use the following query to see the number of merge operations, the number of parts merged, and the amount of disk space freed by each operation:

SELECT database, table, count(), sum(parts_to_merge), sum(bytes_to_free) 
FROM system.merges
GROUP BY database, table
ORDER BY sum(bytes_to_free) DESC;

You can also use the following query to check if there are any tables that are in need of optimization.

SELECT database, table, count() 
FROM system.parts WHERE active = 0
GROUP BY database, table
ORDER BY count() DESC;

Conclusion

It’s important to note that fragmented table can occur due to multiple reasons like data size, data distribution, data type, etc. It’s always good to have a deeper understanding of the data and the underlying queries to optimize the performance of the ClickHouse database.

To read more about Troubleshooting in ClickHouse, please read the following articles:

About Shiv Iyer 246 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.