Tuning Linux for ClickHouse Performance

Tuning Linux for ClickHouse Performance

Table of Contents

Introduction

Tuning the Linux kernel can significantly improve the performance of ClickHouse, a popular open-source columnar database management system. Here are some of the Linux kernel parameters that can be tuned to optimize ClickHouse performance:

  1. Transparent Huge Pages (THP): THP is a memory management feature in Linux that can potentially improve performance by reducing the number of page faults. However, THP can cause significant performance issues in databases like ClickHouse that perform a lot of memory mapping. Therefore, it is recommended to disable THP for ClickHouse.

To disable THP, add the following lines to /etc/rc.local file or equivalent:

echo never &gt; /sys/kernel/mm/transparent_hugepage/enabled<br>echo never &gt; /sys/kernel/mm/transparent_hugepage/defrag
  1. Dirty Ratio and Dirty Background Ratio: These parameters control the percentage of system memory that can be used for writing data to disk. The default values may not be suitable for ClickHouse’s write-heavy workload, so it is recommended to increase them.

Add the following lines to /etc/sysctl.conf file:

vm.dirty_ratio=10<br>vm.dirty_background_ratio=5

3. File descriptors: The default maximum number of open file descriptors in Linux may not be sufficient for ClickHouse, which performs a large number of disk I/O operations. To increase the number of file descriptors, add the following line to /etc/security/limits.conf:

* hard nofile 1000000
  1. TCP Settings: ClickHouse is a network-intensive application, and tuning TCP parameters can improve its performance.

Add the following lines to /etc/sysctl.conf:

net.ipv4.tcp_window_scaling = 1<br>net.ipv4.tcp_sack = 1<br>net.ipv4.tcp_timestamps = 1<br>net.ipv4.tcp_fin_timeout = 10<br>net.ipv4.tcp_tw_reuse = 1<br>net.ipv4.tcp_tw_recycle = 1
  1. IO Scheduler: ClickHouse performs a lot of disk I/O operations, and the choice of IO scheduler can have a significant impact on performance.

It is recommended to use the “noop” IO scheduler for ClickHouse. To do this, add the following line to /etc/rc.local:

echo noop &gt; /sys/block/&lt;device&gt;/queue/scheduler

Replace <device> with the device name of your storage device.

Conclusion

In conclusion, tuning these Linux kernel parameters can help optimize ClickHouse performance for your workload. However, the optimal values may vary depending on your specific setup and workload, so it’s essential to benchmark and monitor performance to ensure that changes are beneficial.

To read more about Linux and ClickHouse, do consider reading the following articles

About Shiv Iyer 216 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.