What are the multiple data caches in ClickHouse?

ClickHouse Internals – Understanding ClickHouse Data Caching Mechanism 


ClickHouse utilizes multiple data caches to improve query performance. These caches include:

  1. Read cache: This cache stores the results of read-only queries. This cache is shared among all clients and is used to avoid redundant computation of the same query results.
  2. Write cache: This cache stores the results of write queries, such as INSERT, UPDATE, and DELETE. This cache is also shared among all clients and is used to speed up the execution of write queries by reducing the number of disk I/O operations.
  3. Compression cache: This cache stores compressed data blocks. This cache is used to speed up the execution of queries by reducing the time required to decompress the data blocks.
  4. Dictionary cache: This cache stores the dictionary of string to integer values. This cache is used to speed up the execution of queries by reducing the time required to perform string to integer conversions.
  5. Join cache: This cache stores the results of JOIN queries. This cache is shared among all clients and is used to speed up the execution of JOIN queries by avoiding redundant computation of the same join results.
  6. Merge tree cache: This cache stores the data of MergeTree-based tables. It is used to speed up the execution of queries by avoiding redundant computation of the same merge results.

All these caches have their own size limit, and when a new data block is added to the cache, the least recently used data block is evicted from the cache. The cache size can be configured based on the specific requirements of the system. In addition, the cache eviction strategy can also be configured, like LRU, LFU, etc.

Configuring data caching in ClickHouse

There are several ways to configure data caching in ClickHouse:

  1. Using configuration files: The main configuration file for ClickHouse is called config.xml and is located in the /etc/clickhouse-server/ directory. In this file, you can set the cache size for each cache type. For example, to set the size of the read cache to 256GB, you would add the following line to the config.xml file:

<readonly_cache_size>256GB</readonly_cache_size>

  1. Using the SET command: You can use the SET command to change the cache size while the server is running. For example, to set the size of the read cache to 256GB, you would use the following command:

SET readonly_cache_size = '256GB';

  1. Using the clickhouse-client tool: You can use the clickhouse-client tool to change the cache size while the server is running. For example, to set the size of the read cache to 256GB, you would use the following command:

clickhouse-client --query="SET readonly_cache_size = '256GB'"

It is important to note that ClickHouse uses the RAM to store the data of these caches, so you should take into account the available memory when configuring the cache size. Also, it’s important to test the performance of your system with different cache sizes to find the best configuration for your use case.

Additionally, you can configure the cache eviction strategy, like LRU, LFU, etc. You can configure this in the configuration file or with the SET command.

In summary, ClickHouse uses multiple data caches to improve query performance, including the read cache, write cache, compression cache, dictionary cache, join cache, and merge tree cache. These caches are shared among all clients and are used to speed up the execution of queries by reducing the number of disk I/O operations and avoiding redundant computation of the same query results. The cache size and eviction strategy can be configured based on the specific requirements of the system. you can configure data caching in ClickHouse by editing the config.xml file, using the SET command, or using the clickhouse-client tool. You should take into account the available memory when configuring the cache size, and test the performance of your system with different cache sizes to find the best configuration for your use case.

About Shiv Iyer 56 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.