How memory management happens in ClickHouse?

How ClickHouse consume the available RAM?


In ClickHouse, memory management is handled by the MemoryTracker class, which is responsible for allocating and deallocating memory to different parts of the system. The MemoryTracker class uses a hierarchical approach to manage memory, with different levels of memory allocation, such as global, per thread, and per block.

  1. Global memory allocation: This is the highest level of memory allocation in ClickHouse. It is used for the memory-mapped files, the read and write buffers, and the compression and encoding buffers. The global memory allocation is done at the start of the process and is not released until the end of the process.
  2. Per-thread memory allocation: This level of memory allocation is used for the memory allocated by each thread. The per-thread memory allocation is done on demand, when a thread needs to perform a specific task, such as sorting or merging data.
  3. Per-block memory allocation: This is the lowest level of memory allocation in ClickHouse. It is used for the memory allocated by each block of data. The per-block memory allocation is done on demand, when a block of data is loaded into memory.

In addition to managing the memory allocation, MemoryTracker class also monitors the memory usage of the system, and takes action if the memory usage exceeds a certain threshold. If the memory usage exceeds the threshold, MemoryTracker will try to release memory by flushing the read and write buffers to disk, releasing the memory allocated by the threads, and releasing the memory allocated by the blocks of data.

Python script to monitor top ten ClickHouse queries by memory consumption

#import the necessary libraries
import clickhouse_driver
import pymysql

#establish a connection to the clickhouse server 
conn = clickhouse_driver.connect(host='localhost', port=9000, user='user', password='password')

#fetch the top 10 ClickHouse queries by memory consumption
query = '''
select query, memory_usage
from system.query_log
order by memory_usage desc
limit 10
'''
cur = conn.cursor()
cur.execute(query)

#fetch and print the top 10 results
for row in cur:
    print(row)

#close the connection
cur.close()
conn.close() 

#end of code

In summary, ClickHouse manages memory using the MemoryTracker class, which uses a hierarchical approach to manage memory allocation. Memory is allocated globally, per-thread, and per-block. MemoryTracker also monitors the memory usage of the system, and releases memory if the usage exceeds a certain threshold. This approach allows ClickHouse to efficiently manage memory and ensure that the system stays within the boundaries of the available physical memory.

About Shiv Iyer 56 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.