ClickHouse Storage Engine Explained – How to use them for Performance?

ClickHouse Storage Engine Explained – How to use them for Performance?


ClickHouse supports several storage engines, each with its own characteristics and performance characteristics. The storage engine you choose will depend on the specific requirements of your use case and the data you are working with. Here are some of the main storage engines available in ClickHouse:

  1. MergeTree: This is the default storage engine in ClickHouse and is used for most use cases. It is a columnar storage engine that supports data compression, data partitioning, and indexing. It supports both read and write operations and is suitable for use cases with high-concurrency read and write workloads.
  2. ReplacingMergeTree: This is similar to the MergeTree storage engine, but it is optimized for use cases with high write loads and low read loads. It is also a columnar storage engine that supports data compression, data partitioning, and indexing.
  3. CollapsingMergeTree: This storage engine is used for use cases with high write loads, high data cardinality and low number of rows. It is similar to the ReplacingMergeTree storage engine, but it is optimized for use cases with high data cardinality.
  4. SummingMergeTree: This storage engine is used for use cases with high write loads, high data cardinality and low number of rows. It is similar to the CollapsingMergeTree storage engine, but it is optimized for use cases with high data cardinality and it is used for aggregate data
  5. Buffer: This storage engine is used for temporary tables and it stores data in memory. It is suitable for use cases with high-concurrency read and write workloads, but it is only useful for small datasets.
  6. Memory: This storage engine is similar to the Buffer storage engine, but it is used for storing data in memory permanently. It is suitable for use cases with high-concurrency read and write workloads, but it is only useful for small datasets.
  7. Log: This storage engine is used for storing data in a log-based format. It is suitable for use cases with high write loads and low read loads, but it is not suitable for analytical queries.

When choosing a storage engine, it’s important to consider the specific requirements of your use case and the characteristics of the data you are working with. Factors such as data size, data cardinality, the number of rows, the read-write ratio, and the required level of concurrency should all be taken into account.

It’s also important to note that storage engines can be used in combination and in different tables, so you can use one storage engine for one table and another storage engine for another table, depending on the characteristics of the data and the use case.

About Shiv Iyer 56 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.

Be the first to comment

Leave a Reply