Tuning Index Granularity for ClickHouse Performance in E-commerce

Introduction

In the world of data analytics and storage, databases are the backbone that supports businesses’ decision-making processes. ClickHouse, an open-source columnar database management system, has gained popularity for its exceptional speed and efficiency in handling large volumes of data. One of the key contributors to its high performance is the way the MergeTree table engine stores the data on the disk and the way the data is indexed. One of the most important settings which can impact the speed of the queries in MergeTree table engine is the  index_granularity setting. In this blog post, we’ll explore what index_granularity is, how it works, and provide a real-life example to illustrate its impact on database performance.

Understanding ClickHouse Index Granularity

At its core, index_granularity in ClickHouse determines how often the index is updated and merged. In a columnar database like ClickHouse, data is stored in columns rather than rows, which allows for better compression and more efficient query processing. Indexing is essential to speed up query performance, as it enables the database to quickly locate the relevant data without scanning the entire dataset.

ClickHouse uses a data structure called the “MergeTree” to store data on disk. The MergeTree is organized into parts, each containing a range of data. The index_granularity setting determines how frequently these parts are created and merged. A smaller index_granularity value leads to more frequent part creation and merging, which can improve query performance but might increase write overhead.

Real-Life Example: E-commerce Analytics

Let’s consider an e-commerce company that utilizes ClickHouse to analyze its vast amount of transactional data. They have a table storing order information, including customer details, products purchased, and order dates. The marketing team frequently runs complex queries to analyze customer behavior, product trends, and sales patterns.

Initially, the company had the default index_granularity setting, which resulted in relatively larger parts. While query performance was acceptable, some analytical queries took longer than desired, especially when dealing with time-based aggregations. The marketing team needed quicker insights to optimize their campaigns and make data-driven decisions faster.

To address this, the company’s database administrators decided to adjust the index_granularity setting. They decreased it to a smaller value, causing ClickHouse to create smaller parts and merge them more frequently. This change aimed to improve query response times for the marketing team’s analytical queries.

The results were impressive. The smaller index_granularity setting significantly reduced the time required for complex queries involving time-based aggregations. For instance, a query that used to take several minutes now executed within seconds. This improvement empowered the marketing team to promptly identify trends, assess the performance of ongoing campaigns, and adjust strategies in near real-time.

SQL Command to Set index_granularity

ALTER TABLE table_name
    MODIFY INDEX index_name
    SET SETTING index_granularity = value;

Replace:

  • table_name with the name of your table.
  • index_name with the name of the index for which you want to set the index_granularity.
  • value with the desired value for the index_granularity. It can be an integer representing the number of rows or bytes.

Example: Setting index_granularity

Let’s consider a scenario where you have a ClickHouse table named sales that stores information about sales transactions. You want to set a custom index_granularity for the primary index of the table to improve query performance for certain types of analytical queries.

Assuming you have an index named primary_index and you want to set the index_granularity to 100,000 rows, here’s how you would do it:

ALTER TABLE sales
    MODIFY INDEX primary_index
    SET SETTING index_granularity = 100000;

In this example, you are modifying the index_granularity setting of the primary_index in the sales Table to have a granularity of 100,000 rows. This change can potentially improve query performance for analytical queries involving aggregations or range-based sales data operations.

Remember that the optimal value for index_granularity depends on your specific workload and query patterns. You might need to experiment with different values and monitor query performance to determine the most suitable setting for your use case.

Conclusion

ClickHouse’s index_granularity setting is a powerful tool that can dramatically impact database performance. By adjusting the frequency of part creation and merging, you can tailor your database to suit your specific workload requirements. The real-life example of the e-commerce company showcases the tangible benefits of optimizing index_granularity for analytical workloads, where quick insights are crucial for informed decision-making.

When considering adjustments to index_granularity, it’s essential to strike a balance between query performance and write overhead. Smaller parts can lead to faster queries but might increase the write workload. Therefore, careful monitoring and testing are essential to fine-tune this setting for your unique use case.

In the fast-paced world of data-driven decision-making, ClickHouse’s index_granularity offers a practical way to supercharge your database performance and provide timely insights that can drive your business forward.

For more information, please visit the official ClickHouse documentation from here.

To know more about Indexes in ClickHouse, do read the following articles:

References:

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#settings

About Can Sayn 41 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website