Fine-Tuning ClickHouse Storage and MergeTree Data Variables

Table of Contents

Introduction

In the realm of data management and analytics, performance is paramount. When dealing with large datasets, every bit of optimization counts. ClickHouse, the open-source column-oriented database management system, offers a plethora of settings to fine-tune storage variables and enhance MergeTree data handling. In this article, we delve into some key settings that can significantly impact performance and storage efficiency.

You can check the settings for MergeTree tables in ClickHouse by examining the system.merge_tree_settings table. These settings can be adjusted in a couple of ways:

Table-Specific Configuration: You have the option to override the default settings for individual MergeTree tables. This can be done by specifying custom values within the SETTINGS section when creating or altering each table.
Global Configuration File (config.xml): Alternatively, you can establish default values for MergeTree settings at a global level. This involves editing the config.xml file and modifying the settings within the <merge_tree> section. These global settings will be applied across all MergeTree tables unless overridden at the table level.
Note: Changes you make in the <merge_tree> section require a clickhouse-server restart.

To see the current MergeTree settings for all tables, you can execute a query like:

SELECT *
FROM system.merge_tree_settings;

For further details and comprehensive documentation on ClickHouse MergeTree settings, please refer to the official ClickHouse documentation page: ClickHouse MergeTree Settings.

Storage Settings

(1) remote_fs_execute_merges_on_single_replica_time_threshold

One notable setting in ClickHouse is remote_fs_execute_merges_on_single_replica_time_threshold. When set to a value greater than zero, this setting triggers a particular behavior where only a single replica initiates the merge process immediately, provided certain conditions are met. Specifically, this behavior is activated when merged parts are on shared storage and allow_remote_fs_zero_copy_replication is enabled.

However, it’s crucial to note that zero-copy replication, while potentially beneficial for performance, is not yet deemed production-ready. As of ClickHouse version 22.8 and higher, zero-copy replication remains disabled by default, with caution against its usage in production environments.

Usage:

<merge_tree>
    <remote_fs_execute_merges_on_single_replica_time_threshold>5000</remote_fs_execute_merges_on_single_replica_time_threshold>
</merge_tree>

(2) ratio_of_defaults_for_sparse_serialization

Another powerful feature in ClickHouse is sparse serialization optimization, governed by the ratio_of_defaults_for_sparse_serialization setting. This setting determines the minimal ratio of default values to all values in a column, thereby influencing whether the column is stored using sparse serialization.

Sparse serialization is particularly advantageous when dealing with columns that predominantly contain default or zero values. By enabling sparse serialization for such columns, ClickHouse can encode them in a compressed format, leading to reduced storage requirements and improved query performance.

For instance, let’s consider a scenario where a column ‘s’ in a table is empty for 95% of the rows. By setting ratio_of_defaults_for_sparse_serialization to 0.95 for this column, ClickHouse automatically employs sparse serialization, resulting in significant storage savings.

Usage:

<merge_tree>
    <ratio_of_defaults_for_sparse_serialization>0.9</ratio_of_defaults_for_sparse_serialization>
</merge_tree>

Example:

In the provided example, it’s evident that the ‘s’ column in the given table predominantly contains empty strings, with approximately 95% of the rows having this characteristic. In the first table, referred to as my_regular_table, we haven’t implemented sparse serialization. However, in the second table, denoted as my_sparse_table, we’ve specifically configured the ratio_of_defaults_for_sparse_serialization setting to 0.95.

CREATE TABLE my_regular_table
(
    `id` UInt64,
    `s` String
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO my_regular_table
SELECT
    number AS id,
    number % 10 = 0 ? toString(number): '' AS s
FROM
    numbers(10000000);

CREATE TABLE my_sparse_table
(
    `id` UInt64,
    `s` String
)
ENGINE = MergeTree
ORDER BY id
SETTINGS ratio_of_defaults_for_sparse_serialization = 0.95;

INSERT INTO my_regular_table
SELECT
    number AS id,
    number % 10 = 0 ? toString(number): '' AS s
FROM
    numbers(10000000);

It’s worth noting that the ‘s’ column in the my_sparse_table consumes significantly less storage space on disk compared to its counterpart in the my_regular_table.

SELECT table, name, data_compressed_bytes, data_uncompressed_bytes FROM system.columns
WHERE table LIKE 'my_%_table';

SELECT
    table,
    name,
    data_compressed_bytes,
    data_uncompressed_bytes
FROM system.columns
WHERE table LIKE 'my_%_table'

Query id: 629434cd-9992-4a2b-b4cd-3fd148ad778b

┌─table────────────┬─name─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┐
│ my_regular_table │ id   │              77840698 │               155488328 │
│ my_regular_table │ s    │               8125129 │                32819054 │
│ my_sparse_table  │ id   │                     0 │                       0 │
│ my_sparse_table  │ s    │                     0 │                       0 │
└──────────────────┴──────┴───────────────────────┴─────────────────────────┘

4 rows in set. Elapsed: 0.036 sec.

You can confirm whether a column is utilizing sparse encoding by inspecting the serialization_kind column in the system.parts_columns table.

┌─column─┬─serialization_kind─┐
│ id     │ Default            │
│ s      │ Default            │
│ id     │ Default            │
│ s      │ Default            │
│ id     │ Default            │
│ s      │ Sparse             │
│ id     │ Default            │
│ s      │ Sparse             │
│ id     │ Default            │
│ s      │ Sparse             │
│ id     │ Default            │
│ s      │ Sparse             │
│ id     │ Default            │
│ s      │ Sparse             │
│ id     │ Default            │
│ s      │ Sparse             │
│ id     │ Default            │
│ s      │ Sparse             │
│ id     │ Default            │
│ s      │ Sparse             │
│ id     │ Default            │
│ s      │ Sparse             │
└────────┴────────────────────┘

Cache Settings

(1) replicated_deduplication_window_for_async_inserts

The ‘replicated_deduplication_window_for_async_inserts’ parameter in ClickHouse determines the number of most recently asynchronously inserted blocks for which ClickHouse Keeper stores hash sums to check for duplicates.

When using the Async Insert command in ClickHouse, data is cached in one or more blocks (parts). To prevent duplicate entries, especially when writing into replicated tables, ClickHouse stores hash sums of each insert in ClickHouse Keeper. However, it only retains hash sums for the most recent ‘replicated_deduplication_window_for_async_inserts’ blocks. Older hash sums are automatically removed from ClickHouse Keeper.

It’s important to consider the impact of this setting on performance. A larger value for ‘replicated_deduplication_window_for_async_inserts’ can slow down Async Inserts because it requires comparing more entries. The hash sum, derived from a combination of field names, types, and the insert data itself (a stream of bytes), serves as a unique identifier to detect duplicates.

Possible Values:

Any positive integer.
0 (disables deduplication for async_inserts)
Default value: 10000.

Usage:

<merge_tree>
    <replicated_deduplication_window_for_async_inserts>600</replicated_deduplication_window_for_async_inserts>
</merge_tree>

(2) use_async_block_ids_cache

The ‘cache_async_inserts_hash_sums’ parameter in ClickHouse determines whether hash sums of async inserts are cached or not.

When set to true, ClickHouse caches the hash sums of async inserts. This caching mechanism is particularly useful for blocks bearing multiple async inserts. By caching hash sums, ClickHouse can efficiently filter duplicated inserts in memory.

Without caching, when some inserts are duplicated, ClickHouse Keeper returns only one duplicated hash sum in one Remote Procedure Call (RPC). This can lead to unnecessary RPC retries, impacting performance. However, with caching enabled, the cache continuously monitors the hash sums path in Keeper. Any updates trigger immediate cache updates, ensuring efficient filtering of duplicated inserts in memory.

Possible Values:

true
false
Default value: false.

Usage:

<merge_tree>
    <use_async_block_ids_cache>true</use_async_block_ids_cache>
</merge_tree>

(3) async_block_ids_cache_min_update_interval_ms

The ‘update_interval_use_async_block_ids_cache’ parameter in ClickHouse specifies the minimum interval, in milliseconds, for updating the ‘use_async_block_ids_cache’.

By default, the ‘use_async_block_ids_cache’ updates immediately upon detecting updates in the watched keeper path. However, frequent updates may impose a heavy burden on the system. This parameter acts as a safeguard, ensuring that the cache is not updated too frequently.

It’s essential to strike a balance with this setting. Setting the interval too short may lead to excessive updates, while setting it too long can prolong the retry time for blocks with duplicated inserts. Therefore, careful consideration is needed to optimize performance while avoiding unnecessary delays in processing.

Possible Values:

Any positive integer.
Default value: 100.

Usage:

<merge_tree>
    <async_block_ids_cache_min_update_interval_ms>10000</async_block_ids_cache_min_update_interval_ms>
</merge_tree>

(4) old_parts_lifetime

The parameter ‘old_parts_lifetime’ in ClickHouse specifies the duration, in seconds, for storing inactive parts to safeguard against data loss during spontaneous server reboots.

When ClickHouse merges several parts into a new one, it marks the original parts as inactive. These inactive parts are not immediately deleted. Instead, ClickHouse waits for a specified period, determined by ‘old_parts_lifetime’, before removing them. This precautionary measure helps protect against data loss, especially in scenarios involving spontaneous server reboots.

During server startup, ClickHouse performs integrity checks on the parts. If the merged part is found to be damaged, ClickHouse reinstates the inactive parts and later attempts to merge them again. In the case of undamaged merged parts, ClickHouse renames the original inactive parts and moves them to a detached folder.

It’s important to note that new parts are initially stored only in the server’s RAM (OS cache) without invoking ‘fsync’. Therefore, under heavy disk load, there’s a risk of data loss if a server reboot occurs before new parts are written to disk. To mitigate this risk, ClickHouse retains inactive parts for the duration specified by ‘old_parts_lifetime’, ensuring that new parts are safely written to disk before removal of inactive parts.

The default value of 480 seconds for ‘old_parts_lifetime’ was chosen empirically to guarantee that new parts are reliably written to disk even under heavy disk system loads. This value provides a sufficient buffer to protect against data loss during spontaneous server reboots.

Possible Values:

Any positive integer.
Default value: 480.

Usage:

<merge_tree>
    <old_parts_lifetime>7200</old_parts_lifetime>
</merge_tree>

Conclusion

In conclusion, fine-tuning storage settings and optimizing MergeTree data handling in ClickHouse are crucial for maximizing performance and resource utilization. By adjusting parameters like remote_fs_execute_merges_on_single_replica_time_threshold, ratio_of_defaults_for_sparse_serialization, and replicated_deduplication_window_for_async_inserts, users can tailor ClickHouse to meet specific workload requirements.

These settings offer flexibility in managing data storage, optimizing query performance, and ensuring data integrity. Whether adjusting settings globally in the config.xml file or at the table level with the SETTINGS section, ClickHouse provides granular control for effective data management.

By implementing these best practices, organizations can enhance their analytical capabilities, reduce storage overhead, and derive actionable insights from their data with ClickHouse. As data volumes grow, fine-tuning storage settings will remain essential for maintaining a high-performing ClickHouse environment.

To learn more about MergeTree in ClickHouse, do read the following articles:

ChistaDATA Inc.

Enterprise-class 24*7 ClickHouse Consultative Support and Managed Services

Optimizing ClickHouse: Fine-tuning Storage and MergeTree Data Variables alongside Cache Settings

Introduction

Storage Settings

(1) remote_fs_execute_merges_on_single_replica_time_threshold

(2) ratio_of_defaults_for_sparse_serialization

Cache Settings

(1) replicated_deduplication_window_for_async_inserts

(2) use_async_block_ids_cache

(3) async_block_ids_cache_min_update_interval_ms

(4) old_parts_lifetime

Conclusion

Introduction

Storage Settings

(1) remote_fs_execute_merges_on_single_replica_time_threshold​

(2) ratio_of_defaults_for_sparse_serialization

Cache Settings

(1) replicated_deduplication_window_for_async_inserts

(2) use_async_block_ids_cache

(3) async_block_ids_cache_min_update_interval_ms

(4) old_parts_lifetime

Conclusion

(1) remote_fs_execute_merges_on_single_replica_time_threshold