Understanding Storage Infrastructure and Indexing in MergeTree
Data stored by primary key. This makes SORT/SEARCH queries faster with help of a small sparse index.
ClickHouse supports partitioning – ClickHouse can automatically cut off the partition data matching to partition pruning logic implemented
You can implement redundant(support replication) and distributed Database Infrastructure Operations using ReplicatedMergeTree tables.
CREATE TABLE [IF NOT EXISTS] [chdb.]tab_nam [ON CLUSTER chcluster] ( colnam1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1], colnam2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2], ... INDEX index_colnam1 expr1 TYPE type1(...) GRANULARITY value1, INDEX index_colnam2 expr2 TYPE type2(...) GRANULARITY value2, ... PROJECTION projection_name_1 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]), PROJECTION projection_name_2 (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]) ) ENGINE = MergeTree() ORDER BY expr [PARTITION BY expr] [PRIMARY KEY expr] [SAMPLE BY expr] [TTL expr [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx' [, ...] ] [WHERE conditions] [GROUP BY key_expr [SET v1 = aggr_func(v1) [, v2 = aggr_func(v2) ...]] ] ] [SETTINGS name=value, ...]
In “Wide” format, each column is stored in a separate file of a filesystem.
In “Compact” format entire columns will be stored in a single file. We recommend a “Compact” format for optimal performance of small and very frequent INSERT operations
Note: when none of these settings is configured then data parts are stored in Wide format
- index_granularity — Maximum number of data rows between the marks of an index. Default value: 8192.
- index_granularity_bytes — Maximum size of data granules in bytes. Default value: 10Mb. To restrict the granule size only by a number of rows, set to 0 (not recommended).
- min_index_granularity_bytes — Min allowed size of data granules in bytes. Default value: 1024b. To provide a safeguard against accidentally creating tables with very low index_granularity_bytes.
- enable_mixed_granularity_parts — Enables or disables transitioning to control the granule size with the index_granularity_bytes setting. Before version 19.11, there was only the index_granularity setting for restricting granule size. The index_granularity_bytes setting improves ClickHouse performance when selecting data from tables with big rows (tens and hundreds of megabytes). If you have tables with big rows, you can enable this setting for the tables to improve the efficiency of SELECT queries.
- use_minimalistic_part_header_in_zookeeper — Storage method of the data parts headers in ZooKeeper. If use_minimalistic_part_header_in_zookeeper=1, then ZooKeeper stores less data. For more information, see the setting description in “Server configuration parameters”.
- min_merge_bytes_to_use_direct_io — The minimum data volume for merge operation that is required for using direct I/O access to the storage disk. When merging data parts, ClickHouse calculates the total storage volume of all the data to be merged. If the volume exceeds min_merge_bytes_to_use_direct_io bytes, ClickHouse reads and writes the data to the storage disk using the direct I/O interface (O_DIRECT option). If min_merge_bytes_to_use_direct_io = 0, then direct I/O is disabled. Default value: 10 * 1024 * 1024 * 1024 bytes.
- merge_with_ttl_timeout — Minimum delay in seconds before repeating a merge with delete TTL. Default value: 14400 seconds (4 hours).
- merge_with_recompression_ttl_timeout — Minimum delay in seconds before repeating a merge with recompression TTL. Default value: 14400 seconds (4 hours).
- try_fetch_recompressed_part_timeout — Timeout (in seconds) before starting merge with recompression. During this time ClickHouse tries to fetch recompressed parts from the replica which assigned this merge with recompression. Default value: 7200seconds (2 hours).
- write_final_mark — Enables or disables writing the final index mark at the end of the data part (after the last byte). Default value: 1. Don’t turn it off.
- merge_max_block_size — Maximum number of rows in the block for merge operations. Default value: 8192.
- storage_policy — Storage policy. See Using Multiple Block Devices for Data Storage.
- min_bytes_for_wide_part, min_rows_for_wide_part — Minimum number of bytes/rows in a data part that can be stored in Wide format. You can set one, both or none of these settings.
- max_parts_in_total — Maximum number of parts in all partitions.
- max_compress_block_size — Maximum size of blocks of uncompressed data before compressing for writing to a table. You can also specify this setting in the global settings (see max_compress_block_size setting). The value specified when table is created overrides the global value for this setting.
- min_compress_block_size — Minimum size of blocks of uncompressed data required for compression when writing the next mark. You can also specify this setting in the global settings. The value specified when the table is created overrides the global value for this setting.
- max_partitions_to_read — Limits the maximum number of partitions that can be accessed in one query. You can also specify setting max_partitions_to_read in the global setting.
How ClickHouse use indexes and partitions in queries?