The following query identifies the largest tables in the system schema. The results display the table name, compressed size, uncompressed size, and ratio for the top 10 tables in terms of compressed size. Each row represents a table, and the columns provide the corresponding information.
SELECT table, formatReadableSize(sum(data_compressed_bytes)) AS compressed_size, formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size, round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio FROM system.columns WHERE database = 'system' GROUP BY table ORDER BY compressed_size DESC LIMIT 10 Query id: 1d9f6512-4f1d-4ff1-9041-b9fc1f8887db ┌─table─────────────────────┬─compressed_size─┬─uncompressed_size─┬─ratio─┐ │ metric_log_1 │ 866.50 MiB │ 7.37 GiB │ 8.71 │ │ asynchronous_metric_log │ 817.12 MiB │ 18.92 GiB │ 23.71 │ │ trace_log │ 8.35 GiB │ 81.70 GiB │ 9.79 │ │ session_log │ 752.90 MiB │ 3.77 GiB │ 5.13 │ │ query_views_log │ 659.98 MiB │ 29.76 GiB │ 46.18 │ │ metric_log │ 547.56 MiB │ 7.61 GiB │ 14.22 │ │ asynchronous_metric_log_1 │ 4.93 GiB │ 53.72 GiB │ 10.9 │ │ query_log │ 3.65 GiB │ 23.16 GiB │ 6.34 │ │ part_log_1 │ 1.76 GiB │ 8.24 GiB │ 4.69 │ │ part_log │ 1.49 GiB │ 10.14 GiB │ 6.81 │ └───────────────────────────┴─────────────────┴───────────────────┴───────┘ 10 rows in set. Elapsed: 0.018 sec. Processed 2.71 thousand rows, 140.57 KB (150.82 thousand rows/s., 7.82 MB/s.)
In ClickHouse, the TTL feature allows you to automatically remove or expire data from a table after a certain period of time. It is commonly used for managing data retention and purging old or irrelevant data.
ALTER TABLE TABLE_NAME MODIFY TTL TTL_EXPRESSION ; ALTER TABLE TABLE_NAME MODIFY TTL DATEADD(day, 30, now());
Let’s break down the above statement:
- “ALTER TABLE query_log”: This part indicates that you want to modify the table named “query_log”.
- “MODIFY TTL”: This specifies that you are changing the TTL setting of the table.
- “DATEADD(day, 30, now())”: This is the new TTL value being set. It uses the
DATEADD
function to calculate a date by adding 30 days to the current date and time (now()
function). So, it means that data in the “query_log” table will be retained for 30 days from the current date and time.
By executing this ALTER TABLE statement, you are adjusting the TTL setting of the “query_log” table to retain data for 30 days. After the specified TTL period has passed, ClickHouse will automatically remove the expired data from the “query_log” table, freeing up storage space and improving query performance by reducing the amount of data to process.
ALTER TABLE query_thread_log MODIFY TTL DATEADD(day, 30, now()); ALTER TABLE part_log MODIFY TTL DATEADD(day, 30, now()); ALTER TABLE metric_log MODIFY TTL DATEADD(day, 30, now()); ALTER TABLE query_log MODIFY TTL DATEADD(day, 30, now()); ALTER TABLE asynchronous_metric_log MODIFY TTL DATEADD(day, 30, now()); ALTER TABLE query_views_log MODIFY TTL DATEADD(day, 30, now()); ALTER TABLE session_log MODIFY TTL DATEADD(day, 30, now()); ALTER TABLE crash_log MODIFY TTL DATEADD(day, 30, now()); ALTER TABLE trace_log MODIFY TTL DATEADD(day, 30, now());
If you use ALTER statements to modify the TTL (Time to Live) settings in ClickHouse, it’s important to note that these changes will not persist after a reboot. After a reboot, the TTL settings will revert back to their default values.
To ensure that the modified TTL settings are persistent across server reboots, you can update the ClickHouse configuration file located at “/etc/clickhouse-server/config.xml”.
Below are the sample changes you can make in the configuration file for the system tables. Please adjust them according to your specific requirements:
cat /etc/clickhouse-server/config.d/part_log.xml <yandex> <part_log replace="1"> <database>system</database> <table>part_log</table> <flush_interval_milliseconds>7500</flush_interval_milliseconds> <engine> ENGINE = MergeTree PARTITION BY toYYYYMM(event_date) ORDER BY (event_time) TTL toStartOfMonth(event_date) + INTERVAL 3 MONTH SETTINGS ttl_only_drop_parts=1 </engine> </part_log> </yandex> cat /etc/clickhouse-server/config.d/query_log.xml <yandex> <query_log replace="1"> <database>system</database> <table>query_log</table> <flush_interval_milliseconds>7500</flush_interval_milliseconds> <engine> ENGINE = MergeTree PARTITION BY event_date ORDER BY (event_time) TTL event_date + interval 90 day SETTINGS ttl_only_drop_parts=1 </engine> </query_log> </yandex>
Please note that these changes will help manage the table size effectively and optimize storage usage in the system.