How to configure compression parameters in ClickHouse for Performance?
ClickHouse supports advanced compression to performance while reducing storage costs. Technically, Tuning the storage infrastructure of the ClickHouse schema will equally improve the performance of both memory and network bandwidth. ClickHouse supports both LZ4 and ZSTD. LZ4 is faster than ZSTD but it provides smaller compression ratio compared to ZSTD
You can implement ZSTD by adding the following lines to the config:
<compression incl="clickhouse_compression"> <case> <method>zstd</method> </case> </compression>
LZ4 and ZSTD Compression Ratio
Compression | Ratio |
---|---|
LZ4 | 3.5 |
ZSTD | 5.0 |
Benchmarking Performance
SELECT toYear(CAMP_PERF_DATE) AS CAMPCONV, sum(CAMP_CLICKS) FROM CAMPAIGN_TAB GROUP BY CAMPCONV;
Query performance results for LZ4 compression:
Cold run:
24 rows in set. Elapsed: 37.613 sec. Processed 11.59 billion rows, 81.37 GB (971.49 million rows/s., 7.15 GB/s.)
Hot run:
24 rows in set. Elapsed: 6.158 sec. Processed 11.59 billion rows, 81.37 GB (2.74 billion rows/s., 28.51 GB/s.)
Query performance results for ZSTD compression:
Cold run:
24 rows in set. Elapsed: 39.527 sec. Processed 11.59 billion rows, 81.37 GB (856.72 million rows/s., 6.29 GB/s.)
Hot run:
24 rows in set. Elapsed: 8.173 sec. Processed 11.59 billion rows, 81.37 GB (2.38 billion rows/s., 26.53 GB/s.)
Technically the performance numbers difference in cold runs is lean. But, During hot runs, LZ4 proved faster due to much fewer I/O operations and the decompression performance factors proved expensive.
Conclusion
- We strongly recommend ZSTD when there are I/O bottleneck in queries with huge range scans
- When decompression latency is a concern, we recommend LZ4.
- In ClickHouse you have an option to specify “none” compression, This is recommended for extreme performance SSD NVME arrays