ClickHouse utility has a lot of tools to ease the operations. “clickhouse-compressor” is one of the tools used to perform the compression and decompression operations over the files. In this article, we will show the overview of “clickhouse-compressor”.
Supported compression algorithms:
The following compression methods are supported.
- codecs combination
“LZ4” is the default compression method.
How to perform the compression?
We have the file “cel_towers.csv” which has the size around “3.6G”.
root@tshirt:/tmp# ls -lrth cell_towers.csv -rw-r--r-- 1 root root 3.5G Feb 7 2022 cell_towers.csv
We have to compress the file using method “zstd”. The block size needs to be “2097152” bytes. We can execute the following command to do this job.
root@tshirt:/tmp# clickhouse-compressor --zstd --block-size 2097152 cell_towers.csv > cell_towers_zstd root@tshirt:/tmp# root@tshirt:/tmp# ls -lrth cell_towers_zstd -rw-r--r-- 1 root root 944M Oct 7 13:15 cell_towers_zstd
We can see that after compression, the file size has been reduced to “944M”.
To decompress the file, need to execute the following command ( –decompress ).
root@tshirt:/tmp# clickhouse-compressor --decompress --zstd cell_towers_zstd > de_cell_towers.csv root@tshirt:/tmp# root@tshirt:/tmp# ls -lrth de_cell_towers.csv -rw-r--r-- 1 root root 3.5G Oct 7 13:23 de_cell_towers.csv
To check the compressed file block status, we can use the option “–stat” with the compressed file name. For example,
root@tshirt:/tmp# clickhouse-compressor --stat cell_towers_zstd | head -n5 2097152 574323 2097152 626838 2097152 637364 2097152 638221 2097152 654227
To understand the above output,
- The first column is the block actual size
- The second column defines the compressed data size from that block.
We hope this article is helpful in understanding the tool “clickhouse-compressor”. Let us know if any feedback on this.