1. Home
  2. Knowledge Base
  3. ClickHouse
  4. File compression/decompression with ClickHouse Compressor utility
  1. Home
  2. Knowledge Base
  3. ClickHouse DBA
  4. File compression/decompression with ClickHouse Compressor utility
  1. Home
  2. Knowledge Base
  3. ClickHouse Performance
  4. File compression/decompression with ClickHouse Compressor utility

File compression/decompression with ClickHouse Compressor utility

ClickHouse utility has a lot of tools to ease the operations. “clickhouse-compressor” is one of the tools used to perform the compression and decompression operations over the files. In this article, we will show the overview of “clickhouse-compressor”.

Supported compression algorithms:

The following compression methods are supported.

  • LZ4
  • LZ4HC
  • ZSTD
  • deflate_qpl
  • codecs combination

“LZ4” is the default compression method.

How to perform the compression?

We have the file “cel_towers.csv” which has the size around “3.6G”.

 

root@tshirt:/tmp# ls -lrth cell_towers.csv 
-rw-r--r-- 1 root root 3.5G Feb  7  2022 cell_towers.csv

We have to compress the file using method “zstd”. The block size needs to be “2097152” bytes. We can execute the following command to do this job.

 

root@tshirt:/tmp# clickhouse-compressor --zstd --block-size 2097152 cell_towers.csv > cell_towers_zstd
root@tshirt:/tmp# 
root@tshirt:/tmp# ls -lrth cell_towers_zstd 
-rw-r--r-- 1 root root 944M Oct  7 13:15 cell_towers_zstd

We can see that after compression, the file size has been reduced to “944M”.

To decompress the file, need to execute the following command ( –decompress ).

 

root@tshirt:/tmp# clickhouse-compressor --decompress --zstd  cell_towers_zstd > de_cell_towers.csv 
root@tshirt:/tmp# 
root@tshirt:/tmp# ls -lrth de_cell_towers.csv 
-rw-r--r-- 1 root root 3.5G Oct  7 13:23 de_cell_towers.csv

To check the compressed file block status, we can use the option “–stat” with the compressed file name. For example,

 

root@tshirt:/tmp# clickhouse-compressor --stat cell_towers_zstd  | head -n5
2097152	574323
2097152	626838
2097152	637364
2097152	638221
2097152	654227

To understand the above output,

 

  • The first column is the block actual size
  • The second column defines the compressed data size from that block.

We hope this article is helpful in understanding the tool “clickhouse-compressor”. Let us know if any feedback on this.

 

 

 

Was this article helpful?

Related Articles

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

In the spirit of freedom, independence and innovation. ChistaDATA Corporation is not affiliated with ClickHouse Corporation 

Need Support?

Can't find the answer you're looking for?
Contact Support

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)

Copyright 2022 ChistaDATA Inc

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.