Among DBAs, there is much debate regarding the significance of database performance tuning. As we now understand, various data types really power the business world. It should be operating very effectively and always be accessible to ensure that the data is quickly and easily accessible for all beneficiaries.
ClickHouse supports advanced compression to improve speed and lower storage costs. Technically speaking, optimizing the ClickHouse schema’s storage architecture will boost both memory and network bandwidth performance. But what must be done to efficient compression?
Generating the JSON file with ordered column:
CREATE TABLE default.table_one ( `id` UInt64 ) ENGINE = MergeTree ORDER BY `id` insert into table_one select number/10 from numbers(5000000)
clickhouse-client -q "select * from table_one" > table_one.json gzip table_one.json
Generating the JSON file with an unordered column:
CREATE TABLE default.table_two ( `id` UInt64 ) ENGINE = MergeTree ORDER BY tuple() insert into table_two select number/10 from numbers(5000000) order by rand()
clickhouse-client -q "select * from table_two" > table_two.json gzip table_two.json
Comparing the size result:
The tables only contain 5 million rows, but you can see how much the compressed files change in size.
-rw-r--r-- 1 root root 837985 Jan 8 01:47 table_one.json.gz -rw-r--r-- 1 root root 15615551 Jan 8 01:48 table_two.json.gz
“Why does ClickHouse take so much RAM for a straightforward query, and how can we lower this requirement in production environments?”
It is actually a FAQ. The most important point is with this operation, the amount of memory used decreased significantly after the re-sorting procedure with the SELECT operations. You can quickly observe a change in the effectiveness of your SELECT queries in sorted columns. A table with order expression correctly will perform better in SELECT queries, without a doubt.
Well, how do existing tables’ columns get sorted?
You can add expressions that use only the newly added columns. So, let’s do that:
CREATE TABLE default.test_exp ( `id` UInt64, `duration` UInt64 ) ENGINE = MergeTree ORDER BY id
ALTER TABLE default.test_exp ADD COLUMN version Int32, MODIFY ORDER BY (id, version)
We now know how crucial the ORDER BY phrase is. Making the appropriate choice is crucial while creating a table. It is vital for efficient SELECT operations. Also, before compressing a table, it is advised that the current columns be re-sorted to reduce the size; however, with ClickHouse, you may include expressions that use the newly inserted columns as the sorting key. We just mentioned the
version column in the instructions because of this. The sorting procedure was unable to include the
The primary key in ClickHouse cannot be changed as online. But ChistaDATA Inc. has OSC tool for these kinds of operations. Adding or modifying the primary key is feasible if you wish to update the ORDER BY expression in your existing table. The OSC tool from ChistaDATA Inc. might be used to do this task online without cost. Just stay tuned for this tool.
As demonstrated, using a proper ordering key can boost table compression by a factor of 20 or more. Repetitive values are better compressed in such files than random values. Also, for performance-enhancing SELECT queries, remember to re-order your columns in addition to this. Therefore, less RAM can be used for your cloud service, and you can pay less.