Introduction
We are all aware that the ClickHouse team is doing the new releases on a monthly basis. Every new release comes up with new features, improvements, and many bug fixes. At ChistaData, we do the Quick peek analysis on every release and publish the dedicated blog for them. This blog post will cover the ClickHouse September release of 2022. Below, I am sharing the important links for release 22.9.
Key features & improvements
The release happened on September 22nd, 2022. It has the following aspects.
- 12 new features ( 3 experimental )
- 18 performance optimization
- 82 bug fixes
Let’s look at the main features and improvements one by one.
1. insert_quorum can be set to “auto” now
release 22.9 :) select name,description from system.settings where name like '%insert_quorum'\G Row 1: ────── name: insert_quorum description: For INSERT queries in the replicated table, wait writing for the specified number of replicas and linearize the addition of the data. 0 - disabled, 'auto' - use majority
From ClickHouse 22.9, we have a new option “auto” for the parameter “insert_quorum” ( insert_quorum=auto ). This means the INSERTs for the replicated tables will succeed only when the majority of replicas correctly write data. In this case, we can ensure the majority of the replicas have the updated data. ( Note: insert_quorum is disabled by default. )
2. INTERSECT DISTINCT and EXCEPT DISTINCT
- INTERSECT DISTINCT operator is used to combine two select statements and returns only the distinct dataset, which is common in both statements.
- EXCEPT DISTINCT operator is used to return distinct datasets from the left input query that isn’t output by the right input query.
Below, I am sharing a practical example to understand this better.
release :) select * from 229_test; ┌─id─┐ │ 1 │ │ 2 │ └────┘ 2 rows in set. Elapsed: 0.016 sec. release :) select * from 229_test_2; ┌─id─┐ │ 1 │ └────┘ 1 row in set. Elapsed: 0.007 sec. release :) (select id from 229_test) INTERSECT DISTINCT (select id from 229_test_2); ┌─id─┐ │ 1 │ └────┘ 1 row in set. Elapsed: 0.015 sec. release :) (select id from 229_test) EXCEPT DISTINCT (select id from 229_test_2); ┌─id─┐ │ 2 │ └────┘ 1 row in set. Elapsed: 0.007 sec.
3. Settings can be changed for readonly users even with readonly mode ON
The new option “changeable_in_readonly” is introduced in ClickHouse 22.9. By enabling this option, we can change the setting within the min/max range even if readonly setting is set to ON. Previously, settings were not allowed to be changed with readonly mode ON.
For example, created the READONLY user “chista_229” with max_memory_usage ( min=100000 & max=200000 ).
release :) create user chista_229 host any identified with plaintext_password by 'chista' settings max_memory_usage min 100000 max 200000 readonly; CREATE USER chista_229 IDENTIFIED WITH plaintext_password BY 'chista' HOST ANY SETTINGS max_memory_usage MIN 100000 MAX 200000 CONST Query id: 2e57f821-6657-4575-af8a-36bc0d72a964 Ok. 0 rows in set. Elapsed: 0.004 sec.
After that, I logged in with user “chista_229” and tried to update the max_memory_usage setting at the session level. But, it not works. Giving the following error.
root@release:~# clickhouse-client --user chista_229 --password chista ClickHouse client version 22.9.2.7 (official build). Connecting to localhost:9000 as user chista_229. Connected to ClickHouse server version 22.9.2 revision 54460. release :) set max_memory_usage = 100001; SET max_memory_usage = 100001 Query id: 0805cf11-4253-4e19-844a-9d370e8b96bd 0 rows in set. Elapsed: 0.036 sec. Received exception from server (version 22.9.2): Code: 452. DB::Exception: Received from localhost:9000. DB::Exception: Setting max_memory_usage should not be changed. (SETTING_CONSTRAINT_VIOLATION)
After that, At the admin console, I altered the user to apply “changeable_in_readonly”.
release :) alter user chista_229 settings max_memory_usage min 100000 max 200000 changeable_in_readonly; ALTER USER chista_229 SETTINGS max_memory_usage MIN 100000 MAX 200000 CHANGEABLE_IN_READONLY Query id: 4de148a9-5476-443b-ac25-7b59ab5ec993 Ok. 0 rows in set. Elapsed: 0.003 sec.
Then trying to change the parameter value with user “chista_229”, It works fine now.
root@release:~# clickhouse-client --user chista_229 --password chista ClickHouse client version 22.9.2.7 (official build). Connecting to localhost:9000 as user chista_229. Connected to ClickHouse server version 22.9.2 revision 54460. release :) set max_memory_usage = 100001; SET max_memory_usage = 100001 Query id: 00acbb15-0227-4763-a04f-849bb9049f6e Ok. 0 rows in set. Elapsed: 0.003 sec.
4. JSON supports
JSONObjectEachRow
- This is the new format introduced in 22.9. All data is represented as a single JSON Object, each row is represented as a separate field of this object, similar to JSONEachRow format.
- Supported for input and output.
- Support import for formats JSON/JSONCompact/JSONColumnsWithMetadata.
Example for output,
release :) select * from ch.json229; SELECT * FROM ch.json229 Query id: c7e89c89-88d7-42d3-9fb9-5a5a82f3892b ┌─id─┬─name─┐ │ 1 │ jc │ │ 2 │ herc │ │ 3 │ sri │ └────┴──────┘ 3 rows in set. Elapsed: 0.001 sec. release :) select * from ch.json229 format JSONObjectEachRow; SELECT * FROM ch.json229 FORMAT JSONObjectEachRow Query id: 09d4d4eb-7caa-4001-838c-ef98d867693c { "row_1": {"id":1,"name":"jc"}, "row_2": {"id":2,"name":"herc"}, "row_3": {"id":3,"name":"sri"} } 3 rows in set. Elapsed: 0.008 sec.
Example for input,
release :) INSERT INTO json229_import FORMAT JSONEachRow {"id":1, "name":"jc"}, {"id":2, "name":"herc"}; INSERT INTO json229_import FORMAT JSONEachRow Query id: e4aa11b2-3d58-4222-838a-b58c21c21fd8 Ok. 2 rows in set. Elapsed: 0.010 sec. release :) select * from json229_import; SELECT * FROM json229_import Query id: f297b13a-bfa3-45ae-b516-e75afdb60924 ┌─id─┬─name─┐ │ 1 │ jc │ │ 2 │ herc │ └────┴──────┘ 2 rows in set. Elapsed: 0.012 sec.
Below are the four new parameters introduced with this release. Those parameters used to define the input/output format types.
- input_format_json_validate_types_from_metadata
- input_format_json_validate_utf8
- input_format_json_read_numbers_as_strings
- output_format_json_quote_decimals
5. Parameterised Query now supports DESCRIBE keyword
Query parameters are now supported in DESCRIBE TABLE queries. This is not supported before. For example,
previous behaviour ( < 22.9 ):
oot@python:~# clickhouse client --param_par1="hi" -q "SELECT {par1:String}" hi root@python:~# clickhouse client --param_par="hi" -q "DESCRIBE TABLE (SELECT {par:String})" Received exception from server (version 22.8.5): Code: 456. DB::Exception: Received from localhost:9000. DB::Exception: Query parameter `par` was not set. (UNKNOWN_QUERY_PARAMETER) (query: DESCRIBE TABLE (SELECT {par:String}))
Current behaviour ( from 22.9 ):
root@release:~# clickhouse client --param_p0="hi" -q "SELECT {p0:String}" hi root@release:~# clickhouse client --param_par1="hi" -q "SELECT {par1:String}" hi root@release:~# clickhouse client --param_par="hi" -q "DESCRIBE TABLE (SELECT {par:String})" _CAST(\'hi\', \'String\') String
6. Important Backward Incompatible Change
ClickHouse Upgrade:
- ClickHouse upgrade from 20.3 and older ( <= 20.3 ) to 22.9 and newer ( >=22.9 ) should happen through an imtermediate version if there are any ReplicatedMergeTree tables.
- Otherwise, the server with the newer version will not start.
7. Notable Performance Improvements
- Added new storage engine KeeperMap, which uses ClickHouse Keeper or ZooKeeper as a key-value store. This storage engine is intended to store a small amount of metadata.
- Virtual filesystem over s3 will use random object names split into multiple path prefixes for better performance on AWS.
- Speed up the backup process using native copy when possible instead of copying through clickhouse-server memory.
- During startup and ATTACH call, ReplicatedMergeTree tables will be readonly until the ZooKeeper connection is made and the setup is finished. This is applicable for ClickHouse Keeper as well.
- Two new metrics in system.asynchronous_metrics. NumberOfDetachedParts and NumberOfDetachedByUserParts.
- Better support for nested data structures in Parquet format
8. Notable Bug Fixes
- Fix potential dataloss due to a bug in AWS SDK. The bug can be triggered only when clickhouse is used over S3. #40506. This bug has been open for 5 years in AWS SDK and is closed.
- Malicious data in Native format might cause a crash. #41441.
- Fix unused, unknown columns introduced by WITH statement. This fixes #37812 . #39131.
- Fix memory safety issues with functions
encrypt
andcontingency
if Array of Nullable is used as an argument. This fixes #41004. #40195. - Fix crash while parsing values of type
Object
(experimental feature) that contains arrays of variadic dimensions. #40483 - Fix incremental backups for Log family. #40827
Conclusion
ClickHouse’s September 2022 release (version 22.9) introduces key features like “auto” setting for insert_quorum, support for INTERSECT DISTINCT and EXCEPT DISTINCT set operations, and the ability to modify settings for read only users. With enhancements in JSON support and parameterized query support for DESCRIBE keyword, ClickHouse continues to evolve as a powerful analytical database.
To know more about ClickHouse releases, do visit the following articles:
- ClickHouse Release Notes – v22.10 (October 2022)
- ClickHouse Release Notes – v22.11 (November 2022)
- ClickHouse Release Notes – v22.12 (December 2022)
- ClickHouse Release Notes – v23.1 (January 2023)
- ClickHouse Release Notes – v23.3 (March 2023)
- ClickHouse Release Notes – v23.4 (April 2023)
References
https://clickhouse.com/docs/en/whats-new/changelog/