ClickHouse, the open-source, columnar-oriented, distributed SQL database management system, is known for its high-performance analytics and data processing capabilities. With each release, ClickHouse continues to evolve and introduce new features to enhance its functionality and performance.
In this blog post, we will explore some of the exciting new features introduced in ClickHouse 23.3, the latest release of ClickHouse, as per the release notes on the official ClickHouse website. This version includes 22 new features, 14 performance optimizations and 61 bug fixes. Let’s dive in and see what’s in store!
For further details, please see the official ClickHouse change log list here.
- Source Code – GitHub Link
- Installation Instructions – Official Documentation
- v23.3 Release Webinar – Recording
- v23.3 Release Webinar – Slides
Let’s look at some important New Features:
Before discussing the Clickhouse “Undrop” feature, we must know the drop table mechanism in Clickhouse. When we execute a “drop table” command to drop a table, the table’s data stored in the storage is not physically removed for a certain period. Because ClickHouse uses MVCC for atomic DDL operations. MVCC stands for Multi-Version Concurrency Control, and It’s a technique used in database management systems, including ClickHouse, which concurrently allows multiple transactions to access and modify data without conflicts or data inconsistencies. In ClickHouse, MVCC is implemented at the storage engine level, which is responsible for managing the actual data storage and retrieval. MVCC creates separate versions of each row in the database for each modified transaction. Each transaction sees its own “snapshot” of the database, including its own version of modified rows, without affecting other concurrent transactions.
In this certain period of time (8 minutes) after a table is dropped, you can recover it with the “undrop table” command.
--create a test table CREATE TABLE test_table ( `id` int, name String ) ENGINE = MergeTree ORDER BY id --insert a record to the table insert into test_table values (1,'chistaDATA'); -- drop table DROP TABLE test_table --check dropped tables SELECT database, table, uuid FROM system.dropped_tables Query id: bffa7bd2-c654-4a94-9821-fee9d548ca8e ┌─database─┬─table──────┬─uuid─────────────────────────────────┐ │ default │ test_table │ 17f5fcdf-b750-4194-91dc-b8a67bd6aa6f │ └──────────┴────────────┴──────────────────────────────────────┘ --undrop table from the "system.dropped_tables" -- just because it is experimental feature, we need to enable it first SET allow_experimental_undrop_table_query = 1 --UUID is optional, we need to use for some cases UNDROP TABLE default.test_table UUID '17f5fcdf-b750-4194-91dc-b8a67bd6aa6f' -- test_table is accessible right now SELECT * FROM test_table Query id: 412a3c6c-88bc-49f0-9adb-bdb539f80961 ┌─id─┬─name───────┐ │ 1 │ ChistaDATA │ └────┴────────────┘
Server Settings Introspection
You can access server settings in the config.xml file from the “system.server_settings” table. With the help of the following query, you can find the name, description and value of the setting parameters.
SELECT * FROM system.server_settings FORMAT Vertical Row 1: ────── name: show_addresses_in_stack_traces value: 1 default: 1 changed: 0 description: If it is set true will show addresses in stack traces type: Bool Row 2: ────── name: shutdown_wait_unfinished_queries value: 0 default: 0 changed: 0 description: If set true ClickHouse will wait for running queries finish before shutdown. type: Bool
Nested Dynamic Disks
ClickHouse 23.3 introduces an exciting new feature called nested dynamic disk, which enhances the storage capabilities of ClickHouse tables. With nested dynamic disk, users can create tables with nested data structures, such as nested columns, allowing for more flexible and efficient data organization. In the following example, you can create a table with the web endpoint as data and the local disk as a cache. With the help of this feature, after the first run, query results will be cached locally, and query performance will improve significantly.
This ClickHouse version supports the MySQL wire protocol, which allows MySQL clients to communicate directly with ClickHouse, opening up new possibilities for interoperability and integration between ClickHouse and MySQL-based applications. This feature enables MySQL clients to connect to ClickHouse like a MySQL server, making it easier to migrate or interact with ClickHouse using existing MySQL client libraries, tools, and connectors.
First, you must install the MySQL client on the ClickHouse DB server.
--> apt-get update --> apt-get install mysql-client
Now you can connect to ClickHouse DB both as MYSQL and ClickHouse itself.
Connect to the ClickHouse Database with MySQL client and run the following commands:
# mysql --port 9004 --host 127.0.0.1 --user default Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 0 Server version: 220.127.116.1123-ClickHouse Copyright (c) 2000, 2023, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show tables; +-----------+ | name | +-----------+ | test_hits | +-----------+ 1 row in set (0.01 sec) Read 1 rows, 34.00 B in 0.008918 sec., 112 rows/sec., 3.72 KiB/sec. mysql> select count() from test_hits; +---------+ | count() | +---------+ | 7157487 | +---------+ 1 row in set (0.00 sec) Read 1 rows, 4.01 KiB in 0.003168 sec., 315 rows/sec., 1.24 MiB/sec.
Connect to the ClickHouse Database with ClickHouse client and run the same commands:
# clickhouse-client ClickHouse client version 18.104.22.16823 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 23.3.1 revision 54462. Warnings: * Linux is not using a fast clock source. Performance can be degraded. Check /sys/devices/system/clocksource/clocksource0/current_clocksource clickhouse03 :) show tables; SHOW TABLES Query id: ddf45cf8-eee0-45a1-b3b8-5c50c65bcbb5 ┌─name──────┐ │ test_hits │ └───────────┘ 1 row in set. Elapsed: 0.003 sec. clickhouse03 :) select count() from test_hits; SELECT count() FROM test_hits Query id: 82174d13-22dc-40dd-ae40-0374a783bccb ┌─count()─┐ │ 7157487 │ └─────────┘ 1 row in set. Elapsed: 0.004 sec.
This powerful new function provides enhanced functionality for parsing date and time values. This function allows users to easily convert string representations of date and time values into ClickHouse’s internal DateTime type, making it convenient for data processing and analysis.
Here is an example of this function:
SELECT parseDateTime('2023/01/21 22:22:15', '%Y/%m/%d %H:%M:%S') ┌─parseDateTime('2023/01/21 22:22:15', '%Y/%m/%d %H:%M:%S')─┐ │ 2023-01-21 22:22:15 │ └───────────────────────────────────────────────────────────┘
With this release, ClickHouse significantly improves its support for Parquet, ORC, and Arrow formats.
Some of these improvements are:
- Control over compression method: SET output_format_parquet_compression_method = 'snappy|lz4|brotli|zstd|gzip|none'; - Control over format version: SET output_format_parquet_version = '2.6' //Supported versions are: 1.0, 2.4, 2.6, 2.latest.
SSL Certificate Authentication In The Native Protocol
SSL certificate authentication in the native protocol in ClickHouse 23.3 offers improved security options for users who require secure communication between ClickHouse clients and servers, helping to safeguard their data and ensuring secure operations in ClickHouse deployments. To do so, add “certificateFile” and “privateKeyFile” parameters to clickhouse-client’s config.yaml or config.xml file. The config.yaml example of this parameter is shown below:
openSSL: client: certificateFile: 'path.cert' privateKeyFile: 'path.key' . . .
Lightweight DELETEs are production-ready and enabled by default in 23.3 LTS. It means you can directly use the “DELETE FROM…” command instead of “ALTER TABLE <TABLE_NAME> DELETE”.
Here is an example of lightweight delete:
DELETE FROM uk_price_paid WHERE type = 'terraced'
It is time to talk about some important Performance Optimizations:
Compressed Marks In Memory
Marks take 3 to 6 times less space in memory. You can find the cached mark size with the following query:
select * from system.asynchronous_metrics where metric='MarkCacheBytes'
As a test scenario, we ran the same query both in 23.2 and 23.3 releases and compared cached mark sizes. According to findings, marks in the 23.3 version take five times less space than in the 23.2 version.
-- "Total size of mark cache in bytes" is "0" after DB restarted(both 23.3 and 23.2) SELECT * FROM system.asynchronous_metrics WHERE metric = 'MarkCacheBytes' Query id: 4cc029d6-097d-459b-a14d-11547b2032ed ┌─metric─────────┬───value─┬─description───────────────────────┐ │ MarkCacheBytes │ 0 │ Total size of mark cache in bytes │ └────────────────┴─────────┴───────────────────────────────────┘ --Run same sample query both 23.3 and 23.2. This table includes 28113076 rows in set select * from uk_price_paid --23.2 After query, total size of mark cache is 1174656 bytes SELECT * FROM system.asynchronous_metrics WHERE metric = 'MarkCacheBytes' ┌─metric─────────┬───value─┬─description───────────────────────┐ │ MarkCacheBytes │ 1174656 │ Total size of mark cache in bytes │ └────────────────┴─────────┴───────────────────────────────────┘ --23.3 - After query, total size of mark cache is 236992 bytes SELECT * FROM system.asynchronous_metrics WHERE metric = 'MarkCacheBytes' ┌─metric─────────┬──value─┬─description───────────────────────┐ │ MarkCacheBytes │ 236992 │ Total size of mark cache in bytes │ └────────────────┴────────┴───────────────────────────────────┘
Optimizations For Backups
This version introduces a separate thread pool for the backup’s IO operations. This will allow it to scale independently of other pools and increase performance. Also, MultiRead requests and retries for collecting metadata at the final stage of backup processing are used. If a backup and restoring data are both in S3, then a server-side copy should be used from now on.
With the new ClickHouse parallel replicas feature, you can utilize such replicas for parallelizing data processing. It is still an experimental feature and you need to allow this feature with max parallel replica.
SET allow_experimental_parallel_reading_from_replicas = 1, max_parallel_replicas = 3;
You can also set parallel replicas with dynamic shard features. It helps us to:
- Represents every replica as a shard for distributed queries by pre-filtering data by the remainder of division or a range.
- Useful for large JOIN or GROUP BY with the distributed_group_by_no_merge setting.
SET max_parallel_replicas = 3, parallel_replicas_custom_key = '<DynamicShardColumn>', parallel_replicas_custom_key_filter_type = 'default';
Since this feature is experimental and needs some benchmarking tests, we plan to discuss it in detail in a new blog post.
Temporary Tables With an Engine
In ClickHouse, a temporary table is a table that exists only for the duration of the current session or query execution and is automatically dropped and deleted when the session or query completes. Temporary tables are useful for intermediate data storage, temporary data processing, or breaking down complex queries into smaller parts. It’s important to note that temporary tables in ClickHouse are session-specific, which means they are visible only within the same session or query execution that created them. Once the session or query completes, the temporary table is automatically dropped and deleted, and its data is not persisted. In this release, support of arbitrary tables engines for temporary tables is added.
The following example shows a temp table with “MergeTree” engine.
CREATE TEMPORARY TABLE tmp_uk_price_paid ( `price` UInt32, `date` Date, `postcode1` LowCardinality(String), `postcode2` LowCardinality(String), `type` Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4), `is_new` UInt8, `duration` Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2), `addr1` String, `addr2` String, `street` LowCardinality(String), `locality` LowCardinality(String), `town` LowCardinality(String), `district` LowCardinality(String), `county` LowCardinality(String) ) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2)
And last but not least, we’ll talk about some important Integrations:
Superset and Grafana
As of the 23.3 release, ClickHouse has added native support for Superset, an open-source business intelligence web application that ships with ClickHouse support out-of-the-box. For detailed information about Superset, you can visit our Superset blogs. ClickHouse also supports trace visualization for Grafana.
On the other hand, another alternative for polyglot monitoring and observability tools is QRYN. You can check out this tool from there.
Note: If you would like to test your scenarios, you can use the following Docker containers and container packages we prepared. Here is the link for this installation.
ClickHouse 23.3 brings many exciting new features that further enhance its capabilities as a high-performance data analytics database management system. ClickHouse continues evolving and provides users with powerful tools for processing, analyzing, and visualizing data. The performance optimizations and expanded integrations further solidify ClickHouse’s position as a top choice for high-velocity data processing, real-time analytics, and advanced data analytics use cases. With its open-source nature and continuous innovation, ClickHouse remains a leading solution for organizations seeking fast, scalable, and feature-rich data analytics capabilities. Upgrade to ClickHouse 23.3 to take advantage of these exciting new features and unlock the full potential of your data analytics workflows.