ClickHouse March 2023 Version 23.3

Table of Contents

Introduction

ClickHouse, the open-source, columnar-oriented, distributed SQL database management system, is known for its high-performance analytics and data processing capabilities. With each release, ClickHouse continues to evolve and introduce new features to enhance its functionality and performance.

In this blog post, we will explore some of the exciting new features introduced in ClickHouse 23.3, the latest release of ClickHouse, as per the release notes on the official ClickHouse website. This version includes 22 new features, 14 performance optimizations and 61 bug fixes. Let’s dive in and see what’s in store!

For further details, please see the official ClickHouse change log list here.

Source Code – GitHub Link
Installation Instructions – Official Documentation
v23.3 Release Webinar – Recording
v23.3 Release Webinar – Slides

Let’s look at some important New Features:

Undrop Table

Before discussing the Clickhouse “Undrop” feature, we must know the drop table mechanism in Clickhouse. When we execute a “drop table” command to drop a table, the table’s data stored in the storage is not physically removed for a certain period. Because ClickHouse uses MVCC for atomic DDL operations. MVCC stands for Multi-Version Concurrency Control, and It’s a technique used in database management systems, including ClickHouse, which concurrently allows multiple transactions to access and modify data without conflicts or data inconsistencies. In ClickHouse, MVCC is implemented at the storage engine level, which is responsible for managing the actual data storage and retrieval. MVCC creates separate versions of each row in the database for each modified transaction. Each transaction sees its own “snapshot” of the database, including its own version of modified rows, without affecting other concurrent transactions.

In this certain period of time (8 minutes) after a table is dropped, you can recover it with the “undrop table” command.

 --create a test table 

CREATE TABLE test_table
(
    `id` int,
    name String
)
ENGINE = MergeTree
ORDER BY id

 --insert a record to the table 

insert into test_table values (1,'chistaDATA');


-- drop table
DROP TABLE test_table

--check dropped tables
SELECT
    database,
    table,
    uuid
FROM system.dropped_tables

Query id: bffa7bd2-c654-4a94-9821-fee9d548ca8e

┌─database─┬─table──────┬─uuid─────────────────────────────────┐
│ default  │ test_table │ 17f5fcdf-b750-4194-91dc-b8a67bd6aa6f │
└──────────┴────────────┴──────────────────────────────────────┘

--undrop table from the "system.dropped_tables" 
-- just because it is experimental feature, we need to enable it first
SET allow_experimental_undrop_table_query = 1
--UUID is optional, we need to use for some cases
UNDROP TABLE default.test_table UUID '17f5fcdf-b750-4194-91dc-b8a67bd6aa6f'

-- test_table is accessible right now
SELECT *
FROM test_table

Query id: 412a3c6c-88bc-49f0-9adb-bdb539f80961

┌─id─┬─name───────┐
│  1 │ ChistaDATA │
└────┴────────────┘

Server Settings Introspection

You can access server settings in the config.xml file from the “system.server_settings” table. With the help of the following query, you can find the name, description and value of the setting parameters.

SELECT *
FROM system.server_settings
FORMAT Vertical


Row 1:
──────
name:        show_addresses_in_stack_traces
value:       1
default:     1
changed:     0
description: If it is set true will show addresses in stack traces
type:        Bool

Row 2:
──────
name:        shutdown_wait_unfinished_queries
value:       0
default:     0
changed:     0
description: If set true ClickHouse will wait for running queries finish before shutdown.
type:        Bool

Nested Dynamic Disks

ClickHouse 23.3 introduces an exciting new feature called nested dynamic disk, which enhances the storage capabilities of ClickHouse tables. With nested dynamic disk, users can create tables with nested data structures, such as nested columns, allowing for more flexible and efficient data organization. In the following example, you can create a table with the web endpoint as data and the local disk as a cache. With the help of this feature, after the first run, query results will be cached locally, and query performance will improve significantly.

ATTACH TABLE test_hits UUID '1ae36516-d62d-4218-9ae3-6516d62da218'
(
    WatchID UInt64,
    JavaEnable UInt8,
    Title String,
    GoodEvent Int16,
    EventTime DateTime,
    EventDate Date,
    CounterID UInt32,
    ClientIP UInt32,
    ClientIP6 FixedString(16),
    RegionID UInt32,
    UserID UInt64,
    CounterClass Int8,
    OS UInt8,
    UserAgent UInt8,
    URL String,
    Referer String,
    URLDomain String,
    RefererDomain String,
    Refresh UInt8,
    IsRobot UInt8,
    RefererCategories Array(UInt16),
    URLCategories Array(UInt16),
    URLRegions Array(UInt32),
    RefererRegions Array(UInt32),
    ResolutionWidth UInt16,
    ResolutionHeight UInt16,
    ResolutionDepth UInt8,
    FlashMajor UInt8,
    FlashMinor UInt8,
    FlashMinor2 String,
    NetMajor UInt8,
    NetMinor UInt8,
    UserAgentMajor UInt16,
    UserAgentMinor FixedString(2),
    CookieEnable UInt8,
    JavascriptEnable UInt8,
    IsMobile UInt8,
    MobilePhone UInt8,
    MobilePhoneModel String,
    Params String,
    IPNetworkID UInt32,
    TraficSourceID Int8,
    SearchEngineID UInt16,
    SearchPhrase String,
    AdvEngineID UInt8,
    IsArtifical UInt8,
    WindowClientWidth UInt16,
    WindowClientHeight UInt16,
    ClientTimeZone Int16,
    ClientEventTime DateTime,
    SilverlightVersion1 UInt8,
    SilverlightVersion2 UInt8,
    SilverlightVersion3 UInt32,
    SilverlightVersion4 UInt16,
    PageCharset String,
    CodeVersion UInt32,
    IsLink UInt8,
    IsDownload UInt8,
    IsNotBounce UInt8,
    FUniqID UInt64,
    HID UInt32,
    IsOldCounter UInt8,
    IsEvent UInt8,
    IsParameter UInt8,
    DontCountHits UInt8,
    WithHash UInt8,
    HitColor FixedString(1),
    UTCEventTime DateTime,
    Age UInt8,
    Sex UInt8,
    Income UInt8,
    Interests UInt16,
    Robotness UInt8,
    GeneralInterests Array(UInt16),
    RemoteIP UInt32,
    RemoteIP6 FixedString(16),
    WindowName Int32,
    OpenerName Int32,
    HistoryLength Int16,
    BrowserLanguage FixedString(2),
    BrowserCountry FixedString(2),
    SocialNetwork String,
    SocialAction String,
    HTTPError UInt16,
    SendTiming Int32,
    DNSTiming Int32,
    ConnectTiming Int32,
    ResponseStartTiming Int32,
    ResponseEndTiming Int32,
    FetchTiming Int32,
    RedirectTiming Int32,
    DOMInteractiveTiming Int32,
    DOMContentLoadedTiming Int32,
    DOMCompleteTiming Int32,
    LoadEventStartTiming Int32,
    LoadEventEndTiming Int32,
    NSToDOMContentLoadedTiming Int32,
    FirstPaintTiming Int32,
    RedirectCount Int8,
    SocialSourceNetworkID UInt8,
    SocialSourcePage String,
    ParamPrice Int64,
    ParamOrderID String,
    ParamCurrency FixedString(3),
    ParamCurrencyID UInt16,
    GoalsReached Array(UInt32),
    OpenstatServiceName String,
    OpenstatCampaignID String,
    OpenstatAdID String,
    OpenstatSourceID String,
    UTMSource String,
    UTMMedium String,
    UTMCampaign String,
    UTMContent String,
    UTMTerm String,
    FromTag String,
    HasGCLID UInt8,
    RefererHash UInt64,
    URLHash UInt64,
    CLID UInt32,
    YCLID UInt64,
    ShareService String,
    ShareURL String,
    ShareTitle String,
    ParsedParams Nested(
        Key1 String,
        Key2 String,
        Key3 String,
        Key4 String,
        Key5 String,
        ValueDouble Float64),
    IslandID FixedString(16),
    RequestNum UInt32,
    RequestTry UInt8
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS disk = disk(
    type = cache,
    path = '/var/lib/clickhouse/.cache/clickhouse-local/',
    max_size = '100G',
    disk = disk(
        type = web,
        endpoint = 'https://clickhouse-datasets.s3.yandex.net/disk-with-static-files-tests/test-hits/'));

MySQL Compatibility

This ClickHouse version supports the MySQL wire protocol, which allows MySQL clients to communicate directly with ClickHouse, opening up new possibilities for interoperability and integration between ClickHouse and MySQL-based applications. This feature enables MySQL clients to connect to ClickHouse like a MySQL server, making it easier to migrate or interact with ClickHouse using existing MySQL client libraries, tools, and connectors.

First, you must install the MySQL client on the ClickHouse DB server.

--> apt-get update              
--> apt-get install mysql-client

Now you can connect to ClickHouse DB both as MYSQL and ClickHouse itself.

Connect to the ClickHouse Database with MySQL client and run the following commands:

# mysql --port 9004 --host 127.0.0.1 --user default
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 0
Server version: 23.3.1.2823-ClickHouse 

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show tables;
+-----------+
| name      |
+-----------+
| test_hits |
+-----------+
1 row in set (0.01 sec)
Read 1 rows, 34.00 B in 0.008918 sec., 112 rows/sec., 3.72 KiB/sec.

mysql> select count() from test_hits;
+---------+
| count() |
+---------+
| 7157487 |
+---------+
1 row in set (0.00 sec)
Read 1 rows, 4.01 KiB in 0.003168 sec., 315 rows/sec., 1.24 MiB/sec.

Connect to the ClickHouse Database with ClickHouse client and run the same commands:

# clickhouse-client
ClickHouse client version 23.3.1.2823 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.3.1 revision 54462.

Warnings:
 * Linux is not using a fast clock source. Performance can be degraded. Check /sys/devices/system/clocksource/clocksource0/current_clocksource

clickhouse03 :) show tables;

SHOW TABLES

Query id: ddf45cf8-eee0-45a1-b3b8-5c50c65bcbb5

┌─name──────┐
│ test_hits │
└───────────┘

1 row in set. Elapsed: 0.003 sec. 

clickhouse03 :) select count() from test_hits;

SELECT count()
FROM test_hits

Query id: 82174d13-22dc-40dd-ae40-0374a783bccb

┌─count()─┐
│ 7157487 │
└─────────┘

1 row in set. Elapsed: 0.004 sec.

parseDateTime

This powerful new function provides enhanced functionality for parsing date and time values. This function allows users to easily convert string representations of date and time values into ClickHouse’s internal DateTime type, making it convenient for data processing and analysis.

Here is an example of this function:

SELECT parseDateTime('2023/01/21 22:22:15', '%Y/%m/%d %H:%M:%S')

┌─parseDateTime('2023/01/21 22:22:15', '%Y/%m/%d %H:%M:%S')─┐
│                                       2023-01-21 22:22:15 │
└───────────────────────────────────────────────────────────┘

Parquet/ORC/Arrow Improvements

With this release, ClickHouse significantly improves its support for Parquet, ORC, and Arrow formats.

Some of these improvements are:

- Control over compression method:
SET output_format_parquet_compression_method
    = 'snappy|lz4|brotli|zstd|gzip|none';

- Control over format version:
SET output_format_parquet_version = '2.6'

//Supported versions are: 1.0, 2.4, 2.6, 2.latest.

SSL Certificate Authentication In The Native Protocol

SSL certificate authentication in the native protocol in ClickHouse 23.3 offers improved security options for users who require secure communication between ClickHouse clients and servers, helping to safeguard their data and ensuring secure operations in ClickHouse deployments. To do so, add “certificateFile” and “privateKeyFile” parameters to clickhouse-client’s config.yaml or config.xml file. The config.yaml example of this parameter is shown below:

openSSL:
    client:
        certificateFile: 'path.cert'
        privateKeyFile: 'path.key'
        .
        .
        .

Lightweight Deletes

Lightweight DELETEs are production-ready and enabled by default in 23.3 LTS. It means you can directly use the “DELETE FROM…” command instead of “ALTER TABLE <TABLE_NAME> DELETE”.

Here is an example of lightweight delete:

DELETE FROM uk_price_paid WHERE type = 'terraced'

It is time to talk about some important Performance Optimizations:

Compressed Marks In Memory

Marks take 3 to 6 times less space in memory. You can find the cached mark size with the following query:

select * from system.asynchronous_metrics where metric='MarkCacheBytes'

As a test scenario, we ran the same query both in 23.2 and 23.3 releases and compared cached mark sizes. According to findings, marks in the 23.3 version take five times less space than in the 23.2 version.

-- "Total size of mark cache in bytes" is "0" after DB restarted(both 23.3 and 23.2)
SELECT *
FROM system.asynchronous_metrics
WHERE metric = 'MarkCacheBytes'

Query id: 4cc029d6-097d-459b-a14d-11547b2032ed

┌─metric─────────┬───value─┬─description───────────────────────┐
│ MarkCacheBytes │ 0       │ Total size of mark cache in bytes │
└────────────────┴─────────┴───────────────────────────────────┘

--Run same sample query both 23.3 and 23.2. This table includes 28113076 rows in set
select * from uk_price_paid

--23.2 After query, total size of mark cache is 1174656 bytes
SELECT *
FROM system.asynchronous_metrics
WHERE metric = 'MarkCacheBytes'

┌─metric─────────┬───value─┬─description───────────────────────┐
│ MarkCacheBytes │ 1174656 │ Total size of mark cache in bytes │
└────────────────┴─────────┴───────────────────────────────────┘

--23.3 - After query, total size of mark cache is 236992 bytes
SELECT *
FROM system.asynchronous_metrics
WHERE metric = 'MarkCacheBytes'

┌─metric─────────┬──value─┬─description───────────────────────┐
│ MarkCacheBytes │ 236992 │ Total size of mark cache in bytes │
└────────────────┴────────┴───────────────────────────────────┘

Optimizations For Backups

This version introduces a separate thread pool for the backup’s IO operations. This will allow it to scale independently of other pools and increase performance. Also, MultiRead requests and retries for collecting metadata at the final stage of backup processing are used. If a backup and restoring data are both in S3, then a server-side copy should be used from now on.

Parallel Replicas

With the new ClickHouse parallel replicas feature, you can utilize such replicas for parallelizing data processing. It is still an experimental feature and you need to allow this feature with max parallel replica.

SET allow_experimental_parallel_reading_from_replicas = 1,
    max_parallel_replicas = 3;

You can also set parallel replicas with dynamic shard features. It helps us to:

Represents every replica as a shard for distributed queries by pre-filtering data by the remainder of division or a range.
Useful for large JOIN or GROUP BY with the distributed_group_by_no_merge setting.

SET max_parallel_replicas = 3,
    parallel_replicas_custom_key = '<DynamicShardColumn>',
    parallel_replicas_custom_key_filter_type = 'default';

Since this feature is experimental and needs some benchmarking tests, we plan to discuss it in detail in a new blog post.

Temporary Tables With an Engine

In ClickHouse, a temporary table is a table that exists only for the duration of the current session or query execution and is automatically dropped and deleted when the session or query completes. Temporary tables are useful for intermediate data storage, temporary data processing, or breaking down complex queries into smaller parts. It’s important to note that temporary tables in ClickHouse are session-specific, which means they are visible only within the same session or query execution that created them. Once the session or query completes, the temporary table is automatically dropped and deleted, and its data is not persisted. In this release, support of arbitrary tables engines for temporary tables is added.

The following example shows a temp table with “MergeTree” engine.

CREATE TEMPORARY TABLE tmp_uk_price_paid
                (
                    `price` UInt32,
                    `date` Date,
                    `postcode1` LowCardinality(String),
                    `postcode2` LowCardinality(String),
                    `type` Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
                    `is_new` UInt8,
                    `duration` Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
                    `addr1` String,
                    `addr2` String,
                    `street` LowCardinality(String),
                    `locality` LowCardinality(String),
                    `town` LowCardinality(String),
                    `district` LowCardinality(String),
                    `county` LowCardinality(String)
                )
                ENGINE = MergeTree
                ORDER BY (postcode1, postcode2, addr1, addr2)

And last but not least, we’ll talk about some important Integrations:

Superset and Grafana

As of the 23.3 release, ClickHouse has added native support for Superset, an open-source business intelligence web application that ships with ClickHouse support out-of-the-box. For detailed information about Superset, you can visit our Superset blogs. ClickHouse also supports trace visualization for Grafana.

On the other hand, another alternative for polyglot monitoring and observability tools is QRYN. You can check out this tool from there.

Note: If you would like to test your scenarios, you can use the following Docker containers and container packages we prepared. Here is the link for this installation.

Conclusion

ClickHouse 23.3 brings many exciting new features that further enhance its capabilities as a high-performance data analytics database management system. ClickHouse continues evolving and provides users with powerful tools for processing, analyzing, and visualizing data. The performance optimizations and expanded integrations further solidify ClickHouse’s position as a top choice for high-velocity data processing, real-time analytics, and advanced data analytics use cases. With its open-source nature and continuous innovation, ClickHouse remains a leading solution for organizations seeking fast, scalable, and feature-rich data analytics capabilities. Upgrade to ClickHouse 23.3 to take advantage of these exciting new features and unlock the full potential of your data analytics workflows.

To know more about ClickHouse releases, do visit the following articles:

References:

https://clickhouse.com/blog/clickhouse-release-23-03

ChistaDATA Inc.

Enterprise-class 24*7 ClickHouse Consultative Support and Managed Services

ClickHouse March 2023 Release – Version 23.3