1. Home
  2. Knowledge Base
  3. ChistaDATA
  4. Enhancing Performance and Scalability using Query Cache in ClickHouse
  1. Home
  2. Knowledge Base
  3. ClickHouse
  4. Enhancing Performance and Scalability using Query Cache in ClickHouse
  1. Home
  2. Knowledge Base
  3. ClickHouse DBA
  4. Enhancing Performance and Scalability using Query Cache in ClickHouse

Enhancing Performance and Scalability using Query Cache in ClickHouse

The query cache in any database significantly improves performance by reducing query execution time, decreasing server load, and optimizing resource utilization. It is a valuable feature for enhancing scalability, particularly in read-intensive environments. However, proper configuration and thoughtful cache invalidation strategies are necessary to maintain data consistency. Let’s explore how the query cache works in ClickHouse step by step.

Note: The query cache feature was introduced as an experimental preview in ClickHouse version 23.1, and it became production-ready starting from version 23.5. If you are using an older version, make sure to enable it with the command SET allow_experimental_query_cache = true.

 

Example: The following query took 8.934 seconds to execute, and the empty state of the system.query_cache table indicates that no results have been cached yet.

 

clickhouse-dbnode :) SELECT
    day_of_week_added,
    count() AS num,
    avg(days_present) AS avg_days_present
FROM
(
    SELECT
        added_code.line,
        added_code.time AS added_day,
        dateDiff('day', added_code.time, removed_code.time) AS days_present
    FROM
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = 1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS added_code
    INNER JOIN
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = -1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS removed_code USING (path, line)
    WHERE removed_code.time > added_code.time
)
GROUP BY dayOfWeek(added_day) AS day_of_week_added

SELECT
    day_of_week_added,
    count() AS num,
    avg(days_present) AS avg_days_present
FROM
(
    SELECT
        added_code.line,
        added_code.time AS added_day,
        dateDiff('day', added_code.time, removed_code.time) AS days_present
    FROM
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = 1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS added_code
    INNER JOIN
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = -1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS removed_code USING (path, line)
    WHERE removed_code.time > added_code.time
)
GROUP BY dayOfWeek(added_day) AS day_of_week_added

Query id: cabb79da-766c-4167-a693-079b47ac5cdd

┌─day_of_week_added─┬────num─┬───avg_days_present─┐
│                 1 │ 173241 │ 193.07947887624755 │
│                 2 │ 142198 │   152.979387895751 │
│                 3 │ 163436 │ 136.45409212168678 │
│                 4 │ 257044 │ 121.19575247817494 │
│                 5 │ 204567 │ 141.79689294949821 │
│                 6 │  62476 │ 202.50060823356168 │
│                 7 │  71084 │  219.8482499577964 │
└───────────────────┴────────┴────────────────────┘

7 rows in set. Elapsed: 8.934 sec. Processed 15.48 million rows, 147.11 MB (1.73 million rows/s., 16.47 MB/s.)

clickhouse-dbnode :) SELECT *
FROM system.query_cache

SELECT *
FROM system.query_cache

Query id: 25987b26-d62e-4aff-a0bb-717f601a2b95

Ok.

0 rows in set. Elapsed: 0.002 sec.

 

Now, let’s enhance the query by enabling the query cache with the SETTINGS use_query_cache = 1 directive. This will allow ClickHouse to utilize the query cache for optimizing subsequent executions of the same query.

Example: With the same query, we have appended the true cache settings by adding SETTINGS use_query_cache = 1 in the query.

clickhouse-dbnode :) SELECT
    day_of_week_added,
    count() AS num,
    avg(days_present) AS avg_days_present
FROM
(
    SELECT
        added_code.line,
        added_code.time AS added_day,
        dateDiff('day', added_code.time, removed_code.time) AS days_present
    FROM
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = 1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS added_code
    INNER JOIN
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = -1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS removed_code USING (path, line)
    WHERE removed_code.time > added_code.time
)
GROUP BY dayOfWeek(added_day) AS day_of_week_added SETTINGS use_query_cache = 1

SELECT
    day_of_week_added,
    count() AS num,
    avg(days_present) AS avg_days_present
FROM
(
    SELECT
        added_code.line,
        added_code.time AS added_day,
        dateDiff('day', added_code.time, removed_code.time) AS days_present
    FROM
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = 1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS added_code
    INNER JOIN
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = -1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS removed_code USING (path, line)
    WHERE removed_code.time > added_code.time
)
GROUP BY dayOfWeek(added_day) AS day_of_week_added
SETTINGS use_query_cache = 1

Query id: 48b3e250-5800-49c1-ac01-e22b24761ad0

┌─day_of_week_added─┬────num─┬───avg_days_present─┐
│                 1 │ 173241 │ 193.07947887624755 │
│                 2 │ 142198 │   152.979387895751 │
│                 3 │ 163436 │ 136.45409212168678 │
│                 4 │ 257044 │ 121.19575247817494 │
│                 5 │ 204567 │ 141.79689294949821 │
│                 6 │  62476 │ 202.50060823356168 │
│                 7 │  71084 │  219.8482499577964 │
└───────────────────┴────────┴────────────────────┘

7 rows in set. Elapsed: 8.738 sec. Processed 15.48 million rows, 147.11 MB (1.77 million rows/s., 16.84 MB/s.)

clickhouse-dbnode :)

After executing the modified query with the SETTINGS use_query_cache = 1 directive, we can observe that the system.query_cache table displays the queries that are stored in the cache. This indicates that ClickHouse is successfully utilizing the query cache to store and retrieve query results.

clickhouse-dbnode :) SELECT *
FROM system.query_cache\G

SELECT *
FROM system.query_cache

Query id: 55f7c6f5-f9f0-4eff-872e-c955b7380e4a

Row 1:
──────
query:       SELECT day_of_week_added, count() AS num, avg(days_present) AS avg_days_present FROM (SELECT added_code.line, added_code.time AS added_day, dateDiff('day', added_code.time, removed_code.time) AS days_present FROM (SELECT path, line, max(time) AS time FROM git.line_changes WHERE (sign = 1) AND (line_type NOT IN ('Punct', 'Empty')) GROUP BY path, line) AS added_code INNER JOIN (SELECT path, line, max(time) AS time FROM git.line_changes WHERE (sign = -1) AND (line_type NOT IN ('Punct', 'Empty')) GROUP BY path, line) AS removed_code USING (path, line) WHERE removed_code.time > added_code.time) GROUP BY dayOfWeek(added_day) AS day_of_week_added SETTINGS
result_size: 768
stale:       0
shared:      0
compressed:  1
expires_at:  2023-06-30 05:59:05
key_hash:    1259426933936142896

1 row in set. Elapsed: 0.013 sec.

 

By default, the time-to-live (TTL) for cached queries in ClickHouse is set to 60 seconds. This timeout can be adjusted using the query_cache_ttl variable either at the query level or at the session level, providing flexibility in managing the cache duration.

Additionally, ClickHouse sets a default maximum cache entry size of 1 MiB. However, you can control this size by utilizing the max_entry_size variable available at the server-level setting in ClickHouse’s server configuration file. This allows you to customize the cache capacity based on your specific requirements.

Now, if you execute the same query again, you will observe a remarkable improvement in performance. The query is expected to finish with an Elapsed time of just 0.002 seconds, demonstrating the effectiveness of the query cache in significantly reducing query execution time.

clickhouse-dbnode :) SELECT
    day_of_week_added,
    count() AS num,
    avg(days_present) AS avg_days_present
FROM
(
    SELECT
        added_code.line,
        added_code.time AS added_day,
        dateDiff('day', added_code.time, removed_code.time) AS days_present
    FROM
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = 1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS added_code
    INNER JOIN
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = -1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS removed_code USING (path, line)
    WHERE removed_code.time > added_code.time
)
GROUP BY dayOfWeek(added_day) AS day_of_week_added SETTINGS use_query_cache = 1

SELECT
    day_of_week_added,
    count() AS num,
    avg(days_present) AS avg_days_present
FROM
(
    SELECT
        added_code.line,
        added_code.time AS added_day,
        dateDiff('day', added_code.time, removed_code.time) AS days_present
    FROM
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = 1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS added_code
    INNER JOIN
    (
        SELECT
            path,
            line,
            max(time) AS time
        FROM git.line_changes
        WHERE (sign = -1) AND (line_type NOT IN ('Punct', 'Empty'))
        GROUP BY
            path,
            line
    ) AS removed_code USING (path, line)
    WHERE removed_code.time > added_code.time
)
GROUP BY dayOfWeek(added_day) AS day_of_week_added
SETTINGS use_query_cache = 1

Query id: d4bae98b-a2ad-4a6e-b8fa-2205e1072a3c

┌─day_of_week_added─┬────num─┬───avg_days_present─┐
│                 1 │ 173241 │ 193.07947887624755 │
│                 2 │ 142198 │   152.979387895751 │
│                 3 │ 163436 │ 136.45409212168678 │
│                 4 │ 257044 │ 121.19575247817494 │
│                 5 │ 204567 │ 141.79689294949821 │
│                 6 │  62476 │ 202.50060823356168 │
│                 7 │  71084 │  219.8482499577964 │
└───────────────────┴────────┴────────────────────┘

7 rows in set. Elapsed: 0.002 sec.

To clear the cache in ClickHouse, you can utilize the following query:

clickhouse-dbnode :) SYSTEM DROP QUERY CACHE

SYSTEM DROP QUERY CACHE

Query id: 4fdedf4d-d5fc-4b17-94bd-1d7b0dc4cd4e

Ok.

0 rows in set. Elapsed: 0.001 sec.

Executing this query will remove all cached entries from the query cache, allowing you to start with a fresh state.

Was this article helpful?

Related Articles

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

In the spirit of freedom, independence and innovation. ChistaDATA Corporation is not affiliated with ClickHouse Corporation 

Need Support?

Can't find the answer you're looking for?
Contact Support

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)

Copyright 2022 ChistaDATA Inc

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.