ClickHouse Query Profiler: Introduction to ChistaDATA Anansi

Introduction

ChistaDATA Anansi is a log analyzing tool built for ClickHouse. It is a CLI tool written in Go. The reports generated by the profiler shed light on the different aspects of queries like execution time, memory usage, bytes read, and other essential but crucial information.

ChistaDATA Anansi Architecture Overview

The profiler has two stages. Query parsing stage and report generation stage. The reading and information extraction from files is done in the query parsing stage. Report generation using the extracted information is done in the next.

Query parsing stage

Every file is read line by line. Information in each line is extracted using regular expressions.

Which builds a list of query information. Information that can be extracted as follows: 

  • Timestamp
  • Query Id
  • Client Host
  • Client Port
  • User
  • Query
  • Completed
  • Peak Memory Usage
  • No Rows Read
  • No Bytes Read
  • Duration
  • Initial Query Id
  • Databases
  • Tables
  • ThreadIds
  • Error Stacktrace 
  • Complete Error Text
  • Error Code
  • Error Message

Report generation stage

In this stage, the list of queries is processed to get information. All the similar queries are grouped and create, insert statements are discarded. After the grouping and filtering, the data is more refined, and we use this to generate reports.

Features

The profiles come with the following features

  • Multiple log file support
  • Report customization using CLI arguments
  • Multiple report formats are available (text, markdown)
  • Can be compiled to binaries (multiple OS and Architecture compatible)
  • Support for Huge files

Upcoming Features:

  • Compressed file support
  • S3 Support

Installation

You can get an already built binary from here.

If you want to build it yourself, Follow these steps

Prerequisites:

  • GO 1.19 (Might work on other versions not tested)
  • Git
  • make

Build Steps:

  • Clone the repo
    git clone https://github.com/ChistaDATA/ChistaDATA-Anansi-Profiler.git
  • Make ChistaDATA-Anansi-Profiler your working directory.
    cd ChistaDATA-Anansi-Profiler
  • Build
    make build

Your executable anansi-profiler will be generated in
the working directory.

Supported log files

The profiler expects the main log of clickhouse-server as input. The file is usually located at /var/log/clickhouse-server/clickhouse-server.log . If you can’t find the log file there, please check your
server configuration to determine its location. If you haven’t enabled logging, please refer to this
blog for instructions on how to set it up. In order for the query profiler to work, the query log must be enabled. You can find more on the query log here.

Usage

anansi-profiler <file-paths> ...




Arguments:

<file-paths> ...    Paths of log files




Flags:

 -h, --help                          Show context-sensitive help.

 -n, --top-query-count=10            Count of queries for top x table

 -r, --report-type="text"            Report type to be generated, types: md, text

 -c, --minimum-query-call-count=1    Minimum no of query calls needed

     --database-name="clickhouse"    database type

     --database-version="0"          database version

     --sort-field="ExecTime"         Sort queries by the given field, possible values: ExecTime, RowsRead, BytesRead, PeakMemory, QPS, QueryCount

     --sort-field-operation="max"    Sort queries by the given operation on field, possible values: sum, min, max, avg, per95, stdDev, median

     --sort-order="desc"             Sort order, possible values: asc, desc

     --log-level="error"             log level, possible values: panic, fatal, error, warn, info, debug, trace

Examples

Let’s use the profiler to generate a sample report. I am using a sample log file with 1GB data
./anansi-profiler -n 2 sample.log
Now the profiler will generate a report with three parts

  • Overall statistics
  • Top n queries ranked
  • Details of the ranked queries

Output:

# Current date: 2022-12-22 17:38:03.212385 +0530 IST m=+5.157956709

# Hostname: MacBook-Pro.local

# Files:

  * sample.log

# Overall: 148398, Unique: 17 , QPS: 9.07

# Time range: 2022-10-03 18:42:08.324308 +0000 UTC to 2022-10-03 23:14:49.970276 +0000 UTC

# Attribute          total     min     max     avg     95%  stddev  median

# ============     ======= ======= ======= ======= ======= ======= =======

# Exec time         8.35ks   0.00s   0.93s   0.06s   0.72s   0.18s   0.00s

# Rows read         32.20G    0.00 520.69k 216.96k 513.73k 161.34k 254.06k

# Bytes read        0.86TB   0.00B 25.41MB  6.09MB 24.73MB  7.65MB  2.22MB

# Peak Memory                0.00B  1.67GB  0.12GB  1.54GB  0.39GB  4.00MB




# Profile

# Rank Response time   Calls R/Call Query

# ==== =============== ===== ====== =====

#    1   0.02s   0.06% 10314  0.00s SELECT COUNT(*) FROM salary

#    2   0.04s   0.21% 10366  0.00s SELECT AVG(amount) FROM salary





# Query 1 : 1.236 QPS

# Time range: From 2022-10-03 18:42:08.801737 +0000 UTC to 2022-10-03 23:14:47.425628 +0000 UTC

# ====================================================================

# Attribute      total     min     max     avg     95%  stddev  median

# ============ ======= ======= ======= ======= ======= ======= =======

# Count         10.31k

# Exec time      4.81s   0.00s   0.02s   0.00s   0.00s   0.00s   0.00s

# Rows read     73.99k    1.00    8.00    7.17    8.00    0.48    7.00

# Bytes read   40.81MB  4.01KB  4.06KB  4.05KB  4.06KB   4.44B  4.05KB

# Peak Memory            0.00B   0.00B   0.00B   0.00B   0.00B   0.00B

# ====================================================================

# Databases    default (10314/10314) 

# Hosts        [::1] (10314/10314) 

# Users        default (10314/10314) 

# Completion   10314/10314

# Query_time distribution

# ====================================================================

#   1us 

#  10us 

# 100us  ###########################################################

#   1ms 

#  10ms 

# 100ms 

#    1s 

#  10s+ 

# ====================================================================

# Query

SELECT COUNT(*) FROM salary




# Query 2 : 1.242 QPS

# Time range: From 2022-10-03 18:42:09.511146 +0000 UTC to 2022-10-03 23:14:49.293234 +0000 UTC

# ====================================================================

# Attribute      total     min     max     avg     95%  stddev  median

# ============ ======= ======= ======= ======= ======= ======= =======

# Count         10.37k

# Exec time     17.62s   0.00s   0.04s   0.00s   0.00s   0.00s   0.00s

# Rows read      2.64G   1.00k 260.35k 254.91k 259.86k  10.36k 255.41k

# Bytes read   19.69GB  7.81KB  1.99MB  1.94MB  1.98MB 81.00KB  1.95MB

# Peak Memory          29.85KB  1.54MB  1.34MB  1.35MB 53.38KB  1.34MB

# ====================================================================

# Databases    default (10366/10366) 

# Hosts        [::1] (10366/10366) 

# Users        default (10366/10366) 

# Completion   10366/10366

# Query_time distribution

# ====================================================================

#   1us 

#  10us 

# 100us 

#   1ms  ###########################################################

#  10ms 

# 100ms 

#    1s 

#  10s+ 

# ====================================================================

# Query

SELECT AVG(amount) FROM salary

For the sake of keeping it precise, the following outputs will discard the overall statistics and detailed information of queries.

The no of queries listed can be controlled using -n argument.

If you need the top 5 queries, then run following:
./anansi-profiler -n 5 sample.log
Output:

# Profile

# Rank Response time   Calls R/Call Query

# ==== =============== ===== ====== =====

#    1   0.93s  90.37% 10419  0.72s select distinct(city) from salary s1 LEFT JOIN salary s2 ON s1.event_id = s2.eve

#    2   0.13s   0.31% 10340  0.00s select min(amount) from salary where rand > 10000

#    3   0.11s   2.49% 10357  0.02s select name, min(amount), max(amount), count(*) from salary GROUP BY name

#    4   0.10s   0.33% 10596  0.00s SELECT COUNT(*) FROM salary where timestamp < now64() - 60

#    5   0.08s   2.96% 10398  0.02s select AVG(amount) FROM salary s1 INNER JOIN salary2 s2 on s1.name = s2.city

The previous one had two queries, and this one has five queries.

The sorting of queries is based on the following parameters:

  • sort-field
  • sort-field-operation
  • sort-order

By default sort will be based on maximum execution time in descending order.

We can configure these values to generate the report we need.

Queries with minimum response time:
./anansi-profiler -n 5 --sort-order=asc sample.log
Output:

# Profile

# Rank Response time   Calls R/Call Query

# ==== =============== ===== ====== =====

#    1   0.02s   0.06% 10314  0.00s SELECT COUNT(*) FROM salary

#    2   0.04s   0.21% 10366  0.00s SELECT AVG(amount) FROM salary

#    3   0.04s   0.28% 10397  0.00s select AVG(amount) FROM salary WHERE good = 1

#    4   0.04s   0.30% 10400  0.00s select * from salary where amount > rand order by timestamp limit 10

#    5   0.06s   1.30% 10470  0.01s select * from salary where amount > rand order by event_id DESC limit 10

Queries with maximum cumulative time:
./anansi-profiler -n 5 --sort-field-operation=sum sample.log
Output:

# Profile

# Rank Response time   Calls R/Call Query

# ==== =============== ===== ====== =====

#    1   0.93s  90.37% 10419  0.72s select distinct(city) from salary s1 LEFT JOIN salary s2 ON s1.event_id = s2.eve

#    2   0.08s   2.96% 10398  0.02s select AVG(amount) FROM salary s1 INNER JOIN salary2 s2 on s1.name = s2.city

#    3   0.11s   2.49% 10357  0.02s select name, min(amount), max(amount), count(*) from salary GROUP BY name

#    4   0.06s   1.30% 10470  0.01s select * from salary where amount > rand order by event_id DESC limit 10

#    5   0.08s   0.68% 10510  0.01s select distinct(name) from salary where ilike(name, '%cross%') or ilike(name, '%

Queries ordered by the number of calls:
./anansi-profiler -n 5 --sort-field=QueryCount sample.log

Output:

# Profile

# Rank Response time   Calls R/Call Query

# ==== =============== ===== ====== =====

#    1   0.10s   0.33% 10596  0.00s SELECT COUNT(*) FROM salary where timestamp < now64() - 60

#    2   0.08s   0.68% 10510  0.01s select distinct(name) from salary where ilike(name, '%cross%') or ilike(name, '%

#    3   0.06s   1.30% 10470  0.01s select * from salary where amount > rand order by event_id DESC limit 10

#    4   0.93s  90.37% 10419  0.72s select distinct(city) from salary s1 LEFT JOIN salary s2 ON s1.event_id = s2.eve

#    5   0.04s   0.30% 10400  0.00s select * from salary where amount > rand order by timestamp limit 10

To get information on all possible configs, refer to usage.

Performance

A minimal performance analysis was done using files with different sizes.

Size (MB)Time (s)
10.02
100.08
500.37
1000.79
2561.91
5123.93
10247.87
512040.5

The test was done in a system with specs:

    Chip: Apple M1 Pro

    Memory: 16GB

Conclusion

The profiler can be used to extract crucial information from the ClickHouse log file. It can be used to process huge log files with millions of lines. It is blazing fast. Overall a simple yet powerful profiler.

To know more about Clickhouse Query Profiler, please do consider reading the following articles:

Contribution

At ChistaDATA, we value your input and always look for ways to improve our products and services. If you have any feedback or ideas for new features, we encourage you to create a GitHub issue with an RFC (Request for Comment). This will allow us to review and consider your request as we continue to develop and enhance our offerings. Thank you for your support!