Introduction
At ChistaDATA, We work for some of the largest ClickHouse installation worldwide and performance is very critical for our customers and ChistaDATA is committed to delivering optimal ClickHouse operations to the clients globally. ChistaDATA provides/publishes ClickHouse Application performance. We publish data of both hardware and software infrastructure used for benchmarking ClickHouse with GitHub / data if ever needed to reproduce the same. The performance metrics published covers both Average Response Time / Latency and Throughput. We strongly believe that it’s the responsibility of ClickHouse infrastructure stakeholders / DBAs / Data SREs / Performance Engineers to understand the thresholds of their ClickHouse operations which eventually depend on RAM, CPU, Disk performance and Network latency. The key to ClickHouse performance benchmarking is to deliver consistently reproducible results. This is really important because the reproducible results allows you to rerun the tests and also gain confidence in the overall ClickHouse performance benchmarking exercise.
Hardware Infrastructure and Software Platforms Information
CPU
Cendor and model details
# view vendor name root@Performance-LAB:~# cat /proc/cpuinfo | grep 'vendor' | uniq vendor_id : GenuineIntel
# Display model name root@Performance-LAB:~# cat /proc/cpuinfo | grep 'model name' | uniq model name : Intel Xeon Processor (Icelake)
CPU count
# Count the number of processing units root@Performance-LAB:~# cat /proc/cpuinfo | grep processor | wc -l 8
root@Performance-LAB:~# cat /proc/cpuinfo | grep 'core id' core id : 0 core id : 1 core id : 2 core id : 3 core id : 4 core id : 5 core id : 6 core id : 7
Memory Available / RAM Info.
## RAM info.
root@Performance-LAB:~#
root@Performance-LAB:~# free -h
total used free shared buff/cache available
Mem: 15Gi 286Mi 14Gi 0.0Ki 1.3Gi 14Gi
Swap: 0B 0B 0B
root@Performance-LAB:~#
Disk Infrastructure Operations – Capacity and Throughput
### Disk Information root@Performance-LAB:~# fdisk -l Disk /dev/loop0: 61.83 MiB, 64835584 bytes, 126632 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/loop1: 72.56 MiB, 76083200 bytes, 148600 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/loop2: 32.44 MiB, 34017280 bytes, 66440 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/vda: 200 GiB, 214748364800 bytes, 419430400 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: ED13961A-1618-437B-A9EB-A140AAA7F944 Device Start End Sectors Size Type /dev/vda1 227328 419430366 419203039 199.9G Linux filesystem /dev/vda14 2048 10239 8192 4M BIOS boot /dev/vda15 10240 227327 217088 106M Microsoft basic data
Software Infrastructure
## Linux distribution / Ubuntu release root@PERFORMANCE-LAB:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 21.10 Release: 21.10 Codename: impish ## ClickHouse version root@PERFORMANCE-LAB:~# clickhouse-client --password ClickHouse client version 21.11.6.7 (official build). Password for user (default): Connecting to localhost:9000 as user default. Connected to ClickHouse server version 21.11.6 revision 54450. PERFORMANCE-LAB :)
Building ClickHouse Infrastructure for Performance Benchmarking
Source: Brown University Benchmark (MgBench is a new analytical benchmark for machine-generated log data) provided by Andrew Crotty
Step 1: Download the datafor benchmarking ClickHouse
wget https://datasets.clickhouse.com/mgbench{1..3}.csv.xz
Step 2: Unpack the data downloaded
xz -v -d mgbench{1..3}.csv.xz
Step 3: Create Schema Objects / Tables
CREATE DATABASE mgbench; CREATE TABLE mgbench.logs1 ( log_time DateTime, machine_name LowCardinality(String), machine_group LowCardinality(String), cpu_idle Nullable(Float32), cpu_nice Nullable(Float32), cpu_system Nullable(Float32), cpu_user Nullable(Float32), cpu_wio Nullable(Float32), disk_free Nullable(Float32), disk_total Nullable(Float32), part_max_used Nullable(Float32), load_fifteen Nullable(Float32), load_five Nullable(Float32), load_one Nullable(Float32), mem_buffers Nullable(Float32), mem_cached Nullable(Float32), mem_free Nullable(Float32), mem_shared Nullable(Float32), swap_free Nullable(Float32), bytes_in Nullable(Float32), bytes_out Nullable(Float32) ) ENGINE = MergeTree() ORDER BY (machine_group, machine_name, log_time); CREATE TABLE mgbench.logs2 ( log_time DateTime, client_ip IPv4, request String, status_code UInt16, object_size UInt64 ) ENGINE = MergeTree() ORDER BY log_time; CREATE TABLE mgbench.logs3 ( log_time DateTime64, device_id FixedString(15), device_name LowCardinality(String), device_type LowCardinality(String), device_floor UInt8, event_type LowCardinality(String), event_unit FixedString(1), event_value Nullable(Float32) ) ENGINE = MergeTree() ORDER BY (event_type, log_time);
Step 4: Data loading
clickhouse-client --query "INSERT INTO mgbench.logs1 FORMAT CSVWithNames" < mgbench1.csv clickhouse-client --query "INSERT INTO mgbench.logs2 FORMAT CSVWithNames" < mgbench2.csv clickhouse-client --query "INSERT INTO mgbench.logs3 FORMAT CSVWithNames" < mgbench3.csv
Benchmarking ClickHouse Performance
SQL 1: What is the CPU/network utilization for each web server since midnight?
SELECT machine_name,
MIN(cpu) AS cpu_min,
MAX(cpu) AS cpu_max,
AVG(cpu) AS cpu_avg,
MIN(net_in) AS net_in_min,
MAX(net_in) AS net_in_max,
AVG(net_in) AS net_in_avg,
MIN(net_out) AS net_out_min,
MAX(net_out) AS net_out_max,
AVG(net_out) AS net_out_avg
FROM (
SELECT machine_name,
COALESCE(cpu_user, 0.0) AS cpu,
COALESCE(bytes_in, 0.0) AS net_in,
COALESCE(bytes_out, 0.0) AS net_out
FROM logs1
WHERE machine_name IN ('anansi','aragog','urd')
AND log_time >= TIMESTAMP '2017-01-11 00:00:00'
) AS r
GROUP BY machine_name\G
Query id: 6772d840-bc4f-4d90-8d78-d47a8ecf332e
Row 1:
──────
machine_name: urd
cpu_min: 0
cpu_max: 23.493331909179688
cpu_avg: 1.499065357465967
net_in_min: 1168.3499755859375
net_in_max: 368025.625
net_in_avg: 22100.285078089873
net_out_min: 220.95001220703125
net_out_max: 108069.65625
net_out_avg: 8129.441456009021
Row 2:
──────
machine_name: anansi
cpu_min: 0
cpu_max: 22.5
cpu_avg: 0.1426860871961902
net_in_min: 938.949951171875
net_in_max: 2355335.75
net_in_avg: 9177.160347696004
net_out_min: 216.8800048828125
net_out_max: 2354536.25
net_out_avg: 13467.316385337206
Row 3:
──────
machine_name: aragog
cpu_min: 0
cpu_max: 31.14000129699707
cpu_avg: 0.46608482347828106
net_in_min: 1318.68994140625
net_in_max: 27836488
net_in_avg: 328922.02496959124
net_out_min: 308.9499816894531
net_out_max: 40642404
net_out_avg: 825490.5620742807
↖ Progress: 217.89 thousand rows, 607.23 KB (25.75 million rows/s., 71.75 M
3 rows in set. Elapsed: 0.009 sec. Processed 217.89 thousand rows, 607.23 KB (25.39 million rows/s., 70.75 MB/s.)
SQL 2: Which computer lab machines have been offline in the past day?
SELECT machine_name,
log_time
FROM logs1
WHERE (machine_name LIKE 'cslab%' OR
machine_name LIKE 'mslab%')
AND load_one IS NULL
AND log_time >= TIMESTAMP '2017-01-10 00:00:00'
ORDER BY machine_name,
log_time;
Query id: fb5f4823-8221-404c-aec5-4e4ef59a95db
Row 1:
──────
machine_name: cslab9e
log_time: 2017-01-10 14:14:15
Row 2:
──────
machine_name: cslab9e
log_time: 2017-01-10 14:14:30
Row 3:
──────
machine_name: cslab9e
log_time: 2017-01-10 14:14:45
Row 4:
──────
machine_name: cslab9e
log_time: 2017-01-10 14:15:15
Row 5:
──────
machine_name: cslab9e
log_time: 2017-01-10 14:15:30
Row 6:
──────
machine_name: cslab9e
log_time: 2017-01-10 14:15:45
Row 7:
──────
machine_name: cslab9e
log_time: 2017-01-10 14:16:15
Row 8:
──────
machine_name: cslab9e
log_time: 2017-01-10 14:16:30
Row 9:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:16:45
Row 10:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:17:15
Row 11:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:17:30
Row 12:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:17:45
Row 13:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:18:15
Row 14:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:18:30
Row 15:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:18:45
Row 16:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:19:15
Row 17:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:19:30
Row 18:
───────
machine_name: cslab9e
log_time: 2017-01-10 14:19:45
↗ Progress: 1.63 million rows, 14.84 MB (92.92 million rows/s., 843.85 MB/s
18 rows in set. Elapsed: 0.018 sec. Processed 1.63 million rows, 14.84 MB (91.67 million rows/s., 832.47 MB/s.)
SQL 3: What are the hourly average metrics during the past 10 days for a specific workstation?
SELECT dt,
hr,
AVG(load_fifteen) AS load_fifteen_avg,
AVG(load_five) AS load_five_avg,
AVG(load_one) AS load_one_avg,
AVG(mem_free) AS mem_free_avg,
AVG(swap_free) AS swap_free_avg
FROM (
SELECT CAST(log_time AS DATE) AS dt,
EXTRACT(HOUR FROM log_time) AS hr,
load_fifteen,
load_five,
load_one,
mem_free,
swap_free
FROM logs1
WHERE machine_name = 'babbage'
AND load_fifteen IS NOT NULL
AND load_five IS NOT NULL
AND load_one IS NOT NULL
AND mem_free IS NOT NULL
AND swap_free IS NOT NULL
AND log_time >= TIMESTAMP '2017-01-01 00:00:00'
) AS r
GROUP BY dt,
hr
ORDER BY dt,
hr\G
Row 1:
──────
dt: 2017-01-01
hr: 0
load_fifteen_avg: 0.12023027762770652
load_five_avg: 0.09362472295761108
load_one_avg: 0.08744388920943796
mem_free_avg: 2170359.9625
swap_free_avg: 8388604
Row 2:
──────
dt: 2017-01-01
hr: 1
load_fifteen_avg: 0.06573361121118068
load_five_avg: 0.06255777780897916
load_one_avg: 0.05611999999382533
mem_free_avg: 2170050.05
swap_free_avg: 8388604
Row 3:
──────
dt: 2017-01-01
hr: 2
load_fifteen_avg: 0.06406694483011961
load_five_avg: 0.08363861131171385
load_one_avg: 0.07870277822948993
mem_free_avg: 2170090.8666666667
swap_free_avg: 8388604
Row 4:
──────
dt: 2017-01-01
hr: 3
load_fifteen_avg: 0.0986541671678424
load_five_avg: 0.1006138886945943
load_one_avg: 0.10358000136911868
mem_free_avg: 2169553.683333333
swap_free_avg: 8388604
Row 5:
──────
dt: 2017-01-01
hr: 4
load_fifteen_avg: 0.09523666650056839
load_five_avg: 0.09710055496543646
load_one_avg: 0.09572555573152688
mem_free_avg: 2167171.0083333333
swap_free_avg: 8388604
↗ Progress: 193.32 thousand rows, 2.34 MB (15.36 million rows/s., 186.18
255 rows in set. Elapsed: 0.013 sec. Processed 193.32 thousand rows, 2.34 MB (15.21 million rows/s., 184.41 MB/s.)
SQL 4: Over 1 month, how often was each server blocked on disk I/O?
SELECT machine_name,
COUNT(*) AS spikes
FROM logs1
WHERE machine_group = 'Servers'
AND cpu_wio > 0.99
AND log_time >= TIMESTAMP '2016-12-01 00:00:00'
AND log_time < TIMESTAMP '2017-01-01 00:00:00'
GROUP BY machine_name
ORDER BY spikes DESC
LIMIT 10\G
Query id: 3d9c7259-85f6-44cf-a538-acad00c162df
Row 1:
──────
machine_name: sourpatch
spikes: 8861
Row 2:
──────
machine_name: louie
spikes: 581
Row 3:
──────
machine_name: zotz
spikes: 550
Row 4:
──────
machine_name: pieces
spikes: 495
Row 5:
──────
machine_name: reeces
spikes: 453
Row 6:
──────
machine_name: thing
spikes: 438
Row 7:
──────
machine_name: adminhost
spikes: 366
Row 8:
──────
machine_name: cadbury
spikes: 357
Row 9:
───────
machine_name: york
spikes: 193
Row 10:
───────
machine_name: reflect
spikes: 156
↙ Progress: 1.65 million rows, 16.66 MB (104.63 million rows/s., 1.06 GB/s.
10 rows in set. Elapsed: 0.016 sec. Processed 1.65 million rows, 16.66 MB (102.57 million rows/s., 1.04 GB/s.)
SQL 5: Which externally reachable VMs have run low on memory?
SELECT machine_name,
dt,
MIN(mem_free) AS mem_free_min
FROM (
SELECT machine_name,
CAST(log_time AS DATE) AS dt,
mem_free
FROM logs1
WHERE machine_group = 'DMZ'
AND mem_free IS NOT NULL
) AS r
GROUP BY machine_name,
dt
HAVING MIN(mem_free) < 10000
ORDER BY machine_name,
dt;
Query id: f4af0029-2707-4c09-b775-160dfcebc957
Row 1:
──────
machine_name: chimera
dt: 2016-11-04
mem_free_min: 6135.48
Row 2:
──────
machine_name: chimera
dt: 2016-11-26
mem_free_min: 6045.8667
Row 3:
──────
machine_name: chimera
dt: 2016-11-27
mem_free_min: 9374.527
Row 4:
──────
machine_name: chimera
dt: 2016-11-30
mem_free_min: 6933.56
Row 5:
──────
machine_name: chimera
dt: 2016-12-01
mem_free_min: 6535.8135
Row 6:
──────
machine_name: chimera
dt: 2016-12-02
mem_free_min: 6393.74
Row 7:
──────
machine_name: chimera
dt: 2016-12-11
mem_free_min: 8010.5
Row 8:
──────
machine_name: chimera
dt: 2016-12-30
mem_free_min: 8167.6665
Row 9:
───────
machine_name: chimera
dt: 2016-12-31
mem_free_min: 6324.6665
Row 10:
───────
machine_name: chimera
dt: 2017-01-01
mem_free_min: 5972
Row 11:
───────
machine_name: chimera
dt: 2017-01-03
mem_free_min: 5948
Row 12:
───────
machine_name: chimera
dt: 2017-01-04
mem_free_min: 7320
Row 13:
───────
machine_name: chimera
dt: 2017-01-05
mem_free_min: 8144
Row 14:
───────
machine_name: chimera
dt: 2017-01-07
mem_free_min: 6308.4663
Row 15:
───────
machine_name: chimera
dt: 2017-01-09
mem_free_min: 9370
Row 16:
───────
machine_name: chimera
dt: 2017-01-10
mem_free_min: 7147.467
↖ Progress: 3.54 million rows, 39.12 MB (108.02 million rows/s., 1.19 GB/s.
16 rows in set. Elapsed: 0.033 sec. Processed 3.54 million rows, 39.12 MB (106.94 million rows/s., 1.18 GB/s.)
SQL 7: What is the total hourly network traffic across all file servers?
SELECT dt,
hr,
SUM(net_in) AS net_in_sum,
SUM(net_out) AS net_out_sum,
SUM(net_in) + SUM(net_out) AS both_sum
FROM (
SELECT CAST(log_time AS DATE) AS dt,
EXTRACT(HOUR FROM log_time) AS hr,
COALESCE(bytes_in, 0.0) / 1000000000.0 AS net_in,
COALESCE(bytes_out, 0.0) / 1000000000.0 AS net_out
FROM logs1
WHERE machine_name IN ('allsorts','andes','bigred','blackjack','bonbon',
'cadbury','chiclets','cotton','crows','dove','fireball','hearts','huey',
'lindt','milkduds','milkyway','mnm','necco','nerds','orbit','peeps',
'poprocks','razzles','runts','smarties','smuggler','spree','stride',
'tootsie','trident','wrigley','york')
) AS r
GROUP BY dt,
hr
ORDER BY both_sum DESC
LIMIT 10\G
Query id: 78f32d40-af55-49e1-8338-768144f54c2b
Row 1:
──────
dt: 2017-01-10
hr: 17
net_in_sum: 87.94858539170787
net_out_sum: 71.38525362068188
both_sum: 159.33383901238975
Row 2:
──────
dt: 2017-01-10
hr: 18
net_in_sum: 89.64930670570355
net_out_sum: 69.15606039811588
both_sum: 158.80536710381944
Row 3:
──────
dt: 2017-01-10
hr: 19
net_in_sum: 73.60724927620798
net_out_sum: 81.40722209611724
both_sum: 155.0144713723252
Row 4:
──────
dt: 2017-01-10
hr: 20
net_in_sum: 68.54794632249272
net_out_sum: 85.50667667204637
both_sum: 154.0546229945391
Row 5:
──────
dt: 2017-01-11
hr: 13
net_in_sum: 42.0006900292216
net_out_sum: 69.61384714089114
both_sum: 111.61453717011274
Row 6:
──────
dt: 2017-01-10
hr: 16
net_in_sum: 52.18435269713231
net_out_sum: 56.94834975170359
both_sum: 109.1327024488359
Row 7:
──────
dt: 2017-01-10
hr: 15
net_in_sum: 58.92187069434693
net_out_sum: 47.70097635240866
both_sum: 106.6228470467556
Row 8:
──────
dt: 2017-01-10
hr: 14
net_in_sum: 60.98027388591966
net_out_sum: 44.4488195209332
both_sum: 105.42909340685286
Row 9:
───────
dt: 2017-01-10
hr: 21
net_in_sum: 45.79773242298867
net_out_sum: 54.048210619660104
both_sum: 99.84594304264877
Row 10:
───────
dt: 2017-01-11
hr: 4
net_in_sum: 48.96217601967555
net_out_sum: 48.37625873783819
both_sum: 97.33843475751374
↗ Progress: 1.79 million rows, 20.13 MB (78.76 million rows/s., 887.07 MB/s
10 rows in set. Elapsed: 0.023 sec. Processed 1.79 million rows, 20.13 MB (77.66 million rows/s., 874.69 MB/s.)
SQL 8: Which requests have caused server errors within the past 2 weeks?
SELECT * FROM logs2 WHERE status_code >= 500 AND log_time >= TIMESTAMP '2012-12-18 00:00:00' ORDER BY log_time\G Query id: 41fbe700-34ff-4fd4-ab90-4fcbfeae9715 Row 1: ────── log_time: 2012-12-21 16:58:00 client_ip: 229.50.247.232 request: //cgi-bin/feedback/ status_code: 500 object_size: 410 Row 2: ────── log_time: 2012-12-23 23:28:02 client_ip: 229.50.247.232 request: //cgi-bin/feedback/ status_code: 500 object_size: 410 ↗ Progress: 2.00 million rows, 4.72 MB (338.82 million rows/s., 798.60 MB/s 2 rows in set. Elapsed: 0.006 sec. Processed 2.00 million rows, 4.72 MB (324.94 million rows/s., 765.89 MB/s.)
SQL 9: During a specific 2-week period, was the user password file leaked?
SELECT * FROM logs2 WHERE status_code >= 200 AND status_code < 300 AND request LIKE '%/etc/passwd%' AND log_time >= TIMESTAMP '2012-05-06 00:00:00' AND log_time < TIMESTAMP '2012-05-20 00:00:00'\G Query id: 5782f776-56c9-4701-a1eb-0213f2ee96a9 Row 1: ────── log_time: 2012-05-09 14:46:58 client_ip: 201.183.185.11 request: /?-nd+auto_prepend_file%3D/etc/passwd status_code: 200 object_size: 21173 Row 2: ────── log_time: 2012-05-13 20:17:05 client_ip: 201.183.185.11 request: /?-nd+auto_prepend_file%3D/etc/passwd status_code: 200 object_size: 21809 ↖ Progress: 1.83 million rows, 120.13 MB (45.43 million rows/s., 2.99 GB/s. 2 rows in set. Elapsed: 0.040 sec. Processed 1.83 million rows, 120.13 MB (45.12 million rows/s., 2.97 GB/s.)
SQL 10: What was the average path depth for top-level requests in the past month?
SELECT top_level,
AVG(LENGTH(request) - LENGTH(REPLACE(request, '/', ''))) AS depth_avg
FROM (
SELECT SUBSTRING(request FROM 1 FOR len) AS top_level,
request
FROM (
SELECT POSITION(SUBSTRING(request FROM 2), '/') AS len,
request
FROM logs2
WHERE status_code >= 200
AND status_code < 300
AND log_time >= TIMESTAMP '2012-12-01 00:00:00'
) AS r
WHERE len > 0
) AS s
WHERE top_level IN ('/about','/courses','/degrees','/events',
'/grad','/industry','/news','/people',
'/publications','/research','/teaching','/ugrad')
GROUP BY top_level
ORDER BY top_level\G
Query id: dfeb4126-4564-443c-b571-c70130ca1768
↑ Progress: 4.33 million rows, 230.42 MB (39.11 million rows/s., 2.08 GB/s.Row 1:
──────
top_level: /about
depth_avg: 4.180225643663013
Row 2:
──────
top_level: /courses
depth_avg: 5.138040610251345
Row 3:
──────
top_level: /degrees
depth_avg: 3.385165907612232
Row 4:
──────
top_level: /events
depth_avg: 3.105441792057925
Row 5:
──────
top_level: /grad
depth_avg: 3.6942030710629687
Row 6:
──────
top_level: /industry
depth_avg: 4.094312105007292
Row 7:
──────
top_level: /news
depth_avg: 2.963451710977424
Row 8:
──────
top_level: /people
depth_avg: 4.485873513820637
Row 9:
───────
top_level: /publications
depth_avg: 2.245890410958904
Row 10:
───────
top_level: /research
depth_avg: 5.040612211618354
Row 11:
───────
top_level: /teaching
depth_avg: 2
Row 12:
───────
top_level: /ugrad
depth_avg: 2.4527363184079602
↗ Progress: 4.33 million rows, 230.42 MB (27.43 million rows/s., 1.46 GB/s.→ Progress: 4.80 million rows, 255.41 MB (30.39 million rows/s., 1.62 GB/s.
12 rows in set. Elapsed: 0.158 sec. Processed 4.80 million rows, 255.41 MB (30.35 million rows/s., 1.62 GB/s.)
SQL 11: During the last 3 months, which clients have made an excessive number of requests?
SELECT client_ip,
COUNT(*) AS num_requests
FROM logs2
WHERE log_time >= TIMESTAMP '2012-10-01 00:00:00'
GROUP BY client_ip
HAVING COUNT(*) >= 100000
ORDER BY num_requests DESC\G
Query id: 605feb05-9719-46fa-ba69-324b53ec1277
Row 1:
──────
client_ip: 219.63.173.93
num_requests: 1540391
Row 2:
──────
client_ip: 229.50.247.232
num_requests: 743801
Row 3:
──────
client_ip: 97.211.80.244
num_requests: 733261
Row 4:
──────
client_ip: 152.149.228.251
num_requests: 492221
Row 5:
──────
client_ip: 198.156.249.133
num_requests: 370834
Row 6:
──────
client_ip: 70.86.124.37
num_requests: 273057
Row 7:
──────
client_ip: 67.153.111.239
num_requests: 167287
Row 8:
──────
client_ip: 249.92.17.134
num_requests: 112909
↓ Progress: 17.98 million rows, 143.83 MB (221.43 million rows/s., 1.77 GB/
8 rows in set. Elapsed: 0.082 sec. Processed 17.98 million rows, 143.83 MB (220.44 million rows/s., 1.76 GB/s.)
SQL 12: What are the daily unique visitors ?
SELECT dt,
COUNT(DISTINCT client_ip)
FROM (
SELECT CAST(log_time AS DATE) AS dt,
client_ip
FROM logs2
) AS r
GROUP BY dt
ORDER BY dt\G
Row 361:
────────
dt: 2012-12-26
uniqExact(client_ip): 8799
Row 362:
────────
dt: 2012-12-27
uniqExact(client_ip): 9435
Row 363:
────────
dt: 2012-12-28
uniqExact(client_ip): 9185
Row 364:
────────
dt: 2012-12-29
uniqExact(client_ip): 8247
Row 365:
────────
dt: 2012-12-30
uniqExact(client_ip): 8328
Row 366:
────────
dt: 2012-12-31
uniqExact(client_ip): 7824
← Progress: 75.75 million rows, 605.98 MB (273.04 million rows/s., 2.18 G
366 rows in set. Elapsed: 0.278 sec. Processed 75.75 million rows, 605.98 MB (272.73 million rows/s., 2.18 GB/s.)
SQL 13: What are the average and maximum data transfer rates (Gbps)?
SELECT AVG(transfer) / 125000000.0 AS transfer_avg,
MAX(transfer) / 125000000.0 AS transfer_max
FROM (
SELECT log_time,
SUM(object_size) AS transfer
FROM logs2
GROUP BY log_time
) AS r\G
Query id: 06ad0671-d9ea-4881-a68f-7865b193d219
↖ Progress: 17.24 million rows, 206.83 MB (168.95 million rows/s., 2.03 G↑ Progress: 35.32 million rows, 423.79 MB (174.44 million rows/s., 2.09 G↗ Progress: 62.59 million rows, 751.14 MB (206.70 million rows/s., 2.48 G→ Progress: 70.30 million rows, 843.54 MB (174.35 million rows/s., 2.09 G↘ Progress: 70.69 million rows, 848.26 MB (140.37 million rows/s., 1.68 G↓ Progress: 72.26 million rows, 867.13 MB (119.64 million rows/s., 1.44 G↙ Progress: 75.75 million rows, 908.98 MB (107.54 million rows/s., 1.29 GRow 1:
──────
transfer_avg: 0.0046296999419207785
transfer_max: 295.028835936
← Progress: 75.75 million rows, 908.98 MB (64.98 million rows/s., 779.80
1 rows in set. Elapsed: 1.166 sec. Processed 75.75 million rows, 908.98 MB (64.94 million rows/s., 779.28 MB/s.)
SQL 14: Did the indoor temperature reach freezing over the weekend?
SELECT * FROM logs3 WHERE event_type = 'temperature' AND event_value <= 32.0 AND log_time >= '2019-11-29 17:00:00.000'\G Row 1: ────── log_time: 2019-12-01 09:30:13.417 device_id: 16C448031108ABA device_name: balcony_5 device_type: door device_floor: 5 event_type: temperature event_unit: F event_value: 32 Row 2: ────── log_time: 2019-12-01 09:59:58.874 device_id: 16C448031108ABA device_name: balcony_5 device_type: door device_floor: 5 event_type: temperature event_unit: F event_value: 32 ↑ Progress: 7.35 thousand rows, 228.52 KB (1.56 million rows/s., 48.57 MB 2 rows in set. Elapsed: 0.005 sec. Processed 7.35 thousand rows, 228.52 KB (1.51 million rows/s., 46.96 MB/s.)
SQL 15: Over the past 6 months, how frequently were each door opened?
SELECT device_name,
device_floor,
COUNT(*) AS ct
FROM logs3
WHERE event_type = 'door_open'
AND log_time >= '2019-06-01 00:00:00.000'
GROUP BY device_name,
device_floor
ORDER BY ct DESC\G
Row 15:
───────
device_name: lobby_left_1
device_floor: 1
ct: 2944
Row 16:
───────
device_name: stairs_north_5
device_floor: 5
ct: 1731
Row 17:
───────
device_name: lobby_right_1
device_floor: 1
ct: 1641
Row 18:
───────
device_name: stairs_east_5
device_floor: 5
ct: 1637
Row 19:
───────
device_name: balcony_3
device_floor: 3
ct: 1619
Row 20:
───────
device_name: stairs_east_4
device_floor: 4
ct: 1093
→ Progress: 245.76 thousand rows, 2.71 MB (14.90 million rows/s., 164.09
20 rows in set. Elapsed: 0.017 sec. Processed 245.76 thousand rows, 2.71 MB (14.67 million rows/s., 161.52 MB/s.)
SQL 16: For each device category, what are the monthly power consumption metrics?
SELECT yr,
mo,
SUM(coffee_hourly_avg) AS coffee_monthly_sum,
AVG(coffee_hourly_avg) AS coffee_monthly_avg,
SUM(printer_hourly_avg) AS printer_monthly_sum,
AVG(printer_hourly_avg) AS printer_monthly_avg,
SUM(projector_hourly_avg) AS projector_monthly_sum,
AVG(projector_hourly_avg) AS projector_monthly_avg,
SUM(vending_hourly_avg) AS vending_monthly_sum,
AVG(vending_hourly_avg) AS vending_monthly_avg
FROM (
SELECT dt,
yr,
mo,
hr,
AVG(coffee) AS coffee_hourly_avg,
AVG(printer) AS printer_hourly_avg,
AVG(projector) AS projector_hourly_avg,
AVG(vending) AS vending_hourly_avg
FROM (
SELECT CAST(log_time AS DATE) AS dt,
EXTRACT(YEAR FROM log_time) AS yr,
EXTRACT(MONTH FROM log_time) AS mo,
EXTRACT(HOUR FROM log_time) AS hr,
CASE WHEN device_name LIKE 'coffee%' THEN event_value END AS coffee,
CASE WHEN device_name LIKE 'printer%' THEN event_value END AS printer,
CASE WHEN device_name LIKE 'projector%' THEN event_value END AS projector,
CASE WHEN device_name LIKE 'vending%' THEN event_value END AS vending
FROM logs3
WHERE device_type = 'meter'
) AS r
GROUP BY dt,
yr,
mo,
hr
) AS s
GROUP BY yr,
mo
ORDER BY yr,
mo\G
Row 25:
───────
yr: 2019
mo: 9
coffee_monthly_sum: 862784.4318728528
coffee_monthly_avg: 1198.3117109345178
printer_monthly_sum: 63348.6489027506
printer_monthly_avg: 87.98423458715362
projector_monthly_sum: 56973.39566326607
projector_monthly_avg: 126.32681965247465
vending_monthly_sum: 322162.84531000565
vending_monthly_avg: 447.4483962638967
Row 26:
───────
yr: 2019
mo: 10
coffee_monthly_sum: 900170.2429565514
coffee_monthly_avg: 1209.9062405329992
printer_monthly_sum: 64953.121362668586
printer_monthly_avg: 87.30258247670508
projector_monthly_sum: 98168.02805722524
projector_monthly_avg: 218.6370335350228
vending_monthly_sum: 310577.09491537174
vending_monthly_avg: 417.44233187549963
Row 27:
───────
yr: 2019
mo: 11
coffee_monthly_sum: 881836.9061517064
coffee_monthly_avg: 1224.773480766259
printer_monthly_sum: 61941.82869595774
printer_monthly_avg: 86.03031763327463
projector_monthly_sum: 181716.37704881537
projector_monthly_avg: 294.9941185857392
vending_monthly_sum: 258322.18736711267
vending_monthly_avg: 358.7808157876565
Row 28:
───────
yr: 2019
mo: 12
coffee_monthly_sum: 45118.35452979575
coffee_monthly_avg: 1253.28762582766
printer_monthly_sum: 2884.8668441554055
printer_monthly_avg: 80.13519011542793
projector_monthly_sum: 13576.481448950077
projector_monthly_avg: 377.1244846930577
vending_monthly_sum: 12249.124855753844
vending_monthly_avg: 340.25346821538454
← Progress: 107.25 million rows, 1.50 GB (94.32 million rows/s., 1.32 GB/↖ Progress: 108.96 million rows, 1.52 GB (95.82 million rows/s., 1.34 GB/
28 rows in set. Elapsed: 1.137 sec. Processed 108.96 million rows, 1.52 GB (95.80 million rows/s., 1.34 GB/s.)
To learn more about Benchmarking in ClickHouse, do consider reading the following articles:
- Benchmarking ClickHouse Using the clickhouse-benchmark Tool
- ChistaDATA Cloud for ClickHouse v/s Google BigQuery: Comparative Benchmarking
- ChistaDATA Cloud for ClickHouse: Benchmarking on OpenTelemetry
You might also like:
- Runbook for Migration from Amazon Redshift to ChistaDATA Cloud for ClickHouse: Part 1
- Mastering Index Selection and Troubleshooting in ClickHouse
- Building Multi-Tenant ClickHouse Clusters
- Using ClickHouse-Backup for Comprehensive ClickHouse® Backup and Restore Operations
- Real-Time Performance Monitoring for ClickHouse Cluster Using Python Script