ClickHouse Performance Benchmarking – Brown University

Brown University Benchmark on ClickHouse


At ChistaDATA, We work for some of the largest ClickHouse installation worldwide and performance is very critical for our customers and ChistaDATA is committed to delivering optimal ClickHouse operations to the clients globally. ChistaDATA provides/publishes ClickHouse Application performance. We publish data of  both hardware and software infrastructure used for benchmarking ClickHouse with GitHub / data if ever needed to reproduce the same. The performance metrics published covers both Average Response Time / Latency and Throughput. We strongly believe that it’s the responsibility of ClickHouse infrastructure stakeholders / DBAs / Data SREs / Performance Engineers to understand the thresholds of their ClickHouse operations which eventually depend on RAM, CPU, Disk performance and Network latency. The key to ClickHouse performance benchmarking is to deliver consistently reproducible results. This is really important because the reproducible results allows you to rerun the tests and also gain confidence in the overall ClickHouse performance benchmarking exercise.

Hardware Infrastructure and Software Platforms Information

CPU

vendor and model details

# view vendor name

root@Performance-LAB:~# cat /proc/cpuinfo | grep 'vendor' | uniq
vendor_id       : GenuineIntel
# Display model name

root@Performance-LAB:~# cat /proc/cpuinfo | grep 'model name' | uniq    
model name      : Intel Xeon Processor (Icelake)

CPU count

# Count the number of processing units

root@Performance-LAB:~# cat /proc/cpuinfo | grep processor | wc -l
8
root@Performance-LAB:~# cat /proc/cpuinfo | grep 'core id'
core id         : 0
core id         : 1
core id         : 2
core id         : 3
core id         : 4
core id         : 5
core id         : 6
core id         : 7

Memory Available / RAM Info.

## RAM info. 


root@Performance-LAB:~# 
root@Performance-LAB:~# free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       286Mi        14Gi       0.0Ki       1.3Gi        14Gi
Swap:             0B          0B          0B
root@Performance-LAB:~#

Disk Infrastructure Operations – Capacity and Throughput

### Disk Information 

root@Performance-LAB:~# fdisk -l
Disk /dev/loop0: 61.83 MiB, 64835584 bytes, 126632 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop1: 72.56 MiB, 76083200 bytes, 148600 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop2: 32.44 MiB, 34017280 bytes, 66440 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/vda: 200 GiB, 214748364800 bytes, 419430400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: ED13961A-1618-437B-A9EB-A140AAA7F944

Device      Start       End   Sectors   Size Type
/dev/vda1  227328 419430366 419203039 199.9G Linux filesystem
/dev/vda14   2048     10239      8192     4M BIOS boot
/dev/vda15  10240    227327    217088   106M Microsoft basic data

Software Infrastructure

## Linux distribution / Ubuntu release

root@PERFORMANCE-LAB:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 21.10
Release:        21.10
Codename:       impish

## ClickHouse version 

root@PERFORMANCE-LAB:~# clickhouse-client --password
ClickHouse client version 21.11.6.7 (official build).
Password for user (default): 
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 21.11.6 revision 54450.

PERFORMANCE-LAB :) 

Building ClickHouse Infrastructure for Performance Benchmarking

Source: Brown University Benchmark (MgBench is a new analytical benchmark for machine-generated log data) provided by Andrew Crotty

Step 1: Download the data (schema and data) for benchmarking ClickHouse:

wget https://datasets.clickhouse.com/mgbench{1..3}.csv.xz

Step 2: Unpack the data downloaded:

xz -v -d mgbench{1..3}.csv.xz

Step 3: Create Schema Objects / Tables:

CREATE DATABASE mgbench;


CREATE TABLE mgbench.logs1 (
  log_time      DateTime,
  machine_name  LowCardinality(String),
  machine_group LowCardinality(String),
  cpu_idle      Nullable(Float32),
  cpu_nice      Nullable(Float32),
  cpu_system    Nullable(Float32),
  cpu_user      Nullable(Float32),
  cpu_wio       Nullable(Float32),
  disk_free     Nullable(Float32),
  disk_total    Nullable(Float32),
  part_max_used Nullable(Float32),
  load_fifteen  Nullable(Float32),
  load_five     Nullable(Float32),
  load_one      Nullable(Float32),
  mem_buffers   Nullable(Float32),
  mem_cached    Nullable(Float32),
  mem_free      Nullable(Float32),
  mem_shared    Nullable(Float32),
  swap_free     Nullable(Float32),
  bytes_in      Nullable(Float32),
  bytes_out     Nullable(Float32)
)
ENGINE = MergeTree()
ORDER BY (machine_group, machine_name, log_time);


CREATE TABLE mgbench.logs2 (
  log_time    DateTime,
  client_ip   IPv4,
  request     String,
  status_code UInt16,
  object_size UInt64
)
ENGINE = MergeTree()
ORDER BY log_time;


CREATE TABLE mgbench.logs3 (
  log_time     DateTime64,
  device_id    FixedString(15),
  device_name  LowCardinality(String),
  device_type  LowCardinality(String),
  device_floor UInt8,
  event_type   LowCardinality(String),
  event_unit   FixedString(1),
  event_value  Nullable(Float32)
)
ENGINE = MergeTree()
ORDER BY (event_type, log_time);

Step 4: Data loading

clickhouse-client --query "INSERT INTO mgbench.logs1 FORMAT CSVWithNames" < mgbench1.csv
clickhouse-client --query "INSERT INTO mgbench.logs2 FORMAT CSVWithNames" < mgbench2.csv
clickhouse-client --query "INSERT INTO mgbench.logs3 FORMAT CSVWithNames" < mgbench3.csv

Benchmarking ClickHouse Performance

— SQL 1: What is the CPU/network utilization for each web server since midnight?

SELECT machine_name,
       MIN(cpu) AS cpu_min,
       MAX(cpu) AS cpu_max,
       AVG(cpu) AS cpu_avg,
       MIN(net_in) AS net_in_min,
       MAX(net_in) AS net_in_max,
       AVG(net_in) AS net_in_avg,
       MIN(net_out) AS net_out_min,
       MAX(net_out) AS net_out_max,
       AVG(net_out) AS net_out_avg
FROM (
  SELECT machine_name,
         COALESCE(cpu_user, 0.0) AS cpu,
         COALESCE(bytes_in, 0.0) AS net_in,
         COALESCE(bytes_out, 0.0) AS net_out
  FROM logs1
  WHERE machine_name IN ('anansi','aragog','urd')
    AND log_time >= TIMESTAMP '2017-01-11 00:00:00'
) AS r
GROUP BY machine_name\G

Query id: 6772d840-bc4f-4d90-8d78-d47a8ecf332e

Row 1:
──────
machine_name: urd
cpu_min:      0
cpu_max:      23.493331909179688
cpu_avg:      1.499065357465967
net_in_min:   1168.3499755859375
net_in_max:   368025.625
net_in_avg:   22100.285078089873
net_out_min:  220.95001220703125
net_out_max:  108069.65625
net_out_avg:  8129.441456009021

Row 2:
──────
machine_name: anansi
cpu_min:      0
cpu_max:      22.5
cpu_avg:      0.1426860871961902
net_in_min:   938.949951171875
net_in_max:   2355335.75
net_in_avg:   9177.160347696004
net_out_min:  216.8800048828125
net_out_max:  2354536.25
net_out_avg:  13467.316385337206

Row 3:
──────
machine_name: aragog
cpu_min:      0
cpu_max:      31.14000129699707
cpu_avg:      0.46608482347828106
net_in_min:   1318.68994140625
net_in_max:   27836488
net_in_avg:   328922.02496959124
net_out_min:  308.9499816894531
net_out_max:  40642404
net_out_avg:  825490.5620742807
↖ Progress: 217.89 thousand rows, 607.23 KB (25.75 million rows/s., 71.75 M
3 rows in set. Elapsed: 0.009 sec. Processed 217.89 thousand rows, 607.23 KB (25.39 million rows/s., 70.75 MB/s.)

–SQL 2: Which computer lab machines have been offline in the past day?

SELECT machine_name,
       log_time
FROM logs1
WHERE (machine_name LIKE 'cslab%' OR
       machine_name LIKE 'mslab%')
  AND load_one IS NULL
  AND log_time >= TIMESTAMP '2017-01-10 00:00:00'
ORDER BY machine_name,
         log_time;
         
Query id: fb5f4823-8221-404c-aec5-4e4ef59a95db

Row 1:
──────
machine_name: cslab9e
log_time:     2017-01-10 14:14:15

Row 2:
──────
machine_name: cslab9e
log_time:     2017-01-10 14:14:30

Row 3:
──────
machine_name: cslab9e
log_time:     2017-01-10 14:14:45

Row 4:
──────
machine_name: cslab9e
log_time:     2017-01-10 14:15:15

Row 5:
──────
machine_name: cslab9e
log_time:     2017-01-10 14:15:30

Row 6:
──────
machine_name: cslab9e
log_time:     2017-01-10 14:15:45

Row 7:
──────
machine_name: cslab9e
log_time:     2017-01-10 14:16:15

Row 8:
──────
machine_name: cslab9e
log_time:     2017-01-10 14:16:30

Row 9:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:16:45

Row 10:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:17:15

Row 11:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:17:30

Row 12:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:17:45

Row 13:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:18:15

Row 14:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:18:30

Row 15:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:18:45

Row 16:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:19:15

Row 17:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:19:30

Row 18:
───────
machine_name: cslab9e
log_time:     2017-01-10 14:19:45
↗ Progress: 1.63 million rows, 14.84 MB (92.92 million rows/s., 843.85 MB/s
18 rows in set. Elapsed: 0.018 sec. Processed 1.63 million rows, 14.84 MB (91.67 million rows/s., 832.47 MB/s.)

–SQL 3: What are the hourly average metrics during the past 10 days for a specific workstation?

SELECT dt,
       hr,
       AVG(load_fifteen) AS load_fifteen_avg,
       AVG(load_five) AS load_five_avg,
       AVG(load_one) AS load_one_avg,
       AVG(mem_free) AS mem_free_avg,
       AVG(swap_free) AS swap_free_avg
FROM (
  SELECT CAST(log_time AS DATE) AS dt,
         EXTRACT(HOUR FROM log_time) AS hr,
         load_fifteen,
         load_five,
         load_one,
         mem_free,
         swap_free
  FROM logs1
  WHERE machine_name = 'babbage'
    AND load_fifteen IS NOT NULL
    AND load_five IS NOT NULL
    AND load_one IS NOT NULL
    AND mem_free IS NOT NULL
    AND swap_free IS NOT NULL
    AND log_time >= TIMESTAMP '2017-01-01 00:00:00'
) AS r
GROUP BY dt,
         hr
ORDER BY dt,
         hr\G

Row 1:
──────
dt:               2017-01-01
hr:               0
load_fifteen_avg: 0.12023027762770652
load_five_avg:    0.09362472295761108
load_one_avg:     0.08744388920943796
mem_free_avg:     2170359.9625
swap_free_avg:    8388604

Row 2:
──────
dt:               2017-01-01
hr:               1
load_fifteen_avg: 0.06573361121118068
load_five_avg:    0.06255777780897916
load_one_avg:     0.05611999999382533
mem_free_avg:     2170050.05
swap_free_avg:    8388604

Row 3:
──────
dt:               2017-01-01
hr:               2
load_fifteen_avg: 0.06406694483011961
load_five_avg:    0.08363861131171385
load_one_avg:     0.07870277822948993
mem_free_avg:     2170090.8666666667
swap_free_avg:    8388604

Row 4:
──────
dt:               2017-01-01
hr:               3
load_fifteen_avg: 0.0986541671678424
load_five_avg:    0.1006138886945943
load_one_avg:     0.10358000136911868
mem_free_avg:     2169553.683333333
swap_free_avg:    8388604

Row 5:
──────
dt:               2017-01-01
hr:               4
load_fifteen_avg: 0.09523666650056839
load_five_avg:    0.09710055496543646
load_one_avg:     0.09572555573152688
mem_free_avg:     2167171.0083333333
swap_free_avg:    8388604

↗ Progress: 193.32 thousand rows, 2.34 MB (15.36 million rows/s., 186.18 
255 rows in set. Elapsed: 0.013 sec. Processed 193.32 thousand rows, 2.34 MB (15.21 million rows/s., 184.41 MB/s.)

–SQL 4 – Over 1 month, how often was each server blocked on disk I/O?

SELECT machine_name,
       COUNT(*) AS spikes
FROM logs1
WHERE machine_group = 'Servers'
  AND cpu_wio > 0.99
  AND log_time >= TIMESTAMP '2016-12-01 00:00:00'
  AND log_time < TIMESTAMP '2017-01-01 00:00:00'
GROUP BY machine_name
ORDER BY spikes DESC
LIMIT 10\G

Query id: 3d9c7259-85f6-44cf-a538-acad00c162df

Row 1:
──────
machine_name: sourpatch
spikes:       8861

Row 2:
──────
machine_name: louie
spikes:       581

Row 3:
──────
machine_name: zotz
spikes:       550

Row 4:
──────
machine_name: pieces
spikes:       495

Row 5:
──────
machine_name: reeces
spikes:       453

Row 6:
──────
machine_name: thing
spikes:       438

Row 7:
──────
machine_name: adminhost
spikes:       366

Row 8:
──────
machine_name: cadbury
spikes:       357

Row 9:
───────
machine_name: york
spikes:       193

Row 10:
───────
machine_name: reflect
spikes:       156
↙ Progress: 1.65 million rows, 16.66 MB (104.63 million rows/s., 1.06 GB/s.
10 rows in set. Elapsed: 0.016 sec. Processed 1.65 million rows, 16.66 MB (102.57 million rows/s., 1.04 GB/s.)

–SQL 5: Which externally reachable VMs have run low on memory?

SELECT machine_name,
       dt,
       MIN(mem_free) AS mem_free_min
FROM (
  SELECT machine_name,
         CAST(log_time AS DATE) AS dt,
         mem_free
  FROM logs1
  WHERE machine_group = 'DMZ'
    AND mem_free IS NOT NULL
) AS r
GROUP BY machine_name,
         dt
HAVING MIN(mem_free) < 10000
ORDER BY machine_name,
         dt;
         
         
Query id: f4af0029-2707-4c09-b775-160dfcebc957

Row 1:
──────
machine_name: chimera
dt:           2016-11-04
mem_free_min: 6135.48

Row 2:
──────
machine_name: chimera
dt:           2016-11-26
mem_free_min: 6045.8667

Row 3:
──────
machine_name: chimera
dt:           2016-11-27
mem_free_min: 9374.527

Row 4:
──────
machine_name: chimera
dt:           2016-11-30
mem_free_min: 6933.56

Row 5:
──────
machine_name: chimera
dt:           2016-12-01
mem_free_min: 6535.8135

Row 6:
──────
machine_name: chimera
dt:           2016-12-02
mem_free_min: 6393.74

Row 7:
──────
machine_name: chimera
dt:           2016-12-11
mem_free_min: 8010.5

Row 8:
──────
machine_name: chimera
dt:           2016-12-30
mem_free_min: 8167.6665

Row 9:
───────
machine_name: chimera
dt:           2016-12-31
mem_free_min: 6324.6665

Row 10:
───────
machine_name: chimera
dt:           2017-01-01
mem_free_min: 5972

Row 11:
───────
machine_name: chimera
dt:           2017-01-03
mem_free_min: 5948

Row 12:
───────
machine_name: chimera
dt:           2017-01-04
mem_free_min: 7320

Row 13:
───────
machine_name: chimera
dt:           2017-01-05
mem_free_min: 8144

Row 14:
───────
machine_name: chimera
dt:           2017-01-07
mem_free_min: 6308.4663

Row 15:
───────
machine_name: chimera
dt:           2017-01-09
mem_free_min: 9370

Row 16:
───────
machine_name: chimera
dt:           2017-01-10
mem_free_min: 7147.467
↖ Progress: 3.54 million rows, 39.12 MB (108.02 million rows/s., 1.19 GB/s.
16 rows in set. Elapsed: 0.033 sec. Processed 3.54 million rows, 39.12 MB (106.94 million rows/s., 1.18 GB/s.)

— SQL 7: What is the total hourly network traffic across all file servers?

SELECT dt,
       hr,
       SUM(net_in) AS net_in_sum,
       SUM(net_out) AS net_out_sum,
       SUM(net_in) + SUM(net_out) AS both_sum
FROM (
  SELECT CAST(log_time AS DATE) AS dt,
         EXTRACT(HOUR FROM log_time) AS hr,
         COALESCE(bytes_in, 0.0) / 1000000000.0 AS net_in,
         COALESCE(bytes_out, 0.0) / 1000000000.0 AS net_out
  FROM logs1
  WHERE machine_name IN ('allsorts','andes','bigred','blackjack','bonbon',
      'cadbury','chiclets','cotton','crows','dove','fireball','hearts','huey',
      'lindt','milkduds','milkyway','mnm','necco','nerds','orbit','peeps',
      'poprocks','razzles','runts','smarties','smuggler','spree','stride',
      'tootsie','trident','wrigley','york')
) AS r
GROUP BY dt,
         hr
ORDER BY both_sum DESC
LIMIT 10\G

Query id: 78f32d40-af55-49e1-8338-768144f54c2b

Row 1:
──────
dt:          2017-01-10
hr:          17
net_in_sum:  87.94858539170787
net_out_sum: 71.38525362068188
both_sum:    159.33383901238975

Row 2:
──────
dt:          2017-01-10
hr:          18
net_in_sum:  89.64930670570355
net_out_sum: 69.15606039811588
both_sum:    158.80536710381944

Row 3:
──────
dt:          2017-01-10
hr:          19
net_in_sum:  73.60724927620798
net_out_sum: 81.40722209611724
both_sum:    155.0144713723252

Row 4:
──────
dt:          2017-01-10
hr:          20
net_in_sum:  68.54794632249272
net_out_sum: 85.50667667204637
both_sum:    154.0546229945391

Row 5:
──────
dt:          2017-01-11
hr:          13
net_in_sum:  42.0006900292216
net_out_sum: 69.61384714089114
both_sum:    111.61453717011274

Row 6:
──────
dt:          2017-01-10
hr:          16
net_in_sum:  52.18435269713231
net_out_sum: 56.94834975170359
both_sum:    109.1327024488359

Row 7:
──────
dt:          2017-01-10
hr:          15
net_in_sum:  58.92187069434693
net_out_sum: 47.70097635240866
both_sum:    106.6228470467556

Row 8:
──────
dt:          2017-01-10
hr:          14
net_in_sum:  60.98027388591966
net_out_sum: 44.4488195209332
both_sum:    105.42909340685286

Row 9:
───────
dt:          2017-01-10
hr:          21
net_in_sum:  45.79773242298867
net_out_sum: 54.048210619660104
both_sum:    99.84594304264877

Row 10:
───────
dt:          2017-01-11
hr:          4
net_in_sum:  48.96217601967555
net_out_sum: 48.37625873783819
both_sum:    97.33843475751374
↗ Progress: 1.79 million rows, 20.13 MB (78.76 million rows/s., 887.07 MB/s
10 rows in set. Elapsed: 0.023 sec. Processed 1.79 million rows, 20.13 MB (77.66 million rows/s., 874.69 MB/s.)

–SQL 8 – Which requests have caused server errors within the past 2 weeks?

SELECT *
FROM logs2
WHERE status_code >= 500
  AND log_time >= TIMESTAMP '2012-12-18 00:00:00'
ORDER BY log_time\G

Query id: 41fbe700-34ff-4fd4-ab90-4fcbfeae9715

Row 1:
──────
log_time:    2012-12-21 16:58:00
client_ip:   229.50.247.232
request:     //cgi-bin/feedback/
status_code: 500
object_size: 410

Row 2:
──────
log_time:    2012-12-23 23:28:02
client_ip:   229.50.247.232
request:     //cgi-bin/feedback/
status_code: 500
object_size: 410
↗ Progress: 2.00 million rows, 4.72 MB (338.82 million rows/s., 798.60 MB/s
2 rows in set. Elapsed: 0.006 sec. Processed 2.00 million rows, 4.72 MB (324.94 million rows/s., 765.89 MB/s.)

–SQL 9 – During a specific 2-week period, was the user password file leaked?

SELECT *
FROM logs2
WHERE status_code >= 200
  AND status_code < 300
  AND request LIKE '%/etc/passwd%'
  AND log_time >= TIMESTAMP '2012-05-06 00:00:00'
  AND log_time < TIMESTAMP '2012-05-20 00:00:00'\G


Query id: 5782f776-56c9-4701-a1eb-0213f2ee96a9

Row 1:
──────
log_time:    2012-05-09 14:46:58
client_ip:   201.183.185.11
request:     /?-nd+auto_prepend_file%3D/etc/passwd
status_code: 200
object_size: 21173

Row 2:
──────
log_time:    2012-05-13 20:17:05
client_ip:   201.183.185.11
request:     /?-nd+auto_prepend_file%3D/etc/passwd
status_code: 200
object_size: 21809
↖ Progress: 1.83 million rows, 120.13 MB (45.43 million rows/s., 2.99 GB/s.
2 rows in set. Elapsed: 0.040 sec. Processed 1.83 million rows, 120.13 MB (45.12 million rows/s., 2.97 GB/s.)

–SQL 10: What was the average path depth for top-level requests in the past month?

SELECT top_level,
       AVG(LENGTH(request) - LENGTH(REPLACE(request, '/', ''))) AS depth_avg
FROM (
  SELECT SUBSTRING(request FROM 1 FOR len) AS top_level,
         request
  FROM (
    SELECT POSITION(SUBSTRING(request FROM 2), '/') AS len,
           request
    FROM logs2
    WHERE status_code >= 200
      AND status_code < 300
      AND log_time >= TIMESTAMP '2012-12-01 00:00:00'
  ) AS r
  WHERE len > 0
) AS s
WHERE top_level IN ('/about','/courses','/degrees','/events',
                    '/grad','/industry','/news','/people',
                    '/publications','/research','/teaching','/ugrad')
GROUP BY top_level
ORDER BY top_level\G

Query id: dfeb4126-4564-443c-b571-c70130ca1768

↑ Progress: 4.33 million rows, 230.42 MB (39.11 million rows/s., 2.08 GB/s.Row 1:
──────
top_level: /about
depth_avg: 4.180225643663013

Row 2:
──────
top_level: /courses
depth_avg: 5.138040610251345

Row 3:
──────
top_level: /degrees
depth_avg: 3.385165907612232

Row 4:
──────
top_level: /events
depth_avg: 3.105441792057925

Row 5:
──────
top_level: /grad
depth_avg: 3.6942030710629687

Row 6:
──────
top_level: /industry
depth_avg: 4.094312105007292

Row 7:
──────
top_level: /news
depth_avg: 2.963451710977424

Row 8:
──────
top_level: /people
depth_avg: 4.485873513820637

Row 9:
───────
top_level: /publications
depth_avg: 2.245890410958904

Row 10:
───────
top_level: /research
depth_avg: 5.040612211618354

Row 11:
───────
top_level: /teaching
depth_avg: 2

Row 12:
───────
top_level: /ugrad
depth_avg: 2.4527363184079602

↗ Progress: 4.33 million rows, 230.42 MB (27.43 million rows/s., 1.46 GB/s.→ Progress: 4.80 million rows, 255.41 MB (30.39 million rows/s., 1.62 GB/s.
12 rows in set. Elapsed: 0.158 sec. Processed 4.80 million rows, 255.41 MB (30.35 million rows/s., 1.62 GB/s.)

–SQL 11: During the last 3 months, which clients have made an excessive number of requests?

SELECT client_ip,
       COUNT(*) AS num_requests
FROM logs2
WHERE log_time >= TIMESTAMP '2012-10-01 00:00:00'
GROUP BY client_ip
HAVING COUNT(*) >= 100000
ORDER BY num_requests DESC\G

Query id: 605feb05-9719-46fa-ba69-324b53ec1277

Row 1:
──────
client_ip:    219.63.173.93
num_requests: 1540391

Row 2:
──────
client_ip:    229.50.247.232
num_requests: 743801

Row 3:
──────
client_ip:    97.211.80.244
num_requests: 733261

Row 4:
──────
client_ip:    152.149.228.251
num_requests: 492221

Row 5:
──────
client_ip:    198.156.249.133
num_requests: 370834

Row 6:
──────
client_ip:    70.86.124.37
num_requests: 273057

Row 7:
──────
client_ip:    67.153.111.239
num_requests: 167287

Row 8:
──────
client_ip:    249.92.17.134
num_requests: 112909
↓ Progress: 17.98 million rows, 143.83 MB (221.43 million rows/s., 1.77 GB/
8 rows in set. Elapsed: 0.082 sec. Processed 17.98 million rows, 143.83 MB (220.44 million rows/s., 1.76 GB/s.)

–SQL 11: What are the daily unique visitors ?

SELECT dt,
       COUNT(DISTINCT client_ip)
FROM (
  SELECT CAST(log_time AS DATE) AS dt,
         client_ip
  FROM logs2
) AS r
GROUP BY dt
ORDER BY dt\G



Row 361:
────────
dt:                   2012-12-26
uniqExact(client_ip): 8799

Row 362:
────────
dt:                   2012-12-27
uniqExact(client_ip): 9435

Row 363:
────────
dt:                   2012-12-28
uniqExact(client_ip): 9185

Row 364:
────────
dt:                   2012-12-29
uniqExact(client_ip): 8247

Row 365:
────────
dt:                   2012-12-30
uniqExact(client_ip): 8328

Row 366:
────────
dt:                   2012-12-31

uniqExact(client_ip): 7824
← Progress: 75.75 million rows, 605.98 MB (273.04 million rows/s., 2.18 G
366 rows in set. Elapsed: 0.278 sec. Processed 75.75 million rows, 605.98 MB (272.73 million rows/s., 2.18 GB/s.)

–SQL 12: What are the average and maximum data transfer rates (Gbps)?

SELECT AVG(transfer) / 125000000.0 AS transfer_avg,
       MAX(transfer) / 125000000.0 AS transfer_max
FROM (
  SELECT log_time,
         SUM(object_size) AS transfer
  FROM logs2
  GROUP BY log_time
) AS r\G


 Query id: 06ad0671-d9ea-4881-a68f-7865b193d219

 ↖ Progress: 17.24 million rows, 206.83 MB (168.95 million rows/s., 2.03 G↑ Progress: 35.32 million rows, 423.79 MB (174.44 million rows/s., 2.09 G↗ Progress: 62.59 million rows, 751.14 MB (206.70 million rows/s., 2.48 G→ Progress: 70.30 million rows, 843.54 MB (174.35 million rows/s., 2.09 G↘ Progress: 70.69 million rows, 848.26 MB (140.37 million rows/s., 1.68 G↓ Progress: 72.26 million rows, 867.13 MB (119.64 million rows/s., 1.44 G↙ Progress: 75.75 million rows, 908.98 MB (107.54 million rows/s., 1.29 GRow 1:
 ──────
 transfer_avg: 0.0046296999419207785
 transfer_max: 295.028835936
 ← Progress: 75.75 million rows, 908.98 MB (64.98 million rows/s., 779.80 
 1 rows in set. Elapsed: 1.166 sec. Processed 75.75 million rows, 908.98 MB (64.94 million rows/s., 779.28 MB/s.)

— SQL 17: Did the indoor temperature reach freezing over the weekend?

SELECT *
 FROM logs3
 WHERE event_type = 'temperature'
   AND event_value <= 32.0
   AND log_time >= '2019-11-29 17:00:00.000'\G
   
   Row 1:
   ──────
   log_time:     2019-12-01 09:30:13.417
   device_id:    16C448031108ABA
   device_name:  balcony_5
   device_type:  door
   device_floor: 5
   event_type:   temperature
   event_unit:   F
   event_value:  32

   Row 2:
   ──────
   log_time:     2019-12-01 09:59:58.874
   device_id:    16C448031108ABA
   device_name:  balcony_5
   device_type:  door
   device_floor: 5
   event_type:   temperature
   event_unit:   F
   event_value:  32
   ↑ Progress: 7.35 thousand rows, 228.52 KB (1.56 million rows/s., 48.57 MB
   2 rows in set. Elapsed: 0.005 sec. Processed 7.35 thousand rows, 228.52 KB (1.51 million rows/s., 46.96 MB/s.)

–SQL 18: Over the past 6 months, how frequently were each door opened?

 SELECT device_name,
         device_floor,
         COUNT(*) AS ct
  FROM logs3
  WHERE event_type = 'door_open'
    AND log_time >= '2019-06-01 00:00:00.000'
  GROUP BY device_name,
           device_floor
  ORDER BY ct DESC\G

Row 15:
───────
device_name:  lobby_left_1
device_floor: 1
ct:           2944

Row 16:
───────
device_name:  stairs_north_5
device_floor: 5
ct:           1731

Row 17:
───────
device_name:  lobby_right_1
device_floor: 1
ct:           1641

Row 18:
───────
device_name:  stairs_east_5
device_floor: 5
ct:           1637

Row 19:
───────
device_name:  balcony_3
device_floor: 3
ct:           1619

Row 20:
───────
device_name:  stairs_east_4
device_floor: 4
ct:           1093
→ Progress: 245.76 thousand rows, 2.71 MB (14.90 million rows/s., 164.09 
20 rows in set. Elapsed: 0.017 sec. Processed 245.76 thousand rows, 2.71 MB (14.67 million rows/s., 161.52 MB/s.)

–SQL 19: For each device category, what are the monthly power consumption metrics?

SELECT yr,
       mo,
       SUM(coffee_hourly_avg) AS coffee_monthly_sum,
       AVG(coffee_hourly_avg) AS coffee_monthly_avg,
       SUM(printer_hourly_avg) AS printer_monthly_sum,
       AVG(printer_hourly_avg) AS printer_monthly_avg,
       SUM(projector_hourly_avg) AS projector_monthly_sum,
       AVG(projector_hourly_avg) AS projector_monthly_avg,
       SUM(vending_hourly_avg) AS vending_monthly_sum,
       AVG(vending_hourly_avg) AS vending_monthly_avg
FROM (
  SELECT dt,
         yr,
         mo,
         hr,
         AVG(coffee) AS coffee_hourly_avg,
         AVG(printer) AS printer_hourly_avg,
         AVG(projector) AS projector_hourly_avg,
         AVG(vending) AS vending_hourly_avg
  FROM (
    SELECT CAST(log_time AS DATE) AS dt,
           EXTRACT(YEAR FROM log_time) AS yr,
           EXTRACT(MONTH FROM log_time) AS mo,
           EXTRACT(HOUR FROM log_time) AS hr,
           CASE WHEN device_name LIKE 'coffee%' THEN event_value END AS coffee,
           CASE WHEN device_name LIKE 'printer%' THEN event_value END AS printer,
           CASE WHEN device_name LIKE 'projector%' THEN event_value END AS projector,
           CASE WHEN device_name LIKE 'vending%' THEN event_value END AS vending
    FROM logs3
    WHERE device_type = 'meter'
  ) AS r
  GROUP BY dt,
           yr,
           mo,
           hr
) AS s
GROUP BY yr,
         mo
ORDER BY yr,
         mo\G

Row 25:
───────
yr:                    2019
mo:                    9
coffee_monthly_sum:    862784.4318728528
coffee_monthly_avg:    1198.3117109345178
printer_monthly_sum:   63348.6489027506
printer_monthly_avg:   87.98423458715362
projector_monthly_sum: 56973.39566326607
projector_monthly_avg: 126.32681965247465
vending_monthly_sum:   322162.84531000565
vending_monthly_avg:   447.4483962638967

Row 26:
───────
yr:                    2019
mo:                    10
coffee_monthly_sum:    900170.2429565514
coffee_monthly_avg:    1209.9062405329992
printer_monthly_sum:   64953.121362668586
printer_monthly_avg:   87.30258247670508
projector_monthly_sum: 98168.02805722524
projector_monthly_avg: 218.6370335350228
vending_monthly_sum:   310577.09491537174
vending_monthly_avg:   417.44233187549963

Row 27:
───────
yr:                    2019
mo:                    11
coffee_monthly_sum:    881836.9061517064
coffee_monthly_avg:    1224.773480766259
printer_monthly_sum:   61941.82869595774
printer_monthly_avg:   86.03031763327463
projector_monthly_sum: 181716.37704881537
projector_monthly_avg: 294.9941185857392
vending_monthly_sum:   258322.18736711267
vending_monthly_avg:   358.7808157876565

Row 28:
───────
yr:                    2019
mo:                    12
coffee_monthly_sum:    45118.35452979575
coffee_monthly_avg:    1253.28762582766
printer_monthly_sum:   2884.8668441554055
printer_monthly_avg:   80.13519011542793
projector_monthly_sum: 13576.481448950077
projector_monthly_avg: 377.1244846930577
vending_monthly_sum:   12249.124855753844
vending_monthly_avg:   340.25346821538454
← Progress: 107.25 million rows, 1.50 GB (94.32 million rows/s., 1.32 GB/↖ Progress: 108.96 million rows, 1.52 GB (95.82 million rows/s., 1.34 GB/
28 rows in set. Elapsed: 1.137 sec. Processed 108.96 million rows, 1.52 GB (95.80 million rows/s., 1.34 GB/s.)

 

 

About Shiv Iyer 3 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.