Getting Started with ClickHouse using clickhouse-local

Introduction

Would you like to run ClickHouse queries without a server or attached storage? If your answer is yes, you can prefer clickhouse-local.

We want to introduce you to clickhouse-local. It is all about SQL queries and the rich ClickHouse function library and speed. ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utility.

In this article, we will address the questions of what clickhouse-local is, what it is good for, and which questions can be answered with clickhouse-local, and we will enrich our article with examples.

According to the ClickHouse site, the meaning of clickhouse-local is:

The clickhouse-local program enables you to perform fast processing on local files without deploying and configuring the ClickHouse server.

clickhouse-local uses the same core as the ClickHouse server, supporting most of the features and the same set of formats and table engines.

By default clickhouse-local does not have access to data on the same host, but it supports loading server configuration using --config-file an argument.

clickhouse-client and clickhouse-local

clickhouse-client is a client application that is used to connect to ClickHouse from the command line. clickhouse-local is a client application that is used to query files on disk and across the network. Many of the guides in the ClickHouse documentation will have you examine the schema of a file (CSV, TSV, Parquet, etc.) with clickhouse-local, query the file, and even manipulate the data from the file in order to prepare it for insertion into ClickHouse. We will often have you query a file with clickhouse-local and pipe the output to clickhouse-client to stream the data into ClickHouse.

Installation of ClickHouse with Homebrew

You can run the following command to install ClickHouse with Homebrew.

brew install clickhouse

If ClickHouse does not start automatically, you can start the ClickHouse service with the command below.

brew services start clickhouse

Now let’s test our command and connect to the clickhouse-local service.

root@a1b9dbb4b370:/# clickhouse-local
ClickHouse local version 22.9.3.18 (official build).

a1b9dbb4b370 :) 

Now, we can give information with different examples with clickhouse-local.

A simple example of clickhouse-local;

output memory user for each Unix user:

$ ps aux | tail -n +2 | awk '{ printf("%s\t%s\n", $1, $4) }' \
    | clickhouse-local --structure "user String, mem Float64" \
        --query "SELECT user, round(sum(mem), 2) as memTotal
            FROM table GROUP BY user ORDER BY memTotal DESC FORMAT Pretty"
┏━━━━━━━━━━┳━━━━━━━━━━┓
┃ user     ┃ memTotal ┃
┡━━━━━━━━━━╇━━━━━━━━━━┩
│ clickho+ │     22.4 │
├──────────┼──────────┤
│ root     │     15.6 │
├──────────┼──────────┤
│ ubuntu   │      0.8 │
├──────────┼──────────┤
│ systemd+ │      0.3 │
├──────────┼──────────┤
│ syslog   │      0.1 │
├──────────┼──────────┤
│ message+ │      0.1 │
├──────────┼──────────┤
│ daemon   │        0 │
└──────────┴──────────┘

Query a CSV file using clickhouse-local

Imagine you have a CSV file and want to extract something from that using the full power of SQL queries. Of course, you can insert that CSV file into ClickHouse (or some other DBMS), but for a one-time query, it sounds like overkill.

So how can ‘clickhouse-local’ help in that case? Let’s check.

First of all, you need to have ‘clickhouse-local’ installed. It is distributed as a part of ‘clickhouse-client’ package, so if you have ‘clickhouse-client’ in your system, it means you already have ‘clickhouse-local’.

clickhouse-local \
    --file=drivers.csv \
    --input-format=CSVWithNames \
    --table=drivers\
    --structure='driverId String,driverRef String,number String,forename String,surname String,dob String,nationality String,url String,' \
    --query='SELECT forename, surname, dob from drivers GROUP BY forename ORDER BY dob FORMAT PrettyCompact'
┌─forename───┬─surname───┬───────────dob─┐
│ Lewis      │   Hamilton│    1985-01-07 │
│ Fernando   │     Alonso│    1981-07-29 │
│ Nico       │    Rosberg│    1985-06-27 │
...
│ Kimi       │  Räikkönen│    1979-10-17 │
│ Felipe     │      Massa│    1981-04-25 │
│ Jenson     │     Button│.   1980-01-19 │
...
# time 
# real    0m2.348s
# user    0m2.146s
# sys     0m0.081s

In this example, we selected driver names, surnames, and date of birth, in order by date of birth. As you can see, clickhouse-local bring us the result without importing this CSV into the ClickHouse server.

Source of drivers.csv

Let’s compare clickhouse with the two most popular ones: ‘q’ and ‘textql’.

‘textql’ is written in golang on top of SQLite. There is no binary distribution, it needs to be built from the source. Somewhat non-typically for the Unix world, it uses long parameter names with a single dash.

$ textql -header -sql='SELECT forename, surname, dob from drivers GROUP BY forename ORDER BY dob'
/drivers.csv

Lewis   Hamilton    1985-01-07
Nick    Heidfeld    1977-05-10
Nico    Rosberg 1985-06-27
Fernando    Alonso  1981-07-29
Heikki  Kovalainen  1981-10-19
Kazuki  Nakajima    1985-01-11
Sébastien   Bourdais    1979-02-28
Kimi    Räikkönen   1979-10-17
Robert  Kubica  1984-12-07
Timo    Glock   1982-03-18
Takuma  Sato    1977-01-28
Nelson  Piquet Jr.  1985-07-25
Felipe  Massa   1981-04-25
David   Coulthard   1971-03-27
Jarno   Trulli  1974-07-13
Adrian  Sutil   1983-01-11
Mark    Webber  1976-08-27
...

# time 
# real 0m4.314s 
# user 0m4.471s 
# sys 0m0.111s

Let’s try the same with ‘q’. It is also built on top of SQLite, this time using Python.

$ cat /drivers.csv | q -H 'SELECT forename, surname, dob from drivers GROUP BY forename ORDER BY dob' -d,

Lewis   Hamilton    1985-01-07
Nick    Heidfeld    1977-05-10
Nico    Rosberg 1985-06-27
Fernando    Alonso  1981-07-29
Heikki  Kovalainen  1981-10-19
Kazuki  Nakajima    1985-01-11
Sébastien   Bourdais    1979-02-28
Kimi    Räikkönen   1979-10-17
Robert  Kubica  1984-12-07
Timo    Glock   1982-03-18
Takuma  Sato    1977-01-28
Nelson  Piquet Jr.  1985-07-25
Felipe  Massa   1981-04-25
David   Coulthard   1971-03-27
Jarno   Trulli  1974-07-13
Adrian  Sutil   1983-01-11
Mark    Webber  1976-08-27
...

# time 
# real 0m7.863s 
# user 0m6.783s 
# sys 0m0.355s

If we compare query times:

  • clickhouse-local: 0m2.348s
  • textsql: 0m4.314s
  • q: 0m7.863s

‘clickhouse-local’ is certainly much faster than tools built on top of SQLite. ClickHouse’s outstanding performance, rich SQL dialect, and plenty of input/output formats shine here.

Some other examples;

clickhouse-local --query='select *, rand() from numbers(20)'
0	2127369387
1	3261287878
2	2367728063
3	581934576
4	747639222
5	3155791826
6	993342505
7	328854968
8	3202341608
9	1789685972
10	832630616
11	348253160
12	3842952540
13	3441355260
14	749620731
15	2642293777
16	2127129424
17	79035446
18	1452862564
19	1060916590
clickhouse-local --query="select * from url('http://www.geoplugin.net/csv.gp?ip=3.3.3.3','CSV','property String,value String') limit 10" --format=PrettyCompact
┌─property─────────────┬─value──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ geoplugin_request    │ 3.3.3.3                                                                                                                                        │
│ geoplugin_status     │ 206                                                                                                                                            │
│ geoplugin_delay      │ 0ms                                                                                                                                            │
│ geoplugin_credit     │ Some of the returned data includes GeoLite data created by MaxMind available from <a href='http://www.maxmind.com'>http://www.maxmind.com</a>. │
│ geoplugin_city       │                                                                                                                                                │
│ geoplugin_region     │                                                                                                                                                │
│ geoplugin_regionCode │                                                                                                                                                │
│ geoplugin_regionName │                                                                                                                                                │
│ geoplugin_areaCode   │                                                                                                                                                │
│ geoplugin_dmaCode    │                                                                                                                                                │
└──────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Conclusion

This article talked about which scenarios you can set up clickhouse-local and perform operations with. It is for executing analytical SQL queries on CSV files and as a preprocessor in your shell scripts.

‘clickhouse-local’ handles tasks with outstanding performance leaving other tools created for that purpose far behind. Being a part of the clickhouse toolset ‘clickhouse-local’ provides seamless and efficient integration with ‘clickhouse-client’ using Native format.

You can start learning clickhouse with clickhouse-local and clickhouse-client.

To learn more about getting started with ClickHouse, do consider reading the following articles:

Sources and References:

About Can Sayn 36 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website