Introduction
Would you like to run ClickHouse queries without a server or attached storage? If your answer is yes, you can prefer clickhouse-local.
We want to introduce you to clickhouse-local. It is all about SQL queries and the rich ClickHouse function library and speed. ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utility.
In this article, we will address the questions of what clickhouse-local is, what it is good for, and which questions can be answered with clickhouse-local, and we will enrich our article with examples.
According to the ClickHouse site, the meaning of clickhouse-local is:
The clickhouse-local
program enables you to perform fast processing on local files without deploying and configuring the ClickHouse server.
clickhouse-local
uses the same core as the ClickHouse server, supporting most of the features and the same set of formats and table engines.
By default clickhouse-local
does not have access to data on the same host, but it supports loading server configuration using --config-file
an argument.
clickhouse-client and clickhouse-local
clickhouse-client
is a client application that is used to connect to ClickHouse from the command line. clickhouse-local
is a client application that is used to query files on disk and across the network. Many of the guides in the ClickHouse documentation will have you examine the schema of a file (CSV, TSV, Parquet, etc.) with clickhouse-local
, query the file, and even manipulate the data from the file in order to prepare it for insertion into ClickHouse. We will often have you query a file with clickhouse-local
and pipe the output to clickhouse-client
to stream the data into ClickHouse.
Installation of ClickHouse with Homebrew
You can run the following command to install ClickHouse with Homebrew.
brew install clickhouse
If ClickHouse does not start automatically, you can start the ClickHouse service with the command below.
brew services start clickhouse
Now let’s test our command and connect to the clickhouse-local service.
root@a1b9dbb4b370:/# clickhouse-local ClickHouse local version 22.9.3.18 (official build). a1b9dbb4b370 :)
Now, we can give information with different examples with clickhouse-local.
A simple example of clickhouse-local;
output memory user for each Unix user:
$ ps aux | tail -n +2 | awk '{ printf("%s\t%s\n", $1, $4) }' \ | clickhouse-local --structure "user String, mem Float64" \ --query "SELECT user, round(sum(mem), 2) as memTotal FROM table GROUP BY user ORDER BY memTotal DESC FORMAT Pretty"
┏━━━━━━━━━━┳━━━━━━━━━━┓ ┃ user ┃ memTotal ┃ ┡━━━━━━━━━━╇━━━━━━━━━━┩ │ clickho+ │ 22.4 │ ├──────────┼──────────┤ │ root │ 15.6 │ ├──────────┼──────────┤ │ ubuntu │ 0.8 │ ├──────────┼──────────┤ │ systemd+ │ 0.3 │ ├──────────┼──────────┤ │ syslog │ 0.1 │ ├──────────┼──────────┤ │ message+ │ 0.1 │ ├──────────┼──────────┤ │ daemon │ 0 │ └──────────┴──────────┘
Query a CSV file using clickhouse-local
Imagine you have a CSV file and want to extract something from that using the full power of SQL queries. Of course, you can insert that CSV file into ClickHouse (or some other DBMS), but for a one-time query, it sounds like overkill.
So how can ‘clickhouse-local’ help in that case? Let’s check.
First of all, you need to have ‘clickhouse-local’ installed. It is distributed as a part of ‘clickhouse-client’ package, so if you have ‘clickhouse-client’ in your system, it means you already have ‘clickhouse-local’.
clickhouse-local \ --file=drivers.csv \ --input-format=CSVWithNames \ --table=drivers\ --structure='driverId String,driverRef String,number String,forename String,surname String,dob String,nationality String,url String,' \ --query='SELECT forename, surname, dob from drivers GROUP BY forename ORDER BY dob FORMAT PrettyCompact' ┌─forename───┬─surname───┬───────────dob─┐ │ Lewis │ Hamilton│ 1985-01-07 │ │ Fernando │ Alonso│ 1981-07-29 │ │ Nico │ Rosberg│ 1985-06-27 │ ... │ Kimi │ Räikkönen│ 1979-10-17 │ │ Felipe │ Massa│ 1981-04-25 │ │ Jenson │ Button│. 1980-01-19 │ ...
# time # real 0m2.348s # user 0m2.146s # sys 0m0.081s
In this example, we selected driver names, surnames, and date of birth, in order by date of birth. As you can see, clickhouse-local bring us the result without importing this CSV into the ClickHouse server.
Let’s compare clickhouse with the two most popular ones: ‘q’ and ‘textql’.
‘textql’ is written in golang on top of SQLite. There is no binary distribution, it needs to be built from the source. Somewhat non-typically for the Unix world, it uses long parameter names with a single dash.
$ textql -header -sql='SELECT forename, surname, dob from drivers GROUP BY forename ORDER BY dob' /drivers.csv Lewis Hamilton 1985-01-07 Nick Heidfeld 1977-05-10 Nico Rosberg 1985-06-27 Fernando Alonso 1981-07-29 Heikki Kovalainen 1981-10-19 Kazuki Nakajima 1985-01-11 Sébastien Bourdais 1979-02-28 Kimi Räikkönen 1979-10-17 Robert Kubica 1984-12-07 Timo Glock 1982-03-18 Takuma Sato 1977-01-28 Nelson Piquet Jr. 1985-07-25 Felipe Massa 1981-04-25 David Coulthard 1971-03-27 Jarno Trulli 1974-07-13 Adrian Sutil 1983-01-11 Mark Webber 1976-08-27 ... # time # real 0m4.314s # user 0m4.471s # sys 0m0.111s
Let’s try the same with ‘q’. It is also built on top of SQLite, this time using Python.
$ cat /drivers.csv | q -H 'SELECT forename, surname, dob from drivers GROUP BY forename ORDER BY dob' -d, Lewis Hamilton 1985-01-07 Nick Heidfeld 1977-05-10 Nico Rosberg 1985-06-27 Fernando Alonso 1981-07-29 Heikki Kovalainen 1981-10-19 Kazuki Nakajima 1985-01-11 Sébastien Bourdais 1979-02-28 Kimi Räikkönen 1979-10-17 Robert Kubica 1984-12-07 Timo Glock 1982-03-18 Takuma Sato 1977-01-28 Nelson Piquet Jr. 1985-07-25 Felipe Massa 1981-04-25 David Coulthard 1971-03-27 Jarno Trulli 1974-07-13 Adrian Sutil 1983-01-11 Mark Webber 1976-08-27 ... # time # real 0m7.863s # user 0m6.783s # sys 0m0.355s
If we compare query times:
- clickhouse-local: 0m2.348s
- textsql: 0m4.314s
- q: 0m7.863s
‘clickhouse-local’ is certainly much faster than tools built on top of SQLite. ClickHouse’s outstanding performance, rich SQL dialect, and plenty of input/output formats shine here.
Some other examples;
clickhouse-local --query='select *, rand() from numbers(20)' 0 2127369387 1 3261287878 2 2367728063 3 581934576 4 747639222 5 3155791826 6 993342505 7 328854968 8 3202341608 9 1789685972 10 832630616 11 348253160 12 3842952540 13 3441355260 14 749620731 15 2642293777 16 2127129424 17 79035446 18 1452862564 19 1060916590
clickhouse-local --query="select * from url('http://www.geoplugin.net/csv.gp?ip=3.3.3.3','CSV','property String,value String') limit 10" --format=PrettyCompact ┌─property─────────────┬─value──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ geoplugin_request │ 3.3.3.3 │ │ geoplugin_status │ 206 │ │ geoplugin_delay │ 0ms │ │ geoplugin_credit │ Some of the returned data includes GeoLite data created by MaxMind available from <a href='http://www.maxmind.com'>http://www.maxmind.com</a>. │ │ geoplugin_city │ │ │ geoplugin_region │ │ │ geoplugin_regionCode │ │ │ geoplugin_regionName │ │ │ geoplugin_areaCode │ │ │ geoplugin_dmaCode │ │ └──────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Conclusion
This article talked about which scenarios you can set up clickhouse-local and perform operations with. It is for executing analytical SQL queries on CSV files and as a preprocessor in your shell scripts.
‘clickhouse-local’ handles tasks with outstanding performance leaving other tools created for that purpose far behind. Being a part of the clickhouse toolset ‘clickhouse-local’ provides seamless and efficient integration with ‘clickhouse-client’ using Native format.
You can start learning clickhouse with clickhouse-local and clickhouse-client.
To learn more about getting started with ClickHouse, do consider reading the following articles:
Sources and References: