Introduction
JSON refers to Javascript Object Notation. It is a popular text data format used to exchange data on modern web and mobile applications. It is based on a subset of the Javascript programming language. It is used to store random data in log files or NoSQL.
It also enables us to integrate NoSQL and related concepts into the same database.
JSON Data Types
JSON supports six data types, which are divided into two classes. These classes are:
- Simple/Primitive Data Types
- Complex Data Types
Primitive data types;
- String
- Number
- Boolean
- Null
Complex data types;
- Object
- Array
String
The string data type stores text-based data. A string is enclosed within double quotation marks " "
.
For example, the following snippet of code contains a JSON object with a property called company_name
, and its value is a “ChistaData”, a string.
{"company_name" : "ChistaData"}
Number
The number data type is used to store any numeric data in the form of an integer of float. It supports base-10 numbers only.
For example, the following code shows a JSON object with a property named year
which value is 2022.
{"year" : 2022}
Boolean
The Boolean data type stores data that can be either true or false.
For example, the following code shows a JSON object with a property named account
which value is true
.
{"account": true}
Null
The null
data type is used whenever we have a property, but the value against it is undefined. The null
data type is used in place of any undefined data.
For example, the following code blog shows a JSON object that holds the user’s primary and phone number.
{ "primary_email" : "can.sayn@chistadata.com", "phone_number" : null }
Arrays
Arrays are a collection of similar data. An array is used to store multiple values of similar data against a single property.
An array is enclosed within square brackets and must contain the same type of data.
{"employee_names" : ["Shiv", "Alkin", "Can"]}
Object
An object is a collection of name or value pairs that are in between curly brackets.
For example, the following snippet shows an object that contains a student’s name and his scores in different subjects. The scores are stored in a nested object.
{ "name" : "Can", "skill" : {"administration" : 1, "support" : 2, "consulting" : 3} }
How to Import JSON Into ClickHouse?
To more information please visit ClickHouse docs from here.
ClickHouse supports a wide range of data formats for input and output. There are multiple JSON variations among them, but the most commonly used for data ingestion is JSONEachRow. It expects one JSON object per row, each object separated by a newline.
Examples
Using HTTP interface:
echo '{"foo":"bar"}' | curl 'http://localhost:8123/?query=INSERT%20INTO%20test%20FORMAT%20JSONEachRow' --data-binary @-
Using CLI interface:
echo '{"foo":"bar"}' | clickhouse-client --query="INSERT INTO test FORMAT JSONEachRow"
Useful Settings
input_format_skip_unknown_fields
allows inserting JSON even if there were additional fields not present in table schema (by discarding them).
Example ;
clickhouse-client -n --query="SET input_format_skip_unknown_fields=1; INSERT INTO event FORMAT JSONEachRow;"
input_format_import_nested_json
allows inserting nested JSON objects into columns of Nested type.
{"@timestamp": 86295847, "hostip":"111.22.22.22", "clientip":"192.168.2.8", "request": "GET /image/interstaller.jpg", "number": 47 } {"@timestamp": 86295847, "hostip":"111.22.22.22", "clientip":"192.168.2.47", "request": "GET /image/fastandfurious.png", "number": 1327 }
Now, create a table in ClickHouse that fields are related to JSON data.
CREATE TABLE table_log_file ( '@timestamp' DateTime, 'hostip' IPv4, 'clientip' IPv4, 'request' String, 'number' UInt32 ) ENGINE = MergeTree PARTITION BY toStartOfDay('@timestamp') ORDER BY '@timestamp' SETTINGS index_granularity = 8192
Then, upload the log file into ClickHouse;
clickhouse-client --query 'INSERT INTO table_log_file FORMAT JSONEachRow' < log_file
In ClickHouse, you can get a JSON string and integer by running the following commands
String;
SELECT JSONExtractString (message, 'request') AS request FROM table_log_file limit 5
Int;
SELECT JSONExtractINT (message, 'number') AS number FROM table_log_file limit 5
Now let’s give an example with the same table for reporting via aggregate function. When we say aggregation operations, we understand that documents are grouped according to certain fields and some filtering or calculation operations are performed with the help of these groups. To calculate the sum of units in number by request, you use the SUM function with the GROUP BY clause as the following query:
SELECT request, sum(number) FROM table_log_file GROUP BY request;
Conclusion
JSON is a frequently preferred file format to be used in different database systems for different situations. In this blog, I talked about the JSON file format and data types. At the same time, we gave some examples of operations that can be done with JSON over the ClickHouse database. ClickHouse contains commands that make it easy for the user to operate with JSON format. In this way, you can process your information in JSON file format with ClickHouse, analyze and report.
To read more about JSON in ClickHouse, do consider reading the following articles: