Demystifying JSON Data With ClickHouse

Introduction

JSON refers to Javascript Object Notation. It is a popular text data format used to exchange data on modern web and mobile applications. It is based on a subset of the Javascript programming language. It is used to store random data in log files or NoSQL.

It also enables us to integrate NoSQL and related concepts into the same database.

JSON Data Types

JSON supports six data types, which are divided into two classes. These classes are:

  • Simple/Primitive Data Types
  • Complex Data Types

Primitive data types;

  • String
  • Number
  • Boolean
  • Null

Complex data types;

  • Object
  • Array
JSON

String

The string data type stores text-based data. A string is enclosed within double quotation marks " ".

For example, the following snippet of code contains a JSON object with a property called company_name, and its value is a “ChistaData”, a string.

{"company_name" : "ChistaData"}

Number

The number data type is used to store any numeric data in the form of an integer of float. It supports base-10 numbers only.

For example, the following code shows a JSON object with a property named year which value is 2022.

{"year" : 2022}

Boolean

The Boolean data type stores data that can be either true or false.

For example, the following code shows a JSON object with a property named account which value is true.

{"account": true}

Null

The null data type is used whenever we have a property, but the value against it is undefined. The null data type is used in place of any undefined data.

For example, the following code blog shows a JSON object that holds the user’s primary and phone number.

{
"primary_email" : "can.sayn@chistadata.com",
"phone_number" : null
}

Arrays

Arrays are a collection of similar data. An array is used to store multiple values of similar data against a single property.

An array is enclosed within square brackets and must contain the same type of data.

{"employee_names" : ["Shiv", "Alkin", "Can"]}

Object

An object is a collection of name or value pairs that are in between curly brackets.

For example, the following snippet shows an object that contains a student’s name and his scores in different subjects. The scores are stored in a nested object.

{
"name" : "Can",
"skill" : {"administration" : 1, "support" : 2, "consulting" : 3}
}

How to Import JSON Into ClickHouse?

To more information please visit ClickHouse docs from here.

ClickHouse supports a wide range of data formats for input and output. There are multiple JSON variations among them, but the most commonly used for data ingestion is JSONEachRow. It expects one JSON object per row, each object separated by a newline.

Examples

Using HTTP interface:

echo '{"foo":"bar"}' | curl 'http://localhost:8123/?query=INSERT%20INTO%20test%20FORMAT%20JSONEachRow' --data-binary @-


Using CLI interface:

echo '{"foo":"bar"}'  | clickhouse-client --query="INSERT INTO test FORMAT JSONEachRow"

Useful Settings

  • input_format_skip_unknown_fields allows inserting JSON even if there were additional fields not present in table schema (by discarding them).

Example ;

clickhouse-client -n --query="SET input_format_skip_unknown_fields=1; INSERT INTO event FORMAT JSONEachRow;"
  • input_format_import_nested_json allows inserting nested JSON objects into columns of Nested type.
 
Let’s give an example of how we can upload data into ClickHouse with a real JSON file.
We have a file which is called log_file.json and it includes example data which is given below;
 
 
{"@timestamp": 86295847, "hostip":"111.22.22.22", "clientip":"192.168.2.8", "request": "GET /image/interstaller.jpg",
"number": 47 }
{"@timestamp": 86295847, "hostip":"111.22.22.22", "clientip":"192.168.2.47", "request": "GET /image/fastandfurious.png",
"number": 1327 }

Now, create a table in ClickHouse that fields are related to JSON data.

CREATE TABLE table_log_file
(
    '@timestamp' DateTime,
    'hostip' IPv4,
    'clientip' IPv4,
    'request' String,
    'number' UInt32
)
ENGINE = MergeTree
PARTITION BY toStartOfDay('@timestamp')
ORDER BY '@timestamp'
SETTINGS index_granularity = 8192

Then, upload the log file into ClickHouse;

clickhouse-client --query 'INSERT INTO table_log_file FORMAT JSONEachRow' < log_file

In ClickHouse, you can get a JSON string and integer by running the following commands

String;

SELECT JSONExtractString (message, 'request') AS request
FROM table_log_file limit 5

Int;

SELECT JSONExtractINT (message, 'number') AS number
FROM table_log_file limit 5

Now let’s give an example with the same table for reporting via aggregate function. When we say aggregation operations, we understand that documents are grouped according to certain fields and some filtering or calculation operations are performed with the help of these groups. To calculate the sum of units in number by request, you use the SUM function with the GROUP BY clause as the following query:

SELECT 
request, sum(number)
FROM table_log_file
GROUP BY request;

Conclusion

JSON is a frequently preferred file format to be used in different database systems for different situations. In this blog, I talked about the JSON file format and data types. At the same time, we gave some examples of operations that can be done with JSON over the ClickHouse database. ClickHouse contains commands that make it easy for the user to operate with JSON format. In this way, you can process your information in JSON file format with ClickHouse, analyze and report.

To read more about JSON in ClickHouse, do consider reading the following articles: 

About Can Sayn 36 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website