Batch inserts vs individual row inserts in ClickHouse

ClickHouse developers recommend batch data ingestion for ClickHouse. While transactional databases can handle multiple insert requests efficiently and fast, ClickHouse is little different. For MergeTree family engines, the data inserted is stored first in unmerged data parts and then sorted and merged. Inserting the records row by row is a costly operation because it requires multiple data writes, sorts and updates in the primary key and the mark files. Let us verify this using a Python code.

Pre-Requisites

Python 3.x
clickhouse-connect driver
ClickHouse Server – You can sign up for fully managed ClickHouse server here

Example table

CREATE TABLE insert_benchmark
(
ID Int64,
Value Float64,
Count Int64
)
Engine = MergeTree
ORDER BY ID;

Row by Row Insert

The below Python code snippet can generate random data and insert it row by row in to the ClickHouse table.

from clickhouse_driver import Client
import random

client = Client(host='localhost', user='default', password='123456', port='9000', 
                            send_receive_timeout=600, secure=False )

insert_query = "INSERT INTO insert_benchmark VALUES "

# Random data for insertion

data = [[random.randint(0, 10000000), random.uniform(0.0, 100000.0), random.randint(0, 10000000)] for x in range(100000)]

# row insert
for row in data:
    client.execute(insert_query, [row])

client.disconnect()

Batch Insert

The below Python code snippet can generate random data and insert it in a single batch to the ClickHouse table.

from clickhouse_driver import Client
import random

client = Client(host='localhost', user='default', password='123456', port='9000', 
                            send_receive_timeout=600, secure=False )

insert_query = "INSERT INTO insert_benchmark VALUES "

data = [[random.randint(0, 10000000), random.uniform(0.0, 100000.0), random.randint(0, 10000000)] for x in range(100000)]

client.execute(insert_query, data)
client.disconnect()

You can verify the speed difference when you run these on your ClickHouse setup.

References

https://clickhouse-driver.readthedocs.io/en/latest/quickstart.html#inserting-data

Was this article helpful?

Yes No

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

ChistaDATA Inc.

Enterprise-class 24*7 ClickHouse Consultative Support and Managed Services

Batch inserts vs individual row inserts in ClickHouse

Pre-Requisites

Example table

Row by Row Insert

Batch Insert

References

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

Contents

Need Support?

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)

Batch inserts vs individual row inserts in ClickHouse

Pre-Requisites

Example table

Row by Row Insert

Batch Insert

References

Related Articles

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

Contents

Need Support?

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)