1. Home
  2. Knowledge Base
  3. ClickHouse
  4. Batch inserts vs individual row inserts in ClickHouse
  1. Home
  2. Knowledge Base
  3. ClickHouse Cloud
  4. Batch inserts vs individual row inserts in ClickHouse
  1. Home
  2. Knowledge Base
  3. ClickHouse Performance
  4. Batch inserts vs individual row inserts in ClickHouse

Batch inserts vs individual row inserts in ClickHouse

ClickHouse developers recommend batch data ingestion for ClickHouse. While transactional databases can handle multiple insert requests efficiently and fast, ClickHouse is little different. For MergeTree family engines,  the data inserted is stored first in unmerged data parts and then sorted and merged. Inserting the records row by row is a costly operation because it requires multiple data writes, sorts and updates in the primary key and the mark files. Let us verify this using a Python code.

Pre-Requisites

Example table

CREATE TABLE insert_benchmark
(
ID Int64,
Value Float64,
Count Int64
)
Engine = MergeTree
ORDER BY ID;

Row by Row Insert

The below Python code snippet can generate random data and insert it row by row in to the ClickHouse table.

from clickhouse_driver import Client
import random

client = Client(host='localhost', user='default', password='123456', port='9000', 
                            send_receive_timeout=600, secure=False )

insert_query = "INSERT INTO insert_benchmark VALUES "

# Random data for insertion

data = [[random.randint(0, 10000000), random.uniform(0.0, 100000.0), random.randint(0, 10000000)] for x in range(100000)]

# row insert
for row in data:
    client.execute(insert_query, [row])

client.disconnect()

Batch Insert

The below Python code snippet can generate random data and insert it in a single batch to the ClickHouse table.

from clickhouse_driver import Client
import random

client = Client(host='localhost', user='default', password='123456', port='9000', 
                            send_receive_timeout=600, secure=False )

insert_query = "INSERT INTO insert_benchmark VALUES "

data = [[random.randint(0, 10000000), random.uniform(0.0, 100000.0), random.randint(0, 10000000)] for x in range(100000)]

client.execute(insert_query, data)
client.disconnect()

You can verify the speed difference when you run these on your ClickHouse setup.

 

References

https://clickhouse-driver.readthedocs.io/en/latest/quickstart.html#inserting-data

Was this article helpful?

Related Articles

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

In the spirit of freedom, independence and innovation. ChistaDATA Corporation is not affiliated with ClickHouse Corporation 

Need Support?

Can't find the answer you're looking for?
Contact Support

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)

Copyright 2022 ChistaDATA Inc

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.