Sharding in ClickHouse: Part 1

Introduction Sharding is splitting a large table horizontally (row-wise) and storing it in multiple servers. Clickhouse uses distributed table engine for processing the sharded tables. Shards can be internally replicated or non-replicated in ClickHouse. Sharding […]

Data Replication in ClickHouse (Docker Based Setup)

Image Source – Pexels Data replication is the process of storing multiple copies of data to ensure system reliability and improve data availability. ClickHouse supports multi-primary replication, and it is asynchronous (eventually consistent). Every MergeTree table […]

Tuples in ClickHouse

Tuples are collections of items with heterogeneous data types. ClickHouse tuples are used with IN operators and in Lambda functions. The tuples can not be empty and need at least one element in them. Tuples […]

How to Ingest Data from a Kafka Topic in ClickHouse

Introduction Apache Kafka is a distributed event streaming platform developed by Apache Software Foundation. Visit the official page before proceeding for a detailed introduction to the basics of Kafka. The installation instructions are available here. ClickHouse […]

Arrays in ClickHouse

Arrays are collections of items comprising similar data types. ClickHouse supports arrays as its column data type. The maximum allowed size of a ClickHouse array is 1 million. An array can hold elements entirely of […]

OPTIMIZE statement

Image Courtesy – Pexels – Vlad Chetan  The data is stored in multiple parts in MergeTree family of engines and data parts are merged asynchronously in the background. Refer to our article on Mergetree storage and […]

ClickHouse July 2022 Release – v22.7

Introduction ClickHouse version 22.7 (July 2022) was unveiled on 21st July 2022. This release has around 25 new features, 19 performance improvement changes,  40+ other improvements and 50+ bug fixes. Here is the official list […]

1 3 4 5 6