Sharding in ClickHouse – Part 1

Image Source – Pexels Sharding is splitting a large table horizontally (row-wise) and storing it in multiple servers. Clickhouse uses distributed table engine for processing the sharded tables. Shards can be internally replicated or non-replicated in […]

Data Replication in ClickHouse (Docker Based Setup)

Image Source – Pexels Data replication is the process of storing multiple copies of data to ensure system reliability and improve data availability. ClickHouse supports multi-primary replication, and it is asynchronous (eventually consistent). Every MergeTree table […]

Tuples in ClickHouse

Tuples are collections of items with heterogeneous data types. ClickHouse tuples are used with IN operators and in Lambda functions. The tuples can not be empty and need at least one element in them. Tuples […]

Ingesting Data from a Kafka topic

Image Source – Pexels   Apache Kafka is a distributed event streaming platform developed by Apache Software Foundation. Visit the official page before proceeding for a detailed introduction to the basics of Kafka. The installation instructions are […]

Arrays in ClickHouse

Arrays are collections of items comprising similar data types. ClickHouse supports arrays as its column data type. The maximum allowed size of a ClickHouse array is 1 million. An array can hold elements entirely of […]

OPTIMIZE statement

Image Courtesy – Pexels – Vlad Chetan  The data is stored in multiple parts in MergeTree family of engines and data parts are merged asynchronously in the background. Refer to our article on Mergetree storage and […]

ReplacingMergetree Engine in ClickHouse

Image Source – Pexels Multiple heavy-weight table engines and functionalities of ClickHouse are built on top of the MergeTree engine. The MergeTree engine supports PRIMARY KEY expression, but it is not the same as the primary […]

1 2