☛ What is ClickHouse?

ClickHouse is an open-source columnar database management system from ClickHouse Corporation built for web-scale, real-time streaming data analytics using SQL queries. ClickHouse is capable of delivering optimal, horizontally scalable, fault-tolerant and highly available data analytics solutions for planet-scale internet / mobile properties and the Internet of Things (IoT). Modern hardware efficient ClickHouse columnar storage format allows fitting more hot data in RAM, which leads to shorter response times. ChistaDATA provides ClickHouse Consulting and Support (24*7) to deliver optimal, scalable and highly available web-scale data analytics platforms.

☛ Quick facts on ClickHouse

  • Open Source ColumnStore project from ClickHouse Corporation.
  • Built for Massively Parallel Processing Systems, Large/complex queries can be run in parallel with minimal or no effort, The modern hardware infrastructure ready!
  • Data compression – ClickHouse supports data compression and this improves query performance.
  • Horizontally scalable columnar database system – ClickHouse is built for web-scale data analytics, Data can be replicated across several ClickHouse Shards. ClickHouse has distributed database analytics-ready columnar database system.
  • In ClickHouse, data is not just stored by columns but is also processed by vectors to achieve high CPU performance.
  • Web-Scale data analytics-ready – Primary keys are allowed, The data extraction for specific clients through Metrica counter over a specific time range makes low latency query analytics possible.
  • Flexible aggregation – Aggregate functions for partial data with approximated calculation (minimal data retrieval option). Random keys aggregation instead of all keys for higher accuracy using minimal resources.
  • Maximum availability and self-healing – Asynchronous multi-master replication with auto-failover capabilities.
  • SQL-based – ClickHouse supports SQL, JOINS, subqueries including FROM, IN, JOIN clauses; and scalar subqueries are allowed. Correlated subqueries are not allowed.

☛ ColumnStore and Row-Based Database Managed System – Why it’s better to use ColumnStores for SORT/SEARCH intensive Analytics Operations

☛ Why is ClickHouse recommended for a time-series Database?

ClickHouse is a column-oriented, distributed relational database management system that is designed for OLAP (Online Analytical Processing) and OLTP (Online Transaction Processing) workloads. It is particularly well-suited for time-series data analysis because of its ability to handle large amounts of data, high write and read performance, and support for advanced analytical functions. Here are some of the reasons why ClickHouse is recommended for time-series data:

  • Column-oriented storage: ClickHouse uses a column-oriented storage model, which means that data is stored by columns rather than by rows. This allows for efficient compression and faster data retrieval, especially for time-series data, where the data is often read in time-based chunks.
  • Advanced analytical functions: ClickHouse supports advanced analytical functions such as window functions, aggregate functions, and SQL-based data filtering, which are useful for time-series data analysis. This allows users to perform complex queries on large data sets quickly and efficiently.
  • Real-time query performance: ClickHouse is designed to handle high write and read performance, making it suitable for real-time data analysis. It can handle millions of writes per second and return results in milliseconds, even on large datasets.
  • Scalability: ClickHouse is a distributed system, which means that it can scale horizontally by adding more servers. This allows it to handle very large data sets and handle high write and read loads.
  • Compression: ClickHouse supports advanced compression techniques, which can significantly reduce the size of the data stored on disk, making it more cost-efficient for storing large data sets.
  • High Availability: ClickHouse supports high availability through replication. It allows data to be replicated across multiple servers, which can help to ensure that data is always available even in the event of a server failure.

In summary, ClickHouse’s column-oriented storage, advanced analytical functions, real-time query performance, scalability, compression and high availability features make it a suitable choice for time-series data analysis and data warehousing.

☛ Why do we recommend ClickHouse over many other columnar database systems?

  • Compact data storage – Ten billion UInt8-type values should exactly consume 10GB uncompressed to efficiently use the available CPU. Optimal storage even when uncompressed benefits performance and resource management. ClickHouse is built is store data efficiently without any garbage.
  • CPU efficient – Whenever possible, ClickHouse operations are dispatched on arrays, rather than on individual values. This is called “vectorized query execution,” and it helps lower the cost of actual data processing.
  • Data compression – ClickHouse supports two kinds of compression LZ4 and ZSTD. LZ4 is faster than ZSTD but the compression ratio is smaller.ZSTD is faster and compresses better than traditional Zlib but slower than LZ4.  We recommend customers LZ4 when I/O is fast enough so decompression speed will become a bottleneck. When using super ultra-fast disk subsystems you have an option to specify “none” compression. ZSTD is recommended when I/O is the bottleneck in queries with large range scans.
  • Can store data in disk – The columnar database systems like SAP HANA and Google PowerDrill can only work in the RAM.
  • Massively Parallel Processing – ClickHouse is capable of Massively Parallel Processing very large/complex SQL(s) optimally and cost-efficiently
  • Built for web-scale data analytics – ClickHouse supports sharding and distributed processing, This makes ClickHouse the most preferred columnar database system for web-scale. Each shard in ClickHouse can be a group of replicas addressing maximum reliability and fault tolerance.
  • ClickHouse support Primary Key – ClickHouse permits real-time data updates with a primary key (there will be no locking when adding data). Data is sorted incrementally using the merge tree to perform queries on the range of primary key values.
  • Built for statistical analysis and supporting partial aggregation – ClickHouse is a statistical query analysis-ready columnar database store supporting aggregate functions for approximated calculation of the number of various values, medians, and quantiles. ClickHouse supports aggregation for a limited number of random keys, instead of for all the keys. You can query on a part (sample) of data and generate approximate results reducing disk I/O operations considerably.
  • Supports SQL – ClickHouse supports SQL, Subqueries are supported in FROM, IN, and JOIN clauses, as well as scalar subqueries. Dependent subqueries are not supported.
  • Supports data replication – ClickHouse supports asynchronous multi-master and master-slave replication.

☛ ClickHouse comparison with Teradata




Columnar database
Relational database
Data compression
Built-in data compression for efficient storage
Limited data compression options
Query performance
Extremely fast, designed for high-speed analytics
Fast, but may struggle with very large datasets
Query language
SQL-like language called ClickHouse SQL
Data ingestion
Can handle high-volume, real-time data ingestion
Can handle high-volume data ingestion
Open-source and free, with commercial support available
Proprietary software with licensing fees and additional costs
Designed to scale horizontally across commodity hardware
Designed to scale vertically across specialized hardware
Ease of use
User-friendly interface and easy to set up
Requires specialized knowledge and training to set up and use effectively
Use cases
Best for real-time analytics and data warehousing
Best for large-scale data warehousing and business intelligence

☛ ClickHouse comparison with Hadoop




Data storage
Columnar storage for efficient compression and query performance
Hadoop Distributed File System (HDFS)
Query performance     
Extremely fast, designed for high-speed analytics
Slower than ClickHouse, especially with complex queries
Query language
SQL-like language called ClickHouse SQL
Hadoop Query Language (HQL)
Data processing
Designed for OLAP (online analytical processing) workloads
Designed for both OLAP and OLTP (online transaction processing) workloads
Data ingestion
Limited real-time data ingestion capabilities
Designed for batch processing and can handle both real-time and historical data
Open-source and free, with commercial support available
Open-source and free, but may require additional hardware and infrastructure costs
Designed to scale horizontally across commodity hardware
Designed to scale horizontally across commodity hardware
Ease of use
User-friendly interface and easy to set up
Requires specialized knowledge and training to set up and use effectively
Use cases
Best for real-time analytics and data warehousing
Best for batch processing, ETL (extract, transform, load), and data warehousing

