Benchmarking ClickHouse using the clickhouse-benchmark Tool

Introduction

ClickHouse is a popular open-source analytical database management system that is designed to handle massive amounts of data. It has been adopted by many companies for its excellent performance, low latency, and scalability. However, to ensure optimal performance, benchmarking ClickHouse is essential.

Benchmarking is the process of testing the performance of a system under different workloads and conditions to identify potential issues, bottlenecks, and areas for improvement. In the case of ClickHouse, benchmarking can help users determine the hardware and software configurations needed to achieve optimal performance for specific use cases.

How to benchmark ClickHouse

Here are some critical steps to benchmark ClickHouse:

  1. Identify the benchmarking goals: Before starting the benchmarking process, it is essential to identify the goals of the benchmark. What are the questions that need to be answered? What are the performance metrics that need to be measured? The goals could be anything from measuring query performance to testing the system’s ability to handle high volumes of data.
  2. Set up the test environment: The test environment should mimic the production environment as closely as possible. This includes the hardware configuration, software versions, and network setup. Using a dedicated test environment is recommended to avoid interference with the production system.
  3. Generate test data: Generating test data that closely resembles the production data is crucial for accurate benchmarking. This can be done using tools like ClickHouse’s own Data Generator or third-party tools like DBGen. The size and complexity of the test data should match the production data to get accurate results.
  4. Run the benchmark tests: There are several types of benchmark tests that can be run on ClickHouse. Some of the most common ones are query performance tests, ingestion tests, and system scalability tests. These tests should be run multiple times to ensure consistent results.
  5. Analyze the results: After running the tests, the results should be analyzed to identify any issues, bottlenecks, or areas for improvement. The performance metrics to be analyzed include query execution time, ingestion rate, disk utilization, and memory usage.
  6. Optimize the system: Based on the analysis of the results, the system should be optimized to improve performance. This can involve tweaking the hardware configuration, adjusting the software settings, or fine-tuning the queries.

Advantages of ClickHouse Benchmarking

There are several advantages to benchmarking ClickHouse, including the following:

  1. Improved Performance: Benchmarking allows users to identify bottlenecks and areas for improvement, which can lead to better performance. Optimizing the system based on benchmarking results allows users to achieve faster query execution times, higher ingestion rates, and improved overall system scalability.
  2. Better Resource Utilization: Benchmarking can help users identify how resources such as CPU, memory, and disk are being utilized by the system. This can help users allocate resources more efficiently and optimize the system for better performance.
  3. Reduced Downtime: By identifying potential issues and bottlenecks through benchmarking, users can proactively address them before they cause system failures or downtime. This can help reduce the impact of any system failures or downtime and keep the system running smoothly.
  4. Cost Savings: Benchmarking can help users identify the hardware and software configurations needed to achieve optimal performance for specific use cases. This can help users avoid overspending on unnecessary resources and save on hardware and software costs.
  5. Better Decision Making: Benchmarking provides users with data-driven insights that can inform better decision-making. By having a clear understanding of the system’s capabilities and limitations, users can make informed decisions about how to optimize the system and achieve the best possible performance for their specific use case.

In summary, benchmarking ClickHouse can help users achieve better performance, reduce downtime, save costs, and make better-informed decisions. It is an essential process for anyone using ClickHouse for data analytics and processing.

clickhouse-benchmark Tool

ClickHouse provides a built-in tool for benchmarking called clickhouse-benchmark. This tool allows users to run benchmark tests and measure the performance of the ClickHouse system. Here is an example of how to use the clickhouse-benchmark tool:

Install clickhouse-benchmark

To install the ClickHouse benchmark tool, you can follow these steps:

  1. Open your terminal or command prompt.
  2. Install ClickHouse on your system by following the instructions on the official ClickHouse website.
  3. Once ClickHouse is installed, navigate to the ClickHouse binary folder. This folder contains the ClickHouse benchmark tool.
  4. In the ClickHouse binary folder, locate the “clickhouse-benchmark” binary file.
  5. Make the file executable by running the command: chmod +x clickhouse-benchmark
  6. Test that the clickhouse-benchmark tool is working properly by running the command: ./clickhouse-benchmark --help

The location of the ClickHouse binary folder can depend on how you installed ClickHouse on your system. However, there are a few common default locations where the ClickHouse binary folder can be found:

  1. For a package installation on Ubuntu or Debian, the ClickHouse binary folder is typically located at /usr/bin/clickhouse.
  2. For a package installation on CentOS or Fedora, the ClickHouse binary folder is typically located at /usr/bin/clickhouse-server.
  3. If you installed ClickHouse using a binary package, the ClickHouse binary folder is typically located in the directory where you extracted the package.
  4. If you installed ClickHouse from the source, the ClickHouse binary folder is typically located in the build directory in the ClickHouse source tree.

Once you have completed these steps, the ClickHouse benchmark tool should be installed and ready to use. You can use the tool to run performance tests and benchmarks on your ClickHouse installation. For more information on how to use the ClickHouse benchmark tool, refer to the official ClickHouse documentation.

Run the following command to start benchmarking

echo "SELECT * FROM system.numbers LIMIT 10000000 OFFSET 10000000" | clickhouse-benchmark --host=localhost --port=9000 -i 10

Output

Loaded 1 queries.

Queries executed: 6.

localhost:9000, queries 6, QPS: 6.665, RPS: 133398978.589, MiB/s: 1017.753, result RPS: 66648989.355, result MiB/s: 508.491.

0.000%		0.123 sec.	
10.000%		0.126 sec.	
20.000%		0.126 sec.	
30.000%		0.135 sec.	
40.000%		0.135 sec.	
50.000%		0.149 sec.	
60.000%		0.149 sec.	
70.000%		0.159 sec.	
80.000%		0.159 sec.	
90.000%		0.208 sec.	
95.000%		0.208 sec.	
99.000%		0.208 sec.	
99.900%		0.208 sec.	
99.990%		0.208 sec.	



Queries executed: 10.

localhost:9000, queries 10, QPS: 7.225, RPS: 144601413.126, MiB/s: 1103.221, result RPS: 72245965.795, result MiB/s: 551.193.

0.000%		0.117 sec.	
10.000%		0.118 sec.	
20.000%		0.119 sec.	
30.000%		0.123 sec.	
40.000%		0.126 sec.	
50.000%		0.130 sec.	
60.000%		0.130 sec.	
70.000%		0.135 sec.	
80.000%		0.149 sec.	
90.000%		0.159 sec.	
95.000%		0.208 sec.	
99.000%		0.208 sec.	
99.900%		0.208 sec.	
99.990%		0.208 sec.

This output shows the results of the benchmark test.

In the report, you can find the following:

  • Number of queries in the Queries executed: field.
  • Status string containing (in order):
    • Endpoint of ClickHouse server.
    • Number of processed queries.
    • QPS: How many queries the server performed per second during a period specified in the --delay argument.
    • RPS: How many rows the server reads per second during a period specified in the --delay argument.
    • MiB/s: How many mebibytes the server reads per second during a period specified in the --delay argument.
    • result RPS: How many rows are placed by the server to the result of a query per second during a period specified in the --delay argument.
    • result MiB/s. How many mebibytes are placed by the server to the result of a query per second during a period specified in the --delay argument?
  • Percentiles of query execution time.

Comparison Mode

clickhouse-benchmark can compare performances for two running ClickHouse servers.

To use the comparison mode, specify endpoints of both servers by two pairs of --host--port keys. Keys matched together by position in arguments list, the first --host is matched with the first --port and so on. clickhouse-benchmark establishes connections to both servers, then sends queries. Each query addressed to a randomly selected server. The results are shown in a table.

echo "SELECT * FROM system.numbers LIMIT 10000000 OFFSET 10000000" | clickhouse-benchmark --host=localhost --port=9001 --host=localhost --port=9000 -i 10

Output

Loaded 1 queries.

Queries executed: 5.

localhost:9001, queries 2, QPS: 3.764, RPS: 75446929.370, MiB/s: 575.614, result RPS: 37639659.982, result MiB/s: 287.168.
localhost:9000, queries 3, QPS: 3.815, RPS: 76466659.385, MiB/s: 583.394, result RPS: 38148392.297, result MiB/s: 291.049.

0.000%          0.258 sec.      0.250 sec.
10.000%         0.258 sec.      0.250 sec.
20.000%         0.258 sec.      0.250 sec.
30.000%         0.258 sec.      0.267 sec.
40.000%         0.258 sec.      0.267 sec.
50.000%         0.273 sec.      0.267 sec.
60.000%         0.273 sec.      0.267 sec.
70.000%         0.273 sec.      0.267 sec.
80.000%         0.273 sec.      0.269 sec.
90.000%         0.273 sec.      0.269 sec.
95.000%         0.273 sec.      0.269 sec.
99.000%         0.273 sec.      0.269 sec.
99.900%         0.273 sec.      0.269 sec.
99.990%         0.273 sec.      0.269 sec.

No difference proven at 99.5% confidence

Note that the actual results may vary depending on the configuration of your ClickHouse server and the hardware on which the benchmark is run.

Analyze the results

After the benchmark test is complete, the clickhouse-benchmark tool displays the results in the console. The results show the total time taken for the test, the number of requests per second, and the average time per request.

By analyzing these results, we can identify potential bottlenecks or issues with the system and optimize it accordingly.

Conclusion

In conclusion, the clickhouse-benchmark is a powerful tool that allows users to run benchmark tests and measure the performance of the ClickHouse system. Using this tool, users can identify any issues or bottlenecks with the system and optimize it for better performance.

To read more about ClickHouse benchmarking, please do consider reading the below articles

About Can Sayn 36 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website