Implementing Online Schema Change in ClickHouse

Online Schema Change - ClickHouse

Introduction

In the dynamic world of data management, databases are constantly evolving to meet the changing needs of organisations. However, changing the structure of a database, also known as a schema change, has traditionally been a daunting and time-consuming task. Fortunately, the advent of Online Schema Change techniques has revolutionised the way databases are modified, making the process seamless and efficient. In this article, we will explore the importance of Online Schema Change tool and discuss its benefits and challenges.

What is Online Schema Change?

Online Schema Change refers to the ability to change the schema of a database without disrupting its normal operation and while it is actively serving customers. It allows changes to be made in real time, enabling organisations to adapt to changing requirements without downtime or compromising data integrity.

Benefits of Online Schema Change

  1. Increased availability: With Online Schema Change tool, ClickHouse databases can be modified while they remain online. This eliminates the need for scheduled maintenance windows or user disruption, resulting in significantly increased availability.
  2. Continuous service: By allowing changes to be made while the ClickHouse database is running, Online Schema Change ensures uninterrupted service for users. This is particularly important for organisations operating in 24/7 environments where downtime can result in significant financial loss.
  3. Improved scalability: Online Schema Change tool allows ClickHouse databases to adapt to changing workloads by adding or modifying indexes, partitions or tables on the fly. This aspect of scalability is critical for accommodating data growth, increasing performance and maximising resource utilisation.
  4. Increased flexibility: With Online Schema Change techniques, your organisations can iteratively refine and optimise ClickHouse database designs to meet evolving requirements without requiring a complete system maintenance. It provides the flexibility to experiment and iterate without disrupting the entire system.

Challenges and Considerations

While there are notable benefits to changing your schema online, there are also challenges. Here are some considerations to keep in mind:

  1. Data integrity: Changing a live database carries the risk of compromising data integrity. Robust testing, validation and rollback mechanisms should be in place to ensure consistency during the schema change process.
  2. Performance impact: The complexity of schema changes can impact database performance. Queries may take longer to execute during the change process, and it is important to consider the potential impact on application performance and user experience.
  3. Compatibility: Online Schema Change techniques vary between different database management systems (DBMS) and versions. It is important to choose a method that is compatible with your specific DBMS and has been thoroughly tested to ensure a seamless transition.

Procedure for Online Schema Change

The tool performs the schema changes using the following procedure:

  1. It creates an empty temporary table from the original table. We call this table the ghost table.
  2. ALTER the ghost table. This procedure has no load because the table is empty.
  3. Check that all structural changes have been applied.
  4. Start loading data from the original table into the ghost table.
  5. Check that there are no data differences between the two tables using ClickHouse’s SipHash64 function.
  6. Once the ghost table is synchronised and ready for use, the old table can be discarded by renaming it.
  7. The schema change is complete.

The Online Schema Change tool never changes or updates the original table. We only ever work on the ghost table. We create a ghost table with an exact copy of the original table’s DDL, then we run the user’s ALTER TABLE command on the ghost table. Then the tool compares the two tables to determine what makes them similar and what makes them different using the ClickHouse’s SipHash64 function. This procedure continues until two tables are synchronised and when they are, the renaming operation can be started with the last validation processes.

Conclusion

The Online Schema Change tool is a game-changer for modern database management, allowing organisations to adapt and evolve their data structures in real time without compromising availability or data integrity. The ability to change schemas on the fly significantly improves the scalability, flexibility and overall performance of databases. Despite the challenges, the benefits of on-the-fly schema modification make it an invaluable tool for your organisations looking to remain agile and meet the ever-changing demands of the digital landscape. Stay tuned for the Online Schema Change tool for the ClickHouse!

To read more about ClickHouse internals, do consider reading the below articles: 

About Ilkay 24 Articles
Ilkay has been administering databases including NoSQL and RDBMS for over 7 years. He is experienced working high-scale banking databases. He is currently working at ChistaDATA Inc. as Database Administrator.
Contact: Website