Implementing Branching on ChistaDATA Cloud for ClickHouse

 

Introduction

In the world of modern software development, managing data efficiently and effectively is crucial for maintaining a successful and scalable application. One of the techniques that has gained popularity in recent years is the Database Branching Method. This method allows developers to create multiple branches of their database, enabling them to test new features, experiment with different data structures, and maintain data integrity throughout the development process. In this blog, we will explore what the Database Branching Method is, its benefits, and how to implement it in your project on ChistaDATA DBaaS platform.

What is the Database Branching Method?

The Database Branching Method is an approach that involves creating multiple branches of a database to isolate changes made during development. Similar to how developers use version control systems like Git to manage code changes, the Database Branching Method extends this concept to data management. Each branch represents a separate environment for data manipulation, allowing developers to work on different features or experiments without affecting the main production database.

Benefits of Database Branching Method

  1. Isolation and Experimentation: With separate database branches, developers can freely experiment with new features, data models, or algorithms without worrying about disrupting the main production database. This isolation enables risk-free testing and fosters innovation.
  2. Parallel Development: Database branching enables multiple developers to work on different tasks simultaneously. Each developer can have their own branch, which they can merge or discard after completing their task, reducing the risk of conflicts and promoting collaboration.
  3. Code-Data Synchronization: By linking code changes with specific database branches, the Database Branching Method ensures that changes to the data model align with the corresponding code changes. This synchronization helps prevent data inconsistencies and reduces debugging efforts.
  4. Easy Rollback: If a new feature or experiment doesn’t yield the expected results, developers can easily roll back to a previous branch state. This ability to revert changes is particularly useful for maintaining data integrity and recovering from potential errors.
  5. Performance Optimization: Database branching allows developers to test performance optimizations on a smaller scale before applying them to the production database. This practice ensures that only thoroughly tested and optimized changes make it to the main environment.

Example with ChistaDATA Branching Tool on DBaaS

First, create an example table and insert an example data from ClickHouse official dataset.

ClickHouse Example dataset official document

CREATE TABLE uk_price_paid
(
    price UInt32,
    date Date,
    postcode1 LowCardinality(String),
    postcode2 LowCardinality(String),
    type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
    is_new UInt8,
    duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
    addr1 String,
    addr2 String,
    street LowCardinality(String),
    locality LowCardinality(String),
    town LowCardinality(String),
    district LowCardinality(String),
    county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2);

Insert data into table

INSERT INTO uk_price_paid
WITH
   splitByChar(' ', postcode) AS p
SELECT
    toUInt32(price_string) AS price,
    parseDateTimeBestEffortUS(time) AS date,
    p[1] AS postcode1,
    p[2] AS postcode2,
    transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
    b = 'Y' AS is_new,
    transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
    addr1,
    addr2,
    street,
    locality,
    town,
    district,
    county
FROM url(
    'http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv',
    'CSV',
    'uuid_string String,
    price_string String,
    time String,
    postcode String,
    a String,
    b String,
    c String,
    addr1 String,
    addr2 String,
    street String,
    locality String,
    town String,
    district String,
    county String,
    d String,
    e String'
) SETTINGS max_http_get_redirects=10;

Insert Statement Result:

Elapsed: 168.927 sec. Processed 28.35 million rows, 7.68 GB (167.82 thousand rows/s., 45.45 MB/s.)

Now check item count on ChistaDATA DBaaS.

28349587 rows return.

Now lets run branching tool and check what is happening.

Note: Connection Info can be found on Connection Section on DBaaS.

In this example, we will create a new branch for developer John.

Our DBaaS instance sources:

Memory Total:       64794092 kB

CPU: 8 core

First of all, you have to provide the following informations to connect our ChistaDATA DBaaS and create a new branch.

Database Name, Table Name, Branch Name, User Name, Password, Host Name, Port

Branching Tool will freeze your main table and then will create a new table with same structure of the main table.

When the freeze process is done, branching tool will create a hardlink for your main table and will give access to you to reach your branch data.


Note: This process does not copy your data. So, your storage size stay as same. The branching tool establishes a hardlink communication between the main table and the branching table for you to use.

(For more information about Freeze command you can check our article about Freeze command
https://chistadata.com/how-to-use-freeze-command-in-clickhouse/)

Branching tool will examine your folders located under the following freeze path and will create a hardlink for your new branching table.

chi-ws91-ch559-chi-clickhouse-chi-0-0-0:~# time python3 branching.py 
Enter the database name: uk
Enter the table name: uk_price_paid
Enter the branch name: devJohn
Enter the ClickHouse username: can.sayn@chistadata.com
Enter the ClickHouse password: 
Enter the ClickHouse hostname: ws91-ch559.dev.db.chistadata.io
Enter the ClickHouse port: 9440
The table uk.uk_price_paid has been frozen with name 'uk.uk_price_paid_devJohn'
Table information is saved into /tmp/uk.uk_price_paid.sql
New Table information is saved into /tmp/uk.uk_price_paid_devJohn.sql
Table uk.uk_price_paid_devJohn created successfully.
/var/lib/clickhouse/shadow/uk%2Euk_price_paid_devJohn/store/013/01350aa7-e8ff-48bd-aee1-8892336da565
/var/lib/clickhouse/data/uk/uk_price_paid_devJohn/detached/
sending incremental file list
./
all_13_18_1/
all_13_18_1/addr1.bin
all_13_18_1/addr1.mrk2
all_13_18_1/addr2.bin
all_13_18_1/addr2.mrk2
all_13_18_1/checksums.txt
all_13_18_1/columns.txt
all_13_18_1/count.txt
all_13_18_1/county.bin
all_13_18_1/county.dict.bin
all_13_18_1/county.dict.mrk2
all_13_18_1/county.mrk2
all_13_18_1/date.bin
all_13_18_1/date.mrk2
all_13_18_1/default_compression_codec.txt
all_13_18_1/district.bin
all_13_18_1/district.dict.bin
all_13_18_1/district.dict.mrk2
all_13_18_1/district.mrk2
all_13_18_1/duration.bin
all_13_18_1/duration.mrk2
all_13_18_1/is_new.bin
all_13_18_1/is_new.mrk2
all_13_18_1/locality.bin
all_13_18_1/locality.dict.bin
all_13_18_1/locality.dict.mrk2
all_13_18_1/locality.mrk2
all_13_18_1/postcode1.bin
all_13_18_1/postcode1.dict.bin
all_13_18_1/postcode1.dict.mrk2
all_13_18_1/postcode1.mrk2
all_13_18_1/postcode2.bin
all_13_18_1/postcode2.dict.bin
all_13_18_1/postcode2.dict.mrk2
all_13_18_1/postcode2.mrk2
all_13_18_1/price.bin
all_13_18_1/price.mrk2
all_13_18_1/primary.idx
all_13_18_1/street.bin
all_13_18_1/street.dict.bin
all_13_18_1/street.dict.mrk2
all_13_18_1/street.mrk2
all_13_18_1/town.bin
all_13_18_1/town.dict.bin
all_13_18_1/town.dict.mrk2
all_13_18_1/town.mrk2
all_13_18_1/type.bin
all_13_18_1/type.mrk2
all_19_24_1/
all_19_24_1/addr1.bin
all_19_24_1/addr1.mrk2
all_19_24_1/addr2.bin
all_19_24_1/addr2.mrk2
all_19_24_1/checksums.txt
all_19_24_1/columns.txt
all_19_24_1/count.txt
all_19_24_1/county.bin
all_19_24_1/county.dict.bin
all_19_24_1/county.dict.mrk2
all_19_24_1/county.mrk2
all_19_24_1/date.bin
all_19_24_1/date.mrk2
all_19_24_1/default_compression_codec.txt
all_19_24_1/district.bin
all_19_24_1/district.dict.bin
all_19_24_1/district.dict.mrk2
all_19_24_1/district.mrk2
all_19_24_1/duration.bin
all_19_24_1/duration.mrk2
all_19_24_1/is_new.bin
all_19_24_1/is_new.mrk2
all_19_24_1/locality.bin
all_19_24_1/locality.dict.bin
all_19_24_1/locality.dict.mrk2
all_19_24_1/locality.mrk2
all_19_24_1/postcode1.bin
all_19_24_1/postcode1.dict.bin
all_19_24_1/postcode1.dict.mrk2
all_19_24_1/postcode1.mrk2
all_19_24_1/postcode2.bin
all_19_24_1/postcode2.dict.bin
all_19_24_1/postcode2.dict.mrk2
all_19_24_1/postcode2.mrk2
all_19_24_1/price.bin
all_19_24_1/price.mrk2
all_19_24_1/primary.idx
all_19_24_1/street.bin
all_19_24_1/street.dict.bin
all_19_24_1/street.dict.mrk2
all_19_24_1/street.mrk2
all_19_24_1/town.bin
all_19_24_1/town.dict.bin
all_19_24_1/town.dict.mrk2
all_19_24_1/town.mrk2
all_19_24_1/type.bin
all_19_24_1/type.mrk2
all_1_6_1/
all_1_6_1/addr1.bin
all_1_6_1/addr1.mrk2
all_1_6_1/addr2.bin
all_1_6_1/addr2.mrk2
all_1_6_1/checksums.txt
all_1_6_1/columns.txt
all_1_6_1/count.txt
all_1_6_1/county.bin
all_1_6_1/county.dict.bin
all_1_6_1/county.dict.mrk2
all_1_6_1/county.mrk2
all_1_6_1/date.bin
all_1_6_1/date.mrk2
all_1_6_1/default_compression_codec.txt
all_1_6_1/district.bin
all_1_6_1/district.dict.bin
all_1_6_1/district.dict.mrk2
all_1_6_1/district.mrk2
all_1_6_1/duration.bin
all_1_6_1/duration.mrk2
all_1_6_1/is_new.bin
all_1_6_1/is_new.mrk2
all_1_6_1/locality.bin
all_1_6_1/locality.dict.bin
all_1_6_1/locality.dict.mrk2
all_1_6_1/locality.mrk2
all_1_6_1/postcode1.bin
all_1_6_1/postcode1.dict.bin
all_1_6_1/postcode1.dict.mrk2
all_1_6_1/postcode1.mrk2
all_1_6_1/postcode2.bin
all_1_6_1/postcode2.dict.bin
all_1_6_1/postcode2.dict.mrk2
all_1_6_1/postcode2.mrk2
all_1_6_1/price.bin
all_1_6_1/price.mrk2
all_1_6_1/primary.idx
all_1_6_1/street.bin
all_1_6_1/street.dict.bin
all_1_6_1/street.dict.mrk2
all_1_6_1/street.mrk2
all_1_6_1/town.bin
all_1_6_1/town.dict.bin
all_1_6_1/town.dict.mrk2
all_1_6_1/town.mrk2
all_1_6_1/type.bin
all_1_6_1/type.mrk2
all_25_25_0/
all_25_25_0/addr1.bin
all_25_25_0/addr1.mrk2
all_25_25_0/addr2.bin
all_25_25_0/addr2.mrk2
all_25_25_0/checksums.txt
all_25_25_0/columns.txt
all_25_25_0/count.txt
all_25_25_0/county.bin
all_25_25_0/county.dict.bin
all_25_25_0/county.dict.mrk2
all_25_25_0/county.mrk2
all_25_25_0/date.bin
all_25_25_0/date.mrk2
all_25_25_0/default_compression_codec.txt
all_25_25_0/district.bin
all_25_25_0/district.dict.bin
all_25_25_0/district.dict.mrk2
all_25_25_0/district.mrk2
all_25_25_0/duration.bin
all_25_25_0/duration.mrk2
all_25_25_0/is_new.bin
all_25_25_0/is_new.mrk2
all_25_25_0/locality.bin
all_25_25_0/locality.dict.bin
all_25_25_0/locality.dict.mrk2
all_25_25_0/locality.mrk2
all_25_25_0/postcode1.bin
all_25_25_0/postcode1.dict.bin
all_25_25_0/postcode1.dict.mrk2
all_25_25_0/postcode1.mrk2
all_25_25_0/postcode2.bin
all_25_25_0/postcode2.dict.bin
all_25_25_0/postcode2.dict.mrk2
all_25_25_0/postcode2.mrk2
all_25_25_0/price.bin
all_25_25_0/price.mrk2
all_25_25_0/primary.idx
all_25_25_0/street.bin
all_25_25_0/street.dict.bin
all_25_25_0/street.dict.mrk2
all_25_25_0/street.mrk2
all_25_25_0/town.bin
all_25_25_0/town.dict.bin
all_25_25_0/town.dict.mrk2
all_25_25_0/town.mrk2
all_25_25_0/type.bin
all_25_25_0/type.mrk2
all_26_26_0/
all_26_26_0/addr1.bin
all_26_26_0/addr1.mrk2
all_26_26_0/addr2.bin
all_26_26_0/addr2.mrk2
all_26_26_0/checksums.txt
all_26_26_0/columns.txt
all_26_26_0/count.txt
all_26_26_0/county.bin
all_26_26_0/county.dict.bin
all_26_26_0/county.dict.mrk2
all_26_26_0/county.mrk2
all_26_26_0/date.bin
all_26_26_0/date.mrk2
all_26_26_0/default_compression_codec.txt
all_26_26_0/district.bin
all_26_26_0/district.dict.bin
all_26_26_0/district.dict.mrk2
all_26_26_0/district.mrk2
all_26_26_0/duration.bin
all_26_26_0/duration.mrk2
all_26_26_0/is_new.bin
all_26_26_0/is_new.mrk2
all_26_26_0/locality.bin
all_26_26_0/locality.dict.bin
all_26_26_0/locality.dict.mrk2
all_26_26_0/locality.mrk2
all_26_26_0/postcode1.bin
all_26_26_0/postcode1.dict.bin
all_26_26_0/postcode1.dict.mrk2
all_26_26_0/postcode1.mrk2
all_26_26_0/postcode2.bin
all_26_26_0/postcode2.dict.bin
all_26_26_0/postcode2.dict.mrk2
all_26_26_0/postcode2.mrk2
all_26_26_0/price.bin
all_26_26_0/price.mrk2
all_26_26_0/primary.idx
all_26_26_0/street.bin
all_26_26_0/street.dict.bin
all_26_26_0/street.dict.mrk2
all_26_26_0/street.mrk2
all_26_26_0/town.bin
all_26_26_0/town.dict.bin
all_26_26_0/town.dict.mrk2
all_26_26_0/town.mrk2
all_26_26_0/type.bin
all_26_26_0/type.mrk2
all_27_27_0/
all_27_27_0/addr1.bin
all_27_27_0/addr1.mrk2
all_27_27_0/addr2.bin
all_27_27_0/addr2.mrk2
all_27_27_0/checksums.txt
all_27_27_0/columns.txt
all_27_27_0/count.txt
all_27_27_0/county.bin
all_27_27_0/county.dict.bin
all_27_27_0/county.dict.mrk2
all_27_27_0/county.mrk2
all_27_27_0/date.bin
all_27_27_0/date.mrk2
all_27_27_0/default_compression_codec.txt
all_27_27_0/district.bin
all_27_27_0/district.dict.bin
all_27_27_0/district.dict.mrk2
all_27_27_0/district.mrk2
all_27_27_0/duration.bin
all_27_27_0/duration.mrk2
all_27_27_0/is_new.bin
all_27_27_0/is_new.mrk2
all_27_27_0/locality.bin
all_27_27_0/locality.dict.bin
all_27_27_0/locality.dict.mrk2
all_27_27_0/locality.mrk2
all_27_27_0/postcode1.bin
all_27_27_0/postcode1.dict.bin
all_27_27_0/postcode1.dict.mrk2
all_27_27_0/postcode1.mrk2
all_27_27_0/postcode2.bin
all_27_27_0/postcode2.dict.bin
all_27_27_0/postcode2.dict.mrk2
all_27_27_0/postcode2.mrk2
all_27_27_0/price.bin
all_27_27_0/price.mrk2
all_27_27_0/primary.idx
all_27_27_0/street.bin
all_27_27_0/street.dict.bin
all_27_27_0/street.dict.mrk2
all_27_27_0/street.mrk2
all_27_27_0/town.bin
all_27_27_0/town.dict.bin
all_27_27_0/town.dict.mrk2
all_27_27_0/town.mrk2
all_27_27_0/type.bin
all_27_27_0/type.mrk2
all_28_28_0/
all_28_28_0/checksums.txt
all_28_28_0/columns.txt
all_28_28_0/count.txt
all_28_28_0/data.bin
all_28_28_0/data.mrk3
all_28_28_0/default_compression_codec.txt
all_28_28_0/primary.idx
all_7_12_1/
all_7_12_1/addr1.bin
all_7_12_1/addr1.mrk2
all_7_12_1/addr2.bin
all_7_12_1/addr2.mrk2
all_7_12_1/checksums.txt
all_7_12_1/columns.txt
all_7_12_1/count.txt
all_7_12_1/county.bin
all_7_12_1/county.dict.bin
all_7_12_1/county.dict.mrk2
all_7_12_1/county.mrk2
all_7_12_1/date.bin
all_7_12_1/date.mrk2
all_7_12_1/default_compression_codec.txt
all_7_12_1/district.bin
all_7_12_1/district.dict.bin
all_7_12_1/district.dict.mrk2
all_7_12_1/district.mrk2
all_7_12_1/duration.bin
all_7_12_1/duration.mrk2
all_7_12_1/is_new.bin
all_7_12_1/is_new.mrk2
all_7_12_1/locality.bin
all_7_12_1/locality.dict.bin
all_7_12_1/locality.dict.mrk2
all_7_12_1/locality.mrk2
all_7_12_1/postcode1.bin
all_7_12_1/postcode1.dict.bin
all_7_12_1/postcode1.dict.mrk2
all_7_12_1/postcode1.mrk2
all_7_12_1/postcode2.bin
all_7_12_1/postcode2.dict.bin
all_7_12_1/postcode2.dict.mrk2
all_7_12_1/postcode2.mrk2
all_7_12_1/price.bin
all_7_12_1/price.mrk2
all_7_12_1/primary.idx
all_7_12_1/street.bin
all_7_12_1/street.dict.bin
all_7_12_1/street.dict.mrk2
all_7_12_1/street.mrk2
all_7_12_1/town.bin
all_7_12_1/town.dict.bin
all_7_12_1/town.dict.mrk2
all_7_12_1/town.mrk2
all_7_12_1/type.bin
all_7_12_1/type.mrk2

sent 321,849,195 bytes  received 6,467 bytes  214,570,441.33 bytes/sec
total size is 321,748,656  speedup is 1.00
Attach copy successfully done.
[]
[]
[]
[]
[]
[]
[]
[]

real	0m31.461s
user	0m1.272s
sys	0m0.339s

Elapsed Time is 1.272 seconds.

A new branch from the table which includes 28.35 million rows – 7.68 GB data is created in 1.272 seconds with name “uk_price_paid_devJohn”

Now, it is time to check DBaaS.

Lets push the refresh button near “Databases” and then check the tables.

As you can see table uk_price_paid_devJohn is created and table includes same data with uk_price_paid table.

28349587 rows return.

Now, developer John can connect to uk_price_paid_devJohn table and make his own changes.

Conclusion

The Database Branching Method is a powerful technique that empowers developers to work efficiently, experiment with new features, and maintain data integrity throughout the development lifecycle. By creating isolated branches, developers can make changes without fear of breaking the main production database and test various optimizations before deployment. As you adopt this method, remember to set up a robust version control system, define a suitable branching strategy, and promote rigorous testing to reap the full benefits of this approach. With the Database Branching Method in your toolkit, you can master data management and take your software development to new heights.

To learn more about ChistaDATA Cloud, do read the following articles:

References

 

About Can Sayn 41 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website