Introduction
In the world of modern software development, managing data efficiently and effectively is crucial for maintaining a successful and scalable application. One of the techniques that has gained popularity in recent years is the Database Branching Method. This method allows developers to create multiple branches of their database, enabling them to test new features, experiment with different data structures, and maintain data integrity throughout the development process. In this blog, we will explore what the Database Branching Method is, its benefits, and how to implement it in your project on ChistaDATA DBaaS platform.
What is the Database Branching Method?
The Database Branching Method is an approach that involves creating multiple branches of a database to isolate changes made during development. Similar to how developers use version control systems like Git to manage code changes, the Database Branching Method extends this concept to data management. Each branch represents a separate environment for data manipulation, allowing developers to work on different features or experiments without affecting the main production database.
Benefits of Database Branching Method
- Isolation and Experimentation: With separate database branches, developers can freely experiment with new features, data models, or algorithms without worrying about disrupting the main production database. This isolation enables risk-free testing and fosters innovation.
- Parallel Development: Database branching enables multiple developers to work on different tasks simultaneously. Each developer can have their own branch, which they can merge or discard after completing their task, reducing the risk of conflicts and promoting collaboration.
- Code-Data Synchronization: By linking code changes with specific database branches, the Database Branching Method ensures that changes to the data model align with the corresponding code changes. This synchronization helps prevent data inconsistencies and reduces debugging efforts.
- Easy Rollback: If a new feature or experiment doesn’t yield the expected results, developers can easily roll back to a previous branch state. This ability to revert changes is particularly useful for maintaining data integrity and recovering from potential errors.
- Performance Optimization: Database branching allows developers to test performance optimizations on a smaller scale before applying them to the production database. This practice ensures that only thoroughly tested and optimized changes make it to the main environment.
Example with ChistaDATA Branching Tool on DBaaS
First, create an example table and insert an example data from ClickHouse official dataset.
ClickHouse Example dataset official document
CREATE TABLE uk_price_paid ( price UInt32, date Date, postcode1 LowCardinality(String), postcode2 LowCardinality(String), type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0), is_new UInt8, duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0), addr1 String, addr2 String, street LowCardinality(String), locality LowCardinality(String), town LowCardinality(String), district LowCardinality(String), county LowCardinality(String) ) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);
Insert data into table
INSERT INTO uk_price_paid WITH splitByChar(' ', postcode) AS p SELECT toUInt32(price_string) AS price, parseDateTimeBestEffortUS(time) AS date, p[1] AS postcode1, p[2] AS postcode2, transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type, b = 'Y' AS is_new, transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration, addr1, addr2, street, locality, town, district, county FROM url( 'http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv', 'CSV', 'uuid_string String, price_string String, time String, postcode String, a String, b String, c String, addr1 String, addr2 String, street String, locality String, town String, district String, county String, d String, e String' ) SETTINGS max_http_get_redirects=10;
Insert Statement Result:
Elapsed: 168.927 sec. Processed 28.35 million rows, 7.68 GB (167.82 thousand rows/s., 45.45 MB/s.)
Now check item count on ChistaDATA DBaaS.
28349587 rows return.
Now lets run branching tool and check what is happening.
Note: Connection Info can be found on Connection Section on DBaaS.
In this example, we will create a new branch for developer John.
Our DBaaS instance sources:
Memory Total: 64794092 kB
CPU: 8 core
First of all, you have to provide the following informations to connect our ChistaDATA DBaaS and create a new branch.
Database Name, Table Name, Branch Name, User Name, Password, Host Name, Port
Branching Tool will freeze your main table and then will create a new table with same structure of the main table.
When the freeze process is done, branching tool will create a hardlink for your main table and will give access to you to reach your branch data.
Note: This process does not copy your data. So, your storage size stay as same. The branching tool establishes a hardlink communication between the main table and the branching table for you to use.
(For more information about Freeze command you can check our article about Freeze command
https://chistadata.com/how-to-use-freeze-command-in-clickhouse/)
Branching tool will examine your folders located under the following freeze path and will create a hardlink for your new branching table.
chi-ws91-ch559-chi-clickhouse-chi-0-0-0:~# time python3 branching.py Enter the database name: uk Enter the table name: uk_price_paid Enter the branch name: devJohn Enter the ClickHouse username: can.sayn@chistadata.com Enter the ClickHouse password: Enter the ClickHouse hostname: ws91-ch559.dev.db.chistadata.io Enter the ClickHouse port: 9440 The table uk.uk_price_paid has been frozen with name 'uk.uk_price_paid_devJohn' Table information is saved into /tmp/uk.uk_price_paid.sql New Table information is saved into /tmp/uk.uk_price_paid_devJohn.sql Table uk.uk_price_paid_devJohn created successfully. /var/lib/clickhouse/shadow/uk%2Euk_price_paid_devJohn/store/013/01350aa7-e8ff-48bd-aee1-8892336da565 /var/lib/clickhouse/data/uk/uk_price_paid_devJohn/detached/ sending incremental file list ./ all_13_18_1/ all_13_18_1/addr1.bin all_13_18_1/addr1.mrk2 all_13_18_1/addr2.bin all_13_18_1/addr2.mrk2 all_13_18_1/checksums.txt all_13_18_1/columns.txt all_13_18_1/count.txt all_13_18_1/county.bin all_13_18_1/county.dict.bin all_13_18_1/county.dict.mrk2 all_13_18_1/county.mrk2 all_13_18_1/date.bin all_13_18_1/date.mrk2 all_13_18_1/default_compression_codec.txt all_13_18_1/district.bin all_13_18_1/district.dict.bin all_13_18_1/district.dict.mrk2 all_13_18_1/district.mrk2 all_13_18_1/duration.bin all_13_18_1/duration.mrk2 all_13_18_1/is_new.bin all_13_18_1/is_new.mrk2 all_13_18_1/locality.bin all_13_18_1/locality.dict.bin all_13_18_1/locality.dict.mrk2 all_13_18_1/locality.mrk2 all_13_18_1/postcode1.bin all_13_18_1/postcode1.dict.bin all_13_18_1/postcode1.dict.mrk2 all_13_18_1/postcode1.mrk2 all_13_18_1/postcode2.bin all_13_18_1/postcode2.dict.bin all_13_18_1/postcode2.dict.mrk2 all_13_18_1/postcode2.mrk2 all_13_18_1/price.bin all_13_18_1/price.mrk2 all_13_18_1/primary.idx all_13_18_1/street.bin all_13_18_1/street.dict.bin all_13_18_1/street.dict.mrk2 all_13_18_1/street.mrk2 all_13_18_1/town.bin all_13_18_1/town.dict.bin all_13_18_1/town.dict.mrk2 all_13_18_1/town.mrk2 all_13_18_1/type.bin all_13_18_1/type.mrk2 all_19_24_1/ all_19_24_1/addr1.bin all_19_24_1/addr1.mrk2 all_19_24_1/addr2.bin all_19_24_1/addr2.mrk2 all_19_24_1/checksums.txt all_19_24_1/columns.txt all_19_24_1/count.txt all_19_24_1/county.bin all_19_24_1/county.dict.bin all_19_24_1/county.dict.mrk2 all_19_24_1/county.mrk2 all_19_24_1/date.bin all_19_24_1/date.mrk2 all_19_24_1/default_compression_codec.txt all_19_24_1/district.bin all_19_24_1/district.dict.bin all_19_24_1/district.dict.mrk2 all_19_24_1/district.mrk2 all_19_24_1/duration.bin all_19_24_1/duration.mrk2 all_19_24_1/is_new.bin all_19_24_1/is_new.mrk2 all_19_24_1/locality.bin all_19_24_1/locality.dict.bin all_19_24_1/locality.dict.mrk2 all_19_24_1/locality.mrk2 all_19_24_1/postcode1.bin all_19_24_1/postcode1.dict.bin all_19_24_1/postcode1.dict.mrk2 all_19_24_1/postcode1.mrk2 all_19_24_1/postcode2.bin all_19_24_1/postcode2.dict.bin all_19_24_1/postcode2.dict.mrk2 all_19_24_1/postcode2.mrk2 all_19_24_1/price.bin all_19_24_1/price.mrk2 all_19_24_1/primary.idx all_19_24_1/street.bin all_19_24_1/street.dict.bin all_19_24_1/street.dict.mrk2 all_19_24_1/street.mrk2 all_19_24_1/town.bin all_19_24_1/town.dict.bin all_19_24_1/town.dict.mrk2 all_19_24_1/town.mrk2 all_19_24_1/type.bin all_19_24_1/type.mrk2 all_1_6_1/ all_1_6_1/addr1.bin all_1_6_1/addr1.mrk2 all_1_6_1/addr2.bin all_1_6_1/addr2.mrk2 all_1_6_1/checksums.txt all_1_6_1/columns.txt all_1_6_1/count.txt all_1_6_1/county.bin all_1_6_1/county.dict.bin all_1_6_1/county.dict.mrk2 all_1_6_1/county.mrk2 all_1_6_1/date.bin all_1_6_1/date.mrk2 all_1_6_1/default_compression_codec.txt all_1_6_1/district.bin all_1_6_1/district.dict.bin all_1_6_1/district.dict.mrk2 all_1_6_1/district.mrk2 all_1_6_1/duration.bin all_1_6_1/duration.mrk2 all_1_6_1/is_new.bin all_1_6_1/is_new.mrk2 all_1_6_1/locality.bin all_1_6_1/locality.dict.bin all_1_6_1/locality.dict.mrk2 all_1_6_1/locality.mrk2 all_1_6_1/postcode1.bin all_1_6_1/postcode1.dict.bin all_1_6_1/postcode1.dict.mrk2 all_1_6_1/postcode1.mrk2 all_1_6_1/postcode2.bin all_1_6_1/postcode2.dict.bin all_1_6_1/postcode2.dict.mrk2 all_1_6_1/postcode2.mrk2 all_1_6_1/price.bin all_1_6_1/price.mrk2 all_1_6_1/primary.idx all_1_6_1/street.bin all_1_6_1/street.dict.bin all_1_6_1/street.dict.mrk2 all_1_6_1/street.mrk2 all_1_6_1/town.bin all_1_6_1/town.dict.bin all_1_6_1/town.dict.mrk2 all_1_6_1/town.mrk2 all_1_6_1/type.bin all_1_6_1/type.mrk2 all_25_25_0/ all_25_25_0/addr1.bin all_25_25_0/addr1.mrk2 all_25_25_0/addr2.bin all_25_25_0/addr2.mrk2 all_25_25_0/checksums.txt all_25_25_0/columns.txt all_25_25_0/count.txt all_25_25_0/county.bin all_25_25_0/county.dict.bin all_25_25_0/county.dict.mrk2 all_25_25_0/county.mrk2 all_25_25_0/date.bin all_25_25_0/date.mrk2 all_25_25_0/default_compression_codec.txt all_25_25_0/district.bin all_25_25_0/district.dict.bin all_25_25_0/district.dict.mrk2 all_25_25_0/district.mrk2 all_25_25_0/duration.bin all_25_25_0/duration.mrk2 all_25_25_0/is_new.bin all_25_25_0/is_new.mrk2 all_25_25_0/locality.bin all_25_25_0/locality.dict.bin all_25_25_0/locality.dict.mrk2 all_25_25_0/locality.mrk2 all_25_25_0/postcode1.bin all_25_25_0/postcode1.dict.bin all_25_25_0/postcode1.dict.mrk2 all_25_25_0/postcode1.mrk2 all_25_25_0/postcode2.bin all_25_25_0/postcode2.dict.bin all_25_25_0/postcode2.dict.mrk2 all_25_25_0/postcode2.mrk2 all_25_25_0/price.bin all_25_25_0/price.mrk2 all_25_25_0/primary.idx all_25_25_0/street.bin all_25_25_0/street.dict.bin all_25_25_0/street.dict.mrk2 all_25_25_0/street.mrk2 all_25_25_0/town.bin all_25_25_0/town.dict.bin all_25_25_0/town.dict.mrk2 all_25_25_0/town.mrk2 all_25_25_0/type.bin all_25_25_0/type.mrk2 all_26_26_0/ all_26_26_0/addr1.bin all_26_26_0/addr1.mrk2 all_26_26_0/addr2.bin all_26_26_0/addr2.mrk2 all_26_26_0/checksums.txt all_26_26_0/columns.txt all_26_26_0/count.txt all_26_26_0/county.bin all_26_26_0/county.dict.bin all_26_26_0/county.dict.mrk2 all_26_26_0/county.mrk2 all_26_26_0/date.bin all_26_26_0/date.mrk2 all_26_26_0/default_compression_codec.txt all_26_26_0/district.bin all_26_26_0/district.dict.bin all_26_26_0/district.dict.mrk2 all_26_26_0/district.mrk2 all_26_26_0/duration.bin all_26_26_0/duration.mrk2 all_26_26_0/is_new.bin all_26_26_0/is_new.mrk2 all_26_26_0/locality.bin all_26_26_0/locality.dict.bin all_26_26_0/locality.dict.mrk2 all_26_26_0/locality.mrk2 all_26_26_0/postcode1.bin all_26_26_0/postcode1.dict.bin all_26_26_0/postcode1.dict.mrk2 all_26_26_0/postcode1.mrk2 all_26_26_0/postcode2.bin all_26_26_0/postcode2.dict.bin all_26_26_0/postcode2.dict.mrk2 all_26_26_0/postcode2.mrk2 all_26_26_0/price.bin all_26_26_0/price.mrk2 all_26_26_0/primary.idx all_26_26_0/street.bin all_26_26_0/street.dict.bin all_26_26_0/street.dict.mrk2 all_26_26_0/street.mrk2 all_26_26_0/town.bin all_26_26_0/town.dict.bin all_26_26_0/town.dict.mrk2 all_26_26_0/town.mrk2 all_26_26_0/type.bin all_26_26_0/type.mrk2 all_27_27_0/ all_27_27_0/addr1.bin all_27_27_0/addr1.mrk2 all_27_27_0/addr2.bin all_27_27_0/addr2.mrk2 all_27_27_0/checksums.txt all_27_27_0/columns.txt all_27_27_0/count.txt all_27_27_0/county.bin all_27_27_0/county.dict.bin all_27_27_0/county.dict.mrk2 all_27_27_0/county.mrk2 all_27_27_0/date.bin all_27_27_0/date.mrk2 all_27_27_0/default_compression_codec.txt all_27_27_0/district.bin all_27_27_0/district.dict.bin all_27_27_0/district.dict.mrk2 all_27_27_0/district.mrk2 all_27_27_0/duration.bin all_27_27_0/duration.mrk2 all_27_27_0/is_new.bin all_27_27_0/is_new.mrk2 all_27_27_0/locality.bin all_27_27_0/locality.dict.bin all_27_27_0/locality.dict.mrk2 all_27_27_0/locality.mrk2 all_27_27_0/postcode1.bin all_27_27_0/postcode1.dict.bin all_27_27_0/postcode1.dict.mrk2 all_27_27_0/postcode1.mrk2 all_27_27_0/postcode2.bin all_27_27_0/postcode2.dict.bin all_27_27_0/postcode2.dict.mrk2 all_27_27_0/postcode2.mrk2 all_27_27_0/price.bin all_27_27_0/price.mrk2 all_27_27_0/primary.idx all_27_27_0/street.bin all_27_27_0/street.dict.bin all_27_27_0/street.dict.mrk2 all_27_27_0/street.mrk2 all_27_27_0/town.bin all_27_27_0/town.dict.bin all_27_27_0/town.dict.mrk2 all_27_27_0/town.mrk2 all_27_27_0/type.bin all_27_27_0/type.mrk2 all_28_28_0/ all_28_28_0/checksums.txt all_28_28_0/columns.txt all_28_28_0/count.txt all_28_28_0/data.bin all_28_28_0/data.mrk3 all_28_28_0/default_compression_codec.txt all_28_28_0/primary.idx all_7_12_1/ all_7_12_1/addr1.bin all_7_12_1/addr1.mrk2 all_7_12_1/addr2.bin all_7_12_1/addr2.mrk2 all_7_12_1/checksums.txt all_7_12_1/columns.txt all_7_12_1/count.txt all_7_12_1/county.bin all_7_12_1/county.dict.bin all_7_12_1/county.dict.mrk2 all_7_12_1/county.mrk2 all_7_12_1/date.bin all_7_12_1/date.mrk2 all_7_12_1/default_compression_codec.txt all_7_12_1/district.bin all_7_12_1/district.dict.bin all_7_12_1/district.dict.mrk2 all_7_12_1/district.mrk2 all_7_12_1/duration.bin all_7_12_1/duration.mrk2 all_7_12_1/is_new.bin all_7_12_1/is_new.mrk2 all_7_12_1/locality.bin all_7_12_1/locality.dict.bin all_7_12_1/locality.dict.mrk2 all_7_12_1/locality.mrk2 all_7_12_1/postcode1.bin all_7_12_1/postcode1.dict.bin all_7_12_1/postcode1.dict.mrk2 all_7_12_1/postcode1.mrk2 all_7_12_1/postcode2.bin all_7_12_1/postcode2.dict.bin all_7_12_1/postcode2.dict.mrk2 all_7_12_1/postcode2.mrk2 all_7_12_1/price.bin all_7_12_1/price.mrk2 all_7_12_1/primary.idx all_7_12_1/street.bin all_7_12_1/street.dict.bin all_7_12_1/street.dict.mrk2 all_7_12_1/street.mrk2 all_7_12_1/town.bin all_7_12_1/town.dict.bin all_7_12_1/town.dict.mrk2 all_7_12_1/town.mrk2 all_7_12_1/type.bin all_7_12_1/type.mrk2 sent 321,849,195 bytes received 6,467 bytes 214,570,441.33 bytes/sec total size is 321,748,656 speedup is 1.00 Attach copy successfully done. [] [] [] [] [] [] [] [] real 0m31.461s user 0m1.272s sys 0m0.339s
Elapsed Time is 1.272 seconds.
A new branch from the table which includes 28.35 million rows – 7.68 GB data is created in 1.272 seconds with name “uk_price_paid_devJohn”
Now, it is time to check DBaaS.
Lets push the refresh button near “Databases” and then check the tables.
As you can see table uk_price_paid_devJohn is created and table includes same data with uk_price_paid table.
28349587 rows return.
Now, developer John can connect to uk_price_paid_devJohn table and make his own changes.
Conclusion
The Database Branching Method is a powerful technique that empowers developers to work efficiently, experiment with new features, and maintain data integrity throughout the development lifecycle. By creating isolated branches, developers can make changes without fear of breaking the main production database and test various optimizations before deployment. As you adopt this method, remember to set up a robust version control system, define a suitable branching strategy, and promote rigorous testing to reap the full benefits of this approach. With the Database Branching Method in your toolkit, you can master data management and take your software development to new heights.
To learn more about ChistaDATA Cloud, do read the following articles:
- Architecture of ChistaDATA Cloud for ClickHouse: Part 1
- Architecture of ChistaDATA Cloud for ClickHouse: Part 2
- Connecting ChistaDATA Cloud for ClickHouse with Python
- ChistaDATA Cloud for ClickHouse: Benchmarking on OpenTelemetry Dataset
References
- https://clickhouse.com/docs/en/getting-started/example-datasets/uk-price-paid
- https://chistadata.com/how-to-use-freeze-command-in-clickhouse/