Transaction Isolation Levels with ClickHouse

In this article, we will talk about the isolation levels used in database systems.

As you know, four different features maintain consistency in a database. These are; Atomicity, Consistency, Isolation, and Durability (ACID for short). We will deal with the isolation part here. Isolation refers to how the integrity of a transaction appears to other users and systems. The isolation level defines how a transaction in the database should be isolated from other transactions.

 

Isolation Level Dirty Read Nonrepeatable read Phantom
Read uncommitted Yes Yes Yes
Read committed No Yes Yes
Read committed using row versioning No Yes Yes
Repeatable read No No Yes
Snapshot No No No
Serializable No No No

To summarize the table briefly, For example, taking the repeatable read level will only allow phantom read status in concurrent queries. Likewise, it does not allow problems that may occur between serializable and snapshot transactions.

Now, let us explain Isolation Levels.

Dirty Read: As the name suggests, it is a dirty read. A situation where a transaction reads data that has not yet been processed. For example, suppose we have two transactions, T1 and T2. T1 updates any row but quits without confirming the change. Meanwhile, T2 can read this updated data. If T1 undoes this action, T2 will continue to read data that did not exist.

Non-Repeatable Read: Again, let’s assume we have two transactions, T1 and T2. For example, suppose T1 is reading data. Simultaneously, T2 also updates this data. Now, if T1 gets the same data again, it reads a different value.

Now let’s talk about isolation levels. As it is known, there are basically four isolation levels.

Read Uncommitted: Read Uncommitted is the lowest isolation level. As the name suggests, a transaction at this level can read changes that another transaction has not yet approved. In this way, the Dirty Read event we mentioned above is allowed.

Read Committed: This isolation level ensures that any read data is already processed when it is read. Thus, it does not allow Dirty Read.

Repeatable Read: It is the most restrictive isolation level. Any transaction at this level holds read locks on all rows it references. This prevents Non-Repeatable Read as other processes cannot read, update, or delete these lines.

Serializable: This is the highest isolation level. A serializable execution is guaranteed to be serialized.

 

What is Snapshot Isolation?

As the name suggests, Snapshot Isolation provides each transaction with a copy of the relevant records to work on; that is, it runs on a snapshot of the data so that each transaction has worked on consistent data. The transaction that calculates the total number of instant users consistently and the transaction that adds a new user work in a healthy way without affecting each other. If two transactions want to update the same record, the first finalized transaction is allowed to be committed, and the second transaction is aborted because of the “Update Conflict.”

What is MVCC? (Multi-Version Concurrency Control)

MVCC aims to solve the problem by keeping multiple copies of each data item. This way, every user connected to the database sees a snapshot of the database at a given time. Any changes made by an insert will not be visible to other users of the database until the changes are committed (commit). When a piece of data needs to be updated in the database using MVCC, it does not overwrite the original data item with new data but instead creates a newer version of the data item. That’s why there are multiple versions stored. The version that each transaction sees depends on the isolation level applied. With isolation, the transaction observes the state of the data as when the transaction started. MVCC provides consistent views over time. Under MVCC, transactions use a timestamp or transaction id to determine which state of the DB to read and read those versions of data. Read and write operations are thus isolated from each other without the need for locking. MVCC introduces a challenge on how to remove versions that have become obsolete and will never be read. Sometimes, a process is implemented to periodically scan and delete old versions. PostgreSQL takes this approach with its VACUUM process.

Conclusion

In ClickHouse, Mergetree family engines provides isolation between two transactions (INSERT operations within a partition of a table up to rows specified in max_insert_size setting). MVCC is already used in ClickHouse which provides Snapshot Isolation for reading queries. Snapshot Isolation requires detecting conflicts and rollback of a transaction if it tries to modify an object modified by a concurrent transaction. It’s not a problem for concurrent INSERT queries because they just append new data to the table by creating new data parts in MergeTree engines. Two INSERTs cannot modify a single object and cannot cause write-write conflict. To support transactional ALTERs, it should be enough to forbid concurrent merging/mutating operations with overlapping source parts sets and not assign such operations on future parts. A new experimental feature is available in ClickHouse, which has better support for transactions.

About Can Sayn 10 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website