Designing Data-Intensive Applications (7) - Transactions

This is my personal learning notes from reading <> Chapter 7.

What is transaction?

A transaction consists of multiple operations, only consider a transaction succeed when all the operations succeed. We are familiar with this when we learning database knowledge.

But in distributed environment, it means more. For example, when client fire a request to the distributed system, to return a response to the client, the backend may needs to fire multiple service calls to other services, any of them fail during the process may cause no response to the client, so we need to handle transactions in this semantic as well.

Transaction has the following properties:

  • Atomicity: The smallest operation that can’t be separated further. In case of errors, database should abort the transaction, all changes already made must be discarded.

  • Isolation: Concurrent executions are isolated from each other, they can’t step on others’ toe.

  • Consistency: A transaction can only bring the database from one valid state to another, this actually relies on the app to achieve, for example application needs to define some rules on your table to define a valid operation (is amount allowed to be below 0? etc).

  • Durability: Avoid data loss, usually backup and disaster recovery should be in place.

What makes transaction difficult?

Most of databases have good support of atomicity and isolation, while here we should think more than database level. Things become more difficult when we take application and business requirements into consideration. For example:

  • If a transaction fail, what the application should do? (retry but keep it idempotent)
  • How to handle race condition if transaction has weak isolation? there are so many different cases of race conditions.
  • Write skew and phantoms: two transactions update two different objects, but violate business rules.

Solution

Concurrency control(avoid race condition) maybe the most difficult problem in transaction. Different isolation levels are used to solve this problems:

Read Committed

The most basic level of transaction isolation and it is also the default setting for many databases used to prevent Dirty reads/writes.

When reading/writing to database, only committed data can be seen/overwritten.

  • Dirty reads: Transaction 1 modifies a row. Transaction 2 reads the same row before Transaction 1 commits.
  • Dirty writes: Transaction 1 modifies row A, transaction 2 modifies row A after transaction modified A. then transaction 1 wants to modify row B and check row A again, but now row A contains the data that modified by transaction 2 which is totally flawed.

Snapshot isolation and repeatable read

Read committed can’t solve read skew: read values in the middle of another transaction, and after that transaction finished, the read values are changed (non repeatable read).

This indicates that before a transaction finished, we shouldn’t allow to read the intermediate data.

Snapshot isolation ensures the database to return old values(from a snapshot) that before a transaction, this means that database needs to maintain multiple version of object(Multiple Version Concurrency Control).

The read committed and snapshot isolation levels have been primarily about the guarantees of what a read-only transaction can see in the presence of concurrent writes.

But what about write, two transactions want to do read-modify-write at the same time, one of them could be lost.

  • Use atomic write operation.
  • Explicit locking: Explicitly lock objects that are going to be updated, if any other transaction tries to concurrently read the same object, it is forced to wait until the first read-modify-write cycle has completed.
  • Detect/resolve lost update: This is useful in a distributed system that data has multiple replicas, then conflict resolution mentioned in previous blogs can be used.

Serializability

Write skew/phantom read: two transactions read the same object but update different objects. in real world, a ticketing system needs to handle this issue carefully(how to not issue ticket to two people when there is only one left).

To handle this problem, previous solution won’t work because two transactions are updating different objects. A very naive way of handling this issue is to avoid concurrency, transform two transactions to execute serially. But it’s very obviously that it has performance concerns.

Serialization is also the strongest isolation level, writers don’t just block other writers; they also block readers and vice versa.

References

https://tikv.github.io/deep-dive-tikv/distributed-transaction/distributed-algorithms.html