Designing Data-Intensive Applications (9) - Consistency and Consensus

This is my personal learning notes from reading <> Chapter 8 and 9.

This chapter is probably the hardest one in this book to understand.

The best way of building fault-tolerant systems is to find some general-purpose abstractions with useful guarantees, implement them once, and then let applications rely on those guarantees.

one of the most important abstractions for distributed systems is consensus, getting all of the nodes to agree on something and reliably reaching consensus in spite of network faults and process failures.

Consistency Guarantees

Linearizability

A strong consistency model is to make replicated data appear as if there were only a single copy, and to make all operations act on it atomically.

In a linearizable system, as soon as one client successfully completes a write, all clients reading from the database must be able to see the value just written.

But this model may not be very efficient, while weaker consistency models can be much faster, so this trade-off is important for latency-sensitive systems. Actually, most of the current systems don’t implement linearizability.

Causality Consistency and Ordering Guarantees

Causal consistency instead is the strongest possible consistency model that does not slow down due to network delays, and remains available in the face of network failures.

Unlike linearizability, which puts all operations in a single, totally ordered timeline, causality provides a weaker consistency model: some things can be concurrent, so the version history is like a timeline with branching and merging. This also implies that ordering is very important.

How to capture casual ordering

  • Sequence number ordering
  • Non casual sequence number generators
  • Lamport timestamp

In order to implement uniqueness constraint, it’s not sufficient to have a total ordering of operations, you also need to know when that order is finalized, this is what total order broadcast can do.

Total order broadcast

Total order broadcast is usually described as a protocol for exchanging messages between nodes, it is what needed for database replication.

Properties of total order broadcast:

  • Reliable delivery: No messages are lost, if a message is delivered to one node, it is delivered to all nodes.

  • Total ordered delivery: Messages are delivered to every node in the same order.

These properties actually can be implemented based on consensus algorithms.

Eventual Consistency

This is widely adopted in industry. for example Amazon DynamoDB.

Distributed Transactions and Consensus

Consensus provides total order broadcast, and therefore can be used to implement linearizable atomic operations in a fault-tolerant way.

Two-phase commit

2PC is a simple consensus algorithm, first we need to differentiate it with two phase locking.

Two-phase locking is used in databases to ensure serializability, meaning that outcome of concurrent operations can be seen as the outcome of the same operations performed in some sequential order. The algorithm describes how a transaction in a database should deal with locking a part of the database in order to read or write.

Two-phase commit is used in distributed systems in a process of distributed transaction with a single coordinator and several followers.

  • The first phase: The coordinator first asks all of the followers to commit a transactions, then every follower executes the command and if it succeeds then it replies with an agreement to commit, otherwise it replies with an abort. The voting phase.

  • The second phase: When all agreement messages are received, coordinator commits the transaction. If there is an abort message from any follower, the coordinator aborts the transaction.

What if coordinator crashed?

Fault tolerant consensus

Fault tolerant consensus algorithms compared to 2PC should have the following properties:

  • Uniform agreement: No two nodes decide differently.

  • Integrity: No node decides twice.

  • Validity: If a node decides value v, then v was proposed by some node.

  • Termination: Every node that does not crash eventually decides some value.

Zookeeper ZAB, Paxos and Raft are some popular consensus algorithms.