Paper Review - Paxos vs Raft: Consensus on distributed consensus

Reaching consensus in a distributed system among nodes or processes is a complex problem because the network is not reliable, and the participants may experience failures. Two popular algorithms are Paxos (or its variant Multi-Paxos) and Raft. Paxos was proposed by the computer scientist Leslie Lamport (a Turing Award winner), it was a seminal paper in computer science which laid the theoretical and practical groundwork for all subsequent distributed consensus algorithms.

Consensus protocols have wide applications including payment systems, storage systems (data replication e.g. Google Spanner or CockroachDB), distributed locking (shared resources in distributed systems), blockchain technology.

Heidi Howard and Richard Mortimer from University of Cambridge in the UK wrote a paper comparing two algorithms “Paxos vs Raft: Have we reached consensus on distributed consensus?” ( https://lnkd.in/g7yVWjAq ). It is an interesting read, I tried to summarize the paper in the following article.

“What's in a name? That which we call a rose by any other name would smell just as sweet.”

Juliet to Romeo in “Romeo and Juliet” by William Shakespeare

The Problem of Distributed Consensus

Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communications may experience failures. The reality of distributed systems is that failures are pervasive, consensus protocols try to achieve agreement among distributed nodes, even in the presence of failures. A few examples where consistent protocols play a crucial role are payment systems, data replication across the majority of replicas of storage systems (Google Spanner or CockroachDB), distributed locking (manage access to shared resources in distributed systems), blockchain technology. There are many consensus protocols, two dominant and popular algorithms are Paxos and Raft. Raft is much closer to Multi-Paxos (variant of Paxos) than it is to the basic, single-instance Paxos algorithm.

History

Paxos

Paxos was proposed by the computer scientist Leslie Lamport (a Turing Award winner). Paxos algorithm is a seminal paper in computer science, notable for the algorithm's correctness. Instead of presenting the algorithm using traditional computer science, Lamport used an elaborate analogy of a parliament on the fictional ancient Greek island of Paxos. The initial paper, titled "The Part-Time Parliament" described how this parliament had to function and maintain a consistent record of decrees (the state machine) even though its legislators (processes) continually "wandered in and out" (failed or became unavailable) and their messengers (network messages) were unreliable. Reviewers found the fictional Greek presentation distracting, confusing, and inappropriate for serious computer science. Recognizing the problem, Lamport published a follow-up paper, "Paxos Made Simple" in 2001. Paxos, and its variants like Multi-Paxos, eventually became the foundational protocol for strong consistency in large-scale systems, most notably Google's Chubby (a distributed lock service), Google Spanner (a global-scale distributed database).

Despite its difficult beginnings, Paxos laid the indispensable theoretical and practical groundwork for all subsequent distributed consensus algorithms. The core mechanisms developed in Paxos are still fundamental to modern consensus protocols.

Raft

The Raft consensus protocol is a direct response to the perceived complexity and implementation difficulties associated with Paxos. Raft was explicitly designed with a single goal of understandability. The creators of Raft, Diego Ongaro and John Ousterhout at Stanford University, noted that it was hard to understand Paxos protocol until after reading several simplified explanations. Raft was introduced in 2013, the paper received Best Paper Award at the 2014 USENIX Annual Technical Conference. Today, Raft is the foundation for etcd (the key-value store used by Kubernetes), Apache Kafka (KRaft), which replaced its reliance on ZooKeeper with an embedded Raft protocol.

Introduction

The paper compares the two protocols based on simplicity, understandability, leader election and popularity based on number of citations and open source available implementations.

Both algorithms operate on the same principle:

One of the participants is elected as the leader.
The leader accepts all writes and sends log entries to the followers.
Once a majority of servers replicates an entry, it is applied to the state machine by all members (commit).
In the event of a failure of leader, a new leader is elected by a majority.

Paxos: Brief Introduction

Reputation: Paxos is the "classic, textbook gold standard", but is difficult to understand and often described as a family of algorithms rather than a single one.
Leader Election: Its operation is split between Acceptors, Proposers, and Learners. For leader election (or choosing a Proposer), the protocol uses a ballot number (or term) mechanism where a Proposer sends a Prepare message with a unique, increasing ballot number, if a majority of Acceptors agree, that Proposer can then attempt to propose a value. This decentralized mechanism, known as Phase 1, ensures that only the Proposer with the highest agreed-upon ballot number is allowed to commit a new value.

Raft: Brief Introduction

Goal: Raft was designed explicitly for understandability, prioritizing simplicity over potential performance gains. Uses pragmatic, engineering-focused terms like Request Votes and Append Entries rather than the abstract 1A, 1B nomenclature of Paxos.
Leader Election: Any node is in one of three roles: Follower, Candidate, Leader. Leader election occurs when a Follower does not receive heartbeat from Leader and times out at which point it transitions to a Candidate state (uses terms by increasing the term counter), sending Request Vote messages to peers, and wins if it receives votes from a majority of the servers. Raft uses a randomized election timeout to ensure that all nodes eventually converge to a solution in the event of split vote.

Which is Better?

The authors of the paper ( https://arxiv.org/abs/2004.05074 ) mention that both consensus algorithms are excellent, and for the most part, it doesn't matter which one is chosen. The key is that the two communities can benefit from each other's research, as many of the vast optimizations developed for Paxos can be applied to Raft. The paper offers a practical assessment of their understandability and efficiency:

Understandability: The authors argue that there is no significant difference in understandability of the underlying algorithms. Raft's success is attributed to its "brilliantly written" paper and pragmatic presentation, not a fundamental algorithmic change.

Efficiency: While Raft can suffer from slower leader election due to split votes, the authors ultimately favors Raft’s leadership election mechanism because it is surprisingly efficient and avoids the need for followers to send potentially huge log entries back to the candidate during the election phase, as is necessary in a naive Paxos implementation.

The final conclusion in the paper is that while Raft's paper is better written, the underlying algorithms are not significantly different in understandability. However, Raft's leadership election mechanism is considered surprisingly lightweight and efficient in practice. Ultimately, the choice between the two often "doesn't matter" as they are incredibly similar, and many optimizations developed for one can be applied to the other. I personally prefer Raft.

References

Paxos vs Raft: Have we reached consensus on distributed consensus? By Heidi Howard, Richard Mortier https://arxiv.org/abs/2004.05074
Youtube presentation by Heidi Howard on “Have we reached consensus on distributed consensus?” https://www.youtube.com/watch?v=0K6kt39wyH0

Search This Blog

Gaurav Blog