CS-240 Assignment 3: Raft

Introduction

This is the first of two assignments in which you'll build a fault-tolerant key/value storage system. You'll start in this assignment by implementing Raft, a replicated state machine protocol. You will start by implementing the leader election features of Raft, and then you will complete Raft's log consensus agreement features. You will implement Raft as a Go object with associated methods, available to be used as a module in a larger service. Once you have completed Raft, the last course assignment will conclude with such a service: a key/value service built on top of Raft.

Raft Overview

The Raft protocol is used to manage replica servers for services that must continue operation in the face of failure (e.g., server crashes, broken or flaky networks). The challenge is that, in the face of these failures, the replicas won't always hold identical data. The Raft protocol helps sort out what the correct data is.

Raft's basic approach for this is to implement a replicated state machine. Raft organizes client requests into a sequence, called the log, and ensures that all the replicas agree on the contents of the log. Each replica executes the client requests in the log in the order they appear in the log, applying those requests to the service's state. Since all the live replicas see the same log contents, they all execute the same requests in the same order, and thus continue to have identical service state. If a server fails but later recovers, Raft takes care of bringing its log up to date. Raft will continue to operate as long as at least a majority of the servers are alive and can talk to each other. If there is no such majority, Raft will make no progress, but will pick up where it left off as soon as a majority is alive again.

You should consult the extended Raft paper and the Raft lecture notes. You may also find this illustrated Raft guide useful to get a sense of the high-level workings of Raft. For a wider perspective, have a look at Paxos, Chubby, Paxos Made Live, Spanner, Zookeeper, Harp, Viewstamped Replication, and Bolosky et al.

Software

You can download a tarball containing the provided code for assignment 3 here. For this assignment, we will focus primarily on the code and tests for the Raft implementation in src/raft and the simple RPC-like system in src/labrpc. It is worth your while to read and digest the code in these packages.

Before you have implemented anything, your Raft tests will fail, but this behavior is a sign that you have everything properly configured and are ready to begin:

$ cd cs240 # or wherever you unpacked your tarball
$ export GOPATH="$PWD"
$ cd "$GOPATH/src/raft"
$ go test -run Election
Test: initial election ...
--- FAIL: TestInitialElection (5.00s)
config.go:286: expected one leader, got none
Test: election after network failure ...
--- FAIL: TestReElection (5.00s)
config.go:286: expected one leader, got none
FAIL
exit status 1
FAIL  raft  10.053s

You should implement Raft by adding code to raft/raft.go (only). In that file you'll find a bit of skeleton code, plus some examples of how to send and receive RPCs, and examples of how to save and restore persistent state.

Part I: Leader Election

You should start by reading the code to determine which functions are responsible for conducting Raft leader election, if you haven't already.

The natural first task is to fill in the RequestVoteArgs and RequestVoteReply structs, and modify Make() to create a background goroutine that starts an election (by sending out RequestVote RPCs) when it hasn't heard from another peer for a while. For election to work, you will also need to implement the RequestVote() RPC handler so that servers will vote for one another.

To implement heartbeats, you will need to define an AppendEntries RPC struct (though you will not need any real payload yet), and have the leader send them out periodically. You will also have to write an AppendEntries RPC handler method that resets the election timeout so that other servers don't step forward as leaders when one has already been elected.

Make sure the timers in different Raft peers are not synchronized. In particular, make sure the election timeouts don't always fire at the same time, or else all peers will vote for themselves and no one will become leader.

Your Raft implementation must support the following interface, which the tester and (eventually) your key/value server will use. You'll find more details in comments in raft.go.

// create a new Raft server instance:
rf := Make(peers, me, persister, applyCh)

// start agreement on a new log entry:
rf.Start(command interface{}) (index, term, isleader)

// ask a Raft for its current term, and whether it thinks it is leader
rf.GetState() (term, isLeader)

// each time a new entry is committed to the log, each Raft peer
// should send an ApplyMsg to the service (or tester).
type ApplyMsg

A service calls Make(peers,me,…) to create a Raft peer. The peers argument is an array of established RPC connections, one to each Raft peer (including this one). The me argument is the index of this peer in the peers array. Start(command) asks Raft to start the processing to append the command to the replicated log. Start() should return immediately, without waiting for this process to complete. The service expects your implementation to send an ApplyMsg for each new committed log entry to the applyCh argument to Make().

Your Raft peers should exchange RPCs using the labrpc Go package that we provide to you. It is modeled after Go's rpc library, but internally uses Go channels rather than sockets. raft.go contains some example code that sends an RPC (sendRequestVote()) and that handles an incoming RPC (RequestVote()).

Implementing leader election and heartbeats (empty AppendEntries calls) should be sufficient for a single leader to be elected and -- in the absence of failures -- stay the leader, as well as redetermine leadership after failures. Once you have this working, you should be able to pass the two Election "go test" tests. Note that the tests should always pass. Please use the --count n flag to run the tests multiple times:

$ go test -run Election --count 3
Test: initial election ...
  ... Passed
Test: election after network failure ...
  ... Passed
PASS
  ok  raft7.008s

Part II: Log Consensus

While being able to elect a leader is useful, we want to use Raft to keep a consistent, replicated log of operations. To do so, we need to have the servers accept client operations through Start(), and insert them into the log. In Raft, only the leader is allowed to append to the log, and should disseminate new entries to other servers by including them in its outgoing AppendEntries RPCs.

In this assignment you'll implement most of the Raft design described in the extended paper, including saving persistent state and reading it after a node fails and then restarts. You will not implement cluster membership changes (Section 6) or log compaction / snapshotting (Section 7).

A set of Raft instances talk to each other with RPC to maintain replicated logs. Your Raft interface will support an indefinite sequence of numbered commands, also called log entries. The entries are numbered with index numbers. The log entry with a given index will eventually be committed. At that point, your Raft should send the log entry to the larger service for it to execute.

Your first major task is to implement the leader and follower code to append new log entries. This will involve implementing Start(), completing the AppendEntries RPC structs, sending them, and fleshing out the AppendEntry RPC handler. Your goal should first be to pass the TestBasicAgree() test (in test_test.go). Once you have that working, you should try to get all the tests before the "basic persistence" test to pass before, by running the following command:

  $ go test -run "TestBasicAgree|TestFailAgree|TestFailNoAgree|TestConcurrentStarts|TestRejoin|TestBackup|TestCount" --count 3
  Test: basic agreement ...
    ... Passed
  Test: agreement despite follower failure ...
    ... Passed
  Test: no agreement if too many followers fail ...
    ... Passed
  Test: concurrent Start()s ...
    ... Passed
  Test: rejoin of partitioned leader ...
    ... Passed
  Test: leader backs up quickly over incorrect follower logs ...
    ... Passed
  Test: RPC counts aren't too high ...
    ... Passed
  PASS
  ok    raft  43.365s

Only RPCs may be used for interaction between different Raft instances. For example, different instances of your Raft implementation are not allowed to share Go variables. Your implementation should not use files at all.

Part III: Fault Tolerance

The next major task is to handle the fault tolerant aspects of the Raft protocol, making your implementation robust against various kinds of failures. These failures could include servers not receiving some RPCs and servers that crash and restart.

A Raft-based server must be able to pick up where it left off, and continue if the computer it is running on reboots. This requires that Raft keep persistent state that survives a reboot (the paper's Figure 2 mentions which state should be persistent).

A “real” implementation would do this by writing Raft's persistent state to disk each time it changes, and reading the latest saved state from disk when restarting after a reboot. Your implementation won't use the disk; instead, it will save and restore persistent state from a Persister object (see persister.go). Whoever calls Make() supplies a Persister that initially holds Raft's most recently persisted state (if any). Raft should initialize its state from that Persister, and should use it to save its persistent state each time the state changes. You can use the ReadRaftState() and SaveRaftState() methods for this respectively.

Implement persistence by first adding code to serialize any state that needs persisting in persist(), and to unserialize that same state in readPersist(). You now need to determine at what points in the Raft protocol your servers are required to persist their state, and insert calls to persist() in those places. Once this code is complete, you should pass the remaining tests. You may want to first try and pass the "basic persistence" test (go test -run 'TestPersist1$'), and then tackle the remaining ones by running:

  $ go test -run "TestPersist1|TestPersist2|TestPersist3|TestFigure8|TestUnreliableAgree|TestFigure8Unreliable|TestReliableChurn|TestUnreliableChurn" --count 3
  Test: basic persistence ...
    ... Passed
  Test: more persistence ...
    ... Passed
  Test: partitioned leader and one follower crash, leader restarts ...
    ... Passed
  Test: Figure 8 ...
    ... Passed
  Test: unreliable agreement ...
    ... Passed
  Test: Figure 8 (unreliable) ...
    ... Passed
  Test: churn ...
    ... Passed
  Test: unreliable churn ...
    ... Passed
  PASS
  ok    raft  146.815s

You will need to encode the state as an array of bytes in order to pass it to the Persister; raft.go contains some example code for this in persist() and readPersist().

In order to pass some of the challenging tests towards the end, such as those marked "unreliable", you will need to implement the optimization to allow a follower to back up the leader's nextIndex by more than one entry at a time. See the description in the extended Raft paper starting at the bottom of page 7 and top of page 8 (marked by a gray line).

Resources and Advice

Submission

All submissions will be through the course's Inginious site, which will run and return your score in real time. You will need to sign in with you KAUST username and password; the same used for Portal and Blackboard. Once logged in, click assignment 3, which will ask you to upload a file. This file should be raft.go. You may submit multiple times. You will receive the highest score recorded prior to the assignment deadline. If you face any issues, especially students using Windows, please post a public question on Piazza and include in the title "Assignment submission issue".

Before submitting, please run the full tests given above for each part one final time. You are responsible for making sure your code works.

You will receive full credit for Part I, the leader election component, if your software passes the Election tests (as run by the go test commands above) on our servers. You will receive full credit for Part II if your software passes the tests mentioned for that section on our servers. You will receive full credit for Part III if your software passes the tests mentioned for that section on our servers.

The final portion of your credit is determined by code quality tests, using the standard tools gofmt and go vet. You will receive full credit for this portion if all files submitted conform to the style standards set by gofmt and the report from go vet is clean for your mapreduce package (that is, produces no errors). If your code does not pass the gofmt test, you should reformat your code using the tool. You can also use the Go Checkstyle tool for advice to improve your code's style, if applicable. Additionally, though not part of the graded cheks, it would also be advisable to produce code that complies with Golint where possible.

Acknowledgments

This assignment is adapted from Princeton's COS-418, and previously from MIT's 6.824 course. Thanks to Mike Freedman, Frans Kaashoek, Robert Morris, and Nickolai Zeldovich for their support.


Last updated: 2019-10-08 13:30:14