raft: doc, debugging instruction on MessageType
This adds documentation on MessageType. Having clear explanation about MessageType helps understand raft logic and debug etcd when there is a message dropping. This is partially for coreos#3806.release-2.3
parent
4e5f20de3b
commit
d817f885db
99
raft/doc.go
99
raft/doc.go
|
@ -13,7 +13,8 @@
|
|||
// limitations under the License.
|
||||
|
||||
/*
|
||||
Package raft provides an implementation of the raft consensus algorithm.
|
||||
Package raft sends and receives messages in the Protocol Buffer format
|
||||
defined in the raftpb package.
|
||||
|
||||
Raft is a protocol by which a cluster of nodes can maintain a replicated state machine.
|
||||
The state machine is kept in sync through the use of a replicated log.
|
||||
|
@ -180,5 +181,101 @@ cannot be removed any more since the cluster cannot make progress.
|
|||
For this reason it is highly recommended to use three or more nodes in
|
||||
every cluster.
|
||||
|
||||
MessageType
|
||||
|
||||
Package raft sends and receives message in Protocol Buffer format (defined
|
||||
in raftpb package). Each state (follower, candidate, leader) implements its
|
||||
own 'step' method ('stepFollower', 'stepCandidate', 'stepLeader') when
|
||||
advancing with the given raftpb.Message. Each step is determined by its
|
||||
raftpb.MessageType. Note that every step is checked by one common method
|
||||
'Step' that safety-checks the terms of node and incoming message to prevent
|
||||
stale log entries:
|
||||
|
||||
'MsgHup' is used for election. If a node is a follower or candidate, the
|
||||
'tick' function in 'raft' struct is set as 'tickElection'. If a follower or
|
||||
candidate has not received any heartbeat before the election timeout, it
|
||||
passes 'MsgHup' to its Step method and becomes (or remains) a candidate to
|
||||
start a new election.
|
||||
|
||||
'MsgBeat' is an internal type that signals leaders to send a heartbeat of
|
||||
the 'MsgHeartbeat' type. If a node is a leader, the 'tick' function in
|
||||
the 'raft' struct is set as 'tickHeartbeat', and sends periodic heartbeat
|
||||
messages of the 'MsgBeat' type to its followers.
|
||||
|
||||
'MsgProp' proposes to append data to its log entries. This is a special
|
||||
type to redirect proposals to leader. Therefore, send method overwrites
|
||||
raftpb.Message's term with its HardState's term to avoid attaching its
|
||||
local term to 'MsgProp'. When 'MsgProp' is passed to the leader's 'Step'
|
||||
method, the leader first calls the 'appendEntry' method to append entries
|
||||
to its log, and then calls 'bcastAppend' method to send those entries to
|
||||
its peers. When passed to candidate, 'MsgProp' is dropped. When passed to
|
||||
follower, 'MsgProp' is stored in follower's mailbox(msgs) by the send
|
||||
method. It is stored with sender's ID and later forwarded to leader by
|
||||
rafthttp package.
|
||||
|
||||
'MsgApp' contains log entries to replicate. A leader calls bcastAppend,
|
||||
which calls sendAppend, which sends soon-to-be-replicated logs in 'MsgApp'
|
||||
type. When 'MsgApp' is passed to candidate's Step method, candidate reverts
|
||||
back to follower, because it indicates that there is a valid leader sending
|
||||
'MsgApp' messages. Candidate and follower respond to this message in
|
||||
'MsgAppResp' type.
|
||||
|
||||
'MsgAppResp' is response to log replication request('MsgApp'). When
|
||||
'MsgApp' is passed to candidate or follower's Step method, it responds by
|
||||
calling 'handleAppendEntries' method, which sends 'MsgAppResp' to raft
|
||||
mailbox.
|
||||
|
||||
'MsgVote' requests votes for election. When a node is a follower or
|
||||
candidate and 'MsgHup' is passed to its Step method, then the node calls
|
||||
'campaign' method to campaign itself to become a leader. Once 'campaign'
|
||||
method is called, the node becomes candidate and sends 'MsgVote' to peers
|
||||
in cluster to request votes. When passed to leader or candidate's Step
|
||||
method and the message's Term is lower than leader's or candidate's,
|
||||
'MsgVote' will be rejected ('MsgVoteResp' is returned with Reject true).
|
||||
If leader or candidate receives 'MsgVote' with higher term, it will revert
|
||||
back to follower. When 'MsgVote' is passed to follower, it votes for the
|
||||
sender only when sender's last term is greater than MsgVote's term or
|
||||
sender's last term is equal to MsgVote's term but sender's last committed
|
||||
index is greater than or equal to follower's.
|
||||
|
||||
'MsgVoteResp' contains responses from voting request. When 'MsgVoteResp' is
|
||||
passed to candidate, the candidate calculates how many votes it has won. If
|
||||
it's more than majority (quorum), it becomes leader and calls 'bcastAppend'.
|
||||
If candidate receives majority of votes of denials, it reverts back to
|
||||
follower.
|
||||
|
||||
'MsgSnap' requests to install a snapshot message. When a node has just
|
||||
become a leader or the leader receives 'MsgProp' message, it calls
|
||||
'bcastAppend' method, which then calls 'sendAppend' method to each
|
||||
follower. In 'sendAppend', if a leader fails to get term or entries,
|
||||
the leader requests snapshot by sending 'MsgSnap' type message.
|
||||
|
||||
'MsgSnapStatus' tells the result of snapshot install message. When a
|
||||
follower rejected 'MsgSnap', it indicates the snapshot request with
|
||||
'MsgSnap' had failed from network issues which causes the network layer
|
||||
to fail to send out snapshots to its followers. Then leader considers
|
||||
follower's progress as probe. When 'MsgSnap' were not rejected, it
|
||||
indicates that the snapshot succeeded and the leader sets follower's
|
||||
progress to probe and resumes its log replication.
|
||||
|
||||
'MsgHeartbeat' sends heartbeat from leader. When 'MsgHeartbeat' is passed
|
||||
to candidate and message's term is higher than candidate's, the candidate
|
||||
reverts back to follower and updates its committed index from the one in
|
||||
this heartbeat. And it sends the message to its mailbox. When
|
||||
'MsgHeartbeat' is passed to follower's Step method and message's term is
|
||||
higher than follower's, the follower updates its leaderID with the ID
|
||||
from the message.
|
||||
|
||||
'MsgHeartbeatResp' is a response to 'MsgHeartbeat'. When 'MsgHeartbeatResp'
|
||||
is passed to leader's Step method, the leader knows which follower
|
||||
responded. And only when the leader's last committed index is greater than
|
||||
follower's Match index, the leader runs 'sendAppend` method.
|
||||
|
||||
'MsgUnreachable' tells that request(message) wasn't delivered. When
|
||||
'MsgUnreachable' is passed to leader's Step method, the leader discovers
|
||||
that the follower that sent this 'MsgUnreachable' is not reachable, often
|
||||
indicating 'MsgApp' is lost. When follower's progress state is replicate,
|
||||
the leader sets it back to probe.
|
||||
|
||||
*/
|
||||
package raft
|
||||
|
|
Loading…
Reference in New Issue