Commit Graph

246 Commits (9df46f9d6f06c1206d5a6ce0e081fc4ef59983aa)

Author SHA1 Message Date
Xiang Li 9df46f9d6f raft: expose RecentActive in Progress 2015-12-10 12:17:18 -08:00
Bram Gruneir 1901a4c718 raft: Ensure that Progress is not nil when a MsgSnapStatus comes in.
This was causing some issues in cockroach cockroachdb/cockroach#2950
2015-12-07 16:01:18 -05:00
Xiang Li a8cc1570d0 raft: support quorum check when raft is leader
If quorum check fails, the leader will step down to follower.
2015-11-24 09:36:37 -08:00
Xiang Li 5d0268aa2e Merge pull request #3877 from bdarnell/campaign-while-leader
raft: no-op instead of panic for Campaigning while leader
2015-11-16 19:59:34 -08:00
Ben Darnell fbeb58d265 raft: no-op instead of panic for Campaigning while leader
We need to be able to force an election (on one node) after creating a
new group (cockroachdb/cockroach#1384), but it is difficult to ensure
that our call to Campaign does not race with an election that may be
started by raft itself. A redundant call to Campaign should be a no-op
instead of a panic. (But the panic in becomeCandidate remains, because
we don't want to update the term or change the committed index in this
case)
2015-11-16 21:44:14 -05:00
Yicheng Qin 3a65442d7d raft: fix print format for term in one log line
`term` should be printed in decimal representation instead of
hexadecimal one.
2015-11-15 20:26:16 -08:00
Yicheng Qin 533e728b64 Merge pull request #3609 from yichengq/raft-snapshot
raft: kill TODO about behavior when snapshot fails
2015-09-29 19:32:31 -07:00
Yicheng Qin 4c82b481a5 raft: improve behavior when snapshot fails
etcd is going to support incremental snapshot, and we design to let it
send at most one snapshot out at first stage. So when one snapshot is in
flight, snapshot request will return error.

When failing to get snapshot when sending MsgSnap, raft prints out
related log and abort sending this message.
2015-09-29 19:15:15 -07:00
Kenji Kaneda f602767e50 raft: remove an obsolete TODO comment on 4MB maxMsgSize hard coding
The TODO comment was added by 7571b2cd, and it was addressed by d9b5b56c.
2015-09-28 21:31:12 -07:00
Dmitry Smirnov b2f4a5f587 *: fix spelling issues (codespell).
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
2015-09-11 10:22:29 +10:00
Ben Darnell 4f20e01f60 raft: Ignore proposals if not a current member.
Fixes another panic in MultiNode.Propose.
2015-08-31 20:31:14 -04:00
Xiang Li 50c1db3fbf raft: downgrade the logging around snapshot to debugf
Snapshot related logging is spamming when leader trying to
sync a failed peer.
2015-08-18 15:43:53 -07:00
es-chow cc362ccdad raft: set logger to raft so log context such as multinode groupID can be logged 2015-08-12 22:56:00 +08:00
Xiang Li b4022899eb raft: fix panic in send app
sendApp accesses the storage several times. Perviously, we
assume that the storage will not be modified during the read
opeartions. The assumption is not true since the storage can
be compacted between the read operations. If a compaction
causes a read entries error, we should not painc. Instead, we
can simply retry the sendApp logic until succeed.
2015-06-15 14:23:33 -07:00
Xiang Li 1279e495f0 raft: make the repeated log message under bad path debug level 2015-06-05 17:29:24 -07:00
Xiang Li 1561b85bf3 raft: drop the raft prefix in logging 2015-06-02 12:50:42 -07:00
Xiang Li b3fb052ad4 raft: make peers a prviate field in raft.Config 2015-03-24 11:10:07 -07:00
Xiang Li d9b5b56c82 raft: make raft configurable 2015-03-23 09:55:19 -07:00
Xiang Li 4a64373225 raft: add flow control for progress
Each progress has a inflighs sliding window. When the progress
is in replicate state, inflights will control the sending speed
of the leader.

The leader can have at most maxInflight number of inflight
messages for each replicate progress. Receving a appResp moves
forward the sliding window. Heartbeat response free one
slot if the window is full.
2015-03-20 20:04:33 -07:00
Xiang Li 2adb58f9de raft: move progress to progress.go 2015-03-19 10:05:04 -07:00
Xiang Li 7571b2cde2 raft: limit the size of msgApp
limit the max size of entries sent per message.
Lower the cost at probing state as we limit the size per message;
lower the penalty when aggressively decrease to a too low next.
2015-03-18 15:59:30 -07:00
Yicheng Qin 67194c0b22 raft: introduce progress states 2015-03-18 08:16:32 -07:00
Xiang Li 39731724ff Merge pull request #2485 from yichengq/337
raft: fall back to bad path when unreachable
2015-03-11 14:16:39 -07:00
Yicheng Qin be0bf2a2bd raft: fall back to bad path when unreachable 2015-03-11 13:21:23 -07:00
Xiang Li c643967a41 raft: reply with the commit index when receives a smaller append message
Follower should not reject the append message with a smaller index than its commit
index. Or it will trigger the leader's resending logic, which might have a high cost.
2015-03-10 22:32:36 -07:00
Xiang Li a2be25cba4 Merge pull request #2460 from xiang90/raft-logger
raft: introduce logger interface
2015-03-09 08:00:21 -07:00
Xiang Li 97579e2e1d raft: introduce logger interface 2015-03-08 21:36:32 -07:00
Xiang Li 7fe608532a raft: do not reset vote if term is not changed
raft MUST keep the voting information for the same term. reset
should not reset vote if term is not changed.
2015-03-07 22:31:20 -08:00
Yicheng Qin b4b9b9118a rafthttp: report MsgSnap status 2015-03-02 09:38:11 -08:00
Yicheng Qin 09f181f585 raft: log unreachable remote node 2015-03-01 16:47:49 -08:00
Xiang Li 9b4d52ee73 raft: do not resend snapshot if not necessary
raft relies on the link layer to report the status of the sent snapshot.
If the snapshot is still sending, the replication to that remote peer will
be paused. If the snapshot finish sending, the replication will begin
optimistically after electionTimeout. If the snapshot fails, raft will
try to resend it.
2015-02-28 11:41:58 -08:00
Xiang Li 2185ac5ac8 raft: cleanup unreachable 2015-02-28 11:35:16 -08:00
Xiang Li 2af33fd494 raft: add reportUnreachable 2015-02-28 10:45:22 -08:00
Xiang Li bff2ccaa22 Merge pull request #2170 from xiang90/remove_log
raft: remove default verbose logging
2015-01-27 15:58:53 -08:00
Xiang Li 553379e82b raft: remove default verbose logging 2015-01-27 15:57:44 -08:00
Ben Darnell 33d2400063 raft: Send any waiting appends after receiving MsgAppResp.
This addresses a problem that comes up in the cockroach tests,
in which the order of messages may lead to deadlocks (due to
the fact that we don't have regular heartbeat timers in most
of our tests).
2015-01-27 17:43:29 -05:00
Jonathan Boulle f1ed69e883 *: switch to line comments for copyright
Build tags are not compatible with block comments.
Also adds copyright header to a few places it was missing.
2015-01-26 09:53:30 -08:00
Ben Darnell 8c3a6508e9 raft: Add applied to the newRaft log message. 2015-01-22 12:04:40 -05:00
Ben Darnell 59214978a2 raft: Add applied index as an argument to newRaft and RestartNode. 2015-01-22 11:38:05 -05:00
Xiang Li 003b97a60f raft: public progress struct in raft 2015-01-20 10:26:22 -08:00
Ben Darnell 2e1c36cdd9 raft: introduce MsgHeartbeatResp.
Now that heartbeats are distinct from MsgApp{,Resp}, the retries
currently performed in stepLeader's MsgAppResp section are only
performed on an actual MsgAppResp (or a new MsgProp). This means
that it may take a long time to recover from a dropped MsgAppResp
in a quiet cluster.

This commit adds a dedicated heartbeat response message. This message
does not convey the follower's current log position because the
MsgHeartbeat does not include the leaders term and index. Upon receipt
of a heartbeat response, the leader may retry the latest MsgApp if it
believes the follower to be behind.
2015-01-14 17:34:10 -05:00
Ben Darnell 9972e62d94 raft: Use <= instead of < for heartbeat ticks.
In code outside the raft package, we cannot call raft.bcastHeartbeat
directly. Instead, to control heartbeats we set heartbeatInterval to 1
and call Tick().
2015-01-14 15:27:32 -05:00
Xiang Li 35b907ac58 raft: add lastIndex as rejectHint
Add the lastindex of the raft log as reject hint, so the leader can
bypass the greater index probing and decrease the next index directly
to last + 1.
2015-01-01 19:04:07 -08:00
Xiang Li fc96a9e4a7 raft: remove unnecessary funcs in raft.go 2014-12-25 17:04:33 -08:00
Xiang Li 88767d913d raft: leader waits for the reply of previous message when follower is not in good path.
It is reasonable for the leader to wait for the reply before sending out the next
msgApp or msgSnap for the follower in bad path. Or the leader will send out useless
messages if the previous message is rejected or the previous message is a snapshot.
Especially for the snapshot case, the leader will be 100% to send out duplicate message
including the snapshot, which is a huge waste.

This commit implement a timeout based wait mechanism. The timeout for normal msgApp is a
heartbeatTimeout and the timeout for snapshot is electionTimeout(snapshot is larger). We
can implement a piggyback mechanism(application notifies the msg lost) in the future
if necessary.
2014-12-18 15:01:50 -08:00
Xiang Li 044e35b814 raft: use newRaft 2014-12-15 11:25:35 -08:00
Xiang Li c586d5012c raft: log term as %d 2014-12-14 10:06:45 -08:00
Xiang Li 2c2e032155 Merge pull request #1908 from bdarnell/error-fixes
raft: remove panic when we see a proposal with no leader.
2014-12-11 13:58:51 -08:00
Ben Darnell b26856b603 raft: add detail to "no leader" log message 2014-12-11 15:07:32 -05:00
Xiang Li 89cba625d6 Merge pull request #1897 from xiang90/raft
raft: get rid of the using of defer in critical path
2014-12-10 21:24:38 -08:00