Commit Graph

961 Commits (bcc147127d1a33d827f78a0d1bf8fe2363a344f2)

Author SHA1 Message Date
Xiang Li d506962fec
Merge pull request #10848 from spzala/raftthesis10831
raftdoc: fix raft thesis link
2019-06-28 12:43:32 -07:00
Sahdev P. Zala 655ab0ac6a raftdoc: fix raft thesis link
The current link does not work and not valid anymore per stanford support.
Replace all current refs with a link that is used by the
https://raft.github.io/

Fixes # https://github.com/etcd-io/etcd/issues/10831
2019-06-24 19:01:00 -04:00
Tobias Schottdorf f9c2d00fb3 raft: extract 'tracker' package
Mechanically extract `progressTracker`, `Progress`, and `inflights`
to their own package named `tracker`. Add lots of comments in the
progress, and take the opportunity to rename and clarify various
fields.
2019-06-21 22:15:00 +02:00
Tobias Schottdorf e262542d6d quorum: fix vet failure
This slipped in during a rename and I didn't see it in CI because of
CI flakiness and a general intransparency about which failures are
important.
2019-06-20 23:40:08 +02:00
Tobias Schottdorf e039629907 raft: use half-populated joint quorum
To ease a future transition into joint quorums, this commit removes the
previous "ad-hoc" majority-based quorum and vote computations with that
introduced in the `raft/quorum` package.

More specifically, the progressTracker now uses a quorum.JointConfig for
which the "second" majority quorum is always empty; in this case the
quorum behaves like the one quorum.MajorityConfig that is actually
present. Or, more briefly, this change is a no-op, but it will take the
busywork out of actually starting to make use of joint quorums in the
future.

On a side node, I suspect that this might've fixed a bug regarding the
read index though I haven't been able to explicitly come up with a
counter-example. The problem was that the acks collected for the read
index weren't taking into account membership changes, so they'd run the
danger of using acks from nodes since removed to claim that a quorum of
acks had been received. There's a chance that there isn't a
counter-example (the only guarantee extracted from the "quorum" is that
there isn't another leader, but even if there's another leader all that
matters is that that leader doesn't have a divergent history from the
stale leader in the hypothetical counter-example), but either way there
is morally a bug here that is now fixed because VoteCommitted doesn't
care about votes from members that are not voters known to the currently
active configuration.
2019-06-19 14:19:35 +02:00
Tobias Schottdorf 0384c587eb raft: rename makeP{RS,rogressTracker} 2019-06-19 14:19:35 +02:00
Tobias Schottdorf 3def2364e4 raft: use membership sets in progress tracking
Instead of having disjoint mappings of ID to *Progress for voters and
learners, use a map[id]struct{} for each and share a map of *Progress
among them.

This is easier to handle when joint quorums are introduced, at which
point a node may be a voting member of two quorums.
2019-06-19 14:19:35 +02:00
Tobias Schottdorf 76c8ca5a55 quorum: introduce library for majority and joint quorums
The quorum package contains logic to reason about committed indexes as
well as vote outcomes for both majority and joint quorums. The package
is oblivious to the existence of learner replicas.

The plan is to hook this up to etcd/raft in subsequent commits.
2019-06-19 14:19:35 +02:00
Tobias Schottdorf c844526002 raft: prevent learners from becoming leader
We were already taking some precautions against learners campaigning,
but there was no safeguard against an explicit call to `Campaign()`.
The newly added test also verifies that leadership transfers to
learners are ignored.
2019-06-17 09:20:45 +02:00
Tobias Schottdorf cbb7730c26 raft: make relationship between node and RawNode explicit
This will keep them from diverging to much. In fact we should remove
some of the obvious differences that have crept in over time so that
what remains is structural. This isn't done in this commit since
it amounts to a change in the public API; we should lump this in
when we break the public API the next time.
2019-06-07 23:07:42 +02:00
Gyuho Lee 34bd797e67 *: revert module import paths
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-05-28 15:39:35 -07:00
Tobias Schottdorf 5dd45011d6 raft: rename prs to progressTracker 2019-05-21 16:03:36 +02:00
Tobias Schottdorf 02b0d80234 raft: remove quorum() dependency from readOnly
This now delegates the quorum computation to r.prs, which will allow
it to generalize in a straightforward way when etcd-io/etcd#7625 is
addressed.
2019-05-21 16:03:36 +02:00
Tobias Schottdorf 57a1b39fcd raft: avoid another call to quorum()
This particular caller just wanted to know whether it was in a single-voter
cluster configuration, which is now a question prs can answer.
2019-05-21 16:02:52 +02:00
Tobias Schottdorf bc828e939a raft: pull checkQuorumActive into prs
It's looking at each voter's Progress and needs to know how quorums
work, so this is the ideal new home for it.
2019-05-21 16:02:52 +02:00
Tobias Schottdorf a6f222e62d raft: establish an interface around vote counting
This cleans up the mechanical refactor in the last commit and will
help with etcd-io/etcd#7625 as well.
2019-05-21 16:02:52 +02:00
Tobias Schottdorf 26eaadb1d1 raft: move votes into prs
This is purely mechanical. Cleanup deferred to the next commit.
2019-05-21 16:02:52 +02:00
Tobias Schottdorf a11563737c raft: use progress tracker APIs in more places
This doesn't completely eliminate access to prs.nodes, but that's not
really necessary. This commit uses the existing APIs in a few more
places where it's convenient, and also sprinkles some assertions.
2019-05-21 16:02:52 +02:00
Tobias Schottdorf ea82b2b758 raft: move more methods onto the progress tracker
Continues what was initiated in the last commit.
2019-05-21 16:02:52 +02:00
Tobias Schottdorf dbac67e7a8 raft: extract progress tracking into own component
The Progress maps contain both the active configuration and information
about the replication status. By pulling it into its own component, this
becomes easier to unit test and also clarifies the code, which will see
changes as etcd-io/etcd#7625 is addressed.

More functionality will move into `prs` in self-contained follow-up commits.
2019-05-21 16:02:52 +02:00
Hongyi Shen d68f60e9a0 raft: update raft paper link (previous link deprecated) 2019-05-03 08:50:16 -07:00
Gyuho Lee e899023f3f
Merge pull request #10640 from shrajfr12/gomodulecompat
Fix module path to have the major version to comply with go modules specification.
2019-05-01 22:46:03 -07:00
Xiang Li e3f37534e1
Merge pull request #10684 from nvanbenschoten/nvanbenschoten/appendAndCopy
raft: Avoid multiple allocs when merging stable and unstable log
2019-04-30 11:51:32 -07:00
Xiang Li 0bc219a91e
Merge pull request #10679 from nvanbenschoten/nvanbenschoten/commitAlloc
raft: Avoid allocation when boxing slice in maybeCommit
2019-04-30 10:55:16 -07:00
shivaramr 9150bf52d6 go modules: Fix module path version to include version number 2019-04-26 15:29:50 -07:00
Nathan VanBenschoten b5593de806 raft: Avoid multiple allocs when merging stable and unstable log
Appending to an empty slice twice could (and often did) result in
multiple allocations. This was wasteful. We can avoid this by performing
a single allocation with the correct size and copying into it.
2019-04-26 14:57:51 -04:00
Nathan VanBenschoten 24f35a9861 raft: avoid allocation of Raft entry due to logging
`raftpb.Entry.String` takes a pointer receiver, so calling it
on a loop variable was causing the variable to escape. Removing
the `.String()` call was enough to avoid the allocation, but
this also avoids a memory copy and prevents similar bugs.

This was responsible for 11.63% of total allocations in an
experiment with https://github.com/nvanbenschoten/raft-toy.
2019-04-26 14:56:31 -04:00
Nathan VanBenschoten 208b8a349c raft: Avoid allocation when boxing slice in maybeCommit
By boxing a heap-allocated slice header instead of the slice
header on the stack, we can avoid an allocation when passing
through the sort.Interface interface.

This was responsible for 26.61% of total allocations in an
experiment with https://github.com/nvanbenschoten/raft-toy.
2019-04-26 00:10:45 -04:00
Jingyi Hu 30034e5ff5 raft: add learner field to progress stringer 2019-04-11 18:15:03 -07:00
Jingyi Hu 5088d70d69 raft: leader response to learner MsgReadIndex
Leader should check message sender after receiving MsgReadIndex, even
when raft quorum is 1. The message could be sent from learner node, and
leader should respond.
2019-03-28 16:14:32 -07:00
Sahdev P. Zala 56f1bce161 raft/doc: clarify the case of out of date term
Clarify the doc.

Fixes #10491
2019-03-26 14:00:24 -04:00
caoming 92d5d19ce9 raft: more precise that rename res to err. 2019-03-22 10:18:00 +08:00
shawnli 23731bf9ba raft: set lead to none when becomePreCandidate 2018-12-28 19:57:26 +08:00
Xiang Li 1900a8e26f
Merge pull request #10308 from tbg/fix/progress-after-snap
raft: enter ProgressStateReplica immediately after snapshot
2018-12-06 10:58:22 -08:00
Tobias Schottdorf bd332b318e raft: add (*RawNode).WithProgress
Calls to Status can be frequent and currently incur three heap
allocations, but often the caller has no intention to hold on to the
returned status.

Add StatusWithoutProgress and WithProgress to allow avoiding heap
allocations altogether. StatusWithoutProgress does what's on the
tin and additionally returns a value (instead of a pointer) to
avoid the associated heap allocation. By not returning a Progress
map, it avoids all other allocations that Status incurs.

To still introspect the Progress map, add WithProgress, which
uses a simple visitor pattern.

Add benchmarks to verify that this is indeed allocation free.

```
BenchmarkStatusProgress/members=1/Status-8                  5000000    353 ns/op        784 B/op    3 allocs/op
BenchmarkStatusProgress/members=1/Status-example-8          5000000    372 ns/op        784 B/op    3 allocs/op
BenchmarkStatusProgress/members=1/StatusWithoutProgress-8   100000000  17.6 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=1/WithProgress-8            30000000   48.6 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=1/WithProgress-example-8    30000000   42.9 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=3/Status-8                  5000000    395 ns/op        784 B/op    3 allocs/op
BenchmarkStatusProgress/members=3/Status-example-8          3000000    449 ns/op        784 B/op    3 allocs/op
BenchmarkStatusProgress/members=3/StatusWithoutProgress-8   100000000  18.7 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=3/WithProgress-8            20000000   78.1 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=3/WithProgress-example-8    20000000   70.7 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=5/Status-8                  3000000    470 ns/op        784 B/op    3 allocs/op
BenchmarkStatusProgress/members=5/Status-example-8          3000000    544 ns/op        784 B/op    3 allocs/op
BenchmarkStatusProgress/members=5/StatusWithoutProgress-8   100000000  19.7 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=5/WithProgress-8            20000000   105 ns/op          0 B/op    0 allocs/op
BenchmarkStatusProgress/members=5/WithProgress-example-8    20000000   94.0 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=100/Status-8                100000     11903 ns/op    22663 B/op   12 allocs/op
BenchmarkStatusProgress/members=100/Status-example-8        100000     13330 ns/op    22669 B/op   12 allocs/op
BenchmarkStatusProgress/members=100/StatusWithoutProgress-8 50000000   20.9 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=100/WithProgress-8          1000000    1731 ns/op         0 B/op    0 allocs/op
BenchmarkStatusProgress/members=100/WithProgress-example-8  1000000    1571 ns/op         0 B/op    0 allocs/op
```
2018-12-06 19:02:48 +01:00
Tobias Schottdorf bfaae1ba46 raft: enter ProgressStateReplica immediately after snapshot
When a follower requires a snapshot and the snapshot is sent at the
committed (and last) index in an otherwise idle Raft group, the follower
would previously remain in ProgressStateProbe even though it had been
caught up completely.

In a busy Raft group this wouldn't be an issue since the next round of
MsgApp would update the state, but in an idle group there's nothing
that rectifies the status (since there's nothing to append or update).

The reason this matters is that the state is exposed through
`RaftStatus()`. Concretely, in CockroachDB, we use the Raft status to
make sharding decisions (since it's dangerous to make rapid changes
to a fragile Raft group), and had to work around this problem[1].

[1]: 91b11dae41/pkg/storage/split_delay_helper.go (L138-L141)
2018-12-06 11:09:59 +01:00
Xiang Li 6c649de36e
Merge pull request #10281 from tbg/print-hint-reject-app-resp
raft: print RejectHint of zero on MsgAppResp
2018-11-24 11:48:16 +08:00
Tobias Schottdorf 5c209d66d2 raft: ensure leader is in ProgressStateReplicate
The leader perpetually kept itself in ProgressStateProbe even though of
course it has perfect knowledge of its log. This wasn't usually an issue
because it also doesn't care about its own Progress, but it's better to
keep this data correctly maintained, especially since this is part of
raft.Status and thus becomes visible to applications using the Raft
library.

(Concretely, in CockroachDB we use the Progress to inform log
truncations).
2018-11-23 17:57:36 +01:00
Tobias Schottdorf 1569f4829d raft: print RejectHint of zero on MsgAppResp
A zero RejectHint on MsgAppResp is still used, and so should be
reflected in the message description.
2018-11-23 11:06:38 +01:00
Xiang Li c2d023ce74
Merge pull request #10263 from johncming/raftstorage
raft:  add a test case in TestStorageAppend
2018-11-15 22:01:06 +08:00
caoming 9668536124 raft: add a test case in TestStorageAppend 2018-11-15 16:41:36 +08:00
Andrew Werner e4af2be5bb raft: separate MaxCommittedSizePerReady config from MaxSizePerMsg
Prior to this change, MaxSizePerMsg was used both to cap the total byte size of
entries in messages as well as the total byte size of entries passed through
CommittedEntries in the Ready struct. This change adds a new Config parameter
MaxCommittedSizePerReady which defaults to MaxSizePerMsg and contols the second
of above descibed settings.
2018-11-14 09:59:09 -05:00
Shin'ya Ueoka aa4313a55a *: fix github links 2018-11-10 11:14:18 +09:00
Xiang Li c0e04700cf
Merge pull request #10230 from manishrjain/master
raft: Explain ReportSnapshot and Propose behavior
2018-11-01 06:48:45 +08:00
Manish R Jain 4aa72ca1d3
raft: Explain ReportSnapshot and Propose behavior
Update godocs for node interface, explaining the behavior of ReportSnapshot and Propose.
2018-10-31 15:37:55 -07:00
Xiang Li 798955d4d6
Merge pull request #10209 from ping40/d1024
raft: Fix comment on TestLeaderBcastBeat
2018-10-25 14:42:16 -07:00
Gyuho Lee b7ed4165ea raft: fix godoc in tests
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-24 23:23:32 -07:00
Gyuho Lee 965ba5ca8b
Merge pull request #10203 from ping40/doc1022_2
raft: fix description in UT
2018-10-24 23:21:02 -07:00
ping40 10255cf196 raft: Fix comment on TestLeaderBcastBeat 2018-10-24 16:56:10 +08:00
Gyuho Lee 86b933311d
Merge pull request #10205 from gyuho/testing-prow
OWNERS: experiment
2018-10-22 16:07:27 -07:00
Gyuho Lee c561f8310e OWNERS: experiment
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-22 12:49:08 -07:00
Tobias Schottdorf ad49c8fd98 raft: fix bug in unbounded log growth prevention mechanism
The previous code was using the proto-generated `Size()` method to
track the size of an incoming proposal at the leader. This includes
the Index and Term, which were mutated after the call to `Size()`
when appending to the log. Additionally, it was not taking into
account that an ignored configuration change would ignore the
original proposal and append an empty entry instead.

As a result, a fully committed Raft group could end up with a non-
zero tracked uncommitted Raft log counter that would eventually hit
the ceiling and drop all future proposals indiscriminately. It would
also immediately imply that proposals exceeding the threshold alone
would get refused (as the "first uncommitted proposal" gets special
treatment and is always allowed in).

Track only the size of the payload actually appended to the Raft log
instead.

For context, see:
https://github.com/cockroachdb/cockroach/issues/31618#issuecomment-431374938
2018-10-22 11:28:39 +02:00
ping40 de470991e1 raft: fix description in UT 2018-10-22 13:59:50 +08:00
Nathan VanBenschoten 73c20cc1b7 raft: Fix comment on sendHeartbeat 2018-10-14 00:03:43 -04:00
Nathan VanBenschoten 7be7ac5a5d raft: Fix spelling in doc.go 2018-10-13 23:25:05 -04:00
Nathan VanBenschoten f89b06dc6d raft: provide protection against unbounded Raft log growth
The suggested pattern for Raft proposals is that they be retried
periodically until they succeed. This turns out to be an issue
when a leader cannot commit entries because the leader will continue
to append re-proposed entries to its log without committing anything.
This can result in the uncommitted tail of a leader's log growing
without bound until it is able to commit entries.

This change add a safeguard to protect against this case where a
leader's log can grow without bound during loss of quorum scenarios.
It does so by introducing a new, optional ``MaxUncommittedEntriesSize
configuration. This config limits the max aggregate size of uncommitted
entries that may be appended to a leader's log. Once this limit
is exceeded, proposals will begin to return ErrProposalDropped
errors.

See cockroachdb/cockroach#27772
2018-10-13 23:25:05 -04:00
Ben Darnell 08e88c6693
Merge pull request #10063 from tschottdorf/fix-commit-pagination
raft: fix correctness bug in CommittedEntries pagination
2018-10-02 12:39:29 -04:00
Peter Mattis 66ee394527 raft: fix Ready.MustSync logic
The previous logic was erroneously setting Ready.MustSync to true when
the hard state had not changed because we were comparing an empty hard
state to the previous hard state. In combination with another misfeature
in CockroachDB (unnecessary writing of empty batches), this was causing
a steady stream of synchronous writes to disk.
2018-09-19 16:33:16 -04:00
Gyuho Lee c2b3c54370 raft: fix link typo
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-09-06 09:20:22 -07:00
Tobias Schottdorf 7a8ab37bfd raft: fix correctness bug in CommittedEntries pagination
In #9982, a mechanism to limit the size of `CommittedEntries` was
introduced. The way this mechanism worked was that it would load
applicable entries (passing the max size hint) and would emit a
`HardState` whose commit index was truncated to match the limitation
applied to the entries. Unfortunately, this was subtly incorrect
when the user-provided `Entries` implementation didn't exactly
match what Raft uses internally. Depending on whether a `Node` or
a `RawNode` was used, this would either lead to regressing the
HardState's commit index or outright forgetting to apply entries,
respectively.

Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of `HardState` when limiting is active
and tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.

See [1] for more on the discovery of this bug (CockroachDB's
implementation of `Entries` returns one more entry than Raft's when the
size limit hits).

[1]: https://github.com/cockroachdb/cockroach/issues/28918#issuecomment-418174448
2018-09-04 14:52:23 +02:00
Gyuho Lee bb60f8ab1d raft: change import paths to "go.etcd.io/etcd"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 17:47:52 -07:00
Zhao Haiyuan 6ee880eb5b raft: fix typo in test 2018-08-22 23:48:47 +08:00
Xiang Li 11dd0b583b
Merge pull request #9982 from bdarnell/pagination
raft: Introduce CommittedEntries pagination
2018-08-11 09:12:46 +08:00
Ben Darnell a9e7c1e11f raft: Make flow control more aggressive
We allow multiple in-flight append messages, but prior to this change
the only way we'd ever send them is if there is a steady stream of new
proposals. Catching up a follower that is far behind would be
unnecessarily slow (this is exacerbated by a quirk of CockroachDB's
use of raft which limits our ability to catch up via snapshot in some
cases).

See cockroachdb/cockroach#27983
2018-08-08 11:10:54 -04:00
Ben Darnell 0a670b7c9b raft: Introduce CommittedEntries pagination
The MaxSizePerMsg setting is now used to limit the size of
Ready.CommittedEntries. This prevents out-of-memory errors if the raft
log has become very large and commits all at once.
2018-08-07 12:54:34 -04:00
Ben Darnell bc14deecca raft: Add a test for MaxSizePerMsg feature
Ensure that this limit is respected when generating MsgApp messages.
2018-08-06 16:52:16 -04:00
Nathan VanBenschoten 0a415cf0d6 raft: dont allocate slice and sort on every commit 2018-07-25 23:42:16 -04:00
Gyuho Lee 7aaaa0d82f raft: do not use underscore in var name
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-05 10:25:47 -07:00
Gyuho Lee 0249c39cb3 raft: remove unnecessary type conversion
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-05 10:12:45 -07:00
Ben Darnell 20422c5b4d raft: Really avoid scanning raft log in becomeLeader
I meant to do this in #9073, but sent the PR before it was finished.
The last log index is known directly; there is no need to fetch any
entries here.
2018-06-26 15:29:51 -04:00
Gyuho Lee 1136ba0e0d raft: fix logger variadic parameter
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-15 13:10:58 -07:00
Gyuho Lee 9054786553 Revert "raft: fix logger Panic variadic parameter"
This reverts commit 5a94aba33eeb504e7036a27268c67f6a1796445e.
2018-06-15 13:10:58 -07:00
sudeesh john e07d19e549 raft: fix logger Panic variadic parameter
"# github.com/coreos/etcd/raft
raft/logger.go:117: missing ... in args forwarded to print-like function"

New parameter check got added the golang to check the function parameter
c006036075 (diff-8fa5b0d6191706747ef5773f895781c9)
2018-06-15 13:10:58 -07:00
Xiang Li 357308bfcd
Merge pull request #9679 from lorneli/lorneli-raft-dev
raft: describe the purpose of lockedRand
2018-05-26 22:03:18 -07:00
lorneli a083282482 raft: describe the purpose of lockedRand
Struct lockedRand wraps rand.Rand with mutex lock because it's
accessed by multiple raft groups.
2018-05-26 21:59:24 +08:00
Xiang Li 20cf7f4d5b
Merge pull request #9671 from lorneli/raft-test
raft: merge test cases of pre-candidate with the normal one
2018-05-24 08:27:07 -07:00
Gyuho Lee e7adfb0ebf raft: use different parameters for tests
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-09 15:42:45 -07:00
lorneli 3d12e36c7e raft: merge test cases of pre-candidate with the normal one
So result checking just compares the expected with output and
becomes more readable.
2018-05-01 17:08:37 +08:00
Jia Zhan d14b705355 raft: fix a few comments 2018-04-27 11:25:06 -07:00
Vincent Lee f0dffb4163 raft: Propose in raft node wait the proposal result so we can fail fast while dropping proposal. 2018-04-03 11:04:09 +08:00
Kostas Christidis 438163feb4 raft: fix failing tests in rafttest
Tests in `rafttest` would fail because they referred to field `Id` instead of
`ID`. This PR fixes that.

Closes #9504.

Signed-off-by: Kostas Christidis <kostas@christidis.io>
2018-03-28 15:12:29 -04:00
Gyuho Lee 8aae8c1c9c raft: document disruptive rejoining server, add tests
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-06 09:54:29 -08:00
Gyuho Lee d808b4686c raft: fix typo in raft_test.go
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-02-26 10:03:25 -08:00
Gyuho Lee 01db389ea8 raft: document why reuse candidate's term for vote response in stepCandidate
"stepCandidate" should reuse candidate's own term, not term in Message,
because pre-vote is requested with future term.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-02-21 16:11:01 -08:00
Gyuho Lee 38846c220a raft: use leader's term when candidate becomes follower
`raft.Step` already ensures that when `m.Term > r.Term`,
candidate reverts back to follower with its term being
reset with `m.Term`, thus it's always true that
`m.Term == r.Term` in `stepCandidate`.

This just makes `r.becomeFollower` calls consistent.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-02-21 16:10:52 -08:00
Gyuho Lee 2b7c12fb12 raft: reuse "last index" in "appendEntry"
No need to call "lastIndex" again.
"append" call already returns "lastIndex".

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-02-05 21:26:45 -08:00
Xiang Li d54f281b26
Merge pull request #8525 from shuaili87/pre-vote-compatible
raft: fix deadlock during PreVote migration process
2018-01-26 16:34:59 -08:00
Manjunath A Kumatagi c27998db97 raft: fix govet errors 2018-01-25 04:51:38 -05:00
Ben Darnell 4e0291ff91 raft: Clarify conditions for granting votes and prevotes.
This includes one theoretical logic change: A node that knows the
leader of the current term will no longer grant votes, even if it has
not yet voted in this term. It also adds a `m.Type == MsgPreVote`
guard on the `m.Term > r.Term` check, which was previously thought to
be incorrect (see #8517) but was actually just unclear.

Closes #8517
Closes #8571
2018-01-23 15:05:11 -05:00
Kostas Christidis 97fad42d81 docs: fix invalid reference in Raft README
Code snippet in Raft README refers to non-existent field `State`. Fixed
the reference by setting it to `HardState`.
2018-01-17 16:03:32 -05:00
Xiang Li c5532ebbf6
Merge pull request #9067 from absolute8511/optimize-raft-drop
raft: let raft step return error when proposal is dropped to allow fail-fast
2018-01-11 19:54:52 -08:00
Vincent Lee 30ced5b2be raft: let raft step return error when proposal is dropped to allow fail-fast. 2018-01-12 10:16:47 +08:00
Vincent Lee 11fa4f0275 raft: raft learners should be returned after applyConfChange 2018-01-11 17:30:17 +08:00
Gyuho Lee 9bd9d2041f
Merge pull request #9122 from gyuho/temp
raft: fix wrong comments in "mustCheckOutOfBounds"
2018-01-08 18:34:29 -08:00
GhostComputing b3916a393f raft: fix wrong comments in "mustCheckOutOfBounds" 2018-01-08 18:31:22 -08:00
Xiang Li ed1ff9e952
Merge pull request #9073 from bdarnell/pending-conf-index
raft: Avoid scanning raft log in becomeLeader
2018-01-08 16:37:36 -08:00
Nathan VanBenschoten e6dc57f708 raft: s/leaner/learner/g 2018-01-03 08:16:50 -05:00
Ben Darnell 8d8f3195e4 raft: Avoid scanning raft log in becomeLeader
Scanning the uncommitted portion of the raft log to determine whether
there are any pending config changes can be expensive. In
cockroachdb/cockroach#18601, we've seen that a new leader can spend so
much time scanning its log post-election that it fails to send
its first heartbeats in time to prevent a second election from
starting immediately.

Instead of tracking whether a pending config change exists with a
boolean, this commit tracks the latest log index at which a pending
config change *could* exist. This is a less expensive solution to
the problem, and the impact of false positives should be minimal since
a newly-elected leader should be able to quickly commit the tail of
its log.
2017-12-30 10:13:36 -05:00
Gyuho Lee bcd5390b35 *: regenerate protobuf, grpc-gateway
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2017-12-07 21:31:13 -08:00
siddontang c6f2db2e92 raft: support learner 2017-11-11 10:38:21 +08:00