Commit Graph

244 Commits (96cce208c2cb9e70ac3573b3f76eed7c84d262d1)

Author SHA1 Message Date
Brandon Philips 96cce208c2 go.mod: use go.etcd.io/etcd/v3 versioning
This change makes the etcd package compatible with the existing Go
ecosystem for module versioning.

Used this tool to update package imports:
  https://github.com/KSubedi/gomove
2020-04-28 00:57:35 +00:00
Tobias Schottdorf 0544f33248 raft: clarify ApplyConfChange contract for rejected conf changes
Apps typically maintain the raft configuration as part of the state
machine. As a result, they want to be able to reject configuration change
entries at apply time based on the state on which the entry is supposed
to be applied. When this happens, the app should not call
ApplyConfChange, but the comments did not make this clear.

As a result, it was tempting to pass an empty pb.ConfChange or it's V2
version instead of not calling ApplyConfChange.

However, an empty V1 or V2 proto aren't noops when the configuration is
joint: an empty V1 change is treated internally as a single
configuration change for NodeID zero and will cause a panic when applied
in a joint state. An empty V2 proto is treated as a signal to leave a
joint state, which means that the app's config and raft's would diverge.

The comments updated in this commit now ask users to not call
ApplyConfState when they reject a conf change. Apps that never use joint
consensus can keep their old behavior since the distinction only matters
when in a joint state, but we don't want to encourage that.
2020-02-25 12:45:45 +01:00
Gyuho Lee c7c9428f6b raft: move "RawNode", clarify tick miss
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-24 23:35:36 -07:00
Tobias Schottdorf 721127da12 raft: require app to consume result from Ready()
I changed `(*RawNode).Ready`'s behavior in #10892 in a problematic way.
Previously, `Ready()` would create and immediately "accept" a Ready
(i.e. commit the app to actually handling it). In #10892, Ready() became
a pure read-only operation and the "accepting" was moved to
`Advance(rd)`.  As a result it was illegal to use the RawNode in certain
ways while the Ready was being handled. Failure to do so would result in
dropped messages (and perhaps worse). For example, with the following
operations

1. `rd := rawNode.Ready()`
2. `rawNode.Step(someMsg)`
3. `rawNode.Advance(rd)`

`someMsg` would be dropped, because `Advance()` would clear out the
outgoing messages thinking that they had all been handled by the client.
I mistakenly assumed that this restriction had existed prior, but this
is incorrect.

I noticed this while trying to pick up the above PR in CockroachDB,
where it caused unit test failures, precisely due to the above example.

This PR reestablishes the previous behavior (result of `Ready()` must
be handled by the app) and adds a regression test.

While I was there, I carried out a few small clarifying refactors.
2019-07-23 22:45:01 +02:00
Tobias Schottdorf b9c051e7a7 raftpb: clean up naming in ConfChange 2019-07-23 10:40:03 +02:00
Tobias Schottdorf b67303c6a2 raft: allow use of joint quorums
This change introduces joint quorums by changing the Node and RawNode
API to accept pb.ConfChangeV2 (on top of pb.ConfChange).

pb.ConfChange continues to work as today: it allows carrying out a
single configuration change. A pb.ConfChange proposal gets added to
the Raft log as such and is thus also observed by the app during Ready
handling, and fed back to ApplyConfChange.

ConfChangeV2 allows joint configuration changes but will continue to
carry out configuration changes in "one phase" (i.e. without ever
entering a joint config) when this is possible.
2019-07-23 10:40:03 +02:00
Tobias Schottdorf 500af91653 raft: restore ability to bootstrap RawNode
We are worried about breaking backwards compatibility for any
application out there that may have relied on the old behavior. Their
RawNode invocation would have been broken by the removal of the peers
argument so it would not have changed silently; an associated comment
tells callers how to fix it.
2019-07-19 10:02:02 +02:00
Tobias Schottdorf c9491d7861 raft: clean up bootstrap
This is the first (maybe not last) step in cleaning up the bootstrap
code around StartNode.

Initializing a Raft group for the first time is awkward, since a
configuration has to be pulled from thin air. The way this is solved
today is unclean: The app is supposed to pass peers to StartNode(),
we add configuration changes for them to the log, immediately pretend
that they are applied, but actually leave them unapplied (to give the
app a chance to observe them, though if the app did decide to not apply
them things would really go off the rails), and then return control to
the app. The app will then process the initial Readys and as a result
the configuration will be persisted to disk; restarts of the node then
use RestartNode which doesn't take any peers.

The code that did this lived awkwardly in two places fairly deep down
the callstack, though it was really only necessary in StartNode(). This
commit refactors things to make this more obvious: only StartNode does
this dance now. In particular, RawNode does not support this at all any
more; it expects the app to set up its Storage correctly.

Future work may provide helpers to make this "preseeding" of the Storage
more user-friendly. It isn't entirely straightforward to do so since
the Storage interface doesn't provide the right accessors for this
purpose. Briefly speaking, we want to make sure that a non-bootstrapped
node can never catch up via the log so that we can implicitly use one
of the "skipped" log entries to represent the configuration change into
the bootstrap configuration. This is an invasive change that affects
all consumers of raft, and it is of lower urgency since the code (post
this commit) already encapsulates the complexity sufficiently.
2019-07-19 10:02:02 +02:00
Tobias Schottdorf c62b7048b5 raft: use RawNode for node's event loop
It has always bugged me that any new feature essentially needed to be
tested twice due to the two ways in which apps can use raft (`*node` and
`*RawNode`). Due to upcoming testing work for joint consensus, now is a
good time to rectify this somewhat.

This commit removes most logic from `(*node).run` and uses `*RawNode`
internally. This simplifies the logic and also lead (via debugging) to
some insight on how the semantics of the approaches differ, which is now
documented in the comments.
2019-07-19 09:59:59 +02:00
Tobias Schottdorf b171e1c78b raft: centralize configuration change application
Put all the logic related to applying a configuration change in one
place in preparation for adding joint consensus.

This inspired various TODOs.

I had to rewrite TestSnapshotSucceedViaAppResp since it was relying
on a snapshot applied to the leader, which is now prevented.
2019-07-03 21:26:42 +02:00
Tobias Schottdorf f9c2d00fb3 raft: extract 'tracker' package
Mechanically extract `progressTracker`, `Progress`, and `inflights`
to their own package named `tracker`. Add lots of comments in the
progress, and take the opportunity to rename and clarify various
fields.
2019-06-21 22:15:00 +02:00
Gyuho Lee 34bd797e67 *: revert module import paths
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-05-28 15:39:35 -07:00
Tobias Schottdorf ea82b2b758 raft: move more methods onto the progress tracker
Continues what was initiated in the last commit.
2019-05-21 16:02:52 +02:00
shivaramr 9150bf52d6 go modules: Fix module path version to include version number 2019-04-26 15:29:50 -07:00
caoming 92d5d19ce9 raft: more precise that rename res to err. 2019-03-22 10:18:00 +08:00
Xiang Li c0e04700cf
Merge pull request #10230 from manishrjain/master
raft: Explain ReportSnapshot and Propose behavior
2018-11-01 06:48:45 +08:00
Manish R Jain 4aa72ca1d3
raft: Explain ReportSnapshot and Propose behavior
Update godocs for node interface, explaining the behavior of ReportSnapshot and Propose.
2018-10-31 15:37:55 -07:00
Nathan VanBenschoten f89b06dc6d raft: provide protection against unbounded Raft log growth
The suggested pattern for Raft proposals is that they be retried
periodically until they succeed. This turns out to be an issue
when a leader cannot commit entries because the leader will continue
to append re-proposed entries to its log without committing anything.
This can result in the uncommitted tail of a leader's log growing
without bound until it is able to commit entries.

This change add a safeguard to protect against this case where a
leader's log can grow without bound during loss of quorum scenarios.
It does so by introducing a new, optional ``MaxUncommittedEntriesSize
configuration. This config limits the max aggregate size of uncommitted
entries that may be appended to a leader's log. Once this limit
is exceeded, proposals will begin to return ErrProposalDropped
errors.

See cockroachdb/cockroach#27772
2018-10-13 23:25:05 -04:00
Ben Darnell 08e88c6693
Merge pull request #10063 from tschottdorf/fix-commit-pagination
raft: fix correctness bug in CommittedEntries pagination
2018-10-02 12:39:29 -04:00
Peter Mattis 66ee394527 raft: fix Ready.MustSync logic
The previous logic was erroneously setting Ready.MustSync to true when
the hard state had not changed because we were comparing an empty hard
state to the previous hard state. In combination with another misfeature
in CockroachDB (unnecessary writing of empty batches), this was causing
a steady stream of synchronous writes to disk.
2018-09-19 16:33:16 -04:00
Tobias Schottdorf 7a8ab37bfd raft: fix correctness bug in CommittedEntries pagination
In #9982, a mechanism to limit the size of `CommittedEntries` was
introduced. The way this mechanism worked was that it would load
applicable entries (passing the max size hint) and would emit a
`HardState` whose commit index was truncated to match the limitation
applied to the entries. Unfortunately, this was subtly incorrect
when the user-provided `Entries` implementation didn't exactly
match what Raft uses internally. Depending on whether a `Node` or
a `RawNode` was used, this would either lead to regressing the
HardState's commit index or outright forgetting to apply entries,
respectively.

Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of `HardState` when limiting is active
and tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.

See [1] for more on the discovery of this bug (CockroachDB's
implementation of `Entries` returns one more entry than Raft's when the
size limit hits).

[1]: https://github.com/cockroachdb/cockroach/issues/28918#issuecomment-418174448
2018-09-04 14:52:23 +02:00
Gyuho Lee bb60f8ab1d raft: change import paths to "go.etcd.io/etcd"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 17:47:52 -07:00
Ben Darnell 0a670b7c9b raft: Introduce CommittedEntries pagination
The MaxSizePerMsg setting is now used to limit the size of
Ready.CommittedEntries. This prevents out-of-memory errors if the raft
log has become very large and commits all at once.
2018-08-07 12:54:34 -04:00
Vincent Lee f0dffb4163 raft: Propose in raft node wait the proposal result so we can fail fast while dropping proposal. 2018-04-03 11:04:09 +08:00
Vincent Lee 11fa4f0275 raft: raft learners should be returned after applyConfChange 2018-01-11 17:30:17 +08:00
Ben Darnell 8d8f3195e4 raft: Avoid scanning raft log in becomeLeader
Scanning the uncommitted portion of the raft log to determine whether
there are any pending config changes can be expensive. In
cockroachdb/cockroach#18601, we've seen that a new leader can spend so
much time scanning its log post-election that it fails to send
its first heartbeats in time to prevent a second election from
starting immediately.

Instead of tracking whether a pending config change exists with a
boolean, this commit tracks the latest log index at which a pending
config change *could* exist. This is a less expensive solution to
the problem, and the impact of false positives should be minimal since
a newly-elected leader should be able to quickly commit the tail of
its log.
2017-12-30 10:13:36 -05:00
siddontang c6f2db2e92 raft: support learner 2017-11-11 10:38:21 +08:00
Gyu-Ho Lee f65aee0759 *: replace 'golang.org/x/net/context' with 'context'
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-09-07 13:39:42 -07:00
Peter Mattis ab03a42f06 raft: add Ready.MustSync
Add Ready.MustSync which indicates that the hard state and raft log
entries in a Ready message must be synchronously written to persistent
storage.
2017-02-13 15:13:21 -05:00
Alexander Morozov 7afc490c95 raft: return empty status if node is stopped
If the node is stopped, then Status can hang forever because there is no
event loop to answer. So, just return empty status to avoid deadlocks.

Fix #6855

Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
2016-11-15 15:45:23 -08:00
Xiang Li 710b14ce56 raft: support safe readonly request
Implement raft readonly request described in raft thesis 6.4
along with the existing clock/lease based approach.
2016-09-12 15:13:52 +08:00
Gyu-Ho Lee e64ef3f261 raft: add 'TransferLeadership' to Node interface 2016-08-10 16:25:22 -07:00
Xiang Li 484f579905 raft: hide Campaign rules on applying all entries 2016-07-25 15:53:39 -07:00
Xiang Li 1c5754f02d raft: fix readindex 2016-07-19 15:00:58 -07:00
Jared Hulbert df94f58462 raft: atomic access alignment
The relevant structures are properly aligned, however, there is no comment
highlighting the need to keep it aligned as is present elsewhere in the
codebase.

Adding note to keep alignment, in line with similar comments in the codebase.
2016-07-08 11:05:41 -07:00
Gyu-Ho Lee 9e0de02fde raft: fix minor grammar, remove TODO
- test 'Term' panic cases (remove TODO)
- fix minor grammar in 'Node' godoc
2016-07-05 07:21:52 -07:00
Xiang Li 5f1c763993 Merge pull request #5553 from swingbach/master
raft: implemented read-only query when quorum check is on
2016-06-28 12:47:43 -07:00
swingbach@gmail.com 0faae33ace raft: implemented read-only query when quorum check is on 2016-06-28 10:52:53 +08:00
Xiang Li 848f539536 raft: make tick unblock and fix potential live lock 2016-06-16 08:01:06 -07:00
Gyu-Ho Lee fe884f8209 raft: update LICENSE header 2016-05-12 20:49:15 -07:00
Xiang Li 59c5110b73 raft: fix detected race in node.go 2016-04-22 15:45:33 -07:00
es-chow ac059eb8cb raft: transfer leader feature 2016-04-08 16:56:32 +08:00
Anthony Romano bd832e5b0a *: migrate Godeps to vendor/ 2016-03-22 17:10:28 -07:00
Anthony Romano afa0368dcc *: fix godoc bugs in interfaces and slice fields
detected with goword
2016-02-24 00:45:40 -08:00
Xiang Li 390a4518c0 raft: rework comment for advance interface 2016-02-12 13:43:51 -08:00
Anthony Romano 20461ab11a *: fix many typos 2016-01-31 21:42:39 -08:00
Ben Darnell 22925a1d2f raft: Remove redundant `raft.Commit` field.
Keeping this field in sync with `raft.raftLog.committed` was
error-prone, so instead we synthesize the `HardState` on demand.

Fixes #4278.
2016-01-26 15:18:55 -05:00
Sam Rijs 896719c877 raft: use configured logger in raft/node.go
Those three log statements in node.go have not been using the logger that was passed via `raft.Config`, but instead the default raft logger. This changes it to use the proper logger.
2016-01-25 00:15:44 +11:00
Emil Hessman b9f22cb69b raft: fix Node doc typo 2015-09-21 06:13:33 +02:00
Xiang Li 085447ed85 raft: fix raft node start bug
raft node should set initial prev hard state to empty.
Or it will not send the first hard coded state to application
until the state changes again.

This commit fixs the issue. It introduce a small overhead, that
the same tate might send to application twice when restarting.
But this is fine.
2015-05-27 13:32:04 -07:00