Commit Graph

971 Commits (0461b3fa51ebcd3f72ef54296638ac2f19ea9071)

Author SHA1 Message Date
Andrei Matei 0db04a38a5 raft: minor comment fix in confchange 2020-04-12 22:44:37 -04:00
Fullstop000 7eae024ead
raft: only redirect msg produced by own node (#11466)
Signed-off-by: Fullstop000 <fullstop1005@gmail.com>
2020-04-06 20:27:46 -07:00
Lars Lehtonen dbcf540c88
raft/rafttest: Prune Unused Functions (#11625)
* raft/rafttest: prune unused network interface

* raft/rafttest: prune unused raftNetwork.heal()

* raft/rafttest: prune unused logLevels.strToLev()

* raft/rafttest: prune unused InteractionEnv.writeErr()
2020-04-04 14:38:14 -07:00
qupeng 6f850a65a1
raft: cleanup read index code (#11528)
Signed-off-by: qupeng <qupeng@pingcap.com>
2020-03-03 09:20:25 -08:00
Tobias Schottdorf 0544f33248 raft: clarify ApplyConfChange contract for rejected conf changes
Apps typically maintain the raft configuration as part of the state
machine. As a result, they want to be able to reject configuration change
entries at apply time based on the state on which the entry is supposed
to be applied. When this happens, the app should not call
ApplyConfChange, but the comments did not make this clear.

As a result, it was tempting to pass an empty pb.ConfChange or it's V2
version instead of not calling ApplyConfChange.

However, an empty V1 or V2 proto aren't noops when the configuration is
joint: an empty V1 change is treated internally as a single
configuration change for NodeID zero and will cause a panic when applied
in a joint state. An empty V2 proto is treated as a signal to leave a
joint state, which means that the app's config and raft's would diverge.

The comments updated in this commit now ask users to not call
ApplyConfState when they reject a conf change. Apps that never use joint
consensus can keep their old behavior since the distinction only matters
when in a joint state, but we don't want to encourage that.
2020-02-25 12:45:45 +01:00
Tobias Schottdorf 37c7e4d1d8 raft: fix auto-transitioning out of joint config
The code doing so was undertested and buggy: it would launch multiple
attempts to transition out when the conf change was not the last element
in the log.

This commit fixes the problem and adds a regression test. It also
reworks the code to handle a former untested edge case, in which the
auto-transition append is refused. This can't happen any more with the
current version of the code because this proposal has size zero and is
special cased in increaseUncommittedSize. Last but not least, the
auto-leave proposal now also bumps pendingConfIndex, which was not done
previously due to an oversight.
2020-02-25 12:35:51 +01:00
Tobias Schottdorf 2332705f10 raft: remove bogus tail end of membership change interaction test
The test was supposed to end earlier, but some old copy pasta
survived.
2020-02-25 12:35:51 +01:00
po3rin 53f15caf73
raft: Fixed missing package name in README.md (#11566) 2020-01-29 10:15:00 -08:00
qupeng eaa0612e02 raft: abort leader transferring if the target is demoted (#11417)
Signed-off-by: qupeng <qupeng@pingcap.com>
2019-12-20 12:07:52 +08:00
Baohua Yang 4b755e8935 docs: Update the raft usage by adding Hyperledger project 2019-11-19 09:54:16 -08:00
Ilya Sevostyanov d487b16de1
confchange: removed duplicate check in confchange.Simple. 2019-10-01 11:32:12 +03:00
Wine93 b0534c1b44 raft/log_test: fixed wrong index 2019-08-25 04:47:11 +00:00
Wine93 5f42161750 raft: fixed some typos and simplify minor logic 2019-08-25 04:46:29 +00:00
Xiang Li f4dfd1976d
Merge pull request #10971 from nilsocket/codeReformat
raft : write compact if statements
2019-08-23 09:10:26 +08:00
nilsocket 702c69c906
raft : Write compact if statements 2019-08-23 04:31:40 +05:30
Tobias Schottdorf 47ae53d25d rafttest: print Ready before processing it
It was confusing to see the effects of the Ready (i.e. log messages)
printed before the Ready itself.
2019-08-16 09:41:35 +02:00
Tobias Schottdorf 99f8046fd1 raft: fix a test file name 2019-08-16 09:38:44 +02:00
Tobias Schottdorf 8d1946d16a raft: document problem with leader self-removal
When a leader removes itself, it will retain its leadership but not
accept new proposals, making the range effectively stuck until manual
intervention triggers a campaign event.

This commit documents the behavior. It does not correct it yet.
2019-08-16 09:38:44 +02:00
Tobias Schottdorf 306e75a96f raft: add a batch of interaction-driven conf change tests
Verifiy the behavior in various v1 and v2 conf change operations.
This also includes various fixups, notably it adds protection
against transitioning in and out of new configs when this is not
permissible.

There are more threads to pull, but those are left for future commits.
2019-08-16 09:38:44 +02:00
Tobias Schottdorf 4e19150676 raft: proactively probe newly added followers
When the leader applied a new configuration that added voters, it would
not immediately probe these voters, delaying when they would be caught
up.

I noticed this while writing an interaction-driven test, which has now
been cleaned up and completed.
2019-08-14 20:53:34 +02:00
Tobias Schottdorf 3d6721f751 rafttest: add _breakpoint directive
It is a helper case to attach a debugger to when a problem needs
to be investigated in a longer test file. In such a case, add the
following stanza immediately before the interesting behavior starts:

_breakpoint:
----
ok

and set a breakpoint on the _breakpoint case.
2019-08-14 20:53:34 +02:00
Tobias Schottdorf fdaed88f14 raft: initialize new Progress at LastIndex, not LastIndex+1
Initializing at LastIndex+1 meant that new peers would not be probed
immediately when they appeared in the leader's config, which delays
their getting caught up.
2019-08-14 20:53:34 +02:00
Tobias Schottdorf c2d9514370 raft/rafttest: fix stabilize handler
It was bailing out too early.
2019-08-14 17:24:14 +02:00
Tobias Grieger 029401ab81
Merge pull request #11005 from tbg/interactiontest
raft/rafttest: introduce datadriven testing
2019-08-12 11:52:52 +02:00
Tobias Schottdorf e8090e57a2 raft/rafttest: introduce datadriven testing
It has often been tedious to test the interactions between multi-member
Raft groups, especially when many steps were required to reach a certain
scenario. Often, this boilerplate was as boring as it is hard to write
and hard to maintain, making it attractive to resort to shortcuts
whenever possible, which in turn tended to undercut how meaningful and
maintainable the tests ended up being - that is, if the tests were even
written, which sometimes they weren't.

This change introduces a datadriven framework specifically for testing
deterministically the interaction between multiple members of a raft group
with the goal of reducing the friction for writing these tests to near
zero.

In the near term, this will be used to add thorough testing for joint
consensus (which is already available today, but wildly undertested),
but just converting an existing test into this framework has shown that
the concise representation and built-in inspection of log messages
highlights unexpected behavior much more readily than the previous unit
tests did (the test in question is `snapshot_succeed_via_app_resp`; the
reader is invited to compare the old and new version of it).

The main building block is `InteractionEnv`, which holds on to the state
of the whole system and exposes various relevant methods for
manipulating it, including but not limited to adding nodes, delivering
and dropping messages, and proposing configuration changes. All of this
is extensible so that in the future I hope to use it to explore the
phenomena discussed in

https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263

which requires injecting appropriate "crash points" in the Ready
handling loop. Discussions of the "what if X happened in state Y"
can quickly be made concrete by "scripting up an interaction test".

Additionally, this framework is intentionally not kept internal to the
raft package.. Though this is in its infancy, a goal is that it should
be possible for a suite of interaction tests to allow applications to
validate that their Storage implementation behaves accordingly, simply
by running a raft-provided interaction suite against their Storage.
2019-08-12 11:13:51 +02:00
Gyuho Lee 6c87b21821 raft: fix typo
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-09 21:26:48 -07:00
Tobias Grieger 4cec8dddc6
Merge pull request #11003 from tbg/interaction/restore
raft: fix restoring joint configurations
2019-08-09 20:13:04 +02:00
Xiang Li 84c69cca76
Merge pull request #10970 from nilsocket/minorFix1
raft : remove unnecessary, if check
2019-08-09 11:03:01 -07:00
Tobias Schottdorf 37ab5bdd21 raft: fix restoring joint configurations
While writing interaction tests for joint configuration changes, I
realized that this wasn't working yet - restoring had no notion of
the joint configuration and was simply dropping it on the floor.

This commit introduces a helper `confchange.Restore` which takes
a `ConfState` and initializes a `Tracker` from it.

This is then used both in `(*raft).restore` as well as in `newRaft`.
2019-08-09 19:28:43 +02:00
Tobias Schottdorf a5f785a232 confchange: clean up unnecessary block 2019-08-09 19:28:43 +02:00
Tobias Grieger 7948f39790
Merge pull request #11004 from tbg/interaction/unused-type
raft/tracker: visit Progress in stable order
2019-08-09 12:32:04 +02:00
Tobias Schottdorf 1b3e0821a7 raft/tracker: visit Progress in stable order
This is helpful for upcoming testing work which allows datadriven
testing of the interaction of multiple nodes. This testing requires
determinism to work correctly.
2019-08-08 09:37:33 +02:00
Tobias Schottdorf 9553994cd7 raft/auorum: remove unused type 2019-08-07 18:53:01 +02:00
Tobias Schottdorf c30c2e345b raft: let learners vote
It turns out that that learners must be allowed to cast votes.

This seems counter- intuitive but is necessary in the situation in which
a learner has been promoted (i.e. is now a voter) but has not learned
about this yet.

For example, consider a group in which id=1 is a learner and id=2 and
id=3 are voters. A configuration change promoting 1 can be committed on
the quorum `{2,3}` without the config change being appended to the
learner's log. If the leader (say 2) fails, there are de facto two
voters remaining. Only 3 can win an election (due to its log containing
all committed entries), but to do so it will need 1 to vote. But 1
considers itself a learner and will continue to do so until 3 has
stepped up as leader, replicates the conf change to 1, and 1 applies it.

Ultimately, by receiving a request to vote, the learner realizes that
the candidate believes it to be a voter, and that it should act
accordingly. The candidate's config may be stale, too; but in that case
it won't win the election, at least in the absence of the bug discussed
in:
https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263.
2019-08-07 12:03:18 +02:00
nilsocket 0d99469cdb
raft : `newRaft()` does check for validity of `Config` 2019-08-02 03:09:51 +05:30
Gyuho Lee 4e43a082b2 raft: use mutex in "SetLogger" to avoid race conditions in tests
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-29 15:43:19 -07:00
Gyuho Lee 936c506e8d
Merge pull request #10945 from tbg/add-todo
raft: leave TODO about leaving StateSnapshot
2019-07-29 13:51:38 -07:00
Tobias Schottdorf 3b02d4c5ff raft: leave TODO about leaving StateSnapshot
The condition is overly strict, which has popped up in CockroachDB
recently.
2019-07-26 23:19:34 +02:00
Gyuho Lee c7c9428f6b raft: move "RawNode", clarify tick miss
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-24 23:35:36 -07:00
Tobias Schottdorf 721127da12 raft: require app to consume result from Ready()
I changed `(*RawNode).Ready`'s behavior in #10892 in a problematic way.
Previously, `Ready()` would create and immediately "accept" a Ready
(i.e. commit the app to actually handling it). In #10892, Ready() became
a pure read-only operation and the "accepting" was moved to
`Advance(rd)`.  As a result it was illegal to use the RawNode in certain
ways while the Ready was being handled. Failure to do so would result in
dropped messages (and perhaps worse). For example, with the following
operations

1. `rd := rawNode.Ready()`
2. `rawNode.Step(someMsg)`
3. `rawNode.Advance(rd)`

`someMsg` would be dropped, because `Advance()` would clear out the
outgoing messages thinking that they had all been handled by the client.
I mistakenly assumed that this restriction had existed prior, but this
is incorrect.

I noticed this while trying to pick up the above PR in CockroachDB,
where it caused unit test failures, precisely due to the above example.

This PR reestablishes the previous behavior (result of `Ready()` must
be handled by the app) and adds a regression test.

While I was there, I carried out a few small clarifying refactors.
2019-07-23 22:45:01 +02:00
Tobias Schottdorf b9c051e7a7 raftpb: clean up naming in ConfChange 2019-07-23 10:40:03 +02:00
Tobias Schottdorf b67303c6a2 raft: allow use of joint quorums
This change introduces joint quorums by changing the Node and RawNode
API to accept pb.ConfChangeV2 (on top of pb.ConfChange).

pb.ConfChange continues to work as today: it allows carrying out a
single configuration change. A pb.ConfChange proposal gets added to
the Raft log as such and is thus also observed by the app during Ready
handling, and fed back to ApplyConfChange.

ConfChangeV2 allows joint configuration changes but will continue to
carry out configuration changes in "one phase" (i.e. without ever
entering a joint config) when this is possible.
2019-07-23 10:40:03 +02:00
Tobias Schottdorf 88f5561733 raft: use ConfChangeSingle internally 2019-07-23 10:39:48 +02:00
Tobias Schottdorf 10680744b9 raft: introduce protos for joint quorums 2019-07-23 10:39:48 +02:00
Tobias Schottdorf caa48bcc3d raft: remove TestNodeBoundedLogGrowthWithPartition
It has a data race between the test's call to `reduceUncommittedSize`
and a corresponding call during Ready handling in `(*node).run()`.
The corresponding RawNode test still verifies the functionality, so
instead of fixing the test we can remove it.
2019-07-19 12:35:14 +02:00
Tobias Schottdorf 500af91653 raft: restore ability to bootstrap RawNode
We are worried about breaking backwards compatibility for any
application out there that may have relied on the old behavior. Their
RawNode invocation would have been broken by the removal of the peers
argument so it would not have changed silently; an associated comment
tells callers how to fix it.
2019-07-19 10:02:02 +02:00
Tobias Schottdorf c9491d7861 raft: clean up bootstrap
This is the first (maybe not last) step in cleaning up the bootstrap
code around StartNode.

Initializing a Raft group for the first time is awkward, since a
configuration has to be pulled from thin air. The way this is solved
today is unclean: The app is supposed to pass peers to StartNode(),
we add configuration changes for them to the log, immediately pretend
that they are applied, but actually leave them unapplied (to give the
app a chance to observe them, though if the app did decide to not apply
them things would really go off the rails), and then return control to
the app. The app will then process the initial Readys and as a result
the configuration will be persisted to disk; restarts of the node then
use RestartNode which doesn't take any peers.

The code that did this lived awkwardly in two places fairly deep down
the callstack, though it was really only necessary in StartNode(). This
commit refactors things to make this more obvious: only StartNode does
this dance now. In particular, RawNode does not support this at all any
more; it expects the app to set up its Storage correctly.

Future work may provide helpers to make this "preseeding" of the Storage
more user-friendly. It isn't entirely straightforward to do so since
the Storage interface doesn't provide the right accessors for this
purpose. Briefly speaking, we want to make sure that a non-bootstrapped
node can never catch up via the log so that we can implicitly use one
of the "skipped" log entries to represent the configuration change into
the bootstrap configuration. This is an invasive change that affects
all consumers of raft, and it is of lower urgency since the code (post
this commit) already encapsulates the complexity sufficiently.
2019-07-19 10:02:02 +02:00
Tobias Schottdorf c62b7048b5 raft: use RawNode for node's event loop
It has always bugged me that any new feature essentially needed to be
tested twice due to the two ways in which apps can use raft (`*node` and
`*RawNode`). Due to upcoming testing work for joint consensus, now is a
good time to rectify this somewhat.

This commit removes most logic from `(*node).run` and uses `*RawNode`
internally. This simplifies the logic and also lead (via debugging) to
some insight on how the semantics of the approaches differ, which is now
documented in the comments.
2019-07-19 09:59:59 +02:00
Jingyi Hu 233be58056
Merge pull request #10839 from needkane/pr
raft: update log info and annotation
2019-07-18 23:26:44 -07:00
Tobias Schottdorf 6b0322549f raft: replace StatusWithoutProgress with BasicStatus
Now that a Config is also added to the full status, the old name
did not convey the intention, which was to get a Status without
an associated allocation.
2019-07-18 16:28:37 +02:00