Commit Graph

30 Commits (0461b3fa51ebcd3f72ef54296638ac2f19ea9071)

Author SHA1 Message Date
Tobias Schottdorf 306e75a96f raft: add a batch of interaction-driven conf change tests
Verifiy the behavior in various v1 and v2 conf change operations.
This also includes various fixups, notably it adds protection
against transitioning in and out of new configs when this is not
permissible.

There are more threads to pull, but those are left for future commits.
2019-08-16 09:38:44 +02:00
Tobias Schottdorf e8090e57a2 raft/rafttest: introduce datadriven testing
It has often been tedious to test the interactions between multi-member
Raft groups, especially when many steps were required to reach a certain
scenario. Often, this boilerplate was as boring as it is hard to write
and hard to maintain, making it attractive to resort to shortcuts
whenever possible, which in turn tended to undercut how meaningful and
maintainable the tests ended up being - that is, if the tests were even
written, which sometimes they weren't.

This change introduces a datadriven framework specifically for testing
deterministically the interaction between multiple members of a raft group
with the goal of reducing the friction for writing these tests to near
zero.

In the near term, this will be used to add thorough testing for joint
consensus (which is already available today, but wildly undertested),
but just converting an existing test into this framework has shown that
the concise representation and built-in inspection of log messages
highlights unexpected behavior much more readily than the previous unit
tests did (the test in question is `snapshot_succeed_via_app_resp`; the
reader is invited to compare the old and new version of it).

The main building block is `InteractionEnv`, which holds on to the state
of the whole system and exposes various relevant methods for
manipulating it, including but not limited to adding nodes, delivering
and dropping messages, and proposing configuration changes. All of this
is extensible so that in the future I hope to use it to explore the
phenomena discussed in

https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263

which requires injecting appropriate "crash points" in the Ready
handling loop. Discussions of the "what if X happened in state Y"
can quickly be made concrete by "scripting up an interaction test".

Additionally, this framework is intentionally not kept internal to the
raft package.. Though this is in its infancy, a goal is that it should
be possible for a suite of interaction tests to allow applications to
validate that their Storage implementation behaves accordingly, simply
by running a raft-provided interaction suite against their Storage.
2019-08-12 11:13:51 +02:00
Tobias Schottdorf 37ab5bdd21 raft: fix restoring joint configurations
While writing interaction tests for joint configuration changes, I
realized that this wasn't working yet - restoring had no notion of
the joint configuration and was simply dropping it on the floor.

This commit introduces a helper `confchange.Restore` which takes
a `ConfState` and initializes a `Tracker` from it.

This is then used both in `(*raft).restore` as well as in `newRaft`.
2019-08-09 19:28:43 +02:00
Tobias Schottdorf 9553994cd7 raft/auorum: remove unused type 2019-08-07 18:53:01 +02:00
Gyuho Lee 34bd797e67 *: revert module import paths
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-05-28 15:39:35 -07:00
shivaramr 9150bf52d6 go modules: Fix module path version to include version number 2019-04-26 15:29:50 -07:00
Tobias Schottdorf 1569f4829d raft: print RejectHint of zero on MsgAppResp
A zero RejectHint on MsgAppResp is still used, and so should be
reflected in the message description.
2018-11-23 11:06:38 +01:00
Tobias Schottdorf ad49c8fd98 raft: fix bug in unbounded log growth prevention mechanism
The previous code was using the proto-generated `Size()` method to
track the size of an incoming proposal at the leader. This includes
the Index and Term, which were mutated after the call to `Size()`
when appending to the log. Additionally, it was not taking into
account that an ignored configuration change would ignore the
original proposal and append an empty entry instead.

As a result, a fully committed Raft group could end up with a non-
zero tracked uncommitted Raft log counter that would eventually hit
the ceiling and drop all future proposals indiscriminately. It would
also immediately imply that proposals exceeding the threshold alone
would get refused (as the "first uncommitted proposal" gets special
treatment and is always allowed in).

Track only the size of the payload actually appended to the Raft log
instead.

For context, see:
https://github.com/cockroachdb/cockroach/issues/31618#issuecomment-431374938
2018-10-22 11:28:39 +02:00
Tobias Schottdorf 7a8ab37bfd raft: fix correctness bug in CommittedEntries pagination
In #9982, a mechanism to limit the size of `CommittedEntries` was
introduced. The way this mechanism worked was that it would load
applicable entries (passing the max size hint) and would emit a
`HardState` whose commit index was truncated to match the limitation
applied to the entries. Unfortunately, this was subtly incorrect
when the user-provided `Entries` implementation didn't exactly
match what Raft uses internally. Depending on whether a `Node` or
a `RawNode` was used, this would either lead to regressing the
HardState's commit index or outright forgetting to apply entries,
respectively.

Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of `HardState` when limiting is active
and tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.

See [1] for more on the discovery of this bug (CockroachDB's
implementation of `Entries` returns one more entry than Raft's when the
size limit hits).

[1]: https://github.com/cockroachdb/cockroach/issues/28918#issuecomment-418174448
2018-09-04 14:52:23 +02:00
Gyuho Lee bb60f8ab1d raft: change import paths to "go.etcd.io/etcd"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 17:47:52 -07:00
Ben Darnell 73cae7abd0 raft: Implement the PreVote RPC described in thesis section 9.6
This prevents disruption when a node that has been partitioned
away rejoins the cluster.

Fixes #6522
2016-10-19 19:35:20 +08:00
Gyu-Ho Lee f4141f0f51 raft: handle 'MsgTransferLeader' in follower 2016-08-10 16:24:29 -07:00
Gyu-Ho Lee fe884f8209 raft: update LICENSE header 2016-05-12 20:49:15 -07:00
es-chow ac059eb8cb raft: transfer leader feature 2016-04-08 16:56:32 +08:00
Ben Darnell c185bdaf95 raft: Improve formatting of DescribeMessage 2016-01-20 11:03:07 -04:00
Hitoshi Mitake 9b2da76796 raft: remove go vet compliants 2015-12-16 13:29:23 +09:00
Xiang Li a8cc1570d0 raft: support quorum check when raft is leader
If quorum check fails, the leader will step down to follower.
2015-11-24 09:36:37 -08:00
Ben Darnell ef721db247 raft: Format node IDs as hex in DescribeMessage.
This is how they are printed in all other log messages.
2015-05-20 15:32:56 -04:00
Xiang Li 7571b2cde2 raft: limit the size of msgApp
limit the max size of entries sent per message.
Lower the cost at probing state as we limit the size per message;
lower the penalty when aggressively decrease to a too low next.
2015-03-18 15:59:30 -07:00
Xiang Li 9b4d52ee73 raft: do not resend snapshot if not necessary
raft relies on the link layer to report the status of the sent snapshot.
If the snapshot is still sending, the replication to that remote peer will
be paused. If the snapshot finish sending, the replication will begin
optimistically after electionTimeout. If the snapshot fails, raft will
try to resend it.
2015-02-28 11:41:58 -08:00
Xiang Li 2af33fd494 raft: add reportUnreachable 2015-02-28 10:45:22 -08:00
Ben Darnell b53dc0826e Only use the EntryFormatter for normal entries.
ConfChange entries also have a Data field but the application-supplied
formatter won't know what to do with them.
2015-02-20 13:51:14 -05:00
Jonathan Boulle f1ed69e883 *: switch to line comments for copyright
Build tags are not compatible with block comments.
Also adds copyright header to a few places it was missing.
2015-01-26 09:53:30 -08:00
Ben Darnell cd9d5573d4 raft: make EntryFormatter less clever. 2015-01-21 19:27:26 -05:00
Ben Darnell e73d442e32 raft: Add support for custom formatters in DescribeMessage/DescribeEntry 2015-01-21 14:12:58 -05:00
Ben Darnell 2e1c36cdd9 raft: introduce MsgHeartbeatResp.
Now that heartbeats are distinct from MsgApp{,Resp}, the retries
currently performed in stepLeader's MsgAppResp section are only
performed on an actual MsgAppResp (or a new MsgProp). This means
that it may take a long time to recover from a dropped MsgAppResp
in a quiet cluster.

This commit adds a dedicated heartbeat response message. This message
does not convey the follower's current log position because the
MsgHeartbeat does not include the leaders term and index. Upon receipt
of a heartbeat response, the leader may retry the latest MsgApp if it
believes the follower to be behind.
2015-01-14 17:34:10 -05:00
Xiang Li fc96a9e4a7 raft: remove unnecessary funcs in raft.go 2014-12-25 17:04:33 -08:00
Xiang Li 6409a8bf0d raft: filter out messages from unknow sender.
If we cannot find the `m.from` from current peers in the raft and it is a response
message, we should filter it out or raft panics. We are not targetting to avoid
malicious peers.

It has to be done in the raft node layer syncchronously. Although we can check
it at the application layer asynchronously, but after the checking and before
the message going into raft, the raft state machine might make progress and
unfortunately remove the `m.from` peer.
2014-12-05 11:34:56 -08:00
Xiang Li 8de98d4903 raft: clean up 2014-11-25 16:21:50 -08:00
Ben Darnell 25b6590547 raft: introduce log storage interface.
This change splits the raftLog.entries array into an in-memory
"unstable" list and a pluggable interface for retrieving entries that
have been persisted to disk. An in-memory implementation of this
interface is provided which behaves the same as the old version;
in a future commit etcdserver could replace the MemoryStorage with
one backed by the WAL.
2014-11-10 17:40:39 -05:00