vitalif/etcd - etcd

Commit Graph

Author	SHA1	Message	Date
Nathan VanBenschoten	f89b06dc6d	raft: provide protection against unbounded Raft log growth The suggested pattern for Raft proposals is that they be retried periodically until they succeed. This turns out to be an issue when a leader cannot commit entries because the leader will continue to append re-proposed entries to its log without committing anything. This can result in the uncommitted tail of a leader's log growing without bound until it is able to commit entries. This change add a safeguard to protect against this case where a leader's log can grow without bound during loss of quorum scenarios. It does so by introducing a new, optional ``MaxUncommittedEntriesSize configuration. This config limits the max aggregate size of uncommitted entries that may be appended to a leader's log. Once this limit is exceeded, proposals will begin to return ErrProposalDropped errors. See cockroachdb/cockroach#27772	2018-10-13 23:25:05 -04:00
Tobias Schottdorf	7a8ab37bfd	raft: fix correctness bug in CommittedEntries pagination In #9982, a mechanism to limit the size of `CommittedEntries` was introduced. The way this mechanism worked was that it would load applicable entries (passing the max size hint) and would emit a `HardState` whose commit index was truncated to match the limitation applied to the entries. Unfortunately, this was subtly incorrect when the user-provided `Entries` implementation didn't exactly match what Raft uses internally. Depending on whether a `Node` or a `RawNode` was used, this would either lead to regressing the HardState's commit index or outright forgetting to apply entries, respectively. Asking implementers to precisely match the Raft size limitation semantics was considered but looks like a bad idea as it puts correctness squarely in the hands of downstream users. Instead, this PR removes the truncation of `HardState` when limiting is active and tracks the applied index separately. This removes the old paradigm (that the previous code tried to work around) that the client will always apply all the way to the commit index, which isn't true when commit entries are paginated. See [1] for more on the discovery of this bug (CockroachDB's implementation of `Entries` returns one more entry than Raft's when the size limit hits). [1]: https://github.com/cockroachdb/cockroach/issues/28918#issuecomment-418174448	2018-09-04 14:52:23 +02:00
Gyuho Lee	bb60f8ab1d	raft: change import paths to "go.etcd.io/etcd" Signed-off-by: Gyuho Lee <leegyuho@amazon.com>	2018-08-28 17:47:52 -07:00
Ben Darnell	0a670b7c9b	raft: Introduce CommittedEntries pagination The MaxSizePerMsg setting is now used to limit the size of Ready.CommittedEntries. This prevents out-of-memory errors if the raft log has become very large and commits all at once.	2018-08-07 12:54:34 -04:00
Ben Darnell	bc14deecca	raft: Add a test for MaxSizePerMsg feature Ensure that this limit is respected when generating MsgApp messages.	2018-08-06 16:52:16 -04:00
Vincent Lee	f0dffb4163	raft: Propose in raft node wait the proposal result so we can fail fast while dropping proposal.	2018-04-03 11:04:09 +08:00
Xiang Li	c5532ebbf6	Merge pull request #9067 from absolute8511/optimize-raft-drop raft: let raft step return error when proposal is dropped to allow fail-fast	2018-01-11 19:54:52 -08:00
Vincent Lee	30ced5b2be	raft: let raft step return error when proposal is dropped to allow fail-fast.	2018-01-12 10:16:47 +08:00
Vincent Lee	11fa4f0275	raft: raft learners should be returned after applyConfChange	2018-01-11 17:30:17 +08:00
Ben Darnell	8d8f3195e4	raft: Avoid scanning raft log in becomeLeader Scanning the uncommitted portion of the raft log to determine whether there are any pending config changes can be expensive. In cockroachdb/cockroach#18601, we've seen that a new leader can spend so much time scanning its log post-election that it fails to send its first heartbeats in time to prevent a second election from starting immediately. Instead of tracking whether a pending config change exists with a boolean, this commit tracks the latest log index at which a pending config change could exist. This is a less expensive solution to the problem, and the impact of false positives should be minimal since a newly-elected leader should be able to quickly commit the tail of its log.	2017-12-30 10:13:36 -05:00
Gyu-Ho Lee	f65aee0759	*: replace 'golang.org/x/net/context' with 'context' Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-09-07 13:39:42 -07:00
smetro	e461017ac5	raft: add DisableProposalForwarding option this allows users to disable followers from forwarding proposals to the leader.	2017-06-21 14:58:28 -07:00
Gyu-Ho Lee	3d75395875	*: remove never-unused vars, minor lint fix Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-03-06 14:59:12 -08:00
Xiang	931cf3454a	raft: make TestNodeTick reliable TestNodeTick relies on a unreliable func `waitForSchedule` when running with GOMAXPROCS > 1. This commit changes the test to make sure we stop the node afte it drains the tick chan. The test should be reliable now.	2017-03-01 17:35:58 -08:00
Peter Mattis	ab03a42f06	raft: add Ready.MustSync Add Ready.MustSync which indicates that the hard state and raft log entries in a Ready message must be synchronously written to persistent storage.	2017-02-13 15:13:21 -05:00
Manjunath A Kumatagi	0914b8b707	test: Fix gosimple errors Getting gosimple suggestion while running test script, so this PR is for fixing gosimple S1019 check. raft/node_test.go:456:40: should use make([]raftpb.Entry, 1) instead (S1019) raft/node_test.go:457:49: should use make([]raftpb.Entry, 1) instead (S1019) raft/node_test.go:458:43: should use make([]raftpb.Message, 1) instead (S1019) Refer https://github.com/dominikh/go-tools/blob/master/cmd/gosimple/README.md#checks for more information.	2017-02-09 08:01:28 -05:00
Dylan.Wen	16135165c2	raft: add RawNode test case for #6866	2017-01-10 10:55:57 +08:00
Xiang Li	fc8cd44c72	raft: use status to test node stop n.Tick() is async. It can be racy when running with n.Stop(). n.Status() is sync and has a feedback mechnism internally. So there wont be any race between n.Status() and n.Stop() call.	2017-01-03 15:18:48 -08:00
Xiang Li	f2eb8560ed	raft: fix TestNodeProposeAddDuplicateNode Only send signal after applying conf change. Or deadlock might happen if raft node receives ready without conf change when the test server is slow.	2016-11-20 21:59:31 -08:00
Vincent Lee	e6d1ebcc1d	raft: use the channel instead of sleep to make test case reliable	2016-11-21 13:30:15 +08:00
Vincent Lee	bc6f5ad53e	raft: fix test case for data race	2016-11-21 10:30:36 +08:00
Vincent Lee	62bd5477b9	raft: fix test case, should wait config propose applied	2016-11-21 10:10:34 +08:00
Vincent Lee	16e3ab0f11	raft: test case to check the duplicate add node propose	2016-11-20 16:58:11 +08:00
Gyu-Ho Lee	cb5c92f69b	raft: do not attach term to MsgReadIndex Fix https://github.com/coreos/etcd/issues/6744. MsgReadIndex, as MsgProp, is to be forwarded to leader. So we should treat it as local message.	2016-10-28 22:12:25 -07:00
Xiang Li	710b14ce56	raft: support safe readonly request Implement raft readonly request described in raft thesis 6.4 along with the existing clock/lease based approach.	2016-09-12 15:13:52 +08:00
Xiang Li	a75688bd17	Merge pull request #6039 from xiang90/fix_r raft: hide Campaign rules on applying all entries	2016-07-26 20:52:09 -07:00
Xiang Li	484f579905	raft: hide Campaign rules on applying all entries	2016-07-25 15:53:39 -07:00
Gyu-Ho Lee	4ff6c72257	raft: replace 'reflect.DeepEqual' with bytes.Equal	2016-07-22 16:34:13 -07:00
Xiang Li	1c5754f02d	raft: fix readindex	2016-07-19 15:00:58 -07:00
Xiang Li	5f1c763993	Merge pull request #5553 from swingbach/master raft: implemented read-only query when quorum check is on	2016-06-28 12:47:43 -07:00
swingbach@gmail.com	0faae33ace	raft: implemented read-only query when quorum check is on	2016-06-28 10:52:53 +08:00
Xiang Li	848f539536	raft: make tick unblock and fix potential live lock	2016-06-16 08:01:06 -07:00
Xiang Li	500296d0fb	raft: fix TestNodeStepUnblock The test cases have side-effect. We need to stop testing if one of the test fails. Also timeout should be much longer to avoid false-positive.	2016-06-03 10:22:11 -07:00
Gyu-Ho Lee	fe884f8209	raft: update LICENSE header	2016-05-12 20:49:15 -07:00
es-chow	ac059eb8cb	raft: transfer leader feature	2016-04-08 16:56:32 +08:00
Anthony Romano	bd832e5b0a	*: migrate Godeps to vendor/	2016-03-22 17:10:28 -07:00
Xiang Li	aa59e7518e	raft: remove unnecessary waitSchedule in test	2016-03-09 09:18:49 -08:00
Gyu-Ho Lee	c827c7432c	raft: fix leaky goroutines in raft test	2016-01-31 12:41:33 -08:00
Xiang Li	a8cc1570d0	raft: support quorum check when raft is leader If quorum check fails, the leader will step down to follower.	2015-11-24 09:36:37 -08:00
Yicheng Qin	0de52414cd	raft: extend wait timeout in TestNodeAdvance This fixes the failure met in semaphore CI.	2015-11-03 16:57:18 -08:00
Yicheng Qin	018fb8e6d9	pkg/testutil: ForceGosched -> WaitSchedule ForceGosched() performs bad when GOMAXPROCS>1. When GOMAXPROCS=1, it could promise that other goroutines run long enough because it always yield the processor to other goroutines. But it cannot yield processor to goroutine running on other processors. So when GOMAXPROCS>1, the yield may finish when goroutine on the other processor just runs for little time. Here is a test to confirm the case: ``` package main import ( "fmt" "runtime" "testing" ) func ForceGosched() { // possibility enough to sched up to 10 go routines. for i := 0; i < 10000; i++ { runtime.Gosched() } } var d int func loop(c chan struct{}) { for { select { case <-c: for i := 0; i < 1000; i++ { fmt.Sprintf("come to time %d", i) } d++ } } } func TestLoop(t *testing.T) { c := make(chan struct{}, 1) go loop(c) c <- struct{}{} ForceGosched() if d != 1 { t.Fatal("d is not incremented") } } ``` `go test -v -race` runs well, but `GOMAXPROCS=2 go test -v -race` fails. Change the functionality to waiting for schedule to happen.	2015-06-10 14:37:41 -07:00
Xiang Li	085447ed85	raft: fix raft node start bug raft node should set initial prev hard state to empty. Or it will not send the first hard coded state to application until the state changes again. This commit fixs the issue. It introduce a small overhead, that the same tate might send to application twice when restarting. But this is fine.	2015-05-27 13:32:04 -07:00
Xiang Li	abddef0f28	raft: make node configurable	2015-03-23 21:20:49 -07:00
Xiang Li	d9b5b56c82	raft: make raft configurable	2015-03-23 09:55:19 -07:00
Xiang Li	7fe608532a	raft: do not reset vote if term is not changed raft MUST keep the voting information for the same term. reset should not reset vote if term is not changed.	2015-03-07 22:31:20 -08:00
Xiang Li	9b4d52ee73	raft: do not resend snapshot if not necessary raft relies on the link layer to report the status of the sent snapshot. If the snapshot is still sending, the replication to that remote peer will be paused. If the snapshot finish sending, the replication will begin optimistically after electionTimeout. If the snapshot fails, raft will try to resend it.	2015-02-28 11:41:58 -08:00
Barak Michener	92dca0af0f	*: remove shadowing of variables from etcd and add travis test We've been bitten by this enough times that I wrote a tool so that it never happens again.	2015-02-17 16:31:42 -05:00
Jonathan Boulle	f1ed69e883	*: switch to line comments for copyright Build tags are not compatible with block comments. Also adds copyright header to a few places it was missing.	2015-01-26 09:53:30 -08:00
Ben Darnell	59214978a2	raft: Add applied index as an argument to newRaft and RestartNode.	2015-01-22 11:38:05 -05:00
Xiang Li	a5efbf826d	raft: drop nodes in softState	2014-12-09 11:43:52 -08:00

1 2 3 4

160 Commits (c75ba98f8167a131818a9306459cacff1d059f69)