Commit Graph

972 Commits (96cce208c2cb9e70ac3573b3f76eed7c84d262d1)

Author SHA1 Message Date
Jonathan Boulle af9f352fe3 raft: update RecentActive name in comments
Noticed when retrospectively reviewing #3976 that a couple of places
were missed when the variable was renamed.
2015-12-11 15:06:11 -08:00
Xiang Li cc6d98bf89 etcdserver: only send snapshot when the member is active 2015-12-10 16:15:26 -08:00
Xiang Li 9df46f9d6f raft: expose RecentActive in Progress 2015-12-10 12:17:18 -08:00
Bram Gruneir 1901a4c718 raft: Ensure that Progress is not nil when a MsgSnapStatus comes in.
This was causing some issues in cockroach cockroachdb/cockroach#2950
2015-12-07 16:01:18 -05:00
Gyu-Ho Lee d817f885db raft: doc, debugging instruction on MessageType
This adds documentation on MessageType. Having clear explanation about
MessageType helps understand raft logic and debug etcd when there is a
message dropping. This is partially for coreos#3806.
2015-12-03 00:45:11 -08:00
es-chow 5bc56786dc raft: add RawNode which is a thread-unsafe node without goroutine and remove MultiNode 2015-11-26 17:14:14 +08:00
Xiang Li a8cc1570d0 raft: support quorum check when raft is leader
If quorum check fails, the leader will step down to follower.
2015-11-24 09:36:37 -08:00
Gyu-Ho Lee 81229dbea9 *: add missing package descriptions
This adds and updates package descriptions in etcd projects.
And also deletes some duplicate LICENSE statements.
2015-11-17 20:54:10 -08:00
Gyu-Ho Lee e1c108e604 raft: minor typo in progress.go
Fixes a minor typo.
2015-11-17 14:21:35 -08:00
Xiang Li 5d0268aa2e Merge pull request #3877 from bdarnell/campaign-while-leader
raft: no-op instead of panic for Campaigning while leader
2015-11-16 19:59:34 -08:00
Ben Darnell fbeb58d265 raft: no-op instead of panic for Campaigning while leader
We need to be able to force an election (on one node) after creating a
new group (cockroachdb/cockroach#1384), but it is difficult to ensure
that our call to Campaign does not race with an election that may be
started by raft itself. A redundant call to Campaign should be a no-op
instead of a panic. (But the panic in becomeCandidate remains, because
we don't want to update the term or change the committed index in this
case)
2015-11-16 21:44:14 -05:00
Yicheng Qin 3a65442d7d raft: fix print format for term in one log line
`term` should be printed in decimal representation instead of
hexadecimal one.
2015-11-15 20:26:16 -08:00
Xiang Li 2990249c1d Merge pull request #3856 from xiang90/raft_doc_restart
raft: add doc to make restart clear
2015-11-11 11:15:49 -08:00
Xiang Li f7f28b9984 raft: add doc to make restart clear, especially for configuration changed case 2015-11-11 11:11:58 -08:00
Xiang Li 6df52614fc raft: add more words about raft protocol 2015-11-11 09:20:25 -08:00
Yicheng Qin 0de52414cd raft: extend wait timeout in TestNodeAdvance
This fixes the failure met in semaphore CI.
2015-11-03 16:57:18 -08:00
Yicheng Qin bf3057e5bd raft: extend wait timeout in TestMultiNodeAdvance
This fixes the failure met in semaphore CI:

```
--- FAIL: TestMultiNodeAdvance-2 (0.01s)
		multinode_test.go:458: expect Ready after Advance, but there is
		no Ready available
```
2015-10-23 12:08:24 -07:00
Yicheng Qin 01806c3e80 raft: fix malformed example name
It is reported by latest govet:
```
gopath/src/github.com/coreos/etcd/raft/example_test.go:26: Example_Node
has malformed example suffix: Node
```
2015-10-20 16:40:01 -07:00
Gyu-Ho Lee 1716d5858f raft/documentation: clarify progress's subjects.
If I understand correctly, `progress` represents the states of follower. For
me, some comments weren't clear because it was missing the subjects of
`progress`. This adds more clarification on who is doing what. Please let me
know if I misunderstood anything. Thanks,
2015-10-15 19:15:08 -07:00
Cong Ding 362df8e470 raft/doc: fix misuse of `for' loop in docs 2015-10-15 11:13:30 -05:00
Cong Ding f1f92f0fa3 raft/doc: fix typos 2015-10-15 02:17:34 -05:00
Kenji Kaneda ebd8cb04c1 raft: fix a description of MemoryStorage.Compact
The parameter name is compactIndex, not i.
2015-10-06 21:49:33 -07:00
Cong Ding b2edf1d24a raft: fix typo in doc 2015-10-01 11:21:23 -05:00
Yicheng Qin 533e728b64 Merge pull request #3609 from yichengq/raft-snapshot
raft: kill TODO about behavior when snapshot fails
2015-09-29 19:32:31 -07:00
Yicheng Qin 4c82b481a5 raft: improve behavior when snapshot fails
etcd is going to support incremental snapshot, and we design to let it
send at most one snapshot out at first stage. So when one snapshot is in
flight, snapshot request will return error.

When failing to get snapshot when sending MsgSnap, raft prints out
related log and abort sending this message.
2015-09-29 19:15:15 -07:00
Kenji Kaneda f602767e50 raft: remove an obsolete TODO comment on 4MB maxMsgSize hard coding
The TODO comment was added by 7571b2cd, and it was addressed by d9b5b56c.
2015-09-28 21:31:12 -07:00
Emil Hessman b9f22cb69b raft: fix Node doc typo 2015-09-21 06:13:33 +02:00
Ben Darnell b7baaa6bc8 raft: Allow per-group nodeIDs in MultiNode.
This feature is motivated by
https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/replica_tombstone.md
which requires a change to the way CockroachDB constructs its node IDs.
2015-09-18 15:36:36 -04:00
Jonathan Boulle 7848ac3979 *: add missing license headers 2015-09-15 14:09:01 -07:00
Brandon Philips 68d4ec3e13 raft: improve panic error message
Give a human being some insight into how we might have gotten to this
state based on feedback from #3504.
2015-09-12 12:17:02 -07:00
Dmitry Smirnov b2f4a5f587 *: fix spelling issues (codespell).
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
2015-09-11 10:22:29 +10:00
Xiang Li ef7cf058a2 *: update gogoproto 2015-09-03 15:32:25 -07:00
Tamir Duberstein 45390b9fb8 *: regenerate proto to use local import path
Using Go-style import paths in protos is not idiomatic. Normally, this
detail would be internal to etcd, but the path from which gogoproto
is imported affects downstream consumers (e.g. cockroachdb).

In cockroach, we want to avoid including `$GOPATH/src` in our protoc
include path for various reasons. This patch puts etcd on the same
convention, which allows this for cockroach.

More information: https://github.com/cockroachdb/cockroach/pull/2339#discussion_r38663417

This commit also regenerates all the protos, which seem to have
drifted a tiny bit.
2015-09-03 13:38:28 -04:00
Ben Darnell 4f20e01f60 raft: Ignore proposals if not a current member.
Fixes another panic in MultiNode.Propose.
2015-08-31 20:31:14 -04:00
Xiang Li 6cbaaa715c Merge pull request #3396 from bdarnell/multinode-propose-panic
raft: Fix a nil-pointer panic in MultiNode.Propose.
2015-08-28 12:34:49 -07:00
Ben Darnell 05924b330a raft: Fix a nil-pointer panic in MultiNode.Propose. 2015-08-28 11:17:59 +02:00
Yicheng Qin df83af944b Merge pull request #3384 from yichengq/fix-shadow
test: use go vet shadow feature instead of go-nyet
2015-08-27 14:27:57 -07:00
Yicheng Qin 92cd24d5bd *: fix govet shadow check failure 2015-08-27 14:15:30 -07:00
Matt Keller 32372e1d70 raft: Fixed a test misassumption
network_test.go:56: total = 59.22354ms, want > 50ms
59 is > 50, but the equation added 10 to the right side
2015-08-27 15:15:34 -04:00
Cong Ding c09b667d57 *: fix go vet reported issues 2015-08-22 12:19:02 -05:00
Xiang Li 6b23a8131f *: test gofmt with -s and fix reported issues 2015-08-21 18:52:16 -07:00
Xiang Li 50c1db3fbf raft: downgrade the logging around snapshot to debugf
Snapshot related logging is spamming when leader trying to
sync a failed peer.
2015-08-18 15:43:53 -07:00
es-chow cc362ccdad raft: set logger to raft so log context such as multinode groupID can be logged 2015-08-12 22:56:00 +08:00
Xiang Li 845c51fedd *: fix typos vaild->valid 2015-08-07 10:57:11 -07:00
Xiang Li 581ef05bab *: resolve proto warnings 2015-06-29 18:39:46 -07:00
Xiang Li 13f44e4b79 *: update generated proto code 2015-06-29 16:45:25 -07:00
Xiang Li e01d53b853 Merge pull request #2979 from xiang90/fix_sendapp
raft: fix panic in send app
2015-06-29 10:49:04 -07:00
Xiang Li b4022899eb raft: fix panic in send app
sendApp accesses the storage several times. Perviously, we
assume that the storage will not be modified during the read
opeartions. The assumption is not true since the storage can
be compacted between the read operations. If a compaction
causes a read entries error, we should not painc. Instead, we
can simply retry the sendApp logic until succeed.
2015-06-15 14:23:33 -07:00
Xiang Li 2f0169c3ab raft: fix usage section of doc
We recently added a config struct to start raft. Update
our doc accordingly.
2015-06-15 10:26:10 -07:00
Yicheng Qin 4e79abcfeb Merge pull request #2944 from yichengq/fix-2procs
pkg/testutil: ForceGosched -> WaitSchedule
2015-06-10 14:44:32 -07:00
Yicheng Qin 018fb8e6d9 pkg/testutil: ForceGosched -> WaitSchedule
ForceGosched() performs bad when GOMAXPROCS>1. When GOMAXPROCS=1, it
could promise that other goroutines run long enough
because it always yield the processor to other goroutines. But it cannot
yield processor to goroutine running on other processors. So when
GOMAXPROCS>1, the yield may finish when goroutine on the other
processor just runs for little time.

Here is a test to confirm the case:

```
package main

import (
	"fmt"
	"runtime"
	"testing"
)

func ForceGosched() {
	// possibility enough to sched up to 10 go routines.
	for i := 0; i < 10000; i++ {
		runtime.Gosched()
	}
}

var d int

func loop(c chan struct{}) {
	for {
		select {
		case <-c:
			for i := 0; i < 1000; i++ {
				fmt.Sprintf("come to time %d", i)
			}
			d++
		}
	}
}

func TestLoop(t *testing.T) {
	c := make(chan struct{}, 1)
	go loop(c)
	c <- struct{}{}
	ForceGosched()
	if d != 1 {
		t.Fatal("d is not incremented")
	}
}
```

`go test -v -race` runs well, but `GOMAXPROCS=2 go test -v -race` fails.

Change the functionality to waiting for schedule to happen.
2015-06-10 14:37:41 -07:00
Xiang Li 1279e495f0 raft: make the repeated log message under bad path debug level 2015-06-05 17:29:24 -07:00
Xiang Li 1561b85bf3 raft: drop the raft prefix in logging 2015-06-02 12:50:42 -07:00
Xiang Li 0ca6be31f8 raft: remove wrong invariant
The commit > unstable might not true for follower. The leader only need
to ensure the entry is stored on the majority of nodes to commit an
entry. So the minority of the cluster might receive commit > unstable
append request. This is normal.
2015-05-29 18:48:59 -07:00
Xiang Li 085447ed85 raft: fix raft node start bug
raft node should set initial prev hard state to empty.
Or it will not send the first hard coded state to application
until the state changes again.

This commit fixs the issue. It introduce a small overhead, that
the same tate might send to application twice when restarting.
But this is fine.
2015-05-27 13:32:04 -07:00
Xiang Li 0ad6d7e3ba Merge pull request #2853 from bdarnell/status
raft: MultiNode.Status returns nil for non-existent groups.
2015-05-20 13:07:23 -07:00
Ben Darnell d58fac453d raft: MultiNode.Status returns nil for non-existent groups.
Previously it would panic if the group did not exist.
2015-05-20 15:45:38 -04:00
Ben Darnell ef721db247 raft: Format node IDs as hex in DescribeMessage.
This is how they are printed in all other log messages.
2015-05-20 15:32:56 -04:00
xujun 6b7891c643 raft: fix typo in raftlog
fix typo in String() method of raftlog which will misorder
the "committed" and "unstable.offset" output.
2015-04-24 03:28:57 -04:00
Yicheng Qin 89495f9194 Merge pull request #2626 from yichengq/fix-raft-status
raft: generate correct json-format status
2015-04-03 13:54:46 -07:00
Yicheng Qin fa96e64b43 Merge pull request #2624 from yichengq/fix-raft-storage
raft: lock storage when compact it
2015-04-03 13:51:06 -07:00
Yicheng Qin 3d32c059dd raft: generate correct json-format status
Current json-format string misses the double quote around status field.

Use %q for better clearance.
2015-04-03 13:49:46 -07:00
Yicheng Qin d91ea7f199 raft: fix freeTo fails to free
If freeTo is called when to is set to the lastest inflight, freeTo
fails to free the slots.
2015-04-03 13:21:26 -07:00
Yicheng Qin c6de464587 raft: lock storage when compact it
etcd now compact raft storage asynchronously, and append entry to raft
storage may happen at the same time. Add the lock to fix the bug that
the entries saved in storage may be organized in a wrong way.
2015-04-03 11:38:01 -07:00
Xiang Li 3f867bc6ed raft: node bench matches reality 2015-03-28 14:53:42 -07:00
Xiang Li 05e240b892 *: update protobuf 2015-03-25 10:14:35 -07:00
Ben Darnell c9d507df11 raft: Use raft.Config in MultiNode. 2015-03-24 15:37:13 -04:00
Xiang Li b3fb052ad4 raft: make peers a prviate field in raft.Config 2015-03-24 11:10:07 -07:00
Xiang Li abddef0f28 raft: make node configurable 2015-03-23 21:20:49 -07:00
Brandon Philips 057978bbc6 raft: design: fixup markdown
Need a space between `1.` for markdown to render as a list.
2015-03-23 14:01:17 -07:00
Xiang Li d9b5b56c82 raft: make raft configurable 2015-03-23 09:55:19 -07:00
Xiang Li a552722f03 Merge pull request #2544 from xiang90/raft-inflight
raft: add flow control for progress
2015-03-20 20:12:31 -07:00
Xiang Li 4a64373225 raft: add flow control for progress
Each progress has a inflighs sliding window. When the progress
is in replicate state, inflights will control the sending speed
of the leader.

The leader can have at most maxInflight number of inflight
messages for each replicate progress. Receving a appResp moves
forward the sliding window. Heartbeat response free one
slot if the window is full.
2015-03-20 20:04:33 -07:00
Xiang Li 09a86cb9b9 Merge pull request #2553 from xiang90/raft-design
raft: add progress state machine graph
2015-03-20 19:57:51 -07:00
Xiang Li 86622537a1 raft: add progress state machine graph 2015-03-20 15:28:50 -07:00
Xiang Li 44d9209990 Merge pull request #2548 from xiang90/raft-design
raft: add our very first design.md
2015-03-20 09:07:44 -07:00
Yicheng Qin 6e557c58c7 Merge pull request #2532 from yichengq/342
raft: print out data and time in log
2015-03-20 08:03:23 -07:00
Xiang Li 59d8089295 raft: add our very first design.md 2015-03-19 21:00:47 -07:00
Xiang Li 2adb58f9de raft: move progress to progress.go 2015-03-19 10:05:04 -07:00
Xiang Li 7571b2cde2 raft: limit the size of msgApp
limit the max size of entries sent per message.
Lower the cost at probing state as we limit the size per message;
lower the penalty when aggressively decrease to a too low next.
2015-03-18 15:59:30 -07:00
Yicheng Qin 0634cf2cfe raft: print out data and time in log
Keep the default log setting consistent with other packages.
2015-03-18 15:49:06 -07:00
Yicheng Qin 7e7bc76038 Merge pull request #2514 from yichengq/340
raft: introduce progress states
2015-03-18 09:40:30 -07:00
Yicheng Qin 67194c0b22 raft: introduce progress states 2015-03-18 08:16:32 -07:00
Xiang Li d17f3a4452 Merge pull request #2519 from bdarnell/multinode-commit
raft: Use the correct commit index when advancing in MultiNode.
2015-03-17 10:31:53 -07:00
Ben Darnell cd1ff78ff3 raft: Elaborate a little more about committed entries in commitReady. 2015-03-17 13:22:36 -04:00
funkygao 0b912c0faf raft: fix godoc about starting a node 2015-03-17 17:35:18 +08:00
Ben Darnell 271d911c32 raft: Use the correct commit index when advancing in MultiNode.
This fixes an issue when restoring from a snapshot and brings
MultiNode closer to Node.
2015-03-16 18:40:51 -04:00
Ben Darnell 5e19adcf70 raft: correctly pass arguments to Logger.Panicf() 2015-03-12 16:15:43 -04:00
Iago López Galeiras e698192e4a rafttest: fix build error
raftLogger is not exported so we can't access it from here. Go back to
using log.
2015-03-12 11:47:13 +01:00
Xiang Li 39731724ff Merge pull request #2485 from yichengq/337
raft: fall back to bad path when unreachable
2015-03-11 14:16:39 -07:00
Yicheng Qin be0bf2a2bd raft: fall back to bad path when unreachable 2015-03-11 13:21:23 -07:00
Xiang Li c643967a41 raft: reply with the commit index when receives a smaller append message
Follower should not reject the append message with a smaller index than its commit
index. Or it will trigger the leader's resending logic, which might have a high cost.
2015-03-10 22:32:36 -07:00
Xiang Li a2be25cba4 Merge pull request #2460 from xiang90/raft-logger
raft: introduce logger interface
2015-03-09 08:00:21 -07:00
Xiang Li 97579e2e1d raft: introduce logger interface 2015-03-08 21:36:32 -07:00
Xiang Li 7fe608532a raft: do not reset vote if term is not changed
raft MUST keep the voting information for the same term. reset
should not reset vote if term is not changed.
2015-03-07 22:31:20 -08:00
Ben Darnell 725c411346 Add ReportUnreachable and ReportSnapshot to MultiNode.
Add ReportSnapshot requirement to doc.go.
2015-03-05 12:39:52 -05:00
Xiang Li 6b9b695167 Merge pull request #2435 from bdarnell/multinode
raft: Introduce MultiNode.
2015-03-04 21:27:20 -08:00
Ben Darnell c824c867ec raft: more doc updates.
Including parallelism of persist and send, cancellation of
ConfChanges, and the risks of two-node clusters.
2015-03-04 15:48:35 -05:00
Ben Darnell 4e74d81bbb raft: Introduce MultiNode.
MultiNode is an alternative to raft.Node that is more efficient
when a node may participate in many consensus groups. It is currently
used in the CockroachDB project; this commit merges the
github.com/cockroachdb/etcd fork back into the mainline.
2015-03-04 15:30:21 -05:00
Ben Darnell 250970cc23 raft: Expand doc.go
Includes more details on the required caller behavior and the safety of
membership changes.

Closes #2397
2015-03-04 13:18:02 -05:00
Yicheng Qin b4b9b9118a rafthttp: report MsgSnap status 2015-03-02 09:38:11 -08:00
Yicheng Qin 09f181f585 raft: log unreachable remote node 2015-03-01 16:47:49 -08:00
Yicheng Qin fbd5c81139 raft: remove shadowing of variables from test 2015-02-28 12:09:33 -08:00
Xiang Li 9b4d52ee73 raft: do not resend snapshot if not necessary
raft relies on the link layer to report the status of the sent snapshot.
If the snapshot is still sending, the replication to that remote peer will
be paused. If the snapshot finish sending, the replication will begin
optimistically after electionTimeout. If the snapshot fails, raft will
try to resend it.
2015-02-28 11:41:58 -08:00
Xiang Li 2185ac5ac8 raft: cleanup unreachable 2015-02-28 11:35:16 -08:00
Xiang Li 2af33fd494 raft: add reportUnreachable 2015-02-28 10:45:22 -08:00
Xiang Li cbef6ab152 raft: clean up storage 2015-02-28 10:09:07 -08:00
Xiang Li 5ede18be74 raft: separate compact and createsnap in memory storage 2015-02-28 10:08:30 -08:00
Ben Darnell b53dc0826e Only use the EntryFormatter for normal entries.
ConfChange entries also have a Data field but the application-supplied
formatter won't know what to do with them.
2015-02-20 13:51:14 -05:00
Barak Michener 92dca0af0f *: remove shadowing of variables from etcd and add travis test
We've been bitten by this enough times that I wrote a tool so that
it never happens again.
2015-02-17 16:31:42 -05:00
Xiang Li fa66055f66 rafttest: drop isPaused 2015-02-09 18:52:34 -08:00
Xiang Li 085b608de9 rafttest: support node pause 2015-02-09 16:26:43 -08:00
Xiang Li 279b216f9a raftest: wait for network sending 2015-02-09 15:52:16 -08:00
Xiang Li 65cd0051fe rafttest: add network delay 2015-02-06 15:01:07 -08:00
Xiang Li d423946fa4 rafttest: add network drop 2015-02-06 10:50:55 -08:00
Xiang Li 83edf0d862 rafttest: separate network interface and network 2015-02-03 22:50:27 -08:00
Xiang Li b147a6328d raftest: add restart and related simple test 2015-02-03 10:08:52 -08:00
Xiang Li d65af21b73 raft: add raft test suite 2015-02-01 14:53:22 -08:00
Xiang Li bff2ccaa22 Merge pull request #2170 from xiang90/remove_log
raft: remove default verbose logging
2015-01-27 15:58:53 -08:00
Xiang Li 553379e82b raft: remove default verbose logging 2015-01-27 15:57:44 -08:00
Ben Darnell 33d2400063 raft: Send any waiting appends after receiving MsgAppResp.
This addresses a problem that comes up in the cockroach tests,
in which the order of messages may lead to deadlocks (due to
the fact that we don't have regular heartbeat timers in most
of our tests).
2015-01-27 17:43:29 -05:00
Xiang Li 276c9540b4 etcdserver: support raft.status 2015-01-26 16:39:33 -08:00
Jonathan Boulle f1ed69e883 *: switch to line comments for copyright
Build tags are not compatible with block comments.
Also adds copyright header to a few places it was missing.
2015-01-26 09:53:30 -08:00
Ben Darnell 8c3a6508e9 raft: Add applied to the newRaft log message. 2015-01-22 12:04:40 -05:00
Ben Darnell 59214978a2 raft: Add applied index as an argument to newRaft and RestartNode. 2015-01-22 11:38:05 -05:00
Ben Darnell cd9d5573d4 raft: make EntryFormatter less clever. 2015-01-21 19:27:26 -05:00
Ben Darnell e73d442e32 raft: Add support for custom formatters in DescribeMessage/DescribeEntry 2015-01-21 14:12:58 -05:00
Xiang Li 003b97a60f raft: public progress struct in raft 2015-01-20 10:26:22 -08:00
Xiang Li b34936b097 raft: add progress into status 2015-01-18 15:23:50 -08:00
Xiang Li 0eaaad0e48 raft: add Status interface
Status returns the current status of raft state machine.
2015-01-16 14:02:04 -08:00
Ben Darnell 2e1c36cdd9 raft: introduce MsgHeartbeatResp.
Now that heartbeats are distinct from MsgApp{,Resp}, the retries
currently performed in stepLeader's MsgAppResp section are only
performed on an actual MsgAppResp (or a new MsgProp). This means
that it may take a long time to recover from a dropped MsgAppResp
in a quiet cluster.

This commit adds a dedicated heartbeat response message. This message
does not convey the follower's current log position because the
MsgHeartbeat does not include the leaders term and index. Upon receipt
of a heartbeat response, the leader may retry the latest MsgApp if it
believes the follower to be behind.
2015-01-14 17:34:10 -05:00
Ben Darnell 9972e62d94 raft: Use <= instead of < for heartbeat ticks.
In code outside the raft package, we cannot call raft.bcastHeartbeat
directly. Instead, to control heartbeats we set heartbeatInterval to 1
and call Tick().
2015-01-14 15:27:32 -05:00
Yicheng Qin 7a2fa39e52 Merge pull request #2012 from andybons/master
raft: add link to the paper raft_paper_test.go refers to
2015-01-06 00:27:47 -08:00
Xiang Li 2a83e350b1 Merge pull request #1992 from xiang90/rm_leader
*: support removing the leader from a 2 members cluster
2015-01-02 14:15:12 -08:00
Xiang Li 35b907ac58 raft: add lastIndex as rejectHint
Add the lastindex of the raft log as reject hint, so the leader can
bypass the greater index probing and decrease the next index directly
to last + 1.
2015-01-01 19:04:07 -08:00
Xiang Li 152676f43a *: support removing the leader from a 2 members cluster 2014-12-29 11:34:33 -08:00
Andrew Bonventre 4463f5c4b3 raft: add link to the paper raft_paper_tests.go refers to 2014-12-29 14:17:48 -05:00
Xiang Li fc96a9e4a7 raft: remove unnecessary funcs in raft.go 2014-12-25 17:04:33 -08:00
Xiang Li 2dbdf87f86 raft: add doc for storage 2014-12-22 12:33:14 -08:00
Xiang Li 896bac1f76 raft: flush the commit to fix a race in test 2014-12-18 17:10:37 -08:00
Xiang Li 88767d913d raft: leader waits for the reply of previous message when follower is not in good path.
It is reasonable for the leader to wait for the reply before sending out the next
msgApp or msgSnap for the follower in bad path. Or the leader will send out useless
messages if the previous message is rejected or the previous message is a snapshot.
Especially for the snapshot case, the leader will be 100% to send out duplicate message
including the snapshot, which is a huge waste.

This commit implement a timeout based wait mechanism. The timeout for normal msgApp is a
heartbeatTimeout and the timeout for snapshot is electionTimeout(snapshot is larger). We
can implement a piggyback mechanism(application notifies the msg lost) in the future
if necessary.
2014-12-18 15:01:50 -08:00
Xiang Li 044e35b814 raft: use newRaft 2014-12-15 11:25:35 -08:00
Xiang Li c586d5012c raft: log term as %d 2014-12-14 10:06:45 -08:00
Xiang Li 2c2e032155 Merge pull request #1908 from bdarnell/error-fixes
raft: remove panic when we see a proposal with no leader.
2014-12-11 13:58:51 -08:00
Ben Darnell b26856b603 raft: add detail to "no leader" log message 2014-12-11 15:07:32 -05:00
Xiang Li 89cba625d6 Merge pull request #1897 from xiang90/raft
raft: get rid of the using of defer in critical path
2014-12-10 21:24:38 -08:00
Yicheng Qin e89cc25c50 Merge pull request #1901 from yichengq/260
rafthttp: batch MsgProp
2014-12-10 21:16:07 -08:00
Yicheng Qin 3867c72c8a raft: support to do multiple proposals in one message 2014-12-10 20:00:59 -08:00
Ben Darnell fa247d09cc raft: remove panic when we see a proposal with no leader.
This panic can never be reached when using raft.Node, because we only
read from propc when there is a leader. However, it is possible to see
this error when using raft the raft object directly (as in MultiNode),
and in this case it is better to simply drop the proposal (as if we had
sent it to a leader that immediately vanished).

Add an error return to MemoryStorage.Append for consistency.
2014-12-10 17:34:40 -05:00
Xiang Li 96de9776b7 raft: get rid of allocation 2014-12-10 13:41:04 -08:00
Xiang Li e4c0f5c1a8 Merge pull request #1895 from xiang90/snap_nodes
etcd: update conf when apply the confChange entry
2014-12-09 11:45:01 -08:00
Xiang Li a5efbf826d raft: drop nodes in softState 2014-12-09 11:43:52 -08:00
Yicheng Qin 0472ddf05f Merge pull request #1890 from yichengq/259
raft: set raft.Commit too when setting raftLog.committed
2014-12-09 11:28:05 -08:00
Yicheng Qin 4804c45e14 raft: set raft.Commit too when setting raftLog.committed 2014-12-08 22:35:55 -08:00
Yicheng Qin 22dd3b039c Merge pull request #1888 from yichengq/258
raft: increase term to 1 before append initial entries
2014-12-08 22:27:23 -08:00
Yicheng Qin 7317834417 raft: increase term to 1 before append initial entries
Because the term of new raft is 0, it is weird to have term-1 committed
entries in the log.
2014-12-08 22:21:39 -08:00
Xiang Li ba45637ba3 raft: group step funcs 2014-12-08 15:29:54 -08:00
Xiang Li 099f4f10ea raft: one line 2014-12-08 15:28:48 -08:00
Xiang Li 8ead428e76 raft: group getter funcs 2014-12-08 15:24:34 -08:00
Xiang Li f73d059d80 raft: group configuration related funcs 2014-12-08 15:23:21 -08:00
Xiang Li 25313b1210 raft: move poll close to campaign 2014-12-08 15:21:57 -08:00
Xiang Li d52c66ad42 raft: removed unused func 2014-12-08 15:20:43 -08:00
Xiang Li 62ed1de10d raft: refactoring logging 2014-12-08 15:16:02 -08:00
Xiang Li 6cb7f2d9e9 raft: print out log when creating a newraft 2014-12-08 14:37:39 -08:00
Ben Darnell ea4d645a83 raft: Ignore redundant addNode calls.
This avoids clobbering any state when bootstrapping entries are
applied twice.
2014-12-05 17:15:50 -05:00
Ben Darnell 3d91faf85a Pre-apply the bootstrapping ConfChange entries.
This eliminates the need to fake an ApplyConfChange call before Campaign
in tests.

Fixes #1856.
2014-12-05 15:35:39 -05:00
Xiang Li 6409a8bf0d raft: filter out messages from unknow sender.
If we cannot find the `m.from` from current peers in the raft and it is a response
message, we should filter it out or raft panics. We are not targetting to avoid
malicious peers.

It has to be done in the raft node layer syncchronously. Although we can check
it at the application layer asynchronously, but after the checking and before
the message going into raft, the raft state machine might make progress and
unfortunately remove the `m.from` peer.
2014-12-05 11:34:56 -08:00
Xiang Li 182c30a41a raft: refactor logging at node level 2014-12-04 21:03:06 -08:00
Xiang Li 197e6b1b20 Merge pull request #1858 from vlajos/typofixes-vlajos-20141204
typofixes - https://github.com/vlajos/misspell_fixer
2014-12-04 14:52:27 -08:00
Veres Lajos 3de2ab2c04 *: typofixes
https://github.com/vlajos/misspell_fixer
2014-12-04 22:51:19 +00:00
Xiang Li a47690dd30 Merge pull request #1845 from xiang90/testunstable
raft: add TestUnstableTruncateAndAppend
2014-12-04 11:03:37 -08:00
Xiang Li 4ebd3a0b10 Merge pull request #1852 from xiang90/heartbeat
raft: add msgHeartbeat type
2014-12-04 10:25:46 -08:00
Xiang Li 149389cbfa raft: add msgHeartbeat type 2014-12-04 08:29:31 -08:00
Yicheng Qin e344774c10 Merge pull request #1850 from yichengq/247
raft: return 0 for term of compacted index
2014-12-03 17:23:32 -08:00
Yicheng Qin 34a468de36 raft: return 0 for term of compacted index
It is necessary to make this check because of the following case:

1. memory storage contains ents from index 0 to 50, and unstable has
ents from index 50 to 60.
2. raft receives an incoming snapshot with index 100.
3. raft restores its unstable to 100, but has not applied snapshot on memory storage.
4. raft receives an out-dated MsgApp from index 60.
5. raft finds the term of index 60 to check the match.
6. raft asks memory storage about the term of index 60 after it failed to get
it from unstable.
7. memory storage panics because it knows nothing about index 60.
2014-12-03 17:22:36 -08:00
Xiang Li ddd9cb7345 raft: add TestUnstableTruncateAndAppend 2014-12-03 16:37:19 -08:00
Xiang Li 2caf4f5f22 raft: fix log format in sendAppend 2014-12-03 16:11:44 -08:00
Xiang Li 06a5892a18 raft: more logging 2014-12-03 14:46:24 -08:00
Xiang Li 8074a5b5a4 raft: fix error message format in test 2014-12-03 13:36:47 -08:00
Xiang Li 37ab463e86 raft: add TestUnstableStableTo 2014-12-03 13:26:35 -08:00
Xiang Li 7703d4942c raft: add TestUnstableRestore 2014-12-03 13:03:56 -08:00
Xiang Li be60c88603 Merge pull request #1842 from xiang90/unstable_test
raft: add TestUnstableFirstIndex
2014-12-03 11:50:39 -08:00
Yicheng Qin 63ed202db6 raft: print out term in decimal format 2014-12-03 11:33:51 -08:00
Xiang Li 48f75ca645 raft: add TestUnstableMaybeTerm 2014-12-03 11:30:59 -08:00
Xiang Li 058356d9bd raft: add TestUnstableLastIndex 2014-12-03 11:11:31 -08:00
Xiang Li 98ebfa3468 raft: add TestUnstableFirstIndex 2014-12-03 11:11:11 -08:00
Yicheng Qin 23b32a6cbe Merge pull request #1716 from yichengq/225
raft: panic if loaded commit is out of range
2014-12-02 22:14:12 -08:00
Yicheng Qin 38768e5396 raft: panic if loaded commit is out of range 2014-12-02 22:09:34 -08:00
Xiang Li b3841afcc3 raft: do not restore snapshot if local raft has longer matching history
Raft should not restore the snapshot if it has longer matching history.
Or restoring snapshot might remove the matched entries.
2014-12-02 21:34:14 -08:00
Xiang Li 3209fd544b raft: panic on bad slice 2014-12-02 17:48:03 -08:00
Xiang Li 79014556e9 Merge pull request #1831 from xiang90/fix_unstable
raft: fix unstable
2014-12-02 14:43:11 -08:00
Xiang Li 2f5b748a90 raft: clearify that the firstIndex might not be available. 2014-12-02 14:27:52 -08:00
Yicheng Qin 1c7b9317a9 Merge pull request #1833 from yichengq/244
raft: not call stableTo for restored snapshot
2014-12-02 13:20:39 -08:00
Yicheng Qin 551a56fb98 raft: not call stableTo for restored snapshot
Stable has been set when restoring the snapshot in raftlog, so we don't need
to set it after advance.
2014-12-02 13:10:35 -08:00
Xiang Li b7ca56e3c8 raft: move good case of truncateAndAppend to the first place 2014-12-02 13:05:55 -08:00
Xiang Li 3cadaca1a3 Merge pull request #1830 from xiang90/raft_snap_log
raft: log snapshot events
2014-12-02 12:06:15 -08:00
Xiang Li 411063e14f raft: log snapshot events 2014-12-02 11:57:10 -08:00
Xiang Li 788d1e59a2 raft: use index in entry 2014-12-02 10:25:27 -08:00
Xiang Li 51de095d2c raft: logging state change events and events on bad path 2014-12-02 10:08:19 -08:00
Xiang Li 312db7f0f3 raft: fix memory storage
Memory storage should append all entries that have greater index
than the snap.Matedata.Index. We first truncate the old parts of
incoming entries. Then truncate the existing entries in the storage.
At last, we append the incoming entries to the existing entries.
2014-12-01 16:37:16 -08:00
Xiang Li 19ccdbee18 Merge pull request #1806 from xiang90/no_copy
No copy
2014-12-01 13:15:13 -08:00
Xiang Li 92d4112feb Merge pull request #1809 from xiang90/unstable
raft: stableTo checks term matching
2014-12-01 11:09:40 -08:00
Xiang Li 649176934a raft: add tests for stableTo 2014-12-01 10:54:34 -08:00
Xiang Li 3c0fbe285c raft: stableTo checks term matching
stableTo should only mark the index stable if the term is matched. After raft sends out unstable
entries to application, raft makes progress without waiting for reply. When the appliaction
calls the stableTo to notify the entries up to "index" are stable, raft might have truncated
some entries before "index" due to leader lost. raft must verify the (index,term) of stableTo,
before marking the entries as stable.
2014-11-28 14:13:07 -08:00
Xiang Li d214e87aee raft: make unstable.entries immutable; copy the entries at bad path 2014-11-27 19:35:03 -08:00
Xiang Li d244e3bf6e raft: fix node bench 2014-11-26 23:07:35 -08:00
Xiang Li fe0bc4ff36 Merge pull request #1805 from xiang90/fix_raft_b
raft: fix start term
2014-11-26 21:41:38 -08:00
Xiang Li 746c66b466 raft: fix start term 2014-11-26 21:21:13 -08:00
Xiang Li 7929e46dd8 raft: clean up 2014-11-26 15:31:07 -08:00
Xiang Li 8a626257c7 raft: move unstable related function to log_unstable.go 2014-11-26 15:25:24 -08:00
Yicheng Qin 0f070f3d2d raft: no need to save dummy entry into stable storage 2014-11-26 14:04:56 -08:00
Xiang Li 66252c7d62 raft: move all unstable stuff into one struct for future cleanup 2014-11-26 13:36:17 -08:00
Yicheng Qin ab2a40ea37 Merge branch 'log_interface'
Conflicts:
	raft/log.go
2014-11-26 12:16:02 -08:00
Xiang Li 732cfa1ad6 raft: remove the applysnap from Storage interface 2014-11-26 11:28:51 -08:00
Xiang Li e23f9e76d1 raft: do not applysnapshot in raft 2014-11-26 10:59:13 -08:00
Xiang Li 39e6631447 raft: always write dummy entry to storage 2014-11-25 23:27:40 -08:00
Xiang Li 8de98d4903 raft: clean up 2014-11-25 16:21:50 -08:00
Xiang Li 9bd1786fe4 raft: memory storage does not append out of date entries 2014-11-25 15:18:40 -08:00
Xiang Li 9df0e7715d raft: do not panic on out of date compaction 2014-11-25 15:14:39 -08:00
Xiang Li 01cbcce8ba etcdserver: do not applySnapshot twice 2014-11-25 14:53:49 -08:00
Yicheng Qin 7e6e305c4f Merge branch 'log_interface'
Conflicts:
	raft/raft.go
2014-11-25 14:22:11 -08:00
Yicheng Qin 4b43824be9 raft: not compact log if the compact index < first index of the log
It should ignore the compact operation instead of panic because the case that
the log is restored from snapshot before executing compact is reasonable.
2014-11-25 11:51:20 -08:00
Yicheng Qin 8aa89dba3d raft: make if checking match the error in storage.Term 2014-11-25 00:52:13 -08:00
Yicheng Qin 8ee1bf31d6 raft: use IsEmptySnap to check the empty snapshot 2014-11-25 00:37:21 -08:00
Yicheng Qin e466126510 raft: set snapshot to nil when it is saved 2014-11-25 00:30:22 -08:00
Yicheng Qin e17bcd8932 raft: remove wont-fix TODO in ApplyConfChange 2014-11-25 00:10:44 -08:00
Yicheng Qin 85d0e2f130 raft: remove unused raftLog.isOutOfAppliedBounds 2014-11-25 00:07:55 -08:00
Yicheng Qin 1e0f87df8c raft: stricter checking in raftLog.slice 2014-11-25 00:05:00 -08:00
Yicheng Qin 1d01c8aa2d raft: remove unused raftLog.at function 2014-11-24 23:52:28 -08:00
Yicheng Qin 2c06a1d815 raft: not set applied when restore log from snapshot
applied is only updated by application level through Advance.
2014-11-24 23:37:47 -08:00
Yicheng Qin 0d200baf72 raft: refine raftLog.term 2014-11-24 23:27:57 -08:00
Yicheng Qin 7fcaca6d18 raft: simplify raftLog.lastIndex 2014-11-24 23:08:51 -08:00
Yicheng Qin 8670f4012b raft: remove useless line in raftLog.append 2014-11-24 22:42:55 -08:00
Yicheng Qin 239c8dd479 raft: add comment to newLog 2014-11-24 21:47:12 -08:00
Xiang Li 9455119968 raft: always check leader changes in node run loop 2014-11-24 19:07:10 -08:00
Xiang Li 65ad1f6ffd raft: attach Index to Entry in all tests 2014-11-24 17:13:47 -08:00
Xiang Li 10ebf1a335 raft: fix memoryStorage append 2014-11-24 16:36:59 -08:00
Xiang Li 2876c652ab raft: fix for go vet 2014-11-24 15:00:38 -08:00
Xiang Li 62a8df304a raft: fix error message in TestLogRestore 2014-11-24 11:10:02 -08:00
Xiang Li e8afdcfe0a raft: refactor testUnstableEnts 2014-11-24 10:40:38 -08:00
Xiang Li 3dd4c458ca raft: refactor term in log.go 2014-11-24 10:13:56 -08:00
Xiang Li 94190286ff raft: add comment for append in unstableEntries in log.go 2014-11-24 09:05:40 -08:00
Xiang Li 0a46c70f5d raft: use empty slice in unstableEntries in log.go 2014-11-24 09:04:45 -08:00
Xiang Li bc0e72acb9 raft: clean up panic in log.go 2014-11-24 09:01:25 -08:00
Xiang Li f3cef87c69 raft: remove extra empty line in log.go 2014-11-24 08:43:34 -08:00
Xiang Li bdbafe2cf3 raft: use max in log.slice 2014-11-24 08:36:15 -08:00
Ben Darnell 9ddd8ee539 Rename Storage.HardState back to InitialState and include ConfState.
This fixes integration/migration_test.go (and highlights the fact that
we need some more raft-level testing of restoring from snapshots).
2014-11-21 17:22:20 -05:00
Ben Darnell 03c8881e35 Fix TestSlowNodeRestore 2014-11-21 16:40:41 -05:00
Ben Darnell 0d680d0e6b Merge remote-tracking branch 'coreos/master' into merge
* coreos/master:
  rafthttp: fix import
  raft: should not decrease match and next when handling out of order msgAppResp
  Fix migration to allow snapshots to have the right IDs
  add snapshotted integration test
  fix test import loop
  fix import loop, add set to types, and fix comments
  etcdserver: autodetect v0.4 WALs and upgrade them to v0.5 automatically
  wal: add a bench for write entry
  rafthttp: add streaming server and client
  dep: use vendored imports in codegangsta/cli
  dep: bump golang.org/x/net/context

Conflicts:
	etcdserver/server.go
	etcdserver/server_test.go
	migrate/snapshot.go
2014-11-21 15:40:11 -05:00
Xiang Li 063c5c77a0 raft: should not decrease match and next when handling out of order msgAppResp 2014-11-20 17:58:23 -08:00
Brian Waldon 9a728a127a dep: bump golang.org/x/net/context
Move from code.google.com/p/go.net/context to
golang.org/x/net/context before bumping to latest.
2014-11-20 10:19:12 -08:00
Ben Darnell b29240baf0 Merge remote-tracking branch 'coreos/master' into merge
* coreos/master:
  scripts: build-docker tag and use ENTRYPOINT
  scripts: build-release add etcd-migrate
  create .godir
  raft: optimistically increase the next if the follower is already matched
  raft: add handleHeartbeat handleHeartbeat commits to the commit index in the message. It never decreases the commit index of the raft state machine.
  rafthttp: send takes raft message instead of bytes
  *: add rafthttp pkg into test list
  raft: include commitIndex in heartbeat
  rafthttp: move server stats in raftHandler to etcdserver
  *: etcdhttp.raftHandler -> rafthttp.RaftHandler
  etcdserver: rename sender.go -> sendhub.go
  *: etcdserver.sender -> rafthttp.Sender

Conflicts:
	raft/log.go
	raft/raft_paper_test.go
2014-11-19 17:05:16 -05:00
Ben Darnell 355ee4f393 raft: Integrate snapshots into the raft.Storage interface.
Compaction is now treated as an implementation detail of Storage
implementations; Node.Compact() and related functionality have been
removed. Ready.Snapshot is now used only for incoming snapshots.

A return value has been added to ApplyConfChange to allow applications
to track the node information that must be stored in the snapshot.

raftpb.Snapshot has been split into Snapshot and SnapshotMetadata, to
allow the full snapshot data to be read from disk only when needed.

raft.Storage has new methods Snapshot, ApplySnapshot, HardState, and
SetHardState. The Snapshot and HardState parameters have been removed
from RestartNode() and will now be loaded from Storage instead.
The only remaining difference between StartNode and RestartNode is that
the former bootstraps an initial list of Peers.
2014-11-19 16:40:26 -05:00
Xiang Li b50f331558 Merge pull request #1744 from xiang90/next
raft: optimistically increase the next if the follower is already matched
2014-11-19 13:21:11 -08:00
Xiang Li 4c1fd07311 raft: optimistically increase the next if the follower is already matched
This is useful since we want to pipeline the appendEntry requests. Without
enabling optimistic increasing, the second pipelining appendEntry request
will include the entries the first one has already sent out. We decrease
the next directly to match if the leader receives a rejection for a matched
follower. This happens if one pipelining request get lost and following ones
arrives at the follower.
2014-11-18 13:41:38 -08:00
Ben Darnell 46ee58c6f0 raft: Rename ErrSnapshotRequired To ErrCompacted. 2014-11-18 13:15:10 -05:00
Xiang Li bd4cfa2a07 raft: add handleHeartbeat
handleHeartbeat commits to the commit index in the message. It never decreases the
commit index of the raft state machine.
2014-11-18 08:34:06 -08:00
Xiang Li b93d87f17f raft: include commitIndex in heartbeat 2014-11-17 16:19:28 -08:00
Ben Darnell 300c5a2001 Merge remote-tracking branch 'coreos/master' into log-storage-interface
* coreos/master: (21 commits)
  etcdserver: refactor ValidateClusterAndAssignIDs
  integration: add integration test for remove member
  integration: add test for member restart
  version: bump to alpha.3
  etcdserver: add buffer to the sender queue
  *: gracefully stop etcdserver
  Fix up migration tool, add snapshot migration
  etcd4: migration from v0.4 -> v0.5
  etcdserver: export Member.StoreKey
  etcdserver: recover cluster when receiving newer snapshot
  etcdserver: check and select committed entries to apply
  etcdserver: recover from snapshot before applying requests
  raft: not set applied when restored from snapshot
  sender: support elegant stop
  etcdserver: add StopNotify
  etcdserver: fix TestDoProposalStopped test
  etcdserver: minor cleanup
  etcdserver: validate new node is not registered before in best effort
  etcdserver: fix server.Stop()
  *: print out configuration when necessary
  ...

Conflicts:
	etcdserver/server.go
	etcdserver/server_test.go
	raft/log.go
2014-11-17 18:28:24 -05:00
Ben Darnell 64d9bcabf1 Add Storage.Term() method and hide the first entry from other methods.
The first entry in the log is a dummy which is used for matchTerm
but may not have an actual payload. This change permits Storage
implementations to treat this term value specially instead of
storing it as a dummy Entry.

Storage.FirstIndex() no longer includes the term-only entry.

This reverses a recent decision to create entry zero as initially
unstable; Storage implementations are now required to make
Term(0) == 0 and the first unstable entry is now index 1.
stableTo(0) is no longer allowed.
2014-11-17 16:54:12 -05:00
Yicheng Qin 7d0ffb3f12 raft: not set applied when restored from snapshot
applied is only updated by application level through Advance.
2014-11-14 12:08:39 -08:00
Ben Darnell 45e96be605 raft: PR feedback.
Removed Get prefix in method names, added assertions and fixed comments.
2014-11-14 13:53:42 -05:00
Ben Darnell 0e8ffe9128 raft: remove a guard that is no longer necessary 2014-11-13 15:51:36 -05:00
Ben Darnell 39eddd8565 Merge remote-tracking branch 'coreos/master' into log-storage-interface
* coreos/master:
  etcdserver: add sender tests
  raft: Only call stableTo when we have ready entries or a snapshot.
  etcdserver: add ID() function to the Server interface.
  sender: use RoundTripper instead of Client in sender
2014-11-13 15:50:08 -05:00
Ben Darnell 32824e053c raft: Only call stableTo when we have ready entries or a snapshot.
The first Ready after RestartNode (with no snapshot) will have no
unstable entries, so we don't have the correct prevLastUnstablei
when Advance is called. This would cause raftLog.unstable to move
backwards and previously-stable entries would be returned to
the application again.

This should have been caught by the "unexpected Ready" portion of
TestNodeRestart, but it went unnoticed because the Node's goroutine
takes some time to read from advancec and prepare the write to read to
readyc. Added a small (1ms) delay to all such tests to ensure that the
goroutine has time to enter its select wait.
2014-11-13 14:57:01 -05:00
Ben Darnell b29c512f50 Merge remote-tracking branch 'coreos/master' into log-storage-interface
* coreos/master: (27 commits)
  pkg/wait: move wait to pkg/wait
  etcdserver: do not add/remove/update local member to/from sender hub
  etcdserver: not record attributes when add member
  raft: add a test for proposeConfChange
  raft: block Stop() on n.done, support idempotency
  raft: add a test for node proposal
  integration: add increase cluster size test
  integration: remove unnecessary t.Testing argument
  raft: stop the node synchronously
  integration: fix test to propagate NewServer errors
  etcdserver: move peer URLs check to config
  etcdserver: ensure initial-advertise-peer-urls match initial-cluster
  raft: add a test for node.Tick
  raft: add comment string for TestNodeStart
  etcdserver: use member instead of node at etcd level
  raft: nodes return sorted ids
  raft: update unstable when calling stableTo with 0
  *: support updating advertise-peer-url Users might want to update the peerurl of the etcd member in several cases. For example, if the IP address of the physical machine etcd running on is changed, user need to update the adversite-pee-rurl accordingly. This commit makes etcd support updating the advertise-peer-url of its members.
  transport: create a tls listener only if the tlsInfo is not empty and the scheme is HTTPS
  etcdserver: use member pointer for all tests
  ...

Conflicts:
	etcdserver/server.go
	raft/log.go
	raft/log_test.go
	raft/node.go
2014-11-13 14:21:09 -05:00
Xiang Li 04994048bb Merge pull request #1702 from xiang90/node_config_propose
raft: add a test for proposeConfChange
2014-11-12 21:16:54 -08:00
Jonathan Boulle eb66d2b0eb Merge pull request #1699 from jonboulle/node_stop
raft: block Stop() on n.done, support idempotency
2014-11-12 16:26:54 -08:00
Xiang Li 2a407dadc0 raft: add a test for proposeConfChange 2014-11-12 16:16:26 -08:00
Jonathan Boulle 2cedf127d4 raft: block Stop() on n.done, support idempotency 2014-11-12 15:54:45 -08:00
Xiang Li 68ab7e69e1 raft: add a test for node proposal 2014-11-12 15:44:24 -08:00
Ben Darnell 54b07d7974 Remove raft.loadEnts and the ents parameter to raft.RestartNode.
The initial entries are now provided via the Storage interface.
2014-11-12 18:31:19 -05:00
Ben Darnell 147fd614ce The initial term=0 log entry is now initially unstable.
This entry is now persisted through the normal flow instead of appearing
in the stored log at creation time.  This is how things worked before
the Storage interface was introduced. (see coreos/etcd#1689)
2014-11-12 18:24:16 -05:00
Xiang Li b271e88c20 Merge pull request #1696 from xiang90/testnodetick
raft: add a test for node.Tick
2014-11-12 14:38:07 -08:00
Xiang Li d834324e97 raft: stop the node synchronously 2014-11-12 14:06:52 -08:00
Ben Darnell 76a3de9a33 Require a non-nil Storage parameter in newLog.
Callers must in general have a reference to their Storage objects to
transfer entries from Ready to Storage, so it doesn't make sense to
create a hidden Storage for them.

By explicitly creating Storage objects in tests we can remove a
few casts of raftLog's storage field.
2014-11-12 16:38:50 -05:00
Xiang Li 45c36a0808 raft: add a test for node.Tick 2014-11-12 11:51:51 -08:00
Xiang Li fe0325fce7 raft: add comment string for TestNodeStart 2014-11-12 11:40:40 -08:00
Yicheng Qin fb93e3fa00 Merge pull request #1689 from yichengq/219
raft: update unstable when calling stableTo with 0
2014-11-12 10:41:40 -08:00
Xiang Li d494014782 Merge pull request #1679 from xiang90/peerurl
update peer url
2014-11-12 10:21:13 -08:00
Yicheng Qin 78cbb1512c raft: nodes return sorted ids
This makes raft.softState return the same result when its soft state is
not changed.
2014-11-11 22:58:15 -08:00
Yicheng Qin 7dba92dd53 raft: update unstable when calling stableTo with 0
It should update unstable in this case because it may happen that raft
only writes entry 0 into stable storage.
2014-11-11 17:20:31 -08:00
Xiang Li 5967794009 *: support updating advertise-peer-url
Users might want to update the peerurl of the etcd member in several cases.
For example, if the IP address of the physical machine etcd running on is
changed, user need to update the adversite-pee-rurl accordingly.
This commit makes etcd support updating the advertise-peer-url of its members.
2014-11-11 12:07:03 -08:00
Xiang Li f64963de88 raftpb: fix proto 2014-11-10 17:05:30 -08:00
Ben Darnell 25b6590547 raft: introduce log storage interface.
This change splits the raftLog.entries array into an in-memory
"unstable" list and a pluggable interface for retrieving entries that
have been persisted to disk. An in-memory implementation of this
interface is provided which behaves the same as the old version;
in a future commit etcdserver could replace the MemoryStorage with
one backed by the WAL.
2014-11-10 17:40:39 -05:00
Ben Darnell 21987c8701 raft: remove raftLog.resetUnstable and resetNextEnts
These methods are no longer used outside of tests and are redundant with
the new stableTo and appliedTo methods.
2014-11-06 17:18:00 -05:00
Xiang Li 3fc6f9c24f Merge pull request #1586 from xiangli-cmu/fix_node
*: add Advance interface to raft.Node
2014-11-05 15:09:51 -08:00
Xiang Li 0d7c43d885 *: add a Advance interface to raft.Node
Node set the applied to committed right after it sends out Ready to application. This is not
correct since the application has not actually applied the entries at that point. We add a
Advance interface to Node. Application needs to call Advance to tell raft Node its progress.
Also this change can avoid unnecessary copying when application is still applying entires but
there are more entries to be applied.
2014-11-05 15:04:14 -08:00
Jonathan Boulle aa5711bd0f Merge pull request #1595 from jonboulle/header
*: add copyright header to remaining files
2014-11-03 23:42:14 -08:00
Jonathan Boulle f7434b55e5 *: add copyright header to remaining files 2014-11-03 23:29:15 -08:00
Xiang Li 5ead800ff5 Merge pull request #1572 from xiangli-cmu/raft_test
raft: add paper tests for section 5.4.1
2014-11-03 22:37:26 -08:00
Xiang Li 165ac654e8 raft: add paper tests for section 5.4.1 2014-11-03 15:50:56 -08:00
Xiang Li dbdeceda7b raft: do not load empty state and ents 2014-11-03 15:16:41 -08:00
Yicheng Qin 5bdf6a4110 Merge pull request #1528 from unihorn/191
raft: add tests based on section 5.3 in raft paper
2014-10-31 16:35:36 -07:00
Yicheng Qin 421d5fbe72 raft: add tests based on section 5.3 in raft paper 2014-10-31 16:32:34 -07:00
Xiang Li 816c173edf Merge pull request #1526 from xiangli-cmu/leader_log
raft: better logging for leader transition
2014-10-30 10:13:58 -07:00
Jonathan Boulle b99633207c raft: minor cleanup in comments 2014-10-30 10:01:44 -07:00
Xiang Li 46ebf69c02 raft: better logging for leader transition 2014-10-30 09:33:38 -07:00
Jonathan Boulle 0cf0cb3d02 raft: add tests for progress.maybeDecr 2014-10-29 22:57:25 -07:00
Yicheng Qin ccca32b138 Merge pull request #1497 from unihorn/189
raft: add tests based on section 5.1 in raft paper
2014-10-29 16:35:25 -07:00
Yicheng Qin dabb5c150d raft: add tests based on section 5.1 in raft paper 2014-10-29 16:22:17 -07:00
Xiang Li 738da2b3fa raft: fix a incorrect in testMaybeAppend 2014-10-29 14:57:39 -07:00
Xiang Li 6375bd7960 Merge pull request #1469 from xiangli-cmu/raft_log_test
raft: add tests for maybeappend
2014-10-29 14:36:07 -07:00
Xiang Li 14f4163e41 raft: add several test cases for testMaybeAppend 2014-10-29 14:35:13 -07:00
Jonathan Boulle 81304b2b7e raft: move test helper function into tests 2014-10-29 14:04:45 -07:00
Xiang Li ad1718a3e5 raft: add a test case for testLogMaybeAppend 2014-10-29 11:44:18 -07:00
Yicheng Qin 35bba87d2a Merge pull request #1471 from unihorn/189
raft: add tests based on section 5.2 in raft paper
2014-10-29 10:49:23 -07:00
Yicheng Qin bffe611fe6 raft: add tests based on section 5.2 in raft paper 2014-10-29 10:17:09 -07:00
Xiang Li c6873c1eab raft: add tests for maybeappend 2014-10-28 15:07:49 -07:00
Brian Waldon 6796669484 raft: stop logging IDs with 0x prefix 2014-10-27 18:56:13 -07:00
Xiang Li 74c257f63d Merge pull request #1419 from xiangli-cmu/raft_log_test
raft: add test for findConflict
2014-10-27 14:30:36 -07:00
Xiang Li 460d6490ba raft: address issues in comments 2014-10-27 14:20:42 -07:00
Yicheng Qin b986a52579 raft: use raft-specific rand.Rand instead of global one 2014-10-27 12:32:11 -07:00
Jonathan Boulle 6e6d1897d8 pkg: move everything into subpackages 2014-10-27 09:57:28 -07:00
Xiang Li 94f701cf95 raft: refactor isUpToDate and add a test 2014-10-25 20:34:14 -07:00
Xiang Li 8cd95e916d raft: comments for isUpToDate 2014-10-25 20:12:54 -07:00
Xiang Li 86c66cd802 raft: remove unused code 2014-10-25 19:56:13 -07:00
Xiang Li 90f26e4a56 raft: add test for findConflict 2014-10-25 18:58:11 -07:00
Xiang Li 507300130b raft: add tests for ignoring heartbeat reply 2014-10-24 11:50:21 -07:00
Soheil Hassas Yeganeh 09e9618b02 raft: change raftLog.maybeAppend to return the last new index
As per @unihorn's comment on #1366, we change raftLog.maybeAppend to
return the last new index of entries in maybeAppend.
2014-10-23 15:42:47 -04:00
Soheil Hassas Yeganeh 233617bea2 raft: Make MsgAppRes ack only the last index in MsgApp
As explained in #1366, the leader will fail to transmit the missed
logs if the leader receives a hearbeat response from a follower
that is not yet matched in the leader. In other words, there are
append responses that do not explicitly reject an append but
implied a gap.

This commit is based on @xiangli-cmu's idea. We should only acknowledge
upto the index of logs in the append message. This way responses to
heartbeats would never interfer with the log synchronization because
their log index is always 0.

Fixes #1366
2014-10-23 14:56:17 -04:00
Xiang Li 48c4145f1b raft: fix node bench 2014-10-21 12:46:39 -07:00
Xiang Li a9984fda4f Merge pull request #1102 from coreos/node_bench
raft: add a one node bench
2014-10-21 11:44:46 -07:00
Xiang Li 50d4abc676 raft: add a one node bench 2014-10-21 11:43:55 -07:00
Xiang Li a44849deec Merge pull request #1286 from coreos/clusterid
*: generate clusterid
2014-10-20 19:07:03 -07:00
Yicheng Qin e200d2a8e2 etcdserver/raft: remove msgDenied, removedNodes, shouldStop
The future plan is to do all these in etcdserver level.
2014-10-20 15:13:18 -07:00
Xiang Li ea6bcacfe4 *: generate clusterid 2014-10-20 15:00:54 -07:00
Jonathan Boulle 7a4d42166b *: add license header to all source files 2014-10-17 15:41:22 -07:00
Jonathan Boulle fc42bdb904 raft: remove unused compactThreshold 2014-10-16 17:11:10 -07:00
Jonathan Boulle 8168fed825 etcdserver: add ServerStats and LeaderStats
This adds the remaining two stats endpoints: `/v2/stats/self`, for
various statistics on the EtcdServer, and `/v2/stats/leader`, for
statistics on a leader's followers.

By and large most of the stats code is copied across from 0.4.x, updated
where necessary to integrate with the new decoupling of raft from
transport.

This does not satisfactorily resolve the question of name vs ID. In the
old world, names were unique in the cluster and transmitted over the
wire, so they could be used safely in all statistics. In the new world,
a given EtcdServer only knows its own name, and it is instead IDs that
are communicated among the cluster members. Hence in most places here we
simply substitute a string-encoded ID in place of name, and only where
possible do we retain the actual given name of the EtcdServer.
2014-10-16 10:43:44 -07:00
Yicheng Qin 8cd6030a1d etcdserver: add checking when apply conf change 2014-10-16 09:49:26 -07:00
Yicheng Qin eb2dd1892f raft: add RemovedNodes to SoftState 2014-10-15 10:53:07 -07:00
Jonathan Boulle 4183b69e12 *: move from third_party to Godep 2014-10-14 00:37:52 -07:00
Xiang Li d7dfe07e5d Merge pull request #1293 from unihorn/160
raft: protobuf messageType
2014-10-14 09:16:38 +08:00
Yicheng Qin f693c6ddf2 etcdserver: apply bootstrap conf change 2014-10-13 11:22:23 -07:00
Yicheng Qin 0319b033ea etcdserver/raft: set context for bootstrap addnode entries 2014-10-13 11:22:23 -07:00
Yicheng Qin 32c38820c1 raft: protobuf messageType 2014-10-13 11:13:43 -07:00
Yicheng Qin 447caf1afc etcdserver/wal: record info at the head of WAL file 2014-10-10 11:57:09 -07:00
Xiang Li 1eb1020717 raft: fix raft test 2014-10-09 14:42:29 +08:00
Xiang Li 8bbbaa88b2 *: raft related int64 -> uint64 2014-10-09 14:29:21 +08:00
Xiang Li af5b8c6c44 raft: int64 -> uint64 2014-10-09 14:26:43 +08:00
Xiang Li 38af14b0f4 Merge pull request #1260 from coreos/snap_rm
raft: save removed nodes in snapshot
2014-10-09 08:00:33 +08:00
Xiang Li abe97e49d5 raft: more comment 2014-10-09 07:02:05 +08:00
Xiang Li 73f2aaf98f raft: removedSlice -> removedNodes 2014-10-09 06:55:25 +08:00
Xiang Li c67fd14fe8 Merge pull request #1257 from bdarnell/cleanups
Raft: assorted cleanups (golint and go vet)
2014-10-09 05:55:21 +08:00
Ben Darnell d2e858587f Raft: a few more improvements to test messages. 2014-10-08 15:07:11 -04:00
Xiang Li 7b61565c0a raft: save removed nodes in snapshot 2014-10-08 15:33:55 +08:00
Xiang Li 1cd3345e00 raft: address issues with election timeout 2014-10-08 07:41:17 +08:00
Ben Darnell 1083ce8f73 raft: remove misleading labels in array definition
Since these are arrays instead of maps, the "keys" here are actually
(useless) goto labels. What really matters is that the ordering is
the same between the constant declarations and the array.
2014-10-07 18:44:06 -04:00
Ben Darnell 36558b1924 Raft: fix printf strings found by go vet. 2014-10-07 18:44:06 -04:00
Ben Darnell 3ad0df3722 Raft: fix problems reported by golint. 2014-10-07 18:44:06 -04:00
Xiang Li f65d117462 raft: add a test for randElectionTimeout 2014-10-07 20:34:15 +08:00
Xiang Li d7d6f84f64 raft: rand election timeout 2014-10-07 20:12:49 +08:00
Xiang Li 45e4a8643a raft: add tests for raft.compact 2014-10-07 16:03:11 +08:00
Xiang Li 7fe4385ef9 raft: add comment for Compact interface of Node 2014-10-07 16:03:11 +08:00
Xiang Li 5587e0d73f raft: compact takes index and nodes parameters
Before this commit, compact always compact log at current appliedindex of raft.
This prevents us from doing non-blocking snapshot since we have to make snapshot
and compact atomically. To prepare for non-blocking snapshot, this commit make
compact supports index and nodes parameters. After completing snapshot, the applier
should call compact with the snapshot index and the nodes at snapshot index to do
a compaction at snapsohot index.
2014-10-07 16:03:11 +08:00
Yicheng Qin 45ebfb4217 raft: refine initial entries logic in StartNode 2014-10-06 16:06:01 -07:00
Yicheng Qin 314d425718 main/raft: write addNode ConfChange entries in log when start raft 2014-10-06 14:33:12 -07:00
Xiang Li dc9cb4b4ba raft: fix send
send should not attach current term to msgProp. Send should simply do proxy for msgProp without
changing its term. msgProp has a special term 0, which indicates that it is a local message.
2014-10-06 04:48:35 +08:00
Xiang Li 01ecc60a88 Merge pull request #1203 from coreos/fix_raft
raft: commitIndex=min(leaderCommit, index of last new entry)
2014-10-03 22:24:07 +08:00
Xiang Li 172bd7d096 raft: add test for maybeappend change 2014-10-03 22:21:35 +08:00
Xiang Li 70bf464cd6 raft: add comment to decrTo 2014-10-03 13:42:34 +08:00
Xiang Li 16ba77767e raft: do not decrease nextIndex and send entries for stale reply 2014-10-03 13:41:27 +08:00
Yicheng Qin 8490904f20 Merge pull request #1224 from unihorn/149
raft: msg.Denied -> msg.Reject
2014-10-02 12:24:09 -07:00
Yicheng Qin fff918c672 raft: msg.Denied -> msg.Reject
Change the field name because it has msgDenied already.
2014-10-02 12:22:12 -07:00
Yicheng Qin 182c8316e1 raft: refine comment for doc and removed list tests 2014-10-01 14:57:39 -07:00
Yicheng Qin e4a6c9651a raft: add removed
The usage of removed:
1. tell removed node about its removal explicitly using msgDenied
2. prevent removed node disrupt cluster progress by launching leader election

It is set when apply node removal, or receive msgDenied.
2014-10-01 14:57:38 -07:00
Xiang Li ce70e63cc6 Merge pull request #1200 from coreos/raft_heartbeat
raft: heartbeat is only response for maintaining leader dominance
2014-09-29 17:00:26 -07:00
Xiang Li d7b4e44a66 raft: heartbeat is only response for maintaining leader dominance 2014-09-29 16:57:43 -07:00
Xiang Li b3c1bd5616 raft: commitIndex=min(leaderCommit, index of last new entry) 2014-09-29 14:38:17 -07:00
Yicheng Qin 0e8345aa73 Merge pull request #1143 from unihorn/136
*: Id -> ID for protobuf types
2014-09-29 13:58:02 -07:00
Xiang Li e26ff32fd8 raft: fix error msg 2014-09-28 21:17:51 -07:00
Xiang Li 51529cc3f2 raft: remove index field in msg AppResp 2014-09-28 21:13:53 -07:00
Xiang Li adefd83855 raft: remove index field in msg voteResp 2014-09-28 21:13:43 -07:00
Xiang Li 86473d8a27 raft: add msg denied field 2014-09-28 21:13:33 -07:00
Yicheng Qin 1d5d2e3726 *: Id -> ID for protobuf types
We use ID instead of Id in this project based on golang conventions.
2014-09-26 11:49:30 -07:00
Xiang Li 45f71af33e pkg: move testutil to pkg 2014-09-25 10:40:40 -07:00
Jonathan Boulle a45d490598 Merge pull request #1146 from jonboulle/1146_protobuf
script protobuf generation
2014-09-24 14:34:49 -07:00
Jonathan Boulle c8c55aa378 scripts: consolidate and standardize protobuf generation 2014-09-24 13:45:00 -07:00
Yicheng Qin 1ca03d8e9d raft: move logic to separate func 2014-09-24 10:23:44 -07:00
Yicheng Qin b07be74a82 raft: stop tickElection when the node is not in peer list
This prevents the bug like this:
1. a node sends join to a cluster and succeeds
2. it starts with empty peers and waits for sync, but it have not
received anything
3. election timeout passes, and it promotes itself to leader
4. it commits some log entry
5. its log conflicts with the cluster's
2014-09-23 23:15:02 -07:00
Xiang Li 03152004d7 Merge pull request #1145 from coreos/fix_panic
raft: node ignores unexpected local messages receiving from network
2014-09-23 14:11:56 -07:00
Xiang Li 25c2768b8f raft: node ignores unexpected local messages receiving from network 2014-09-23 13:50:43 -07:00
Yicheng Qin bc7b0108dc raft: ConfigChange -> ConfChange 2014-09-23 12:02:44 -07:00
Yicheng Qin d92931853e raft: Config -> ConfigChange
Configure -> ProposeConfigChange
AddNode, RemoveNode -> ApplyConfigChange
2014-09-22 23:39:53 -07:00
Yicheng Qin ec8f493fde raft: refine comments for Configure 2014-09-22 15:44:47 -07:00
Yicheng Qin dc36ae7058 raft: use pb.Config instead of []byte for Configure 2014-09-22 15:44:47 -07:00
Yicheng Qin b82d70871f raft: use EntryType in protobuf 2014-09-22 15:44:46 -07:00
Yicheng Qin b801f1affe raft: refine comment for raft.pendingConf 2014-09-22 15:44:46 -07:00
Yicheng Qin ff6705b94b raft: add Configure, AddNode, RemoveNode
Configure is used to propose config change. AddNode and RemoveNode is
used to apply cluster change to raft state machine. They are the
basics for dynamic configuration.
2014-09-22 15:43:13 -07:00
Yicheng Qin f2ebd64a1b *: add testutil pkg 2014-09-19 14:32:38 -07:00
Jonathan Boulle b66a40495d raft: introduce Node interface 2014-09-17 14:18:56 -07:00
Xiang Li ab61a8aa9a *: init for on disk snap support 2014-09-17 13:56:12 -07:00
Yicheng Qin de21c39ca5 raft: isStateEqual -> isHardStateEqual, IsEmptyState -> IsEmptyHardState 2014-09-16 13:55:00 -07:00
Yicheng Qin 023dc7cba2 etcdserver: add SYNC request 2014-09-16 13:42:03 -07:00
Yicheng Qin cc8d8f2102 raft: remove unused raftpb.LastIndex 2014-09-15 14:34:23 -07:00
Yicheng Qin 9607665323 raft: remove unused return 2014-09-15 13:22:21 -07:00
Yicheng Qin 9bf2c2ed9d Merge pull request #1052 from unihorn/121
server: add unit tests
2014-09-15 13:20:50 -07:00
Yicheng Qin 6cd4434ff3 server: add unit tests
Make test coverage >= 90%
2014-09-15 13:16:48 -07:00
Xiang Li f9c12e2053 Merge pull request #1075 from coreos/fix_heartbeat
raft: fix heartbeat
2014-09-15 10:04:12 -07:00