Yicheng Qin
fa96e64b43
Merge pull request #2624 from yichengq/fix-raft-storage
...
raft: lock storage when compact it
2015-04-03 13:51:06 -07:00
Yicheng Qin
3d32c059dd
raft: generate correct json-format status
...
Current json-format string misses the double quote around status field.
Use %q for better clearance.
2015-04-03 13:49:46 -07:00
Yicheng Qin
d91ea7f199
raft: fix freeTo fails to free
...
If freeTo is called when to is set to the lastest inflight, freeTo
fails to free the slots.
2015-04-03 13:21:26 -07:00
Yicheng Qin
c6de464587
raft: lock storage when compact it
...
etcd now compact raft storage asynchronously, and append entry to raft
storage may happen at the same time. Add the lock to fix the bug that
the entries saved in storage may be organized in a wrong way.
2015-04-03 11:38:01 -07:00
Xiang Li
3f867bc6ed
raft: node bench matches reality
2015-03-28 14:53:42 -07:00
Xiang Li
05e240b892
*: update protobuf
2015-03-25 10:14:35 -07:00
Ben Darnell
c9d507df11
raft: Use raft.Config in MultiNode.
2015-03-24 15:37:13 -04:00
Xiang Li
b3fb052ad4
raft: make peers a prviate field in raft.Config
2015-03-24 11:10:07 -07:00
Xiang Li
abddef0f28
raft: make node configurable
2015-03-23 21:20:49 -07:00
Brandon Philips
057978bbc6
raft: design: fixup markdown
...
Need a space between `1.` for markdown to render as a list.
2015-03-23 14:01:17 -07:00
Xiang Li
d9b5b56c82
raft: make raft configurable
2015-03-23 09:55:19 -07:00
Xiang Li
a552722f03
Merge pull request #2544 from xiang90/raft-inflight
...
raft: add flow control for progress
2015-03-20 20:12:31 -07:00
Xiang Li
4a64373225
raft: add flow control for progress
...
Each progress has a inflighs sliding window. When the progress
is in replicate state, inflights will control the sending speed
of the leader.
The leader can have at most maxInflight number of inflight
messages for each replicate progress. Receving a appResp moves
forward the sliding window. Heartbeat response free one
slot if the window is full.
2015-03-20 20:04:33 -07:00
Xiang Li
09a86cb9b9
Merge pull request #2553 from xiang90/raft-design
...
raft: add progress state machine graph
2015-03-20 19:57:51 -07:00
Xiang Li
86622537a1
raft: add progress state machine graph
2015-03-20 15:28:50 -07:00
Xiang Li
44d9209990
Merge pull request #2548 from xiang90/raft-design
...
raft: add our very first design.md
2015-03-20 09:07:44 -07:00
Yicheng Qin
6e557c58c7
Merge pull request #2532 from yichengq/342
...
raft: print out data and time in log
2015-03-20 08:03:23 -07:00
Xiang Li
59d8089295
raft: add our very first design.md
2015-03-19 21:00:47 -07:00
Xiang Li
2adb58f9de
raft: move progress to progress.go
2015-03-19 10:05:04 -07:00
Xiang Li
7571b2cde2
raft: limit the size of msgApp
...
limit the max size of entries sent per message.
Lower the cost at probing state as we limit the size per message;
lower the penalty when aggressively decrease to a too low next.
2015-03-18 15:59:30 -07:00
Yicheng Qin
0634cf2cfe
raft: print out data and time in log
...
Keep the default log setting consistent with other packages.
2015-03-18 15:49:06 -07:00
Yicheng Qin
7e7bc76038
Merge pull request #2514 from yichengq/340
...
raft: introduce progress states
2015-03-18 09:40:30 -07:00
Yicheng Qin
67194c0b22
raft: introduce progress states
2015-03-18 08:16:32 -07:00
Xiang Li
d17f3a4452
Merge pull request #2519 from bdarnell/multinode-commit
...
raft: Use the correct commit index when advancing in MultiNode.
2015-03-17 10:31:53 -07:00
Ben Darnell
cd1ff78ff3
raft: Elaborate a little more about committed entries in commitReady.
2015-03-17 13:22:36 -04:00
funkygao
0b912c0faf
raft: fix godoc about starting a node
2015-03-17 17:35:18 +08:00
Ben Darnell
271d911c32
raft: Use the correct commit index when advancing in MultiNode.
...
This fixes an issue when restoring from a snapshot and brings
MultiNode closer to Node.
2015-03-16 18:40:51 -04:00
Ben Darnell
5e19adcf70
raft: correctly pass arguments to Logger.Panicf()
2015-03-12 16:15:43 -04:00
Iago López Galeiras
e698192e4a
rafttest: fix build error
...
raftLogger is not exported so we can't access it from here. Go back to
using log.
2015-03-12 11:47:13 +01:00
Xiang Li
39731724ff
Merge pull request #2485 from yichengq/337
...
raft: fall back to bad path when unreachable
2015-03-11 14:16:39 -07:00
Yicheng Qin
be0bf2a2bd
raft: fall back to bad path when unreachable
2015-03-11 13:21:23 -07:00
Xiang Li
c643967a41
raft: reply with the commit index when receives a smaller append message
...
Follower should not reject the append message with a smaller index than its commit
index. Or it will trigger the leader's resending logic, which might have a high cost.
2015-03-10 22:32:36 -07:00
Xiang Li
a2be25cba4
Merge pull request #2460 from xiang90/raft-logger
...
raft: introduce logger interface
2015-03-09 08:00:21 -07:00
Xiang Li
97579e2e1d
raft: introduce logger interface
2015-03-08 21:36:32 -07:00
Xiang Li
7fe608532a
raft: do not reset vote if term is not changed
...
raft MUST keep the voting information for the same term. reset
should not reset vote if term is not changed.
2015-03-07 22:31:20 -08:00
Ben Darnell
725c411346
Add ReportUnreachable and ReportSnapshot to MultiNode.
...
Add ReportSnapshot requirement to doc.go.
2015-03-05 12:39:52 -05:00
Xiang Li
6b9b695167
Merge pull request #2435 from bdarnell/multinode
...
raft: Introduce MultiNode.
2015-03-04 21:27:20 -08:00
Ben Darnell
c824c867ec
raft: more doc updates.
...
Including parallelism of persist and send, cancellation of
ConfChanges, and the risks of two-node clusters.
2015-03-04 15:48:35 -05:00
Ben Darnell
4e74d81bbb
raft: Introduce MultiNode.
...
MultiNode is an alternative to raft.Node that is more efficient
when a node may participate in many consensus groups. It is currently
used in the CockroachDB project; this commit merges the
github.com/cockroachdb/etcd fork back into the mainline.
2015-03-04 15:30:21 -05:00
Ben Darnell
250970cc23
raft: Expand doc.go
...
Includes more details on the required caller behavior and the safety of
membership changes.
Closes #2397
2015-03-04 13:18:02 -05:00
Yicheng Qin
b4b9b9118a
rafthttp: report MsgSnap status
2015-03-02 09:38:11 -08:00
Yicheng Qin
09f181f585
raft: log unreachable remote node
2015-03-01 16:47:49 -08:00
Yicheng Qin
fbd5c81139
raft: remove shadowing of variables from test
2015-02-28 12:09:33 -08:00
Xiang Li
9b4d52ee73
raft: do not resend snapshot if not necessary
...
raft relies on the link layer to report the status of the sent snapshot.
If the snapshot is still sending, the replication to that remote peer will
be paused. If the snapshot finish sending, the replication will begin
optimistically after electionTimeout. If the snapshot fails, raft will
try to resend it.
2015-02-28 11:41:58 -08:00
Xiang Li
2185ac5ac8
raft: cleanup unreachable
2015-02-28 11:35:16 -08:00
Xiang Li
2af33fd494
raft: add reportUnreachable
2015-02-28 10:45:22 -08:00
Xiang Li
cbef6ab152
raft: clean up storage
2015-02-28 10:09:07 -08:00
Xiang Li
5ede18be74
raft: separate compact and createsnap in memory storage
2015-02-28 10:08:30 -08:00
Ben Darnell
b53dc0826e
Only use the EntryFormatter for normal entries.
...
ConfChange entries also have a Data field but the application-supplied
formatter won't know what to do with them.
2015-02-20 13:51:14 -05:00
Barak Michener
92dca0af0f
*: remove shadowing of variables from etcd and add travis test
...
We've been bitten by this enough times that I wrote a tool so that
it never happens again.
2015-02-17 16:31:42 -05:00
Xiang Li
fa66055f66
rafttest: drop isPaused
2015-02-09 18:52:34 -08:00
Xiang Li
085b608de9
rafttest: support node pause
2015-02-09 16:26:43 -08:00
Xiang Li
279b216f9a
raftest: wait for network sending
2015-02-09 15:52:16 -08:00
Xiang Li
65cd0051fe
rafttest: add network delay
2015-02-06 15:01:07 -08:00
Xiang Li
d423946fa4
rafttest: add network drop
2015-02-06 10:50:55 -08:00
Xiang Li
83edf0d862
rafttest: separate network interface and network
2015-02-03 22:50:27 -08:00
Xiang Li
b147a6328d
raftest: add restart and related simple test
2015-02-03 10:08:52 -08:00
Xiang Li
d65af21b73
raft: add raft test suite
2015-02-01 14:53:22 -08:00
Xiang Li
bff2ccaa22
Merge pull request #2170 from xiang90/remove_log
...
raft: remove default verbose logging
2015-01-27 15:58:53 -08:00
Xiang Li
553379e82b
raft: remove default verbose logging
2015-01-27 15:57:44 -08:00
Ben Darnell
33d2400063
raft: Send any waiting appends after receiving MsgAppResp.
...
This addresses a problem that comes up in the cockroach tests,
in which the order of messages may lead to deadlocks (due to
the fact that we don't have regular heartbeat timers in most
of our tests).
2015-01-27 17:43:29 -05:00
Xiang Li
276c9540b4
etcdserver: support raft.status
2015-01-26 16:39:33 -08:00
Jonathan Boulle
f1ed69e883
*: switch to line comments for copyright
...
Build tags are not compatible with block comments.
Also adds copyright header to a few places it was missing.
2015-01-26 09:53:30 -08:00
Ben Darnell
8c3a6508e9
raft: Add applied to the newRaft log message.
2015-01-22 12:04:40 -05:00
Ben Darnell
59214978a2
raft: Add applied index as an argument to newRaft and RestartNode.
2015-01-22 11:38:05 -05:00
Ben Darnell
cd9d5573d4
raft: make EntryFormatter less clever.
2015-01-21 19:27:26 -05:00
Ben Darnell
e73d442e32
raft: Add support for custom formatters in DescribeMessage/DescribeEntry
2015-01-21 14:12:58 -05:00
Xiang Li
003b97a60f
raft: public progress struct in raft
2015-01-20 10:26:22 -08:00
Xiang Li
b34936b097
raft: add progress into status
2015-01-18 15:23:50 -08:00
Xiang Li
0eaaad0e48
raft: add Status interface
...
Status returns the current status of raft state machine.
2015-01-16 14:02:04 -08:00
Ben Darnell
2e1c36cdd9
raft: introduce MsgHeartbeatResp.
...
Now that heartbeats are distinct from MsgApp{,Resp}, the retries
currently performed in stepLeader's MsgAppResp section are only
performed on an actual MsgAppResp (or a new MsgProp). This means
that it may take a long time to recover from a dropped MsgAppResp
in a quiet cluster.
This commit adds a dedicated heartbeat response message. This message
does not convey the follower's current log position because the
MsgHeartbeat does not include the leaders term and index. Upon receipt
of a heartbeat response, the leader may retry the latest MsgApp if it
believes the follower to be behind.
2015-01-14 17:34:10 -05:00
Ben Darnell
9972e62d94
raft: Use <= instead of < for heartbeat ticks.
...
In code outside the raft package, we cannot call raft.bcastHeartbeat
directly. Instead, to control heartbeats we set heartbeatInterval to 1
and call Tick().
2015-01-14 15:27:32 -05:00
Yicheng Qin
7a2fa39e52
Merge pull request #2012 from andybons/master
...
raft: add link to the paper raft_paper_test.go refers to
2015-01-06 00:27:47 -08:00
Xiang Li
2a83e350b1
Merge pull request #1992 from xiang90/rm_leader
...
*: support removing the leader from a 2 members cluster
2015-01-02 14:15:12 -08:00
Xiang Li
35b907ac58
raft: add lastIndex as rejectHint
...
Add the lastindex of the raft log as reject hint, so the leader can
bypass the greater index probing and decrease the next index directly
to last + 1.
2015-01-01 19:04:07 -08:00
Xiang Li
152676f43a
*: support removing the leader from a 2 members cluster
2014-12-29 11:34:33 -08:00
Andrew Bonventre
4463f5c4b3
raft: add link to the paper raft_paper_tests.go refers to
2014-12-29 14:17:48 -05:00
Xiang Li
fc96a9e4a7
raft: remove unnecessary funcs in raft.go
2014-12-25 17:04:33 -08:00
Xiang Li
2dbdf87f86
raft: add doc for storage
2014-12-22 12:33:14 -08:00
Xiang Li
896bac1f76
raft: flush the commit to fix a race in test
2014-12-18 17:10:37 -08:00
Xiang Li
88767d913d
raft: leader waits for the reply of previous message when follower is not in good path.
...
It is reasonable for the leader to wait for the reply before sending out the next
msgApp or msgSnap for the follower in bad path. Or the leader will send out useless
messages if the previous message is rejected or the previous message is a snapshot.
Especially for the snapshot case, the leader will be 100% to send out duplicate message
including the snapshot, which is a huge waste.
This commit implement a timeout based wait mechanism. The timeout for normal msgApp is a
heartbeatTimeout and the timeout for snapshot is electionTimeout(snapshot is larger). We
can implement a piggyback mechanism(application notifies the msg lost) in the future
if necessary.
2014-12-18 15:01:50 -08:00
Xiang Li
044e35b814
raft: use newRaft
2014-12-15 11:25:35 -08:00
Xiang Li
c586d5012c
raft: log term as %d
2014-12-14 10:06:45 -08:00
Xiang Li
2c2e032155
Merge pull request #1908 from bdarnell/error-fixes
...
raft: remove panic when we see a proposal with no leader.
2014-12-11 13:58:51 -08:00
Ben Darnell
b26856b603
raft: add detail to "no leader" log message
2014-12-11 15:07:32 -05:00
Xiang Li
89cba625d6
Merge pull request #1897 from xiang90/raft
...
raft: get rid of the using of defer in critical path
2014-12-10 21:24:38 -08:00
Yicheng Qin
e89cc25c50
Merge pull request #1901 from yichengq/260
...
rafthttp: batch MsgProp
2014-12-10 21:16:07 -08:00
Yicheng Qin
3867c72c8a
raft: support to do multiple proposals in one message
2014-12-10 20:00:59 -08:00
Ben Darnell
fa247d09cc
raft: remove panic when we see a proposal with no leader.
...
This panic can never be reached when using raft.Node, because we only
read from propc when there is a leader. However, it is possible to see
this error when using raft the raft object directly (as in MultiNode),
and in this case it is better to simply drop the proposal (as if we had
sent it to a leader that immediately vanished).
Add an error return to MemoryStorage.Append for consistency.
2014-12-10 17:34:40 -05:00
Xiang Li
96de9776b7
raft: get rid of allocation
2014-12-10 13:41:04 -08:00
Xiang Li
e4c0f5c1a8
Merge pull request #1895 from xiang90/snap_nodes
...
etcd: update conf when apply the confChange entry
2014-12-09 11:45:01 -08:00
Xiang Li
a5efbf826d
raft: drop nodes in softState
2014-12-09 11:43:52 -08:00
Yicheng Qin
0472ddf05f
Merge pull request #1890 from yichengq/259
...
raft: set raft.Commit too when setting raftLog.committed
2014-12-09 11:28:05 -08:00
Yicheng Qin
4804c45e14
raft: set raft.Commit too when setting raftLog.committed
2014-12-08 22:35:55 -08:00
Yicheng Qin
22dd3b039c
Merge pull request #1888 from yichengq/258
...
raft: increase term to 1 before append initial entries
2014-12-08 22:27:23 -08:00
Yicheng Qin
7317834417
raft: increase term to 1 before append initial entries
...
Because the term of new raft is 0, it is weird to have term-1 committed
entries in the log.
2014-12-08 22:21:39 -08:00
Xiang Li
ba45637ba3
raft: group step funcs
2014-12-08 15:29:54 -08:00
Xiang Li
099f4f10ea
raft: one line
2014-12-08 15:28:48 -08:00
Xiang Li
8ead428e76
raft: group getter funcs
2014-12-08 15:24:34 -08:00
Xiang Li
f73d059d80
raft: group configuration related funcs
2014-12-08 15:23:21 -08:00
Xiang Li
25313b1210
raft: move poll close to campaign
2014-12-08 15:21:57 -08:00
Xiang Li
d52c66ad42
raft: removed unused func
2014-12-08 15:20:43 -08:00
Xiang Li
62ed1de10d
raft: refactoring logging
2014-12-08 15:16:02 -08:00
Xiang Li
6cb7f2d9e9
raft: print out log when creating a newraft
2014-12-08 14:37:39 -08:00
Ben Darnell
ea4d645a83
raft: Ignore redundant addNode calls.
...
This avoids clobbering any state when bootstrapping entries are
applied twice.
2014-12-05 17:15:50 -05:00
Ben Darnell
3d91faf85a
Pre-apply the bootstrapping ConfChange entries.
...
This eliminates the need to fake an ApplyConfChange call before Campaign
in tests.
Fixes #1856 .
2014-12-05 15:35:39 -05:00
Xiang Li
6409a8bf0d
raft: filter out messages from unknow sender.
...
If we cannot find the `m.from` from current peers in the raft and it is a response
message, we should filter it out or raft panics. We are not targetting to avoid
malicious peers.
It has to be done in the raft node layer syncchronously. Although we can check
it at the application layer asynchronously, but after the checking and before
the message going into raft, the raft state machine might make progress and
unfortunately remove the `m.from` peer.
2014-12-05 11:34:56 -08:00
Xiang Li
182c30a41a
raft: refactor logging at node level
2014-12-04 21:03:06 -08:00
Xiang Li
197e6b1b20
Merge pull request #1858 from vlajos/typofixes-vlajos-20141204
...
typofixes - https://github.com/vlajos/misspell_fixer
2014-12-04 14:52:27 -08:00
Veres Lajos
3de2ab2c04
*: typofixes
...
https://github.com/vlajos/misspell_fixer
2014-12-04 22:51:19 +00:00
Xiang Li
a47690dd30
Merge pull request #1845 from xiang90/testunstable
...
raft: add TestUnstableTruncateAndAppend
2014-12-04 11:03:37 -08:00
Xiang Li
4ebd3a0b10
Merge pull request #1852 from xiang90/heartbeat
...
raft: add msgHeartbeat type
2014-12-04 10:25:46 -08:00
Xiang Li
149389cbfa
raft: add msgHeartbeat type
2014-12-04 08:29:31 -08:00
Yicheng Qin
e344774c10
Merge pull request #1850 from yichengq/247
...
raft: return 0 for term of compacted index
2014-12-03 17:23:32 -08:00
Yicheng Qin
34a468de36
raft: return 0 for term of compacted index
...
It is necessary to make this check because of the following case:
1. memory storage contains ents from index 0 to 50, and unstable has
ents from index 50 to 60.
2. raft receives an incoming snapshot with index 100.
3. raft restores its unstable to 100, but has not applied snapshot on memory storage.
4. raft receives an out-dated MsgApp from index 60.
5. raft finds the term of index 60 to check the match.
6. raft asks memory storage about the term of index 60 after it failed to get
it from unstable.
7. memory storage panics because it knows nothing about index 60.
2014-12-03 17:22:36 -08:00
Xiang Li
ddd9cb7345
raft: add TestUnstableTruncateAndAppend
2014-12-03 16:37:19 -08:00
Xiang Li
2caf4f5f22
raft: fix log format in sendAppend
2014-12-03 16:11:44 -08:00
Xiang Li
06a5892a18
raft: more logging
2014-12-03 14:46:24 -08:00
Xiang Li
8074a5b5a4
raft: fix error message format in test
2014-12-03 13:36:47 -08:00
Xiang Li
37ab463e86
raft: add TestUnstableStableTo
2014-12-03 13:26:35 -08:00
Xiang Li
7703d4942c
raft: add TestUnstableRestore
2014-12-03 13:03:56 -08:00
Xiang Li
be60c88603
Merge pull request #1842 from xiang90/unstable_test
...
raft: add TestUnstableFirstIndex
2014-12-03 11:50:39 -08:00
Yicheng Qin
63ed202db6
raft: print out term in decimal format
2014-12-03 11:33:51 -08:00
Xiang Li
48f75ca645
raft: add TestUnstableMaybeTerm
2014-12-03 11:30:59 -08:00
Xiang Li
058356d9bd
raft: add TestUnstableLastIndex
2014-12-03 11:11:31 -08:00
Xiang Li
98ebfa3468
raft: add TestUnstableFirstIndex
2014-12-03 11:11:11 -08:00
Yicheng Qin
23b32a6cbe
Merge pull request #1716 from yichengq/225
...
raft: panic if loaded commit is out of range
2014-12-02 22:14:12 -08:00
Yicheng Qin
38768e5396
raft: panic if loaded commit is out of range
2014-12-02 22:09:34 -08:00
Xiang Li
b3841afcc3
raft: do not restore snapshot if local raft has longer matching history
...
Raft should not restore the snapshot if it has longer matching history.
Or restoring snapshot might remove the matched entries.
2014-12-02 21:34:14 -08:00
Xiang Li
3209fd544b
raft: panic on bad slice
2014-12-02 17:48:03 -08:00
Xiang Li
79014556e9
Merge pull request #1831 from xiang90/fix_unstable
...
raft: fix unstable
2014-12-02 14:43:11 -08:00
Xiang Li
2f5b748a90
raft: clearify that the firstIndex might not be available.
2014-12-02 14:27:52 -08:00
Yicheng Qin
1c7b9317a9
Merge pull request #1833 from yichengq/244
...
raft: not call stableTo for restored snapshot
2014-12-02 13:20:39 -08:00
Yicheng Qin
551a56fb98
raft: not call stableTo for restored snapshot
...
Stable has been set when restoring the snapshot in raftlog, so we don't need
to set it after advance.
2014-12-02 13:10:35 -08:00
Xiang Li
b7ca56e3c8
raft: move good case of truncateAndAppend to the first place
2014-12-02 13:05:55 -08:00
Xiang Li
3cadaca1a3
Merge pull request #1830 from xiang90/raft_snap_log
...
raft: log snapshot events
2014-12-02 12:06:15 -08:00
Xiang Li
411063e14f
raft: log snapshot events
2014-12-02 11:57:10 -08:00
Xiang Li
788d1e59a2
raft: use index in entry
2014-12-02 10:25:27 -08:00
Xiang Li
51de095d2c
raft: logging state change events and events on bad path
2014-12-02 10:08:19 -08:00
Xiang Li
312db7f0f3
raft: fix memory storage
...
Memory storage should append all entries that have greater index
than the snap.Matedata.Index. We first truncate the old parts of
incoming entries. Then truncate the existing entries in the storage.
At last, we append the incoming entries to the existing entries.
2014-12-01 16:37:16 -08:00
Xiang Li
19ccdbee18
Merge pull request #1806 from xiang90/no_copy
...
No copy
2014-12-01 13:15:13 -08:00
Xiang Li
92d4112feb
Merge pull request #1809 from xiang90/unstable
...
raft: stableTo checks term matching
2014-12-01 11:09:40 -08:00
Xiang Li
649176934a
raft: add tests for stableTo
2014-12-01 10:54:34 -08:00
Xiang Li
3c0fbe285c
raft: stableTo checks term matching
...
stableTo should only mark the index stable if the term is matched. After raft sends out unstable
entries to application, raft makes progress without waiting for reply. When the appliaction
calls the stableTo to notify the entries up to "index" are stable, raft might have truncated
some entries before "index" due to leader lost. raft must verify the (index,term) of stableTo,
before marking the entries as stable.
2014-11-28 14:13:07 -08:00
Xiang Li
d214e87aee
raft: make unstable.entries immutable; copy the entries at bad path
2014-11-27 19:35:03 -08:00
Xiang Li
d244e3bf6e
raft: fix node bench
2014-11-26 23:07:35 -08:00
Xiang Li
fe0bc4ff36
Merge pull request #1805 from xiang90/fix_raft_b
...
raft: fix start term
2014-11-26 21:41:38 -08:00
Xiang Li
746c66b466
raft: fix start term
2014-11-26 21:21:13 -08:00
Xiang Li
7929e46dd8
raft: clean up
2014-11-26 15:31:07 -08:00
Xiang Li
8a626257c7
raft: move unstable related function to log_unstable.go
2014-11-26 15:25:24 -08:00