Commit Graph

531 Commits (1f206c027a37d2ed80b59465d37e8c22477209c1)

Author SHA1 Message Date
Xiang Li 4c81615cef etcdserver: initial support for cluster-wide v3 request 2015-09-13 08:32:01 -07:00
Hitoshi Mitake 6974fc63ed etcdserver: avoid deadlock caused by adding members with wrong peer URLs
Current membership changing functionality of etcd seems to have a
problem which can cause deadlock.

How to produce:
 1. construct N node cluster
 2. add N new nodes with etcdctl member add, without starting the new members

What happens:
After finishing add N nodes, a total number of the cluster becomes 2 *
N and a quorum number of the cluster becomes N + 1. It means
membership change requires at least N + 1 nodes because Raft treats
membership information in its log like other ordinal log append
requests.

Assume the peer URLs of the added nodes are wrong because of miss
operation or bugs in wrapping program which launch etcd. In such a
case, both of adding and removing members are impossible because the
quorum isn't preserved. Of course ordinal requests cannot be
served. The cluster would seem to be deadlock.

Of course, the best practice of adding new nodes is adding one node
and let the node start one by one. However, the effect of this problem
is so serious. I think preventing the problem forcibly would be
valuable.

Solution:
This patch lets etcd forbid adding a new node if the operation changes
quorum and the number of changed quorum is larger than a number of
running nodes. If etcd is launched with a newly added option
-strict-reconfig-check, the checking logic is activated. If the option
isn't passed, default behavior of reconfig is kept.

Fixes https://github.com/coreos/etcd/issues/3477
2015-09-13 09:31:53 +09:00
Xiang Li d94e712d91 *: support wal dir 2015-09-01 09:54:27 -07:00
Yicheng Qin 8f6bf029f8 etcdserver: specify request timeout error due to connection lost
It specifies request timeout error possibly caused by connection lost,
and print out better log for user to understand.

It handles two cases:
1. the leader cannot connect to majority of cluster.
2. the connection between follower and leader is down for a while,
and it losts proposals.

log format:
```
20:04:19 etcd3 | 2015-08-25 20:04:19.368126 E | etcdhttp: etcdserver:
request timed out, possibly due to connection lost
20:04:19 etcd3 | 2015-08-25 20:04:19.368227 E | etcdhttp: etcdserver:
request timed out, possibly due to connection lost
```
2015-08-26 12:38:37 -07:00
Xiang Li 6b23a8131f *: test gofmt with -s and fix reported issues 2015-08-21 18:52:16 -07:00
Yicheng Qin 2d5b95c49f etcdserver: use ReqTimeout only
We cannot refer RTT value from heartbeat interval, so CommitTimeout
is invalid. Remove it and use ReqTimeout instead.
2015-08-17 14:54:25 -07:00
Xiang Li f199a484af *: only print out major.minor version for cluster version 2015-08-15 08:30:06 -07:00
Yicheng Qin c229e6e655 etcdserver: improve error message when timeout due to leader fail 2015-08-13 15:46:21 -07:00
Yicheng Qin 0fdb77aea2 etcdserver: go back to marshal request in 2.1 way
It fixes the problem that 2.1 cannot roll upgrade to 2.2 smoothly
because 2.1 cannot understand the bytes marshalled at 2.2.
2015-08-13 13:41:52 -07:00
Yicheng Qin 27170e67b9 etcdserver: specify timeout caused by leader election
Before this PR, the timeout caused by leader election returns:

```
14:45:37 etcd2 | 2015-08-12 14:45:37.786349 E | etcdhttp: got unexpected
response error (etcdserver: request timed out)
```

After this PR:

```
15:52:54 etcd1 | 2015-08-12 15:52:54.389523 E | etcdhttp: etcdserver:
request timed out, possibly due to leader down
```
2015-08-12 16:53:18 -07:00
Yicheng Qin 5a91937367 etcdserver: adjust commit timeout based on config
It uses heartbeat interval and election timeout to estimate the
commit timeout for internal requests.

This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
2015-08-11 21:09:03 -07:00
Xiang Li a718329ad3 Merge pull request #3248 from xiang90/v3
initial v3 demo
2015-08-10 13:59:03 -07:00
Brandon Philips fb1951204c etcdserver: move atomics to make etcd work on arm64
Follow the simple rule in the atomic package:

"On both ARM and x86-32, it is the caller's responsibility to arrange
for 64-bit alignment of 64-bit words accessed atomically. The first word
in a global variable or in an allocated struct or slice can be relied
upon to be 64-bit aligned."

Tested on a system with /proc/cpuinfo reporting:

processor       : 0
model name      : ARMv7 Processor rev 1 (v7l)
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3
tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc0d
CPU revision    : 1
2015-08-08 18:11:41 -07:00
Xiang Li 9ff7075ce8 etcdserver: use v3server interface 2015-08-08 10:39:04 -07:00
Xiang Li f004b4dac7 *: etcdserver supports v3 demo 2015-08-08 05:58:29 -07:00
Xiang Li 58503817ec etcdserver: internal request union 2015-08-05 07:47:10 -07:00
Yicheng Qin 6fc9dbfe56 Merge pull request #3114 from yichengq/clean-raft-init
etcdserver: clean up start and stop logic of raft
2015-07-27 14:19:25 -07:00
Yicheng Qin 7696dd3280 etcdserver: clean up start and stop logic of raft
kill TODO and make it more readable.
2015-07-27 13:24:26 -07:00
Yicheng Qin b7892b20c1 etcdserver: rename defaultPublishRetryInterval -> defaultPublishTimeout
This makes code more readable and reasonable.
2015-07-23 10:09:28 -07:00
Yicheng Qin 7f95780bfb etcdserver: init raft internal var early
Its `stopped`/`done` should be created always before being used
in defer in server loop.

It fixes the race detected when running TestSyncTrigger.
2015-06-29 15:34:15 -07:00
Antoine Grondin 270487d340 etcdserver: use Infof to print formatted argument 2015-06-14 20:22:21 +07:00
Xiang Li e0f9796653 etcdserver: use leveled logging
Leveled logging for etcdserver pkg.
2015-06-09 13:53:07 -07:00
Xiang Li 4a72d3a8bb etcdserver: refactore member.go 2015-05-21 09:19:29 -07:00
Xiang Li db7db689a6 etcdserver: check cluster version compability when joining 2015-05-19 10:19:41 -07:00
Xiang Li 9f8342dba4 etcdserver: do not get local version via HTTP 2015-05-13 17:19:32 -07:00
Yicheng Qin a6a649f1c3 etcdserver: stop exposing Cluster struct
After this PR, only cluster's interface Cluster is exposed, which makes
code much cleaner. And it avoids external packages to rely on cluster
struct in the future.
2015-05-13 10:01:25 -07:00
Yicheng Qin 032db5e396 *: extract types.Cluster from etcdserver.Cluster
The PR extracts types.Cluster from etcdserver.Cluster. types.Cluster
is used for flag parsing and etcdserver config.

There is no need to expose etcdserver.Cluster public, which contains
lots of etcdserver internal details and methods. This is the first step
for it.
2015-05-12 14:53:11 -07:00
Xiang Li e866314b94 etcdserver: support update cluster version through raft
1. Persist the cluster version change through raft. When the member is restarted, it can recover
the previous known decided cluster version.

2. When there is a new leader, it is forced to do a version checking immediately. This helps to
update the first cluster version fast.
2015-05-12 11:44:34 -07:00
Xiang Li 94ffd72c7e etcdserver: rename StoreAdminPrefix to StoreClusterPrefix
We store cluster related key in StoreAdminPrefix for some
historical reason. The previous API is called admin. But now,
the admin name is gone and `cluster` is a more clear and correct
name.
2015-04-29 12:05:51 -07:00
Xiang Li 6699107f61 *: add cluster version and cluster version detection.
Cluster version is the min major.minor of all members in
the etcd cluster. Cluster version is set to the min version
that a etcd member is compatible with when first bootstrapp.

During a rolling upgrades, the cluster version will be updated
automatically.

For example:

```
Cluster [a:1, b:1 ,c:1] -> clusterVersion 1

update a -> 2, b -> 2

after a detection

Cluster [a:2, b:2 ,c:1] -> clusterVersion 1, since c is still 1

update c -> 2

after a detection

Cluster [a:2, b:2 ,c:2] -> clusterVersion 2
```

The API/raft component can utilize clusterVersion to determine if
it can accept a client request or a raft RPC.

We choose polling rather than pushing since we want to use the same
logic for cluster version detection and (TODO) cluster version checking.

Before a member actually joins a etcd cluster, it should check the version
of the cluster. Push does not work since the other members cannot push
version info to it before it actually joins. Moreover, we do not want our
raft RPC system (which is doing the heartbeat pushing) to coordinate cluster version.
2015-04-29 11:31:59 -07:00
Yicheng Qin 1c1cccd236 rafthttp: stop etcd if it is found removed when stream dial
The original process is stopping etcd only when pipeline message finds itself
has been removed. After this PR, stream dial has this functionality too.
It helps fast etcd stop, which doesn't need to wait for stream break to
fall back to pipeline, and wait for election timeout to send out message
to detect self removal.
2015-04-27 15:10:00 -07:00
Yicheng Qin ebecee34e0 Merge pull request #2701 from yichengq/rafthttp-anon
rafthttp: add remotes
2015-04-24 13:04:37 -07:00
Yicheng Qin 9f19b5660f rafthttp: add AddRemote
Add remotes to rafthttp, who help newly joined members catch up the
progress of the cluster. It supports basic message sending to remote, and
has no stream connection for simplicity. remotes will not be used
after the latest peers have been added into rafthttp.
2015-04-24 11:49:23 -07:00
xiaost cab1e9a723 etcdserver: skip noop entry in apply 2015-04-24 12:15:51 +08:00
Yicheng Qin 1d96de459a etcdserver: init server stats before passing it as argument
It is more reasonable to init the variable before passing it as an
argument.

It fixes a bug that etcdserver may panic on server stats when processing
a message from rafthttp streamReader before server stats is initialized
in server.Start().
2015-04-22 08:28:08 -07:00
Yicheng Qin 1811701427 Revert "etcdserver: fix cluster fallback recovery"
This reverts commit cff005777a.

Conflicts:
	etcdserver/server.go
2015-04-19 11:34:33 -07:00
Yicheng Qin 88224f6f4e Revert "etcdserver: not apply stale conf change in cluster and transport"
This reverts commit 40197f0698.
2015-04-19 11:08:03 -07:00
Xiang Li 98f8dfbc9d etcdserver: prevExist=true + condition is compareAndSwap
PrevExist indicates the key should exist. Condition compares with
an existing key. So PrevExist+condition = CompareAndSwap not Update.
2015-04-14 23:44:06 -07:00
xiaost eab2c2224a etcdserver: fix minor bug in EtcdServer.send
it seems to nothing serious.
after deleted peers, the log may output:
"etcdserver: send message to unknown receiver %s"
2015-04-13 20:35:58 +08:00
Yicheng Qin 7a7e1f7a7c etcdserver: metrics and monitor number of file descriptor
It exposes the metrics of file descriptor limit and file descriptor used.
Moreover, it prints out warning when more than 80% of fd limit has been used.

```
2015/04/08 01:26:19 etcdserver: 80% of the file descriptor limit is open
[open = 969, limit = 1024]
```
2015-04-08 11:17:48 -07:00
Yicheng Qin 9e5743c816 etcdserver: stop raft node goroutine before stop server
Stop raftNode goroutine before stopping server goroutine, so
server.Stop does stop all underlying stuffs elegantly now. This fixes
the problem that previous-round lock on WAL may not be released when
etcd is restarted.
2015-04-01 11:20:51 -07:00
Yicheng Qin dd92a2b484 Merge pull request #2556 from yichengq/fix-apply-conf
etcdserver: not apply stale conf change
2015-03-27 14:00:30 -07:00
Yicheng Qin 40197f0698 etcdserver: not apply stale conf change in cluster and transport 2015-03-27 12:53:34 -07:00
Yicheng Qin 5e0077cc0c etcdserver: print out extra files in data dir instead of erroring 2015-03-24 18:56:22 -07:00
Yicheng Qin abcd828114 etcdserver: add join-existing check 2015-03-23 22:31:20 -07:00
Xiang Li d015610da5 etcdserver: separate apply and raft routine 2015-03-10 13:34:24 -07:00
Yicheng Qin b4b9b9118a rafthttp: report MsgSnap status 2015-03-02 09:38:11 -08:00
Yicheng Qin 9989bf1d36 Merge pull request #2407 from yichengq/334
rafthttp: report unreachable status of the peer
2015-03-02 09:35:35 -08:00
Yicheng Qin 9b986fb4c1 rafthttp: report unreachable status of the peer
When it failed to send message to the remote peer, it reports unreachable
to raft.
2015-03-01 16:48:26 -08:00
Xiang Li 428b77afc3 etcdserver: keep a min number of entries in memory
Do not aggressively compact raft log entries. After a snapshot,
etcd server can compact the raft log upto snapshot index. etcd server
compacts to an index smaller than snapshot to keep some entries in memory.
The leader can still read out the in memory entries to send to a slightly
slow follower. If all the entries are compacted, the leader will send the
whole snapshot or read entries from disk if possible.
2015-03-01 10:12:13 -08:00
Xiang Li a4dab7ad75 *: do not block etcdserver when encoding store into json
Encoding store into json snapshot has quite high CPU cost. And it
will block for a while. This commit makes the encoding process non-
blocking by running it in another go-routine.
2015-02-28 11:41:58 -08:00
Xiang Li 86429264fb wal: support auto-cut in wal
WAL should control the cut logic itself. We want to do falloc to
per allocate the space for a segmented wal file at the beginning
and cut it when it size reaches the limit.
2015-02-28 11:18:59 -08:00
Xiang Li 95bba154d6 etcdserver: add propose summary 2015-02-28 11:16:42 -08:00
Xiang Li 2e078582f9 etcdmain: expose runtime metrics 2015-02-28 10:11:53 -08:00
Xiang Li 33afbfead6 etcdserver: remove the dep on metrics. first step towards removing metrics pkg from etcd. 2015-02-28 10:09:55 -08:00
Xiang Li 5ede18be74 raft: separate compact and createsnap in memory storage 2015-02-28 10:08:30 -08:00
Yicheng Qin cff005777a etcdserver: fix cluster fallback recovery
Cluster and transport may recover to old states when new node joins
the cluster. Record cluster last modified index to avoid this.
2015-02-20 14:30:00 -08:00
Barak Michener 92dca0af0f *: remove shadowing of variables from etcd and add travis test
We've been bitten by this enough times that I wrote a tool so that
it never happens again.
2015-02-17 16:31:42 -05:00
Xiang Li beb44ef6ba etcdserver: fix error message when valide the discovery cluster 2015-02-16 09:53:01 -08:00
Xiang Li c5ca1218f3 etcdserver: GetClusterFromPeers -> GetClusterFromRemotePeers 2015-02-13 19:05:29 -08:00
Xiang Li f7540912d6 etcdserver: getOtherPeerURLs -> getRemotePeerURLs 2015-02-13 18:56:45 -08:00
Xiang Li cfa7ab6074 etcdserver: validate discovery cluster 2015-02-13 14:32:24 -08:00
Xiang Li c16cc3a6a3 etcdserver: recover transport when recovering from a snapshot 2015-02-13 10:16:28 -08:00
Xiang Li fbc4c8efb5 etcdserver: fix snapshot 2015-02-13 09:54:25 -08:00
Barak Michener a0e3bc9cbd etcdserver: Unmask the snapshotter. Fixes #2295 2015-02-13 11:56:00 -05:00
Barak Michener cd50f0e058 etcdserver: Create MemberDir() and base {Snap,WAL}Dir() thereon. Audit DataDir. 2015-02-12 12:45:19 -05:00
Barak Michener fade9b6065 etcdserver: Refactor 2.0.1 directory rename into a proper migration
fix all instances

fix detection test
2015-02-12 11:53:19 -05:00
Xiang Li 163f0f09f6 etcdserver: cleanup cluster_util 2015-02-11 16:20:38 -08:00
Xiang Li 20497f1f85 etcdserver: move remote cluster retrive to cluster_util.go 2015-02-11 14:03:14 -08:00
Xiang Li 6e1aecfc6f etcdserver: save confstate when apply new snapshot 2015-02-10 07:31:25 -08:00
Yicheng Qin f13c7872d5 etcdserver: register pre-defined namespaces in store 2015-02-04 16:33:40 -08:00
Yicheng Qin 7840d49ae0 etcdserver: not add self to transporter based on local ID
If this is decided by local name, it comes to trouble if the name is
duplicate in the cluster.
2015-01-29 12:35:47 -08:00
Xiang Li 276c9540b4 etcdserver: support raft.status 2015-01-26 16:39:33 -08:00
Xiang Li 9c7f66c5d9 Merge pull request #2119 from sorah/peer-ca-on-fetching-members
etcdserver: User peerTLSInfo to get cluster member
2015-01-26 10:50:44 -08:00
Shota Fukumori (sora_h) 033e7d1db9 etcdserver: User peerTLSInfo to get cluster member 2015-01-27 03:43:21 +09:00
Jonathan Boulle f1ed69e883 *: switch to line comments for copyright
Build tags are not compatible with block comments.
Also adds copyright header to a few places it was missing.
2015-01-26 09:53:30 -08:00
Xiang Li 973f79e1c9 etcdserver: separate out raft related stuff 2015-01-15 15:15:13 -08:00
Xiang Li 276a4abac0 etcdserver: make heartbeat/election configurable 2015-01-15 11:11:33 -08:00
Yicheng Qin 07a69430c1 *: move etcdserver/idutil -> pkg/idutil 2015-01-13 11:54:51 -08:00
Yicheng Qin bca1e5aea6 Merge pull request #2057 from yichengq/282
fix context time-out failure on travis
2015-01-07 13:41:26 -08:00
Yicheng Qin 930156c18a integration: adjust election ticks using env var 2015-01-07 11:18:29 -08:00
Yicheng Qin 6b237416e1 Merge pull request #2044 from yichengq/278
wal: record mark when snapshotting
2015-01-07 08:26:33 -08:00
Yicheng Qin 6460e49a33 wal: save empty snapshot when create
So caller can open at empty snapshot to read all entries.
2015-01-06 19:48:21 -08:00
Yicheng Qin 84f62f21ee wal: record and check snapshot 2015-01-06 16:27:40 -08:00
Xiang Li 1ebad5e42c etcdhttp: support member/leader endpoint 2015-01-06 08:52:33 -08:00
Xiang Li 15be030aaa etcdserver: collect error from errorc 2015-01-02 20:13:46 -08:00
Xiang Li 2a83e350b1 Merge pull request #1992 from xiang90/rm_leader
*: support removing the leader from a 2 members cluster
2015-01-02 14:15:12 -08:00
Xiang Li 6e727625b9 etcdserver: continue to apply after self-removed 2015-01-02 14:10:07 -08:00
Xiang Li 04003a01ba Merge pull request #2013 from xiang90/tr
rafthttp cleanup
2014-12-31 08:35:20 -08:00
Xiang Li 803c38f448 etcdserver: move error to errors.go
Both server.go and cluster.go are using defined ErrX. Move error
to errors.go
2014-12-30 15:02:07 -08:00
Xiang Li c3d2f5eea0 pbutil: add getbool to pbutil 2014-12-30 14:51:26 -08:00
Yicheng Qin 6ccaadc95d Merge pull request #1952 from yichengq/262
etcdserver: add id generator
2014-12-29 13:59:06 -08:00
Yicheng Qin 05c921229e etcdserver: add id generator 2014-12-29 13:03:04 -08:00
Xiang Li c712dd682a rafthttp: make Transport private 2014-12-29 12:20:52 -08:00
Xiang Li a14d13f724 rafthttp: make fields in Transport private 2014-12-29 12:08:13 -08:00
Xiang Li 7c8b9c0203 Merge pull request #2011 from xiang90/timeutil
etcdserver: move getExpr to timeutil
2014-12-29 12:03:25 -08:00
Xiang Li 152676f43a *: support removing the leader from a 2 members cluster 2014-12-29 11:34:33 -08:00
Yicheng Qin 5bb8eeb5cf rafthttp: transport cleanup 2014-12-29 11:21:40 -08:00
Xiang Li cea29fe158 etcdserver: move getExpr to timeutil 2014-12-29 11:15:02 -08:00
Yicheng Qin 08f839e32c rafthttp: set the API boundary of the package 2014-12-28 15:50:27 -08:00
Xiang Li 1535596252 etcdserver: remove unnecessary indirection 2014-12-26 11:03:13 -08:00
Xiang Li 3dcd66459d etcdserver: remove unused containsUint64() 2014-12-26 10:56:56 -08:00
Xiang Li 69444b6bba etcdserver: cleanup server.go 2014-12-25 21:37:20 -08:00
Xiang Li 78b51d3f2f etcdserver: cleanup cluster.go 2014-12-25 20:56:30 -08:00
Xiang Li f43bc809b9 etcdserver: cleanup wal upgrade 2014-12-24 22:02:46 -08:00
Jonathan Boulle 4e6cbc937e *: change remaining 0.5 references -> 2.0 2014-12-18 16:14:42 -08:00
Xiang Li c27c288bef etcdserver: update stats when become leader 2014-12-15 17:02:48 -08:00
Xiang Li 04522baeee etcdserver: fix leader stats 2014-12-15 16:50:03 -08:00
Xiang Li 53bf7e4b5e wal: rename openAtIndex -> open; OpenAtIndexUntilUsing -> openNotInUse 2014-12-14 19:33:06 -08:00
Xiang Li ea94d19147 *: lock the in using files; do not purge locked the wal files 2014-12-14 19:27:22 -08:00
Xiang Li 935f7128a9 etcdserver: move stats inferface to stats pkg 2014-12-11 22:14:05 -08:00
Barak Michener 5f16fab541 Merge pull request #1915 from barakmich/1834
Return Unknown instead of NotExist
2014-12-11 13:49:26 -05:00
Barak Michener cf7690cb51 detect more cases of empty directories and actual errors 2014-12-11 13:37:32 -05:00
Xiang Li c26542b7f2 Merge pull request #1913 from xiang90/lazy_snap_dir
etcdserver: create snap dir until start the node
2014-12-11 09:39:51 -08:00
Xiang Li 836ccabad2 etcdserver: create snap dir until start the node 2014-12-11 09:25:18 -08:00
Rob Szumski 13f3158728 etcdserver: improve discovery ignore warning 2014-12-09 15:57:25 -08:00
Xiang Li a5efbf826d raft: drop nodes in softState 2014-12-09 11:43:52 -08:00
Xiang Li 29d7a2a558 etcd: update conf when apply the confChange entry 2014-12-08 23:37:07 -08:00
Yicheng Qin 9c8f5c9535 Merge pull request #1891 from yichengq/257
etcdserver: init state before run loop correctly
2014-12-08 16:38:33 -08:00
Yicheng Qin 13814c9d7d etcdserver: init state before run loop correctly 2014-12-08 16:13:16 -08:00
Yicheng Qin 7e06d85651 etcdserver: apply entries when it is not empty
Or it updates appliedi wrongly.
2014-12-08 15:56:38 -08:00
Yicheng Qin 71f3b80fbe etcdserver: check recovery error when new server 2014-12-08 14:55:23 -08:00
Yicheng Qin 8c338ffcc7 etcdserver: correct the log about recovering from snapshot 2014-12-08 14:51:42 -08:00
Yicheng Qin 771ff4589d etcdserver: not add self into sendhub when new server 2014-12-05 00:18:40 -08:00
Yicheng Qin 1d1c2ff834 Merge pull request #1841 from yichengq/246
etcdserver: close storage when stop
2014-12-04 15:36:24 -08:00
Yicheng Qin a7bc03b42b etcdserver: close storage when stop 2014-12-04 15:16:22 -08:00
Xiang Li 88e2fab572 Merge pull request #1859 from xiang90/pause_test
*: add pauseMember test
2014-12-04 15:11:59 -08:00
Veres Lajos 3de2ab2c04 *: typofixes
https://github.com/vlajos/misspell_fixer
2014-12-04 22:51:19 +00:00
Xiang Li 151f043414 *: add pauseMember test 2014-12-04 14:22:43 -08:00
Xiang Li 7beac083ff Merge pull request #1810 from xiang90/purge
*: support purging old wal/snap files
2014-12-01 12:05:05 -08:00
Xiang Li d3db010190 *: support purging old wal/snap files 2014-12-01 11:50:17 -08:00
Xiang Li bc5acd3c42 etcdserver: log snapshot event 2014-11-30 12:10:20 -08:00
Xiang Li e23f9e76d1 raft: do not applysnapshot in raft 2014-11-26 10:59:13 -08:00
Xiang Li 9df0e7715d raft: do not panic on out of date compaction 2014-11-25 15:14:39 -08:00
Xiang Li 01cbcce8ba etcdserver: do not applySnapshot twice 2014-11-25 14:53:49 -08:00
Xiang Li 74d8c7f457 etcdserver: cleanup main loop 2014-11-25 14:38:18 -08:00
Yicheng Qin a13d5a70ff etcdserver: save snapshot before entries 2014-11-25 12:39:15 -08:00
Yicheng Qin 54e1237271 etcdserver: panic when snapshot on raft storage
Snapshot on raft storage should always succeed. If there is an error, it must
be internal fault and needs stack info to debug.
2014-11-24 21:22:49 -08:00
Yicheng Qin 1b038da18a etcdserver: init snapi when init appliedi 2014-11-24 21:19:30 -08:00
Yicheng Qin bd9e93eeea etcdserver: remove finished TODO for raftStorage.Compact 2014-11-24 21:10:53 -08:00
Yicheng Qin 185d37c333 etcdserver: not load dummy entry from the wal 2014-11-24 20:51:04 -08:00
Xiang Li d69e4dbe6d etcdserver: initial index to 1 2014-11-24 14:57:08 -08:00
Xiang Li 453133977d etcdserver: save snapshot only if the index is greater than previous snap index 2014-11-24 14:47:59 -08:00
Xiang Li 08f156a1de etcdserver: remove extra empty line in snapshot func 2014-11-24 10:27:18 -08:00
Ben Darnell 0d680d0e6b Merge remote-tracking branch 'coreos/master' into merge
* coreos/master:
  rafthttp: fix import
  raft: should not decrease match and next when handling out of order msgAppResp
  Fix migration to allow snapshots to have the right IDs
  add snapshotted integration test
  fix test import loop
  fix import loop, add set to types, and fix comments
  etcdserver: autodetect v0.4 WALs and upgrade them to v0.5 automatically
  wal: add a bench for write entry
  rafthttp: add streaming server and client
  dep: use vendored imports in codegangsta/cli
  dep: bump golang.org/x/net/context

Conflicts:
	etcdserver/server.go
	etcdserver/server_test.go
	migrate/snapshot.go
2014-11-21 15:40:11 -05:00
Ben Darnell 30690d15d9 Re-enable a few tests I had missed.
Fix integration test for the change to log entry zero.

Increase test timeouts since integration tests often take
longer than 10s for me.
2014-11-21 15:27:17 -05:00
Brian Waldon c0fb1c8a00 Merge pull request #1755 from bcwaldon/golang.org-deps
Switch to golang.org/x/net/context
2014-11-20 16:26:14 -08:00
Barak Michener 59a0c64e9f fix import loop, add set to types, and fix comments 2014-11-20 15:38:08 -05:00
Barak Michener 78ea3335bf etcdserver: autodetect v0.4 WALs and upgrade them to v0.5 automatically 2014-11-20 15:38:08 -05:00
Yicheng Qin 9d53b94546 rafthttp: add streaming server and client 2014-11-20 11:34:50 -08:00
Brian Waldon 9a728a127a dep: bump golang.org/x/net/context
Move from code.google.com/p/go.net/context to
golang.org/x/net/context before bumping to latest.
2014-11-20 10:19:12 -08:00
Ben Darnell b29240baf0 Merge remote-tracking branch 'coreos/master' into merge
* coreos/master:
  scripts: build-docker tag and use ENTRYPOINT
  scripts: build-release add etcd-migrate
  create .godir
  raft: optimistically increase the next if the follower is already matched
  raft: add handleHeartbeat handleHeartbeat commits to the commit index in the message. It never decreases the commit index of the raft state machine.
  rafthttp: send takes raft message instead of bytes
  *: add rafthttp pkg into test list
  raft: include commitIndex in heartbeat
  rafthttp: move server stats in raftHandler to etcdserver
  *: etcdhttp.raftHandler -> rafthttp.RaftHandler
  etcdserver: rename sender.go -> sendhub.go
  *: etcdserver.sender -> rafthttp.Sender

Conflicts:
	raft/log.go
	raft/raft_paper_test.go
2014-11-19 17:05:16 -05:00
Ben Darnell 355ee4f393 raft: Integrate snapshots into the raft.Storage interface.
Compaction is now treated as an implementation detail of Storage
implementations; Node.Compact() and related functionality have been
removed. Ready.Snapshot is now used only for incoming snapshots.

A return value has been added to ApplyConfChange to allow applications
to track the node information that must be stored in the snapshot.

raftpb.Snapshot has been split into Snapshot and SnapshotMetadata, to
allow the full snapshot data to be read from disk only when needed.

raft.Storage has new methods Snapshot, ApplySnapshot, HardState, and
SetHardState. The Snapshot and HardState parameters have been removed
from RestartNode() and will now be loaded from Storage instead.
The only remaining difference between StartNode and RestartNode is that
the former bootstraps an initial list of Peers.
2014-11-19 16:40:26 -05:00
Yicheng Qin a2c568a144 Merge pull request #1669 from yichengq/215
*: add rafthttp as a separate package
2014-11-17 16:14:59 -08:00
Yicheng Qin f24e214ee5 rafthttp: move server stats in raftHandler to etcdserver 2014-11-17 16:02:20 -08:00
Yicheng Qin 5dc5f8145c *: etcdhttp.raftHandler -> rafthttp.RaftHandler 2014-11-17 15:52:24 -08:00
Ben Darnell 300c5a2001 Merge remote-tracking branch 'coreos/master' into log-storage-interface
* coreos/master: (21 commits)
  etcdserver: refactor ValidateClusterAndAssignIDs
  integration: add integration test for remove member
  integration: add test for member restart
  version: bump to alpha.3
  etcdserver: add buffer to the sender queue
  *: gracefully stop etcdserver
  Fix up migration tool, add snapshot migration
  etcd4: migration from v0.4 -> v0.5
  etcdserver: export Member.StoreKey
  etcdserver: recover cluster when receiving newer snapshot
  etcdserver: check and select committed entries to apply
  etcdserver: recover from snapshot before applying requests
  raft: not set applied when restored from snapshot
  sender: support elegant stop
  etcdserver: add StopNotify
  etcdserver: fix TestDoProposalStopped test
  etcdserver: minor cleanup
  etcdserver: validate new node is not registered before in best effort
  etcdserver: fix server.Stop()
  *: print out configuration when necessary
  ...

Conflicts:
	etcdserver/server.go
	etcdserver/server_test.go
	raft/log.go
2014-11-17 18:28:24 -05:00
Xiang Li e04e4632b3 Merge pull request #1736 from xiang90/verify
etcdserver: refactor ValidateClusterAndAssignIDs
2014-11-17 14:34:59 -08:00
Xiang Li 0541f0afa0 etcdserver: refactor ValidateClusterAndAssignIDs 2014-11-17 14:23:37 -08:00
Xiang Li c26de66262 integration: add integration test for remove member 2014-11-17 13:28:09 -08:00
Xiang Li 8bf71d796e *: gracefully stop etcdserver 2014-11-14 14:12:24 -08:00
Barak Michener 192f200d9e Fix up migration tool, add snapshot migration
Fixes all updates since bcwaldon sketched the original, with cleanup and
into an acutal working state. The commit log follows:

fix pb reference and remove unused file post rebase

unbreak the migrate folder

correctly detect node IDs

fix snapshotting

Fix previous broken snapshot

Add raft log entries to the translation; fix test for all timezones. (Still in progress, but passing)

Fix etcd:join and etcd:remove

print more data when dumping the log

Cleanup based on yichengq's comments

more comments

Fix the commited index based on the snapshot, if one exists

detect nodeIDs from snapshot

add initial tool documentation and match the semantics in the build script and main

formalize migration doc

rename function and clarify docs

fix nil pointer

fix the record conversion test

add migration to test suite and fix govet
2014-11-14 16:46:08 -05:00
Yicheng Qin 77433ff6da etcdserver: recover cluster when receiving newer snapshot 2014-11-14 12:11:21 -08:00
Yicheng Qin dfaa7290c4 etcdserver: check and select committed entries to apply 2014-11-14 12:11:16 -08:00
Yicheng Qin f6a7f96967 etcdserver: recover from snapshot before applying requests 2014-11-14 12:08:39 -08:00
Xiang Li e66bda957b Merge pull request #1714 from xiang90/stop
StopNotify
2014-11-13 15:16:52 -08:00
Xiang Li 6a1fe00615 Merge pull request #1704 from xiang90/print_config
*: print out configuration when necessary
2014-11-13 14:35:50 -08:00
Yicheng Qin 11f392bdc8 Merge pull request #1708 from yichengq/223
etcdserver: validate new node is not registered before in best effort
2014-11-13 14:30:40 -08:00
Xiang Li b5d480f17a etcdserver: add StopNotify 2014-11-13 14:16:48 -08:00
Xiang Li fb344bc33f etcdserver: minor cleanup 2014-11-13 14:01:56 -08:00
Yicheng Qin ac907d746b etcdserver: validate new node is not registered before in best effort 2014-11-13 13:56:11 -08:00
Xiang Li 30dfdb0ea9 etcdserver: fix server.Stop()
Stop should be idempotent. It should simply send a stop signal to the server.
It is the server's responsibility to stop the go-routines and related components.
2014-11-13 13:47:12 -08:00
Ben Darnell 39eddd8565 Merge remote-tracking branch 'coreos/master' into log-storage-interface
* coreos/master:
  etcdserver: add sender tests
  raft: Only call stableTo when we have ready entries or a snapshot.
  etcdserver: add ID() function to the Server interface.
  sender: use RoundTripper instead of Client in sender
2014-11-13 15:50:08 -05:00
Xiang Li d6f40acc86 etcdserver: add ID() function to the Server interface. 2014-11-13 11:37:06 -08:00
Ben Darnell b29c512f50 Merge remote-tracking branch 'coreos/master' into log-storage-interface
* coreos/master: (27 commits)
  pkg/wait: move wait to pkg/wait
  etcdserver: do not add/remove/update local member to/from sender hub
  etcdserver: not record attributes when add member
  raft: add a test for proposeConfChange
  raft: block Stop() on n.done, support idempotency
  raft: add a test for node proposal
  integration: add increase cluster size test
  integration: remove unnecessary t.Testing argument
  raft: stop the node synchronously
  integration: fix test to propagate NewServer errors
  etcdserver: move peer URLs check to config
  etcdserver: ensure initial-advertise-peer-urls match initial-cluster
  raft: add a test for node.Tick
  raft: add comment string for TestNodeStart
  etcdserver: use member instead of node at etcd level
  raft: nodes return sorted ids
  raft: update unstable when calling stableTo with 0
  *: support updating advertise-peer-url Users might want to update the peerurl of the etcd member in several cases. For example, if the IP address of the physical machine etcd running on is changed, user need to update the adversite-pee-rurl accordingly. This commit makes etcd support updating the advertise-peer-url of its members.
  transport: create a tls listener only if the tlsInfo is not empty and the scheme is HTTPS
  etcdserver: use member pointer for all tests
  ...

Conflicts:
	etcdserver/server.go
	raft/log.go
	raft/log_test.go
	raft/node.go
2014-11-13 14:21:09 -05:00
Xiang Li 92096dfdc3 *: print out configuration when necessary 2014-11-13 10:46:42 -08:00
Xiang Li 0d18a0f381 pkg/wait: move wait to pkg/wait 2014-11-13 09:11:53 -08:00
Xiang Li ba915ad5a8 etcdserver: do not add/remove/update local member to/from sender hub 2014-11-12 20:45:21 -08:00
Ben Darnell 54b07d7974 Remove raft.loadEnts and the ents parameter to raft.RestartNode.
The initial entries are now provided via the Storage interface.
2014-11-12 18:31:19 -05:00
Xiang Li 0aa8258d29 etcdserver: use member instead of node at etcd level 2014-11-12 10:45:35 -08:00
Xiang Li 5967794009 *: support updating advertise-peer-url
Users might want to update the peerurl of the etcd member in several cases.
For example, if the IP address of the physical machine etcd running on is
changed, user need to update the adversite-pee-rurl accordingly.
This commit makes etcd support updating the advertise-peer-url of its members.
2014-11-11 12:07:03 -08:00
Ben Darnell 25b6590547 raft: introduce log storage interface.
This change splits the raftLog.entries array into an in-memory
"unstable" list and a pluggable interface for retrieving entries that
have been persisted to disk. An in-memory implementation of this
interface is provided which behaves the same as the old version;
in a future commit etcdserver could replace the MemoryStorage with
one backed by the WAL.
2014-11-10 17:40:39 -05:00
Jonathan Boulle 41757e7f78 etcdserver: collapse shared readWAL logic 2014-11-08 17:07:05 -08:00
Yicheng Qin 4b9c3a9102 etcdserver: not get cluster info from self peer urls
Self peer urls have not started to serve at the time that it tries to
get cluster info, so it is useless to get cluster info from self peer
urls.
2014-11-08 13:52:48 -08:00
Yicheng Qin 014ef0f52d etcdserver: fix data race in cluster
The data race happens when etcd updates member attributes and fetches
member info in http handler at the same time.
2014-11-07 16:13:07 -08:00
Jonathan Boulle ca06fd0060 etcdserver: log cluster when adding/removing node 2014-11-07 13:36:41 -08:00
Jonathan Boulle 958ade86a5 etcdserver: log message after loading peers from snapshot 2014-11-07 13:34:43 -08:00
Jonathan Boulle 285cd404e3 etcdserver: print peerURLs when adding member 2014-11-07 12:00:41 -08:00
Jonathan Boulle 5055863e09 etcdserver: add docstrings for confchanges 2014-11-07 10:19:55 -08:00
Xiang Li bf47fe7cac Merge pull request #1647 from xiangli-cmu/force_cluster
etcdserver: force new cluster
2014-11-07 10:15:53 -08:00
Yicheng Qin 9d19429993 Merge pull request #1609 from yichengq/202
etcdserver: refactor sender
2014-11-07 10:12:02 -08:00
Xiang Li 0a9c6164af etcdserver: add support for force cluster 2014-11-07 08:49:01 -08:00
Jonathan Boulle 8f1885a398 discovery: add command line flag for discovery-proxy 2014-11-06 16:35:24 -08:00
Xiang Li c5e6053fcd Merge pull request #1638 from xiangli-cmu/better_logging
etcdserver: better logging for clusterFromPeerURLs
2014-11-06 14:33:53 -08:00
Xiang Li eb0d80767e etcdserver: better logging for clusterFromPeerURLs 2014-11-06 14:28:07 -08:00
Yicheng Qin 457b30e585 etcdserver: add/remove sender in sendhub explicitly 2014-11-06 14:04:14 -08:00
Yicheng Qin 1e05cd75c7 etcdserver: refactor sender
1. restrict the number of inflight connections to remote member
2. support stop
2014-11-06 14:04:14 -08:00
Jonathan Boulle 04f6208ace etcdmain: use StringsFlag for initialclusterstate 2014-11-06 11:13:24 -08:00
Xiang Li 4ed60471fe Merge pull request #1627 from xiangli-cmu/validate_peer_url
etcdserver: validate peerurl when adding members
2014-11-06 10:43:22 -08:00
Xiang Li bd2b18b6de etcdserver: validate peerurl when adding members 2014-11-05 23:12:48 -08:00
Jonathan Boulle 68bca981de discovery: simplify interface
There's no real need to expose a Discoverer interface/struct when the
only use of the interface (and indeed the module) is to invoke a single
function. This isn't Java, after all. So instead, simplify to Discovery
exposing just two functions: JoinCluster (i.e. what was formerly called
"discovery"), and GetCluster (hitherto "ProxyDiscovery")
2014-11-05 22:45:01 -08:00
Xiang Li 99b1af40c6 etcdserver: move config validation to cluster 2014-11-05 17:55:07 -08:00
Xiang Li 3fc6f9c24f Merge pull request #1586 from xiangli-cmu/fix_node
*: add Advance interface to raft.Node
2014-11-05 15:09:51 -08:00
Xiang Li 0d7c43d885 *: add a Advance interface to raft.Node
Node set the applied to committed right after it sends out Ready to application. This is not
correct since the application has not actually applied the entries at that point. We add a
Advance interface to Node. Application needs to call Advance to tell raft Node its progress.
Also this change can avoid unnecessary copying when application is still applying entires but
there are more entries to be applied.
2014-11-05 15:04:14 -08:00
Yicheng Qin c5140d5c18 Merge pull request #1614 from yichengq/194
*: handle panic and fatal organizedly
2014-11-05 14:08:35 -08:00
Yicheng Qin 791b2fd503 *: handle panic and fatal more consistently
1. etcd fatals if there is critical error in the system and operator should
do something for it
2. etcd panics if there happens something unexpected, and it should be
reported to us to debug.
2014-11-05 13:53:24 -08:00
Jonathan Boulle 89eac70d09 proxy: add docstrings 2014-11-05 10:30:05 -08:00
Xiang Li f71c247d87 Merge pull request #1604 from xiangli-cmu/fallback_proxy
*: support discovery fallback
2014-11-04 16:41:28 -08:00
Jonathan Boulle e4d0c25365 etcdserver: log adding and removing nodes 2014-11-04 15:05:15 -08:00
Xiang Li 5cb13fd071 *: support discovery fallback 2014-11-04 14:30:22 -08:00
Yicheng Qin 866ec5948c etcdhttp/etcdserver: support HEAD on /v2/keys/ namespace 2014-11-04 00:06:49 -08:00
Yicheng Qin 5ed5d44652 etcdserver: print out initial cluster members
It is moved from etcdmain pkg because the line should only be printed out
when etcd bootstraps at the first time.
2014-11-03 19:34:24 -08:00
Yicheng Qin 5da481213e Merge pull request #1478 from unihorn/190
etcdserver: panic on storage error
2014-11-03 11:07:55 -08:00
Yicheng Qin 433b4138c5 etcdserver: panic on storage error
It is a critical error to etcd, and etcd is not able to recover it now.
2014-11-03 10:46:04 -08:00
Brian Waldon 6e038e02a6 etcdserver: fix logging of IDs 2014-10-31 12:26:53 -07:00
Yicheng Qin e02ef6b141 Merge pull request #1546 from unihorn/198
etcdserver: better logging for assign ids from upstream
2014-10-31 11:13:43 -07:00
Yicheng Qin 2c5f062b7f etcdserver: better logging for assign ids from upstream 2014-10-31 11:06:31 -07:00
Jonathan Boulle 55c92ad456 *: create ID type
This creates a simple ID type (wrapped around uint64) to provide for
standard serialization/deserialization to a string (i.e. base 16
encoded). This replaces strutil so now that package is removed.
2014-10-31 10:34:07 -07:00
Yicheng Qin aa50af1c69 *: clean log.Print
1. only log things by default that the operator of etcd may need to react to
2. put package name at the head of log lines
2014-10-30 18:15:53 -07:00
Xiang Li 3dfb6723b2 *: rename initial-cluster-name to initial-cluster-token 2014-10-30 13:43:38 -07:00
Brian Waldon 480e92d340 strutil: move IDAsHex/IDFromHex to new pkg 2014-10-27 18:39:09 -07:00
Brian Waldon 2472953939 etcdhttp: hex-encode member ID 2014-10-27 17:25:22 -07:00
Brian Waldon 80172c3d4a etcdserver: s/parseMemberID/mustParseMemberIDFromKey/ 2014-10-27 17:25:00 -07:00
Yicheng Qin ee27846d5b Merge pull request #1422 from unihorn/187
etcdserver: parse context error for better message
2014-10-27 11:00:01 -07:00
Yicheng Qin e77f8e311c etcdserver: parse context error for better message 2014-10-27 10:59:15 -07:00
Xiang Li 009b737cef *: better logging 2014-10-26 08:13:03 -07:00
Jonathan Boulle 719c57a29d proxy: retrieve ClientURLs from cluster
This is a simple solution to having the proxy keep up to date with the
state of the cluster. Basically, it uses the cluster configuration
provided at start up (i.e. with `-initial-cluster-state`) to determine
where to reach peer(s) in the cluster, and then it will periodically hit
the `/members` endpoint of those peer(s) (using the same mechanism that
`-cluster-state=existing` does to initialise) to update the set of valid
client URLs to proxy to.

This does not address discovery (#1376), and it would probably be better
to update the set of proxyURLs dynamically whenever we fetch the new
state of the cluster; but it needs a bit more thinking to have this done
in a clean way with the proxy interface.

Example in Procfile works again.
2014-10-24 15:54:12 -07:00
Yicheng Qin 49a68adcf1 Merge pull request #1386 from unihorn/184
etcdserver: update member attribute when apply request
2014-10-24 12:46:11 -07:00
Yicheng Qin ea0bff80c0 etcdserver: update member attribute when apply request 2014-10-24 12:43:53 -07:00
Yicheng Qin 08593bcdf6 etcdserver: support newly-join member bootstrap 2014-10-24 12:38:44 -07:00
Yicheng Qin 4d80f01201 etcdserver: Cluster.IsIDremoved -> Cluster.IsIDRemoved 2014-10-23 14:29:58 -07:00
Yicheng Qin 8eee8c260e etcdserver: rebase on master and code clean 2014-10-23 13:58:55 -07:00
Yicheng Qin f8b8bdeb17 etcdserver: use path.Join for member key in cluster 2014-10-23 13:27:54 -07:00
Yicheng Qin 3d243baacd etcdserver: generate id when new cluster 2014-10-23 13:27:54 -07:00
Yicheng Qin 67412e07f8 etcdserver: MemberFromName -> MemberByName 2014-10-23 13:27:54 -07:00
Yicheng Qin 89572b5fd7 etcdserver: refactor cluster and clusterStore
Integrate clusterStore into cluster, and let cluster become the source of
cluster info.
2014-10-23 13:27:54 -07:00
Brandon Philips ab90369f9e etcdserver: use hex for cluster and machine id
Continue using hex everywhere. Including here.

TODO: cleanup the printing of the structs which currently have decimal
to/from:

`{Type:MsgAppResp To:9973738105406047488 From:17050684879817348455 T...`
2014-10-22 16:24:50 -07:00
Barak Michener e42d65da12 etcdserver: Check the initial cluster settings after checking if the WAL exists 2014-10-22 18:16:43 -04:00
Barak Michener 13656eb4e7 Merge pull request #1340 from barakmich/better_ids2
etcdserver: Calculate IDs based on PeerURLs and --initial-cluster-name
2014-10-22 14:49:49 -04:00
Yicheng Qin 89b032cd69 etcdserver: Member.storeKey -> memberStoreKey 2014-10-22 11:09:36 -07:00
Yicheng Qin 7498234e40 etcdserver: record removed member to check incoming message 2014-10-22 11:09:35 -07:00
Barak Michener 502a3c2460 Refactor Cluster to hold and add members. 2014-10-22 13:52:42 -04:00
Barak Michener 1ca7c031ff first round of comments
Conflicts:
	etcdserver/config.go
	etcdserver/config_test.go
	etcdserver/server.go
	main.go
2014-10-22 13:49:54 -04:00
Barak Michener 456d1ebcae etcdserver: Calculate IDs for nodes solely on PeerURLs
Removes the notion of name being anything more than advisory or
command-line grouping, and adds checks for bootstrapping the command
line. IDs are consistent if the URLs are consistent.
2014-10-22 13:49:54 -04:00
Yicheng Qin e2b6a4fc4c etcdserver: const XXXDir -> StoreXXXPrefix
and code clean
2014-10-21 16:10:49 -07:00
Yicheng Qin 2ff3cac653 etcdserver/etcdhttp: store location adjustment
Detailed adjustment:
/_etcd/machines/* -> /0/members/*
/* -> /1/*

And it keeps key path returned to user the same as before.
2014-10-21 16:10:19 -07:00
Xiang Li 894e678ad6 etcdserver: checking clusterID 2014-10-21 11:05:24 -07:00
Xiang Li a44849deec Merge pull request #1286 from coreos/clusterid
*: generate clusterid
2014-10-20 19:07:03 -07:00
Xiang Li 0fd28169c8 etcdserver: use id,cid 2014-10-20 16:35:41 -07:00
Xiang Li dc68dc9ebd etcdserver: add a todo for clusterid generation 2014-10-20 16:26:31 -07:00