Commit Graph

209 Commits (c8ffa36d9e473a9cb2d570f9176c9109b066df68)

Author SHA1 Message Date
Hitoshi Mitake 6974fc63ed etcdserver: avoid deadlock caused by adding members with wrong peer URLs
Current membership changing functionality of etcd seems to have a
problem which can cause deadlock.

How to produce:
 1. construct N node cluster
 2. add N new nodes with etcdctl member add, without starting the new members

What happens:
After finishing add N nodes, a total number of the cluster becomes 2 *
N and a quorum number of the cluster becomes N + 1. It means
membership change requires at least N + 1 nodes because Raft treats
membership information in its log like other ordinal log append
requests.

Assume the peer URLs of the added nodes are wrong because of miss
operation or bugs in wrapping program which launch etcd. In such a
case, both of adding and removing members are impossible because the
quorum isn't preserved. Of course ordinal requests cannot be
served. The cluster would seem to be deadlock.

Of course, the best practice of adding new nodes is adding one node
and let the node start one by one. However, the effect of this problem
is so serious. I think preventing the problem forcibly would be
valuable.

Solution:
This patch lets etcd forbid adding a new node if the operation changes
quorum and the number of changed quorum is larger than a number of
running nodes. If etcd is launched with a newly added option
-strict-reconfig-check, the checking logic is activated. If the option
isn't passed, default behavior of reconfig is kept.

Fixes https://github.com/coreos/etcd/issues/3477
2015-09-13 09:31:53 +09:00
Dmitry Smirnov b2f4a5f587 *: fix spelling issues (codespell).
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
2015-09-11 10:22:29 +10:00
Raoof Mohammed 2de1c36061 etcdmain: Proxy doesnt specify - listening on http or https
etcdmain: Proxy doesnt specify - listening on http or https

Fixes #3464
2015-09-08 17:19:23 -04:00
Xiang Li 7957677cf2 etcdmain: proxy does not need to belong to the discovered cluster 2015-09-01 11:24:02 -07:00
Xiang Li d94e712d91 *: support wal dir 2015-09-01 09:54:27 -07:00
Yicheng Qin 58455a2ae4 etcdmain: check error before assigning peer transport
Or it may panic when new transport fails, e.g., TLS info is invalid.
2015-08-25 22:04:26 -07:00
Yicheng Qin 2ac9a329ab etcdmain: stop setting GOMAXPROCS explicitly
We always want to use GOMAXPROCS() as the way go parses it. When in go1.4, we
want to expose GOMAXPROCS value, so we set GOMAXPROCS explicitly as the
way go 1.4 does and print it out.

But it becomes a problem when go 1.5 changes the way to set GOMAXPROCS.

Fix the problem by stop setting GOMAXPROCS and get its value directly.

Due to this change, it sets default GOMAXPROCS to the
number of CPUs available when compiling in go 1.5, which matches how go 1.5 works:
https://docs.google.com/document/d/1At2Ls5_fhJQ59kDK2DFVhFu3g5mATSXqqV5QrxinasI/edit

This is a behavior change in etcd 2.2.
2015-08-25 13:38:16 -07:00
Xiang Li 92634356c1 *: use limitedListener from golang 2015-08-20 20:02:35 -07:00
Xiang Li 6b77c146ec etcdmain: print out version information on startup 2015-08-20 14:50:16 -07:00
Yicheng Qin ffae601af5 etcdmain: calculate dial timeout for peer transport
This helps peer communication in globally-deployed cluster.
2015-08-17 16:52:53 -07:00
Yicheng Qin c3d4d11402 etcdhttp: adjust request timeout based on config
It uses heartbeat interval and election timeout to estimate the
expected request timeout.

This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
2015-08-12 09:22:59 -07:00
Xiang Li a718329ad3 Merge pull request #3248 from xiang90/v3
initial v3 demo
2015-08-10 13:59:03 -07:00
Xiang Li 6c58333969 etcdmain: use default formatter
The default formatter would use syslog style when running
under init system, and would use pretty format otherwise.
2015-08-10 13:38:22 -07:00
Xiang Li f004b4dac7 *: etcdserver supports v3 demo 2015-08-08 05:58:29 -07:00
Xiang Li 1b572ae2dd etcdmain: fix path printing 2015-08-06 15:53:24 -07:00
Xiang Li 7314310aed Merge pull request #3233 from xiang90/srv_discovery
better dns discovery error and doc
2015-08-06 14:35:22 -07:00
Yicheng Qin 2c2249dadc Merge pull request #3219 from yichengq/limit-listener
etcdmain: stop accepting client conns when it reachs limit
2015-08-06 12:17:49 -07:00
Yicheng Qin 97923ca3fc etcdmain: close client conns when it exceeds limit
This solves the problem that etcd may fatal because its critical path
cannot get file descriptor resource when the number of clients is too
big. The PR lets the client listener close client connections
immediately after they are accepted when
the file descriptor usage in the process reaches some pre-set limit, so
it ensures that the internal critical path could always get file
descriptor when it needs.

When there are tons to clients connecting to the server, the original
behavior is like this:

```
2015/08/4 16:42:08 etcdserver: cannot monitor file descriptor usage
(open /proc/self/fd: too many open files)
2015/08/4 16:42:33 etcdserver: failed to purge snap file open
default2.etcd/member/snap: too many open files
[halted]
```

Current behavior is like this:

```
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:26 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:27 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 etcdserver: 80% of the file descriptor limit is
used [used = 873, limit = 1024]
```

It is available at linux system today because pkg/runtime only has linux
support.
2015-08-06 12:03:20 -07:00
Xiang Li 203e0f178b etcdmian: better error for srv discovery failure 2015-08-06 11:38:53 -07:00
Xiang Li 0cbac56fa2 etcdmain: support sdnotify for readiness 2015-07-31 13:33:18 +08:00
Xiang Li 6be02ff5ec etcdmian: fix initialization confilct
Fix #3142

Ignore flags if etcd is already initialized.
2015-07-21 12:53:21 -07:00
Yicheng Qin 24db661401 etcdmain: warn when listening on HTTP if TLS is set
If the user sets TLS info, this implies that he wants to listen on TLS.
If etcd finds that urls to listen is still HTTP schema, it prints out
warning to notify user about possible wrong setting.
2015-07-21 12:53:21 -07:00
Xiang Li dc3f7f5d90 *: detect duplicate name for discovery bootstrap 2015-07-21 12:53:20 -07:00
Xiang Li dedabddcb3 etcdmain: proxy ignores discovery if it is initialized 2015-07-10 12:52:24 -07:00
Michal Witkowski 7bca757d09 *: add metrics to `store` and `proxy`. 2015-07-07 16:01:51 +01:00
Wolfgang Ebner 1264dbe24d proxy: added endpoint refresh and timeout configuration values
the default dial timeout was set to 30 seconds this made the proxy a pain to use
in failure scenarios.

fixes 2862
2015-06-13 09:42:18 +02:00
Xiang Li 6c8b32d316 etcdmain: exit if discovery fails
Fix #2919

If discovery fails, etcd will hang there and does nothing. This
commit fixes the problem.
2015-06-11 15:45:00 -07:00
Yicheng Qin 1764837783 etcdmain: clean up plog.Printf
Put it into different log levels.
2015-06-11 10:24:02 -07:00
Yicheng Qin 5a9c2851a7 etcdmain: var log -> plog
So the variable name doesn't mess up with standard package name.
2015-06-10 16:19:06 -07:00
Yicheng Qin 0589afe605 etcdmain: increase maxIdleConnsPerHost in proxy transport
This PR set maxIdleConnsPerHost to 128 to let proxy handle 128 concurrent
requests in long term smoothly.
If the number of concurrent requests is bigger than this value,
proxy needs to create one new connection when handling each request in
the delta, which is bad because the creation consumes resource and may
eat up your ephemeral port.
2015-06-01 16:19:36 -07:00
Alex Altair 6f8c36c2ab etcdmain: use double-dash in message flag 2015-05-28 13:09:44 -07:00
Xiang Li 7875de7d2f etcdmian: remove main prefix in logging
We are using new log pkg, which adds the prefix for us.
2015-05-27 10:01:22 -07:00
Prashanth Balasubramanian 1e15b05e4c etcdmain: explicitly set gomaxprocs and log its value 2015-05-27 09:53:05 -07:00
Yicheng Qin a6a649f1c3 etcdserver: stop exposing Cluster struct
After this PR, only cluster's interface Cluster is exposed, which makes
code much cleaner. And it avoids external packages to rely on cluster
struct in the future.
2015-05-13 10:01:25 -07:00
Yicheng Qin 032db5e396 *: extract types.Cluster from etcdserver.Cluster
The PR extracts types.Cluster from etcdserver.Cluster. types.Cluster
is used for flag parsing and etcdserver config.

There is no need to expose etcdserver.Cluster public, which contains
lots of etcdserver internal details and methods. This is the first step
for it.
2015-05-12 14:53:11 -07:00
Xiang Li 91cbf47a2a etcdmain: better error msg when detected duplicate id in discovery 2015-05-11 17:34:44 -07:00
Yicheng Qin 3f90394fbb etcdmain: advertise-client-urls must be set if listen-client-urls is set
Before this PR, people can set listen-client-urls without setting
advertise-client-urls, and leaves advertise-client-urls as default
localhost value. The client libraries which sync the cluster info
fetch wrong advertise-client-urls and cannot connect to the cluster.
This PR avoids this case and provides better UX.

On the other hand, this change is safe because people always want to set
advertise-client-urls if listen-client-urls is set. The default localhost
advertise url cannot be accessed from the outside, and should always be
set except that etcd is bootstrapped with no flag.
2015-04-29 09:52:15 -07:00
Barak Michener ad8e3ea5dc etcdmain: fix logging flag documentation 2015-04-28 16:31:19 -04:00
Barak Michener b369cf037a etcdmain: New Logging Package
use capnslog

Vendor capnslog and set the flags in etcd main

remove package prefix from etcdmain
2015-04-28 15:42:32 -04:00
Yicheng Qin 1811701427 Revert "etcdserver: fix cluster fallback recovery"
This reverts commit cff005777a.

Conflicts:
	etcdserver/server.go
2015-04-19 11:34:33 -07:00
Yicheng Qin 0ac05e310e etcdmain: print error when non-flag args remain 2015-03-23 11:23:47 -07:00
Brandon Philips ea72f2637c etcdmain: let user provide a name w/o initial-cluster update
Currently this doesn't work if a user wants to try out a single machine
cluster but change the name for whatever reason. This is because the
name is always "default" and the

```
./bin/etcd -name 'baz'
```

This solves our problem on CoreOS where the default is `ETCD_NAME=%m`.
2015-03-18 17:24:52 -07:00
Xiang Li 1ab68902a9 etcdmain: identify data dir type 2015-03-17 16:10:58 -07:00
Yicheng Qin 2c94e2d771 *: make dial timeout configurable
Dial timeout is set shorter because
1. etcd is supposed to work in good environment, and the new value is long
enough
2. shorter dial timeout makes dial fail faster, which is good for
performance
2015-02-28 11:18:59 -08:00
Xiang Li 7bf615aee0 *: drop old metrics pkg 2015-02-28 11:16:41 -08:00
Xiang Li 2e078582f9 etcdmain: expose runtime metrics 2015-02-28 10:11:53 -08:00
Yicheng Qin cff005777a etcdserver: fix cluster fallback recovery
Cluster and transport may recover to old states when new node joins
the cluster. Record cluster last modified index to avoid this.
2015-02-20 14:30:00 -08:00
Xiang Li c5ca1218f3 etcdserver: GetClusterFromPeers -> GetClusterFromRemotePeers 2015-02-13 19:05:29 -08:00
Fabian Reinartz 8bf795dc3c etcdmain/osutil: shutdown gracefully, interrupt handling
The functionality in pkg/osutil ensures that all interrupt handlers finish
and the process kills itself with the proper signal.
Test for interrupt handling added.
The server shutsdown gracefully by stopping on interrupt (Issue #2277.)
2015-02-13 10:28:53 +01:00
Barak Michener fade9b6065 etcdserver: Refactor 2.0.1 directory rename into a proper migration
fix all instances

fix detection test
2015-02-12 11:53:19 -05:00
Yicheng Qin 92b329fdb9 etcdmain: use symlink instead of link for v0.4 files
link doesn't support directory.
2015-02-03 10:59:43 -08:00
Yicheng Qin afb14a3e7a Merge pull request #2210 from yichengq/316
etcdmain: use /member subdir to save member data
2015-02-02 17:06:30 -08:00
Yicheng Qin ce1d7a9fa9 etcdmain: use /member subdir to save member data 2015-02-02 17:01:19 -08:00
Xiang Li fbabcedcc9 etcd: fix proxy
1. move proxy datadir to /proxy subdir.
2. delay update proxy's cluster after validation.
2015-02-02 14:58:45 -08:00
Xiang Li ae9f54c132 etcd: fix proxy updating 2015-01-30 16:56:41 -08:00
Xiang Li dc7374c488 etcd: persist proxy cluster to disk 2015-01-30 15:18:26 -08:00
Xiang Li 9c7f66c5d9 Merge pull request #2119 from sorah/peer-ca-on-fetching-members
etcdserver: User peerTLSInfo to get cluster member
2015-01-26 10:50:44 -08:00
Shota Fukumori (sora_h) 033e7d1db9 etcdserver: User peerTLSInfo to get cluster member 2015-01-27 03:43:21 +09:00
Jonathan Boulle f1ed69e883 *: switch to line comments for copyright
Build tags are not compatible with block comments.
Also adds copyright header to a few places it was missing.
2015-01-26 09:53:30 -08:00
Xiang Li 276a4abac0 etcdserver: make heartbeat/election configurable 2015-01-15 11:11:33 -08:00
Yicheng Qin dc6aef0d02 etcdhttp: add NewPeerHandler test 2015-01-12 15:56:29 -08:00
Xiang Li a15f39e6a2 etcdmain: do not set timeout for client api 2015-01-06 16:17:56 -08:00
Xiang Li 7f1c630a0b *: use keepalive listener to detect dead clients 2015-01-06 12:09:34 -08:00
Xiang Li 0afbca4090 etcdmain: add readtimeout for http server 2015-01-06 11:04:38 -08:00
Xiang Li 08e9c25ea5 *: move srv into pkg discovery 2014-12-24 21:37:20 -08:00
Xiang Li 0fa754d90e etcdmain: add config.go 2014-12-19 18:33:19 -08:00
Barak Michener 4f2d35679e Merge pull request #1947 from barakmich/dns_bootstrap
add capability to bootstrap from DNS SRV
2014-12-19 13:45:03 -08:00
Barak Michener 8fc17147ef change logging 2014-12-19 16:40:29 -05:00
Barak Michener 6295dfba5a resolve all hostnames in DNS discovery 2014-12-18 19:19:21 -05:00
Barak Michener 977c74069c move constants out for windows 2014-12-18 18:57:11 -05:00
Barak Michener 2dfcf053d4 rename flag to discovery-srv 2014-12-18 18:13:40 -05:00
Barak Michener 7f733ad68b Fully resolve DNS entries to IPs and ignore single errors (such as no etcd-server-ssl) 2014-12-18 18:08:56 -05:00
Barak Michener fc70aa27d2 add apurl checking and logging 2014-12-17 20:53:12 -05:00
Barak Michener 04d9f848a7 fix from comments 2014-12-17 20:28:48 -05:00
Barak Michener fdad6630ea Add a simple test and mock for genDNS 2014-12-17 20:18:41 -05:00
Barak Michener af4272848d add capability to bootstrap from DNS 2014-12-15 19:26:42 -05:00
Xiang Li ec777ebd28 Merge pull request #1918 from xiang90/http_no_logging
etcdmain: discard the http server logging
2014-12-11 16:06:58 -08:00
Xiang Li 3a83ab1b71 etcdmain: discard the http server logging 2014-12-11 16:06:28 -08:00
Xiang Li d9b21c79d4 etcdmain: better logging for discovery error 2014-12-11 16:03:27 -08:00
Xiang Li 0416503124 Merge pull request #1803 from junxu/master
etcdmain: Fix misuse "-addr" flag
2014-12-11 09:45:17 -08:00
Xiang Li a1f648e5db etcdmain: format usage 2014-12-04 17:21:23 -08:00
Xiang Li d3db010190 *: support purging old wal/snap files 2014-12-01 11:50:17 -08:00
junxu 43d6f9f964 Update etcd.go
etcdmain: Fix misuse "-addr" flag

In code, it uses "-advertise-client-urls" or "-addr" flags to get the list of this member's peer URLs, 
It should be using "-peer-addr" flag instead of "-addr" flag.
2014-11-27 10:38:47 +08:00
Yicheng Qin 3e55834c38 *: set read/write timeout for raft transport and listener 2014-11-24 13:46:44 -08:00
Xiang Li 8bf71d796e *: gracefully stop etcdserver 2014-11-14 14:12:24 -08:00
Xiang Li 92096dfdc3 *: print out configuration when necessary 2014-11-13 10:46:42 -08:00
Jonathan Boulle 1197c1f965 etcdserver: move peer URLs check to config 2014-11-12 13:12:49 -08:00
Jonathan Boulle 3f358b6d5d etcdserver: ensure initial-advertise-peer-urls match initial-cluster
This adds a check to setupCluster to ensure that the list of URLs
specified in `initial-advertise-peer-urls` matches those configured in
`initial-cluster` for this node. Also updates the documentation to
clarify this and address some changes in wording.
2014-11-12 12:54:35 -08:00
Xiang Li b6f0c789b8 transport: create a tls listener only if the tlsInfo is not empty and the scheme is HTTPS 2014-11-11 11:51:57 -08:00
Jonathan Boulle e1e454f138 etcdmain: do not exit inappropriately 2014-11-10 12:34:14 -08:00
Jonathan Boulle 8799679083 etcdmain: actually return errors 2014-11-10 11:59:59 -08:00
Jonathan Boulle a607e097c6 etcdserver: re-order ServerConfig fields 2014-11-07 11:45:59 -08:00
Xiang Li 0a9c6164af etcdserver: add support for force cluster 2014-11-07 08:49:01 -08:00
Jonathan Boulle 376268391b Merge pull request #1646 from jonboulle/1536_disco_proxy
discovery: add command line flag for discovery-proxy
2014-11-07 08:32:23 -08:00
Jonathan Boulle 8f1885a398 discovery: add command line flag for discovery-proxy 2014-11-06 16:35:24 -08:00
Jonathan Boulle 321d65c4ac pkg: fix SetFlagsFromEnv behaviour
This function was fundamentally buggy, as a panic could be trivially
triggered by setting the wrong environment variable (e.g.
ETCD_BIND_ADDR=foo). Instead, let's propagate the error and present it
to the user in a cleaner way.
2014-11-06 14:39:30 -08:00
Jonathan Boulle 04f6208ace etcdmain: use StringsFlag for initialclusterstate 2014-11-06 11:13:24 -08:00
Jonathan Boulle 68bca981de discovery: simplify interface
There's no real need to expose a Discoverer interface/struct when the
only use of the interface (and indeed the module) is to invoke a single
function. This isn't Java, after all. So instead, simplify to Discovery
exposing just two functions: JoinCluster (i.e. what was formerly called
"discovery"), and GetCluster (hitherto "ProxyDiscovery")
2014-11-05 22:45:01 -08:00
Jonathan Boulle b85496922f etcdmain: simplify proxy start logic 2014-11-05 11:41:03 -08:00
Jonathan Boulle 5de9d38cc6 pkg: move to more generic StringsFlag 2014-11-04 16:52:56 -08:00
Xiang Li 71acd0c3d0 discovery: consolidate proxyDiscover and Discover interface 2014-11-04 16:38:05 -08:00
Xiang Li 5cb13fd071 *: support discovery fallback 2014-11-04 14:30:22 -08:00
Yicheng Qin e4b12a8e28 Merge pull request #1593 from unihorn/200
etcdserver: print out initial cluster members
2014-11-03 22:23:40 -08:00
Yicheng Qin 5ed5d44652 etcdserver: print out initial cluster members
It is moved from etcdmain pkg because the line should only be printed out
when etcd bootstraps at the first time.
2014-11-03 19:34:24 -08:00
Xiang Li 075ab6415f Merge pull request #1587 from xiangli-cmu/fix_wal
wal: sync before returning from create
2014-11-03 15:58:47 -08:00
Xiang Li dd09042632 etcdserver: try to listen on ports before initializing etcd server 2014-11-03 15:55:58 -08:00
Kelsey Hightower 3ec4da6ac6 etcd: print initial cluster members during startup
etcd now prints the initial clusters members during startup.

```
2014/11/03 10:32:46 etcd: initial cluster members: etcd0=http://127.0.0.1:2380,etcd1=http://127.0.0.1:2390,etcd2=http://127.0.0.1:2400
```
2014-11-03 10:38:18 -08:00
Xiang Li 3dfb6723b2 *: rename initial-cluster-name to initial-cluster-token 2014-10-30 13:43:38 -07:00
Jonathan Boulle cf9dd31daa etcd: move main logic to etcdmain subpackage 2014-10-29 18:43:22 -07:00