Based on the configuration doc, seems these two flags are missing
in the help. So add them and the descriptions are from config.go in
the same directory.
Signed-off-by: Yiqiao Pu <ypu@redhat.com>
We should wrap the blocking function with a closure. And first
creates a go routine to execute the function. Or the inner function
blocks before creating the go routine.
This moves the code to create listener and roundTripper for raft communication
to the same place, and use explicit functions to build them. This prevents
possible development errors in the future.
rafthttp has different requirements for connections created by the
transport for different usage, and this is hard to achieve when giving
one http.RoundTripper. Pass into pkg the data needed to build transport
now, and let rafthttp build its own transports.
Before this PR, the log is
```
2015/09/1 13:18:31 etcdmain: client: etcd cluster is unavailable or
misconfigured
```
It is quite hard for people to understand what happens.
Now we print out the exact reason for the failure, and explains the way
to handle it.
Current membership changing functionality of etcd seems to have a
problem which can cause deadlock.
How to produce:
1. construct N node cluster
2. add N new nodes with etcdctl member add, without starting the new members
What happens:
After finishing add N nodes, a total number of the cluster becomes 2 *
N and a quorum number of the cluster becomes N + 1. It means
membership change requires at least N + 1 nodes because Raft treats
membership information in its log like other ordinal log append
requests.
Assume the peer URLs of the added nodes are wrong because of miss
operation or bugs in wrapping program which launch etcd. In such a
case, both of adding and removing members are impossible because the
quorum isn't preserved. Of course ordinal requests cannot be
served. The cluster would seem to be deadlock.
Of course, the best practice of adding new nodes is adding one node
and let the node start one by one. However, the effect of this problem
is so serious. I think preventing the problem forcibly would be
valuable.
Solution:
This patch lets etcd forbid adding a new node if the operation changes
quorum and the number of changed quorum is larger than a number of
running nodes. If etcd is launched with a newly added option
-strict-reconfig-check, the checking logic is activated. If the option
isn't passed, default behavior of reconfig is kept.
Fixes https://github.com/coreos/etcd/issues/3477
We always want to use GOMAXPROCS() as the way go parses it. When in go1.4, we
want to expose GOMAXPROCS value, so we set GOMAXPROCS explicitly as the
way go 1.4 does and print it out.
But it becomes a problem when go 1.5 changes the way to set GOMAXPROCS.
Fix the problem by stop setting GOMAXPROCS and get its value directly.
Due to this change, it sets default GOMAXPROCS to the
number of CPUs available when compiling in go 1.5, which matches how go 1.5 works:
https://docs.google.com/document/d/1At2Ls5_fhJQ59kDK2DFVhFu3g5mATSXqqV5QrxinasI/edit
This is a behavior change in etcd 2.2.
It uses heartbeat interval and election timeout to estimate the
expected request timeout.
This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
This solves the problem that etcd may fatal because its critical path
cannot get file descriptor resource when the number of clients is too
big. The PR lets the client listener close client connections
immediately after they are accepted when
the file descriptor usage in the process reaches some pre-set limit, so
it ensures that the internal critical path could always get file
descriptor when it needs.
When there are tons to clients connecting to the server, the original
behavior is like this:
```
2015/08/4 16:42:08 etcdserver: cannot monitor file descriptor usage
(open /proc/self/fd: too many open files)
2015/08/4 16:42:33 etcdserver: failed to purge snap file open
default2.etcd/member/snap: too many open files
[halted]
```
Current behavior is like this:
```
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:26 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:27 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 etcdserver: 80% of the file descriptor limit is
used [used = 873, limit = 1024]
```
It is available at linux system today because pkg/runtime only has linux
support.