vitalif/etcd - etcd

Commit Graph

Author	SHA1	Message	Date
Yicheng Qin	49d262185d	Merge pull request #3590 from yichengq/discovery-log etcdmain: improve log when join discovery fails	2015-09-29 08:02:18 -07:00
Yicheng Qin	939aa96a34	etcdmain: improve log when join discovery fails Before this PR, the log is ``` 2015/09/1 13:18:31 etcdmain: client: etcd cluster is unavailable or misconfigured ``` It is quite hard for people to understand what happens. Now we print out the exact reason for the failure, and explains the way to handle it.	2015-09-28 23:23:50 -07:00
Yicheng Qin	dc9a75df1c	etcdmain: exit after print out ErrDuplicateID etcd should exit after printing log for unhandlable error.	2015-09-25 14:10:50 -07:00
Xiang Li	3b70bf87c3	etcdmain: better logging when user forget to set initial flags	2015-09-21 10:43:26 -07:00
Xiang Li	662b4966d0	Merge pull request #3510 from xiang90/v3_raft etcdmain: support gRPC addr flag	2015-09-12 22:58:08 -07:00
Xiang Li	a0cfcf2dd7	etcdmain: support gRPC addr flag	2015-09-12 22:52:51 -07:00
Hitoshi Mitake	6974fc63ed	etcdserver: avoid deadlock caused by adding members with wrong peer URLs Current membership changing functionality of etcd seems to have a problem which can cause deadlock. How to produce: 1. construct N node cluster 2. add N new nodes with etcdctl member add, without starting the new members What happens: After finishing add N nodes, a total number of the cluster becomes 2 * N and a quorum number of the cluster becomes N + 1. It means membership change requires at least N + 1 nodes because Raft treats membership information in its log like other ordinal log append requests. Assume the peer URLs of the added nodes are wrong because of miss operation or bugs in wrapping program which launch etcd. In such a case, both of adding and removing members are impossible because the quorum isn't preserved. Of course ordinal requests cannot be served. The cluster would seem to be deadlock. Of course, the best practice of adding new nodes is adding one node and let the node start one by one. However, the effect of this problem is so serious. I think preventing the problem forcibly would be valuable. Solution: This patch lets etcd forbid adding a new node if the operation changes quorum and the number of changed quorum is larger than a number of running nodes. If etcd is launched with a newly added option -strict-reconfig-check, the checking logic is activated. If the option isn't passed, default behavior of reconfig is kept. Fixes https://github.com/coreos/etcd/issues/3477	2015-09-13 09:31:53 +09:00
Dmitry Smirnov	b2f4a5f587	*: fix spelling issues (codespell). Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>	2015-09-11 10:22:29 +10:00
Raoof Mohammed	2de1c36061	etcdmain: Proxy doesnt specify - listening on http or https etcdmain: Proxy doesnt specify - listening on http or https Fixes #3464	2015-09-08 17:19:23 -04:00
Xiang Li	7957677cf2	etcdmain: proxy does not need to belong to the discovered cluster	2015-09-01 11:24:02 -07:00
Xiang Li	d94e712d91	*: support wal dir	2015-09-01 09:54:27 -07:00
Yicheng Qin	58455a2ae4	etcdmain: check error before assigning peer transport Or it may panic when new transport fails, e.g., TLS info is invalid.	2015-08-25 22:04:26 -07:00
Yicheng Qin	2ac9a329ab	etcdmain: stop setting GOMAXPROCS explicitly We always want to use GOMAXPROCS() as the way go parses it. When in go1.4, we want to expose GOMAXPROCS value, so we set GOMAXPROCS explicitly as the way go 1.4 does and print it out. But it becomes a problem when go 1.5 changes the way to set GOMAXPROCS. Fix the problem by stop setting GOMAXPROCS and get its value directly. Due to this change, it sets default GOMAXPROCS to the number of CPUs available when compiling in go 1.5, which matches how go 1.5 works: https://docs.google.com/document/d/1At2Ls5_fhJQ59kDK2DFVhFu3g5mATSXqqV5QrxinasI/edit This is a behavior change in etcd 2.2.	2015-08-25 13:38:16 -07:00
Xiang Li	92634356c1	*: use limitedListener from golang	2015-08-20 20:02:35 -07:00
Xiang Li	6b77c146ec	etcdmain: print out version information on startup	2015-08-20 14:50:16 -07:00
Yicheng Qin	ffae601af5	etcdmain: calculate dial timeout for peer transport This helps peer communication in globally-deployed cluster.	2015-08-17 16:52:53 -07:00
Yicheng Qin	c3d4d11402	etcdhttp: adjust request timeout based on config It uses heartbeat interval and election timeout to estimate the expected request timeout. This PR helps etcd survive under high roundtrip-time environment, e.g., globally-deployed cluster.	2015-08-12 09:22:59 -07:00
Xiang Li	a718329ad3	Merge pull request #3248 from xiang90/v3 initial v3 demo	2015-08-10 13:59:03 -07:00
Xiang Li	6c58333969	etcdmain: use default formatter The default formatter would use syslog style when running under init system, and would use pretty format otherwise.	2015-08-10 13:38:22 -07:00
Xiang Li	f004b4dac7	*: etcdserver supports v3 demo	2015-08-08 05:58:29 -07:00
Xiang Li	1b572ae2dd	etcdmain: fix path printing	2015-08-06 15:53:24 -07:00
Xiang Li	7314310aed	Merge pull request #3233 from xiang90/srv_discovery better dns discovery error and doc	2015-08-06 14:35:22 -07:00
Yicheng Qin	2c2249dadc	Merge pull request #3219 from yichengq/limit-listener etcdmain: stop accepting client conns when it reachs limit	2015-08-06 12:17:49 -07:00
Yicheng Qin	97923ca3fc	etcdmain: close client conns when it exceeds limit This solves the problem that etcd may fatal because its critical path cannot get file descriptor resource when the number of clients is too big. The PR lets the client listener close client connections immediately after they are accepted when the file descriptor usage in the process reaches some pre-set limit, so it ensures that the internal critical path could always get file descriptor when it needs. When there are tons to clients connecting to the server, the original behavior is like this: ``` 2015/08/4 16:42:08 etcdserver: cannot monitor file descriptor usage (open /proc/self/fd: too many open files) 2015/08/4 16:42:33 etcdserver: failed to purge snap file open default2.etcd/member/snap: too many open files [halted] ``` Current behavior is like this: ``` 2015/08/6 19:05:25 transport: accept error: closing connection, exceed file descriptor usage limitation (fd limit=874) 2015/08/6 19:05:25 transport: accept error: closing connection, exceed file descriptor usage limitation (fd limit=874) 2015/08/6 19:05:26 transport: accept error: closing connection, exceed file descriptor usage limitation (fd limit=874) 2015/08/6 19:05:27 transport: accept error: closing connection, exceed file descriptor usage limitation (fd limit=874) 2015/08/6 19:05:28 transport: accept error: closing connection, exceed file descriptor usage limitation (fd limit=874) 2015/08/6 19:05:28 etcdserver: 80% of the file descriptor limit is used [used = 873, limit = 1024] ``` It is available at linux system today because pkg/runtime only has linux support.	2015-08-06 12:03:20 -07:00
Xiang Li	203e0f178b	etcdmian: better error for srv discovery failure	2015-08-06 11:38:53 -07:00
Xiang Li	0cbac56fa2	etcdmain: support sdnotify for readiness	2015-07-31 13:33:18 +08:00
Xiang Li	6be02ff5ec	etcdmian: fix initialization confilct Fix #3142 Ignore flags if etcd is already initialized.	2015-07-21 12:53:21 -07:00
Yicheng Qin	24db661401	etcdmain: warn when listening on HTTP if TLS is set If the user sets TLS info, this implies that he wants to listen on TLS. If etcd finds that urls to listen is still HTTP schema, it prints out warning to notify user about possible wrong setting.	2015-07-21 12:53:21 -07:00
Xiang Li	dc3f7f5d90	*: detect duplicate name for discovery bootstrap	2015-07-21 12:53:20 -07:00
Xiang Li	dedabddcb3	etcdmain: proxy ignores discovery if it is initialized	2015-07-10 12:52:24 -07:00
Michal Witkowski	7bca757d09	*: add metrics to `store` and `proxy`.	2015-07-07 16:01:51 +01:00
Wolfgang Ebner	1264dbe24d	proxy: added endpoint refresh and timeout configuration values the default dial timeout was set to 30 seconds this made the proxy a pain to use in failure scenarios. fixes 2862	2015-06-13 09:42:18 +02:00
Xiang Li	6c8b32d316	etcdmain: exit if discovery fails Fix #2919 If discovery fails, etcd will hang there and does nothing. This commit fixes the problem.	2015-06-11 15:45:00 -07:00
Yicheng Qin	1764837783	etcdmain: clean up plog.Printf Put it into different log levels.	2015-06-11 10:24:02 -07:00
Yicheng Qin	5a9c2851a7	etcdmain: var log -> plog So the variable name doesn't mess up with standard package name.	2015-06-10 16:19:06 -07:00
Yicheng Qin	0589afe605	etcdmain: increase maxIdleConnsPerHost in proxy transport This PR set maxIdleConnsPerHost to 128 to let proxy handle 128 concurrent requests in long term smoothly. If the number of concurrent requests is bigger than this value, proxy needs to create one new connection when handling each request in the delta, which is bad because the creation consumes resource and may eat up your ephemeral port.	2015-06-01 16:19:36 -07:00
Alex Altair	6f8c36c2ab	etcdmain: use double-dash in message flag	2015-05-28 13:09:44 -07:00
Xiang Li	7875de7d2f	etcdmian: remove main prefix in logging We are using new log pkg, which adds the prefix for us.	2015-05-27 10:01:22 -07:00
Prashanth Balasubramanian	1e15b05e4c	etcdmain: explicitly set gomaxprocs and log its value	2015-05-27 09:53:05 -07:00
Yicheng Qin	a6a649f1c3	etcdserver: stop exposing Cluster struct After this PR, only cluster's interface Cluster is exposed, which makes code much cleaner. And it avoids external packages to rely on cluster struct in the future.	2015-05-13 10:01:25 -07:00
Yicheng Qin	032db5e396	*: extract types.Cluster from etcdserver.Cluster The PR extracts types.Cluster from etcdserver.Cluster. types.Cluster is used for flag parsing and etcdserver config. There is no need to expose etcdserver.Cluster public, which contains lots of etcdserver internal details and methods. This is the first step for it.	2015-05-12 14:53:11 -07:00
Xiang Li	91cbf47a2a	etcdmain: better error msg when detected duplicate id in discovery	2015-05-11 17:34:44 -07:00
Yicheng Qin	3f90394fbb	etcdmain: advertise-client-urls must be set if listen-client-urls is set Before this PR, people can set listen-client-urls without setting advertise-client-urls, and leaves advertise-client-urls as default localhost value. The client libraries which sync the cluster info fetch wrong advertise-client-urls and cannot connect to the cluster. This PR avoids this case and provides better UX. On the other hand, this change is safe because people always want to set advertise-client-urls if listen-client-urls is set. The default localhost advertise url cannot be accessed from the outside, and should always be set except that etcd is bootstrapped with no flag.	2015-04-29 09:52:15 -07:00
Barak Michener	ad8e3ea5dc	etcdmain: fix logging flag documentation	2015-04-28 16:31:19 -04:00
Barak Michener	b369cf037a	etcdmain: New Logging Package use capnslog Vendor capnslog and set the flags in etcd main remove package prefix from etcdmain	2015-04-28 15:42:32 -04:00
Yicheng Qin	1811701427	Revert "etcdserver: fix cluster fallback recovery" This reverts commit `cff005777a`. Conflicts: etcdserver/server.go	2015-04-19 11:34:33 -07:00
Yicheng Qin	0ac05e310e	etcdmain: print error when non-flag args remain	2015-03-23 11:23:47 -07:00
Brandon Philips	ea72f2637c	etcdmain: let user provide a name w/o initial-cluster update Currently this doesn't work if a user wants to try out a single machine cluster but change the name for whatever reason. This is because the name is always "default" and the ``` ./bin/etcd -name 'baz' ``` This solves our problem on CoreOS where the default is `ETCD_NAME=%m`.	2015-03-18 17:24:52 -07:00
Xiang Li	1ab68902a9	etcdmain: identify data dir type	2015-03-17 16:10:58 -07:00
Yicheng Qin	2c94e2d771	*: make dial timeout configurable Dial timeout is set shorter because 1. etcd is supposed to work in good environment, and the new value is long enough 2. shorter dial timeout makes dial fail faster, which is good for performance	2015-02-28 11:18:59 -08:00

1 2 3

115 Commits (f8a4d1f01ba987a7541ea40dc9df16641e04020f)