vitalif/etcd - etcd

Commit Graph

Author	SHA1	Message	Date
Xiang Li	0708a5e50d	etcdserver: refactor a for loop in recvSnap test	2015-12-02 15:41:03 -08:00
Xiang Li	67ffeee521	Merge pull request #3946 from xiang90/fix_snap_test etcdserver: get rid of unreliable WaitSchedule	2015-12-02 15:01:20 -08:00
Xiang Li	3ec3ffbef0	etcdserver: get rid of unreliable WaitSchedule In this case, we know we are waiting for an action happened on storage. We can do a busy wait instead of calling waitSchedule. The test previously failed on CI with no observed actions.	2015-12-02 13:18:11 -08:00
ngaut	e142e073e8	v3rpc: Tiny clean up unreachable code	2015-12-02 12:30:21 +08:00
Barak Michener	452e5bffc0	etcdserver: Fix panic for v3 transaction compares on non-existent keys Fixes #3920	2015-11-24 16:49:45 -05:00
Xiang Li	b868f4b1b1	v3rpc: report eventReceived correctly	2015-11-19 22:44:46 -08:00
Xiang Li	3cf90a4dff	v3rpc: do not send closing event When a watch stream closes, both of the watcher.Chan and closec will be closed. If watcher.Chan is closed, we should not send out the empty event. Sending the empty is wrong and waste a lot of CPU resources. Instead we should just return.	2015-11-19 17:56:15 -08:00
Xiang Li	c400d05d0a	Merge pull request #3892 from xiang90/fix_snapshot_handling etcdserver: handle incoming v3 snapshot correctly	2015-11-19 12:18:18 -08:00
Xiang Li	a07e4bb6e2	etcdserver: handle incoming v3 snapshot correctly 1. we should update all kv reference (including the on in snapStore). 2. we should first restore a new KV and then close the old one asynchronously.	2015-11-18 16:07:41 -08:00
Gyu-Ho Lee	81229dbea9	*: add missing package descriptions This adds and updates package descriptions in etcd projects. And also deletes some duplicate LICENSE statements.	2015-11-17 20:54:10 -08:00
Xiang Li	b4abe5b584	etcdserver: start real txn for txn request We should open real txn for applying txn requests. Or the intermediate state might be observed by reader. This also fixes #3803. Same consistent(raft) index per multiple indenpendent operations confuses consistentStore.	2015-11-16 21:11:27 -08:00
Xiang Li	a1616afc5d	auth: use canonical path for pre-defined guest role	2015-11-15 17:58:09 -08:00
Xiang Li	ff36b9d9bc	Merge pull request #3700 from xiang90/metrics_hi Replace Summary with Histogram for all metrics	2015-11-10 10:06:45 -08:00
Xiang Li	ae2f69b41e	etcdserver: rename processInternalRaftReq to processInternalRaftRequest We have a structure called InternalRaftRequest. Making the function shorter by calling it processInternalRaftReq seems to be random and reduce the readability. So we just use the full name.	2015-11-09 13:37:36 -08:00
Hitoshi Mitake	2c8ffa6bcb	etcdserver: correct error log for strict reconfig checking This commit fixes an error log caused by the strict reconfig checking option. Before: 14:21:38 etcd2 \| 2015-11-05 14:21:38.870356 E \| etcdhttp: got unexpected response error (etcdserver: re-configuration failed due to not enough started members) After: log 13:27:33 etcd2 \| 2015-11-05 13:27:33.089364 E \| etcdhttp: etcdserver: re-configuration failed due to not enough started members The error is not an unexpected thing therefore the old message is incorrect.	2015-11-06 11:03:42 +09:00
Yicheng Qin	0874c44cdc	etcdserver: fix snapshot index in creation log line The snapshot is created at appliedi instead of snapi.	2015-11-05 14:02:09 -08:00
Xiang Li	08f0d94019	Merge pull request #3809 from xiang90/rpc_kv *: refactor kv rpc implementation	2015-11-04 19:05:48 -08:00
Yicheng Qin	ec3c2d23a3	*: update feature maps to adopt v2.3.0	2015-11-04 14:30:35 -08:00
Yicheng Qin	3d15526c35	Merge pull request #3796 from yichengq/fix-get-version etcdserver: not reuse connections for peer transport	2015-11-04 11:39:14 -08:00
Xiang Li	c37bd2385a	*: refactor kv rpc implementation	2015-11-04 11:36:17 -08:00
Yicheng Qin	4ccbcb91c8	rafthttp: add functions to create listener and roundTripper This moves the code to create listener and roundTripper for raft communication to the same place, and use explicit functions to build them. This prevents possible development errors in the future.	2015-11-04 11:12:46 -08:00
Yicheng Qin	32819f6b3f	etcdserver: use roundTripper to request peerURL It uses roundTripper instead of Transport because roundTripper is sufficient for its requirements.	2015-11-04 10:49:42 -08:00
Xiang Li	1a3f7f7fa4	*: rename etcd service to kv service in gRPC	2015-11-04 10:05:49 -08:00
Xiang Li	10de2e6dbe	*: serve watch service Implement watch service and hook it up with grpc server in etcdmain.	2015-11-03 15:58:34 -08:00
Xiang Li	c160085f44	*: add v3 watch service	2015-11-03 14:21:24 -08:00
Yicheng Qin	0eee88a3d9	etcdserver: use timeout transport as peer transport This pairs with remote timeout listeners. etcd uses timeout listener, and times out the accepted connections if there is no activity. So the idle connections may time out easily. Becaus timeout transport doesn't reuse connections, it prevents using timeouted connection. This fixes the problem that etcd fail to get version of peers.	2015-11-03 07:58:03 -08:00
Xiang Li	fe165de1d1	Merge pull request #3794 from yichengq/fix-proxy-term etcdmain: fix parsing discovery error	2015-11-02 17:33:47 -08:00
Yicheng Qin	9757dcd3a2	etcdmain: fix parsing discovery error The discovery error is wrapped into a struct now, and cannot be compared to predefined errors. Correct the comparison behavior to fix the problem.	2015-11-02 17:23:06 -08:00
Jonathan Boulle	ee522025b3	etcdserver: restructure auth.Store and auth.User This attempts to decouple password-related functions, which previously existed both in the Store and User structs, by splitting them out into a separate interface, PasswordStore. This means that they can be more easily swapped out during testing. This also changes the relevant tests to use mock password functions instead of the bcrypt-backed implementations; as a result, the tests are much faster. Before: ``` github.com/coreos/etcd/etcdserver/auth 31.495s github.com/coreos/etcd/etcdserver/etcdhttp 91.205s ``` After: ``` github.com/coreos/etcd/etcdserver/auth 1.207s github.com/coreos/etcd/etcdserver/etcdhttp 1.207s ```	2015-10-30 16:33:40 -07:00
Yicheng Qin	7d757bbc8a	etcdserver: extend wait timeout in TestPublishRetry It fixes the failure in semaphore CI: ``` --- FAIL: TestPublishRetry (0.00s) server_test.go:1108: len(action) = 1, want >= 2 ```	2015-10-28 12:07:00 -07:00
Yicheng Qin	099d8674c4	Merge pull request #3746 from yichengq/load-storage etcdserver: fix recovering snapshot from disk	2015-10-27 14:42:41 -07:00
Yicheng Qin	263b270708	etcdserver: commit v3 storage before releasing WAL This ensures that v3 storage could always find the following log entries when restart.	2015-10-26 21:06:08 -07:00
Xiang Li	a8e6e71bf9	*: fix various data races detected by race detector	2015-10-26 20:49:37 -07:00
Yicheng Qin	15ed6d8268	etcdserver: save consistent index into v3 storage This helps to recover consistent index when restart in the future.	2015-10-24 09:27:24 -07:00
Yicheng Qin	cacc0d6432	etcdserver: restore KV snapshot when receiving snapshot When a slow follower receives the snapshot sent from the leader, it should rename the snapshot file to the default KV file path, and restore KV snapshot. Have tested it manually and it works pretty well.	2015-10-23 08:43:26 -07:00
Yicheng Qin	de669be6d6	Merge pull request #3683 from yichengq/raft-block etcdserver: fix raft state machine may block	2015-10-20 09:44:34 -07:00
Yicheng Qin	ab5df57ecf	etcdserver: fix raft state machine may block When snapshot store requests raft snapshot from etcdserver apply loop, it may block on the channel for some time, or wait some time for KV to snapshot. This is unexpected because raft state machine should be unblocked. Even worse, this block may lead to deadlock: 1. raft state machine waits on getting snapshot from raft memory storage 2. raft memory storage waits snapshot store to get snapshot 3. snapshot store requests raft snapshot from apply loop 4. apply loop is applying entries, and waits raftNode loop to finish messages sending 5. raftNode loop waits peer loop in Transport to send out messages 6. peer loop in Transport waits for raft state machine to process message Fix it by changing the logic of getSnap to be asynchronously creation.	2015-10-20 09:19:34 -07:00
Hitoshi Mitake	1b0c65c299	etcdserver: don't allow methods other than GET in /debug/vars Currently, /debug/vars seems to allow all types of methods e.g. PUT, POST, etc. However, this path is a readonly stuff so it should allow GET only.	2015-10-20 17:19:42 +09:00
Xiang Li	32dd4d5de3	Merge pull request #3657 from xiang90/fix_remove etcdserver: skip updating attr if the member does not exist	2015-10-19 13:35:57 -07:00
Xiang Li	d90a47656e	etcdserver: use Histogram for proposal_durations	2015-10-17 12:48:25 -07:00
Yicheng Qin	1f21ccf166	rafthttp: support sending v3 snapshot message Use snapshotSender to send v3 snapshot message. It puts raft snapshot message and v3 snapshot into request body, then sends it to the target peer. When it receives http.StatusNoContent, it knows the message has been received and processed successfully. As receiver, snapHandler saves v3 snapshot and then processes the raft snapshot message, then respond with http.StatusNoContent.	2015-10-13 23:11:28 -07:00
Yicheng Qin	207c92b627	rafthttp: build transport inside pkg instead of passed-in rafthttp has different requirements for connections created by the transport for different usage, and this is hard to achieve when giving one http.RoundTripper. Pass into pkg the data needed to build transport now, and let rafthttp build its own transports.	2015-10-11 21:42:37 -07:00
Yicheng Qin	233e717e2f	rafthttp: expose struct to set configuration transport takes too many arguments and the new function is unable to read. Change the way to set fields in transport struct directly.	2015-10-11 09:02:16 -07:00
Xiang Li	98e30ca7c2	etcdserver: skip updating attr if the member does not exist	2015-10-08 14:07:16 -07:00
Yicheng Qin	f74ff9b867	Merge pull request #3644 from mitake/test-race etcdserver, test: don't access testing.T in time.AfterFunc()'s own go…	2015-10-07 08:34:58 -07:00
Hitoshi Mitake	68dd3ee621	etcdserver, test: don't access testing.T in time.AfterFunc()'s own goroutine time.AfterFunc() creates its own goroutine and calls the callback function in the goroutine. It can cause datarace like the problem fixed in the commit `de1a16e0f1` . This commit also fixes the potential dataraces of tests in etcdserver/server_test.go .	2015-10-06 11:37:08 +09:00
Yicheng Qin	8c94ae0ee3	etcdserver: get existing snapshot instead of requesting one This fixes the problem that proposal cannot be applied. When start the etcdserver.run loop, it expects to get the latest existing snapshot. It should not attempt to request one because the loop is the entity to create the snapshot.	2015-10-05 14:32:16 -07:00
Yicheng Qin	36f4303fc3	storage/etcdserver: update KV.Snapshot function When using Snapshot function, it is expected: 1. know the size of snapshot before writing data 2. split snapshot-ready phase and write-data phase. so we could cut snapshot first and write data later. Update its interface to fit the requirement of etcdserver.	2015-10-03 10:15:23 -07:00
Yicheng Qin	8c0db94fef	Merge pull request #3631 from yichengq/create-snapshot etcdserver: support to create raft snapshot at apply loop	2015-10-03 10:03:27 -07:00
Yicheng Qin	18c568bc82	etcdserver: print out correct restored cluster info Before this PR, it always prints nil because cluster info has not been covered when print: ``` 2015-10-02 14:00:24.353631 I \| etcdserver: loaded cluster information from store: <nil> ```	2015-10-02 16:11:32 -07:00
Yicheng Qin	bfe9502f4f	etcdserver: support to create raft snapshot at apply loop and snapStore could trigger it to create the latest raft snapshot.	2015-10-02 13:17:56 -07:00
Yicheng Qin	ccce61bda9	Merge pull request #3614 from yichengq/snapshot-store etcdserver: add snapshotStore and raftStorage	2015-10-01 19:35:34 -07:00
Yicheng Qin	2276328720	etcdserver: add snapshotStore and raftStorage snapshotStore is the store of snapshot, and it supports to get latest snapshot and save incoming snapshot. raftStorage supports to get latest snapshot when v3demo is open.	2015-10-01 19:00:59 -07:00
Xiang Li	715fdfb669	Merge pull request #3093 from mwitkow-io/feature/httpd_metrics add `events` metrics in etcdhttp.	2015-10-01 12:10:58 -07:00
Michal Witkowski	1b2dc1c796	metrics: add `events` metrics in etcdhttp.	2015-10-01 08:11:42 +01:00
Yicheng Qin	a535cf2cad	Merge pull request #3610 from yichengq/load-storage etcdserver: restore v3 storage when restart	2015-09-29 11:58:38 -07:00
Yicheng Qin	49d262185d	Merge pull request #3590 from yichengq/discovery-log etcdmain: improve log when join discovery fails	2015-09-29 08:02:18 -07:00
Yicheng Qin	5d906a0acc	etcdserver: restore v3 storage when restart To load the previous data.	2015-09-29 00:14:27 -07:00
Yicheng Qin	939aa96a34	etcdmain: improve log when join discovery fails Before this PR, the log is ``` 2015/09/1 13:18:31 etcdmain: client: etcd cluster is unavailable or misconfigured ``` It is quite hard for people to understand what happens. Now we print out the exact reason for the failure, and explains the way to handle it.	2015-09-28 23:23:50 -07:00
Xiang Li	6c05a01ec6	Merge pull request #3604 from gyuho/replace_netutil_BasicAuth etcdhttp/auth: BasicAuth method in standard pkg	2015-09-28 15:55:46 -07:00
Gyu-Ho Lee	e16f81838b	etcdhttp/auth: BasicAuth method in standard pkg I created a new PR from https://github.com/coreos/etcd/pull/3598. This is for `TODO: use the standard lib BasicAuth method when we move to Go 1.4.` [1]. `BasicAuth` method got into Go standard package a year ago. [2] --- 1. https://github.com/coreos/etcd/blob/master/pkg/netutil/netutil.go#L126-L138 2. https://codereview.appspot.com/76540043/	2015-09-28 14:02:55 -07:00
Xiang Li	1226838381	etcdhttp: add Content-Type: application/json header to version handler	2015-09-25 15:14:13 -07:00
Gyu-Ho Lee	85f4475f62	httptypes/errors: HTTPError.WriteTo returns error Squashing all commits into this one (from https://github.com/coreos/etcd/pull/357). Thanks,	2015-09-25 08:06:26 -07:00
Xiang Li	2540a3fb7e	etcdsever: mismatch error uses the same format as the corresponding flags	2015-09-21 19:32:10 -07:00
Xiang Li	ea3dbfed60	Merge pull request #3408 from MSamman/extend-auth-api etcdserver: extend auth api	2015-09-21 11:51:19 -07:00
Xiang Li	3b70bf87c3	etcdmain: better logging when user forget to set initial flags	2015-09-21 10:43:26 -07:00
Mohammad Samman	6ae1f6c6e4	etcdserver: extend auth api allow recursive query on users and roles to get more detail Fixes #3278	2015-09-21 00:51:18 -07:00
Yicheng Qin	cedad49dcf	Merge pull request #3543 from mitake/reconfig-remove etcdserver: forbid removing started member if quorum cannot be preserved in strict reconfig mode	2015-09-17 18:22:53 -07:00
Hitoshi Mitake	f8859a980d	etcdserver: forbid removing started member if quorum cannot be preserved in strict reconfig mode Like the commit `6974fc63ed`, this commit lets etcdserver forbid removing started member if quorum cannot be preserved after reconfiguration if the option -strict-reconfig-check is passed to etcd. The removal can cause deadlock if unstarted members have wrong peer URLs.	2015-09-18 10:09:57 +09:00
Xiang Li	ec4142576e	Merge pull request #3534 from xiang90/grpc_err etcdserver: better v3 api error handling	2015-09-16 12:32:28 -07:00
Jonathan Boulle	7848ac3979	*: add missing license headers	2015-09-15 14:09:01 -07:00
Xiang Li	94f4069a25	etcdserver: better v3 api error handling	2015-09-15 11:20:06 -07:00
Yicheng Qin	352cd768c6	etcdserver: fix shadow declaration	2015-09-14 23:25:16 -07:00
Yicheng Qin	05c74bd890	etcdserver: rename db file into a formal directory and rename it to a formal name	2015-09-14 22:41:40 -07:00
Yicheng Qin	51f1ee055e	Merge pull request #3526 from yichengq/snapshot etcdserver: forbid to unset v3 demo once used	2015-09-14 21:36:39 -07:00
Yicheng Qin	1f0fb3d9aa	etcdserver: forbid to unset v3 demo once used After enabling v3 demo, it may change the underlying data organization for v3 store. So we forbid to unset --experimental-v3demo once it has been used.	2015-09-14 21:27:11 -07:00
Xiang Li	94f784826a	*: support v3 compaction	2015-09-14 19:59:36 -07:00
Xiang Li	e0d8923f7b	Merge pull request #3524 from xiang90/grpc_error etcdserver: use gRPC error instead of error message in header	2015-09-14 16:38:44 -07:00
Xiang Li	7183387110	etcdserver: use gRPC error instead of error message in header	2015-09-14 16:11:13 -07:00
Gyu-Ho Lee	c2dcf7431e	etcdserver, store: fix grammars in comments (a->an existing) I found some grammatical errors in comments. This pull request was submitted https://github.com/coreos/etcd/pull/3513. I am resubmitting following the correct guidlines.	2015-09-14 13:41:13 -07:00
Xiang Li	c7b4c67436	Merge pull request #3514 from xiang90/v3_raft support clustered v3 api	2015-09-14 09:35:02 -07:00
Xiang Li	4c81615cef	etcdserver: initial support for cluster-wide v3 request	2015-09-13 08:32:01 -07:00
Xiang Li	600456f4ba	etcdserverpb: update proto file for raftInternalRequest We needs to assign each raftInternalRequest an ID for getting the response after it goes through raft. We also needs an empty response for error case.	2015-09-13 08:28:10 -07:00
Hitoshi Mitake	dad32646eb	etcdserver: enhance test cases for isReadyToAddNewMember - a case of a cluster with even number members - a case of an empty cluster	2015-09-13 12:30:10 +09:00
Jonathan Boulle	d9cf752060	etcdserver: add test for isReadyToAddNewMember Also fixed check for special case of one-member cluster	2015-09-13 11:16:08 +09:00
Hitoshi Mitake	6974fc63ed	etcdserver: avoid deadlock caused by adding members with wrong peer URLs Current membership changing functionality of etcd seems to have a problem which can cause deadlock. How to produce: 1. construct N node cluster 2. add N new nodes with etcdctl member add, without starting the new members What happens: After finishing add N nodes, a total number of the cluster becomes 2 * N and a quorum number of the cluster becomes N + 1. It means membership change requires at least N + 1 nodes because Raft treats membership information in its log like other ordinal log append requests. Assume the peer URLs of the added nodes are wrong because of miss operation or bugs in wrapping program which launch etcd. In such a case, both of adding and removing members are impossible because the quorum isn't preserved. Of course ordinal requests cannot be served. The cluster would seem to be deadlock. Of course, the best practice of adding new nodes is adding one node and let the node start one by one. However, the effect of this problem is so serious. I think preventing the problem forcibly would be valuable. Solution: This patch lets etcd forbid adding a new node if the operation changes quorum and the number of changed quorum is larger than a number of running nodes. If etcd is launched with a newly added option -strict-reconfig-check, the checking logic is activated. If the option isn't passed, default behavior of reconfig is kept. Fixes https://github.com/coreos/etcd/issues/3477	2015-09-13 09:31:53 +09:00
Xiang Li	95d5556445	etcdserver: refactor v3demo do	2015-09-05 15:31:28 -07:00
Xiang Li	3f18ded10a	*: v3api index->revision	2015-09-04 10:41:20 -07:00
Xiang Li	2ac9af4924	*: replace consistent token with revision in v3 api	2015-09-03 15:41:33 -07:00
Xiang Li	ef7cf058a2	*: update gogoproto	2015-09-03 15:32:25 -07:00
Tamir Duberstein	45390b9fb8	*: regenerate proto to use local import path Using Go-style import paths in protos is not idiomatic. Normally, this detail would be internal to etcd, but the path from which gogoproto is imported affects downstream consumers (e.g. cockroachdb). In cockroach, we want to avoid including `$GOPATH/src` in our protoc include path for various reasons. This patch puts etcd on the same convention, which allows this for cockroach. More information: https://github.com/cockroachdb/cockroach/pull/2339#discussion_r38663417 This commit also regenerates all the protos, which seem to have drifted a tiny bit.	2015-09-03 13:38:28 -04:00
Xiang Li	a94118893c	Merge pull request #3413 from xiang90/snapshot_dir *: support wal dir	2015-09-01 10:03:50 -07:00
Xiang Li	d94e712d91	*: support wal dir	2015-09-01 09:54:27 -07:00
Yicheng Qin	f3bfcb9dee	etcdserver: add timeout param on getClusterFromRemotePeers It sets 10s timeout for public GetClusterFromRemotePeers. This helps the following cases to work well in high latency scenario: 1. proxy sync members from the cluster 2. newly-joined member sync members from the cluster Besides 10s request timeout, the request is also controlled by dial timeout and read connection timeout.	2015-09-01 08:49:01 -07:00
Xiang Li	1bcaa9f4a1	etcdserver: ignore confChangeUpdateNode in getIDs	2015-08-31 09:36:39 -07:00
Yicheng Qin	92cd24d5bd	*: fix govet shadow check failure	2015-08-27 14:15:30 -07:00
Yicheng Qin	8f6bf029f8	etcdserver: specify request timeout error due to connection lost It specifies request timeout error possibly caused by connection lost, and print out better log for user to understand. It handles two cases: 1. the leader cannot connect to majority of cluster. 2. the connection between follower and leader is down for a while, and it losts proposals. log format: ``` 20:04:19 etcd3 \| 2015-08-25 20:04:19.368126 E \| etcdhttp: etcdserver: request timed out, possibly due to connection lost 20:04:19 etcd3 \| 2015-08-25 20:04:19.368227 E \| etcdhttp: etcdserver: request timed out, possibly due to connection lost ```	2015-08-26 12:38:37 -07:00
Mohammad Samman	e2e002f94e	etcdserver: handle malformed basic auth return insufficient credentials if basic auth header is malformed Fixes #3280	2015-08-25 12:37:24 -07:00
Xiang Li	e3ef1d363a	Merge pull request #3366 from xiang90/v3_proto update v3 proto and doc	2015-08-24 11:22:29 -07:00
Xiang Li	1cccbb5ebd	etcdserverpb: add comments for compaction	2015-08-24 10:52:54 -07:00

1 2 3 4 5 ...

907 Commits (f5fa9b538450e21cc6d409e1f92bef47adc76210)