vitalif/etcd - etcd

Commit Graph

Author	SHA1	Message	Date
Anthony Romano	b77de97136	test: bill of materials check pass	2017-04-26 16:29:47 -07:00
Gyu-Ho Lee	633a0a847b	Merge pull request #7824 from gyuho/certs *: test expired certs in client	2017-04-26 13:31:17 -07:00
Gyu-Ho Lee	f674a1b583	clientv3/integration: test client dial with expired certs Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-26 12:32:46 -07:00
Gyu-Ho Lee	7cb860a31b	integration/fixtures: add expired certs Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-26 12:22:54 -07:00
Anthony Romano	d2e69b339f	Merge pull request #7816 from heyitsanthony/v3client-blankctx v3client: wrap watch ctxs with blank ctx	2017-04-25 21:53:14 -07:00
Gyu-Ho Lee	41e77c9db6	Merge pull request #7818 from gyuho/doc Documentation: require Go 1.8+ for build	2017-04-25 21:46:07 -07:00
Gyu-Ho Lee	4959663f90	Documentation: require Go 1.8+ for build Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 17:04:54 -07:00
fanmin shi	c49a87bd04	Merge pull request #7672 from fanminshi/integrate_runner_to_tester etcd-tester: integrate etcd runner into etcd tester	2017-04-25 15:22:29 -07:00
fanmin shi	60b9adc267	Merge pull request #7812 from fanminshi/refactor_runner etcd-runner: fix runner and minor refactoring.	2017-04-25 15:21:57 -07:00
Anthony Romano	3ce31acda4	v3client: wrap watch ctxs with blank ctx Printing the values in ctx.String() will data race if the value is mutable and doesn't implement String(), which seems to be common. Instead, just return a fixed string instead of computing it; v3client watches don't need as much flexibility for creating separate strings, so separate ctx strings probably aren't necessary at this point. Fixes #7811	2017-04-25 15:03:06 -07:00
Gyu-Ho Lee	96aaeee4f5	Merge pull request #7814 from gyuho/aaa etcdserver: do not block on raft stopping	2017-04-25 15:00:06 -07:00
fanmin shi	a9e04061b1	etcd-runner: integrate etcd runner in to etcd tester etcd tester runs etcd runner as a separate binary. it signals sigstop to the runner when tester wants to stop stressing. it signals sigcont to the runner when tester wants to start stressing. when tester needs to clean up, it signals sigint to runner. FIXES #7026	2017-04-25 14:53:23 -07:00
fanmin shi	77fbe10dfc	etcd-runner: add --prefix flag, allows inf round, and minor vars refactoring in watch runner.	2017-04-25 14:18:42 -07:00
fanmin shi	debc69e1f2	etcd-runner: pass in lock name as a command arg for lock_racer.	2017-04-25 14:18:42 -07:00
fanmin shi	72fb756af3	etcd-runner: add lease ttl as a flag and fatal when err in lease-runner.	2017-04-25 14:18:42 -07:00
fanmin shi	d57ad8ec8d	etcd-runner: add barrier, observe !ok handling, and election name arg to election-runner.	2017-04-25 14:17:59 -07:00
fanmin shi	fa85445ef8	etcd-runner: add rate limiting in doRounds()	2017-04-25 14:00:52 -07:00
Gyu-Ho Lee	327f09fcb4	etcdserver: do not block on raft stopping Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 13:35:43 -07:00
Gyu-Ho Lee	2af1605db3	Merge pull request #7810 from gyuho/sync-with-apply etcdserver: ensure waitForApply sync with applyAll	2017-04-25 13:21:30 -07:00
Gyu-Ho Lee	91f6aee4f2	etcdserver: ensure waitForApply sync with applyAll Problem is: `Step1`: `etcdserver/raft.go`'s `Ready` process routine sends config-change entries via `r.applyc <- ap` (https://github.com/coreos/etcd/blob/master/etcdserver/raft.go#L193-L203) `Step2`: `etcdserver/server.go`'s `*EtcdServer.run` routine receives this via `ap := <-s.r.apply()` (https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L735-L738) `StepA`: `Step1` proceeds without sync, right after sending `r.applyc <- ap`. `StepB`: `Step2` proceeds without sync, right after `sched.Schedule(s.applyAll(&ep,&ap))`. `StepC`: `etcdserver` tries to sync with `s.applyAll(&ep,&ap)` by calling `rh.waitForApply()`. `rh.waitForApply()` waits for all pending jobs to finish in `pkg/schedule` side. However, the order of `StepA`,`StepB`,`StepC` is not guaranteed. It is possible that `StepC` happens first, and proceeds without waiting on apply. And the restarting member comes back as a leader in single-node cluster, when there is no synchronization between apply-layer and config-change Raft entry apply. Confirmed with more debugging lines below, only reproducible with slow CPU VM (~2 vCPU). ``` ~:24.005397 I \| etcdserver: starting server... [version: 3.2.0+git, cluster version: to_be_decided] ~:24.011136 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply before ~:24.011194 I \| etcdserver: [DEBUG] 29b2d24047a277df starts wait for 0 pending jobs ~:24.011234 I \| etcdserver: [DEBUG] 29b2d24047a277df finished wait for 0 pending jobs (current pending 0) ~:24.011268 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply after ~:24.011348 I \| etcdserver: [DEBUG] [0] 29b2d24047a277df is scheduling conf change on 29b2d24047a277df ~:24.011396 I \| etcdserver: [DEBUG] [1] 29b2d24047a277df is scheduling conf change on 5edf80e32a334cf0 ~:24.011437 I \| etcdserver: [DEBUG] [2] 29b2d24047a277df is scheduling conf change on e32e31e76c8d2678 ~:24.011477 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 29b2d24047a277df ~:24.011509 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 5edf80e32a334cf0 ~:24.011545 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on e32e31e76c8d2678 ~:24.012500 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df before ~:24.013014 I \| etcdserver/membership: added member 29b2d24047a277df [unix://127.0.0.1:2100515039] to cluster 9250d4ae34216949 ~:24.013066 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after ~:24.013113 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after trigger ~:24.013158 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 5edf80e32a334cf0 before ~:24.013666 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 11.964739ms) ~:24.013709 W \| etcdserver: server is likely overloaded ~:24.013750 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 12.057265ms) ~:24.013775 W \| etcdserver: server is likely overloaded ~:24.013950 I \| raft: 29b2d24047a277df is starting a new election at term 4 ~:24.014012 I \| raft: 29b2d24047a277df became candidate at term 5 ~:24.014051 I \| raft: 29b2d24047a277df received MsgVoteResp from 29b2d24047a277df at term 5 ~:24.014107 I \| raft: 29b2d24047a277df became leader at term 5 ~:24.014146 I \| raft: raft.node: 29b2d24047a277df elected leader 29b2d24047a277df at term 5 ``` I am printing out the number of pending jobs before we call `sched.WaitFinish(0)`, and there was no pending jobs, so it returned immediately (before we schedule `applyAll`). This is the root cause to: - https://github.com/coreos/etcd/issues/7595 - https://github.com/coreos/etcd/issues/7739 - https://github.com/coreos/etcd/issues/7802 `sched.WaitFinish(0)` doesn't work when `len(f.pendings)==0` and `f.finished==0`. Config-change is the first job to apply, so `f.finished` is 0 in this case. `f.finished` monotonically increases, so we need `WaitFinish(finished+1)`. And `finished` must be the one before calling `Schedule`. This is safe because `Schedule(applyAll)` is the only place adding jobs to `sched`. Then scheduler waits on the single job of `applyAll`, by getting the current number of finished jobs before sending `Schedule`. Or just make it be blocked until `applyAll` routine triggers on the config-change job. This patch just removes `waitForApply`, and signal `raftDone` to wait until `applyAll` finishes applying entries. Confirmed that it fixes the issue, as below: ``` ~:43.198354 I \| rafthttp: started streaming with peer 36cda5222aba364b (stream MsgApp v2 reader) ~:43.198740 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply before ~:43.198836 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c starts wait for 0 pending jobs, 1 finished jobs ~:43.200696 I \| integration: launched 3169361310155633349 () ~:43.201784 I \| etcdserver: [DEBUG] [0] 3988bc20c2b2e40c is scheduling conf change on 36cda5222aba364b ~:43.201884 I \| etcdserver: [DEBUG] [1] 3988bc20c2b2e40c is scheduling conf change on 3988bc20c2b2e40c ~:43.201965 I \| etcdserver: [DEBUG] [2] 3988bc20c2b2e40c is scheduling conf change on cf5d6cbc2a121727 ~:43.202070 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 36cda5222aba364b ~:43.202139 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 3988bc20c2b2e40c ~:43.202204 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on cf5d6cbc2a121727 ~:43.202444 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) before ~:43.204486 I \| etcdserver/membership: added member 36cda5222aba364b [unix://127.0.0.1:2100913646] to cluster 425d73f1b7b01674 ~:43.204588 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after ~:43.204703 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after trigger ~:43.204791 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) before ~:43.205689 I \| etcdserver/membership: added member 3988bc20c2b2e40c [unix://127.0.0.1:2101113646] to cluster 425d73f1b7b01674 ~:43.205783 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after ~:43.205929 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after trigger ~:43.206056 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) before ~:43.207353 I \| etcdserver/membership: added member cf5d6cbc2a121727 [unix://127.0.0.1:2100713646] to cluster 425d73f1b7b01674 ~:43.207516 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after ~:43.207619 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after trigger ~:43.207710 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 36cda5222aba364b ~:43.207781 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 3988bc20c2b2e40c ~:43.207843 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on cf5d6cbc2a121727 ~:43.207951 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished wait for 0 pending jobs (current pending 0, finished 1) ~:43.208029 I \| rafthttp: started HTTP pipelining with peer cf5d6cbc2a121727 ~:43.210339 I \| rafthttp: peer 3988bc20c2b2e40c became active ~:43.210435 I \| rafthttp: established a TCP streaming connection with peer 3988bc20c2b2e40c (stream MsgApp v2 reader) ~:43.210861 I \| rafthttp: started streaming with peer 3988bc20c2b2e40c (writer) ~:43.211732 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply after ``` Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 10:22:27 -07:00
fanmin shi	b94b8b5707	etcd-runner: move root cmd into command package this allows easier sharing of global variable for sub commands.	2017-04-25 10:19:20 -07:00
Anthony Romano	fbbc4a4979	Merge pull request #7732 from heyitsanthony/lease-err-ka clientv3: don't halt lease client if there is a lease error	2017-04-25 07:06:31 -07:00
Anthony Romano	2fd6df922a	integration: close proxy's lease client	2017-04-24 23:49:45 -07:00
Anthony Romano	cb8524fbec	benchmark: use new lease interface	2017-04-24 23:49:45 -07:00
Anthony Romano	78afc853f4	etcd-runner: update to use new lease interface	2017-04-24 23:49:45 -07:00
Anthony Romano	b5384ac1c0	grpcproxy: use new lease interface	2017-04-24 23:49:44 -07:00
Anthony Romano	70f0bbe38c	etcdcdtl: use new lease interface	2017-04-24 23:49:44 -07:00
Anthony Romano	f3053265ae	clientv3/integration: use new interfaces in lease tests	2017-04-24 23:49:44 -07:00
Anthony Romano	f224d74ed7	concurrency: use new lease interface in session	2017-04-24 23:49:44 -07:00
Anthony Romano	d5f414f69b	clientv3: don't halt lease client if there is a lease error Fixes #7488	2017-04-24 23:49:44 -07:00
Anthony Romano	f254e38385	Merge pull request #7806 from heyitsanthony/testutil-assert testutil: assert functions	2017-04-23 01:30:39 -07:00
Anthony Romano	2ef3eac5ca	vendor: remove testify Fixes #7805	2017-04-22 20:29:58 -07:00
Anthony Romano	76fb6ebcbb	scripts: remove testify hack in updatedep	2017-04-22 20:29:58 -07:00
Anthony Romano	978cf804ca	store: replace testify asserts with testutil asserts	2017-04-22 20:29:58 -07:00
Anthony Romano	6f06e1cb47	testutil: add assert functions	2017-04-22 20:29:58 -07:00
Anthony Romano	c5d4f3e7db	Merge pull request #7804 from heyitsanthony/current-watch-fix clientv3: set current revision to create rev regardless of CreateNotify	2017-04-22 14:09:17 -07:00
Anthony Romano	7f159b6a8d	Merge pull request #7803 from heyitsanthony/snip-deprecated-machines v2http: remove deprecated /v2/machines path	2017-04-22 14:08:55 -07:00
Anthony Romano	ca4acceb1e	clientv3: set current revision to create rev regardless of CreateNotify Turns out the optimization to ignore setting the init rev for current revision watches breaks some ordering assumptions. Since Watch only returns a channel once it gets a response, it should bind the revision at the time of the first create response. Was causing TestWatchReconnInit to fail.	2017-04-22 13:04:38 -07:00
Anthony Romano	94f6a11bbf	Merge pull request #7756 from heyitsanthony/weaken-v3elect-test integration: permit dropping intermediate leader values on observe	2017-04-22 12:13:51 -07:00
Anthony Romano	c1300c81b3	concurrency: clarify Observe semantics; only fetches subsequence	2017-04-22 11:26:11 -07:00
Anthony Romano	e6a789d541	integration: permit dropping intermediate leader values on observe Weaken TestV3ElectionObserve so it only checks that it observes a strictly monotonically ascending leader transition sequence following the first observed leader. First, the Observe will issue the leader channel before getting a response for its first get; the election revision is only bound after returning the channel. So, Observe can't be expected to always return the leader at the time it was started. Second, Observe fetches the current leader based on its create revision, but begins watching on its ModRevision; this is important so that elections still work in case the leader issues proclamations following a compaction that exceeds its creation revision. So, Observe can't be expected to return the entire proclamation sequence for a single leader. Fixes #7749	2017-04-22 11:26:11 -07:00
Anthony Romano	2bb33181b6	v2http: remove deprecated /v2/machines path	2017-04-22 03:11:21 -07:00
Anthony Romano	7da451640f	Merge pull request #7795 from heyitsanthony/dont-force-initrev clientv3: only update initReq.rev == 0 with watch revision	2017-04-22 02:50:55 -07:00
Anthony Romano	4ab818a856	clientv3: only update initReq.rev == 0 with creation watch revision Always updating the initReq.rev on watch create will resume from the wrong revision if initReq is ever nonzero.	2017-04-21 20:22:51 -07:00
Anthony Romano	ec470944f8	clientv3/integration: test watch resume with disconnect before first event	2017-04-21 20:22:51 -07:00
Anthony Romano	fe1ce3a2f0	integration: add pause/unpause to client bridge Resetting connections sometimes isn't enough; need to stop/resume accepting connections for some tests while keeping the member up.	2017-04-21 20:22:51 -07:00
Anthony Romano	91039bef7c	Merge pull request #7799 from heyitsanthony/ctxize-resolve netutil: use "context" and ctx-ize TCP addr resolution	2017-04-21 16:30:32 -07:00
Anthony Romano	a73950545a	Merge pull request #7801 from heyitsanthony/s1027 *: clear redundant return statement warnings (S1027)	2017-04-21 15:18:40 -07:00
Anthony Romano	14d6ed9e5f	*: clear redundant return statement warnings (S1027)	2017-04-21 14:01:00 -07:00
Xiang Li	a9087ee659	Merge pull request #7714 from glevand/for-merge-cross Add multi arch release support	2017-04-21 10:56:01 -07:00

1 2 3 4 5 ...

11266 Commits (b77de97136137e2a2227557c28dfd99ed54ebc62) All Branches Search

11266 Commits (b77de97136137e2a2227557c28dfd99ed54ebc62)

All Branches