Commit Graph

16286 Commits (4b6a0eea4916a9f7a7183779f13970e1be16c9cd)

Author SHA1 Message Date
Gyuho Lee 4b6a0eea49 CHANGELOG: update with server panic fix
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-19 20:02:52 -07:00
Gyuho Lee 44dea5df03
Merge pull request #12238 from liggitt/slow-v2-panic
etcdserver: Avoid panics logging slow v2 requests in integration tests
2020-08-19 09:20:47 -07:00
Jordan Liggitt ad57fea4c5 etcdserver: Avoid panics logging slow v2 requests in integration tests 2020-08-19 11:30:15 -04:00
Gyuho Lee cfdc296a3c CHANGELOG: update release-3.2 dates
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-18 09:43:45 -07:00
Gyuho Lee edaac6e2a9 CHANGELOG: add v3.3.24 release dates
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-18 09:30:27 -07:00
Gyuho Lee e37b28bd28 CHANGELOG: add v3.4.11
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-18 09:23:24 -07:00
OG 0526f461e1
Doc: Update curl command to fix 400 Bad Request (#11911) 2020-08-16 16:12:39 -07:00
Gyuho Lee 0e4ba37b6c
Merge pull request #12193 from mitake/integration_test
test: avoid non existing package for integration test
2020-08-15 23:26:34 -07:00
Gyuho Lee d35933c351
Merge pull request #12221 from wenjiaswe/changelog_12215
CHANGELOG: update from 12215
2020-08-14 13:29:40 -07:00
Wenjia Zhang 32982ef469 CHANGELOG: update from 12215
Change-Id: I17e076554a56c95fabd95af111eccd8d7409966b
2020-08-14 13:18:02 -07:00
Gyuho Lee 92f9e6eba2
Merge pull request #12216 from jingyih/experimental_flag_for_watch_notify_interval
*: add experimental flag for watch notify interval
2020-08-14 12:47:43 -07:00
jingyih 799b16c2d1 CHANGELOG: update for PR12216 2020-08-14 12:06:38 -07:00
jingyih 9a698476bf *: add experimental flag for watch notify interval 2020-08-14 12:01:00 -07:00
Gyuho Lee 06f89cc4f8
Merge pull request #12212 from gyuho/logger
*: upgrade zap logger to 1.15, replace global logger
2020-08-13 09:46:44 -07:00
Gyuho Lee 93cf449205
Merge pull request #12214 from gyuho/fd
*: optimize runtime.FDUsage + add OS level FD metrics
2020-08-12 18:37:05 -07:00
Gyuho Lee 5678779665 CHANGELOG: update
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 10:32:27 -07:00
Gyuho Lee 421df2ecbb etcdserver: add OS level FD metrics
Similar counts are exposed via Prometheus.
This adds the one that are perceived by etcd server.

e.g.

os_fd_limit 120000
os_fd_used 14
process_cpu_seconds_total 0.31
process_max_fds 120000
process_open_fds 17

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 10:32:27 -07:00
Gyuho Lee 53fdcdc5a2 pkg/runtime: optimize FDUsage by removing sort
No need sort when we just want the counts.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 10:32:24 -07:00
Gyuho Lee d8ed233791 CHANGELOG: update
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 09:50:00 -07:00
Gyuho Lee 7eac6bd497 *: upgrade zap logger to 1.15, replace global logger
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 09:50:00 -07:00
Gyuho Lee ed27d9d2de
Merge pull request #12198 from ptabor/20200803-int-to-string-test-fix
etcdserver, wal: Fix tests unintended CASTing of int->String.
2020-08-11 21:35:20 -07:00
Gyuho Lee 8c44d25f2a
Merge pull request #12211 from tangcong/ignore-errcompacted
etcdserver: ignore ErrCompacted error
2020-08-11 21:34:38 -07:00
Gyuho Lee fe36be2251
Merge pull request #12195 from tangcong/optimize-healthcheck
*: check health by using v3 range request and its corresponding timeout
2020-08-11 21:32:44 -07:00
tangcong 8a4c7751d8 CHANGELOG: update for 12195 2020-08-12 08:10:13 +08:00
Gyuho Lee 18adf55c92
Merge pull request #12199 from ptabor/20200803-expect-replace-fix
tests/e2e: Update github.com/creack/pty v1.1.7 -> v1.1.11
2020-08-11 11:59:12 -07:00
Gyuho Lee 844091dda3
Merge pull request #12206 from ptabor/20200807-setLoggingDataRace
integration: Fix flakes due to .setupLogging race.
2020-08-11 11:58:47 -07:00
tangcong afa0e8196c etcdserver: ignore mvcc.ErrCompacted error 2020-08-11 23:45:20 +08:00
Jingyi Hu cd25d6c06e
Merge pull request #12130 from ptabor/master
functional/tester: Update cluster_test.go to reflect functional.yaml
2020-08-09 02:14:31 +08:00
Piotr Tabor 9d182c2a70
clientv3/integration: Fix flaky TestGetTokenWithoutAuth (#12200)
The test is vary flaky on Travis.

Seems that since (https://github.com/etcd-io/etcd/issues/7724) the
client is expected to simply ignore whether server is in AuthDisabled
mode even if the user supplies credentials.

The tests used to:
  * use very large cluster (10 nodes)
  * set very low timeout (1 sec)

Such setup led to frequent deadlineExceed errors or following failures:

    === RUN   TestGetTokenWithoutAuth
    {"level":"warn","ts":"2020-08-04T16:50:48.686+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-35573307-1ee5-441b-acc7-d073f0bd7de5/localhost:69820396562031027440","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: leader changed"}
        user_test.go:151: other errors:etcdserver: leader changed
    --- FAIL: TestGetTokenWithoutAuth (10.91s)
2020-08-07 13:32:32 -07:00
Piotr Tabor 830618e44d ./integration: Fix flakes due to .setupLogging race.
The source of problem was the fact that multiple tests were creating
their clusters (and some of them were setting global grpclog).
If the test was running after some other test that created HttpServer
(so accessed grpclog), this was reported as race.

Tested with:
  go test ./clientv3/. -v "--run=(Example).*" --count=2
  go test ./clientv3/. -v "--run=(Test).*" --count=2
  go test ./integration/embed/. -v "--run=(Test).*" --count=2
2020-08-07 13:54:41 +02:00
Gyuho Lee f395f82a75
Merge pull request #12202 from spzala/auditchangelog
CHANGELOG: update with added audit report
2020-08-05 08:56:16 -07:00
Sahdev P. Zala eafd374309 CHANGELOG: update with added audit report
Update the changelog for recently added audit report. Also mention the
report in the security readme.
2020-08-04 22:44:48 -04:00
Sahdev Zala d29af0f22b
Merge pull request #12201 from spzala/audit
Add audit report
2020-08-04 20:25:24 -04:00
Sahdev P. Zala 4edd2679f0 doc: add audit report
Adding audit report.
2020-08-04 19:07:18 -04:00
Piotr Tabor 00de56a4a4 etcdserver, wal: Fix tests that were performing unintended casting of int to String.
Fixes following problems during "./etcd# go test ./..."

> go.etcd.io/etcd/v3/etcdserver/api/v2store_test
etcdserver/api/v2store/store_test.go:847:24: conversion from int to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)

> go.etcd.io/etcd/v3/wal
wal/wal_test.go:242:68: conversion from int to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
2020-08-04 20:28:41 +02:00
Piotr Tabor b151a47d1b tests/e2e: Update github.com/creack/pty v1.1.7 -> v1.1.11
The fix is needed to mitigate consequences of
https://github.com/golang/go/issues/29458 "golang breaking change" that
causes following test failures on etcd end:

--- FAIL: TestCtlV2Set (0.00s)
    ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetQuorum (0.00s)
    ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetClientTLS (0.00s)
    ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
2020-08-04 16:12:12 +02:00
Piotr Tabor 36f07cb591 functional/tester: Update cluster_test.go to reflect current content of functional.yaml.
Before the change:

.../etcd/functional% go test -v ./...
...
?   	go.etcd.io/etcd/v3/functional/runner	[no test files]
=== RUN   Test_read
{"level":"info","ts":1596018672.2852418,"caller":"tester/cluster_read_config.go:36","msg":"opened configuration file","path":"../../functional.yaml"}
    Test_read: cluster_test.go:269: expected &{lg:<nil> agentConns:[] agentClients:[] agentStreams:[] agentRequests:[] testerHTTPServer:<nil> Members:[EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:19027" FailpointHTTPAddr:"http://127.0.0.1:7381" BaseDir:"/tmp/etcd-functional-1" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:1379" Etcd:<Name:"s1" DataDir:"/tmp/etcd-functional-1/etcd.data" WALDir:"/tmp/etcd-functional-1/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:1379" AdvertiseClientURLs:"https://127.0.0.1:1379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:1380" AdvertisePeerURLs:"https://127.0.0.1:1381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:10000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-1/etcd.log" > SnapshotPath:"/tmp/etcd-functional-1.snapshot.db"  EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:29027" FailpointHTTPAddr:"http://127.0.0.1:7382" BaseDir:"/tmp/etcd-functional-2" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:2379" Etcd:<Name:"s2" DataDir:"/tmp/etcd-functional-2/etcd.data" WALDir:"/tmp/etcd-functional-2/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:2379" AdvertiseClientURLs:"https://127.0.0.1:2379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:2380" AdvertisePeerURLs:"https://127.0.0.1:2381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:10000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-2/etcd.log" > SnapshotPath:"/tmp/etcd-functional-2.snapshot.db"  EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:39027" FailpointHTTPAddr:"http://127.0.0.1:7383" BaseDir:"/tmp/etcd-functional-3" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:3379" Etcd:<Name:"s3" DataDir:"/tmp/etcd-functional-3/etcd.data" WALDir:"/tmp/etcd-functional-3/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:3379" AdvertiseClientURLs:"https://127.0.0.1:3379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:3380" AdvertisePeerURLs:"https://127.0.0.1:3381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:10000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-3/etcd.log" > SnapshotPath:"/tmp/etcd-functional-3.snapshot.db" ] Tester:DataDir:"/tmp/etcd-tester-data" Network:"tcp" Addr:"127.0.0.1:9028" DelayLatencyMs:5000 DelayLatencyMsRv:500 UpdatedDelayLatencyMs:5000 RoundLimit:1 ExitOnCaseFail:true EnablePprof:true CaseDelayMs:7000 CaseShuffle:true Cases:"SIGTERM_ONE_FOLLOWER" Cases:"SIGTERM_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT" Cases:"SIGTERM_LEADER" Cases:"SIGTERM_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"SIGTERM_QUORUM" Cases:"SIGTERM_ALL" Cases:"SIGQUIT_AND_REMOVE_ONE_FOLLOWER" Cases:"SIGQUIT_AND_REMOVE_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT" Cases:"BLACKHOLE_PEER_PORT_TX_RX_LEADER" Cases:"BLACKHOLE_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"BLACKHOLE_PEER_PORT_TX_RX_QUORUM" Cases:"BLACKHOLE_PEER_PORT_TX_RX_ALL" Cases:"DELAY_PEER_PORT_TX_RX_LEADER" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_LEADER" Cases:"DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"DELAY_PEER_PORT_TX_RX_QUORUM" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_QUORUM" Cases:"DELAY_PEER_PORT_TX_RX_ALL" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_ALL" Cases:"NO_FAIL_WITH_STRESS" Cases:"NO_FAIL_WITH_NO_STRESS_FOR_LIVENESS" FailpointCommands:"panic(\"etcd-tester\")" RunnerExecPath:"./bin/etcd-runner" Stressers:<Type:"KV_WRITE_SMALL" Weight:0.35 > Stressers:<Type:"KV_WRITE_LARGE" Weight:0.002 > Stressers:<Type:"KV_READ_ONE_KEY" Weight:0.07 > Stressers:<Type:"KV_READ_RANGE" Weight:0.07 > Stressers:<Type:"KV_DELETE_ONE_KEY" Weight:0.07 > Stressers:<Type:"KV_DELETE_RANGE" Weight:0.07 > Stressers:<Type:"KV_TXN_WRITE_DELETE" Weight:0.35 > Stressers:<Type:"LEASE" > Checkers:"KV_HASH" Checkers:"LEASE_EXPIRE" StressKeySize:100 StressKeySizeLarge:32769 StressKeySuffixRange:250000 StressKeySuffixRangeTxn:100 StressKeyTxnOps:10 StressClients:100 StressQPS:2000  cases:[] rateLimiter:<nil> stresser:<nil> checkers:[] currentRevision:0 rd:0 cs:0}, got &{lg:<nil> agentConns:[] agentClients:[] agentStreams:[] agentRequests:[] testerHTTPServer:<nil> Members:[EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:19027" FailpointHTTPAddr:"http://127.0.0.1:7381" BaseDir:"/tmp/etcd-functional-1" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:1379" Etcd:<Name:"s1" DataDir:"/tmp/etcd-functional-1/etcd.data" WALDir:"/tmp/etcd-functional-1/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:1379" AdvertiseClientURLs:"https://127.0.0.1:1379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:1380" AdvertisePeerURLs:"https://127.0.0.1:1381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:2000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-1/etcd.log" LogLevel:"info" > SnapshotPath:"/tmp/etcd-functional-1.snapshot.db"  EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:29027" FailpointHTTPAddr:"http://127.0.0.1:7382" BaseDir:"/tmp/etcd-functional-2" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:2379" Etcd:<Name:"s2" DataDir:"/tmp/etcd-functional-2/etcd.data" WALDir:"/tmp/etcd-functional-2/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:2379" AdvertiseClientURLs:"https://127.0.0.1:2379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:2380" AdvertisePeerURLs:"https://127.0.0.1:2381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:2000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-2/etcd.log" LogLevel:"info" > SnapshotPath:"/tmp/etcd-functional-2.snapshot.db"  EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:39027" FailpointHTTPAddr:"http://127.0.0.1:7383" BaseDir:"/tmp/etcd-functional-3" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:3379" Etcd:<Name:"s3" DataDir:"/tmp/etcd-functional-3/etcd.data" WALDir:"/tmp/etcd-functional-3/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:3379" AdvertiseClientURLs:"https://127.0.0.1:3379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:3380" AdvertisePeerURLs:"https://127.0.0.1:3381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:2000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-3/etcd.log" LogLevel:"info" > SnapshotPath:"/tmp/etcd-functional-3.snapshot.db" ] Tester:DataDir:"/tmp/etcd-tester-data" Network:"tcp" Addr:"127.0.0.1:9028" DelayLatencyMs:5000 DelayLatencyMsRv:500 UpdatedDelayLatencyMs:5000 RoundLimit:1 ExitOnCaseFail:true EnablePprof:true CaseDelayMs:7000 CaseShuffle:true Cases:"SIGTERM_ONE_FOLLOWER" Cases:"SIGTERM_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT" Cases:"SIGTERM_LEADER" Cases:"SIGTERM_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"SIGTERM_QUORUM" Cases:"SIGTERM_ALL" Cases:"SIGQUIT_AND_REMOVE_ONE_FOLLOWER" Cases:"SIGQUIT_AND_REMOVE_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT" Cases:"BLACKHOLE_PEER_PORT_TX_RX_LEADER" Cases:"BLACKHOLE_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"BLACKHOLE_PEER_PORT_TX_RX_QUORUM" Cases:"BLACKHOLE_PEER_PORT_TX_RX_ALL" Cases:"DELAY_PEER_PORT_TX_RX_LEADER" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_LEADER" Cases:"DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"DELAY_PEER_PORT_TX_RX_QUORUM" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_QUORUM" Cases:"DELAY_PEER_PORT_TX_RX_ALL" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_ALL" Cases:"NO_FAIL_WITH_STRESS" Cases:"NO_FAIL_WITH_NO_STRESS_FOR_LIVENESS" FailpointCommands:"panic(\"etcd-tester\")" RunnerExecPath:"./bin/etcd-runner" Stressers:<Type:"KV_WRITE_SMALL" Weight:0.35 > Stressers:<Type:"KV_WRITE_LARGE" Weight:0.002 > Stressers:<Type:"KV_READ_ONE_KEY" Weight:0.07 > Stressers:<Type:"KV_READ_RANGE" Weight:0.07 > Stressers:<Type:"KV_DELETE_ONE_KEY" Weight:0.07 > Stressers:<Type:"KV_DELETE_RANGE" Weight:0.07 > Stressers:<Type:"KV_TXN_WRITE_DELETE" Weight:0.35 > Stressers:<Type:"LEASE" > Checkers:"KV_HASH" Checkers:"LEASE_EXPIRE" StressKeySize:100 StressKeySizeLarge:32769 StressKeySuffixRange:250000 StressKeySuffixRangeTxn:100 StressKeyTxnOps:10 StressClients:100 StressQPS:2000  cases:[] rateLimiter:<nil> stresser:<nil> checkers:[] currentRevision:0 rd:0 cs:0}
--- FAIL: Test_read (0.00s)
FAIL
FAIL	go.etcd.io/etcd/v3/functional/tester	0.050s
...

After:
go test -v ../...

...
=== RUN   Test_read
{"level":"info","ts":1596018561.408186,"caller":"tester/cluster_read_config.go:36","msg":"opened configuration file","path":"../../functional.yaml"}
{"level":"info","ts":1596018561.408739,"caller":"tester/cluster_shuffle.go:35","msg":"shuffled test failure cases","total":22}
{"level":"info","ts":1596018561.4087567,"caller":"tester/cluster_shuffle.go:35","msg":"shuffled test failure cases","total":22}
--- PASS: Test_read (0.00s)
PASS

...
2020-08-04 15:47:41 +02:00
tangcong 25220a0287 *: check health by using v3 range request and its corresponding timeout 2020-08-04 00:34:18 +08:00
Hitoshi Mitake 76539cee57 test: avoid non existing package for integration test 2020-08-03 00:13:47 +09:00
Sam Batschelet 1af6d61a1c
Merge pull request #12177 from ironcladlou/etcdmembersdown-tweak
Documentation: Further improve etcdMembersDown alert
2020-07-31 15:06:52 -04:00
Dan Mace cd3df73944 Documentation: Further improve etcdMembersDown alert
Before this change, the default window for the etcdMembersDown network failure
rate function was recently changed to 1 minute. While this helps detect a etcd
recovery more quickly, it depends on scrape intervals of <= 15s to collect
sufficient data points for the rate function. In practice, an interval of >= 30s
is more typical, which causes the rate function to be less accurate.

This patch increases the window to 2m, which is a compromise between the
original value of 3m and the 1m change introuced with 2aa5684, and should
accomodate more typical scrape intervals.

To offset the window change and to further improve the chance that the alert
will only fire when etcd is truly dead, this patch changes the `for` clause from
3m to 10m. The rationale is as follows:

1. There can be significant variance in durations following a reboot before etcd
is scraped and detected as available.

2. A conservative trigger like 10m seems less likely to produce a false alarm in
the face of such variance.

3. In this alerting situation, if the outage is real, it seems unlikely that an
additional 7 minutes of delay before (for example) paging somebody will make a
significant impact on the overall response.
2020-07-31 09:26:46 -04:00
Jingyi Hu cc564110bd
clientv3: remove excessive watch cancel logging (#12187) 2020-07-29 14:58:53 -07:00
nicktming 6c81b20ec8
rafthttp: fix streamHandle outgoingConn peerID (#12179) 2020-07-28 14:41:10 -07:00
Manohar Reddy bc67babee8
package adt: rename the filename to be consistent with the package name (#12170) 2020-07-28 14:40:34 -07:00
Boqin Qin 9006d8d4f9
Documentation/learning/lock/client: Add defer Unlock (#11802) 2020-07-26 11:22:19 -07:00
Denis Issoupov 51de68ddac
12126: snapshot: corrupted in Embedded server (#12129) 2020-07-26 11:14:46 -07:00
Jay 26b89fd418
raft: don't campaign with pending snapshot (#12163)
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2020-07-26 00:04:46 -07:00
Björn Rabenstein c9a5889915
Documentation/etcd-mixin: Reformulate alerting rules to use `without` rather than `by` (#12122)
* etcd-mixin: Reformulate alerting rules to use `without` rather than `by`

With aggregations using `by`, all additional target labels that a user
might have configured, are aggregated away. However, those target
labels are useful for e.g. alert routing. With this commit, nothing
should change for vanilla job/instance target labels, but whoever has
more target labels can now still make use of them.

Signed-off-by: beorn7 <beorn@grafana.com>

* etcd-mixin: Parametrize instance labels to aggregate away

Signed-off-by: beorn7 <beorn@grafana.com>
2020-07-23 16:02:26 -07:00
Jay d0e4fe56a5
raft: check pending conf change before campaign (#12134)
* raft: check conf change before campaign

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

* raft: extract hup function

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

* raft: check pending conf change for transferleader

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2020-07-22 17:04:48 -07:00
kaiiak 772dfbfe35
wal: Fix format error avoid using reflect.DeepEqual with errors (#12118) 2020-07-20 19:49:57 -07:00