Commit Graph

16429 Commits (5c726cfe255a2ea50c2a1c8e5d0f893e60eb451a)

Author SHA1 Message Date
Jingyi Hu 0aab02e7b5
Merge pull request #12367 from ptabor/20201005-api2client
Modularization: Move dependencies of client (protos, version) to api/ module
2020-10-07 05:56:11 -07:00
Piotr Tabor 997961ebfd bom: Update bill of materials to reflect the new module. 2020-10-06 12:28:40 +02:00
Piotr Tabor ec3026fdc9 *: Run ./scripts/genproto.sh (protoc 3.12.3) after proto file moves.
The changed blobs are consequences of proto-descriptors changing as a
result of file moves.
2020-10-06 11:57:19 +02:00
Piotr Tabor 28f2b07623 *: Update references to code moved to the api/ dir.
Follow up to file-moves done in the previous commit.

The commit contains purely mechanical consequences of execution (apart
of scripts/genproto.sh):

  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/etcdserver/api/v3rpc/rpctypes|v3/api/v3rpc/rpctypes|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/version|v3/api/version|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/mvcc/mvccpb|v3/api/mvccpb|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/etcdserver/etcdserverpb|v3/api/etcdserverpb|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/etcdserver/api/membership/membershippb|v3/api/membershippb|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/auth/authpb|v3/api/authpb|g'

  % find ./ -name '*.proto' -o -name '*.md'  | xargs -L 1 sed --follow-symlinks -i 's|/mvcc/mvccpb/kv.proto|/api/mvccpb/kv.proto|g'
  % find ./ -name '*.proto' -o -name '*.md'  | xargs -L 1 sed --follow-symlinks -i 's|/auth/authpb/auth.proto|/api/authpb/auth.proto|g'
  % find ./ -name '*.proto' -o -name '*.md'  | xargs -L 1 sed --follow-symlinks -i 's|/etcdserver/api/membership/membershippb/membership.proto|/api/membershippb/membership.proto|g'

  I also modified manually paths in scripts/genproto.sh.

  % go fmt ./...
2020-10-06 11:56:16 +02:00
Piotr Tabor 2edb08642c api: Make api/ a module that will contain proto-definitions.
The module is supposed to contain minimal set of files that establish
public etcd server API. In particular client libraries for etcd built in
different languages might want to depend on this file.
2020-10-06 11:54:50 +02:00
Piotr Tabor 389642dd16 client: Move client specific code (protos, version) to api/
client: Move client specific code (protos, version) to the api/
directory. Thanks to this change /client directory will not need to depend on
the server code. In next commits we make "/api" a module on its own.

Mechanical consequences of execution:

% git mv version/version.go api/version
% git mv etcdserver/api/v3rpc/rpctypes api/v3rpc
% git mv mvcc/mvccpb api/
% git mv etcdserver/etcdserverpb api/
% git mv auth/authpb api/
% git mv etcdserver/api/membership/membershippb api/
2020-10-06 11:53:36 +02:00
Sahdev Zala 7bd956fa2b
Merge pull request #12366 from guusvw/fix-yaml-indention-doc
the example alert file had a wrong indentation
2020-10-05 14:08:12 -04:00
Guus van Weelden 985d4cffc4
Documentation: the example alert file had a wrong indentation
Signed-off-by: Guus van Weelden <guus.vanweelden@moia.io>
2020-10-05 18:11:21 +02:00
Sahdev Zala 0693e2b4df
Merge pull request #12355 from cfc4n/changelog_gettoken
CHANGELOG: update for #12165 , #12264 .
2020-10-05 11:14:09 -04:00
Gyuho Lee fdb3f89730
Merge pull request #12362 from ptabor/20201001-deflake-unit-race
Fix "race" - auth unit tests leaking goroutines
2020-10-04 20:47:52 -07:00
Piotr Tabor 97820f1c6e integration: Fix flakes of TestV3WatchRestoreSnapshotUnsync
```
```

The flakes manifested as:
```
--- FAIL: TestV3WatchRestoreSnapshotUnsync (3.59s)
    v3_watch_restore_test.go:82: inflight snapshot sends expected 0 or 1, got ""
FAIL
coverage: 55.2% of statements
FAIL	go.etcd.io/etcd/v3/integration	3.646s
FAIL
```

The root reason is that all the SnapMsg processing happends on both ends
(leader, follower) assynchronously in goroutines, e.g. on Fifo schedule
within EtcdServer.run, so when we observe through metrics, we don't
know whether it finised (or even got started).

Idally we should have EtcdServer.Drain() call that exits when the server
processed or internal 'queues' and is idle.
2020-10-03 19:39:08 +02:00
Piotr Tabor 98b123f034 mvcc: Fix races between metrics gathering and mvcc.Restore
The races was manifesting as following flakes:

```
```
See:
  https://github.com/etcd-io/etcd/issues/12336

I'm taking the locks for short-duration of time (instead of the whole
duriation of Restore) to allow metrics being gather when the server
restoration is in progress.

```
{"level":"warn","ts":"2020-09-26T13:33:13.010Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-c9c21e47-2013-4776-8e83-e331b2caa9ae/localhost:14422410081761184170","attempt":0,"error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix localhost:14422410081761184170: connect: no such file or directory\""}
{"level":"warn","ts":"2020-09-26T13:33:13.011Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-c9c21e47-2013-4776-8e83-e331b2caa9ae/localhost:14422410081761184170","attempt":0,"error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix localhost:14422410081761184170: connect: no such file or directory\""}
{"level":"warn","ts":"2020-09-26T13:33:16.285Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-b504e954-e000-42a4-aa4f-70ded8dbef39/localhost:55672762955698614610","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
{"level":"warn","ts":"2020-09-26T13:33:21.434Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-7945004b-f67e-42aa-af11-a7b40fbbe6fc/localhost:49623072144007561240","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
==================
WARNING: DATA RACE
Write at 0x00c000905f78 by goroutine 764:
  go.etcd.io/etcd/v3/mvcc.(*store).restore()
      /go/src/go.etcd.io/etcd/mvcc/kvstore.go:397 +0x773
  go.etcd.io/etcd/v3/mvcc.(*store).Restore()
      /go/src/go.etcd.io/etcd/mvcc/kvstore.go:343 +0x5f1
  go.etcd.io/etcd/v3/mvcc.(*watchableStore).Restore()
      /go/src/go.etcd.io/etcd/mvcc/watchable_store.go:199 +0xe2
  go.etcd.io/etcd/v3/etcdserver.(*EtcdServer).applySnapshot()
      /go/src/go.etcd.io/etcd/etcdserver/server.go:1107 +0xa49
  go.etcd.io/etcd/v3/etcdserver.(*EtcdServer).applyAll()
      /go/src/go.etcd.io/etcd/etcdserver/server.go:1031 +0x6d
  go.etcd.io/etcd/v3/etcdserver.(*EtcdServer).run.func8()
      /go/src/go.etcd.io/etcd/etcdserver/server.go:986 +0x53
  go.etcd.io/etcd/v3/pkg/schedule.(*fifo).run()
      /go/src/go.etcd.io/etcd/pkg/schedule/schedule.go:157 +0x11e
Previous read at 0x00c000905f78 by goroutine 180:
  [failed to restore the stack]
Goroutine 764 (running) created at:
  go.etcd.io/etcd/v3/pkg/schedule.NewFIFOScheduler()
      /go/src/go.etcd.io/etcd/pkg/schedule/schedule.go:70 +0x2b1
  go.etcd.io/etcd/v3/etcdserver.(*EtcdServer).run()
      /go/src/go.etcd.io/etcd/etcdserver/server.go:871 +0x32c
Goroutine 180 (running) created at:
  net/http.(*Server).Serve()
      /usr/local/go/src/net/http/server.go:2933 +0x5b6
  net/http/httptest.(*Server).goServe.func1()
      /usr/local/go/src/net/http/httptest/server.go:308 +0xd3
==================
--- FAIL: TestV3WatchRestoreSnapshotUnsync (6.74s)
    testing.go:906: race detected during execution of test
FAIL
coverage: 83.5% of statements
FAIL	go.etcd.io/etcd/v3/integration	231.272s
FAIL
Command 'go test -timeout=30m -cpu=1 --race --cover=true go.etcd.io/etcd/v3/integration' failed.
```
2020-10-03 19:39:08 +02:00
Piotr Tabor 220f711a2a clientv3/integration: Fix leaked goroutine in case of skipped test. 2020-10-03 19:38:54 +02:00
Piotr Tabor 528f5315d6 auth: Fix "race" - auth unit tests leaking goroutines
- We were leaking goroutines in auth-test
  - The go-routines were depending / modifying global test environment
variables (simpleTokenTTLDefault) leading to races

Removed the leaked go-routines, and expanded 'auth' package to
be covered we leaked go-routines detection.
2020-10-03 19:38:30 +02:00
Piotr Tabor 0e5d81704f .travis.yml, scripts: Fix minor bugs in the test script.
1. setting environment variable cannot be in quote
2. "--race" testing for unit tests is supposed to be part of linux-amd64-unit-4-cpu-race config.
3. 'run' function in test script should log_error in case of failed
command (wrong operator for ints comparison in bash).
2020-10-01 14:34:58 +02:00
CFC4N b0e8ec951c
CHANGELOG: update for #12165 , #12264 . 2020-09-30 16:57:15 +08:00
vivian ab4cc3caef
raft/log: remove redundant code logic (#12346)
Remove redundant code logic

Co-authored-by: yangweiwei <yangweiwei@cmss.chinamobile.com>
2020-09-29 19:48:32 -07:00
Piotr Tabor a6b0375b7b
Travis: Reduce footprint of unit tests - so hopefully flakiness (#12350)
* ./tests: Remove legacy coverage collection code

The legacy tests/cover.test.bash script was not ./test script
compatible for a long time.

The following method of coverage collection works (also across
packages) and does not make all the test execution slower.

```
COVERDIR=coverage PASSES="build build_cov cov" ./test
go tool cover -html ./coverage/cover.out
```

* CI: Reduce duplicated coverage between different variants on Travis

We used to execute unit tests in 3 different jobs,
every time with --race detection and every time in 3 variants:1,2,4
CPUS.

The proposed change makes each of the jobs use different variant of
CPUS, and only 4-cpu variant is running with --race detection
(as the more-parallel variant is more likely to experience races),
2020-09-29 19:47:05 -07:00
Joe Betz b47cd2f470
Merge pull request #12322 from ptabor/20200920-test-script
./test: Refactoring of test script for modularization
2020-09-29 11:09:24 -04:00
Naga Ravi Chaitanya Elluri 3022bd73ce Documentation/etcd-mixin/mixin.libsonnet: Add alerts for etcd fsync duration
This commit adds support to check the 99th percentile of the etcd
members fsync duration and fires a critical alert when it is greater
than 1 sec. The recommended fsync for etcd is 20 ms but there might
be scenarios where a user might be using bad disks for reasons. This
will make sure to let the user/admin know that it is critical for
etcd performance.
2020-09-28 08:08:20 -04:00
Piotr Tabor b3bbe10465 go.sum: Update & make sure PASSES="mod_tidy" ./test detects such problems.
Commit inspired by this failure:
  https://travis-ci.com/github/etcd-io/etcd/jobs/391164537

This is not happanning locally - but can be forced by removal of go.sum
file. Let's watch how frequently we will need to refresh go.sum.
2020-09-28 12:00:25 +02:00
Piotr Tabor f1d4593241 ./test: Refactoring of test script for modularization
This refactoring offers following benefits:

  - Unified way how go test commands are being called (in terms of flags intepretation)
  - Uses standard go mechanisms (like go lists) to find files/packages that are subject for test. The mechanism are module aware.
  - Added instruction how to install tools needed for the tests/checkers.
  - Added colors to the output to make it easier to spot any failure.

Confirmed to work using:
- COVERDIR="./coverage" CPU="4" RACE=false COVER=false PASSES="build build_cov cov" ./test
- CPU="4" RACE=false COVER=false PASSES="e2e functional integration" ./test
- COVERDIR="./coverage" COVER="false" CPU="4" RACE="false" PASSES="fmt build unit build_cov integration e2e integration_e2e grpcproxy cov" ./test
- PASSES=unit PKG=./wal TIMEOUT=1m ./test
- PASSES=integration PKG=./clientv3 TIMEOUT=1m ./test
- PASSES=unit PKG=./wal TESTCASE=TestNew TIMEOUT=1m ./test
- PASSES=unit PKG=./wal TESTCASE="\bTestNew\b" TIMEOUT=1m ./test
- PASSES=integration PKG=./client/integration TESTCASE="\bTestV2NoRetryEOF\b" TIMEOUT=1m ./test
- COVERDIR=coverage PASSES="build_cov cov" ./test
2020-09-28 11:07:50 +02:00
Pierre Zemb cc2b4cd05e
etcdserver: add more detailed traces on linearized reading (#12335)
To improve debuggability of `agreement among raft nodes before
linearized reading`, we added some tracing inside
`linearizableReadLoop`.

This will allow us to know the timing of `s.r.ReadIndex` vs
`s.applyWait.Wait(rs.Index)`.
2020-09-26 19:08:36 -07:00
Piotr Tabor 31426b0041 clientv3/ordering: Split mocked part of the test from integration-level test. 2020-09-26 08:44:58 +02:00
Piotr Tabor d65b5d6791 Makefile: Improve the 'make clean' to remove all the tmp artifacts 2020-09-25 22:20:52 +02:00
Piotr Tabor 16eeedffaa pkg/testutil: Fixing flakes due to >>leak" text/template/parse goroutines.
Examplar flake: https://travis-ci.com/github/etcd-io/etcd/jobs/388806782
```
go test -timeout=5m -cpu=1 --run=Example ./client/...

ok  	go.etcd.io/etcd/v3/client	0.085s
testing: warning: no tests to run
PASS
Unexpected goroutines running after all test(s).
1 instances of:
text/template/parse.(*lexer).emit(...)
	/usr/local/go/src/text/template/parse/lex.go:157
text/template/parse.lexText(...)
	/usr/local/go/src/text/template/parse/lex.go:269 +0x4f0
text/template/parse.(*lexer).run(...)
	/usr/local/go/src/text/template/parse/lex.go:230 +0x37
created by text/template/parse.lex
	/usr/local/go/src/text/template/parse/lex.go:223 +0x190
FAIL	go.etcd.io/etcd/v3/client/integration	0.013s
```
2020-09-25 22:10:43 +02:00
Piotr Tabor 73e5714bc5
integration: 'go test -tags cluster_proxy -v ./integration/... ./clientv3/...' passes now. (#12319)
The grpc-proxy test logic was assuming that the context associated to client is closed,
while in practice all tests called client.Close() without explicit context close.

The current testing strategy is complicated 2 fold:
  - grpc proxy works like man-in-the middle of each Connection issues
from integration tests and its lifetime is bound to the connection.
  - both connections (client -> proxy, and proxy -> etcd-server) are
represented by the same ClientV3 object instance (with substituted
implementations of KV or watcher).

The fix splits context representing proxy from context representing proxy -> etcd-server connection,
thus allowing cancelation of the proxy context.
2020-09-25 12:18:58 -07:00
mlmhl 3f36143790
pkg/traceutil: skip subTraceStart/subTraceEnd steps when logging steps (#12262)
SubTraceStart and SubTraceEnd steps are only placeholders, not really
steps, we should skip them when logging the long duration steps,
otherwise these steps will lead to incorrect start time and duration
 of subsequent steps.
2020-09-25 11:46:06 -07:00
Piotr Tabor 7bf75824bf
CHANGELOG: Update changelog about modules instead of ./vendor (#12313)
Follow up to PR: https://github.com/etcd-io/etcd/pull/12279
2020-09-25 11:29:51 -07:00
Naga Ravi Chaitanya Elluri ed82418799
Documentation: Add etcd database quota alerts (#12249)
This commit:
- Fires a critical alert when the etcd database quota is 95% full
  at any given point of time to alert the user to defrag or increase
  the quota in order to avoid the alarm getting triggered which blocks
  all the writes to etcd meaning there can't be any new objects created.
  This is needed to make sure the cluster supports running large number
  of nodes and objects.
- Fires a warning when there is a sudden surge in etcd writes leading to
  increase in the etcd database quota size at an alarming rate as it
  is disruptive. It might be because of a rougue process and it's
  important to alert the admin.
2020-09-25 11:03:04 -07:00
CFC4N 8050881aaf
clientv3:get AuthToken gracefully without extra connection. (#12165)
* etcdserver: check authinfo if it is not InternalAuthenticateRequest.

* credentials: let GetRequestMetadata() return nil when authToken isn't initialized.

* clientv3: get AuthToken gracefully without extra connection.
2020-09-25 11:01:54 -07:00
Paweł Krupa 74fea11ddc
Documentation/etcd-mixin: Adhere to monitoring mixins annotation guidelines (#12224)
* replaced `message` annotation field with `description`
* added simple `summary` field

Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-09-25 10:56:52 -07:00
Tobias Klauser add86bbd1a
pkg/fileutil: use fcntl syscall wrappers from golang.org/x/sys/unix (#12316)
Direct syscalls using syscall.Syscall(SYS_*, ...) should no longer be
used on darwin, see [1]. Instead, use the fcntl libSystem wrappers
provided by the golang.org/x/sys/unix package which implement the same
functionality.

[1] https://golang.org/doc/go1.12#darwin
2020-09-24 23:02:32 -07:00
yofan 4136df7933
lease: fix lease expiry bases on wall clock (#12292)
fix https://github.com/etcd-io/etcd/issues/12291
2020-09-24 22:59:25 -07:00
CFC4N 8c192d99df
clientv3: get AuthToken automatically when clientConn is ready. (#12264)
fixes: #11954
2020-09-24 22:43:21 -07:00
Jingyi Hu 205a656cc5
Merge pull request #11853 from viviyww/dev1
tools: fix test case errors in etcd-dump-logs
2020-09-21 09:13:31 -07:00
Jingyi Hu 353fcf0924
Merge pull request #12314 from viviyww/incorrect-log
etcdserver: fix log info error
2020-09-21 03:17:59 -07:00
Jingyi Hu de1550d7c8
Merge pull request #12318 from ptabor/20200920-leak-detection-flake-fix
pkg/testutil: Ignore flakes due to "leaked" testing.runTests goroutine
2020-09-21 01:22:06 -07:00
yangweiwei ff516e3a36 etcdserver: fix log info error
etcdserver#util.go#warnOfExpensiveReadOnlyTxnRequest logs with wrong prefix "read-only range ".
It has to be "read-only txn ".

fixes #12295
2020-09-21 16:00:01 +08:00
yangweiwei c88d1497ee tools: fix test case errors in etcd-dump-logs
Fix test case errors in etcd-dump-logs and the error is the time zone.
When the GOST time zone is CST or PST and the test case will be failed.
So we should set UTC as the standard time zone.
2020-09-21 15:54:47 +08:00
Jingyi Hu 14566556f3
Merge pull request #12283 from teddylear/feature/FixPortOnTest
embed: TestStartEtcdWrongToken now uses dynamic ports instead of default
2020-09-20 08:30:48 -07:00
Jingyi Hu 132098b028
Merge pull request #12311 from ptabor/20200917-proxy-watcher-progress-panic
integration,proxy: Skip WatchRequestProgress test in grpc-proxy mode.
2020-09-20 08:07:20 -07:00
teddylear 3b92b9f884 embed: TestStartEtcdWrongToken now uses dynamic ports instead of default
To avoid issues with a test failing due to port conflict when etcd is
already running, this test now uses dynamic ports instead.

Fixes #11956
2020-09-19 11:29:22 -04:00
Piotr Tabor 5f9a1394db integration,proxy: Skip WatchRequestProgress test in grpc-proxy mode.
Fixes:
  go test -tags cluster_proxy ./clientv3/integration -v -run TestWatchRequestProgress

Does not fail the grpc-server (completely) by a not implemented RPC.
Failing whole server by remote request is anti-pattern and security
risk.

Prior to the fix, the command line above was failing with:

```
=== RUN   TestWatchRequestProgress/0-watcher
panic: not implemented

goroutine 602 [running]:
go.etcd.io/etcd/v3/proxy/grpcproxy.(*watchProxyStream).recvLoop(0xc0004779d0, 0x0, 0x0)
	/home/ptab/corp/etcd/proxy/grpcproxy/watch.go:275 +0xac5
go.etcd.io/etcd/v3/proxy/grpcproxy.(*watchProxy).Watch.func1(0xc0034f94a0, 0xc0004779d0)
	/home/ptab/corp/etcd/proxy/grpcproxy/watch.go:129 +0x53
created by go.etcd.io/etcd/v3/proxy/grpcproxy.(*watchProxy).Watch
	/home/ptab/corp/etcd/proxy/grpcproxy/watch.go:127 +0x3c8
FAIL	go.etcd.io/etcd/v3/clientv3/integration	0.215s
FAIL
```
2020-09-19 17:27:07 +02:00
Piotr Tabor 04b91945f4 pkg/testutil: Ignore flakes due to "leaked" testing.runTests goroutine.
The flake happened e.g. in:
https://travis-ci.com/github/etcd-io/etcd/jobs/386607570

```
--- PASS: TestWatchClose (0.37s)
PASS
Unexpected goroutines running after all test(s).
1 instances of:
testing.runTests.func1.1(...)
	/usr/local/go/src/testing/testing.go:1289 +0x60
created by testing.runTests.func1
	/usr/local/go/src/testing/testing.go:1289 +0xdb
FAIL	go.etcd.io/etcd/v3/clientv3/integration	344.389s
FAIL
```

This is implementation detail of Go testing.lib and we should not worry.
2020-09-19 17:16:21 +02:00
Jingyi Hu 6d5b77b91b
Merge pull request #12315 from BinacsLee/binacs-mvcc-fix-typo
mvcc: fix typo
2020-09-18 09:17:59 -07:00
Binacs Lee 6968c45f58 mvcc: fix typo 2020-09-18 06:50:56 +00:00
Sahdev Zala 588c021ddb
Merge pull request #12308 from zcchew1202/readme-fix
Doc: Add that grpc-proxy is optional in readme
2020-09-17 16:29:18 -04:00
Zi Chien Chew be348f0ea6 Doc: Add that grpc-proxy is optional in readme
The script allows optionally enabling `grpc-proxy` so reflecting it in the doc…
2020-09-17 12:13:18 -04:00
Jingyi Hu 528b01c327
Merge pull request #12303 from ptabor/20200915-v3compactor-metric-reporting
etcdserver: v3compactor should use proper clock for latency (took) reporting
2020-09-17 02:26:56 -07:00