They used to take >10min with coverage, so were causing interrupted
Travis runs.
Know thay fit in 100-150s (together), thanks also to parallel
execution.
The root reason of flakes, was that server was considered as ready to
early.
In particular:
```
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:44.474+0100","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"ed5f620d34a8e61b","remote-peer-id":"ca50e9357181d758"}
../../bin/etcd-2456648: {"level":"warn","ts":"2021-01-11T09:56:49.040+0100","caller":"etcdserver/server.go:1942","msg":"failed to publish local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","publish-timeout":"7s","error":"etcdserver: request timed out, possibly due to connection lost"}
../../bin/etcd-2456648: {"level":"info","ts":"2021-01-11T09:56:49.049+0100","caller":"etcdserver/server.go:1921","msg":"published local member to cluster through raft","local-member-id":"ed5f620d34a8e61b","local-member-attributes":"{Name:infra2 ClientURLs:[http://localhost:20030]}","request-path":"/0/members/ed5f620d34a8e61b/attributes","cluster-id":"34f27e83b3bc2ff","publish-timeout":"7s"}
```
was taking 5s. If this was happening concurrently with etcdctl, the
etcdctl could timeout.
The fix, requires servers to report 'ready to serve client requests' to consider them up.
Fixed also some whitelisted 'goroutines'.
The commit ensures that spawned etcdctl processes are "closed",
so they perform proper os wait processing.
This might have contributed to file-descriptor/open-files limit being
exceeded.
In these unit tests, goroutines may leak if certain branches are chosen. This commit edits channel operations and buffer sizes, so no matter what branch is chosen, the test will end correctly. This commit doesn't change the semantics of unit tests.
Use golang.org/x/sys/unix for F_OFD_* constants.
This fixes the issue that F_OFD_GETLK was defined incorrectly,
resulting in bugs such as https://github.com/moby/moby/issues/31182
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This brings consistency between proto-generation code and actual versions of libraries being used in runtime:
github.com/gogo/protobuf v1.2.1,v1.0.0 -> v1.3.1
github.com/golang/protobuf v1.3.2 -> v1.3.5
github.com/grpc-ecosystem/grpc-gateway v1.9.5,v1.4.1,v1.15.2 -> v1.14.6
google.golang.org/grpc v1.26.0 -> v1.29.1
Moved as far as possible, without bumping on grpc 1.30.0 "naming" decomissioning.
Please also notice that gogo/protobuf is likely to reach EOL: https://github.com/gogo/protobuf/issues/691
This package depends on raft and is solelly used by etcdserver/raft.go.
So it does not fullfills conditions of:
```
pkg/ is a collection of utility packages used by etcd without being specific to etcd itself. A package belongs here
only if it could possibly be moved out into its own repository in the future.
```
Mechanical execution of:
```
git mv ./pkg/mock/mockserver ./client/pkg/mock
git mv ./pkg/mock/{mockstorage,mockstore,mockwait} ./server/pkg/mock
```
The packages depend on etcd API / protos - so they are NOT etcd-dependencies.
In such situation thay should be placed in 'pkg' folder.
We introduce a LazyCluster abstraction (instead of copy-pasted logic)
that makes clusters to be created only if there are runnable tests
in need for the infrastructure.
This CL tries to connect 2 objectives:
- Examples should be close (the same package) to the original code,
such that they can participate in documentation.
- Examples should be runnable - such that they are not getting out of
sync with underlying API/implementation.
In case of etcd-client, the examples are assuming running 'integration'
style, i.e. thay do connect to fully functional etcd-server.
That would lead to a cyclic dependencies between modules:
- server depends on client (as client need to be lightweight)
- client (for test purposes) depend on server.
Go modules does not allow to distingush testing dependency from
prod-code dependency.
Thus to meet the objective:
- The examples are getting executed within testing/integration packages against real etcd
- The examples are symlinked to 'unit' tests, such that they included in documentation.
- Long-term the unit examples should get rewritten to use 'mocks' instead of real integration tests.
- We were leaking goroutines in auth-test
- The go-routines were depending / modifying global test environment
variables (simpleTokenTTLDefault) leading to races
Removed the leaked go-routines, and expanded 'auth' package to
be covered we leaked go-routines detection.
To improve debuggability of `agreement among raft nodes before
linearized reading`, we added some tracing inside
`linearizableReadLoop`.
This will allow us to know the timing of `s.r.ReadIndex` vs
`s.applyWait.Wait(rs.Index)`.
Examplar flake: https://travis-ci.com/github/etcd-io/etcd/jobs/388806782
```
go test -timeout=5m -cpu=1 --run=Example ./client/...
ok go.etcd.io/etcd/v3/client 0.085s
testing: warning: no tests to run
PASS
Unexpected goroutines running after all test(s).
1 instances of:
text/template/parse.(*lexer).emit(...)
/usr/local/go/src/text/template/parse/lex.go:157
text/template/parse.lexText(...)
/usr/local/go/src/text/template/parse/lex.go:269 +0x4f0
text/template/parse.(*lexer).run(...)
/usr/local/go/src/text/template/parse/lex.go:230 +0x37
created by text/template/parse.lex
/usr/local/go/src/text/template/parse/lex.go:223 +0x190
FAIL go.etcd.io/etcd/v3/client/integration 0.013s
```
SubTraceStart and SubTraceEnd steps are only placeholders, not really
steps, we should skip them when logging the long duration steps,
otherwise these steps will lead to incorrect start time and duration
of subsequent steps.
Direct syscalls using syscall.Syscall(SYS_*, ...) should no longer be
used on darwin, see [1]. Instead, use the fcntl libSystem wrappers
provided by the golang.org/x/sys/unix package which implement the same
functionality.
[1] https://golang.org/doc/go1.12#darwin
The flake happened e.g. in:
https://travis-ci.com/github/etcd-io/etcd/jobs/386607570
```
--- PASS: TestWatchClose (0.37s)
PASS
Unexpected goroutines running after all test(s).
1 instances of:
testing.runTests.func1.1(...)
/usr/local/go/src/testing/testing.go:1289 +0x60
created by testing.runTests.func1
/usr/local/go/src/testing/testing.go:1289 +0xdb
FAIL go.etcd.io/etcd/v3/clientv3/integration 344.389s
FAIL
```
This is implementation detail of Go testing.lib and we should not worry.
Marked all 'integrational, e2e' as skipped in the --short mode.
Thanks to this we will be able to significantly simplify ./test script.
The run currently takes ~23s.
With (follow up) move of ~clientv3/snapshot to integration tests (as
part of modularization), we can expect this to fall to 5-10s.
```
% time go test --short ./... --count=1
ok go.etcd.io/etcd/v3 0.098s
? go.etcd.io/etcd/v3/Documentation/learning/lock/client [no test files]
? go.etcd.io/etcd/v3/Documentation/learning/lock/storage [no test files]
ok go.etcd.io/etcd/v3/auth 0.724s
? go.etcd.io/etcd/v3/auth/authpb [no test files]
ok go.etcd.io/etcd/v3/client 0.166s
ok go.etcd.io/etcd/v3/client/integration 0.166s
ok go.etcd.io/etcd/v3/clientv3 3.219s
ok go.etcd.io/etcd/v3/clientv3/balancer 1.102s
? go.etcd.io/etcd/v3/clientv3/balancer/connectivity [no test files]
? go.etcd.io/etcd/v3/clientv3/balancer/picker [no test files]
? go.etcd.io/etcd/v3/clientv3/balancer/resolver/endpoint [no test files]
ok go.etcd.io/etcd/v3/clientv3/clientv3util 0.096s [no tests to run]
ok go.etcd.io/etcd/v3/clientv3/concurrency 3.323s
? go.etcd.io/etcd/v3/clientv3/credentials [no test files]
ok go.etcd.io/etcd/v3/clientv3/integration 0.131s
? go.etcd.io/etcd/v3/clientv3/leasing [no test files]
? go.etcd.io/etcd/v3/clientv3/mirror [no test files]
ok go.etcd.io/etcd/v3/clientv3/namespace 0.041s
ok go.etcd.io/etcd/v3/clientv3/naming 0.115s
ok go.etcd.io/etcd/v3/clientv3/ordering 0.121s
ok go.etcd.io/etcd/v3/clientv3/snapshot 19.325s
ok go.etcd.io/etcd/v3/clientv3/yaml 0.090s
ok go.etcd.io/etcd/v3/contrib/raftexample 7.572s
? go.etcd.io/etcd/v3/contrib/recipes [no test files]
ok go.etcd.io/etcd/v3/embed 0.282s
ok go.etcd.io/etcd/v3/etcdctl 0.054s
? go.etcd.io/etcd/v3/etcdctl/ctlv2 [no test files]
ok go.etcd.io/etcd/v3/etcdctl/ctlv2/command 0.117s
? go.etcd.io/etcd/v3/etcdctl/ctlv3 [no test files]
ok go.etcd.io/etcd/v3/etcdctl/ctlv3/command 0.070s
ok go.etcd.io/etcd/v3/etcdmain 0.172s
ok go.etcd.io/etcd/v3/etcdserver 1.698s
? go.etcd.io/etcd/v3/etcdserver/api [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/etcdhttp 0.075s
ok go.etcd.io/etcd/v3/etcdserver/api/membership 0.104s
? go.etcd.io/etcd/v3/etcdserver/api/membership/membershippb [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/rafthttp 0.181s
ok go.etcd.io/etcd/v3/etcdserver/api/snap 0.078s
? go.etcd.io/etcd/v3/etcdserver/api/snap/snappb [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/v2auth 0.142s
ok go.etcd.io/etcd/v3/etcdserver/api/v2discovery 0.035s
ok go.etcd.io/etcd/v3/etcdserver/api/v2error 0.043s
ok go.etcd.io/etcd/v3/etcdserver/api/v2http 0.070s
ok go.etcd.io/etcd/v3/etcdserver/api/v2http/httptypes 0.031s
? go.etcd.io/etcd/v3/etcdserver/api/v2stats [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/v2store 0.645s
ok go.etcd.io/etcd/v3/etcdserver/api/v2v3 0.218s
? go.etcd.io/etcd/v3/etcdserver/api/v3alarm [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3client [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/v3compactor 1.765s
? go.etcd.io/etcd/v3/etcdserver/api/v3election [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3election/v3electionpb [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3election/v3electionpb/gw [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3lock [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3lock/v3lockpb [no test files]
? go.etcd.io/etcd/v3/etcdserver/api/v3lock/v3lockpb/gw [no test files]
ok go.etcd.io/etcd/v3/etcdserver/api/v3rpc 0.091s
ok go.etcd.io/etcd/v3/etcdserver/api/v3rpc/rpctypes 0.012s
ok go.etcd.io/etcd/v3/etcdserver/cindex 0.054s
ok go.etcd.io/etcd/v3/etcdserver/etcdserverpb 0.039s
? go.etcd.io/etcd/v3/etcdserver/etcdserverpb/gw [no test files]
ok go.etcd.io/etcd/v3/functional/agent 0.094s
? go.etcd.io/etcd/v3/functional/cmd/etcd-agent [no test files]
? go.etcd.io/etcd/v3/functional/cmd/etcd-proxy [no test files]
? go.etcd.io/etcd/v3/functional/cmd/etcd-runner [no test files]
? go.etcd.io/etcd/v3/functional/cmd/etcd-tester [no test files]
ok go.etcd.io/etcd/v3/functional/rpcpb 0.060s
? go.etcd.io/etcd/v3/functional/runner [no test files]
ok go.etcd.io/etcd/v3/functional/tester 0.079s
ok go.etcd.io/etcd/v3/integration 0.684s
ok go.etcd.io/etcd/v3/integration/embed 0.101s
ok go.etcd.io/etcd/v3/lease 3.455s
ok go.etcd.io/etcd/v3/lease/leasehttp 2.185s
? go.etcd.io/etcd/v3/lease/leasepb [no test files]
ok go.etcd.io/etcd/v3/mvcc 7.246s
ok go.etcd.io/etcd/v3/mvcc/backend 0.354s
? go.etcd.io/etcd/v3/mvcc/mvccpb [no test files]
ok go.etcd.io/etcd/v3/pkg/adt 0.025s
? go.etcd.io/etcd/v3/pkg/contention [no test files]
? go.etcd.io/etcd/v3/pkg/cpuutil [no test files]
ok go.etcd.io/etcd/v3/pkg/crc 0.008s
? go.etcd.io/etcd/v3/pkg/debugutil [no test files]
ok go.etcd.io/etcd/v3/pkg/expect 0.015s
ok go.etcd.io/etcd/v3/pkg/fileutil 0.268s
ok go.etcd.io/etcd/v3/pkg/flags 0.021s
ok go.etcd.io/etcd/v3/pkg/httputil 0.020s
ok go.etcd.io/etcd/v3/pkg/idutil 0.008s
ok go.etcd.io/etcd/v3/pkg/ioutil 0.025s
ok go.etcd.io/etcd/v3/pkg/logutil 0.047s
? go.etcd.io/etcd/v3/pkg/mock/mockserver [no test files]
? go.etcd.io/etcd/v3/pkg/mock/mockstorage [no test files]
? go.etcd.io/etcd/v3/pkg/mock/mockstore [no test files]
? go.etcd.io/etcd/v3/pkg/mock/mockwait [no test files]
ok go.etcd.io/etcd/v3/pkg/netutil 1.024s
ok go.etcd.io/etcd/v3/pkg/osutil 0.021s
ok go.etcd.io/etcd/v3/pkg/pathutil 0.008s
ok go.etcd.io/etcd/v3/pkg/pbutil 0.008s
ok go.etcd.io/etcd/v3/pkg/proxy 4.081s
ok go.etcd.io/etcd/v3/pkg/report 0.008s
? go.etcd.io/etcd/v3/pkg/runtime [no test files]
ok go.etcd.io/etcd/v3/pkg/schedule 0.009s
ok go.etcd.io/etcd/v3/pkg/srv 0.019s
ok go.etcd.io/etcd/v3/pkg/stringutil 0.008s
? go.etcd.io/etcd/v3/pkg/systemd [no test files]
ok go.etcd.io/etcd/v3/pkg/testutil 0.023s
ok go.etcd.io/etcd/v3/pkg/tlsutil 3.965s
ok go.etcd.io/etcd/v3/pkg/traceutil 0.034s
ok go.etcd.io/etcd/v3/pkg/transport 0.532s
ok go.etcd.io/etcd/v3/pkg/types 0.028s
ok go.etcd.io/etcd/v3/pkg/wait 0.023s
ok go.etcd.io/etcd/v3/proxy/grpcproxy 0.101s
? go.etcd.io/etcd/v3/proxy/grpcproxy/adapter [no test files]
? go.etcd.io/etcd/v3/proxy/grpcproxy/cache [no test files]
ok go.etcd.io/etcd/v3/proxy/httpproxy 0.044s
ok go.etcd.io/etcd/v3/proxy/tcpproxy 0.047s
ok go.etcd.io/etcd/v3/raft 0.312s
ok go.etcd.io/etcd/v3/raft/confchange 0.183s
ok go.etcd.io/etcd/v3/raft/quorum 0.316s
ok go.etcd.io/etcd/v3/raft/raftpb 0.024s
ok go.etcd.io/etcd/v3/raft/rafttest 0.640s
ok go.etcd.io/etcd/v3/raft/tracker 0.026s
ok go.etcd.io/etcd/v3/tests/e2e 0.077s
? go.etcd.io/etcd/v3/tools/benchmark [no test files]
? go.etcd.io/etcd/v3/tools/benchmark/cmd [no test files]
? go.etcd.io/etcd/v3/tools/etcd-dump-db [no test files]
ok go.etcd.io/etcd/v3/tools/etcd-dump-logs 0.088s
? go.etcd.io/etcd/v3/tools/etcd-dump-metrics [no test files]
? go.etcd.io/etcd/v3/tools/local-tester/bridge [no test files]
? go.etcd.io/etcd/v3/version [no test files]
ok go.etcd.io/etcd/v3/wal 1.517s
? go.etcd.io/etcd/v3/wal/walpb [no test files]
go test --short ./... --count=1 76.12s user 12.57s system 375% cpu 23.635 total
```
We have following communication schema:
client --- 1 ---> grpc-proxy --- 2 --- > etcd-server
There are 2 sets of flags/certs in grpc proxy [ https://github.com/etcd-io/etcd/blob/master/etcdmain/grpc_proxy.go#L140 ]:
A. (cert-file, key-file, trusted-ca-file, auto-tls) this are controlling [1] so client to proxy connection and in particular they are describing proxy public identity.
B. (cert,key, cacert ) - these are controlling [2] so what's the identity that proxy uses to make connections to the etcd-server.
If 2 (B.) contains certificate with CN and etcd-server is running with --client-cert-auth=true, the CN can be used as identity of 'client' from service perspective. This is permission escalation, that we should forbid.
If 1 (A.) contains certificate with CN - it should be considered perfectly valid. The server can (should) have full identity.
So only --cert flag (and not --cert-file flag) should be validated for empty CN.
The source of problem was the fact that multiple tests were creating
their clusters (and some of them were setting global grpclog).
If the test was running after some other test that created HttpServer
(so accessed grpclog), this was reported as race.
Tested with:
go test ./clientv3/. -v "--run=(Example).*" --count=2
go test ./clientv3/. -v "--run=(Test).*" --count=2
go test ./integration/embed/. -v "--run=(Test).*" --count=2
The fix is needed to mitigate consequences of
https://github.com/golang/go/issues/29458 "golang breaking change" that
causes following test failures on etcd end:
--- FAIL: TestCtlV2Set (0.00s)
ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetQuorum (0.00s)
ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetClientTLS (0.00s)
ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
os.MkdirAll creates directory before umask so make sure that a desired
permission is set after creating a directory with MkdirAll. Use the
existing TouchDirAll function which checks for permission if dir is already
exist and when create a new dir.
This change makes the etcd package compatible with the existing Go
ecosystem for module versioning.
Used this tool to update package imports:
https://github.com/KSubedi/gomove
* etcdserver: add trace for txn request
* pkg/traceutil: added StopSubTrace as a sign of the end of subtrace. Added test case for logging out subtrace.
steps:range from the in-memory index tree; range from boltdb.
etcdserver: add tracing steps: agreement among raft nodes before
linerized reading; authentication; filter and sort kv pairs; assemble
the response.
Etcd currently supports validating peers based on their TLS certificate's
CN field. The current best practice for creation and validation of TLS
certs is to use the Subject Alternative Name (SAN) fields instead, so that
a certificate might be issued with a unique CN and its logical
identities in the SANs.
This commit extends the peer validation logic to use Go's
`(*"crypto/x509".Certificate).ValidateHostname` function for name
validation, which allows SANs to be used for peer access control.
In addition, it allows name validation to be enabled on clients as well.
This is used when running Etcd behind an authenticating proxy, or as
an internal component in a larger system (like a Kubernetes master).
Update to Go 1.12.5 testing. Remove deprecated unused and gosimple
pacakges, and mask staticcheck 1006. Also, fix unconvert errors related
to unnecessary type conversions and following staticcheck errors:
- remove redundant return statements
- use for range instead of for select
- use time.Since instead of time.Now().Sub
- omit comparison to bool constant
- replace T.Fatal and T.Fatalf in tests with T.Error and T.Fatalf respectively because the goroutine calls T.Fatal must be called in the same goroutine as the test
- fix error strings that should not be capitalized
- use sort.Strings(...) instead of sort.Sort(sort.StringSlice(...))
- use he status code of Canceled instead of grpc.ErrClientConnClosing which is deprecated
- use use status.Errorf instead of grpc.Errorf which is deprecated
Related #10528#10438
Membership reconfiguration may not be applied
when the new member joins. Also pass all endpoints
to check restore data in case of leader election or
network faults.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
bufio.NewReader.ReadString blocks even
when the process received syscall.SIGKILL.
Remove ptyMu mutex and make ReadString return
when *os.File is closed.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
ftruncate changes st_blocks, and following fallocate
syscalls would return EINVAL when allocated block size
is already greater than requested block size
(e.g. st_blocks==8, requested blocks are 2).
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
This commit adds an authentication mechanism to inter peer connection
(rafthttp). If the cert based peer auth is enabled and a new option
`--peer-cert-allowed-cn` is passed, an etcd process denies a peer
connection whose CN doesn't match.
The golang/time package tracks monotonic time in each time.Time
returned by time.Now function for Go 1.9.
Use time.Time to measure whether a lease is expired and remove
previous pkg/monotime. Use zero time.Time to mean forever. No
expiration when expiry.IsZero() is true.
* flag: improve StringFlags by support set default value when init
when init flagSet, set default value should be moved to StringFlags init
func, which is more friendly
personal proposal
* flag: code improved for StringFlags
Leak detector is catching goroutines trying to close files which appear
runtime related:
1 instances of:
syscall.Syscall(...)
/usr/local/golang/1.8.3/go/src/syscall/asm_linux_386.s:20 +0x5
syscall.Close(...)
/usr/local/golang/1.8.3/go/src/syscall/zsyscall_linux_386.go:296 +0x3d
os.(*file).close(...)
/usr/local/golang/1.8.3/go/src/os/file_unix.go:140 +0x62
It's unlikely a user goroutine will leak on file close; whitelist it.