Commit Graph

19830 Commits (8c483f31add76f7933696aee3750ef1f309c6e34)

Author SHA1 Message Date
Wei Fu 09d053e035 tests/robustness: tune timeout policy
In a [scheduled test][1], the error shows

```
2023-04-19T11:16:15.8166316Z     traffic.go:96: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout
```

According to [grpc-keepalive@v1.51.0][2], each frame from server will
fresh the `lastRead` and it won't file `Ping` frame to server. But the
client used by [`tombstone` request][3] might hit the race. Since we use
5ms as timeout, the client might not receive the result of `Ping` from
server in time. The keepalive will mark it timeout and close the
connection.

I didn't reproduce it in my local. If we add the sleep before update
`lastRead`, it can reproduce it sometimes. Still investigating this
part.

```diff
diff --git a/internal/transport/http2_client.go b/internal/transport/http2_client.go
index d518b07e..bee9c00a 100644
--- a/internal/transport/http2_client.go
+++ b/internal/transport/http2_client.go
@@ -1560,6 +1560,7 @@ func (t *http2Client) reader(errCh chan<- error) {
                t.controlBuf.throttle()
                frame, err := t.framer.fr.ReadFrame()
                if t.keepaliveEnabled {
+                       time.Sleep(2 * time.Millisecond)
                        atomic.StoreInt64(&t.lastRead, time.Now().UnixNano())
                }
                if err != nil {
```

`DialKeepAliveTime` is always >= [10s][4]. I think we should increase
the timeout to avoid flaky caused by unstable env.

And in a [scheduled test][5], the error shows

```
logger.go:130: 2023-04-22T10:45:52.646Z	INFO	Failed to trigger failpoint	{"failpoint": "blackhole", "error": "context deadline exceeded"}
```

Before sending `Status` to member, the client doesn't [pick][6] the
connection in time (100ms) and returns the error.

The `waitTillSnapshot` is used to ensure that it is good enough to
trigger snapshot transfer. And we have 1min timeout for
injectFailpoints, so I think we can remove the 100ms timeout to reduce
unnecessary stop.

```
injectFailpoints(1min timeout)
  failpoint.Inject
    triggerBlockhole.Trigger
      blackhole
        waitTillSnapshot
```

> NOTE: I didn't reproduce it either. :(

Reference:

[1]: <https://github.com/etcd-io/etcd/actions/runs/4741737098/jobs/8419176899>
[2]: <eeb9afa1f6/internal/transport/http2_client.go (L1647)>
[3]: <7450cd886d/tests/robustness/traffic.go (L94)>
[4]: <eeb9afa1f6/dialoptions.go (L445)>
[5]: <https://github.com/etcd-io/etcd/actions/runs/4772033408/jobs/8484334015>
[6]: <eeb9afa1f6/clientconn.go (L932)>

REF: #15763

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-29 07:03:47 +08:00
Marek Siarkowicz 46ab121cb7
Merge pull request #15786 from etcd-io/serathius-patch-1
Provide release date for v3.5.8
2023-04-28 15:21:53 +02:00
Marek Siarkowicz 7e2e5c68de Provide release data for v3.5.8
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-28 15:21:06 +02:00
aimuz b052092297
refactor(util): remove duplicate lg check
lg always has a value

Signed-off-by: aimuz <mr.imuz@gmail.com>
2023-04-28 10:43:30 +08:00
Marek Siarkowicz 7450cd886d
Merge pull request #15790 from Rajalakshmi-Girish/add-failfast-flag
Add -failfast flag when the mode is fail_fast
2023-04-27 21:16:38 +02:00
Marek Siarkowicz cd24847086
Merge pull request #15789 from ahrtr/save_data_20230427
test: forcibly save data on panicking
2023-04-27 16:23:49 +02:00
Rajalakshmi Girish 81fccc13da Add -failfast flag when the mode is fail_fast
Signed-off-by: Rajalakshmi Girish <rajalakshmi.girish1@ibm.com>
2023-04-27 05:26:38 -07:00
Benjamin Wang c7d81acaf0 test: forcibly save data on pinicking
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-27 14:54:35 +08:00
Benjamin Wang 6d11f8ceb5
Merge pull request #15785 from Mskxn/fix_session
close the session to avoid leak goroutine
2023-04-27 04:24:17 +08:00
Msk233 26fdf46001 close the session to avoid leak goroutine
Signed-off-by: Mskxn <118117161+Mskxn@users.noreply.github.com>
2023-04-26 20:45:13 +08:00
Hitoshi Mitake c9b368119e tests: e2e and integration test for timetolive
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
2023-04-26 20:35:20 +09:00
Hitoshi Mitake 975854f07f etcdserver: protect lease timetilive with auth
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
2023-04-26 20:35:20 +09:00
Marek Siarkowicz e04120042e
Merge pull request #15779 from fuweid/deprecate-schwag
chore: deprecate github.com/hexfusion/schwag
2023-04-26 11:36:31 +02:00
Marek Siarkowicz f13b7502ef
Merge pull request #15781 from serathius/meme-readme
Incorporate xkcd dependency meme into README
2023-04-26 11:05:35 +02:00
Marek Siarkowicz f6822b4225
Merge pull request #15783 from jmhbnz/consolidate-dockerfiles
Consolidate etcd dockerfiles
2023-04-26 11:01:44 +02:00
James Blair ab65ee3d01
Consolidate etcd dockerfiles.
We can consolidate by using docker build args to create the individual platform Dockerfile.

Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-26 17:09:25 +12:00
Wei Fu b4f49a55a5 chore: deprecate github.com/hexfusion/schwag
The schwag was introduced to generate swagger with authorization support
[1][1] in 2017. And in 2018, the grpc-gateway supports to render
security fields by protoc-gen-swagger [2][2]. After several years, I
think it's good to use upstream protoc supports.

NOTE:

The json's key in `rpc.swagger.json` has been reordered so that it seems
that there's a lot of changes. How to verify it:

```bash
$ # use jq -S to sort the key
$ latest_commit="https://raw.githubusercontent.com/etcd-io/etcd/228f493c7697ce3e9d3a1d831bcffad175846c75/Documentation/dev-guide/apispec/swagger/rpc.swagger.json"
$ curl -s "${latest_commit}"  | jq -S . > /tmp/old.json
$ cat Documentation/dev-guide/apispec/swagger/rpc.swagger.json | jq -S . > /tmp/new.json
$ diff --color -u /tmp/old.json /tmp/new.json
```

```diff
--- /tmp/old.json       2023-04-26 10:58:07.142311861 +0800
+++ /tmp/new.json       2023-04-26 10:58:12.170299194 +0800
@@ -1523,11 +1523,14 @@
       "type": "object"
     },
     "protobufAny": {
+      "description": "`Any` contains an arbitrary serialized protocol buffer message along with a\nURL that describes the type of the serialized message.\n\nProtobuf library provides support to pack/unpack Any values in the form\nof utility functions or additional generated methods of the Any type.\n\nExample 1: Pack and unpack a message in C++.\n\n    Foo foo = ...;\n    Any any;\n    any.PackFrom(foo);\n    ...\n    if (any.UnpackTo(&foo)) {\n      ...\n    }\n\nExample 2: Pack and unpack a message in Java.\n\n    Foo foo = ...;\n    Any any = Any.pack(foo);\n    ...\n    if (any.is(Foo.class)) {\n      foo = any.unpack(Foo.class);\n    }\n\n Example 3: Pack and unpack a message in Python.\n\n    foo = Foo(...)\n    any = Any()\n    any.Pack(foo)\n    ...\n    if any.Is(Foo.DESCRIPTOR):\n      any.Unpack(foo)\n      ...\n\n Example 4: Pack and unpack a message in Go\n\n     foo := &pb.Foo{...}\n     any, err := ptypes.MarshalAny(foo)\n     ...\n     foo := &pb.Foo{}\n     if err := ptypes.UnmarshalAny(any, foo); err != nil {\n       ...\n     }\n\nThe pack methods provided by protobuf library will by default use\n'type.googleapis.com/full.type.name' as the type URL and the unpack\nmethods only use the fully qualified type name after the last '/'\nin the type URL, for example \"foo.bar.com/x/y.z\" will yield type\nname \"y.z\".\n\n\nJSON\n====\nThe JSON representation of an `Any` value uses the regular\nrepresentation of the deserialized, embedded message, with an\nadditional field `@type` which contains the type URL. Example:\n\n    package google.profile;\n    message Person {\n      string first_name = 1;\n      string last_name = 2;\n    }\n\n    {\n      \"@type\": \"type.googleapis.com/google.profile.Person\",\n      \"firstName\": <string>,\n      \"lastName\": <string>\n    }\n\nIf the embedded message type is well-known and has a custom JSON\nrepresentation, that representation will be embedded adding a field\n`value` which holds the custom JSON in addition to the `@type`\nfield. Example (for message [google.protobuf.Duration][]):\n\n    {\n      \"@type\": \"type.googleapis.com/google.protobuf.Duration\",\n      \"value\": \"1.212s\"\n    }",
       "properties": {
         "type_url": {
+          "description": "A URL/resource name that uniquely identifies the type of the serialized\nprotocol buffer message. This string must contain at least\none \"/\" character. The last segment of the URL's path must represent\nthe fully qualified name of the type (as in\n`path/google.protobuf.Duration`). The name should be in a canonical form\n(e.g., leading \".\" is not accepted).\n\nIn practice, teams usually precompile into the binary all types that they\nexpect it to use in the context of Any. However, for URLs which use the\nscheme `http`, `https`, or no scheme, one can optionally set up a type\nserver that maps type URLs to message definitions as follows:\n\n* If no scheme is provided, `https` is assumed.\n* An HTTP GET on the URL must yield a [google.protobuf.Type][]\n  value in binary format, or produce an error.\n* Applications are allowed to cache lookup results based on the\n  URL, or have them precompiled into a binary to avoid any\n  lookup. Therefore, binary compatibility needs to be preserved\n  on changes to types. (Use versioned type names to manage\n  breaking changes.)\n\nNote: this functionality is not currently available in the official\nprotobuf release, and it is not used for type URLs beginning with\ntype.googleapis.com.\n\nSchemes other than `http`, `https` (or the empty scheme) might be\nused with implementation specific semantics.",
           "type": "string"
         },
         "value": {
+          "description": "Must be a valid serialized protocol buffer of the above specified type.",
           "format": "byte",
           "type": "string"
         }
```

REF:

1: <https://github.com/etcd-io/etcd/pull/7999#issuecomment-307512043>
2: <https://github.com/grpc-ecosystem/grpc-gateway/pull/547>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-26 11:14:50 +08:00
Marek Siarkowicz 045192683c Move credits to subscript
Co-authored-by: James Blair <mail@jamesblair.net>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-25 15:33:03 +02:00
Marek Siarkowicz 2fd9a1914e Incorporate xkcd dependency meme into README
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-25 14:49:50 +02:00
Benjamin Wang 4485db379e
Merge pull request #15577 from jmhbnz/add-round-robin-test
tests: Add new test for round robin resolver
2023-04-25 18:37:54 +08:00
Benjamin Wang 9b310ea316
Merge pull request #15776 from fuweid/update-deps
[2023-04-25] Bump dependencies identified by dependabot
2023-04-25 16:37:34 +08:00
Wei Fu aa787d9f51 dependency: bump github.com/alexkohler/nakedret from 1.0.1 to 1.0.2
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-25 14:44:34 +08:00
James Blair 18e3acae0e
Add new test for round robin resolver.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-25 18:44:24 +12:00
Benjamin Wang 8c5e9ad455
Merge pull request #15759 from fuweid/deflake-TestAuthMemberRemove
server/etcdserver: togRPCError for maintenance API
2023-04-25 09:26:28 +08:00
Benjamin Wang 0f3fb04f1f
Merge pull request #15744 from ahrtr/dependency_management_20230419
Document: add guidance on dependency management
2023-04-25 06:14:16 +08:00
Benjamin Wang 1dbc9db621
Merge pull request #15772 from etcd-io/dependabot/github_actions/github/codeql-action-2.3.0
build(deps): bump github/codeql-action from 2.2.12 to 2.3.0
2023-04-25 06:12:33 +08:00
dependabot[bot] a2426712cc
build(deps): bump github/codeql-action from 2.2.12 to 2.3.0
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.12 to 2.3.0.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](7df0ce3489...b2c19fb9a2)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-24 18:01:36 +00:00
Benjamin Wang d589a0b5f6 Document: add guidance on dependency management
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-24 18:29:09 +08:00
Marek Siarkowicz c2d78a316a
Merge pull request #15761 from ahrtr/min_version_20230424
Change the minimum recommended etcd versions to run in production to 3.4.22+ and 3.5.6+
2023-04-24 10:26:35 +02:00
Benjamin Wang 146f44d35e change the minimum recommended etcd versions to run in production to 3.4.22+ and 3.5.6+
Please read https://groups.google.com/g/etcd-dev/c/8S7u6NqW6C4

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-24 07:08:28 +08:00
Benjamin Wang 211b4320c3
Merge pull request #15752 from sharathsivakumar/main
fixes for "improve description of --initial-cluster-state flag" #15743
2023-04-23 07:17:37 +08:00
sharathsivakumar 32c83becf5
fix review: Updated description of --initial-cluster-state flag
Signed-off-by: sharathsivakumar <mailssr9@gmail.com>
2023-04-22 23:16:33 +02:00
Wei Fu 1ba577e499 server/etcdserver: togRPCError for maintenance API
It's to deflake TestAuthMemberRemove.

When the client has multiple endpoints, the client might send a request
with valid token to the follower member which hasn't received token
replicated log yet. The member will reject the request.

For instance, the maintenance.Status API will return "auth: invalid auth
token". But the client doesn't identify the error. The client won't retry to
refresh auth token. The maintenance.Status should togRPCError before return
so that the client can reflesh token. It's align with existing API.

Since the maintenance client always creates one connection to target
member, the member will have the token after refresh auth.

Maybe we can introduce a sync to wait for member is ready with token,
instead of refreshing.

Fixes: #15758

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-22 18:35:53 +08:00
Benjamin Wang 63c9fe1d00
Merge pull request #15751 from owayss/auth_store_unit_test_coverage
tests: increases unit test coverage for etcd/server/auth isRangeOpPermitted
2023-04-21 07:29:35 +08:00
Benjamin Wang 4a8817bfb0
Merge pull request #15737 from jmhbnz/update-dependencies
Bump dependencies identified by dependabot
2023-04-21 06:35:08 +08:00
James Blair aad63a1efe
dependency: bump github.com/mikefarah/yq/v4 from 4.33.1 to 4.33.3
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-21 05:40:03 +12:00
James Blair 04f3e9cb9a
dependency: bump golang.org/x/crypto from 0.7.0 to 0.8.0
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-21 05:34:21 +12:00
James Blair 042e2e9a57
dependency: bump github.com/prometheus/client_golang from 1.14.0 to 1.15.0
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-21 05:14:40 +12:00
Marek Siarkowicz ee6fde70dc
Merge pull request #15748 from judavi/15674
Adds a reusable workflow to setup the GoVersion
2023-04-20 15:34:20 +02:00
Juan 0df7c48ddd Centralizing workflow go-version variable
Signed-off-by: Juan <1766933+judavi@users.noreply.github.com>
2023-04-20 11:42:28 +00:00
Owayss Kabtoul 1c18c86e18 tests: increases unit test coverage for etcd/server/auth isRangeOpPermitted
Signed-off-by: Owayss Kabtoul <owayssk@gmail.com>
2023-04-20 13:39:08 +02:00
Benjamin Wang 0ac617059f
Merge pull request #15745 from catandcoder/main
fix some comments
2023-04-20 15:07:07 +08:00
cui fliter 57908723f4 fix some comments
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-04-20 14:26:17 +08:00
Benjamin Wang b27dec8b94
Merge pull request #15721 from vianamjr/txn-auth-unit-test
tests: cover txn.CheckTxnAuth logic with unit tests
2023-04-19 05:54:55 +08:00
Marcondes Viana 9d14ae43c2 fix review: remove if on error check
Signed-off-by: Marcondes Viana <marju10@gmail.com>
2023-04-18 10:43:13 -03:00
Marcondes Viana ecc7441ba1 fix review: use assert lib
Signed-off-by: Marcondes Viana <marju10@gmail.com>
2023-04-18 10:02:03 -03:00
Marek Siarkowicz b526cdcbe8
Merge pull request #15718 from fuweid/followup-15667
tests: make log monitor as common helper (followup #15667
2023-04-18 07:59:19 +02:00
Wei Fu 50aa00b203 tests: make log monitor as common helper
It's followup of #15667.

This patch is to use zaptest/observer as base to provide a similar
function to pkg/expect.Expect.

The test env

```bash
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
mkdir /sys/fs/cgroup/etcd-followup-15667
echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores
```

Before change:

* memory.peak: ~ 681 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04

After change:

* memory.peak: ~ 671 MiB
* Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07

Based on the test result, I think it's safe to be enabled by default.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-18 09:00:24 +08:00
Benjamin Wang a9c1c217a2
Merge pull request #15736 from etcd-io/dependabot/github_actions/github/codeql-action-2.2.12
build(deps): bump github/codeql-action from 2.2.11 to 2.2.12
2023-04-18 05:11:44 +08:00
Benjamin Wang 4c79aecd0c
Merge pull request #15735 from etcd-io/dependabot/github_actions/actions/checkout-3.5.2
build(deps): bump actions/checkout from 3.5.0 to 3.5.2
2023-04-18 05:04:46 +08:00