Commit Graph

11 Commits (c13a1d47f5885302ae66e28a2d70212a6d39796b)

Author SHA1 Message Date
xakdwch 8298ed8204 rpctypes: use status.Error() instead of status.New().Err()
status.Error() returns an error representing error code and msg,
status.Error() is better than status.New().Err().

Signed-off-by: xakdwch <xakdwch5@gmail.com>
2023-03-02 10:47:09 +08:00
ahrtr 1b3d6cb0c8 set an separate applyTimeout for the waitAppliedIndex 2022-04-10 14:44:55 +08:00
ahrtr 8681888012 fix typo, renamed ErrGPRCNotSupportedForLearner to ErrGRPCNotSupportedForLearner 2022-02-21 14:46:58 +08:00
ahrtr 15568f4c00 add protection code for Range when the sortTarget is an invalid value 2022-01-18 07:46:37 +08:00
ahrtr f8aafea504 add protection code to prevent etcd from panic when the client api version is not valid UTF-8 2022-01-17 06:21:22 +08:00
Hitoshi Mitake 2a750a8dba *: implement a retry logic for auth old revision in the client 2021-09-05 01:13:52 +09:00
J. David Lowe 115c694af6 etcdserver: don't attempt to grant nil permission to a role
Prevent etcd from crashing when given a bad grant payload, e.g.:

$ curl -d '{"name": "foo"}' http://localhost:2379/v3/auth/role/add
{"header":{"cluster_id":"14841639068965178418", ...
$ curl -d '{"name": "foo"}' http://localhost:2379/v3/auth/role/grant
curl: (52) Empty reply from server
2021-06-04 14:20:02 -07:00
Piotr Tabor 16d51d8c26 Fix not retryable error codes from: Unavailable -> FailedPrecondition
- ErrGRPCNotCapable("etcdserver: not capable") -> codes.FailedPrecondition  (it will not autofix, it requires new version of server)
 - ErrGPRCNotSupportedForLearner("etcdserver: rpc not supported for learner") -> codes.FailedPrecondition (as long as its learner, the call will not work)
 - ErrGRPCClusterVersionUnavailable("etcdserver: cluster version not found during downgrade") -> codes.FailedPrecondition (backend does not contain the version (old etcd?) so retry will not help)

https://github.com/etcd-io/etcd/runs/2599598633?check_suite_focus=true

```
{"level":"warn","ts":"2021-05-17T09:55:30.246Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000539880/#initially=[unix://localhost:m30]","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"}
{"level":"warn","ts":"2021-05-17T09:55:30.270Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker
failed","target":"etcd-endpoints://0xc000539880/#initially=[unix://localhost:m30]","attempt":1,"error":"rpc
error: code = Unavailable desc = etcdserver: rpc not supported for
learner"}`
```
2021-05-19 02:08:53 +02:00
Piotr Tabor 9a4b2bdccc Errors: `context cancelled` or `context deadline exceeded` are exposed as codes.Canceled, codes.DeadlineExceeded instead of 'codes.Unknown' 2021-04-22 14:35:24 +02:00
Dan Mace 9571325fe8 etcdserver: fix incorrect metrics generated when clients cancel watches
Before this patch, a client which cancels the context for a watch results in the
server generating a `rpctypes.ErrGRPCNoLeader` error that leads the recording of
a gRPC `Unavailable` metric in association with the client watch cancellation.
The metric looks like this:

    grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"}

So, the watch server has misidentified the error as a server error and then
propagates the mistake to metrics, leading to a false indicator that the leader
has been lost. This false signal then leads to false alerting.

The commit 9c103dd0de introduced an interceptor which wraps
watch streams requiring a leader, causing those streams to be actively canceled
when leader loss is detected.

However, the error handling code assumes all stream context cancellations are
from the interceptor. This assumption is broken when the context was canceled
because of a client stream cancelation.

The core challenge is lack of information conveyed via `context.Context` which
is shared by both the send and receive sides of the stream handling and is
subject to cancellation by all paths (including the gRPC library itself). If any
piece of the system cancels the shared context, there's no way for a context
consumer to understand who cancelled the context or why.

To solve the ambiguity of the stream interceptor code specifically, this patch
introduces a custom context struct which the interceptor uses to expose a custom
error through the context when the interceptor decides to actively cancel a
stream. Now the consuming side can more safely assume a generic context
cancellation can be propagated as a cancellation, and the server generated
leader error is preserved and propagated normally without any special inference.

When a client cancels the stream, there remains a race in the error handling
code between the send and receive goroutines whereby the underlying gRPC error
is lost in the case where the send path returns and is handled first, but this
issue can be taken separately as no matter which paths wins, we can detect a
generic cancellation.

This is a replacement of https://github.com/etcd-io/etcd/pull/11375.

Fixes #10289, #9725, #9576, #9166
2020-11-18 17:02:09 -05:00
Piotr Tabor 389642dd16 client: Move client specific code (protos, version) to api/
client: Move client specific code (protos, version) to the api/
directory. Thanks to this change /client directory will not need to depend on
the server code. In next commits we make "/api" a module on its own.

Mechanical consequences of execution:

% git mv version/version.go api/version
% git mv etcdserver/api/v3rpc/rpctypes api/v3rpc
% git mv mvcc/mvccpb api/
% git mv etcdserver/etcdserverpb api/
% git mv auth/authpb api/
% git mv etcdserver/api/membership/membershippb api/
2020-10-06 11:53:36 +02:00