--- title: Upgrade etcd from 3.3 to 3.4 --- In the general case, upgrading from etcd 3.3 to 3.4 can be a zero-downtime, rolling upgrade: - one by one, stop the etcd v3.3 processes and replace them with etcd v3.4 processes - after running all v3.4 processes, new features in v3.4 are available to the cluster Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare. ### Upgrade checklists **NOTE:** When [migrating from v2 with no v3 data](https://github.com/etcd-io/etcd/issues/9480), etcd server v3.2+ panics when etcd restores from existing snapshots but no v3 `ETCD_DATA_DIR/member/snap/db` file. This happens when the server had migrated from v2 with no previous v3 data. This also prevents accidental v3 data loss (e.g. `db` file might have been moved). etcd requires that post v3 migration can only happen with v3 data. Do not upgrade to newer v3 versions until v3.0 server contains v3 data. Highlighted breaking changes in 3.4. #### Make `ETCDCTL_API=3 etcdctl` default `ETCDCTL_API=3` is now the default. ```diff etcdctl set foo bar Error: unknown command "set" for "etcdctl" -etcdctl set foo bar +ETCDCTL_API=2 etcdctl set foo bar bar ETCDCTL_API=3 etcdctl put foo bar OK -ETCDCTL_API=3 etcdctl put foo bar +etcdctl put foo bar ``` #### Make `etcd --enable-v2=false` default [`etcd --enable-v2=false`](https://github.com/etcd-io/etcd/pull/10935) is now the default. This means, unless `etcd --enable-v2=true` is specified, etcd v3.4 server would not serve v2 API requests. If v2 API were used, make sure to enable v2 API in v3.4: ```diff -etcd +etcd --enable-v2=true ``` Other HTTP APIs will still work (e.g. `[CLIENT-URL]/metrics`, `[CLIENT-URL]/health`, v3 gRPC gateway). #### Deprecated `etcd --ca-file` and `etcd --peer-ca-file` flags `--ca-file` and `--peer-ca-file` flags are deprecated; they have been deprecated since v2.1. ```diff -etcd --ca-file ca-client.crt +etcd --trusted-ca-file ca-client.crt ``` ```diff -etcd --peer-ca-file ca-peer.crt +etcd --peer-trusted-ca-file ca-peer.crt ``` #### Deprecated `grpc.ErrClientConnClosing` error `grpc.ErrClientConnClosing` has been [deprecated in gRPC >= 1.10](https://github.com/grpc/grpc-go/pull/1854). ```diff import ( + "go.etcd.io/etcd/clientv3" "google.golang.org/grpc" + "google.golang.org/grpc/codes" + "google.golang.org/grpc/status" ) _, err := kvc.Get(ctx, "a") -if err == grpc.ErrClientConnClosing { +if clientv3.IsConnCanceled(err) { // or +s, ok := status.FromError(err) +if ok { + if s.Code() == codes.Canceled ``` #### Require `grpc.WithBlock` for client dial [The new client balancer](https://github.com/etcd-io/etcd/blob/master/Documentation/learning/design-client.md) uses an asynchronous resolver to pass endpoints to the gRPC dial function. As a result, v3.4 client requires `grpc.WithBlock` dial option to wait until the underlying connection is up. ```diff import ( "time" "go.etcd.io/etcd/clientv3" + "google.golang.org/grpc" ) +// "grpc.WithBlock()" to block until the underlying connection is up ccfg := clientv3.Config{ Endpoints: []string{"localhost:2379"}, DialTimeout: time.Second, + DialOptions: []grpc.DialOption{grpc.WithBlock()}, DialKeepAliveTime: time.Second, DialKeepAliveTimeout: 500 * time.Millisecond, } ``` #### Deprecating `etcd_debugging_mvcc_db_total_size_in_bytes` Prometheus metrics v3.4 promotes `etcd_debugging_mvcc_db_total_size_in_bytes` Prometheus metrics to `etcd_mvcc_db_total_size_in_bytes`, in order to encourage etcd storage monitoring. `etcd_debugging_mvcc_db_total_size_in_bytes` is still served in v3.4 for backward compatibilities. It will be completely deprecated in v3.5. ```diff -etcd_debugging_mvcc_db_total_size_in_bytes +etcd_mvcc_db_total_size_in_bytes ``` Note that `etcd_debugging_*` namespace metrics have been marked as experimental. As we improve monitoring guide, we may promote more metrics. #### Deprecating `etcd_debugging_mvcc_put_total` Prometheus metrics v3.4 promotes `etcd_debugging_mvcc_put_total` Prometheus metrics to `etcd_mvcc_put_total`, in order to encourage etcd storage monitoring. `etcd_debugging_mvcc_put_total` is still served in v3.4 for backward compatibilities. It will be completely deprecated in v3.5. ```diff -etcd_debugging_mvcc_put_total +etcd_mvcc_put_total ``` Note that `etcd_debugging_*` namespace metrics have been marked as experimental. As we improve monitoring guide, we may promote more metrics. #### Deprecating `etcd_debugging_mvcc_delete_total` Prometheus metrics v3.4 promotes `etcd_debugging_mvcc_delete_total` Prometheus metrics to `etcd_mvcc_delete_total`, in order to encourage etcd storage monitoring. `etcd_debugging_mvcc_delete_total` is still served in v3.4 for backward compatibilities. It will be completely deprecated in v3.5. ```diff -etcd_debugging_mvcc_delete_total +etcd_mvcc_delete_total ``` Note that `etcd_debugging_*` namespace metrics have been marked as experimental. As we improve monitoring guide, we may promote more metrics. #### Deprecating `etcd_debugging_mvcc_txn_total` Prometheus metrics v3.4 promotes `etcd_debugging_mvcc_txn_total` Prometheus metrics to `etcd_mvcc_txn_total`, in order to encourage etcd storage monitoring. `etcd_debugging_mvcc_txn_total` is still served in v3.4 for backward compatibilities. It will be completely deprecated in v3.5. ```diff -etcd_debugging_mvcc_txn_total +etcd_mvcc_txn_total ``` Note that `etcd_debugging_*` namespace metrics have been marked as experimental. As we improve monitoring guide, we may promote more metrics. #### Deprecating `etcd_debugging_mvcc_range_total` Prometheus metrics v3.4 promotes `etcd_debugging_mvcc_range_total` Prometheus metrics to `etcd_mvcc_range_total`, in order to encourage etcd storage monitoring. `etcd_debugging_mvcc_range_total` is still served in v3.4 for backward compatibilities. It will be completely deprecated in v3.5. ```diff -etcd_debugging_mvcc_range_total +etcd_mvcc_range_total ``` Note that `etcd_debugging_*` namespace metrics have been marked as experimental. As we improve monitoring guide, we may promote more metrics. #### Deprecating `etcd --log-output` flag (now `--log-outputs`) Rename [`etcd --log-output` to `--log-outputs`](https://github.com/etcd-io/etcd/pull/9624) to support multiple log outputs. **`etcd --logger=capnslog` does not support multiple log outputs.** **`etcd --log-output`** will be deprecated in v3.5. **`etcd --logger=capnslog` will be deprecated in v3.5**. ```diff -etcd --log-output=stderr +etcd --log-outputs=stderr +# to write logs to stderr and a.log file at the same time +# only "--logger=zap" supports multiple writers +etcd --logger=zap --log-outputs=stderr,a.log ``` v3.4 adds `etcd --logger=zap --log-outputs=stderr` support for structured logging and multiple log outputs. Main motivation is to promote automated etcd monitoring, rather than looking back server logs when it starts breaking. Future development will make etcd log as few as possible, and make etcd easier to monitor with metrics and alerts. **`etcd --logger=capnslog` will be deprecated in v3.5**. #### Changed `log-outputs` field type in `etcd --config-file` to `[]string` Now that `log-outputs` (old field name `log-output`) accepts multiple writers, etcd configuration YAML file `log-outputs` field must be changed to `[]string` type as below: ```diff # Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd. -log-output: default +log-outputs: [default] ``` #### Renamed `embed.Config.LogOutput` to `embed.Config.LogOutputs` Renamed [**`embed.Config.LogOutput`** to **`embed.Config.LogOutputs`**](https://github.com/etcd-io/etcd/pull/9624) to support multiple log outputs. And changed [`embed.Config.LogOutput` type from `string` to `[]string`](https://github.com/etcd-io/etcd/pull/9579) to support multiple log outputs. ```diff import "github.com/coreos/etcd/embed" cfg := &embed.Config{Debug: false} -cfg.LogOutput = "stderr" +cfg.LogOutputs = []string{"stderr"} ``` #### v3.5 deprecates `capnslog` **v3.5 will deprecate `etcd --log-package-levels` flag for `capnslog`**; `etcd --logger=zap --log-outputs=stderr` will the default. **v3.5 will deprecate `[CLIENT-URL]/config/local/log` endpoint.** ```diff -etcd +etcd --logger zap ``` #### Deprecating `etcd --debug` flag (now `--log-level=debug`) v3.4 deprecates [`etcd --debug`](https://github.com/etcd-io/etcd/pull/10947) flag. Instead, use `etcd --log-level=debug` flag. ```diff -etcd --debug +etcd --logger zap --log-level debug ``` #### Deprecated `pkg/transport.TLSInfo.CAFile` field Deprecated `pkg/transport.TLSInfo.CAFile` field. ```diff import "github.com/coreos/etcd/pkg/transport" tlsInfo := transport.TLSInfo{ CertFile: "/tmp/test-certs/test.pem", KeyFile: "/tmp/test-certs/test-key.pem", - CAFile: "/tmp/test-certs/trusted-ca.pem", + TrustedCAFile: "/tmp/test-certs/trusted-ca.pem", } tlsConfig, err := tlsInfo.ClientConfig() if err != nil { panic(err) } ``` #### Changed `embed.Config.SnapCount` to `embed.Config.SnapshotCount` To be consistent with the flag name `etcd --snapshot-count`, `embed.Config.SnapCount` field has been renamed to `embed.Config.SnapshotCount`: ```diff import "github.com/coreos/etcd/embed" cfg := embed.NewConfig() -cfg.SnapCount = 100000 +cfg.SnapshotCount = 100000 ``` #### Changed `etcdserver.ServerConfig.SnapCount` to `etcdserver.ServerConfig.SnapshotCount` To be consistent with the flag name `etcd --snapshot-count`, `etcdserver.ServerConfig.SnapCount` field has been renamed to `etcdserver.ServerConfig.SnapshotCount`: ```diff import "github.com/coreos/etcd/etcdserver" srvcfg := etcdserver.ServerConfig{ - SnapCount: 100000, + SnapshotCount: 100000, ``` #### Changed function signature in package `wal` Changed `wal` function signatures to support structured logger. ```diff import "github.com/coreos/etcd/wal" +import "go.uber.org/zap" +lg, _ = zap.NewProduction() -wal.Open(dirpath, snap) +wal.Open(lg, dirpath, snap) -wal.OpenForRead(dirpath, snap) +wal.OpenForRead(lg, dirpath, snap) -wal.Repair(dirpath) +wal.Repair(lg, dirpath) -wal.Create(dirpath, metadata) +wal.Create(lg, dirpath, metadata) ``` #### Changed `IntervalTree` type in package `pkg/adt` `pkg/adt.IntervalTree` is now defined as an `interface`. ```diff import ( "fmt" "go.etcd.io/etcd/pkg/adt" ) func main() { - ivt := &adt.IntervalTree{} + ivt := adt.NewIntervalTree() ``` #### Deprecated `embed.Config.SetupLogging` `embed.Config.SetupLogging` has been removed in order to prevent wrong logging configuration, and now set up automatically. ```diff import "github.com/coreos/etcd/embed" cfg := &embed.Config{Debug: false} -cfg.SetupLogging() ``` #### Changed gRPC gateway HTTP endpoints (replaced `/v3beta` with `/v3`) Before ```bash curl -L http://localhost:2379/v3beta/kv/put \ -X POST -d '{"key": "Zm9v", "value": "YmFy"}' ``` After ```bash curl -L http://localhost:2379/v3/kv/put \ -X POST -d '{"key": "Zm9v", "value": "YmFy"}' ``` Requests to `/v3beta` endpoints will redirect to `/v3`, and `/v3beta` will be removed in 3.5 release. #### Deprecated container image tags `latest` and minor version images tags are deprecated: ```diff -docker pull gcr.io/etcd-development/etcd:latest +docker pull gcr.io/etcd-development/etcd:v3.4.0 -docker pull gcr.io/etcd-development/etcd:v3.4 +docker pull gcr.io/etcd-development/etcd:v3.4.0 -docker pull gcr.io/etcd-development/etcd:v3.4 +docker pull gcr.io/etcd-development/etcd:v3.4.1 -docker pull gcr.io/etcd-development/etcd:v3.4 +docker pull gcr.io/etcd-development/etcd:v3.4.2 ``` ### Server upgrade checklists #### Upgrade requirements To upgrade an existing etcd deployment to 3.4, the running cluster must be 3.3 or greater. If it's before 3.3, please [upgrade to 3.3](upgrade_3_3.md) before upgrading to 3.4. Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl endpoint health` command before proceeding. #### Preparation Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment. Before beginning, [download the snapshot backup](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore). #### Mixed versions While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 3.4. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features. #### Limitations Note: If the cluster only has v3 data and no v2 data, it is not subject to this limitation. If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member. For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before upgrading, and we'll be happy to provide advice on the procedure. #### Downgrade If all members have been upgraded to v3.4, the cluster will be upgraded to v3.4, and downgrade from this completed state is **not possible**. If any single member is still v3.3, however, the cluster and its operations remains "v3.3", and it is possible from this mixed cluster state to return to using a v3.3 etcd binary on all members. Please [download the snapshot backup](../op-guide/maintenance.md#snapshot-backup) to make downgrading the cluster possible even after it has been completely upgraded. ### Upgrade procedure This example shows how to upgrade a 3-member v3.3 ectd cluster running on a local machine. #### Step 1: check upgrade requirements Is the cluster healthy and running v3.3.x? ```bash etcdctl --endpoints=localhost:2379,localhost:22379,localhost:32379 endpoint health <} WARNING: 2018/05/17 12:45:21 grpc: addrConn.transportMonitor exits due to: grpc: the connection is closing 21.193589 I | raft: 7339c4e5e833c029 [term: 8] received a MsgVote message with higher term from 729934363faa4a24 [term: 9] 21.193626 I | raft: 7339c4e5e833c029 became follower at term 9 21.193651 I | raft: 7339c4e5e833c029 [logterm: 8, index: 9, vote: 0] cast MsgVote for 729934363faa4a24 [logterm: 8, index: 9] at term 9 21.193675 I | raft: raft.node: 7339c4e5e833c029 lost leader 7339c4e5e833c029 at term 9 21.194424 I | raft: raft.node: 7339c4e5e833c029 elected leader 729934363faa4a24 at term 9 21.292898 I | etcdserver: 7339c4e5e833c029 finished leadership transfer from 7339c4e5e833c029 to 729934363faa4a24 (took 100.436391ms) 21.292975 I | rafthttp: stopping peer 729934363faa4a24... 21.293206 I | rafthttp: closed the TCP streaming connection with peer 729934363faa4a24 (stream MsgApp v2 writer) 21.293225 I | rafthttp: stopped streaming with peer 729934363faa4a24 (writer) 21.293437 I | rafthttp: closed the TCP streaming connection with peer 729934363faa4a24 (stream Message writer) 21.293459 I | rafthttp: stopped streaming with peer 729934363faa4a24 (writer) 21.293514 I | rafthttp: stopped HTTP pipelining with peer 729934363faa4a24 21.293590 W | rafthttp: lost the TCP streaming connection with peer 729934363faa4a24 (stream MsgApp v2 reader) 21.293610 I | rafthttp: stopped streaming with peer 729934363faa4a24 (stream MsgApp v2 reader) 21.293680 W | rafthttp: lost the TCP streaming connection with peer 729934363faa4a24 (stream Message reader) 21.293700 I | rafthttp: stopped streaming with peer 729934363faa4a24 (stream Message reader) 21.293711 I | rafthttp: stopped peer 729934363faa4a24 21.293720 I | rafthttp: stopping peer b548c2511513015... 21.293987 I | rafthttp: closed the TCP streaming connection with peer b548c2511513015 (stream MsgApp v2 writer) 21.294063 I | rafthttp: stopped streaming with peer b548c2511513015 (writer) 21.294467 I | rafthttp: closed the TCP streaming connection with peer b548c2511513015 (stream Message writer) 21.294561 I | rafthttp: stopped streaming with peer b548c2511513015 (writer) 21.294742 I | rafthttp: stopped HTTP pipelining with peer b548c2511513015 21.294867 W | rafthttp: lost the TCP streaming connection with peer b548c2511513015 (stream MsgApp v2 reader) 21.294892 I | rafthttp: stopped streaming with peer b548c2511513015 (stream MsgApp v2 reader) 21.294990 W | rafthttp: lost the TCP streaming connection with peer b548c2511513015 (stream Message reader) 21.295004 E | rafthttp: failed to read b548c2511513015 on stream Message (context canceled) 21.295013 I | rafthttp: peer b548c2511513015 became inactive 21.295024 I | rafthttp: stopped streaming with peer b548c2511513015 (stream Message reader) 21.295035 I | rafthttp: stopped peer b548c2511513015 ``` #### Step 4: restart the etcd server with same configuration Restart the etcd server with same configuration but with the new etcd binary. ```diff -etcd-old --name s1 \ +etcd-new --name s1 \ --data-dir /tmp/etcd/s1 \ --listen-client-urls http://localhost:2379 \ --advertise-client-urls http://localhost:2379 \ --listen-peer-urls http://localhost:2380 \ --initial-advertise-peer-urls http://localhost:2380 \ --initial-cluster s1=http://localhost:2380,s2=http://localhost:22380,s3=http://localhost:32380 \ --initial-cluster-token tkn \ + --initial-cluster-state new \ + --logger zap \ + --log-outputs stderr ``` The new v3.4 etcd will publish its information to the cluster. At this point, cluster still operates as v3.3 protocol, which is the lowest common version. > `{"level":"info","ts":1526586617.1647713,"caller":"membership/cluster.go:485","msg":"set initial cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"7339c4e5e833c029","cluster-version":"3.0"}` > `{"level":"info","ts":1526586617.1648536,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.0"}` > `{"level":"info","ts":1526586617.1649303,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"7339c4e5e833c029","from":"3.0","from":"3.3"}` > `{"level":"info","ts":1526586617.1649797,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.3"}` > `{"level":"info","ts":1526586617.2107732,"caller":"etcdserver/server.go:1770","msg":"published local member to cluster through raft","local-member-id":"7339c4e5e833c029","local-member-attributes":"{Name:s1 ClientURLs:[http://localhost:2379]}","request-path":"/0/members/7339c4e5e833c029/attributes","cluster-id":"7dee9ba76d59ed53","publish-timeout":7}` Verify that each member, and then the entire cluster, becomes healthy with the new v3.4 etcd binary: ```bash etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379 < `{"level":"info","ts":1526586949.0920913,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}` > `{"level":"info","ts":1526586949.0921566,"caller":"etcdserver/server.go:2272","msg":"cluster version is updated","cluster-version":"3.4"}` Member 2: > `{"level":"info","ts":1526586949.092117,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"729934363faa4a24","from":"3.3","from":"3.4"}` > `{"level":"info","ts":1526586949.0923078,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}` Member 3: > `{"level":"info","ts":1526586949.0921423,"caller":"membership/cluster.go:473","msg":"updated cluster version","cluster-id":"7dee9ba76d59ed53","local-member-id":"b548c2511513015","from":"3.3","from":"3.4"}` > `{"level":"info","ts":1526586949.0922918,"caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}` ```bash endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379 <