* Documentation: enhance description of lock and lease
* Documentation: an executable implementation of fencing
* docs: api guarantees
cleanup lease grammar slightly
* docs: learning/lock/README.md improve grammar
Co-Authored-By: Steven E. Harris <seh@panix.com>
* docs: learning: improve locks and leases grammar
Co-authored-by: Brandon Philips <brandon@ifup.org>
Co-authored-by: Steven E. Harris <seh@panix.com>
This changes have started at etcdctl under auth.go, and make changes to stub out everything down into the internal raft. Made changes to the .proto files and regenerated them so that the local version would build successfully.
The `etcdHighNumberOfLeaderChanges` alert had a copy and paste
error when it was converted from docs to mixin in 10244 - we moved
from "increase over 15m > 3" to "rate over 15m > 3" which is not
the same (rate is measured per second, so it should have been
"rate over 15m > (3 / 60 / 15)"). As part of fixing that, we
need to capture when prometheus starts or when new etcd clusters
are captured with a high leader change - i.e. if you start a new
etcd cluster and at the moment prometheus first scrapes you are
already at 5 leader changes, we should fire on that transition.
This alert is also now more responsive, so if you get a quick
burst of 3 leader changes we'll alert within 5m rather than 15m.
Etcd-backup-restore is collection of components to backup and restore the etcd. It features the periodic full and incremental backups, automated restore, Validation of etcd data directory with multi cloud provider support.
In the Background section, the document describes various challenges for cluster membership change.
Added section header for each case described for better readability.
New tool: ETCD Manager
ETCD Manager is a multi-platform ETCD v3 client. Currently, builds are available form Mac, Wiindows and Linux, but iOS / Android builds will also be added in the future.
It aims to be a modern, efficient and easy to use GUI with full coverage of ETCD APIs / functionality. The first public (beta) release is already available.
See the issue created here:
https://github.com/etcd-io/etcd/issues/10989#issuecomment-518726038
doc: fix broken links referring to etcd.redhatdocs.io
Adding links to internal Documentation within github.com.
Update runtime-configuration.md
Update runtime-configuration.md
Update CHANGELOG-3.3.md
Remove extra space
Keep the formatting similar to original
An etcd member being down is an important failure state - while
normal admin operations may cause transient outages to rotate,
when any member is down the cluster is operating in a degraded
fashion. Add an alert that records when any members are down
so that administrators know whether the next failure is fatal.
The rule is more complicated than `up{...} == 0` because not all
failure modes for etcd may have an `up{...}` entry for each member.
For instance, a Kubernetes service in front of an etcd cluster
might only have 2 endpoints recorded in `up` because the third
pod is evicted by the kubelet - the cluster is degraded but
`count(up{...})` would not return the full quorum size. Instead,
use network peer send failures as a failure detector and attempt
to return the max of down services or failing peers. We may
undercount the number of total failures, but we will at least
alert that a member is down.
Etcd currently supports validating peers based on their TLS certificate's
CN field. The current best practice for creation and validation of TLS
certs is to use the Subject Alternative Name (SAN) fields instead, so that
a certificate might be issued with a unique CN and its logical
identities in the SANs.
This commit extends the peer validation logic to use Go's
`(*"crypto/x509".Certificate).ValidateHostname` function for name
validation, which allows SANs to be used for peer access control.
In addition, it allows name validation to be enabled on clients as well.
This is used when running Etcd behind an authenticating proxy, or as
an internal component in a larger system (like a Kubernetes master).