500ms keepalive delay on proxy side causes client to sometimes send
a second keepalive since it waits more than 500ms for the first response.
Fixes#7658
If the balancer update notification loop starts with a downed
connection and endpoints are switched while the old connection is up,
the balancer can potentially wait forever for an up connection without
refreshing the connections to reflect the current endpoints.
Instead, fetch upc/downc together, only caring about a single transition
either from down->up or up->down for each iteration
Simple way to reproduce failures: add time.Sleep(time.Second) to the
beginning of the update notification loop.
Watching from the leader's ModRevision could cause live-locking on
observe retry loops when the ModRevision is less than the compacted
revision. Instead, start watching the leader from at least the store
revision of the linearized read used to detect the current leader.
Fixes#7815
Turns out the optimization to ignore setting the init rev for
current revision watches breaks some ordering assumptions. Since
Watch only returns a channel once it gets a response, it should
bind the revision at the time of the first create response.
Was causing TestWatchReconnInit to fail.
Server Stop+Restart sometimes takes more than 500ms, so with a
one second window the lease client may not get a chance to issue
a keepalive and get a lease extension before the lease client
timer elapses. Instead, sleep for a shorter period of time (while
still guaranteeing a keepalive resend during quorum loss) and
skip the test if server restart takes longer than the lease TTL.
Fixes#7346
Pure Snapshot isolation would permit read conflicts. Change the name
from Snapshot to SerializableSnapshot to reflect that it will also
reject read conflicts.
The Get for the leader key will fetch based on the latest revision
instead of the deletion revision, missing leader updates between
the delete and the Get.
Although it's usually safe to skip these updates since they're
stale, it makes testing more difficult and in some cases the
full leader update history is desirable.
Addresses a case where two clients share the same lease. A client resigns but
disconnects / crashes and doesn't realize it. Another client reuses the
lease and gets leadership with a new key. The old client comes back and
tries to resign again, revoking the new leadership of the new client.
The full information about the leader's key is necessary to
safely use elections with transactions. Instead of returning
only the value on Leader(), return the entire GetResposne.