This also happens without gRPC proxy.
Fix panic when gRPC proxy leader watcher is restored:
```
go test -v -tags cluster_proxy -cpu 4 -race -run TestV3WatchRestoreSnapshotUnsync
=== RUN TestV3WatchRestoreSnapshotUnsync
panic: watcher minimum revision 9223372036854775805 should not exceed current revision 16
goroutine 156 [running]:
github.com/coreos/etcd/mvcc.(*watcherGroup).chooseAll(0xc4202b8720, 0x10, 0xffffffffffffffff, 0x1)
/home/gyuho/go/src/github.com/coreos/etcd/mvcc/watcher_group.go:242 +0x3b5
github.com/coreos/etcd/mvcc.(*watcherGroup).choose(0xc4202b8720, 0x200, 0x10, 0xffffffffffffffff, 0xc420253378, 0xc420253378)
/home/gyuho/go/src/github.com/coreos/etcd/mvcc/watcher_group.go:225 +0x289
github.com/coreos/etcd/mvcc.(*watchableStore).syncWatchers(0xc4202b86e0, 0x0)
/home/gyuho/go/src/github.com/coreos/etcd/mvcc/watchable_store.go:340 +0x237
github.com/coreos/etcd/mvcc.(*watchableStore).syncWatchersLoop(0xc4202b86e0)
/home/gyuho/go/src/github.com/coreos/etcd/mvcc/watchable_store.go:214 +0x280
created by github.com/coreos/etcd/mvcc.newWatchableStore
/home/gyuho/go/src/github.com/coreos/etcd/mvcc/watchable_store.go:90 +0x477
exit status 2
FAIL github.com/coreos/etcd/integration 2.551s
```
gRPC proxy spawns a watcher with a key "proxy-namespace__lostleader"
and watch revision "int64(math.MaxInt64 - 2)" to detect leader loss.
But, when the partitioned node restores, this watcher triggers
panic with "watcher minimum revision ... should not exceed current ...".
This check was added a long time ago, by my PR, when there was no gRPC proxy:
https://github.com/coreos/etcd/pull/4043#discussion_r48457145
> we can remove this checking actually. it is impossible for a unsynced watching to have a future rev. or we should just panic here.
However, now it's possible that a unsynced watcher has a future
revision, when it was moved from a synced watcher group through
restore operation.
This PR adds "restore" flag to indicate that a watcher was moved
from the synced watcher group with restore operation. Otherwise,
the watcher with future revision in an unsynced watcher group
would still panic.
Example logs with future revision watcher from restore operation:
```
{"level":"info","ts":1527196358.9057755,"caller":"mvcc/watcher_group.go:261","msg":"choosing future revision watcher from restore operation","watch-key":"proxy-namespace__lostleader","watch-revision":9223372036854775805,"current-revision":16}
{"level":"info","ts":1527196358.910349,"caller":"mvcc/watcher_group.go:261","msg":"choosing future revision watcher from restore operation","watch-key":"proxy-namespace__lostleader","watch-revision":9223372036854775805,"current-revision":16}
```
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
We replace/insert into in-memory B-tree, which means
we only keep a single node per key thus do not support
delete by revision on B-tree. So, (*keyIndex).tombstone
has always been marked with latest revision.
tombstone with key's modified revision panics:
panic: store.keyindex: put with unexpected smaller revision [{2 0} / {2 0}]
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
For compaction, clone the original Btree for traversal purposes, so as to
not hold the lock for the duration of compaction. This allows read/write
throughput by not blocking when the index tree is large (> 1M entries).
mvcc: add comment for index compaction lock
mvcc: explicitly unlock store to do index compaction synchronously
mvcc: formatting index bench
mvcc: add release note for index compaction changes
mvcc: add license header
This allows for watchers to be created concurrently
without needing potentially complex and latency-adding
queuing on the client.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
It was getting revisions with "atRev==0", which makes
"available" from "keep" method always empty since
"walk" on "keyIndex" only returns true.
"available" should be populated with all revisions to be
kept if the compaction happens with the given revision.
But, "available" was being empty when "kvindex.Keep(0)"
since it's always the case that "rev.main > atRev==0".
Fix https://github.com/coreos/etcd/issues/9022.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
If Close() is called before Cancel()'s cancel() completes, the
watch channel will be closed while the watch is still in the
synced list. If there's an event, etcd will try to write to a
closed channel. Instead, remove the watch from the bookkeeping
structures only after cancel completes, so Close() will always
call it.
Fixes#8443