ForceGosched() performs bad when GOMAXPROCS>1. When GOMAXPROCS=1, it
could promise that other goroutines run long enough
because it always yield the processor to other goroutines. But it cannot
yield processor to goroutine running on other processors. So when
GOMAXPROCS>1, the yield may finish when goroutine on the other
processor just runs for little time.
Here is a test to confirm the case:
```
package main
import (
"fmt"
"runtime"
"testing"
)
func ForceGosched() {
// possibility enough to sched up to 10 go routines.
for i := 0; i < 10000; i++ {
runtime.Gosched()
}
}
var d int
func loop(c chan struct{}) {
for {
select {
case <-c:
for i := 0; i < 1000; i++ {
fmt.Sprintf("come to time %d", i)
}
d++
}
}
}
func TestLoop(t *testing.T) {
c := make(chan struct{}, 1)
go loop(c)
c <- struct{}{}
ForceGosched()
if d != 1 {
t.Fatal("d is not incremented")
}
}
```
`go test -v -race` runs well, but `GOMAXPROCS=2 go test -v -race` fails.
Change the functionality to waiting for schedule to happen.
This PR sets etcd version and min cluster version in request header,
and let server check version compatibility. rafthttp server
will reject any message from peer with incompatible version(too low
version or too high version), and print out warning logs.
Because etcd 2.1 will build stream to any existing peers and etcd 2.0
requires the remote to provide most updated term, it is
necessary for streamReader to know the latest term.
This helps etcd 2.1 connect to msgappV1 handler when the remote member
doesn't support msgappV2. And it doesn't print out unsupported handler
error to make log clean.
Fix#2841.
From Prometheus developer:
```
the recommended way for etcd as an open source project and under
consideration of its size would be etcd_<subsystem>_<name>.
```
We made the naming change accordingly.
In all stream types, streamMsgApp needs to be closed when
updating term because its stream connection can only be used under
a certain term. But there is no need to close other streams, which
may waste time and reduce performance.
The original process is stopping etcd only when pipeline message finds itself
has been removed. After this PR, stream dial has this functionality too.
It helps fast etcd stop, which doesn't need to wait for stream break to
fall back to pipeline, and wait for election timeout to send out message
to detect self removal.
Add remotes to rafthttp, who help newly joined members catch up the
progress of the cluster. It supports basic message sending to remote, and
has no stream connection for simplicity. remotes will not be used
after the latest peers have been added into rafthttp.
The patch decreases the allocs when sending one AppEntry in msgappv2
stream from 30 to 9. This helps reduce CPU load when etcd is under
high write load.
msgappv2 stream is used to send all MsgApp, and replaces the
functionality of msgapp stream. Compared to v1, it has several
advantanges:
1. The output message is exactly the same with the input one, which
cannot be done in v1.
2. It uses one connection to stream persistently, which prevents message
reorder and saves the time to request stream.
3. It transmits 10 addiontional bytes in the procedure of committing one
proposal, which is trivia for idle time.
4. It transmits less bytes when committing mutliple proposals or keep
committing proposals.
The messages in channel are outdated, and there is no need to send
them in the future. It also reports unreachable if there are messages
in the channel.
The size of MsgSnap may be very big, e.g., 1G.
If its size is big and general streaming is used to send it, it may block
the following messages for several ten seconds, which interrupts the
heartbeat heavily.
Only use pipeline to send MsgSnap.
New rafthttp uses /raft/stream/msgapp for MsgApp stream, but v2.0 rafthttp
cannot understand it. Use the old endpoint /raft/stream instead for backward
compatibility, and plan to move to new endpoint in the version after the
next one.
Dial timeout is set shorter because
1. etcd is supposed to work in good environment, and the new value is long
enough
2. shorter dial timeout makes dial fail faster, which is good for
performance
It is not sent out because it is useless to let remote raft step the
message.
Moreover, MsgApp stream reader can always assume that the length
of entries sent is > 0.
If amounts of MsgProp goes to the follower, it could batch them and
forward to the leader. This can avoid dropping MsgProp in good path
due to exceed maximal serving in sender.
Moreover, batching MsgProp can increase the throughput of proposals that
come from follower.
Streaming buffer is used for:
1. hand over data to io goroutine in non-blocking way
2. hold pressure for temprorary network delay
3. be able to wait on I/O instead of data coming under high throughput
The old 1024 value is too small and is very likely to be full and
break the streaming when suffering temprorary network delay.