Fixes client hangs possible after stopping & restarting an osd.
Hangs happened when a connection was closed in the middle of reading a READ
operation reply from the network. In this case the operation being read was
in read_op and the client didn't free it when closing the connection.
Test case for msgr_read.cpp:
- Partially read reply for a READ operation
- stop_client()
- Check that the READ operation returns EPIPE
The bug was actually introduced in 0.5.11.
Basic naive implementation works, but it's highly non-optimal as
RNR retransmissions occur all the time. RDMA expects the receiver
to always have place for incoming WRs...
The bug reproduced if fio was temporarily stopped with SIGSTOP
during write test and then resumed after 10 seconds. In this case
"pings" were failed for all clients and fio process crashed with
'use-after-free' in keepalive_timer. It happened because it called
stop_client while having a live iterator to the map.
Previous implementation didn't respect write ordering and could lead
to corrupted data when restarting writes after an OSD outage
Also rework cluster_client queueing logic and add tests for it to verify the correct behaviour