This is the simplest and, as usual, the best implementation :)
100% zero-copy implementation is also possible (see rdma-zerocopy branch),
but it requires to create A LOT of queues (~128 per client) to use QPN as a 'tag'
because of the lack of receive tags and the server may simply run out of queues.
Hardware limit is 262144 on Mellanox ConnectX-4 which amounts to only 2048
'connections' per host. And even with that amount of queues it's still less optimal
than the non-zerocopy one.
In fact, newest hardware like Mellanox ConnectX-5 does have Tag Matching
support, but it's still unsuitable for us because it doesn't support scatter/gather
(tm_caps.max_sge=1).
Basic naive implementation works, but it's highly non-optimal as
RNR retransmissions occur all the time. RDMA expects the receiver
to always have place for incoming WRs...
Rework client operation queue from a vector to a linked list.
This is required to rework continue_ops() as its current implementation
consumes ~25% of client process CPU.
Warning: upgrading from 0.5.x is currently not supported!
Please create an issue if you really need upgrade capability.
New features:
- Snapshots and Copy-on-Write clones
- Inode (image) names
- Inode I/O and space statistics
- Write throttling for smoothing random write workloads in SSD+HDD configurations
The new protocol is almost compatible - it has bitmaps, but also it has
a "bitmap_length" field. It's not hard to make 0.5-0.6 OSDs and clients
compatible, but for now I just assume nobody needs it.
If I'm wrong and anybody requests to upgrade their production 0.5.x system
to 0.6.x I'll fix it.
Each inode has: image name, parent inode number & pool, size and readonly flag
Snapshots are created by switching image name to a different inode number
while using the older inode as parent.