vitastor

Commit Graph

Author	SHA1	Message	Date
Vitaliy Filippov	7d79c58095	Use the larger sockaddr_storage structure	2022-02-12 11:22:56 +03:00
Vitaliy Filippov	20a4406acc	Support IPv6 OSD addresses	2021-12-19 10:42:17 +03:00
Vitaliy Filippov	660c3f7b0d	Change default RDMA settings to 128x 129K buffers 129K to leave extra space for the header The problem with 8x 1M buffers is that the following happens with, for example, 2 OSDs and 4M T1Q1 write: - Server posts 8 receives - Client posts 8 sends - WRs are processed by the RDMA stack, but the OSD doesn't have the time to handle them and doesn't refill buffers - Client posts 1 more send - RNR retransmission happens and performance drops to zero Overall it seems that RDMA support should be reworked to use real 'RDMA' operations i.e. operations writing into remote memory. This has an additional advantage of avoiding a copy at the receive side of the OSD.	2021-11-21 12:05:52 +03:00
Vitaliy Filippov	fc3a1e076a	Fix minor bugs in snapshot removal, check it in tests	2021-09-25 19:30:29 +03:00
Vitaliy Filippov	72aa2fd819	Make OSD and client read common configuration from /etc/vitastor/vitastor.conf	2021-04-30 01:11:27 +03:00
Vitaliy Filippov	483c5ab380	Negotiate max_msg instead of max_sge, make buffer settings more conservative :-)	2021-04-29 11:10:35 +03:00
Vitaliy Filippov	971aa4ae4f	Implement RDMA receive with memory copying (send remains zero-copy) This is the simplest and, as usual, the best implementation :) 100% zero-copy implementation is also possible (see rdma-zerocopy branch), but it requires to create A LOT of queues (~128 per client) to use QPN as a 'tag' because of the lack of receive tags and the server may simply run out of queues. Hardware limit is 262144 on Mellanox ConnectX-4 which amounts to only 2048 'connections' per host. And even with that amount of queues it's still less optimal than the non-zerocopy one. In fact, newest hardware like Mellanox ConnectX-5 does have Tag Matching support, but it's still unsuitable for us because it doesn't support scatter/gather (tm_caps.max_sge=1).	2021-04-29 02:34:45 +03:00
Vitaliy Filippov	9e6cbc6ebc	Negotiate max_sge between RDMA client & server	2021-04-29 02:15:20 +03:00
Vitaliy Filippov	ce777319c3	WIP RDMA support Basic naive implementation works, but it's highly non-optimal as RNR retransmissions occur all the time. RDMA expects the receiver to always have place for incoming WRs...	2021-04-29 02:03:54 +03:00
Vitaliy Filippov	860ac24762	Add "external" bitmap support to the secondary OSD protocol	2021-04-10 17:44:12 +03:00
Vitaliy Filippov	b0b2e7df3c	Fix use-after-free in keepalive_timer and rework stop_client() The bug reproduced if fio was temporarily stopped with SIGSTOP during write test and then resumed after 10 seconds. In this case "pings" were failed for all clients and fio process crashed with 'use-after-free' in keepalive_timer. It happened because it called stop_client while having a live iterator to the map.	2021-04-07 11:06:31 +03:00
Vitaliy Filippov	a48e2bbf18	Fix write replay ordering when immediate_commit != all Previous implementation didn't respect write ordering and could lead to corrupted data when restarting writes after an OSD outage Also rework cluster_client queueing logic and add tests for it to verify the correct behaviour	2021-04-03 14:51:52 +03:00
Vitaliy Filippov	829381b335	Extract some definitions to msgr_op.{cpp,h}	2021-04-03 14:36:04 +03:00
Vitaliy Filippov	ad577c4aac	Add PING operation and timeouts to detect OSD failures when a host goes down	2021-03-09 02:15:38 +03:00
Vitaliy Filippov	bf9a175efc	Move C/C++ sources to src subdirectory	2021-02-25 23:59:03 +03:00

15 Commits (2a2e914ef9ea450d537fbf368a6a766da529b3e7)