vitastor

antilles

vitastor

Author	SHA1	Message	Date
Vitaliy Filippov	225eb2fe3d	Support RDMA without ODP by stupidly copying memory. Disable ODP by default ODP is slower than regular RDMA even with memory copy overhead Example numbers: - 3950000 random read iops without ODP vs 240000 iops with ODP - 1447000 random write iops without ODP vs 101000 iops with ODP Reference: https://tkygtr6.github.io/pub/ISPASS21_slides.pdf	2023-11-12 15:03:47 +03:00
Vitaliy Filippov	7e82573ed0	Fix RDMA connection leak which was preventing stable functioning of RDMA :)	2023-11-11 23:40:47 +03:00
Vitaliy Filippov	65487da4b1	Do not include msgr_rdma.h into messenger.h	2023-08-24 01:55:35 +03:00
Vitaliy Filippov	cdfc74665b	Close client FDs only when destroying the client, after handling all async reads/writes Fixes "Client XX command out of sync" sometimes happening on reconnections	2023-05-25 00:52:43 +03:00
Vitaliy Filippov	d06ed2b0e7	Implement online config update	2023-03-26 19:21:50 +03:00
Vitaliy Filippov	14d6acbcba	Set default rdma_max_recv/send to 16/8, fix documentation	2023-02-28 11:00:56 +03:00
Vitaliy Filippov	c3e80abad7	Allow to send more than 1 operation at a time	2023-02-26 02:01:04 +03:00
Vitaliy Filippov	c71e5e7bbd	Fix possible use-after-free during pings	2022-12-04 00:16:47 +03:00
Vitaliy Filippov	5a10d135f3	Allow to configure block_size, bitmap_granularity and immediate_commit per-pool	2022-08-11 01:56:33 +03:00
Vitaliy Filippov	7d79c58095	Use the larger sockaddr_storage structure	2022-02-12 11:22:56 +03:00
Vitaliy Filippov	e591a3e9f7	Include sys/stat.h in messenger.cpp No idea why, but it builds without it on x86 and does not build on e2k	2022-01-17 13:43:29 +03:00
Vitaliy Filippov	7bdd92ca4f	Fix build under clang and some warnings Build problems fixed: - void* pointer arithmetic which is a GNU extension (works as byte*) - "variable size object may not be initialized" which is OK under GCC - nullptr_t related error in json11 (it lacks 'operator <' in clang) Warnings fixed: - empty nested struct initializer { 0 } replaced by {} - removed several unused lambda captures	2022-01-16 00:02:54 +03:00
Vitaliy Filippov	20a4406acc	Support IPv6 OSD addresses	2021-12-19 10:42:17 +03:00
Vitaliy Filippov	660c3f7b0d	Change default RDMA settings to 128x 129K buffers 129K to leave extra space for the header The problem with 8x 1M buffers is that the following happens with, for example, 2 OSDs and 4M T1Q1 write: - Server posts 8 receives - Client posts 8 sends - WRs are processed by the RDMA stack, but the OSD doesn't have the time to handle them and doesn't refill buffers - Client posts 1 more send - RNR retransmission happens and performance drops to zero Overall it seems that RDMA support should be reworked to use real 'RDMA' operations i.e. operations writing into remote memory. This has an additional advantage of avoiding a copy at the receive side of the OSD.	2021-11-21 12:05:52 +03:00
Vitaliy Filippov	609bd4eb59	Remove naggy RDMA messages when log level is zero	2021-11-06 14:36:23 +03:00
Vitaliy Filippov	fc3a1e076a	Fix minor bugs in snapshot removal, check it in tests	2021-09-25 19:30:29 +03:00
Vitaliy Filippov	891250d355	Implement CAS writes From now on, reads will return the server-side object version numbers and writes and deletes will have an additional "version" parameter which, if set to a non-zero value, will be atomically compared with the current version of the object plus 1 and the modification will fail if it doesn't match. This feature opens the road to correct online flattening of snapshot layers and other interesting things.	2021-06-15 00:12:35 +03:00
Vitaliy Filippov	699a0fbbc7	Log to stderr instead of stdout in client	2021-05-15 19:22:24 +03:00
Vitaliy Filippov	6b2dd50f27	Fix build without RDMA	2021-05-08 18:20:43 +03:00
Vitaliy Filippov	d3978c6d0e	Do not print RDMA connection messages when log_level=0 By the way, it's 1 by default in the OSD, so these messages will still be there in OSD logs	2021-05-01 00:26:09 +03:00
Vitaliy Filippov	818ae5d61d	Some config parsing fixes	2021-05-01 00:20:01 +03:00
Vitaliy Filippov	72aa2fd819	Make OSD and client read common configuration from /etc/vitastor/vitastor.conf	2021-04-30 01:11:27 +03:00
Vitaliy Filippov	483c5ab380	Negotiate max_msg instead of max_sge, make buffer settings more conservative :-)	2021-04-29 11:10:35 +03:00
Vitaliy Filippov	971aa4ae4f	Implement RDMA receive with memory copying (send remains zero-copy) This is the simplest and, as usual, the best implementation :) 100% zero-copy implementation is also possible (see rdma-zerocopy branch), but it requires to create A LOT of queues (~128 per client) to use QPN as a 'tag' because of the lack of receive tags and the server may simply run out of queues. Hardware limit is 262144 on Mellanox ConnectX-4 which amounts to only 2048 'connections' per host. And even with that amount of queues it's still less optimal than the non-zerocopy one. In fact, newest hardware like Mellanox ConnectX-5 does have Tag Matching support, but it's still unsuitable for us because it doesn't support scatter/gather (tm_caps.max_sge=1).	2021-04-29 02:34:45 +03:00
Vitaliy Filippov	9e6cbc6ebc	Negotiate max_sge between RDMA client & server	2021-04-29 02:15:20 +03:00
Vitaliy Filippov	ce777319c3	WIP RDMA support Basic naive implementation works, but it's highly non-optimal as RNR retransmissions occur all the time. RDMA expects the receiver to always have place for incoming WRs...	2021-04-29 02:03:54 +03:00
Vitaliy Filippov	64eeb79051	Prevent 0.6.x OSDs from talking to 0.5.x The new protocol is almost compatible - it has bitmaps, but also it has a "bitmap_length" field. It's not hard to make 0.5-0.6 OSDs and clients compatible, but for now I just assume nobody needs it. If I'm wrong and anybody requests to upgrade their production 0.5.x system to 0.6.x I'll fix it.	2021-04-10 22:26:17 +03:00
Vitaliy Filippov	b0b2e7df3c	Fix use-after-free in keepalive_timer and rework stop_client() The bug reproduced if fio was temporarily stopped with SIGSTOP during write test and then resumed after 10 seconds. In this case "pings" were failed for all clients and fio process crashed with 'use-after-free' in keepalive_timer. It happened because it called stop_client while having a live iterator to the map.	2021-04-07 11:06:31 +03:00
Vitaliy Filippov	f6d705383a	Fix client connection recovery bugs, add dirty_ops limit	2021-04-07 11:06:31 +03:00
Vitaliy Filippov	68567c0e1f	Fix messenger possibly trying to connect to the same OSD twice	2021-04-07 01:30:38 +03:00
Vitaliy Filippov	04b00003e9	Log ping failures	2021-04-07 01:30:38 +03:00
Vitaliy Filippov	a48e2bbf18	Fix write replay ordering when immediate_commit != all Previous implementation didn't respect write ordering and could lead to corrupted data when restarting writes after an OSD outage Also rework cluster_client queueing logic and add tests for it to verify the correct behaviour	2021-04-03 14:51:52 +03:00
Vitaliy Filippov	829381b335	Extract some definitions to msgr_op.{cpp,h}	2021-04-03 14:36:04 +03:00
Vitaliy Filippov	23225c5e62	Do not run ping on clients that are not yet connected	2021-03-21 01:37:23 +03:00
Vitaliy Filippov	ad577c4aac	Add PING operation and timeouts to detect OSD failures when a host goes down	2021-03-09 02:15:38 +03:00
Vitaliy Filippov	bf9a175efc	Move C/C++ sources to src subdirectory	2021-02-25 23:59:03 +03:00

36 Commits (recovery-autotune)