Vitaliy Filippov
6561d4e040
Validate pool ID before executing the operation
...
Reply -EPIPE for non-existing pools because we assume that it means
that pool config isn't loaded yet. Previously OSD crashed on such operations
2020-10-30 01:02:46 +03:00
Vitaliy Filippov
e8ac08be14
Allow to overwrite incomplete objects or parts of objects to recover them
2020-10-24 02:14:41 +03:00
Vitaliy Filippov
701eb79422
Stabilize writes before deleting extra chunks to not stall peer journals
2020-10-23 22:45:05 +03:00
Vitaliy Filippov
776fe954a5
Fix crashes on multiple OSD reconnects
...
Identify clients by pointers instead of peer_fd as peer may be dropped
and reconnected between callbacks
Yeah maybe I need some Rust, but ... maybe in the future :)
2020-10-17 10:53:04 +00:00
Vitaliy Filippov
9f2a948712
Make pg_stripe_size a per-pool config
2020-10-01 18:51:49 +03:00
Vitaliy Filippov
0471b09b9c
Add license notices to all source code files
2020-09-17 23:07:06 +03:00
Vitaliy Filippov
53832d184a
Allow to use lazy sync with replicated pools
2020-09-06 12:08:44 +03:00
Vitaliy Filippov
352caeba14
Fix one more bug with replicated reads
2020-09-06 02:27:44 +03:00
Vitaliy Filippov
44973e7f27
Fix replicated pool bugs
2020-09-05 21:45:04 +03:00
Vitaliy Filippov
e051db5a73
Check for unsuccessful memory allocations
2020-09-05 01:42:11 +03:00
Vitaliy Filippov
4f9b5286a0
Add replicated pool support to OSD logic
...
...in theory :-D now it needs some testing
2020-09-05 01:42:11 +03:00
Vitaliy Filippov
168cc2c803
Add pool support to OSD, part 1
...
This just fixes all the code so it builds and works like before,
but doesn't yet bring the support for replicated pools.
2020-09-04 17:04:17 +03:00
Vitaliy Filippov
a7929931eb
Implement PG epochs to prevent the "version split"
...
The "version split" is when:
- A block is written to 1 OSD out of 3, all of them die
- OSDs 2 and 3 come up, the same block is written to both of them
- The remaining OSD comes up. Now all 3 OSDs have the same version of the same object,
but with different data.
2020-07-04 00:55:27 +03:00
Vitaliy Filippov
e680d6c1c3
Rename reconstruct_stripe and calc_rmw_parity to indicate that they are only for XOR N+1
2020-06-30 10:40:43 +03:00
Vitaliy Filippov
badf68c039
Support iovecs for read operations
2020-06-19 19:47:05 +03:00
Vitaliy Filippov
0f6d193d73
Postpone op callbacks to the end of handle_read(), fix a bug where primary OSD could reply -EPIPE with data to a read operation
2020-06-16 01:36:38 +03:00
Vitaliy Filippov
c573bc6bb3
(Probably almost) implement cluster client
2020-06-07 00:09:36 +03:00
Vitaliy Filippov
571be0f380
Make deletions instantly stable
...
"2-phase" (write->stabilize) process is pointless for deletions because it
doesn't protect us from incomplete objects. This happens because it removes
the version information from metadata after stabilization. Deletions require
"3-phase" process with a potentially very long 3rd phase.
So, deletions will be allowed to generate degraded and incomplete objects,
and for it to not affect users' ability to delete something, the cluster
will allow to delete whole inodes while storing a list of them in etcd.
Proper TRIM will be impossible until the implementation of the aforementioned
"3-phase" process, though.
By the way, this change also fixes a possible write stall after rebalancing
which was caused by the lack of "stabilize delete" operations.
2020-06-02 23:45:22 +03:00
Vitaliy Filippov
46e111272f
Replace assert(this_it == cur_op) with if() for the case of PG repeering
2020-06-02 14:30:57 +03:00
Vitaliy Filippov
c3fe9ad0d1
Fix rebalancing writes (add a forgotten state resume)
2020-06-02 01:26:14 +03:00
Vitaliy Filippov
45b1c2fbf1
Fix canceling of write operations on PG re-peer (which led to use-after-free, too...)
2020-06-01 16:18:14 +03:00
Vitaliy Filippov
b466e215f0
Fix queued OP_SYNC execution
2020-05-27 13:55:25 +03:00
Vitaliy Filippov
0aca6e9ca8
Extract peer connect and read-write loop into a separate file (to be shared with the client library)
2020-05-26 22:11:30 +03:00
Vitaliy Filippov
e09d0e0678
Several bug fixes
...
- Do not block flock() requests
- Fix stop_client(0) attempts leading to std::bad_function_call
- Fix degraded writes crashing due to an unset stripes[i].missing (at least with a missing parity device)
- Fix recovery B/W reporting
2020-05-24 01:51:35 +03:00
Vitaliy Filippov
393fe75900
Fix creepy (osd_op_t*)(long) casts
2020-05-23 15:43:37 +03:00
Vitaliy Filippov
19f25c7cd5
Handle integer overflow of the op_stat_count
2020-05-15 01:37:17 +03:00
Vitaliy Filippov
5084ff7c6c
Measure & report recovery op count and bandwidth
2020-05-15 01:29:15 +03:00
Vitaliy Filippov
e8149e5848
Implement OSD_OP_DELETE
2020-05-05 00:39:51 +03:00
Vitaliy Filippov
00cf24fbd7
Split osd_primary.cpp
2020-05-03 11:04:20 +03:00
Vitaliy Filippov
bd0fe6e4cc
Fix PGs not stopping during sync, fix state reporting autovivification of erased PGs
2020-05-01 01:33:14 +03:00
Vitaliy Filippov
7b57eeeeb3
Implement PG state locking and PG moving in response to etcd events
2020-04-29 22:23:38 +03:00
Vitaliy Filippov
37b27c3025
Implement basic OSD status reporting to Consul
2020-04-14 14:52:06 +03:00
Vitaliy Filippov
298b013eae
Add simple http request function
2020-04-11 12:05:58 +03:00
Vitaliy Filippov
0880a77c1a
2 FIXME for the future
2020-04-06 00:55:47 +03:00
Vitaliy Filippov
d59be0e8b4
Delete misplaced chunks after moving the object, reset object state in primary_write
2020-04-05 15:51:22 +03:00
Vitaliy Filippov
cf7de0f181
(Almost) Implement misplaced recovery, integrating it into calc_rmw()
2020-04-05 15:50:53 +03:00
Vitaliy Filippov
dfb6e15eaa
Implement graceful stopping of PGs
2020-04-03 13:03:42 +03:00
Vitaliy Filippov
afe2e76c87
Implement regular automatic syncs, split osd_t constructor into some methods
2020-04-02 22:16:46 +03:00
Vitaliy Filippov
0f43f6d3f6
Fix crashes, print some stats
...
Notably:
- fix the `delete op` inside lambda callback crash (it frees the lambda itself
which results in use-after-free with g++)
- fix stop_client() reenterability
- fix a bug in the blockstore layer which resulted in always returning version=0
for zero-length reads
- change error codes for blockstore_stabilize
2020-03-31 17:55:31 +03:00
Vitaliy Filippov
92c800bb64
Forget unstable writes when re-peering, rename parity_block_size -> pg_stripe_size, pg_parity_size -> pg_block_size
2020-03-31 02:09:25 +03:00
Vitaliy Filippov
8a8b619875
Handle secondary OSD connection errors [in theory]
2020-03-30 19:51:34 +03:00
Vitaliy Filippov
43fe1d88e7
Fix memory leaks with subops, fix recovery crashes
2020-03-28 19:09:20 +03:00
Vitaliy Filippov
1b30120918
Fix stripe reconstruction in recovery, only write modified object parts
2020-03-28 13:58:42 +03:00
Vitaliy Filippov
c0a22d825d
Fix degraded object recovery (it seems to work now)
2020-03-25 02:17:41 +03:00
Vitaliy Filippov
250f22c0b6
Implement basic degraded object recovery (integrated into primary_write)
2020-03-25 01:17:50 +03:00
Vitaliy Filippov
dbd8418798
Reply using a single finish_op() method, allow to call OSD ops from inside the OSD
2020-03-24 00:18:52 +03:00
Vitaliy Filippov
21d0b06959
Implement flushing (stabilize/rollback) of unstable entries on start of the PG
2020-03-14 02:49:34 +03:00
Vitaliy Filippov
31f9445030
Use immediate_commit to benefit the primary OSD
2020-03-10 02:20:16 +03:00
Vitaliy Filippov
56765ab750
Send all iovecs at once
2020-02-29 02:27:19 +03:00
Vitaliy Filippov
2be4824a7a
Fix a small memory leak and BS_OP_SYNC mishandling, now fio does not hang during primary-osd test
2020-02-28 01:46:39 +03:00