vitastor

Commit Graph

Author	SHA1	Message	Date
Vitaliy Filippov	0aa2dd2890	Send bitmaps with primary-reads, actually read bitmaps for READ ops	2021-04-10 17:44:12 +03:00
Vitaliy Filippov	6bf88883ac	Allocate bitmaps along with stripes to avoid memory fragmentation	2021-04-10 17:44:12 +03:00
Vitaliy Filippov	004f265393	Remove cryptic bitmap inlining from bs_op_t and osd_op_t, use bitmap in primary OSD code	2021-04-10 17:44:12 +03:00
Vitaliy Filippov	860ac24762	Add "external" bitmap support to the secondary OSD protocol	2021-04-10 17:44:12 +03:00
Vitaliy Filippov	6107a4d07b	Add "external" bitmap support to blockstore	2021-04-10 17:44:12 +03:00
Vitaliy Filippov	95c29b9dc3	Add "external" bitmap support to osd_rmw	2021-04-10 17:44:12 +03:00
Vitaliy Filippov	6909807068	Allow to start the OSD just to flush the journal completely	2021-04-10 17:44:12 +03:00
Vitaliy Filippov	18c72f4835	Correct reenterability fix (now verified with a test) It's rather funny but 0.5.12 has to be re-published again	2021-04-09 12:10:16 +03:00
Vitaliy Filippov	40b7c21fb1	Followup to `307c1731c1` - fix mark_stable	2021-04-08 15:47:18 +03:00
Vitaliy Filippov	efb3678606	Fix qemu-img broken in 0.5.11 Caused by the lack of reenterability of the main cluster_client function	2021-04-08 14:59:20 +03:00
Vitaliy Filippov	8d87e32175	Fix msgr_op.h includes	2021-04-08 01:18:46 +03:00
Vitaliy Filippov	b0b2e7df3c	Fix use-after-free in keepalive_timer and rework stop_client() The bug reproduced if fio was temporarily stopped with SIGSTOP during write test and then resumed after 10 seconds. In this case "pings" were failed for all clients and fio process crashed with 'use-after-free' in keepalive_timer. It happened because it called stop_client while having a live iterator to the map.	2021-04-07 11:06:31 +03:00
Vitaliy Filippov	97efb9e299	Do not crash on PG re-peering events when operations are in progress	2021-04-07 11:06:31 +03:00
Vitaliy Filippov	f6d705383a	Fix client connection recovery bugs, add dirty_ops limit	2021-04-07 11:06:31 +03:00
Vitaliy Filippov	68567c0e1f	Fix messenger possibly trying to connect to the same OSD twice	2021-04-07 01:30:38 +03:00
Vitaliy Filippov	04b00003e9	Log ping failures	2021-04-07 01:30:38 +03:00
Vitaliy Filippov	307c1731c1	Forget all dirty_entries before stable big_write or delete during initialisation This fixes a 'double_alloc' assertion in the following case: - big_write object #1 v1 to block #100 - big_write object #1 v2 to block #101 - big_write object #2 v1 to block #100	2021-04-07 01:30:38 +03:00
Vitaliy Filippov	a48e2bbf18	Fix write replay ordering when immediate_commit != all Previous implementation didn't respect write ordering and could lead to corrupted data when restarting writes after an OSD outage Also rework cluster_client queueing logic and add tests for it to verify the correct behaviour	2021-04-03 14:51:52 +03:00
Vitaliy Filippov	688821665a	Remove stoull_full() from etcd_state_client.cpp	2021-04-03 14:36:04 +03:00
Vitaliy Filippov	3e162d95a0	Remove http_client.h include from etcd_state_client.h	2021-04-03 14:36:04 +03:00
Vitaliy Filippov	829381b335	Extract some definitions to msgr_op.{cpp,h}	2021-04-03 14:36:04 +03:00
Vitaliy Filippov	54f2353f24	Use bitmap granularity for alignment checks	2021-04-03 14:36:04 +03:00
Vitaliy Filippov	e47f6fba60	Remove cluster_client_t::stop()	2021-04-03 14:35:42 +03:00
Vitaliy Filippov	883bf84a16	Fix build	2021-04-03 01:47:15 +03:00
Vitaliy Filippov	52097c4856	Stop flushing when less than min_flusher_count operations are available (unless a trim is forced)	2021-04-03 00:53:28 +03:00
Vitaliy Filippov	e1355cbc74	Report failed operation name in cluster_client	2021-04-03 00:53:28 +03:00
Vitaliy Filippov	8f8b90be7a	Add min_flusher_count configuration	2021-04-03 00:53:28 +03:00
Vitaliy Filippov	ad9f619370	Skip double allocs when reading journal	2021-04-03 00:53:28 +03:00
Vitaliy Filippov	f4769ba7c7	Collapse create+delete journal entry pairs if they're already flushed Old journal replay mechanism could lead to a double allocation of the same block and a "Fatal error: tried to overwrite non-zero metadata entry"	2021-04-03 00:53:28 +03:00
Vitaliy Filippov	843b7052d2	Add an assertion when clearing deleted metadata entries, add debug details when freeing blocks	2021-04-03 00:53:28 +03:00
Vitaliy Filippov	4095bcc558	Do not ignore object deletion journal entries when they are preceded by a big write	2021-03-25 11:00:10 +03:00
Vitaliy Filippov	564d64e271	Add some details for debug prints	2021-03-25 11:00:10 +03:00
Vitaliy Filippov	cf54741c95	Followup to `05db1308aa` Don't do anything with the object state after errors because it's freed by PG re-peer in this case	2021-03-25 11:00:10 +03:00
Vitaliy Filippov	18a5fafa2a	Fix rollback	2021-03-25 02:41:58 +03:00
Vitaliy Filippov	06f4978085	Fix fsync check in blockstore_flush (data fsyncs were disabled instead of journal fsyncs)	2021-03-25 02:41:58 +03:00
Vitaliy Filippov	7ebf1588c5	Check for immediate_commit==small in the OSD code	2021-03-25 02:41:58 +03:00
Vitaliy Filippov	b0ad1e1e6d	Remember writes as "unsynced" only after completing them Previously BS_OP_SYNC could take unfinished writes and add them into the journal before they were actually completed. This was leading to crashes with the message "BUG: Unexpected dirty_entry 2000000000001:9f2a0000 v3 unstable state during flush: 338"	2021-03-25 02:41:58 +03:00
Vitaliy Filippov	0949f08407	Extract osd_primary write and sync code into separate files	2021-03-24 14:20:56 +03:00
Vitaliy Filippov	04a1f18fa5	Assign .req as a whole to always zero out the remaining part Also clear .reply before processing the operation	2021-03-24 14:20:56 +03:00
Vitaliy Filippov	cf9a641d66	Skip disconnected OSDs during sync	2021-03-24 14:20:56 +03:00
Vitaliy Filippov	05db1308aa	Fix two potential read/write ordering problems (even though not yet seen in tests) - Write operations could be 'stabilized' and previous versions could be purged from OSDs before the removal of version_override and following reads could potentially hit different version in EC pools - Object was marked clean after completing the delete during recovery, so reads could in theory hit a deleted version and return nothing	2021-03-24 14:20:56 +03:00
Vitaliy Filippov	98b54ca948	Don't try to "recover" misplaced objects if it would make them degraded	2021-03-21 01:37:23 +03:00
Vitaliy Filippov	23225c5e62	Do not run ping on clients that are not yet connected	2021-03-21 01:37:23 +03:00
Vitaliy Filippov	435045751d	Delete objects only after a SYNC during rebalance in the non-immediate_commit mode Previously OSDs could commit deletes before writes during recovery or rebalance in the "lazy fsync" (immediate_commit=off) mode which could result in lost objects	2021-03-16 12:48:26 +03:00
Vitaliy Filippov	c5fb1d5987	Do not duplicate blockstore operations when io_uring fills up This bug was leading to OSDs dying with "Assertion `fulfilled == read_op->len' failed" when testing fio -rw=randread -numjobs=8 -iodepth=128	2021-03-16 12:48:26 +03:00
Vitaliy Filippov	9ac7e75178	Allow to specify etcd URLs for OSDs with http://, do not die with a strange error if -etcd option is missing for fio	2021-03-16 12:48:26 +03:00
Vitaliy Filippov	88671cf745	Fix a bug causing all flushers to wait for an fsync without actually trying to do it This happened because flusher_count became dynamic and fsync_batch() was comparing the number of flushers currently ready to do an fsync with the maximum number of flushers. Also the number wasn't rechecked on every loop which was also incorrect. Now the interrupted_rebalance test passes even without IMMEDIATE_COMMIT=1.	2021-03-13 17:27:29 +03:00
Vitaliy Filippov	ceb9c28de7	Set default log_level before passing config to etcd_state_client	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	299d7d7c95	Use common macro for get_sqe	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	d1526b415f	Correctly resume writes when OSD is full to return an error	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	f49fd53d55	Fix a bug where allocator was unable to allocate up to last (n%64) blocks, add tests for it	2021-03-13 02:19:02 +03:00
Vitaliy Filippov	b44f49aab2	Ignore zero OSDs in history osd_sets	2021-03-12 12:40:15 +03:00
Vitaliy Filippov	af5155fcd9	Implement "no_recovery" and "no_rebalance" flags	2021-03-11 00:36:31 +03:00
Vitaliy Filippov	c4ba24c305	Do not print ping op latency	2021-03-10 02:01:44 +03:00
Vitaliy Filippov	bd178ac20f	Fix history osd_set check - local OSD is always available!	2021-03-09 02:18:18 +03:00
Vitaliy Filippov	ad577c4aac	Add PING operation and timeouts to detect OSD failures when a host goes down	2021-03-09 02:15:38 +03:00
Vitaliy Filippov	e91ff2a9ec	Only forget offline PGs if their state is not changed during reporting	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	086667f568	Do not check PG state key ownership if it doesn't exist yet This fixes the bug where OSDs were sometimes trying to report updated PG states infinitely without luck when PGs transitioned from 'starting' to 'peering' too fast	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	1be94da437	Check & remove extra chunks for degraded / incomplete objects, too	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	80e12358a2	Use pg_data_size instead of pg_minsize for object state calculation	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	36c935ace6	Use std::vector for the blockstore submission queue	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	0d8b5e2ef9	Remove unused enqueue_op_first()	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	98f1e2c277	Rework write/sync ordering Make syncs wait for all previous writes because it's the only way to make sure that OSDs do not receive incomplete writes in LIST results during peering when some writes are still in progress. Also simplify blockstore submission queue logic.	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	21e7686037	Fix possible "assertion failed: pg.inflight >= 0" error during PG stop	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	ab21a1908b	Check for the dirty PG flag when trying to continue to stop it after sync	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	30d1ccd43e	Fix an infinite loop when discarding list operations during stop_pg()	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	8bdd6d8d78	Reset PG state when stopping them	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	09b3e4e789	Fix OSDs being unable to stop PGs that are 'peering', not 'active' This was sometimes leading to incorrect misplaced and degraded object count statistics	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	bc742ccf8c	Fix a small memory leak in etcd_state_client	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	314b20437b	Do not break subsequent small writes badly when a big write is canceled	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	29d8ac8b1b	Do not report statistics for the empty operation	2021-03-01 16:20:57 +03:00
Vitaliy Filippov	6155b23a7e	Replace pgs[id] with pgs.at(id) to prevent accidental auto-vivification	2021-02-28 19:36:59 +03:00
Vitaliy Filippov	46e79f3306	Wait for PGs to become clean before stopping them	2021-02-28 19:36:59 +03:00
Vitaliy Filippov	41fd14e024	Fix deletes not increasing write_iodepth	2021-02-28 19:36:59 +03:00
Vitaliy Filippov	2d73b19a6c	Fix online PG count change bugs	2021-02-25 23:59:33 +03:00
Vitaliy Filippov	c974cb539c	Make flusher_count adaptive and limit write iodepth	2021-02-25 23:59:33 +03:00
Vitaliy Filippov	bf9a175efc	Move C/C++ sources to src subdirectory	2021-02-25 23:59:03 +03:00

... 6 7 8 9 10

477 Commits (7b358016472eb4278cb2363d17006ce216a2b4e7)