vitastor

Commit Graph

Author	SHA1	Message	Date
Vitaliy Filippov	0949f08407	Extract osd_primary write and sync code into separate files	2021-03-24 14:20:56 +03:00
Vitaliy Filippov	04a1f18fa5	Assign .req as a whole to always zero out the remaining part Also clear .reply before processing the operation	2021-03-24 14:20:56 +03:00
Vitaliy Filippov	cf9a641d66	Skip disconnected OSDs during sync	2021-03-24 14:20:56 +03:00
Vitaliy Filippov	05db1308aa	Fix two potential read/write ordering problems (even though not yet seen in tests) - Write operations could be 'stabilized' and previous versions could be purged from OSDs before the removal of version_override and following reads could potentially hit different version in EC pools - Object was marked clean after completing the delete during recovery, so reads could in theory hit a deleted version and return nothing	2021-03-24 14:20:56 +03:00
Vitaliy Filippov	98b54ca948	Don't try to "recover" misplaced objects if it would make them degraded	2021-03-21 01:37:23 +03:00
Vitaliy Filippov	23225c5e62	Do not run ping on clients that are not yet connected	2021-03-21 01:37:23 +03:00
Vitaliy Filippov	7e6e1a5a82	Release 0.5.10 The version seems to be stable after this bunch of fixes :) - Fix delete & write operation ordering during rebalance to not lose objects in the immediate_commit=off mode - Fix a possible crash caused by very high iodepths - Re-distribute PG primaries over OSDs that come up after a short downtime - Allow to specify etcd URLs for OSDs with http://, do not die with a strange error if -etcd option is missing for fio - Fix a journal flushing deadlock which sometimes occurred in the immediate_commit=off mode - Fix a bug where OSDs could hang if the data device filled up - Fix an allocator bug where it was unable to allocate up to last (n%64) data device blocks - Fix monitor crash that occurred on removal of some etcd keys - Fix a bug where PGs could remain incomplete due to incorrect PG history with just zeroes in osd_sets	2021-03-16 12:48:26 +03:00
Vitaliy Filippov	435045751d	Delete objects only after a SYNC during rebalance in the non-immediate_commit mode Previously OSDs could commit deletes before writes during recovery or rebalance in the "lazy fsync" (immediate_commit=off) mode which could result in lost objects	2021-03-16 12:48:26 +03:00
Vitaliy Filippov	c5fb1d5987	Do not duplicate blockstore operations when io_uring fills up This bug was leading to OSDs dying with "Assertion `fulfilled == read_op->len' failed" when testing fio -rw=randread -numjobs=8 -iodepth=128	2021-03-16 12:48:26 +03:00
Vitaliy Filippov	9f59381bea	Re-distribute PG primaries over OSDs that come up after a short downtime	2021-03-16 12:48:26 +03:00
Vitaliy Filippov	9ac7e75178	Allow to specify etcd URLs for OSDs with http://, do not die with a strange error if -etcd option is missing for fio	2021-03-16 12:48:26 +03:00
Vitaliy Filippov	88671cf745	Fix a bug causing all flushers to wait for an fsync without actually trying to do it This happened because flusher_count became dynamic and fsync_batch() was comparing the number of flushers currently ready to do an fsync with the maximum number of flushers. Also the number wasn't rechecked on every loop which was also incorrect. Now the interrupted_rebalance test passes even without IMMEDIATE_COMMIT=1.	2021-03-13 17:27:29 +03:00
Vitaliy Filippov	fe1749c427	Fix the multiple_interrupted_rebalance test	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	ceb9c28de7	Set default log_level before passing config to etcd_state_client	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	299d7d7c95	Use common macro for get_sqe	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	d1526b415f	Correctly resume writes when OSD is full to return an error	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	f49fd53d55	Fix a bug where allocator was unable to allocate up to last (n%64) blocks, add tests for it	2021-03-13 02:19:02 +03:00
Vitaliy Filippov	dd76eda5e5	Test multiple interrupted rebalancings Currently only passes with immediate_commit=all configuration (env variable IMMEDIATE_COMMIT=1 for the bash script)	2021-03-12 12:55:44 +03:00
Vitaliy Filippov	87dbd8fa57	Use empty hash as the default value for some etcd keys in the monitor	2021-03-12 12:40:15 +03:00
Vitaliy Filippov	b44f49aab2	Ignore zero OSDs in history osd_sets	2021-03-12 12:40:15 +03:00
Vitaliy Filippov	036555638e	Release 0.5.9 - Fix two monitor bugs which led to objects being "logically lost" (physically present on some secondary OSDs while primary doesn't know about it) after multiple interrupted rebalancings - Implement "no_recovery" and "no_rebalance" flags	2021-03-11 00:39:10 +03:00
Vitaliy Filippov	af5155fcd9	Implement "no_recovery" and "no_rebalance" flags	2021-03-11 00:36:31 +03:00
Vitaliy Filippov	0d2efbecc9	Preserve previous PG history when changing PG distribution Fixes incorrect PG history in case when a new rebalance is started before the finish of the previous one which could make primary OSDs unable to locate some objects on some secondaries.	2021-03-11 00:16:10 +03:00
Vitaliy Filippov	e62e8b6bae	Use real pg configuration instead of the "last clean" one for generating PG history Basically fixes the bug introduced in 0.5.7 where an rebalance interrupted by the monitor could result in forgetting objects moved to the new place	2021-03-10 02:01:44 +03:00
Vitaliy Filippov	c4ba24c305	Do not print ping op latency	2021-03-10 02:01:44 +03:00
Vitaliy Filippov	19e47a0279	Release 0.5.8 - Add heartbeats (fixes failover in case of network issues or offline nodes) - Fix a bug where a PG could incorrectly become listed as 'incomplete' if historical osd_sets included a set with the the PG's primary OSD as the only alive one - Use osd_out_time = 10 minutes by default instead of 30 minutes - Make monitors stick to a single selected etcd URL on start and not try to select random ones on every request - this was leading to etcd interaction errors when some etcds were unavailable	2020-03-09 02:38:17 +03:00
Vitaliy Filippov	bd178ac20f	Fix history osd_set check - local OSD is always available!	2021-03-09 02:18:18 +03:00
Vitaliy Filippov	7006875a24	Make monitor stick to one etcd until the restart	2021-03-09 02:15:38 +03:00
Vitaliy Filippov	ad577c4aac	Add PING operation and timeouts to detect OSD failures when a host goes down	2021-03-09 02:15:38 +03:00
Vitaliy Filippov	836635c518	Use osd_out_time = 10 minutes by default	2021-03-09 02:15:38 +03:00
Vitaliy Filippov	88a03f4e98	Release 0.5.7 - Fix multiple bugs leading to OSDs sometimes being unable to correctly activate PGs when a lot of PG peering events occurred in a small amount of time - Fix a bug where OSDs could list incomplete object versions during peering. The bug manifested with "local rollback operation failed" messages in OSD logs - Fix a bug where misplaced chunks for degraded and incomplete objects were not removed from extra OSDs during recovery - Fix incorrect PG history configuration resulting in OSDs being unable to find some of the objects after a PG count change - Simplify block layer write ordering logic - Avoid extra data move when a lot of OSDs are first stopped for long time and then restarted - Fix incorrect degraded & misplaced object statistics after a completed rebalance - Fix incorrect usage of pg_minsize instead of the minimal possible object chunk count in EC pools	2021-03-08 23:37:02 +03:00
Vitaliy Filippov	2a5036669d	Fix PG count change procedure In previous versions PG histories were calculated incorrectly during PG count change which led to objects being lost on OSDs not in PG's osd set.	2021-03-08 23:15:58 +03:00
Vitaliy Filippov	2e0c853180	Make test_change_pg_count check if any objects are lost during the test	2021-03-08 23:15:07 +03:00
Vitaliy Filippov	e91ff2a9ec	Only forget offline PGs if their state is not changed during reporting	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	086667f568	Do not check PG state key ownership if it doesn't exist yet This fixes the bug where OSDs were sometimes trying to report updated PG states infinitely without luck when PGs transitioned from 'starting' to 'peering' too fast	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	73ce20e246	Add a test for the "reappear after move" case	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	1be94da437	Check & remove extra chunks for degraded / incomplete objects, too	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	80e12358a2	Use pg_data_size instead of pg_minsize for object state calculation	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	36c935ace6	Use std::vector for the blockstore submission queue	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	0d8b5e2ef9	Remove unused enqueue_op_first()	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	98f1e2c277	Rework write/sync ordering Make syncs wait for all previous writes because it's the only way to make sure that OSDs do not receive incomplete writes in LIST results during peering when some writes are still in progress. Also simplify blockstore submission queue logic.	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	21e7686037	Fix possible "assertion failed: pg.inflight >= 0" error during PG stop	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	ab21a1908b	Check for the dirty PG flag when trying to continue to stop it after sync	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	30d1ccd43e	Fix an infinite loop when discarding list operations during stop_pg()	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	8bdd6d8d78	Reset PG state when stopping them	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	09b3e4e789	Fix OSDs being unable to stop PGs that are 'peering', not 'active' This was sometimes leading to incorrect misplaced and degraded object count statistics	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	07912fd670	Use history/last_clean_pgs to avoid extra data move when observing a series of changes in the cluster	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	bc742ccf8c	Fix a small memory leak in etcd_state_client	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	314b20437b	Do not break subsequent small writes badly when a big write is canceled	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	29bac892ad	Add .gitignore	2021-03-08 17:04:10 +03:00

1 2 3 4 5 ...

626 Commits (0949f0840707f16b6ea1935bf294155a0e27a6bf) All Branches Search

626 Commits (0949f0840707f16b6ea1935bf294155a0e27a6bf)

All Branches