vitastor

Commit Graph

Author	SHA1	Message	Date
Vitaliy Filippov	88671cf745	Fix a bug causing all flushers to wait for an fsync without actually trying to do it This happened because flusher_count became dynamic and fsync_batch() was comparing the number of flushers currently ready to do an fsync with the maximum number of flushers. Also the number wasn't rechecked on every loop which was also incorrect. Now the interrupted_rebalance test passes even without IMMEDIATE_COMMIT=1.	2021-03-13 17:27:29 +03:00
Vitaliy Filippov	fe1749c427	Fix the multiple_interrupted_rebalance test	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	ceb9c28de7	Set default log_level before passing config to etcd_state_client	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	299d7d7c95	Use common macro for get_sqe	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	d1526b415f	Correctly resume writes when OSD is full to return an error	2021-03-13 17:19:45 +03:00
Vitaliy Filippov	f49fd53d55	Fix a bug where allocator was unable to allocate up to last (n%64) blocks, add tests for it	2021-03-13 02:19:02 +03:00
Vitaliy Filippov	dd76eda5e5	Test multiple interrupted rebalancings Currently only passes with immediate_commit=all configuration (env variable IMMEDIATE_COMMIT=1 for the bash script)	2021-03-12 12:55:44 +03:00
Vitaliy Filippov	87dbd8fa57	Use empty hash as the default value for some etcd keys in the monitor	2021-03-12 12:40:15 +03:00
Vitaliy Filippov	b44f49aab2	Ignore zero OSDs in history osd_sets	2021-03-12 12:40:15 +03:00
Vitaliy Filippov	036555638e	Release 0.5.9 - Fix two monitor bugs which led to objects being "logically lost" (physically present on some secondary OSDs while primary doesn't know about it) after multiple interrupted rebalancings - Implement "no_recovery" and "no_rebalance" flags	2021-03-11 00:39:10 +03:00
Vitaliy Filippov	af5155fcd9	Implement "no_recovery" and "no_rebalance" flags	2021-03-11 00:36:31 +03:00
Vitaliy Filippov	0d2efbecc9	Preserve previous PG history when changing PG distribution Fixes incorrect PG history in case when a new rebalance is started before the finish of the previous one which could make primary OSDs unable to locate some objects on some secondaries.	2021-03-11 00:16:10 +03:00
Vitaliy Filippov	e62e8b6bae	Use real pg configuration instead of the "last clean" one for generating PG history Basically fixes the bug introduced in 0.5.7 where an rebalance interrupted by the monitor could result in forgetting objects moved to the new place	2021-03-10 02:01:44 +03:00
Vitaliy Filippov	c4ba24c305	Do not print ping op latency	2021-03-10 02:01:44 +03:00
Vitaliy Filippov	19e47a0279	Release 0.5.8 - Add heartbeats (fixes failover in case of network issues or offline nodes) - Fix a bug where a PG could incorrectly become listed as 'incomplete' if historical osd_sets included a set with the the PG's primary OSD as the only alive one - Use osd_out_time = 10 minutes by default instead of 30 minutes - Make monitors stick to a single selected etcd URL on start and not try to select random ones on every request - this was leading to etcd interaction errors when some etcds were unavailable	2020-03-09 02:38:17 +03:00
Vitaliy Filippov	bd178ac20f	Fix history osd_set check - local OSD is always available!	2021-03-09 02:18:18 +03:00
Vitaliy Filippov	7006875a24	Make monitor stick to one etcd until the restart	2021-03-09 02:15:38 +03:00
Vitaliy Filippov	ad577c4aac	Add PING operation and timeouts to detect OSD failures when a host goes down	2021-03-09 02:15:38 +03:00
Vitaliy Filippov	836635c518	Use osd_out_time = 10 minutes by default	2021-03-09 02:15:38 +03:00
Vitaliy Filippov	88a03f4e98	Release 0.5.7 - Fix multiple bugs leading to OSDs sometimes being unable to correctly activate PGs when a lot of PG peering events occurred in a small amount of time - Fix a bug where OSDs could list incomplete object versions during peering. The bug manifested with "local rollback operation failed" messages in OSD logs - Fix a bug where misplaced chunks for degraded and incomplete objects were not removed from extra OSDs during recovery - Fix incorrect PG history configuration resulting in OSDs being unable to find some of the objects after a PG count change - Simplify block layer write ordering logic - Avoid extra data move when a lot of OSDs are first stopped for long time and then restarted - Fix incorrect degraded & misplaced object statistics after a completed rebalance - Fix incorrect usage of pg_minsize instead of the minimal possible object chunk count in EC pools	2021-03-08 23:37:02 +03:00
Vitaliy Filippov	2a5036669d	Fix PG count change procedure In previous versions PG histories were calculated incorrectly during PG count change which led to objects being lost on OSDs not in PG's osd set.	2021-03-08 23:15:58 +03:00
Vitaliy Filippov	2e0c853180	Make test_change_pg_count check if any objects are lost during the test	2021-03-08 23:15:07 +03:00
Vitaliy Filippov	e91ff2a9ec	Only forget offline PGs if their state is not changed during reporting	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	086667f568	Do not check PG state key ownership if it doesn't exist yet This fixes the bug where OSDs were sometimes trying to report updated PG states infinitely without luck when PGs transitioned from 'starting' to 'peering' too fast	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	73ce20e246	Add a test for the "reappear after move" case	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	1be94da437	Check & remove extra chunks for degraded / incomplete objects, too	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	80e12358a2	Use pg_data_size instead of pg_minsize for object state calculation	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	36c935ace6	Use std::vector for the blockstore submission queue	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	0d8b5e2ef9	Remove unused enqueue_op_first()	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	98f1e2c277	Rework write/sync ordering Make syncs wait for all previous writes because it's the only way to make sure that OSDs do not receive incomplete writes in LIST results during peering when some writes are still in progress. Also simplify blockstore submission queue logic.	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	21e7686037	Fix possible "assertion failed: pg.inflight >= 0" error during PG stop	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	ab21a1908b	Check for the dirty PG flag when trying to continue to stop it after sync	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	30d1ccd43e	Fix an infinite loop when discarding list operations during stop_pg()	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	8bdd6d8d78	Reset PG state when stopping them	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	09b3e4e789	Fix OSDs being unable to stop PGs that are 'peering', not 'active' This was sometimes leading to incorrect misplaced and degraded object count statistics	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	07912fd670	Use history/last_clean_pgs to avoid extra data move when observing a series of changes in the cluster	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	bc742ccf8c	Fix a small memory leak in etcd_state_client	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	314b20437b	Do not break subsequent small writes badly when a big write is canceled	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	29bac892ad	Add .gitignore	2021-03-08 17:04:10 +03:00
Vitaliy Filippov	cf7547faf3	Fix *.sh build scripts	2021-03-02 02:17:11 +03:00
Vitaliy Filippov	ab90ed747f	Release 0.5.6 - Fix operation statistics - Fix a rebalance hang introduced in 0.5.5 - Test PG count changes with actual data moving - Fix a possible 'unexpected pg state: 0' error during PG count change	2021-03-01 16:26:04 +03:00
Vitaliy Filippov	29d8ac8b1b	Do not report statistics for the empty operation	2021-03-01 16:20:57 +03:00
Vitaliy Filippov	97795ea1b1	Use pg_minsize=2 in the pg_count change test Also don't check for has_degraded because it's not a bug that objects are _temporarily_ listed as degraded during PG peering as it's not required for the new primary to connect to _all_ older peers to start peering. The test may be improved in the future by temporarily disabling degraded recovery during it and returning the has_degraded check back.	2021-03-01 16:18:08 +03:00
Vitaliy Filippov	24e7075f08	Fix monitor's statistics aggregation	2021-02-28 19:51:16 +03:00
Vitaliy Filippov	6155b23a7e	Replace pgs[id] with pgs.at(id) to prevent accidental auto-vivification	2021-02-28 19:36:59 +03:00
Vitaliy Filippov	7d49706c07	Improve the pg_count change test: add more OSDs and actually move data between them	2021-02-28 19:36:59 +03:00
Vitaliy Filippov	46e79f3306	Wait for PGs to become clean before stopping them	2021-02-28 19:36:59 +03:00
Vitaliy Filippov	41fd14e024	Fix deletes not increasing write_iodepth	2021-02-28 19:36:59 +03:00
Vitaliy Filippov	bb2d9a3afe	Release 0.5.5 - Transition to CMake build system - Fix Monitor being unable to change PG sizes - Fix PG optimizer not using some OSDs in some cases - Fix inability to change PG count online - Improve journal flusher performance - Add a little better systemd unit generator - Use w=8 with jerasure (breaking change for EC pools)	2021-02-26 01:59:18 +03:00
Vitaliy Filippov	e899ed2c25	Make OSDs with 256 flushers (as they are now dynamic)	2021-02-26 01:59:18 +03:00

1 2 3 4 5 ...

615 Commits (88671cf745e631ff303cafd3d5c6da44b01ed920) All Branches Search

615 Commits (88671cf745e631ff303cafd3d5c6da44b01ed920)

All Branches