Commit Graph

52 Commits (master)

Author SHA1 Message Date
Vitaliy Filippov 6875a838e0 Capture all by value in qemu_proxy 2021-03-16 12:48:36 +03:00
Vitaliy Filippov ee44f64927 Introduce image names and metadata storage in etcd
Each inode has: image name, parent inode number & pool, size and readonly flag

Snapshots are created by switching image name to a different inode number
while using the older inode as parent.
2021-03-16 12:48:36 +03:00
Vitaliy Filippov abf0611d93 Use clean_entry_bitmap_size instead of entry_attr_size back because of changed bitmap handling 2021-03-16 12:48:36 +03:00
Vitaliy Filippov edbf0eb040 Add a test for snapshots, fix bugs. Now the test passes 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 18f71b059a Fix part bitmap addresses 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 2db2ed22ea Fix several snapshot I/O bugs 2021-03-16 12:48:36 +03:00
Vitaliy Filippov aa7699da24 Fix subop generation for snapshot implementation 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 853ecba780 Actual snapshot support (untested) 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 2f9c76b8fc Report inode I/O statistics, aggregate it in the monitor 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 8da7f26459 Report inode space usage statistics to etcd, aggregate it in the monitor 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 9998b50c7e Add inode space usage statistics tracking to blockstore 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 0422d94a70 Send bitmaps with primary-reads, actually read bitmaps for READ ops 2021-03-16 12:48:36 +03:00
Vitaliy Filippov ff2208ae70 Allocate bitmaps along with stripes to avoid memory fragmentation 2021-03-16 12:48:36 +03:00
Vitaliy Filippov ae54dddb0c Remove cryptic bitmap inlining from bs_op_t and osd_op_t, use bitmap in primary OSD code 2021-03-16 12:48:36 +03:00
Vitaliy Filippov bfc175fe0f Add "external" bitmap support to the secondary OSD protocol 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 07e10210b6 Use bitmap granularity for alignment checks 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 221b728fc9 Add "external" bitmap support to blockstore 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 6625aaae00 Add "external" bitmap support to osd_rmw 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 435045751d Delete objects only after a SYNC during rebalance in the non-immediate_commit mode
Previously OSDs could commit deletes before writes during recovery or rebalance
in the "lazy fsync" (immediate_commit=off) mode which could result in lost objects
2021-03-16 12:48:26 +03:00
Vitaliy Filippov c5fb1d5987 Do not duplicate blockstore operations when io_uring fills up
This bug was leading to OSDs dying with "Assertion `fulfilled == read_op->len' failed"
when testing fio -rw=randread -numjobs=8 -iodepth=128
2021-03-16 12:48:26 +03:00
Vitaliy Filippov 9ac7e75178 Allow to specify etcd URLs for OSDs with http://, do not die with a strange error if -etcd option is missing for fio 2021-03-16 12:48:26 +03:00
Vitaliy Filippov 88671cf745 Fix a bug causing all flushers to wait for an fsync without actually trying to do it
This happened because flusher_count became dynamic and fsync_batch() was comparing the number
of flushers currently ready to do an fsync with the maximum number of flushers. Also the number
wasn't rechecked on every loop which was also incorrect.

Now the interrupted_rebalance test passes even without IMMEDIATE_COMMIT=1.
2021-03-13 17:27:29 +03:00
Vitaliy Filippov ceb9c28de7 Set default log_level before passing config to etcd_state_client 2021-03-13 17:19:45 +03:00
Vitaliy Filippov 299d7d7c95 Use common macro for get_sqe 2021-03-13 17:19:45 +03:00
Vitaliy Filippov d1526b415f Correctly resume writes when OSD is full to return an error 2021-03-13 17:19:45 +03:00
Vitaliy Filippov f49fd53d55 Fix a bug where allocator was unable to allocate up to last (n%64) blocks, add tests for it 2021-03-13 02:19:02 +03:00
Vitaliy Filippov b44f49aab2 Ignore zero OSDs in history osd_sets 2021-03-12 12:40:15 +03:00
Vitaliy Filippov af5155fcd9 Implement "no_recovery" and "no_rebalance" flags 2021-03-11 00:36:31 +03:00
Vitaliy Filippov c4ba24c305 Do not print ping op latency 2021-03-10 02:01:44 +03:00
Vitaliy Filippov bd178ac20f Fix history osd_set check - local OSD is always available! 2021-03-09 02:18:18 +03:00
Vitaliy Filippov ad577c4aac Add PING operation and timeouts to detect OSD failures when a host goes down 2021-03-09 02:15:38 +03:00
Vitaliy Filippov e91ff2a9ec Only forget offline PGs if their state is not changed during reporting 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 086667f568 Do not check PG state key ownership if it doesn't exist yet
This fixes the bug where OSDs were sometimes trying to report updated PG states
infinitely without luck when PGs transitioned from 'starting' to 'peering' too fast
2021-03-08 17:04:10 +03:00
Vitaliy Filippov 1be94da437 Check & remove extra chunks for degraded / incomplete objects, too 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 80e12358a2 Use pg_data_size instead of pg_minsize for object state calculation 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 36c935ace6 Use std::vector for the blockstore submission queue 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 0d8b5e2ef9 Remove unused enqueue_op_first() 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 98f1e2c277 Rework write/sync ordering
Make syncs wait for all previous writes because it's the only way
to make sure that OSDs do not receive incomplete writes in LIST results
during peering when some writes are still in progress.

Also simplify blockstore submission queue logic.
2021-03-08 17:04:10 +03:00
Vitaliy Filippov 21e7686037 Fix possible "assertion failed: pg.inflight >= 0" error during PG stop 2021-03-08 17:04:10 +03:00
Vitaliy Filippov ab21a1908b Check for the dirty PG flag when trying to continue to stop it after sync 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 30d1ccd43e Fix an infinite loop when discarding list operations during stop_pg() 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 8bdd6d8d78 Reset PG state when stopping them 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 09b3e4e789 Fix OSDs being unable to stop PGs that are 'peering', not 'active'
This was sometimes leading to incorrect misplaced and degraded object count statistics
2021-03-08 17:04:10 +03:00
Vitaliy Filippov bc742ccf8c Fix a small memory leak in etcd_state_client 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 314b20437b Do not break subsequent small writes badly when a big write is canceled 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 29d8ac8b1b Do not report statistics for the empty operation 2021-03-01 16:20:57 +03:00
Vitaliy Filippov 6155b23a7e Replace pgs[id] with pgs.at(id) to prevent accidental auto-vivification 2021-02-28 19:36:59 +03:00
Vitaliy Filippov 46e79f3306 Wait for PGs to become clean before stopping them 2021-02-28 19:36:59 +03:00
Vitaliy Filippov 41fd14e024 Fix deletes not increasing write_iodepth 2021-02-28 19:36:59 +03:00
Vitaliy Filippov 2d73b19a6c Fix online PG count change bugs 2021-02-25 23:59:33 +03:00