Commit Graph

1217 Commits (rdma-v2)
 

Author SHA1 Message Date
Vitaliy Filippov b4b6407716 WIP Implement RDMA v2 based on IBV_WR_RDMA_WRITE with remote buffer management
One BIG FIXME remaining - handling large operations :))
3 months ago
Vitaliy Filippov 8139a34e97 Fix json11: allow trailing comma 4 months ago
Vitaliy Filippov 4ab630b44d Use just sfdisk --json, --dump is not needed 4 months ago
Vitaliy Filippov 2c8241b7db Remove PG "peered" state 4 months ago
Vitaliy Filippov 36a7dd3671 Move tests to "make test" 4 months ago
Vitaliy Filippov 936122bbcf Initialize msgr lazily in client to speedup vitastor-cli with RDMA enabled 4 months ago
Vitaliy Filippov 1a1ba0d1e7 Add set_immediate to ringloop and use it for bs/osd ops to prevent reenterability issues 4 months ago
Vitaliy Filippov 3d09c9cec7 Remove unused wait_sqe() from ringloop 4 months ago
Vitaliy Filippov 3d08a1ad6c Fix cluster_client test after last reenterability fixes 4 months ago
Vitaliy Filippov 499881d81c Fix typo 4 months ago
Vitaliy Filippov aba93b951b Fix incorrect EC free space statistics in vitastor-cli df output 4 months ago
Vitaliy Filippov d125fb1f30 Release 0.8.5
- Fix a possible "double free" bug in the client library happening on OSD restart
- Fix a possible write hang on PG history update when only epoch is changed
- Fix incorrect systemd target "local.target" in mon/make-etcd
- Allow "content" option in PVE storage plugin to allow to enable containers
- Build client library without tcmalloc which fixes "attempt to free invalid pointer"
  errors when, for example, trying to run QEMU with both Vitastor and Ceph RBD disks
4 months ago
Vitaliy Filippov 9d3fd72298 Require liburing < 2 in rpm specs 4 months ago
Vitaliy Filippov 8b552a01f9 Do not retry successful operation parts in client (could lead to "double free" bugs) 4 months ago
Vitaliy Filippov 0385b2f9e8 Fix write hangs on PG epoch update - always set pg.history_changed to true 4 months ago
Vitaliy Filippov 749c837045 Replace non-existing local.target with multi-user.target 4 months ago
Vitaliy Filippov 98001d845b Remove version from vitastor-release.rpm links 5 months ago
Vitaliy Filippov c96bcae74b Allow "content" option in PVE storage plugin to allow to enable containers 5 months ago
Vitaliy Filippov 9f4e34a8cc Build client library without tcmalloc
Fixes "[src/tcmalloc.cc:332] Attempt to free invalid pointer ..." when trying
to run QEMU with both Vitastor and Ceph RBD disks and other possible allocator
collisions.
5 months ago
Vitaliy Filippov 81fc8bb94c Release 0.8.4
New features:
- Implement QCOW2 image/snapshot export via qemu-img (bdrv_co_block_status in the driver)
- Remove OSDs from PG history during `vitastor-cli rm-osd` to prevent `left_on_dead` PG states after deletion
- Add a new recovery_pg_switch setting to mix all PGs during recovery, to almost
  fully reduce the probability of ENOSPC during rebalance
- Introduce partial ENOSPC ("OSD is full") handling - now ENOSPC doesn't turn
  into cascades of crashes
- Add migration support to Proxmox VE Vitastor driver
- Track last_clean_pgs on a per-pool basis thus reducing data movement in a cluster
  with pools remaining unclean/degraded for a long time

Bug fixes:
- Fix a bug where monitor could generate degraded PGs if one of the hosts had no OSDs
- Fix a bug where monitor could skip PG redistribution with a lot of OSDs in cluster
- Report PG history synchronously on the first write, which improves PG consistency
  and availability at the same time, because history now gets reported correctly
  and doesn't get reported without the need for it
- Fix possible write and recovery stalls which could happen in a cluster with both EC and replicated pools
- Make OSD and monitors sanitize & deduplicate PG history items in etcd
- Fix non-working OSD peer config safety check
- Fix a rare journal flush stall where flushing wasn't activated with full journal, but with empty flush queue
- Fix builds without ISA-L (jerasure-only) crashing with EC N+K, K>=2 due to the lack of 16-byte buffer alignment
- Fix a possible crash for EC N+K, K>=2 when calculating a parity chunk with previous parity chunk missing
- Fix a bug where vitastor-disk purge with suppressed warnings didn't work
5 months ago
Vitaliy Filippov bc465c16de Fix arithmetic on void* for clang 5 months ago
Vitaliy Filippov 8763e9211c Fix qemu driver compilation warning/error 5 months ago
Vitaliy Filippov 9e1a80bd17 Replace apt-key with trusted.gpg.d 5 months ago
Vitaliy Filippov 3e280f2f08 Mark vitastor as shared storage in PVE driver 5 months ago
Vitaliy Filippov fe87b4076b Fix backwards compatibility in cluster_client 5 months ago
Vitaliy Filippov a38957c1a7 Skip empty hosts in lp-optimizer 5 months ago
Vitaliy Filippov 137309cf29 Implement bdrv_co_block_status for snapshot export support 5 months ago
Vitaliy Filippov 373f9d0387 Try to re-peer PGs on history change 5 months ago
Vitaliy Filippov c4516ea971 Also remove deleted OSD from PG configuration and last_clean_pgs 5 months ago
Vitaliy Filippov 91065c80fc Try to prevent left_on_dead when deleting OSDs by removing them from PG history 5 months ago
Vitaliy Filippov 0f6b946add Time changes with every stat change, do not schedule checks based on it 5 months ago
Vitaliy Filippov 465cbf0b2f Do not re-schedule recheck indefinitely, run it after mon_change_timeout in any case 5 months ago
Vitaliy Filippov 41add50e4e Track last_clean_pgs on a per-pool basis 5 months ago
Vitaliy Filippov 02e7be7dc9 Prevent reenterability side effects during PG history operation resume 5 months ago
Vitaliy Filippov 73940adf07 Prioritize EC (non-instantly-stable) operations under journal pressure
This reduces the probability of hitting OSD stalls with EC due to "deadlocks"
where two parallel write operations wait for each other to complete
5 months ago
Vitaliy Filippov e950c024d3 Do not sync peer OSDs before listing
Sync before listing was added to wait for all PG writes possibly left in queue
from the previous master to finish before listing it

But in fact it may block the cluster when EC is used and some unstable writes
are left in the queue - they block journal flushing, rollback/stabilize is
required to unblock them, but rollback/stabilize may only happen after PG is
peered. But peering needs listings, listings are requested only after sync, and
sync itself waits for currently blocked writes waiting in the queue
5 months ago
Vitaliy Filippov 71d6d9f868 Fix possible crash on ENOSPC during operation cancel in blockstore 5 months ago
Vitaliy Filippov a4dfa519af Report PG history synchronously during write
This has 2 effects:
1) OSD sets aren't added into PG history until actual write attempts anymore
   which removes unneeded extra osd_sets in PG history
2) New OSD sets are reported synchronously and can't be lost on PG restarts
   happening at the same time with reconfiguration
5 months ago
Vitaliy Filippov 37a6aff2fa Write OSD numbers always as numbers in mon 5 months ago
Vitaliy Filippov 67019f5b02 Make OSD sort & sanitize PG history items 5 months ago
Vitaliy Filippov 0593e5c21c Fix OSD peer config safety check 5 months ago
Vitaliy Filippov 998e24adf8 Add a new recovery_pg_switch setting to mix all PGs during recovery 5 months ago
Vitaliy Filippov d7bd36dc32 Fix another rare journal flush stall 5 months ago
Vitaliy Filippov cf5c562800 Log all object locations when peering PGs 5 months ago
Vitaliy Filippov 629200b0cc Return ENOSPC as the primary OSD 5 months ago
Vitaliy Filippov 3589ccec22 Do not disconnect peer on ENOSPC during write 5 months ago
Vitaliy Filippov 8d55a1e780 Build osd_rmw_test both with and without ISA-L 5 months ago
Vitaliy Filippov 65f6b3a4eb Fix jerasure crashing on bitmap calculation/restoration due to the lack of 16-byte alignment 5 months ago
Vitaliy Filippov fd216eac77 Add a test for missing parity chunk calculation 5 months ago
Vitaliy Filippov 61fca7c426 Fix crash when calculating a parity chunk with previous parity chunk missing (test coming shortly) 5 months ago