Supposed fix for "unexpected state during flush: 0x51" with EC
Test / buildenv (push) Successful in 12s Details
Test / build (push) Successful in 2m39s Details
Test / test_cas (push) Successful in 13s Details
Test / make_test (push) Successful in 37s Details
Test / test_change_pg_count (push) Successful in 39s Details
Test / test_change_pg_size (push) Successful in 10s Details
Test / test_change_pg_count_ec (push) Successful in 36s Details
Test / test_create_nomaxid (push) Successful in 10s Details
Test / test_etcd_fail (push) Successful in 54s Details
Test / test_add_osd (push) Successful in 2m49s Details
Test / test_interrupted_rebalance_imm (push) Successful in 2m49s Details
Test / test_interrupted_rebalance (push) Successful in 2m58s Details
Test / test_interrupted_rebalance_ec (push) Successful in 2m14s Details
Test / test_failure_domain (push) Successful in 11s Details
Test / test_minsize_1 (push) Successful in 15s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m27s Details
Test / test_snapshot (push) Successful in 26s Details
Test / test_snapshot_ec (push) Successful in 27s Details
Test / test_rm (push) Successful in 18s Details
Test / test_move_reappear (push) Successful in 23s Details
Test / test_snapshot_down (push) Successful in 31s Details
Test / test_snapshot_down_ec (push) Successful in 30s Details
Test / test_splitbrain (push) Successful in 26s Details
Test / test_snapshot_chain (push) Successful in 2m12s Details
Test / test_snapshot_chain_ec (push) Successful in 2m54s Details
Test / test_rebalance_verify_imm (push) Successful in 5m37s Details
Test / test_rebalance_verify (push) Successful in 6m15s Details
Test / test_switch_primary (push) Successful in 38s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m53s Details
Test / test_write (push) Successful in 1m3s Details
Test / test_write_no_same (push) Successful in 20s Details
Test / test_write_xor (push) Successful in 1m25s Details
Test / test_rebalance_verify_ec (push) Successful in 7m7s Details
Test / test_heal_pg_size_2 (push) Successful in 4m7s Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m36s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m49s Details
Test / test_heal_csum_32k (push) Successful in 5m37s Details
Test / test_heal_ec (push) Failing after 10m37s Details
Test / test_heal_csum_4k_dmj (push) Successful in 5m27s Details
Test / test_scrub (push) Successful in 1m34s Details
Test / test_heal_csum_4k_dj (push) Successful in 5m46s Details
Test / test_scrub_zero_osd_2 (push) Successful in 56s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m2s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m35s Details
Test / test_scrub_ec (push) Successful in 56s Details
Test / test_heal_csum_4k (push) Successful in 5m29s Details
Test / test_scrub_xor (push) Failing after 3m14s Details

Vitaliy Filippov 2024-02-20 20:42:49 +03:00
parent 3d16cde23c
commit a4cb915448
1 changed files with 32 additions and 22 deletions

View File

@ -307,35 +307,45 @@ int blockstore_impl_t::dequeue_stable(blockstore_op_t *op)
return STAB_SPLIT_DONE;
}
}
else if (IS_IN_FLIGHT(dirty_it->second.state))
{
// Object write is still in progress. Wait until the write request completes
return STAB_SPLIT_WAIT;
}
else if (!IS_SYNCED(dirty_it->second.state))
{
// Object not synced yet - sync it
// In previous versions we returned EBUSY here and required
// the caller (OSD) to issue a global sync first. But a global sync
// waits for all writes in the queue including inflight writes. And
// inflight writes may themselves be blocked by unstable writes being
// still present in the journal and not flushed away from it.
// So we must sync specific objects here.
//
// Even more, we have to process "stabilize" request in parts. That is,
// we must stabilize all objects which are already synced. Otherwise
// they may block objects which are NOT synced yet.
return STAB_SPLIT_SYNC;
}
else if (IS_STABLE(dirty_it->second.state))
{
// Already stable
return STAB_SPLIT_DONE;
}
else
while (true)
{
return STAB_SPLIT_TODO;
if (IS_IN_FLIGHT(dirty_it->second.state))
{
// Object write is still in progress. Wait until the write request completes
return STAB_SPLIT_WAIT;
}
else if (!IS_SYNCED(dirty_it->second.state))
{
// Object not synced yet - sync it
// In previous versions we returned EBUSY here and required
// the caller (OSD) to issue a global sync first. But a global sync
// waits for all writes in the queue including inflight writes. And
// inflight writes may themselves be blocked by unstable writes being
// still present in the journal and not flushed away from it.
// So we must sync specific objects here.
//
// Even more, we have to process "stabilize" request in parts. That is,
// we must stabilize all objects which are already synced. Otherwise
// they may block objects which are NOT synced yet.
return STAB_SPLIT_SYNC;
}
// Check previous versions too
if (dirty_it == dirty_db.begin())
{
break;
}
dirty_it--;
if (dirty_it->first.oid != ov.oid)
{
break;
}
}
return STAB_SPLIT_TODO;
});
if (r != 1)
{