Commit Graph

5935 Commits (36df75a0a903c8ba3e2e28eb7162c43f8dd5d8f0)

Author SHA1 Message Date
Stefan Hajnoczi 66547f416a block/nvme: invoke blk_io_plug_call() outside q->lock
blk_io_plug_call() is invoked outside a blk_io_plug()/blk_io_unplug()
section while opening the NVMe drive from:

  nvme_file_open() ->
  nvme_init() ->
  nvme_identify() ->
  nvme_admin_cmd_sync() ->
  nvme_submit_command() ->
  blk_io_plug_call()

blk_io_plug_call() immediately invokes the given callback when the
current thread is not plugged, as is the case during nvme_file_open().

Unfortunately, nvme_submit_command() calls blk_io_plug_call() with
q->lock still held:

    ...
    q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE;
    q->need_kick++;
    blk_io_plug_call(nvme_unplug_fn, q);
    qemu_mutex_unlock(&q->lock);
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^

nvme_unplug_fn() deadlocks trying to acquire q->lock because the lock is
already acquired by the same thread. The symptom is that QEMU hangs
during startup while opening the NVMe drive.

Fix this by moving the blk_io_plug_call() outside q->lock. This is safe
because no other thread runs code related to this queue and
blk_io_plug_call()'s internal state is immune to thread safety issues
since it is thread-local.

Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Lukas Doktor <ldoktor@redhat.com>
Message-id: 20230712191628.252806-1-stefanha@redhat.com
Fixes: f2e590002b ("block/nvme: convert to blk_io_plug_call() API")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-17 09:17:41 -04:00
Stefan Hajnoczi c21eae1ccc block/blkio: fix module_block.py parsing
When QEMU is built with --enable-modules, the module_block.py script
parses block/*.c to find block drivers that are built as modules. The
script generates a table of block drivers called block_driver_modules[].
This table is used for block driver module loading.

The blkio.c driver uses macros to define its BlockDriver structs. This
was done to avoid code duplication but the module_block.py script is
unable to parse the macro. The result is that libblkio-based block
drivers can be built as modules but will not be found at runtime.

One fix is to make the module_block.py script or build system fancier so
it can parse C macros (e.g. by parsing the preprocessed source code). I
chose not to do this because it raises the complexity of the build,
making future issues harder to debug.

Keep things simple: use the macro to avoid duplicating BlockDriver
function pointers but define .format_name and .protocol_name manually
for each BlockDriver. This way the module_block.py is able to parse the
code.

Also get rid of the block driver name macros (e.g. DRIVER_IO_URING)
because module_block.py cannot parse them either.

Fixes: fd66dbd424 ("blkio: add libblkio block driver")
Reported-by: Qing Wang <qinwang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230704123436.187761-1-stefanha@redhat.com
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-04 17:28:25 +02:00
Richard Henderson 0eb8f90ede Block layer patches
- Re-enable the graph lock
 - More fixes to coroutine_fn marking
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmScQCQRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9bNSA//WIzPT45rFhl2U9QgyOJu26ho6ahsgwgI
 Z3QM5kCDB1dAN9USRPxhGboLGo8CyY7eeSwSrR7RtwBGYrWrAoJfGp5gK/7d9s5Q
 o0AGgRPnJGhFkBhRRMytsDsewM6Kk4IRmk4HMK3cOH3rsSM8RHs6KmDSBKesllu0
 QVGf3qW4u8LHyZyGM5OlPVUbtuDuK6/52FGhpXBp+x4oyNegOhjwO4mGOvTG+xIk
 Q5zwWZaPfjxaEDkvW8iahB6/D7Tpt64BmMf1Ydhxcd5eKEp932CiBI36aAlNKoRD
 Al5wztRx1GEh12ekN39jIi7Ypp3JX26keJcieKU0q656pT551UFRYjU0Rk08/Cca
 qv2oiQDu6bHgQ9zCQ1nMfa9+K2MyBwx0b5qfYkvs2RzgCTl8ImgBQANHfw8tz6Bq
 HUo1zsFBXCaK0boUB5iFwdf3rlx3t9UTEuDej/RaHqZjZD5xeG/smCcOlSfHaKUa
 wXfYxvm8ZfefJn1D6io1A+7M956uvIQNtmh13cU44clgFX9Y/bBNMg/5lMRsJKo8
 xxjvqCAyxo/pPfUsVWx4pc8AXbfVa85gyoSiaLEYZnqP54sJ2lFccqykCsTy58Lo
 VDcoPnoSc+LNqBOvtzxXgQbEWFCXU6fe0+TZgVYUvExWFIAOImeDWg2GD1JVrwsX
 e9QrPhL3DXg=
 =ZQcP
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- Re-enable the graph lock
- More fixes to coroutine_fn marking

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmScQCQRHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9bNSA//WIzPT45rFhl2U9QgyOJu26ho6ahsgwgI
# Z3QM5kCDB1dAN9USRPxhGboLGo8CyY7eeSwSrR7RtwBGYrWrAoJfGp5gK/7d9s5Q
# o0AGgRPnJGhFkBhRRMytsDsewM6Kk4IRmk4HMK3cOH3rsSM8RHs6KmDSBKesllu0
# QVGf3qW4u8LHyZyGM5OlPVUbtuDuK6/52FGhpXBp+x4oyNegOhjwO4mGOvTG+xIk
# Q5zwWZaPfjxaEDkvW8iahB6/D7Tpt64BmMf1Ydhxcd5eKEp932CiBI36aAlNKoRD
# Al5wztRx1GEh12ekN39jIi7Ypp3JX26keJcieKU0q656pT551UFRYjU0Rk08/Cca
# qv2oiQDu6bHgQ9zCQ1nMfa9+K2MyBwx0b5qfYkvs2RzgCTl8ImgBQANHfw8tz6Bq
# HUo1zsFBXCaK0boUB5iFwdf3rlx3t9UTEuDej/RaHqZjZD5xeG/smCcOlSfHaKUa
# wXfYxvm8ZfefJn1D6io1A+7M956uvIQNtmh13cU44clgFX9Y/bBNMg/5lMRsJKo8
# xxjvqCAyxo/pPfUsVWx4pc8AXbfVa85gyoSiaLEYZnqP54sJ2lFccqykCsTy58Lo
# VDcoPnoSc+LNqBOvtzxXgQbEWFCXU6fe0+TZgVYUvExWFIAOImeDWg2GD1JVrwsX
# e9QrPhL3DXg=
# =ZQcP
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 28 Jun 2023 04:13:56 PM CEST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (23 commits)
  block: use bdrv_co_debug_event in coroutine context
  block: use bdrv_co_getlength in coroutine context
  qcow2: mark more functions as coroutine_fns and GRAPH_RDLOCK
  vhdx: mark more functions as coroutine_fns and GRAPH_RDLOCK
  vmdk: mark more functions as coroutine_fns and GRAPH_RDLOCK
  dmg: mark more functions as coroutine_fns and GRAPH_RDLOCK
  cloop: mark more functions as coroutine_fns and GRAPH_RDLOCK
  block: mark another function as coroutine_fns and GRAPH_UNLOCKED
  bochs: mark more functions as coroutine_fns and GRAPH_RDLOCK
  vpc: mark more functions as coroutine_fns and GRAPH_RDLOCK
  qed: mark more functions as coroutine_fns and GRAPH_RDLOCK
  file-posix: remove incorrect coroutine_fn calls
  Revert "graph-lock: Disable locking for now"
  graph-lock: Unlock the AioContext while polling
  blockjob: Fix AioContext locking in block_job_add_bdrv()
  block: Fix AioContext locking in bdrv_open_backing_file()
  block: Fix AioContext locking in bdrv_open_inherit()
  block: Fix AioContext locking in bdrv_reopen_parse_file_or_backing()
  block: Fix AioContext locking in bdrv_attach_child_common()
  block: Fix AioContext locking in bdrv_open_child()
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-28 17:29:53 +02:00
Paolo Bonzini 17362398ee block: use bdrv_co_debug_event in coroutine context
bdrv_co_debug_event was recently introduced, with bdrv_debug_event
becoming a wrapper for use in unknown context.  Because most of the
time bdrv_debug_event is used on a BdrvChild via the wrapper macro
BLKDBG_EVENT, introduce a similar macro BLKDBG_CO_EVENT that calls
bdrv_co_debug_event, and switch whenever possible.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-13-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:34 +02:00
Paolo Bonzini 0af02bd107 block: use bdrv_co_getlength in coroutine context
bdrv_co_getlength was recently introduced, with bdrv_getlength becoming
a wrapper for use in unknown context.  Switch to bdrv_co_getlength when
possible.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-12-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:33 +02:00
Paolo Bonzini 70bacc4453 qcow2: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-11-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:32 +02:00
Paolo Bonzini f6b0899493 vhdx: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-10-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:30 +02:00
Paolo Bonzini 28944f99c4 vmdk: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-9-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:29 +02:00
Paolo Bonzini 688dc49da5 dmg: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-8-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:28 +02:00
Paolo Bonzini cf8d4c582b cloop: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-7-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:27 +02:00
Paolo Bonzini e7918e9619 bochs: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-5-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:24 +02:00
Paolo Bonzini 517b5dfffd vpc: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-4-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:23 +02:00
Paolo Bonzini bba667da7f qed: mark more functions as coroutine_fns and GRAPH_RDLOCK
Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-3-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:20 +02:00
Paolo Bonzini 36c6c8773a file-posix: remove incorrect coroutine_fn calls
raw_co_getlength is called by handle_aiocb_write_zeroes, which is not a coroutine
function.  This is harmless because raw_co_getlength does not actually suspend,
but in the interest of clarity make it a non-coroutine_fn that is just wrapped
by the coroutine_fn raw_co_getlength.  Likewise, check_cache_dropped was only
a coroutine_fn because it called raw_co_getlength, so it can be made non-coroutine
as well.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-2-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 09:46:14 +02:00
Kevin Wolf 3cce22defb Revert "graph-lock: Disable locking for now"
Now that bdrv_graph_wrlock() temporarily drops the AioContext lock that
its caller holds, it can poll without causing deadlocks. We can now
re-enable graph locking.

This reverts commit ad128dff0bf4b6f971d05eb4335a627883a19c1d.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-12-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 08:46:26 +02:00
Kevin Wolf 31b2ddfea3 graph-lock: Unlock the AioContext while polling
If the caller keeps the AioContext lock for a block node in an iothread,
polling in bdrv_graph_wrlock() deadlocks if the condition isn't
fulfilled immediately.

Now that all callers make sure to actually have the AioContext locked
when they call bdrv_replace_child_noperm() like they should, we can
change bdrv_graph_wrlock() to take a BlockDriverState whose AioContext
lock the caller holds (NULL if it doesn't) and unlock it temporarily
while polling.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-11-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28 08:46:23 +02:00
Manos Pitsidianakis f8ed3648b5 vhost-user: fully use new backend/frontend naming
Slave/master nomenclature was replaced with backend/frontend in commit
1fc19b6527 ("vhost-user: Adopt new backend naming")

This patch replaces all remaining uses of master and slave in the
codebase.

Signed-off-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20230613080849.2115347-1-manos.pitsidianakis@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2023-06-26 09:50:00 -04:00
Philippe Mathieu-Daudé de6cd7599b meson: Replace softmmu_ss -> system_ss
We use the user_ss[] array to hold the user emulation sources,
and the softmmu_ss[] array to hold the system emulation ones.
Hold the latter in the 'system_ss[]' array for parity with user
emulation.

Mechanical change doing:

  $ sed -i -e s/softmmu_ss/system_ss/g $(git grep -l softmmu_ss)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230613133347.82210-10-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-20 10:01:30 +02:00
Richard Henderson 7ce5a15fa6 * Fix emulated LCCB, LOCFHR, MXDB and MXDBR s390x instructions
* Fix the malta machine on s390x (big endian) hosts
 * Emulate /proc/cpuinfo on s390x
 * Remove pointless QOM casts
 * Improve the inclusion logic for libkeyutils and ipmi-bt-test in meson.build
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmR+ycgRHHRodXRoQHJl
 ZGhhdC5jb20ACgkQLtnXdP5wLbWXXw//WPz3ng50KLS+M1t3/ULEjO6XkGfP2LQZ
 RsZq3hf9THFPZgcREk+6SQvttOSTuvHNfakfujS6U1Ou5thReWqLe4itFW6+hB5j
 kQ+Sm6YJ+fpezkBnSefcUoL5nA9VVKZ6KE6kxq5CUBZNoIk1sSsfrU8y8wjzW0yg
 2nraOcG10aLpO2BfvKHVEAhJtwl9pHJsFANmHC2/h2wC9BZIAzdxiytzdcJ909gN
 AAa0hIrLK/oFgJjkSSxu+QTaVGPARXqkx5WV546F/zmDMFUWd9nrXaegwqxjgPBN
 m9Ua0SXll5hX2Z57vjJWlbTYkD+JUB22L0N7p5/xzhYRpLVSq1pdveo9psrzIC3E
 Bt7chZB58acQepJHxxa3UHDOHcnfdfaN+Dd9wD29wHr7nK8lOcsen7/7V+5YXomc
 qenkCtkpjKTl07OBxe6MDGZtPZYA8fK1CjEyYwHCe8QvxEzsyg96Bm3j4N2VPxQU
 +f/sFPX7SgogZI4mB4wdoxOF1RmQ+DXQ2tnB970txZRkmFq2jJHpW86jkkbq2Jl1
 KIjgdIXjVgy+MPtuQzO5cT+jfhGQL7FQynGXHjv/UidBid5XD3TDVNa9AthN3Mng
 +rPT90VJ7j9soMqvmNT1COSIRD+M49dQKBIQuq/gWplaTOHaAcJrCwYScwqe0u0P
 zmjCNeuPVw8=
 =dfJr
 -----END PGP SIGNATURE-----

Merge tag 'pull-request-2023-06-06' of https://gitlab.com/thuth/qemu into staging

* Fix emulated LCCB, LOCFHR, MXDB and MXDBR s390x instructions
* Fix the malta machine on s390x (big endian) hosts
* Emulate /proc/cpuinfo on s390x
* Remove pointless QOM casts
* Improve the inclusion logic for libkeyutils and ipmi-bt-test in meson.build

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmR+ycgRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbWXXw//WPz3ng50KLS+M1t3/ULEjO6XkGfP2LQZ
# RsZq3hf9THFPZgcREk+6SQvttOSTuvHNfakfujS6U1Ou5thReWqLe4itFW6+hB5j
# kQ+Sm6YJ+fpezkBnSefcUoL5nA9VVKZ6KE6kxq5CUBZNoIk1sSsfrU8y8wjzW0yg
# 2nraOcG10aLpO2BfvKHVEAhJtwl9pHJsFANmHC2/h2wC9BZIAzdxiytzdcJ909gN
# AAa0hIrLK/oFgJjkSSxu+QTaVGPARXqkx5WV546F/zmDMFUWd9nrXaegwqxjgPBN
# m9Ua0SXll5hX2Z57vjJWlbTYkD+JUB22L0N7p5/xzhYRpLVSq1pdveo9psrzIC3E
# Bt7chZB58acQepJHxxa3UHDOHcnfdfaN+Dd9wD29wHr7nK8lOcsen7/7V+5YXomc
# qenkCtkpjKTl07OBxe6MDGZtPZYA8fK1CjEyYwHCe8QvxEzsyg96Bm3j4N2VPxQU
# +f/sFPX7SgogZI4mB4wdoxOF1RmQ+DXQ2tnB970txZRkmFq2jJHpW86jkkbq2Jl1
# KIjgdIXjVgy+MPtuQzO5cT+jfhGQL7FQynGXHjv/UidBid5XD3TDVNa9AthN3Mng
# +rPT90VJ7j9soMqvmNT1COSIRD+M49dQKBIQuq/gWplaTOHaAcJrCwYScwqe0u0P
# zmjCNeuPVw8=
# =dfJr
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 05 Jun 2023 10:53:12 PM PDT
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [unknown]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [unknown]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2023-06-06' of https://gitlab.com/thuth/qemu:
  linux-user: Emulate /proc/cpuinfo on s390x
  linux-user/elfload: Introduce elf_hwcap_str() on s390x
  linux-user/elfload: Expose get_elf_hwcap() on s390x
  s390x/tcg: Fix CPU address returned by STIDP
  bulk: Remove pointless QOM casts
  scripts: Add qom-cast-macro-clean-cocci-gen.py
  hw/mips/malta: Fix the malta machine on big endian hosts
  gitlab-ci: Remove unused Python package
  tests/qtest: Run ipmi-bt-test only if CONFIG_IPMI_EXTERN is set
  tests/tcg/s390x: Test MXDB and MXDBR
  target/s390x: Fix MXDB and MXDBR
  Add conditional dependency for libkeyutils
  tests/tcg/s390x: Test single-stepping SVC
  linux-user/s390x: Fix single-stepping SVC
  tests/tcg/s390x: Test LOCFHR
  target/s390x: Fix LOCFHR taking the wrong half of R2
  tests/tcg/s390x: Test LCBB
  target/s390x: Fix LCBB overwriting the top 32 bits

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-06 07:07:37 -07:00
Philippe Mathieu-Daudé 7d5b0d6864 bulk: Remove pointless QOM casts
Mechanical change running Coccinelle spatch with content
generated from the qom-cast-macro-clean-cocci-gen.py added
in the previous commit.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230601093452.38972-3-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-06-05 20:48:34 +02:00
Jean-Louis Dupond 42a2890a76 qcow2: add discard-no-unref option
When we for example have a sparse qcow2 image and discard: unmap is enabled,
there can be a lot of fragmentation in the image after some time. Especially on VM's
that do a lot of writes/deletes.
This causes the qcow2 image to grow even over 110% of its virtual size,
because the free gaps in the image get too small to allocate new
continuous clusters. So it allocates new space at the end of the image.

Disabling discard is not an option, as discard is needed to keep the
incremental backup size as low as possible. Without discard, the
incremental backups would become large, as qemu thinks it's just dirty
blocks but it doesn't know the blocks are unneeded.
So we need to avoid fragmentation but also 'empty' the unneeded blocks in
the image to have a small incremental backup.

In addition, we also want to send the discards further down the stack, so
the underlying blocks are still discarded.

Therefor we introduce a new qcow2 option "discard-no-unref".
When setting this option to true, discards will no longer have the qcow2
driver relinquish cluster allocations. Other than that, the request is
handled as normal: All clusters in range are marked as zero, and, if
pass-discard-request is true, it is passed further down the stack.
The only difference is that the now-zero clusters are preallocated
instead of being unallocated.
This will avoid fragmentation on the qcow2 image.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Message-Id: <20230605084523.34134-2-jean-louis@dupond.be>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:15:42 +02:00
Alexander Ivanov 029136f261 parallels: Incorrect condition in out-of-image check
All the offsets in the BAT must be lower than the file size.
Fix the check condition for correct check.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-13-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:59 +02:00
Alexander Ivanov c0fc051dd4 parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
Replace the way we use mutex in parallels_co_check() for simplier
and less error prone code.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-12-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:58 +02:00
Alexander Ivanov 7e259e2540 parallels: Move statistic collection to a separate function
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230424093147.197643-11-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:57 +02:00
Alexander Ivanov 09a21edfaf parallels: Move check of leaks to a separate function
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Message-Id: <20230424093147.197643-10-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:56 +02:00
Alexander Ivanov 9616f7a6c2 parallels: Fix statistics calculation
Exclude out-of-image clusters from allocated and fragmented clusters
calculation.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Message-Id: <20230424093147.197643-9-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:55 +02:00
Alexander Ivanov 6d416e56a7 parallels: Move check of cluster outside image to a separate function
We will add more and more checks so we need a better code structure in
parallels_co_check. Let each check performs in a separate loop in a
separate helper.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-8-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:54 +02:00
Alexander Ivanov 96de69c7d9 parallels: Move check of unclean image to a separate function
We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230424093147.197643-7-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:53 +02:00
Alexander Ivanov 3569cb7b78 parallels: Use generic infrastructure for BAT writing in parallels_co_check()
BAT is written in the context of conventional operations over the image
inside bdrv_co_flush() when it calls parallels_co_flush_to_os() callback.
Thus we should not modify BAT array directly, but call
parallels_set_bat_entry() helper and bdrv_co_flush() further on. After
that there is no need to manually write BAT and track its modification.

This makes code more generic and allows to split parallels_set_bat_entry()
for independent pieces.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-6-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:52 +02:00
Alexander Ivanov b64b29b96b parallels: create parallels_set_bat_entry_helper() to assign BAT value
This helper will be reused in next patches during parallels_co_check
rework to simplify its code.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230424093147.197643-5-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:51 +02:00
Alexander Ivanov 679749ce41 parallels: Fix image_end_offset and data_end after out-of-image check
Set data_end to the end of the last cluster inside the image. In such a
way we can be sure that corrupted offsets in the BAT can't affect on the
image size. If there are no allocated clusters set image_end_offset by
data_end.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20230424093147.197643-4-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:50 +02:00
Alexander Ivanov ab2d739c41 parallels: Fix high_off calculation in parallels_co_check()
Don't let high_off be more than the file size even if we don't fix the
image.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230424093147.197643-3-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:49 +02:00
Alexander Ivanov f5e715dbbb parallels: Out of image offset in BAT leads to image inflation
data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will
create the cluster at this offset and/or the image will be truncated to
this offset on close. This is definitely not correct.

Raise an error in parallels_open() if data_end points outside the image
and it is not a check (let the check to repaire the image). Set data_end
to the end of the cluster with the last correct offset.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Message-Id: <20230424093147.197643-2-alexander.ivanov@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05 13:13:47 +02:00
Hanna Czenczek 18743311b8 block: Collapse padded I/O vecs exceeding IOV_MAX
When processing vectored guest requests that are not aligned to the
storage request alignment, we pad them by adding head and/or tail
buffers for a read-modify-write cycle.

The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
with this padding, the vector can exceed that limit.  As of
4c002cef0e ("util/iov: make
qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
limit, instead returning an error to the guest.

To the guest, this appears as a random I/O error.  We should not return
an I/O error to the guest when it issued a perfectly valid request.

Before 4c002cef0e, we just made the vector
longer than IOV_MAX, which generally seems to work (because the guest
assumes a smaller alignment than we really have, file-posix's
raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
so emulate the request, so that the IOV_MAX does not matter).  However,
that does not seem exactly great.

I see two ways to fix this problem:
1. We split such long requests into two requests.
2. We join some elements of the vector into new buffers to make it
   shorter.

I am wary of (1), because it seems like it may have unintended side
effects.

(2) on the other hand seems relatively simple to implement, with
hopefully few side effects, so this patch does that.

To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
is effectively replaced by the new function bdrv_create_padded_qiov(),
which not only wraps the request IOV with padding head/tail, but also
ensures that the resulting vector will not have more than IOV_MAX
elements.  Putting that functionality into qemu_iovec_init_extended() is
infeasible because it requires allocating a bounce buffer; doing so
would require many more parameters (buffer alignment, how to initialize
the buffer, and out parameters like the buffer, its length, and the
original elements), which is not reasonable.

Conversely, it is not difficult to move qemu_iovec_init_extended()'s
functionality into bdrv_create_padded_qiov() by using public
qemu_iovec_* functions, so that is what this patch does.

Because bdrv_pad_request() was the only "serious" user of
qemu_iovec_init_extended(), the next patch will remove the latter
function, so the functionality is not implemented twice.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-05 13:11:06 +02:00
Eric Blake bd1386cce1 cutils: Adjust signature of parse_uint[_full]
It's already confusing that we have two very similar functions for
wrapping the parse of a 64-bit unsigned value, differing mainly on
whether they permit leading '-'.  Adjust the signature of parse_uint()
and parse_uint_full() to be like all of qemu_strto*(): put the result
parameter last, use the same types (uint64_t and unsigned long long
have the same width, but are not always the same type), and mark
endptr const (this latter change only affects the rare caller of
parse_uint).  Adjust all callers in the tree.

While at it, note that since cutils.c already includes:

    QEMU_BUILD_BUG_ON(sizeof(int64_t) != sizeof(long long));

we are guaranteed that the result of parse_uint* cannot exceed
UINT64_MAX (or the build would have failed), so we can drop
pre-existing dead comparisons in opts-visitor.c that were never false.

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230522190441.64278-8-eblake@redhat.com>
[eblake: Drop dead code spotted by Markus]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-06-02 12:27:19 -05:00
Stefano Garzarella cad2ccc395 block/blkio: use qemu_open() to support fd passing for virtio-blk
Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd
passing. Let's expose this to the user, so the management layer
can pass the file descriptor of an already opened path.

If the libblkio virtio-blk driver supports fd passing, let's always
use qemu_open() to open the `path`, so we can handle fd passing
from the management layer through the "/dev/fdset/N" special path.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230530071941.8954-2-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 11:08:21 -04:00
Stefan Hajnoczi 2a0d7cb6b7 block: remove bdrv_co_io_plug() API
No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
function pointers.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-7-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 08:59:24 -04:00
Stefan Hajnoczi 076682885d block/linux-aio: convert to blk_io_plug_call() API
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Note that a dev_max_batch check is dropped in laio_io_unplug() because
the semantics of unplug_fn() are different from .bdrv_co_unplug():
1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
   not every time blk_io_unplug() is called.
2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
   way to get per-BlockDriverState fields like dev_max_batch.

Therefore this condition cannot be moved to laio_unplug_fn(). It is not
obvious that this condition affects performance in practice, so I am
removing it instead of trying to come up with a more complex mechanism
to preserve the condition.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230530180959.1108766-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi 6a6da231b7 block/io_uring: convert to blk_io_plug_call() API
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi 28ff7b4dfb block/blkio: convert to blk_io_plug_call() API
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi f2e590002b block/nvme: convert to blk_io_plug_call() API
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi 41abca8c39 block: add blk_io_plug_call() API
Introduce a new API for thread-local blk_io_plug() that does not
traverse the block graph. The goal is to make blk_io_plug() multi-queue
friendly.

Instead of having block drivers track whether or not we're in a plugged
section, provide an API that allows them to defer a function call until
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
called multiple times with the same fn/opaque pair, then fn() is only
called once at the end of the function - resulting in batching.

This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
because the plug state is now thread-local.

Later patches convert block drivers to blk_io_plug_call() and then we
can finally remove .bdrv_co_io_plug() once all block drivers have been
converted.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01 07:34:03 -04:00
Stefan Hajnoczi 60f782b6b7 aio: remove aio_disable_external() API
All callers now pass is_external=false to aio_set_fd_handler() and
aio_set_event_notifier(). The aio_disable_external() API that
temporarily disables fd handlers that were registered is_external=true
is therefore dead code.

Remove aio_disable_external(), aio_enable_external(), and the
is_external arguments to aio_set_fd_handler() and
aio_set_event_notifier().

The entire test-fdmon-epoll test is removed because its sole purpose was
testing aio_disable_external().

Parts of this patch were generated using the following coccinelle
(https://coccinelle.lip6.fr/) semantic patch:

  @@
  expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
  @@
  - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
  + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)

  @@
  expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
  @@
  - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
  + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-21-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:37:26 +02:00
Stefan Hajnoczi 17b69c0fc1 block/fuse: do not set is_external=true on FUSE fd
This is part of ongoing work to remove the aio_disable_external() API.

Use BlockDevOps .drained_begin/end/poll() instead of
aio_set_fd_handler(is_external=true).

As a side-effect the FUSE export now follows AioContext changes like the
other export types.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-16-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi 3d499a43a2 block/export: don't require AioContext lock around blk_exp_ref/unref()
The FUSE export calls blk_exp_ref/unref() without the AioContext lock.
Instead of fixing the FUSE export, adjust blk_exp_ref/unref() so they
work without the AioContext lock. This way it's less error-prone.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-15-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi 195332c1d9 block/export: rewrite vduse-blk drain code
vduse_blk_detach_ctx() waits for in-flight requests using
AIO_WAIT_WHILE(). This is not allowed according to a comment in
bdrv_set_aio_context_commit():

  /*
   * Take the old AioContex when detaching it from bs.
   * At this point, new_context lock is already acquired, and we are now
   * also taking old_context. This is safe as long as bdrv_detach_aio_context
   * does not call AIO_POLL_WHILE().
   */

Use this opportunity to rewrite the drain code in vduse-blk:

- Use the BlockExport refcount so that vduse_blk_exp_delete() is only
  called when there are no more requests in flight.

- Implement .drained_poll() so in-flight request coroutines are stopped
  by the time .bdrv_detach_aio_context() is called.

- Remove AIO_WAIT_WHILE() from vduse_blk_detach_ctx() to solve the
  .bdrv_detach_aio_context() constraint violation. It's no longer
  needed due to the previous changes.

- Always handle the VDUSE file descriptor, even in drained sections. The
  VDUSE file descriptor doesn't submit I/O, so it's safe to handle it in
  drained sections. This ensures that the VDUSE kernel code gets a fast
  response.

- Suspend virtqueue fd handlers in .drained_begin() and resume them in
  .drained_end(). This eliminates the need for the
  aio_set_fd_handler(is_external=true) flag, which is being removed from
  QEMU.

This is a long list but splitting it into individual commits would
probably lead to git bisect failures - the changes are all related.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-14-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi ab61335025 block: drain from main loop thread in bdrv_co_yield_to_drain()
For simplicity, always run BlockDevOps .drained_begin/end/poll()
callbacks in the main loop thread. This makes it easier to implement the
callbacks and avoids extra locks.

Move the function pointer declarations from the I/O Code section to the
Global State section for BlockDevOps, BdrvChildClass, and BlockDriver.

Narrow IO_OR_GS_CODE() to GLOBAL_STATE_CODE() where appropriate.

The test-bdrv-drain test case calls bdrv_drain() from an IOThread. This
is now only allowed from coroutine context, so update the test case to
run in a coroutine.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-11-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi ff82b7835b block: add blk_in_drain() API
The BlockBackend quiesce_counter is greater than zero during drained
sections. Add an API to check whether the BlockBackend is in a drained
section.

The next patch will use this API.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-10-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi e1054cd4aa block/export: stop using is_external in vhost-user-blk server
vhost-user activity must be suspended during bdrv_drained_begin/end().
This prevents new requests from interfering with whatever is happening
in the drained section.

Previously this was done using aio_set_fd_handler()'s is_external
argument. In a multi-queue block layer world the aio_disable_external()
API cannot be used since multiple AioContext may be processing I/O, not
just one.

Switch to BlockDevOps->drained_begin/end() callbacks.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-8-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00
Stefan Hajnoczi 8f5e9a8ee1 block/export: wait for vhost-user-blk requests when draining
Each vhost-user-blk request runs in a coroutine. When the BlockBackend
enters a drained section we need to enter a quiescent state. Currently
any in-flight requests race with bdrv_drained_begin() because it is
unaware of vhost-user-blk requests.

When blk_co_preadv/pwritev()/etc returns it wakes the
bdrv_drained_begin() thread but vhost-user-blk request processing has
not yet finished. The request coroutine continues executing while the
main loop thread thinks it is in a drained section.

One example where this is unsafe is for blk_set_aio_context() where
bdrv_drained_begin() is called before .aio_context_detached() and
.aio_context_attach(). If request coroutines are still running after
bdrv_drained_begin(), then the AioContext could change underneath them
and they race with new requests processed in the new AioContext. This
could lead to virtqueue corruption, for example.

(This example is theoretical, I came across this while reading the
code and have not tried to reproduce it.)

It's easy to make bdrv_drained_begin() wait for in-flight requests: add
a .drained_poll() callback that checks the VuServer's in-flight counter.
VuServer just needs an API that returns true when there are requests in
flight. The in-flight counter needs to be atomic.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-7-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30 17:32:02 +02:00