python/machine: handle "fast" QEMU terminations

In the case that the QEMU process actually launches -- but then dies so
quickly that we can't establish a QMP connection to it -- QEMUMachine
currently calls _post_shutdown() assuming that it never launched the VM
process.

This isn't true, though: it "merely" may have failed to establish a QMP
connection and the process is in the middle of its own exit path.

If we don't wait for the subprocess, the caller may get a bogus `None`
return for .exitcode(). This behavior was observed from
device-crash-test; after the switch to Async QMP, the timings were
changed such that it was now seemingly possible to witness the failure
of "vm.launch()" *prior* to the exitcode becoming available.

The semantic of the `_launched` property is changed in this
patch. Instead of representing the condition "launch() executed
successfully", it will now represent "has forked a child process
successfully". This way, wait() when called in the exit path won't
become a no-op.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Message-id: 20211118204620.1897674-6-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
master
John Snow 2021-11-18 15:46:18 -05:00
parent b1ca991993
commit 1611e6cf4e
1 changed files with 12 additions and 7 deletions

View File

@ -349,9 +349,6 @@ class QEMUMachine:
Called to cleanup the VM instance after the process has exited. Called to cleanup the VM instance after the process has exited.
May also be called after a failed launch. May also be called after a failed launch.
""" """
# Comprehensive reset for the failed launch case:
self._early_cleanup()
try: try:
self._close_qmp_connection() self._close_qmp_connection()
except Exception as err: # pylint: disable=broad-except except Exception as err: # pylint: disable=broad-except
@ -400,8 +397,15 @@ class QEMUMachine:
try: try:
self._launch() self._launch()
self._launched = True
except: except:
# We may have launched the process but it may
# have exited before we could connect via QMP.
# Assume the VM didn't launch or is exiting.
# If we don't wait for the process, exitcode() may still be
# 'None' by the time control is ceded back to the caller.
if self._launched:
self.wait()
else:
self._post_shutdown() self._post_shutdown()
LOG.debug('Error launching VM') LOG.debug('Error launching VM')
@ -426,6 +430,7 @@ class QEMUMachine:
stderr=subprocess.STDOUT, stderr=subprocess.STDOUT,
shell=False, shell=False,
close_fds=False) close_fds=False)
self._launched = True
self._post_launch() self._post_launch()
def _close_qmp_connection(self) -> None: def _close_qmp_connection(self) -> None:
@ -457,8 +462,8 @@ class QEMUMachine:
""" """
Perform any cleanup that needs to happen before the VM exits. Perform any cleanup that needs to happen before the VM exits.
May be invoked by both soft and hard shutdown in failover scenarios. This method may be called twice upon shutdown, once each by soft
Called additionally by _post_shutdown for comprehensive cleanup. and hard shutdown in failover scenarios.
""" """
# If we keep the console socket open, we may deadlock waiting # If we keep the console socket open, we may deadlock waiting
# for QEMU to exit, while QEMU is waiting for the socket to # for QEMU to exit, while QEMU is waiting for the socket to