Commit Graph

658 Commits (e462c6d27d0175a67cd373b3027ea5276d25dd4e)

Author SHA1 Message Date
Peter Xu 08401c0426 migration: Allow migrate-recover to run multiple times
Previously migration didn't have an easy way to cleanup the listening
transport, migrate recovery only allows to execute once.  That's done with a
trick flag in postcopy_recover_triggered.

Now the facility is already there.

Drop postcopy_recover_triggered and instead allows a new migrate-recover to
release the previous listener transport.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220331150857.74406-8-peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21 19:36:46 +01:00
Peter Xu a39e933962 migration: Move channel setup out of postcopy_try_recover()
We used to use postcopy_try_recover() to replace migration_incoming_setup() to
setup incoming channels.  That's fine for the old world, but in the new world
there can be more than one channels that need setup.  Better move the channel
setup out of it so that postcopy_try_recover() only handles the last phase of
switching to the recovery phase.

To do that in migration_fd_process_incoming(), move the postcopy_try_recover()
call to be after migration_incoming_setup(), which will setup the channels.
While in migration_ioc_process_incoming(), postpone the recover() routine right
before we'll jump into migration_incoming_process().

A side benefit is we don't need to pass in QEMUFile* to postcopy_try_recover()
anymore.  Remove it.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220331150857.74406-7-peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21 19:36:46 +01:00
Peter Xu f444eeda71 migration: Move migrate_allow_multifd and helpers into migration.c
This variable, along with its helpers, is used to detect whether multiple
channel will be supported for migration.  In follow up patches, there'll be
other capability that requires multi-channels.  Hence move it outside multifd
specific code and make it public.  Meanwhile rename it from "multifd" to
"multi_channels" to show its real meaning.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220331150857.74406-5-peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21 19:36:46 +01:00
Peter Xu 83174765da migration: Postpone releasing MigrationState.hostname
We used to release it right after migrate_fd_connect().  That's not good
enough when there're more than one socket pair required, because it'll be
needed to establish TLS connection for the rest channels.

One example is multifd, where we copied over the hostname for each channel
but that's actually not needed.

Keeping the hostname until the cleanup phase of migration.

Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220331150857.74406-2-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixup checkpatch error; don't need to check for NULL
  around g_free
2022-04-21 19:36:46 +01:00
Marc-André Lureau 8e3b0cbb72 Replace qemu_real_host_page variables with inlined functions
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.

This avoids the need for a constructor initialization too.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 10:50:38 +02:00
Emanuele Giuseppe Esposito 3b71719462 block: rename bdrv_invalidate_cache_all, blk_invalidate_cache and test_sync_op_invalidate_cache
Following the bdrv_activate renaming, change also the name
of the respective callers.

bdrv_invalidate_cache_all -> bdrv_activate_all
blk_invalidate_cache -> blk_activate
test_sync_op_invalidate_cache -> test_sync_op_activate

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220209105452.1694545-5-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04 18:14:40 +01:00
Peter Xu e031149c78 migration: Add migration_incoming_transport_cleanup()
Add a helper to cleanup the transport listener.

When do it, we should also null-ify the cleanup hook and the data, then it's
even safe to call it multiple times.

Move the socket_address_list cleanup altogether, because that's a mirror of the
listener channels and only for the purpose of query-migrate.  Hence when
someone wants to cleanup the listener transport, it should also want to cleanup
the socket list too, always.

No functional change intended.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-15-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-03-02 18:20:45 +00:00
Peter Xu d5c8f2afe8 migration: Enlarge postcopy recovery to capture !-EIO too
We used to have quite a few places making sure -EIO happened and that's the
only way to trigger postcopy recovery.  That's based on the assumption that
we'll only return -EIO for channel issues.

It'll work in 99.99% cases but logically that won't cover some corner cases.
One example is e.g. ram_block_from_stream() could fail with an interrupted
network, then -EINVAL will be returned instead of -EIO.

I remembered Dave Gilbert pointed that out before, but somehow this is
overlooked.  Neither did I encounter anything outside the -EIO error.

However we'd better touch that up before it triggers a rare VM data loss during
live migrating.

To cover as much those cases as possible, remove the -EIO restriction on
triggering the postcopy recovery, because even if it's not a channel failure,
we can't do anything better than halting QEMU anyway - the corpse of the
process may even be used by a good hand to dig out useful memory regions, or
the admin could simply kill the process later on.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-11-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-03-02 18:20:45 +00:00
David Edmondson ae68066880 migration: Tally pre-copy, downtime and post-copy bytes independently
Provide information on the number of bytes copied in the pre-copy,
downtime and post-copy phases of migration.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-01-28 15:38:23 +01:00
Peter Xu 739fcc1b0e migration: Drop return code for disgard ram process
It will just never fail.  Drop those return values where they're constantly
zeros.

A tiny touch-up on the tracepoint so trace_ram_postcopy_send_discard_bitmap()
is called after the logic itself (which sounds more reasonable).

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-01-28 15:38:23 +01:00
Zhang Chen 01ee5e3556 migration/migration.c: Remove the MIGRATION_STATUS_ACTIVE when migration finished
The MIGRATION_STATUS_ACTIVE indicates that migration is running.
Remove it to be handled by the default operation,
It should be part of the unknown ending states.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-01-28 15:38:23 +01:00
Zhang Chen eeeb48ee33 migration/migration.c: Avoid COLO boot in postcopy migration
COLO dose not support postcopy migration and remove the Fixme.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-01-28 15:38:23 +01:00
Zhang Chen 444252b96a migration/migration.c: Add missed default error handler for migration state
In the migration_completion() no other status is expected, for
example MIGRATION_STATUS_CANCELLING, MIGRATION_STATUS_CANCELLED, etc.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-01-28 15:38:23 +01:00
Laurent Vivier 1b529d908d failover: Silence warning messages during qtest
virtio-net-failover test tries several device combinations that produces
some expected warnings.
These warning can be confusing, so we disable them during the qtest
sequence.

Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20211220145314.390697-1-lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
[thuth: Fix memory leak by using error_free()]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-12-22 08:12:45 +01:00
Juan Quintela 144fa06b34 migration: Never call twice qemu_target_page_size()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15 10:31:42 +01:00
Zhang Chen 751fe4c608 migration/colo: Optimize COLO primary node start code path
Optimize COLO primary start path from:
MIGRATION_STATUS_XXX --> MIGRATION_STATUS_ACTIVE --> MIGRATION_STATUS_COLO --> MIGRATION_STATUS_COMPLETED
To:
MIGRATION_STATUS_XXX --> MIGRATION_STATUS_COLO --> MIGRATION_STATUS_COMPLETED
No need to start primary COLO through "MIGRATION_STATUS_ACTIVE".

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-12-15 10:31:42 +01:00
Rao, Lei 795969ab1f Fixed a QEMU hang when guest poweroff in COLO mode
When the PVM guest poweroff, the COLO thread may wait a semaphore
in colo_process_checkpoint().So, we should wake up the COLO thread
before migration shutdown.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-12-15 10:31:42 +01:00
Rao, Lei 684bfd1820 Fixed SVM hang when do failover before PVM crash
This patch fixed as follows:
    Thread 1 (Thread 0x7f34ee738d80 (LWP 11212)):
    #0 __pthread_clockjoin_ex (threadid=139847152957184, thread_return=0x7f30b1febf30, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
    #1 0x0000563401998e36 in qemu_thread_join (thread=0x563402d66610) at util/qemu-thread-posix.c:587
    #2 0x00005634017a79fa in process_incoming_migration_co (opaque=0x0) at migration/migration.c:502
    #3 0x00005634019b59c9 in coroutine_trampoline (i0=63395504, i1=22068) at util/coroutine-ucontext.c:115
    #4 0x00007f34ef860660 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib/x86_64-linux-gnu/libc.so.6
    #5 0x00007f30b21ee730 in ?? ()
    #6 0x0000000000000000 in ?? ()

    Thread 13 (Thread 0x7f30b3dff700 (LWP 11747)):
    #0  __lll_lock_wait (futex=futex@entry=0x56340218ffa0 <qemu_global_mutex>, private=0) at lowlevellock.c:52
    #1  0x00007f34efa000a3 in _GI__pthread_mutex_lock (mutex=0x56340218ffa0 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
    #2  0x0000563401997f99 in qemu_mutex_lock_impl (mutex=0x56340218ffa0 <qemu_global_mutex>, file=0x563401b7a80e "migration/colo.c", line=806) at util/qemu-thread-posix.c:78
    #3  0x0000563401407144 in qemu_mutex_lock_iothread_impl (file=0x563401b7a80e "migration/colo.c", line=806) at /home/workspace/colo-qemu/cpus.c:1899
    #4  0x00005634017ba8e8 in colo_process_incoming_thread (opaque=0x563402d664c0) at migration/colo.c:806
    #5  0x0000563401998b72 in qemu_thread_start (args=0x5634039f8370) at util/qemu-thread-posix.c:519
    #6  0x00007f34ef9fd609 in start_thread (arg=<optimized out>) at pthread_create.c:477
    #7  0x00007f34ef924293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

    The QEMU main thread is holding the lock:
    (gdb) p qemu_global_mutex
    $1 = {lock = {_data = {lock = 2, __count = 0, __owner = 11212, __nusers = 9, __kind = 0, __spins = 0, __elision = 0, __list = {_prev = 0x0, __next = 0x0}},
     __size = "\002\000\000\000\000\000\000\000\314+\000\000\t", '\000' <repeats 26 times>, __align = 2}, file = 0x563401c07e4b "util/main-loop.c", line = 240,
    initialized = true}

>From the call trace, we can see it is a deadlock bug. and the QEMU main thread holds the global mutex to wait until the COLO thread ends. and the colo thread
wants to acquire the global mutex, which will cause a deadlock. So, we should release the qemu_global_mutex before waiting colo thread ends.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-03 09:38:53 +01:00
Rao, Lei aa505f8e0e Fixed qemu crash when guest power off in COLO mode
This patch fixes the following:
qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running'
Aborted (core dumped)
The gdb bt as following:
0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
1  0x00007faa3d613859 in __GI_abort () at abort.c:79
2  0x000055c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at vl.c:723
3  0x000055c5a1f8cae4 in vm_prepare_start () at /home/workspace/colo-qemu/cpus.c:2206
4  0x000055c5a1f8cb1b in vm_start () at /home/workspace/colo-qemu/cpus.c:2213
5  0x000055c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) at migration/migration.c:3376
6  0x000055c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at migration/migration.c:3527
7  0x000055c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at util/qemu-thread-posix.c:519
8  0x00007faa3d7e9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
9  0x00007faa3d710293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-03 09:38:53 +01:00
yuxiating fa0b31d585 migration: initialise compression_counters for a new migration
If the compression migration fails or is canceled, the query for the value of
compression_counters during the next compression migration is wrong.

Signed-off-by: yuxiating <yuxiating@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-03 09:38:53 +01:00
Laurent Vivier 458fecca80 migration: provide an error message to migration_cancel()
This avoids to call migrate_get_current() in the caller function
whereas migration_cancel() already needs the pointer to the current
migration state.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-03 09:38:53 +01:00
David Hildenbrand 7648297d40 migration: Simplify alignment and alignment checks
Let's use QEMU_ALIGN_DOWN() and friends to make the code a bit easier to
read.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
Peter Xu 60fd680193 migration: Add migrate_add_blocker_internal()
An internal version that removes -only-migratable implications.  It can be used
for temporary migration blockers like dump-guest-memory.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:44 +01:00
Peter Xu 4c170330aa migration: Make migration blocker work for snapshots too
save_snapshot() checks migration blocker, which looks sane.  At the meantime we
should also teach the blocker add helper to fail if during a snapshot, just
like for migrations.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01 22:56:43 +01:00
Li Zhijian 5ad15e8614 migration: allow enabling mutilfd for specific protocol only
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
CC: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Sat, 31 Jul 2021 22:05:52 +0800 (5 weeks, 4 days, 17 hours ago)

And change the default to true so that in '-incoming defer' case, user is able
to change multifd capability.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-10-19 08:39:04 +02:00
Li Zhijian b7acd65707 migration: allow multifd for socket protocol only
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
CC: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Sat, 31 Jul 2021 22:05:51 +0800 (5 weeks, 4 days, 17 hours ago)

multifd with unsupported protocol will cause a segment fault.
(gdb) bt
 #0  0x0000563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at ../util/qemu-sockets.c:1190
 #1 0x0000563b4a797a03 in qio_channel_socket_connect_sync
(ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at
../io/channel-socket.c:145
 #2  0x0000563b4a797abf in qio_channel_socket_connect_worker (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168
 #3  0x0000563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at ../io/task.c:124
 #4  0x0000563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at ../util/qemu-thread-posix.c:541
 #5  0x00007f7fe9b5b3f9 in ?? ()
 #6  0x0000000000000000 in ?? ()

It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and
multifd setup() though there are so many other places using migrate_use_multifd().

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-10-19 08:39:04 +02:00
Emanuele Giuseppe Esposito 68b88468f6 migration: add missing qemu_mutex_lock_iothread in migration_completion
qemu_savevm_state_complete_postcopy assumes the iothread lock (BQL)
to be held, but instead it isn't.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20211005080751.3797161-3-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-05 13:10:29 +02:00
Markus Armbruster 7d6f6933aa migration: Handle migration_incoming_setup() errors consistently
Commit b673eab4e2 "multifd: Make multifd_load_setup() get an Error
parameter" changed migration_incoming_setup() to take an Error **
argument, and adjusted the callers accordingly.  It neglected to
change adjust multifd_load_setup(): it still exit()s on error.  Clean
that up.

The error now gets propagated up two call chains: via
migration_fd_process_incoming() to rdma_accept_incoming_migration(),
and via migration_ioc_process_incoming() to
migration_channel_process_incoming().  Both chain ends report the
error with error_report_err(), but otherwise ignore it.  Behavioral
change: we no longer exit() on this error.

This is consistent with how we handle other errors here, e.g. from
multifd_recv_new_channel() via migration_ioc_process_incoming() to
migration_channel_process_incoming().  Whether it's consistently right
or consistently wrong I can't tell.

Also clean up the return value from the unusual 0 on success, 1 on
error to the more common true on success, false on error.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-11-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-26 17:15:28 +02:00
Markus Armbruster f9734d5d40 error: Use error_fatal to simplify obvious fatal errors (again)
We did this with scripts/coccinelle/use-error_fatal.cocci before, in
commit 50beeb6809 and 007b06578a.  This commit cleans up rarer
variations that don't seem worth matching with Coccinelle.

Cc: Thomas Huth <thuth@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-2-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2021-08-26 17:15:28 +02:00
Peter Xu 39675ffffb migration: Move the yank unregister of channel_close out
It's efficient, but hackish to call yank unregister calls in channel_close(),
especially it'll be hard to debug when qemu crashed with some yank function
leaked.

Remove that hack, but instead explicitly unregister yank functions at the
places where needed, they are:

  (on src)
  - migrate_fd_cleanup
  - postcopy_pause

  (on dst)
  - migration_incoming_state_destroy
  - postcopy_pause_incoming

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210722175841.938739-6-peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-26 12:45:03 +01:00
Peter Xu 43044ac0ee migration: Make from_dst_file accesses thread-safe
Accessing from_dst_file is potentially racy in current code base like below:

  if (s->from_dst_file)
    do_something(s->from_dst_file);

Because from_dst_file can be reset right after the check in another
thread (rp_thread).  One example is migrate_fd_cancel().

Use the same qemu_file_lock to protect it too, just like to_dst_file.

When it's safe to access without lock, comment it.

There's one special reference in migration_thread() that can be replaced by
the newly introduced rp_thread_created flag.

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Message-Id: <20210722175841.938739-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  with Peter's fixup
2021-07-26 12:44:46 +01:00
Peter Xu 53021ea165 migration: Fix missing join() of rp_thread
It's possible that the migration thread skip the join() of the rp_thread in
below race and crash on src right at finishing migration:

       migration_thread                     rp_thread
       ----------------                     ---------
    migration_completion()
                                        (before rp_thread quits)
                                        from_dst_file=NULL
                                        [thread got scheduled out]
      s->rp_state.from_dst_file==NULL
        (skip join() of rp_thread)
    migrate_fd_cleanup()
      qemu_fclose(s->to_dst_file)
      yank_unregister_instance()
        assert(yank_find_entry())  <------- crash

It could mostly happen with postcopy, but that shouldn't be required, e.g., I
think it could also trigger with MIGRATION_CAPABILITY_RETURN_PATH set.

It's suspected that above race could be the root cause of a recent (but rare)
migration-test break reported by either Dave or PMM:

https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/

The issue is: from_dst_file is reset in the rp_thread, so if the thread reset
it to NULL fast enough then the migration thread will assume there's no
rp_thread at all.

This could potentially cause more severe issue (e.g. crash) after the yank code.

Fix it by using a boolean to keep "whether we've created rp_thread".

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210722175841.938739-2-peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-26 12:44:26 +01:00
Peter Xu ca7bd0821b migration: Clear error at entry of migrate_fd_connect()
For each "migrate" command, remember to clear the s->error before going on.
For one reason, when there's a new error it'll be still remembered; see
migrate_set_error() who only sets the error if error==NULL.  Meanwhile if a
failed migration completes (e.g., postcopy recovered and finished), we
shouldn't dump an error when calling migrate_fd_cleanup() at last.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210708190653.252961-4-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-13 16:21:57 +01:00
Peter Xu ca30f24d12 migration: Don't do migrate cleanup if during postcopy resume
Below process could crash qemu with postcopy recovery:

  1. (hmp) migrate -d ..
  2. (hmp) migrate_start_postcopy
  3. [network down, postcopy paused]
  4. (hmp) migrate -r $WRONG_PORT
     when try the recover on an invalid $WRONG_PORT, cleanup_bh will be cleared
  5. (hmp) migrate -r $RIGHT_PORT
     [qemu crash on assert(cleanup_bh)]

The thing is we shouldn't cleanup if it's postcopy resume; the error is set
mostly because the channel is wrong, so we return directly waiting for the user
to retry.

migrate_fd_cleanup() should only be called when migration is cancelled or
completed.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210708190653.252961-3-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-13 16:21:57 +01:00
Peter Xu 2e3e3da3c2 migration: Release return path early for paused postcopy
When postcopy pause triggered, we rely on the migration thread to cleanup the
to_dst_file handle, and the return path thread to cleanup the from_dst_file
handle (which is stored in the local variable "rp").

Within the process, from_dst_file cleanup (qemu_fclose) is postponed until it's
setup again due to a postcopy recovery.

It used to work before yank was born; after yank is introduced we rely on the
refcount of IOC to correctly unregister yank function in channel_close().  If
without the early and on-time release of from_dst_file handle the yank function
will be leftover during paused postcopy.

Without this patch, below steps (quoted from Xiaohui) could trigger qemu src
crash:

  1.Boot vm on src host
  2.Boot vm on dst host
  3.Enable postcopy on src&dst host
  4.Load stressapptest in vm and set postcopy speed to 50M
  5.Start migration from src to dst host, change into postcopy mode when migration is active.
  6.When postcopy is active, down the network card(do migration via this network) on dst host.
  7.Wait untill postcopy is paused on src&dst host.
  8.Before up network card, recover migration on dst host, will get error like following.
  9.Ignore the error of step 8, go on recovering migration on src host:

  After step 9, qemu on src host will core dump after some seconds:
  qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
  1.sh: line 38: 44662 Aborted                 (core dumped)

Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210708190653.252961-2-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-13 16:21:57 +01:00
Laurent Vivier a51dcef08b migration: failover: emit a warning when the card is not fully unplugged
When the migration fails or is canceled we wait the end of the unplug
operation to be able to plug it back. But if the unplug operation
is never finished we stop to wait and QEMU emits a warning to inform
the user.

Based-on: 20210629155007.629086-1-lvivier@redhat.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210701131458.112036-1-lvivier@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-13 16:21:57 +01:00
Laurent Vivier 944bc52842 migration: failover: continue to wait card unplug on error
If the user cancels the migration in the unplug-wait state,
QEMU will try to plug back the card and this fails because the card
is partially unplugged.
To avoid the problem, continue to wait the card unplug, but to
allow the migration to be canceled if the card never finishes to unplug
use a timeout.

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210629155007.629086-3-lvivier@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Laurent Vivier fde93d99d9 migration: move wait-unplug loop to its own function
The loop is used in migration_thread() and bg_migration_thread(),
so we can move it to its own function and call it from these both places.

Moreover, in migration_thread() we have a wrong state transition from
SETUP to ACTIVE while state could be WAIT_UNPLUG. This is correctly
managed in bg_migration_thread() so use this code instead.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210629155007.629086-2-lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Peter Xu b7f9afd48e migration: Allow reset of postcopy_recover_triggered when failed
It's possible qemu_start_incoming_migration() failed at any point, when it
happens we should reset postcopy_recover_triggered to false so that the user
can still retry with a saner incoming port.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210629181356.217312-3-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Peter Xu cc48c587d2 migration: Move yank outside qemu_start_incoming_migration()
Starting from commit b5eea99ec2, qmp_migrate_recover() calls unregister
before calling qemu_start_incoming_migration(). I believe it wanted to mitigate
the next call to yank_register_instance(), but I think that's wrong.

Firstly, if during recover, we should keep the yank instance there, not
"quickly removing and adding it back".

Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2 will directly
crash the dest qemu (right now it can't; but it'll start to work right after
the next patch) because the 1st call of qmp_migrate_recover() will unregister
permanently when the channel failed to establish, then the 2nd call of
qmp_migrate_recover() crashes at yank_unregister_instance().

This patch fixes it by moving yank ops out of qemu_start_incoming_migration()
into qmp_migrate_incoming.  For qmp_migrate_recover(), drop the unregister of
yank instance too since we keep it there during the recovery phase.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210629181356.217312-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Daniel P. Berrangé 626ff6515d migration: add trace point when vm_stop_force_state fails
This is a critical failure scenario for migration that is hard to
diagnose from existing probes. Most likely it is caused by an error
from bdrv_flush(), but we're not logging the errno anywhere, hence
this new probe.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-14 13:28:50 +01:00
Rao, Lei 3ba024457f Remove migrate_set_block_enabled in checkpoint
We can detect disk migration in migrate_prepare, if disk migration
is enabled in COLO mode, we can directly report an error.and there
is no need to disable block migration at every checkpoint.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2021-06-11 10:30:13 +08:00
Dr. David Alan Gilbert 1df6ddb43b migration: Add cleanup hook for inwards migration
Add a cleanup hook for incoming migration that gets called
at the end as a way for a transport to allow cleanup.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210421112834.107651-4-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-08 19:36:18 +01:00
Peter Maydell 9b1e81d1c2 * Replace YAML anchors by extends in the gitlab-CI yaml files
* Many small qtest fixes (e.g. to fix issues discovered by Coverity)
 * Poison more config switches in common code
 * Fix the failing Travis-CI and Cirrus-CI tasks
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmCeXFMRHHRodXRoQHJl
 ZGhhdC5jb20ACgkQLtnXdP5wLbUV7g//VRxN74v2pO6zSsSNavYTkMa3jZE1Io3l
 w9lsOr3DM0HIMhyEMwSJsiygj0s8TNanUtFBHfDfo1k1gmZYd5FiM1sB7ZB/sB/S
 9Q9wjmeV2NMG7pcr9zLmiJaEM2LGfGbso44/m0c+gpSyxzg2TeH7sF+38AUZI9/R
 J6/gOzIs4WWdSV4f8kp+YPPQCyOtxZsfxDuk1z1fKsBgXE5iEBUqYyyZUm4gYJkN
 4ALmqc/NVeLszt08qkPHxwXQc488nCJP31psxx6MQ7toKfCAamT07Pp3oi70cakC
 y+HVfsIHc7SxaZyj8sbJCU3LJHwDd8N82ydJUrv216qDffb38fNb+m6y9mcYy2MH
 C25uGe9mucTuTP1WIttC6nCKg/MCgi4PWIzqEhkeKj3TxpTUJJP+BUIfSubV38Gc
 T+XUCNkWWW8sTeRiyE3m9pEZ+gz1MFubaIr/Owephch1SjRYn4zUwJFHHi3I4PY4
 7XjEo8y2M9tKckW3pCfYi+aIDzC1DcRtzvXUGtdemX5xVjTAGlKkFLWl/brdCn2U
 y+JrwrL8cQ3Fy7bXzmK6m9mmhcIW6PcSOBE7RnSESyKzqM5VBSBgnjMqnHcP3FrV
 FYzihTCDcFKy3ELUe3Kbc1TgG/07stQs9xInGXN7ZFsIwadfHgoNVQ6JpOqq1oQa
 fvmLPBhq+a4=
 =QZuG
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-05-14' into staging

* Replace YAML anchors by extends in the gitlab-CI yaml files
* Many small qtest fixes (e.g. to fix issues discovered by Coverity)
* Poison more config switches in common code
* Fix the failing Travis-CI and Cirrus-CI tasks

# gpg: Signature made Fri 14 May 2021 12:17:39 BST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* remotes/thuth-gitlab/tags/pull-request-2021-05-14:
  cirrus.yml: Fix the MSYS2 task
  pc-bios/s390-ccw: Fix inline assembly for older versions of Clang
  tests/qtest/migration-test: Use g_autofree to avoid leaks on error paths
  configure: Poison all current target-specific #defines
  migration: Move populate_vfio_info() into a separate file
  include/sysemu: Poison all accelerator CONFIG switches in common code
  tests: Avoid side effects inside g_assert() arguments
  tests/qtest/rtc-test: Remove pointless NULL check
  tests/qtest/tpm-util.c: Free memory with correct free function
  tests/migration-test: Fix "true" vs true
  tests/qtest/npcm7xx_pwm-test.c: Avoid g_assert_true() for non-test assertions
  tests/qtest/ahci-test.c: Calculate iso_size with 64-bit arithmetic
  util/compatfd.c: Replaced a malloc call with g_malloc.
  libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm
  docs/devel/qgraph: add troubleshooting information
  libqos/qgraph: fix "UNAVAILBLE" typo
  gitlab-ci: Replace YAML anchors by extends (native_test_job)
  gitlab-ci: Replace YAML anchors by extends (native_build_job)
  gitlab-ci: Replace YAML anchors by extends (container_job)
  tests/docker/dockerfiles: Add ccache to containers where it was missing

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-05-14 19:33:23 +01:00
Thomas Huth 43bd0bf30f migration: Move populate_vfio_info() into a separate file
The CONFIG_VFIO switch only works in target specific code. Since
migration/migration.c is common code, the #ifdef does not have
the intended behavior here. Move the related code to a separate
file now which gets compiled via specific_ss instead.

Fixes: 3710586caa ("qapi: Add VFIO devices migration stats in Migration stats")
Message-Id: <20210414112004.943383-3-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-05-14 12:31:51 +02:00
David Hildenbrand c7c0e72408 migration/ram: Handle RAM block resizes during precopy
Resizing while migrating is dangerous and does not work as expected.
The whole migration code works on the usable_length of ram blocks and does
not expect this to change at random points in time.

In the case of precopy, the ram block size must not change on the source,
after syncing the RAM block list in ram_save_setup(), so as long as the
guest is still running on the source.

Resizing can be trigger *after* (but not during) a reset in
ACPI code by the guest
- hw/arm/virt-acpi-build.c:acpi_ram_update()
- hw/i386/acpi-build.c:acpi_ram_update()

Use the ram block notifier to get notified about resizes. Let's simply
cancel migration and indicate the reason. We'll continue running on the
source. No harm done.

Update the documentation. Postcopy will be handled separately.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210429112708.12291-5-david@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Manual merge
2021-05-13 18:21:13 +01:00
Markus Armbruster 372043f389 migration: Drop redundant query-migrate result @blocked
Result @blocked is redundant.  Unfortunately, we realized this too
close to the release to risk dropping it, so we deprecated it
instead, in commit e11ce6c06.

Since it was deprecated from the start, we can delete it without
the customary grace period.  Do so.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210429140424.2802929-1-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-13 18:21:13 +01:00
Andrey Gruzdev eeccb99c9d migration: Pre-fault memory before starting background snasphot
This commit solves the issue with userfault_fd WP feature that
background snapshot is based on. For any never poluated or discarded
memory page, the UFFDIO_WRITEPROTECT ioctl() would skip updating
PTE for that page, thereby loosing WP setting for it.

So we need to pre-fault pages for each RAM block to be protected
before making a userfault_fd wr-protect ioctl().

Fixes: 278e2f551a (migration: support
  UFFD write fault processing in ram_save_iterate())
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210401092226.102804-4-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert:
  Bodged ifdef __linux__ on ram_write_tracking_prepare, should really
      go in a stub
2021-04-07 18:37:28 +01:00
Andrey Gruzdev 1a8e44a89f migration: Inhibit virtio-balloon for the duration of background snapshot
The same thing as for incoming postcopy - we cannot deal with concurrent
RAM discards when using background snapshot feature in outgoing migration.

Fixes: 8518278a6a (migration: implementation
  of background snapshot thread)
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210401092226.102804-3-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-04-06 18:56:01 +01:00
Andrey Gruzdev ecb23efea0 migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread
Added missing qemu_fflush() on buffer file holding precopy device state.
Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs.
Typical configurations often require >200KB for device state and VMDESC.

Fixes: 8518278a6a (migration: implementation
  of background snapshot thread)
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Message-Id: <20210401092226.102804-2-andrey.gruzdev@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-04-06 18:56:01 +01:00
Daniel P. Berrangé cbde7be900 migrate: remove QMP/HMP commands for speed, downtime and cache size
The generic 'migrate_set_parameters' command handle all types of param.

Only the QMP commands were documented in the deprecations page, but the
rationale for deprecating applies equally to HMP, and the replacements
exist. Furthermore the HMP commands are just shims to the QMP commands,
so removing the latter breaks the former unless they get re-implemented.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-18 09:22:55 +00:00
Mahmoud Mandour 373969507a migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD
Replaced various qemu_mutex_lock calls and their respective
qemu_mutex_unlock calls with QEMU_LOCK_GUARD macro. This simplifies
the code by eliminating the respective qemu_mutex_unlock calls.

Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
Message-Id: <20210311031538.5325-7-ma.mandourr@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-15 20:01:55 +00:00
Dr. David Alan Gilbert 3af8554bd0 migration: Add blocker information
Modify query-migrate so that it has a flag indicating if outbound
migration is blocked, and if it is a list of reasons.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202135522.127380-2-dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Markus Armbruster 54270c450a migration: Fix a few absurdly defective error messages
migrate_params_check() has a number of error messages of the form

    Parameter 'NAME' expects is invalid, it should be ...

Fix them to something like

    Parameter 'NAME' expects a ...

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202141734.2488076-5-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Markus Armbruster 8b9407a09f migration: Clean up signed vs. unsigned XBZRLE cache-size
73af8dd8d7 "migration: Make xbzrle_cache_size a migration
parameter" (v2.11.0) made the new parameter unsigned (QAPI type
'size', uint64_t in C).  It neglected to update existing code, which
continues to use int64_t.

migrate_xbzrle_cache_size() returns the new parameter.  Adjust its
return type.

QMP query-migrate-cache-size returns migrate_xbzrle_cache_size().
Adjust its return type.

migrate-set-parameters passes the new parameter to
xbzrle_cache_resize().  Adjust its parameter type.

xbzrle_cache_resize() passes it on to cache_init().  Adjust its
parameter type.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210202141734.2488076-3-armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Andrey Gruzdev 8518278a6a migration: implementation of background snapshot thread
Introducing implementation of 'background' snapshot thread
which in overall follows the logic of precopy migration
while internally utilizes completely different mechanism
to 'freeze' vmstate at the start of snapshot creation.

This mechanism is based on userfault_fd with wr-protection
support and is Linux-specific.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210129101407.103458-5-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Andrey Gruzdev 6e8c25b4c6 migration: introduce 'background-snapshot' migration capability
Add new capability to 'qapi/migration.json' schema.
Update migrate_caps_check() to validate enabled capability set
against introduced one. Perform checks for required kernel features
and compatibility with guest memory backends.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210129101407.103458-2-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Eric Blake 95b3a8c8a8 qapi: More complex uses of QAPI_LIST_APPEND
These cases require a bit more thought to review; in each case, the
code was appending to a list, but not with a FOOList **tail variable.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210113221013.390592-6-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Flawed change to qmp_guest_network_get_interfaces() dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-28 08:08:45 +01:00
Lukas Straub b5eea99ec2 migration: Add yank feature
Register yank functions on sockets to shut them down.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <484c6a14cc2506bebedd5a237259b91363ff8f88.1609167865.git.lukasstraub2@web.de>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-13 10:21:17 +01:00
Peter Maydell 41192db338 Machine queue, 2020-12-23
Cleanup:
 * qdev code cleanup (Eduardo Habkost)
 
 Bug fix:
 * hostmem: Free host_nodes list right after visited (Keqian Zhu)
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl/jteYUHGVoYWJrb3N0
 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaZUHw//c40nRlYdGSV5j6w3ZCSlmZRFxZTU
 UiLK51Z3hI9Q9kyLcoIQitEYlQTIbgp0qlIJ6evDd/HvQvZ+P4P0Lzm1UGOZhD0h
 nJk5+bBkP/mzMh0P9oiN20DSLk6a3Wvdiu/bQR8gm/WdLvTM1Zjek1ns5tL06ZvA
 MziG6gIypgScu2FeNxD0zC8sDO16oVrzKq7mjZcQe6XYFRsJmYjZw84v+uu/Bdf7
 MBxolkA8vYwwBJNdVsAf7I0Gw3BeArgPUOwbWyt8/tuGIOZxYjdKIj55S7j2fuju
 524sg8Di+YzxmLZaNAGksEBMj9uY39nwdHGhNElMtWCM9oOPumlps9eyLtpTagfM
 wmiVrMGWVlXV6c4kZo8R2NSF8hcDr02S7eyrUpITrh09p4nT6fBGG2ufEYiCyNao
 o9ZqMf7NUO5J60zM5EOfdGxpaN2O0M5pXCCN48NtmqvO0wIAfTc9l/OkCrrfVbEO
 Q/X1jqbj6ZcilSIl9OeLAPi7Xjx26jMeeLPUQtoZnkqDvpk/Vz6Ka1RgGG86QA5z
 2W/KCAoVrg6dO4f9vY3x84rf0Ta5kJtp2LezPgG8d++4bMSf2jN00wYvAQuCyqqW
 zbm8f57YST3vm8XMHPlmtnlKfiLI4wbVUmrDYu3rNI+JgdvhdXseGoErt15ejAcL
 B5IH2SK4AwMpSsk=
 =bnjc
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost-gl/tags/machine-next-pull-request' into staging

Machine queue, 2020-12-23

Cleanup:
* qdev code cleanup (Eduardo Habkost)

Bug fix:
* hostmem: Free host_nodes list right after visited (Keqian Zhu)

# gpg: Signature made Wed 23 Dec 2020 21:25:58 GMT
# gpg:                using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6
# gpg:                issuer "ehabkost@redhat.com"
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full]
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost-gl/tags/machine-next-pull-request:
  bugfix: hostmem: Free host_nodes list right after visited
  qdev: Avoid unnecessary DeviceState* variable at set_prop_arraylen()
  qdev: Rename qdev_get_prop_ptr() to object_field_prop_ptr()
  qdev: Move qdev_prop_tpm declaration to tpm_prop.h
  qdev: Make qdev_class_add_property() more flexible
  qdev: Make PropertyInfo.create return ObjectProperty*
  qdev: Move dev->realized check to qdev_property_set()
  qdev: Wrap getters and setters in separate helpers
  qdev: Add name argument to PropertyInfo.create method
  qdev: Add name parameter to qdev_class_add_property()
  qdev: Avoid using prop->name unnecessarily
  qdev: Get just property name at error_set_from_qdev_prop_error()
  sparc: Use DEFINE_PROP for nwindows property
  qdev: Reuse DEFINE_PROP in all DEFINE_PROP_* macros
  qdev: Move softmmu properties to qdev-properties-system.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-01 22:57:15 +00:00
Peter Maydell 1f7c02797f QAPI patches patches for 2020-12-19
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG
 F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs
 9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3
 Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6
 kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq
 eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ
 N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw
 Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq
 GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV
 isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8
 HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6
 vEQDZ+PxubDg
 =vGLy
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging

QAPI patches patches for 2020-12-19

# gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits)
  qobject: Make QString immutable
  block: Use GString instead of QString to build filenames
  keyval: Use GString to accumulate value strings
  json: Use GString instead of QString to accumulate strings
  migration: Replace migration's JSON writer by the general one
  qobject: Factor JSON writer out of qobject_to_json()
  qobject: Factor quoted_str() out of to_json()
  qobject: Drop qstring_get_try_str()
  qobject: Drop qobject_get_try_str()
  Revert "qobject: let object_property_get_str() use new API"
  block: Avoid qobject_get_try_str()
  qmp: Fix tracing of non-string command IDs
  qobject: Move internals to qobject-internal.h
  hw/rdma: Replace QList by GQueue
  Revert "qstring: add qstring_free()"
  qobject: Change qobject_to_json()'s value to GString
  qobject: Use GString instead of QString to accumulate JSON
  qobject: Make qobject_to_json_pretty() take a pretty argument
  monitor: Use GString instead of QString for output buffer
  hmp: Simplify how qmp_human_monitor_command() gets output
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-01 14:33:03 +00:00
Eric Blake 54aa3de72e qapi: Use QAPI_LIST_PREPEND() where possible
Anywhere we create a list of just one item or by prepending items
(typically because order doesn't matter), we can use
QAPI_LIST_PREPEND().  But places where we must keep the list in order
by appending remain open-coded until later patches.

Note that as a side effect, this also performs a cleanup of two minor
issues in qga/commands-posix.c: the old code was performing
 new = g_malloc0(sizeof(*ret));
which 1) is confusing because you have to verify whether 'new' and
'ret' are variables with the same type, and 2) would conflict with C++
compilation (not an actual problem for this file, but makes
copy-and-paste harder).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201113011340.463563-5-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
[Straightforward conflicts due to commit a8aa94b5f8 "qga: update
schema for guest-get-disks 'dependents' field" and commit a10b453a52
"target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c"
resolved.  Commit message tweaked.]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-12-19 10:20:14 +01:00
Eric Blake eaedde5255 migration: Refactor migrate_cap_add
Instead of taking a list parameter and returning a new head at a
distance, just return the new item for the caller to insert into a
list via QAPI_LIST_PREPEND.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201113011340.463563-4-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-12-19 10:15:08 +01:00
Eduardo Habkost ce35e2295e qdev: Move softmmu properties to qdev-properties-system.h
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-18 15:20:17 -05:00
Tuguoyi 36d0fe6516 migration: Don't allow migration if vm is in POSTMIGRATE
The following steps will cause qemu assertion failure:
- pause vm by executing 'virsh suspend'
- create external snapshot of memory and disk using 'virsh snapshot-create-as'
- doing the above operation again will cause qemu crash

The backtrace looks like:
    at /build/qemu-5.0/migration/savevm.c:1401
    at /build/qemu-5.0/migration/savevm.c:1453

When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE
flag by bdrv_inactivate_all(), and during the second migration the
bdrv_inactivate_recurse assert that the bs->open_flags is already
BDRV_O_INACTIVE enabled which cause crash.

As Vladimir suggested, this patch makes migrate_prepare check the state of vm and
return error if it is in RUN_STATE_POSTMIGRATE state.

Signed-off-by: Tuguoyi <tu.guoyi@h3c.com>
Message-Id: <6b704294ad2e405781c38fb38d68c744@h3c.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reported-by: Li Zhang <li.zhang@cloud.ionos.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18 10:08:25 +00:00
Paolo Bonzini e69d50d621 migration, vl: start migration via qmp_migrate_incoming
Make qemu_start_incoming_migration local to migration/migration.c.
By using the runstate instead of a separate flag, vl need not do
anything to setup deferred incoming migration.

qmp_migrate_incoming also does not need the deferred_incoming flag
anymore, because "-incoming PROTOCOL" will clear the "once" flag
before the main loop starts.  Therefore, later invocations of
the migrate-incoming command will fail with the existing
"The incoming migration has already been started" error message.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:14 -05:00
Paolo Bonzini e0d17dfd22 vl: move various initialization routines out of qemu_init
Some very simple initialization routines can be nested in existing
subsystem-level functions, do that to simplify qemu_init.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:11 -05:00
Longpeng(Mike) 6ba11211bd migration: handle CANCELLING state in migration_completion()
The following sequence may cause the VM abort during migration:

1. RUN_STATE_RUNNING,MIGRATION_STATUS_ACTIVE

2. before call migration_completion(), we send migrate_cancel
   QMP command, the state machine is changed to:
     RUN_STATE_RUNNING,MIGRATION_STATUS_CANCELLING

3. call migration_completion(), and the state machine is
   switch to: RUN_STATE_RUNNING,MIGRATION_STATUS_COMPLETED

4. call migration_iteration_finish(), because the migration
   status is COMPLETED, so it will try to set the runstate
   to POSTMIGRATE, but RUNNING-->POSTMIGRATE is an invalid
   transition, so abort().

The migration_completion() should not change the migration state
to COMPLETED if it is already changed to CANCELLING.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Message-Id: <20201105091726.148-1-longpeng2@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-12 15:52:20 +00:00
Chen Qun a24292830b migration: fix uninitialized variable warning in migrate_send_rp_req_pages()
After the WITH_QEMU_LOCK_GUARD macro is added, the compiler cannot identify
 that the statements in the macro must be executed. As a result, some variables
 assignment statements in the macro may be considered as unexecuted by the compiler.

When the -Wmaybe-uninitialized capability is enabled on GCC9,the compiler showed warning:
migration/migration.c: In function ‘migrate_send_rp_req_pages’:
migration/migration.c:384:8: warning: ‘received’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 384 |     if (received) {
     |        ^

Add a default value for 'received' to prevented the warning.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201111142203.2359370-6-kuhn.chenqun@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-12 14:49:16 +00:00
Kirti Wankhede 3710586caa qapi: Add VFIO devices migration stats in Migration stats
Added amount of bytes transferred to the VM at destination by all VFIO
devices

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Peter Maydell d55450df99 migration pull: 2020-10-26
Another go at Peter's postcopy fixes
 
 Cleanups from Bihong Yu and Peter Maydell.
 
 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAl+W9n8ACgkQBRYzHrxb
 /ef2uRAAqWTFLXuBF8+evEd1mMq2SM3ZYTuc7QKTY3MzAH6J/OMvJbZ112itqWOb
 iZ5NuuWH4PvzOhlR/PNNf1Yv3hTfv36HinG+OCh6s+6aqVx9yHOAfdBgmJIdYAeg
 Sk1jx43dvCyN2FwPs31ir3L6mwsrtfkRsS+2FeyrvRoEl4WE9mOoypCft3vdd9Dw
 zZea0Pw7vIs454D4n1vpJiQtq6B4eSAlQKpTLfQbglpTm4MgqLERzGvpT6hbQXJR
 eQyTOqRe08viIOZ+oN0B/+RVO6T9jc4Y1bEl2NSak1v4Tf7NNfDkFpLAjFm07V/1
 tIhL/NOOsHdzfHQtrZpzKQgwaceb1N5qo0PfxD6/tRf9HlXY54iw6yY75+5c5Y89
 UK8VSIYKnM2yXeVDLShxixIr3A1Z+zA41XydDwaLZczjeV7+nwrAXAjO8a+j6Dox
 zj4IyN2g5elEOmarC8qkvbDZ+TVvA2tookhWVwoz+D8ChYkcRDKP9eoYomfRwg+e
 NKRFuLBkyVPb0eEhyOV6HqJbMfTLpHneTM94v6HGz8tiK8IlMZfTTnC2Mr5gTXuS
 /cgOVhsY7+l+pKpxpGJmU3aUCYRk1CuK6MhXgjYEFMh5Siba8s0ZPZVaEm/BUyO1
 rD+tVup87xMiJq3xnmLX+opblYE9G+b67hH1KuPc5vZXiSwuTkQ=
 =OL0Q
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20201026a' into staging

migration pull: 2020-10-26

Another go at Peter's postcopy fixes

Cleanups from Bihong Yu and Peter Maydell.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

# gpg: Signature made Mon 26 Oct 2020 16:17:03 GMT
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20201026a:
  migration-test: Only hide error if !QTEST_LOG
  migration/postcopy: Release fd before going into 'postcopy-pause'
  migration: Sync requested pages after postcopy recovery
  migration: Maintain postcopy faulted addresses
  migration: Introduce migrate_send_rp_message_req_pages()
  migration: Pass incoming state into qemu_ufd_copy_ioctl()
  migration: using trace_ to replace DPRINTF
  migration: Delete redundant spaces
  migration: Open brace '{' following function declarations go on the next line
  migration: Do not initialise statics and globals to 0 or NULL
  migration: Add braces {} for if statement
  migration: Open brace '{' following struct go on the same line
  migration: Add spaces around operator
  migration: Don't use '#' flag of printf format
  migration: Do not use C99 // comments
  migration: Drop unused VMSTATE_FLOAT64 support

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-27 10:25:42 +00:00
Peter Xu d246ea5039 migration/postcopy: Release fd before going into 'postcopy-pause'
Logically below race could trigger with the old code:

          test program                        migration thread
          ------------                        ----------------
       wait_until('postcopy-pause')
                                          postcopy_pause()
                                            set_state('postcopy-pause')
       do_postcopy_recover()
         arm s->to_dst_file with new fd
                                            release s->to_dst_file [1]

Here [1] could have released the just-installed recoverying channel.  Then the
migration could hang without really resuming.

Instead, it should be very safe to release the fd before setting the state into
'postcopy-pause', because there's no reason for any other thread to touch it
during 'postcopy-active'.

Dave reported a very rare postcopy recovery hang that the migration-test
program waited for the migration to complete in migrate_postcopy_complete().
We do suspect it's the same thing that we're gonna fix here.  Hard to tell.
However since we've noticed this, fix this irrelevant of the hang report.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-6-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 16:15:04 +00:00
Peter Xu 8f8bfffcf1 migration: Maintain postcopy faulted addresses
Maintain a list of faulted addresses on the destination host for which we're
waiting on.  This is implemented using a GTree rather than a real list to make
sure even there're plenty of vCPUs/threads that are faulting, the lookup will
still be fast with O(log(N)) (because we'll do that after placing each page).
It should bring a slight overhead, but ideally that shouldn't be a big problem
simply because in most cases the requested page list will be short.

Actually we did similar things for postcopy blocktime measurements.  This patch
didn't use that simply because:

  (1) blocktime measurement is towards vcpu threads only, but here we need to
      record all faulted addresses, including main thread and external
      thread (like, DPDK via vhost-user).

  (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we
      don't want to add that extra dependency on the kernel version since not
      necessary.  E.g., we don't need to know which thread faulted on which
      page, we also don't care about multiple threads faulting on the same
      page.  But we only care about what addresses are faulted so waiting for a
      page copying from src.

  (3) blocktime measurement is not enabled by default.  However we need this by
      default especially for postcopy recover.

Another thing to mention is that this patch introduced a new mutex to serialize
the receivedmap and the page_requested tree, however that serialization does
not cover other procedures like UFFDIO_COPY.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-4-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 16:15:04 +00:00
Peter Xu 7a267fc49b migration: Introduce migrate_send_rp_message_req_pages()
This is another layer wrapper for sending a page request to the source VM.  The
new migrate_send_rp_message_req_pages() will be used elsewhere in coming
patches.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201021212721.440373-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 16:15:04 +00:00
Bihong Yu 395cb45009 migration: Add spaces around operator
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1603163448-27122-4-git-send-email-yubihong@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 16:15:04 +00:00
Paolo Bonzini 9f2931bc65 machine: remove deprecated -machine enforce-config-section option
Deprecated since 3.1 and complicates the initialization sequence,
remove it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-26 07:08:39 -04:00
Markus Armbruster 2155ceaf25 error: Remove NULL checks on error_propagate() calls (again)
Patch created mechanically by rerunning:

    $ spatch --sp-file scripts/coccinelle/error_propagate_null.cocci \
             --macro-file scripts/cocci-macro-file.h \
             --use-gitgrep .

Cc: Jens Freimann <jfreimann@redhat.com>
Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
Cc: Juan Quintela <quintela@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200722084048.1726105-4-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-10-09 08:36:23 +02:00
Dr. David Alan Gilbert 1783c00fc9 Revert "migration: Properly destroy variables on incoming side"
This reverts commit c02039a6f3.

This is breaking test 068 that does a loadvm twice.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-28 16:18:02 +01:00
Chuan Zheng d8053e73fb migration/tls: save hostname into MigrationState
hostname is need in multifd-tls, save hostname into MigrationState.

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Message-Id: <1600139042-104593-2-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 12:45:58 +01:00
Laurent Vivier 7590a2ae09 migration: increase max-bandwidth to 128 MiB/s (1 Gib/s)
max-bandwidth is set by default to 32 MiB/s (256 Mib/s)
since 2008 (5bb7910af0).

Most of the CPUs can dirty memory faster than that now,
and this is clearly a problem with POWER where the page
size is 64 kiB and not 4 KiB.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20200921144957.979989-1-lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 12:45:58 +01:00
Peter Xu 2e2bce167e migration: Rework migrate_send_rp_req_pages() function
We duplicated the logic of maintaining the last_rb variable at both callers of
this function.  Pass *rb pointer into the function so that we can avoid
duplicating the logic.  Also, when we have the rb pointer, it's also easier to
remove the original 2nd & 4th parameters, because both of them (name of the
ramblock when needed, or the page size) can be fetched from the ramblock
pointer too.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200908203022.341615-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 11:11:01 +01:00
Peter Xu c02039a6f3 migration: Properly destroy variables on incoming side
In migration_incoming_state_destroy(), we've got a few variables that aren't
destroyed properly, namely:

    main_thread_load_event
    postcopy_pause_sem_dst
    postcopy_pause_sem_fault
    rp_mutex

Destroy them properly.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200908203022.341615-2-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 11:11:01 +01:00
Stefan Hajnoczi d73415a315 qemu/atomic.h: rename atomic_ to qatomic_
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:

  $ CC=clang CXX=clang++ ./configure ... && make
  ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)

Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.

This patch was generated using:

  $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
    sort -u >/tmp/changed_identifiers
  $ for identifier in $(</tmp/changed_identifiers); do
        sed -i "s%\<$identifier\>%q$identifier%g" \
            $(git grep -I -l "\<$identifier\>")
    done

I manually fixed line-wrap issues and misaligned rST tables.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
2020-09-23 16:07:44 +01:00
Longpeng(Mike) 9ba3b2baa1 migration: add vsock as data channel support
The vsock channel is more widely use in some new features, for example,
the Nitro/Enclave. It can also be used as the migration channel.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Message-Id: <20200806074030.174-3-longpeng2@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-08-28 13:34:52 +01:00
Longpeng(Mike) d658f65c16 migration: unify the framework of socket-type channel
Currently, the only difference of tcp channel and unix channel in
migration/socket.c is the way to build SocketAddress, but socket_parse()
can handle these two types, so use it to instead of tcp_build_address()
and unix_build_address().

The socket-type channel can be further unified based on the up, this
would be helpful for us to add other socket-type channels.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Message-Id: <20200806074030.174-2-longpeng2@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-08-28 13:34:52 +01:00
Max Reitz 31e4c354b3 migration: Add block-bitmap-mapping parameter
This migration parameter allows mapping block node names and bitmap
names to aliases for the purpose of block dirty bitmap migration.

This way, management tools can use different node and bitmap names on
the source and destination and pass the mapping of how bitmaps are to be
transferred to qemu (on the source, the destination, or even both with
arbitrary aliases in the migration stream).

While touching this code, fix a bug where bitmap names longer than 255
bytes would fail an assertion in qemu_put_counted_string().

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200820150725.68687-2-mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-08-21 08:56:09 -05:00
Vladimir Sementsov-Ogievskiy 1499ab0969 migration/block-dirty-bitmap: cancel migration on shutdown
If target is turned off prior to postcopy finished, target crashes
because busy bitmaps are found at shutdown.
Canceling incoming migration helps, as it removes all unfinished (and
therefore busy) bitmaps.

Similarly on source we crash in bdrv_close_all which asserts that all
bdrv states are removed, because bdrv states involved into dirty bitmap
migration are referenced by it. So, we need to cancel outgoing
migration as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200727194236.19551-17-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-27 15:41:34 -05:00
Vladimir Sementsov-Ogievskiy d0cccbd118 migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
No reasons to keep two public init functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200727194236.19551-11-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-27 15:39:59 -05:00
Liao Pingfang eb9bd46ff6 migration/migration.c: Remove superfluous breaks
Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-07-13 18:15:36 +02:00
Zheng Chuan 9728ebfb77 migration: fix memory leak in qmp_migrate_set_parameters
"tmp.tls_hostname" and "tmp.tls_creds" allocated by migrate_params_test_apply()
is forgot to free at the end of qmp_migrate_set_parameters(). Fix that.

The leak stack:
Direct leak of 2 byte(s) in 2 object(s) allocated from:
   #0 0xffffb597c20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b)
   #1 0xffffb52dcb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b)
   #2 0xffffb52f8143 in g_strdup (/usr/lib64/libglib-2.0.so.0+0x74143)
   #3 0xaaaac52447fb in migrate_params_test_apply (/usr/src/debug/qemu-4.1.0/migration/migration.c:1377)
   #4 0xaaaac52fdca7 in qmp_migrate_set_parameters (/usr/src/debug/qemu-4.1.0/qapi/qapi-commands-migration.c:192)
   #5 0xaaaac551d543 in qmp_dispatch (/usr/src/debug/qemu-4.1.0/qapi/qmp-dispatch.c:165)
   #6 0xaaaac52a0a8f in qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:125)
   #7 0xaaaac52a1c7f in monitor_qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:214)
   #8 0xaaaac55cb0cf in aio_bh_call (/usr/src/debug/qemu-4.1.0/util/async.c:117)
   #9 0xaaaac55d4543 in aio_bh_poll (/usr/src/debug/qemu-4.1.0/util/aio-posix.c:459)
   #10 0xaaaac55cae0f in aio_dispatch (/usr/src/debug/qemu-4.1.0/util/async.c:268)
   #11 0xffffb52d6a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b)
   #12 0xaaaac55d1e3b(/usr/bin/qemu-kvm-4.1.0+0x1622e3b)
   #13 0xaaaac4e314bb(/usr/bin/qemu-kvm-4.1.0+0xe824bb)
   #14 0xaaaac47f45ef(/usr/bin/qemu-kvm-4.1.0+0x8455ef)
   #15 0xffffb4bfef3f in __libc_start_main (/usr/lib64/libc.so.6+0x23f3f)
   #16 0xaaaac47ffacb(/usr/bin/qemu-kvm-4.1.0+0x850acb)

Direct leak of 2 byte(s) in 2 object(s) allocated from:
   #0 0xffffb597c20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b)
   #1 0xffffb52dcb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b)
   #2 0xffffb52f8143 in g_strdup (/usr/lib64/libglib-2.0.so.0+0x74143)
   #3 0xaaaac5244893 in migrate_params_test_apply (/usr/src/debug/qemu-4.1.0/migration/migration.c:1382)
   #4 0xaaaac52fdca7 in qmp_migrate_set_parameters (/usr/src/debug/qemu-4.1.0/qapi/qapi-commands-migration.c:192)
   #5 0xaaaac551d543 in qmp_dispatch (/usr/src/debug/qemu-4.1.0/qapi/qmp-dispatch.c)
   #6 0xaaaac52a0a8f in qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:125)
   #7 0xaaaac52a1c7f in monitor_qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:214)
   #8 0xaaaac55cb0cf in aio_bh_call (/usr/src/debug/qemu-4.1.0/util/async.c:117)
   #9 0xaaaac55d4543 in aio_bh_poll (/usr/src/debug/qemu-4.1.0/util/aio-posix.c:459)
   #10 0xaaaac55cae0f in in aio_dispatch (/usr/src/debug/qemu-4.1.0/util/async.c:268)
   #11 0xffffb52d6a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b)
   #12 0xaaaac55d1e3b(/usr/bin/qemu-kvm-4.1.0+0x1622e3b)
   #13 0xaaaac4e314bb(/usr/bin/qemu-kvm-4.1.0+0xe824bb)
   #14 0xaaaac47f45ef (/usr/bin/qemu-kvm-4.1.0+0x8455ef)
   #15 0xffffb4bfef3f in __libc_start_main (/usr/lib64/libc.so.6+0x23f3f)
   #16 0xaaaac47ffacb(/usr/bin/qemu-kvm-4.1.0+0x850acb)

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Reviewed-by: KeQian Zhu <zhukeqian1@huawei.com>
Reviewed-by: HaiLiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-07-13 18:13:49 +02:00
Claudio Fontana b0c3cf9407 cpu-throttle: new module, extracted from cpus.c
move the vcpu throttling functionality into its own module.

This functionality is not specific to any accelerator,
and it is used currently by migration to slow down guests to try to
have migrations converge, and by the cocoa MacOS UI to throttle speed.

cpu-throttle contains the controls to adjust and inspect throttle
settings, start (set) and stop vcpu throttling, and the throttling
function itself that is run periodically on vcpus to make them take a nap.

Execution of the throttling function on all vcpus is triggered by a timer,
registered at module initialization.

No functionality change.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20200629093504.3228-3-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 18:04:49 -04:00
David Hildenbrand 18b1d3c952 migration/colo: Use ram_block_discard_disable()
COLO will copy all memory in a RAM block, disable discarding of RAM.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20200626072248.78761-10-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-07-02 05:54:59 -04:00
David Hildenbrand 06df2e692a virtio-balloon: Rip out qemu_balloon_inhibit()
The only remaining special case is postcopy. It cannot handle
concurrent discards yet, which would result in requesting already sent
pages from the source. Special-case it in virtio-balloon instead.

Introduce migration_in_incoming_postcopy(), to find out if incoming
postcopy is active.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20200626072248.78761-7-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-07-02 05:54:59 -04:00
Markus Armbruster 2194abd623 qdev: qdev_create(), qdev_try_create() are now unused, drop
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-31-armbru@redhat.com>
2020-06-15 22:05:28 +02:00
Lukas Straub 773861274a migration/migration.c: Fix hang in ram_save_host_page
migration_rate_limit will erroneously ratelimit a shutdown socket,
which causes the migration thread to hang in ram_save_host_page
if the socket is shutdown.

Fix this by explicitly testing if the socket has errors or was
shutdown in migration_rate_limit.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Message-Id: <e79085bbe2d46dfa007dd41820194d5e2d4fcd80.1590007004.git.lukasstraub2@web.de>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-06-01 18:44:27 +01:00
Wei Wang e460a4b1a4 migration/xbzrle: add encoding rate
Users may need to check the xbzrle encoding rate to know if the guest
memory is xbzrle encoding-friendly, and dynamically turn off the
encoding if the encoding rate is low.

Signed-off-by: Yi Sun <yi.y.sun@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Message-Id: <1588208375-19556-1-git-send-email-wei.w.wang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-07 17:40:24 +01:00
Keqian Zhu cbbf818224 migration/throttle: Add cpu-throttle-tailslow migration parameter
At the tail stage of throttling, the Guest is very sensitive to
CPU percentage while the @cpu-throttle-increment is excessive
usually at tail stage.

If this parameter is true, we will compute the ideal CPU percentage
used by the Guest, which may exactly make the dirty rate match the
dirty rate threshold. Then we will choose a smaller throttle increment
between the one specified by @cpu-throttle-increment and the one
generated by ideal CPU percentage.

Therefore, it is compatible to traditional throttling, meanwhile
the throttle increment won't be excessive at tail stage. This may
make migration time longer, and is disabled by default.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Message-Id: <20200413101508.54793-1-zhukeqian1@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-07 17:40:24 +01:00
Mao Zhongyi 7ac5529afb migration: move the units of migrate parameters from milliseconds to ms
Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <474bb6cf67defb8be9de5035c11aee57a680557a.1585641083.git.maozhongyi@cmss.chinamobile.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-07 17:40:24 +01:00
Mao Zhongyi 2ee30cf078 migration/migration: improve error reporting for migrate parameters
use QERR_INVALID_PARAMETER_VALUE instead of
"Parameter '%s' expects" for consistency.

Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com>
Message-Id: <4ce71da4a5f98ad6ead0806ec71043473dcb4c07.1585641083.git.maozhongyi@cmss.chinamobile.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-07 17:40:24 +01:00
Mao Zhongyi ed8b2828cc migration: fix bad indentation in error_report()
bad indentation conflicts with CODING_STYLE doc.

Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com>
Message-Id: <09f7529c665cac0c6a5e032ac6fdb6ca701f7e37.1585329482.git.maozhongyi@cmss.chinamobile.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-07 17:40:24 +01:00
Daniel Brodsky 6e8a355de6 lockable: replaced locks with lock guard macros where appropriate
- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end

Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-04 16:07:43 +01:00
Marc-André Lureau 9cbc36497c migration: fix cleanup_bh leak on resume
Since commit 8c6b0356b5 ("util/async:
make bh_aio_poll() O(1)"), migration-test reveals a leak:

QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
tests/qtest/migration-test  -p /x86_64/migration/postcopy/recovery
tests/qtest/libqtest.c:140: kill_qemu() tried to terminate QEMU
process but encountered exit status 1 (expected 0)

=================================================================
==2082571==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f25971dfc58 in __interceptor_malloc (/lib64/libasan.so.5+0x10dc58)
    #1 0x7f2596d08358 in g_malloc (/lib64/libglib-2.0.so.0+0x57358)
    #2 0x560970d006f8 in qemu_bh_new /home/elmarco/src/qemu/util/main-loop.c:532
    #3 0x5609704afa02 in migrate_fd_connect
/home/elmarco/src/qemu/migration/migration.c:3407
    #4 0x5609704b6b6f in migration_channel_connect
/home/elmarco/src/qemu/migration/channel.c:92
    #5 0x5609704b2bfb in socket_outgoing_migration
/home/elmarco/src/qemu/migration/socket.c:108
    #6 0x560970b9bd6c in qio_task_complete /home/elmarco/src/qemu/io/task.c:196
    #7 0x560970b9aa97 in qio_task_thread_result
/home/elmarco/src/qemu/io/task.c:111
    #8 0x7f2596cfee3a  (/lib64/libglib-2.0.so.0+0x4de3a)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20200325184723.2029630-2-marcandre.lureau@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-02 14:55:45 -04:00
Mao Zhongyi 7cd75cbdb8 migration: use "" instead of (null) for tls-authz
run:
(qemu) info migrate_parameters
announce-initial: 50 ms
...
announce-max: 550 ms
multifd-compression: none
xbzrle-cache-size: 4194304
max-postcopy-bandwidth: 0
 tls-authz: '(null)'

Migration parameter 'tls-authz' is used to provide the QOM ID
of a QAuthZ subclass instance that provides the access control
check, default is NULL. But the empty string is not a valid
object ID, so use "" instead of the default. Although it will
fail when lookup an object with ID "", it is harmless, just
consistent with tls_creds.

As a bonus, this patch also fixed the bad indentation on the
last line and removed 'has_tls_authz' redundant check in
'hmp_info_migrate_parameters'.

Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com>
Message-Id: <119f539a9f4d198bc3bcced46b8280520d60bc51.1585100802.git.maozhongyi@cmss.chinamobile.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-25 12:31:38 +00:00
Mao Zhongyi 06b1c6f8b7 xbzrle: update xbzrle doc
Add new parameter description, also:
1. Remove unsociable space.
2. Nit picking: s/two/2 in report

Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com>
Message-Id: <20200320143216.423374-1-maozhongyi@cmss.chinamobile.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-25 12:31:38 +00:00
zhanghailiang 19dd408a47 migration: recognize COLO as part of activating process
We will migrate parts of dirty pages backgroud lively during the gap time
of two checkpoints, without this modification, it will not work
because ram_save_iterate() will check it before send RAM_SAVE_FLAG_EOS
at the end of it.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-Id: <20200224065414.36524-7-zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-13 09:36:30 +00:00
Keqian Zhu dc14a47076 migration/throttle: Add throttle-trig-thres migration parameter
Currently, if the bytes_dirty_period is more than the 50% of
bytes_xfer_period, we start or increase throttling.

If we make this percentage higher, then we can tolerate higher
dirty rate during migration, which means less impact on guest.
The side effect of higher percentage is longer migration time.
We can make this parameter configurable to switch between mig-
ration time first or guest performance first.

The default value is 50 and valid range is 1 to 100.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Message-Id: <20200224023142.39360-1-zhukeqian1@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-13 09:36:30 +00:00
zhanghailiang f51d0b4178 savevm: Don't call colo_init_ram_cache twice
This helper has been called twice which is wrong.
Left the one where called while get COLO enable message
from source side.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-02-28 10:13:54 +01:00
zhanghailiang 0306dae5ac migration: fix COLO broken caused by a previous commit
This commit "migration: Create migration_is_running()" broke
COLO. Becuase there is a process broken by this commit.

colo_process_checkpoint
 ->colo_do_checkpoint_transaction
   ->migrate_set_block_enabled
     ->qmp_migrate_set_capabilities

It can be fixed by make COLO process as an exception,
Maybe we need a better way to fix it.

Cc: Juan Quintela <quintela@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-02-28 10:13:54 +01:00
Juan Quintela 6a9ad15420 multifd: Add multifd-zstd-level parameter
This parameter specifies the zstd compression level. The next patch
will put it to use.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
2020-02-28 09:25:28 +01:00
Juan Quintela 9004db48c0 multifd: Add multifd-zlib-level parameter
This parameter specifies the zlib compression level. The next patch
will put it to use.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-02-28 09:24:43 +01:00
Juan Quintela ab7cbb0b9a multifd: Make no compression operations into its own structure
It will be used later.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

---

No comp value needs to be zero.
2020-02-28 09:24:43 +01:00
Juan Quintela 96eef04238 multifd: Add multifd-compression parameter
This will store the compression method to use.  We start with none.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

---

Rename multifd-method to multifd-compression
2020-02-28 09:24:43 +01:00
Keqian Zhu d05de9e39a migration: Optimization about wait-unplug migration state
qemu_savevm_nr_failover_devices() is originally designed to
get the number of failover devices, but it actually returns
the number of "unplug-pending" failover devices now. Moreover,
what drives migration state to wait-unplug should be the number
of "unplug-pending" failover devices, not all failover devices.

We can also notice that qemu_savevm_state_guest_unplug_pending()
and qemu_savevm_nr_failover_devices() is equivalent almost (from
the code view). So the latter is incorrect semantically and
useless, just delete it.

In the qemu_savevm_state_guest_unplug_pending(), once hit a
unplug-pending failover device, then it can return true right
now to save cpu time.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-02-13 10:53:10 +01:00
Zhimin Feng 8958338b10 migration: Maybe VM is paused when migration is cancelled
If the migration is cancelled when it is in the completion phase,
the migration state is set to MIGRATION_STATUS_CANCELLING.
The VM maybe wait for the 'pause_sem' semaphore in migration_maybe_pause
function, so that VM always is paused.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Zhimin Feng <fengzhimin1@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-02-13 10:52:58 +01:00
Juan Quintela d32ca5ad79 multifd: Split multifd code into its own file
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-29 11:28:59 +01:00
Juan Quintela b673eab4e2 multifd: Make multifd_load_setup() get an Error parameter
We need to change the full chain to pass the Error parameter.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-29 11:28:59 +01:00
Juan Quintela 00f4b572e6 multifd: Make multifd_save_setup() get an Error parameter
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-29 11:28:59 +01:00
Juan Quintela 392d87e213 migration: Create migration_is_running()
This function returns true if we are in the middle of a migration.
It is like migration_is_setup_or_active() with CANCELLING and COLO.
Adapt all callers that are needed.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-29 11:28:59 +01:00
Marc-André Lureau 4f67d30b5e qdev: set properties with device_class_set_props()
The following patch will need to handle properties registration during
class_init time. Let's use a device_class_set_props() setter.

spatch --macro-file scripts/cocci-macro-file.h  --sp-file
./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place
--dir .

@@
typedef DeviceClass;
DeviceClass *d;
expression val;
@@
- d->props = val
+ device_class_set_props(d, val)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 20:59:15 +01:00
Wei Yang 644acf99b8 migration/postcopy: enable compress during postcopy
postcopy requires to place a whole host page, while migration thread
migrate memory in target page size. This makes postcopy need to collect
all target pages in one host page before placing via userfaultfd.

To enable compress during postcopy, there are two problems to solve:

    1. Random order for target page arrival
    2. Target pages in one host page arrives without interrupt by target
       page from other host page

The first one is handled by previous cleanup patch.

This patch handles the second one by:

    1. Flush compress thread for each host page
    2. Wait for decompress thread for before placing host page

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-01-20 09:10:23 +01:00
Yury Kotov 603d5a42d3 migration: Fix the re-run check of the migrate-incoming command
The current check sets an error but doesn't fail the command.
This may cause a problem if new connection attempt by the same URI
affects the first connection.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-01-20 09:10:22 +01:00
Fangrui Song 2667c98722 migration: Fix incorrect integer->float conversion caught by clang
Clang does not like qmp_migrate_set_downtime()'s code to clamp double
@value to 0..INT64_MAX:

    qemu/migration/migration.c:2038:24: error: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Werror,-Wimplicit-int-float-conversion]

The warning will be enabled by default in clang 10. It is not
available for clang <= 9.

The clamp is actually useless; @value is checked to be within
0..MAX_MIGRATE_DOWNTIME_SECONDS immediately before.  Delete it.

While there, make the conversion from double to int64_t explicit.

Signed-off-by: Fangrui Song <i@maskray.me>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Patch split, commit message improved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-01-20 09:10:22 +01:00
Dr. David Alan Gilbert 97e1e06780 migration: Rate limit inside host pages
When using hugepages, rate limiting is necessary within each huge
page, since a 1G huge page can take a significant time to send, so
you end up with bursty behaviour.

Fixes: 4c011c37ec ("postcopy: Send whole huge pages")
Reported-by: Lin Ma <LMa@suse.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-01-20 09:10:22 +01:00
Jens Freimann c7e0acd5a3 migration: add new migration state wait-unplug
This patch adds a new migration state called wait-unplug.  It is entered
after the SETUP state if failover devices are present. It will transition
into ACTIVE once all devices were succesfully unplugged from the guest.

So if a guest doesn't respond or takes long to honor the unplug request
the user will see the migration state 'wait-unplug'.

In the migration thread we query failover devices if they're are still
pending the guest unplug. When all are unplugged the migration
continues. If one device won't unplug migration will stay in wait_unplug
state.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20191029114905.6856-9-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-29 18:55:26 -04:00
Wei Yang 038adc2f58 core: replace getpagesize() with qemu_real_host_page_size
There are three page size in qemu:

  real host page size
  host page size
  target page size

All of them have dedicate variable to represent. For the last two, we
use the same form in the whole qemu project, while for the first one we
use two forms: qemu_real_host_page_size and getpagesize().

qemu_real_host_page_size is defined to be a replacement of
getpagesize(), so let it serve the role.

[Note] Not fully tested for some arch or device.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-26 15:38:06 +02:00
Wei Yang 17d9351bf2 migration: pass in_postcopy instead of check state again
Not necessary to do the check again.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20191005220517.24029-4-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-10-11 14:31:27 +01:00
Dr. David Alan Gilbert fb14a42ade migration: Don't try and recover return path in non-postcopy
In normal precopy we can't do reconnection recovery - but we also
don't need to, since you can just rerun migration.
At the moment if the 'return-path' capability is on, we use
the return path in precopy to give a positive 'OK' to the end
of migration; however if migration fails then we fall into
the postcopy recovery path and hang.  This fixes it by only
running the return path in the postcopy case.

Reported-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-10-11 14:25:26 +01:00
Wei Yang 8f8d528e73 migration: use migration_is_active to represent active state
Wrap the check into a function to make it easy to read.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190717005341.14140-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-10-11 14:18:13 +01:00
Dr. David Alan Gilbert 3748fef9b9 migration/postcopy: Recognise the recovery states as 'in_postcopy'
Various parts of the migration code do different things when they're
in postcopy mode; prior to this patch this has been 'postcopy-active'.
This patch extends 'in_postcopy' to include 'postcopy-paused' and
'postcopy-recover'.

In particular, when you set the max-postcopy-bandwidth parameter, this
only affects the current migration fd if we're 'in_postcopy';
this leads to a race in the postcopy recovery test where it increases
the speed from 4k/sec to unlimited, but that increase can get ignored
if the change is made between the point at which the reconnection
happens and it transitions back to active.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190923174942.12182-1-dgilbert@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-09-25 15:51:19 +01:00
Wei Yang 268dcd46ae migration: fix one typo in comment of function migration_total_bytes()
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190912024957.11780-1-richardw.yang@linux.intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-09-12 11:25:06 +01:00
Peter Xu 8504ddeca0 migration: Fix postcopy bw for recovery
We've got max-postcopy-bandwidth parameter but it's not applied
correctly after a postcopy recovery so the recovered migration stream
will still eat the whole net bandwidth.  Fix that up.

Reported-by: Xiaohui Li <xiaohli@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190906130103.20961-1-peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-09-12 11:21:25 +01:00
Yury Kotov b9d68df62a migration: Add validate-uuid capability
This capability realizes simple source validation by UUID.
It's useful for live migration between hosts.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190903162246.18524-2-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-09-12 11:19:23 +01:00
Peter Maydell 95a9457fd4 Header cleanup patches for 2019-08-13
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl1WleASHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTBBYQALQLzIYb2Zux95bAxoJdhqNuEOGLfxeu
 gx0i0roPe6SBleHozUK+gf7kVYyw7he58n2dZURGqrpqktgZOFcea2a6Dq1rnVw6
 JMJ2Oy7V326bHwJT0Np9rW4n+FHsMQZoAUEHjl9EeGCZfO/zy2aSWPsD8mbcbm0g
 hUW5Jr4+cpm28BCL8I+2HhWFazB6G2IPAF9oEXmNsOM6J1Ho8WGrTAjASe0Il5Yi
 m2B4QWG+4uz77WYnkttnssm41K1S95HYyaKluIVyNwTnsPTN303V/sUj+wdRaooL
 k1O6WqaavGhal7QeRqy+vCpF8m6qLq7NaYCzSCOrrkkuC8TAnpVn7Xmi9qI+vb6O
 kGBpDWhq5wOnphsEhnFvhPZgD+WZo3mwTgW4h0d3UhB6orOTPTMvWKEwFJ1j/O6/
 gntV61o542c9gpZjS133221HRmNjteHF/5/TFzmX/G50sgivJn+WOP87naM2aBAz
 8MW5HatTox+qQqYD4VMUIVnVkguxHDVhFRBunYu0HvZZ1Rud+Lc6Xzi6H4jDlZ81
 vtOmAlMU3dbp97gNvJrAVqV4JIL3puOWbu0MMaQWoG53Kcdfu46LIr57TTg3dw61
 R9e7HSOQjYILChoodwELlyeAsVeZo3IzX9vPX8aw7MoHvneyTUNqtha/rHsLEwsb
 97G19dydGEC6
 =eSUz
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-include-2019-08-13-v2' into staging

Header cleanup patches for 2019-08-13

# gpg: Signature made Fri 16 Aug 2019 12:39:12 BST
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-include-2019-08-13-v2: (29 commits)
  sysemu: Split sysemu/runstate.h off sysemu/sysemu.h
  sysemu: Move the VMChangeStateEntry typedef to qemu/typedefs.h
  Include sysemu/sysemu.h a lot less
  Clean up inclusion of sysemu/sysemu.h
  numa: Move remaining NUMA declarations from sysemu.h to numa.h
  Include sysemu/hostmem.h less
  numa: Don't include hw/boards.h into sysemu/numa.h
  Include hw/boards.h a bit less
  Include hw/qdev-properties.h less
  Include qemu/main-loop.h less
  Include qemu/queue.h slightly less
  Include hw/hw.h exactly where needed
  Include qom/object.h slightly less
  Include exec/memory.h slightly less
  Include migration/vmstate.h less
  migration: Move the VMStateDescription typedef to typedefs.h
  Clean up inclusion of exec/cpu-common.h
  Include hw/irq.h a lot less
  typedefs: Separate incomplete types and function types
  ide: Include hw/ide/internal a bit less outside hw/ide/
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-16 14:53:43 +01:00
Markus Armbruster 54d31236b9 sysemu: Split sysemu/runstate.h off sysemu/sysemu.h
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related
to the system-emulator.  Evidence:

* It's included widely: in my "build everything" tree, changing
  sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600
  objects (not counting tests and objects that don't depend on
  qemu/osdep.h, down from 5400 due to the previous two commits).

* It pulls in more than a dozen additional headers.

Split stuff related to run state management into its own header
sysemu/runstate.h.

Touching sysemu/sysemu.h now recompiles some 850 objects.  qemu/uuid.h
also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400
to 4200.  Touching new sysemu/runstate.h recompiles some 500 objects.

Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also
add qemu/main-loop.h.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-30-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[Unbreak OS-X build]
2019-08-16 13:37:36 +02:00
Markus Armbruster 46517dd497 Include sysemu/sysemu.h a lot less
In my "build everything" tree, changing sysemu/sysemu.h triggers a
recompile of some 5400 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

hw/qdev-core.h includes sysemu/sysemu.h since recent commit e965ffa70a
"qdev: add qdev_add_vm_change_state_handler()".  This is a bad idea:
hw/qdev-core.h is widely included.

Move the declaration of qdev_add_vm_change_state_handler() to
sysemu/sysemu.h, and drop the problematic include from hw/qdev-core.h.

Touching sysemu/sysemu.h now recompiles some 1800 objects.
qemu/uuid.h also drops from 5400 to 1800.  A few more headers show
smaller improvement: qemu/notify.h drops from 5600 to 5200,
qemu/timer.h from 5600 to 4500, and qapi/qapi-types-run-state.h from
5500 to 5000.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190812052359.30071-28-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2019-08-16 13:31:53 +02:00
Markus Armbruster a27bd6c779 Include hw/qdev-properties.h less
In my "build everything" tree, changing hw/qdev-properties.h triggers
a recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

Many places including hw/qdev-properties.h (directly or via hw/qdev.h)
actually need only hw/qdev-core.h.  Include hw/qdev-core.h there
instead.

hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h
and hw/qdev-properties.h, which in turn includes hw/qdev-core.h.
Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h.

While there, delete a few superfluous inclusions of hw/qdev-core.h.

Touching hw/qdev-properties.h now recompiles some 1200 objects.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-08-16 13:31:53 +02:00
Markus Armbruster db72581598 Include qemu/main-loop.h less
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).  It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.

Include qemu/main-loop.h only where it's needed.  Touching it now
recompiles only some 1700 objects.  For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800.  For the
others, they shrink only slightly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Ivan Ren 87f3bd8717 migration: always initialise ram_counters for a new migration
This patch fix a multifd migration bug in migration speed calculation, this
problem can be reproduced as follows:
1. start a vm and give a heavy memory write stress to prevent the vm be
   successfully migrated to destination
2. begin a migration with multifd
3. migrate for a long time [actually, this can be measured by transferred bytes]
4. migrate cancel
5. begin a new migration with multifd, the migration will directly run into
   migration_completion phase

Reason as follows:

Migration update bandwidth and s->threshold_size in function
migration_update_counters after BUFFER_DELAY time:

    current_bytes = migration_total_bytes(s);
    transferred = current_bytes - s->iteration_initial_bytes;
    time_spent = current_time - s->iteration_start_time;
    bandwidth = (double)transferred / time_spent;
    s->threshold_size = bandwidth * s->parameters.downtime_limit;

In multifd migration, migration_total_bytes function return
qemu_ftell(s->to_dst_file) + ram_counters.multifd_bytes.
s->iteration_initial_bytes will be initialized to 0 at every new migration,
but ram_counters is a global variable, and history migration data will be
accumulated. So if the ram_counters.multifd_bytes is big enough, it may lead
pending_size >= s->threshold_size become false in migration_iteration_run
after the first migration_update_counters.

Signed-off-by: Ivan Ren <ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Suggested-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <1564741121-1840-1-git-send-email-ivanren@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-08-14 17:33:14 +01:00
Wei Yang 14adf288d3 migration: remove unused field bytes_xfer
MigrationState->bytes_xfer is only set to 0 in migrate_init().

Remove this unnecessary field.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190402003106.17614-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-08-14 17:33:14 +01:00
Wei Yang 52aec70923 migration/postcopy: start_postcopy could be true only when migrate_postcopy() return true
There is only one place to set start_postcopy to true,
qmp_migrate_start_postcopy(), which make sure start_postcopy could be
set to true when migrate_postcopy() return true.

So start_postcopy is true implies the other one.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190718083747.5859-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-08-14 17:33:14 +01:00
Wei Yang 640dfb14db migration: consolidate time info into populate_time_info
Consolidate time information fill up into its function for better
readability.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190716005411.4156-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-08-14 17:33:14 +01:00
Yury Kotov 3d661c8ab1 migration: Add error_desc for file channel errors
Currently, there is no information about error if outgoing migration was failed
because of file channel errors.
Example (QMP session):
-> { "execute": "migrate", "arguments": { "uri": "exec:head -c 1" }}
<- { "return": {} }
...
-> { "execute": "query-migrate" }
<- { "return": { "status": "failed" }} // There is not error's description

And even in the QEMU's output there is nothing.

This patch
1) Adds errp for the most of QEMUFileOps
2) Adds qemu_file_get_error_obj/qemu_file_set_error_obj
3) And finally using of qemu_file_get_error_obj in migration.c

And now, the status for the mentioned fail will be:
-> { "execute": "query-migrate" }
<- { "return": { "status": "failed",
                 "error-desc": "Unable to write to command: Broken pipe" }}

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190422103420.15686-1-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-08-14 17:33:14 +01:00
Peter Xu 002cad6b16 migration: Split log_clear() into smaller chunks
Currently we are doing log_clear() right after log_sync() which mostly
keeps the old behavior when log_clear() was still part of log_sync().

This patch tries to further optimize the migration log_clear() code
path to split huge log_clear()s into smaller chunks.

We do this by spliting the whole guest memory region into memory
chunks, whose size is decided by MigrationState.clear_bitmap_shift (an
example will be given below).  With that, we don't do the dirty bitmap
clear operation on the remote node (e.g., KVM) when we fetch the dirty
bitmap, instead we explicitly clear the dirty bitmap for the memory
chunk for each of the first time we send a page in that chunk.

Here comes an example.

Assuming the guest has 64G memory, then before this patch the KVM
ioctl KVM_CLEAR_DIRTY_LOG will be a single one covering 64G memory.
If after the patch, let's assume when the clear bitmap shift is 18,
then the memory chunk size on x86_64 will be 1UL<<18 * 4K = 1GB.  Then
instead of sending a big 64G ioctl, we'll send 64 small ioctls, each
of the ioctl will cover 1G of the guest memory.  For each of the 64
small ioctls, we'll only send if any of the page in that small chunk
was going to be sent right away.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190603065056.25211-12-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15 15:39:03 +02:00
Greg Kurz b6eca81e1b migration: Fix typo in migrate_add_blocker() error message
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <155800428514.543845.17558475870097990036.stgit@bahia.lan>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-05-22 17:35:27 +02:00
Yury Kotov fd392cfa8e migration: Fix use-after-free during process exit
It fixes heap-use-after-free which was found by clang's ASAN.

Control flow of this use-after-free:
main_thread:
    * Got SIGTERM and completes main loop
    * Calls migration_shutdown
      - migrate_fd_cancel (so, migration_thread begins to complete)
      - object_unref(OBJECT(current_migration));

migration_thread:
    * migration_iteration_finish -> schedule cleanup bh
    * object_unref(OBJECT(s)); (Now, current_migration is freed)
    * exits

main_thread:
    * Calls vm_shutdown -> drain bdrvs -> main loop
      -> cleanup_bh -> use after free

If you want to reproduce, these couple of sleeps will help:
vl.c:4613:
     migration_shutdown();
+    sleep(2);
migration.c:3269:
+    sleep(1);
     trace_migration_thread_after_loop();
     migration_iteration_finish(s);

Original output:
qemu-system-x86_64: terminating on signal 15 from pid 31980 (<unknown process>)
=================================================================
==31958==ERROR: AddressSanitizer: heap-use-after-free on address 0x61900001d210
  at pc 0x555558a535ca bp 0x7fffffffb190 sp 0x7fffffffb188
READ of size 8 at 0x61900001d210 thread T0 (qemu-vm-0)
    #0 0x555558a535c9 in migrate_fd_cleanup migration/migration.c:1502:23
    #1 0x5555594fde0a in aio_bh_call util/async.c:90:5
    #2 0x5555594fe522 in aio_bh_poll util/async.c:118:13
    #3 0x555559524783 in aio_poll util/aio-posix.c:725:17
    #4 0x555559504fb3 in aio_wait_bh_oneshot util/aio-wait.c:71:5
    #5 0x5555573bddf6 in virtio_blk_data_plane_stop
      hw/block/dataplane/virtio-blk.c:282:5
    #6 0x5555589d5c09 in virtio_bus_stop_ioeventfd hw/virtio/virtio-bus.c:246:9
    #7 0x5555589e9917 in virtio_pci_stop_ioeventfd hw/virtio/virtio-pci.c:287:5
    #8 0x5555589e22bf in virtio_pci_vmstate_change hw/virtio/virtio-pci.c:1072:9
    #9 0x555557628931 in virtio_vmstate_change hw/virtio/virtio.c:2257:9
    #10 0x555557c36713 in vm_state_notify vl.c:1605:9
    #11 0x55555716ef53 in do_vm_stop cpus.c:1074:9
    #12 0x55555716eeff in vm_shutdown cpus.c:1092:12
    #13 0x555557c4283e in main vl.c:4617:5
    #14 0x7fffdfdb482f in __libc_start_main
      (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #15 0x555556ecb118 in _start (x86_64-softmmu/qemu-system-x86_64+0x1977118)

0x61900001d210 is located 144 bytes inside of 952-byte region
  [0x61900001d180,0x61900001d538)
freed by thread T6 (live_migration) here:
    #0 0x555556f76782 in __interceptor_free
      /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:124:3
    #1 0x555558d5fa94 in object_finalize qom/object.c:618:9
    #2 0x555558d57651 in object_unref qom/object.c:1068:9
    #3 0x555558a55588 in migration_thread migration/migration.c:3272:5
    #4 0x5555595393f2 in qemu_thread_start util/qemu-thread-posix.c:502:9
    #5 0x7fffe057f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)

previously allocated by thread T0 (qemu-vm-0) here:
    #0 0x555556f76b03 in __interceptor_malloc
      /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146:3
    #1 0x7ffff6ee37b8 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4f7b8)
    #2 0x555558d58031 in object_new qom/object.c:640:12
    #3 0x555558a31f21 in migration_object_init migration/migration.c:139:25
    #4 0x555557c41398 in main vl.c:4320:5
    #5 0x7fffdfdb482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

Thread T6 (live_migration) created by T0 (qemu-vm-0) here:
    #0 0x555556f5f0dd in pthread_create
      /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:210:3
    #1 0x555559538cf9 in qemu_thread_create util/qemu-thread-posix.c:539:11
    #2 0x555558a53304 in migrate_fd_connect migration/migration.c:3332:5
    #3 0x555558a72bd8 in migration_channel_connect migration/channel.c:92:5
    #4 0x555558a6ef87 in exec_start_outgoing_migration migration/exec.c:42:5
    #5 0x555558a4f3c2 in qmp_migrate migration/migration.c:1922:9
    #6 0x555558bb4f6a in qmp_marshal_migrate qapi/qapi-commands-migration.c:607:5
    #7 0x555559363738 in do_qmp_dispatch qapi/qmp-dispatch.c:131:5
    #8 0x555559362a15 in qmp_dispatch qapi/qmp-dispatch.c:174:11
    #9 0x5555571bac15 in monitor_qmp_dispatch monitor.c:4124:11
    #10 0x55555719a22d in monitor_qmp_bh_dispatcher monitor.c:4207:9
    #11 0x5555594fde0a in aio_bh_call util/async.c:90:5
    #12 0x5555594fe522 in aio_bh_poll util/async.c:118:13
    #13 0x5555595201e0 in aio_dispatch util/aio-posix.c:460:5
    #14 0x555559503553 in aio_ctx_dispatch util/async.c:261:5
    #15 0x7ffff6ede196 in g_main_context_dispatch
      (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4a196)

SUMMARY: AddressSanitizer: heap-use-after-free migration/migration.c:1502:23
  in migrate_fd_cleanup
Shadow bytes around the buggy address:
  0x0c327fffb9f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c327fffba40: fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable: 00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone: fa
  Freed heap region: fd
  Stack left redzone: f1
  Stack mid redzone: f2
  Stack right redzone: f3
  Stack after return: f5
  Stack use after scope: f8
  Global redzone: f9
  Global init order: f6
  Poisoned by user: f7
  Container overflow: fc
  Array cookie: ac
  Intra object redzone: bb
  ASan internal: fe
  Left alloca redzone: ca
  Right alloca redzone: cb
  Shadow gap: cc
==31958==ABORTING

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190408113343.2370-1-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Fixed up comment formatting
2019-05-14 18:59:54 +01:00
Wei Yang 15d2d64cf5 migration: remove not used field xfer_limit
MigrationState->xfer_limit is only set to 0 in migrate_init().

Remove this unnecessary field.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190326055726.10539-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-05-14 17:33:35 +01:00
Markus Armbruster daff7f0bbe migration: Support adding migration blockers earlier
migrate_add_blocker() asserts we have a current_migration object, in
migrate_get_current().  We do only after migration_object_init().

This contributes to the following dependency cycle:

* configure_blockdev() must run before machine_set_property()
  so machine properties can refer to block backends

* machine_set_property() before configure_accelerator()
  so machine properties like kvm-irqchip get applied

* configure_accelerator() before migration_object_init()
  so that Xen's accelerator compat properties get applied.

* migration_object_init() before configure_blockdev()
  so configure_blockdev() can add migration blockers

The cycle was closed when recent commit cda4aa9a5a "Create block
backends before setting machine properties" added the first
dependency, and satisfied it by violating the last one.  Broke block
backends that add migration blockers, as demonstrated by qemu-iotests
055.

To fix it, break the last dependency: make migrate_add_blocker()
usable before migration_object_init().

The previous commit already removed the use of migrate_get_current()
from migrate_add_blocker() itself.  Didn't quite do the trick, as
there's another one hiding in migration_is_idle().

The use there isn't actually necessary: when no migration object has
been created yet, migration is surely idle.  Make migration_is_idle()
return true then.

Fixes: cda4aa9a5a
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190401090827.20793-4-armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2019-04-02 13:49:36 +02:00
Markus Armbruster 811f865271 Revert "migration: move only_migratable to MigrationState"
This reverts commit 3df663e575.
This reverts commit b605c47b57.

Command line option --only-migratable is for disallowing any
configuration that can block migration.

Initially, --only-migratable set global variable @only_migratable.

Commit 3df663e575 "migration: move only_migratable to MigrationState"
replaced it by MigrationState member @only_migratable.  That was a
mistake.

First, it doesn't make sense on the design level.  MigrationState
captures the state of an individual migration, but --only-migratable
isn't a property of an individual migration, it's a restriction on
QEMU configuration.  With fault tolerance, we could have several
migrations at once.  --only-migratable would certainly protect all of
them.  Storing it in MigrationState feels inappropriate.

Second, it contributes to a dependency cycle that manifests itself as
a bug now.

Putting @only_migratable into MigrationState means its available only
after migration_object_init().

We can't set it before migration_object_init(), so we delay setting it
with a global property (this is fixup commit b605c47b57 "migration:
fix handling for --only-migratable").

We can't get it before migration_object_init(), so anything that uses
it can only run afterwards.

Since migrate_add_blocker() needs to obey --only-migratable, any code
adding migration blockers can run only afterwards.  This contributes
to the following dependency cycle:

* configure_blockdev() must run before machine_set_property()
  so machine properties can refer to block backends

* machine_set_property() before configure_accelerator()
  so machine properties like kvm-irqchip get applied

* configure_accelerator() before migration_object_init()
  so that Xen's accelerator compat properties get applied.

* migration_object_init() before configure_blockdev()
  so configure_blockdev() can add migration blockers

The cycle was closed when recent commit cda4aa9a5a "Create block
backends before setting machine properties" added the first
dependency, and satisfied it by violating the last one.  Broke block
backends that add migration blockers.

Moving @only_migratable into MigrationState was a mistake.  Revert it.

This doesn't quite break the "migration_object_init() before
configure_blockdev() dependency, since migrate_add_blocker() still has
another dependency on migration_object_init().  To be addressed the
next commit.

Note that the reverted commit made -only-migratable sugar for -global
migration.only-migratable=on below the hood.  Documentation has only
ever mentioned -only-migratable.  This commit removes the arcane &
undocumented alternative to -only-migratable again.  Nobody should be
using it.

Conflicts:
	include/migration/misc.h
	migration/migration.c
	migration/migration.h
	vl.c

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190401090827.20793-3-armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2019-04-02 13:38:05 +02:00
Dr. David Alan Gilbert c38c1c142e migration/postcopy: Update the bandwidth during postcopy
The recently added max-postcopy-bandwidth parameter is only read
at the transition from precopy->postcopy where as the older
max-bandwidth parameter updates the migration bandwidth when changed
even if the migration is already running.

Fix this discrepency so that:
  a) You can change the bandwidth during postcopy by setting
     max-postcopy-bandwidth

  b) Changing max-bandwidth during postcopy has no effect
     (it currently changes the postcopy bandwidth which isn't
     expected).

Fixes: 7e555c6c
bz: https://bugzilla.redhat.com/show_bug.cgi?id=1686321
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-25 18:46:03 +01:00
Daniel P. Berrange d2f1d29b95 migration: add support for a "tls-authz" migration parameter
The QEMU instance that runs as the server for the migration data
transport (ie the target QEMU) needs to be able to configure access
control so it can prevent unauthorized clients initiating an incoming
migration. This adds a new 'tls-authz' migration parameter that is used
to provide the QOM ID of a QAuthZ subclass instance that provides the
access control check. This is checked against the x509 certificate
obtained during the TLS handshake.

For example, when starting a QEMU for incoming migration, it is
possible to give an example identity of the source QEMU that is
intended to be connecting later:

  $QEMU \
     -monitor stdio \
     -incoming defer \
     ...other args...

  (qemu) object_add tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
             endpoint=server,verify-peer=yes \
  (qemu) object_add authz-simple,id=auth0,identity=CN=laptop.example.com,,\
             O=Example Org,,L=London,,ST=London,,C=GB \
  (qemu) migrate_incoming tcp:localhost:9000

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-25 18:13:47 +01:00
Juan Quintela cbfd6c957a multifd: Drop x-
We make it supported from now on.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-25 18:13:45 +01:00
Juan Quintela efd1a1d640 multifd: Drop x-multifd-page-count parameter
Libvirt don't want to expose (and explain it).  From now on we measure
the number of packages in bytes instead of pages, so it is the same
independently of architecture.  We choose the page size of x86.
Notice that in the following patch we make this variable.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-25 18:13:41 +01:00
Zhang Chen db00972922 Migration/colo.c: Make COLO node running after failover
Delay to close COLO for auto start VM after failover.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190303145021.2962-4-chen.zhang@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:18 +00:00
Juan Quintela 9aca82ba31 migration: Create socket-address parameter
It will be used to store the uri parameters. We want this only for
tcp, so we don't set it for other uris.  We need it to know what port
is migration running.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Removed DummyStruct as suggested by Eric & Markus

--
2019-03-06 10:49:17 +00:00
Yury Kotov 18269069c3 migration: Introduce ignore-shared capability
We want to use local migration to update QEMU for running guests.
In this case we don't need to migrate shared (file backed) RAM.
So, add a capability to ignore such blocks during live migration.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-3-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert 892ae715b6 migration: Cleanup during exit
Currently we cleanup the migration object as we exit main after the
main_loop finishes; however if there's a migration running things
get messy and we can end up with the migration thread still trying
to access freed structures.

We now take a ref to the object around the migration thread itself,
so the act of dropping the ref during exit doesn't cause us to lose
the state until the thread quits.

Cancelling the migration during migration also tries to get the thread
to quit.

We do this a bit earlier; so hopefully migration gets out of the way
before all the devices etc are freed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190227164900.16378-1-dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert c3c5eae6ac migration: Fix cancel state
During a cancelled migration there's a race where the fd can
go into an error state before we get back around the migration loop
and migration_detect_error transitions from cancelling->failed.

Check for cancelled/cancelling and don't change the state.

Red Hat bug: https://bugzilla.redhat.com/show_bug.cgi?id=1608649

Fixes: b23c2ade25
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190219195928.12289-1-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert 7659505c16 migration: Switch to using announce timer
Switch the announcements to using the new announce timer.
Move the code that does it to announce.c rather than savevm
because it really has nothing to do with the actual migration.

Migration starts the announce from bh's and so they're all
in the main thread/bql, and so there's never any racing with
the timers themselves.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2019-03-05 11:27:41 +08:00
Dr. David Alan Gilbert ee3d96baf3 migration: Add announce parameters
Add migration parameters that control RARP/GARP announcement timeouts.

Based on earlier patches by myself and
  Vladislav Yasevich <vyasevic@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2019-03-05 11:27:41 +08:00
Dr. David Alan Gilbert 50510ea2c2 net: Introduce announce timer
The 'announce timer' will be used by migration, and explicit
requests for qemu to perform network announces.

Based on the work by Germano Veit Michel <germano@redhat.com>
 and Vlad Yasevich <vyasevic@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2019-03-05 11:27:41 +08:00
Xiao Guangrong aecbfe9c64 migration: introduce pages-per-second
It introduces a new statistic, pages-per-second, as bandwidth or mbps is
not enough to measure the performance of posting pages out as we have
compression, xbzrle, which can significantly reduce the amount of the
data size, instead, pages-per-second is the one we want

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20190111063732.10484-2-xiaoguangrong@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  With typo's Eric spotted fixed
2019-01-23 15:51:47 +00:00
Fei Li 6d99c2d41c migration: unify error handling for process_incoming_migration_co
In the current code, if process_incoming_migration_co() fails we do
the same error handing: set the error state, close the source file,
do the cleanup for multifd, and then exit(EXIT_FAILURE). To make the
code clearer, add a "goto fail" to unify the error handling.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190113140849.38339-6-lifei1214@126.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:02:07 +00:00
Fei Li 1398b2e3fe migration: multifd_save_cleanup() can't fail, simplify
multifd_save_cleanup() takes an Error ** argument and returns an
error code even though it can't actually fail.  Its callers
dutifully check for failure.  Remove the useless argument and return
value, and simplify the callers.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190113140849.38339-4-lifei1214@126.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:02:07 +00:00
Fei Li 49ed0d24a4 migration: fix the multifd code when receiving less channels
In our current code, when multifd is used during migration, if there
is an error before the destination receives all new channels, the
source keeps running, however the destination does not exit but keeps
waiting until the source is killed deliberately.

Fix this by dumping the specific error and let users decide whether
to quit from the destination side when failing to receive packet via
some channel. And update the comment for multifd_recv_new_channel().

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190113140849.38339-3-lifei1214@126.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:02:07 +00:00
Daniel Henrique Barboza fb06411210 qmp hmp: Make system_wakeup check wake-up support and run state
The qmp/hmp command 'system_wakeup' is simply a direct call to
'qemu_system_wakeup_request' from vl.c. This function verifies if
runstate is SUSPENDED and if the wake up reason is valid before
proceeding. However, no error or warning is thrown if any of those
pre-requirements isn't met. There is no way for the caller to
differentiate between a successful wakeup or an error state caused
when trying to wake up a guest that wasn't suspended.

This means that system_wakeup is silently failing, which can be
considered a bug. Adding error handling isn't an API break in this
case - applications that didn't check the result will remain broken,
the ones that check it will have a chance to deal with it.

Adding to that, the commit before previous created a new QMP API called
query-current-machine, with a new flag called wakeup-suspend-support,
that indicates if the guest has the capability of waking up from suspended
state. Although such guest will never reach SUSPENDED state and erroring
it out in this scenario would suffice, it is more informative for the user
to differentiate between a failure because the guest isn't suspended versus
a failure because the guest does not have support for wake up at all.

All this considered, this patch changes qmp_system_wakeup to check if
the guest is capable of waking up from suspend, and if it is suspended.
After this patch, this is the output of system_wakeup in a guest that
does not have wake-up from suspend support (ppc64):

(qemu) system_wakeup
wake-up from suspend is not supported by this guest
(qemu)

And this is the output of system_wakeup in a x86 guest that has the
support but isn't suspended:

(qemu) system_wakeup
Unable to wake up: guest is not in suspended state
(qemu)

Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20181205194701.17836-4-danielhb413@gmail.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-12-18 07:55:47 +01:00
Zhang Chen 7e934f5b27 migration/migration.c: Add COLO dependency checks
Current COLO mode(independent disk mode) need replication module work
together. Suggested by Dr. David Alan Gilbert <dgilbert@redhat.com>.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Message-Id: <20181114190912.7242-1-chen.zhang@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-11-21 11:38:12 +00:00
Jia Lina 3d63da16fb migration: avoid segmentfault when take a snapshot of a VM which being migrated
During an active background migration, snapshot will trigger a
segmentfault. As snapshot clears the "current_migration" struct
and updates "to_dst_file" before it finds out that there is a
migration task, Migration accesses the null pointer in
"current_migration" struct and qemu crashes eventually.

Signed-off-by: Jia Lina <jialina01@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Message-Id: <20181026083620.10172-1-jialina01@baidu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-10-31 09:38:59 +00:00
Peter Maydell 13399aad4f Error reporting patches for 2018-10-22
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbzcCHAAoJEDhwtADrkYZT3YsP/2qE4HNY/htj3IP6vNJuSaqw
 CLPRTz7zWmUBTE6FqSkvLsq3X2BMFFLeaIPA9EFcbyn2km6qPqBYgg9ElXXvPZBm
 6hDeRIoC8FdRD0Apozd5MGC94/lE47PheDRV8V+4KrGLaaMXEPxMZ0wP4AfdS5pS
 6Pt2xuF7nPu1+OWVxMk0fXadGjGLEuOQQmTh3B21J5RaynQ3gtd6h7XFC/LJyOGG
 LC/6GyPc0h7KU83VnvrRjH/EOpu1wENgrsvWsS0sem8op35Z+i9jU5BfCp4qFkDy
 gCHHUEyEeyexS+W+Tj87eBtK2gfrqQx9ovo8CIsWcUwpKbdD6AMK4FKGsDNMNHab
 Kg5u/M+O8nHCB7DuursF+3mqEbZHb05cfKe6JEtiq49EuORMV5hp4Ap966noSwTw
 UEU0NJNA1p8EdmXVudyyyYR7wpoSSmZpoenA+bJ3nthK8K0KcU4RUGk6ZEbxfJy+
 7ENl+3R2IxmxzgXv/x0tz0uFisaVW1rltTXtMte+ElQsO0qy74iHdfR7JHsmLxj9
 CO/ABMVoYsWq2OJv8pWLrdKpT4v3HQLJdHhknyu0ZcJGDyICqX29ULLEhPrNEZvW
 rxVxAkiemlaqxlUjbrM46CDQQm+w03OCnk7aCYcV4oK+u5+o3mCag705gMPErapZ
 6uOE3fAjiWw43sA31mek
 =kPZX
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2018-10-22' into staging

Error reporting patches for 2018-10-22

# gpg: Signature made Mon 22 Oct 2018 13:20:23 BST
# gpg:                using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-error-2018-10-22: (40 commits)
  error: Drop bogus "use error_setg() instead" admonitions
  vpc: Fail open on bad header checksum
  block: Clean up bdrv_img_create()'s error reporting
  vl: Simplify call of parse_name()
  vl: Fix exit status for -drive format=help
  blockdev: Convert drive_new() to Error
  vl: Assert drive_new() does not fail in default_drive()
  fsdev: Clean up error reporting in qemu_fsdev_add()
  spice: Clean up error reporting in add_channel()
  tpm: Clean up error reporting in tpm_init_tpmdev()
  numa: Clean up error reporting in parse_numa()
  vnc: Clean up error reporting in vnc_init_func()
  ui: Convert vnc_display_init(), init_keyboard_layout() to Error
  ui/keymaps: Fix handling of erroneous include files
  vl: Clean up error reporting in device_init_func()
  vl: Clean up error reporting in parse_fw_cfg()
  vl: Clean up error reporting in mon_init_func()
  vl: Clean up error reporting in machine_set_property()
  vl: Clean up error reporting in chardev_init_func()
  qom: Clean up error reporting in user_creatable_add_opts_foreach()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-10-23 17:20:23 +01:00
Markus Armbruster 4b5766488f error: Fix use of error_prepend() with &error_fatal, &error_abort
From include/qapi/error.h:

  * Pass an existing error to the caller with the message modified:
  *     error_propagate(errp, err);
  *     error_prepend(errp, "Could not frobnicate '%s': ", name);

Fei Li pointed out that doing error_propagate() first doesn't work
well when @errp is &error_fatal or &error_abort: the error_prepend()
is never reached.

Since I doubt fixing the documentation will stop people from getting
it wrong, introduce error_propagate_prepend(), in the hope that it
lures people away from using its constituents in the wrong order.
Update the instructions in error.h accordingly.

Convert existing error_prepend() next to error_propagate to
error_propagate_prepend().  If any of these get reached with
&error_fatal or &error_abort, the error messages improve.  I didn't
check whether that's the case anywhere.

Cc: Fei Li <fli@suse.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
Zhang Chen 13af18f222 COLO: Load dirty pages into SVM's RAM cache firstly
We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached ram
to SVM after we receive all PVM's state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19 11:15:03 +08:00
Zhang Chen aad555c229 COLO: Remove colo_state migration struct
We need to know if migration is going into COLO state for
incoming side before start normal migration.

Instead by using the VMStateDescription to send colo_state
from source side to destination side, we use MIG_CMD_ENABLE_COLO
to indicate whether COLO is enabled or not.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19 11:15:03 +08:00
Zhang Chen 8e48ac9586 COLO: Add block replication into colo process
Make sure master start block replication after slave's block
replication started.

Besides, we need to activate VM's blocks before goes into
COLO state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19 11:15:03 +08:00
Zhang Chen 131b2153fc COLO: integrate colo compare with colo frame
For COLO FT, both the PVM and SVM run at the same time,
only sync the state while it needs.

So here, let SVM runs while not doing checkpoint, change
DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100.

Besides, we forgot to release colo_checkpoint_semd and
colo_delay_timer, fix them here.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19 11:15:03 +08:00
Marc-André Lureau 0284a2a81c migration: fix QEMUFile leak
Spotted by ASAN while running:

$ tests/migration-test -p /x86_64/migration/postcopy/recovery

=================================================================
==18034==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 33864 byte(s) in 1 object(s) allocated from:
    #0 0x7f3da7f31e50 in calloc (/lib64/libasan.so.5+0xeee50)
    #1 0x7f3da644441d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d)
    #2 0x55af9db15440 in qemu_fopen_channel_input /home/elmarco/src/qemu/migration/qemu-file-channel.c:183
    #3 0x55af9db15413 in channel_get_output_return_path /home/elmarco/src/qemu/migration/qemu-file-channel.c:159
    #4 0x55af9db0d4ac in qemu_file_get_return_path /home/elmarco/src/qemu/migration/qemu-file.c:78
    #5 0x55af9dad5e4f in open_return_path_on_source /home/elmarco/src/qemu/migration/migration.c:2295
    #6 0x55af9dadb3bf in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3111
    #7 0x55af9dae1bf3 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:91
    #8 0x55af9daddeca in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108
    #9 0x55af9e13d3db in qio_task_complete /home/elmarco/src/qemu/io/task.c:158
    #10 0x55af9e13ca03 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:89
    #11 0x7f3da643b1ca in g_idle_dispatch gmain.c:5535

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180925092245.29565-1-marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-09-26 17:29:01 +01:00
Xiao Guangrong 76e030004f migration: show the statistics of compression
Currently, it includes:
pages: amount of pages compressed and transferred to the target VM
busy: amount of count that no free thread to compress data
busy-rate: rate of thread busy
compressed-size: amount of bytes after compression
compression-rate: rate of compressed size

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180906070101.27280-3-xiaoguangrong@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-09-26 12:27:27 +01:00
Peter Xu 3ab72385b2 qapi: Drop qapi_event_send_FOO()'s Error ** argument
The generated qapi_event_send_FOO() take an Error ** argument.  They
can't actually fail, because all they do with the argument is passing it
to functions that can't fail: the QObject output visitor, and the
@qmp_emit callback, which is either monitor_qapi_event_queue() or
event_test_emit().

Drop the argument, and pass &error_abort to the QObject output visitor
and @qmp_emit instead.

Suggested-by: Eric Blake <eblake@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180815133747.25032-4-peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message rewritten, update to qapi-code-gen.txt corrected]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-28 18:21:38 +02:00
Xiao Guangrong 1d58872a91 migration: do not wait for free thread
Instead of putting the main thread to sleep state to wait for
free compression thread, we can directly post it out as normal
page that reduces the latency and uses CPUs more efficiently

A parameter, compress-wait-thread, is introduced, it can be
enabled if the user really wants the old behavior

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:34:11 +02:00
Lidong Chen 923709896b migration: poll the cm event for destination qemu
The destination qemu only poll the comp_channel->fd in
qemu_rdma_wait_comp_channel. But when source qemu disconnnect
the rdma connection, the destination qemu should be notified.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:17:43 +02:00
Lidong Chen 74637e6f08 migration: implement bi-directional RDMA QIOChannel
This patch implements bi-directional RDMA QIOChannel. Because different
threads may access RDMAQIOChannel currently, this patch use RCU to protect it.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:12:26 +02:00
Li Qiang 4cbc9c7ffd migrate/cpu-throttle: Add max-cpu-throttle migration parameter
Currently, the default maximum CPU throttle for migration is
99(CPU_THROTTLE_PCT_MAX). This is too big and can make a remarkable
performance effect for the guest. We see a lot of packets latency
exceed 500ms when the CPU_THROTTLE_PCT_MAX reached. This patch set
adds a new max-cpu-throttle parameter to limit the CPU throttle.

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 11:42:34 +02:00
Lidong Chen 4b3fb65db9 migration: fix duplicate initialization for expected_downtime and cleanup_bh
migrate_fd_connect duplicate initialize expected_downtime and cleanup_bh.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Message-Id: <1532434585-14732-2-git-send-email-lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-24 17:28:57 +01:00
Peter Xu 97ca211c62 migration: disallow recovery for release-ram
Postcopy recovery won't work well with release-ram capability since
release-ram will drop the page buffer as long as the page is put into
the send buffer.  So if there is a network failure happened, any page
buffers that have not yet reached the destination VM but have already
been sent from the source VM will be lost forever.  Let's refuse the
client from resuming such a postcopy migration.  Luckily release-ram was
designed to only be used when src and destination VMs are on the same
host, so it should be fine.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180723123305.24792-3-peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-24 17:10:59 +01:00
Dr. David Alan Gilbert 57225e5f32 migrate: Fix cancelling state warning
We've been getting the warning:

migration_iteration_finish: Unknown ending state 2

on a cancel.

I think that's originally due to 39b9e17905c;  although
I've only seen the warning, I think that in some cases
that we could find the VM stays paused after a cancel where
it should restart.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180719092257.12703-1-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-24 17:00:45 +01:00
Peter Xu 3c9928d9f9 migration: show pause/recover state on dst host
These two states will be missing when doing "query-migrate" on
destination VM.  Add these states so that we can get the query results
as expected.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180710091902.28780-5-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10 12:56:37 +01:00
Peter Xu a429e7f488 migration: unify incoming processing
This is the 2nd patch to unbreak postcopy recovery.

Let's unify the migration_incoming_process() call at a single place
rather than calling it in connection setup codes.  This fixes a problem
that we will go into incoming migration procedure even if we are trying
to recovery from a paused postcopy migration.

Fixes: 36c2f8be2c ("migration: Delay start of migration main routines")
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180627132246.5576-5-peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10 12:48:53 +01:00
Peter Xu 884835fa1e migration: unbreak postcopy recovery
The whole postcopy recovery logic was accidentally broken.  We need to
fix it in two steps.

This is the first step that we should do the recovery when needed.  It
was bypassed before after commit 36c2f8be2c.

Introduce postcopy_try_recovery() helper for the postcopy recovery
logic.  Call it both in migration_fd_process_incoming() and
migration_ioc_process_incoming().

Fixes: 36c2f8be2c ("migration: Delay start of migration main routines")
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180627132246.5576-4-peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10 12:48:53 +01:00
Peter Xu 81e620531f migration: move income process out of multifd
Move the call to migration_incoming_process() out of multifd code.  It's
a bit strange that we can migration generic calls in multifd code.
Instead, let multifd_recv_new_channel() return a boolean showing whether
it's ready to continue the incoming migration.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180627132246.5576-3-peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10 12:48:53 +01:00
Daniel P. Berrangé ca273df301 migration: fix crash in when incoming client channel setup fails
The way we determine if we can start the incoming migration was
changed to use migration_has_all_channels() in:

  commit 428d89084c
  Author: Juan Quintela <quintela@redhat.com>
  Date:   Mon Jul 24 13:06:25 2017 +0200

    migration: Create migration_has_all_channels

This method in turn calls multifd_recv_all_channels_created()
which is hardcoded to always return 'true' when multifd is
not in use. This is a latent bug...

...activated in a following commit where that return result
ends up acting as the flag to indicate whether it is possible
to start processing the migration:

  commit 36c2f8be2c
  Author: Juan Quintela <quintela@redhat.com>
  Date:   Wed Mar 7 08:40:52 2018 +0100

    migration: Delay start of migration main routines

This means that if channel initialization fails with normal
migration, it'll never notice and attempt to start the
incoming migration regardless and crash on a NULL pointer.

This can be seen, for example, if a client connects to a server
requiring TLS, but has an invalid x509 certificate:

qemu-system-x86_64: The certificate hasn't got a known issuer
qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed.

 #0  0x00007fffebd24f2b in raise () at /lib64/libc.so.6
 #1  0x00007fffebd0f561 in abort () at /lib64/libc.so.6
 #2  0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
 #3  0x00007fffebd1d692 in  () at /lib64/libc.so.6
 #4  0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386
 #5  0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
 #6  0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
 #7  0x0000000000000000 in  ()

To handle the non-multifd case, we check whether mis->from_src_file
is non-NULL. With this in place, the migration server drops the
rejected client and stays around waiting for another, hopefully
valid, client to arrive.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180619163552.18206-1-berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-27 13:29:35 +02:00
Juan Quintela b9ee2f7d70 migration: Create ram_save_multifd_page
The function still don't use multifd, but we have simplified
ram_save_page, xbzrle and RDMA stuff is gone.  We have added a new
counter.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

--
Add last_page parameter
Add commets for done and address
Remove multifd field, it is the same than normal pages
Merge next patch, now we send multiple pages at a time
Remove counter for multifd pages, it is identical to normal pages
Use iovec's instead of creating the equivalent.
Clear memory used by pages (dave)
Use g_new0(danp)
define MULTIFD_CONTINUE
now pages member is a pointer
Fix off-by-one in number of pages in one packet
Remove RAM_SAVE_FLAG_MULTIFD_PAGE
s/multifd_pages_t/MultiFDPages_t/
add comment explaining what it means
2018-06-27 13:28:11 +02:00
Juan Quintela a61c45bd22 migration: Create multifd_bytes ram_counter
This will include how many bytes they are sent through multifd.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27 13:28:11 +02:00
Juan Quintela 0c8f0efdd4 migration: Abstract the number of bytes sent
Right now we use the "position" inside the QEMUFile, but things like
RDMA already do weird things to be able to maintain that counter
right, and multifd will have some similar problems.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27 13:28:11 +02:00
Juan Quintela 6cde6fbe2b migration: Calculate mbps only during transfer time
We used to include in this calculation the setup time, but that can be
quite big in rdma or multifd.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27 13:28:11 +02:00
Balamuruhan S 650af8907b migration: calculate expected_downtime with ram_bytes_remaining()
expected_downtime value is not accurate with dirty_pages_rate * page_size,
using ram_bytes_remaining() would yeild it resonable.

consider to read the remaining ram just after having updated the dirty
pages count later migration_bitmap_sync_range() in migration_bitmap_sync()
and reuse the `remaining` field in ram_counters to hold ram_bytes_remaining()
for calculating expected_downtime.

Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20180612085009.17594-2-bala24@linux.vnet.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert ad767bed5a migration: Wake rate limiting for urgent requests
Rate limiting sleeps the migration thread for a while when it runs
out of bandwidth; but sometimes we want to wake up to get on with
something more urgent (like a postcopy request).  Here we use
a semaphore with a timedwait instead of a simple sleep; Incrementing
the sempahore will wake it up sooner.  Anything that consumes
these urgent events must decrement the sempahore.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180613102642.23995-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert 7e555c6c58 migration/postcopy: Add max-postcopy-bandwidth parameter
Limit the background transfer bandwidth during the postcopy
phase to the value set on this new parameter.  The default, 0,
corresponds to the existing behaviour which is unlimited bandwidth.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180613102642.23995-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert 0f073f44df migration: Don't activate block devices if using -S
Activating the block devices causes the locks to be taken on
the backing file.  If we're running with -S and the destination libvirt
hasn't started the destination with 'cont', it's expecting the locks are
still untaken.

Don't activate the block devices if we're not going to autostart the VM;
'cont' already will do that anyway.   This change is tied to the new
migration capability 'late-block-activate' that defaults to off, keeping
the old behaviour by default.

bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-04 05:46:15 +02:00
Xiao Guangrong f548222c24 migration: introduce decompress-error-check
QEMU 3.0 enables strict check for compression & decompression to
make the migration more robust, that depends on the source to fix
the internal design which triggers the unexpected error conditions

To make it work for migrating old version QEMU to 2.13 QEMU, we
introduce this parameter to disable the error check on the
destination which is the default behavior of the machine type
which is older than 2.13, alternately, the strict check can be
enabled explicitly as followings:
      -M pc-q35-2.11 -global migration.decompress-error-check=true

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-04 05:46:15 +02:00
Peter Xu bfbf89c2b5 migration/qmp: add command migrate-pause
It pauses an ongoing migration.  Currently it only supports postcopy.
Note that this command will work on either side of the migration.
Basically when we trigger this on one side, it'll interrupt the other
side as well since the other side will get notified on the disconnect
event.

However, it's still possible that the other side is not notified, for
example, when the network is totally broken, or due to some firewall
configuration changes.  In that case, we will also need to run the same
command on the other side so both sides will go into the paused state.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-24-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

---
s/2.12/2.13/
2018-05-15 22:12:57 +02:00
Peter Xu 62df066fff migration: introduce lock for to_dst_file
Let's introduce a lock for that QEMUFile since we are going to operate
on it in multiple threads.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-23-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 22:12:41 +02:00
Peter Xu 02affd41b1 qmp/migration: new command migrate-recover
The first allow-oob=true command.  It's used on destination side when
the postcopy migration is paused and ready for a recovery.  After
execution, a new migration channel will be established for postcopy to
continue.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-21-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
s/2.12/2.13/
2018-05-15 22:11:45 +02:00