Commit Graph

630 Commits (7d61d892327d803ae43d14500601e48031b4632c)

Author SHA1 Message Date
Laurent Vivier 7d61d89232 linux-user: fix netlink memory corruption
Netlink is byte-swapping data in the guest memory (it's bad).

It's ok when the data come from the host as they are generated by the
host.

But it doesn't work when data come from the guest: the guest can
try to reuse these data whereas they have been byte-swapped.

This is what happens in glibc:

glibc generates a sequence number in nlh.nlmsg_seq and calls
sendto() with this nlh. In sendto(), we byte-swap nlmsg.seq.

Later, after the recvmsg(), glibc compares nlh.nlmsg_seq with
sequence number given in return, and of course it fails (hangs),
because nlh.nlmsg_seq is not valid anymore.

The involved code in glibc is:

sysdeps/unix/sysv/linux/check_pf.c:make_request()
...
  req.nlh.nlmsg_seq = time (NULL);
...
  if (TEMP_FAILURE_RETRY (__sendto (fd, (void *) &req, sizeof (req), 0,
                                    (struct sockaddr *) &nladdr,
                                    sizeof (nladdr))) < 0)
<here req.nlh.nlmsg_seq has been byte-swapped>
...
  do
    {
...
      ssize_t read_len = TEMP_FAILURE_RETRY (__recvmsg (fd, &msg, 0));
...
      struct nlmsghdr *nlmh;
      for (nlmh = (struct nlmsghdr *) buf;
           NLMSG_OK (nlmh, (size_t) read_len);
           nlmh = (struct nlmsghdr *) NLMSG_NEXT (nlmh, read_len))
        {
<we compare nlmh->nlmsg_seq with corrupted req.nlh.nlmsg_seq>
          if (nladdr.nl_pid != 0 || (pid_t) nlmh->nlmsg_pid != pid
              || nlmh->nlmsg_seq != req.nlh.nlmsg_seq)
            continue;
...
          else if (nlmh->nlmsg_type == NLMSG_DONE)
            /* We found the end, leave the loop.  */
            done = true;
        }
    }
  while (! done);

As we have a continue on "nlmh->nlmsg_seq != req.nlh.nlmsg_seq",
"done" cannot be set to "true" and we have an infinite loop.

It's why commands like "apt-get update" or "dnf update hangs".

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:20:59 +03:00
Laurent Vivier ef759f6fcc linux-user: fd_trans_*_data() returns the length
fd_trans_target_to_host_data() and fd_trans_host_to_target_data() must
return the length of processed data.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:20:58 +03:00
Peter Maydell ddf31aa853 linux-user: Fix compilation when F_SETPIPE_SZ isn't defined
Older kernels don't have F_SETPIPE_SZ and F_GETPIPE_SZ (in
particular RHEL6's system headers don't define these). Add
ifdefs so that we can gracefully fall back to not supporting
those guest ioctls rather than failing to build.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-id: 1467304429-21470-1-git-send-email-peter.maydell@linaro.org
2016-06-30 19:27:51 +01:00
Peter Maydell 845d1e7e42 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJXcuu4AAoJEJykq7OBq3PIG0EH/0qjHztvJXFcOB/pMYt9fRoe
 /zEuqSaSQJYBNjRonp4jUslziJG7ULyrcqbW4BmIswQG3R8577z5vQdJ2XKSRoAf
 sUb4vVnBiTGdZPWPvO7pxuqgYSHoGGSsHFCuzs7gL+mOkJs3U8Uw+9DExAe80J/R
 C4ZU40A4EW/UUebrWeasW8DG5b0h6dFSnIKtSo6DcGbcGDHQS1FwM/atpFk58Zev
 lpS+T8Q1JEuIYK6RAa1Oc9KxeJO3OROen2HqlPDGCFl32t3k7lEmR77J4bNg/mDj
 XbYQZbVtDeIN02/o5GSAF4ros0UNZ/Ut9Z0YeWoCzBumu9ZK0iGbitgLG8w7psA=
 =gHyO
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Tue 28 Jun 2016 22:27:20 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request:
  trace: [*-user] Add events to trace guest syscalls in syscall emulation mode
  trace: enable tracing in qemu-img
  qemu-img: move common options parsing before commands processing
  trace: enable tracing in qemu-nbd
  trace: enable tracing in qemu-io
  trace: move qemu_trace_opts to trace/control.c
  doc: move text describing --trace to specific .texi file
  doc: sync help description for --trace with man for qemu.1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-29 14:07:57 +01:00
Lluís Vilanova 9c15e70086 trace: [*-user] Add events to trace guest syscalls in syscall emulation mode
Adds two events to trace syscalls in syscall emulation mode (*-user):

* guest_user_syscall: Emitted before the syscall is emulated; contains
  the syscall number and arguments.

* guest_user_syscall_ret: Emitted after the syscall is emulated;
  contains the syscall number and return value.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 146651712411.12388.10024905980452504938.stgit@fimbulvetr.bsc.es
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-28 21:14:12 +01:00
Laurent Vivier b9403979b5 linux-user: don't swap NLMSG_DATA() fields
If the structure pointed by NLMSG_DATA() is bigger
than the size of NLMSG_DATA(), don't swap its fields
to avoid memory corruption.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-26 13:17:22 +03:00
Laurent Vivier 48dc0f2c3d linux-user: fd_trans_host_to_target_data() must process only received data
if we process the whole buffer, the netlink helpers can try
to swap invalid data.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-26 13:17:22 +03:00
Laurent Vivier 84f34b00c8 linux-user: add missing return in netlink switch statement
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-26 13:17:21 +03:00
Peter Maydell 7e3b92ece0 linux-user: Support F_GETPIPE_SZ and F_SETPIPE_SZ fcntls
Support the F_GETPIPE_SZ and F_SETPIPE_SZ fcntl operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-26 13:17:21 +03:00
Peter Maydell 4debae6fa5 linux-user: Fix wrong type used for argument to rt_sigqueueinfo
The third argument to the rt_sigqueueinfo syscall is a pointer to
a siginfo_t, not a pointer to a sigset_t. Fix the error in the
arguments to lock_user(), which meant that we would not have
detected some faults that we should.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-26 13:17:21 +03:00
Peter Maydell 1d48fdd9d8 linux-user: Don't use sigfillset() on uc->uc_sigmask
The kernel and libc have different ideas about what a sigset_t
is -- for the kernel it is only _NSIG / 8 bytes in size (usually
8 bytes), but for libc it is much larger, 128 bytes. In most
situations the difference doesn't matter, because if you pass a
pointer to a libc sigset_t to the kernel it just acts on the first
8 bytes of it, but for the ucontext_t* argument to a signal handler
it trips us up. The kernel allocates this ucontext_t on the stack
according to its idea of the sigset_t type, but the type of the
ucontext_t defined by the libc headers uses the libc type, and
so do the manipulator functions like sigfillset(). This means that
 (1) sizeof(uc->uc_sigmask) is much larger than the actual
     space used on the stack
 (2) sigfillset(&uc->uc_sigmask) will write garbage 0xff bytes
     off the end of the structure, which can trash data that
     was on the stack before the signal handler was invoked,
     and may result in a crash after the handler returns

To avoid this, we use a memset() of the correct size to fill
the signal mask rather than using the libc function.

This fixes a problem where we would crash at least some of the
time on an i386 host when a signal was taken.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-26 13:17:20 +03:00
Peter Maydell 435da5e709 linux-user: Use safe_syscall wrapper for fcntl
Use the safe_syscall wrapper for fcntl. This is straightforward now
that we always use 'struct fcntl64' on the host, as we don't need
to select whether to call the host's fcntl64 or fcntl syscall
(a detail that the libc previously hid for us).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-26 13:17:20 +03:00
Peter Maydell 213d3e9ea2 linux-user: Use __get_user() and __put_user() to handle structs in do_fcntl()
Use the __get_user() and __put_user() to handle reading and writing the
guest structures in do_ioctl(). This has two benefits:
 * avoids possible errors due to misaligned guest pointers
 * correctly sign extends signed fields (like l_start in struct flock)
   which might be different sizes between guest and host

To do this we abstract out into copy_from/to_user functions. We
also standardize on always using host flock64 and the F_GETLK64
etc flock commands, as this means we always have 64 bit offsets
whether the host is 64-bit or 32-bit and we don't need to support
conversion to both host struct flock and struct flock64.

In passing we fix errors in converting l_type from the host to
the target (where we were doing a byteswap of the host value
before trying to do the convert-bitmasks operation rather than
otherwise, and inexplicably shifting left by 1); these were
accidentally left over when the original simple "just shift by 1"
arm<->x86 conversion of commit 43f238d was changed to the more
general scheme of using target_to_host_bitmask() functions in 2ba7f73.

[RV: fixed ifdef guard for eabi functions]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-26 13:16:41 +03:00
Paolo Bonzini 02d0e09503 os-posix: include sys/mman.h
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h.  Include it in
sysemu/os-posix.h and remove it from everywhere else.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:03 +02:00
Peter Maydell b66e10e4c9 linux-user pull request for June 2016
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAV1gdMrRIkN7ePJvAAQhLcg/+Kby99taEuewItrA1yDs75jxOlLqaJopd
 cVzo4LFRFPhIn4UEKqRQS0CGoIeU/DYOmObvuUzJxs2LyUoHoqmQOwEm5obC2a85
 JrHo/NOppYBbyvvIEAAXzZDCZo0KZKVclrlT+AX5obpOSNSvAnKvEuLWq1aQ9WGN
 n4AzHuFEl885cd4nFd8VK/xth89bqz6U/z8CjgIuw3mczp1XNrK5IJJwAy5epHay
 GCBr9XHooW3SU971WS20RTRS0D33tKPHgCU3ZeZ3rKh4g3JNj6/ixdVgzi9NqFsQ
 5DzAj/iBGhN1LtCOednRS6tUt32Bhy8G/g4O3GiXdejagAmNe2wz31cveNJ8S3W5
 DK8SZAnJlz06zN5uIpOVQgDOqfXZkCp7ndq779QJoHOAnuOjJBcUbhw1myz2R3eR
 6208tStWl3R0+ATEK8CZ7ejg1cUHvdzyqGJA+1nC2HaFUrBWipxN8jf2fz9vO/wG
 G7zNbahvVgyJWO7bPNK4mxkb6qkWCETnCnLJsq2ZbmtPEMcINjD8vNWLNvFGVG8b
 2HbinDrzh0Z9Zik5gLZfiVyP5HFaWSrJn9QRVIgaUjuIH9n3/25sl9OvW/JLjxJ+
 h2P17CLnAK6dhUYc4R3wQTx2X/N2FvO4DD8iMYOcgDY6fhZ2b6EEyE9yBgQrIDbF
 gU1AlC/CX+M=
 =AXqa
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160608' into staging

linux-user pull request for June 2016

# gpg: Signature made Wed 08 Jun 2016 14:27:14 BST
# gpg:                using RSA key 0xB44890DEDE3C9BC0
# gpg: Good signature from "Riku Voipio <riku.voipio@iki.fi>"
# gpg:                 aka "Riku Voipio <riku.voipio@linaro.org>"

* remotes/riku/tags/pull-linux-user-20160608: (44 commits)
  linux-user: In fork_end(), remove correct CPUs from CPU list
  linux-user: Special-case ERESTARTSYS in target_strerror()
  linux-user: Make target_strerror() return 'const char *'
  linux-user: Correct signedness of target_flock l_start and l_len fields
  linux-user: Use safe_syscall wrapper for ioctl
  linux-user: Use safe_syscall wrapper for accept and accept4 syscalls
  linux-user: Use safe_syscall wrapper for semop
  linux-user: Use safe_syscall wrapper for epoll_wait syscalls
  linux-user: Use safe_syscall wrapper for poll and ppoll syscalls
  linux-user: Use safe_syscall wrapper for sleep syscalls
  linux-user: Use safe_syscall wrapper for rt_sigtimedwait syscall
  linux-user: Use safe_syscall wrapper for flock
  linux-user: Use safe_syscall wrapper for mq_timedsend and mq_timedreceive
  linux-user: Use safe_syscall wrapper for msgsnd and msgrcv
  linux-user: Use safe_syscall wrapper for send* and recv* syscalls
  linux-user: Use safe_syscall wrapper for connect syscall
  linux-user: Use safe_syscall wrapper for readv and writev syscalls
  linux-user: Fix error conversion in 64-bit fadvise syscall
  linux-user: Fix NR_fadvise64 and NR_fadvise64_64 for 32-bit guests
  linux-user: Fix handling of arm_fadvise64_64 syscall
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Conflicts:
	configure
	scripts/qemu-binfmt-conf.sh
2016-06-08 18:34:32 +01:00
Peter Maydell da2a34f7f9 linux-user: Special-case ERESTARTSYS in target_strerror()
Since TARGET_ERESTARTSYS and TARGET_ESIGRETURN are internal-to-QEMU
error numbers, handle them specially in target_strerror(), to avoid
confusing strace output like:

9521 rt_sigreturn(14,8,274886297808,8,0,268435456) = -1 errno=513 (Unknown error 513)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 12:06:57 +03:00
Peter Maydell 7dcdaeafe0 linux-user: Make target_strerror() return 'const char *'
Make target_strerror() return 'const char *' rather than just 'char *';
this will allow us to return constant strings from it for some special
cases.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
2016-06-08 12:06:57 +03:00
Peter Maydell 49ca6f3e24 linux-user: Use safe_syscall wrapper for ioctl
Use the safe_syscall wrapper to implement the ioctl syscall.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:47 +03:00
Peter Maydell ff6dc13079 linux-user: Use safe_syscall wrapper for accept and accept4 syscalls
Use the safe_syscall wrapper for the accept and accept4 syscalls.
accept4 has been in the kernel since 2.6.28 so we can assume it
is always present.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:46 +03:00
Peter Maydell ffb7ee796a linux-user: Use safe_syscall wrapper for semop
Use the safe_syscall wrapper for the semop syscall or IPC operation.
(We implement via the semtimedop syscall to make it easier to
implement the guest semtimedop syscall later.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:46 +03:00
Peter Maydell 227f02143f linux-user: Use safe_syscall wrapper for epoll_wait syscalls
Use the safe_syscall wrapper for epoll_wait and epoll_pwait syscalls.

Since we now directly use the host epoll_pwait syscall for both
epoll_wait and epoll_pwait, we don't need the configure machinery
to check whether glibc supports epoll_pwait(). (The kernel has
supported the syscall since 2.6.19 so we can assume it's always there.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:46 +03:00
Peter Maydell a6130237b8 linux-user: Use safe_syscall wrapper for poll and ppoll syscalls
Use the safe_syscall wrapper for the poll and ppoll syscalls.
Since not all host architectures will have a poll syscall, we
have to rewrite the TARGET_NR_poll handling to use ppoll instead
(we can assume everywhere has ppoll by now).

We take the opportunity to switch to the code structure
already used in the implementation of epoll_wait and epoll_pwait,
which uses a switch() to avoid interleaving #if and if (),
and to stop using a variable with a leading '_' which is in
the implementation's namespace.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:46 +03:00
Peter Maydell 9e518226f4 linux-user: Use safe_syscall wrapper for sleep syscalls
Use the safe_syscall wrapper for the clock_nanosleep and nanosleep
syscalls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:46 +03:00
Peter Maydell b3f8233068 linux-user: Use safe_syscall wrapper for rt_sigtimedwait syscall
Use the safe_syscall wrapper for the rt_sigtimedwait syscall.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:46 +03:00
Peter Maydell 2a8459892f linux-user: Use safe_syscall wrapper for flock
Use the safe_syscall wrapper for the flock syscall.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:46 +03:00
Peter Maydell d40ecd6618 linux-user: Use safe_syscall wrapper for mq_timedsend and mq_timedreceive
Use the safe_syscall wrapper for mq_timedsend and mq_timedreceive syscalls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:46 +03:00
Peter Maydell 89f9fe4452 linux-user: Use safe_syscall wrapper for msgsnd and msgrcv
Use the safe_syscall wrapper for msgsnd and msgrcv syscalls.
This is made slightly awkward by some host architectures providing
only a single 'ipc' syscall rather than separate syscalls per
operation; we provide safe_msgsnd() and safe_msgrcv() as wrappers
around safe_ipc() to handle this if needed.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:45 +03:00
Peter Maydell 666875306e linux-user: Use safe_syscall wrapper for send* and recv* syscalls
Use the safe_syscall wrapper for the send, sendto, sendmsg, recv,
recvfrom and recvmsg syscalls.

RV: adjusted to apply
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:45 +03:00
Peter Maydell 2a3c761928 linux-user: Use safe_syscall wrapper for connect syscall
Use the safe_syscall wrapper for the connect syscall.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:45 +03:00
Peter Maydell 918c03ed9a linux-user: Use safe_syscall wrapper for readv and writev syscalls
Use the safe_syscall wrapper for readv and writev syscalls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:45 +03:00
Peter Maydell 977d8241c1 linux-user: Fix error conversion in 64-bit fadvise syscall
Fix a missing host-to-target errno conversion in the 64-bit
fadvise syscall emulation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:45 +03:00
Peter Maydell badd3cd880 linux-user: Fix NR_fadvise64 and NR_fadvise64_64 for 32-bit guests
Fix errors in the implementation of NR_fadvise64 and NR_fadvise64_64
for 32-bit guests, which pass their off_t values in register pairs.
We can't use the 64-bit code path for this, so split out the 32-bit
cases, so that we can correctly handle the "only offset is 64-bit"
and "both offset and length are 64-bit" syscall flavours, and
"uses aligned register pairs" and "does not" flavours of target.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:45 +03:00
Peter Maydell e0156a9dc4 linux-user: Fix handling of arm_fadvise64_64 syscall
32-bit ARM has an odd variant of the fadvise syscall which has
rearranged arguments, which we try to implement. Unfortunately we got
the rearrangement wrong.

This is a six-argument syscall whose arguments are:
 * fd
 * advise parameter
 * offset high half
 * offset low half
 * len high half
 * len low half

Stop trying to share code with the standard fadvise syscalls,
and just implement the syscall with the correct argument order.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-08 10:13:45 +03:00
Laurent Vivier b1b2db29bd linux-user: Use DIV_ROUND_UP
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).

This patch is the result of coccinelle script
scripts/coccinelle/round.cocci

CC: Riku Voipio <riku.voipio@iki.fi>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:25 +03:00
Timothy E Baldwin 7d92d34ee4 linux-user: Restart fork() if signals pending
If there is a signal pending during fork() the signal handler will
erroneously be called in both the parent and child, so handle any
pending signals first.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-20-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 16:39:08 +03:00
Peter Maydell bef653d92e linux-user: Use safe_syscall for kill, tkill and tgkill syscalls
Use the safe_syscall wrapper for the kill, tkill and tgkill syscalls.
Without this, if a thread sent a SIGKILL to itself it could kill the
thread before we had a chance to process a signal that arrived just
before the SIGKILL, and that signal would get lost.

We drop all the ifdeffery for tkill and tgkill, because every guest
architecture we support implements them, and they've been in Linux
since 2003 so we can assume the host headers define the __NR_tkill
and __NR_tgkill constants.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 16:39:08 +03:00
Timothy E Baldwin a0995886e2 linux-user: Restart exit() if signal pending
Without this a signal could vanish on thread exit.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-26-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 16:39:08 +03:00
Timothy E Baldwin f59ec60610 linux-user: pause() should not pause if signal pending
Fix races between signal handling and the pause syscall by
reimplementing it using block_signals() and sigsuspend().
(Using safe_syscall(pause) would also work, except that the
pause syscall doesn't exist on all architectures.)

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-28-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 16:39:07 +03:00
Peter Maydell 3d3efba020 linux-user: Fix race between multiple signals
If multiple host signals are received in quick succession they would
be queued in TaskState then delivered to the guest in spite of
signals being supposed to be blocked by the guest signal handler's
sa_mask. Fix this by decoupling the guest signal mask from the
host signal mask, so we can have protected sections where all
host signals are blocked. In particular we block signals from
when host_signal_handler() queues a signal from the guest until
process_pending_signals() has unqueued it. We also block signals
while we are manipulating the guest signal mask in emulation of
sigprocmask and similar syscalls.

Blocking host signals also ensures the correct behaviour with respect
to multiple threads and the overrun count of timer related signals.
Alas blocking and queuing in qemu is still needed because of virtual
processor exceptions, SIGSEGV and SIGBUS.

Blocking signals inside process_pending_signals() protects against
concurrency problems that would otherwise happen if host_signal_handler()
ran and accessed the signal data structures while process_pending_signals()
was manipulating them.

Since we now track the guest signal mask separately from that
of the host, the sigsuspend system calls must track the signal
mask passed to them, because when we process signals as we leave
the sigsuspend the guest signal mask in force is that passed to
sigsuspend.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-19-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: make signal_pending a simple flag rather than a word with two flag bits;
 ensure we don't call block_signals() twice in sigreturn codepaths;
 document and assert() the guarantee that using do_sigprocmask() to
 get the current mask never fails;  use the qemu atomics.h functions
 rather than raw volatile variable access; add extra commentary and
 documentation; block SIGSEGV/SIGBUS in block_signals() and in
 process_pending_signals() because they can't occur synchronously here;
 check the right do_sigprocmask() call for errors in ssetmask syscall;
 expand commit message; fixed sigsuspend() hanging]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 16:39:07 +03:00
Peter Maydell 2fe4fba115 linux-user: Use safe_syscall for sigsuspend syscalls
Use the safe_syscall wrapper for sigsuspend syscalls. This
means that we will definitely deliver a signal that arrives
before we do the sigsuspend call, rather than blocking first
and delivering afterwards.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 16:39:06 +03:00
Peter Maydell b28a1f333a linux-user: Define macro for size of host kernel sigset_t
Some host syscalls take an argument specifying the size of a
host kernel's sigset_t (which isn't necessarily the same as
that of the host libc's type of that name). Instead of hardcoding
_NSIG / 8 where we do this, define and use a SIGSET_T_SIZE macro.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 16:39:06 +03:00
Laurent Vivier 575b22b1b7 linux-user: check if NETLINK_ROUTE is available
Some IFLA_* symbols can be missing in the host linux/if_link.h,
but as they are enums and not "#defines", check in "configure" if
last known  (IFLA_PROTO_DOWN) is available and if not, disable
management of NETLINK_ROUTE protocol.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 11:39:00 +03:00
Laurent Vivier 5ce9bb5937 linux-user: add netlink audit
This is, for instance, needed to log in a container.

Without this, the user cannot be identified and the console login
fails with "Login incorrect".

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 11:37:14 +03:00
Laurent Vivier b265620bfb linux-user: support netlink protocol NETLINK_KOBJECT_UEVENT
This is the protocol used by udevd to manage kernel events.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 11:34:36 +03:00
Laurent Vivier 6c5b5645ae linux-user: add rtnetlink(7) support
rtnetlink is needed to use iproute package (ip addr, ip route)
and dhcp client.

Examples:

Without this patch:
    # ip link
    Cannot open netlink socket: Address family not supported by protocol
    # ip addr
    Cannot open netlink socket: Address family not supported by protocol
    # ip route
    Cannot open netlink socket: Address family not supported by protocol
    # dhclient eth0
    Cannot open netlink socket: Address family not supported by protocol
    Cannot open netlink socket: Address family not supported by protocol

With this patch:
    # ip link
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    51: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT qlen 1000
        link/ether 00:16:3e:89:6b:d7 brd ff:ff:ff:ff:ff:ff
    # ip addr show eth0
    51: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
        link/ether 00:16:3e:89:6b:d7 brd ff:ff:ff:ff:ff:ff
        inet 192.168.122.197/24 brd 192.168.122.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::216:3eff:fe89:6bd7/64 scope link
           valid_lft forever preferred_lft forever
    # ip route
    default via 192.168.122.1 dev eth0
    192.168.122.0/24 dev eth0  proto kernel  scope link  src 192.168.122.197
    # ip addr flush eth0
    # ip addr add 192.168.122.10 dev eth0
    # ip addr show eth0
    51: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
        link/ether 00:16:3e:89:6b:d7 brd ff:ff:ff:ff:ff:ff
        inet 192.168.122.10/32 scope global eth0
           valid_lft forever preferred_lft forever
    # ip route add 192.168.122.0/24 via 192.168.122.10
    # ip route
        192.168.122.0/24 via 192.168.122.10 dev eth0

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-06-07 11:33:36 +03:00
Peter Maydell fd6f7798ac linux-user: Use direct syscalls for setuid(), etc
On Linux the setuid(), setgid(), etc system calls have different semantics
from the libc functions. The libc functions follow POSIX and update the
credentials for all threads in the process; the system calls update only
the thread which makes the call. (This impedance mismatch is worked around
in libc by signalling all threads to tell them to do a syscall, in a
byzantine and fragile way; see http://ewontfix.com/17/.)

Since in linux-user we are trying to emulate the system call semantics,
we must implement all these syscalls to directly call the underlying
host syscall, rather than calling the host libc function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell 415d847110 linux-user: Use g_try_malloc() in do_msgrcv()
In do_msgrcv() we want to allocate a message buffer, whose size
is passed to us by the guest. That means we could legitimately
fail, so use g_try_malloc() and handle the error case, in the same
way that do_msgsnd() does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell 99874f6552 linux-user: Handle msgrcv error case correctly
The msgrcv ABI is a bit odd -- the msgsz argument is a size_t, which is
unsigned, but it must fail EINVAL if the value is negative when cast
to a long. We were incorrectly passing the value through an
"unsigned int", which meant that if the guest was 32-bit longs and
the host was 64-bit longs an input of 0xffffffff (which should trigger
EINVAL) would simply be passed to the host msgrcv() as 0xffffffff,
where it does not cause the host kernel to reject it.
Follow the same approach as do_msgsnd() in using a ssize_t and
doing the check for negative values by hand, so we correctly fail
in this corner case.

This fixes the msgrcv03 Linux Test Project test case, which otherwise
hangs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell c7e35da348 linux-user: Handle negative values in timespec conversion
In a struct timespec, both fields are signed longs. Converting
them from guest to host with code like
    host_ts->tv_sec = tswapal(target_ts->tv_sec);
mishandles negative values if the guest has 32-bit longs and
the host has 64-bit longs because tswapal()'s return type is
abi_ulong: the assignment will zero-extend into the host long
type rather than sign-extending it.

Make the conversion routines use __get_user() and __set_user()
instead: this automatically picks up the signedness of the
field type and does the correct kind of sign or zero extension.
It also handles the possibility that the target struct is not
sufficiently aligned for the host's requirements.

In particular, this fixes a hang when running the Linux Test Project
mq_timedsend01 and mq_timedreceive01 tests: one of the test cases
sets the timeout to -1 and expects an EINVAL failure, but we were
setting a very long timeout instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell d509eeb13c linux-user: Use safe_syscall for futex syscall
Use the safe_syscall wrapper for the futex syscall.

In particular, this fixes hangs when using programs that link
against the Boehm garbage collector, including the Mono runtime.

(We don't change the sys_futex() call in the implementation of
the exit syscall, because as the FIXME comment there notes
that should be handled by disabling signals, since we can't
easily back out if the futex were to return ERESTARTSYS.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00