mirror_qemu

Commit Graph

Author	SHA1	Message	Date
Ilya Maximets	cb039ef3d9	net: add initial support for AF_XDP network backend AF_XDP is a network socket family that allows communication directly with the network device driver in the kernel, bypassing most or all of the kernel networking stack. In the essence, the technology is pretty similar to netmap. But, unlike netmap, AF_XDP is Linux-native and works with any network interfaces without driver modifications. Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't require access to character devices or unix sockets. Only access to the network interface itself is necessary. This patch implements a network backend that communicates with the kernel by creating an AF_XDP socket. A chunk of userspace memory is shared between QEMU and the host kernel. 4 ring buffers (Tx, Rx, Fill and Completion) are placed in that memory along with a pool of memory buffers for the packet data. Data transmission is done by allocating one of the buffers, copying packet data into it and placing the pointer into Tx ring. After transmission, device will return the buffer via Completion ring. On Rx, device will take a buffer form a pre-populated Fill ring, write the packet data into it and place the buffer into Rx ring. AF_XDP network backend takes on the communication with the host kernel and the network interface and forwards packets to/from the peer device in QEMU. Usage example: -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1 XDP program bridges the socket with a network interface. It can be attached to the interface in 2 different modes: 1. skb - this mode should work for any interface and doesn't require driver support. With a caveat of lower performance. 2. native - this does require support from the driver and allows to bypass skb allocation in the kernel and potentially use zero-copy while getting packets in/out userspace. By default, QEMU will try to use native mode and fall back to skb. Mode can be forced via 'mode' option. To force 'copy' even in native mode, use 'force-copy=on' option. This might be useful if there is some issue with the driver. Option 'queues=N' allows to specify how many device queues should be open. Note that all the queues that are not open are still functional and can receive traffic, but it will not be delivered to QEMU. So, the number of device queues should generally match the QEMU configuration, unless the device is shared with something else and the traffic re-direction to appropriate queues is correctly configured on a device level (e.g. with ethtool -N). 'start-queue=M' option can be used to specify from which queue id QEMU should start configuring 'N' queues. It might also be necessary to use this option with certain NICs, e.g. MLX5 NICs. See the docs for examples. In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN or CAP_BPF capabilities in order to load default XSK/XDP programs to the network interface and configure BPF maps. It is possible, however, to run with no capabilities. For that to work, an external process with enough capabilities will need to pre-load default XSK program, create AF_XDP sockets and pass their file descriptors to QEMU process on startup via 'sock-fds' option. Network backend will need to be configured with 'inhibit=on' to avoid loading of the program. QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue or CAP_IPC_LOCK. There are few performance challenges with the current network backends. First is that they do not support IO threads. This means that data path is handled by the main thread in QEMU and may slow down other work or may be slowed down by some other work. This also means that taking advantage of multi-queue is generally not possible today. Another thing is that data path is going through the device emulation code, which is not really optimized for performance. The fastest "frontend" device is virtio-net. But it's not optimized for heavy traffic either, because it expects such use-cases to be handled via some implementation of vhost (user, kernel, vdpa). In practice, we have virtio notifications and rcu lock/unlock on a per-packet basis and not very efficient accesses to the guest memory. Communication channels between backend and frontend devices do not allow passing more than one packet at a time as well. Some of these challenges can be avoided in the future by adding better batching into device emulation or by implementing vhost-af-xdp variant. There are also a few kernel limitations. AF_XDP sockets do not support any kinds of checksum or segmentation offloading. Buffers are limited to a page size (4K), i.e. MTU is limited. Multi-buffer support implementation for AF_XDP is in progress, but not ready yet. Also, transmission in all non-zero-copy modes is synchronous, i.e. done in a syscall. That doesn't allow high packet rates on virtual interfaces. However, keeping in mind all of these challenges, current implementation of the AF_XDP backend shows a decent performance while running on top of a physical NIC with zero-copy support. Test setup: 2 VMs running on 2 physical hosts connected via ConnectX6-Dx card. Network backend is configured to open the NIC directly in native mode. The driver supports zero-copy. NIC is configured to use 1 queue. Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd for PPS testing. iperf3 result: TCP stream : 19.1 Gbps dpdk-testpmd (single queue, single CPU core, 64 B packets) results: Tx only : 3.4 Mpps Rx only : 2.0 Mpps L2 FWD Loopback : 1.5 Mpps In skb mode the same setup shows much lower performance, similar to the setup where pair of physical NICs is replaced with veth pair: iperf3 result: TCP stream : 9 Gbps dpdk-testpmd (single queue, single CPU core, 64 B packets) results: Tx only : 1.2 Mpps Rx only : 1.0 Mpps L2 FWD Loopback : 0.7 Mpps Results in skb mode or over the veth are close to results of a tap backend with vhost=on and disabled segmentation offloading bridged with a NIC. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool) Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-09-18 14:36:13 +08:00
Laurent Vivier	5166fe0ae4	qapi: net: add stream and dgram netdevs Copied from socket netdev file and modified to use SocketAddress to be able to introduce new features like unix socket. "udp" and "mcast" are squashed into dgram netdev, multicast is detected according to the IP address type. "listen" and "connect" modes are managed by stream netdev. An optional parameter "server" defines the mode (off by default) The two new types need to be parsed the modern way with -netdev, because with the traditional way, the "type" field of netdev structure collides with the "type" field of SocketAddress and prevents the correct evaluation of the command line option. Moreover the traditional way doesn't allow to use the same type (SocketAddress) several times with the -netdev option (needed to specify "local" and "remote" addresses). The previous commit paved the way for parsing the modern way, but omitted one detail: how to pick modern vs. traditional, in netdev_is_modern(). We want to pick based on the value of parameter "type". But how to extract it from the option argument? Parsing the option argument, either the modern or the traditional way, extracts it for us, but only if parsing succeeds. If parsing fails, there is no good option. No matter which parser we pick, it'll be the wrong one for some arguments, and the error reporting will be confusing. Fortunately, the traditional parser accepts anything when called in a certain way. This maximizes our chance to extract the value of "type", and in turn minimizes the risk of confusing error reporting. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2022-10-28 13:28:52 +08:00
Vladislav Yaroshchuk	81ad2964e9	net/vmnet: add vmnet backends to qapi/net Create separate netdevs for each vmnet operating mode: - vmnet-host - vmnet-shared - vmnet-bridged Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com> Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Vladislav Yaroshchuk <Vladislav.Yaroshchuk@jetbrains.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2022-05-17 16:48:23 +08:00
Cindy Lu	1e0a84ea49	vhost-vdpa: introduce vhost-vdpa net client This patch set introduces a new net client type: vhost-vdpa. vhost-vdpa net client will set up a vDPA device which is specified by a "vhostdev" parameter. Signed-off-by: Lingshan Zhu <lingshan.zhu@intel.com> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Cindy Lu <lulu@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20200701145538.22333-15-lulu@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2020-07-07 07:59:51 -04:00
Markus Armbruster	522ece32d2	Drop superfluous includes of qapi-types.h and test-qapi-types.h Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-4-armbru@redhat.com>	2018-02-09 05:05:11 +01:00
Kővágó, Zoltán	cebea51057	net: use Netdev instead of NetClientOptions in client init This way we no longer need NetClientOptions and can convert Netdev into a flat union. Signed-off-by: Kővágó, Zoltán <DirtY.iCE.hu@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <93ffdfed7054529635e6acb935150d95dc173a12.1441627176.git.DirtY.iCE.hu@gmail.com> [rework net_client_init1() to pass Netdev by copying from NetdevLegacy, rather than merging the two types - which means that we still need NetClientOptions after all. Rebase to qapi changes. The bulk of the patch is mechanical, replacing 'opts' by 'netdev->opts', while net_client_init1() takes care of converting between legacy and modern types.] Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1468468228-27827-2-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-07-19 13:21:08 +02:00
Markus Armbruster	a30ecde6e7	net: Permit incremental conversion of init functions to Error Error reporting for netdev_add is broken: the net_client_init_fun[] report the actual errors with (at best) error_report(), and their caller net_client_init1() makes up a generic error on top. For command line and HMP, this produces an mildly ugly error cascade. In QMP, the actual errors go to stderr, and the generic error becomes the command's error reply. To fix this, we need to convert the net_client_init_fun[] to Error. To permit fixing them one by one, add an Error ** parameter to the net_client_init_fun[]. If the call fails without returning an Error, make up the same generic Error as before. But if it returns one, use that instead. Since none of them does so far, no functional change. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1431691143-1015-3-git-send-email-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-27 09:51:04 +01:00
Anton Ivanov	3fb69aa1d1	net: L2TPv3 transport This transport allows to connect a QEMU nic to a static Ethernet over L2TPv3 tunnel. The transport supports all options present in the Linux kernel implementation. It allows QEMU to connect to any Linux host running kernel 3.3+, most routers and network devices as well as other QEMU instances. [Fixed up net_client_init1() switch statement to support -netdev --Stefan] Signed-off-by: Anton Ivanov <antivano@cisco.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-06-27 10:39:10 +02:00
Nikolay Nikolaev	d314f586b3	Add new vhost-user netdev backend Add a new QEMU netdev backend that is intended to invoke vhost_net with the vhost-user backend. It uses an Unix socket chardev to establish a communication with the 'slave' (client and server mode supported). At runtime the netdev will handle OPEN/CLOSE events from the chardev. Upon disconnection it will set link_down accordingly and notify virtio-net; the virtio-net interface will go down. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2014-06-19 16:41:57 +03:00
Vincenzo Maffione	58952137b0	net: Adding netmap network backend This patch adds support for a network backend based on netmap. netmap is a framework for high speed packet I/O. You can use it to build extremely fast traffic generators, monitors, software switches or network middleboxes. Its companion software switch VALE lets you interconnect virtual machines. netmap and VALE are implemented as a non-intrusive kernel module, support NICs from multiple vendors, are part of standard FreeBSD distributions and available in source format for Linux too. To compile QEMU with netmap support, use the following configure options: ./configure [...] --enable-netmap --extra-cflags=-I/path/to/netmap/sys where "/path/to/netmap" contains the netmap source code, available at http://info.iet.unipi.it/~luigi/netmap/ The same webpage contains more information about the netmap project (together with papers and presentations). Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-12-09 13:33:19 +01:00
Paolo Bonzini	1422e32db5	net: reorganize headers Move public headers to include/net, and leave private headers in net/. Put the virtio headers in include/net/tap.h, removing the multiple copies that existed. Leave include/net/tap.h as the interface for NICs, and net/tap_int.h as the interface for OS-specific parts of the tap backend. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:29 +01:00
Paolo Bonzini	a245fc1835	net: consolidate NetClientState header files into one This patch doesn't seem much useful alone, I must admit. However, it makes sense as part of the upcoming directory reorganization, where I want to have include/net/tap.h as the net<->hw interface for tap. Then having both net/tap.h and include/net/tap.h does not work. "Fixed" by moving all the init functions to a single header file net/clients.h. The patch also adopts a uniform style for including net/.h files from net/.c, without the net/ path. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>	2012-10-08 13:59:40 +02:00

12 Commits (bb59f3548f0df66689b3fef676b2ac29ca00973c)