Automatically size the number of virtio-scsi-pci, vhost-scsi-pci, and
vhost-user-scsi-pci request virtqueues to match the number of vCPUs.
Other transports continue to default to 1 request virtqueue.
A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
handled on the same vCPU that submitted the request. No IPI is
necessary to complete an I/O request and performance is improved. The
maximum number of MSI-X vectors and virtqueues limit are respected.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200818143348.310613-6-stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The event and control virtqueues are always present, regardless of the
multi-queue configuration. Define a constant so that virtqueue number
calculations are easier to read.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20200818143348.310613-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Before the patch, seg_max parameter was immutable and hardcoded
to 126 (128 - 2) without respect to queue size. This has two negative effects:
1. when queue size is < 128, we have Virtio 1.1 specfication violation:
(2.6.5.3.1 Driver Requirements) seq_max must be <= queue_size.
This violation affects the old Linux guests (ver < 4.14). These guests
crash on these queue_size setups.
2. when queue_size > 128, as was pointed out by Denis Lunev <den@virtuozzo.com>,
seg_max restrics guest's block request length which affects guests'
performance making them issues more block request than needed.
https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html
To mitigate this two effects, the patch adds the property adjusting seg_max
to queue size automaticaly. Since seg_max is a guest visible parameter,
the property is machine type managable and allows to choose between
old (seg_max = 126 always) and new (seg_max = queue_size - 2) behaviors.
Not to change the behavior of the older VMs, prevent setting the default
seg_max_adjust value for older machine types.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Message-Id: <20191220140905.1718-2-dplotnikov@virtuozzo.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The argument is not used and passing it clutters error propagation in the
callers. So, get rid of it.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since Linux switched to blk-mq as the default in Linux commit
5c279bd9e406 ("scsi: default to scsi-mq"), virtio-scsi LUNs consume
about 10x as much guest kernel memory.
This commit allows you to choose the virtqueue size for each
virtio-scsi-pci controller like this:
-device virtio-scsi-pci,id=scsi,virtqueue_size=16
The default is still 128 as before. Using smaller virtqueue_size
allows many more disks to be added to small memory virtual machines.
For a 1 vCPU, 500 MB, no swap VM I observed:
With scsi-mq enabled (upstream kernel): 175 disks
-"- ditto -"- virtqueue_size=64: 318 disks
-"- ditto -"- virtqueue_size=16: 775 disks
With scsi-mq disabled (kernel before 5c279bd9e406): 1755 disks
Note that to have any effect, this requires a kernel patch:
https://lkml.org/lkml/2017/8/10/689
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20170810165255.20865-1-rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit introduces a vhost-user device for SCSI. This is based
on the existing vhost-scsi implementation, but done over vhost-user
instead. It also uses a chardev to connect to the backend. Unlike
vhost-scsi (today), VMs using vhost-user-scsi can be live migrated.
To use it, start Qemu with a command line equivalent to:
qemu-system-x86_64 \
-chardev socket,id=vus0,path=/tmp/vus.sock \
-device vhost-user-scsi-pci,chardev=vus0,bus=pci.0,addr=...
A separate commit presents a sample application linked with libiscsi to
provide a backend for vhost-user-scsi.
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Message-Id: <1488479153-21203-4-git-send-email-felipe@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to introduce a new vhost-user-scsi host device type, it makes
sense to abstract part of vhost-scsi into a common parent class. This
commit does exactly that.
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Message-Id: <1488479153-21203-3-git-send-email-felipe@nutanix.com>
They will be used in virtio-scsi-dataplane.c as well, so move them to
header.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170317061447.16243-2-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In virtio_queue_host_notifier_aio_poll, not all "!virtio_queue_empty()"
cases are making true progress.
Currently the offending one is virtio-scsi event queue, whose handler
does nothing if no event is pending. As a result aio_poll() will spin on
the "non-empty" VQ and take 100% host CPU.
Fix this by reporting actual progress from virtio queue aio handlers.
Reported-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Ed Swierk <eswierk@skyportsystems.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Dataplane has been omitting forever the step of setting ISR when
an interrupt is raised. This caused little breakage, because the
specification actually says that ISR may not be updated in MSI mode.
Some versions of the Windows drivers however didn't clear MSI mode
correctly, and proceeded using polling mode (using ISR, not the used
ring index!) for crashdump and hibernation. If it were just crashdump
and hibernation it would not be a big deal, but recent releases of
Windows do not really shut down, but rather log out and hibernate to
make the next startup faster. Hence, this manifested as a more serious
hang during shutdown with e.g. Windows 8.1 and virtio-win 1.8.0 RPMs.
Newer versions fixed this, while older versions do not use MSI at all.
The failure has always been there for virtio dataplane, but it became
visible after commits 9ffe337 ("virtio-blk: always use dataplane path
if ioeventfd is active", 2016-10-30) and ad07cd6 ("virtio-scsi: always
use dataplane path if ioeventfd is active", 2016-10-30) made virtio-blk
and virtio-scsi always use the dataplane code under KVM. The good news
therefore is that it was not a bug in the patches---they were doing
exactly what they were meant for, i.e. shake out remaining dataplane bugs.
The fix is not hard, so it's worth arranging for the broken drivers.
The virtio_should_notify+event_notifier_set pair that is common to
virtio-blk and virtio-scsi dataplane is replaced with a new public
function virtio_notify_irqfd that also sets ISR. The irqfd emulation
code now need not set ISR anymore, so virtio_irq is removed.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Override start_ioeventfd and stop_ioeventfd to start/stop the
whole dataplane logic. This has some positive side effects:
- no need anymore for virtio_add_queue_aio (i.e. a revert of
commit 1c627137c1)
- no need anymore to switch from generic ioeventfd handlers to
dataplane
It detects some errors better:
$ qemu-system-x86_64 -object iothread,id=io \
-device virtio-scsi-pci,ioeventfd=off,iothread=io
qemu-system-x86_64: -device virtio-scsi-pci,ioeventfd=off,iothread=io:
ioeventfd is required for iothread
while previously it would have started just fine.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There is a new common one in virtio.h, use it.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
The previous patch dropped all op blockers from virtio-blk data plane.
The situation of virtio-scsi is exactly the same it can drop them too.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1463969978-24970-5-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In addition to handling IO in vcpu thread and in io thread, dataplane
introduces yet another mode: handling it by AioContext.
This reuses the same handler as previous modes, which triggers races as
these were not designed to be reentrant. Use a separate handler just
for aio, and disable regular handlers when dataplane is active.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add two missing checks for s->dataplane_fenced. In one case, QEMU
would skip injecting an IRQ due to a write to an uninitialized
EventNotifier's file descriptor.
In the second case, the dataplane_disabled field was used by mistake;
in fact after fixing this occurrence it is completely unused.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
The return code of virtqueue_pop/vring_pop is unused except to check for
errors or 0. We can thus easily move allocation inside the functions
and just return a pointer to the VirtQueueElement.
The advantage is that we will be able to allocate only the space that
is needed for the actual size of the s/g list instead of the full
VIRTQUEUE_MAX_SIZE items. Currently VirtQueueElement takes about 48K
of memory, and this kind of allocation puts a lot of stress on malloc.
By cutting the size by two or three orders of magnitude, malloc can
use much more efficient algorithms.
The patch is pretty large, but changes to each device are testable
more or less independently. Splitting it would mostly add churn.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The next patch will make virtqueue_pop/vring_pop allocate memory for
the VirtQueueElement. In some cases (blk, scsi, gpu) the device wants
to extend VirtQueueElement with device-specific fields and, until now,
the place of the VirtQueueElement within the containing struct didn't
matter. When allocating the entire block in virtqueue_pop/vring_pop,
however, the containing struct must basically be a "subclass" of
VirtQueueElement, with the VirtQueueElement as the first field. Make
that the case for blk and scsi; gpu is already doing it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Make use of the BDS-BB removal and insertion notifiers to remove or set
up, respectively, virtio-scsi's op blockers.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As only one place in virtio-scsi.c uses DEFINE_VIRTIO_SCSI_PROPERTIES
and DEFINE_VIRTIO_SCSI_FEATURES, there is no need to expose them. Inline
them into virtio-scsi.c to avoid wrongly use.
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
So far virtio-scsi-device can't expose host features to guest while
using virtio-mmio because it doesn't set DEFINE_VIRTIO_SCSI_FEATURES on
backend or transport.
The host features belong to the backends while virtio-scsi-pci,
virtio-scsi-s390 and virtio-scsi-ccw set the DEFINE_VIRTIO_SCSI_FEATURES
on transports. But they already have the ability to forward property
accesses to the backend child. So if we move the host features to
backends, it doesn't break the backwards compatibility for them and
make host features work while using virtio-mmio.
Move DEFINE_VIRTIO_SCSI_FEATURES to the backend virtio-scsi. The
transports just sync the host features from backends.
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
The anonymous struct only has a single field now, drop the wrapper
structure.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
cdb is now part of cmd, drop it from req.
There's also nothing to check using build assert now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Commit "virtio-scsi: use standard-headers" added
cdb and sense into req/rep structures, which
breaks uses of sizeof for these structures,
since qemu adds its own arrays on top.
To fix, redefine CDB/sense field size to 0.
Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Because Qemu only accept an wwpn argument for vhost-scsi, we
cannot assign a tpgt. That's say tpg is transparent for Qemu, Qemu
doesn't know which tpg can boot, but vhost-scsi driver module
doesn't know too for one assigned wwpn.
At present, we assume that the first tpg can boot only, and add
a boot_tpgt property that defaults to 0. Of course, people can
pass a valid value by qemu command line.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The cdb is not zeroed by virtio_scsi_init_req, so fix the misleading
comment.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There's no use to constantly trying to enable dataplane if we failed
to set up guest or host notifiers, so fence it off in that case.
We'll try again if the device is reinitialized.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We need this to protect dataplane thread from race conditions with block
jobs until the latter is made dataplane-safe.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For VIRTIO_SCSI_T_TMF_ABORT_TASK and VIRTIO_SCSI_T_TMF_ABORT_TASK_SET,
use scsi_req_cancel_async to start the cancellation.
Because each tmf command may cancel multiple requests, we need to use a
counter to track the number of remaining requests we still need to wait
for.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Queue the popped requests while calling
virtio_scsi_handle_cmd_req_prepare(), then submit them after all
prepared.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Mechanical change, in preparation for bdrv_io_plug/bdrv_io_unplug.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similar to virtio-blk-dataplane, we stop the iothread while migration
starts and restart it when migration finishes.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This implements the core part of dataplane feature of virtio-scsi.
A few fields are added in VirtIOSCSICommon to maintain the dataplane
status. These fields are managed by a new source file:
virtio-scsi-dataplane.c.
Most code in this file will run on an iothread, unless otherwise
commented as in a global mutex context, such as those functions to
start, stop and setting the iothread property.
Upon start, we set up guest/host event notifiers, in a same way as
virtio-blk does. The handlers then pop request from vring and call into
virtio-scsi.c functions to process it. So we need to make sure make all
those called functions work with iothread, too.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move VirtIOSCSIReq to header and add one field "vring" as a wrapper
structure of Vring, VirtIOSCSIVring.
This is necessary for coming dataplane code that runs uses vring on
iothread.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similar to this property in virtio-blk for dataplane, add it as a QOM
link in virtio-scsi and an alias in virtio-scsi-pci and virtio-scsi-ccw,
in order to assign an iothread to the device.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is the "common part" to handle one cmd request. Refactor out for
later usage of dataplane iothread code.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Right now starting a machine with virtio-scsi and a <= 2.0 machine type
fails with:
qemu-system-x86_64: -device virtio-scsi-pci: Property .any_layout not found
This is because the any_layout bit was actually never set after
virtio-scsi was changed to support arbitrary layout for virtio buffers.
(This was just a cleanup and a preparation for virtio 1.0; no guest
actually checks the bit, but the new request parsing algorithms are
tested even with old guest).
Reported-by: David Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The two common virtio features can be defined per bus, so move all
into bus class device to make code more clean.
As discussed with cornelia, s390-virtio-blk doesn't support
the two features at all, so keep s390-virtio as it.
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> #for s390 ccw
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
MST: rebase and resolve conflicts
vhost userspace needn't to handle vq's notification from guest,
so define dummy handle_output callback for all vqs of vhost-scsi.
In some corner cases(such as when handling vq's reset from VM), virtio-pci
still trys to handle pending virtio-scsi events, then object check failure
inside virtio_scsi_handle_event() for vhost-scsi can be triggered.
The issue can be reproduced by 'rmmod virtio-scsi', 'system sleep' or reboot
inside VM.
Cc: qemu-stable@nongnu.org
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Store the request and response headers by value, and let
virtio_scsi_parse_req check that there is only one of datain
and dataout.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>