Errors should be logged using error_report() so they go to the
appropriate monitor.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Fix virtio-blk to use the usual completion path that involves werror handling
instead of directly completing the request in cases where bdrv_aio_flush
returns NULL.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The werror option now affects not only write requests, but also flush requests.
Previously, it was not possible to stop a VM on a failed flush.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
in_sg[].iovec and out_sg[].ioved are pointer to (source) host memory and
therefore invalid after migration. When loading the device state we must
create a new mapping on the destination host.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Changing block.h or blockdev.h resulted in recompiling most objects.
Move DriveInfo typedef and BlockInterfaceType enum definitions
to qemu-common.h and rearrange blockdev.h use to decrease churn.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Otherwise we can't migrate after we've removed a virtio block device.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Disks without media make no sense. For SCSI, a Linux guest kernel
complains during boot. I didn't try other combinations.
scsi-generic doesn't need the additional check, because it already
requires bdrv_is_sg(), which fails without media.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the check from virtio_blk_init_pci(), where it protects only
virtio-blk-pci, to virtio_blk_init(). Without that, virtio-blk-s390
initializes without a drive. I figure that can lead to null pointer
dereferences.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When available, we'd like to be able to access the DeviceState
when registering a savevm. For buses with a get_dev_path()
function, this will allow us to create more unique savevm
id strings.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds the final missing bits for support of
passing a serial/id string to a virtio-blk guest driver.
The guest-side component already exists in the virtio
driver, and has recently been reworked by Ryan to export
a /sys interface for retrieval of the id from guest userland.
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BlockDriverState member removable controls whether virtual media
change (monitor commands change, eject) is allowed. It is set when
the "type hint" is BDRV_TYPE_CDROM or BDRV_TYPE_FLOPPY.
The type hint is only set by drive_init(). It sets BDRV_TYPE_FLOPPY
for if=floppy. It sets BDRV_TYPE_CDROM for media=cdrom and if=ide,
scsi, xen, or none.
if=ide and if=scsi work, because the type hint makes it a CD-ROM.
if=xen likewise, I think.
For the same reason, if=none works when it's used by ide-drive or
scsi-disk. For other guest devices, there are problems:
* fdc: you can't change virtual media
$ qemu [...] -drive if=none,id=foo,... -global isa-fdc.driveA=foo
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) eject foo
Device 'foo' is not removable
unless you add media=cdrom, but that makes it readonly.
* virtio: if you add media=cdrom, you can change virtual media. If
you eject, the guest gets I/O errors. If you change, the guest sees
the drive's contents suddenly change.
* scsi-generic: if you add media=cdrom, you can change virtual media.
I didn't test what that does to the guest or the physical device,
but it can't be pretty.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make the property point to BlockDriverState, cutting out the DriveInfo
middleman. This prepares the ground for block devices that don't have
a DriveInfo.
Currently all user-defined ones have a DriveInfo, because the only way
to define one is -drive & friends (they go through drive_init()).
DriveInfo is closely tied to -drive, and like -drive, it mixes
information about host and guest part of the block device. I'm
working towards a new way to define block devices, with clean
host/guest separation, and I need to get DriveInfo out of the way for
that.
Fortunately, the device models are perfectly happy with
BlockDriverState, except for two places: ide_drive_initfn() and
scsi_disk_initfn() need to check the DriveInfo for a serial number set
with legacy -drive serial=... Use drive_get_by_blockdev() there.
Device model code should now use DriveInfo only when explicitly
dealing with drives defined the old way, i.e. without -device.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Although it is really rare to get in to the while loop, the list
operation in the loop is obviously wrong.
Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
That's where they belong semantically (block device host part), even
though the actions are actually executed by guest device code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pass the MultiReqBuffer structure down all the way to the I/O submission
instead of takin it apart. Also mark num_writes unsigned as it can't
go negative, and take the check for any pending I/O requests into the
submission function. Last but not least rename do_multiwrite to
virtio_submit_multiwrite to fit the general naming scheme and make clear
what it does.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
There is a 1:1 relation between VirtIOBlock and BlockDriverState instances,
no need to track it because it won't change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Anything that moves hundreds of lines out of vl.c can't be all bad.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Clean up virtio-blk.c to be more consistent using BDRV_SECTOR_SIZE
instead of hard coded 512 values.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before issuing the barrier to the block driver we need to flush our oustanding
queue of write requests, as the flush is supposed to be issued after them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The VirtIOBlockRequest structure is about 40 KB in size. This patch
avoids zeroing every request by only initializing fields that are read.
The other fields are either written to or may not be used at all.
Oprofile shows about 10% of CPU samples in memset called by
virtio_blk_alloc_request(). The workload is
dd if=/dev/vda of=/dev/null iflag=direct bs=8k running concurrently 4
times. This patch makes memset disappear to the bottom of the profile.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_set_geometry_hint call below is not needed - it's just setting
what was just read.
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
virtio_blk_req_complete frees the request, so we can't access it any more when
calling bdrv_mon_event. Use the pointer that was copied earlier.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Add a logical block size attribute as various guest side tools only
increase the filesystem sector size based on it, not the advisory
physical block size.
For scsi we already have support for a different logical block size
in place for CDROMs that we can built upon. Only my recent block
device characteristics VPD page needs some fixups. Note that we
leave the logial block size for CDROMs hardcoded as the 2k value
is expected for it in general.
For virtio-blk we already have a feature flag claiming to support
a variable logical block size that was added for the s390 kuli
hypervisor. Interestingly it does not actually change the units
in which the protocol works, which is still fixed at 512 bytes,
but only communicates a different minimum I/O granularity. So
all we need to do in virtio is to add a trap for unaligned I/O
and round down the device size to the next multiple of the logical
block size.
IDE does not support any other logical block size than 512 bytes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The next commit will move the STOP event into do_vm_stop(), to
have the expected event sequence we need to emit the I/O error
event before calling vm_stop().
The expected sequence is:
{ "event": "BLOCK_IO_ERROR" [...] }
{ "event": "STOP" }
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Export all topology information in the block config structure,
guarded by a new VIRTIO_BLK_F_TOPOLOGY feature flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add three new qdev properties to export block topology information to
the guest. This is needed to get optimal I/O alignment for RAID arrays
or SSDs.
The options are:
- physical_block_size to specify the physical block size of the device,
this is going to increase from 512 bytes to 4096 kilobytes for many
modern storage devices
- min_io_size to specify the minimal I/O size without performance impact,
this is typically set to the RAID chunk size for arrays.
- opt_io_size to specify the optimal sustained I/O size, this is
typically the RAID stripe width for arrays.
I decided to not auto-probe these values from blkid which might easily
be possible as I don't know how to deal with these issues on migration.
Note that we specificly only set the physical_block_size, and not the
logial one which is the unit all I/O is described in. The reason for
that is that IDE does not support increasing the logical block size and
at last for now I want to stick to one meachnisms in queue and allow
for easy switching of transports for a given backing image which would
not be possible if scsi and virtio use real 4k sectors, while ide only
uses the physical block exponent.
To make this more common for the different block drivers introduce a
new BlockConf structure holding all common block properties and a
DEFINE_BLOCK_PROPERTIES macro to add them all together, mirroring
what is done for network drivers. Also switch over all block drivers
to use it, except for the floppy driver which has weird driveA/driveB
properties and probably won't require any advanced block options ever.
Example usage for a virtio device with 4k physical block size and
8k optimal I/O size:
-drive file=scratch.img,media=disk,cache=none,id=scratch \
-device virtio-blk-pci,drive=scratch,physical_block_size=4096,opt_io_size=8192
aliguori: updated patch to take into account BLOCK events
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The addition of the whole ATA IDENTIY page caused the config space to
go above the allowed size in the PCI spec, and thus the feature was
already reverted in the Linux guest driver and disabled by default in
qemu.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Just call bdrv_mon_event() in the right place.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If an I/O request fails right away instead of getting an error only in the
callback, we still need to consider rerror/werror.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Current code assumes that only write requests are ever going to be restarted.
This is wrong since rerror=stop exists. Instead of directly starting writes,
use the same request processing as used for new requests.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We need a function that handles a single request. Create one by splitting out
code from virtio_blk_handle_output.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
As pointed out by clang size is only ever written to, but never actually
used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add feature bits as properties to virtio. This makes it possible to e.g. define
machine without indirect buffer support, which is required for 0.10
compatibility, or without hardware checksum support, which is required for 0.11
compatibility. Since default values for optional features are now set by qdev,
get_features callback has been modified: it sets non-optional bits, and clears
bits not supported by host.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Either rename variables and functions to refer to write errors (which is what
they actually do) or introduce a parameter to distinguish reads and writes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We need to signal not only write errors, but also read errors to the guest
driver. This fixes a regression introduced by 869a5c6d.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Changes:
* drive_uninit() wants a DriveInfo now.
* drive_uninit() also calls bdrv_delete(),
so callers don't need to do that.
* drive_uninit() calls are moved over to the ->exit()
callbacks, destroy_bdrvs() is zapped.
* setting bdrv->private is not needed any more as the
only user (destroy_bdrvs) is gone.
* usb-storage needs no drive_uninit, scsi-disk will
handle that.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add a new VIRTIO_BLK_F_WCACHE feature to virtio-blk to indicate that we have
a volatile write cache that needs controlled flushing. Implement a
VIRTIO_BLK_T_FLUSH operation to flush it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
commit bf011293fa made virtio-blk-pci not
PCI-compliant, since it makes region 0 (which is an i/o region)
size > 256, and, since PCI 2.1, i/o regions are limited to 256 bytes size.
When the ATA serial number feature is off, which is the default,
make the device spec compliant again, by making region 0 smaller.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Tested-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
It is quite common for virtio-blk to submit more than one write request in a
row to the qemu block layer. Use bdrv_aio_multiwrite to allow block drivers to
optimize its handling of the requests.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The bdrv_aio_{read,write} routines can return a NULL pointer when the
I/O submission fails. Currently we ignore this and will wait forever
for an I/O completion and leading to a hang of the guest.
I can easily reproduce this using the native Linux AIO patch, but it's
also possible using normal pthreads-based AIO.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
First user of the new drive property. With this patch applied host
and guest config can be specified separately, like this:
-drive if=none,id=disk1,file=/path/to/disk.img
-device virtio-blk-pci,drive=disk1
You can set any property for virtio-blk-pci now. You can set the pci
address via addr=. You can switch the device into 0.10 compat mode
using class=0x0180. As this is per device you can have one 0.10 and one
0.11 virtio block device in a single virtual machine.
Old syntax continues to work. Internally it does the same as the two
lines above though. One side effect this has is a different
initialization order, which might result in a different pci address
being assigned by default.
Long term plan here is to have this working for all block devices, i.e.
once all scsi is properly qdev-ified you will be able to do something
like this:
-drive if=none,id=sda,file=/path/to/disk.img
-device lsi,id=lsi,addr=<pciaddr>
-device scsi-disk,drive=sda,bus=lsi.0,lun=<n>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Message-Id:
When a VM state change handler changes VM state, other VM state change
handlers can see the state transitions out of order.
bmdma_map(), scsi_disk_init() and virtio_blk_init() install VM state
change handlers to restart DMA. These handlers can vm_stop() by
running into a write error on a drive with werror=stop. This throws
the VM state change handler callback into disarray. Here's an example
case I observed:
0. The virtual IDE drive goes south. All future writes return errors.
1. Something encounters a write error, and duly stops the VM with
vm_stop().
2. vm_stop() calls vm_state_notify(0).
3. vm_state_notify() runs the callbacks in list vm_change_state_head.
It contains ide_dma_restart_cb() installed by bmdma_map(). It also
contains audio_vm_change_state_handler() installed by audio_init().
4. audio_vm_change_state_handler() stops audio stuff.
5. User continues VM with monitor command "c". This runs vm_start().
6. vm_start() calls vm_state_notify(1).
7. vm_state_notify() runs the callbacks in vm_change_state_head.
8. ide_dma_restart_cb() happens to come first. It does its work, runs
into a write error, and duly stops the VM with vm_stop().
9. vm_stop() runs vm_state_notify(0).
10. vm_state_notify() runs the callbacks in vm_change_state_head.
11. audio_vm_change_state_handler() stops audio stuff. Which isn't
running.
12. vm_stop() finishes, ide_dma_restart_cb() finishes, step 7's
vm_state_notify() resumes running handlers.
13. audio_vm_change_state_handler() starts audio stuff. Oopsie.
Fix this by moving the actual write from each VM state change handler
into a new bottom half (suggested by Gleb Natapov).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
[brought forward to current qemu-kvm.git]
This patch implements the missing qemu logic to
interpret a '-drive .. serial=XYZ ..' flag for
a virtio_blk device.
The serial number string is contained in a
skeletal IDENTIFY DEVICE data structure and
this structure is made available to the guest
virtio_blk driver via pci i/o region 0.
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>