Monitor operations that manipulate image files must not execute while a
background job (like image streaming) is in progress. This prevents
corruptions from happening when two pieces of code are manipulating the
image file without knowledge of each other.
The monitor "commit" command raises QERR_DEVICE_IN_USE when
bdrv_commit() returns -EBUSY but "commit all" has no error handling.
This is easy to fix, although note that we do not deliver a detailed
error about which device was busy in the "commit all" case.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a QAPI/QMP only command to take a snapshot of a group of
devices. This is similar to the blockdev-snapshot-sync command, except
blockdev-group-snapshot-sync accepts a list devices, filenames, and
formats.
It is attempted to keep the snapshot of the group atomic; if the
creation or open of any of the new snapshots fails, then all of
the new snapshots are abandoned, and the name of the snapshot image
that failed is returned. The failure case should not interrupt
any operations.
Rather than use bdrv_close() along with a subsequent bdrv_open() to
perform the pivot, the original image is never closed and the new
image is placed 'in front' of the original image via manipulation
of the BlockDriverState fields. Thus, once the new snapshot image
has been successfully created, there are no more failure points
before pivoting to the new snapshot.
This allows the group of disks to remain consistent with each other,
even across snapshot failures.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Floppies must be read at a specific transfer rate, depending of its own format.
Update floppy description table to include required transfer rate.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's emitted whenever the tray is moved by the guest or by HMP/QMP
commands.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
They are QMP events, not monitor events. Rename them accordingly.
Also, move bdrv_emit_qmp_error_event() up in the file. A new event will
be added soon and it's good to have them next each other.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Copy-on-Read populates the image file with data read from a backing
image. In order to avoid bloating the image file when all zeroes are
read we should scan the buffer and perform an optimized zero write
operation.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The ability to zero regions of an image file is a useful primitive for
higher-level features such as image streaming or zero write detection.
Image formats may support an optimized metadata representation instead
of writing zeroes into the image file. This allows zero writes to be
potentially faster than regular write operations and also preserve
sparseness of the image file.
The .bdrv_co_write_zeroes() interface should be implemented by block
drivers that wish to provide efficient zeroing.
Note that this operation is different from the discard operation, which
may leave the contents of the region indeterminate. That means
discarded blocks are not guaranteed to contain zeroes and may contain
junk data instead.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add bdrv_find_backing_image: given a BlockDriverState pointer, and an id,
traverse the backing image chain to locate the id.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Previously copy-on-read could only be enabled for all requests to a
block device. This means requests coming from the guest as well as
QEMU's internal requests would perform copy-on-read when enabled.
For image streaming we want to support finer-grained behavior than just
populating the image file from its backing image. Image streaming
supports partial streaming where a common backing image is preserved.
In this case guest requests should not perform copy-on-read because they
would indiscriminately copy data which should be left in a backing image
from the backing chain.
Introduce a per-request flag for copy-on-read so that a block device can
process both regular and copy-on-read requests. Overlapping reads and
writes still need to be serialized for correctness when copy-on-read is
happening, so add an in-flight reference count to track this.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Long-running block operations like block migration and image streaming
must have continual access to their block device. It is not safe to
perform operations like hotplug, eject, change, resize, commit, or
external snapshot while a long-running operation is in progress.
This patch adds the missing bdrv_in_use() checks so that block migration
and image streaming never have the rug pulled out from underneath them.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Coverity is confused by this "if" and reports leaks on acb->bh.
The bottom half is always deleted before releasing the AIOCB,
in either bdrv_aio_cancel_em or bdrv_aio_bh_cb.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that early failure of bdrv_aio_writev is not possible anymore,
mcb->num_requests can be set before the loop starts.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Initially done with the following semantic patch:
@ rule1 @
expression E;
statement S;
@@
E =
(
bdrv_aio_readv
| bdrv_aio_writev
| bdrv_aio_flush
| bdrv_aio_discard
| bdrv_aio_ioctl
)
(...);
(
- if (E == NULL) { ... }
|
- if (E)
{ <... S ...> }
)
which however missed the occurrence in block/blkverify.c
(as it should have done), and left behind some unused
variables.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O. Most of these places actually want to drain all block
requests but there is no block layer API to do so.
This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete. As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Debugging a reentrant request deadlock was fun but in the future we need
a quick and obvious way of detecting such bugs. Add an assert that
checks we are not about to deadlock when waiting for another request.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Cases beyond the end of the disk image are only implemented for block
drivers that do not provide .bdrv_co_is_allocated(). It's worth making
these cases generic so that block drivers that do implement
.bdrv_co_is_allocated() also get them for free.
Suggested-by: Mark Wu <wudxw@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Detect overlapping requests and remember to align to cluster boundaries
if the image format uses them. This assumes that allocating I/O is
performed in cluster granularity - which is true for qcow2, qed, etc.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When copy-on-read is enabled it is necessary to wait for overlapping
requests before issuing new requests. This prevents races between the
copy-on-read and a write request.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_enable_copy_on_read()/bdrv_disable_copy_on_read() functions can
be used to programmatically enable or disable copy-on-read for a block
device. Later patches add the actual copy-on-read logic.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block layer does not know about pending requests. This information
is necessary for copy-on-read since overlapping requests must be
serialized to prevent races that corrupt the image.
The BlockDriverState gets a new tracked_request list field which
contains all pending requests. Each request is a BdrvTrackedRequest
record with sector_num, nb_sectors, and is_write fields.
Note that request tracking is always enabled but hopefully this extra
work is so small that it doesn't justify adding an enable/disable flag.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch introduces the public bdrv_co_is_allocated() interface which
can be used to query image allocation status while the VM is running.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that all block drivers have been converted to
.bdrv_co_is_allocated() we can drop .bdrv_is_allocated().
Note that the public bdrv_is_allocated() interface is still available
but is in fact a synchronous wrapper around .bdrv_co_is_allocated().
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds the .bdrv_co_is_allocated() interface which is identical
to .bdrv_is_allocated() but runs in coroutine context. Running in
coroutine context implies that other coroutines might be performing I/O
at the same time. Therefore it must be safe to run while the following
BlockDriver functions are in-flight:
.bdrv_co_readv()
.bdrv_co_writev()
.bdrv_co_flush()
.bdrv_co_is_allocated()
The new .bdrv_co_is_allocated() interface is useful because it can be
used when a VM is running, whereas .bdrv_is_allocated() is a synchronous
interface that does not cope with parallel requests.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is no need for bdrv_commit() to use the BlockDriver
.bdrv_is_allocated() interface directly. Converting to the public
interface gives us the freedom to drop .bdrv_is_allocated() entirely in
favor of a new .bdrv_co_is_allocated() in the future.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Image files have two types of data: immutable data that describes things like
image size, backing files, etc. and mutable data that includes offset and
reference count tables.
Today, image formats aggressively cache mutable data to improve performance. In
some cases, this happens before a guest even starts. When dealing with live
migration, since a file is open on two machines, the caching of meta data can
lead to data corruption.
This patch addresses this by introducing a mechanism to invalidate any cached
mutable data a block driver may have which is then used by the live migration
code.
NB, this still requires coherent shared storage. Addressing migration without
coherent shared storage (i.e. NFS) requires additional work.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
cache=unsafe completely ignored bdrv_flush, because flushing the host disk
costs a lot of performance. However, this means that qcow2 images (and
potentially any other format) can lose data even after the guest has issued a
flush if the qemu process crashes/is killed. In case of a host crash, data loss
is certainly expected with cache=unsafe, but if just the qemu process dies this
is a bit too unsafe.
Now that we have two separate flush functions, we can choose to flush
everythign to the OS, but don't enforce that it's physically written to the
disk.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qcow2 has a writeback metadata cache, so flushing a qcow2 image actually
consists of writing back that cache to the protocol and only then flushes the
protocol in order to get everything stable on disk.
This introduces a separate bdrv_co_flush_to_os to reflect the split.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are two different types of flush that you can do: Flushing one level up
to the OS (i.e. writing data to the host page cache) or flushing it all the way
down to the disk. The existing functions flush to the disk, reflect this in the
function name.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Recent versions of udev always keep the tray locked so that the kernel
can observe "eject request" events (aka tray button presses) even on
discs that aren't mounted. Add support for these events in the ATAPI
and SCSI cd drive device models.
To let management cope with the behavior of udev, an event should also
be added for "tray opened/closed". This way, after issuing an "eject"
command, management can poll until the guests actually reacts to the
command. They can then issue the "change" command after the tray has been
opened, or try with "eject -f" after a (configurable?) timeout. However,
with this patch and the corresponding support in the device models,
at least it is possible to do a manual two-step eject+change sequence.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Several BlockDriverState fields are not being reinitialized across
bdrv_close()/bdrv_open(). Make sure they are reset to their default
values.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Several block drivers set bs->read_only in .bdrv_open() but
block.c:bdrv_open_common() clobbers its value. Additionally, QED uses
bdrv_is_read_only() in .bdrv_open() to decide whether to perform
consistency checks.
The correct ordering is to initialize bs->read_only from the open flags
before calling .bdrv_open(). This way block drivers can override it if
necessary and can use bdrv_is_read_only() in .bdrv_open().
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
tmp_filename was used outside the block it was defined in, i.e. after it went
out of scope. Move its declaration to the top level.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Previous commits dropped most qobjects usage from qemu modules
(now they are a low level interface used by the QAPI). However,
some modules still include the qemu-objects.h header file.
This commit drops qemu-objects.h from some of those modules
and includes qjson.h instead, which is what they actually need.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
The biggest change is to rename its prefix from BDRV_IOS to
BLOCK_DEVICE_IO_STATUS.
Next commit will convert the query-block command to the QAPI
and that's how the enumeration is going to be generated.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
A future commit will convert bdrv_info() to the QAPI and it won't
provide IOS_INVAL.
Luckily all we have to do is to add a new 'iostatus_enabled'
member to BlockDriverState and use it instead.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Since coroutine operation is now mandatory, convert both bdrv_discard
implementations to coroutines. For qcow2, this means taking the lock
around the operation. raw-posix remains synchronous.
The bdrv_discard callback is then unused and can be eliminated.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since coroutine operation is now mandatory, convert all bdrv_flush
implementations to coroutines. For qcow2, this means taking the lock.
Other implementations are simpler and just forward bdrv_flush to the
underlying protocol, so they can avoid the lock.
The bdrv_flush callback is then unused and can be eliminated.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This similarly adds support for coroutine and asynchronous discard.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add coroutine support for flush and apply the same emulation that
we already do for read/write. bdrv_aio_flush is simplified to always
go through a coroutine.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>