This is a more flexible alternative to bdrv_iterate().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
do_commit() and mux_proc_byte() iterate over the list of drives
defined with drive_init(). This misses host block devices defined by
other means. Such means don't exist now, but will be introduced later
in this series.
Change them to use new bdrv_commit_all(), which iterates over all host
block devices.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
That's where they belong semantically (block device host part), even
though the actions are actually executed by guest device code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Both bdrv_can_snapshot() and bdrv_has_snapshot() does not work as advertized.
First issue: Their names implies different porpouses, but they do the same thing
and have exactly the same code. Maybe copied and pasted and forgotten?
bdrv_has_snapshot() is called in various places for actually checking if there
is snapshots or not.
Second issue: the way bdrv_can_snapshot() verifies if a block driver supports or
not snapshots does not catch all cases. E.g.: a raw image.
So when do_savevm() is called, first thing it does is to set a global
BlockDriverState to save the VM memory state calling get_bs_snapshots().
static BlockDriverState *get_bs_snapshots(void)
{
BlockDriverState *bs;
DriveInfo *dinfo;
if (bs_snapshots)
return bs_snapshots;
QTAILQ_FOREACH(dinfo, &drives, next) {
bs = dinfo->bdrv;
if (bdrv_can_snapshot(bs))
goto ok;
}
return NULL;
ok:
bs_snapshots = bs;
return bs;
}
bdrv_can_snapshot() may return a BlockDriverState that does not support
snapshots and do_savevm() goes on.
Later on in do_savevm(), we find:
QTAILQ_FOREACH(dinfo, &drives, next) {
bs1 = dinfo->bdrv;
if (bdrv_has_snapshot(bs1)) {
/* Write VM state size only to the image that contains the state */
sn->vm_state_size = (bs == bs1 ? vm_state_size : 0);
ret = bdrv_snapshot_create(bs1, sn);
if (ret < 0) {
monitor_printf(mon, "Error while creating snapshot on '%s'\n",
bdrv_get_device_name(bs1));
}
}
}
bdrv_has_snapshot(bs1) is not checking if the device does support or has
snapshots as explained above. Only in bdrv_snapshot_create() the device is
actually checked for snapshot support.
So, in cases where the first device supports snapshots, and the second does not,
the snapshot on the first will happen anyways. I believe this is not a good
behavior. It should be an all or nothing process.
This patch addresses these issues by making bdrv_can_snapshot() actually do
what it must do and enforces better tests to avoid errors in the middle of
do_savevm(). bdrv_has_snapshot() is removed and replaced by bdrv_can_snapshot()
where appropriate.
bdrv_can_snapshot() was moved from savevm.c to block.c. It makes more sense to me.
The loadvm_state() function was updated too to enforce that when loading a VM at
least all writable devices must support snapshots too.
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When snapshot handlers are not defined in the format driver, it is
better to call the ones of the protocol driver. This enables us to
implement snapshot support in the protocol driver.
We need to call bdrv_close() and bdrv_open() handlers of the format
driver before and after bdrv_snapshot_goto() call of the protocol. It is
because the contents of the block driver state may need to be changed
after loading vmstate.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch calls the close handler of the block driver before the qemu
process exits.
This is necessary because the sheepdog block driver releases the lock
of VM images in the close handler.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu -cdrom /dev/cdrom with an empty CD-ROM drive doesn't work any more because
we try to guess the format and when this fails (because there is no medium) we
exit with an error message.
This patch should restore the old behaviour by assuming raw format for such
drives.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Clean up block.c and use BDRV_SECTOR_SIZE rather than hard coded
numbers (512) when referring to sector size throughout the code.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In bdrv_open() there is no need to shift total_size >> 9 just to
multiply it by 512 again just a few lines later, since this is the
only place the variable is used.
Mask with BDRV_SECTOR_MASK to protect against case where we are
passed a corrupted image.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Previous commit added QMP documentation to the qemu-monitor.hx
file, it's is a copy of this information.
While it's good to keep it near code, maintaining two copies of
the same information is too hard and has little benefit as we
don't expect client writers to consult the code to find how to
use a QMP command.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds a missing bdrv_delete() call in find_image_format() so that a
SG_IO BlockDriver properly releases the temporary BlockDriverState *bs created
from bdrv_file_open()
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Reported-by: Chris Krumme <chris.krumme@windriver.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch enables protocol drivers to use their create options which
are not supported by the format. For example, protcol drivers can use
a backing_file option with raw format.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
With overlapping requests, the total number of sectors is smaller than the sum
of the nb_sectors of both requests.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Usually the guest can tell the host to flush data to disk. In some cases we
don't want to flush though, but try to keep everything in cache.
So let's add a new cache value to -drive that allows us to set the cache
policy to most aggressive, disabling flushes. We call this mode "unsafe",
as guest data is not guaranteed to survive host crashes anymore.
This patch also adds a noop function for aio, so we can do nothing in AIO
fashion.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This patch adds a special case check for scsi-generic devices in
refresh_total_sectors() to skip the subsequent BlockDriver->bdrv_getlength()
that will be returning -ESPIPE from block/raw-posic.c:raw_getlength() for
BlockDriverState->sg=1 devices.
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds a special BlockDriverState->sg check in block.c:find_image_format()
after bdrv_file_open() -> block/raw-posix.c:hdev_open() has been called to determine
if we are dealing with a Linux host scsi-generic device.
The patch then returns the BlockDriver * from bdrv_find_format("raw"), skipping the
subsequent bdrv_read() and rest of find_image_format().
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The difference between the start sectors of two requests can be larger
than the size of the "int" type, which can lead to a not correctly
sorted multiwrite array and thus spurious I/O errors and filesystem
corruption due to incorrect request merges.
So instead of doing the cute sector arithmetics trick spell out the
exact comparisms.
Spotted by Kevin Wolf based on a testcase from Michael Tokarev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The special case doesn't really us buy anything. Without it vvfat works more
consistently as a protocol. We get raw on top of vvfat now, which works just
as well as using vvfat directly.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The 'parent' field in the 'query-blockstats' monitor command is
part of the top level block device QDict, not part of the 2nd
level 'stats' QDict.
* block.c: Fix docs for 'parent' field in block stats monitor
command output
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is a call to free() where qemu_free() should instead be used.
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When reopening the image, don't guess the driver, but use the same driver as
was used before. This is important if the format=... option was used for that
image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We can't assume the file protocol for Windows devices, they need the same
detection as other files for which an explicit protocol is not specified.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
They aren't used afterwards nor supposed to be stored by a bdrv_create
handler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds the wr_highest_sector blockstat which implements what is generally
known as the high watermark. It is the highest offset of a sector written to
the respective BlockDriverState since it has been opened.
The query-blockstat QMP command is extended to add this value to the result,
and also to add the statistics of the underlying protocol in a new "parent"
field. Note that to get the "high watermark" of a qcow2 image, you need to look
into the wr_highest_sector field of the parent (which can be a file, a
host_device, ...). The wr_highest_sector of the qcow2 BlockDriverState itself
is the highest offset on the _virtual_ disk that the guest has written to.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The BlockDriver bdrv_getlength function is called from the I/O code path
when checking that the request falls within the device. Unfortunately
this involves an lseek system call in the raw protocol; every read or
write request will incur this lseek cost.
Jan Kiszka <jan.kiszka@siemens.com> identified this issue and its
latency overhead. This patch caches device length in the existing
total_sectors variable so lseek calls can be avoided for fixed size
devices.
Growable devices fall back to the full bdrv_getlength code path because
I have not added logic to detect extending the size of the device in a
write.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is safer to set backing_hd to NULL after deleting it so that any use
after deletion is obvious during development. Happy segfaulting!
This patch should be applied after Kevin Wolf's "vmdk: Convert to
bdrv_open" so that vmdk does not segfault on close.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This fixes the problem that qemu-img's use of no_zero_init only considered the
no_zero_init flag of the format driver, but not of the underlying protocols.
Between the raw/file split and this fix, converting to host devices is broken.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.
For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_open contains quite some code that is only useful for opening images (as
opposed to opening files by a protocol), for example snapshots.
This patch splits the code so that we have bdrv_open_file() for files (uses
protocols), bdrv_open() for images (uses format drivers) and bdrv_open_common()
for the code common for opening both images and files.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We're running into various problems because the "raw" file access, which
is used internally by the various image formats is entangled with the
"raw" image format, which maps the VM view 1:1 to a file system.
This patch renames the raw file backends to the file protocol which
is treated like other protocols (e.g. nbd and http) and adds a new
"raw" image format which is just a wrapper around calls to the underlying
protocol.
The patch is surprisingly simple, besides changing the probing logical
in block.c to only look for image formats when using bdrv_open and
renaming of the old raw protocols to file there's almost nothing in there.
For creating images, a new bdrv_create_file is introduced which guesses the
protocol to use. This allows using qemu-img create -f raw (or just using the
default) for both files and host devices. Converting the other format drivers
to use this function to create their images is left for later patches.
The only issues still open are in the handling of the host devices.
Firstly in current qemu we can specifiy the host* format names
on various command line acceping images, but the new code can't
do that without adding some translation. Second the layering breaks
the no_zero_init flag in the BlockDriver used by qemu-img. I'm not
happy how this is done per-driver instead of per-state so I'll
prepare a separate patch to clean this up.
There's some more cleanup opportunity after this patch, e.g. using
separate lists and registration functions for image formats vs
protocols and maybe even host drivers, but this can be done at a
later stage.
Also there's a check for protocol in bdrv_open for the BDRV_O_SNAPSHOT
case that I don't quite understand, but which I fear won't work as
expected - possibly even before this patch.
Note that this patch requires various recent block patches from Kevin
and me, which should all be in his block queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A new iovec array is allocated when creating a merged write request.
This patch ensures that the iovec array is deleted in addition to its
qiov owner.
Reported-by: Leszek Urbanski <tygrys@moo.pl>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_first linked list of BlockDriverStates is currently extern so
that block migration can iterate the list. However, since there is
already a bdrv_iterate() function there is no need to expose bdrv_first.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BDRV_O_FILE is only used to communicate between bdrv_file_open and bdrv_open.
It affects two things: first bdrv_open only searches for protocols using
find_protocol instead of all image formats and host drivers. We can easily
move that to the caller and pass the found driver to bdrv_open. Second
it is used to not force a read-write open of a snapshot file. But we never
use bdrv_file_open to open snapshots and this behaviour doesn't make sense
to start with.
qemu-io abused the BDRV_O_FILE for it's growable option, switch it to
using bdrv_file_open to make sure we only open files as growable were
we can actually support that.
This patch requires Kevin's "[PATCH] Replace calls of old bdrv_open" to
be applied first.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
What is known today as bdrv_open2 becomes the new bdrv_open. All remaining
callers of the old function are converted to the new one. In some places they
even know the right format, so they should have used bdrv_open2 from the
beginning.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block drivers can trigger a blkdebug event whenever they reach a place where it
could be useful to inject an error for testing/debugging purposes.
Rules are read from a blkdebug config file and describe which action is taken
when an event is triggered. For now this is only injecting an error (with a few
options) or changing the state (which is an integer). Rules can be declared to
be active only in a specific state; this way later rules can distiguish on
which path we came to trigger their event.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Previously multiwrite_user_cb was never called if a request in the multiwrite
batch failed right away because it did set mcb->error immediately. Make it look
more like a normal callback to fix this.
Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When two requests of the same multiwrite batch fail, the callback of all
requests in that batch were called twice. This could have any kind of nasty
effects, in my case it lead to use after free and eventually a segfault.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Open backing file read-only where possible
Upgrade backing file to read-write during commit, back to read-only after commit
If upgrade fail, back to read-only. If also fail, "disconnect" the drive.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
It's not needed to check the return of qobject_from_jsonf()
anymore, as an assert() has been added there.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Clean up the current mess about figuring out which flags to pass to the
driver. BDRV_O_FILE, BDRV_O_SNAPSHOT and BDRV_O_NO_BACKING are flags
only used by the block layer internally so filter them out directly.
Previously BDRV_O_NO_BACKING could accidentally be passed to the drivers,
but wasn't ever used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This commit introduces the bdrv_mon_event() function, which
should be called by block subsystems (eg. IDE) when a I/O
error occurs, so that an QMP event is emitted.
The following information is currently provided in the event:
- device name
- operation (ie. "read" or "write")
- action taken (eg. "stop")
Event example:
{ "event": "BLOCK_IO_ERROR",
"data": { "device": "ide0-hd1",
"operation": "write",
"action": "stop" },
"timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This will manage dirty counter for each device and will allow to get the
dirty counter from above.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.
Add a MAX_IOV defintion for platforms that don't have it. For now we use
the same 1024 define that's used on Linux and various other platforms,
but until the windows block backend implements some kind of vectored I/O
it doesn't matter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>