Commit Graph

4192 Commits

Author SHA1 Message Date
Vladimir Sementsov-Ogievskiy
19c021e194 Revert "hbitmap: Add @advance param to hbitmap_iter_next()"
This reverts commit a33fbb4f8b.

The functionality is unused.

Note: in addition to automatic revert, drop second parameter in
hbitmap_iter_next() call from hbitmap_next_dirty_area() too.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy
166cd55125 Revert "block/dirty-bitmap: Add bdrv_dirty_iter_next_area"
This reverts commit 72d10a9421.

The function is unused now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy
1eaf1b0fdf block/mirror: fix and improve do_sync_target_write
Use bdrv_dirty_bitmap_next_dirty_area() instead of
bdrv_dirty_iter_next_area(), because of the following problems of
bdrv_dirty_iter_next_area():

1. Using HBitmap iterators we should carefully handle unaligned offset,
as first call to hbitmap_iter_next() may return a value less than
original offset (actually, it will be original offset rounded down to
bitmap granularity). This handling is not done in
do_sync_target_write().

2. bdrv_dirty_iter_next_area() handles unaligned max_offset
incorrectly:

look at the code:
    if (max_offset == iter->bitmap->size) {
        /* If max_offset points to the image end, round it up by the
         * bitmap granularity */
        gran_max_offset = ROUND_UP(max_offset, granularity);
    } else {
        gran_max_offset = max_offset;
    }

    ret = hbitmap_iter_next(&iter->hbi, false);
    if (ret < 0 || ret + granularity > gran_max_offset) {
        return false;
    }

and assume that max_offset != iter->bitmap->size but still unaligned.
if 0 < ret < max_offset we found dirty area, but the function can
return false in this case (if ret + granularity > max_offset).

3. bdrv_dirty_iter_next_area() uses inefficient loop to find the end of
the dirty area. Let's use more efficient hbitmap_next_zero instead
(bdrv_dirty_bitmap_next_dirty_area() do so)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy
a78a1a48cd dirty-bitmap: add bdrv_dirty_bitmap_next_dirty_area
The function alters bdrv_dirty_iter_next_area(), which is wrong and
less efficient (see further commit
"block/mirror: fix and improve do_sync_target_write" for description).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy
76d570dc49 dirty-bitmap: improve bdrv_dirty_bitmap_next_zero
Add bytes parameter to the function, to limit searched range.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-15 18:26:49 -05:00
John Snow
07d5a8df6a block/dirty-bitmap: remove assertion from restore
When making a backup of a dirty bitmap (for transactions), we want to
restore that backup whether or not the bitmap is enabled.

It is perfectly valid to write into bitmaps that are disabled. It is
only illegitimate for the guest to have done so.

Remove this assertion.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20181221093529.23855-3-jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-01-14 10:09:46 -06:00
Paolo Bonzini
7d37435bd5 avoid TABs in files that only contain a few
Most files that have TABs only contain a handful of them.  Change
them to spaces so that we don't confuse people.

disas, standard-headers, linux-headers and libdecnumber are imported
from other projects and probably should be exempted from the check.
Outside those, after this patch the following files still contain both
8-space and TAB sequences at the beginning of the line.  Many of them
have a majority of TABs, or were initially committed with all tabs.

    bsd-user/i386/target_syscall.h
    bsd-user/x86_64/target_syscall.h
    crypto/aes.c
    hw/audio/fmopl.c
    hw/audio/fmopl.h
    hw/block/tc58128.c
    hw/display/cirrus_vga.c
    hw/display/xenfb.c
    hw/dma/etraxfs_dma.c
    hw/intc/sh_intc.c
    hw/misc/mst_fpga.c
    hw/net/pcnet.c
    hw/sh4/sh7750.c
    hw/timer/m48t59.c
    hw/timer/sh_timer.c
    include/crypto/aes.h
    include/disas/bfd.h
    include/hw/sh4/sh.h
    libdecnumber/decNumber.c
    linux-headers/asm-generic/unistd.h
    linux-headers/linux/kvm.h
    linux-user/alpha/target_syscall.h
    linux-user/arm/nwfpe/double_cpdo.c
    linux-user/arm/nwfpe/fpa11_cpdt.c
    linux-user/arm/nwfpe/fpa11_cprt.c
    linux-user/arm/nwfpe/fpa11.h
    linux-user/flat.h
    linux-user/flatload.c
    linux-user/i386/target_syscall.h
    linux-user/ppc/target_syscall.h
    linux-user/sparc/target_syscall.h
    linux-user/syscall.c
    linux-user/syscall_defs.h
    linux-user/x86_64/target_syscall.h
    slirp/cksum.c
    slirp/if.c
    slirp/ip.h
    slirp/ip_icmp.c
    slirp/ip_icmp.h
    slirp/ip_input.c
    slirp/ip_output.c
    slirp/mbuf.c
    slirp/misc.c
    slirp/sbuf.c
    slirp/socket.c
    slirp/socket.h
    slirp/tcp_input.c
    slirp/tcpip.h
    slirp/tcp_output.c
    slirp/tcp_subr.c
    slirp/tcp_timer.c
    slirp/tftp.c
    slirp/udp.c
    slirp/udp.h
    target/cris/cpu.h
    target/cris/mmu.c
    target/cris/op_helper.c
    target/sh4/helper.c
    target/sh4/op_helper.c
    target/sh4/translate.c
    tcg/sparc/tcg-target.inc.c
    tests/tcg/cris/check_addo.c
    tests/tcg/cris/check_moveq.c
    tests/tcg/cris/check_swap.c
    tests/tcg/multiarch/test-mmap.c
    ui/vnc-enc-hextile-template.h
    ui/vnc-enc-zywrle.h
    util/envlist.c
    util/readline.c

The following have only TABs:

    bsd-user/i386/target_signal.h
    bsd-user/sparc64/target_signal.h
    bsd-user/sparc64/target_syscall.h
    bsd-user/sparc/target_signal.h
    bsd-user/sparc/target_syscall.h
    bsd-user/x86_64/target_signal.h
    crypto/desrfb.c
    hw/audio/intel-hda-defs.h
    hw/core/uboot_image.h
    hw/sh4/sh7750_regnames.c
    hw/sh4/sh7750_regs.h
    include/hw/cris/etraxfs_dma.h
    linux-user/alpha/termbits.h
    linux-user/arm/nwfpe/fpopcode.h
    linux-user/arm/nwfpe/fpsr.h
    linux-user/arm/syscall_nr.h
    linux-user/arm/target_signal.h
    linux-user/cris/target_signal.h
    linux-user/i386/target_signal.h
    linux-user/linux_loop.h
    linux-user/m68k/target_signal.h
    linux-user/microblaze/target_signal.h
    linux-user/mips64/target_signal.h
    linux-user/mips/target_signal.h
    linux-user/mips/target_syscall.h
    linux-user/mips/termbits.h
    linux-user/ppc/target_signal.h
    linux-user/sh4/target_signal.h
    linux-user/sh4/termbits.h
    linux-user/sparc64/target_syscall.h
    linux-user/sparc/target_signal.h
    linux-user/x86_64/target_signal.h
    linux-user/x86_64/termbits.h
    pc-bios/optionrom/optionrom.h
    slirp/mbuf.h
    slirp/misc.h
    slirp/sbuf.h
    slirp/tcp.h
    slirp/tcp_timer.h
    slirp/tcp_var.h
    target/i386/svm.h
    target/sparc/asi.h
    target/xtensa/core-dc232b/xtensa-modules.inc.c
    target/xtensa/core-dc233c/xtensa-modules.inc.c
    target/xtensa/core-de212/core-isa.h
    target/xtensa/core-de212/xtensa-modules.inc.c
    target/xtensa/core-fsf/xtensa-modules.inc.c
    target/xtensa/core-sample_controller/core-isa.h
    target/xtensa/core-sample_controller/xtensa-modules.inc.c
    target/xtensa/core-test_kc705_be/core-isa.h
    target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
    tests/tcg/cris/check_abs.c
    tests/tcg/cris/check_addc.c
    tests/tcg/cris/check_addcm.c
    tests/tcg/cris/check_addoq.c
    tests/tcg/cris/check_bound.c
    tests/tcg/cris/check_ftag.c
    tests/tcg/cris/check_int64.c
    tests/tcg/cris/check_lz.c
    tests/tcg/cris/check_openpf5.c
    tests/tcg/cris/check_sigalrm.c
    tests/tcg/cris/crisutils.h
    tests/tcg/cris/sys.c
    tests/tcg/i386/test-i386-ssse3.c
    ui/vgafont.h

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181213223737.11793-3-pbonzini@redhat.com>
Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:56 +01:00
Paolo Bonzini
b58deb344d qemu/queue.h: leave head structs anonymous unless necessary
Most list head structs need not be given a name.  In most cases the
name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV
or reverse iteration, but this does not apply to lists of other kinds,
and even for QTAILQ in practice this is only rarely needed.  In addition,
we will soon reimplement those macros completely so that they do not
need a name for the head struct.  So clean up everything, not giving a
name except in the rare case where it is necessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Stefan Hajnoczi
88e94fd238 block/iscsi: cancel libiscsi task when ABORT TASK TMF completes
The libiscsi iscsi_task_mgmt_async() API documentation says:

  abort_task will also cancel the scsi task. The callback for the scsi
  task will be invoked with SCSI_STATUS_CANCELLED

The libiscsi implementation does not fulfil this promise.  The task's
callback is not invoked and its struct iscsi_pdu remains in the internal
list (effectively leaked).

This patch invokes the libiscsi iscsi_scsi_cancel_task() API to force
the task's callback to be invoked with SCSI_STATUS_CANCELLED when the
ABORT TASK TMF completes and the task's callback hasn't been invoked
yet.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180215111526.2464-1-stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 13:57:24 +01:00
Stefan Hajnoczi
c100448790 block/iscsi: fix ioctl cancel use-after-free
iscsi_aio_cancel() does not increment the request's reference count,
causing a use-after-free when ABORT TASK finishes after the request has
already completed.

There are some additional issues with iscsi_aio_cancel():
1. Several ABORT TASKs may be sent for the same task if
   iscsi_aio_cancel() is invoked multiple times.  It's better to avoid
   this just in case the command identifier is reused.
2. The iscsilun->mutex protection is missing in iscsi_aio_cancel().

Reported-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180203061621.7033-4-stefanha@redhat.com>
Reviewed-by: Felipe Franciosi <felipe@nutanix.com>
Tested-by: Sreejith Mohanan <sreejit.mohanan@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 13:57:24 +01:00
Stefan Hajnoczi
83d11973fa block/iscsi: take iscsilun->mutex in iscsi_timed_check_events()
Commit d045c466d9 ("iscsi: do not use
aio_context_acquire/release") introduced iscsilun->mutex but appears to
have overlooked iscsi_timed_check_events() when introducing the mutex.

iscsi_service() and iscsi_set_events() must be called with
iscsilun->mutex held.

iscsi_timed_check_events() is invoked from the AioContext and does not
take the mutex.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180203061621.7033-3-stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 13:57:24 +01:00
Stefan Hajnoczi
f92fa85f98 block/iscsi: drop unused IscsiAIOCB->buf field
The IscsiAIOCB->buf field has not been used since commit
e49ab19fca ("block/iscsi: bump libiscsi
requirement to 1.9.0").  It used to be a linear buffer for old libiscsi
versions that didn't support scatter-gather.  The minimum libiscsi
version supports scatter-gather so we don't linearize buffers anymore.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180203061621.7033-2-stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 13:57:24 +01:00
Li Feng
2916405a11 block/nvme: optimize the performance of nvme driver based on vfio-pci
When the IO size is larger than 2 pages, we move the the pointer one by
one in the pagelist, this is inefficient.

This is a simple benchmark result:

Before:
$ qemu-io -c 'write 0 1G' nvme://0000:00:04.0/1

wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.41 (424.504 MiB/sec and 0.4146 ops/sec)

 $ qemu-io -c 'read 0 1G' nvme://0000:00:04.0/1

read 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.03 (503.055 MiB/sec and 0.4913 ops/sec)

After:
$ qemu-io -c 'write 0 1G' nvme://0000:00:04.0/1

wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.17 (471.517 MiB/sec and 0.4605 ops/sec)

 $ qemu-io -c 'read 0 1G' nvme://0000:00:04.0/1

read 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:01.94 (526.770 MiB/sec and 0.5144 ops/sec)

Signed-off-by: Li Feng <lifeng1519@gmail.com>
Message-Id: <20181101103807.25862-1-lifeng1519@gmail.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2019-01-09 09:38:34 +08:00
Peter Maydell
a29644590f nbd patches for 2019-01-05
Error and trace improvements in NBD code, such as less noise for
 common disconnect scenarios.
 
 - Vladimir Sementsov-Ogievskiy: 0/3 nbd-client: drop extra error noise
 - Eric Blake: portions of 0/22 nbd: add qemu-nbd --list
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJcMLgeAAoJEKeha0olJ0NqzXgH/RVRrCtmz0dOrrAkoJUfFX3t
 AcnOSb8nTWXTLGvGpU6x5l9yGL77VaN83j3z/4tXAraGA3sZsFx4cNNepT7P3zBB
 GlyT/b/Aie4EQAJPjuZdIZBe5pR6seSLoMFoGiga2oS2ik/wx4Wxxn9Gf9ZHUwVO
 muCtsJQypNrfbv7GY2Qf5NKyB2IQxt4ljSCqOmi0/T2//mQrdmRhcHbgTTDUA5HS
 3Rpr9gPV0u6YzA9foCu+UqUpyhF3RBQY9sSMLS+Y9L6W72WgynrG7y0A637o4Q61
 W+2UDDjuvyyCDR8DjD+LnWyGn4QuxIglfta2kFRcFGoPr1GqcmP4gDLeqqdJKFs=
 =R1PV
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-01-05' into staging

nbd patches for 2019-01-05

Error and trace improvements in NBD code, such as less noise for
common disconnect scenarios.

- Vladimir Sementsov-Ogievskiy: 0/3 nbd-client: drop extra error noise
- Eric Blake: portions of 0/22 nbd: add qemu-nbd --list

# gpg: Signature made Sat 05 Jan 2019 13:58:54 GMT
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2019-01-05:
  nbd/client: Drop pointless buf variable
  qemu-nbd: Fail earlier for -c/-d on non-linux
  nbd/client: More consistent error messages
  nbd: Document timeline of various features
  qemu-nbd: Use program name in error messages
  block/nbd-client: use traces instead of noisy error_report_err
  nbd/client: Trace all server option error messages
  nbd: publish _lookup functions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-01-07 11:55:52 +00:00
Vladimir Sementsov-Ogievskiy
d8b4bad846 block/nbd-client: use traces instead of noisy error_report_err
Reduce extra noise of nbd-client, change 083 correspondingly.

In various commits (be41c100 in 2.10, f140e300 in 2.11, 78a33ab
in 2.12), we added spots where qemu as an NBD client would report
problems communicating with the server to stderr, because there
was no where else to send the error to.  However, this is racy,
particularly since the most common source of these errors is when
either the client or the server abruptly hangs up, leaving one
coroutine to report the error only if it wins (or loses) the
race in attempting the read from the server before another
thread completes its cleanup of a protocol error that caused the
disconnect in the first place.  The race is also apparent in the
fact that differences in the flush behavior of the server can
alter the frequency of encountering the race in the client (see
commit 6d39db96).

Rather than polluting stderr, it's better to just trace these
situations, for use by developers debugging a flaky connection,
particularly since the real error that either triggers the abrupt
disconnection in the first place, or that results from the EIO
when a request can't receive a reply, DOES make it back to the
user in the normal Error propagation channels.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20181102151152.288399-4-vsementsov@virtuozzo.com>
[eblake: drop depedence on error hint, enhance commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-01-04 17:34:58 -06:00
yuchenlin
39a0408e76 dmg: don't skip zero chunk
The dmg file has many tables which describe: "start from sector XXX to
sector XXX, the compression method is XXX and where the compressed data
resides on".

Each sector in the expanded file should be covered by a table. The table
will describe the offset of compressed data (or raw depends on the type)
in the dmg.

For example:

[-----------The expanded file------------]
[---bzip table ---]/* zeros */[---zlib---]
    ^
    | if we want to read this sector.

we will find bzip table which contains this sector, and get the
compressed data offset, read it from dmg, uncompress it, finally write to
expanded file.

If we skip zero chunk (table), some sector cannot find the table which
will cause search_chunk() return s->n_chunks, dmg_read_chunk() return -1
and finally causing dmg_co_preadv() return EIO.

See:

[-----------The expanded file------------]
[---bzip table ---]/* zeros */[---zlib---]
                    ^
                    | if we want to read this sector.

Oops, we cannot find the table contains it...

In the original implementation, we don't have zero table. When we try to
read sector inside the zero chunk. We will get EIO, and skip reading.

After this patch, we treat zero chunk the same as ignore chunk, it will
directly write zero and avoid some sector may not find the table.

After this patch:

[-----------The expanded file------------]
[---bzip table ---][--zeros--][---zlib---]

Signed-off-by: yuchenlin <npes87184@gmail.com>
Reviewed-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190103114700.9686-4-npes87184@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-04 11:15:09 +00:00
yuchenlin
e80fcf590e dmg: use enumeration type instead of hard coding number
Signed-off-by: yuchenlin <npes87184@gmail.com>
Reviewed-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190103114700.9686-3-npes87184@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-04 11:15:09 +00:00
yuchenlin
5ef40828ba dmg: fix binary search
There is a possible hang in original binary search implementation. That is
if chunk1 = 4, chunk2 = 5, chunk3 = 4, and we go else case.

The chunk1 will be still 4, and so on.

Signed-off-by: yuchenlin <npes87184@gmail.com>
Message-id: 20190103114700.9686-2-npes87184@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-04 11:15:09 +00:00
Julio Faracco
b47c7d5325 dmg: Fixing wrong dmg block type value for block terminator.
This is a trivial patch to fix a wrong value for block terminator.
The old value was 0x7fffffff which is wrong. It was not affecting the
code because QEMU dmg block is not handling block terminator right now.
Neverthless, it should be fixed.

Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: yuchenlin <yuchenlin@synology.com>
Message-id: 20181228145055.18039-1-jcfaracco@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-04 11:06:02 +00:00
Stefan Hajnoczi
537c3d4f64 block/mirror: add missing coroutine_fn annotations
Marking a function coroutine_fn currently has no effect on the compiler,
but it documents that this function must be called from coroutine
context and it may yield.  This is important information for the
programmer.

Also, if we ever transition to a stackless coroutine implementation,
then it's likely that the annotation will become mandatory so the
compiler can use the correct calling convention for coroutine functions.

Cc: Max Reitz <mreitz@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
2e891722c5 block: Remove flags parameter from bdrv_reopen_queue()
Now that all callers are passing all flag changes as QDict options,
the flags parameter is no longer necessary, so we can get rid of it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
3c4e964762 block: Clean up reopen_backing_file() in block/replication.c
This function is used to put the hidden and secondary disks in
read-write mode before launching the backup job, and back in read-only
mode afterwards.

This patch does the following changes:

  - Use an options QDict with the "read-only" option instead of
    passing the changes as flags only.

  - Simplify the code (it was unnecessarily complicated and verbose).

  - Fix a bug due to which the secondary disk was not being put back
    in read-only mode when writable=false (because in this case
    orig_secondary_flags always had the BDRV_O_RDWR flag set).

  - Stop clearing the BDRV_O_INACTIVE flag.

The flags parameter to bdrv_reopen_queue() becomes redundant and we'll
be able to get rid of it in a subsequent patch.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
1ba7938895 block: Use bdrv_reopen_set_read_only() in the mirror driver
The 'block-commit' QMP command is implemented internally using two
different drivers. If the source image is the active layer then the
mirror driver is used (commit_active_start()), otherwise the commit
driver is used (commit_start()).

In both cases the destination image must be put temporarily in
read-write mode. This is done correctly in the latter case, but what
commit_active_start() does is copy all flags instead.

This patch replaces the bdrv_reopen() calls in that function with
bdrv_reopen_set_read_only() so that only the read-only status is
changed.

A similar change is made in mirror_exit(), which is also used by the
'drive-mirror' and 'blockdev-mirror' commands.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
e7d22f8bc6 block: Use bdrv_reopen_set_read_only() in stream_start/complete()
This patch replaces the bdrv_reopen() calls that set and remove the
BDRV_O_RDWR flag with the new bdrv_reopen_set_read_only() function.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:02 +01:00
Alberto Garcia
c742a3643f block: Use bdrv_reopen_set_read_only() in bdrv_commit()
This patch replaces the bdrv_reopen() calls that set and remove the
BDRV_O_RDWR flag with the new bdrv_reopen_set_read_only() function.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:01 +01:00
Alberto Garcia
e70cdc57da block: Use bdrv_reopen_set_read_only() in commit_start/complete()
This patch replaces the bdrv_reopen() calls that set and remove the
BDRV_O_RDWR flag with the new bdrv_reopen_set_read_only() function.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:55:01 +01:00
Kevin Wolf
0342567115 file-posix: Avoid aio_worker() for QEMU_AIO_IOCTL
aio_worker() doesn't add anything interesting, it's only a useless
indirection. Call the handler function directly instead.

As we know that this handler function is only called from coroutine
context and the coroutine stays around until the worker thread finishes,
we can keep RawPosixAIOData on the stack.

This was the last user of aio_worker(), so the function goes away now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:54:49 +01:00
Kevin Wolf
2f3a7ab39b file-posix: Switch to .bdrv_co_ioctl
No real reason to keep using the callback based mechanism here when the
rest of the file-posix driver is coroutine based. Changing it brings
ioctls more in line with how other request types work.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
c9db2b6489 file-posix: Remove paio_submit_co()
The function is not used any more, remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
999e6b69ce file-posix: Avoid aio_worker() for QEMU_AIO_READ/WRITE
aio_worker() doesn't add anything interesting, it's only a useless
indirection. Call the handler function directly instead.

As we know that this handler function is only called from coroutine
context and the coroutine stays around until the worker thread finishes,
we can keep RawPosixAIOData on the stack.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
54c7ca1b81 file-posix: Move read/write operation logic out of aio_worker()
aio_worker() for reads and writes isn't boring enough yet. It still does
some postprocessing for handling short reads and turning the result into
the right return value.

However, there is no reason why handle_aiocb_rw() couldn't do the same,
and even without duplicating code between the read and write path. So
move the code there.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
06dc9bd571 file-posix: Avoid aio_worker() for QEMU_AIO_FLUSH
aio_worker() doesn't add anything interesting, it's only a useless
indirection. Call the handler function directly instead.

As we know that this handler function is only called from coroutine
context and the coroutine stays around until the worker thread finishes,
we can keep RawPosixAIOData on the stack.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
46ee0f462b file-posix: Avoid aio_worker() for QEMU_AIO_DISCARD
aio_worker() doesn't add anything interesting, it's only a useless
indirection. Call the handler function directly instead.

As we know that this handler function is only called from coroutine
context and the coroutine stays around until the worker thread finishes,
we can keep RawPosixAIOData on the stack.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
7154d8ae66 file-posix: Avoid aio_worker() for QEMU_AIO_WRITE_ZEROES
aio_worker() doesn't add anything interesting, it's only a useless
indirection. Call the handler function directly instead.

As we know that this handler function is only called from coroutine
context and the coroutine stays around until the worker thread finishes,
we can keep RawPosixAIOData on the stack.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
58a209c437 file-posix: Avoid aio_worker() for QEMU_AIO_COPY_RANGE
aio_worker() doesn't add anything interesting, it's only a useless
indirection. Call the handler function directly instead.

As we know that this handler function is only called from coroutine
context and the coroutine stays around until the worker thread finishes,
we can keep RawPosixAIOData on the stack.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
29cb4c01e7 file-posix: Avoid aio_worker() for QEMU_AIO_TRUNCATE
aio_worker() doesn't add anything interesting, it's only a useless
indirection. Call the handler function directly instead.

As we know that this handler function is only called from coroutine
context and the coroutine stays around until the worker thread finishes,
we can keep RawPosixAIOData on the stack.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
5d5de25005 file-posix: Factor out raw_thread_pool_submit()
Getting the thread pool of the AioContext of a block node and scheduling
some work in it is an operation that is already done twice, and we'll
get more instances. Factor it out into a separate function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Kevin Wolf
d57c44d00f file-posix: Reorganise RawPosixAIOData
RawPosixAIOData contains a lot of fields for several separate operations
that are to be processed in a worker thread and that need different
parameters. The struct is currently rather unorganised, with unions that
cover some, but not all operations, and even one #define for field names
instead of a union.

Clean this up to have some common fields and a single union. As a side
effect, on x86_64 the struct shrinks from 72 to 48 bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Vladimir Sementsov-Ogievskiy
e23c9d7a1c qcow2: do decompression in threads
Do decompression in threads, like it is already done for compression.
This improves asynchronous compressed reads performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Vladimir Sementsov-Ogievskiy
c3c10f7295 qcow2: aio support for compressed cluster read
Allocate buffers locally and release qcow2 lock. Than, reads inside
qcow2_co_preadv_compressed may be done in parallel, however all
decompression is still done synchronously. Let's improve it in the
following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:41 +01:00
Vladimir Sementsov-Ogievskiy
c068a1cd52 qcow2: use byte-based read in qcow2_decompress_cluster
We are gradually moving away from sector-based interfaces, towards
byte-based.  Get rid of it here too.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
341926ab83 qcow2: refactor decompress_buffer
- make it look more like a pair of qcow2_compress - rename the function
  and its parameters
- drop extra out_len variable, check filling of output buffer by strm
  structure itself
- fix code style
- add some documentation

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
f4b3e2a960 qcow2: move decompression from qcow2-cluster.c to qcow2.c
Compression is done in threads in qcow2.c. We want to do decompression
in the same way, so, firstly, move it to the same file.

The only change is braces around if-body in decompress_buffer, to
satisfy checkpatch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
6994fd78b9 qcow2: make more generic interface for qcow2_compress
Give explicit size both for source and destination buffers, to make it
similar with decompression path and than cleanly reuse parameter
structure for decompression threads.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
19a4448853 qcow2: use Z_OK instead of 0 for deflateInit2 return code check
Use appropriate macro, corresponding to deflateInit2 spec.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
3a75187fd8 block/backup: drop unused synchronization interface
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
e4f9752c4a block/replication: drop extra synchronization
After commit f8d59dfb40
    "block/backup: fix fleecing scheme: use serialized writes" fleecing
(specifically reading from backup target, when backup source is in
backing chain of backup target) is safe, because all backup-job writes
to target are serialized. Therefore we don't need additional
synchronization for these reads.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Julio Faracco
95a156f689 dmg: exchanging hardcoded dmg UDIF block types to enum.
This change is better to understand what kind of block type is being
handled by the code. Using a syntax similar to the DMG documentation is
easier than tracking all hex values assigned to a block type.

Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Julio Faracco
7a40b418ec dmg: including dmg-lzfse module inside dmg block driver.
This commit includes the support to new module dmg-lzfse into dmg block
driver. It includes the support for block type ULFO (0x80000007).

Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Julio Faracco
83bc1f9768 configure: adding support to lzfse library.
This commit includes the support to lzfse opensource library. With this
library dmg block driver can decompress images with this type of
compression inside.

Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Julio Faracco
c13e80d792 block: adding lzfse decompressing support as a module.
QEMU dmg support includes zlib and bzip2, but it does not contains lzfse
support. This commit adds the source file to extend compression support
for new DMGs.

Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy
c972fa123c crypto: support multiple threads accessing one QCryptoBlock
The two thing that should be handled are cipher and ivgen. For ivgen
the solution is just mutex, as iv calculations should not be long in
comparison with encryption/decryption. And for cipher let's just keep
per-thread ciphers.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2018-12-12 11:16:49 +00:00
Peter Maydell
9225cd127d nbd patches for 2018-12-03
Improve x-dirty-bitmap handling for experimenting with pull mode
 incremental backups.
 
 - Eric Blake: 0/3 NBD dirty bitmap cleanups
 -----BEGIN PGP SIGNATURE-----
 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
 
 iQEcBAABCAAGBQJcBVInAAoJEKeha0olJ0Nq+68H/RoRtXAycWm1bNidVNC9J8pO
 hkXsZwe39V/6Hl5eXch5J6K9V+443A4+WJSpvfISnB7jKEIHQ2GVMiuXPjmeyap+
 BIbAlHt+frQYz+x1E6fKrLE67mxQViRCQrfx8ijDRUJTUlDxq+yPDjwmGk25DRDt
 zlFqpkCr1NhEREm6JDNmM/WJqBaueSAXY5Fi7MqeHTyRKlZ5RclZaZ5bw4YpB+ip
 kwsWPHsz9uoe9caMxcdU0Y8PqB7KwvwJ+4RjtGpFgp0Auqe2FvaRyiqYE4n5EU2S
 5kvbvKgGK1XUe4RyBLVg1cqcB/LoHLbNohJUJ03pdU1anxcjKIDEv/Oh801OPmE=
 =w1HF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-12-03' into staging

nbd patches for 2018-12-03

Improve x-dirty-bitmap handling for experimenting with pull mode
incremental backups.

- Eric Blake: 0/3 NBD dirty bitmap cleanups

# gpg: Signature made Mon 03 Dec 2018 15:56:23 GMT
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2018-12-03:
  nbd/client: Send NBD_CMD_DISC if open fails after connect
  nbd/client: Make x-dirty-bitmap more reliable
  nbd/server: Advertise all contexts in response to bare LIST

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-12-03 17:43:20 +00:00
Vladimir Sementsov-Ogievskiy
d12ade5732 mirror: fix dead-lock
Let start from the beginning:

Commit b9e413dd37 (in 2.9)
"block: explicitly acquire aiocontext in aio callbacks that need it"
added pairs of aio_context_acquire/release to mirror_write_complete and
mirror_read_complete, when they were aio callbacks for blk_aio_* calls.

Then, commit 2e1990b26e (in 3.0) "block/mirror: Convert to coroutines"
dropped these blk_aio_* calls, than mirror_write_complete and
mirror_read_complete are not callbacks more, and don't need additional
aiocontext acquiring. Furthermore, mirror_read_complete calls
blk_co_pwritev inside these pair of aio_context_acquire/release, which
leads to the following dead-lock with mirror:

 (gdb) info thr
   Id   Target Id         Frame
   3    Thread (LWP 145412) "qemu-system-x86" syscall ()
   2    Thread (LWP 145416) "qemu-system-x86" __lll_lock_wait ()
 * 1    Thread (LWP 145411) "qemu-system-x86" __lll_lock_wait ()

 (gdb) bt
 #0  __lll_lock_wait ()
 #1  _L_lock_812 ()
 #2  __GI___pthread_mutex_lock
 #3  qemu_mutex_lock_impl (mutex=0x561032dce420 <qemu_global_mutex>,
     file=0x5610327d8654 "util/main-loop.c", line=236) at
     util/qemu-thread-posix.c:66
 #4  qemu_mutex_lock_iothread_impl
 #5  os_host_main_loop_wait (timeout=480116000) at util/main-loop.c:236
 #6  main_loop_wait (nonblocking=0) at util/main-loop.c:497
 #7  main_loop () at vl.c:1892
 #8  main

Printing contents of qemu_global_mutex, I see that "__owner = 145416",
so, thr1 is main loop, and now it wants BQL, which is owned by thr2.

 (gdb) thr 2
 (gdb) bt
 #0  __lll_lock_wait ()
 #1  _L_lock_870 ()
 #2  __GI___pthread_mutex_lock
 #3  qemu_mutex_lock_impl (mutex=0x561034d25dc0, ...
 #4  aio_context_acquire (ctx=0x561034d25d60)
 #5  dma_blk_cb
 #6  dma_blk_io
 #7  dma_blk_read
 #8  ide_dma_cb
 #9  bmdma_cmd_writeb
 #10 bmdma_write
 #11 memory_region_write_accessor
 #12 access_with_adjusted_size
 #15 flatview_write
 #16 address_space_write
 #17 address_space_rw
 #18 kvm_handle_io
 #19 kvm_cpu_exec
 #20 qemu_kvm_cpu_thread_fn
 #21 qemu_thread_start
 #22 start_thread
 #23 clone ()

Printing mutex in fr 2, I see "__owner = 145411", so thr2 wants aio
context mutex, which is owned by thr1. Classic dead-lock.

Then, let's check that aio context is hold by mirror coroutine: just
print coroutine stack of first tracked request in mirror job target:

 (gdb) [...]
 (gdb) qemu coroutine 0x561035dd0860
 #0  qemu_coroutine_switch
 #1  qemu_coroutine_yield
 #2  qemu_co_mutex_lock_slowpath
 #3  qemu_co_mutex_lock
 #4  qcow2_co_pwritev
 #5  bdrv_driver_pwritev
 #6  bdrv_aligned_pwritev
 #7  bdrv_co_pwritev
 #8  blk_co_pwritev
 #9  mirror_read_complete () at block/mirror.c:232
 #10 mirror_co_read () at block/mirror.c:370
 #11 coroutine_trampoline
 #12 __start_context

Yes it is mirror_read_complete calling blk_co_pwritev after acquiring
aio context.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-03 16:50:58 +01:00
Eric Blake
c688e6ca7b nbd/client: Send NBD_CMD_DISC if open fails after connect
If nbd_client_init() fails after we are already connected,
then the server will spam logs with:

Disconnect client, due to: Unexpected end-of-file before all bytes were read

unless we gracefully disconnect before closing the connection.

Ways to trigger this:

$ opts=driver=nbd,export=foo,server.type=inet,server.host=localhost,server.port=10809
$  qemu-img map --output=json --image-opts $opts,read-only=off
$  qemu-img map --output=json --image-opts $opts,x-dirty-bitmap=nosuch:

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181130023232.3079982-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-11-30 14:13:37 -06:00
Eric Blake
47829c4079 nbd/client: Make x-dirty-bitmap more reliable
The implementation of x-dirty-bitmap in qemu 3.0 (commit 216ee365)
silently falls back to treating the server as not supporting
NBD_CMD_BLOCK_STATUS if a requested meta_context name was not
negotiated, which in turn means treating the _entire_ image as
data. Since our hack relied on using 'qemu-img map' to view
which portions of the image were dirty by seeing what the
redirected bdrv_block_status() treats as holes, this means
that our fallback treats the entire image as clean.  Better
would have been to treat the entire image as dirty, or to fail
to connect because the user's request for a specific context
could not be honored. This patch goes with the latter.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181130023232.3079982-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-11-30 14:11:20 -06:00
Max Reitz
577a133988 file-posix: Fix shared locks on reopen commit
s->locked_shared_perm is the set of bits locked in the file, which is
the inverse of the permissions actually shared.  So we need to pass them
as they are to raw_apply_lock_bytes() instead of inverting them again.

Reported-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-19 14:32:01 +01:00
Eric Blake
77d6a21558 qcow2: Don't allow overflow during cluster allocation
Our code was already checking that we did not attempt to
allocate more clusters than what would fit in an INT64 (the
physical maximimum if we can access a full off_t's worth of
data).  But this does not catch smaller limits enforced by
various spots in the qcow2 image description: L1 and normal
clusters of L2 are documented as having bits 63-56 reserved
for other purposes, capping our maximum offset at 64PB (bit
55 is the maximum bit set).  And for compressed images with
2M clusters, the cap drops the maximum offset to bit 48, or
a maximum offset of 512TB.  If we overflow that offset, we
would write compressed data into one place, but try to
decompress from another, which won't work.

It's actually possible to prove that overflow can cause image
corruption without this patch; I'll add the iotests separately
in the next commit.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-19 12:51:40 +01:00
Kevin Wolf
443ba6befa vvfat: Fix memory leak
Don't leak 'cluster' in the mapping == NULL case. Found by Coverity
(CID 1055918).

Fixes: 8d9401c279
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2018-11-19 12:51:40 +01:00
Liam Merwick
7cb6d3c9be qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()
The commit for 0e4e4318ea increments QCOW2_OL_MAX_BITNR but does not
add an array entry for QCOW2_OL_BITMAP_DIRECTORY_BITNR to metadata_ol_names[].
As a result, an array dereference of metadata_ol_names[8] in
qcow2_pre_write_overlap_check() could result in a read outside of the array bounds.

Fixes: 0e4e4318ea ('qcow2: add overlap check for bitmap directory')

Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1541453919-25973-6-git-send-email-Liam.Merwick@oracle.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-11-12 17:49:21 +01:00
Liam Merwick
8d9401c279 block: Fix potential Null pointer dereferences in vvfat.c
The calls to find_mapping_for_cluster() may return NULL but it
isn't always checked for before dereferencing the value returned.
Additionally, add some asserts to cover cases where NULL can't
be returned but which might not be obvious at first glance.

Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
Message-id: 1541453919-25973-5-git-send-email-Liam.Merwick@oracle.com
[mreitz: Dropped superfluous check of "mapping" following an assertion
         that it is not NULL, and fixed some indentation]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-11-12 17:49:21 +01:00
Liam Merwick
602414d123 block: Null pointer dereference in blk_root_get_parent_desc()
The dev_id returned by the call to blk_get_attached_dev_id() in
blk_root_get_parent_desc() can be NULL (an internal call to
object_get_canonical_path may have returned NULL).

Instead of just checking this case before before dereferencing,
adjust blk_get_attached_dev_id() to return the empty string if no
object path can be found (similar to the case when blk->dev is NULL
and an empty string is returned).

Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
Message-id: 1541453919-25973-3-git-send-email-Liam.Merwick@oracle.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-11-12 17:49:21 +01:00
Jeff Cody
2f74013655 block: Make more block drivers compile-time configurable
This adds configure options to control the following block drivers:

* Bochs
* Cloop
* Dmg
* Qcow (V1)
* Vdi
* Vvfat
* qed
* parallels
* sheepdog

Each of these defaults to being enabled.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20181107063644.2254-1-armbru@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-11-12 17:49:21 +01:00
Fam Zheng
f2e3af29b7 file-posix: Drop s->lock_fd
The lock_fd field is not strictly necessary because transferring locked
bytes from old fd to the new one shouldn't fail anyway. This spares the
user one fd per image.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-12 17:46:57 +01:00
Fam Zheng
2996ffad3a file-posix: Skip effectiveless OFD lock operations
If we know we've already locked the bytes, don't do it again; similarly
don't unlock a byte if we haven't locked it. This doesn't change the
behavior, but fixes a corner case explained below.

Libvirt had an error handling bug that an image can get its (ownership,
file mode, SELinux) permissions changed (RHBZ 1584982) by mistake behind
QEMU. Specifically, an image in use by Libvirt VM has:

    $ ls -lhZ b.img
    -rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c600,c690 b.img

Trying to attach it a second time won't work because of image locking.
And after the error, it becomes:

    $ ls -lhZ b.img
    -rw-r--r--. root root system_u:object_r:virt_image_t:s0 b.img

Then, we won't be able to do OFD lock operations with the existing fd.
In other words, the code such as in blk_detach_dev:

    blk_set_perm(blk, 0, BLK_PERM_ALL, &error_abort);

can abort() QEMU, out of environmental changes.

This patch is an easy fix to this and the change is regardlessly
reasonable, so do it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-12 17:46:57 +01:00
Fam Zheng
db0754df88 file-posix: Use error API properly
Use error_report for situations that affect user operation (i.e.  we're
actually returning error), and warn_report/warn_report_err when some
less critical error happened but the user operation can still carry on.

For raw_normalize_devicepath, add Error parameter to propagate to
its callers.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-12 17:46:57 +01:00
Leonid Bloch
3dd5b8f471 vdi: Use a literal number of bytes for DEFAULT_CLUSTER_SIZE
If an expression is used to define DEFAULT_CLUSTER_SIZE, when compiled,
it will be embedded as a literal expression in the binary (as the
default value) because it is stringified to mark the size of the default
value. Now this is fixed by using a defined number to define this value.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:28:48 +01:00
Kevin Wolf
8f3bf50d34 iscsi: Support auto-read-only option
If read-only=off, but auto-read-only=on is given, open the volume
read-write if we have the permissions, but instead of erroring out for
read-only volumes, just degrade to read-only.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05 15:09:55 +01:00
Kevin Wolf
54ea21bd16 gluster: Support auto-read-only option
If read-only=off, but auto-read-only=on is given, open the file
read-write if we have the permissions, but instead of erroring out for
read-only files, just degrade to read-only.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
2018-11-05 15:09:55 +01:00
Kevin Wolf
6ceef36acb curl: Support auto-read-only option
If read-only=off, but auto-read-only=on is given, just degrade to
read-only.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05 15:09:55 +01:00
Kevin Wolf
64107dc044 file-posix: Support auto-read-only option
If read-only=off, but auto-read-only=on is given, open the file
read-write if we have the permissions, but instead of erroring out for
read-only files, just degrade to read-only.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05 15:09:55 +01:00
Kevin Wolf
6c2e581d4d nbd: Support auto-read-only option
If read-only=off, but auto-read-only=on is given, open a read-write NBD
connection if the server provides a read-write export, but instead of
erroring out for read-only exports, just degrade to read-only.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05 15:09:55 +01:00
Kevin Wolf
eaa2410f1e block: Require auto-read-only for existing fallbacks
Some block drivers have traditionally changed their node to read-only
mode without asking the user. This behaviour has been marked deprecated
since 2.11, expecting users to provide an explicit read-only=on option.

Now that we have auto-read-only=on, enable these drivers to make use of
the option.

This is the only use of bdrv_set_read_only(), so we can make it a bit
more specific and turn it into a bdrv_apply_auto_read_only() that is
more convenient for drivers to use.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05 15:09:55 +01:00
Kevin Wolf
a51b9c4862 rbd: Close image in qemu_rbd_open() error path
Commit e2b8247a32 introduced an error path in qemu_rbd_open() after
calling rbd_open(), but neglected to close the image again in this error
path. The error path should contain everything that the regular close
function qemu_rbd_close() contains.

This adds the missing rbd_close() call.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05 15:09:55 +01:00
Kevin Wolf
e35bdc123a block: Add auto-read-only option
If a management application builds the block graph node by node, the
protocol layer doesn't inherit its read-only option from the format
layer any more, so it must be set explicitly.

Backing files should work on read-only storage, but at the same time, a
block job like commit should be able to reopen them read-write if they
are on read-write storage. However, without option inheritance, reopen
only changes the read-only option for the root node (typically the
format layer), but not the protocol layer, so reopening fails (the
format layer wants to get write permissions, but the protocol layer is
still read-only).

A simple workaround for the problem in the management tool would be to
open the protocol layer always read-write and to make only the format
layer read-only for backing files. However, sometimes the file is
actually stored on read-only storage and we don't know whether the image
can be opened read-write (for example, for NBD it depends on the server
we're trying to connect to). This adds an option that makes QEMU try to
open the image read-write, but allows it to degrade to a read-only mode
without returning an error.

The documentation for this option is consciously phrased in a way that
allows QEMU to switch to a better model eventually: Instead of trying
when the image is first opened, making the read-only flag dynamic and
changing it automatically whenever the first BLK_PERM_WRITE user is
attached or the last one is detached would be much more useful
behaviour.

Unfortunately, this more useful behaviour is also a lot harder to
implement, and libvirt needs a solution now before it can switch to
-blockdev, so let's start with this easier approach for now.

Instead of adding a new auto-read-only option, turning the existing
read-only into an enum (with a bool alternate for compatibility) was
considered, but it complicated the implementation to the point that it
didn't seem to be worth it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05 15:09:55 +01:00
Alberto Garcia
808b27d464 quorum: Forbid adding children in blkverify mode
The blkverify mode of Quorum only works when the number of children is
exactly two, so any attempt to add a new one must return an error.

quorum_del_child() on the other hand doesn't need any additional check
because decreasing the number of children would make it go under the
vote threshold.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Alberto Garcia
83aedca872 quorum: Return an error if the blkverify mode has invalid settings
The blkverify mode of Quorum can only be enabled if the number of
children is exactly two and the value of vote-threshold is also two.

If the user tries to enable it but the other settings are incorrect
then QEMU simply prints an error message to stderr and carries on
disabling the blkverify setting.

This patch makes quorum_open() fail and return an error in this case.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Alberto Garcia
6840e8d8ae quorum: Remove quorum_err()
This is a static function with only one caller, so there's no need to
keep it. Inlining the code in quorum_compare() makes it much simpler.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Peter Maydell
091901841a block/vdi: Don't take address of fields in packed structs
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.

There are a few places where the in-place swap function is
used on something other than a packed struct field; we convert
those anyway, for consistency.

Patch produced with scripts/coccinelle/inplace-byteswaps.cocci.

There are other places where we take the address of a packed member
in this file for other purposes than passing it to a byteswap
function (all the calls to qemu_uuid_*()); we leave those for now.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Peter Maydell
1229e46d3c block/vhdx: Don't take address of fields in packed structs
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.

There are a few places where the in-place swap function is
used on something other than a packed struct field; we convert
those anyway, for consistency.

Patch produced with scripts/coccinelle/inplace-byteswaps.cocci.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Kevin Wolf
c317b646d7 vpc: Don't leak opts in vpc_open()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2018-11-05 15:09:54 +01:00
Li Qiang
967105651b block: change some function return type to bool
Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Alberto Garcia
6f8f015c0c qcow2: Get the request alignment for encrypted images from QCryptoBlock
This doesn't have any practical effect at the moment because the
values of BDRV_SECTOR_SIZE, QCRYPTO_BLOCK_LUKS_SECTOR_SIZE and
QCRYPTO_BLOCK_QCOW_SECTOR_SIZE are all the same (512 bytes), but
future encryption methods could have different requirements.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Peter Maydell
caacea4b2e block/qcow2-bitmap: Don't take address of fields in packed structs
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.

There are a few places where the in-place swap function is
used on something other than a packed struct field; we convert
those anyway, for consistency.

This patch was produced with the following spatch script:

@@
expression E;
@@
-be16_to_cpus(&E);
+E = be16_to_cpu(E);
@@
expression E;
@@
-be32_to_cpus(&E);
+E = be32_to_cpu(E);
@@
expression E;
@@
-be64_to_cpus(&E);
+E = be64_to_cpu(E);
@@
expression E;
@@
-cpu_to_be16s(&E);
+E = cpu_to_be16(E);
@@
expression E;
@@
-cpu_to_be32s(&E);
+E = cpu_to_be32(E);
@@
expression E;
@@
-cpu_to_be64s(&E);
+E = cpu_to_be64(E);

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Peter Maydell
a5fdff18a7 block/qcow: Don't take address of fields in packed structs
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.

There are a few places where the in-place swap function is
used on something other than a packed struct field; we convert
those anyway, for consistency.

This patch was produced with the following spatch script:

@@
expression E;
@@
-be16_to_cpus(&E);
+E = be16_to_cpu(E);
@@
expression E;
@@
-be32_to_cpus(&E);
+E = be32_to_cpu(E);
@@
expression E;
@@
-be64_to_cpus(&E);
+E = be64_to_cpu(E);
@@
expression E;
@@
-cpu_to_be16s(&E);
+E = cpu_to_be16(E);
@@
expression E;
@@
-cpu_to_be32s(&E);
+E = cpu_to_be32(E);
@@
expression E;
@@
-cpu_to_be64s(&E);
+E = cpu_to_be64(E);

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Peter Maydell
3b698f52f9 block/qcow2: Don't take address of fields in packed structs
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.

There are a few places where the in-place swap function is
used on something other than a packed struct field; we convert
those anyway, for consistency.

This patch was produced with the following spatch script
(and hand-editing to fold a few resulting overlength lines):

@@
expression E;
@@
-be16_to_cpus(&E);
+E = be16_to_cpu(E);
@@
expression E;
@@
-be32_to_cpus(&E);
+E = be32_to_cpu(E);
@@
expression E;
@@
-be64_to_cpus(&E);
+E = be64_to_cpu(E);
@@
expression E;
@@
-cpu_to_be16s(&E);
+E = cpu_to_be16(E);
@@
expression E;
@@
-cpu_to_be32s(&E);
+E = cpu_to_be32(E);
@@
expression E;
@@
-cpu_to_be64s(&E);
+E = cpu_to_be64(E);

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Thomas Huth
a2b83a5165 block/vvfat: Fix crash when reporting error about too many files in directory
When using the vvfat driver with a directory that contains too many files,
QEMU currently crashes. This can be triggered like this for example:

 mkdir /tmp/vvfattest
 cd /tmp/vvfattest
 for ((x=0;x<=513;x++)); do mkdir $x; done
 qemu-system-x86_64 -drive \
   file.driver=vvfat,file.dir=.,read-only=on,media=cdrom

Seems like read_directory() is changing the mapping->path variable. Make
sure we use the right pointer instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05 15:09:54 +01:00
Vladimir Sementsov-Ogievskiy
9c98f145df dirty-bitmaps: clean-up bitmaps loading and migration logic
This patch aims to bring the following behavior:

1. We don't load bitmaps, when started in inactive mode. It's the case
of incoming migration. In this case we wait for bitmaps migration
through migration channel (if 'dirty-bitmaps' capability is enabled) or
for invalidation (to load bitmaps from the image).

2. We don't remove persistent bitmaps on inactivation. Instead, we only
remove bitmaps after storing. This is the only way to restore bitmaps,
if we decided to resume source after [failed] migration with
'dirty-bitmaps' capability enabled (which means, that bitmaps were not
stored).

3. We load bitmaps on open and any invalidation, it's ok for all cases:
  - normal open
  - migration target invalidation with dirty-bitmaps capability
    (bitmaps are migrating through migration channel, the are not
     stored, so they should have IN_USE flag set and will be skipped
     when loading. However, it would fail if bitmaps are read-only[1])
  - migration target invalidation without dirty-bitmaps capability
    (normal load of the bitmaps, if migrated with shared storage)
  - source invalidation with dirty-bitmaps capability
    (skip because IN_USE)
  - source invalidation without dirty-bitmaps capability
    (bitmaps were dropped, reload them)

[1]: to accurately handle this, migration of read-only bitmaps is
     explicitly forbidden in this patch.

New mechanism for not storing bitmaps when migrate with dirty-bitmaps
capability is introduced: migration filed in BdrvDirtyBitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:17 -04:00
John Snow
0be37c9e19 block/dirty-bitmaps: allow clear on disabled bitmaps
Similarly to merge, it's OK to allow clear operations on disabled
bitmaps, as this condition only means that they are not recording
new writes. We are free to clear it if the user requests it.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20181002230218.13949-4-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:16 -04:00
John Snow
283d7a04f2 block/dirty-bitmaps: fix merge permissions
In prior commits that made merge transactionable, we removed the
assertion that merge cannot operate on disabled bitmaps. In addition,
we want to make sure that we are prohibiting merges to "locked" bitmaps.

Use the new user_locked function to check.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20181002230218.13949-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:16 -04:00
John Snow
993edc0ce0 block/dirty-bitmaps: add user_locked status checker
Instead of both frozen and qmp_locked checks, wrap it into one check.
frozen implies the bitmap is split in two (for backup), and shouldn't
be modified. qmp_locked implies it's being used by another operation,
like being exported over NBD. In both cases it means we shouldn't allow
the user to modify it in any meaningful way.

Replace any usages where we check both frozen and qmp_locked with the
new check.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20181002230218.13949-2-jsnow@redhat.com
[w/edits Suggested-By: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>]
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:16 -04:00
Vladimir Sementsov-Ogievskiy
2ea427efff bloc/qcow2: drop dirty_bitmaps_loaded state variable
This variable doesn't work as it should, because it is actually cleared
in qcow2_co_invalidate_cache() by memset(). Drop it, as the following
patch will introduce new behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:15 -04:00
Vladimir Sementsov-Ogievskiy
132adb6820 block/qcow2: improve error message in qcow2_inactivate
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[Maintainer edit -- touched up error message. --js]
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:15 -04:00
Vladimir Sementsov-Ogievskiy
fa000f2f9f dirty-bitmap: make it possible to restore bitmap after merge
Add backup parameter to bdrv_merge_dirty_bitmap() to be used then with
bdrv_restore_dirty_bitmap() if it needed to restore the bitmap after
merge operation.

This is needed to implement bitmap merge transaction action in further
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:15 -04:00
Vladimir Sementsov-Ogievskiy
56bd662497 dirty-bitmap: rename bdrv_undo_clear_dirty_bitmap
Use more generic names to reuse the function for bitmap merge in the
following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:14 -04:00
Vladimir Sementsov-Ogievskiy
06bf50068a dirty-bitmap: switch assert-fails to errors in bdrv_merge_dirty_bitmap
Move checks from qmp_x_block_dirty_bitmap_merge() to
bdrv_merge_dirty_bitmap(), to share them with dirty bitmap merge
transaction action in future commit.

Note: for now, only qmp_x_block_dirty_bitmap_merge() calls
bdrv_merge_dirty_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-10-29 16:23:14 -04:00
Markus Armbruster
04788ba2ed vpc: Fail open on bad header checksum
vpc_open() merely prints a warning when it finds a bad header
checksum.  Turn that into a hard error.

Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181017082702.5581-39-armbru@redhat.com>
[Error message capitalized for local consistency]
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2018-10-19 14:55:46 +02:00
Markus Armbruster
5197f44584 block: Use warn_report() & friends to report warnings
Calling error_report() in a function that takes an Error ** argument
is suspicious.  Convert a few that are actually warnings to
warn_report().

While there, split warnings consisting of multiple sentences to
conform to conventions spelled out in warn_report()'s contract, and
improve a rather useless warning in sheepdog.c.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Lieven <pl@kamp.de>
Cc: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-4-armbru@redhat.com>

Drop changes to "without an explicit read-only=on" warnings, because
there's a series removing them pending.  Also drop a cc: to a former
Sheepdog maintainer.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2018-10-19 14:51:34 +02:00
Markus Armbruster
4b5766488f error: Fix use of error_prepend() with &error_fatal, &error_abort
From include/qapi/error.h:

  * Pass an existing error to the caller with the message modified:
  *     error_propagate(errp, err);
  *     error_prepend(errp, "Could not frobnicate '%s': ", name);

Fei Li pointed out that doing error_propagate() first doesn't work
well when @errp is &error_fatal or &error_abort: the error_prepend()
is never reached.

Since I doubt fixing the documentation will stop people from getting
it wrong, introduce error_propagate_prepend(), in the hope that it
lures people away from using its constituents in the wrong order.
Update the instructions in error.h accordingly.

Convert existing error_prepend() next to error_propagate to
error_propagate_prepend().  If any of these get reached with
&error_fatal or &error_abort, the error messages improve.  I didn't
check whether that's the case anywhere.

Cc: Fei Li <fli@suse.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
Paolo Bonzini
6388147296 nvme: correct locking around completion
nvme_poll_queues is already protected by q->lock, and
AIO callbacks are invoked outside the AioContext lock.
So remove the acquire/release pair in nvme_handle_event.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180814062739.19640-1-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-10-12 09:46:14 +08:00
Kevin Wolf
cb53460b70 block-backend: Set werror/rerror defaults in blk_new()
Currently, the default values for werror and rerror have to be set
explicitly with blk_set_on_error() by the callers of blk_new(). The only
caller actually doing this is blockdev_init(), which is called for
BlockBackends created using -drive.

In particular, anonymous BlockBackends created with
-device ...,drive=<node-name> didn't get the correct default set and
instead defaulted to the integer value 0 (= BLOCKDEV_ON_ERROR_REPORT).
This is the intended default for rerror anyway, but the default for
werror should be BLOCKDEV_ON_ERROR_ENOSPC.

Set the defaults in blk_new() instead so that they apply no matter what
way the BlockBackend was created.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2018-10-01 19:13:46 +02:00
Leonid Bloch
bd016b912c qcow2: Explicit number replaced by a constant
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
e957b50b8d qcow2: Set the default cache-clean-interval to 10 minutes
The default cache-clean-interval is set to 10 minutes, in order to lower
the overhead of the qcow2 caches (before the default was 0, i.e.
disabled).

* For non-Linux platforms the default is kept at 0, because
  cache-clean-interval is not supported there yet.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
45b4949c7b qcow2: Resize the cache upon image resizing
The caches are now recalculated upon image resizing. This is done
because the new default behavior of assigning L2 cache relatively to
the image size, implies that the cache will be adapted accordingly
after an image resize.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
80668d0fb7 qcow2: Increase the default upper limit on the L2 cache size
The upper limit on the L2 cache size is increased from 1 MB to 32 MB
on Linux platforms, and to 8 MB on other platforms (this difference is
caused by the ability to set intervals for cache cleaning on Linux
platforms only).

This is done in order to allow default full coverage with the L2 cache
for images of up to 256 GB in size (was 8 GB). Note, that only the
needed amount to cover the full image is allocated. The value which is
changed here is just the upper limit on the L2 cache size, beyond which
it will not grow, even if the size of the image will require it to.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
b749562d98 qcow2: Assign the L2 cache relatively to the image size
Sufficient L2 cache can noticeably improve the performance when using
large images with frequent I/O.

Previously, unless 'cache-size' was specified and was large enough, the
L2 cache was set to a certain size without taking the virtual image size
into account.

Now, the L2 cache assignment is aware of the virtual size of the image,
and will cover the entire image, unless the cache size needed for that is
larger than a certain maximum. This maximum is set to 1 MB by default
(enough to cover an 8 GB image with the default cluster size) but can
be increased or decreased using the 'l2-cache-size' option. This option
was previously documented as the *maximum* L2 cache size, and this patch
makes it behave as such, instead of as a constant size. Also, the
existing option 'cache-size' can limit the sum of both L2 and refcount
caches, as previously.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
657ada52ab qcow2: Avoid duplication in setting the refcount cache size
The refcount cache size does not need to be set to its minimum value in
read_cache_sizes(), as it is set to at least its minimum value in
qcow2_update_options_prepare().

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Leonid Bloch
b6a95c6d10 qcow2: Make sizes more humanly readable
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Alberto Garcia
8d3245750b file-posix: Forbid trying to change unsupported options during reopen
The file-posix code is used for the "file", "host_device" and
"host_cdrom" drivers, and it allows reopening images. However the only
option that is actually processed is "x-check-cache-dropped", and
changes in all other options (e.g. "filename") are silently ignored:

   (qemu) qemu-io virtio0 "reopen -o file.filename=no-such-file"

While we could allow changing some of the other options, let's keep
things as they are for now but return an error if the user tries to
change any of them.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:12 +02:00
Alberto Garcia
589f20dccd file-posix: x-check-cache-dropped should default to false on reopen
The default value of x-check-cache-dropped is false. There's no reason
to use the previous value as a default in raw_reopen_prepare() because
bdrv_reopen_queue_child() already takes care of putting the old
options in the BDRVReopenState.options QDict.

If x-check-cache-dropped was previously set but is now missing from
the reopen QDict then it should be reset to false.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:11 +02:00
Fam Zheng
b857431d2a file-posix: Include filename in locking error message
Image locking errors happening at device initialization time doesn't say
which file cannot be locked, for instance,

    -device scsi-disk,drive=drive-1: Failed to get shared "write" lock
    Is another process using the image?

could refer to either the overlay image or its backing image.

Hoist the error_append_hint to the caller of raw_check_lock_bytes where
file name is known, and include it in the error hint.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01 12:51:11 +02:00
Peter Maydell
099bea113f Block and testing patches
- Paolo's AIO fixes.
 - VMDK streamOptimized corner case fix
 - VM testing improvment on -cpu
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCAAuFiEEUAN8t5cGD3bwIa1WyjViTGqRccYFAluq9NAQHGZhbXpAcmVk
 aGF0LmNvbQAKCRDKNWJMapFxxm6mB/0XBuWySxCchAZDmkdcIqosrO7XZ6dKFpvW
 3uegPH3gUCpH6tw1YtigQSS+Se7tdnrUkOA5/5yOt8v2h9+6ORbIEYkkjGFpsPAL
 jITjIGSxXb2yjR0Ss+zKoR4tFQGYEVRmQSGZ8UHYiDVFHU0FbcwNkagHIyluoRuF
 +IspfB7bMXqrZ1qCCWROQDy7Cd2TJDeZ+jUtoiOACh7kyq5Q+PcVoeyeKKk1berH
 Kv8ZhdOk2t/te/CSaCpUTe5zYv3QI1tjcU1D6huYAkT9M8/3SasOeXHBNNG9zLE+
 zAtNKufd+VQoG7l24AfcPaUVqJlQ4GRDXFdumcgW3zA2Qf5CGyN7
 =pEHW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/famz/tags/staging-pull-request' into staging

Block and testing patches

- Paolo's AIO fixes.
- VMDK streamOptimized corner case fix
- VM testing improvment on -cpu

# gpg: Signature made Wed 26 Sep 2018 03:54:08 BST
# gpg:                using RSA key CA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/staging-pull-request:
  vmdk: align end of file to a sector boundary
  tests/vm: Use -cpu max rather than -cpu host
  aio-posix: do skip system call if ctx->notifier polling succeeds
  aio-posix: compute timeout before polling
  aio-posix: fix concurrent access to poll_disable_cnt

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-28 13:35:26 +01:00
yuchenlin
51b3c6b73a vmdk: align end of file to a sector boundary
There is a rare case which the size of last compressed cluster
is larger than the cluster size, which will cause the file is
not aligned at the sector boundary.

There are three reasons to do it. First, if vmdk doesn't align at
the sector boundary, there may be many undefined behaviors,
such as, in vbox it will show VMDK: Compressed image is corrupted
'syno-vm-disk1.vmdk' (VERR_ZIP_CORRUPTED) when we try to import an
ova with unaligned vmdk. Second, all the cluster_sector is aligned
to sector, the last one should be like this, too. Third, it ease
reading with sector based I/Os.

Signed-off-by: yuchenlin <yuchenlin@synology.com>
Message-Id: <20180913082952.3675-1-yuchenlin@synology.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26 10:47:18 +08:00
Peter Maydell
866ba83854 - Deprecate the usage of a network backend via "name" instead of "id"
- Deprecate the "enforce-config-section" machine parameter
 - Re-enable the wdt_ib700, endianness and vmxnet3 qtests
 - Some trivial fixes and doc update patches that crossed my way
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbqlsyAAoJEC7Z13T+cC21RbAP/3IvGfBxuRm6rBWoghjQgbl8
 KU8nPnlZUtqjxmfUTILO/h+pJ3na5MQ8hh7v8JHi+xlQ2DPkECW21DtnfdxntVjw
 +b+N5Ap6J22GHyEq4HJXPWAk2rDInqkU966DvL40RiMvOTfXdg9EO0TDX0VsVgZv
 BR1r7/t3T0P7hiQ0XWb9U2JchRIC+Zgk34gXZPSTpoIv89fUhzNoK5LvAA6yV1FQ
 TvE8VTKJm4wkqThH1ShtbJCBKjHjW/W8LYZr3YMothcs8vGjEdEcDL4BoJZDn3bF
 h4VTkU+k8lp7W9LmlnPnu1WH/5ezhzdwJTeFaPJt4U10WKJptAS4vbK03DXlds9O
 9d2BOXKrima2kSr1ejSe1f0kcE8fis1XFmSuhF61Nbw6ngT5+pP2JSc1XwFazd2K
 zQwV4GXBLzAGnd4F2Ec+5TKzbGFVfczxeBDiBkkVmG+XdX/UXJpkpPYGAaw7DDiK
 JwKVVYIPk1ll6MAbR6qEGsvE/adHNEm8lUdjXqwgbQlIeUZ2H0hCu9lJ0X81mtoQ
 WZP+nMa/87COnlPX6VPVgxM2TXQOH/UbGz/WmYzZ6/gPKTX+gfwrHQGdp7Tjl33U
 KxFKWioFnoqGuyWasvTtKEK67/IlrY+w1nXuuqKJg8J2/qx1SVtx45FHkRkxkIDx
 4boRpx0XUqpDVdf8VhRB
 =dXgp
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2018-09-25' into staging

- Deprecate the usage of a network backend via "name" instead of "id"
- Deprecate the "enforce-config-section" machine parameter
- Re-enable the wdt_ib700, endianness and vmxnet3 qtests
- Some trivial fixes and doc update patches that crossed my way

# gpg: Signature made Tue 25 Sep 2018 16:58:42 BST
# gpg:                using RSA key 2ED9D774FE702DB5
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>"
# gpg:                 aka "Thomas Huth <thuth@redhat.com>"
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>"
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>"
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* remotes/huth-gitlab/tags/pull-request-2018-09-25:
  Revert "check: Move VMXNET3 test to common"
  Revert "check: Move endianess test to common"
  Revert "check: Move wdt_ib700 test to common"
  tests/migration: Speed up the test on ppc64
  hw/qdev-core: Fix description of instance_init
  qdev: fix a typo in comment
  docs: Fix some typos (most found by codespell)
  trivial: Make bios files and source files non-executable
  memfd: fix possible usage of the uninitialized file descriptor
  hw/core/machine: Officially deprecate the enforce-config-section parameter
  net/slirp: Deprecate the [hub_id name] parameter tuple
  net: Deprecate the "name" parameter of -net
  Makefile: Add missing dependency for qemu-deprecated.texi

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25 18:09:52 +01:00
Thomas Huth
55d38d10b8 trivial: Make bios files and source files non-executable
These files can not be executed on the host, so they should not be
marked as executable.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25 17:26:18 +02:00
Kevin Wolf
cfe29d8294 block: Use a single global AioWait
When draining a block node, we recurse to its parent and for subtree
drains also to its children. A single AIO_WAIT_WHILE() is then used to
wait for bdrv_drain_poll() to become true, which depends on all of the
nodes we recursed to. However, if the respective child or parent becomes
quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent
is checked, while AIO_WAIT_WHILE() depends on the AioWait of the
original node.

Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE().

This may mean that the draining thread gets a few more unnecessary
wakeups because an unrelated operation got completed, but we already
wake it up when something _could_ have changed rather than only if it
has certainly changed.

Apart from that, drain is a slow path anyway. In theory it would be
possible to use wakeups more selectively and still correctly, but the
gains are likely not worth the additional complexity. In fact, this
patch is a nice simplification for some places in the code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
4cf077b59f block: Remove aio_poll() in bdrv_drain_poll variants
bdrv_drain_poll_top_level() was buggy because it didn't release the
AioContext lock of the node to be drained before calling aio_poll().
This way, callbacks called by aio_poll() would possibly take the lock a
second time and run into a deadlock with a nested AIO_WAIT_WHILE() call.

However, it turns out that the aio_poll() call isn't actually needed any
more. It was introduced in commit 91af091f92, which is effectively
reverted by this patch. The cases it was supposed to fix are now covered
by bdrv_drain_poll(), which waits for block jobs to reach a quiescent
state.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
46aaf2a566 block-backend: Decrease in_flight only after callback
Request callbacks can do pretty much anything, including operations that
will yield from the coroutine (such as draining the backend). In that
case, a decreased in_flight would be visible to other code and could
lead to a drain completing while the callback hasn't actually completed
yet.

Note that reordering these operations forbids calling drain directly
inside an AIO callback. As Paolo explains, indirectly calling it is
okay:

- Calling it through a coroutine is okay, because then
  bdrv_drained_begin() goes through bdrv_co_yield_to_drain() and you
  have in_flight=2 when bdrv_co_yield_to_drain() yields, then soon
  in_flight=1 when the aio_co_wake() in the AIO callback completes, then
  in_flight=0 after the bottom half starts.

- Calling it through a bottom half would be okay too, as long as the AIO
  callback remembers to do inc_in_flight/dec_in_flight just like
  bdrv_co_yield_to_drain() and bdrv_co_drain_bh_cb() do

A few more important cases that come to mind:

- A coroutine that yields because of I/O is okay, with a sequence
  similar to bdrv_co_yield_to_drain().

- A coroutine that yields with no I/O pending will correctly decrease
  in_flight to zero before yielding.

- Calling more AIO from the callback won't overflow the counter just
  because of mutual recursion, because AIO functions always yield at
  least once before invoking the callback.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
5ca9d21bd1 block-backend: Fix potential double blk_delete()
blk_unref() first decreases the refcount of the BlockBackend and calls
blk_delete() if the refcount reaches zero. Requests can still be in
flight at this point, they are only drained during blk_delete():

At this point, arbitrary callbacks can run. If any callback takes a
temporary BlockBackend reference, it will first increase the refcount to
1 and then decrease it to 0 again, triggering another blk_delete(). This
will cause a use-after-free crash in the outer blk_delete().

Fix it by draining the BlockBackend before decreasing to refcount to 0.
Assert in blk_ref() that it never takes the first refcount (which would
mean that the BlockBackend is already being deleted).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
fe5258a503 block-backend: Add .drained_poll callback
A bdrv_drain operation must ensure that all parents are quiesced, this
includes BlockBackends. Otherwise, callbacks called by requests that are
completed on the BDS layer, but not quite yet on the BlockBackend layer
could still create new requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
aa1361d54a block: Add missing locking in bdrv_co_drain_bh_cb()
bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Sergio Lopez
e091f0e905 block/linux-aio: acquire AioContext before qemu_laio_process_completions
In qemu_laio_process_completions_and_submit, the AioContext is acquired
before the ioq_submit iteration and after qemu_laio_process_completions,
but the latter is not thread safe either.

This change avoids a number of random crashes when the Main Thread and
an IO Thread collide processing completions for the same AioContext.
This is an example of such crash:

 - The IO Thread is trying to acquire the AioContext at aio_co_enter,
   which evidences that it didn't lock it before:

Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)):
 #0  0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 #1  0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0
 #2  0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0)
    at ../nptl/pthread_mutex_lock.c:114
 #3  0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66
 #4  0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493
 #5  0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478
 #6  0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104
 #7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 #8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 #9  0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406
 #10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true)
    at util/aio-posix.c:693
 #11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64
 #12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308
 #13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

 - The Main Thread is also processing completions from the same
   AioContext, and crashes due to failed assertion at util/iov.c:78:

Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)):
 #0  0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 #1  0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90
 #2  0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset")
    at assert.c:92
 #3  0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101
 #4  0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78
 #5  0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410
 #6  0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88
 #7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 #8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 #9  0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272
 #10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497
 #11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574
 #12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false)
    at util/aio-posix.c:604
 #13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273
 #14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390
 #15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590
 #16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774
 #17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401
 #18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449
 #19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080)
    at block/commit.c:92
 #20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973
 #21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90
 #22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118
 #23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436
 #24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
 #25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201
 #26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854
 #27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215
 #28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238
 #29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497
 #30 0x00005631fbd81412 in main_loop () at vl.c:1866
 #31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4647

 - A closer examination shows that s->io_q.in_flight appears to have
   gone backwards:

(gdb) frame 7
 #7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
222	            qemu_laio_process_completion(laiocb);
(gdb) p s
$2 = (LinuxAioState *) 0x7fdfc0297670
(gdb) p *s
$3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0,
    in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0,
      sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241}
(gdb) p/x s->io_q.in_flight
$4 = 0xfffffff0

Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-09-25 15:50:15 +02:00
John Snow
1b57488acf block/stream: refactor stream to use job callbacks
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180906130225.5118-8-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:31:15 +02:00
John Snow
737efc1eda block/mirror: conservative mirror_exit refactor
For purposes of minimum code movement, refactor the mirror_exit
callback to use the post-finalization callbacks in a trivial way.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20180906130225.5118-7-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Added comment for the mirror_exit() function]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:31:15 +02:00
John Snow
c2924ceaa7 block/mirror: don't install backing chain on abort
In cases where we abort the block/mirror job, there's no point in
installing the new backing chain before we finish aborting.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20180906130225.5118-6-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:31:15 +02:00
John Snow
22dffcbec6 block/commit: refactor commit to use job callbacks
Use the component callbacks; prepare, abort, and clean.

NB: prepare is only called when the job has not yet failed;
and abort can be called after prepare.

complete -> prepare -> abort -> clean
complete -> abort -> clean

During refactor, a potential problem with bdrv_drop_intermediate
was identified, the patched behavior is no worse than the pre-patch
behavior, so leave a FIXME for now to be fixed in a future patch.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180906130225.5118-5-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:31:15 +02:00
John Snow
cf6320df58 block/stream: add block job creation flags
Add support for taking and passing forward job creation flags.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180906130225.5118-4-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:31:15 +02:00
John Snow
a1999b3348 block/mirror: add block job creation flags
Add support for taking and passing forward job creation flags.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180906130225.5118-3-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:31:15 +02:00
John Snow
5360782d08 block/commit: add block job creation flags
Add support for taking and passing forward job creation flags.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180906130225.5118-2-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:31:15 +02:00
Richard W.M. Jones
637fa44ab8 curl: Make sslverify=off disable host as well as peer verification.
The sslverify setting is supposed to turn off all TLS certificate
checks in libcurl.  However because of the way we use it, it only
turns off peer certificate authenticity checks
(CURLOPT_SSL_VERIFYPEER).  This patch makes it also turn off the check
that the server name in the certificate is the same as the server
you're connecting to (CURLOPT_SSL_VERIFYHOST).

We can use Google's server at 8.8.8.8 which happens to have a bad TLS
certificate to demonstrate this:

$ ./qemu-img create -q -f qcow2 -b 'json: { "file.sslverify": "off", "file.driver": "https", "file.url": "https://8.8.8.8/foo" }' /var/tmp/file.qcow2
qemu-img: /var/tmp/file.qcow2: CURL: Error opening file: SSL: no alternative certificate subject name matches target host name '8.8.8.8'
Could not open backing image to determine size.

With this patch applied, qemu-img connects to the server regardless of
the bad certificate:

$ ./qemu-img create -q -f qcow2 -b 'json: { "file.sslverify": "off", "file.driver": "https", "file.url": "https://8.8.8.8/foo" }' /var/tmp/file.qcow2
qemu-img: /var/tmp/file.qcow2: CURL: Error opening file: The requested URL returned error: 404 Not Found

(The 404 error is expected because 8.8.8.8 is not actually serving a
file called "/foo".)

Of course the default (without sslverify=off) remains to always check
the certificate:

$ ./qemu-img create -q -f qcow2 -b 'json: { "file.driver": "https", "file.url": "https://8.8.8.8/foo" }' /var/tmp/file.qcow2
qemu-img: /var/tmp/file.qcow2: CURL: Error opening file: SSL: no alternative certificate subject name matches target host name '8.8.8.8'
Could not open backing image to determine size.

Further information about the two settings is available here:

https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html
https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYHOST.html

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-id: 20180914095622.19698-1-rjones@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-09-24 23:46:05 -04:00
Jeff Cody
084d1d13bd block/rbd: Attempt to parse legacy filenames
When we converted rbd to get rid of the older key/value-centric
encoding format, we broke compatibility with image files with backing
file strings encoded in the old format.

This leaves a bit of an ugly conundrum, and a hacky solution.

If the initial attempt to parse the "proper" options fails, it assumes
that we may have an older key/value encoded filename.  Fall back to
attempting to parse the filename, and extract the required options from
it.  If that fails, pass along the original error message.

We do not support mixed modern usage alongside legacy keyvalue pair
usage.

A deprecation warning has been added, although care should be taken
when actually deprecating since the impact is not limited to
commandline or qapi usage, but also opening existing images.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 15b332e5432ad069441f7275a46080f465d789a0.1536704901.git.jcody@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-09-24 23:46:05 -04:00
Jeff Cody
f24b03b56c block/rbd: pull out qemu_rbd_convert_options
Code movement to pull the conversion from Qdict to BlockdevOptionsRbd
into a helper function.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 5b49a980f2cde6610ab1df41bb0277d00b5db893.1536704901.git.jcody@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-09-24 23:31:09 -04:00
Peter Maydell
d6f71af654 Block patches:
- (Block) job exit refactoring, part 1
   (removing job_defer_to_main_loop())
 - test-bdrv-drain leak fix
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbiVEJAAoJEPQH2wBh1c9AHb4H/0P6yjozmu8J6cQhfXmFsJQk
 72Y6Lu9w7kNL43dEBkAU3vzDPUkzHRwpO5pLPKqQh0ojCz45KfTMozh/iMoJtuKP
 Hev4ZlRlFpcr0NHLQnysxsgV7FYbDEVS9xdQ6KlgFXyDBLgZVGykjq67kwDtXfnp
 eQof9Nf0T+m3bNJey6C43l4YqPzPIUCfoSgCqkoB1W6QGtfglGx8I4evjjgxv7GT
 s8IzBg7WSi7h8+mouZcXOs8/w7nJNeSSbMb921NXCWXCzIVHLpw5SImDiQfxEvcy
 pnBtVttty6pAmQvOC6GphqHPNIeRYLTIxkEzBxZUAePonsEa9zw33pjBGdWtSPU=
 =60m9
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/xanclic/tags/pull-block-2018-08-31-v2' into staging

Block patches:
- (Block) job exit refactoring, part 1
  (removing job_defer_to_main_loop())
- test-bdrv-drain leak fix

# gpg: Signature made Fri 31 Aug 2018 15:30:33 BST
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/xanclic/tags/pull-block-2018-08-31-v2:
  jobs: remove job_defer_to_main_loop
  jobs: remove ret argument to job_completed; privatize it
  block/backup: make function variables consistently named
  jobs: utilize job_exit shim
  block/mirror: utilize job_exit shim
  block/commit: utilize job_exit shim
  jobs: add exit shim
  jobs: canonize Error object
  jobs: change start callback to run callback
  tests: fix bdrv-drain leak

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-24 14:35:58 +01:00
John Snow
6870277535 block/backup: make function variables consistently named
Rename opaque_job to job to be consistent with other job implementations.
Rename 'job', the BackupBlockJob object, to 's' to also be consistent.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180830015734.19765-8-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
John Snow
eb23654dbe jobs: utilize job_exit shim
Utilize the job_exit shim by not calling job_defer_to_main_loop, and
where applicable, converting the deferred callback into the job_exit
callback.

This converts backup, stream, create, and the unit tests all at once.
Most of these jobs do not see any changes to the order in which they
clean up their resources, except the test-blockjob-txn test, which
now puts down its bs before job_completed is called.

This is safe for the same reason the reordering in the mirror job is
safe, because job_completed no longer runs under two locks, making
the unref safe even if it causes a flush.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180830015734.19765-7-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
John Snow
7b508f6b7a block/mirror: utilize job_exit shim
Change the manual deferment to mirror_exit into the implicit
callback to job_exit and the mirror_exit callback.

This does change the order of some bdrv_unref calls and job_completed,
but thanks to the new context in which we call .exit, this is safe to
defer the possible flushing of any nodes to the job_finalize_single
cleanup stage.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20180830015734.19765-6-jsnow@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
John Snow
f369b48dc4 block/commit: utilize job_exit shim
Change the manual deferment to commit_complete into the implicit
callback to job_exit, renaming commit_complete to commit_exit.

This conversion does change the timing of when job_completed is
called to after the bdrv_replace_node and bdrv_unref calls, which
could have implications for bjob->blk which will now be put down
after this cleanup.

Kevin highlights that we did not take any permissions for that backend
at job creation time, so it is safe to reorder these operations.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180830015734.19765-5-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
John Snow
3d1f8b07a4 jobs: canonize Error object
Jobs presently use both an Error object in the case of the create job,
and char strings in the case of generic errors elsewhere.

Unify the two paths as just j->err, and remove the extra argument from
job_completed. The integer error code for job_completed is kept for now,
to be removed shortly in a separate patch.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20180830015734.19765-3-jsnow@redhat.com
[mreitz: Dropped a superfluous g_strdup()]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
John Snow
f67432a201 jobs: change start callback to run callback
Presently we codify the entry point for a job as the "start" callback,
but a more apt name would be "run" to clarify the idea that when this
function returns we consider the job to have "finished," except for
any cleanup which occurs in separate callbacks later.

As part of this clarification, change the signature to include an error
object and a return code. The error ptr is not yet used, and the return
code while captured, will be overwritten by actions in the job_completed
function.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180830015734.19765-2-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
Peter Xu
3ab72385b2 qapi: Drop qapi_event_send_FOO()'s Error ** argument
The generated qapi_event_send_FOO() take an Error ** argument.  They
can't actually fail, because all they do with the argument is passing it
to functions that can't fail: the QObject output visitor, and the
@qmp_emit callback, which is either monitor_qapi_event_queue() or
event_test_emit().

Drop the argument, and pass &error_abort to the QObject output visitor
and @qmp_emit instead.

Suggested-by: Eric Blake <eblake@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180815133747.25032-4-peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message rewritten, update to qapi-code-gen.txt corrected]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-28 18:21:38 +02:00
Peter Maydell
d3bd57d9f6 Block layer patches:
- Remove deprecated -drive options for geometry/serial/addr
 - luks: Allow shared writers if the parents allow them (share-rw=on)
 - qemu-img: Fix error when trying to convert to encrypted target image
 - mirror: Fail gracefully for source == target
 - I/O throttling: Fix behaviour during drain (always ignore the limits)
 - bdrv_reopen() related fixes for bs->options/explicit_options content
 - Documentation improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbdApvAAoJEH8JsnLIjy/WJB8P/3JX843sJCKXG2GijuEvgblB
 QEJrMowaxWea7LSwc4DoU3Bduj/xri05w3dFvkKzrW+dk7wJB5hb8rfTcROVGP9X
 IByZW/Bqom41kvKCshbaGa7WsaOEu24/nW/pzn69iBpEQdZv5xrM1CAVHwlBgKtf
 Rc/angKQRE4Lm12Jb40R/r/Fnr1JTRDzb6rwk4w3zOeeKVPYxWT1F6Jk8XJC+R0W
 n6fGl6FRiv6kue7UkWrjpASOdKhsZSla0M8nyE/ABuHFXFIcwusPuWkm+qXMWlbH
 uXRTLfnzc3brzn4IYR1VVbHCZUBLpyfeuE5S5a8kHFTjLYnNzZH0Crdh7ofpYTnV
 AsyL1xnAI238XhExV37c7vnIn9UsYBRm8KrFmFYuGQ2PYFNTUrVRmKAUhh4m89jw
 vXhmQckhefyyLRwL/4OQqzbBuDhCewXyBFkj9kq9FonFLmWUo5VzpMUbTYw47QAB
 Y/hDYz854UCxoPU/tqevAcpKBgK3rFIXfHcsxqcnsQrqQwgEjhJnNzNwEl+VQWER
 nYxad9L91BDjYZiW0MKjqF9d6jKKFVO9HYFY62pYSM5lwkHCirs2j39kfEaWK46T
 6WZWSoLNB0uUt9WUn1uM3uo6UWlUWEyEoV9XsKQCzx/XRuLoxMhfTUkMAgMcjqaD
 hXcLbldpvtpXj8YAdn8/
 =aYYo
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Remove deprecated -drive options for geometry/serial/addr
- luks: Allow shared writers if the parents allow them (share-rw=on)
- qemu-img: Fix error when trying to convert to encrypted target image
- mirror: Fail gracefully for source == target
- I/O throttling: Fix behaviour during drain (always ignore the limits)
- bdrv_reopen() related fixes for bs->options/explicit_options content
- Documentation improvements

# gpg: Signature made Wed 15 Aug 2018 12:11:43 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (21 commits)
  qapi: block: Remove mentions of error types which were removed
  block: Simplify append_open_options()
  block: Update bs->options if bdrv_reopen() succeeds
  block: Simplify bdrv_reopen_abort()
  block: Remove children options from bs->{options,explicit_options}
  qdict: Make qdict_extract_subqdict() accept dst = NULL
  block: drop empty .bdrv_close handlers
  block: make .bdrv_close optional
  qemu-img: fix regression copying secrets during convert
  mirror: Fail gracefully for source == target
  qapi/block: Document restrictions for node names
  block: Remove dead deprecation warning code
  block: Remove deprecated -drive option serial
  block: Remove deprecated -drive option addr
  block: Remove deprecated -drive geometry options
  luks: Allow share-rw=on
  throttle-groups: Don't allow timers without throttled requests
  qemu-iotests: Update 093 to improve the draining test
  throttle-groups: Skip the round-robin if a member is being drained
  qemu-iotests: Test removing a throttle group member with a pending timer
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-08-15 22:11:08 +01:00
Vladimir Sementsov-Ogievskiy
f66b1f0e27 block: drop empty .bdrv_close handlers
.bdrv_close handler is optional after previous commit, no needs to keep
empty functions more.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Vladimir Sementsov-Ogievskiy
3c005293c2 block: make .bdrv_close optional
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Kevin Wolf
86fae10c64 mirror: Fail gracefully for source == target
blockdev-mirror with the same node for source and target segfaults
today: A node is in its own backing chain, so mirror_start_job() decides
that this is an active commit. When adding the intermediate nodes with
block_job_add_bdrv(), it starts the iteration through the subchain with
the backing file of source, though, so it never reaches target and
instead runs into NULL at the base.

While we could fix that by starting with source itself, there is no
point in allowing mirroring a node into itself and I wouldn't be
surprised if this caused more problems later.

So just check for this scenario and error out.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-08-15 12:50:39 +02:00
Kevin Wolf
572023f7b2 block: Remove deprecated -drive option serial
This reinstates commit b008326744,
which was temporarily reverted for the 3.0 release so that libvirt gets
some extra time to update their command lines.

The -drive option serial was deprecated in QEMU 2.10. It's time to
remove it.

Tests need to be updated to set the serial number with -global instead
of using the -drive option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-08-15 12:50:39 +02:00
Fam Zheng
497da8236a luks: Allow share-rw=on
Format drivers such as qcow2 don't allow sharing the same image between
two QEMU instances in order to prevent image corruptions, because of
metadata cache. LUKS driver don't modify metadata except for when
creating image, so it is safe to relax the permission. This makes
share-rw=on property work on virtual devices.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Alberto Garcia
25b8e4db7f throttle-groups: Don't allow timers without throttled requests
Commit 6fccbb475b fixed a bug caused by
QEMU attempting to remove a throttle group member with no pending
requests but an active timer set. This was the result of a previous
bdrv_drained_begin() call processing the throttled requests but
leaving the timer untouched.

Although the commit does solve the problem, the situation shouldn't
happen in the first place. If we try to drain a throttle group member
which has a timer set, we should cancel the timer instead of ignoring
it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Alberto Garcia
5d8e4ca035 throttle-groups: Skip the round-robin if a member is being drained
In the throttling code after an I/O request has been completed the
next one is selected from a different member using a round-robin
algorithm. This ensures that all members get a chance to finish their
pending I/O requests.

However, if a group member has its I/O limits disabled (because it's
being drained) then we should always give it priority in order to have
all its pending requests finished as soon as possible.

If we don't do this we could have a member in the process of being
drained waiting for the throttled requests of other members, for which
the I/O limits still apply.

This can have additional consequences: if we're running in qtest mode
(with QEMU_CLOCK_VIRTUAL) then timers can only fire if we advance the
clock manually, so attempting to drain a block device can hang QEMU in
the BDRV_POLL_WHILE() loop at the end of bdrv_do_drained_begin().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Kevin Wolf
f62492bb8d block/qapi: Fix memory leak in qmp_query_blockstats()
For BlockBackends that are skipped in query-blockstats, we would leak
info since commit 567dcb31. Allocate info only later to avoid the memory
leak.

Fixes: CID 1394727
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2018-08-15 12:50:39 +02:00
Paolo Bonzini
2f0d8947a6 nvme: simplify plug/unplug
bdrv_io_plug/bdrv_io_unplug take care of keeping a nesting count,
so change s->plugged to just a bool.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180813144320.12382-2-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-08-15 10:12:35 +08:00