Commit Graph

2638 Commits

Author SHA1 Message Date
Vladimir Sementsov-Ogievskiy
ebf7bba090 qcow2: do not allocate extra memory
There are no needs to allocate more than one cluster, as we set
avail_out for deflate to one cluster.

Zlib docs (http://www.zlib.net/manual.html) says:
"deflate compresses as much data as possible, and stops when the input
buffer becomes empty or the output buffer becomes full."

So, deflate will not write more than avail_out to output buffer. If
there is not enough space in output buffer for compressed data (it may
be larger than input data) deflate just returns Z_OK. (if all data is
compressed and written to output buffer deflate returns Z_STREAM_END).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 1468515565-81313-1-git-send-email-vsementsov@virtuozzo.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-26 17:46:37 +02:00
Peter Maydell
206d0c2436 pc, pci, virtio: new features, cleanups, fixes
- interrupt remapping for intel iommus
 - a bunch of virtio cleanups
 - fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXkQsqAAoJECgfDbjSjVRpanoIAJ9JVlc1aEjt9sa0cSBcs+NQ
 J7JmgU9FqFsj+4FrNTouO3AxTjHurd1UAULP1WMPD+V3JpbnHct8r6SCBLQ5EBMN
 VOjYo4DwWs1g+DqnQ9WZmbadu06XvYi/yiAKNUzWfZk0MR11D0D/S5hmarNKw0Kq
 tGHeTWjGeY4WqFLV7m+qB4+cqkAByn6um99UtUvgLL05RgIEIP2IEMKYZ+rXvAa9
 iGUvzqlO7mbq/+LbL18kaWywa4TCwbbd2eSGWaqhX4CuB62Rl33mWTXFcfaYhkyp
 Z3FgwaJ09h0lAjSVEbyAuLFMfO/BnMcsoKqwl4xc4vkn/xBCqFtgH9JcEVm3O8U=
 =ge2D
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: new features, cleanups, fixes

- interrupt remapping for intel iommus
- a bunch of virtio cleanups
- fixes all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 21 Jul 2016 18:49:30 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (57 commits)
  intel_iommu: avoid unnamed fields
  virtio: Update migration docs
  virtio-gpu: Wrap in vmstate
  virtio-gpu: Use migrate_add_blocker for virgl migration blocking
  virtio-input: Wrap in vmstate
  9pfs: Wrap in vmstate
  virtio-serial: Wrap in vmstate
  virtio-net: Wrap in vmstate
  virtio-balloon: Wrap in vmstate
  virtio-rng: Wrap in vmstate
  virtio-blk: Wrap in vmstate
  virtio-scsi: Wrap in vmstate
  virtio: Migration helper function and macro
  virtio-serial: Remove old migration version support
  virtio-net: Remove old migration version support
  virtio-scsi: Replace HandleOutput typedef
  Revert "mirror: Workaround for unexpected iohandler events during completion"
  virtio-scsi: Call virtio_add_queue_aio
  virtio-blk: Call virtio_add_queue_aio
  virtio: Introduce virtio_add_queue_aio
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-21 20:12:37 +01:00
Fam Zheng
d4a92a8420 Revert "mirror: Workaround for unexpected iohandler events during completion"
This reverts commit ab27c3b5e7.

The virtio storage device host notifiers now work with
bdrv_drained_begin/end, so we don't need this hack any more.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-21 20:44:19 +03:00
Peter Maydell
61ead113ae Pull request
v2:
  * Resolved merge conflict with block/iscsi.c [Peter]
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXj6TkAAoJEJykq7OBq3PI15oIAL24OFnMTFOAWyY2h3yfzGwe
 Gn1qzFOTHCXgbgVsolVuZnFU/SDn9WzMS9y6o7jcJykv++FdMSYUO7paPQAg9etX
 o9KOgyfUUDhskbaXEbTKxhqy7UvcyJsZYmjN+TD5tupfnrqGlmu6rkUPpSwGo++G
 u7CkAZewJc5SvsCdQRe3N10LbG4xU/j06cjCIHWDPRr+AV9qvsb9KkIOg3wkrhWN
 R83yj+yX6vhfQouHLoOlh3RYZ5HNo020JPum0jT1tGcoshKoFyu0prSRqNVVMqTY
 BepvWMhXbiq219HnFaQBtnysve9gWix4WhuCHtaAlaZDTBh945ZztsW7w7dubtA=
 =7ics
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

v2:
 * Resolved merge conflict with block/iscsi.c [Peter]

# gpg: Signature made Wed 20 Jul 2016 17:20:52 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request: (25 commits)
  raw_bsd: Convert to byte-based interface
  nbd: Convert to byte-based interface
  block: Kill .bdrv_co_discard()
  sheepdog: Switch .bdrv_co_discard() to byte-based
  raw_bsd: Switch .bdrv_co_discard() to byte-based
  qcow2: Switch .bdrv_co_discard() to byte-based
  nbd: Switch .bdrv_co_discard() to byte-based
  iscsi: Switch .bdrv_co_discard() to byte-based
  gluster: Switch .bdrv_co_discard() to byte-based
  blkreplay: Switch .bdrv_co_discard() to byte-based
  block: Add .bdrv_co_pdiscard() driver callback
  block: Convert .bdrv_aio_discard() to byte-based
  rbd: Switch rbd_start_aio() to byte-based
  raw-posix: Switch paio_submit() to byte-based
  block: Convert BB interface to byte-based discards
  block: Convert bdrv_aio_discard() to byte-based
  block: Switch BlockRequest to byte-based
  block: Convert bdrv_discard() to byte-based
  block: Convert bdrv_co_discard() to byte-based
  iscsi: Rely on block layer to break up large requests
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Conflicts:
	block/gluster.c
2016-07-21 11:00:36 +01:00
Eric Blake
decaeed773 raw_bsd: Convert to byte-based interface
Since the raw format driver is just passing things through, we can
do byte-based read and write if the underlying protocol does
likewise.

There's one tricky part - if we probed the image format, we document
that we restrict operations on the initial sector.  It's easiest to
keep this guarantee by enforcing read-modify-write on sub-sector
operations (yes, this partially reverts commit ad82be2f).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468624988-423-20-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake
70c4fb2648 nbd: Convert to byte-based interface
The NBD protocol doesn't have any notion of sectors, so it is
a fairly easy conversion to use byte-based read and write.

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468624988-423-19-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake
02aefe43cb block: Kill .bdrv_co_discard()
Now that all drivers have a byte-based .bdrv_co_pdiscard(), we
no longer need to worry about the sector-based version.  We can
also relax our minimum alignment to 1 for drivers that support it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake
dde4753763 sheepdog: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-17-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake
5f61ad079a raw_bsd: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-16-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake
82e8a7888b qcow2: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-15-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake
447e57c3b0 nbd: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

While at it, call directly into nbd-client.c instead of having
a pointless trivial wrapper in nbd.c.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-14-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake
97c7e85cfe iscsi: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Unlike write_zeroes, where we can be handed unaligned requests
and must fail gracefully with -ENOTSUP for a fallback, we are
guaranteed that discard requests are always aligned because the
block layer already ignored unaligned head/tail.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-13-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake
1014170b82 gluster: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-12-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
aba76e2f03 blkreplay: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-11-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
47a5486d59 block: Add .bdrv_co_pdiscard() driver callback
There's enough drivers with a sector-based callback that it will
be easier to switch one at a time.  This patch adds a byte-based
callback, and then after all drivers are swapped, we'll drop the
sector-based callback.

[checkpatch doesn't like the space after coroutine_fn in
block_int.h, but it's consistent with the rest of the file]

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-10-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
4da444a0bb block: Convert .bdrv_aio_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based driver callback .bdrv_aio_discard() with a new
byte-based .bdrv_aio_pdiscard().  Only raw-posix and RBD drivers
are affected, so it was not worth splitting into multiple patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
7bbca9e290 rbd: Switch rbd_start_aio() to byte-based
The internal function converts to byte-based before calling into
RBD code; hoist the conversion to the callers so that callers
can then be switched to byte-based themselves.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-8-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
36e3b2e733 raw-posix: Switch paio_submit() to byte-based
The only remaining uses of paio_submit() were flush (with no
offset or count) and discard (which we are switching to byte-based);
furthermore, the similarly named paio_submit_co() is already
byte-based.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-7-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
1c6c4bb7f0 block: Convert BB interface to byte-based discards
Change sector-based blk_discard(), blk_co_discard(), and
blk_aio_discard() to instead be byte-based blk_pdiscard(),
blk_co_pdiscard(), and blk_aio_pdiscard().  NBD gets a lot
simpler now that ignoring the unaligned portion of a
byte-based discard request is handled under the hood by
the block layer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-6-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
60ebac16bc block: Convert bdrv_aio_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based bdrv_aio_discard() with a new byte-based
bdrv_aio_pdiscard(), which silently ignores any unaligned head
or tail.  Driver callbacks will be converted in followup patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468624988-423-5-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
b15404e027 block: Switch BlockRequest to byte-based
BlockRequest is the internal struct used by bdrv_aio_*.  At the
moment, all such calls were sector-based, but we will eventually
convert to byte-based; start by changing the internal variables
to be byte-based.  No change to behavior, although the read and
write code can now go byte-based through more of the stack.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468624988-423-4-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
0c51a893b6 block: Convert bdrv_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based bdrv_discard() with a new byte-based
bdrv_pdiscard(), which silently ignores any unaligned head
or tail.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-3-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake
9f1963b3f7 block: Convert bdrv_co_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based bdrv_co_discard() with a new byte-based
bdrv_co_pdiscard(), which silently ignores any unaligned head
or tail.  Driver callbacks will be converted in followup patches.

By calculating the alignment outside of the loop, and clamping
the max discard to an aligned value, we can simplify the actions
done within the loop.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-2-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake
6bd01f14db iscsi: Rely on block layer to break up large requests
Now that the block layer honors max_request, we don't need to
bother with an EINVAL on overlarge requests, but can instead
assert that requests are well-behaved.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-7-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake
1e2a77a851 nbd: Drop unused offset parameter
Now that NBD relies on the block layer to fragment things, we no
longer need to track an offset argument for which fragment of
a request we are actually servicing.

While at it, use true and false instead of 0 and 1 for a bool
parameter.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-6-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake
fb1a6de14a nbd: Rely on block layer to break up large requests
Now that the block layer will honor max_transfer, we can simplify
our code to rely on that guarantee.

The readv code can call directly into nbd-client, just as the
writev code has done since commit 52a4650.

Interestingly enough, while qemu-io 'w 0 40m' splits into a 32M
and 8M transaction, 'w -z 0 40m' splits into two 16M and an 8M,
because the block layer caps the bounce buffer for writing zeroes
at 16M.  When we later introduce support for NBD_CMD_WRITE_ZEROES,
we can get a full 32M zero write (or larger, if the client and
server negotiate that write zeroes can use a larger size than
ordinary writes).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-5-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake
04ed95f484 block: Fragment writes to max transfer length
Drivers should be able to rely on the block layer honoring the
max transfer length, rather than needing to return -EINVAL
(iscsi) or manually fragment things (nbd).  We already fragment
write zeroes at the block layer; this patch adds the fragmentation
for normal writes, after requests have been aligned (fragmenting
before alignment would lead to multiple unaligned requests, rather
than just the head and tail).

When fragmenting a large request where FUA was requested, but
where we know that FUA is implemented by flushing all requests
rather than the given request, then we can still get by with
only one flush.  Note, however, that we need a followup patch
to the raw format driver to avoid a regression in the number of
flushes actually issued.

The return value was previously nebulous on success (sometimes
zero, sometimes the length written); since we never have a short
write, and since fragmenting may store yet another positive
value in 'ret', change the function to always return 0 on success,
matching what we do in bdrv_aligned_preadv().

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468607524-19021-4-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake
8a39b4d6e2 raw_bsd: Don't advertise flags not supported by protocol layer
The raw format layer supports all flags via passthrough - but
it only makes sense to pass through flags that the lower layer
actually supports.

The next patch gives stronger reasoning for why this is correct.
At the moment, the raw format layer ignores the max_transfer
limit of its protocol layer, and an attempt to do the qemu-io
'w -f 0 40m' to an NBD server that lacks FUA will pass the entire
40m request to the NBD driver, which then fragments the request
itself into a 32m write, 8m write, and flush.  But once the block
layer starts honoring limits and fragmenting packets, the raw
driver will hand the NBD driver two separate requests; if both
requests have BDRV_REQ_FUA set, then this would result in a 32m
write, flush, 8m write, and second flush.  By having the raw
layer no longer advertise FUA support when the protocol layer
lacks it, we are back to a single flush at the block layer for
the overall 40m request.

Note that 'w -f -z 0 40m' does not currently exhibit the same
problem, because there, the fragmentation does not occur until
at the NBD layer (the raw layer has .bdrv_co_pwrite_zeroes, and
the NBD layer doesn't advertise max_pwrite_zeroes to constrain
things at the raw layer) - but the problem is latent and we
would again have too many flushes without this patch once the
NBD layer implements support for the new NBD_CMD_WRITE_ZEROES
command, if it sets max_pwrite_zeroes to the same 32m limit as
recommended by the NBD protocol.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468607524-19021-3-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake
1a62d0accd block: Fragment reads to max transfer length
Drivers should be able to rely on the block layer honoring the
max transfer length, rather than needing to return -EINVAL
(iscsi) or manually fragment things (nbd).  This patch adds
the fragmentation in the block layer, after requests have been
aligned (fragmenting before alignment would lead to multiple
unaligned requests, rather than just the head and tail).

The return value was previously nebulous on success on whether
it was zero or the length read; and fragmenting may introduce
yet other non-zero values if we use the last length read.  But
as at least some callers are sloppy and expect only zero on
success, it is easiest to just guarantee 0.

[Fix uninitialized ret local variable in bdrv_aligned_preadv().
--Stefan]

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468607524-19021-2-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Prasanna Kumar Kalever
6c7189bb29 block/gluster: add support for multiple gluster servers
This patch adds a way to specify multiple volfile servers to the gluster
block backend of QEMU with tcp|rdma transport types and their port numbers.

Problem:

Currently VM Image on gluster volume is specified like this:

file=gluster[+tcp]://host[:port]/testvol/a.img

Say we have three hosts in a trusted pool with replica 3 volume in action.
When the host mentioned in the command above goes down for some reason,
the other two hosts are still available. But there's currently no way
to tell QEMU about them.

Solution:

New way of specifying VM Image on gluster volume with volfile servers:
(We still support old syntax to maintain backward compatibility)

Basic command line syntax looks like:

Pattern I:
 -drive driver=gluster,
        volume=testvol,path=/path/a.raw,[debug=N,]
        server.0.type=tcp,
        server.0.host=1.2.3.4,
        server.0.port=24007,
        server.1.type=unix,
        server.1.socket=/path/socketfile

Pattern II:
 'json:{"driver":"qcow2","file":{"driver":"gluster",
       "volume":"testvol","path":"/path/a.qcow2",["debug":N,]
       "server":[{hostinfo_1}, ...{hostinfo_N}]}}'

   driver      => 'gluster' (protocol name)
   volume      => name of gluster volume where our VM image resides
   path        => absolute path of image in gluster volume
  [debug]      => libgfapi loglevel [(0 - 9) default 4 -> Error]

  {hostinfo}   => {{type:"tcp",host:"1.2.3.4"[,port=24007]},
                   {type:"unix",socket:"/path/sockfile"}}

   type        => transport type used to connect to gluster management daemon,
                  it can be tcp|unix
   host        => host address (hostname/ipv4/ipv6 addresses/socket path)
   port        => port number on which glusterd is listening.
   socket      => path to socket file

Examples:
1.
 -drive driver=qcow2,file.driver=gluster,
        file.volume=testvol,file.path=/path/a.qcow2,file.debug=9,
        file.server.0.type=tcp,
        file.server.0.host=1.2.3.4,
        file.server.0.port=24007,
        file.server.1.type=unix,
        file.server.1.socket=/var/run/glusterd.socket
2.
  'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol",
         "path":"/path/a.qcow2","debug":9,"server":
         [{"type":"tcp","host":"1.2.3.4","port":"24007"},
          {"type":"unix","socket":"/var/run/glusterd.socket"}
         ]}}'

This patch gives a mechanism to provide all the server addresses, which are in
replica set, so in case host1 is down VM can still boot from any of the
active hosts.

This is equivalent to the backup-volfile-servers option supported by
mount.glusterfs (FUSE way of mounting gluster volume)

credits: sincere thanks to all the supporters

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1468947453-5433-6-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:38:50 -04:00
Prasanna Kumar Kalever
7edac2ddeb block/gluster: using new qapi schema
this patch adds 'GlusterServer' related schema in qapi/block-core.json

[Jeff: minor fix-ups of comments and formatting, per patch reviews]

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468947453-5433-5-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:36:11 -04:00
Prasanna Kumar Kalever
0552ff2465 block/gluster: deprecate rdma support
gluster volfile server fetch happens through unix and/or tcp, it doesn't
support volfile fetch over rdma. The rdma code may actually mislead,
so to make sure things do not break, for now we fallback to tcp when requested
for rdma, with a warning.

If you are wondering how this worked all these days, its the gluster libgfapi
code which handles anything other than unix transport as socket/tcp, sad but
true.

Also gluster doesn't support ipv6 addresses, removing the ipv6 related
comments/docs section

[Jeff: Minor grammatical fixes in comments and commit message, per
review comments]

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1468947453-5433-4-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:20:44 -04:00
Prasanna Kumar Kalever
f70c50c817 block/gluster: code cleanup
unified coding styles of multiline function arguments and other error functions
moved random declarations of structures and other list variables

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1468947453-5433-3-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:08:12 -04:00
Prasanna Kumar Kalever
d5cf4079ca block/gluster: rename [server, volname, image] -> [host, volume, path]
A future patch will add support for multiple gluster servers. Existing
terminology is a bit unusual in relation to what names are used by
other networked devices, and doesn't map very well to the terminology
we expect to use for multiple servers.  Therefore, rename the following
options:
'server'  -> 'host'
'image'   -> 'path'
'volname' -> 'volume'

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1468947453-5433-2-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:08:12 -04:00
Denis V. Lunev
cf56a3c632 mirror: fix request throttling in drive-mirror
There are 2 deficiencies here:
- mirror_iteration could start several requests inside. Thus we could
  simply have more in_flight requests than MAX_IN_FLIGHT.
- keeping this in mind throttling in mirror_run which is checking
  s->in_flight == MAX_IN_FLIGHT is wrong.

The patch adds the check and throttling into mirror_iteration and fixes
the check in mirror_run() to be sure.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1466598927-5990-1-git-send-email-den@openvz.org
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit e648dc95c28fbca12e67be26a1fc4b9a0676c3fe)
2016-07-19 17:03:44 -04:00
Denis V. Lunev
4b5004d9fc mirror: improve performance of mirroring of empty disk
We should not take into account zero blocks for delay calculations.
They are not read and thus IO throttling is not required. In the
other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
days.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-9-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:02:49 -04:00
Denis V. Lunev
c7c2769c0e mirror: efficiently zero out target
With a bdrv_co_write_zeroes method on a target BDS and when this method
is working as indicated by the bdrv_can_write_zeroes_with_unmap(), zeroes
will not be placed into the wire. Thus the target could be very efficiently
zeroed out. This should be done with the largest chunk possible.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-8-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev
b7d5062c9c mirror: optimize dirty bitmap filling in mirror_run a bit
There is no need to scan allocation tables if we have mark_all_dirty flag
set. Just mark it all dirty.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-7-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev
c0b363ad43 mirror: create mirror_dirty_init helper for mirror_run
The code inside the helper will be extended in the next patch. mirror_run
itself is overbloated at the moment.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-5-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev
49efb1f5b0 mirror: create mirror_throttle helper
The patch also places last_pause_ns from stack in mirror_run into
MirrorBlockJob structure. This helper will be useful in next patches.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468503209-19498-4-git-send-email-den@openvz.org
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Eric Blake <eblake@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev
531509ba28 mirror: make sectors_in_flight int64_t
We keep here the sum of int fields. Thus this could easily overflow,
especially when we will start sending big requests in next patches.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-3-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev
6d07859926 dirty-bitmap: operate with int64_t amount
Underlying HBitmap operates even with uint64_t. Thus this change is safe.
This would be useful f.e. to mark entire bitmap dirty in one call.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Peter Maydell
a3b3437721 * two old patches from prospective GSoC students
* i386 -kernel device tree support
 * Coverity fix
 * memory usage improvement from Peter
 * checkpatch fix
 * g_path_get_dirname cleanup
 * caching of block status for iSCSI
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXjcwdAAoJEL/70l94x66D0s8IAKyEBxtATSXZG90jJR+uoCbv
 oK6ea9RDWUZxa2hKjcXh9Be+g4pTv99BqTKJJt+uwkFHAAJl0gvVty+EHE/2sfyo
 Nlt9FlWibxBdSoHxeCq4jg9APWORxcSx3rspg0I8TdxbweKdm9onvXEjfvmhucqG
 FfPSIHg5vsoutCPEDTXfaJDFiLw+rV7Em53kxD/y4VhHZlAWhahpCHSL/lWGRoLp
 B0mKvoHhqHR/EqJr7y7fuga+Aoimyh8R6dUfpxuXQely3309V7znhq0erPQWvSwX
 JKRITmGbWW3HOjplqBT971eH5iH0bDEryx91Oas9VNpm9eGr6qygePhc1eMszxE=
 =pBxf
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* two old patches from prospective GSoC students
* i386 -kernel device tree support
* Coverity fix
* memory usage improvement from Peter
* checkpatch fix
* g_path_get_dirname cleanup
* caching of block status for iSCSI

# gpg: Signature made Tue 19 Jul 2016 07:43:41 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  target-i386: Remove redundant HF_SOFTMMU_MASK
  block/iscsi: allow caching of the allocation map
  block/iscsi: fix rounding in iscsi_allocationmap_set
  Move README to markdown
  cpu-exec: Move down some declarations in cpu_exec()
  exec: avoid realloc in phys_map_node_reserve
  checkpatch: consider git extended headers valid patches
  megasas: remove useless check for cmd->frame
  compiler: never omit assertions if using a static analysis tool
  hw/i386: add device tree support
  Changed malloc to g_malloc, free to g_free in bsd-user/qemu.h
  use g_path_get_dirname instead of dirname

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-19 15:08:05 +01:00
Peter Maydell
ad31cd4c69 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQIcBAABAgAGBQJXjV3bAAoJEH3vgQaq/DkOArAQAKc03QhwukqT2aZ5zs7kZ1Hv
 PCHrISL0dKGK3YfyiSitppoxr6eBoBR9UIEjaNlW0XwujqwWdWfJ7kIbVSGAyWqR
 YWmV+bA+TUWXz+tFbeDI0maMH9GNVCAuvQiqldmJxhZ303pf5cksZ9CqiALSylTY
 t5XUB6nhV7MPto63C2X2xjLkKxlsT9KOTsYxGVgVXwUzgW1lAuu8Lo6eNULXCgUa
 j+azgSFAiUOKwfKxcKD25kPOxgWlrxkGRc2LdFlopEzShENROhR2r9ut3okgAoM4
 KPVIE3jSsLMhNr9bRQRUJw53vRSL/bxFvlCdzKiBFSo7wNMKWNA9RWF6+If1Jvoi
 Am+BzINCfNfoFmqlXppqWGlapk9ZtmDGbPwaUyT6NJ9axAASTQxcj8QOjGEX07UE
 ubvzIXx7D1Amo59/4RWRXVDpMb9+p3npqkuCL+DWZzq7EVB42ig8+fKhijhS4jUK
 2DA7uL4orUjUIoTbJZsKciw7MfaWuP2/SnP1VRNRXSsiNg5N4qJDUB6Wo5AKksQB
 LWP4Ou4irPj/ZGvMhJBMMOQ5kl2maqj8beP9pVjltVjGVzAihAyuXHUqjPdqe1R9
 PLQidHKCb1OaJWi3c53vayuzlINH9iY+adjgSuYxi1QOZ+uqEsjSs+3+m1gyTOeh
 sd8VU/TtJbiGBCh9VTTf
 =9N8+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging

# gpg: Signature made Mon 18 Jul 2016 23:53:15 BST
# gpg:                using RSA key 0x7DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/ide-pull-request:
  block: ignore flush requests when storage is clean
  tests: in IDE and AHCI tests perform DMA write before flushing
  ide: set retry_unit for PIO and FLUSH requests
  ide: refactor retry_unit set and clear into separate function

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-19 11:47:07 +01:00
Peter Lieven
e1123a3b40 block/iscsi: allow caching of the allocation map
until now the allocation map was used only as a hint if a cluster
is allocated or not. If a block was not allocated (or Qemu had
no info about the allocation status) a get_block_status call was
issued to check the allocation status and possibly avoid
a subsequent read of unallocated sectors. If a block known to be
allocated the get_block_status call was omitted. In the other case
a get_block_status call was issued before every read to avoid
the necessity for a consistent allocation map. To avoid the
potential overhead of calling get_block_status for each and
every read request this took only place for the bigger requests.

This patch enhances this mechanism to cache the allocation
status and avoid calling get_block_status for blocks where
the allocation status has been queried before. This allows
for bypassing the read request even for smaller requests and
additionally omits calling get_block_status for known to be
unallocated blocks.

Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468831940-15556-3-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-19 08:34:53 +02:00
Peter Lieven
eb36b953e0 block/iscsi: fix rounding in iscsi_allocationmap_set
when setting clusters as alloacted the boundaries have
to be expanded. As Paolo pointed out the calculation of
the number of clusters is wrong:

Suppose cluster_sectors is 2, sector_num = 1, nb_sectors = 6:

In the "mark allocated" case, you want to set 0..8, i.e.
cluster_num=0, nb_clusters=4.

   0--.--2--.--4--.--6--.--8
   <--|_________________|-->  (<--> = expanded)

Instead you are setting nb_clusters=3, so that 6..8 is not marked.

   0--.--2--.--4--.--6--.--8
   <--|______________|!!!     (! = wrong)

Cc: qemu-stable@nongnu.org
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468831940-15556-2-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-19 08:34:53 +02:00
Evgeny Yakovlev
3ff2f67a7c block: ignore flush requests when storage is clean
Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).

This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2016-07-18 18:19:01 -04:00
Roman Pen
5e1b34a3fa linux-aio: prevent submitting more than MAX_EVENTS
Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us
with specified number of events.  But kernel ring buffer allocation logic
is a bit tricky (ring buffer is page size aligned + some percpu allocation
are required) so eventually more than requested events number is allocated.

From a userspace side we have to follow the convention and should not try
to io_submit() more or logic, which consumes completed events, should be
changed accordingly.  The pitfall is in the following sequence:

    MAX_EVENTS = 128
    io_setup(MAX_EVENTS)

    io_submit(MAX_EVENTS)
    io_submit(MAX_EVENTS)

    /* now 256 events are in-flight */

    io_getevents(MAX_EVENTS) = 128

    /* we can handle only 128 events at once, to be sure
     * that nothing is pended the io_getevents(MAX_EVENTS)
     * call must be invoked once more or hang will happen. */

To prevent the hang or reiteration of io_getevents() call this patch
restricts the number of in-flights, which is now limited to MAX_EVENTS.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 15:10:52 +01:00
Paolo Bonzini
0187f5c9cb linux-aio: share one LinuxAioState within an AioContext
This has better performance because it executes fewer system calls
and does not use a bottom half per disk.

Originally proposed by Ming Lei.

[Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix
build error as reported by Peter Maydell <peter.maydell@linaro.org>.
--Stefan]

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

squash! linux-aio: share one LinuxAioState within an AioContext
2016-07-18 15:09:31 +01:00
Max Reitz
c4b48bfdc5 vvfat: Fix qcow write target driver specification
First, bdrv_open_child() expects all options for the child to be
prefixed by the child's name (and a separating dot). Second,
bdrv_open_child() does not take ownership of the QDict passed to it but
only extracts all options for the child, so if a QDict is created for
the sole purpose of passing it to bdrv_open_child(), it needs to be
freed afterwards.

This patch makes vvfat adhere to both of these rules.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160711135452.11304-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13 13:41:39 +02:00