mirrors/qemu

mirror of https://github.com/qemu/qemu.git synced 2024-12-12 21:23:36 +08:00

Author	SHA1	Message	Date
Eric Blake	8a39b4d6e2	raw_bsd: Don't advertise flags not supported by protocol layer The raw format layer supports all flags via passthrough - but it only makes sense to pass through flags that the lower layer actually supports. The next patch gives stronger reasoning for why this is correct. At the moment, the raw format layer ignores the max_transfer limit of its protocol layer, and an attempt to do the qemu-io 'w -f 0 40m' to an NBD server that lacks FUA will pass the entire 40m request to the NBD driver, which then fragments the request itself into a 32m write, 8m write, and flush. But once the block layer starts honoring limits and fragmenting packets, the raw driver will hand the NBD driver two separate requests; if both requests have BDRV_REQ_FUA set, then this would result in a 32m write, flush, 8m write, and second flush. By having the raw layer no longer advertise FUA support when the protocol layer lacks it, we are back to a single flush at the block layer for the overall 40m request. Note that 'w -f -z 0 40m' does not currently exhibit the same problem, because there, the fragmentation does not occur until at the NBD layer (the raw layer has .bdrv_co_pwrite_zeroes, and the NBD layer doesn't advertise max_pwrite_zeroes to constrain things at the raw layer) - but the problem is latent and we would again have too many flushes without this patch once the NBD layer implements support for the new NBD_CMD_WRITE_ZEROES command, if it sets max_pwrite_zeroes to the same 32m limit as recommended by the NBD protocol. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468607524-19021-3-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:11:54 +01:00
Eric Blake	1a62d0accd	block: Fragment reads to max transfer length Drivers should be able to rely on the block layer honoring the max transfer length, rather than needing to return -EINVAL (iscsi) or manually fragment things (nbd). This patch adds the fragmentation in the block layer, after requests have been aligned (fragmenting before alignment would lead to multiple unaligned requests, rather than just the head and tail). The return value was previously nebulous on success on whether it was zero or the length read; and fragmenting may introduce yet other non-zero values if we use the last length read. But as at least some callers are sloppy and expect only zero on success, it is easiest to just guarantee 0. [Fix uninitialized ret local variable in bdrv_aligned_preadv(). --Stefan] Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 1468607524-19021-2-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:11:54 +01:00
Prasanna Kumar Kalever	6c7189bb29	block/gluster: add support for multiple gluster servers This patch adds a way to specify multiple volfile servers to the gluster block backend of QEMU with tcp\|rdma transport types and their port numbers. Problem: Currently VM Image on gluster volume is specified like this: file=gluster[+tcp]://host[:port]/testvol/a.img Say we have three hosts in a trusted pool with replica 3 volume in action. When the host mentioned in the command above goes down for some reason, the other two hosts are still available. But there's currently no way to tell QEMU about them. Solution: New way of specifying VM Image on gluster volume with volfile servers: (We still support old syntax to maintain backward compatibility) Basic command line syntax looks like: Pattern I: -drive driver=gluster, volume=testvol,path=/path/a.raw,[debug=N,] server.0.type=tcp, server.0.host=1.2.3.4, server.0.port=24007, server.1.type=unix, server.1.socket=/path/socketfile Pattern II: 'json:{"driver":"qcow2","file":{"driver":"gluster", "volume":"testvol","path":"/path/a.qcow2",["debug":N,] "server":[{hostinfo_1}, ...{hostinfo_N}]}}' driver => 'gluster' (protocol name) volume => name of gluster volume where our VM image resides path => absolute path of image in gluster volume [debug] => libgfapi loglevel [(0 - 9) default 4 -> Error] {hostinfo} => {{type:"tcp",host:"1.2.3.4"[,port=24007]}, {type:"unix",socket:"/path/sockfile"}} type => transport type used to connect to gluster management daemon, it can be tcp\|unix host => host address (hostname/ipv4/ipv6 addresses/socket path) port => port number on which glusterd is listening. socket => path to socket file Examples: 1. -drive driver=qcow2,file.driver=gluster, file.volume=testvol,file.path=/path/a.qcow2,file.debug=9, file.server.0.type=tcp, file.server.0.host=1.2.3.4, file.server.0.port=24007, file.server.1.type=unix, file.server.1.socket=/var/run/glusterd.socket 2. 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol", "path":"/path/a.qcow2","debug":9,"server": [{"type":"tcp","host":"1.2.3.4","port":"24007"}, {"type":"unix","socket":"/var/run/glusterd.socket"} ]}}' This patch gives a mechanism to provide all the server addresses, which are in replica set, so in case host1 is down VM can still boot from any of the active hosts. This is equivalent to the backup-volfile-servers option supported by mount.glusterfs (FUSE way of mounting gluster volume) credits: sincere thanks to all the supporters Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1468947453-5433-6-git-send-email-prasanna.kalever@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 17:38:50 -04:00
Prasanna Kumar Kalever	7edac2ddeb	block/gluster: using new qapi schema this patch adds 'GlusterServer' related schema in qapi/block-core.json [Jeff: minor fix-ups of comments and formatting, per patch reviews] Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1468947453-5433-5-git-send-email-prasanna.kalever@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 17:36:11 -04:00
Prasanna Kumar Kalever	0552ff2465	block/gluster: deprecate rdma support gluster volfile server fetch happens through unix and/or tcp, it doesn't support volfile fetch over rdma. The rdma code may actually mislead, so to make sure things do not break, for now we fallback to tcp when requested for rdma, with a warning. If you are wondering how this worked all these days, its the gluster libgfapi code which handles anything other than unix transport as socket/tcp, sad but true. Also gluster doesn't support ipv6 addresses, removing the ipv6 related comments/docs section [Jeff: Minor grammatical fixes in comments and commit message, per review comments] Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1468947453-5433-4-git-send-email-prasanna.kalever@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 17:20:44 -04:00
Prasanna Kumar Kalever	f70c50c817	block/gluster: code cleanup unified coding styles of multiline function arguments and other error functions moved random declarations of structures and other list variables Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1468947453-5433-3-git-send-email-prasanna.kalever@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 17:08:12 -04:00
Prasanna Kumar Kalever	d5cf4079ca	block/gluster: rename [server, volname, image] -> [host, volume, path] A future patch will add support for multiple gluster servers. Existing terminology is a bit unusual in relation to what names are used by other networked devices, and doesn't map very well to the terminology we expect to use for multiple servers. Therefore, rename the following options: 'server' -> 'host' 'image' -> 'path' 'volname' -> 'volume' Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1468947453-5433-2-git-send-email-prasanna.kalever@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 17:08:12 -04:00
Denis V. Lunev	cf56a3c632	mirror: fix request throttling in drive-mirror There are 2 deficiencies here: - mirror_iteration could start several requests inside. Thus we could simply have more in_flight requests than MAX_IN_FLIGHT. - keeping this in mind throttling in mirror_run which is checking s->in_flight == MAX_IN_FLIGHT is wrong. The patch adds the check and throttling into mirror_iteration and fixes the check in mirror_run() to be sure. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1466598927-5990-1-git-send-email-den@openvz.org CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> (cherry picked from commit e648dc95c28fbca12e67be26a1fc4b9a0676c3fe)	2016-07-19 17:03:44 -04:00
Denis V. Lunev	4b5004d9fc	mirror: improve performance of mirroring of empty disk We should not take into account zero blocks for delay calculations. They are not read and thus IO throttling is not required. In the other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes days. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-9-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 17:02:49 -04:00
Denis V. Lunev	c7c2769c0e	mirror: efficiently zero out target With a bdrv_co_write_zeroes method on a target BDS and when this method is working as indicated by the bdrv_can_write_zeroes_with_unmap(), zeroes will not be placed into the wire. Thus the target could be very efficiently zeroed out. This should be done with the largest chunk possible. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-8-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Denis V. Lunev	b7d5062c9c	mirror: optimize dirty bitmap filling in mirror_run a bit There is no need to scan allocation tables if we have mark_all_dirty flag set. Just mark it all dirty. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-7-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Denis V. Lunev	c0b363ad43	mirror: create mirror_dirty_init helper for mirror_run The code inside the helper will be extended in the next patch. mirror_run itself is overbloated at the moment. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-5-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Denis V. Lunev	49efb1f5b0	mirror: create mirror_throttle helper The patch also places last_pause_ns from stack in mirror_run into MirrorBlockJob structure. This helper will be useful in next patches. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1468503209-19498-4-git-send-email-den@openvz.org CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> CC: Eric Blake <eblake@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Denis V. Lunev	531509ba28	mirror: make sectors_in_flight int64_t We keep here the sum of int fields. Thus this could easily overflow, especially when we will start sending big requests in next patches. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-3-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Denis V. Lunev	6d07859926	dirty-bitmap: operate with int64_t amount Underlying HBitmap operates even with uint64_t. Thus this change is safe. This would be useful f.e. to mark entire bitmap dirty in one call. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-2-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Peter Maydell	a3b3437721	* two old patches from prospective GSoC students * i386 -kernel device tree support * Coverity fix * memory usage improvement from Peter * checkpatch fix * g_path_get_dirname cleanup * caching of block status for iSCSI -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXjcwdAAoJEL/70l94x66D0s8IAKyEBxtATSXZG90jJR+uoCbv oK6ea9RDWUZxa2hKjcXh9Be+g4pTv99BqTKJJt+uwkFHAAJl0gvVty+EHE/2sfyo Nlt9FlWibxBdSoHxeCq4jg9APWORxcSx3rspg0I8TdxbweKdm9onvXEjfvmhucqG FfPSIHg5vsoutCPEDTXfaJDFiLw+rV7Em53kxD/y4VhHZlAWhahpCHSL/lWGRoLp B0mKvoHhqHR/EqJr7y7fuga+Aoimyh8R6dUfpxuXQely3309V7znhq0erPQWvSwX JKRITmGbWW3HOjplqBT971eH5iH0bDEryx91Oas9VNpm9eGr6qygePhc1eMszxE= =pBxf -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * two old patches from prospective GSoC students * i386 -kernel device tree support * Coverity fix * memory usage improvement from Peter * checkpatch fix * g_path_get_dirname cleanup * caching of block status for iSCSI # gpg: Signature made Tue 19 Jul 2016 07:43:41 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: target-i386: Remove redundant HF_SOFTMMU_MASK block/iscsi: allow caching of the allocation map block/iscsi: fix rounding in iscsi_allocationmap_set Move README to markdown cpu-exec: Move down some declarations in cpu_exec() exec: avoid realloc in phys_map_node_reserve checkpatch: consider git extended headers valid patches megasas: remove useless check for cmd->frame compiler: never omit assertions if using a static analysis tool hw/i386: add device tree support Changed malloc to g_malloc, free to g_free in bsd-user/qemu.h use g_path_get_dirname instead of dirname Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-07-19 15:08:05 +01:00
Peter Maydell	ad31cd4c69	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXjV3bAAoJEH3vgQaq/DkOArAQAKc03QhwukqT2aZ5zs7kZ1Hv PCHrISL0dKGK3YfyiSitppoxr6eBoBR9UIEjaNlW0XwujqwWdWfJ7kIbVSGAyWqR YWmV+bA+TUWXz+tFbeDI0maMH9GNVCAuvQiqldmJxhZ303pf5cksZ9CqiALSylTY t5XUB6nhV7MPto63C2X2xjLkKxlsT9KOTsYxGVgVXwUzgW1lAuu8Lo6eNULXCgUa j+azgSFAiUOKwfKxcKD25kPOxgWlrxkGRc2LdFlopEzShENROhR2r9ut3okgAoM4 KPVIE3jSsLMhNr9bRQRUJw53vRSL/bxFvlCdzKiBFSo7wNMKWNA9RWF6+If1Jvoi Am+BzINCfNfoFmqlXppqWGlapk9ZtmDGbPwaUyT6NJ9axAASTQxcj8QOjGEX07UE ubvzIXx7D1Amo59/4RWRXVDpMb9+p3npqkuCL+DWZzq7EVB42ig8+fKhijhS4jUK 2DA7uL4orUjUIoTbJZsKciw7MfaWuP2/SnP1VRNRXSsiNg5N4qJDUB6Wo5AKksQB LWP4Ou4irPj/ZGvMhJBMMOQ5kl2maqj8beP9pVjltVjGVzAihAyuXHUqjPdqe1R9 PLQidHKCb1OaJWi3c53vayuzlINH9iY+adjgSuYxi1QOZ+uqEsjSs+3+m1gyTOeh sd8VU/TtJbiGBCh9VTTf =9N8+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging # gpg: Signature made Mon 18 Jul 2016 23:53:15 BST # gpg: using RSA key 0x7DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/ide-pull-request: block: ignore flush requests when storage is clean tests: in IDE and AHCI tests perform DMA write before flushing ide: set retry_unit for PIO and FLUSH requests ide: refactor retry_unit set and clear into separate function Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-07-19 11:47:07 +01:00
Peter Lieven	e1123a3b40	block/iscsi: allow caching of the allocation map until now the allocation map was used only as a hint if a cluster is allocated or not. If a block was not allocated (or Qemu had no info about the allocation status) a get_block_status call was issued to check the allocation status and possibly avoid a subsequent read of unallocated sectors. If a block known to be allocated the get_block_status call was omitted. In the other case a get_block_status call was issued before every read to avoid the necessity for a consistent allocation map. To avoid the potential overhead of calling get_block_status for each and every read request this took only place for the bigger requests. This patch enhances this mechanism to cache the allocation status and avoid calling get_block_status for blocks where the allocation status has been queried before. This allows for bypassing the read request even for smaller requests and additionally omits calling get_block_status for known to be unallocated blocks. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1468831940-15556-3-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-19 08:34:53 +02:00
Peter Lieven	eb36b953e0	block/iscsi: fix rounding in iscsi_allocationmap_set when setting clusters as alloacted the boundaries have to be expanded. As Paolo pointed out the calculation of the number of clusters is wrong: Suppose cluster_sectors is 2, sector_num = 1, nb_sectors = 6: In the "mark allocated" case, you want to set 0..8, i.e. cluster_num=0, nb_clusters=4. 0--.--2--.--4--.--6--.--8 <--\|_________________\|--> (<--> = expanded) Instead you are setting nb_clusters=3, so that 6..8 is not marked. 0--.--2--.--4--.--6--.--8 <--\|______________\|!!! (! = wrong) Cc: qemu-stable@nongnu.org Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1468831940-15556-2-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-19 08:34:53 +02:00
Evgeny Yakovlev	3ff2f67a7c	block: ignore flush requests when storage is clean Some guests (win2008 server for example) do a lot of unnecessary flushing when underlying media has not changed. This adds additional overhead on host when calling fsync/fdatasync. This change introduces a write generation scheme in BlockDriverState. Current write generation is checked against last flushed generation to avoid unnessesary flushes. The problem with excessive flushing was found by a performance test which does parallel directory tree creation (from 2 processes). Results improved from 0.424 loops/sec to 0.432 loops/sec. Each loop creates 10^3 directories with 10 files in each. This affected some blkdebug testcases that were expecting error logs from failure-injected flushes which are now skipped entirely (tests 026 071 089). This also affects the performance of block jobs and thus BLOCK_JOB_READY events for driver-mirror and active block-commit commands now arrives faster, before QMP send successfully returns to caller (tests 141 144). Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468870792-7411-5-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>	2016-07-18 18:19:01 -04:00
Roman Pen	5e1b34a3fa	linux-aio: prevent submitting more than MAX_EVENTS Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us with specified number of events. But kernel ring buffer allocation logic is a bit tricky (ring buffer is page size aligned + some percpu allocation are required) so eventually more than requested events number is allocated. From a userspace side we have to follow the convention and should not try to io_submit() more or logic, which consumes completed events, should be changed accordingly. The pitfall is in the following sequence: MAX_EVENTS = 128 io_setup(MAX_EVENTS) io_submit(MAX_EVENTS) io_submit(MAX_EVENTS) /* now 256 events are in-flight / io_getevents(MAX_EVENTS) = 128 / we can handle only 128 events at once, to be sure * that nothing is pended the io_getevents(MAX_EVENTS) * call must be invoked once more or hang will happen. */ To prevent the hang or reiteration of io_getevents() call this patch restricts the number of in-flights, which is now limited to MAX_EVENTS. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-18 15:10:52 +01:00
Paolo Bonzini	0187f5c9cb	linux-aio: share one LinuxAioState within an AioContext This has better performance because it executes fewer system calls and does not use a bottom half per disk. Originally proposed by Ming Lei. [Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix build error as reported by Peter Maydell <peter.maydell@linaro.org>. --Stefan] Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> squash! linux-aio: share one LinuxAioState within an AioContext	2016-07-18 15:09:31 +01:00
Max Reitz	c4b48bfdc5	vvfat: Fix qcow write target driver specification First, bdrv_open_child() expects all options for the child to be prefixed by the child's name (and a separating dot). Second, bdrv_open_child() does not take ownership of the QDict passed to it but only extracts all options for the child, so if a QDict is created for the sole purpose of passing it to bdrv_open_child(), it needs to be freed afterwards. This patch makes vvfat adhere to both of these rules. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160711135452.11304-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:41:39 +02:00
Reda Sallahi	524089bce4	vmdk: fix metadata write regression Commit "cdeaf1f vmdk: add bdrv_co_write_zeroes" causes a regression on writes. It writes metadata after every write instead of doing it only once for each cluster. vmdk_pwritev() writes metadata whenever m_data is set as valid so this patch sets m_data as valid only when we have a new cluster which hasn't been allocated before or a zero grain. Signed-off-by: Reda Sallahi <fullmanet@gmail.com> Message-id: 20160707084249.29084-1-fullmanet@gmail.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:41:39 +02:00
Sascha Silbe	f14a39ccb9	Improve block job rate limiting for small bandwidth values ratelimit_calculate_delay() previously reset the accounting every time slice, no matter how much data had been processed before. This had (at least) two consequences: 1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream. Not sure if there are real-world use cases where this would be a problem. Mirroring and backup over a slow link (e.g. DSL) would come to mind, though. 2. Tests for block job operations (e.g. cancel) were rather racy All block jobs currently use a time slice of 100ms. That's a reasonable value to get smooth output during regular operation. However this also meant that the state of block jobs changed every 100ms, no matter how low the configured limit was. On busy hosts, qemu often transferred additional chunks until the test case had a chance to cancel the job. Fix the block job rate limit code to delay for more than one time slice to address the above issues. To make it easier to handle oversized chunks we switch the semantics from returning a delay _before_ the current request to a delay _after_ the current request. If necessary, this delay consists of multiple time slice units. Since the mirror job sends multiple chunks in one go even if the rate limit was exceeded in between, we need to keep track of the start of the current time slice so we can correctly re-compute the delay for the updated amount of data. The minimum bandwidth now is 1 data unit per time slice. The block jobs are currently passing the amount of data transferred in sectors and using 100ms time slices, so this translates to 5120 bytes/second. With chunk sizes usually being O(512KiB), tests have plenty of time (O(100s)) to operate on block jobs. The chance of a race condition now is fairly remote, except possibly on insanely loaded systems. Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:41:38 +02:00
Max Reitz	c834cba905	qcow2: Fix qcow2_get_cluster_offset() Recently, qcow2_get_cluster_offset() has been changed to work with bytes instead of sectors. This invalidated some assertions and introduced a possible integer multiplication overflow. This could be reproduced using e.g. $ qemu-img create -f qcow2 -o cluster_size=1M blub.qcow2 8G Formatting 'foo.qcow2', fmt=qcow2 size=8589934592 encryption=off cluster_size=1048576 lazy_refcounts=off refcount_bits=16 $ qemu-io -c map blub.qcow2 qemu-io: qemu/block/qcow2-cluster.c:504: qcow2_get_cluster_offset: Assertion `bytes_needed <= INT_MAX' failed. [1] 20775 abort (core dumped) qemu-io -c map foo.qcow2 This patch removes the now wrong assertion, adding comments and more assertions to prove its correctness (and fixing the overflow which would become apparent with the original assertion removed). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160620142623.24471-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:41:38 +02:00
Max Reitz	84c26520d3	qcow2: Avoid making the L1 table too big We refuse to open images whose L1 table we deem "too big". Consequently, we should not produce such images ourselves. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160615153630.2116-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Added QEMU_BUILD_BUG_ON()] Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:41:38 +02:00
Kevin Wolf	8c39825218	block/qdev: Allow configuring rerror/werror with qdev properties The rerror/werror policies are implemented in the devices, so that's where they should be configured. In comparison to the old options in -drive, the qdev properties are only added to those devices that actually support them. If the option isn't given (or "auto" is specified), the setting of the BlockBackend is used for compatibility with the old options. For block jobs, "auto" is the same as "enospc". Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:32:27 +02:00
Kevin Wolf	1e8fb7f1ee	commit: Fix use of error handling policy Commit implemented the 'enospc' policy as 'ignore' if the error was not ENOSPC. The QAPI documentation promises that it's treated as 'stop'. Using the common block job error handling function fixes this and also adds the missing QMP event. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:32:27 +02:00
Paolo Bonzini	0b8b8753e4	coroutine: move entry argument to qemu_coroutine_create In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine co = qemu_coroutine_create(entry); + Coroutine co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Fam Zheng	5af7045bd0	raw-posix: Use qemu_dup Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	fd62c609ed	commit: Add 'job-id' parameter to 'block-commit' This patch adds a new optional 'job-id' parameter to 'block-commit', allowing the user to specify the ID of the block job to be created. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	2323322ed0	stream: Add 'job-id' parameter to 'block-stream' This patch adds a new optional 'job-id' parameter to 'block-stream', allowing the user to specify the ID of the block job to be created. The HMP 'block_stream' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	70559d499c	backup: Add 'job-id' parameter to 'blockdev-backup' and 'drive-backup' This patch adds a new optional 'job-id' parameter to 'blockdev-backup' and 'drive-backup', allowing the user to specify the ID of the block job to be created. The HMP 'drive_backup' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	71aa98678c	mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror' This patch adds a new optional 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror', allowing the user to specify the ID of the block job to be created. The HMP 'drive_mirror' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	7f0317cfc8	blockjob: Add 'job_id' parameter to block_job_create() When a new job is created, the job ID is taken from the device name of the BDS. This patch adds a new 'job_id' parameter to let the caller provide one instead. This patch also verifies that the ID is always unique and well-formed. This causes problems in a couple of places where no ID is being set, because the BDS does not have a device name. In the case of test_block_job_start() (from test-blockjob-txn.c) we can simply use this new 'job_id' parameter to set the missing ID. In the case of img_commit() (from qemu-img.c) we still don't have the API to make commit_active_start() set the job ID, so we solve it by setting a default value. We'll get rid of this as soon as we extend the API. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	9df229c3ca	blockjob: Update description of the 'id' field The 'id' field of the BlockJob structure will be able to hold any ID, not only a device name. This patch updates the description of that field and the error messages where it is being used. Soon we'll add the ability to set an arbitrary ID when creating a block job. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Markus Armbruster	a9c94277f0	Use #include "..." for our own headers, <...> for others Tracked down with an ugly, brittle and probably buggy Perl script. Also move includes converted to <...> up so they get included before ours where that's obviously okay. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:19:16 +02:00
Peter Maydell	975b1c3ac6	QAPI patches for 2016-07-06 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXfMjDAAoJEDhwtADrkYZTyKwP/i5Lva3qhDk9AJhvPNSbnps/ sxVGVkUTeUM0UoXdKmDChkk2zScv2AOVxL3oibb22PmU1MvDU+XoKYokae/VucVS WL16t/Z77QxSJ1yKhXT3i0P7+9sR9Gq2eDHchSSITLreVO8/jNb7u1JjnV4nPyQP Scg6sCCnfgSw8W4EOz0wDhwXHbRv7obSY0lGUD1R6FifETHQ98rTpII0BS1AyvbF HIJMcGpirAe62BGNQeJv933mrD1EgRfBytQAkEL/l6DDlz/xk9AE26bsjdP5Xa9A MwEBje9CPm0ICleqq1ZVKNQQqng2PVfXXXdJYKQKVnIHL+REzRDKbTGTlS4njdVi v2i4E7MwDPIf86zRUXJ6eUkWDHgld4of2WZmoxNPZmWWmHMb31iqGcb0//mYmP06 QOPd/uxpxg9pze4JeQMhzHCbFum2/s43cyqaH24lrwhTw3EsI45t4sgJQTG1NIFt ZdPXVwNBfa6ur50dsRWHYonQimpofW94YYupWLggiVGKTzvfSWVQJGc9Ho65P23y ZJrkbN3bWNnwVFzIbWFI5TKnE/TsnWvTPwk+9cGSP4MKf1sy6t5nIycoTiP3vC9x PesENYYKy5ZkOkKQpC6lGwjkMl24ii3lVl5g4vWXXjtZHwGNmq/7/UJ7Dm4htqIk B5gm5gQ/2Tul4Dihlteb =BgpV -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-07-06' into staging QAPI patches for 2016-07-06 # gpg: Signature made Wed 06 Jul 2016 10:00:51 BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2016-07-06: replay: Use new QAPI cloning sockets: Use new QAPI cloning qapi: Add new clone visitor qapi: Add new visit_complete() function tests: Factor out common code in qapi output tests tests: Clean up test-string-output-visitor qmp-output-visitor: Favor new visit_free() function string-output-visitor: Favor new visit_free() function qmp-input-visitor: Favor new visit_free() function string-input-visitor: Favor new visit_free() function opts-visitor: Favor new visit_free() function qapi: Add new visit_free() function qapi: Add parameter to visit_end_* qemu-img: Don't leak errors when outputting JSON qapi: Improve use of qmp/types.h Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-07-06 11:38:09 +01:00
Eric Blake	3b098d5697	qapi: Add new visit_complete() function Making each output visitor provide its own output collection function was the only remaining reason for exposing visitor sub-types to the rest of the code base. Add a polymorphic visit_complete() function which is a no-op for input visitors, and which populates an opaque pointer for output visitors. For maximum type-safety, also add a parameter to the output visitor constructors with a type-correct version of the output pointer, and assert that the two uses match. This approach was considered superior to either passing the output parameter only during construction (action at a distance during visit_free() feels awkward) or only during visit_complete() (defeating type safety makes it easier to use incorrectly). Most callers were function-local, and therefore a mechanical conversion; the testsuite was a bit trickier, but the previous cleanup patch minimized the churn here. The visit_complete() function may be called at most once; doing so lets us use transfer semantics rather than duplication or ref-count semantics to get the just-built output back to the caller, even though it means our behavior is not idempotent. Generated code is simplified as follows for events: \|@@ -26,7 +26,7 @@ void qapi_event_send_acpi_device_ost(ACP \| QDict qmp; \| Error err = NULL; \| QMPEventFuncEmit emit; \|- QmpOutputVisitor qov; \|+ QObject obj; \| Visitor v; \| q_obj_ACPI_DEVICE_OST_arg param = { \| info \|@@ -39,8 +39,7 @@ void qapi_event_send_acpi_device_ost(ACP \| \| qmp = qmp_event_build_dict("ACPI_DEVICE_OST"); \| \|- qov = qmp_output_visitor_new(); \|- v = qmp_output_get_visitor(qov); \|+ v = qmp_output_visitor_new(&obj); \| \| visit_start_struct(v, "ACPI_DEVICE_OST", NULL, 0, &err); \| if (err) { \|@@ -55,7 +54,8 @@ void qapi_event_send_acpi_device_ost(ACP \| goto out; \| } \| \|- qdict_put_obj(qmp, "data", qmp_output_get_qobject(qov)); \|+ visit_complete(v, &obj); \|+ qdict_put_obj(qmp, "data", obj); \| emit(QAPI_EVENT_ACPI_DEVICE_OST, qmp, &err); and for commands: \| { \| Error err = NULL; \|- QmpOutputVisitor qov = qmp_output_visitor_new(); \| Visitor v; \| \|- v = qmp_output_get_visitor(qov); \|+ v = qmp_output_visitor_new(ret_out); \| visit_type_AddfdInfo(v, "unused", &ret_in, &err); \|- if (err) { \|- goto out; \|+ if (!err) { \|+ visit_complete(v, ret_out); \| } \|- *ret_out = qmp_output_get_qobject(qov); \|- \|-out: \| error_propagate(errp, err); Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-13-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-07-06 10:52:04 +02:00
Eric Blake	1830f22a67	qmp-output-visitor: Favor new visit_free() function Now that we have a polymorphic visit_free(), we no longer need qmp_output_visitor_cleanup(); however, we still need to expose the subtype for qmp_output_get_qobject(). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-10-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-07-06 10:52:04 +02:00
Eric Blake	09204eac9b	opts-visitor: Favor new visit_free() function Now that we have a polymorphic visit_free(), we no longer need opts_visitor_cleanup(); which in turn means we no longer need to return a subtype from opts_visitor_new() nor a public upcast function. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-6-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-07-06 10:52:04 +02:00
Eric Blake	1158bb2a05	qapi: Add parameter to visit_end_* Rather than making the dealloc visitor track of stack of pointers remembered during visit_start_* in order to free them during visit_end_, it's a lot easier to just make all callers pass the same pointer to visit_end_. The generated code has access to the same pointer, while all other users are doing virtual walks and can pass NULL. The dealloc visitor is then greatly simplified. All three visit_end_() functions intentionally take a void, even though the visit_start_() functions differ between void, GenericList, and GenericAlternate*. This is done for several reasons: when doing a virtual walk, passing NULL doesn't care what the type is, but when doing a generated walk, we already have to cast the caller's specific FOO to call visit_start, while using void** lets us use visit_end without a cast. Also, an upcoming patch will add a clone visitor that wants to use the same implementation for all three visit_end callbacks, which is made easier if all three share the same signature. For visitors with already track per-object state (the QMP visitors via a stack, and the string visitors which do not allow nesting), add an assertion that the caller is indeed passing the same pointer to paired calls. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-4-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-07-06 10:52:04 +02:00
Peter Maydell	f1f7a1ddf3	block/qcow2: Don't use cpu_to_w() Don't use the cpu_to_w() functions, which we are trying to deprecate. Instead either just use cpu_to_() to do the byteswap, or use st_be_p() if we need to do the store somewhere other than to a variable that's already the correct type. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1466093177-17890-1-git-send-email-peter.maydell@linaro.org Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-07-05 16:54:04 +02:00
Kevin Wolf	a03ef88f77	block: Convert bdrv_co_preadv/pwritev to BdrvChild This is the final patch for converting the common I/O path to take a BdrvChild parameter instead of BlockDriverState. The completion of this conversion means that all users that perform I/O on an image need to actually hold a reference (in the form of BdrvChild, possible as part of a BlockBackend) to that image. This also protects against inconsistent use of BlockBackend vs. BlockDriverState functions because direct use of a BlockDriverState isn't possible any more and blk->root is private for block-backends.c. In addition, we can now distinguish different users in the I/O path, and the future op blockers work is going to add assertions based on permissions stored in BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	e293b7a3df	block: Convert bdrv_prwv_co() to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	720ff280e7	block: Convert bdrv_pwrite_zeroes() to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	d9ca2ea2e2	block: Convert bdrv_pwrite(v/_sync) to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	cf2ab8fc34	block: Convert bdrv_pread(v) to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	18d51c4bac	block: Convert bdrv_write() to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	fbcbbf4e80	block: Convert bdrv_read() to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	f8e2bd538d	block: Use BlockBackend for I/O in bdrv_commit() Just like block jobs, the HMP commit command should use its own BlockBackend for doing I/O on BlockDriverStates. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	83fd6dd3e7	block: Move bdrv_commit() to block/commit.c No code changes, just moved from one file to another. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	adad6496c5	block: Convert bdrv_co_do_readv/writev to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	0d1049c7d1	block: Convert bdrv_aio_writev() to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:26 +02:00
Kevin Wolf	ebb7af2173	block: Convert bdrv_aio_readv() to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:26 +02:00
Kevin Wolf	25ec177d90	block: Convert bdrv_co_writev() to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:26 +02:00
Kevin Wolf	28b04a8f65	block: Convert bdrv_co_readv() to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:26 +02:00
Kevin Wolf	db1e80ee2e	vhdx: Some more BlockBackend use in vhdx_create() This does some easy conversions from bdrv_* to blk_* functions in vhdx_create(). We should avoid bypassing the BlockBackend layer whenever possible. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:26 +02:00
Kevin Wolf	e858a9705c	blkreplay: Convert to byte-based I/O The blkreplay driver only forwards the requests it gets, so converting it to byte granularity is trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:26 +02:00
Kevin Wolf	eecc77473b	vvfat: Use BdrvChild for s->qcow vvfat uses a temporary qcow file to cache written data in read-write mode. In order to do things properly, this should show up in the BDS graph and I/O should go through BdrvChild like for every other node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:26 +02:00
Denis V. Lunev	1c42f149dd	block: fix return code for partial write for Linux AIO Partial write most likely means that there is not space rather than "something wrong happens". Thus it would be more natural to return ENOSPC rather than EINVAL. The problem actually happens with NBD server, which has reported EINVAL rather then ENOSPC on the first error using its protocol, which makes report to the user wrong. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:26 +02:00
Eric Blake	5411541270	block: Use bool as appropriate for BDS members Using int for values that are only used as booleans is confusing. While at it, rearrange a couple of members so that all the bools are contiguous. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:26 +02:00
Eric Blake	8cc9c6e92b	block: Fix error message style error_setg() is not supposed to be used for multi-sentence messages; tweak the message to append a hint instead. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:26 +02:00
Eric Blake	a5b8dd2ce8	block: Move request_alignment into BlockLimit It makes more sense to have ALL block size limit constraints in the same struct. Improve the documentation while at it. Simplify a couple of conditionals, now that we have audited and documented that request_alignment is always non-zero. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:26 +02:00
Eric Blake	d9e0dfa246	block: Split bdrv_merge_limits() from bdrv_refresh_limits() During bdrv_merge_limits(), we were computing initial limits based on another BDS in two places. At first glance, the two computations are not identical (one is doing straight copying, the other is doing merging towards or away from zero) - but when you realize that the first round is starting with all-0 memory, all of the merging happens to work. Factoring out the merging makes it easier to track how two BDS limits are merged, in case we have future reasons to merge in even more limits. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:26 +02:00
Eric Blake	ad82be2f4f	block: Drop raw_refresh_limits() The raw block driver was blindly copying all limits from bs->file, even though: 1. the main bdrv_refresh_limits() already does this for many of the limits, and 2. blindly copying from the children can weaken any stricter limits that were already inherited from the backing chain during the main bdrv_refresh_limits(). Also, a future patch is about to move .request_alignment into BlockLimits, and that is a limit that should NOT be copied from other layers in the BDS chain. Thus, we can completely drop raw_refresh_limits(), and rely on the block layer setting up the proper limits. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	b9f7855a50	block: Switch discard length bounds to byte-based Sector-based limits are awkward to think about; in our on-going quest to move to byte-based interfaces, convert max_discard and discard_alignment. Rename them, using 'pdiscard' as an aid to track which remaining discard interfaces need conversion, and so that the compiler will help us catch the change in semantics across any rebased code. The BlockLimits type is now completely byte-based; and in iscsi.c, sector_limits_lun2qemu() is no longer needed. pdiscard_alignment is made unsigned (we use power-of-2 alignments as bitmasks, where unsigned is easier to think about) while leaving max_pdiscard signed (since we still have an 'int' interface); this is comparable to what commit `cf081fc` did for write zeroes limits. We may later want to make everything an unsigned 64-bit limit - but that requires a bigger code audit. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	5def6b80e1	block: Switch transfer length bounds to byte-based Sector-based limits are awkward to think about; in our on-going quest to move to byte-based interfaces, convert max_transfer_length and opt_transfer_length. Rename them (dropping the _length suffix) so that the compiler will help us catch the change in semantics across any rebased code, and improve the documentation. Use unsigned values, so that we don't have to worry about negative values and so that bit-twiddling is easier; however, we are still constrained by 2^31 of signed int in most APIs. When a value comes from an external source (iscsi and raw-posix), sanitize the results to ensure that opt_transfer is a power of 2. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	79ba8c986a	block: Set default request_alignment during bdrv_refresh_limits() We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. Now that all drivers have been updated to supply an override of request_alignment during their .bdrv_refresh_limits(), as needed, the block layer itself can defer setting the default alignment until part of the overall bdrv_refresh_limits(). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	a65064816d	block: Set request_alignment during .bdrv_refresh_limits() We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. Add a .bdrv_refresh_limits() to all four of our legacy devices that will always be sector-only (bochs, cloop, dmg, vvfat), in spite of their recent conversion to expose a byte interface. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	2914a1de99	raw-win32: Set request_alignment during .bdrv_refresh_limits() We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. In this case, raw_probe_alignment() already did what we needed, so just fix its signature and wire it in correctly. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	a84178ccff	qcow2: Set request_alignment during .bdrv_refresh_limits() We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	c8b3b998e2	iscsi: Set request_alignment during .bdrv_refresh_limits() We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	835db3ee7b	blkdebug: Set request_alignment during .bdrv_refresh_limits() We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. Note that when the user does not provide "align", then we were defaulting to bs->request_alignment - but at this stage in the initialization, that was always 512. We were also rejecting an explicit "align":0 from the user; this patch now allows that, as an explicit request for the default alignment (which may not always be 512 in the future). qemu-iotests 77 is particularly sensitive to the fact that we can specify an artificial alignment override in blkdebug, and that override must continue to work even when limits are refreshed on an already open device. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	24ce9a2026	block: Give nonzero result to blk_get_max_transfer_length() Making all callers special-case 0 as unlimited is awkward, and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given our current block layer API limits. In the case of scsi, this means that we now always advertise a limit to the guest, even in cases where the underlying layers previously use 0 for no inherent limit beyond the block layer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	f9e95af0a6	iscsi: Advertise realistic limits to block layer The function sector_limits_lun2qemu() returns a value in units of the block layer's 512-byte sector, and can be as large as 0x40000000, which is much larger than the block layer's inherent limit of BDRV_REQUEST_MAX_SECTORS. The block layer already handles '0' as a synonym to the inherent limit, and it is nicer to return this value than it is to calculate an arbitrary maximum, for two reasons: we want to ensure that the block layer continues to special-case '0' as 'no limit beyond the inherent limits'; and we want to be able to someday expand the block layer to allow 64-bit limits, where auditing for uses of BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't artificially constraining iscsi to old block layer limits. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	202204717a	nbd: Advertise realistic limits to block layer We were basing the advertisement of maximum discard and transfer length off of UINT32_MAX, but since the rest of the block layer has signed int limits on a transaction, nothing could ever reach that maximum, and we risk overflowing an int once things are converted to byte-based rather than sector-based limits. What's more, we DO have a much smaller limit: both the current kernel and qemu-nbd have a hard limit of 32M on a read or write transaction, and while they may also permit up to a full 32 bits on a discard transaction, the upstream NBD protocol is proposing wording that without any explicit advertisement otherwise, clients should limit ALL requests to the same limits as read and write, even though the other requests do not actually require as many bytes across the wire. So the better limit to tell the block layer is 32M for both values. Behavior doesn't actually change with this patch (the block layer is currently ignoring the max_transfer advertisements); but when that problem is fixed in a later series, this patch will prevent the exposure of a latent bug. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:24 +02:00
Eric Blake	476b923c32	nbd: Allow larger requests The NBD layer was breaking up request at a limit of 2040 sectors (just under 1M) to cater to old qemu-nbd. But the server limit was raised to 32M in commit `2d8214885` to match the kernel, more than three years ago; and the upstream NBD Protocol is proposing documentation that without any explicit communication to state otherwise, a client should be able to safely assume that a 32M transaction will work. It is time to rely on the larger sizing, and any downstream distro that cares about maximum interoperability to older qemu-nbd servers can just tweak the value of #define NBD_MAX_SECTORS. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-stable@nongnu.org Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:24 +02:00
Eric Blake	82524274ea	block: Fix harmless off-by-one in bdrv_aligned_preadv() If the amount of data to read ends exactly on the total size of the bs, then we were wasting time creating a local qiov to read the data in preparation for what would normally be appending zeroes beyond the end, even though this corner case has nothing further to do. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:24 +02:00
Eric Blake	a604fa2ba5	block: Document supported flags during bdrv_aligned_preadv() We don't pass any flags on to drivers to handle. Tighten an assert to explain why we pass 0 to bdrv_driver_preadv(), and add some comments on things to be aware of if we want to turn on per-BDS BDRV_REQ_FUA support during reads in the future. Also, document that we may want to consider using unmap during copy-on-read operations where the read is all zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:24 +02:00
Eric Blake	cff86b38ac	block: Tighter assertions on bdrv_aligned_pwritev() For symmetry with bdrv_aligned_preadv(), assert that the caller really has aligned things properly. This requires adding an align parameter, which is used now only in the new asserts, but will come in handy in a later patch that adds auto-fragmentation to the max transfer size, since that value need not always be a multiple of the alignment, and therefore must be rounded down. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:24 +02:00
Peter Maydell	1ec20c2a3a	* serial port fixes (Paolo) * Q35 modeling improvements (Paolo, Vasily) * chardev cleanup improvements (Marc-André) * iscsi bugfix (Peter L.) * cpu_exec patch from multi-arch patches (Peter C.) * pci-assign tweak (Lin Ma) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJXc+GeAAoJEL/70l94x66DtIAH/3+eUBqSxVJ3SMUxJep2Op07 lIWqw1GHAdw1gWQDG4HzokKWrVVp/+NFYQjRFcNMfF8L+/Xm6hHAYc7Y4DMkDxSw zHX2BT93gPcaFJRz3Md8n2anzFHaWePx7LucPjaoas2OzrbVKXC8JT6n3GGnKQzZ 0CxDoyW4keI4ZVAOy9SOKsLPxdSvG8uLvaZU98l/YS/TuiGzpv8IWcdHR+k1hua+ FIenzj7jD9+JFoLEUWkU0pYs33J6yYKPiZn7HgGL9RNWKPFR88+CtMdYXgfOPo7z i05L9RTmL4SpahmStPN2r72MC0T0ub0czk/+qxBNms4r/2gBwaSyldmcTfAXM9o= =DA8v -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * serial port fixes (Paolo) * Q35 modeling improvements (Paolo, Vasily) * chardev cleanup improvements (Marc-André) * iscsi bugfix (Peter L.) * cpu_exec patch from multi-arch patches (Peter C.) * pci-assign tweak (Lin Ma) # gpg: Signature made Wed 29 Jun 2016 15:56:30 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (35 commits) socket: unlink unix socket on remove socket: add listen feature char: clean up remaining chardevs when leaving vhost-user: disable chardev handlers on close vhost-user-test: fix g_cond_wait_until compat implementation vl: smp_parse: fix regression ich9: implement SCI_IRQ_SEL register ich9: implement ACPI_EN register serial: reinstate watch after migration serial: remove watch on reset char: change qemu_chr_fe_add_watch to return unsigned serial: separate serial_xmit and serial_watch_cb serial: simplify tsr_retry reset serial: make tsr_retry unsigned iscsi: fix assertion in is_sector_request_lun_aligned target-*: Don't redefine cpu_exec() pci-assign: Move "Invalid ROM" error message to pci-assign-load-rom.c vnc: generalize "VNC server running on ..." message scsi: esp: fix migration MC146818 RTC: add GPIO access to output IRQ ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-06-29 19:14:48 +01:00
Peter Lieven	0ead93120e	iscsi: fix assertion in is_sector_request_lun_aligned Commit `94d047a` added an assertion the the request alignment check. This introduced 2 issues: a) A off-by-one error since a request of BDRV_REQUEST_MAX_SECTORS is actually allowed. b) The bdrv_get_block_status call in the read path to check the allocation status requests up to INT_MAX sectors which triggers the assertion. Fixes: `94d047a35b` Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1466414680-18383-1-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-29 14:03:47 +02:00
Changlong Xie	15d6729850	mirror: fix misleading comments s/target bs/to_replace/, also we check to_replace bs is not blocked in qmp_drive_mirror() not here Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1466672241-22485-3-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 23:08:25 -04:00
Changlong Xie	b48100cf07	blockjob: assert(cb) when create job Callback for block job should always exist Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1466672241-22485-2-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 23:08:13 -04:00
John Snow	e4808881cb	mirror: limit niov to IOV_MAX elements, again During the refactor of mirror_iteration in `e5b43573`, we regressed the fix introduced in `cae98cb8`. This patch re-adds IOV_MAX checking to cases where we aren't checking alignment (and size) already. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466625064-11280-3-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:53:03 -04:00
John Snow	176129552f	mirror: clarify mirror_do_read return code mirror_do_read intends to return the number of sectors processed after the starting sector, without regard to how many sectors were processed before the starting sector due to alignment. Clean up the comments and code to hopefully illustrate this more clearly. This also fixes an issue in initialization where if the mirror buffer size is initialized to smaller than the number of sectors being requested for transfer, we report back an incorrectly large number to the caller. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466625064-11280-2-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:53:03 -04:00
Jeff Cody	7eac868a50	block/gluster: add support for selecting debug logging level This adds commandline support for the logging level of the gluster protocol driver, output to stdout. The option is 'debug', e.g.: -drive filename=gluster://192.168.15.180/gv2/test.qcow2,debug=9 Debug levels are 0-9, with 9 being the most verbose, and 0 representing no debugging output. The default is the same as it was before, which is a level of 4. The current logging levels defined in the gluster source are: 0 - None 1 - Emergency 2 - Alert 3 - Critical 4 - Error 5 - Warning 6 - Notice 7 - Info 8 - Debug 9 - Trace (From: glusterfs/logging.h) Reviewed-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:52:45 -04:00
Denis V. Lunev	ff04198bf5	mirror: fix trace_mirror_yield_in_flight usage in mirror_iteration() trace_mirror_yield_in_flight accepts 2nd arguments in sectors while here we pass chunks instead. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1466518157-27140-1-git-send-email-den@openvz.org CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:52:45 -04:00
Peter Lieven	d99b26c42a	block/nfs: add support for libnfs pagecache upcoming libnfs will have support for a read cache that can significantly help to speed up requests since libnfs by design circumvents the kernel cache. Example: qemu -cdrom nfs://127.0.0.1/iso/my.iso?pagecache=1024 The pagecache parameters takes the maximum amount of pages to cache. A page in libnfs is always the NFS_BLKSIZE which is 4KB. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1463662083-20814-3-git-send-email-pl@kamp.de Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:52:45 -04:00
Peter Lieven	38f8d5e025	block/nfs: refuse readahead if cache.direct is on if we open a NFS export with disabled cache we should refuse the readahead feature as it will cache data inside libnfs. If a export was opened with readahead enabled it should futher not be allowed to disable the cache while running. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1463662083-20814-2-git-send-email-pl@kamp.de Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:52:45 -04:00
Niels de Vos	947eb2030e	block/gluster: add support for SEEK_DATA/SEEK_HOLE GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes it possible to detect sparse areas in files. Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:52:45 -04:00
Peter Maydell	b0ad00b8c9	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXaFInAAoJEJykq7OBq3PI6VsH/0Sfgbdo1RksYuQwb/y92sCW EN+lxUZ+OLfgrc8PYgNZwfSM3rsfYhznL0MAXOeEe7Ahabi07w7DhGR8WvwfAOlI G96FRuvrIPfv5u6U6fwS4CvG3TIHVLxfHKCsTpPUmH8U5CNx/x/tpjNiWN1dj6t+ sXybSjYHfZfiZy2tI9MFIFWCdxnF/pl0QAPhbRqc8Y/RQTDrPKRjLpz+nitN/u96 5TS7KlELyQuP91YMmLceYSmIkHbxW703h+iE2n4hov0uZCP8Jil+2Jsd3ziQSRlL j6LqexQ2ViBGdDSfiZGYES2VPlsHOCwb4G+IgWBStfZg1ppaXENvcDzPrgrB+L4= =eUnF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging # gpg: Signature made Mon 20 Jun 2016 21:29:27 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/tracing-pull-request: (42 commits) trace: split out trace events for linux-user/ directory trace: split out trace events for qom/ directory trace: split out trace events for target-ppc/ directory trace: split out trace events for target-s390x/ directory trace: split out trace events for target-sparc/ directory trace: split out trace events for net/ directory trace: split out trace events for audio/ directory trace: split out trace events for ui/ directory trace: split out trace events for hw/alpha/ directory trace: split out trace events for hw/arm/ directory trace: split out trace events for hw/acpi/ directory trace: split out trace events for hw/vfio/ directory trace: split out trace events for hw/s390x/ directory trace: split out trace events for hw/pci/ directory trace: split out trace events for hw/ppc/ directory trace: split out trace events for hw/9pfs/ directory trace: split out trace events for hw/i386/ directory trace: split out trace events for hw/isa/ directory trace: split out trace events for hw/sd/ directory trace: split out trace events for hw/sparc/ directory ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-06-20 22:30:34 +01:00
Daniel P. Berrange	b54ca48e40	trace: split out trace events for block/ directory Move all trace-events for files in the block/ directory to their own file. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 1466066426-16657-7-git-send-email-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 17:22:14 +01:00
Peter Maydell	7fa124b273	Error reporting patches for 2016-06-20 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIbBAABAgAGBQJXaAQPAAoJEDhwtADrkYZTbPMP+OmXlf1IwcUONhaHqD0SXAdC Q5NNHheNHbIdzrsOmgj2GAxCeGG08REPrdYLb8N9lI3Fev8ggQ0v3d9qiStL6Nvz 2B7DeTxXba9Qmc2gY86RR4f7T6VK+8k1Kj9f3YcAoXHucdjDLRmtUozwES3Kk8mn oeBqd07yuTAO93xQhF6BDv3mfagPZV+kp15Se04ufWPpT9eG3Cp9Pl4/Yad/wJUr 2B4pIvYnhISVlZIOvpuzdydBaCb/U0YefjWBDkawLwRBtM/kpNKoDdqr2UA0yChQ Xulgpt+UQLZ4aOYGLkzqBT7v3kDeEnYUFDdE03xZBLbHddOU05NEhLgzRVUEAdUS EAKWvWu692USDh9a2r0AYLqZW6J0+Fyi4fZ34gSuP69oi74CIvjJHDAEwsSdt6bm WHCchvfIk9kinzGiFA3hkAbVWAbI0EpB0J7k3mgCVVVelBpF9b9U4M2ydx9ZPufX t4ULXxQO9Qyxr7r0uiznugQAjIRwLMDOPPd1F5Vi+LX0wF3Pcrtc/F+3ehoLBjBD 6zZA5lZVlUEISdzzytfjeX3ch5vRFwlujrTd2anxRuduuBV1/ztvE1v0Nsiii6Sw 15BaHQJvBnMnAIO5LfhnAaVpIBk0jpQpwVyRcNUpeZ1VU2qlqVS4zxYZ0FDQgyer IZa1qYJ0Sz+oMkq9RXU= =TptL -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-06-20' into staging Error reporting patches for 2016-06-20 # gpg: Signature made Mon 20 Jun 2016 15:56:15 BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2016-06-20: log: Fix qemu_set_log_filename() error handling log: Fix qemu_set_dfilter_ranges() error reporting log: Plug memory leak on multiple -dfilter coccinelle: Remove unnecessary variables for function return value error: Remove unnecessary local_err variables error: Remove NULL checks on error_propagate() calls vl: Error messages need to go to stderr, fix some Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-06-20 16:19:18 +01:00
Eduardo Habkost	9be385980d	coccinelle: Remove unnecessary variables for function return value Use Coccinelle script to replace 'ret = E; return ret' with 'return E'. The script will do the substitution only when the function return type and variable type are the same. Manual fixups: * audio/audio.c: coding style of "read (...)" and "write (...)" * block/qcow2-cluster.c: wrap line to make it shorter * block/qcow2-refcount.c: change indentation of wrapped line * target-tricore/op_helper.c: fix coding style of "remainder\|quotient" * target-mips/dsp_helper.c: reverted changes because I don't want to argue about checkpatch.pl * ui/qemu-pixman.c: fix line indentation * block/rbd.c: restore blank line between declarations and statements Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1465855078-19435-4-git-send-email-ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Unused Coccinelle rule name dropped along with a redundant comment; whitespace touched up in block/qcow2-cluster.c; stale commit message paragraph deleted] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-06-20 16:38:13 +02:00
Eduardo Habkost	6b62d96137	error: Remove unnecessary local_err variables This patch simplifies code that uses a local_err variable just to immediately use it for an error_propagate() call. Coccinelle patch used to perform the changes added to scripts/coccinelle/remove_local_err.cocci. Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1465855078-19435-3-git-send-email-ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Blank line in s390-virtio-ccw.c restored] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-06-20 16:38:13 +02:00
Eduardo Habkost	621ff94d50	error: Remove NULL checks on error_propagate() calls error_propagate() already ignores local_err==NULL, so there's no need to check it before calling. Coccinelle patch used to perform the changes added to scripts/coccinelle/error_propagate_null.cocci. Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1465855078-19435-2-git-send-email-ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-06-20 16:38:13 +02:00
Stefan Hajnoczi	5ab4b69ce2	backup: follow AioContext change gracefully Move s->target to the new AioContext when there is an AioContext change. The backup_run() coroutine does not use asynchronous I/O so there is no need to wait for in-flight requests in a BlockJobDriver->pause() callback. Guest writes are intercepted by the backup job. Treat them as guest activity and do it even while the job is paused. This is necessary since the only alternative would be to fail a job that experienced guest writes during pause once the job is resumed. In practice the guest writes don't interfere with AioContext switching since bdrv_drain() is used by bdrv_set_aio_context(). Loops already contain pause points because of block_job_sleep_ns() calls in the yield_and_check() helper function. It is necessary to convert a raw qemu_coroutine_yield() to block_job_yield() so the MIRROR_SYNC_MODE_NONE case can pause. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-9-git-send-email-stefanha@redhat.com	2016-06-20 14:25:41 +01:00
Stefan Hajnoczi	565ac01f8d	mirror: follow AioContext change gracefully Add block_job_pause_point() calls to mark quiescent points and make sure to complete in-flight requests when switching AioContexts. This patch solves undefined behavior in the mirror block job when the BDS AioContext is changed by dataplane. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-8-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 14:25:41 +01:00
Denis V. Lunev	ec050f77a5	block: process before_write_notifiers in bdrv_co_discard This is mandatory for correct backup creation. In the other case the content under this area would be lost. Dirty bits are set exactly like in bdrv_aligned_pwritev, i.e. they are set even if notifier has returned a error. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1466093381-6120-4-git-send-email-den@openvz.org CC: Fam Zheng <famz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 11:44:12 +01:00
Denis V. Lunev	968d8b0627	block: fix race in bdrv_co_discard with drive-mirror Actually we must set dirty bitmap dirty after we have written all our zeroes for correct processing in drive mirror code. In the other case we can face not zeroes in this area in mirror_iteration. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1466093381-6120-3-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 11:44:12 +01:00
Denis V. Lunev	3a36e474f2	block: fixed BdrvTrackedRequest filling in bdrv_co_discard The request area is specified in bytes, not in sectors. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1466093381-6120-2-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 11:44:12 +01:00
Paolo Bonzini	02d0e09503	os-posix: include sys/mman.h qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 18:39:03 +02:00
Max Reitz	67882b1535	block/null: Implement bdrv_refresh_filename() The null block driver ignores any filename used for creating its BDSs, which allows creating such BDSs even without any filename at all. In that case, we currently construct a JSON filename when queried instead of a plain "null-co://" or "null-aio://". This patch implements bdrv_refresh_filename() to remedy this behavior. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-4-mreitz@redhat.com [mreitz@redhat.com: Added commit message] Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Max Reitz	274fccee2b	block/mirror: Fix target backing BDS Currently, we are trying to move the backing BDS from the source to the target in bdrv_replace_in_backing_chain() which is called from mirror_exit(). However, mirror_complete() already tries to open the target's backing chain with a call to bdrv_open_backing_file(). First, we should only set the target's backing BDS once. Second, the mirroring block job has a better idea of what to set it to than the generic code in bdrv_replace_in_backing_chain() (in fact, the latter's conditions on when to move the backing BDS from source to target are not really correct). Therefore, remove that code from bdrv_replace_in_backing_chain() and leave it to mirror_complete(). Depending on what kind of mirroring is performed, we furthermore want to use different strategies to open the target's backing chain: - If blockdev-mirror is used, we can assume the user made sure that the target already has the correct backing chain. In particular, we should not try to open a backing file if the target does not have any yet. - If drive-mirror with mode=absolute-paths is used, we can and should reuse the already existing chain of nodes that the source BDS is in. In case of sync=full, no backing BDS is required; with sync=top, we just link the source's backing BDS to the target, and with sync=none, we use the source BDS as the target's backing BDS. We should not try to open these backing files anew because this would lead to two BDSs existing per physical file in the backing chain, and we would like to avoid such concurrent access. - If drive-mirror with mode=existing is used, we have to use the information provided in the physical image file which means opening the target's backing chain completely anew, just as it has been done already. If the target's backing chain shares images with the source, this may lead to multiple BDSs per physical image file. But since we cannot reliably ascertain this case, there is nothing we can do about it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-3-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Vikhyat Umrao	87cd3d20e1	rbd:change error_setg() to error_setg_errno() Ceph RBD block driver does not use error_setg_errno() where it is possible to use. This patch replaces error_setg() from error_setg_errno(). Signed-off-by: Vikhyat Umrao <vumrao@redhat.com> Message-id: 1462780319-5796-1-git-send-email-vumrao@redhat.com Reviewed-by: Josh Durgin <jdurgin@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Alberto Garcia	834fe28ddf	block: Create the commit block job before reopening any image If the base or overlay images need to be reopened in read-write mode but the block_job_create() call fails then no one will put those images back in read-only mode. We can solve this problem easily by calling block_job_create() first. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: aa495045770a6f1a7cc5d408397a17c75097fdd8.1464346103.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Alberto Garcia	eb1364ceac	block: use the block job list in bdrv_drain_all() bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate over all top-level BlockDriverStates. Therefore the code is unable to find block jobs in other nodes. This patch uses block_job_next() to iterate over all block jobs. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 55ee7d7d4a65c28aa1a1b28823897ef326f328e2.1464346103.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Kevin Wolf	c9d20029f4	block: Remove bs->zero_beyond_eof It is always true for open images now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	734a77584a	qcow2: Let vmstate call qcow2_co_preadv/pwrite directly We don't really want to go through the block layer in order to read from or write to the vmstate in a qcow2 image. Doing so required a few ugly hacks like saving and restoring the old image size (because writing to vmstate offsets would increase the image size) or disabling the "reads after EOF = zeroes" logic. When calling the right functions directly, these hacks aren't necessary any more. Note that .bdrv_vmstate_load/save() return 0 instead of the number of bytes in case of success now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	1a8ae82217	block: Make bdrv_load/save_vmstate coroutine_fns This allows drivers to share code between normal I/O and vmstate accesses. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	b433d9424d	block: Allow .bdrv_load/save_vmstate() to return 0/-errno The return value of .bdrv_load/save_vmstate() can be any non-negative number in case of success now. It used to be bytes/-errno. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	5ddda0b8f0	block: Make .bdrv_load_vmstate() vectored This brings it in line with .bdrv_save_vmstate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	f1e8474115	block: Introduce bdrv_preadv() We already have a byte-based bdrv_pwritev(), but the read counterpart was still missing. This commit adds it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	ccb9dc1012	linux-aio: Cancel BH if not needed linux-aio uses a BH in order to make sure that the remaining completions are processed even in nested event loops of completion callbacks in order to avoid deadlocks. There is no need, however, to have the BH overhead for the first call into qemu_laio_completion_bh() or after all pending completions have already been processed. Therefore, this patch calls directly into qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels the BH after qemu_laio_completion_bh() has processed all pending completions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	23b0d9fb1d	block: Don't enforce 512 byte minimum alignment If block drivers say that they can do an alignment < 512 bytes, let's just suppose they mean it. raw-posix used to be an offender with respect to this, but it can actually deal with byte-aligned requests now. The default is still 512 bytes for any drivers that only implement sector-based interfaces, but it is 1 now for drivers that implement .bdrv_co_preadv. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	9d52aa3c38	raw-posix: Implement .bdrv_co_preadv/pwritev The raw-posix block driver actually supports byte-aligned requests now on non-O_DIRECT images, like it already (and previously incorrectly) claimed in bs->request_alignment. For some block drivers this means that a RMW cycle can be avoided when they write sub-sector metadata e.g. for cluster allocation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	2174f12bde	raw-posix: Switch to bdrv_co_* interfaces In order to use the modern byte-based .bdrv_co_preadv/pwritev() interface, this patch switches raw-posix to coroutine-based interfaces as a first step. In terms of semantics and performance, it doesn't make a difference with the existing code whether we go from a coroutine to a callback-based interface already in block/io.c or only in linux-aio.c As there have been concerns in the past that this change may be a step in the wrong direction with respect to a possible AIO fast path, the old callback-based interface for linux-aio is left around and can be reactivated when a fast path (e.g. directly from virtio-blk dataplane, bypassing the whole block layer) is implemented. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	9896c8765f	block: Prepare bdrv_aligned_pwritev() for byte-aligned requests Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	49c0752600	block: Prepare bdrv_aligned_preadv() for byte-aligned requests This patch makes bdrv_aligned_preadv() ready to accept byte-aligned requests. Note that this doesn't mean that such requests are actually made. The caller still ensures that all requests are aligned to at least 512 bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	244483e64e	block: Byte-based bdrv_co_do_copy_on_readv() In a first step to convert the common I/O path to work on bytes rather than sectors, this converts the copy-on-read logic that is used by bdrv_aligned_preadv(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Daniel P. Berrange	8c0dcbc4ad	block: drop support for using qcow[2] encryption with system emulators Back in the 2.3.0 release we declared qcow[2] encryption as deprecated, warning people that it would be removed in a future release. commit `a1f688f415` Author: Markus Armbruster <armbru@redhat.com> Date: Fri Mar 13 21:09:40 2015 +0100 block: Deprecate QCOW/QCOW2 encryption The code still exists today, but by a (happy?) accident we entirely broke the ability to use qcow[2] encryption in the system emulators in the 2.4.0 release due to commit `8336aafae1` Author: Daniel P. Berrange <berrange@redhat.com> Date: Tue May 12 17:09:18 2015 +0100 qcow2/qcow: protect against uninitialized encryption key This commit was designed to prevent future coding bugs which might cause QEMU to read/write data on an encrypted block device in plain text mode before a decryption key is set. It turns out this preventative measure was a little too good, because we already had a long standing bug where QEMU read encrypted data in plain text mode during system emulator startup, in order to guess disk geometry: Thread 10 (Thread 0x7fffd3fff700 (LWP 30373)): #0 0x00007fffe90b1a28 in raise () at /lib64/libc.so.6 #1 0x00007fffe90b362a in abort () at /lib64/libc.so.6 #2 0x00007fffe90aa227 in __assert_fail_base () at /lib64/libc.so.6 #3 0x00007fffe90aa2d2 in () at /lib64/libc.so.6 #4 0x000055555587ae19 in qcow2_co_readv (bs=0x5555562accb0, sector_num=0, remaining_sectors=1, qiov=0x7fffffffd260) at block/qcow2.c:1229 #5 0x000055555589b60d in bdrv_aligned_preadv (bs=bs@entry=0x5555562accb0, req=req@entry=0x7fffd3ffea50, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=512, qiov=qiov@entry=0x7fffffffd260, flags=0) at block/io.c:908 #6 0x000055555589b8bc in bdrv_co_do_preadv (bs=0x5555562accb0, offset=0, bytes=512, qiov=0x7fffffffd260, flags=<optimized out>) at block/io.c:999 #7 0x000055555589c375 in bdrv_rw_co_entry (opaque=0x7fffffffd210) at block/io.c:544 #8 0x000055555586933b in coroutine_thread (opaque=0x555557876310) at coroutine-gthread.c:134 #9 0x00007ffff64e1835 in g_thread_proxy (data=0x5555562b5590) at gthread.c:778 #10 0x00007ffff6bb760a in start_thread () at /lib64/libpthread.so.0 #11 0x00007fffe917f59d in clone () at /lib64/libc.so.6 Thread 1 (Thread 0x7ffff7ecab40 (LWP 30343)): #0 0x00007fffe91797a9 in syscall () at /lib64/libc.so.6 #1 0x00007ffff64ff87f in g_cond_wait (cond=cond@entry=0x555555e085f0 <coroutine_cond>, mutex=mutex@entry=0x555555e08600 <coroutine_lock>) at gthread-posix.c:1397 #2 0x00005555558692c3 in qemu_coroutine_switch (co=<optimized out>) at coroutine-gthread.c:117 #3 0x00005555558692c3 in qemu_coroutine_switch (from_=0x5555562b5e30, to_=to_@entry=0x555557876310, action=action@entry=COROUTINE_ENTER) at coroutine-gthread.c:175 #4 0x0000555555868a90 in qemu_coroutine_enter (co=0x555557876310, opaque=0x0) at qemu-coroutine.c:116 #5 0x0000555555859b84 in thread_pool_completion_bh (opaque=0x7fffd40010e0) at thread-pool.c:187 #6 0x0000555555859514 in aio_bh_poll (ctx=ctx@entry=0x5555562953b0) at async.c:85 #7 0x0000555555864d10 in aio_dispatch (ctx=ctx@entry=0x5555562953b0) at aio-posix.c:135 #8 0x0000555555864f75 in aio_poll (ctx=ctx@entry=0x5555562953b0, blocking=blocking@entry=true) at aio-posix.c:291 #9 0x000055555589c40d in bdrv_prwv_co (bs=bs@entry=0x5555562accb0, offset=offset@entry=0, qiov=qiov@entry=0x7fffffffd260, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:591 #10 0x000055555589c503 in bdrv_rw_co (bs=bs@entry=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:614 #11 0x000055555589c562 in bdrv_read_unthrottled (nb_sectors=21845, buf=0x7fffffffd2e0 "\321,", sector_num=0, bs=0x5555562accb0) at block/io.c:622 #12 0x000055555589c562 in bdrv_read_unthrottled (bs=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845) at block/io.c:634 nb_sectors@entry=1) at block/block-backend.c:504 #14 0x0000555555752e9f in guess_disk_lchs (blk=blk@entry=0x5555562a5290, pcylinders=pcylinders@entry=0x7fffffffd52c, pheads=pheads@entry=0x7fffffffd530, psectors=psectors@entry=0x7fffffffd534) at hw/block/hd-geometry.c:68 #15 0x0000555555752ff7 in hd_geometry_guess (blk=0x5555562a5290, pcyls=pcyls@entry=0x555557875d1c, pheads=pheads@entry=0x555557875d20, psecs=psecs@entry=0x555557875d24, ptrans=ptrans@entry=0x555557875d28) at hw/block/hd-geometry.c:133 #16 0x0000555555752b87 in blkconf_geometry (conf=conf@entry=0x555557875d00, ptrans=ptrans@entry=0x555557875d28, cyls_max=cyls_max@entry=65536, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7fffffffd5e0) at hw/block/block.c:71 #17 0x0000555555799bc4 in ide_dev_initfn (dev=0x555557875c80, kind=IDE_HD) at hw/ide/qdev.c:174 #18 0x0000555555768394 in device_realize (dev=0x555557875c80, errp=0x7fffffffd640) at hw/core/qdev.c:247 #19 0x0000555555769a81 in device_set_realized (obj=0x555557875c80, value=<optimized out>, errp=0x7fffffffd730) at hw/core/qdev.c:1058 #20 0x00005555558240ce in property_set_bool (obj=0x555557875c80, v=<optimized out>, opaque=0x555557875de0, name=<optimized out>, errp=0x7fffffffd730) at qom/object.c:1514 #21 0x0000555555826c87 in object_property_set_qobject (obj=obj@entry=0x555557875c80, value=value@entry=0x55555784bcb0, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/qom-qobject.c:24 #22 0x0000555555825760 in object_property_set_bool (obj=obj@entry=0x555557875c80, value=value@entry=true, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/object.c:905 #23 0x000055555576897b in qdev_init_nofail (dev=dev@entry=0x555557875c80) at hw/core/qdev.c:380 #24 0x0000555555799ead in ide_create_drive (bus=bus@entry=0x555557629630, unit=unit@entry=0, drive=0x5555562b77e0) at hw/ide/qdev.c:122 #25 0x000055555579a746 in pci_ide_create_devs (dev=dev@entry=0x555557628db0, hd_table=hd_table@entry=0x7fffffffd830) at hw/ide/pci.c:440 #26 0x000055555579b165 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7fffffffd830, devfn=<optimized out>) at hw/ide/piix.c:218 #27 0x000055555568ca55 in pc_init1 (machine=0x5555562960a0, pci_enabled=1, kvmclock_enabled=<optimized out>) at /home/berrange/src/virt/qemu/hw/i386/pc_piix.c:256 #28 0x0000555555603ab2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4249 So the safety net is correctly preventing QEMU reading cipher text as if it were plain text, during startup and aborting QEMU to avoid bad usage of this data. For added fun this bug only happens if the encrypted qcow2 file happens to have data written to the first cluster, otherwise the cluster won't be allocated and so qcow2 would not try the decryption routines at all, just return all 0's. That no one even noticed, let alone reported, this bug that has shipped in 2.4.0, 2.5.0 and 2.6.0 shows that the number of actual users of encrypted qcow2 is approximately zero. So rather than fix the crash, and backport it to stable releases, just go ahead with what we have warned users about and disable any use of qcow2 encryption in the system emulators. qemu-img/qemu-io/qemu-nbd are still able to access qcow2 encrypted images for the sake of data conversion. In the future, qcow2 will gain support for the alternative luks format, but when this happens it'll be using the '-object secret' infrastructure for getting keys, which avoids this problematic scenario entirely. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-16 15:19:55 +02:00
Eric Blake	fa16653874	block: Assert that flags are in range Add a new BDRV_REQ_MASK constant, and use it to make sure that caller flags are always valid. Tested with 'make check' and with qemu-iotests on both '-raw' and '-qcow2'; the only failure turned up was fixed in the previous commit. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-16 15:19:55 +02:00
Eric Blake	73698c30ca	block: Avoid bogus flags during mirroring Commit `e253f4b8` converted mirroring from sector-based bdrv_aio_* to byte-based blk_aio_*, but failed to account for the subtle difference in signatures (the former takes a semi-redundant length, the latter takes a flags parameter). Since all of our flags are currently smaller in size than BDRV_SECTOR_SIZE, it has no ill effects until we either perform sub-sector mirroring, or we start asserting that no unexpected flags are set. I found it while testing new asserts when qemu-iotests 132 started warning about an unknown flag 0x200000. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	d46a0bb24d	qcow2: Implement .bdrv_co_pwritev() This changes qcow2 to implement the byte-based .bdrv_co_pwritev interface rather than the sector-based old one. As preallocation uses the same allocation function as normal writes, and the interface of that function needs to be changed, it is converted in the same patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	8556739355	qcow2: Use bytes instead of sectors for QCowL2Meta In preparation for implementing .bdrv_co_pwritev in qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	aaa4d20b49	qcow2: Make copy_sectors() byte based This will allow copy on write operations where the overwritten part of the cluster is not aligned to sector boundaries. Also rename the function because it has nothing to do with sectors any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	ecfe186380	qcow2: Implement .bdrv_co_preadv() Reading from qcow2 images is now byte granularity. Most of the affected code in qcow2 actually gets simpler with this change. The only exception is encryption, which is fixed on 512 bytes blocks; in order to keep this working, bs->request_alignment is set for encrypted images. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	b2f65d6b02	qcow2: Work with bytes in qcow2_get_cluster_offset() This patch changes the units that qcow2_get_cluster_offset() uses internally, without touching the interface just yet. This will be done in another patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	515c2f431e	block: Don't emulate natively supported pwritev flags Drivers that implement .bdrv_co_pwritev() get the flags passed as an argument to said function, but we also unconditionally emulate the flags anyway. We shouldn't do that. Fix this by clearing all flags that the driver supports natively after it returns from .bdrv_co_pwritev(). Fixes: `4df863f3` ('block: Make supported_write_flags a per-bds property') Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-08 10:21:09 +02:00
Kevin Wolf	2a9170bcd4	block: Fix bdrv_all_delete_snapshot() error handling The code to exit the loop after bdrv_snapshot_delete_by_id_or_name() returned failure was duplicated. The first copy of it was too early so that the AioContext lock would not be freed. This patch removes it so that only the second, correct copy remains. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-08 10:21:09 +02:00
Denis V. Lunev	f3c3b87dae	qcow2: avoid extra flushes in qcow2 The problem with excessive flushing was found by a couple of performance tests: - parallel directory tree creation (from 2 processes) - 32 cached writes + fsync at the end in a loop For the first one results improved from 2.6 loops/sec to 3.5 loops/sec. Each loop creates 10^3 directories with 10 files in each. For the second one results improved from ~600 fsync/sec to ~1100 fsync/sec. Though, it was run on SSD so it probably won't show such performance gain on rotational media. qcow2_cache_flush() calls bdrv_flush() unconditionally after writing cache entries of a particular cache. This can lead to as many as 2 additional fdatasyncs inside bdrv_flush. We can simply skip all fdatasync calls inside qcow2_co_flush_to_os as bdrv_flush for sure will do the job. These flushes are necessary to keep the right order of writes to the different caches. Though this is not necessary in the current code base as this ordering is ensured through the flush in qcow2_cache_flush_dependency(). Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:09 +02:00
Fam Zheng	6f6071745b	raw-posix: Fetch max sectors for host block device This is sometimes a useful value we should count in. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:09 +02:00
Eric Blake	c1499a5e73	block: Kill bdrv_co_write_zeroes() Now that all drivers have been converted to a byte interface, we no longer need a sector interface. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	a620f2ae15	vmdk: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	39ad937e16	raw_bsd: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	2ffa76c2bf	raw-posix: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> [ kwolf: Fixed up trace_paio_submit_co() call for qiov == NULL ] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	49a2e48348	qed: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Kill an abuse of the comma operator while at it (fortunately, the semantics were still right). Also, the test for requests not aligned to clusters should be applied always, not just when a backing file is present. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	e88a36ebad	gluster: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	9c21a4220b	blkreplay: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	5544b59f8e	qcow2: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	94d047a35b	iscsi: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. As this is the first byte-based iscsi interface, convert is_request_lun_aligned() into two versions, one for sectors and one for bytes. Also, change from outright -EINVAL failure on an unaligned request, to instead failing with -ENOTSUP to trigger a read-modify-write fallback, particularly since the block layer should be honoring bs->request_alignment to avoid -EINVAL on read/write requests. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	74021bc497	block: Switch bdrv_write_zeroes() to byte interface Rename to bdrv_pwrite_zeroes() to let the compiler ensure we cater to the updated semantics. Do the same for bdrv_co_write_zeroes(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	d05aa8bb4a	block: Add .bdrv_co_pwrite_zeroes() Update bdrv_co_do_write_zeroes() to be byte-based, and select between the new byte-based bdrv_co_pwrite_zeroes() or the old bdrv_co_write_zeroes(). The next patches will convert drivers, then remove the old interface. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	cf081fca4e	block: Track write zero limits in bytes Another step towards removing sector-based interfaces: convert the maximum write and minimum alignment values from sectors to bytes. Rename the variables to let the compiler check that all users are converted to the new semantics. The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS is constrained by INT_MAX (this means that we can't even support a 2G write_zeroes, but just under it) - changing operation lengths to unsigned or to 64-bits is a much bigger audit, and debatable if we even want to do it (since at the core, a 32-bit platform will still have ssize_t as its underlying limit on write()). Meanwhile, alignment is changed to 'uint32_t', since it makes no sense to have an alignment larger than the maximum write, and less painful to use an unsigned type with well-defined behavior in bit operations than to have to worry about what happens if a driver mistakenly supplies a negative alignment. Add an assert that no one was trying to use sectors to get a write zeroes larger than 2G, and therefore that a later conversion to bytes won't be impacted by keeping the limit at 32 bits. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	8b18474451	iscsi: Use block size as minimum zero/discard alignment If hardware does not advertise a minimum zero/discard alignment, we still want to guarantee that the block layer will align requests to our blocks, rather than the arbitrary 512-byte BDRV sector size. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	ebb718a5c7	qcow2: Catch more unaligned write_zero into zero cluster is_zero_cluster() and is_zero_cluster_top_locked() are used only by qcow2_co_write_zeroes(). The former is too broad (we don't care if the sectors we are about to overwrite are non-zero, only that all other sectors in the cluster are zero), so it needs to be called up to twice but with smaller limits - rename it along with adding the neeeded parameter. The latter can be inlined for more compact code. The testsuite change shows that we now have a sparser top file when an unaligned write_zeroes overwrites the only portion of the backing file with data. Based on a patch proposal by Denis V. Lunev. CC: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Denis V. Lunev	5a64e94251	qcow2: add tracepoints for qcow2_co_write_zeroes This patch follows guidelines of all other tracepoints in qcow2, like ones in qcow2_co_writev. I think that they should dump values in the same quantities or be changed all together. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Eric Blake <eblake@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Message-Id: <1463476543-3087-4-git-send-email-den@openvz.org> [eblake: typo fix in commit message] Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Denis V. Lunev	ba142846b0	qcow2: simplify logic in qcow2_co_write_zeroes Unaligned requests will occupy only one cluster. This is true since the previous commit. Simplify the code taking this consideration into account. In other words, the caller is now buggy if it ever passes us an unaligned request that crosses cluster boundaries (the only requests that can cross boundaries will be aligned). There are no other changes so far. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Message-Id: <1463476543-3087-3-git-send-email-den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Denis V. Lunev	443668ca40	block: split write_zeroes always We should split requests even if they are less than write_zeroes_alignment. For example we can have the following request: offset 62k size 4k write_zeroes_alignment 64k The original code sent 1 request covering 2 qcow2 clusters, and resulted in both clusters being allocated. But by splitting the request, we can cater to the case where one of the two clusters can be zeroed as a whole, for only 1 cluster allocated after the operation. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Eric Blake <eblake@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Message-Id: <1463476543-3087-2-git-send-email-den@openvz.org> [eblake: Avoid exceeding nb_sectors, hoist alignment checks out of loop, and update testsuite to show that patch works] Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Peter Maydell	6ed5546fa7	trivial patches for 2016-06-07 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXVuZUAAoJEL7lnXSkw9fb1VQH/ioMIWojXXTQS4sDmelGMMlK P3RnXmTtCqA5QYy5BXyngmjNY29ne2o5UiRm8uESLHbsf0dpQyUudLoVQhXIG10n TtT4veK8boO+96heCCBbA3X53SIo/Ta+gHmQlCD24wqgtZphbdXnjRTSEA8IT3N6 1mIyEfG/avMUgq/qTF2JoeKBXscY6CVuQdH/a8QzQ0Cnmf6nBJ0NfZ7XiuEQAJ2S eVUkeh3SJKrJeMrk9yw8rnxKBIP27IwbEvrVcryMOdBltiW91rI/79CveANxY5S0 6TeZrIFyb11pAPz741BpzUWjwEzs4PB6ZeJ8uy28ldJPbQfLIe18ELIV7vhpzVQ= =foxH -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2016-06-07' into staging trivial patches for 2016-06-07 # gpg: Signature made Tue 07 Jun 2016 16:20:52 BST # gpg: using RSA key 0xBEE59D74A4C3D7DB # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" * remotes/mjt/tags/pull-trivial-patches-2016-06-07: (51 commits) hbitmap: Use DIV_ROUND_UP qemu-timer: Use DIV_ROUND_UP linux-user: Use DIV_ROUND_UP slirp: Use DIV_ROUND_UP usb: Use DIV_ROUND_UP rocker: Use DIV_ROUND_UP SPICE: Use DIV_ROUND_UP audio: Use DIV_ROUND_UP xen: Use DIV_ROUND_UP crypto: Use DIV_ROUND_UP block: Use DIV_ROUND_UP qed: Use DIV_ROUND_UP qcow/qcow2: Use DIV_ROUND_UP parallels: Use DIV_ROUND_UP coccinelle: use macro DIV_ROUND_UP instead of (((n) + (d) - 1) /(d)) thunk: Rename args and fields in host-target bitmask conversion code thunk: Drop unused NO_THUNK_TYPE_SIZE guards qemu-common.h: Drop WORDS_ALIGNED define host-utils: Prefer 'false' for bool type docs/multi-thread-compression: Fix wrong command string ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-06-07 16:34:45 +01:00
Laurent Vivier	13385ae168	block: Use DIV_ROUND_UP Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d). This patch is the result of coccinelle script scripts/coccinelle/round.cocci CC: qemu-block@nongnu.org Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-06-07 18:19:24 +03:00
Laurent Vivier	c41a73ffaf	qed: Use DIV_ROUND_UP Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d). This patch is the result of coccinelle script scripts/coccinelle/round.cocci CC: qemu-block@nongnu.org Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-06-07 18:19:24 +03:00
Laurent Vivier	d737b78cc1	qcow/qcow2: Use DIV_ROUND_UP Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d). This patch is the result of coccinelle script scripts/coccinelle/round.cocci CC: qemu-block@nongnu.org Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-06-07 18:19:24 +03:00
Laurent Vivier	969401fe76	parallels: Use DIV_ROUND_UP Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d). This patch is the result of coccinelle script scripts/coccinelle/round.cocci CC: qemu-block@nongnu.org Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-06-07 18:19:24 +03:00
Peter Maydell	030c98aff1	all: Remove unnecessary glib.h includes Remove glib.h includes, as it is provided by osdep.h. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-06-07 18:19:24 +03:00
Fam Zheng	c8a9fd8071	block: Drop bdrv_ioctl_bh_cb Similar to the "!drv \|\| !drv->bdrv_aio_ioctl" case above, here it is okay to set co.ret and return. As pointed out by Paolo, a BH will be created as necessary by the caller (bdrv_co_maybe_schedule_bh). Besides, as pointed out by Kevin, "data" was leaked before. Reported-by: Kevin Wolf <kwolf@redhat.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20160601015223.19277-1-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-07 14:40:51 +01:00
Eric Blake	41574268b7	block: Move BlockRequest type to io.c I was thrown by the fact that the public type BlockRequest had an anonymous union, but no obvious discriminator. Turns out that the only client of the second branch of the union was code internal to io.c, now that commit `91c6e4b` killed public multiwrite, so move it into io.c and improve the comments. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463699150-19445-1-git-send-email-eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2016-06-07 14:40:51 +01:00
Peter Lieven	117bc3fa22	block/io: optimize bdrv_co_pwritev for small requests in a read-modify-write cycle a small request might cause head and tail to fall into the same aligned block. Currently QEMU reads the same block twice in this case which is not necessary. Signed-off-by: Peter Lieven <pl@kamp.de> Message-id: 1464607873-28206-1-git-send-email-pl@kamp.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-07 14:40:51 +01:00
Kevin Wolf	a7944dfad0	block/io: Remove unused bdrv_aio_write_zeroes() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1464599852-15392-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-07 14:40:51 +01:00
Peter Lieven	a6b3167fa0	block/iscsi: avoid potential overflow of acb->task->cdb at least in the path via virtio-blk the maximum size is not restricted. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1464080368-29584-1-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-29 09:11:11 +02:00
Kevin Wolf	4653456a5f	commit: Use BlockBackend for I/O This changes the commit block job to use the job's BlockBackend for performing its I/O. job->bs isn't used by the commit code any more afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	5c438bc68c	backup: Use BlockBackend for I/O This changes the backup block job to use the job's BlockBackend for performing its I/O. job->bs isn't used by the backup code any more afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	8543c27414	backup: Remove bs parameter from backup_do_cow() Now that we pass the job to the function, bs is implied by that. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2016-05-25 19:04:21 +02:00
John Snow	12b3e52e48	backup: Pack Notifier within BackupBlockJob Instead of relying on peeking at bs->job, we want to explicitly get a reference to the job that was involved in this notifier callback. Pack the Notifier inside of the BackupBlockJob so we can use container_of to get a reference back to the BackupBlockJob object. This cuts out one more case where we rely unnecessarily on bs->job. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	91ab688379	backup: Don't leak BackupBlockJob in error path Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	e253f4b897	mirror: Use BlockBackend for I/O This changes the mirror block job to use the job's BlockBackend for performing its I/O. job->bs isn't used by the mirroring code any more afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	b880481579	mirror: Allow target that already has a BlockBackend We had to forbid mirroring to a target BDS that already had a BB attached because the node swapping at job completion would add a second BB and we didn't support multiple BBs on a single BDS at the time. Now we do, so we can lift the restriction. As we allow additional BlockBackends for the target, we must expect other users to be sending requests. There may no requests be in flight during the graph modification, so we have to drain those users now. The core part of this patch is a revert of commit `40365552`. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	03e35d820d	stream: Use BlockBackend for I/O This changes the streaming block job to use the job's BlockBackend for performing the COR reads. job->bs isn't used by the streaming code any more afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	1e98fefd95	block: Make blk_co_preadv/pwritev() public Also add trace points now that the function can be directly called. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	b6d2e59995	block: Convert block job core to BlockBackend This adds a new BlockBackend field to the BlockJob struct, which coexists with the BlockDriverState while converting the individual jobs. When creating a block job, a new BlockBackend is created on top of the given BlockDriverState, and it is destroyed when the BlockJob ends. The reference to the BDS is now held by the BlockBackend instead of calling bdrv_ref/unref manually. We have to be careful when we use bdrv_replace_in_backing_chain() in block jobs because this changes the BDS that job->blk points to. At the moment block jobs are too tightly coupled with their BDS, so that moving a job to another BDS isn't easily possible; therefore, we need to just manually undo this change afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	0c3169dffa	block: Default to enabled write cache in blk_new() The existing users of the function are: 1. blk_new_open(), which already enabled the write cache 2. Some test cases that don't care about the setting 3. blockdev_init() for empty drives, where the cache mode is overridden with the value from the options when a medium is inserted Therefore, this patch doesn't change the current behaviour. It will be convenient, however, for additional users of blk_new() (like block jobs) if the most sensible WCE setting is the default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2016-05-25 19:04:21 +02:00
Eric Blake	d004bd52aa	block: Rename blk_write_zeroes() Commit `983a1600` changed the semantics of blk_write_zeroes() to be byte-based rather than sector-based, but did not change the name, which is an open invitation for other code to misuse the function. Renaming to pwrite_zeroes() makes it more in line with other byte-based interfaces, and will help make it easier to track which remaining write_zeroes interfaces still need conversion. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	36fe13317b	block: Fix reconfiguring graph with drained nodes When changing the BlockDriverState that a BdrvChild points to while the node is currently drained, we must call the .drained_end() parent callback. Conversely, when this means attaching a new node that is already drained, we need to call .drained_begin(). bdrv_root_attach_child() takes now an opaque parameter, which is needed because the callbacks must also be called if we're attaching a new child to the BlockBackend when the root node is already drained, and they need a way to identify the BlockBackend. Previously, child->opaque was set too late and the callbacks would still see it as NULL. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Kevin Wolf	6820643fdb	block: Make bdrv_drain() use bdrv_drained_begin/end() Until now, bdrv_drained_begin() used bdrv_drain() internally to drain the queue. This is kind of backwards and caused quiescing code to be duplicated because bdrv_drained_begin() had to ensure that no new requests come in even after bdrv_drain() returns, whereas bdrv_drain() had to have them because it could be called from other places. Instead move the bdrv_drain() code to bdrv_drained_begin() and make bdrv_drain() a simple wrapper around bdrv_drained_begin/end(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	109525ad6a	block: Drop errp parameter from blk_new() blk_new() cannot fail so its Error ** parameter has become superfluous. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	5b3639371c	block: Make bdrv_open() return a BDS There are no callers to bdrv_open() or bdrv_open_inherit() left that pass a pointer to a non-NULL BDS pointer as the first argument of these functions, so we can finally drop that parameter and just make them return the new BDS. Generally, the following pattern is applied: bs = NULL; ret = bdrv_open(&bs, ..., &local_err); if (ret < 0) { error_propagate(errp, local_err); ... } by bs = bdrv_open(..., errp); if (!bs) { ret = -EINVAL; ... } Of course, there are only a few instances where the pattern is really pure. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	28eb9b12f7	block: Drop blk_new_with_bs() Its only caller is blk_new_open(), so we can just inline it there. The bdrv_new_root() call is dropped in the process because we can just let bdrv_open() create the BDS. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Kevin Wolf	88be7b4be4	block: Fix bdrv_next() memory leak The bdrv_next() users all leaked the BdrvNextIterator after completing the iteration. Simply changing bdrv_next() to free the iterator before returning NULL at the end of list doesn't work because some callers exit the loop before looking at all BDSes. This patch moves the BdrvNextIterator from the heap to the stack of the caller and switches to a bdrv_first()/bdrv_next() interface for initialising the iterator. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Vadim Rozenfeld	644c6869d3	iscsi: pass SCSI status back for SG_IO Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-23 16:53:46 +02:00
Peter Maydell	6bd8ab6889	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJXPdcnAAoJEH8JsnLIjy/WPEoQAK5vlRYqvQrrevMJviT4ZPUX cGGbabOcmfTBHGAgGwRLg+vQ043Sgu14JjtNbrsoSsBwAl9eAhAVGOimiieaY3vR 35OOUxECswArJzK8I4XRx4KhI871Yq+8kHILPoXpF8L7YU38Zqa1D5z2dcOKYrL8 Oy5IEfd1+Qfpxg/txKIioP5BzKVpz3V9/8GRNo0iAl7c806NoYFpnM0TXsed9Fjr YvUn1AdGHUF0/pV6vU46Qxz4yy1Q+cuoh923z6+YvXTcwok7PbjhAQWWA0qvSTuG otnPKMPBhYa6g7XOPD9Mra986vs6vBEGiPS5uqXoM5FqxF4Hc9LIeHEr+3hb+m53 NLOmGqfct0USY9r6rXsOhZQb7nZCDuhaedv33ZfgE0T0cYxIilHs5PhgFAWfthhP aNJYlzbJUhqhTi7CJrJcFoGbNQDxux5qtlFo43M4vz/WYYDrwu8P7O3YO+sH0jU1 EXJnbtztQvwfsiIEbIzvBRQl3XD9QmCfYO3lRbOwdCnd3ZLy47E2bze4gV3DwzK7 CsBr+sa49xI8LMswPxTms+A+Inndn8O0mGI32Zi4nBKapjpy5Fb4YG6z8+WPfTKp Il1PsSgG84wm4YxGWty/UI4DoPY+hqlIIz1CNuRRNQtZTybLgNCK8ZKYbVlRppmf pGPpQ8pmqkeFLmx8hecm =ntKz -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Thu 19 May 2016 16:09:27 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (31 commits) qemu-iotests: Fix regression in 136 on aio_read invalid qemu-iotests: Simplify 109 with unaligned qemu-img compare qemu-io: Fix recent UI updates block: clarify error message for qmp-eject qemu-iotests: Some more write_zeroes tests qcow2: Fix write_zeroes with partially allocated backing file cluster qcow2: fix condition in is_zero_cluster block: Propagate AioContext change to all children block: Remove BlockDriverState.blk block: Don't return throttling info in query-named-block-nodes block: Avoid bs->blk in bdrv_next() block: Add bdrv_has_blk() block: Remove bdrv_aio_multiwrite() blockjob: Don't touch BDS iostatus blockjob: Don't set iostatus of target block: User BdrvChild callback for device name block: Use BdrvChild callbacks for change_media/resize block: Don't check throttled reqs in bdrv_requests_pending() Revert "block: Forbid I/O throttling on nodes with multiple parents for 2.6" block: Remove bdrv_move_feature_fields() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-05-19 16:54:12 +01:00
Kevin Wolf	5efdf53227	qcow2: Fix write_zeroes with partially allocated backing file cluster In order to correctly check whether a given cluster is read as zero, we don't only need to check whether bdrv_get_block_status_above() sets BDRV_BLOCK_ZERO, but also if all sectors for the whole cluster have the same status. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org>	2016-05-19 16:45:31 +02:00
Denis V. Lunev	f575f145f4	qcow2: fix condition in is_zero_cluster We should check for (res & BDRV_BLOCK_ZERO) only. The situation when we will have !(res & BDRV_BLOCK_DATA) and will not have BDRV_BLOCK_ZERO is not possible for images with bdi.unallocated_blocks_are_zero == true. For those images where it's false, however, it can happen and we must not consider the data zeroed then or we would corrupt the image. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-19 16:45:31 +02:00
Max Reitz	b97511c7bc	block: Propagate AioContext change to all children Instead of propagating any change of a BDS's AioContext only to its file and backing children and letting driver-specific code do the rest, just propagate it to all and drop the thus superfluous implementations of bdrv_{at,de}tach_aio_context() in Quorum, blkverify and VMDK. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	1f0c461b82	block: Remove BlockDriverState.blk This patch removes the remaining users of bs->blk, which will allow us to have multiple BBs on top of a single BDS. In the meantime, all checks that are currently in place to prevent the user from creating such setups can be switched to bdrv_has_blk() instead of accessing BDS.blk. Future patches can allow them and e.g. enable users to mirror to a block device that already has a BlockBackend on it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	79c719b755	block: Don't return throttling info in query-named-block-nodes query-named-block-nodes should not return information that is related to the attached BlockBackend rather than the node itself, so throttling information needs to be removed from it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	7c8eece45b	block: Avoid bs->blk in bdrv_next() We need to introduce a separate BdrvNextIterator struct that can keep more state than just the current BDS in order to avoid using the bs->blk pointer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	dde33812a8	block: Add bdrv_has_blk() In many cases we just want to know whether a BDS has at least one BB attached, without needing to know the exact BB that is attached. In contrast to bs->blk, this is still a valid question when more than one BB can be attached, so just answer it by checking the parents list. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	91c6e4b7bb	block: Remove bdrv_aio_multiwrite() Since virtio-blk implements request merging itself these days, the only remaining users are test cases for the function. That doesn't make the function exactly useful any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	66a0fae438	blockjob: Don't touch BDS iostatus Block jobs don't actually make use of the iostatus for their BDSes, but they manage a separate block job iostatus. Still, they require that it is enabled for the source BDS and they enable it automatically for the target and set the error handling mode - which ends up never being used by the job. This patch removes all of the BDS iostatus handling from the block job, which removes another few bs->blk accesses. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	81e254dc83	blockjob: Don't set iostatus of target When block job errors were introduced, we assigned the iostatus of the target BDS "just in case". The field has never been accessible for the user because the target isn't listed in query-block. Before we can allow the user to have a second BlockBackend on the target, we need to clean this up. If anything, we would want to set the iostatus for the internal BB of the job (which we can always do later), but certainly not for a separate BB which the job doesn't even use. As a nice side effect, this gets us rid of another bs->blk use. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	4c265bf9f4	block: User BdrvChild callback for device name In order to get rid of bs->blk for bdrv_get_device_name() and bdrv_get_device_or_node_name(), ask all parents for their name and simply pick the first one. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	5c8cab4808	block: Use BdrvChild callbacks for change_media/resize We want to get rid of BlockDriverState.blk in order to allow multiple BlockBackends per BDS. Converting the device callbacks in block.c (which assume a single BlockBackend) to per-child callbacks gets us rid of the first few instances. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	cbe1beb7a1	block: Don't check throttled reqs in bdrv_requests_pending() Checking whether there are throttled requests requires going to the associated BlockBackend, which we want to avoid. All users of bdrv_requests_pending() in block/io.c already call bdrv_parent_drained_begin() first, which restarts all throttled requests, so no throttled requests can be left here and this is removal of dead code. The remaining users (assertions during graph manipulation in block.c) don't care about requests that are still queued in the BlockBackend and haven't been issued for a BlockDriverState yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	7ca7f0f6db	block: Decouple throttling from BlockDriverState This moves the throttling related part of the BDS life cycle management to BlockBackend. The throttling group reference is now kept even when no medium is inserted. With this commit, throttling isn't disabled and then re-enabled any more during graph reconfiguration. This fixes the temporary breakage of I/O throttling when used with live snapshots or block jobs that manipulate the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	bb9aaecaf1	block/io: Quiesce parents between drained_begin/end So far, bdrv_parent_drained_begin/end() was called for the duration of the actual bdrv_drain() at the beginning of a drained section, but we really should keep parents quiesced until the end of the drained section. This does not actually change behaviour at this point because the only user of the .drained_begin/end BdrvChildRole callback is I/O throttling, which already doesn't send any new requests after flushing its queue in .drained_begin. The patch merely removes a trap for future users. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	c2066af051	block: Drain throttling queue with BdrvChild callback This removes the last part of I/O throttling from block/io.c and moves it to the BlockBackend. Instead of having knowledge about throttling inside io.c, we can call a BdrvChild callback .drained_begin/end, which happens to drain the throttled requests for BlockBackend parents. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	22aa8b246a	block: Introduce BdrvChild.opaque BlockBackends use it to get a back pointer from BdrvChild to BlockBackend in any BdrvChildRole callbacks. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	97148076e8	block: Move I/O throttling configuration functions to BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	441565b279	block: Move actual I/O throttling to BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	27ccdd5259	block: Move throttling fields from BDS to BB This patch changes where the throttling state is stored (used to be the BlockDriverState, now it is the BlockBackend), but it doesn't actually make it a BB level feature yet. For example, throttling is still disabled when the BDS is detached from the BB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	49d2165d7d	block: Convert throttle_group_get_name() to BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:29 +02:00
Kevin Wolf	31dce3ccca	block: throttle-groups: Use BlockBackend pointers internally As a first step towards moving I/O throttling to the BlockBackend level, this patch changes all pointers in struct ThrottleGroup from referencing a BlockDriverState to referencing a BlockBackend. This change is valid because we made sure that throttling can only be enabled on BDSes which have a BB attached. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:29 +02:00
Kevin Wolf	f2cd875d54	block: Introduce BlockBackendPublic Some features, like I/O throttling, are implemented outside block-backend.c, but still want to keep information in BlockBackend, e.g. list entries that allow keeping a list of BlockBackends. In order to avoid exposing the whole struct layout in the public header file, this patch introduces an embedded public struct where such information can be added and a pair of functions to convert between BlockBackend and BlockBackendPublic. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:29 +02:00
Kevin Wolf	a5614993d7	block: Make sure throttled BDSes always have a BB It was already true in principle that a throttled BDS always has a BB attached, except that the order of operations while attaching or detaching a BDS to/from a BB wasn't careful enough. This commit breaks graph manipulations while I/O throttling is enabled. It would have been possible to keep things working with some temporary hacks, but quite cumbersome, so it's not worth the hassle. We'll fix things again in a minute. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:29 +02:00
Paolo Bonzini	58369e22cf	qemu-common: stop including qemu/bswap.h from qemu-common.h Move it to the actual users. There are still a few includes of qemu/bswap.h in headers; removing them is left for future work. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:28 +02:00
Peter Maydell	f68419eee9	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJXNIcBAAoJEH8JsnLIjy/WiUQP/Rzfo8pe7TWA2InxdcDOPsx4 2/tHHJdVkffnNX5rdBvc0mOUZNJxej0NtJu2e63BB+ydYju//xw8gruKU7TR+Nd3 nPNSsqk80prK3RNgWu7qymBvIkHDDcDQhlp48HKq+dxrfConXtHmoXapGsqc0S47 xu03oC6WzSIyLf7TLytcjUmEprQSaCGOwsb/XaHAWL750fFAGcdy/K5PWBpUv6DN T0jZ3u4UneE1jeabRmqAwjgDJXC9l6riH9fP/ZtYhgNlNj84zlMXajUHSULhGknP cTGjwwg9tOvhcjTdhdRmWlvG1m0T77ZX3icfZLhcTdb/Uz68NXVqs8P25IGV9McD DPrb3T/M8JUoqLXJxIpxUm2Levof5v0dUF1PHmN5bT7pshcqv/1J7v8Fdtf9l9mp zI0+FK1TZ102C0H2F7AWYZSlo2EfNUSd02QQx6MbfDokDIlIxY+EgP1/Es5XlkqC wc7HrJvq+uix2zXw9bn9Vg9p/nDuxlRx+ppRRarNNRonaqTrx/1qAaas4bsqc9Gz H6gxw7BHybm0TZFdHqAdIonpesecYw6yWUXT/mQehbfphsmQmu/d2HvF2C9uUm4X O0JduBlKOTm2hMcg5qL6Gko8WaQIctdCJH/1Onts92cZnm8Vr/9zcmMgwGoCd7sE +t6Yg0jqpTUJwhZhIuCw =NbjJ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Thu 12 May 2016 14:37:05 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (69 commits) qemu-iotests: iotests: fail hard if not run via "check" block: enable testing of LUKS driver with block I/O tests block: add support for encryption secrets in block I/O tests block: add support for --image-opts in block I/O tests qemu-io: Add 'write -z -u' to test MAY_UNMAP flag qemu-io: Add 'write -f' to test FUA flag qemu-io: Allow unaligned access by default qemu-io: Use bool for command line flags qemu-io: Make 'open' subcommand more like command line qemu-io: Add missing option documentation qmp: add monitor command to add/remove a child quorum: implement bdrv_add_child() and bdrv_del_child() Add new block driver interface to add/delete a BDS's child qemu-img: check block status of backing file when converting. iotests: fix the redirection order in 083 block: Inactivate all children block: Drop superfluous invalidating bs->file from drivers block: Invalidate all children nbd: Simplify client FUA handling block: Honor BDRV_REQ_FUA during write_zeroes ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-05-12 16:33:40 +01:00
Wen Congyang	98292c61bc	quorum: implement bdrv_add_child() and bdrv_del_child() Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Message-id: 1462865799-19402-3-git-send-email-xiecl.fnst@cn.fujitsu.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-05-12 15:33:23 +02:00
Fam Zheng	c9e9e9c66c	block: Drop superfluous invalidating bs->file from drivers Now they are invalidated by the block layer, so it's not necessary to do this in block drivers' implementations of .bdrv_invalidate_cache. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	52a4650574	nbd: Simplify client FUA handling Now that the block layer honors per-bds FUA support, we don't have to duplicate the fallback flush at the NBD layer. The static function nbd_co_writev_flags() is no longer needed, and the driver can just directly use nbd_client_co_writev(). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	465fe887cc	block: Honor BDRV_REQ_FUA during write_zeroes The block layer has a couple of cases where it can lose Force Unit Access semantics when writing a large block of zeroes, such that the request returns before the zeroes have been guaranteed to land on underlying media. SCSI does not support FUA during WRITESAME(10/16); FUA is only supported if it falls back to WRITE(10/16). But where the underlying device is new enough to not need a fallback, it means that any upper layer request with FUA semantics was silently ignoring BDRV_REQ_FUA. Conversely, NBD has situations where it can support FUA but not ZERO_WRITE; when that happens, the generic block layer fallback to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu 2.6) was losing the FUA flag. The problem of losing flags unrelated to ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since commit `aa7bfbff`, but back then, it did not matter because there was no FUA flag. It became observable when commit `93f5e6d8` paved the way for flags that can impact correctness, when we should have been using bdrv_co_writev_flags() with modified flags. Compare to commit `9eeb6dd`, which got flag manipulation right in bdrv_co_do_zero_pwritev(). Symptoms: I tested with qemu-io with default writethrough cache (which is supposed to use FUA semantics on every write), and targetted an NBD client connected to a server that intentionally did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512', the NBD client sent two operations (NBD_CMD_WRITE then NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing 'write -z 0 512', the NBD client sent only NBD_CMD_WRITE. The fix is do to a cleanup bdrv_co_flush() at the end of the operation if any step in the middle relied on a BDS that does not natively support FUA for that step (note that we don't need to flush after every operation, if the operation is broken into chunks based on bounce-buffer sizing). Each BDS gains a new flag .supported_zero_flags, which parallels the use of .supported_write_flags but only when accessing a zero write operation (the flags MUST be different, because of SCSI having different semantics based on WRITE vs. WRITESAME; and also because BDRV_REQ_MAY_UNMAP only makes sense on zero writes). Also fix some documentation to describe -ENOTSUP semantics, particularly since iscsi depends on those semantics. Down the road, we may want to add a driver where its .bdrv_co_pwritev() honors all three of BDRV_REQ_FUA, BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise this via bs->supported_write_flags for blocks opened by that driver; such a driver should NOT supply .bdrv_co_write_zeroes nor .supported_zero_flags. But none of the drivers touched in this patch want to do that (the act of writing zeroes is different enough from normal writes to deserve a second callback). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	4df863f336	block: Make supported_write_flags a per-bds property Pre-patch, .supported_write_flags lives at the driver level, which means we are blindly declaring that all block devices using a given driver will either equally support FUA, or that we need a fallback at the block layer. But there are drivers where FUA support is a per-block decision: the NBD block driver is dependent on the remote server advertising NBD_FLAG_SEND_FUA (and has fallback code to duplicate the flush that the block layer would do if NBD had not set .supported_write_flags); and the iscsi block driver is dependent on the mode sense bits advertised by the underlying device (and is currently silently ignoring FUA requests if the underlying device does not support FUA). The fix is to make supported flags as a per-BDS option, set during .bdrv_open(). This patch moves the variable and fixes NBD and iscsi to set it only conditionally; later patches will then further simplify the NBD driver to quit duplicating work done at the block layer, as well as tackle the fact that SCSI does not support FUA semantics on WRITESAME(10/16) but only on WRITE(10/16). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Denis V. Lunev	2928abce6d	qcow2: improve qcow2_co_write_zeroes() There is a possibility that qcow2_co_write_zeroes() will be called with the partial block. This could be synthetically triggered with qemu-io -c "write -z 32k 4k" and can happen in the real life in qemu-nbd. The latter happens under the following conditions: (1) qemu-nbd is started with --detect-zeroes=on and is connected to the kernel NBD client (2) third party program opens kernel NBD device with O_DIRECT (3) third party program performs write operation with memory buffer not aligned to the page In this case qcow2_co_write_zeroes() is unable to perform the operation and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller switches to non-optimized version and writes real zeroes to the disk. The patch creates a shortcut. If the block is read as zeroes, f.e. if it is unallocated, the request is extended to cover full block. User-visible situation with this block is not changed. Before the patch the block is filled in the image with real zeroes. After that patch the block is marked as zeroed in metadata. Thus any subsequent changes in backing store chain are not affected. Kevin, thank you for a cool suggestion. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	7b1deac84e	block: Kill unused sector-based blk_* functions Now that there are no remaining clients, we can drop the sector-based blk_read(), blk_write(), blk_aio_readv(), and blk_aio_writev(). Sadly, there are still remaining sector-based interfaces, such as blk_*discard(), or blk_write_compressed(); those will have to wait for another day. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	60cb2fa7eb	block: Introduce byte-based aio read/write blk_aio_readv() and blk_aio_writev() are annoying in that they can't access sub-sector granularity, and cannot pass flags. Also, they require the caller to pass redundant information about the size of the I/O (qiov->size in bytes must match nb_sectors in sectors). Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix the flaws. The next few patches will upgrade callers, then finally delete the old interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Eric Blake	983a160050	block: Switch blk_*write_zeroes() to byte interface Sector-based blk_write() should die; convert the one-off variant blk_write_zeroes() to use an offset/count interface instead. Likewise for blk_co_write_zeroes() and blk_aio_write_zeroes(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Eric Blake	b7d17f9fa4	block: Switch blk_read_unthrottled() to byte interface Sector-based blk_read() should die; convert the one-off variant blk_read_unthrottled(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Eric Blake	8341f00dc2	block: Allow BDRV_REQ_FUA through blk_pwrite() We have several block drivers that understand BDRV_REQ_FUA, and emulate it in the block layer for the rest by a full flush. But without a way to actually request BDRV_REQ_FUA during a pass-through blk_pwrite(), FUA-aware block drivers like NBD are forced to repeat the emulation logic of a full flush regardless of whether the backend they are writing to could do it more efficiently. This patch just wires up a flags argument; followup patches will actually make use of it in the NBD driver and in qemu-io. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Janne Karhunen	f249924e96	Allow users to specify the vmdk virtual hardware version. Vmdk images have metadata to indicate the vmware virtual hardware version image was created/tested to run with. Allow users to specify that version via new 'hwversion' option. [ kwolf: Adjust qemu-iotests common.filter ] Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Zhou Jie	ed79f37d9b	block: always compile-check debug prints Files with conditional debug statements should ensure that the printf is always compiled. This prevents bitrot of the format string of the debug statement. And switch debug output to stderr. Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	e3ddef25e9	block: Remove BlockDriver.bdrv_read/write There are no block drivers left that implement the old .bdrv_read/write interface, so it can be removed now. This gets us rid of the corresponding emulation functions, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	4575eb496d	vvfat: Implement .bdrv_co_preadv/pwritev interfaces This doesn't really convert any of the actual vvfat logic to use vectored I/O (and it's doubtful whether that would make sense), but instead just adapts the wrappers to the modern interface. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	513b0f026b	vpc: Implement .bdrv_co_pwritev() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	d46b7cc680	vpc: Implement .bdrv_co_preadv() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	37b1d7d8c9	vmdk: Implement .bdrv_co_pwritev() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	f10cc24359	vmdk: Implement .bdrv_co_preadv() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	a844a2b0d4	vmdk: Add vmdk_find_offset_in_cluster() This is a byte granularity version of vmdk_find_index_in_cluster(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	fde9d56f5b	vdi: Implement .bdrv_co_pwritev() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	0865bb6f04	vdi: Implement .bdrv_co_preadv() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	3edf1e73d5	dmg: Implement .bdrv_co_preadv() interface This implements .bdrv_co_preadv() for the cloop block driver. While updating the error paths, change -1 to a valid -errno code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	5cd230819e	cloop: Implement .bdrv_co_preadv() interface This implements .bdrv_co_preadv() for the cloop block driver. While updating the error paths, change -1 to a valid -errno code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	3b8fd33011	bochs: Implement .bdrv_co_preadv() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	3fb06697ae	block: Introduce .bdrv_co_preadv/pwritev BlockDriver function Many parts of the block layer are already byte granularity. The block driver interface, however, was still missing an interface that allows making use of this. This patch introduces a new BlockDriver interface, which is based on coroutines, vectored, has flags and uses a byte granularity. This is now the preferred interface for new drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	cab3a3563c	block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev It used to be an internal helper function just for implementing bdrv_co_do_readv/writev(), but now that it's a public interface, it deserves a name without "do" in it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	0884447382	block: Support AIO drivers in bdrv_driver_preadv/pwritev() Instead of registering emulation functions as .bdrv_co_writev, just directly check whether the function is there or not, and use the AIO interface if it isn't. This makes the read/write functions more consistent with how things are done in other places (flush, discard, etc.) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	78a07294d5	block: Introduce bdrv_driver_pwritev() This is a function that simply calls into the block driver for doing a write, providing the byte granularity interface we want to eventually have everywhere, and using whatever interface that driver supports. This one is a bit more interesting than the version for reads: It adds support for .bdrv_co_writev_flags() everywhere, so that drivers implementing this function can drop .bdrv_co_writev() now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	166fe96051	block: Introduce bdrv_driver_preadv() This is a function that simply calls into the block driver for doing a read, providing the byte granularity interface we want to eventually have everywhere, and using whatever interface that driver supports. For now, this is just a wrapper for calling bs->drv->bdrv_co_readv(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	dd7f7ed104	linux-aio: make it more type safe Replace void* with an opaque LinuxAioState type. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	6b98bd6495	block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Extract the handling of io_plug "depth" from linux-aio.c and let the main bdrv_drain loop do nothing but wait on I/O. Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug now operate on all children. The visit order is now symmetrical between plug and unplug, making it possible for formats to implement plug/unplug. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	ce0f141259	block: introduce bdrv_no_throttling_begin/end Extract the handling of throttling from bdrv_flush_io_queue. These new functions will soon become BdrvChildRole callbacks, as they can be generalized to "beginning of drain" and "end of drain". Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	b6e84c97ed	block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain Do not call bdrv_drain_recurse twice in bdrv_co_drain. A small tweak to the logic in Fam's patch, which is harmless since no one implements bdrv_drain anyway. But better get it right. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	a72f641407	block: move restarting of throttled reqs to block/throttle-groups.c We want to remove throttled_reqs from block/io.c. This is the easy part---hide the handling of throttled_reqs during disable/enable of throttling within throttle-groups.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	733bbc8cea	block: make bdrv_start_throttled_reqs return void The return value is unused and I am not sure why it would be useful. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	90c78624f1	block: Don't disable I/O throttling on sync requests We had to disable I/O throttling with synchronous requests because we didn't use to run timers in nested event loops when the code was introduced. This isn't true any more, and throttling works just fine even when using the synchronous API. The removed code is in fact dead code since commit `a8823a3b` ('block: Use blk_co_pwritev() for blk_write()') because I/O throttling can only be set on the top layer, but BlockBackend always uses the coroutine interface now instead of using the sync API emulation in block.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Eric Blake	15c2f669e3	qapi: Split visit_end_struct() into pieces As mentioned in previous patches, we want to call visit_end_struct() functions unconditionally, so that visitors can release resources tied up since the matching visit_start_struct() without also having to worry about error priority if more than one error occurs. Even though error_propagate() can be safely used to ignore a second error during cleanup caused by a first error, it is simpler if the cleanup cannot set an error. So, split out the error checking portion (basically, input visitors checking for unvisited keys) into a new function visit_check_struct(), which can be safely skipped if any earlier errors are encountered, and leave the cleanup portion (which never fails, but must be called unconditionally if visit_start_struct() succeeded) in visit_end_struct(). Generated code in qapi-visit.c has diffs resembling: \|@@ -59,10 +59,12 @@ void visit_type_ACPIOSTInfo(Visitor *v, \| goto out_obj; \| } \| visit_type_ACPIOSTInfo_members(v, obj, &err); \|- error_propagate(errp, err); \|- err = NULL; \|+ if (err) { \|+ goto out_obj; \|+ } \|+ visit_check_struct(v, &err); \| out_obj: \|- visit_end_struct(v, &err); \|+ visit_end_struct(v); \| out: and in qapi-event.c: @@ -47,7 +47,10 @@ void qapi_event_send_acpi_device_ost(ACP \| goto out; \| } \| visit_type_q_obj_ACPI_DEVICE_OST_arg_members(v, &param, &err); \|- visit_end_struct(v, err ? NULL : &err); \|+ if (!err) { \|+ visit_check_struct(v, &err); \|+ } \|+ visit_end_struct(v); \| if (err) { \| goto out; Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1461879932-9020-20-git-send-email-eblake@redhat.com> [Conflict with a doc fixup resolved] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-05-12 09:47:55 +02:00
Kevin Wolf	d208c50d9d	vvfat: Fix default volume label Commit `d5941dd` documented that it leaves the default volume name as it was ("QEMU VVFAT"), but it doesn't actually implement this. You get an empty name (eleven space characters) instead. This fixes the implementation to apply the advertised default. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-29 11:14:13 +02:00
Kevin Wolf	ebb72c9f06	vvfat: Fix volume name assertion Commit `d5941dd` made the volume name configurable, but it didn't consider that the rw code compares the volume name string to assert that the first directory entry is the volume name. This made vvfat crash in rw mode. This fixes the assertion to compare with the configured volume name instead of a literal string. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-29 11:14:08 +02:00
Fam Zheng	ab27c3b5e7	mirror: Workaround for unexpected iohandler events during completion Commit `5a7e7a0ba` moved mirror_exit to a BH handler but didn't add any protection against new requests that could sneak in just before the BH is dispatched. For example (assuming a code base at that commit): main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch aio_dispatch ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop qemu_iohandler_poll virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 <snip> aio_dispatch aio_bh_poll (c) mirror_exit At (a) we know the BDS has no pending request. However, the same main_loop_wait call is going to dispatch iohandlers (EventNotifier events), which may lead to a new I/O from guest. So the invariant is already broken at (c). Data loss. Commit `f3926945c8` made iohandler to use aio API. The order of virtio_queue_host_notifier_read and block_job_defer_to_main_loop within a main_loop_wait becomes unpredictable, and even worse, if the host notifier event arrives at the next main_loop_wait call, the unpredictable order between mirror_exit and virtio_queue_host_notifier_read is also a trouble. As shown below, this commit made the bug easier to trigger: - Bug case 1: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 ... aio_dispatch aio_bh_poll (c) mirror_exit - Bug case 2: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop main_loop_wait # 2 ... aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite aio_dispatch aio_bh_poll (c) mirror_exit In both cases, (b) breaks the invariant wanted by (a) and (c). Until then, the request loss has been silent. Later, `3f09bfbc7b` added asserts at (c) to check the invariant (in bdrv_replace_in_backing_chain), and Max reported an assertion failure first visible there, by doing active committing while the guest is running bonnie++. 2.5 added bdrv_drained_begin at (a) to protect the dataplane case from similar problems, but we never realize the main loop bug until now. As a bandage, this patch disables iohandler's external events temporarily together with bs->ctx. Launchpad Bug: 1570134 Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-22 16:44:09 +02:00

... 3 4 5 6 7 ...

2811 Commits