mirrors/qemu

mirror of https://github.com/qemu/qemu.git synced 2024-12-02 08:13:34 +08:00

Author	SHA1	Message	Date
Paolo Bonzini	53fabd4b86	qemu-ga: obey LISTEN_PID when using systemd socket activation qemu-ga's socket activation support was not obeying the LISTEN_PID environment variable, which avoids that a process uses a socket-activation file descriptor meant for its parent. Mess can for example ensue if a process forks a children before consuming the socket-activation file descriptor and therefore setting O_CLOEXEC on it. Luckily, qemu-nbd also got socket activation code, and its copy does support LISTEN_PID. Some extra fixups are needed to ensure that the code can be used for both, but that's what this patch does. The main change is to replace get_listen_fds's "consume" argument with the FIRST_SOCKET_ACTIVATION_FD macro from the qemu-nbd code. Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-19 11:12:12 +01:00
Paolo Bonzini	6b3cca76dd	oslib-posix: fix compilation on OpenBSD si_band is not found in OpenBSD. It is marked as obsolescent in POSIX, so we can delete it without any remorse. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170317152214.6148-1-pbonzini@redhat.com Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-17 18:27:49 +00:00
Peter Lieven	b7a745dc33	thread-pool: add missing qemu_bh_cancel in completion function commit `3c80ca15` fixed a deadlock scenarion with nested aio_poll invocations. However, the rescheduling of the completion BH introcuded unnecessary spinning in the main-loop. On very fast file backends this can even lead to the "WARNING: I/O thread spun for 1000 iterations" message popping up. Callgrind reports about 3-4% less instructions with this patch running qemu-img bench on a ramdisk based VMDK file. Fixes: `3c80ca158c` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:21 +01:00
Daniel P. Berrange	9dc44aa582	os: don't corrupt pre-existing memory-backend data with prealloc When using a memory-backend object with prealloc turned on, QEMU will memset() the first byte in every memory page to zero. While this might have been acceptable for memory backends associated with RAM, this corrupts application data for NVDIMMs. Instead of setting every page to zero, read the current byte value and then just write that same value back, so we are not corrupting the original data. Directly write the value instead of memset()ing it, since there's no benefit to memset for a single byte write. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Message-id: 20170303113255.28262-1-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-03-15 11:55:41 +08:00
Paolo Bonzini	6b8f0187a4	icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread icount has become much slower after tcg_cpu_exec has stopped using the BQL. There is also a latent bug that is masked by the slowness. The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL timer now has to wake up the I/O thread and wait for it. The rendez-vous is mediated by the BQL QemuMutex: - handle_icount_deadline wakes up the I/O thread with BQL taken - the I/O thread wakes up and waits on the BQL - the VCPU thread releases the BQL a little later - the I/O thread raises an interrupt, which calls qemu_cpu_kick - the VCPU thread notices the interrupt, takes the BQL to process it and waits on it All this back and forth is extremely expensive, causing a 6 to 8-fold slowdown when icount is turned on. One may think that the issue is that the VCPU thread is too dependent on the BQL, but then the latent bug comes in. I first tried removing the BQL completely from the x86 cpu_exec, only to see everything break. The only way to fix it (and make everything slow again) was to add a dummy BQL lock/unlock pair. This is because in -icount mode you really have to process the events before the CPU restarts executing the next instruction. Therefore, this series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in the vCPU thread when running in icount mode. The required changes include: - make the timer notification callback wake up TCG's single vCPU thread when run from another thread. By using async_run_on_cpu, the callback can override all_cpu_threads_idle() when the CPU is halted. - move handle_icount_deadline after qemu_tcg_wait_io_event, so that the timer notification callback is invoked after the dummy work item wakes up the vCPU thread - make handle_icount_deadline run the timers instead of just waking the I/O thread. - stop processing the timers in the main loop Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-14 13:51:34 +01:00
Paolo Bonzini	3f53bc61a4	cpus: define QEMUTimerListNotifyCB for QEMU system emulation There is no change for now, because the callback just invokes qemu_notify_event. Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-14 13:28:29 +01:00
Paolo Bonzini	d2528bdc19	qemu-timer: do not include sysemu/cpus.h from util/qemu-timer.h This dependency is the wrong way, and we will need util/qemu-timer.h from sysemu/cpus.h in the next patch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-14 13:28:18 +01:00
Paolo Bonzini	33bef0b994	qemu-timer: fix off-by-one If the first timer is exactly at the current value of the clock, the deadline is met and the timer should fire. This fixes itself on the next iteration of the loop without icount; with icount, however, execution of instructions will stop exactly at the deadline and won't proceed. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-14 13:26:42 +01:00
Suramya Shah	bd5d983fa8	util: Removed unneeded header from path.c Signed-off-by: Suramya Shah <shah.suramya@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20170310163948.7567-1-shah.suramya@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-14 13:26:37 +01:00
Jitendra Kolhe	1e356fc14b	mem-prealloc: reduce large guest start-up and migration time. Using "-mem-prealloc" option for a large guest leads to higher guest start-up and migration time. This is because with "-mem-prealloc" option qemu tries to map every guest page (create address translations), and make sure the pages are available during runtime. virsh/libvirt by default, seems to use "-mem-prealloc" option in case the guest is configured to use huge pages. The patch tries to map all guest pages simultaneously by spawning multiple threads. Currently limiting the change to QEMU library functions on POSIX compliant host only, as we are not sure if the problem exists on win32. Below are some stats with "-mem-prealloc" option for guest configured to use huge pages. ------------------------------------------------------------------------ Idle Guest \| Start-up time \| Migration time ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - single threaded (existing code) ------------------------------------------------------------------------ 64 Core - 4TB \| 54m11.796s \| 75m43.843s 64 Core - 1TB \| 8m56.576s \| 14m29.049s 64 Core - 256GB \| 2m11.245s \| 3m26.598s ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - map guest pages using 8 threads ------------------------------------------------------------------------ 64 Core - 4TB \| 5m1.027s \| 34m10.565s 64 Core - 1TB \| 1m10.366s \| 8m28.188s 64 Core - 256GB \| 0m19.040s \| 2m10.148s ----------------------------------------------------------------------- Guest stats with 2M HugePage usage - map guest pages using 16 threads ----------------------------------------------------------------------- 64 Core - 4TB \| 1m58.970s \| 31m43.400s 64 Core - 1TB \| 0m39.885s \| 7m55.289s 64 Core - 256GB \| 0m11.960s \| 2m0.135s ----------------------------------------------------------------------- Changed in v2: - modify number of memset threads spawned to min(smp_cpus, 16). - removed 64GB memory restriction for spawning memset threads. Changed in v3: - limit number of threads spawned based on min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus) - implement memset thread specific siglongjmp in SIGBUS signal_handler. Changed in v4 - remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread as main thread no longer touches any pages. - simplify code my returning memset_thread_failed status from touch_all_pages. Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com> Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-14 13:26:36 +01:00
Markus Armbruster	0b2c1beea4	keyval: Support lists Additionally permit non-negative integers as key components. A dictionary's keys must either be all integers or none. If all keys are integers, convert the dictionary to a list. The set of keys must be [0,N]. Examples: * list.1=goner,list.0=null,list.1=eins,list.2=zwei is equivalent to JSON [ "null", "eins", "zwei" ] * a.b.c=1,a.b.0=2 is inconsistent: a.b.c clashes with a.b.0 * list.0=null,list.2=eins,list.2=zwei has a hole: list.1 is missing Similar design flaw as for objects: there is no way to denote an empty list. While interpreting "key absent" as empty list seems natural (removing a list member from the input string works when there are multiple ones, so why not when there's just one), it doesn't work: "key absent" already means "optional list absent", which isn't the same as "empty list present". Update the keyval object visitor to use this a.0 syntax in error messages rather than the usual a[0]. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1488317230-26248-25-git-send-email-armbru@redhat.com> [Off-by-one fix squashed in, as per Kevin's review] Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 16:07:48 +01:00
Markus Armbruster	f740048323	keyval: Restrict key components to valid QAPI names Until now, key components are separated by '.'. This leaves little room for evolving the syntax, and is incompatible with the __RFQDN_ prefix convention for downstream extensions. Since key components will be commonly used as QAPI member names by the QObject input visitor, we can just as well borrow the QAPI naming rules here: letters, digits, hyphen and period starting with a letter, with an optional __RFQDN_ prefix for downstream extensions. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1488317230-26248-20-git-send-email-armbru@redhat.com>	2017-03-07 16:07:47 +01:00
Markus Armbruster	d454dbe0ee	keyval: New keyval_parse() keyval_parse() parses KEY=VALUE,... into a QDict. Works like qemu_opts_parse(), except: * Returns a QDict instead of a QemuOpts (d'oh). * Supports nesting, unlike QemuOpts: a KEY is split into key fragments at '.' (dotted key convention; the block layer does something similar on top of QemuOpts). The key fragments are QDict keys, and the last one's value is updated to VALUE. * Each key fragment may be up to 127 bytes long. qemu_opts_parse() limits the entire key to 127 bytes. * Overlong key fragments are rejected. qemu_opts_parse() silently truncates them. * Empty key fragments are rejected. qemu_opts_parse() happily accepts empty keys. * It does not store the returned value. qemu_opts_parse() stores it in the QemuOptsList. * It does not treat parameter "id" specially. qemu_opts_parse() ignores all but the first "id", and fails when its value isn't id_wellformed(), or duplicate (a QemuOpts with the same ID is already stored). It also screws up when a value contains ",id=". * Implied value is not supported. qemu_opts_parse() desugars "foo" to "foo=on", and "nofoo" to "foo=off". * An implied key's value can't be empty, and can't contain ','. I intend to grow this into a saner replacement for QemuOpts. It'll take time, though. Note: keyval_parse() provides no way to do lists, and its key syntax is incompatible with the __RFQDN_ prefix convention for downstream extensions, because it blindly splits at '.', even in __RFQDN_. Both issues will be addressed later in the series. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1488317230-26248-4-git-send-email-armbru@redhat.com>	2017-03-07 16:07:46 +01:00
Peter Maydell	17783ac828	ppc patch queuye for 2017-03-03 This will probably be my last pull request before the hard freeze. It has some new work, but that has all been posted in draft before the soft freeze, so I think it's reasonable to include in qemu-2.9. This batch has: * A substantial amount of POWER9 work * Implements the legacy (hash) MMU for POWER9 * Some more preliminaries for implementing the POWER9 radix MMU * POWER9 has_work * Basic POWER9 compatibility mode handling * Removal of some premature tests * Some cleanups and fixes to the existing MMU code to make the POWER9 work simpler * A bugfix for TCG multiply adds on power * Allow pseries guests to access PCIe extended config space This also includes a code-motion not strictly in ppc code - moving getrampagesize() from ppc code to exec.c. This will make some future VFIO improvements easier, Paolo said it was ok to merge via my tree. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYuOEEAAoJEGw4ysog2bOSHD0P/jBg/qr/4KnsB1KhnlVrB2sP vy2d3bGGlUWr9Z+CK/PMCRB8ekFgQLjidLIXji6mviUocv6m3WsVrnbLF/oOL/IT NPMVAffw7q804YVu1Ns9R82d6CIqHTy//bpg69tFMcJmhL9fqPan3wTZZ9JeiyAm SikqkAHBSW4SxKqg8ApaSqx5L2QTqyfkClR0sLmgM0JtmfJrbobpQ6bMtdPjUZ9L n2gnpO2vaWCa1SEQrRrdELqvcD8PHkSJapWOBXOkpGWxoeov/PYxOgkpdDUW4qYY lVLtp1Vd3OB/h3Unqfw32DNiHA5p89hWPX5UybKMgRVL9Cv2/lyY47pcY8XTeNzn bv84YRbFJeI+GgoEnghmtq+IM8XiW/cr9rWm9wATKfKGcmmFauumALrsffUpHVCM 4hSNgBv5t2V9ptZ+MDlM/Ku+zk9GoqwQ+hemdpVtiyhOtGUPGFBn5YLE4c2DHFxV +L9JtBnFn8obnssNoz0wL+QvZchT1qUHMhH5CWAanjw9CTDp/YwQ2P01zK+00s9d 4cB7fUG3WNto5eXXEGMaXeDsUEz8z//hTe3j5sVbnHsXi0R3dhv7iryifmx4bUKU H9EwAc+uNUHbvBy7u6IWg0I8P2n00CCO6JqXijQ92zELJ5j0XhzHUI2dOXn+zyEo 3FZu56LFnSSUBEXuTjq4 =PcNw -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170303' into staging ppc patch queuye for 2017-03-03 This will probably be my last pull request before the hard freeze. It has some new work, but that has all been posted in draft before the soft freeze, so I think it's reasonable to include in qemu-2.9. This batch has: * A substantial amount of POWER9 work * Implements the legacy (hash) MMU for POWER9 * Some more preliminaries for implementing the POWER9 radix MMU * POWER9 has_work * Basic POWER9 compatibility mode handling * Removal of some premature tests * Some cleanups and fixes to the existing MMU code to make the POWER9 work simpler * A bugfix for TCG multiply adds on power * Allow pseries guests to access PCIe extended config space This also includes a code-motion not strictly in ppc code - moving getrampagesize() from ppc code to exec.c. This will make some future VFIO improvements easier, Paolo said it was ok to merge via my tree. # gpg: Signature made Fri 03 Mar 2017 03:20:36 GMT # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170303: target/ppc: rewrite f[n]m[add,sub] using float64_muladd spapr: Small cleanup of PPC MMU enums spapr_pci: Advertise access to PCIe extended config space target/ppc: Rework hash mmu page fault code and add defines for clarity target/ppc: Move no-execute and guarded page checking into new function target/ppc: Add execute permission checking to access authority check target/ppc: Add Instruction Authority Mask Register Check hw/ppc/spapr: Add POWER9 to pseries cpu models target/ppc/POWER9: Add cpu_has_work function for POWER9 target/ppc/POWER9: Add POWER9 pa-features definition target/ppc/POWER9: Add POWER9 mmu fault handler target/ppc: Don't gen an SDR1 on POWER9 and rework register creation target/ppc: Add patb_entry to sPAPRMachineState target/ppc/POWER9: Add POWERPC_MMU_V3 bit powernv: Don't test POWER9 CPU yet exec, kvm, target-ppc: Move getrampagesize() to common code target/ppc: Add POWER9/ISAv3.00 to compat_table Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-04 16:31:14 +00:00
Paolo Bonzini	d98d407234	cpus: remove ugly cast on sigbus_handler The cast is there because sigbus_handler is invoked via sigfd_handler. But it feels just wrong to use struct qemu_signalfd_siginfo in the prototype of a function that is passed to sigaction. Instead, do a simple-minded conversion of qemu_signalfd_siginfo to siginfo_t. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-03 16:40:02 +01:00
Alexey Kardashevskiy	9c60766887	exec, kvm, target-ppc: Move getrampagesize() to common code getrampagesize() returns the largest supported page size and mainly used to know if huge pages are enabled. However is implemented in target-ppc/kvm.c and not available in TCG or other architectures. This renames and moves gethugepagesize() to mmap-alloc.c where fd-based analog of it is already implemented. This renames and moves getrampagesize() to exec.c as it seems to be the common place for helpers like this. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-03 11:30:59 +11:00
Peter Maydell	c9fc677a35	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYto49AAoJENro4Ql1lpzljb8P/RutuV92h/e/RvIQ00gIE1Tk Tdma/9HpobMtutScp7xCbSjDCaFBHtk8YFmdbPwcwLpNIg9RGUs5cCTVEPvT8nq3 XRrUFtCzZua4U4CKLOAbo53bDN2x36r6rHSpX7FaYvC51o47MNFS2fpKJ5R8Rg4s v9i9zM6bWGYzNbBnK4XDi251/7r8yK2g3okBEZao5TDwBUCtimbwfXn9E/iB21XD HJJRZGIkgCFXebCbPzLfob8NxQi0whuJW6N9XscJOy1AKDTfGT4QUsUUkPUZNRVe lbGjoGJnq5XYMO+9OWJzaSf6ecFFQT/Seu5P6xLb3SRgdT9IXISouPt9sOsYlM7R U/Sw34gPKb2979gASLs2NPbrTPHBQp0hdL8E01LRFjdGVHuvgZcGr0OdxpbUSkEf nJgtBFpLVQdL+qtdQtxgSBZz0/hWMIRcXB++MCPfpWpP9uBUErr4/XQN2rfqLLUM UgA/doDMX7Hc9m81mZ+h+Pji+9O8zWjt7NKJXgPeZPsUsMbUe05VzAl3Fp4khEWM 05hrFB5biQguXkHXBIKO/fWwaftRaatqy0qAvXaiKidcRzQjF9ohlZgWd/CJyAWz V4GocO2oZL3rxQXwW6kh3HKlMCWUj563FhT4V5eN8VQE5PWLQQ+ObBkRHWfyayRE eEFFYApPQRVibAo9U7mC =09RR -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/elmarco/tags/leak-pull-request' into staging # gpg: Signature made Wed 01 Mar 2017 09:02:53 GMT # gpg: using RSA key 0xDAE8E10975969CE5 # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * remotes/elmarco/tags/leak-pull-request: (28 commits) tests: fix virtio-blk-test leaks tests: add specialized device_find function tests: fix usb-test leaks tests: allows to run single test in usb-hcd-ehci-test usb: release the created buses bus: do not unref hotplug handler tests: fix virtio-9p-test leaks tests: fix virtio-scsi-test leak tests: fix e1000e leaks tests: fix i440fx-test leaks tests: fix e1000-test leak tests: fix tco-test leaks tests: fix eepro100-test leak pc: pcihp: avoid adding ACPI_PCIHP_PROP_BSEL twice tests: fix ipmi-bt-test leak tests: fix ipmi-kcs-test leak tests: fix bios-tables-test leak tests: fix hd-geo-test leaks tests: fix ide-test leaks tests: fix vhost-user-test leaks ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-02 15:25:37 +00:00
Marc-André Lureau	e703dcbaee	timer: use an inline function for free Similarly to allocation, do it from an inline function. This allows tests to only use the headers for allocation/free of timer. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-01 00:09:28 +04:00
Markus Armbruster	9e19ad4e49	option: Tweak invalid size error message and unbreak iotest 049 Commit `75cdcd1` neglected to update tests/qemu-iotests/049.out, and made the error message for negative size worse. Fix that. Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-02-28 20:40:31 +01:00
Markus Armbruster	75cdcd1553	option: Fix checking of sizes for overflow and trailing crap parse_option_size()'s checking for overflow and trailing crap is wrong. Has always been that way. qemu_strtosz() gets it right, so use that. This adds support for size suffixes 'P', 'E', and ignores case for all suffixes, not just 'k'. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-25-git-send-email-armbru@redhat.com>	2017-02-23 20:35:36 +01:00
Markus Armbruster	f46bfdbfc8	util/cutils: Change qemu_strtosz*() from int64_t to uint64_t This will permit its use in parse_option_size(). Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86) Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core) Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core) Cc: qemu-block@nongnu.org (open list:Block layer core) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1487708048-2131-24-git-send-email-armbru@redhat.com>	2017-02-23 20:35:36 +01:00
Markus Armbruster	f17fd4fdf0	util/cutils: Return qemu_strtosz*() error and value separately This makes qemu_strtosz(), qemu_strtosz_mebi() and qemu_strtosz_metric() similar to qemu_strtoi64(), except negative values are rejected. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86) Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core) Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core) Cc: qemu-block@nongnu.org (open list:Block layer core) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1487708048-2131-23-git-send-email-armbru@redhat.com>	2017-02-23 20:35:36 +01:00
Markus Armbruster	4fcdf65ae2	util/cutils: Let qemu_strtosz*() optionally reject trailing crap Change the qemu_strtosz() & friends to return -EINVAL when @endptr is null and the conversion doesn't consume the string completely. Matches how qemu_strtol() & friends work. Only test_qemu_strtosz_simple() passes a null @endptr. No functional change there, because its conversion consumes the string. Simplify callers that use @endptr only to fail when it doesn't point to '\0' to pass a null @endptr instead. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86) Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core) Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core) Cc: qemu-block@nongnu.org (open list:Block layer core) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1487708048-2131-22-git-send-email-armbru@redhat.com>	2017-02-23 20:35:36 +01:00
Markus Armbruster	17f942560e	util/cutils: Drop QEMU_STRTOSZ_DEFSUFFIX_* macros Writing QEMU_STRTOSZ_DEFSUFFIX_* instead of '*' gains nothing. Get rid of these eyesores. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1487708048-2131-18-git-send-email-armbru@redhat.com>	2017-02-23 20:35:36 +01:00
Markus Armbruster	466dea14e6	util/cutils: New qemu_strtosz() Most callers of qemu_strtosz_suffix() pass QEMU_STRTOSZ_DEFSUFFIX_B. Capture the pattern in new qemu_strtosz(). Inline qemu_strtosz_suffix() into its only remaining caller. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-17-git-send-email-armbru@redhat.com>	2017-02-23 20:35:36 +01:00
Markus Armbruster	e591591b32	util/cutils: Rename qemu_strtosz() to qemu_strtosz_MiB() With qemu_strtosz(), no suffix means mebibytes. It's used rarely. I'm going to add a similar function where no suffix means bytes. Rename qemu_strtosz() to qemu_strtosz_MiB() to make the name qemu_strtosz() available for the new function. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-16-git-send-email-armbru@redhat.com>	2017-02-23 20:35:36 +01:00
Markus Armbruster	d2734d2629	util/cutils: New qemu_strtosz_metric() To parse numbers with metric suffixes, we use qemu_strtosz_suffix_unit(nptr, &eptr, QEMU_STRTOSZ_DEFSUFFIX_B, 1000) Capture this in a new function for legibility: qemu_strtosz_metric(nptr, &eptr) Replace test_qemu_strtosz_suffix_unit() by test_qemu_strtosz_metric(). Rename qemu_strtosz_suffix_unit() to do_strtosz() and give it internal linkage. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-15-git-send-email-armbru@redhat.com>	2017-02-23 20:35:36 +01:00
Markus Armbruster	3403e5eb88	option: Fix to reject invalid and overflowing numbers parse_option_number() fails to check for these errors after strtoull(). Has always been broken. Fix that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-10-git-send-email-armbru@redhat.com>	2017-02-23 20:35:35 +01:00
Markus Armbruster	4baef2679e	util/cutils: Clean up control flow around qemu_strtol() a bit Reorder check_strtox_error() to make it obvious that we always store through a non-null @endptr. Transform if (some error) { error case ... err = value for error case; } else { normal case ... err = value for normal case; } return err; to if (some error) { error case ... return value for error case; } normal case ... return value for normal case; Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-9-git-send-email-armbru@redhat.com>	2017-02-23 20:35:35 +01:00
Markus Armbruster	717adf9609	util/cutils: Clean up variable names around qemu_strtol() Name same things the same, different things differently. * qemu_strtol()'s parameter @nptr is called @p in check_strtox_error(). Rename the latter. * qemu_strtol()'s parameter @endptr is called @next in check_strtox_error(). Rename the latter. * qemu_strtol()'s variable @p is called @endptr in check_strtox_error(). Rename both to @ep. * qemu_strtol()'s variable @err is negative errno, check_strtox_error()'s parameter @err is positive. Rename the latter to @libc_errno. Same for qemu_strtoul(), qemu_strtoi64(), qemu_strtou64(), of course. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-8-git-send-email-armbru@redhat.com>	2017-02-23 20:35:35 +01:00
Markus Armbruster	b30d188677	util/cutils: Rename qemu_strtoll(), qemu_strtoull() The name qemu_strtoll() suggests conversion to long long, but it actually converts to int64_t. Rename to qemu_strtoi64(). The name qemu_strtoull() suggests conversion to unsigned long long, but it actually converts to uint64_t. Rename to qemu_strtou64(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1487708048-2131-7-git-send-email-armbru@redhat.com>	2017-02-23 20:35:35 +01:00
Markus Armbruster	4295f879be	util/cutils: Rewrite documentation of qemu_strtol() & friends Fixes the following documentation bugs: * Fails to document that null @nptr is safe. * Fails to document that we return -EINVAL when no conversion could be performed (commit `47d4be1`). * Confuses long long with int64_t, and unsigned long long with uint64_t. * Claims the unsigned conversions can underflow. They can't. While there, mark problematic assumptions that int64_t is long long, and uint64_t is unsigned long long with FIXME comments. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1487708048-2131-6-git-send-email-armbru@redhat.com>	2017-02-23 20:35:35 +01:00
Markus Armbruster	8ee8409eff	option: Assert value string isn't null Plenty of code relies on QemuOpt member @str not being null, including qemu_opts_print(), qemu_opts_to_qdict(), and callbacks passed to qemu_opt_foreach(). Begs the question whether it can be null. Only opt_set() creates QemuOpt. It sets member @str to its argument @value. Passing null for @value would plant a time bomb. Callers: * opts_do_parse() can't pass null. * qemu_opt_set() passes its argument @value. Callers: - qemu_opts_from_qdict_1() can't pass null - qemu_opts_set() passes its argument @value, but none of its callers pass null. - Many more outside qemu-option.c, but they shouldn't pass null, either. Assert member @str isn't null, so that misuse is caught right away. Simplify parse_option_bool(), parse_option_number() and parse_option_size() accordingly. Best viewed with whitespace changes ignored. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-3-git-send-email-armbru@redhat.com>	2017-02-23 20:35:35 +01:00
Paolo Bonzini	a7b91d35ba	coroutine-lock: make CoRwlock thread-safe and fair This adds a CoMutex around the existing CoQueue. Because the write-side can just take CoMutex, the old "writer" field is not necessary anymore. Instead of removing it altogether, count the number of pending writers during a read-side critical section and forbid further readers from entering. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-7-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Paolo Bonzini	1ace7ceac5	coroutine-lock: add mutex argument to CoQueue APIs All that CoQueue needs in order to become thread-safe is help from an external mutex. Add this to the API. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-6-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Paolo Bonzini	480cff6322	coroutine-lock: add limited spinning to CoMutex Running a very small critical section on pthread_mutex_t and CoMutex shows that pthread_mutex_t is much faster because it doesn't actually go to sleep. What happens is that the critical section is shorter than the latency of entering the kernel and thus FUTEX_WAIT always fails. With CoMutex there is no such latency but you still want to avoid wait and wakeup. So introduce it artificially. This only works with one waiters; because CoMutex is fair, it will always have more waits and wakeups than a pthread_mutex_t. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Paolo Bonzini	fed20a70e3	coroutine-lock: make CoMutex thread-safe This uses the lock-free mutex described in the paper '"Blocking without Locking", or LFTHREADS: A lock-free thread library' by Gidenstam and Papatriantafilou. The same technique is used in OSv, and in fact the code is essentially a conversion to C of OSv's code. [Added missing coroutine_fn in tests/test-aio-multithread.c. --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Paolo Bonzini	bd451435c0	async: remove unnecessary inc/dec pairs Pull the increment/decrement pair out of aio_bh_poll and into the callers. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-18-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Paolo Bonzini	a153bf52b3	aio-posix: partially inline aio_dispatch into aio_poll This patch prepares for the removal of unnecessary lockcnt inc/dec pairs. Extract the dispatching loop for file descriptor handlers into a new function aio_dispatch_handlers, and then inline aio_dispatch into aio_poll. aio_dispatch can now become void. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-17-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	b9e413dd37	block: explicitly acquire aiocontext in aio callbacks that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-16-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	1919631e6b	block: explicitly acquire aiocontext in bottom halves that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-15-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	9d45665448	block: explicitly acquire aiocontext in callbacks that need it This covers both file descriptor callbacks and polling callbacks, since they execute related code. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-14-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:36 +00:00
Paolo Bonzini	2f47da5f7f	block: explicitly acquire aiocontext in timers that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-13-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	0836c72f70	aio: push aio_context_acquire/release down to dispatching The AioContext data structures are now protected by list_lock and/or they are walked with FOREACH_RCU primitives. There is no need anymore to acquire the AioContext for the entire duration of aio_dispatch. Instead, just acquire it before and after invoking the callbacks. The next step is then to push it further down. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-12-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	a9d9235567	coroutine-lock: reschedule coroutine on the AioContext it was running on As a small step towards the introduction of multiqueue, we want coroutines to remain on the same AioContext that started them, unless they are moved explicitly with e.g. aio_co_schedule. This patch avoids that coroutines switch AioContext when they use a CoMutex. For now it does not make much of a difference, because the CoMutex is not thread-safe and the AioContext itself is used to protect the CoMutex from concurrent access. However, this is going to change. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-9-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	0c330a734b	aio: introduce aio_co_schedule and aio_co_wake aio_co_wake provides the infrastructure to start a coroutine on a "home" AioContext. It will be used by CoMutex and CoQueue, so that coroutines don't jump from one context to another when they go to sleep on a mutex or waitqueue. However, it can also be used as a more efficient alternative to one-shot bottom halves, and saves the effort of tracking which AioContext a coroutine is running on. aio_co_schedule is the part of aio_co_wake that starts a coroutine on a remove AioContext, but it is also useful to implement e.g. bdrv_set_aio_context callbacks. The implementation of aio_co_schedule is based on a lock-free multiple-producer, single-consumer queue. The multiple producers use cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom half) grabs all items added so far, inverts the list to make it FIFO, and goes through it one item at a time until it's empty. The data structure was inspired by OSv, which uses it in the very code we'll "port" to QEMU for the thread-safe CoMutex. Most of the new code is really tests. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:07 +00:00
Paolo Bonzini	c2b38b277a	block: move AioContext, QEMUTimer, main-loop to libqemuutil AioContext is fairly self contained, the only dependency is QEMUTimer but that in turn doesn't need anything else. So move them out of block-obj-y to avoid introducing a dependency from io/ to block-obj-y. main-loop and its dependency iohandler also need to be moved, because later in this series io/ will call iohandler_get_aio_context. [Changed copyright "the QEMU team" to "other QEMU contributors" as suggested by Daniel Berrange and agreed by Paolo. --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:07 +00:00
Halil Pasic	59046ec29a	migration: consolidate VMStateField.start The member VMStateField.start is used for two things, partial data migration for VBUFFER data (basically provide migration for a sub-buffer) and for locating next in QTAILQ. The implementation of the VBUFFER feature is broken when VMSTATE_ALLOC is used. This however goes unnoticed because actually partial migration for VBUFFER is not used at all. Let's consolidate the usage of VMStateField.start by removing support for partial migration for VBUFFER. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Message-Id: <20170203175217.45562-1-pasic@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-13 17:27:13 +00:00
Daniel P. Berrange	e998e2090f	util: add iterators for QemuOpts values To iterate over all QemuOpts currently requires using a callback function which is inconvenient for control flow. Add support for using iterator functions more directly QemuOptsIter iter; QemuOpt *opt; qemu_opts_iter_init(&iter, opts, "repeated-key"); while ((opt = qemu_opts_iter_next(&iter)) != NULL) { ....do something... } Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170203120649.15637-8-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>	2017-02-09 17:28:49 +01:00
Jose Ricardo Ziviani	f539fbe337	host-utils: Implement unsigned quadword left/right shift and unit tests Implements 128-bit left shift and right shift as well as their testcases. By design, shift silently mods by 128, so the caller is responsible to assert the shift range if necessary. Left shift sets the overflow flag if any non-zero digit is shifted out. Examples: ulshift(&low, &high, 250, &overflow); equivalent: n << 122 urshift(&low, &high, -2); equivalent: n << 126 Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> [dwg: Added test-shift128 to .gitignore] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:14 +11:00

1 2 3 4 5 ...

621 Commits