Since iommu devices can be created with '-device' there is
no need to keep iommu as machine and mch property.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Skip bus_master_enable region creation on PCI device init
in order to be sure the IOMMU device (if present) would
be created in advance. Add this memory region at machine_done time.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Here's the current ppc patch queue. This is a fairly large batch,
containing:
* A number of further preliminary patches towards full hypervisor
mode emulation
* Some further fixes / cleanups for the recently merged device_add
based CPU hotplug
* Preliminary patches towards supporting a native (rather than
paravirtualized) XICS device. This will be needed to emulate a
physical Power machine, including hypervisor capabilities
* Assorted bug fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXdgYTAAoJEGw4ysog2bOS8qIQAKjNHf4U1KAf1cI6SWqtrtOJ
y6PVbSZc/ywc3KF/on/o6owt1VidoLe8mkd1rTlQminY0/RXL5Fv+kwIqQ4F54h9
0kKf9GDrLwjKI6IXEeQawzSa8wlfU1b3zjQJhJpztjwfdeTKAydOh5XEujF+hoCB
NkESZbJbH1gdWlxSOUD+1+DOxOfcHXsE4aJmqaw9F6l2GSHLdO//t9+KPxYfSY02
b9v6FTUkvZ/2xIQ1k8/0laE0tWqfLpTNCwkRbacHauZu1zUQVZa+fJDx7XgIlcGr
5undbLWy9/kBONQbjN9BeU9pc7XUPeNWKYZx7MX6ClbV2V74OPgjNdDHgcpTQim8
/ThiF8389qZ9tzoL+LPIdaIlOok0/I8NEcp2jfa2W2Z+AQnyjJ7nFJcjDhtRmhsI
YHUYtTrceY40scUcucDth4osZ0w/PDNr94qb6HwnYxrekXmptfTMf2dq4i6yoH31
VHQYTyBZ5dYg4DjzYuZLpNjC1Bok683Y5g4MtKf0h0A/LUaKdcHYLtuGw+jV9RtP
qC1RZMv2ZmU2jT0Jp2hUcj+S9M1OcitUUwemT2YrMDV0iXiWV6yiK7Rw28VBDLCV
TPYoDDK9xlqYsw0wOAM3uS9ywLn4tIRSG6hp6iIh8ZWt8z66KSyRikWnSjAVI+GO
Fm6xNz0a/r9UomjPB0wa
=KhVy
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160701' into staging
ppc patch queue 2016-07-01
Here's the current ppc patch queue. This is a fairly large batch,
containing:
* A number of further preliminary patches towards full hypervisor
mode emulation
* Some further fixes / cleanups for the recently merged device_add
based CPU hotplug
* Preliminary patches towards supporting a native (rather than
paravirtualized) XICS device. This will be needed to emulate a
physical Power machine, including hypervisor capabilities
* Assorted bug fixes
# gpg: Signature made Fri 01 Jul 2016 06:56:35 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160701: (23 commits)
qmp: fix spapr example of query-hotpluggable-cpus
spapr: drop duplicate variable in spapr_core_release()
spapr: do proper error propagation in spapr_cpu_core_realize_child()
spapr: drop reference on child object during core realization
spapr: Restore support for 970MP and POWER8NVL CPU cores
target-ppc: gen_pause for instructions: yield, mdoio, mdoom, miso
ppc/xics: Replace "icp" with "xics" in most places
ppc/xics: Implement H_IPOLL using an accessor
ppc/xics: Move SPAPR specific code to a separate file
ppc/xics: Rename existing xics to xics_spapr
ppc: Fix 64K pages support in full emulation
target-ppc: Eliminate redundant and incorrect function booke206_page_size_to_tlb
spapr: Restore support for older PowerPC CPU cores
spapr: fix write-past-end-of-array error in cpu core device init code
hw/ppc/spapr: Add some missing hcall function set strings
ppc: Print HSRR0/HSRR1 in "info registers"
ppc: LPCR is a HV resource
ppc: Initial HDEC support
ppc: Enforce setting MSR:EE,IR and DR when MSR:PR is set
ppc: Fix conditions for delivering external interrupts to a guest
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The "ICP" is a different object than the "XICS". For historical reasons,
we have a number of places where we name a variable "icp" while it contains
a XICSState pointer. There *is* an ICPState structure too so this makes
the code really confusing.
This is a mechanical replacement of all those instances to use the name
"xics" instead. There should be no functional change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[spapr_cpu_init has been moved to spapr_cpu_core.c, change there]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
None of the other presenter functions directly mucks with the
internal state, so don't do it there either.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Leave the core ICP/ICS logic in xics.c and move the top level
class wrapper, hypercall and RTAS handlers to xics_spapr.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[add cpu.h in xics_spapr.c, move set_nr_irqs and set_nr_servers to
xics_spapr.c]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The common class doesn't change, the KVM one is sPAPR specific. Rename
variables and functions to xics_spapr.
Retain the type name as "xics" to preserve migration for existing sPAPR
guests.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The IOMMU driver may change behavior depending on whether a notifier
client is present. In the case of POWER, this represents a change in
the visibility of the IOTLB, for other drivers such as intel-iommu and
future AMD-Vi emulation, notifier support is not yet enabled and this
provides the opportunity to flag that incompatibility.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
[new log & extracted from [PATCH qemu v17 12/12] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping listening]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The link property that was added to the pcspk device has the wrong type:
it is only correct for TCG and for KVM's userspace or split irqchip
options. The default KVM option (fully in-kernel irqchip) breaks
because it uses a PIT whose type is a sibling of TYPE_I8254.
Fixes: 873b4d3f05
Tested-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1467298657-6588-1-git-send-email-pbonzini@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Calling our function g_list_insert_sorted_merged is a misnomer,
since we are NOT writing a glib function. Furthermore, we are
making every caller pass the same comparator function of
range_merge(): any caller that would try otherwise would break
in weird ways since our internal call to ranges_can_merge() is
hard-coded to operate only on ranges, rather than paying
attention to the caller's comparator.
Better is to fix things so that callers don't have to care about
our internal comparator, by picking a function name and updating
the parameter type away from a gratuitous use of void*, to make
it obvious that we are operating specifically on a list of ranges
and not a generic list. Plus, refactoring the code here will
make it easier to plug a memory leak in the next patch.
range_compare() is now internal only, and moves to the .c file.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1464712890-14262-3-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
g_list_insert_sorted_merged() is rather large to be an inline
function; move it to its own file. range_merge() and
ranges_can_merge() can likewise move, as they are only used
internally. Also, it becomes obvious that the condition within
range_merge() is already satisfied by its caller, and that the
return value is not used.
The diffstat is misleading, because of the copyright boilerplate.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1464712890-14262-2-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
qemu leaves unix socket files behind when removing a listening chardev
or leaving. qemu could clean that up, even if doing so isn't race-free.
Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1347077
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1466105332-10285-4-git-send-email-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Otherwise, this can cause serial_xmit to be entered with LSR.TEMT=0,
which is invalid and causes an assertion failure.
Reported-by: Bret Ketchum <bcketchum@gmail.com>
Tested-by: Bret Ketchum <bcketchum@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
g_source_attach can return any value between 1 and UINT_MAX if you let
QEMU run long enough. However, qemu_chr_fe_add_watch can also return
a negative errno value when the device is disconnected or does not
support chr_add_watch. Change it to return zero to avoid overloading
these values.
Fix the cadence_uart which asserts in this case (easily obtained with
"-serial pty").
Tested-by: Bret Ketchum <bcketchum@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It can never become negative; reflect this in the type of the field
and simplify the conditions.
Tested-by: Bret Ketchum <bcketchum@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This function needs to be converted to QOM hook and virtualised for
multi-arch. This rename interferes, as cpu-qom will not have access
to the renaming causing name divergence. This rename doesn't really do
anything anyway so just delete it.
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <69bd25a8678b8b31b91cd9760c777bed1aafb44e.1437212383.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Crosthwaite <crosthwaitepeter@gmail.com>
Commit 926cde5 ("scsi: esp: make cmdbuf big enough for maximum CDB size",
2016-06-16) changed the size of a migrated field. Split it in two
parts, and only migrate the second part in a new vmstate version.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently a direct access to the device structure field is used to connect ISA
device IRQ to the bus. GPIO access should be used instead if possible.
The patch adds wrapper isa_connect_gpio_out. The function connects specified
output GPIO to specified ISA IRQ.
Signed-off-by: Efimov Vasily <real@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The ICH9 LPC bridge has 24 output IRQs connected to GSI. Currently the IRQs are
referenced by pointers. The pointers are initialized at startup by direct access
to the structure fields. This violates Qemu device model.
The patch makes the IRQs handling to use GPIO model.
Signed-off-by: Efimov Vasily <real@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ich9->pic and ich9->ioapic differ for the first 16 GSIs (because
ich9->pic is wired to 8259+IOAPIC but ich9->ioapic is wired to
IOAPIC only). However, ich9->ioapic is never used for the first
16 GSIs, so the two vectors can be merged.
Reviewed-by: Efimov Vasily <real@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ICH9 SMB bridge can be created using qdev API despite existence of helper
function. The type name is needed for such creation. Using a preprocessor
alias instead the string type name itself is preferable.
The patch makes the alias accessible through the header.
Signed-off-by: Efimov Vasily <real@ispras.ru>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The port92 device has outgouing IRQ line A20. Currently the IRQ is referenced
by a pointer which normally is set during machine initialization. The
pointer is never changed at runtime. Hence, common GPIO model can be applied
to A20 IRQ line. Note that checking for IRQ to be connected as in
previous version of code is not required qemu_set_irq will do it.
Signed-off-by: Efimov Vasily <real@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The i8042 device has outgouing IRQ line A20. Currently the IRQ is referenced
by a pointer which normally is set during machine initialization. The pointer
is never changed at runtime. So common GPIO model can be applied to A20 IRQ
line. Note that checking for IRQ to be connected as in previous version
of code is not required because qemu_set_irq will do it.
Signed-off-by: Efimov Vasily <real@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During creation of Q35 instance several parameters are set using direct access.
It violates Qemu device model. Correctly, the parameters should be handled as
object properties.
The patch adds four link type properties for fields:
mch.ram_memory
mch.pci_address_space
mch.system_memory
mch.address_space_io
And, it adds two size type properties for fields:
mch.below_4g_mem_size
mch.above_4g_mem_size
Signed-off-by: Efimov Vasily <real@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qdev API can be used to create CFI pflash devices despite existance of helper
functions. The type name is needed in course of such creation. Using the
preprocessor alias instead of the string literal itself is preferable.
The patch makes the aliases accessible through the header.
Signed-off-by: Efimov Vasily <real@ispras.ru>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently vmport device is identified by the string literal. Using a
preprocessor alias instead is preferable.
Signed-off-by: Efimov Vasily <real@ispras.ru>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The speaker device needs pointer to ISA PIT device to operate. But according to
qdev-properties.h, properties of pointer type should be avoided. It seems a
link type property is a good substitution.
Signed-off-by: Efimov Vasily <real@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Multiqueue virtio-blk can be enabled as follows:
qemu -device virtio-blk-pci,num-queues=8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466511196-12612-8-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Multiqueue requires that each request knows to which virtqueue it
belongs.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466511196-12612-5-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The num_queues field is always 1 for the time being. A later patch will
make it a configurable device property so that multiqueue can be
enabled.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466511196-12612-2-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Use socket_*() functions from include/qemu/sockets.h instead of
listen()/bind()/connect()/parse_host_port(). socket_*() fucntions are
QAPI based and this patch performs this api conversion since
everything will be using QAPI based sockets in the future. Also add a
helper function socket_address_to_string() in util/qemu-sockets.c
which returns the string representation of socket address. Thetask was
listed on http://wiki.qemu.org/BiteSizedTasks page.
Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
By specifying the silicon revision we select the appropriate reset
values for the SoC.
Additionally, expose hardware strapping properties aliasing those
provided by the SCU for board-specific configuration.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1466744305-23163-3-git-send-email-andrew@aj.id.au
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The SCU is a collection of chip-level control registers that manage the
various functions supported by ASPEED SoCs. Typically the bits control
interactions with clocks, external hardware or reset behaviour, and we
can largly take a hands-off approach to reads and writes.
Firmware makes heavy use of the state to determine how to boot, but the
reset values vary from SoC to SoC (eg AST2400 vs AST2500). A qdev
property is exposed so that the integrating SoC model can configure the
silicon revision, which in-turn selects the appropriate reset values.
Further qdev properties are exposed so the board model can configure the
board-dependent hardware strapping.
Almost all provided AST2400 reset values are specified by the datasheet.
The notable exception is SOC_SCRATCH1, where we mark the DRAM as
successfully initialised to avoid unnecessary dark corners in the SoC's
u-boot support.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Message-id: 1466744305-23163-2-git-send-email-andrew@aj.id.au
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: drop unnecessary inttypes.h include]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Small queue this time. Main reason for sending it is the pair of
patches to fix up the new cpu hotplug model used on Power to what
should be an actually usable state. There's also a small BookE bugfix
and a XICS trivial cleanup.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXcLmFAAoJEGw4ysog2bOSKH4QAJ2vz596UFDlFvH1sZJ2tBo3
3hFOykyRKuJ0dGk2HiC8ncThkTY/0xQCt6ayqj8a0alPEDr2RxnuAcF1k9grzgrO
DtNmmgKzSHxAfajsBY5tBw9sUx76bn8wEVnln1jzScZS0oAnUa8m6hwPuiquUz1P
8GgtanYpWUBguZojQFj8IlCt5sLl6/+v3Vt0xyPsNA7nMjSrdJxzRw0DMWhJkYbs
P3qn3rdOnDpBPerFPQ5bvLcmqpEl/ISNg9Vwa35PXClF8xBP4ERToPhlfcTlnF5w
SxubGiHcXSxqEBIseTO74r0mZ/voqA7ndrgEcaFBH/jZnxl2QomeZsIUdRV/OdiT
8PB0lJIfruPK0Rs0gJB7n/EF5qD6R8uEvOqnlIktQJUswdLhj/J8RsGgEXVa8pSi
E9+T1Dom/zanDrrtLRJJCSb0s+ZmWAs5dOUwCCeSAFEkFq0zNc7pCv0oDcEpyS17
VgcVyPrBGCtHDdW7SITe7pw+chddA4puIzpZb42NRB59QHP56HLrJc14h/pqj8K4
0qrMPKvqpeCdhKuF5WR3Lt0ZFBu2PDXqyJhp1wL/0rOOSUqU4NCQ+3uB+FJeGJ91
WjWfWAeXaroTpT67pgo4crpq+x1TR/neRpe+1s3uCRQTJKzi4vuLS+ZN6BtloUID
IR4Q7wCJKgmwRoyGQpGU
=ltJi
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160627' into staging
ppc patch queue for 2016-06-27
Small queue this time. Main reason for sending it is the pair of
patches to fix up the new cpu hotplug model used on Power to what
should be an actually usable state. There's also a small BookE bugfix
and a XICS trivial cleanup.
# gpg: Signature made Mon 27 Jun 2016 06:28:37 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160627:
qapi: keep names in 'CpuInstanceProperties' in sync with struct CPUCore
qapi: Report support for -device cpu hotplug in query-machines
ppc/xics: Remove unused xics_set_irq_type()
target-ppc: ppce500_spin.c uses SPR_PIR, should use SPR_BOOKE_PIR
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
struct CPUCore uses 'id' suffix in the property name. As docs for
query-hotpluggable-cpus state that the cpu core properties should be
passed back to device_add by management in case new members are added
and thus the names for the fields should be kept in sync.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
[dwg: Removed a duplicated word in comment]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[dwg: Adjusted for context to apply without original series]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Missing values EF_MIPS_FP64 and EF_MIPS_NAN2008 added.
Signed-off-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Maciej W. Rozycki <macro@codesourcery.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
This patch modifies SoftFloat library so that it can be configured in
run-time in relation to the meaning of signaling NaN bit, while, at the
same time, strictly preserving its behavior on all existing platforms.
Background:
In floating-point calculations, there is a need for denoting undefined or
unrepresentable values. This is achieved by defining certain floating-point
numerical values to be NaNs (which stands for "not a number"). For additional
reasons, virtually all modern floating-point unit implementations use two
kinds of NaNs: quiet and signaling. The binary representations of these two
kinds of NaNs, as a rule, differ only in one bit (that bit is, traditionally,
the first bit of mantissa).
Up to 2008, standards for floating-point did not specify all details about
binary representation of NaNs. More specifically, the meaning of the bit
that is used for distinguishing between signaling and quiet NaNs was not
strictly prescribed. (IEEE 754-2008 was the first floating-point standard
that defined that meaning clearly, see [1], p. 35) As a result, different
platforms took different approaches, and that presented considerable
challenge for multi-platform emulators like QEMU.
Mips platform represents the most complex case among QEMU-supported
platforms regarding signaling NaN bit. Up to the Release 6 of Mips
architecture, "1" in signaling NaN bit denoted signaling NaN, which is
opposite to IEEE 754-2008 standard. From Release 6 on, Mips architecture
adopted IEEE standard prescription, and "0" denotes signaling NaN. On top of
that, Mips architecture for SIMD (also known as MSA, or vector instructions)
also specifies signaling bit in accordance to IEEE standard. MSA unit can be
implemented with both pre-Release 6 and Release 6 main processor units.
QEMU uses SoftFloat library to implement various floating-point-related
instructions on all platforms. The current QEMU implementation allows for
defining meaning of signaling NaN bit during build time, and is implemented
via preprocessor macro called SNAN_BIT_IS_ONE.
On the other hand, the change in this patch enables SoftFloat library to be
configured in run-time. This configuration is meant to occur during CPU
initialization, at the moment when it is definitely known what desired
behavior for particular CPU (or any additional FPUs) is.
The change is implemented so that it is consistent with existing
implementation of similar cases. This means that structure float_status is
used for passing the information about desired signaling NaN bit on each
invocation of SoftFloat functions. The additional field in float_status is
called snan_bit_is_one, which supersedes macro SNAN_BIT_IS_ONE.
IMPORTANT:
This change is not meant to create any change in emulator behavior or
functionality on any platform. It just provides the means for SoftFloat
library to be used in a more flexible way - in other words, it will just
prepare SoftFloat library for usage related to Mips platform and its
specifics regarding signaling bit meaning, which is done in some of
subsequent patches from this series.
Further break down of changes:
1) Added field snan_bit_is_one to the structure float_status, and
correspondent setter function set_snan_bit_is_one().
2) Constants <float16|float32|float64|floatx80|float128>_default_nan
(used both internally and externally) converted to functions
<float16|float32|float64|floatx80|float128>_default_nan(float_status*).
This is necessary since they are dependent on signaling bit meaning.
At the same time, for the sake of code cleanup and simplicity, constants
<floatx80|float128>_default_nan_<low|high> (used only internally within
SoftFloat library) are removed, as not needed.
3) Added a float_status* argument to SoftFloat library functions
XXX_is_quiet_nan(XXX a_), XXX_is_signaling_nan(XXX a_),
XXX_maybe_silence_nan(XXX a_). This argument must be present in
order to enable correct invocation of new version of functions
XXX_default_nan(). (XXX is <float16|float32|float64|floatx80|float128>
here)
4) Updated code for all platforms to reflect changes in SoftFloat library.
This change is twofolds: it includes modifications of SoftFloat library
functions invocations, and an addition of invocation of function
set_snan_bit_is_one() during CPU initialization, with arguments that
are appropriate for each particular platform. It was established that
all platforms zero their main CPU data structures, so snan_bit_is_one(0)
in appropriate places is not added, as it is not needed.
[1] "IEEE Standard for Floating-Point Arithmetic",
IEEE Computer Society, August 29, 2008.
Signed-off-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Maciej W. Rozycki <macro@codesourcery.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Tested-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[leon.alrae@imgtec.com:
* cherry-picked 2 chunks from patch #2 to fix compilation warnings]
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
All users have been converted to the new ioevent callbacks.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce a set of ioeventfd callbacks on the virtio-bus level
that can be implemented by the individual transports. At the
virtio-bus level, do common handling for host notifiers (which
is actually most of it).
Two things of note:
- When setting the host notifier, we only switch from/to the
generic ioeventfd handler. This fixes a latent bug where we
had no ioeventfd assigned for a certain window.
- We always iterate over all possible virtio queues, even though
ccw (currently) has a lower limit. It does not really matter
here.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCMachineState.node_cpu was used for mapping APIC ID
to numa node id as CPU entries in SRAT used to be
built on sparse APIC ID bitmap (up to apic_id_limit).
However since commit
5803fce pc: acpi: SRAT: create only valid processor lapic entries
CPU entries in SRAT aren't build using apic bitmap
but using 0..maxcpus index instead which is also used
for creating numa_info[x].node_cpu map.
So instead of doing useless intermediate conversion from
1. node by cpu index -> node by apic id
i.e. numa_info[x].node_cpu -> PCMachineState.node_cpu
2. apic id -> srat entry PMX
PCMachineState.node_cpu[apic id] -> PMX value
use numa_info[x].node_cpu map directly like ARM does and do
1. numa_info[x].node_cpu -> PMX value using index
in range 0..maxcpus
and drop not necessary PCMachineState.node_cpu and related
code.
That also removes the last (not counting legacy hotplug)
dependency of ACPI code on apic_id_limit and need to allocate
huge sparse PCMachineState.node_cpu array in case of 32-bit
APIC IDs.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For compatibility reasons PC/Q35 will start with legacy
CPU hotplug interface by default but with new CPU hotplug
AML code since 2.7 machine type. That way legacy firmware
that doesn't use QEMU generated ACPI tables will be
able to continue using legacy CPU hotplug interface.
While new machine type, with firmware supporting QEMU
provided ACPI tables, will generate new CPU hotplug AML,
which will switch to new CPU hotplug interface when
guest OS executes its _INI method on ACPI tables
loading.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it adds HW and AML parts for CPU_Device._OST method
handling to allow OSPM reports status of hot-(un)plug
operation.
And extends QMP command query-acpi-ospm-status to report
CPU's OST info along with already reported PC-DIMM devices.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it adds hw registers needed for handling CPU hot-remove and
corresponding AML methods to request and eject a CPU with
necessary hotplug callbacks in pc,piix4,ich9 code.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it adds hw registers needed for handling CPU hot-add and
corresponding AML methods to handle hot-add events on
guest side.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add madt_cpu callback to AcpiDeviceIfClass and use
it for generating LAPIC MADT entries for CPUs.
Later it will be used for generating x2APIC
entries in case of more than 255 CPUs and also
would be reused by ARM target when ACPI CPU hotplug
is introduced there.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it adds CPU objects to DSDT with _STA method
and QEMU side of CPU hotplug interface initialization
with registers sufficient to handle _STA requests,
including necessary hotplug callbacks in piix4,ich9 code.
Hot-(un)plug hw/acpi parts will be added by
corresponding follow up patches.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It will be used to select which hotplug call-back is called
and for switching from legacy mode into new one.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It will be used by NVDIMM ACPI
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Implement ObjectType which is used by NVDIMM _DSM method in
later patch
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce a parameter, 'label-size', which is the size of nvdimm label
data area which is reserved at the end of backend memory. It is required
at least 128k
Two callbacks, read_label_data() and write_label_data(), are used to
operate the label area
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This callback returns the MemoryRegion that is the memory of dimm should
be kept during live migration
nvdimm device is different with pc-dimm as its memory includes not only
the MemoryRegion directly mapping to guest's address space but also the
memory used as label data
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use the ACPI table construction tools to create an ACPI entry
for IPMI. This adds a function called build_acpi_ipmi_devices
to add an DSDT entry for IPMI if IPMI is compiled in and an
IPMI device exists. It also adds a dummy function if IPMI
is not compiled in.
This conforms to section "C3-2 Locating IPMI System Interfaces in
ACPI Name Space" in the IPMI 2.0 specification.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add an IPMI table entry to the SMBIOS.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently outstanding patches for spapr, target-ppc and related
devices. This batch has:
* Significant new progress towards full support for hypervisor
mode
* Assorted bugfixes
* Some preliminary patches towards dynamic DMA window support
The last involves a change to memory.c, which Paolo has said I can
take through this tree.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXa3gJAAoJEGw4ysog2bOSzRQQAI0K+F2CoKuiGJ3csr40edlA
rDvHJd0a3DvF5ez8YBY9iiG2nXaphz7LrsfkFUQ/UmqlIpVLNdJaJ8l/RnjwrVnP
coJ9sFkzzARQF39BybGYPOVMalgi/MhnN/+zm87JaSD+C7gihG+ghYR4sgLHWalb
xWDFLnZVg3uTC1IQzS14NMjqswQN1O0d9wyfYjBogAAEvP4daGKVTQCgpoVwGaXw
JNE4Sm1vVIqJtiYr48V4cNVQWodpTMoHdVPJ6LPXWBlJMd3K5BEy3QsFc+jra+kG
Uj8dayNJwu4CjIwDWz4kpL/H0XtLNxfC2/6n1khc53OEnG5SzrgN80yJBvAKzU5S
KDQemwyOd3R2YTItkHR++QhLqHVx39Hr3BsFGXGyugAaczD7v4NLx87xmMEQOyQB
ai+B4EpQqS/lEOCxyf7nRzgRxA5uB2My9L31peWt862G/LH+UixMWn7CwKW4aKul
CuV4kqVdVQHrWe966HNYnDwfGpmaEXF9W5uLu7GenhFtW6R6IRhT1oORiyqHRfEi
9lTVx/vvHK2U3Ie3RCu3ZYuSbwDhP482J8ysfSo1iCkbSXHwNNACkLQmh0s/WmOs
X2erIOUAja/Ku2DyZ6bXngXN/W/JR6Hw1IpEMeo6LxeK8VIm7F8PkCgrJCg88EEp
ZFxGGYkc3HqksgJy1tuV
=Da1O
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160623' into staging
ppc patch queue for 2016-06-23
Currently outstanding patches for spapr, target-ppc and related
devices. This batch has:
* Significant new progress towards full support for hypervisor
mode
* Assorted bugfixes
* Some preliminary patches towards dynamic DMA window support
The last involves a change to memory.c, which Paolo has said I can
take through this tree.
# gpg: Signature made Thu 23 Jun 2016 06:47:53 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160623:
ppc: Disable huge page support if it is not available for main RAM
ppc: Add P7/P8 Power Management instructions
ppc: Move exception generation code out of line
ppc: Turn a bunch of booleans from int to bool
ppc: Add real mode CI load/store instructions for P7 and P8
ppc: Rework generation of priv and inval interrupts
ppc: Fix generation if ISI/DSI vs. HV mode
ppc: Fix POWER7 and POWER8 exception definitions
ppc: fix exception model for HV mode
ppc: define a default LPCR value
ppc: Fix rfi/rfid/hrfi/... emulation
memory: Add reporting of supported page sizes
ppc: Improve emulation of THRM registers
target-ppc: Fix rlwimi, rlwinm, rlwnm again
ppc64: disable gen_pause() for linux-user mode
tests: Use '+=' to add additional tests, not '='
powerpc/mm: Update the WIMG check during H_ENTER
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
USB devices in attached state are visible to the guest. This patch adds
a QOM property for this. Write access is opt-in per device. Some
devices manage attached state automatically (usb-host, usb-serial,
usb-redir), so we can't enable write access universally but have to do
it on a case by case base. So far, no device opts in.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1465984019-28963-4-git-send-email-kraxel@redhat.com
[ minor codestyle fix ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate
uses when translating, however this information is not available outside
the translate context for various checks.
This adds a get_min_page_size callback to MemoryRegionIOMMUOps and
a wrapper for it so IOMMU users (such as VFIO) can know the minimum
actual page size supported by an IOMMU.
As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE
as fallback.
This removes vfio_container_granularity() and uses new helper in
memory_region_iommu_replay() when replaying IOMMU mappings on added
IOMMU memory region.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[dwg: Removed an unnecessary calculation]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Version: GnuPG v1
iQEcBAABAgAGBQJXaFInAAoJEJykq7OBq3PI6VsH/0Sfgbdo1RksYuQwb/y92sCW
EN+lxUZ+OLfgrc8PYgNZwfSM3rsfYhznL0MAXOeEe7Ahabi07w7DhGR8WvwfAOlI
G96FRuvrIPfv5u6U6fwS4CvG3TIHVLxfHKCsTpPUmH8U5CNx/x/tpjNiWN1dj6t+
sXybSjYHfZfiZy2tI9MFIFWCdxnF/pl0QAPhbRqc8Y/RQTDrPKRjLpz+nitN/u96
5TS7KlELyQuP91YMmLceYSmIkHbxW703h+iE2n4hov0uZCP8Jil+2Jsd3ziQSRlL
j6LqexQ2ViBGdDSfiZGYES2VPlsHOCwb4G+IgWBStfZg1ppaXENvcDzPrgrB+L4=
=eUnF
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
# gpg: Signature made Mon 20 Jun 2016 21:29:27 BST
# gpg: using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/tracing-pull-request: (42 commits)
trace: split out trace events for linux-user/ directory
trace: split out trace events for qom/ directory
trace: split out trace events for target-ppc/ directory
trace: split out trace events for target-s390x/ directory
trace: split out trace events for target-sparc/ directory
trace: split out trace events for net/ directory
trace: split out trace events for audio/ directory
trace: split out trace events for ui/ directory
trace: split out trace events for hw/alpha/ directory
trace: split out trace events for hw/arm/ directory
trace: split out trace events for hw/acpi/ directory
trace: split out trace events for hw/vfio/ directory
trace: split out trace events for hw/s390x/ directory
trace: split out trace events for hw/pci/ directory
trace: split out trace events for hw/ppc/ directory
trace: split out trace events for hw/9pfs/ directory
trace: split out trace events for hw/i386/ directory
trace: split out trace events for hw/isa/ directory
trace: split out trace events for hw/sd/ directory
trace: split out trace events for hw/sparc/ directory
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The event is described in "trace-events". Note that the "MO_AMASK" flag
is not traced, since it does not seem to affect the visible semantics of
instructions.
[s/inline inline/inline/ to fix clang build.
--Stefan]
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 146549350711.18437.726780393247474362.stgit@fimbulvetr.bsc.es
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When qemu_set_log_filename() detects an invalid file name, it reports
an error, closes the log file (if any), and starts logging to stderr
(unless daemonized or nothing is being logged).
This is wrong. Asking for an invalid log file on the command line
should be fatal. Asking for one in the monitor should fail without
messing up an existing logfile.
Fix by converting qemu_set_log_filename() to Error. Pass it
&error_fatal, except for hmp_logfile report errors.
This also permits testing without a subprocess, so do that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1466011636-6112-4-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
g_error() is not an acceptable way to report errors to the user:
$ qemu-system-x86_64 -dfilter 1000+0
** (process:17187): ERROR **: Failed to parse range in: 1000+0
Trace/breakpoint trap (core dumped)
g_assert() isn't, either:
$ qemu-system-x86_64 -dfilter 1000x+64
**
ERROR:/work/armbru/qemu/util/log.c:180:qemu_set_dfilter_ranges: assertion failed: (e == range_op)
Aborted (core dumped)
Convert qemu_set_dfilter_ranges() to Error. Rework its deeply nested
control flow. Touch up the error messages. Call it with
&error_fatal.
This also permits testing without a subprocess, so do that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1466011636-6112-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Block jobs that use additional BDSes or event loop resources need a
callback to get their affairs in order when the AioContext is switched.
Simple block jobs don't need an attach callback, they automatically work
thanks to the generic attach/detach notifiers that this patch adds.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-7-git-send-email-stefanha@redhat.com
It's possible that an AioContext notifier user was close to finishing
when .detach_aio_context() or .attached_aio_context() is called. In
that case they may call bdrv_remove_aio_context_notifier() during the
callback.
Use safe iteration to avoid crashing when the notifier list is modified
during iteration. We must not only handle the case where the current
aio notifier is removed during a callback but also the one where any
other aio notifier is removed.
The next patch adds an AioContext notifier for block jobs and they
really could be terminating just as .detach_aio_context() is invoked.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com
Block jobs are coroutines that usually perform I/O but sometimes also
sleep or yield. Currently only sleeping or yielded block jobs can be
paused. This means jobs that do not sleep or yield (using
block_job_yield()) are unaffected by block_job_pause().
Add block_job_pause_point() so that block jobs can mark quiescent points
that are suitable for pausing. This solves the problem that it can take
a block job a long time to pause if it is performing a long series of
I/O operations.
Transitioning to paused state involves a .pause()/.resume() callback.
These callbacks are used to ensure that I/O and event loop activity has
ceased while the job is at a pause point.
Note that this patch introduces a stricter pause state than previously.
The job->busy flag was incorrectly documented as a quiescent state
without I/O pending. This is violated by any job that has I/O pending
across sleep or block_job_yield(), like the mirror block job.
[Add missing block_job_should_pause() check to avoid deadlock after
job->driver->pause() in block_job_pause_point().
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-4-git-send-email-stefanha@redhat.com
The block_job_is_paused() function name is not great because callers
only use it to determine whether pausing has been requested. Rename it
to highlight those semantics and remove it from the public header file
as there are no external callers.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-3-git-send-email-stefanha@redhat.com
In ACPI 5.1 Errata, it adds GIC version in GIC distributor structure.
This is useful for guest kernel to identify which version GIC hardware
is. Update GIC distributor structure and present GIC version in MADT
table.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1465960955-17388-1-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Value matching allows Linux to boot with CONFIG_NO_HZ_IDLE=y on the
palmetto-bmc machine. Two match registers are provided for each timer.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Message-id: 1465974248-20434-1-git-send-email-andrew@aj.id.au
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Implement the GICv3 logic to recalculate the highest priority pending
interrupt for each CPU after some part of the GIC state has changed.
We avoid unnecessary full recalculation where possible.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1465915112-29272-11-git-send-email-peter.maydell@linaro.org
This patch includes the device class itself, some ID register
value functions which will be needed by both distributor
and redistributor, and some skeleton functions for handling
interrupts coming in and going out, which will be filled in
in a subsequent patch.
Signed-off-by: Shlomo Pongratz <shlomo.pongratz@huawei.com>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1465915112-29272-10-git-send-email-peter.maydell@linaro.org
[PMM: pulled this patch earlier in the sequence, and left
some code out of it for a later patch]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Move the GICv3 parent_irq and parent_fiq pointers into the
GICv3CPUState structure rather than giving them their own array.
This will make it easy to assert the IRQ and FIQ lines for a
particular CPU interface without having to know or calculate
the CPU index for the GICv3CPUState we are working on.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1465915112-29272-8-git-send-email-peter.maydell@linaro.org
Add state information to GICv3 object structure and implement
arm_gicv3_common_reset().
This commit includes accessor functions for the fields which are
stored as bitmaps in uint32_t arrays.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1465915112-29272-7-git-send-email-peter.maydell@linaro.org
[PMM: significantly overhauled:
* Add missing qom/cpu.h include
* Remove legacy-only state fields (we can add them later if/when we add
legacy emulation)
* Use arrays of uint32_t to store the various distributor bitmaps,
and provide accessor functions for the various set/test/etc operations
* Add various missing register offset #defines
* Accessor macros which combine distributor and redistributor behaviour
removed
* Fields in state structures renamed to match architectural register names
* Corrected the reset value for GICR_IENABLER0 since we don't support
legacy mode
* Added ARM_LINUX_BOOT_IF interface for "we are directly booting a kernel in
non-secure" so that we can fake up the firmware-mandated reconfiguration
only when we need it
]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
A half-shuffle operation takes a word with zeros in the high half:
0000 0000 0000 0000 ABCD EFGH IJKL MNOP
and spreads the bits out so they are in every other bit of the word:
0A0B 0C0D 0E0F 0G0H 0I0J 0K0L 0M0N 0O0P
A half-unshuffle performs the reverse operation.
Provide functions in bitops.h which implement these operations
for 32-bit and 64-bit inputs, and add tests for them.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1465915112-29272-3-git-send-email-peter.maydell@linaro.org
Define a VMSTATE_UINT64_2DARRAY macro, to go with the ones we
already have for other type sizes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1465915112-29272-2-git-send-email-peter.maydell@linaro.org
If the same GlobalProperty struct is registered twice, the list
entry gets corrupted, making tqe_next points to itself, and
qdev_prop_set_globals() gets stuck in a loop. The bug can be
easily reproduced by running:
$ qemu-system-x86_64 -rtc-td-hack -rtc-td-hack
Change global_props to use GList instead of queue.h, making the
code simpler and able to deal with properties being registered
twice.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Here's the current accumulated set of spapr, ppc and related patches.
* The big thing in here is CPU hotplug for spapr
- This includes a number of acked generic changes adding new
infrastructure for hotplugging cpu cores
* A number of TCG bug fixes are also included
* This adds a new testcase to make it harder to accidentally break
Macintosh (and other openbios) platforms
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXY5oxAAoJEGw4ysog2bOSMu0QAN5e3H3qt2or5dnxS6/bC7QZ
9UrDfCFiJ1HppWfesj0lSAQvHlkEdkml8O7fxeG7bzHQDdaIwy4+q2RIvDiAgnOW
u1he5yhaN6PDLzr9zowtfUySWz6bixMyrBeY2I6/hOKgPKVeifryudmlRNJ5zc1R
edwiSI4saNBYT2+KlaaQW26g765Xk0CG0FjxAyTTPH62WsUvxB1haSKHd7QTn57c
ovdrCTHzfhXaQ0WmbDvIz2v0kY6lO71Re9DRFQmlZktVtGr6E07afr+2VzBTuYLb
r6iHyi/Ed7x+kncH/5F52K7HIddNEp0gbZJgWnfM0gUSIAnya65FP/Qgkc4RdGFd
yeR6V99s/i5dU5gxoFK0MKNn61/QLLUZJiS9l6XNs92mDZn5C03C/edINOv8tbRe
TG+/TghwoFfojodi3cEM+TrDZv7E73RgWjoLY/Eq29KOAuOQaWV0ctbPgnKOZApq
u2LwhPjzU8ae4LCA1c0sMZnMLnskplnNWhmz9sLyvsfy12Lau86/PnT43/rGZ2uu
lW1SEgrk1zpYkOdbyAB01ZOPw0bQVL8uHQpm7df4/RtiyJuWCG+nPBXJoWXq3hN2
VO5PEd33Ec8Cdln9modOwHOIkIgH5zMsG8yLcvHoAeDWccPUkr9Z1mv7vZ4ENQFO
JaOyIOC8JKJJgA/2DQA+
=AGZ4
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160617' into staging
ppc patch queue for 2016-06-17
Here's the current accumulated set of spapr, ppc and related patches.
* The big thing in here is CPU hotplug for spapr
- This includes a number of acked generic changes adding new
infrastructure for hotplugging cpu cores
* A number of TCG bug fixes are also included
* This adds a new testcase to make it harder to accidentally break
Macintosh (and other openbios) platforms
# gpg: Signature made Fri 17 Jun 2016 07:35:29 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160617:
spapr: implement query-hotpluggable-cpus callback
hmp: Add 'info hotpluggable-cpus' HMP command
QMP: Add query-hotpluggable-cpus
spapr: CPU hot unplug support
spapr: CPU hotplug support
spapr: convert boot CPUs into CPU core devices
spapr: Move spapr_cpu_init() to spapr_cpu_core.c
spapr: Abstract CPU core device and type specific core devices
qom: API to get instance_size of a type
spapr_drc: Prevent detach racing against attach for CPU DR
xics,xics_kvm: Handle CPU unplug correctly
cpu: Abstract CPU core type
qdev: hotplug: Introduce HotplugHandler.pre_plug() callback
target-ppc: Fix rlwimi, rlwinm, rlwnm
vfio: Fix broken EEH
target-ppc: Bug in BookE wait instruction
ppc / sparc: Add a tester for checking whether OpenBIOS runs successfully
hw/ppc/spapr: Silence deprecation message in qtest mode
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Beginning of reconnect support for vhost-user.
Misc cleanups and fixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXY0Q3AAoJECgfDbjSjVRpkVcH/2gTHRE9yUoWe6ROvPV67BKx
8Iy9GzJ3BMO3RolVZEA5KXIevn5TG+pV274BZEuXMD3AL/molv279p0o/gvBYoqq
V0jNH2MO+MV6D9OzhUXcgWSejvybF5W07ojPDU/hlgtFXPZFbJDyt95MWaLiilOg
cCtTuRqgrrRaypcnnk/CIDbC+Ek2kAYdgQHQbfj9ihle3TWO8R0bSXnFqSaqCIkM
4slMlv8y82fODeiO83nkpfAP1NCnfnRC8r8Gv7hbEUTlZQntavx5DuYdiIx6nsJE
W0g+Gpe1o0+jRuMnucGIUZvqzZ0e/I0wZuV16Nsfx+Rbd5+4CzTxZda5Qb05v7I=
=BHbJ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc, pci, virtio: new features, cleanups, fixes
Beginning of reconnect support for vhost-user.
Misc cleanups and fixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 17 Jun 2016 01:28:39 BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
MAINTAINERS: add Marcel to PCI
msi_init: change return value to 0 on success
fix some coding style problems
pci core: assert ENOSPC when add capability
test: start vhost-user reconnect test
tests: append i386 tests
vhost-net: save & restore vring enable state
vhost-net: save & restore vhost-user acked features
vhost-net: do not crash if backend is not present
vhost-user: disconnect on start failure
qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
tests/vhost-user-bridge: workaround stale vring base
tests/vhost-user-bridge: add client mode
vhost-user: add ability to know vhost-user backend disconnection
pci: fix pci_requester_id()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Conflicts:
tests/Makefile.include
It will allow mgmt to query present and hotpluggable CPU objects,
it is required from a target platform that wishes to support command
to implement and set MachineClass.query_hotpluggable_cpus callback,
which will return a list of possible CPU objects with options that
would be needed for hotplugging possible CPU objects.
There are:
'type': 'str' - QOM CPU object type for usage with device_add
'vcpus-count': 'int' - number of logical VCPU threads per
CPU object (mgmt needs to know)
and a set of optional fields that are to used for hotplugging a CPU
objects and would allows mgmt tools to know what/where it could be
hotplugged;
[node],[socket],[core],[thread]
For present CPUs there is a 'qom-path' field which would allow mgmt to
inspect whatever object/abstraction the target platform considers
as CPU object.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Remove the CPU core device by removing the underlying CPU thread devices.
Hot removal of CPU for sPAPR guests is achieved by sending the hot unplug
notification to the guest. Release the vCPU object after CPU hot unplug so
that vCPU fd can be parked and reused.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Set up device tree entries for the hotplugged CPU core and use the
exising RTAS event logging infrastructure to send CPU hotplug notification
to the guest.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduce sPAPRMachineClass.dr_cpu_enabled to indicate support for
CPU core hotplug. Initialize boot time CPUs as core deivces and prevent
topologies that result in partially filled cores. Both of these are done
only if CPU core hotplug is supported.
Note: An unrelated change in the call to xics_system_init() is done
in this patch as it makes sense to use the local variable smt introduced
in this patch instead of kvmppc_smt_threads() call here.
TODO: We derive sPAPR core type by looking at -cpu <model>. However
we don't take care of "compat=" feature yet for boot time as well
as hotplug CPUs.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Start consolidating CPU init related routines in spapr_cpu_core.c. As
part of this, move spapr_cpu_init() and its dependencies from spapr.c
to spapr_cpu_core.c
No functionality change in this patch.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Rename TIMEBASE_FREQ to SPAPR_TIMEBASE_FREQ, since it's now in a
public(ish) header]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add sPAPR specific abastract CPU core device that is based on generic
CPU core device. Use this as base type to create sPAPR CPU specific core
devices.
TODO:
- Add core types for other remaining CPU types
- Handle CPU model alias correctly
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add an API object_type_get_size(const char *typename) that returns the
instance_size of the give typename.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If a CPU is hot removed while hotplug of the same is still in progress,
the guest crashes. Prevent this by ensuring that detach is done only
after attach has completed.
The existing code already prevents such race for PCI hotplug. However
given that CPU is a logical DR unlike PCI and starts with ISOLATED
state, we need a logic that works for CPU too.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[Don't set awaiting_attach for PCI devices]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
XICS is setup for each CPU during initialization. Provide a routine
to undo the same when CPU is unplugged. While here, move ss->cs management
into xics from xics_kvm since there is nothing KVM specific in it.
Also ensure xics reset doesn't set irq for CPUs that are already unplugged.
This allows reboot of a VM that has undergone CPU hotplug and unplug
to work correctly.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add an abstract CPU core type that could be used by machines that want
to define and hotplug CPUs in core granularity.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[Integer core property]
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
[dwg: changed property names to 'core-id' and 'nr-threads']
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
pre_plug callback is to be called before device.realize() is executed.
This would allow to check/set device's properties from HotplugHandler.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)
Restore the vring state when the vhost-user backend is started, so it
can process the ring.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.
To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The patch introduces qemu_chr_disconnect(). The function is used for
closing a fd accepted by listen fd. Though we already have qemu_chr_delete(),
but it closes not only accepted fd but also listen fd. This new function
is used when we still want to keep listen fd.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This fix SID verification failure when IOMMU IR is enabled with PCI
bridges. Existing pci_requester_id() is more like getting BDF info
only. Renaming it to pci_get_bdf(). Meanwhile, we provide the correct
implementation to get requester ID. VT-d spec 5.1.1 is a good reference
to go, though it talks only about interrupt delivery, the rule works
exactly the same for non-interrupt cases.
Currently, there are three use cases for pci_requester_id():
- PCIX status bits: here we need BDF only, not requester ID. Replacing
with pci_get_bdf().
- PCIe Error injection and MSI delivery: for both these cases, we are
looking for requester IDs. Here we should use the new impl.
To avoid a PCI walk every time we send MSI message, one requester_id
cache field is added to PCIDevice to cache the result when initialize
PCI device.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
While doing DMA read into ESP command buffer 's->cmdbuf', it could
write past the 's->cmdbuf' area, if it was transferring more than 16
bytes. Increase the command buffer size to 32, which is maximum when
's->do_cmd' is set, and add a check on 'len' to avoid OOB access.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Declare a constant and use that when determining if an export
name fits within the constraints we are willing to support.
Note that upstream NBD recently documented that clients MUST
support export names of 256 bytes (not including trailing NUL),
and SHOULD support names up to 4096 bytes. 4096 is a bit big
(we would lose benefits of stack-allocation of a name array),
and we already have other limits in place (for example, qcow2
snapshot names are clamped around 1024). So for now, just
stick to the required minimum, as that's easier to audit than
a full-scale support for larger names.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-12-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These structs are never used to represent the bytes that go over the
network. The big-endian network data is built into a uint8_t array
in nbd_{receive,send}_{request,reply}. Remove the unused magic field,
reorder the struct to avoid holes, and remove the packed attribute.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h. Include it in
sysemu/os-posix.h and remove it from everywhere else.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().
First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).
Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().
Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:
- If blockdev-mirror is used, we can assume the user made sure that the
target already has the correct backing chain. In particular, we should
not try to open a backing file if the target does not have any yet.
- If drive-mirror with mode=absolute-paths is used, we can and should
reuse the already existing chain of nodes that the source BDS is in.
In case of sync=full, no backing BDS is required; with sync=top, we
just link the source's backing BDS to the target, and with sync=none,
we use the source BDS as the target's backing BDS.
We should not try to open these backing files anew because this would
lead to two BDSs existing per physical file in the backing chain, and
we would like to avoid such concurrent access.
- If drive-mirror with mode=existing is used, we have to use the
information provided in the physical image file which means opening
the target's backing chain completely anew, just as it has been done
already.
If the target's backing chain shares images with the source, this may
lead to multiple BDSs per physical image file. But since we cannot
reliably ascertain this case, there is nothing we can do about it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-3-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
It is always true for open images now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This allows drivers to share code between normal I/O and vmstate
accesses.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This brings it in line with .bdrv_save_vmstate().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
We already have a byte-based bdrv_pwritev(), but the read counterpart
was still missing. This commit adds it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
In a first step to convert the common I/O path to work on bytes rather
than sectors, this converts the copy-on-read logic that is used by
bdrv_aligned_preadv().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Add a new BDRV_REQ_MASK constant, and use it to make sure that
caller flags are always valid.
Tested with 'make check' and with qemu-iotests on both '-raw'
and '-qcow2'; the only failure turned up was fixed in the
previous commit.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Apply the following renames for starting incoming migration:
process_incoming_migration -> migration_fd_process_incoming
migration_set_incoming_channel -> migration_channel_process_incoming
migration_tls_set_incoming_channel -> migration_tls_channel_process_incoming
and for starting outgoing migration:
migration_set_outgoing_channel -> migration_channel_connect
migration_tls_set_outgoing_channel -> migration_tls_channel_connect
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1464776234-9910-3-git-send-email-berrange@redhat.com
Message-Id: <1464776234-9910-3-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
On the source, add a count of page requests received from the
destination.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-id: 1465816605-29488-4-git-send-email-dgilbert@redhat.com
Message-Id: <1465816605-29488-4-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
I looked at a dozen Intel CPU that have this CPUID and all of them
always had Core offset as 1 (a wasted bit when hyperthreading is
disabled) and Package offset at least 4 (wasted bits at <= 4 cores).
QEMU uses more compact IDs and it doesn't make much sense to change it
now. I keep the SMT and Core sub-leaves even if there is just one
thread/core; it makes the code simpler and there should be no harm.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This adds the DP and the DPDMA to the Zynq MP platform.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Tested-By: Hyun Kwon <hyun.kwon@xilinx.com>
Message-id: 1465833014-21982-10-git-send-email-fred.konrad@greensocs.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This is the implementation of the DisplayPort.
It has an aux-bus to access dpcd and edid.
Graphic plane is connected to the channel 3.
Video plane is connected to the channel 0.
Audio stream are connected to the channels 4 and 5.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Tested-By: Hyun Kwon <hyun.kwon@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 1465833014-21982-9-git-send-email-fred.konrad@greensocs.com
[PMM: fixed format strings]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This is the implementation of the DPDMA.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Tested-By: Hyun Kwon <hyun.kwon@xilinx.com>
Message-id: 1465833014-21982-8-git-send-email-fred.konrad@greensocs.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Implement an I2C slave which implements DDC and returns the
EDID data for an attached monitor.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Tested-by: Hyun Kwon <hyun.kwon@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Message-id: 1465833014-21982-7-git-send-email-fred.konrad@greensocs.com
- Rebased on the current master.
- Modified for QOM.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Tested-By: Hyun Kwon <hyun.kwon@xilinx.com>
[PMM: actually wire up the vmstate to dc->vmsd]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This introduces dpcd module.
It wires on a aux-bus and can be accessed by the driver to get lane-speed, etc.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Tested-By: Hyun Kwon <hyun.kwon@xilinx.com>
Message-id: 1465833014-21982-6-git-send-email-fred.konrad@greensocs.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This introduces a new bus: aux-bus.
It contains an address space for aux slaves devices and a bridge to an I2C bus
for I2C through AUX transactions.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Tested-By: Hyun Kwon <hyun.kwon@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 1465833014-21982-5-git-send-email-fred.konrad@greensocs.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Most of the control flow logic between send and recv (error checking
etc) is the same. Factor this out into a common send_recv() API.
This is then usable by clients, where the control logic for send
and receive differs only by a boolean. E.g.
if (send)
i2c_send(...):
else
i2c_recv(...);
becomes:
i2c_send_recv(... , send);
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Message-id: 1465833014-21982-4-git-send-email-fred.konrad@greensocs.com
Changes from FK:
* Rebased on master.
* Rebased on my i2c broadcast patch.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add a virtual PMU device for virt machine while use PPI 7 for PMU
overflow interrupt number.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1465267577-1808-3-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Let's introduce a CssDevId to handle device ids of the xx.x.xxxx
type used for channel devices. This has some benefits:
- We can use them in virtio-ccw and split the validity checks for
a channel device id in general from the constraint checking
within the virtio-ccw scope.
- We can reuse the device id type for future non-virtio channel
devices.
While we're at it, improve the validity checks and disallow e.g.
trailing characters.
Suggested-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
According to the platform specification, under certain conditions,
pending IO interruptions have to be cleared. Let's add an interface
for that.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Memory hotplug can fail for some combinations of RAM and maxmem when
DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
on maximum addressable memory returned by guest and this value is currently
being calculated wrongly by the guest kernel routine memory_hotplug_max().
While there is an attempt to fix the guest kernel, this patch works
around the problem within QEMU itself.
memory_hotplug_max() routine in the guest kernel arrives at max
addressable memory by multiplying lmb-size with the lmb-count obtained
from ibm,dynamic-memory property. There are two assumptions here:
- All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
where only hot-pluggable LMBs are present in this property.
- The memory area comprising of RAM and hotplug region is contiguous: This
needn't be true always for PowerKVM as there can be gap between
boot time RAM and hotplug region.
To work around this guest kernel bug, ensure that ibm,dynamic-memory
has information about all the LMBs (RMA, boot-time LMBs, future
hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
hotpluggable region).
RMA is represented separately by memory@0 node. Hence mark RMA LMBs
and also the LMBs for the gap b/n RAM and hotpluggable region as
reserved and as having no valid DRC so that these LMBs are not considered
by the guest.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This ensures that the underlying memory is marked dirty once the transfer
is complete and resolves cache coherency problems under MacOS 9.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We need the PPC_FEATURE2_HAS_HTM bit in a subsequent patch, so
add the PowerPC AT_HWCAP2 definitions.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
OpenSSL's libcrypto always defines AES symbols with the same names as
qemu's local aes code. This is problematic when enabling at least curl
as that frequently also uses libcrypto. It might not be noticed when
running, but if you try to statically link, everything falls down.
An example snippet:
LINK qemu-nbd
.../libcrypto.a(aes-x86_64.o): In function 'AES_encrypt':
(.text+0x460): multiple definition of 'AES_encrypt'
crypto/aes.o:aes.c:(.text+0x670): first defined here
.../libcrypto.a(aes-x86_64.o): In function 'AES_decrypt':
(.text+0x9f0): multiple definition of 'AES_decrypt'
crypto/aes.o:aes.c:(.text+0xb30): first defined here
.../libcrypto.a(aes-x86_64.o): In function 'AES_cbc_encrypt':
(.text+0xf90): multiple definition of 'AES_cbc_encrypt'
crypto/aes.o:aes.c:(.text+0xff0): first defined here
collect2: error: ld returned 1 exit status
.../qemu-2.6.0/rules.mak:105: recipe for target 'qemu-nbd' failed
make: *** [qemu-nbd] Error 1
The aes.h header has redefines already for FreeBSD, but go ahead and
enable that for everyone since there's no real good reason to not use
a namespace all the time.
Signed-off-by: Mike Frysinger <vapier@chromium.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This wrapper for machine_usb(current_machine) is not necessary,
replace all usages of usb_enabled() with machine_usb().
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: qemu-arm@nongnu.org
Cc: qemu-ppc@nongnu.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1465419025-21519-3-git-send-email-ehabkost@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Having a fixed-size hash table for keeping track of all translation blocks
is suboptimal: some workloads are just too big or too small to get maximum
performance from the hash table. The MRU promotion policy helps improve
performance when the hash table is a little undersized, but it cannot
make up for severely undersized hash tables.
Furthermore, frequent MRU promotions result in writes that are a scalability
bottleneck. For scalability, lookups should only perform reads, not writes.
This is not a big deal for now, but it will become one once MTTCG matures.
The appended fixes these issues by using qht as the implementation of
the TB hash table. This solution is superior to other alternatives considered,
namely:
- master: implementation in QEMU before this patchset
- xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
- xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
MRU is implemented here by adding an intermediate struct
that contains the u32 hash and a pointer to the TB; this
allows us, on an MRU promotion, to copy said struct (that is not
at the head), and put this new copy at the head. After a grace
period, the original non-head struct can be eliminated, and
after another grace period, freed.
- qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
no MRU for lookups; MRU for inserts.
The appended solution is the following:
- qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
no MRU for lookups; MRU for inserts.
The plots below compare the considered solutions. The Y axis shows the
boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
sweeps the number of buckets (or initial number of buckets for qht-autoresize).
The plots in PNG format (and with errorbars) can be seen here:
http://imgur.com/a/Awgnq
Each test runs 5 times, and the entire QEMU process is pinned to a
single core for repeatability of results.
Host: Intel Xeon E5-2690
28 ++------------+-------------+-------------+-------------+------------++
A***** + + + master **A*** +
27 ++ * xxhash ##B###++
| A******A****** xxhash-rcu $$C$$$ |
26 C$$ A******A****** qht-fixed-nomru*%%D%%%++
D%%$$ A******A******A*qht-dyn-mru A*E****A
25 ++ %%$$ qht-dyn-nomru &&F&&&++
B#####% |
24 ++ #C$$$$$ ++
| B### $ |
| ## C$$$$$$ |
23 ++ # C$$$$$$ ++
| B###### C$$$$$$ %%%D
22 ++ %B###### C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C
| D%%%%%%B###### @E@@@@@@ %%%D%%%@@@E@@@@@@E
21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B
+ E@@@ F&&& + E@ + F&&& + +
20 ++------------+-------------+-------------+-------------+------------++
14 16 18 20 22 24
log2 number of buckets
Host: Intel i7-4790K
14.5 ++------------+------------+-------------+------------+------------++
A** + + + master **A*** +
14 ++ ** xxhash ##B###++
13.5 ++ ** xxhash-rcu $$C$$$++
| qht-fixed-nomru %%D%%% |
13 ++ A****** qht-dyn-mru @@E@@@++
| A*****A******A****** qht-dyn-nomru &&F&&& |
12.5 C$$ A******A******A*****A****** ***A
12 ++ $$ A*** ++
D%%% $$ |
11.5 ++ %% ++
B### %C$$$$$$ |
11 ++ ## D%%%%% C$$$$$ ++
| # % C$$$$$$ |
10.5 F&&&&&&B######D%%%%% C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$ $$$C
10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B
+ F&& D%%%%%%B######B######B#####B###@@@D%%% +
9.5 ++------------+------------+-------------+------------+------------++
14 16 18 20 22 24
log2 number of buckets
Note that the original point before this patch series is X=15 for "master";
the little sensitivity to the increased number of buckets is due to the
poor hashing function in master.
xxhash-rcu has significant overhead due to the constant churn of allocating
and deallocating intermediate structs for implementing MRU. An alternative
would be do consider failed lookups as "maybe not there", and then
acquire the external lock (tb_lock in this case) to really confirm that
there was indeed a failed lookup. This, however, would not be enough
to implement dynamic resizing--this is more complex: see
"Resizable, Scalable, Concurrent Hash Tables via Relativistic
Programming" by Triplett, McKenney and Walpole. This solution was
discarded due to the very coarse RCU read critical sections that we have
in MTTCG; resizing requires waiting for readers after every pointer update,
and resizes require many pointer updates, so this would quickly become
prohibitive.
qht-fixed-nomru shows that MRU promotion is advisable for undersized
hash tables.
However, qht-dyn-mru shows that MRU promotion is not important if the
hash table is properly sized: there is virtually no difference in
performance between qht-dyn-nomru and qht-dyn-mru.
Before this patch, we're at X=15 on "xxhash"; after this patch, we're at
X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we
can achieve with optimum sizing of the hash table, while keeping the hash
table scalable for readers.
The improvement we get before and after this patch for booting debian jessie
with arm-softmmu is:
- Intel Xeon E5-2690: 10.5% less time
- Intel i7-4790K: 5.2% less time
We could get this same improvement _for this particular workload_ by
statically increasing the size of the hash table. But this would hurt
workloads that do not need a large hash table. The dynamic (upward)
resizing allows us to start small and enlarge the hash table as needed.
A quick note on downsizing: the table is resized back to 2**15 buckets
on every tb_flush; this makes sense because it is not guaranteed that the
table will reach the same number of TBs later on (e.g. most bootup code is
thrown away after boot); it makes sense to grow the hash table as
more code blocks are translated. This also avoids the complication of
having to build downsizing hysteresis logic into qht.
Reviewed-by: Sergey Fedorov <serge.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This is a fast, scalable chained hash table with optional auto-resizing, allowing
reads that are concurrent with reads, and reads/writes that are concurrent
with writes to separate buckets.
A hash table with these features will be necessary for the scalability
of the ongoing MTTCG work; before those changes arrive we can already
benefit from the single-threaded speedup that qht also provides.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-11-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Sometimes it is useful to have a quick histogram to represent a certain
distribution -- for example, when investigating a performance regression
in a hash table due to inadequate hashing.
The appended allows us to easily represent a distribution using Unicode
characters. Further, the data structure keeping track of the distribution
is so simple that obtaining its values for off-line processing is trivial.
Example, taking the last 10 commits to QEMU:
Characters in commit title Count
-----------------------------------
39 1
48 1
53 1
54 2
57 1
61 1
67 1
78 1
80 1
qdist_init(&dist);
qdist_inc(&dist, 39);
[...]
qdist_inc(&dist, 80);
char *str = qdist_pr(&dist, 9, QDIST_PR_LABELS);
// -> [39.0,43.6)▂▂ █▂ ▂ ▄[75.4,80.0]
g_free(str);
char *str = qdist_pr(&dist, 4, QDIST_PR_LABELS);
// -> [39.0,49.2)▁█▁▁[69.8,80.0]
g_free(str);
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-9-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For some workloads such as arm bootup, tb_phys_hash is performance-critical.
The is due to the high frequency of accesses to the hash table, originated
by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
More info:
https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html
To dig further into this I modified an arm image booting debian jessie to
immediately shut down after boot. Analysis revealed that quite a bit of time
is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
results in very uneven loading of chains in the hash table's buckets;
the longest observed chain had ~550 elements.
The appended addresses this with two changes:
1) Use xxhash as the hash table's hash function. xxhash is a fast,
high-quality hashing function.
2) Feed the hashing function with not just tb_phys, but also pc and flags.
This improves performance over using just tb_phys for hashing, since that
resulted in some hash buckets having many TB's, while others getting very few;
with these changes, the longest observed chain on a single hash bucket is
brought down from ~550 to ~40.
Tests show that the other element checked for in tb_find_physical,
cs_base, is always a match when tb_phys+pc+flags are a match,
so hashing cs_base is wasteful. It could be that this is an ARM-only
thing, though. UPDATE:
On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote:
> The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB
> consisting of only a delay slot).
> It may well still turn out to be reasonable to ignore cs_base for hashing.
BTW, after this change the hash table should not be called "tb_hash_phys"
anymore; this is addressed later in this series.
This change gives consistent bootup time improvements. I tested two
host machines:
- Intel Xeon E5-2690: 11.6% less time
- Intel i7-4790K: 19.2% less time
Increasing the number of hash buckets yields further improvements. However,
using a larger, fixed number of buckets can degrade performance for other
workloads that do not translate as many blocks (600K+ for debian-jessie arm
bootup). This is dealt with later in this series.
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-8-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This will be used by upcoming changes for hashing the tb hash.
Add this into a separate file to include the copyright notice from
xxhash.
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-7-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
[Rewritten. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Emilio's additions: use TAS instead of atomic_xchg; emit acquire/release
barriers; return bool from trylock; call cpu_relax() while spinning;
optimize for uncontended locks by acquiring the lock with TAS instead
of TATAS; add qemu_spin_locked().]
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-6-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Taken from the linux kernel.
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-5-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It is a more appropriate name, now that the mutex embedded
in the seqlock is gone.
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-4-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This option is unused; besides, it bloats the struct when not needed.
Let's just let writers define their own locks elsewhere.
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-3-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>