LoPAPR dictates that during system reset all DMA windows must be removed
and the default DMA32 window must be created so does the patch.
At the moment there is just one window supported so no change in
behaviour is expected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We are going to have multiple DMA windows at different offsets on
a PCI bus. For the sake of migration, we will have as many TCE table
objects pre-created as many windows supported.
So we need a way to map windows dynamically onto a PCI bus
when migration of a table is completed but at this stage a TCE table
object does not have access to a PHB to ask it to map a DMA window
backed by just migrated TCE table.
This adds a "root" memory region (UINT64_MAX long) to the TCE object.
This new region is mapped on a PCI bus with enabled overlapping as
there will be one root MR per TCE table, each of them mapped at 0.
The actual IOMMU memory region is a subregion of the root region and
a TCE table enables/disables this subregion and maps it at
the specific offset inside the root MR which is 1:1 mapping of
a PCI address space.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The source guest could have reallocated the default TCE table and
migrate bigger/smaller table. This adds reallocation in post_load()
if the default table size is different on source and destination.
This adds @bus_offset, @page_shift to the migration stream as
a subsection so when DDW is added, migration to older machines will
still be possible. As @bus_offset and @page_shift are not used yet,
this makes no change in behavior.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently TCE tables are created once at start and their sizes never
change. We are going to change that by introducing a Dynamic DMA windows
support where DMA configuration may change during the guest execution.
This changes spapr_tce_new_table() to create an empty zero-size IOMMU
memory region (IOMMU MR). Only LIOBN is assigned by the time of creation.
It still will be called once at the owner object (VIO or PHB) creation.
This introduces an "enabled" state for TCE table objects, some
helper functions are added:
- spapr_tce_table_enable() receives TCE table parameters, stores in
sPAPRTCETable and allocates a guest view of the TCE table
(in the user space or KVM) and sets the correct size on the IOMMU MR;
- spapr_tce_table_disable() disposes the table and resets the IOMMU MR
size; it is made public as the following DDW code will be using it.
This changes the PHB reset handler to do the default DMA initialization
instead of spapr_phb_realize(). This does not make differenct now but
later with more than just one DMA window, we will have to remove them all
and create the default one on a system reset.
No visible change in behaviour is expected except the actual table
will be reallocated every reset. We might optimize this later.
The other way to implement this would be dynamically create/remove
the TCE table QOM objects but this would make migration impossible
as the migration code expects all QOM objects to exist at the receiver
so we have to have TCE table objects created when migration begins.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At the moment presence of vfio-pci devices on a bus affect the way
the guest view table is allocated. If there is no vfio-pci on a PHB
and the host kernel supports KVM acceleration of H_PUT_TCE, a table
is allocated in KVM. However, if there is vfio-pci and we do yet not
KVM acceleration for these, the table has to be allocated by
the userspace. At the moment the table is allocated once at boot time
but next patches will reallocate it.
This moves kvmppc_create_spapr_tce/g_malloc0 and their counterparts
to helpers.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The user could have picked LIOBN via the CLI but the device tree
rendering code would still use the value derived from the PHB index
(which is the default fallback if LIOBN is not set in the CLI).
This replaces SPAPR_PCI_LIOBN() with the actual DMA LIOBN value.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are possible racing situations involving hotplug events and
guest migration. For cases where a hotplug event is migrated, or
the guest is in the process of fetching device tree at the time of
migration, we need to ensure the device tree is created and
associated with the corresponding DRC for devices that were
hotplugged on the source, but 'coldplugged' on the target.
Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch adds check for negative return value from get_image_size(),
where it is missing. It avoids unnecessary two function calls.
Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
when new VFIO listener is added, all existing IOMMU mappings are
replayed. However there is a problem that the base address of
an IOMMU memory region (IOMMU MR) is ignored which is not a problem
for the existing user (which is pseries) with its default 32bit DMA
window starting at 0 but it is if there is another DMA window.
This stores the IOMMU's offset_within_address_space and adjusts
the IOVA before calling vfio_dma_map/vfio_dma_unmap.
As the IOMMU notifier expects IOVA offset rather than the absolute
address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before
calling notifier(s).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Switch to adding compat properties incrementaly instead of
completly overwriting compat_props per machine type.
That removes data duplication which we have due to nested
[PC|SPAPR]_COMPAT_* macros.
It also allows to set default device properties from
default foo_machine_options() hook, which will be used
in following patch for putting VMGENID device as
a function if ISA bridge on pc/q35 machines.
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: Fixed CCW_COMPAT_* and PC_COMPAT_0_* defines]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
exec-all.h contains TCG-specific definitions. It is not needed outside
TCG-specific files such as translate.c, exec.c or *helper.c.
One generic function had snuck into include/exec/exec-all.h; move it to
include/qom/cpu.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Memory barriers are needed also by Xen and, when the ioeventfd
bugs are fixed, by TCG as well.
sysemu/kvm.h is not anymore needed in sysemu/dma.h, move it to
the actual users.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reserve this to CPU state serialization.
Luckily, they were only used by sPAPR devices and these are ppc64
only. So there is no change to migration format.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This changes a cpu.h dependency for hw/ppc/ppc.h into a cpu-qom.h
dependency. For it to compile we also need to clean up a few unused
definitions.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The semantics of the list visit are somewhat baroque, with the
following pseudocode when FooList is used:
start()
for (prev = head; cur = next(prev); prev = &cur) {
visit(&cur->value)
}
Note that these semantics (advance before visit) requires that
the first call to next() return the list head, while all other
calls return the next element of the list; that is, every visitor
implementation is required to track extra state to decide whether
to return the input as-is, or to advance. It also requires an
argument of 'GenericList **' to next(), solely because the first
iteration might need to modify the caller's GenericList head, so
that all other calls have to do a layer of dereferencing.
Thankfully, we only have two uses of list visits in the entire
code base: one in spapr_drc (which completely avoids
visit_next_list(), feeding in integers from a different source
than uint8List), and one in qapi-visit.py. That is, all other
list visitors are generated in qapi-visit.c, and share the same
paradigm based on a qapi FooList type, so we can refactor how
lists are laid out with minimal churn among clients.
We can greatly simplify things by hoisting the special case
into the start() routine, and flipping the order in the loop
to visit before advance:
start(head)
for (tail = *head; tail; tail = next(tail)) {
visit(&tail->value)
}
With the simpler semantics, visitors have less state to track,
the argument to next() is reduced to 'GenericList *', and it
also becomes obvious whether an input visitor is allocating a
FooList during visit_start_list() (rather than the old way of
not knowing if an allocation happened until the first
visit_next_list()). As a minor drawback, we now allocate in
two functions instead of one, and have to pass the size to
both functions (unless we were to tweak the input visitors to
cache the size to start_list for reuse during next_list, but
that defeats the goal of less visitor state).
The signature of visit_start_list() is chosen to match
visit_start_struct(), with the new parameters after 'name'.
The spapr_drc case is a virtual visit, done by passing NULL for
list, similarly to how NULL is passed to visit_start_struct()
when a qapi type is not used in those visits. It was easy to
provide these semantics for qmp-output and dealloc visitors,
and a bit harder for qmp-input (several prerequisite patches
refactored things to make this patch straightforward). But it
turned out that the string and opts visitors munge enough other
state during visit_next_list() to make it easier to just
document and require a GenericList visit for now; an assertion
will remind us to adjust things if we need the semantics in the
future.
Several pre-requisite cleanup patches made the reshuffling of
the various visitors easier; particularly the qmp input visitor.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-24-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
As mentioned in previous patches, we want to call visit_end_struct()
functions unconditionally, so that visitors can release resources
tied up since the matching visit_start_struct() without also having
to worry about error priority if more than one error occurs.
Even though error_propagate() can be safely used to ignore a second
error during cleanup caused by a first error, it is simpler if the
cleanup cannot set an error. So, split out the error checking
portion (basically, input visitors checking for unvisited keys) into
a new function visit_check_struct(), which can be safely skipped if
any earlier errors are encountered, and leave the cleanup portion
(which never fails, but must be called unconditionally if
visit_start_struct() succeeded) in visit_end_struct().
Generated code in qapi-visit.c has diffs resembling:
|@@ -59,10 +59,12 @@ void visit_type_ACPIOSTInfo(Visitor *v,
| goto out_obj;
| }
| visit_type_ACPIOSTInfo_members(v, obj, &err);
|- error_propagate(errp, err);
|- err = NULL;
|+ if (err) {
|+ goto out_obj;
|+ }
|+ visit_check_struct(v, &err);
| out_obj:
|- visit_end_struct(v, &err);
|+ visit_end_struct(v);
| out:
and in qapi-event.c:
@@ -47,7 +47,10 @@ void qapi_event_send_acpi_device_ost(ACP
| goto out;
| }
| visit_type_q_obj_ACPI_DEVICE_OST_arg_members(v, ¶m, &err);
|- visit_end_struct(v, err ? NULL : &err);
|+ if (!err) {
|+ visit_check_struct(v, &err);
|+ }
|+ visit_end_struct(v);
| if (err) {
| goto out;
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-20-git-send-email-eblake@redhat.com>
[Conflict with a doc fixup resolved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Now that the QMP output visitor supports an explicit null
output, we should utilize it to make it easier to diagnose
the difference between a missing fdt ('null') vs. a
present-but-empty one ('{}').
(Note that this reverts the behavior of commit ab8bf1d, taking
us back to the behavior of commit 6c2f9a1 [which in turn
stemmed from a crash fix in 1d10b44]; but that this time,
the change is intentional and not an accidental side-effect.)
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1461879932-9020-17-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
CPU/memory resources can be signalled en-masse via
spapr_hotplug_req_add_by_count(), and when doing so, actually change
the meaning of the 'drc' parameter passed to
spapr_hotplug_req_event() to be a count rather than an index.
f40eb92 added a hook in spapr_hotplug_req_event() to record when a
device had been 'signalled' to the guest, but that code assumes that
drc is always an index. In cases where it's a count, such as memory
hotplug, the DRC lookup will fail, leading to an assert.
Fix this by only explicitly setting the signalled state for cases where
we are doing PCI hotplug.
For other resources types, since we cannot selectively track whether a
resource has been signalled in cases where we signal attach as a count,
set the 'signalled' state to true immediately upon making the
resource available via drck->attach().
Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: david@gibson.dropbear.id.au
Cc: qemu-ppc@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU currently crashes when using bad parameters for the
spapr-pci-host-bridge device:
$ qemu-system-ppc64 -device spapr-pci-host-bridge,buid=0x123,liobn=0x321,mem_win_addr=0x1,io_win_addr=0x10
Segmentation fault
The problem is that spapr_tce_find_by_liobn() might return NULL, but
the code in spapr_populate_pci_dt() does not check for this condition
and then tries to dereference this NULL pointer.
Apart from that, the return value of spapr_populate_pci_dt() also
has to be checked for all PCI buses, not only for the last one, to
make sure we catch all errors.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Just a single bugfix for spapr in this batch, but I want to make sure
it gets in for 2.6.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXBzt1AAoJEGw4ysog2bOSyGQQAIL4aADwOhNoVLjtvBN3eoPQ
cP+Ps3DCK/9Z9l00cMR6/8zk5Q2Nb1FLf2Y3f3c2JVEFER8XCnsYPIyYfOZaMex4
/8DUfVueTh0RmpxhWwA4vQJtDqrilB0tUkkqgWFPE2luJcTVTUU7mig788d2yrmp
J35ncNaMcrXGy0Uh/wBlnOpfHD17ds8Sgpw02TT9QusqIjq8MWIkgat0v+h4RmRL
lzEE5N1Vp8vOvJENTEnuuKFbFTxcvhBS+A2K1y+s10k7c1CuFFJpAZY7g3T4hpqU
NZAirty5WeMlSYk9A0gQhgHWq2XSgbDWWj6tMGd5sCEQH5D6Kty0TPWnCpzSxjgu
aqGr7BqAV+NV/Rr/jGy4gvE432f1pZWUIxq271OH9H5aniCWSYFBR7w4UEaM1BPQ
I5tzkp7P1PMWIm/K5ryFVo083kU08KFXZDSbQR/vu4O+DuohPUKYid5cv4wJj/W+
GSzBwTwtp8iY2rs/nbMptSYHKYFYtd5PuALf4BoK62sF72NtWq+41X3QV8I4cIQd
hM03NyuObgnY7aygPmo9OGsvW/Dx8DKKoEO0QX+2gFa22rJ+j7RLSu7pHFW1JEXa
5VkVlTtN8L5NeeG0PdkgkChcgiqahUA6bRjekpFzdoncfsmmiPkiP5xQqK1DVKhW
SoJacddcj86QGpT1aioU
=4ZAr
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160408' into staging
ppc patch queue for 2016-04-08
Just a single bugfix for spapr in this batch, but I want to make sure
it gets in for 2.6.
# gpg: Signature made Fri 08 Apr 2016 06:02:45 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.6-20160408:
spapr: Fix ibm,lrdr-capacity
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
ibm,lrdr-capacity has a field to describe the maximum address in bytes
and therefore, the most memory that can be allocated to this guest. We
are using maxmem for this field, but instead should use the actual RAM
address corresponding to the end of hotplug region.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently spapr doesn't support "aborting" hotplug of PCI
devices by allowing device_del to immediately remove the
device if we haven't signalled the presence of the device
to the guest.
In the past this wasn't an issue, since we always immediately
signalled device attach and simply relied on full guest-aware
add->remove path for device removal. However, as of 788d259,
we now defer signalling for PCI functions until function 0
is attached, so now we need to deal with these "abort" operations
for cases where a user hotplugs a non-0 function, then opts to
remove it prior hotplugging function 0. Currently they'd have to
reboot before the unplug completed. PCIe multifunction hotplug
does not have this requirement however, so from a management
implementation perspective it would be good to address this within
the same release as 788d259.
We accomplish this by simply adding a 'signalled' flag to track
whether a device hotplug event has been sent to the guest. If it
hasn't, we allow immediate removal under the assumption that the
guest will not be using the device. Devices present at boot/reset
time are also assumed to be 'signalled'.
For CPU/memory/etc, signalling will still happen immediately
as part of device_add, so only PCI functions should be affected.
Cc: bharata@linux.vnet.ibm.com
Cc: david@gibson.dropbear.id.au
Cc: sbhat@linux.vnet.ibm.com
Cc: qemu-ppc@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[dwg: This fixes a regression where an incorrect hot-add of a non-zero
function can no longer be backed out until function 0 is added]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch fixes the current AIL implementation for POWER8. The
interrupt vector address can be calculated directly from LPCR when the
exception is handled. The excp_prefix update becomes useless and we
can cleanup the H_SET_MODE hcall.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: Removed LPES0/1 handling for HV vs. !HV
Fixed LPCR_ILE case for POWERPC_EXCP_POWER8 ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
[dwg: This was written as a cleanup, but it also fixes a real bug
where setting an alternative interrupt location would not be
correctly migrated]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Chardev fix from Marc-André
* config.status tweak from David
* Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
* get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
* Coverity fix from myself
* PKE implementation from myself, based on rth's XSAVE support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJW9ErPAAoJEL/70l94x66DJfEH/A/QkMpAhrgNdyVsahzsGrzE
wx5gHFIc1nBYxyr62w4apUb5jPB7zaXu0LA7EAWDeAe0pyP8hZzLT9kJyOEDsuJu
zwKN2QeLSNMtPbnbKN0I/YQ2za2xX1V5ruhSeOJoVslUI214hgnAURaGshhQNzuZ
2CluDT9KgL5cQifAnKs5kJrwhIYShYNQB+1eDC/7wk28dd/EH+sPALIoF+rqrSmt
Zu4Mdqd+9Ns+oKOjA6br9ULq/Hzg0aDfY82J+XLVVqfF3PXQe8rTDmuMf/7jTn+M
Un7ZOcei9oZF2/9vfAfKQpDCcgD9HvOUSbgqV/ubmkPPmN/LNJzeKj0fBhrRN+Y=
=K12D
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Log filtering from Alex and Peter
* Chardev fix from Marc-André
* config.status tweak from David
* Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
* get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
* Coverity fix from myself
* PKE implementation from myself, based on rth's XSAVE support
# gpg: Signature made Thu 24 Mar 2016 20:15:11 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
* remotes/bonzini/tags/for-upstream: (28 commits)
target-i386: implement PKE for TCG
config.status: Pass extra parameters
char: translate from QIOChannel error to errno
exec: fix error handling in file_ram_alloc
cputlb: modernise the debug support
qemu-log: support simple pid substitution for logs
target-arm: dfilter support for in_asm
qemu-log: dfilter-ise exec, out_asm, op and opt_op
qemu-log: new option -dfilter to limit output
qemu-log: Improve the "exec" TB execution logging
qemu-log: Avoid function call for disabled qemu_log_mask logging
qemu-log: correct help text for -d cpu
tcg: pass down TranslationBlock to tcg_code_gen
util: move declarations out of qemu-common.h
Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
hw: explicitly include qemu-common.h and cpu.h
include/crypto: Include qapi-types.h or qemu/bswap.h instead of qemu-common.h
isa: Move DMA_transfer_handler from qemu-common.h to hw/isa/isa.h
Move ParallelIOArg from qemu-common.h to sysemu/char.h
Move QEMU_ALIGN_*() from qemu-common.h to qemu/osdep.h
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Conflicts:
scripts/clean-includes
RX buffer pools are now enabled by default for new machine types.
For older machine types, they are still disabled to avoid breaking
migration.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
And move the code adjusting the MSR mask and calling kvmppc_set_papr()
to it. This allows us to add a few more things such as disabling setting
of MSR:HV and appropriate LPCR bits which will be used when fixing
the exception model.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[clg: removed LPCR setting ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ePAPR defines "hcall-instructions" device-tree property which contains
code to call hypercalls in ePAPR paravirtualized guests. In general
pseries guests won't use this property, instead using the PAPR defined
hypercall interface.
However, this property has been re-used to implement a hack to allow
PR KVM to run (slightly modified) guests in some situations where it
otherwise wouldn't be able to (because the system's L0 hypervisor
doesn't forward the PAPR hypercalls to the PR KVM kernel).
Hence, this property is always present in the device tree for pseries
guests. All KVM guests use it at least to read features via the
KVM_HC_FEATURES hypercall.
The property is populated by the code returned from the KVM's
KVM_PPC_GET_PVINFO ioctl; if not implemented in the KVM, QEMU supplies
code which will fail all hypercall attempts. If QEMU does not create
the property, and the guest kernel is compiled with
CONFIG_EPAPR_PARAVIRT (which is normally the case), there is exactly
the same stub at @epapr_hypercall_start already.
Rather than maintaining this fairly useless stub implementation, it
makes more sense not to create the property in the device tree in the
first place if the host kernel does not implement it.
This changes kvmppc_get_hypercall() to return 1 if the host kernel
does not implement KVM_CAP_PPC_GET_PVINFO. The caller can use it to decide
on whether to create the property or not.
This changes the pseries machine to not create the property if KVM does
not implement KVM_PPC_GET_PVINFO. In practice this means that from now
on the property will not be created if either HV KVM or TCG is used.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[reworded commit message for clarity --dwg]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)
Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch replaces get_ticks_per_sec() calls with the macro
NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec()
is then removed. This replacement improves the readability and
understandability of code.
For example,
timer_mod(fdctrl->result_timer,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50));
NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns
matches the unit of the expression on the right side of the plus.
Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change all machine_init() users that simply call type_register*()
to use type_init().
Cc: Evgeny Voevodin <e.voevodin@samsung.com>
Cc: Maksim Kozlov <m.kozlov@samsung.com>
Cc: Igor Mitsyanko <i.mitsyanko@gmail.com>
Cc: Dmitry Solodkiy <d.solodkiy@samsung.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Michael Walle <michael@walle.cc>
Cc: "Hervé Poussineau" <hpoussin@reactos.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Now that spapr-pci-vfio-host-bridge is reduced to just a stub, there is
only one implementation of the finish_realize hook in sPAPRPHBClass. So,
we can fold that implementation into its (single) caller, and remove the
hook. That's the last thing left in sPAPRPHBClass, so that can go away as
well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Now that the regular spapr-pci-host-bridge can handle EEH, there are only
two things that spapr-pci-vfio-host-bridge does differently:
1. automatically sizes its DMA window to match the host IOMMU
2. checks if the attached VFIO container is backed by the
VFIO_SPAPR_TCE_IOMMU type on the host
(1) is not particularly useful, since the default window used by the
regular host bridge will work with the host IOMMU configuration on all
current systems anyway.
Plus, automatically changing guest visible configuration (such as the DMA
window) based on host settings is generally a bad idea. It's not
definitively broken, since spapr-pci-vfio-host-bridge is only supposed to
support VFIO devices which can't be migrated anyway, but still.
(2) is not really useful, because if a guest tries to configure EEH on a
different host IOMMU, the first call will fail and that will be that.
It's possible there are scripts or tools out there which expect
spapr-pci-vfio-host-bridge, so we don't remove it entirely. This patch
reduces it to just a stub for backwards compatibility.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Now that the EEH code is independent of the special
spapr-vfio-pci-host-bridge device, we can allow it on all spapr PCI
host bridges instead. We do this by changing spapr_phb_eeh_available()
to be based on the vfio_eeh_as_ok() call instead of the host bridge class.
Because the value of vfio_eeh_as_ok() can change with devices being
hotplugged or unplugged, this can potentially lead to some strange edge
cases where the guest starts using EEH, then it starts failing because
of a change in status.
However, it's not really any worse than the current situation. Cases that
would have worked previously will still work (i.e. VFIO devices from at
most one VFIO IOMMU group per vPHB), it's just that it's no longer
necessary to use spapr-vfio-pci-host-bridge with the groupid pre-specified.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
The EEH operations in the spapr-vfio-pci-host-bridge no longer rely on the
special groupid field in sPAPRPHBVFIOState. So we can simplify, removing
the class specific callbacks with direct calls based on a simple
spapr_phb_eeh_enabled() helper. For now we implement that in terms of
a boolean in the class, but we'll continue to clean that up later.
On its own this is a rather strange way of doing things, but it's a useful
intermediate step to further cleanups.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
This switches all EEH on VFIO operations in spapr_pci_vfio.c from the
broken vfio_container_ioctl() interface to the new vfio_as_eeh_op()
interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Since commit "60253ed1e6ec rng: add request queue support to rng-random",
the use of a spapr_rng device may hang vCPU threads.
The following path is taken without holding the lock to the main loop mutex:
h_random()
rng_backend_request_entropy()
rng_random_request_entropy()
qemu_set_fd_handler()
The consequence is that entropy_available() may be called before the vCPU
thread could even queue the request: depending on the scheduling, it may
happen that entropy_available() does not call random_recv()->qemu_sem_post().
The vCPU thread will then sleep forever in h_random()->qemu_sem_wait().
This could not happen before 60253ed1e6 because entropy_available() used
to call random_recv() unconditionally.
This patch ensures the lock is held to avoid the race.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Cédric Le Goater <clg@fr.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
fa48b43 "target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM"
purports to remove a hack in the handling of hash page tables (HPTs)
managed by KVM instead of qemu. However, it actually went in the wrong
direction.
That patch requires anything looking for an external HPT (that is one not
managed by the guest itself) to check both env->external_htab (for a qemu
managed HPT) and kvmppc_kern_htab (for a KVM managed HPT). That's a
problem because kvmppc_kern_htab is local to mmu-hash64.c, but some places
which need to check for an external HPT are outside that, such as
kvm_arch_get_registers(). The latter was subtly broken by the earlier
patch such that gdbstub can no longer access memory.
Basically a KVM managed HPT is much more like a qemu managed HPT than it is
like a guest managed HPT, so the original "hack" was actually on the right
track.
This partially reverts fa48b43, so we again mark a KVM managed external HPT
by putting a special but non-NULL value in env->external_htab. It then
goes further, using that marker to eliminate the kvmppc_kern_htab global
entirely. The ppc_hash64_set_external_hpt() helper function is extended
to set that marker if passed a NULL value (if you're setting an external
HPT, but don't have an actual HPT to set, the assumption is that it must
be a KVM managed HPT).
This also has some flow-on changes to the HPT access helpers, required by
the above changes.
Reported-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
When a Power cpu with 64-bit hash MMU has it's hash page table (HPT)
pointer updated by a write to the SDR1 register we need to update some
derived variables. Likewise, when the cpu is configured for an external
HPT (one not in the guest memory space) some derived variables need to be
updated.
Currently the logic for this is (partially) duplicated in ppc_store_sdr1()
and in spapr_cpu_reset(). In future we're going to need it in some other
places, so make some common helpers for this update.
In addition the new ppc_hash64_set_external_hpt() helper also updates
SDR1 in KVM - it's not updated by the normal runtime KVM <-> qemu CPU
synchronization. In a sense this belongs logically in the
ppc_hash64_set_sdr1() helper, but that is called from
kvm_arch_get_registers() so can't itself call cpu_synchronize_state()
without infinite recursion. In practice this doesn't matter because
the only other caller is TCG specific.
Currently there aren't situations where updating SDR1 at runtime in KVM
matters, but there are going to be in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Since 3f1e147, QEMU has adopted a convention of supporting function
hotplug by deferring hotplug events until func 0 is hotplugged.
This is likely how management tools like libvirt would expose
such support going forward.
Since sPAPR guests rely on per-func events rather than
slot-based, our protocol has been to hotplug func 0 *first* to
avoid cases where devices appear within guests without func 0
present to avoid undefined behavior.
To remain compatible with new convention, defer hotplug in a
similar manner, but then generate events in 0-first order as we
did in the past. Once func 0 present, fail any attempts to plug
additional functions (as we do with PCIe).
For unplug, defer unplug operations in a similar manner, but
generate unplug events such that function 0 is removed last in guest.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Some CPUs are of an opposite data-endianness to other components in the
system. Sometimes elfs have the data sections layed out with this CPU
data-endianness accounting for when loaded via the CPU, so byte swaps
(relative to other system components) will occur.
The leading example, is ARM's BE32 mode, which is is basically LE with
address manipulation on half-word and byte accesses to access the
hw/byte reversed address. This means that word data is invariant
across LE and BE32. This also means that instructions are still LE.
The expectation is that the elf will be loaded via the CPU in this
endianness scheme, which means the data in the elf is reversed at
compile time.
As QEMU loads via the system memory directly, rather than the CPU, we
need a mechanism to reverse elf data endianness to implement this
possibility.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Using the return value to report errors is error prone:
- xics_alloc() returns -1 on error but spapr_vio_busdev_realize() errors
on 0
- xics_alloc_block() returns the unclear value of ics->offset - 1 on error
but both rtas_ibm_change_msi() and spapr_phb_realize() error on 0
This patch adds an errp argument to xics_alloc() and xics_alloc_block() to
report errors. The return value of these functions is a valid IRQ number
if errp is NULL. It is undefined otherwise.
The corresponding error traces get promotted to error messages. Note that
the "can't allocate IRQ" error message in spapr_vio_busdev_realize() also
moves to xics_alloc(). Similar error message consolidation isn't really
applicable to xics_alloc_block() because callers have extra context (device
config address, MSI or MSIX).
This fixes the issues mentioned above.
Based on previous work from Brian W. Hart.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since QEMU 2.4, we have a configuration section in the migration stream.
This must be skipped for older machines, like it is already done for x86.
This patch fixes the migration of pseries-2.3 from/to QEMU 2.3, but it
breaks migration of the same machine from/to QEMU 2.4/2.4.1/2.5. We do
that anyway because QEMU 2.3 is likely to be more widely deployed than
newer QEMU versions.
Fixes: 61964c23e5
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since QEMU 2.3, we have a vmdesc section in the migration stream.
This section is not mandatory but when migrating a pseries-2.2
machine from QEMU 2.2, you get a warning at the destination:
qemu-system-ppc64: Expected vmdescription section, but got 0
The warning goes away if we decide to skip vmdesc as well for
older pseries, like it is already done for pc's.
This can only be observed with -cpu POWER7 because POWER8
cannot migrate from QEMU 2.2 to 2.3 (insns_flags2 mismatch).
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This RTAS call is used to request new interrupts or to free all interrupts.
If the driver has already allocated interrupts and asks again for a non-null
number of irqs, then the rtas_ibm_change_msi() function will silently leak
the previous interrupts.
It happens because xics_free() is only called when the driver releases all
interrupts (!req_num case). Note that the previously allocated spapr_pci_msi
is not leaked because the GHashTable is created with destroy functions and
g_hash_table_insert() hence frees the old value.
This patch makes sure any previously allocated MSIs are released when a
new allocation succeeds.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The num local variable is initialized to zero and has no writer.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is currently possible to hotplug a spapr_rng device but QEMU crashes
when we try to hot unplug:
ERROR:hw/core/qdev.c:295:qdev_unplug: assertion failed: (hotplug_ctrl)
Aborted
This happens because spapr_rng isn't plugged to any bus and sPAPR does
not provide hotplug support for it: qdev_get_hotplug_handler() hence
return NULL and we hit the assertion.
And anyway, it doesn't make much sense to unplug this device since hcalls
cannot be unregistered. Even the idea of hotplugging a RNG device instead
of declaring it on the QEMU command line looks weird.
This patch simply disables hotpluggability for the spapr-rng class.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This fixes a crash in the target QEMU during migration.
Broken in commit c5f54f3.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[reworded commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This hypercall either initializes a page with zeros, or copies
another page.
According to LoPAPR, the i-cache of the page should also be
flushed if using H_ICACHE_INVALIDATE or H_ICACHE_SYNCHRONIZE,
and the d-cache should be synchronized to the RAM if the
H_ICACHE_SYNCHRONIZE flag is used. For this, two new functions
are introduced, kvmppc_dcbst_range() and kvmppc_icbi()_range, which
use the corresponding assembler instructions to flush the caches
if running with KVM on Power. If the code runs with TCG instead,
the code only uses tb_flush(), assuming that this will be
enough for synchronization.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The LoPAPR specification defines the following for the RTAS
power-off call: "On successful operation, does not return".
However, the implementation in QEMU currently returns and runs
the guest CPU again for some more cycles. This caused some
trouble with the new ppc implementation of the kvm-unit-tests
recently. So let's make sure that the QEMU implementation
follows the spec, thus stop the CPU to make sure that the
RTAS call does not return to the guest anymore.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Tested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 4b23699 "pseries: Add pseries-2.6 machine type" added a new
SPAPR_COMPAT_2_5 macro in the usual way. However, it didn't add this
macro to the existing SPAPR_COMPAT_2_4 macro so that pseries-2.4
inherits newer compatibility properties which are needed for 2.5 and
earlier.
This corrects the oversight.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Also implement the command, by taking device list mask into account
when polling ADB devices.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also implement the command, by removing the hardcoded period of 20 ms/50 Hz
and replacing it by the one requested by user.
Update VMState version to store this new parameter.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The H_SET_XDABR hypercall is similar to H_SET_DABR, but also sets
the extended DABR (DABRX) register.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to LoPAPR, h_set_dabr should simply set DABRX to 3
(if the register is available), and load the parameter into DABR.
If DABRX is not available, the hypervisor has to check the
"Breakpoint Translation" bit of the DABR register first.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is a very simple hypercall that only sets up the SPRG0
register for the guest (since writing to SPRG0 was only permitted
to the hypervisor in older versions of the PowerISA).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
htab_save_first_pass could return without finishing its work due to
timeout. The patch checks if another invocation of it is necessary and
will call it in htab_save_complete if necessary.
Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[removed overlong line]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
With HV KVM, the guest's hash page table (HPT) is managed by the kernel and
not directly accessible to QEMU. This means that spapr->htab is NULL
and normally env->external_htab would also be NULL for each cpu.
However, that would cause ppc_hash64_load_hpte*() to do the wrong thing in
the few cases where QEMU does need to load entries from the in-kernel HPT.
Specifically, seeing external_htab is NULL, they would look for an HPT
within the guest's address space instead.
To stop that we have an ugly hack in the pseries machine type code to
set external htab to (void *)1 instead.
This patch removes that hack by having ppc_hash64_load_hpte*() explicitly
check kvmppc_kern_htab instead, which makes more sense.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
At the moment the size of the hash page table (HPT) is fixed based on the
maximum memory allowed to the guest. As such, we allocate the table during
machine construction, and just clear it at reset.
However, we're planning to implement a PAPR extension allowing the hash
page table to be resized at runtime. This will mean that on reset we want
to revert it to the default size. It also means that when migrating, we
need to make sure the destination allocates an HPT of size matching the
host, since the guest could have changed it before the migration.
This patch replaces the spapr_alloc_htab() and spapr_reset_htab() functions
with a new spapr_reallocate_hpt() function. This is called at reset and
inbound migration only, not during machine init any more.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
At present we calculate the recommended hash page table (HPT) size for a
pseries guest just once in ppc_spapr_init() before allocating the HPT.
In future patches we're going to want this calculation in other places, so
this splits it out into a helper function. While we're at it, change the
calculation to use ctz() instead of an explicit loop.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
When migrating the 'pseries' machine type with KVM, we use a special fd
to access the hash page table stored within KVM. Usually, this fd is
opened at the beginning of migration, and kept open until the migration
is complete.
However, if there is a guest reset during the migration, the fd can become
stale and we need to re-open it. At the moment we use an 'htab_fd_stale'
flag in sPAPRMachineState to signal this, which is checked in the migration
iterators.
But that's rather ugly. It's simpler to just close and invalidate the
fd on reset, and lazily re-open it in migration if necessary. This patch
implements that change.
This requires a small addition to the machine state's instance_init,
so that htab_fd is initialized to -1 (telling the migration code it
needs to open it) instead of 0, which could be a valid fd.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
No backend was setting an error when ending the visit of a list or
implicit struct, or when moving to the next list node. Make the
callers a bit easier to follow by making this a part of the contract,
and removing the errp argument - callers can then unconditionally end
an object as part of cleanup without having to think about whether a
second error is dominated by a first, because there is no second
error.
A later patch will then tackle the larger task of splitting
visit_end_struct(), which can indeed set an error.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1454075341-13658-24-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
visit_start_struct() and visit_type_enum() had a 'kind' argument
that was usually set to either the stringized version of the
corresponding qapi type name, or to NULL (although some clients
didn't even get that right). But nothing ever used the argument.
It's even hard to argue that it would be useful in a debugger,
as a stack backtrace also tells which type is being visited.
Therefore, drop the 'kind' argument as dead.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-22-git-send-email-eblake@redhat.com>
[Harmless rebase mistake cleaned up]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Similar to the previous patch, it's nice to have all functions
in the tree that involve a visitor and a name for conversion to
or from QAPI to consistently stick the 'name' parameter next
to the Visitor parameter.
Done by manually changing include/qom/object.h and qom/object.c,
then running this Coccinelle script and touching up the fallout
(Coccinelle insisted on adding some trailing whitespace).
@ rule1 @
identifier fn;
typedef Object, Visitor, Error;
identifier obj, v, opaque, name, errp;
@@
void fn
- (Object *obj, Visitor *v, void *opaque, const char *name,
+ (Object *obj, Visitor *v, const char *name, void *opaque,
Error **errp) { ... }
@@
identifier rule1.fn;
expression obj, v, opaque, name, errp;
@@
fn(obj, v,
- opaque, name,
+ name, opaque,
errp)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-20-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
JSON uses "name":value, but many of our visitor interfaces were
called with visit_type_FOO(v, &value, name, errp). This can be
a bit confusing to have to mentally swap the parameter order to
match JSON order. It's particularly bad for visit_start_struct(),
where the 'name' parameter is smack in the middle of the
otherwise-related group of 'obj, kind, size' parameters! It's
time to do a global swap of the parameter ordering, so that the
'name' parameter is always immediately after the Visitor argument.
Additional reason in favor of the swap: the existing include/qjson.h
prefers listing 'name' first in json_prop_*(), and I have plans to
unify that file with the qapi visitors; listing 'name' first in
qapi will minimize churn to the (admittedly few) qjson.h clients.
Later patches will then fix docs, object.h, visitor-impl.h, and
those clients to match.
Done by first patching scripts/qapi*.py by hand to make generated
files do what I want, then by running the following Coccinelle
script to affect the rest of the code base:
$ spatch --sp-file script `git grep -l '\bvisit_' -- '**/*.[ch]'`
I then had to apply some touchups (Coccinelle insisted on TAB
indentation in visitor.h, and botched the signature of
visit_type_enum() by rewriting 'const char *const strings[]' to
the syntactically invalid 'const char*const[] strings'). The
movement of parameters is sufficient to provoke compiler errors
if any callers were missed.
// Part 1: Swap declaration order
@@
type TV, TErr, TObj, T1, T2;
identifier OBJ, ARG1, ARG2;
@@
void visit_start_struct
-(TV v, TObj OBJ, T1 ARG1, const char *name, T2 ARG2, TErr errp)
+(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
{ ... }
@@
type bool, TV, T1;
identifier ARG1;
@@
bool visit_optional
-(TV v, T1 ARG1, const char *name)
+(TV v, const char *name, T1 ARG1)
{ ... }
@@
type TV, TErr, TObj, T1;
identifier OBJ, ARG1;
@@
void visit_get_next_type
-(TV v, TObj OBJ, T1 ARG1, const char *name, TErr errp)
+(TV v, const char *name, TObj OBJ, T1 ARG1, TErr errp)
{ ... }
@@
type TV, TErr, TObj, T1, T2;
identifier OBJ, ARG1, ARG2;
@@
void visit_type_enum
-(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char *name, TErr errp)
+(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
{ ... }
@@
type TV, TErr, TObj;
identifier OBJ;
identifier VISIT_TYPE =~ "^visit_type_";
@@
void VISIT_TYPE
-(TV v, TObj OBJ, const char *name, TErr errp)
+(TV v, const char *name, TObj OBJ, TErr errp)
{ ... }
// Part 2: swap caller order
@@
expression V, NAME, OBJ, ARG1, ARG2, ERR;
identifier VISIT_TYPE =~ "^visit_type_";
@@
(
-visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR)
+visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR)
|
-visit_optional(V, ARG1, NAME)
+visit_optional(V, NAME, ARG1)
|
-visit_get_next_type(V, OBJ, ARG1, NAME, ERR)
+visit_get_next_type(V, NAME, OBJ, ARG1, ERR)
|
-visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR)
+visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR)
|
-VISIT_TYPE(V, OBJ, NAME, ERR)
+VISIT_TYPE(V, NAME, OBJ, ERR)
)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
h_enter() in the spapr code needs to know the page size of the HPTE it's
about to insert. Unlike other paths that do this, it doesn't have access
to the SLB, so at the moment it determines this with some open-coded
tests which assume POWER7 or POWER8 page size encodings.
To make this more flexible add ppc_hash64_hpte_page_shift_noslb() to
determine both the "base" page size per segment, and the individual
effective page size from an HPTE alone.
This means that the spapr code should now be able to handle any page size
listed in the env->sps table.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Alexander Graf <agraf@suse.de>
When HPTEs are removed or modified by hypercalls on spapr, we need to
invalidate the relevant pages in the qemu TLB.
Currently we do that by doing some complicated calculations to work out the
right encoding for the tlbie instruction, then passing that to
ppc_tlb_invalidate_one()... which totally ignores the argument and flushes
the whole tlb.
Avoid that by adding a new flush-by-hpte helper in mmu-hash64.c.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Alexander Graf <agraf@suse.de>
Like a lot of places these files include a mixture of functions taking
both the older CPUPPCState *env and newer PowerPCCPU *cpu. Move a step
closer to cleaning this up by standardizing on PowerPCCPU, except for the
helper_* functions which are called with the CPUPPCState * from tcg.
Callers and some related functions are updated as well, the boundaries of
what's changed here are a bit arbitrary.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
The implementation of the H_ENTER hypercall for PAPR guests needs to
enforce correct access attributes on the inserted HPTE. This means
determining if the HPTE's real address is a regular RAM address (which
requires attributes for coherent access) or an IO address (which requires
attributes for cache-inhibited access).
At the moment this check is implemented with (raddr < machine->ram_size),
but that only handles addresses in the base RAM area, not any hotplugged
RAM.
This patch corrects the problem with a new helper.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
The functions for migrating the hash page table on pseries machine type
(htab_save_setup() and htab_load()) can report some errors with an
explicit fprintf() before returning an appropriate error code. Change some
of these to use error_report() instead. htab_save_setup() is omitted for
now to avoid conflicts with some other in-progress work.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
This function includes a number of explicit fprintf()s for errors.
Change these to use error_report() instead.
Also replace the single exit(EXIT_FAILURE) with an explicit exit(1), since
the latter is the more usual idiom in qemu by a large margin.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Use the error handling infrastructure to pass an error out from
try_create_xics() instead of assuming &error_abort - the caller is in a
better position to decide on error handling policy.
Also change the error handling from an &error_abort to &error_fatal, since
this occurs during the initial machine construction and could be triggered
by bad configuration rather than a program error.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
The errors detected in this function necessarily indicate bugs in the rest
of the qemu code, rather than an external or configuration problem.
So, a simple assert() is more appropriate than any more complex error
reporting.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Use error_setg() to return an error rather than an explicit exit().
Previously it was an exit(0) instead of a non-zero exit code, which was
simply a bug. Also improve the error message.
While we're at it change the type of spapr_vga_init() to bool since that's
how we're using it anyway.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Use error_setg() and return an error, rather than using an explicit exit().
Also improve messages, and be more explicit about which constraint failed.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Currently spapr_cpu_init() is hardcoded to handle any errors as fatal.
That works for now, since it's only called from initial setup where an
error here means we really can't proceed.
However, we'll want to handle this more flexibly for cpu hotplug in future
so generalize this using the error reporting infrastructure. While we're
at it make a small cleanup in a related part of ppc_spapr_init() to use
error_report() instead of an old-style explicit fprintf().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Current ppc_set_compat() returns -1 for errors, and also (unconditionally)
reports an error message. The caller in h_client_architecture_support()
may then report it again using an outdated fprintf().
Clean this up by using the modern error reporting mechanisms. Also add
strerror(errno) to the error message.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
If guest doesn't have any dynamically reconfigurable (DR) logical memory
blocks (LMB), then we shouldn't create ibm,dynamic-reconfiguration-memory
device tree node.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
h_client_architecture_support() uses rtas_ld() for general purpose memory
access, despite the fact that it's not an RTAS routine at all and rtas_ld
makes things more awkward.
Clean this up by replacing rtas_ld() calls with appropriate ldXX_phys()
calls.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
rtas_st_buffer_direct() is a not particularly useful wrapper around
cpu_physical_memory_write(). All the callers are in
rtas_ibm_configure_connector, where it's better handled by local helper.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
rtas_st_buffer() appears in spapr.h as though it were a widely used helper,
but in fact it is only used for saving data in a format used by
rtas_ibm_get_system_parameter(). This changes it to a local helper more
specifically for that function.
While we're there fix a couple of small defects in
rtas_ibm_get_system_parameter:
- For the string value SPLPAR_CHARACTERISTICS, it wasn't including the
terminating \0 in the length which it should according to LoPAPR
7.3.16.1
- It now checks that the supplied buffer has at least enough space for
the length of the returned data, and returns an error if it does not.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Currently the aiocb is held within MACIOIDEState, however the IDE core code
assumes that the current actvie DMA aiocb is held in aiocb in a few places,
e.g. ide_bus_reset() and ide_reset().
Switch over to using IDEDMA aiocb to store the aiocb for the current active
DMA request so that bus resets and restarts are handled correctly. As a
consequence we can now use ide_set_inactive() rather than handling its
functionality ourselves.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-6-git-send-email-peter.maydell@linaro.org
Replace the uint32 softfloat-specific typedef with uint32_t.
This change was made with
find include hw fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\buint32\b/uint32_t/g'
together with manual removal of the typedef definition,
manual undoing of various mis-hits, and another couple of
fixes found via test compilation.
All the uses in hw/ were using the wrong type by mistake.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-5-git-send-email-peter.maydell@linaro.org
Currently the ObjectProperty iterator API works as follows:
ObjectPropertyIterator *iter;
iter = object_property_iter_init(obj);
while ((prop = object_property_iter_next(iter))) {
...
}
object_property_iter_free(iter);
This has the benefit that the ObjectPropertyIterator struct
can be opaque, but has the downside that callers need to
explicitly call a free function. It is also not in keeping
with iterator style used elsewhere in QEMU/GLib2.
This patch changes the API to use stack allocation instead:
ObjectPropertyIterator iter;
object_property_iter_init(&iter, obj);
while ((prop = object_property_iter_next(&iter))) {
...
}
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[AF: Fused ObjectPropertyIterator struct with typedef]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Commit 6daf194d, be62a2eb and 312fd5f got rid of a bunch, but they
keep coming back. Tracked down with the Coccinelle semantic patch
from commit 312fd5f.
Cc: Fam Zheng <famz@redhat.com>
Cc: Peter Crosthwaite <crosthwaitepeter@gmail.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Cc: Stefan Berger <stefanb@linux.vnet.ibm.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Changchun Ouyang <changchun.ouyang@intel.com>
Cc: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-17-git-send-email-armbru@redhat.com>
Not caught by Coccinelle, because we report the error only
conditionally here.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-14-git-send-email-armbru@redhat.com>
Done with this Coccinelle semantic patch
@@
expression FMT, E, S;
expression list ARGS;
@@
- error_report(FMT, ARGS, error_get_pretty(E));
+ error_reportf_err(E, FMT/*@@@*/, ARGS);
(
- error_free(E);
|
exit(S);
|
abort();
)
followed by a replace of '%s"/*@@@*/' by '"' and some line rewrapping,
because I can't figure out how to make Coccinelle transform strings.
We now use the error whole instead of just its message obtained with
error_get_pretty(). This avoids suppressing its hint (see commit
50b7b00), but I can't see how the errors touched in this commit could
come with hints.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1450452927-8346-12-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Same Coccinelle semantic patch as in commit 565f65d.
We now use the original error whole instead of just its message
obtained with error_get_pretty(). This avoids suppressing its hint
(see commit 50b7b00), but I don't think the errors touched in this
commit can come with hints.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-3-git-send-email-armbru@redhat.com>
Printing CPU registers is not helpful during machine initialization.
Moreover, these are straightforward configuration or "can get
resources" errors, so dumping core isn't appropriate either. Replace
hw_error() by error_report(); exit(1). Matches how we report these
errors in other machine initializations.
Cc: Richard Henderson <rth@twiddle.net>
Cc: qemu-arm@nongnu.org
Cc: qemu-ppc@nongnu.org
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1450370121-5768-2-git-send-email-armbru@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1449764955-10741-3-git-send-email-armbru@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexander Graf <agraf@suse.de>
Cc: qemu-ppc@nongnu.org
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
[fixed return type of spapr_machine_finalizefn()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OHCI has some bugs and performance issues, so for
newer machines it's preferable to use XHCI instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This tweaks the way the default machine version is controlled, so that
there will be a bit less churn when each new version is introduced.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Currently each of the *_class_options() functions for the pseries-2.1 ..
pseries-2.5 machine types are standalone. This will become harder to
maintain as new versions are added.
This patch restructures them similarly to x86 where each function calls
the one from the next version, then overrides anything necessary for
compatibility with the specific version and older.
The default behaviour - that for the most recent machine are set up in
the base class initializer spapr_machine_class_init(). Previously it had
some things set up to default to older behaviour with the more recent
machines overriding it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
At the moment all the class_init functions and TypeInfo structures for the
various versioned pseries machine types are open-coded. As more versions
are created this is getting increasingly clumsy.
This patch borrows the approach used in PC, using a DEFINE_SPAPR_MACHINE()
macro to construct most of the boilerplate from simpler 'class_options' and
'instance_options' functions.
This patch makes a small semantic change - the versioned machine types are
now registered through machine_init() instead of type_init(). Since the
new way is how PC already did it, I'm assuming that's correct.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
To make the spapr_machine_*_class_init() functions a little less bulky.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Currently, the versioned spapr machine types put the machine type version
into the description string. PC does not do this, using just the name
itself to distinguish. Doing the same lets us move setting the description
into the common base class, simplifying the code slightly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
The instance_init() functions for several of the pseries-x.y versioned
machine types explicitly call spapr_machine_initfn(). But that's the
instance_init function for the common parent of all those machine types,
so will already have been called beforehand by the QOM infrastructure.
Remove the redundant calls.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
hw/ppc/spapr.c has a number of definitions related to the various versioned
machine types ("pseries-2.1" .. "pseries-2.5") it defines. These are
mostly arranged by type of function first, then machine version second, and
it's not consistent about whether it goes in increasing or decreasing
version order.
This rearranges the code to keep all the definitions for a particular
machine version together, and arrange then consistently in order most
recent to least recent.
This brings us closer to matching the way PC does things, and makes later
cleanups easier to follow.
Apart from adding some comments marking each section, this is a pure
mechanical rearrangement with no semantic changes.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
98cec76 "machine: Set MachineClass::name automatically" removed the setting
of mc->name for the pseries machine types, since it can be derived
automatically from the type names constructed with MACHINE_TYPE_NAME().
Unfortunately fb0fc8f "spapr: Create pseries-2.5 machine" went in later and
brought one of them back.
This removes it again.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Section B.6.2.1 Root Node Properties of PAPR specification defines
a set of properties which shall be present in the device tree root,
one of these properties is "system-id" which "should be unique across
all systems and all manufacturers". Since UUID is meant to be unique,
it makes sense to use it as "system-id".
This adds "system-id" property to the device tree root when not empty.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
class_size = sizeof(XICSStateClass) does not make much sense
in the RTC code and likely was just a copy-n-paste error.
Let's simply remove it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
prop_get_fdt() misuses the visitor API: when fdt is null, it doesn't
visit anything. object_property_get_qobject() happily
object_property_get_qobject(). Amazingly, the latter survives the
misuse. Turns out we've papered over it long before prop_get_fdt()
existed, in commit 1d10b44.
However, commit 6c2f9a1 changed how we paper over it, and as a side
effect changed qom-get's value from {} to null. Change it right back
by fixing the visitor misuse.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It should only be created via spapr_dr_connector_new(). Attempting to
create it with -device crashes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since prop_get_fdt() is only used with QmpOutputVisitor, errors
shouldn't actually happen, so this is only a latent bug.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The taihu_cpld_writel() function had an obvious typo that meant that
if it was ever called it would go into an infinite recursion. Newer
versions of clang will detect and warn about this:
hw/ppc/ppc405_boards.c:481:1: warning: all paths through this function will call itself [-Winfinite-recursion]
Fix this by converting taihu_cpld from the legacy old_mmio accessors
to new-style ones, with an impl {} declaration to cause the core
memory code to do the splitting of 16 bit and 32 bit accesses into
multiple 8-bit accesses.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The "pseries" alias is currently set twice, one time for the
pseries-2.4 machine and one time for the "pseries-2.5" machine.
To avoid confusion with the alias, let's remove the one from
the older machine class. And while we're at it, also remove
the "is_default = 0" there since the is_default variable
should be set to zero by default already.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
hw/ppc/spapr.c: Fix memory leak on error, it was introduced in bc09e0611
hw/acpi/memory_hotplug.c: Fix memory leak on error, it was introduced in 34f2af3d
Signed-off-by: Stefano Dong (董兴水) <opensource.dxs@aliyun.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stop directly accessing the Object::properties field data
structure and instead use the formal object property iterator
APIs. This insulates the code from future data structure
changes in the Object struct.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
MacOS 9 is racy when it comes to accessing the shift register. Fix this by
introducing a small delay between data accesses and raising the SR_INT
interrupt bit.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The mac99 machines always have a USB controller. Usually not having one around
doesn't hurt quite as much, but Mac OS 9 really really wants one or it crashes
on bootup.
So always add OHCI to make it happy.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM_PPC_ALLOCATE_HTAB ioctl can return -ENOMEM for KVM guests and QEMU
never handled this correctly. But this didn't cause any problems till
now as KVM_PPC_ALLOCATE_HTAB ioctl returned with smaller than requested
HTAB when enough contiguous memory wasn't available in the host.
After the proposed kernel change: https://patchwork.ozlabs.org/patch/530501/,
KVM_PPC_ALLOCATE_HTAB ioctl will not fallback to lower sized HTAB
allocation and will fail if requested HTAB size can't be met.
Check for such failures in QEMU and abort appropriately. This will
prevent guest kernel from hanging/freezing during early boot by doing
graceful exit when host is unable to allocate requested HTAB.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In postcopy we're going to need to perform the complete phase
for postcopiable devices at a different point, start out by
renaming all of the 'complete's to make the difference obvious.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
HW_COMPAT_2_4 will become non-empty: prepare for it.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-id: 1444991154-79217-3-git-send-email-cornelia.huck@de.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
These messages are disabled by default; a perfect usecase for tracepoints.
Convert them over.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
LoPAPR defines a "ibm,pa-features" per-CPU device tree property which
describes extended features of the Processor Architecture.
This adds the property to the device tree. At the moment this is the
copy of what pHyp advertises except "I=1 (cache inhibited) Large Pages"
which is enabled for TCG and disabled when running under HV KVM host
with 4K system page size.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[aik: rebased, changed commit log, moved ci_large_pages initialization,
renamed pa_features arrays]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The core VFIO infrastructure more or less allows VFIO devices to work
on any normal guest PCI host bridge (PHB) without extra logic.
However, the "spapr-pci-host-bridge" device (as opposed to the special
"spapr-pci-vfio-host-bridge" device) breaks this by using a partially
KVM accelerated implementation of the guest kernel IOMMU which won't
work with VFIO devices, without additional kernel support.
This patch allows VFIO devices to work on the spapr-pci-host-bridge,
by having it switch off KVM TCE acceleration when a VFIO device is
added to the PHB (either on startup, or by hotplug).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
all TCE tables (guest IOMMU contexts) can support VFIO devices. Currently,
this is decided at creation time.
To support hotplug of VFIO devices, we need to allow a TCE table which
previously didn't allow VFIO devices to be switched so that it can. This
patch adds an spapr_tce_set_need_vfio() function to do this, by
reallocating the table in userspace if necessary.
Currently this doesn't allow the KVM acceleration to be re-enabled if all
the VFIO devices are removed. That's an optimization for another time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
The vfio_accel parameter used when creating a new TCE table (guest IOMMU
context) has a confusing name. What it really means is whether we need the
TCE table created to be able to support VFIO devices.
VFIO is relevant, because when available we use in-kernel acceleration of
the TCE table, but that may not work with VFIO devices because updates to
the table are handled in kernel, bypass qemu and so don't hit qemu's
infrastructure for keeping the VFIO host IOMMU state in sync with the guest
IOMMU state.
Rename the parameter to "need_vfio" throughout. This is a cosmetic change,
with no impact on the logic.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
At present the PCI host bridge (PHB) for the pseries machine type has a
fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
memory via the PAPR paravirtualized IOMMU.
For better support of VFIO devices, we're going to want to allow for
different configurations of the DMA window.
Eventually we'll want to allow the guest itself to reconfigure the window
via the PAPR dynamic DMA window interface, but as a preliminary this patch
allows the user to reconfigure the window with new properties on the PHB
device.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
According to a commit message in the Linux kernel (see here
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b60c31d85a2a
for example), the name of the property that carries the information
about the number of SLB entries should be called "slb-size", and
not "ibm,slb-size". The Linux kernel can deal with both names, but
to be on the safe side we should support the official name, too.
[Now that LoPAPR is public, the relevant requirement can be found in
section C.6.1.8 --dwg]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Terminate the guest when HTAB of requested size isn't allocated by
the host.
When memory hotplug is attempted on a guest that has booted with
less than requested HTAB size, the guest kernel will not be able
to gracefully fail the hotplug request. This patch will ensure that
we never end up in a situation where memory hotplug fails due to
less than requested HTAB size.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Allocate HTAB from ppc_spapr_init() so that we can abort the guest
if requested HTAB size is't allocated by the host. However retain the
htab reset call in spapr_reset_htab() so that HTAB gets reset (and
not allocated) during machine reset.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It works fine with the Linux driver out of the box
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This should help clarify the purpose of the function that returns
the host system's CPU cycle count.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
ppc portion
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
setting gap to TRUE will make sparse DIMM
address auto allocation, leaving gaps between
a new DIMM address and preceeding existing DIMM.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Rename ELF_MACHINE to be PPC specific. This is used as-is by the
various PPC bootloaders and is locally defined to ELF_MACHINE in linux
user in PPC specific ifdeffery.
This removes another architecture specific definition from the global
namespace (as desired by multi-arch).
Cc: Alexander Graf <agraf@suse.de>
Cc: qemu-ppc@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-By: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This checks if the PCI device retrieved from the PCI device address
is VFIO PCI device when enabling EEH functionality. If it's not
VFIO PCI device, the EEH functonality isn't enabled.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This reverts commit 7cb18007 ("sPAPR: Don't enable EEH on emulated
PCI devices") as rtas_ibm_set_eeh_option() isn't the right place
to check if there has the corresponding PCI device for the input
address, which can be PE address, not PCI device address.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PAPR interface defines a hypercall to pass high-quality
hardware generated random numbers to guests. Recent kernels can
already provide this hypercall to the guest if the right hardware
random number generator is available. But in case the user wants
to use another source like EGD, or QEMU is running with an older
kernel, we should also have this call in QEMU, so that guests that
do not support virtio-rng yet can get good random numbers, too.
This patch now adds a new pseudo-device to QEMU that either
directly provides this hypercall to the guest or is able to
enable the in-kernel hypercall if available. The in-kernel
hypercall can be enabled with the use-kvm property, e.g.:
qemu-system-ppc64 -device spapr-rng,use-kvm=true
For handling the hypercall in QEMU instead, a "RngBackend" is
required since the hypercall should provide "good" random data
instead of pseudo-random (like from a "simple" library function
like rand() or g_random_int()). Since there are multiple RngBackends
available, the user must select an appropriate back-end via the
"rng" property of the device, e.g.:
qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \
-device spapr-rng,rng=gid0 ...
See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
other example of specifying RngBackends.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The buffer that is allocated in spapr_populate_drconf_memory()
is used for setting both, the "ibm,dynamic-memory" and the
"ibm,associativity-lookup-arrays" property. However, only the
size of the first one is taken into account when allocating the
memory. So if the length of the second property is larger than
the length of the first one, we run into a buffer overflow here!
Fix it by taking the length of the second property into account,
too.
Fixes: "spapr: Support ibm,dynamic-reconfiguration-memory" patch
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At present, if guest numa nodes are requested, but the cpus in each node
are not specified, spapr just uses the default behaviour or assigning each
vcpu round-robin to nodes.
If smp_threads != 1, that will assign adjacent threads in a core to
different NUMA nodes. As well as being just weird, that's a configuration
that can't be represented in the device tree we give to the guest, which
means the guest and qemu end up with different ideas of the NUMA topology.
This patch implements mc->cpu_index_to_socket_id in the spapr code to
make sure vcpus get assigned to nodes only at the socket granularity.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Till now memory hotplug used RTAS_LOG_V6_HP_ID_DRC_INDEX hotplug type
which meant that we generated one hotplug type of EPOW event for every
256MB (SPAPR_MEMORY_BLOCK_SIZE). This quickly overruns the kernel
rtas log buffer thus resulting in loss of memory hotplug events. Switch
to RTAS_LOG_V6_HP_ID_DRC_COUNT hotplug type for memory so that we
generate only one event per hotplug request.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Support hotplug identifier type RTAS_LOG_V6_HP_ID_DRC_COUNT that allows
hotplugging of DRCs by specifying the DRC count.
While we are here, rename
spapr_hotplug_req_add_event() to spapr_hotplug_req_add_by_index()
spapr_hotplug_req_remove_event() to spapr_hotplug_req_remove_by_index()
so that they match with spapr_hotplug_req_add_by_count().
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Don't represent non-hotluggable memory under drconf node. With this
we don't have to create DRC objects for them.
The effect of this patch is that we revert back to memory@XXXX representation
for all the memory specified with -m option and represent the cold
plugged memory and hot-pluggable memory under
ibm,dynamic-reconfiguration-memory.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When NUMA isn't configured explicitly, assume node 0 is present for
the purpose of creating ibm,associativity-lookup-arrays property
under ibm,dynamic-reconfiguration-memory DT node. This ensures that
the associativity index property is correctly updated in ibm,dynamic-memory
for the LMB that is hotplugged.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently when user specifies more slots than allowed max of
SPAPR_MAX_RAM_SLOTS (32), we error out like this:
qemu-system-ppc64: unsupported amount of memory slots: 64
Let the user know about the max allowed slots like this:
qemu-system-ppc64: Specified number of memory slots 64 exceeds max supported 32
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently PowerPC kernel doesn't allow hot-adding memory to memory-less
node, but instead will silently add the memory to the first node that has
some memory. This causes two unexpected behaviours for the user.
- Memory gets hotplugged to a different node than what the user specified.
- Since pc-dimm subsystem in QEMU still thinks that memory belongs to
memory-less node, a reboot will set things accordingly and the previously
hotplugged memory now ends in the right node. This appears as if some
memory moved from one node to another.
So until kernel starts supporting memory hotplug to memory-less
nodes, just prevent such attempts upfront in QEMU.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Make use of pc-dimm infrastructure to support memory hotplug
for PowerPC.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The hash table size is dependent on ram_size, but since with hotplug
the memory can grow till maxram_size. Hence make hash table size dependent
on maxram_size.
This allows to hotplug huge amounts of memory to the guest.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Parse ibm,architecture.vec table obtained from the guest and enable
memory node configuration via ibm,dynamic-reconfiguration-memory if guest
supports it. This is in preparation to support memory hotplug for
sPAPR guests.
This changes the way memory node configuration is done. Currently all
memory nodes are built upfront. But after this patch, only memory@0 node
for RMA is built upfront. Guest kernel boots with just that and rest of
the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory)
are built when guest does ibm,client-architecture-support call.
Note: This patch needs a SLOF enhancement which is already part of
SLOF binary in QEMU.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Enable memory hotplug for pseries 2.4 and add LMB DR connectors.
With memory hotplug, enforce RAM size, NUMA node memory size and maxmem
to be a multiple of SPAPR_MEMORY_BLOCK_SIZE (256M) since that's the
granularity in which LMBs are represented and hot-added.
LMB DR connectors will be used by the memory hotplug code.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[spapr_drc_reset implementation]
[since this missed the 2.4 cutoff, changing to only enable for 2.5]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
sPAPR uses hard coded limit of maximum 255 supported CPUs which is
exactly the same as QEMU-wide limit which is MAX_CPUMASK_BITS and also
defined as 255.
This makes use of a global CPU number limit for the "pseries" machine.
In order to anticipate future increase of the MAX_CPUMASK_BITS
(or to help debugging large systems), this also bumps the FDT_MAX_SIZE
limit from 256K to 1M assuming that 1 CPU core needs roughly 512 bytes
in the device tree so the new limit can cover up to 2048 CPU cores.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The dynamic reconfiguration (hotplug) code for the pseries machine type
uses a "DR connector" QOM object for each resource it will be possible
to hotplug. Each of these is added to its owner using
object_property_add_child(owner, "dr-connector[*], ...);
That works ok, mostly, but it means that the property indices are
arbitrary, depending on the order in which the connectors are constructed.
That might line up to something useful, but it doesn't have to.
It will get worse once we add hotplug RAM support. That will add a DR
connector object for every 256MB of potential memory. So if maxmem=2T,
for example, there are 8192 objects under the same parent.
The QOM interfaces aren't really designed for this. In particular
object_property_add() with [*] has O(n^2) time complexity (in the number of
existing children): first it has a linear search through array indices to
find a free slot, each of which is attempted to a recursive call to
object_property_add() with a specific [N]. Those calls are O(n) because
there's a linear search through all properties to check for duplicates.
By using a meaningful index value, which we already know is unique we can
avoid the [*] special behaviour. That lets us reduce the total time for
creating the DR objects from O(n^3) to O(n^2).
O(n^2) is still kind of crappy, but it's enough to reduce the startup time
of qemu (with in-progress memory hotplug support) with maxmem=2T from ~20
minutes to ~4 seconds.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Certain methods in sPAPRDRConnector objects are only ever called by
RTAS and in many cases are responsible for the logic that determines
the RTAS return codes.
Rather than having a level of indirection requiring RTAS code to
re-interpret return values from such methods to determine the
appropriate return code, just pass them through directly.
This requires changing method return types to uint32_t to match the
type of values currently passed to RTAS helpers.
In the case of read accesses like drc->entity_sense() where we weren't
previously reporting any errors, just the read value, we modify the
function to return RTAS return code, and pass the read value back via
reference.
Suggested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Initialize a hotplug memory region under which all the hotplugged
memory is accommodated. Also enable memory hotplug by setting
CONFIG_MEM_HOTPLUG.
Modelled on i386 memory hotplug.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Logical resources start with allocation-state:UNUSABLE /
isolation-state:ISOLATED. During hotplug, guests will transition
them to allocation-state:USABLE, and then to
isolation-state:UNISOLATED.
For cases where we cannot transition to allocation-state:USABLE,
in this case due to no device/resource being association with
the logical DRC, we should return an error -3.
For physical DRCs, we default to allocation-state:USABLE and stay
there, so in this case we should report an error -3 when the guest
attempts to make the isolation-state:ISOLATED transition for a DRC
with no device associated.
These are as documented in PAPR 2.7, 13.5.3.4.
We also ensure allocation-state:USABLE when the guest attempts
transition to isolation-state:UNISOLATED to deal with misbehaving
guests attempting to bring online an unallocated logical resource.
This is as documented in PAPR 2.7, 13.7.
Currently we implement no such error logic. Fix this by handling
these error cases as PAPR defines.
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PAPR requires ibm,req#msi and ibm,req#msi-x to be present in the
device node to define the number of msi/msi-x interrupts the device
supports, respectively.
Currently we have ibm,req#msi-x hardcoded to a non-sensical constant
that happens to be 2, and are missing ibm,req#msi entirely. The result
of that is that msi-x capable devices get limited to 2 msi-x
interrupts (which can impact performance), and msi-only devices likely
wouldn't work at all. Additionally, if devices expect a minimum that
exceeds 2, the guest driver may fail to load entirely.
SLOF still owns the generation of these properties at boot-time
(although other device properties have since been offloaded to QEMU),
but for hotplugged devices we rely on the values generated by QEMU
and thus hit the limitations above.
Fix this by generating these properties in QEMU as expected by guests.
In the future it may make sense to modify SLOF to pass through these
values directly as we do with other props since we're duplicating SLOF
code.
Cc: qemu-ppc@nongnu.org
Cc: qemu-stable@nongnu.org
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For setting debug watchpoints, sPAPR guests use H_SET_MODE hypercall.
The existing QEMU H_SET_MODE handler does not support this but
the KVM handler in HV KVM does. However it is not enabled.
This enables the in-kernel H_SET_MODE handler which handles:
- Completed Instruction Address Breakpoint Register
- Watch point 0 registers.
The rest is still handled in QEMU.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The device tree presented to pseries machine type guests includes an
ibm,chip-id property which gives essentially the socket number of each
vcpu core (individual vcpu threads don't get a node in the device
tree).
To calculate this, it uses a vcpus_per_socket variable computed as
(smp_cpus / #sockets). This is correct for the usual case where
smp_cpus == smp_threads * smp_cores * #sockets.
However, you can start QEMU with the number of cores and threads
mismatching the total number of vcpus (whether that _should_ be
permitted is a topic for another day). It's a bit hard to say what
the "real" number of vcpus per socket here is, but for most purposes
(smp_threads * smp_cores) will more meaningfully match how QEMU
behaves with respect to socket boundaries.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
When a device is hotplugged, attach() sets "configured" to
false, waiting an action from the OS to configure it and then
to call ibm,configure-connector. On ibm,configure-connector,
the hypervisor sets "configured" to true.
In case of coldplugged device, attach() sets "configured" to
false, but firmware and OS never call the ibm,configure-connector
in this case, so it remains set to false.
It could be harmless, but when we unplug a device, hypervisor
waits the device becomes configured because for it, a not configured
device is a device being configured, so it waits the end of configuration
to unplug it... and it never happens, so it is never unplugged.
This patch set by default coldplugged device to "configured=true",
hotplugged device to "configured=false".
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This introduces rtas_ldq() to load 64-bits parameter from continuous
two 4-bytes memory chunk of RTAS parameter buffer, to simplify the
code.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If drmgr is used in the guest to hotplug a device before a device_add
has been issued via the QEMU monitor, QEMU segfaults in configure_connector
call. This occurs due to accessing of NULL FDT which otherwise would have
been created and associated with the DRC during device_add command.
Check for NULL FDT and return failure from configure_connector call.
As per PAPR+, an error value of -9003 seems appropriate for this failure.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
To see the output of the hcall_dprintf statements, you currently have
to enable the DEBUG_SPAPR_HCALLS macro in include/hw/ppc/spapr.h.
This is ugly because a) not every user who wants to debug guest
problems can or wants to recompile QEMU to be able to see such issues,
and b) since this macro is disabled by default, the code in the
hcall_dprintf() brackets tends to bitrot until somebody temporarily
enables that macro again.
Since the hcall_dprintf statements except one indicate guest
problems, let's always use qemu_log_mask(LOG_GUEST_ERROR, ...) for
this macro instead. One spot indicated an unimplemented host feature,
so this is changed into qemu_log_mask(LOG_UNIMP, ...) instead. Now
it's possible to see all those messages by simply adding the CLI
parameter "-d guest_errors,unimp", without the need to re-compile
the binary.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The DRC_INDEX_ID_MASK macro does a left shift on ~0, which is a signed
quantity, and therefore undefined behaviour according to the C spec. In
particular this causes warnings from the clang sanitizer.
This fixes it by calculating the same mask without using ~0 (I think the
new method is a more common idiom for generating masks anyway). For good
measure I also use 1ULL to force the expression's type to unsigned long
long, which should be good for assigning to anything we're going to want
to.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
dumpdtb (-machine dumpdtb=<file>) allows one to inspect the generated
device tree of machine types that generate device trees. This is
useful for a) seeing what's there b) debugging/testing device tree
generator patches. It can be used as follows
$QEMU_CMDLINE -machine dumpdtb=dtb
dtc -I dtb -O dts dtb
Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Improve the SPLPAR Characteristics information:
Add MaxPlatProcs: set to max_cpus, the maximum CPUs that could be
addded to the system.
Add DesMem: set to the initial memory of the system.
Add DesProcs: set to smp_cpus, the inital number of CPUs in the
system.
These tokens and values are specified by PAPR.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, rtas_ibm_change_msi() always returns four values even if
less are specified.
Correct this by only returning the fourth parameter if it was
requested.
This is specified by PAPR.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU is MSI-X capable and makes it available via ibm,change-msi, so
we should indicate this by adding /rtas/ibm,change-msix-capable to the
device tree.
This is specificed by PAPR.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU has a notion of the guest name, so if it's present we might as
well put that into the device tree as /ibm,partition-name.
This is specificed by PAPR.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add pseries-2.5 machine version.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[Altered to merge before memory hotplug -- dwg]
[Altered to work with b9f072d01 -- dwg]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Include an error message when migration fails due to mismatch in
htab_shift values at source and target. This should provide a bit more
verbose message in addition to the current migration failure message
that reads like:
qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr/htab'
After this patch, the failure message will look like this:
qemu-system-ppc64: htab_shift mismatch: source 29 target 24
qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr/htab'
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU does have an I/O thread now, that can be interrupted at any time
because the VCPU thread runs outside the iothread mutex.
Therefore, the kvmppc_timer_hack is obsolete. Remove it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The script used for converting from QEMUMachine had used one
DEFINE_MACHINE() per machine registered. In cases where multiple
machines are registered from one source file, avoid the excessive
generation of module init functions by reverting this unrolling.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Convert all machines to use DEFINE_MACHINE() instead of QEMUMachine
automatically using a script.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[AF: Style cleanups, convert imx25_pdk machine]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Now all TYPE_MACHINE subclasses use MACHINE_TYPE_NAME to generate the
class name. So instead of requiring each subclass to set
MachineClass::name manually, we can now set it automatically at the
TYPE_MACHINE class_base_init() function.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
[AF/ehabkost: Updated for s390-ccw machines]
[AF: Cleanup of intermediate virt and vexpress name handling]
Signed-off-by: Andreas Färber <afaerber@suse.de>
It will result in exactly the same class name, but it will make the code
consistent with the other classes.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Machine class names should use the "-machine" suffix to allow
class-name-based machine class lookup to work. Rename the the pseries
machine classes using the MACHINE_TYPE_NAME macro.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Symptom:
$ qemu-system-x86_64 -m 10000000
Unexpected error in ram_block_add() at /work/armbru/qemu/exec.c:1456:
upstream-qemu: cannot set up guest memory 'pc.ram': Cannot allocate memory
Aborted (core dumped)
Root cause: commit ef701d7 screwed up handling of out-of-memory
conditions. Before the commit, we report the error and exit(1), in
one place, ram_block_add(). The commit lifts the error handling up
the call chain some, to three places. Fine. Except it uses
&error_abort in these places, changing the behavior from exit(1) to
abort(), and thus undoing the work of commit 3922825 "exec: Don't
abort when we can't allocate guest memory".
The three places are:
* memory_region_init_ram()
Commit 4994653 (right after commit ef701d7) lifted the error
handling further, through memory_region_init_ram(), multiplying the
incorrect use of &error_abort. Later on, imitation of existing
(bad) code may have created more.
* memory_region_init_ram_ptr()
The &error_abort is still there.
* memory_region_init_rom_device()
Doesn't need fixing, because commit 33e0eb5 (soon after commit
ef701d7) lifted the error handling further, and in the process
changed it from &error_abort to passing it up the call chain.
Correct, because the callers are realize() methods.
Fix the error handling after memory_region_init_ram() with a
Coccinelle semantic patch:
@r@
expression mr, owner, name, size, err;
position p;
@@
memory_region_init_ram(mr, owner, name, size,
(
- &error_abort
+ &error_fatal
|
err@p
)
);
@script:python@
p << r.p;
@@
print "%s:%s:%s" % (p[0].file, p[0].line, p[0].column)
When the last argument is &error_abort, it gets replaced by
&error_fatal. This is the fix.
If the last argument is anything else, its position is reported. This
lets us check the fix is complete. Four positions get reported:
* ram_backend_memory_alloc()
Error is passed up the call chain, ultimately through
user_creatable_complete(). As far as I can tell, it's callers all
handle the error sanely.
* fsl_imx25_realize(), fsl_imx31_realize(), dp8393x_realize()
DeviceClass.realize() methods, errors handled sanely further up the
call chain.
We're good. Test case again behaves:
$ qemu-system-x86_64 -m 10000000
qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory
[Exit 1 ]
The next commits will repair the rest of commit ef701d7's damage.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1441983105-26376-3-git-send-email-armbru@redhat.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
* qemu_mutex_lock_iothread "No such process" fix
* cutils: qemu_strto* wrappers
* iohandler.c simplification
* Many other fixes and misc patches.
And some MTTCG work (with Emilio's fixes squashed):
* Signal-free TCG kick
* Removing spinlock in favor of QemuMutex
* User-mode emulation multi-threading fixes/docs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJV8Tk7AAoJEL/70l94x66Ds3QH/3bi0RRR2NtKIXAQrGo5tfuD
NPMu1K5Hy+/26AC6mEVNRh4kh7dPH5E4NnDGbxet1+osvmpjxAjc2JrxEybhHD0j
fkpzqynuBN6cA2Gu5GUNoKzxxTmi2RrEYigWDZqCftRXBeO2Hsr1etxJh9UoZw5H
dgpU3j/n0Q8s08jUJ1o789knZI/ckwL4oXK4u2KhSC7ZTCWhJT7Qr7c0JmiKReaF
JEYAsKkQhICVKRVmC8NxML8U58O8maBjQ62UN6nQpVaQd0Yo/6cstFTZsRrHMHL3
7A2Tyg862cMvp+1DOX3Bk02yXA+nxnzLF8kUe0rYo6llqDBDStzqyn1j9R0qeqA=
=nB06
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Support for jemalloc
* qemu_mutex_lock_iothread "No such process" fix
* cutils: qemu_strto* wrappers
* iohandler.c simplification
* Many other fixes and misc patches.
And some MTTCG work (with Emilio's fixes squashed):
* Signal-free TCG kick
* Removing spinlock in favor of QemuMutex
* User-mode emulation multi-threading fixes/docs
# gpg: Signature made Thu 10 Sep 2015 09:03:07 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
* remotes/bonzini/tags/for-upstream: (44 commits)
cutils: work around platform differences in strto{l,ul,ll,ull}
cpu-exec: fix lock hierarchy for user-mode emulation
exec: make mmap_lock/mmap_unlock globally available
tcg: comment on which functions have to be called with mmap_lock held
tcg: add memory barriers in page_find_alloc accesses
remove unused spinlock.
replace spinlock by QemuMutex.
cpus: remove tcg_halt_cond and tcg_cpu_thread globals
cpus: protect work list with work_mutex
scripts/dump-guest-memory.py: fix after RAMBlock change
configure: Add support for jemalloc
add macro file for coccinelle
configure: factor out adding disas configure
vhost-scsi: fix wrong vhost-scsi firmware path
checkpatch: remove tests that are not relevant outside the kernel
checkpatch: adapt some tests to QEMU
CODING_STYLE: update mixed declaration rules
qmp: Add example usage of strto*l() qemu wrapper
cutils: Add qemu_strtoull() wrapper
cutils: Add qemu_strtoll() wrapper
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
My Coccinelle semantic patch finds a few more, because it also fixes up
the equally pointless conditional
if (foo) {
free(foo);
foo = NULL;
}
Result (feel free to squash it into your patch):
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Use the same API to trigger interruption of a CPU, no matter if
under TCG or KVM. There is no difference: these calls come from
the CPU thread, so the qemu_cpu_kick calls will send a signal
to the running thread and it will be processed synchronously,
just like a call to cpu_exit. The only difference is in the
overhead, but neither call to cpu_exit (now qemu_cpu_kick)
is in a hot path.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is unused. cpu_exit now is almost exclusively an internal function
to the CPU execution loop. In a few patches, we'll change the remaining
occurrences to qemu_cpu_kick, making it truly internal.
Reviewed-by: Richard henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some kernels program a 0 address for io regions. PCI 3.0 spec
section 6.2.5.1 doesn't seem to disallow this.
based on patch by Michael Roth <mdroth@linux.vnet.ibm.com>
Add pci_allow_0_addr in MachineClass to conditionally
allow addr 0 for pseries, as this can break other architectures.
This patch allows to hotplug PCI card in pseries machine, as the first
added card BAR0 is always set to 0 address.
This as a temporary hack, waiting to fix PCI memory priorities for more
machine types...
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit e0cf11f31c ("timer: Use a single
definition of NSEC_PER_SEC for the whole codebase") renamed
NANOSECONDS_PER_SECOND to NSEC_PER_SEC.
On Mac OS X there is a <dispatch/time.h> system header which also
defines NSEC_PER_SEC. This causes compiler warnings.
Let's use the old name instead. It's longer but it doesn't clash.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1436364609-7929-1-git-send-email-stefanha@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A few last minute PPC changes for 2.4:
- spapr: Update SLOF
- spapr: Fix a few bugs
- spapr: Preparation for hotplug
- spapr: Minor code cleanups
- linux-user: Add mftb handling
- kvm: Enable hugepage support with memory-backend-file
- mac99: Remove nonexistent interrupt pin (Mac OS 9 fix)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJVm/TZAAoJECszeR4D/txg0rUP/R1C5IAuY0vM7LOYRbp1jFmn
EO6AZpJaXvT2xP0wUd/rqJct/O41vDVbEmnhpUAQwZcgsyw1UaKhRQbnCtY9PHD2
d7NZiBdv3AAbh8pLFadRjDJr/HrfuWVfjKKep5cM87/o3zjVreeIX8Hs77xHia6/
90n3hcDF4QL8qx6fxCMT4mGpTtbxw85IcK2wyIU45cZSN0VYaTjDwcYokeSKqgxV
pi7UjZSM5nZcn7VI1Uray4NkgXGs92Lorrbg08OFQt0AoXROJOk4V/LX3HkHfDbI
BYUgaOQNdBkytkB3fJCsTgl2Up82bVP/tghMyZOIyBAU4MslnAOW6HAMX2TtNswx
7itnIb7DQsVDE/U234Xzf5qoH5x4nB9xKh2qLHPKSpgLChs6lAW37Af3N+V03JVb
k/WX6i0n70a6kUqCxcMTnVSINWandU2jdJ/S8woIqs6XhfLk7hh0ucg+VhgoQxW7
QpeD69c25eVHeZDoMKR/ZTigJg/EQGuV9B9OSx6SyA9WMS4dImt1m0PBdaUlIAFT
759lMMwQIb5sQYghJ63tgrOI5PBneGnelM1zDWt2SCS0rbSjLWIWP47pHoNmnzzp
vIhJX5GgVuzf0NrbZPSR7/6NuKKU6UW5CTGh3vFgRib/CWIbEgCE3yWQfflZKy5q
Q2xBuAjyWnBoipzI4hlz
=+Uma
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/agraf/tags/signed-ppc-for-upstream' into staging
Patch queue for ppc - 2015-07-07
A few last minute PPC changes for 2.4:
- spapr: Update SLOF
- spapr: Fix a few bugs
- spapr: Preparation for hotplug
- spapr: Minor code cleanups
- linux-user: Add mftb handling
- kvm: Enable hugepage support with memory-backend-file
- mac99: Remove nonexistent interrupt pin (Mac OS 9 fix)
# gpg: Signature made Tue Jul 7 16:48:41 2015 BST using RSA key ID 03FEDC60
# gpg: Good signature from "Alexander Graf <agraf@suse.de>"
# gpg: aka "Alexander Graf <alex@csgraf.de>"
* remotes/agraf/tags/signed-ppc-for-upstream: (30 commits)
sPAPR: Clear stale MSIx table during EEH reset
sPAPR: Reenable EEH functionality on reboot
sPAPR: Don't enable EEH on emulated PCI devices
spapr-vty: Use TYPE_ definition instead of hardcoding
spapr_vty: lookup should only return valid VTY objects
spapr_pci: drop redundant args in spapr_[populate, create]_pci_child_dt
spapr_pci: populate ibm,loc-code
spapr_pci: enumerate and add PCI device tree
xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
ppc: Update cpu_model in MachineState
spapr: Consolidate cpu init code into a routine
spapr: Reorganize CPU dt generation code
cpus: Add a macro to walk CPUs in reverse
spapr: Support ibm, lrdr-capacity device tree property
spapr: Consider max_cpus during xics initialization
Revert "hw/ppc/spapr_pci.c: Avoid functions not in glib 2.12 (g_hash_table_iter_*)"
spapr_iommu: translate sPAPRTCEAccess to IOMMUAccessFlags
spapr_iommu: drop erroneous check in h_put_tce_indirect()
spapr_pci: set device node unit address as hex
spapr_pci: encode class code including Prog IF register
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The PCI device MSIx table is cleaned out in hardware after EEH PE
reset. However, we still hold the stale MSIx entries in QEMU, which
should be cleared accordingly. Otherwise, we will run into another
(recursive) EEH error and the PCI devices contained in the PE have
to be offlined exceptionally.
The patch introduces function spapr_phb_vfio_eeh_pre_reset(), which
is called by sPAPR when asserting hot or fundamental reset, to clear
stale MSIx table for VFIO PCI devices before EEH PE reset so that
MSIx table could be restored properly after EEH PE reset.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
When rebooting the guest, some PEs might be in frozen state. The
contained PCI devices won't work properly if their frozen states
aren't cleared in time. One case running into this situation would
be maximal EEH error times encountered in the guest.
The patch reenables the EEH functinality on PEs on PHB's reset
callback, which will clear their frozen states if needed.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
There might have emulated PCI devices, together with VFIO PCI
devices under one PHB. The EEH capability shouldn't enabled
on emulated PCI devices.
The patch returns error when enabling EEH capability on emulated
PCI devices by RTAS call "ibm,set-eeh-option".
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
* phb_index is not being used and if required can be obtained from sphb
* use helper to get drc_index in spapr_populate_pci_child_dt()
* Check if drc_index is zero
Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Each hardware instance has a platform unique location code. The OF
device tree that describes a part of a hardware entity must include
the “ibm,loc-code” property with a value that represents the location
code for that hardware entity.
Populate ibm,loc-code.
1) PCI passthru devices need to identify with its own ibm,loc-code
available on the host. In failure cases use:
vfio_<name>:<phb-index>:<bus>:<slot>.<fn>
2) Emulated devices encode as following:
qemu_<name>:<phb-index>:<bus>:<slot>.<fn>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
All the PCI enumeration and device node creation was off-loaded to
SLOF. With PCI hotplug support, code needed to be added to add device
node. This creates multiple copy of the code one in SLOF and other in
hotplug code. To unify this, the patch adds the pci device node
creation in Qemu. For backward compatibility, a flag
"qemu,phb-enumerated" is added to the phb, suggesting to SLOF to not
do device node creation.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[ Squashed Michael's drc_index changes ]
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Keep cpu_model field in MachineState uptodate so that it can be used
from the CPU hotplug path.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Factor out bits of sPAPR specific CPU initialization code into
a separate routine so that it can be called from CPU hotplug
path too.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Reorganize CPU device tree generation code so that it be reused from
hotplug path. CPU dt entries are now generated from spapr_finalize_fdt()
instead of spapr_create_fdt_skel().
Note: This is how the split-up looks like now:
Boot path
---------
spapr_finalize_fdt
spapr_populate_cpus_dt_node
spapr_populate_cpu_dt
spapr_fixup_cpu_numa_dt
spapr_fixup_cpu_smt_dt
ibm,cas path
------------
spapr_h_cas_compose_response
spapr_fixup_cpu_dt
spapr_fixup_cpu_numa_dt
spapr_fixup_cpu_smt_dt
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Add support for ibm,lrdr-capacity since this is needed by the guest
kernel to know about the possible hot-pluggable CPUs and Memory. With
this, pseries kernels will start reporting correct maxcpus in
/sys/devices/system/cpu/possible.
Also define the minimum hotpluggable memory size as 256MB.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: Fix compile error on 32bit hosts]
Signed-off-by: Alexander Graf <agraf@suse.de>
Use max_cpus instead of smp_cpus when intializating xics system. Also
report max_cpus in ibm,interrupt-server-ranges device tree property of
interrupt controller node.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Since we now require GLib 2.22+ (commit f40685c), we don't have to
work around lack of g_hash_table_iter_init() & friends anymore.
This reverts commit f8833a37c0.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The fact that these enums have matching values is pure coincidence. We
actually need to translate from the PAPR definition to the QEMU one.
This patch doesn't fix any bug, it is only code cleanup.
Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The tce_list variable is not a TCE but the address to a TCE: we shouldn't
clear permission bits as we do now. And this is dead code anyway since we
check tce_list is 4K aligned a few lines above.
This patch doesn't fix any bug, it is only code cleanup.
Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Device node names should encode the unit address as hex, while the
code was encodind it as integers.
Also, use FDT_NAME_MAX macro for allocating and composing the name.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Current code missed the Prog IF register. All Class Code, Subclass,
and Prog IF registers are needed to identify the accurate device type.
For example: USB controllers use the PROG IF for denoting: USB
FullSpeed, HighSpeed or SuperSpeed.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The properties reg/assigned-resources need to encode 64-bit memory
address space as part of phys.hi dword.
00 if configuration space
01 if IO region,
10 if 32-bit MEM region
11 if 64-bit MEM region
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently although we have an sPAPRMachineState descended from MachineState
we don't have an sPAPRMAchineClass descended from MachineClass. So far it
hasn't been needed, but several upcoming features are going to want it,
so this patch creates a stub implementation.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The sPAPRMachineState structure includes an entry_point field containing
the initial PC value for starting the machine, even though this always has
the value 0x100.
I think this is a hangover from very early versions which bypassed the
firmware when using -kernel. In any case it has no function now, so remove
it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The ram_limit field was imported from sPAPREnvironment where it predates
the machine's ram size being available generically from machine->ram_size.
Worse, the existing code was inconsistent about where it got the ram size
from. Sometimes it used spapr->ram_limit, sometimes the global 'ram_size'
and sometimes a local 'ram_size' masking the global.
This cleans up the code to consistently use machine->ram_size, eliminating
spapr->ram_limit in the process.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The code for -machine pseries maintains a global sPAPREnvironment structure
which keeps track of general state information about the guest platform.
This predates the existence of the MachineState structure, but performs
basically the same function.
Now that we have the generic MachineState, fold sPAPREnvironment into
sPAPRMachineState, the pseries specific subclass of MachineState.
This is mostly a matter of search and replace, although a few places which
relied on the global spapr variable are changed to find the structure via
qdev_get_machine().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
XICS needs to know the upper value for cpu_index as it is used to compute
the number of servers:
smp_cpus * kvmppc_smt_threads() / smp_threads
When passing -smp cpus=1,threads=9 on a POWER8 host, we end up with:
1 * 8 / 9 = 0
... which leads to an assertion in both emulated:
Number of servers needs to be greater 0
Aborted (core dumped)
... and in-kernel XICS:
xics_kvm_realize: Assertion `icp->nr_servers' failed.
Aborted (core dumped)
With this patch, we are sure that nr_servers > 0. Passing the same bogus
-smp option then leads to:
qemu-system-ppc64: Cannot support more than 8 threads on PPC with KVM
... which is a lot more explicit than the XICS errors.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This section would be sent:
a- for all new machine types
b- for old machine types if section state is different form {running,paused}
that were the only giving us troubles.
So, in new qemus: it is alwasy there. In old qemus: they are only
there if it an error has happened, basically stoping on target.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
I forgot to add compatibility for Power when adding section footers.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: 37fb569c01
Signed-off-by: Juan Quintela <quintela@redhat.com>
In particular, don't include it into headers.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
In particular, don't include it into headers.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
These macros expand into error class enumeration constant, comma,
string. Unclean. Has been that way since commit 13f59ae.
The error class is always ERROR_CLASS_GENERIC_ERROR since the previous
commit.
Clean up as follows:
* Prepend every use of a QERR_ macro by ERROR_CLASS_GENERIC_ERROR, and
delete it from the QERR_ macro. No change after preprocessing.
* Rewrite error_set(ERROR_CLASS_GENERIC_ERROR, ...) into
error_setg(...). Again, no change after preprocessing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
On ppc, sparc, and sparc64, the value of the FW_CFG_BOOT_DEVICE 16bit
fw_cfg entry is repeatedly modified from a series of callbacks, which
currently results in the previous value's dynamically allocated memory
being leaked.
This patch switches updating to the new fw_cfg_modify_i16() call, which
does not cause memory leaks.
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
qemu currently implements the hypercalls H_LOGICAL_CI_LOAD and
H_LOGICAL_CI_STORE as PAPR extensions. These are used by the SLOF firmware
for IO, because performing cache inhibited MMIO accesses with the MMU off
(real mode) is very awkward on POWER.
This approach breaks when SLOF needs to access IO devices implemented
within KVM instead of in qemu. The simplest example would be virtio-blk
using an iothread, because the iothread / dataplane mechanism relies on
an in-kernel implementation of the virtio queue notification MMIO.
To fix this, an in-kernel implementation of these hypercalls has been made,
(kernel commit 99342cf "kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM"
however, the hypercalls still need to be enabled from qemu. This performs
the necessary calls to do so.
It would be nice to provide some warning if we encounter a problematic
device with a kernel which doesn't support the new calls. Unfortunately,
I can't see a way to detect this case which won't either warn in far too
many cases that will probably work, or which is horribly invasive.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This uses extension of existing EPOW interrupt/event mechanism
to notify userspace tools like librtas/drmgr to handle
in-guest configuration/cleanup operations in response to
device_add/device_del.
Userspace tools that don't implement this extension will need
to be run manually in response/advance of device_add/device_del,
respectively.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This enables hotplug of PCI devices to a PHB. Upon hotplug we
generate the OF-nodes required by PAPR specification and
IEEE 1275-1994 "PCI Bus Binding to Open Firmware" for the
device.
We associate the corresponding FDT for these nodes with the DRC
corresponding to the slot, which will be fetched via
ibm,configure-connector RTAS calls by the guest as described by PAPR
specification.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
These will be used to support hotplug/unplug of PCI devices to the PCI
bus associated with a particular PHB.
We also set up device-tree properties in each PHBs initial FDT to
describe the DRCs associated with them. This advertises to guests that
each PHB is DR-capable device with physical hotpluggable slots, each
managed by the corresponding DRC. This is necessary for allowing
hotplugging of devices to it later via bus rescan or guest rpaphp
hotplug module.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This option enables/disables PCI hotplug for a particular PHB.
Also add machine compatibility code to disable it by default for machine
types prior to pseries-2.4.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: move commas for compat fields]
Signed-off-by: Alexander Graf <agraf@suse.de>
This function handles generation of ibm,drc-* array device tree
properties to describe DRC topology to guests. This will by used
by the guest to direct RTAS calls to manage any dynamic resources
we associate with a particular DR Connector as part of
hotplug/unplug.
Since general management of boot-time device trees are handled
outside of sPAPRDRConnector, we insert these values blindly given
an FDT and offset. A mask of sPAPRDRConnector types is given to
instruct us on what types of connectors entries should be generated
for, since descriptions for different connectors may live in
different parts of the device tree.
Based on code originally written by Nathan Fontenot.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
We don't actually rely on this interface to surface hotplug events, and
instead rely on the similar-but-interrupt-driven check-exception RTAS
interface used for EPOW events. However, the existence of this interface
is needed to ensure guest kernels initialize the event-reporting
interfaces which will in turn be used by userspace tools to handle these
events, so we implement this interface here.
Since events surfaced by this call are mutually exclusive to those
surfaced via check-exception, we also update the RTAS event queue code
to accept a boolean to mark/filter for events accordingly.
Events of this sort are not currently generated by QEMU, but the interface
has been tested by surfacing hotplug events via event-scan in place
of check-exception.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This extends the data structures currently used to report EPOW events to
guests via the check-exception RTAS interfaces to also include event types
for hotplug/unplug events.
This is currently undocumented and being finalized for inclusion in PAPR
specification, but we implement this here as an extension for guest
userspace tools to implement (existing guest kernels simply log these
events via a sysfs interface that's read by rtas_errd, and current
versions of rtas_errd/powerpc-utils already support the use of this
mechanism for initiating hotplug operations).
We also add support for queues of pending RTAS events, since in the
case of hotplug there's chance for multiple events being in-flight
at any point in time.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This interface is used to fetch an OF device-tree nodes that describes a
newly-attached device to guest. It is called multiple times to walk the
device-tree node and fetch individual properties into a 'workarea'/buffer
provided by the guest.
The device-tree is generated by QEMU and passed to an sPAPRDRConnector during
the initial hotplug operation, and the state of these RTAS calls is tracked by
the sPAPRDRConnector. When the last of these properties is successfully
fetched, we report as special return value to the guest and transition
the device to a 'configured' state on the QEMU/DRC side.
See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This interface allows a guest to read various platform/device sensors.
initially, we only implement support necessary to support hotplug:
reading of the dr-entity-sense sensor, which communicates the state of
a hotplugged resource/device to the guest (EMPTY/PRESENT/UNUSABLE).
See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.
Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This interface allows a guest to control various platform/device
sensors. Initially, we only implement support necessary to control
sensors that are required for hotplug: DR connector indicators/LEDs,
resource allocation state, and resource isolation state.
See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.
Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
These interfaces manage the power domains that guest devices are
assigned to and are used to power on/off devices. Currently we
only utilize 1 power domain, the 'live-insertion' domain, which
automates power management of plugged/unplugged devices, essentially
making these calls no-ops, but the RTAS interfaces are still required
by guest hotplug code and PAPR+.
See docs/specs/ppc-spapr-hotplug.txt for a complete description of
these interfaces.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This device emulates a firmware abstraction used by pSeries guests to
manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
memory, and CPUs. It is conceptually similar to an SHPC device,
complete with LED indicators to identify individual slots to physical
physical users and indicate when it is safe to remove a device. In
some cases it is also used to manage virtualized resources, such a
memory, CPUs, and physical-host bridges, which in the case of pSeries
guests are virtualized resources where the physical components are
managed by the host.
Guests communicate with these DR Connectors using RTAS calls,
generally by addressing the unique DRC index associated with a
particular connector for a particular resource. For introspection
purposes we expose this state initially as QOM properties, and
in subsequent patches will introduce the RTAS calls that make use of
it. This constitutes to the 'guest' interface.
On the QEMU side we provide an attach/detach interface to associate
or cleanup a DeviceState with a particular sPAPRDRConnector in
response to hotplug/unplug, respectively. This constitutes the
'physical' interface to the DR Connector.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
hw_error() is designed for printing CPU-related error messages
(e.g. it also prints a full CPU register dump). For error messages
that are not directly related to CPU problems, a function like
error_report() should be used instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
When specifying a non-existing file with the "-bios" parameter, QEMU
complained that it "could not find LPAR rtas". That's obviously a
copy-n-paste bug from the code which loads the spapr-rtas.bin, it
should complain about a missing firmware file instead.
Additionally the error message was printed with hw_error() - which
also dumps the whole CPU state. However, this does not make much
sense here since the CPU is not running yet and thus the registers
only contain zeroes. So let's use error_report() here instead.
And while we're at it, let's also bail out if the firmware file
had zero length.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that 2.4 development has opened, create a new pseries machine type
variant. For now it is identical to the pseries-2.3 machine type, but
a number of new features are coming that will need to set backwards
compatibility options.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The check "liobn & 0xFFFFFFFF00000000ULL" in spapr_tce_find_by_liobn()
is completely useless since liobn is only declared as an uint32_t
parameter. Fix this by using target_ulong instead (this is what most
of the callers of this function are using, too).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Useful for debugging.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This replaces object_child_foreach() and callback with existing
SPAPR_PCI_LIOBN() and spapr_tce_find_by_liobn() to make the code easier
to read.
This is a mechanical patch so no behaviour change is expected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment spapr_tce_find_by_liobn() is used by H_PUT_TCE/...
handlers to find an IOMMU by LIOBN.
We are going to implement Dynamic DMA windows (DDW), new code
will go to a new file and we will use spapr_tce_find_by_liobn()
there too so let's make it public.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This makes find_phb()/find_dev() public and changed its names
to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to
be used from other parts of QEMU such as VFIO DDW (dynamic DMA window)
or VFIO PCI error injection or VFIO EEH handling - in all these
cases there are RTAS calls which are addressed to BUID+config_addr
in IEEE1275 format.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This is to reduce VIO noise while debugging PCI DMA.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This gets rid of a magic constant describing the default DMA window size
for an emulated PHB.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This introduces a macro which makes up a LIOBN from fixed prefix and
VIO device address (@reg property).
This is to keep LIOBN macros rendering consistent - the same macro for
PCI has been added by the previous patch.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
We are going to have multiple DMA windows per PHB and we want them to
migrate so we need a predictable way of assigning LIOBNs.
This introduces a macro which makes up a LIOBN from fixed prefix,
PHB index (unique PHB id) and window number.
This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
from LIOBN. It is used to distinguish the default 32bit windows from
dynamic windows and avoid picking default DMA window properties from
a wrong TCE table.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
PAPR is defined as big endian so TCEs need an adjustment so
does this patch.
This changes code to have ldq_be_phys() in one place.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The existing KVM_CREATE_SPAPR_TCE ioctl only support 4G windows max as
the window size parameter to the kernel ioctl() is 32-bit so
there's no way of expressing a TCE window > 4GB.
We are going to add huge DMA windows support so this will create small
window and unexpectedly fail later.
This disables KVM_CREATE_SPAPR_TCE for windows bigger that 4GB.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
spapr_pci.c contains a number of expressions of the form (uval == -1) or
(uval != -1), where 'uval' is an unsigned value.
This mostly works in practice, because as long as the width of uval is
greater or equal than that of (int), the -1 will be promoted to the
unsigned type, which is the expected outcome.
However, at least for the cases where uval is uint32_t, this would break
on platforms where sizeof(int) > 4 (and a few such do exist), because then
the uint32_t value would be promoted to the larger int type, and never be
equal to -1.
This patch fixes these errors. The fixes for the (uint32_t) cases are
necessary as described above. I've made similar fixes to (uint64_t) and
(hwaddr) cases. Those are strictly theoretical, since I don't know of any
platforms where sizeof(int) > 8, but hey, it's not that hard so we might
as well be strictly C standard compliant.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>