Now that the EEH code is independent of the special
spapr-vfio-pci-host-bridge device, we can allow it on all spapr PCI
host bridges instead. We do this by changing spapr_phb_eeh_available()
to be based on the vfio_eeh_as_ok() call instead of the host bridge class.
Because the value of vfio_eeh_as_ok() can change with devices being
hotplugged or unplugged, this can potentially lead to some strange edge
cases where the guest starts using EEH, then it starts failing because
of a change in status.
However, it's not really any worse than the current situation. Cases that
would have worked previously will still work (i.e. VFIO devices from at
most one VFIO IOMMU group per vPHB), it's just that it's no longer
necessary to use spapr-vfio-pci-host-bridge with the groupid pre-specified.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
The EEH operations in the spapr-vfio-pci-host-bridge no longer rely on the
special groupid field in sPAPRPHBVFIOState. So we can simplify, removing
the class specific callbacks with direct calls based on a simple
spapr_phb_eeh_enabled() helper. For now we implement that in terms of
a boolean in the class, but we'll continue to clean that up later.
On its own this is a rather strange way of doing things, but it's a useful
intermediate step to further cleanups.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
At present the PCI host bridge (PHB) for the pseries machine type has a
fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
memory via the PAPR paravirtualized IOMMU.
For better support of VFIO devices, we're going to want to allow for
different configurations of the DMA window.
Eventually we'll want to allow the guest itself to reconfigure the window
via the PAPR dynamic DMA window interface, but as a preliminary this patch
allows the user to reconfigure the window with new properties on the PHB
device.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
The code for -machine pseries maintains a global sPAPREnvironment structure
which keeps track of general state information about the guest platform.
This predates the existence of the MachineState structure, but performs
basically the same function.
Now that we have the generic MachineState, fold sPAPREnvironment into
sPAPRMachineState, the pseries specific subclass of MachineState.
This is mostly a matter of search and replace, although a few places which
relied on the global spapr variable are changed to find the structure via
qdev_get_machine().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This option enables/disables PCI hotplug for a particular PHB.
Also add machine compatibility code to disable it by default for machine
types prior to pseries-2.4.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: move commas for compat fields]
Signed-off-by: Alexander Graf <agraf@suse.de>
This makes find_phb()/find_dev() public and changed its names
to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to
be used from other parts of QEMU such as VFIO DDW (dynamic DMA window)
or VFIO PCI error injection or VFIO EEH handling - in all these
cases there are RTAS calls which are addressed to BUID+config_addr
in IEEE1275 format.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This gets rid of a magic constant describing the default DMA window size
for an emulated PHB.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The emulation for EEH RTAS requests from guest isn't covered
by QEMU yet and the patch implements them.
The patch defines constants used by EEH RTAS calls and adds
callbacks sPAPRPHBClass::{eeh_set_option, eeh_get_state, eeh_reset,
eeh_configure}, which are going to be used as follows:
* RTAS calls are received in spapr_pci.c, sanity check is done
there.
* RTAS handlers handle what they can. If there is something it
cannot handle and the corresponding sPAPRPHBClass callback is
defined, it is called.
* Those callbacks are only implemented for VFIO now. They do ioctl()
to the IOMMU container fd to complete the calls. Error codes from
that ioctl() are transferred back to the guest.
[aik: defined RTAS tokens for EEH RTAS calls]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment sPAPR only supports 512MB window for MMIO BARs. However
modern devices might want bigger 64bit BARs.
This extends MMIO window from 512MB to 62GB (aligned to
SPAPR_PCI_WINDOW_SPACING) and advertises it in 2 records in
the PHB "ranges" property. 32bit gets the space from
SPAPR_PCI_MEM_WIN_BUS_OFFSET till the end of 4GB, 64bit gets the rest
of the space. If no space is left, 64bit range is not advertised.
The MMIO space size is set to old value of 0x20000000 by default
for pseries machines older than 2.3.
The approach changes the device tree which is a guest visible change, however
it won't break migration as:
1. we do not support migration to older QEMU versions
2. migration to newer QEMU will migrate the device tree as well and since
the new layout only extends the old one and does not change address mappigns,
no breakage is expected here too.
SLOF change is required to utilize this extension.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
pseries guests can have large numbers of PCI host bridges. To avoid the
user having to specify a number of different configuration values for every
one, the device supports an "index" property which is a shorthand setting
the various window and configuration addresses from a predefined sensible
set.
There are some problems with the details at present:
* The "index" propery is signed, but negative values will create PCI
windows below where we expect, potentially colliding with other devices
* No limit is imposed on the "index" property and large values can
translate to extremely large window addresses. With PCI passthrough in
particular this can mean we exceed various mapping and physical address
limits causing the guest host bridge to not work in strange ways.
This patch addresses this, by making "index" unsigned, and imposing a
limit. Currently the limit allows indices from 0..255 which is probably
enough host bridges for the time being. It's fairly easy to extend if
we discover we need more.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
On sPAPR, virtio devices are connected to the PCI bus and use MSI-X.
Commit cc943c36fa has modified MSI-X
so that writes are made using the bus master address space and follow
the IOMMU path.
Unfortunately, the IOMMU address space address space does not have an
MSI window: the notification is silently dropped in unassigned_mem_write
instead of reaching the guest... The most visible effect is that all
virtio devices are non-functional on sPAPR since then. :(
This patch does the following:
1) map the MSI window into the IOMMU address space for each PHB
- since each PHB instantiates its own IOMMU address space, we
can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW)
- no real need to keep the MSI window setup in a separate function,
the spapr_pci_msi_init() code moves to spapr_phb_realize().
2) kill the global MSI window as it is not needed in the end
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently SPAPR PHB keeps track of all allocated MSI (here and below
MSI stands for both MSI and MSIX) interrupt because
XICS used to be unable to reuse interrupts. This is a problem for
dynamic MSI reconfiguration which happens when guest reloads a driver
or performs PCI hotplug. Another problem is that the existing
implementation can enable MSI on 32 devices maximum
(SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.
This makes use of new XICS ability to reuse interrupts.
This reorganizes MSI information storage in sPAPRPHBState. Instead of
static array of 32 descriptors (one per a PCI function), this patch adds
a GHashTable when @config_addr is a key and (first_irq, num) pair is
a value. GHashTable can dynamically grow and shrink so the initial limit
of 32 devices is gone.
This changes migration stream as @msi_table was a static array while new
@msi_devs is a dynamic hash table. This adds temporary array which is
used for migration, it is populated in "spapr_pci"::pre_save() callback
and expanded into the hash table in post_load() callback. Since
the destination side does not know the number of MSI-enabled devices
in advance and cannot pre-allocate the temporary array to receive
migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
which allocates the array automatically.
This resets the MSI configuration space when interrupts are released by
the ibm,change-msi RTAS call.
This fixed traces to be more informative.
This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
was incorrect by accident. As the internal representation changed,
thus bumps migration version number.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: drop g_malloc_n usage]
Signed-off-by: Alexander Graf <agraf@suse.de>
The patch adds a spapr-pci-vfio-host-bridge device type
which is a PCI Host Bridge with VFIO support. The new device
inherits from the spapr-pci-host-bridge device and adds an "iommu"
property which is an IOMMU id. This ID represents a minimal entity
for which IOMMU isolation can be guaranteed. In SPAPR architecture IOMMU
group is called a Partitionable Endpoint (PE).
Current implementation supports one IOMMU id per QEMU VFIO PHB. Since
SPAPR allows multiple PHB for no extra cost, this does not seem to
be a problem. This limitation may change in the future though.
Example of use:
Configure and Add 3 functions of a multifunctional device to QEMU:
(the NEC PCI USB card is used as an example here):
-device spapr-pci-vfio-host-bridge,id=USB,iommu=4,index=7 \
-device vfio-pci,host=4:0:1.0,addr=1.0,bus=USB,multifunction=true
-device vfio-pci,host=4:0:1.1,addr=1.1,bus=USB
-device vfio-pci,host=4:0:1.2,addr=1.2,bus=USB
where:
* index=7 is a QEMU PHB index (used as source for MMIO/MSI/IO windows
offset);
* iommu=4 is an IOMMU id which can be found in sysfs:
[aik@vpl2 ~]$ cd /sys/bus/pci/devices/0004:00:00.0/
[aik@vpl2 0004:00:00.0]$ ls -l iommu_group
lrwxrwxrwx 1 root root 0 Jun 5 12:49 iommu_group -> ../../../kernel/iommu_groups/4
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This removes window_size as it is basically a copy of nb_table
shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are
going to support windows as big as the entire RAM and this number
will be bigger that 32 capacity, we will have to do something
about @window_size anyway and removal seems to be the right way to go.
This removes dma_window_start/dma_window_size from sPAPRPHBState as
they are no longer used.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment sPAPRPHBState contains a @tcet pointer to the only
TCE table. However sPAPR spec allows having more than one DMA window.
Since the TCE object is already a child of SPAPR PHB object, there is
no need to keep an additional pointer to it in sPAPRPHBState so remove it.
This changes the way sPAPRPHBState::reset performs reset of sPAPRTCETable
objects.
This changes the default DMA window properties calculation.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the default DMA window is represented by a single MemoryRegion.
However there can be more than just one window so we need
a "root" memory region to be separated from the actual DMA window(s).
This introduces a "root" IOMMU memory region and adds a subregion for
the default DMA 32bit window. Following patches will add other
subregion(s).
This initializes a default DMA window subregion size to the guest RAM
size as this window can be switched into "bypass" mode which implements
direct DMA mapping.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The spapr-pci PHB initializes IOMMU for emulated devices only.
The upcoming VFIO support will do it different. However both emulated
and VFIO PHB types share most of the initialization code.
For the type specific things a new finish_realize() callback is
introduced.
This introduces sPAPRPHBClass derived from PCIHostBridgeClass and
adds the callback pointer.
This implements finish_realize() for emulated devices.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: Fix compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>
On the sPAPR platform a guest allocates MSI/MSIX vectors via RTAS
hypercalls which return global IRQ numbers to a guest so it only
operates with those and never touches MSIMessage.
Therefore MSIMessage handling is completely hidden in QEMU.
Previously every sPAPR PCI host bridge implemented its own MSI window
to catch msi_notify()/msix_notify() calls from QEMU devices (virtio-pci
or vfio) and route them to the guest via qemu_pulse_irq().
MSIMessage used to be encoded as:
.addr - address within the PHB MSI window;
.data - the device index on PHB plus vector number.
The MSI MR write function translated this MSIMessage to a global IRQ
number and called qemu_pulse_irq().
However the total number of IRQs is not really big (at the moment it is
1024 IRQs starting from 4096) and even 16bit data field of MSIMessage
seems to be enough to store an IRQ number there.
This simplifies MSI handling in sPAPR PHB. Specifically, this does:
1. remove a MSI window from a PHB;
2. add a single memory region for all MSIs to sPAPREnvironment
and spapr_pci_msi_init() to initialize it;
3. encode MSIMessage as:
* .addr - a fixed address of SPAPR_PCI_MSI_WINDOW==0x40000000000ULL;
* .data as an IRQ number.
4. change IRQ allocator to align first IRQ number in a block for MSI.
MSI uses lower bits to specify the vector number so the first IRQ has to
be aligned. MSIX does not need any special allocator though.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds the necessary support for saving the state of the PAPR virtual
PCI host bridge (or host bridges).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Message-id: 1374175984-8930-10-git-send-email-aliguori@us.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Use the new iommu support in the memory core for iommu support. The only
user, spapr, is also converted, but it still provides a DMAContext
interface until the non-PCI bits switch to AddressSpace.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
[ Do not calls memory_region_del_subregion() on the device's
bus_master_enable_region, it is an alias; return an AddressSpace
from the IOMMU hook and remove the destructor hook. - David Gibson ]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The TCE table is currently returned as a DMAContext, and non-type-safe
APIs are called later passing back the DMAContext. Since we want to move
away from DMAContext, use an opaque type instead, and add an accessor
to retrieve the DMAContext from it.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>