iommu_bus_init() registers a bus notifier on the given bus by using
a statically defined notifier block:
static struct notifier_block iommu_bus_nb = {
.notifier_call = iommu_bus_notifier,
};
This same notifier block is used for all busses. This causes a
problem for notifiers registered after iommu has registered this
callback on multiple busses. The problem is that a subsequent
notifier being registered on a bus which has this iommu notifier
will also get linked in to the notifier list of all other busses
which have this iommu notifier.
This patch fixes this by allocating the notifier_block at runtime.
Some error checking is also added to catch any allocation failure
or notifier registration error.
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
It turns out that our assumption that aliases are always to the same
slot isn't true. One particular platform reports an IVRS alias of the
SATA controller (00:11.0) for the legacy IDE controller (00:14.1).
When we hit this, we attempt to use a single IOMMU group for
everything on the same bus, which in this case is the root complex.
We already have multiple groups defined for the root complex by this
point, resulting in multiple WARN_ON hits.
This patch makes these sorts of aliases work again with IOMMU groups
by reworking how we search through the PCI address space to find
existing groups. This should also now handle looped dependencies and
all sorts of crazy inter-dependencies that we'll likely never see.
The recursion used here should never be very deep. It's unlikely to
have individual aliases and only theoretical that we'd ever see a
chain where one alias causes us to search through to yet another
alias. We're also only dealing with PCIe device on a single bus,
which means we'll typically only see multiple slots in use on the root
complex. Loops are also a theoretically possibility, which I've
tested using fake DMA alias quirks and prevent from causing problems
using a bitmap of the devfn space that's been visited.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org # 3.17
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This function will replace the current iommu_domain_has_cap
function and clean up the interface while at it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When a non-PCI device is passed to that function it might
pass group == NULL to iommu_group_add_device() which then
dereferences it and cause a crash this way. Fix it by
just returning an error for non-PCI devices.
Fixes: 104a1c13ac
Cc: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This structure is read-only data and should never be modified.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently each IOMMU driver that supports IOMMU groups has its own
code for discovering the base device used in grouping. This code
is generally not specific to the IOMMU hardware, but to the bus of
the devices managed by the IOMMU. We can therefore create a common
interface for supporting devices on different buses.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Commit 6197ca82 (iommu: Use %pa and %zx instead of casting) introduced the
usage of '%pa', but still kept the '0x', which leads to printing '0x0x'.
Remove the '0x' when '%pa' is used.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Change iommu driver to call unmap trace event. This iommu_map_unmap class
event can be enabled to trigger when iommu unmap iommu ops is called. Trace
information includes iova, physical address (map event only), and size.
Testing:
Added trace calls to iommu_prepare_identity_map() for testing some of the
conditions that are hard to trigger. Here is the trace from the testing:
swapper/0-1 [003] .... 1.854102: unmap: IOMMU: iova=0x00000000cb800000 size=0x400
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Change iommu driver to call map trace event. This iommu_map_unmap class event
can be enabled to trigger when iommu map iommu ops is called. Trace information
includes iova, physical address (map event only), and size.
Testing:
Added trace calls to iommu_prepare_identity_map() for testing some of the
conditions that are hard to trigger. Here is the trace from the testing:
swapper/0-1 [003] .... 1.854102: map: IOMMU: iova=0x00000000cb800000 paddr=0x00000000cf9fffff size=0x400
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Change iommu driver to call detach_device_to_domain trace event. This
iommu_device class event can be enabled to trigger when devices are detached
from a domain. Trace information includes device name.
Testing:
Added trace calls to iommu_prepare_identity_map() for testing some of the
conditions that are hard to trigger. Here is the trace from the testing:
swapper/0-1 [003] .... 1.854102: detach_device_from_domain: IOMMU: device=0000:00:02.0
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Change iommu driver to call attach_device_to_domain trace event. This
iommu_device class event can be enabled to trigger when devices are attached
to a domain. Trace information includes device name.
Testing:
Added trace calls to iommu_prepare_identity_map() for testing some of the
conditions that are hard to trigger. Here is the trace from the testing:
swapper/0-1 [003] .... 1.854102: attach_device_to_domain: IOMMU: device=0000:00:02.0
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Change iommu driver to call remove_device_to_group trace event. This
iommu_group class event can be enabled to trigger when devices get
removed from an iommu group. Trace information includes iommu group id and
device name.
Testing:
Added trace calls to iommu_prepare_identity_map() for testing some of the
conditions that are hard to trigger. Here is the trace from the testing:
swapper/0-1 [003] .... 1.854101: remove_device_from_group: IOMMU: groupID=0 device=0000:00:02.0
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Change iommu driver to call add_device_to_group trace event. This iommu_group
class event can be enabled to trigger when devices get added to an iommu group.
Trace information includes iommu group id and device name.
Testing:
The following is trace is generated when intel-iommu driver adds devices to
to iommu groups during boot-time during its initialization:
swapper/0-1 [003] .... 1.854793: add_device_to_group: IOMMU: groupID=0 device=0000:00:00.0
swapper/0-1 [003] .... 1.854797: add_device_to_group: IOMMU: groupID=1 device=0000:00:02.0
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Add tracing feature to iommu to report various iommu events. Classes
iommu_group, iommu_device, and iommu_map_unmap are defined.
iommu_group class events can be enabled to trigger when devices get added
to and removed from an iommu group. Trace information includes iommu group
id and device name.
iommu:add_device_to_group
iommu:remove_device_from_group
iommu_device class events can be enabled to trigger when devices are attached
to and detached from a domain. Trace information includes device name.
iommu:attach_device_to_domain
iommu:detach_device_from_domain
iommu_map_unmap class events can be enabled to trigger when iommu map and
unmap iommu ops. Trace information includes iova, physical address (map event
only), and size.
iommu:map
iommu:unmap
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
printk supports using %pa for phys_addr_t and
%zx for size_t so use those instead of %lx and
casts to unsigned long.
Other miscellaneous changes around this:
Always use 0x%zx for size instead of one use of decimal.
Coalesce format and align arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
iommu_map splits requests into pages that the iommu driver reports
that it can handle. The iommu_unmap path does not do the same. This
can cause problems not only from callers that might expect the same
behavior as the map path, but even from the failure path of iommu_map,
should it fail at a point where it has mapped and needs to unwind a
set of pages that the iommu driver cannot handle directly. amd_iommu,
for example, will BUG_ON if asked to unmap a non power of 2 size.
Fix this by extracting and generalizing the sizing code from the
iommu_map path and use it for both map and unmap.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
As IOMMU groups are exposed to the user space by their numbers,
the user space can use them in various kernel APIs so the kernel
might need an API to find a group by its ID.
As an example, QEMU VFIO on PPC64 platform needs it to associate
a logical bus number (LIOBN) with a specific IOMMU group in order
to support in-kernel handling of DMA map/unmap requests.
The patch adds the iommu_group_get_by_id(id) function which performs
such search.
v2: fixed reference counting.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Each iommu window can have access permissions associated with it. Extended the
window_enable API to incorporate window access permissions.
In case of PAMU each window can have its specific set of permissions.
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
This is required in case of PAMU, as it can support a window size of up
to 64G (even on 32bit).
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Add the iommu_domain_window_enable() and iommu_domain_window_disable()
functions to the IOMMU-API. These functions will be used to setup
domains that are based on subwindows and not on paging.
Signed-off-by: Joerg Roedel <joro@8bytes.org>
This attribute of a domain can be queried to find out if the
domain supports setting up page-tables using the iommu_map()
and iommu_unmap() functions.
Signed-off-by: Joerg Roedel <joro@8bytes.org>
In case the page-size bitmap is zero the code path in
iommu_map and iommu_unmap is undefined. Make it defined and
return -ENODEV in this case.
Signed-off-by: Joerg Roedel <joro@8bytes.org>
The iommu_init() initializes IOMMU internal structures and data
required for the IOMMU API as iommu_group_alloc().
It is registered as a subsys_initcall now.
One of the IOMMU users is going to be a PCI subsystem on POWER.
It discovers new IOMMU tables during the PCI scan so the logical
place to call iommu_group_alloc() is the moment when a new group
is discovered. However PCI scan is done from subsys_initcall hook
as IOMMU does so PCI hook can be (and is) called before the IOMMU one.
The patch moves IOMMU subsystem initialization one step earlier
to make sure that IOMMU is initialized before PCI scan begins.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
This patch introduces an extension to the iommu-api to get
and set attributes for an iommu_domain. Two functions are
introduced for this:
* iommu_domain_get_attr()
* iommu_domain_set_attr()
These functions will be used to make the iommu-api suitable
for GART-like IOMMUs and to implement hardware-specifc
api-extensions.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
IOMMU device groups are currently a rather vague associative notion
with assembly required by the user or user level driver provider to
do anything useful. This patch intends to grow the IOMMU group concept
into something a bit more consumable.
To do this, we first create an object representing the group, struct
iommu_group. This structure is allocated (iommu_group_alloc) and
filled (iommu_group_add_device) by the iommu driver. The iommu driver
is free to add devices to the group using it's own set of policies.
This allows inclusion of devices based on physical hardware or topology
limitations of the platform, as well as soft requirements, such as
multi-function trust levels or peer-to-peer protection of the
interconnects. Each device may only belong to a single iommu group,
which is linked from struct device.iommu_group. IOMMU groups are
maintained using kobject reference counting, allowing for automatic
removal of empty, unreferenced groups. It is the responsibility of
the iommu driver to remove devices from the group
(iommu_group_remove_device).
IOMMU groups also include a userspace representation in sysfs under
/sys/kernel/iommu_groups. When allocated, each group is given a
dynamically assign ID (int). The ID is managed by the core IOMMU group
code to support multiple heterogeneous iommu drivers, which could
potentially collide in group naming/numbering. This also keeps group
IDs to small, easily managed values. A directory is created under
/sys/kernel/iommu_groups for each group. A further subdirectory named
"devices" contains links to each device within the group. The iommu_group
file in the device's sysfs directory, which formerly contained a group
number when read, is now a link to the iommu group. Example:
$ ls -l /sys/kernel/iommu_groups/26/devices/
total 0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 ->
../../../../devices/pci0000:00/0000:00:1e.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 ->
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 ->
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
$ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group
[truncating perms/owner/timestamp]
/sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group ->
../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group ->
../../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group ->
../../../../kernel/iommu_groups/26
Groups also include several exported functions for use by user level
driver providers, for example VFIO. These include:
iommu_group_get(): Acquires a reference to a group from a device
iommu_group_put(): Releases reference
iommu_group_for_each_dev(): Iterates over group devices using callback
iommu_group_[un]register_notifier(): Allows notification of device add
and remove operations relevant to the group
iommu_group_id(): Return the group number
This patch also extends the IOMMU API to allow attaching groups to
domains. This is currently a simple wrapper for iterating through
devices within a group, but it's expected that the IOMMU API may
eventually make groups a more integral part of domains.
Groups intentionally do not try to manage group ownership. A user
level driver provider must independently acquire ownership for each
device within a group before making use of the group as a whole.
This may change in the future if group usage becomes more pervasive
across both DMA and IOMMU ops.
Groups intentionally do not provide a mechanism for driver locking
or otherwise manipulating driver matching/probing of devices within
the group. Such interfaces are generic to devices and beyond the
scope of IOMMU groups. If implemented, user level providers have
ready access via iommu_group_for_each_dev and group notifiers.
iommu_device_group() is removed here as it has no users. The
replacement is:
group = iommu_group_get(dev);
id = iommu_group_id(group);
iommu_group_put(group);
AMD-Vi & Intel VT-d support re-added in following patches.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Sometimes a single IOMMU user may have to deal with several
different IOMMU devices (e.g. remoteproc).
When an IOMMU fault happens, such users have to regain their
context in order to deal with the fault.
Users can't use the private fields of neither the iommu_domain nor
the IOMMU device, because those are already used by the IOMMU core
and low level driver (respectively).
This patch just simply allows users to pass a private token (most
notably their own context pointer) to iommu_set_fault_handler(),
and then makes sure it is provided back to the users whenever
an IOMMU fault happens.
The patch also adopts remoteproc to the new fault handling
interface, but the real functionality using this (recovery of
remote processors) will only be added later in a subsequent patch
set.
Cc: Fernando Guzman Lugo <fernando.lugo@ti.com>
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Since it is not guaranteed that an iommu driver initializes in its
domain_init() function, it must be initialized with NULL to prevent
calling a function in an arbitrary location when iommu fault occurred.
Signed-off-by: KyongHo Cho <pullip.cho@samsung.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
An IOMMU group is a set of devices for which the IOMMU cannot
distinguish transactions. For PCI devices, a group often occurs
when a PCI bridge is involved. Transactions from any device
behind the bridge appear to be sourced from the bridge itself.
We leave it to the IOMMU driver to define the grouping restraints
for their platform.
Using this new interface, the group for a device can be retrieved
using the iommu_device_group() callback. Users will compare the
value returned against the value returned for other devices to
determine whether they are part of the same group. Devices with
no group are not translated by the IOMMU. There should be no
expectations about the group numbers as they may be arbitrarily
assigned by the IOMMU driver and may not be persistent across boots.
We also provide a sysfs interface to the group numbers here so
that userspace can understand IOMMU dependencies between devices
for managing safe, userspace drivers.
[Some code changes by Joerg Roedel <joerg.roedel@amd.com>]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Now that all IOMMU drivers are exporting their supported pgsizes,
we can remove the default pgsize settings in register_iommu().
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
When mapping a memory region, split it to page sizes as supported
by the iommu hardware. Always prefer bigger pages, when possible,
in order to reduce the TLB pressure.
The logic to do that is now added to the IOMMU core, so neither the iommu
drivers themselves nor users of the IOMMU API have to duplicate it.
This allows a more lenient granularity of mappings; traditionally the
IOMMU API took 'order' (of a page) as a mapping size, and directly let
the low level iommu drivers handle the mapping, but now that the IOMMU
core can split arbitrary memory regions into pages, we can remove this
limitation, so users don't have to split those regions by themselves.
Currently the supported page sizes are advertised once and they then
remain static. That works well for OMAP and MSM but it would probably
not fly well with intel's hardware, where the page size capabilities
seem to have the potential to be different between several DMA
remapping devices.
register_iommu() currently sets a default pgsize behavior, so we can convert
the IOMMU drivers in subsequent patches. After all the drivers
are converted, the temporary default settings will be removed.
Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted
to deal with bytes instead of page order.
Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review!
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Express sizes in bytes rather than in page order, to eliminate the
size->order->size conversions we have whenever the IOMMU API is calling
the low level drivers' map/unmap methods.
Adopt all existing drivers.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
With all IOMMU drivers being converted to bus_set_iommu the
global iommu_ops are no longer required. The same is true
for the deprecated register_iommu function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
With per-bus iommu_ops the iommu_found function needs to
work on a bus_type too. This patch adds a bus_type parameter
to that function and converts all call-places.
The function is also renamed to iommu_present because the
function now checks if an iommu is present for a given bus
and does not check for a global iommu anymore.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This is necessary to store a pointer to the bus-specific
iommu_ops in the iommu-domain structure. It will be used
later to call into bus-specific iommu-ops.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This is the starting point to make the iommu_ops used for
the iommu-api a per-bus-type structure. It is required to
easily implement bus-specific setup in the iommu-layer.
The first user will be the iommu-group attribute in sysfs.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Make report_iommu_fault() return -ENOSYS whenever an iommu fault
handler isn't installed, so IOMMU drivers can then do their own
platform-specific default behavior if they wanted.
Fault handlers can still return -ENOSYS in case they want to elicit the
default behavior of the IOMMU drivers.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
commit 4f3f8d9 "iommu/core: Add fault reporting mechanism" added
the public iommu_set_fault_handler() symbol but forgot to export it.
Fix that.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>