Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):
- SVA (Shared Virtual Address)
- kernel DMA with PASID
- hardware-assist mediated device
This adds the set_dev_pasid domain ops for setting the domain onto a
PASID of a device and remove_dev_pasid iommu ops for removing any setup
on a PASID of device. This also adds interfaces for device drivers to
attach/detach/retrieve a domain for a PASID of a device.
If multiple devices share a single group, it's fine as long the fabric
always routes every TLP marked with a PASID to the host bridge and only
the host bridge. For example, ACS achieves this universally and has been
checked when pci_enable_pasid() is called. As we can't reliably tell the
source apart in a group, all the devices in a group have to be considered
as the same source, and mapped to the same PASID table.
The DMA ownership is about the whole device (more precisely, iommu group),
including the RID and PASIDs. When the ownership is converted, the pasid
array must be empty. This also adds necessary checks in the DMA ownership
interfaces.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The current kernel DMA with PASID support is based on the SVA with a flag
SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address
space to a PASID of the device. The device driver programs the device with
kernel virtual address (KVA) for DMA access. There have been security and
functional issues with this approach:
- The lack of IOTLB synchronization upon kernel page table updates.
(vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
- Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use
cases yet where DMA engines need kernel virtual addresses for in-kernel
DMA.
This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface.
The device drivers are suggested to handle kernel DMA with PASID through
the kernel DMA APIs.
The drvdata parameter in iommu_sva_bind_device() and all callbacks is not
needed anymore. Cleanup them as well.
Link: https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use this field to save the number of PASIDs that a device is able to
consume. It is a generic attribute of a device and lifting it into the
per-device dev_iommu struct could help to avoid the boilerplate code
in various IOMMU drivers.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cases like VFIO wish to attach a device to an existing domain that was
not allocated specifically from the device. This raises a condition
where the IOMMU driver can fail the domain attach because the domain and
device are incompatible with each other.
This is a soft failure that can be resolved by using a different domain.
Provide a dedicated errno EINVAL from the IOMMU driver during attach that
the reason why the attach failed is because of domain incompatibility.
VFIO can use this to know that the attach is a soft failure and it should
continue searching. Otherwise, the attach will be a hard failure and VFIO
will return the code to userspace.
Update kdocs to add rules of return value to the attach_dev op and APIs.
Link: https://lore.kernel.org/r/bd56d93c18621104a0fa1b0de31e9b760b81b769.1666042872.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add gfp parameter to iommu_alloc_resv_region() for the callers to specify
the memory allocation behavior. Thus iommu_alloc_resv_region() could also
be available in critical contexts.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The AMD IOMMU driver cannot activate PASID mode on a RID without the RID's
translation being set to IDENTITY. Further it requires changing the RID's
page table layout from the normal v1 IOMMU_DOMAIN_IDENTITY layout to a
different v2 layout.
It does this by creating a new iommu_domain, configuring that domain for
v2 identity operation and then attaching it to the group, from within the
driver. This logic assumes the group is already set to the IDENTITY domain
and is being used by the DMA API.
However, since the ownership logic is based on the group's domain pointer
equaling the default domain to detect DMA API ownership, this causes it to
look like the group is not attached to the DMA API any more. This blocks
attaching drivers to any other devices in the group.
In a real system this manifests itself as the HD-audio devices on some AMD
platforms losing their device drivers.
Work around this unique behavior of the AMD driver by checking for
equality of IDENTITY domains based on their type, not their pointer
value. This allows the AMD driver to have two IDENTITY domains for
internal purposes without breaking the check.
Have the AMD driver properly declare that the special domain it created is
actually an IDENTITY domain.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: stable@vger.kernel.org
Fixes: 512881eacf ("bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management")
Reported-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0-v1-ea566e16b06b+811-amd_owner_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Clean up the remaining trivial bus_set_iommu() callsites along
with the implementation. Now drivers only have to know and care
about iommu_device instances, phew!
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ea383d5f4d74ffe200ab61248e5de6e95846180a.1660572783.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the bus setup to iommu_device_register(). This should allow
bus_iommu_probe() to be correctly replayed for multiple IOMMU instances,
and leaves bus_set_iommu() as a glorified no-op to be cleaned up next.
At this point we can also handle cleanup better than just rolling back
the most-recently-touched bus upon failure - which may release devices
owned by other already-registered instances, and still leave devices on
other buses with dangling pointers to the failed instance. Now it's easy
to clean up the exact footprint of a given instance, no more, no less.
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Krishna Reddy <vdumpa@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/d342b6f27efb5ef3e93aacaa3012d25386d74866.1660572783.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The number of bus types that the IOMMU subsystem deals with is small and
manageable, so pull that list into core code as a first step towards
cleaning up all the boilerplate bus-awareness from drivers. Calling
iommu_probe_device() before bus->iommu_ops is set will simply return
-ENODEV and not break the notifier call chain, so there should be no
harm in proactively registering all our bus notifiers at init time.
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/7462347bf938bd6eedb629a3a318434f6516e712.1660572783.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With all callers now converted to the device-specific version, retire
the old bus-based interface, and give drivers the chance to indicate
accurate per-instance capabilities.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In the iommu_group_alloc, when the kobject_init_and_add
failed, the group->kobj is associate with iommu_group_ktype,
thus its release function iommu_group_release will be called
by the following kobject_put. The iommu_group_release calls
ida_free with the group->id, so we do not need to do it before
kobject_put.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Link: https://lore.kernel.org/r/20220815031423.94548-1-yuancan@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In order to make the underneath API easier to change in the future,
prevent users from dereferencing fwnode from struct device.
Instead, use the specific dev_fwnode() API for that.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220801164758.20664-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
- convert arm32 to the common dma-direct code (Arnd Bergmann, Robin Murphy,
Christoph Hellwig)
- restructure the PCIe peer to peer mapping support (Logan Gunthorpe)
- allow the IOMMU code to communicate an optional DMA mapping length
and use that in scsi and libata (John Garry)
- split the global swiotlb lock (Tianyu Lan)
- various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang,
Lukas Bulwahn, Robin Murphy)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmLuIYULHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPS5A//Ty1ZNyXExmwZ6J6g7/oIvQlpAHilDr22mCd8tR8Y
Ne7TgLa/X+usFvJTxJfkvg/LNMDjD7qx0J/mhDGm4reOFcEL4/PBy0rDSOgnmntV
k/fPhgwnpuztiAQ+s+WkJ3pkrmG1HaEId7GGj2JaoYdas6RX2mGX7vL8uvUFepjw
lYPAqWMtJHkOfsDK0PqqyQsr7dcC6lyFLqnn/wqvHtTJeKCfGs6W/SIrlWme2SZY
3dNx84ZR1uPjaazAmtf2IWfjh/TBmd0ETRYycgUUKRP9iwsCkBQDBwsBGSIYXiWj
BUKQ5oMvjAlUGRF0jYz9e77KuedE6GxWiXNQstitBmid142M37DHA5tvZRf65MPS
THHcjTDmmoaO4YfFhhXOcFOrjG4/V8bF7fgHB6XkHDjhVVTcnIx8zuOAXIVBZvIV
VAALmamBqEfIZZrCqgr7hzFssK2bip+TIMkdoD46Wcr+D7bAlujhuzWxubn9+ulT
23v/pAvC80ut6LvKj6EA+GpRm/pejfOtEbjXPoO2hguNxvuUKvPQqNh9hy0q+v1e
8n2Y/4lhy5bv02S7wKooNkfCoV753jBY1TIru45UmEYc3EkTQPii6okYe0DvW4QX
VCnKgo156wSBfE+9eWdxCROv2SZqJFMV/wL3vw54dpJQMbDy7VkNsh4mGREdUkU1
uek=
=Bv19
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- convert arm32 to the common dma-direct code (Arnd Bergmann, Robin
Murphy, Christoph Hellwig)
- restructure the PCIe peer to peer mapping support (Logan Gunthorpe)
- allow the IOMMU code to communicate an optional DMA mapping length
and use that in scsi and libata (John Garry)
- split the global swiotlb lock (Tianyu Lan)
- various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang,
Lukas Bulwahn, Robin Murphy)
* tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping: (45 commits)
swiotlb: fix passing local variable to debugfs_create_ulong()
dma-mapping: reformat comment to suppress htmldoc warning
PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()
RDMA/rw: drop pci_p2pdma_[un]map_sg()
RDMA/core: introduce ib_dma_pci_p2p_dma_supported()
nvme-pci: convert to using dma_map_sgtable()
nvme-pci: check DMA ops when indicating support for PCI P2PDMA
iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg
iommu: Explicitly skip bus address marked segments in __iommu_map_sg()
dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support
dma-direct: support PCI P2PDMA pages in dma-direct map_sg
dma-mapping: allow EREMOTEIO return code for P2PDMA transfers
PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
PCI/P2PDMA: Attempt to set map_type if it has not been set
lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
swiotlb: clean up some coding style and minor issues
dma-mapping: update comment after dmabounce removal
scsi: sd: Add a comment about limiting max_sectors to shost optimal limit
ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors
scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit
...
In order to support PCI P2PDMA mappings with dma-iommu, explicitly skip
any segments marked with sg_dma_mark_bus_address() in __iommu_map_sg().
These segments should not be mapped into the IOVA and will be handled
separately in as subsequent patch for dma-iommu.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
All drivers that implement get_resv_regions just use
generic_put_resv_regions to implement the put side. Remove the
indirections and document the allocations constraints.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220708080616.238833-4-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Many drivers do nothing meaningful for .release_device, and it's neatly
abstracted to just two callsites in the core code, so let's make it
optional to implement.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/bda9d3eb4527eac8f6544a15067e2529cca54a2e.1655822151.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A callback is introduced to struct iommu_resv_region to free memory
allocations associated with the reserved region. This will be useful
when we introduce support for IORT RMR based reserved regions.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steven Price <steven.price@arm.com>
Tested-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220615101044.1972-2-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use ida_alloc()/ida_free() instead of deprecated
ida_simple_get()/ida_simple_remove().
Signed-off-by: Ke Liu <liuke94@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20220608021655.1538087-1-liuke94@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Once the group enters 'owned' mode it can never be assigned back to the
default_domain or to a NULL domain. It must always be actively assigned to
a current domain. If the caller hasn't provided a domain then the core
must provide an explicit DMA blocking domain that has no DMA map.
Lazily create a group-global blocking DMA domain when
iommu_group_claim_dma_owner is first called and immediately assign the
group to it. This ensures that DMA is immediately fully isolated on all
IOMMU drivers.
If the user attaches/detaches while owned then detach will set the group
back to the blocking domain.
Slightly reorganize the call chains so that
__iommu_group_set_core_domain() is the function that removes any caller
configured domain and sets the domains back a core owned domain with an
appropriate lifetime.
__iommu_group_set_domain() is the worker function that can change the
domain assigned to a group to any target domain, including NULL.
Add comments clarifying how the NULL vs detach_dev vs default_domain works
based on Robin's remarks.
This fixes an oops with VFIO and SMMUv3 because VFIO will call
iommu_detach_group() and then immediately iommu_domain_free(), but
SMMUv3 has no way to know that the domain it is holding a pointer to
has been freed. Now the iommu_detach_group() will assign the blocking
domain and SMMUv3 will no longer hold a stale domain reference.
Fixes: 1ea2a07a53 ("iommu: Add DMA ownership management interfaces")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Tested-by: Baolu Lu <baolu.lu@linux.intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Co-developed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
--
Just minor polishing as discussed
v3:
- Change names to __iommu_group_set_domain() /
__iommu_group_set_core_domain()
- Clarify comments
- Call __iommu_group_set_domain() directly in
iommu_group_release_dma_owner() since we know it is always selecting
the default_domain
- Remove redundant detach_dev ops check in __iommu_detach_device and
make the added WARN_ON fail instead
- Check for blocking_domain in __iommu_attach_group() so VFIO can
actually attach a new group
- Update comments and spelling
- Fix missed change to new_domain in iommu_group_do_detach_device()
v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com
v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Groups created by VFIO backends outside the core IOMMU API should never
be passed directly into the API itself, however they still expose their
standard sysfs attributes, so we can still stumble across them that way.
Take care to consider those cases before jumping into our normal
assumptions of a fully-initialised core API group.
Fixes: 3f6634d997 ("iommu: Use right way to retrieve iommu_ops")
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/86ada41986988511a8424e84746dfe9ba7f87573.1651667683.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The iommu group changes notifer is not referenced in the tree. Remove it
to avoid dead code.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220418005000.897664-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Multiple devices may be placed in the same IOMMU group because they
cannot be isolated from each other. These devices must either be
entirely under kernel control or userspace control, never a mixture.
This adds dma ownership management in iommu core and exposes several
interfaces for the device drivers and the device userspace assignment
framework (i.e. VFIO), so that any conflict between user and kernel
controlled dma could be detected at the beginning.
The device driver oriented interfaces are,
int iommu_device_use_default_domain(struct device *dev);
void iommu_device_unuse_default_domain(struct device *dev);
By calling iommu_device_use_default_domain(), the device driver tells
the iommu layer that the device dma is handled through the kernel DMA
APIs. The iommu layer will manage the IOVA and use the default domain
for DMA address translation.
The device user-space assignment framework oriented interfaces are,
int iommu_group_claim_dma_owner(struct iommu_group *group,
void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);
The device userspace assignment must be disallowed if the DMA owner
claiming interface returns failure.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
iommu_capable() only really works for systems where all IOMMU instances
are completely homogeneous, and all devices are IOMMU-mapped. Implement
the new variant which will be able to give a more accurate answer for
whichever device the caller is actually interested in, and even more so
once all the external users have been converted and we can reliably pass
the device pointer through the internal driver interface too.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/8407eb9586677995b7a9fd70d0fd82d85929a9bb.1650878781.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the domain specific operations out of struct iommu_ops into a new
structure that only has domain specific operations. This solves the
problem of needing to know if the method vector for a given operation
needs to be retrieved from the device or the domain. Logically the domain
ops are the ones that make sense for external subsystems and endpoint
drivers to use, while device ops, with the sole exception of domain_alloc,
are IOMMU API internals.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The is_attach_deferred iommu_ops callback is a device op. The domain
argument is unnecessary and never used. Remove it to make code clean.
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The common iommu_ops is hooked to both device and domain. When a helper
has both device and domain pointer, the way to get the iommu_ops looks
messy in iommu core. This sorts out the way to get iommu_ops. The device
related helpers go through device pointer, while the domain related ones
go through domain pointer.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The apply_resv_region callback in iommu_ops was introduced to reserve an
IOVA range in the given DMA domain when the IOMMU driver manages the IOVA
by itself. As all drivers converted to use dma-iommu in the core, there's
no driver using this anymore. Remove it to avoid dead code.
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The aux-domain related interfaces and iommu_ops are not referenced
anywhere in the tree. We've also reached a consensus to redesign it
based the new iommufd framework. Remove them to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The guest pasid related uapi interfaces and definitions are not referenced
anywhere in the tree. We've also reached a consensus to replace them with
a new iommufd design. Remove them to avoid dead code.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220216025249.3459465-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The code is mostly free of W=1 warning, so fix the following:
drivers/iommu/iommu.c:996: warning: expecting prototype for iommu_group_for_each_dev(). Prototype was for __iommu_group_for_each_dev() instead
drivers/iommu/iommu.c:3048: warning: Function parameter or member 'drvdata' not described in 'iommu_sva_bind_device'
drivers/iommu/ioasid.c:354: warning: Function parameter or member 'ioasid' not described in 'ioasid_get'
drivers/iommu/omap-iommu.c:1098: warning: expecting prototype for omap_iommu_suspend_prepare(). Prototype was for omap_iommu_prepare() instead
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/1643366673-26803-1-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Extend the scope of holding group->mutex so that it can cover the default
domain check/attachment and direct mappings of reserved regions.
Cc: Ashish Mhetre <amhetre@nvidia.com>
Fixes: 211ff31b3d ("iommu: Fix race condition during default domain allocation")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211108061349.1985579-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- Intel IOMMU Updates fro Lu Baolu:
- Dump DMAR translation structure when DMA fault occurs
- An optimization in the page table manipulation code
- Use second level for GPA->HPA translation
- Various cleanups
- Arm SMMU Updates from Will
- Minor optimisations to SMMUv3 command creation and submission
- Numerous new compatible string for Qualcomm SMMUv2 implementations
- Fixes for the SWIOTLB based implemenation of dma-iommu code for
untrusted devices
- Add support for r8a779a0 to the Renesas IOMMU driver and DT matching
code for r8a77980
- A couple of cleanups and fixes for the Apple DART IOMMU driver
- Make use of generic report_iommu_fault() interface in the AMD IOMMU
driver
- Various smaller fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmGD6NQACgkQK/BELZcB
GuOSfg/9FKXl5ym86BP3tAS1fREKH7p59JRGZrrIR89NyHAcEUjtNG3YLPao+YxU
3CDgLkru+vlDpYY54QoyqcY5FgIHT3Cna/Cdk4zekRmSO/14gHp47jtZRheOUzLF
rvwfaplcbbtT8akpsVFzvw8YpQLGSDiDQSl7xL2+40Z9hiYX/gS9Af+PH98tAXsa
yZKZj6gU+JXM58VihO3M7umyE06tovyBaYgcsBZtbf66bGc0ySu+fe75UVWbueRt
Z8jwqa7TUfVXiYC8h+LqtGET6gtzNSsxAU3VllRe7Brf6K8i/yaRs/TO2Hp83d7/
q/fcK3vNQ5v3aDNci/DjBB8SEySzCmRz/9ocCOCx8ByuRp+5lwVRPPq3WcUMtsZY
QpYo9Fk7luFz2Gj5LObKAVBvOoeBZ5Km3oPs4HVmQ6epxn/rVckJDnJnVSLJuATq
tSZC2heRfFlg1dT6WFaynCTP2RI1LlNEdKhHirV6L368rSjmF0ZdQxdTpHULsHr1
yMjqL21OfcSkLW91rvfb3g68EsIwDbCPGTOlQWZLmAtwOWtHSCLPgwwEG7WefZbH
yaslpmlUTOurUnFmpxlfLicy5sqsBL2ASzGJkEKrgunw82Ke96zzkRzi+9j9HeS6
g0AyIWMi1cUAjONVUZtV4yjImXh63HIPiKx730a9teodusoxm+Q=
=waUR
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- Intel IOMMU Updates fro Lu Baolu:
- Dump DMAR translation structure when DMA fault occurs
- An optimization in the page table manipulation code
- Use second level for GPA->HPA translation
- Various cleanups
- Arm SMMU Updates from Will
- Minor optimisations to SMMUv3 command creation and submission
- Numerous new compatible string for Qualcomm SMMUv2 implementations
- Fixes for the SWIOTLB based implemenation of dma-iommu code for
untrusted devices
- Add support for r8a779a0 to the Renesas IOMMU driver and DT matching
code for r8a77980
- A couple of cleanups and fixes for the Apple DART IOMMU driver
- Make use of generic report_iommu_fault() interface in the AMD IOMMU
driver
- Various smaller fixes and cleanups
* tag 'iommu-updates-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (35 commits)
iommu/dma: Fix incorrect error return on iommu deferred attach
iommu/dart: Initialize DART_STREAMS_ENABLE
iommu/dma: Use kvcalloc() instead of kvzalloc()
iommu/tegra-smmu: Use devm_bitmap_zalloc when applicable
iommu/dart: Use kmemdup instead of kzalloc and memcpy
iommu/vt-d: Avoid duplicate removing in __domain_mapping()
iommu/vt-d: Convert the return type of first_pte_in_page to bool
iommu/vt-d: Clean up unused PASID updating functions
iommu/vt-d: Delete dev_has_feat callback
iommu/vt-d: Use second level for GPA->HPA translation
iommu/vt-d: Check FL and SL capability sanity in scalable mode
iommu/vt-d: Remove duplicate identity domain flag
iommu/vt-d: Dump DMAR translation structure when DMA fault occurs
iommu/vt-d: Do not falsely log intel_iommu is unsupported kernel option
iommu/arm-smmu-qcom: Request direct mapping for modem device
iommu: arm-smmu-qcom: Add compatible for QCM2290
dt-bindings: arm-smmu: Add compatible for QCM2290 SoC
iommu/arm-smmu-qcom: Add SM6350 SMMU compatible
dt-bindings: arm-smmu: Add compatible for SM6350 SoC
iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd()
...
Replace uses of mem_encrypt_active() with calls to cc_platform_has() with
the CC_ATTR_MEM_ENCRYPT attribute.
Remove the implementation of mem_encrypt_active() across all arches.
For s390, since the default implementation of the cc_platform_has()
matches the s390 implementation of mem_encrypt_active(), cc_platform_has()
does not need to be implemented in s390 (the config option
ARCH_HAS_CC_PLATFORM is not set).
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-9-bp@alien8.de
IOVA cookies are now got and put by core code, so we no longer need to
export these to modular drivers. The export for getting MSI cookies
stays, since VFIO can still be a module, but it was already relying on
someone else putting them, so that aspect is unaffected.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ef89db54a27df7d8bc0af094c7d7b204fd61774c.1631531973.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- New DART IOMMU driver for Apple Silicon M1 chips.
- Optimizations for iommu_[map/unmap] performance
- Selective TLB flush support for the AMD IOMMU driver to make
it more efficient on emulated IOMMUs.
- Rework IOVA setup and default domain type setting to move more
code out of IOMMU drivers and to support runtime switching
between certain types of default domains.
- VT-d Updates from Lu Baolu:
- Update the virtual command related registers
- Enable Intel IOMMU scalable mode by default
- Preset A/D bits for user space DMA usage
- Allow devices to have more than 32 outstanding PRs
- Various cleanups
- ARM SMMU Updates from Will Deacon:
- SMMUv3: Minor optimisation to avoid zeroing struct members on CMD submission
- SMMUv3: Increased use of batched commands to reduce submission latency
- SMMUv3: Refactoring in preparation for ECMDQ support
- SMMUv2: Fix races when probing devices with identical StreamIDs
- SMMUv2: Optimise walk cache flushing for Qualcomm implementations
- SMMUv2: Allow deep sleep states for some Qualcomm SoCs with shared clocks
- Various smaller optimizations, cleanups, and fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmEyKYAACgkQK/BELZcB
GuOzAxAAnJ02PG07BnFFFGN2/o3eVON4LQUXquMePjcZ8A8oQf073jO/ybWNnpJK
5V+DHRg2CAugFHks/EwIrxFXAWZuStrcnk81d8t6T6ROQl47Zv1qshksUTnsDQnz
V7mQ1P/pcsBwCUf73aD9ncLmLkiuTVfHKoKfe3gHeQI+H+2Lw4ijzB8kIwUqhkHI
heJZLDmO87S2Mr7zlCmMQH5R550fHrTKSbUCx9QqFu3GgWsjkU+3u1S17xR1bEoW
hmhJhyAw+MLrSgdeG4U9o+6AcQuRELEHfVSq7PtDxQ6hEVziGYGY4Nk+YiEcXFiv
mu9qfEkaP/2QOKszvks+nhHrwDnJ9WLnEEskiEFwjsaFauIKsRscfYVUBTWeYXJT
9t/PVngigWLDhGO0NEPthQvJExvJJs1MQQ72CcA6dd0XdGpN+aRglIUWUJP/nQHd
doAx4/1YWnHVkWWUef8NgmVvlHdoXjA7vy4QGL9FYCqV6ImfhAkJYKJ99X6Ovlmk
gje/Kx+5wUPT2nXNbTkjalIylyUNpugMY4xD7K06VXjvRMUf2SbYNDQxYJaDDld6
nDt0F0NvEyrj7HO8egwIZbX3MOikhMGHur48yEyCTbm+9oHQffkODq1o4OfuxJh2
nq0G5Plln9CEmhQVwzibcPSNlYPe8AZbbXqQ9DrJFusEpYj+01c=
=zVaQ
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- New DART IOMMU driver for Apple Silicon M1 chips
- Optimizations for iommu_[map/unmap] performance
- Selective TLB flush support for the AMD IOMMU driver to make it more
efficient on emulated IOMMUs
- Rework IOVA setup and default domain type setting to move more code
out of IOMMU drivers and to support runtime switching between certain
types of default domains
- VT-d Updates from Lu Baolu:
- Update the virtual command related registers
- Enable Intel IOMMU scalable mode by default
- Preset A/D bits for user space DMA usage
- Allow devices to have more than 32 outstanding PRs
- Various cleanups
- ARM SMMU Updates from Will Deacon:
SMMUv3:
- Minor optimisation to avoid zeroing struct members on CMD submission
- Increased use of batched commands to reduce submission latency
- Refactoring in preparation for ECMDQ support
SMMUv2:
- Fix races when probing devices with identical StreamIDs
- Optimise walk cache flushing for Qualcomm implementations
- Allow deep sleep states for some Qualcomm SoCs with shared clocks
- Various smaller optimizations, cleanups, and fixes
* tag 'iommu-updates-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (85 commits)
iommu/io-pgtable: Abstract iommu_iotlb_gather access
iommu/arm-smmu: Fix missing unlock on error in arm_smmu_device_group()
iommu/vt-d: Add present bit check in pasid entry setup helpers
iommu/vt-d: Use pasid_pte_is_present() helper function
iommu/vt-d: Drop the kernel doc annotation
iommu/vt-d: Allow devices to have more than 32 outstanding PRs
iommu/vt-d: Preset A/D bits for user space DMA usage
iommu/vt-d: Enable Intel IOMMU scalable mode by default
iommu/vt-d: Refactor Kconfig a bit
iommu/vt-d: Remove unnecessary oom message
iommu/vt-d: Update the virtual command related registers
iommu: Allow enabling non-strict mode dynamically
iommu: Merge strictness and domain type configs
iommu: Only log strictness for DMA domains
iommu: Expose DMA domain strictness via sysfs
iommu: Express DMA strictness via the domain type
iommu/vt-d: Prepare for multiple DMA domain types
iommu/arm-smmu: Prepare for multiple DMA domain types
iommu/amd: Prepare for multiple DMA domain types
iommu: Introduce explicit type for non-strict DMA domains
...
- fix debugfs initialization order (Anthony Iliopoulos)
- use memory_intersects() directly (Kefeng Wang)
- allow to return specific errors from ->map_sg
(Logan Gunthorpe, Martin Oliveira)
- turn the dma_map_sg return value into an unsigned int (me)
- provide a common global coherent pool іmplementation (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmEvY+8LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPaehAAsgnBzzzoLHO83pgs0KL92c+0DiMNHYmaMCJOvZXk
x2Irv+O74WikRJc4S7uQ26p2spjmUxjmiOjld+8+NN0liD4QO9BQ/SZpIp8emuKS
/yPG6Xh86xSl/OrPL1y7kGeHkRi5sm3mRhcTdILFQFPLcSReupe++GRfnvrpbOPk
tj3pBGXluD6iJH12BBt00ushUVzZ0F2xaF6xUDAs94RSZ3tlqsfx6c928Y1KxSZh
f89q/KuaokyogFG7Ujj/nYgIUETaIs2W6UmxBfRzdEMJFSffwomUMbw+M+qGJ7/d
2UjamFYRX16FReE8WNsndbX1E6k5JBW12E1qwV3dUwatlNLWEaRq3PNiWkF7zcFH
LDkpDYN6s5bIDPTfDp21XfPygoH8KQhnD9lVf0aB7n04uu8VJrGB9+10PpkCJVXD
0b2dcuSwCO7hAfTfNGVV8f3EI/1XPflr1hJvMgcVtY53CR96ldp+4QaElzWLXumN
MyptirmrVITNVyVwGzhGAblXBLWdarXD0EXudyiaF4Xbrj3AkIOSUCghEwKLpjQf
UwMFFwSE8yGxKTRK4HfU5gMzy6G751fU7TUe5lmxZLovDflQoSXMWgHE8e7r0Qel
o5v6lmUzoWz2fAISf3xjauo2ncgmfWMwYM6C7OJy5nG73QXLQId9J+ReXbmrgrrN
DgI=
=spje
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- fix debugfs initialization order (Anthony Iliopoulos)
- use memory_intersects() directly (Kefeng Wang)
- allow to return specific errors from ->map_sg (Logan Gunthorpe,
Martin Oliveira)
- turn the dma_map_sg return value into an unsigned int (me)
- provide a common global coherent pool іmplementation (me)
* tag 'dma-mapping-5.15' of git://git.infradead.org/users/hch/dma-mapping: (31 commits)
hexagon: use the generic global coherent pool
dma-mapping: make the global coherent pool conditional
dma-mapping: add a dma_init_global_coherent helper
dma-mapping: simplify dma_init_coherent_memory
dma-mapping: allow using the global coherent pool for !ARM
ARM/nommu: use the generic dma-direct code for non-coherent devices
dma-direct: add support for dma_coherent_default_memory
dma-mapping: return an unsigned int from dma_map_sg{,_attrs}
dma-mapping: disallow .map_sg operations from returning zero on error
dma-mapping: return error code from dma_dummy_map_sg()
x86/amd_gart: don't set failed sg dma_address to DMA_MAPPING_ERROR
x86/amd_gart: return error code from gart_map_sg()
xen: swiotlb: return error code from xen_swiotlb_map_sg()
parisc: return error code from .map_sg() ops
sparc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
sparc/iommu: return error codes from .map_sg() ops
s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR
s390/pci: return error code from s390_dma_map_sg()
powerpc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
powerpc/iommu: return error code from .map_sg() ops
...
Allocating and enabling a flush queue is in fact something we can
reasonably do while a DMA domain is active, without having to rebuild it
from scratch. Thus we can allow a strict -> non-strict transition from
sysfs without requiring to unbind the device's driver, which is of
particular interest to users who want to make selective relaxations to
critical devices like the one serving their root filesystem.
Disabling and draining a queue also seems technically possible to
achieve without rebuilding the whole domain, but would certainly be more
involved. Furthermore there's not such a clear use-case for tightening
up security *after* the device may already have done whatever it is that
you don't trust it not to do, so we only consider the relaxation case.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/d652966348c78457c38bf18daf369272a4ebc2c9.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
To parallel the sysfs behaviour, merge the new build-time option
for DMA domain strictness into the default domain type choice.
Suggested-by: Joerg Roedel <joro@8bytes.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/d04af35b9c0f2a1d39605d7a9b451f5e1f0c7736.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When passthrough is enabled, the default strictness policy becomes
irrelevant, since any subsequent runtime override to a DMA domain type
now embodies an explicit choice of strictness as well. Save on noise by
only logging the default policy when it is meaningfully in effect.
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/9d2bcba880c6d517d0751ed8bd4960853030b4d7.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The sysfs interface for default domain types exists primarily so users
can choose the performance/security tradeoff relevant to their own
workload. As such, the choice between the policies for DMA domains fits
perfectly as an additional point on that scale - downgrading a
particular device from a strict default to non-strict may be enough to
let it reach the desired level of performance, while still retaining
more peace of mind than with a wide-open identity domain. Now that we've
abstracted non-strict mode as a distinct type of DMA domain, allow it to
be chosen through the user interface as well.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0e08da5ed4069fd3473cfbadda758ca983becdbf.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Eliminate the iommu_get_dma_strict() indirection and pipe the
information through the domain type from the beginning. Besides
the flow simplification this also has several nice side-effects:
- Automatically implies strict mode for untrusted devices by
virtue of their IOMMU_DOMAIN_DMA override.
- Ensures that we only end up using flush queues for drivers
which are aware of them and can actually benefit.
- Allows us to handle flush queue init failure by falling back
to strict mode instead of leaving it to possibly blow up later.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/47083d69155577f1367877b1594921948c366eb3.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Promote the difference between strict and non-strict DMA domains from an
internal detail to a distinct domain feature and type, to pave the road
for exposing it through the sysfs default domain interface.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/08cd2afaf6b63c58ad49acec3517c9b32c2bb946.1628682049.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that everyone has converged on iommu-dma for IOMMU_DOMAIN_DMA
support, we can abandon the notion of drivers being responsible for the
cookie type, and consolidate all the management into the core code.
CC: Yong Wu <yong.wu@mediatek.com>
CC: Chunyan Zhang <chunyan.zhang@unisoc.com>
CC: Maxime Ripard <mripard@kernel.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/46a2c0e7419c7d1d931762dc7b6a69fa082d199a.1628682048.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When two devices with same SID are getting probed concurrently through
iommu_probe_device(), the iommu_domain sometimes is getting allocated more
than once as call to iommu_alloc_default_domain() is not protected for
concurrency. Furthermore, it leads to each device holding a different
iommu_domain pointer, separate IOVA space and only one of the devices'
domain is used for translations from IOMMU. This causes accesses from other
device to fault or see incorrect translations.
Fix this by protecting iommu_alloc_default_domain() call with group->mutex
and let all devices with same SID share same iommu_domain.
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Link: https://lore.kernel.org/r/1628570641-9127-2-git-send-email-amhetre@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
Convert to ssize_t return code so the return code from __iommu_map()
can be returned all the way down through dma_iommu_map_sg().
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If probe_device is failing, iommu_group is not initialized because
iommu_group_add_device is not reached, so freeing it will result
in NULL pointer access.
iommu_bus_init
->bus_iommu_probe
->probe_iommu_group in for each:/* return -22 in fail case */
->iommu_probe_device
->__iommu_probe_device /* return -22 here.*/
-> ops->probe_device /* return -22 here.*/
-> iommu_group_get_for_dev
-> ops->device_group
-> iommu_group_add_device //good case
->remove_iommu_group //in fail case, it will remove group
->iommu_release_device
->iommu_group_remove_device // here we don't have group
In my case ops->probe_device (mtk_iommu_probe_device from
mtk_iommu_v1.c) is due to failing fwspec->ops mismatch.
Fixes: d72e31c937 ("iommu: IOMMU Groups")
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Link: https://lore.kernel.org/r/20210731074737.4573-1-linux@fw-web.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If people are going to insist on calling iommu_iova_to_phys()
pointlessly and expecting it to work, we can at least do ourselves a
favour by handling those cases in the core code, rather than repeatedly
across an inconsistent handful of drivers.
Since all the existing drivers implement the internal callback, and any
future ones are likely to want to work with iommu-dma which relies on
iova_to_phys a fair bit, we may as well remove that currently-redundant
check as well and consider it mandatory.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/f564f3f6ff731b898ff7a898919bf871c2c7745a.1626354264.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We only ever now set strict mode enabled in iommu_set_dma_strict(), so
just remove the argument.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-7-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the two config options in an choice, as they are mutually exclusive.
[jpg: Make choice between strict and lazy only (and not passthrough)]
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-4-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
As well as the default domain type, it's useful to know whether strict
or lazy for DMA domains, so add this info in a separate print.
The (stict/lazy) mode may be also set via iommu.strict earlyparm, but
this will be processed prior to iommu_subsys_init(), so that print will be
accurate for drivers which don't set the mode via custom means.
For the drivers which set the mode via custom means - AMD and Intel drivers
- they maintain prints to inform a change in policy or that custom cmdline
methods to change policy are deprecated.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1626088340-5838-3-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since iommu_pgsize can calculate how many pages of the
same size can be mapped/unmapped before the next largest
page size boundary, add support for invoking an IOMMU
driver's map_pages() callback, if it provides one.
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1623850736-389584-9-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Extend iommu_pgsize() to populate an optional 'count' parameter so that
we can direct unmapping operation to the ->unmap_pages callback if it
has been provided by the driver.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1623850736-389584-8-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this address
is the logical OR of the virtual and physical addresses.
Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.
No functional change.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1623850736-389584-7-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Avoid the potential for shifting values by amounts greater than the
width of their type by using a bitmap to compute page size in
iommu_pgsize().
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1623850736-389584-6-git-send-email-quic_c_gdjako@quicinc.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Function iommu_group_store_type() is the only caller of the static
function iommu_change_dev_def_domain() and has performed
"if (WARN_ON(!group))" detection before calling it. So the one here is
redundant.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210513075815.6382-1-thunder.leizhen@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rather than have separate opaque setter functions that are easy to
overlook and lead to repetitive boilerplate in drivers, let's pass the
relevant initialisation parameters directly to iommu_device_register().
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ab001b87c533b6f4db71eb90db6f888953986c36.1617285386.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use an explicit set_pgtable_quirks method instead that just passes
the actual quirk bitmask instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-20-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.
This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86. As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.
[ported on top of the other iommu_attr changes and added a few small
missing bits]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210401155256.298656-19-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use an explicit enable_nesting method instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-17-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The geometry information can be trivially queried from the iommu_domain
struture.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-16-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
DOMAIN_ATTR_PAGING is never used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-15-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The only thing that fsl_pamu_window_enable does for the current caller
is to fill in the prot value in the only dma_window structure, and to
propagate a few values from the iommu_domain_geometry struture into the
dma_window. Remove the dma_window entirely, hardcode the prot value and
otherwise use the iommu_domain_geometry structure instead.
Remove the now unused ->domain_window_enable iommu method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-7-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The device iommu probe/attach might have failed leaving dev->iommu
to NULL and device drivers may still invoke these functions resulting
in a crash in iommu vendor driver code.
Hence make sure we check that.
Fixes: a3a195929d ("iommu: Add APIs for multiple domains per device")
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20210303173611.520-1-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Sleeping while atomic = bad. Let's fix an obvious typo to try to avoid it.
The warning that was seen (on a downstream kernel with the problematic
patch backported):
BUG: sleeping function called from invalid context at mm/page_alloc.c:4726
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: ksoftirqd/0
CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.93-12508-gc10c93e28e39 #1
Call trace:
dump_backtrace+0x0/0x154
show_stack+0x20/0x2c
dump_stack+0xa0/0xfc
___might_sleep+0x11c/0x12c
__might_sleep+0x50/0x84
__alloc_pages_nodemask+0xf8/0x2bc
__arm_lpae_alloc_pages+0x48/0x1b4
__arm_lpae_map+0x124/0x274
__arm_lpae_map+0x1cc/0x274
arm_lpae_map+0x140/0x170
arm_smmu_map+0x78/0xbc
__iommu_map+0xd4/0x210
_iommu_map+0x4c/0x84
iommu_map_atomic+0x44/0x58
__iommu_dma_map+0x8c/0xc4
iommu_dma_map_page+0xac/0xf0
Fixes: d8c1df02ac ("iommu: Move iotlb_sync_map out from __iommu_map")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Yong Wu <yong.wu@mediatek.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210201170611.1.I64a7b62579287d668d7c89e105dcedf45d641063@changeid
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently, because domain attach allows to be deferred from iommu
driver to device driver, and when iommu initializes, the devices
on the bus will be scanned and the default groups will be allocated.
Due to the above changes, some devices could be added to the same
group as below:
[ 3.859417] pci 0000:01:00.0: Adding to iommu group 16
[ 3.864572] pci 0000:01:00.1: Adding to iommu group 16
[ 3.869738] pci 0000:02:00.0: Adding to iommu group 17
[ 3.874892] pci 0000:02:00.1: Adding to iommu group 17
But when attaching these devices, it doesn't allow that a group has
more than one device, otherwise it will return an error. This conflicts
with the deferred attaching. Unfortunately, it has two devices in the
same group for my side, for example:
[ 9.627014] iommu_group_device_count(): device name[0]:0000:01:00.0
[ 9.633545] iommu_group_device_count(): device name[1]:0000:01:00.1
...
[ 10.255609] iommu_group_device_count(): device name[0]:0000:02:00.0
[ 10.262144] iommu_group_device_count(): device name[1]:0000:02:00.1
Finally, which caused the failure of tg3 driver when tg3 driver calls
the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma().
[ 9.660310] tg3 0000:01:00.0: DMA engine test failed, aborting
[ 9.754085] tg3: probe of 0000:01:00.0 failed with error -12
[ 9.997512] tg3 0000:01:00.1: DMA engine test failed, aborting
[ 10.043053] tg3: probe of 0000:01:00.1 failed with error -12
[ 10.288905] tg3 0000:02:00.0: DMA engine test failed, aborting
[ 10.334070] tg3: probe of 0000:02:00.0 failed with error -12
[ 10.578303] tg3 0000:02:00.1: DMA engine test failed, aborting
[ 10.622629] tg3: probe of 0000:02:00.1 failed with error -12
In addition, the similar situations also occur in other drivers such
as the bnxt_en driver. That can be reproduced easily in kdump kernel
when SME is active.
Let's move the handling currently in iommu_dma_deferred_attach() into
the iommu core code so that it can call the __iommu_attach_device()
directly instead of the iommu_attach_device(). The external interface
iommu_attach_device() is not suitable for handling this situation.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20210126115337.20068-3-lijiang@redhat.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
mapping. This patch adds iova and size as the parameters in it. then the
IOMMU driver could flush tlb with the whole range once after iova mapping
to improve performance.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107122909.16317-3-yong.wu@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
In the end of __iommu_map, It alway call iotlb_sync_map.
This patch moves iotlb_sync_map out from __iommu_map since it is
unnecessary to call this for each sg segment especially iotlb_sync_map
is flush tlb all currently. Add a little helper _iommu_map for this.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107122909.16317-2-yong.wu@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
Function iommu_dev_has_feature() has never been referenced in the tree,
and there does not appear to be anything coming soon to use it, so delete
it.
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1609940111-28563-7-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Function iommu_map_sg_atomic() is only referenced in dma-iommu.c, which
can only be built-in, so stop exporting.
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1609940111-28563-5-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Merge in IOMMU fixes for 5.10 in order to resolve conflicts against the
queue for 5.11.
* for-next/iommu/fixes:
iommu/amd: Set DTE[IntTabLen] to represent 512 IRTEs
iommu/vt-d: Don't read VCCAP register unless it exists
x86/tboot: Don't disable swiotlb when iommu is forced on
iommu: Check return of __iommu_attach_device()
arm-smmu-qcom: Ensure the qcom_scm driver has finished probing
iommu/amd: Enforce 4k mapping for certain IOMMU data structures
MAINTAINERS: Temporarily add myself to the IOMMU entry
iommu/vt-d: Fix compile error with CONFIG_PCI_ATS not set
iommu/vt-d: Avoid panic if iommu init fails in tboot system
iommu/vt-d: Cure VF irqdomain hickup
x86/platform/uv: Fix copied UV5 output archtype
x86/platform/uv: Drop last traces of uv_flush_tlb_others
Miscellaneous IOMMU changes for 5.11. Largely cosmetic, apart from a
change to the way in which identity-mapped domains are configured so
that the requests are now batched and can potentially use larger pages
for the mapping.
* for-next/iommu/misc:
iommu/io-pgtable-arm: Remove unused 'level' parameter from iopte_type() macro
iommu: Defer the early return in arm_(v7s/lpae)_map
iommu: Improve the performance for direct_mapping
iommu: return error code when it can't get group
iommu: Modify the description of iommu_sva_unbind_device
Currently direct_mapping always use the smallest pgsize which is SZ_4K
normally to mapping. This is unnecessary. we could gather the size, and
call iommu_map then, iommu_map could decide how to map better with the
just right pgsize.
>From the original comment, we should take care overlap, otherwise,
iommu_map may return -EEXIST. In this overlap case, we should map the
previous region before overlap firstly. then map the left part.
Each a iommu device will call this direct_mapping when its iommu
initialize, This patch is effective to improve the boot/initialization
time especially while it only needs level 1 mapping.
Signed-off-by: Anan Sun <anan.sun@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Link: https://lore.kernel.org/r/20201207093553.8635-1-yong.wu@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
Although iommu_group_get() in iommu_probe_device() will always succeed
thanks to __iommu_probe_device() creating the group if it's not present,
it's still worth initialising 'ret' to -ENODEV in case this path is
reachable in the future.
For now, this patch results in no functional change.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20201126133825.3643852-1-yangyingliang@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
"/sys/kernel/iommu_groups/<grp_id>/type" file could be read to find out the
default domain type of an iommu group. The default domain of an iommu group
doesn't change after booting and hence could be read directly. But,
after addding support to dynamically change iommu group default domain, the
above assumption no longer stays valid.
iommu group default domain type could be changed at any time by writing to
"/sys/kernel/iommu_groups/<grp_id>/type". So, take group mutex before
reading iommu group default domain type so that the user wouldn't see stale
values or iommu_group_show_type() doesn't try to derefernce stale pointers.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20201124130604.2912899-4-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
Presently, the default domain of an iommu group is allocated during boot
time and it cannot be changed later. So, the device would typically be
either in identity (also known as pass_through) mode or the device would be
in DMA mode as long as the machine is up and running. There is no way to
change the default domain type dynamically i.e. after booting, a device
cannot switch between identity mode and DMA mode.
But, assume a use case wherein the user trusts the device and believes that
the OS is secure enough and hence wants *only* this device to bypass IOMMU
(so that it could be high performing) whereas all the other devices to go
through IOMMU (so that the system is protected). Presently, this use case
is not supported. It will be helpful if there is some way to change the
default domain of an iommu group dynamically. Hence, add such support.
A privileged user could request the kernel to change the default domain
type of a iommu group by writing to
"/sys/kernel/iommu_groups/<grp_id>/type" file. Presently, only three values
are supported
1. identity: all the DMA transactions from the device in this group are
*not* translated by the iommu
2. DMA: all the DMA transactions from the device in this group are
translated by the iommu
3. auto: change to the type the device was booted with
Note:
1. Default domain of an iommu group with two or more devices cannot be
changed.
2. The device in the iommu group shouldn't be bound to any driver.
3. The device shouldn't be assigned to user for direct access.
4. The change request will fail if any device in the group has a mandatory
default domain type and the requested one conflicts with that.
Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more
information.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20201124130604.2912899-3-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
Currently iommu_create_device_direct_mappings() is called
without checking the return of __iommu_attach_device(). This
may result in failures in iommu driver if dev attach returns
error.
Fixes: ce574c27ae ("iommu: Move iommu_group_create_direct_mappings() out of iommu_group_add_device()")
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Link: https://lore.kernel.org/r/20201119165846.34180-1-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
The "data->flags" variable is a u64 so if one of the high 32 bits is
set the original code will allow it, but it should be rejected. The
fix is to declare "mask" as a u64 instead of a u32.
Fixes: d90573812e ("iommu/uapi: Handle data and argsz filled by users")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20201103101623.GA1127762@mwanda
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- ARM-SMMU Updates from Will:
- Continued SVM enablement, where page-table is shared with
CPU
- Groundwork to support integrated SMMU with Adreno GPU
- Allow disabling of MSI-based polling on the kernel
command-line
- Minor driver fixes and cleanups (octal permissions, error
messages, ...)
- Secure Nested Paging Support for AMD IOMMU. The IOMMU will
fault when a device tries DMA on memory owned by a guest. This
needs new fault-types as well as a rewrite of the IOMMU memory
semaphore for command completions.
- Allow broken Intel IOMMUs (wrong address widths reported) to
still be used for interrupt remapping.
- IOMMU UAPI updates for supporting vSVA, where the IOMMU can
access address spaces of processes running in a VM.
- Support for the MT8167 IOMMU in the Mediatek IOMMU driver.
- Device-tree updates for the Renesas driver to support r8a7742.
- Several smaller fixes and cleanups all over the place.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl+Fy9MACgkQK/BELZcB
GuNxtRAA0TdYHXt6XyLWmvRAX/ySZSz6KOneZWWwpsQ9wh2/iv1PtBsrV0ltf+6g
CaX4ROZUVRbV9wPD+7maBRbzxrG3QhfEaaV+K45Q2J/QE1wjkyV8qj1eORWTUUoc
nis4FhGDKk2ER/Gsajy2Hjs4+6i43gdWG/+ghVGaCRo8mCZyoz1/6AyMQyN3deuO
NqWOv9E7hsavZjRs/w/LXG7eSE20cZwtt//kPVJF0r9eQqC6i1eJDQj48iRqJVqd
R0dwBQZaLz++qQptyKebDNlmH/3aAsb+A8nCeS7ZwHqWC1QujTWOUYWpFyPPbOmC
KVsQXzTzRfnVTDECF1Pk5d3yi45KILLU3B4zDJfUJjbL3KDYjuVUvhHF/pcGcjC3
H1LWJqHSAL8sJwHvKhpi0VtQ5SOxXnLO5fGG/CZT/Xb4QyM+mkwkFLdn1TryZTR/
M4XA+QuI96TzY7HQUJdSoEDANxoBef6gPnxdDKOnK1v4hfNsPAl7o8hZkM3w0DK8
GoFZUV+vjBhFcymGcQegSNiea28Hfi+hBe+PPHCmw+tJm47cketD5uP5jJ5NGaUe
MKU/QXWXc6oqeBTQT6ki5zJbJXKttbPa8eEmp+FrMatc9kruvBVhQoMbj7Vd3CA1
dC4zK9Awy7yj24ZhZfnAFx2DboCmBTUI3QKjDt9K5PRZyMeyoP8=
=C0Sg
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- ARM-SMMU Updates from Will:
- Continued SVM enablement, where page-table is shared with CPU
- Groundwork to support integrated SMMU with Adreno GPU
- Allow disabling of MSI-based polling on the kernel command-line
- Minor driver fixes and cleanups (octal permissions, error
messages, ...)
- Secure Nested Paging Support for AMD IOMMU. The IOMMU will fault when
a device tries DMA on memory owned by a guest. This needs new
fault-types as well as a rewrite of the IOMMU memory semaphore for
command completions.
- Allow broken Intel IOMMUs (wrong address widths reported) to still be
used for interrupt remapping.
- IOMMU UAPI updates for supporting vSVA, where the IOMMU can access
address spaces of processes running in a VM.
- Support for the MT8167 IOMMU in the Mediatek IOMMU driver.
- Device-tree updates for the Renesas driver to support r8a7742.
- Several smaller fixes and cleanups all over the place.
* tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (57 commits)
iommu/vt-d: Gracefully handle DMAR units with no supported address widths
iommu/vt-d: Check UAPI data processed by IOMMU core
iommu/uapi: Handle data and argsz filled by users
iommu/uapi: Rename uapi functions
iommu/uapi: Use named union for user data
iommu/uapi: Add argsz for user filled data
docs: IOMMU user API
iommu/qcom: add missing put_device() call in qcom_iommu_of_xlate()
iommu/arm-smmu-v3: Add SVA device feature
iommu/arm-smmu-v3: Check for SVA features
iommu/arm-smmu-v3: Seize private ASID
iommu/arm-smmu-v3: Share process page tables
iommu/arm-smmu-v3: Move definitions to a header
iommu/io-pgtable-arm: Move some definitions to a header
iommu/arm-smmu-v3: Ensure queue is read after updating prod pointer
iommu/amd: Re-purpose Exclusion range registers to support SNP CWWB
iommu/amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR
iommu/amd: Use 4K page for completion wait write-back semaphore
iommu/tegra-smmu: Allow to group clients in same swgroup
iommu/tegra-smmu: Fix iova->phys translation
...
IOMMU user APIs are responsible for processing user data. This patch
changes the interface such that user pointers can be passed into IOMMU
code directly. Separate kernel APIs without user pointers are introduced
for in-kernel users of the UAPI functionality.
IOMMU UAPI data has a user filled argsz field which indicates the data
length of the structure. User data is not trusted, argsz must be
validated based on the current kernel data size, mandatory data size,
and feature flags.
User data may also be extended, resulting in possible argsz increase.
Backward compatibility is ensured based on size and flags (or
the functional equivalent fields) checking.
This patch adds sanity checks in the IOMMU layer. In addition to argsz,
reserved/unused fields in padding, flags, and version are also checked.
Details are documented in Documentation/userspace-api/iommu.rst
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/1601051567-54787-6-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
User APIs such as iommu_sva_unbind_gpasid() may also be used by the
kernel. Since we introduced user pointer to the UAPI functions,
in-kernel callers cannot share the same APIs. In-kernel callers are also
trusted, there is no need to validate the data.
We plan to have two flavors of the same API functions, one called
through ioctls, carrying a user pointer and one called directly with
valid IOMMU UAPI structs. To differentiate both, let's rename existing
functions with an iommu_uapi_ prefix.
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/1601051567-54787-5-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".
No PASID type change in uapi although it defines PASID as __u64 in
some places.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1600187413-163670-2-git-send-email-fenghua.yu@intel.com
To keep naming consistent we should stick with *iotlb*. This patch
renames a few remaining functions.
Signed-off-by: Tom Murphy <murphyt7@tcd.ie>
Link: https://lore.kernel.org/r/20200817210051.13546-1-murphyt7@tcd.ie
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The sparse tool complains as follows:
drivers/iommu/iommu.c:386:5: warning:
symbol 'iommu_insert_resv_region' was not declared. Should it be static?
drivers/iommu/iommu.c:2182:5: warning:
symbol '__iommu_map' was not declared. Should it be static?
Those functions are not used outside of iommu.c, so mark them static.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20200713142542.50294-1-weiyongjun1@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Some PCIe devices do not expect a PASID value in PRI Page Responses.
If the "PRG Response PASID Required" bit in the PRI capability is zero,
then the OS should not set the PASID field. Similarly on Arm SMMU,
responses to stall events do not have a PASID.
Currently iommu_page_response() systematically checks that the PASID in
the page response corresponds to the one in the page request. This can't
work with virtualization because a page response coming from a guest OS
won't have a PASID if the passed-through device does not require one.
Add a flag to page requests that declares whether the corresponding
response needs to have a PASID. When this flag isn't set, allow page
responses without PASID.
Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20200616144712.748818-1-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The iommu_group_do_dma_attach() must not attach devices which have
deferred_attach set. Otherwise devices could cause IOMMU faults when
re-initialized in a kdump kernel.
Fixes: deac0b3bed ("iommu: Split off default domain allocation from group assignment")
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20200604091944.26402-1-joro@8bytes.org
kobject_init_and_add() takes reference even when it fails.
Thus, when kobject_init_and_add() returns an error,
kobject_put() must be called to properly clean up the kobject.
Fixes: d72e31c937 ("iommu: IOMMU Groups")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Link: https://lore.kernel.org/r/20200527210020.6522-1-wu000273@umn.edu
Signed-off-by: Joerg Roedel <jroedel@suse.de>
After binding a device to an mm, device drivers currently need to
register a mm_exit handler. This function is called when the mm exits,
to gracefully stop DMA targeting the address space and flush page faults
to the IOMMU.
This is deemed too complex for the MMU release() notifier, which may be
triggered by any mmput() invocation, from about 120 callsites [1]. The
upcoming SVA module has an example of such complexity: the I/O Page
Fault handler would need to call mmput_async() instead of mmput() after
handling an IOPF, to avoid triggering the release() notifier which would
in turn drain the IOPF queue and lock up.
Another concern is the DMA stop function taking too long, up to several
minutes [2]. For some mmput() callers this may disturb other users. For
example, if the OOM killer picks the mm bound to a device as the victim
and that mm's memory is locked, if the release() takes too long, it
might choose additional innocent victims to kill.
To simplify the MMU release notifier, don't forward the notification to
device drivers. Since they don't stop DMA on mm exit anymore, the PASID
lifetime is extended:
(1) The device driver calls bind(). A PASID is allocated.
Here any DMA fault is handled by mm, and on error we don't print
anything to dmesg. Userspace can easily trigger errors by issuing DMA
on unmapped buffers.
(2) exit_mmap(), for example the process took a SIGKILL. This step
doesn't happen during normal operations. Remove the pgd from the
PASID table, since the page tables are about to be freed. Invalidate
the IOTLBs.
Here the device may still perform DMA on the address space. Incoming
transactions are aborted but faults aren't printed out. ATS
Translation Requests return Successful Translation Completions with
R=W=0. PRI Page Requests return with Invalid Request.
(3) The device driver stops DMA, possibly following release of a fd, and
calls unbind(). PASID table is cleared, IOTLB invalidated if
necessary. The page fault queues are drained, and the PASID is
freed.
If DMA for that PASID is still running here, something went seriously
wrong and errors should be reported.
For now remove iommu_sva_ops entirely. We might need to re-introduce
them at some point, for example to notify device drivers of unhandled
IOPF.
[1] https://lore.kernel.org/linux-iommu/20200306174239.GM31668@ziepe.ca/
[2] https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa796c3@amd.com/
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200423125329.782066-3-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The iommu_alloc_default_domain() function takes a reference to an IOMMU
group without releasing it. This causes the group to never be released,
with undefined side effects.
The function has only one call-site, which takes a group reference on
its own, so to fix this leak, do not take another reference in
iommu_alloc_default_domain() and pass the group as a function parameter
instead.
Fixes: 6e1aa20491 ("iommu: Move default domain allocation to iommu_probe_device()")
Reported-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Cc: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Link: https://lore.kernel.org/r/20200525130122.380-1-joro@8bytes.org
Reference: https://lore.kernel.org/lkml/20200522130145.30067-1-saiprakash.ranjan@codeaurora.org/
The .probe_finalize() call-back of some IOMMU drivers calls into
arm_iommu_attach_device(). This function will call back into the
IOMMU core code, where it tries to take group->mutex again, resulting
in a deadlock.
As there is no reason why .probe_finalize() needs to be called under
that mutex, move it after the lock has been released to fix the
deadlock.
Fixes: deac0b3bed ("iommu: Split off default domain allocation from group assignment")
Reported-by: Yong Wu <yong.wu@mediatek.com>
Tested-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: Yong Wu <yong.wu@mediatek.com>
Link: https://lore.kernel.org/r/20200519132824.15163-1-joro@8bytes.org
The IOMMU core code has support for deferring the attachment of a domain
to a device. This is needed in kdump kernels where the new domain must
not be attached to a device before the device driver takes it over.
When the AMD IOMMU driver got converted to use the dma-iommu
implementation, the deferred attaching got lost. The code in
dma-iommu.c has support for deferred attaching, but it calls into
iommu_attach_device() to actually do it. But iommu_attach_device()
will check if the device should be deferred in it code-path and do
nothing, breaking deferred attachment.
Move the is_deferred_attach() check out of the attach_device path and
into iommu_group_add_device() to make deferred attaching work from the
dma-iommu code.
Fixes: 795bbbb9b6 ("iommu/dma-iommu: Handle deferred devices")
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Tom Murphy <murphyt7@tcd.ie>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20200519130340.14564-1-joro@8bytes.org
After moving iommu_group setup to iommu core code [1][2] and removing
private domain support in vt-d [3], there are no users for functions such
as iommu_request_dm_for_dev(), iommu_request_dma_domain_for_dev() and
request_default_domain_for_dev(). So, remove these functions.
[1] commit dce8d6964e ("iommu/amd: Convert to probe/release_device()
call-backs")
[2] commit e5d1841f18 ("iommu/vt-d: Convert to probe/release_device()
call-backs")
[3] commit 327d5b2fee ("iommu/vt-d: Allow 32bit devices to uses DMA
domain")
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200513224721.20504-1-sai.praneeth.prakhya@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The host1x bus implemented on Tegra SoCs is primarily an abstraction to
create logical device from multiple platform devices. Since the devices
in such a setup are typically hierarchical, DMA setup still needs to be
done so that DMA masks can be properly inherited, but we don't actually
want to attach the host1x logical devices to any IOMMU. The platform
devices that make up the logical device are responsible for memory bus
transactions, so it is them that will need to be attached to the IOMMU.
Add a check to __iommu_probe_device() that aborts IOMMU setup early for
busses that don't have the IOMMU operations pointer set since they will
cause a crash otherwise.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20200511161000.3853342-1-thierry.reding@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The function is now only used in IOMMU core code and shouldn't be used
outside of it anyway, so remove the export for it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-35-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the calls to dev_iommu_get() and try_module_get() into
__iommu_probe_device(), so that the callers don't have to do it on
their own.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-34-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All drivers are converted to use the probe/release_device()
call-backs, so the add_device/remove_device() pointers are unused and
the code using them can be removed.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-33-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a check to the bus_iommu_probe() call-path to make sure it ignores
devices which have already been successfully probed. Then export the
bus_iommu_probe() function so it can be used by IOMMU drivers.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-14-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
After the previous changes the iommu group may not have a default
domain when iommu_group_add_device() is called. With no default domain
iommu_group_create_direct_mappings() will do nothing and no direct
mappings will be created.
Rename iommu_group_create_direct_mappings() to
iommu_create_device_direct_mappings() to better reflect that the
function creates direct mappings only for one device and not for all
devices in the group. Then move the call to the places where a default
domain actually exists.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-13-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When a bus is initialized with iommu-ops, all devices on the bus are
scanned and iommu-groups are allocated for them, and each groups will
also get a default domain allocated.
Until now this happened as soon as the group was created and the first
device added to it. When other devices with different default domain
requirements were added to the group later on, the default domain was
re-allocated, if possible.
This resulted in some back and forth and unnecessary allocations, so
change the flow to defer default domain allocation until all devices
have been added to their respective IOMMU groups.
The default domains are allocated for newly allocated groups after
each device on the bus is handled and was probed by the IOMMU driver.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-12-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This makes it easier to remove to old code-path when all drivers are
converted. As a side effect that it also fixes the error cleanup
path.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-11-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This is needed to defer default_domain allocation for new IOMMU groups
until all devices have been added to the group.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-10-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Well, not really. The call to iommu_alloc_default_domain() in
iommu_group_get_for_dev() has to stay around as long as there are
IOMMU drivers using the add/remove_device() call-backs instead of
probe/release_device().
Those drivers expect that iommu_group_get_for_dev() returns the device
attached to a group and the group set up with a default domain (and
the device attached to the groups current domain).
But when all drivers are converted this compatability mess can be
removed.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-9-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add call-backs to 'struct iommu_ops' as an alternative to the
add_device() and remove_device() call-backs, which will be removed when
all drivers are converted.
The new call-backs will not setup IOMMU groups and domains anymore,
so also add a probe_finalize() call-back where the IOMMU driver can do
per-device setup work which require the device to be set up with a
group and a domain.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-8-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Some devices are reqired to use a specific type (identity or dma)
of default domain when they are used with a vendor iommu. When the
system level default domain type is different from it, the vendor
iommu driver has to request a new default domain with
iommu_request_dma_domain_for_dev() and iommu_request_dm_for_dev()
in the add_dev() callback. Unfortunately, these two helpers only
work when the group hasn't been assigned to any other devices,
hence, some vendor iommu driver has to use a private domain if
it fails to request a new default one.
This adds def_domain_type() callback in the iommu_ops, so that
any special requirement of default domain for a device could be
aware by the iommu generic layer.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
[ jroedel@suse.de: Added iommu_get_def_domain_type() function and use
it to allocate the default domain ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-3-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the code out of iommu_group_get_for_dev() into a separate
function.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20200429133712.31431-2-joro@8bytes.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In commit a7ba5c3d00 ("drivers/iommu: Export core IOMMU API symbols to
permit modular drivers") a bunch of iommu symbols were exported, all
with _GPL markings except iommu_group_get_for_dev(). That export should
also be _GPL like the others.
Fixes: a7ba5c3d00 ("drivers/iommu: Export core IOMMU API symbols to permit modular drivers")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200430120120.2948448-1-gregkh@linuxfoundation.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the iommu_fwspec pointer in struct device into struct dev_iommu.
This is a step in the effort to reduce the iommu related pointers in
struct device to one.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Will Deacon <will@kernel.org> # arm-smmu
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200326150841.10083-7-joro@8bytes.org
Although the 1-element array was a typical pre-C99 way to implement
variable-length structures, and indeed is a fundamental construct in the
APIs of certain other popular platforms, there's no good reason for it
here (and in particular the sizeof() trick is far too "clever" for its
own good). We can just as easily implement iommu_fwspec's preallocation
behaviour using a standard flexible array member, so let's make it look
the way most readers would expect.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This adds the missing teardown step that removes the device link from
the group when the device addition fails.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Fixes: 797a8b4d76 ("iommu: Handle default domain attach failure")
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Implement a generic function for removing reserved regions. This can be
used by drivers that don't do anything fancy with these regions other
than allocating memory for them.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
'bus_set_iommu()' allows IOMMU drivers to register their ops for a given
bus type. Unfortunately, it then doesn't allow them to be removed, which
is necessary for modular drivers to shutdown cleanly so that they can be
reloaded later on.
Allow 'bus_set_iommu()' to take a NULL 'ops' argument, which clear the
ops pointer for the selected bus_type.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
To avoid accidental removal of an active IOMMU driver module, take a
reference to the driver module in 'iommu_probe_device()' immediately
prior to invoking the '->add_device()' callback and hold it until the
after the device has been removed by '->remove_device()'.
Suggested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Building IOMMU drivers as modules requires that the core IOMMU API
symbols are exported as GPL symbols.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- Fix kmemleak warning in IOVA code
- Fix compile warnings on ARM32/64 in dma-iommu code due to
dma_mask type mismatches
- Make ISA reserved regions relaxable, so that VFIO can assign
devices which have such regions defined
- Fix mapping errors resulting in IO page-faults in the VT-d
driver
- Make sure direct mappings for a domain are created after the
default domain is updated
- Map ISA reserved regions in the VT-d driver with correct
permissions
- Remove unneeded check for PSI capability in the IOTLB flush
code of the VT-d driver
- Lockdep fix iommu_dma_prepare_msi()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl38rzEACgkQK/BELZcB
GuNPChAAzFdZw0GRphdnsrGog7vJICukFshPifLD8NeJXYzqLzRY89LT/sg4gZrZ
K3Uibg8+0OmWl21JqAzDzXeHYUwDV0Xe/ygjeOdqFn3LY8zCo6UcY4OLCZ1az/XU
om/yjTBgZjgBcUAxkzRJSdditQ2p7ItEa4dXnlpeCV07vQEmS/5x8JkNsea7CG2h
bvBLYW5DpJ1LsJo1WjONHw0DvRkExQsXZaA3zj/6BzfQIXUnkF1Imkgr9gTbXXOl
nGHHRLVtsFqv0U5JWz6fZh4/UgvInq45gZIkvvxQWAM/Kn9wxe2RKDwpQJ1wZ8wc
S5fwSPa5g5k2X73BbEHx7AFYESpgCRFOeG74i9b7/DlzsbM+aTGPZ1/4kLt9fl+u
+AOUV3l9/rqrrmeUEBF7F3kFC9/OL0KIT17xdJfQG1x3RBm9OHy1q0GQH4q8ZbWM
aoWg3Ryc4uO/4Majm/kIjADKR0512LvplXsRXhWpud37szhL6vMJDxBb1zKZJgQ1
j/PFUWgolCvmSG1Q048I9pljrsqfE9pgQhmITQ9VAny6eAaZgT7Y21MbBTyQksem
/O08TWGAFddH4U9pGnQ1ST/q5hcVvnUgzy12A3MOuOYh5tWfAbeZparjQsu7bFhp
uaJudGyXg9Xgrg82XQ2x/Nk710wLRAGG4VKnUz+mBDEzOOmcems=
=VWME
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix kmemleak warning in IOVA code
- Fix compile warnings on ARM32/64 in dma-iommu code due to dma_mask
type mismatches
- Make ISA reserved regions relaxable, so that VFIO can assign devices
which have such regions defined
- Fix mapping errors resulting in IO page-faults in the VT-d driver
- Make sure direct mappings for a domain are created after the default
domain is updated
- Map ISA reserved regions in the VT-d driver with correct permissions
- Remove unneeded check for PSI capability in the IOTLB flush code of
the VT-d driver
- Lockdep fix iommu_dma_prepare_msi()
* tag 'iommu-fixes-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/dma: Relax locking in iommu_dma_prepare_msi()
iommu/vt-d: Remove incorrect PSI capability check
iommu/vt-d: Allocate reserved region for ISA with correct permission
iommu: set group default domain before creating direct mappings
iommu/vt-d: Fix dmar pte read access not set error
iommu/vt-d: Set ISA bridge reserved region as relaxable
iommu/dma: Rationalise types for DMA masks
iommu/iova: Init the struct iova to fix the possible memleak
iommu_group_create_direct_mappings uses group->default_domain, but
right after it is called, request_default_domain_for_dev calls
iommu_domain_free for the default domain, and sets the group default
domain to a different domain. Move the
iommu_group_create_direct_mappings call to after the group default
domain is set, so the direct mappings get associated with that domain.
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: stable@vger.kernel.org
Fixes: 7423e01741 ("iommu: Add API to request DMA domain for device")
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In case the new region gets merged into another one, the nr list node is
freed. Checking its type while completing the merge algorithm leads to
a use-after-free. Use new->type instead.
Fixes: 4dbd258ff6 ("iommu: Revisit iommu_insert_resv_region() implementation")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Qian Cai <cai@lca.pw>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Stable <stable@vger.kernel.org> #v5.3+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guest shared virtual address (SVA) may require host to shadow guest
PASID tables. Guest PASID can also be allocated from the host via
enlightened interfaces. In this case, guest needs to bind the guest
mm, i.e. cr3 in guest physical address to the actual PASID table in
the host IOMMU. Nesting will be turned on such that guest virtual
address can go through a two level translation:
- 1st level translates GVA to GPA
- 2nd level translates GPA to HPA
This patch introduces APIs to bind guest PASID data to the assigned
device entry in the physical IOMMU. See the diagram below for usage
explanation.
.-------------. .---------------------------.
| vIOMMU | | Guest process mm, FL only |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush -
'-------------' |
| | V
| | GP
'-------------'
Guest
------| Shadow |----------------------- GP->HP* ---------
v v |
Host v
.-------------. .----------------------.
| pIOMMU | | Bind FL for GVA-GPA |
| | '----------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.---------------------.
| | |Set SL to GPA-HPA |
| | '---------------------'
'-------------'
Where:
- FL = First level/stage one page tables
- SL = Second level/stage two page tables
- GP = Guest PASID
- HP = Host PASID
* Conversion needed if non-identity GP-HP mapping option is chosen.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In any virtualization use case, when the first translation stage
is "owned" by the guest OS, the host IOMMU driver has no knowledge
of caching structure updates unless the guest invalidation activities
are trapped by the virtualizer and passed down to the host.
Since the invalidation data can be obtained from user space and will be
written into physical IOMMU, we must allow security check at various
layers. Therefore, generic invalidation data format are proposed here,
model specific IOMMU drivers need to convert them into their own format.
Signed-off-by: Yi L Liu <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a gfp_t parameter to the iommu_ops::map function.
Remove the needless locking in the AMD iommu driver.
The iommu_ops::map function (or the iommu_map function which calls it)
was always supposed to be sleepable (according to Joerg's comment in
this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
should probably have had a "might_sleep()" since it was written. However
currently the dma-iommu api can call iommu_map in an atomic context,
which it shouldn't do. This doesn't cause any problems because any iommu
driver which uses the dma-iommu api uses gfp_atomic in it's
iommu_ops::map function. But doing this wastes the memory allocators
atomic pools.
Signed-off-by: Tom Murphy <murphyt7@tcd.ie>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Switch to the generic function mem_encrypt_active() because
sme_active() is x86 specific and can't be called from
generic code on other platforms than x86.
Fixes: 2cc13bb4f5 ("iommu: Disable passthrough mode when SME is active")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
These comments are wrong. request_default_domain_for_dev doesn't just
handle direct mapped domains.
Signed-off-by: Tom Murphy <murphyt7@tcd.ie>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Current implementation is recursive and in case of allocation
failure the existing @regions list is altered. A non recursive
version looks better for maintainability and simplifies the
error handling. We use a separate stack for overlapping segment
merging. The elements are sorted by start address and then by
type, if their start address match.
Note this new implementation may change the region order of
appearance in /sys/kernel/iommu_groups/<n>/reserved_regions
files but this order has never been documented, see
commit bc7d12b91b ("iommu: Implement reserved_regions
iommu-group sysfs file").
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Using Passthrough mode when SME is active causes certain
devices to use the SWIOTLB bounce buffer. The bounce buffer
code has an upper limit of 256kb for the size of DMA
allocations, which is too small for certain devices and
causes them to fail.
With this patch we enable IOMMU by default when SME is
active in the system, making the default configuration work
for more systems than it does now.
Users that don't want IOMMUs to be enabled still can disable
them with kernel parameters.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Set the default domain-type at runtime, not at compile-time.
This keeps default domain type setting in one place when we
have to change it at runtime.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There are functions now to set the default domain type which
take care of updating other necessary state. Don't open-code
it in iommu_set_def_domain_type() and use those functions
instead.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a couple of functions to allow changing the default
domain type from architecture code and a function for iommu
drivers to request whether the default domain is
passthrough.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce an extensible concept to remember when certain
configuration settings for the IOMMU code have been set on
the kernel command line.
This will be used later to prevent overwriting these
settings with other defaults.
Signed-off-by: Joerg Roedel <jroedel@suse.de>