When invalidating the ATC for an PCIe endpoint using ATS, we must take
care to complete invalidation of the main SMMU TLBs beforehand, otherwise
the device could immediately repopulate its ATC with stale translations.
Hooking the ATC invalidation into ->unmap() as we currently do does the
exact opposite: it ensures that the ATC is invalidated *before* the
main TLBs, which is bogus.
Move ATC invalidation into the actual (leaf) invalidation routines so
that it is always called after completing main TLB invalidation.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
To prevent any potential issues arising from speculative Address
Translation Requests from an ATS-enabled PCIe endpoint, rework our ATS
enabling/disabling logic so that we enable ATS at the SMMU before we
enable it at the endpoint, and disable things in the opposite order.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Calling arm_smmu_tlb_inv_range() with a size of zero, perhaps due to
an empty 'iommu_iotlb_gather' structure, should be a NOP. Elide the
CMD_SYNC when there is no invalidation to be performed.
Signed-off-by: Will Deacon <will@kernel.org>
There's really no need for this to be a bitfield, particularly as we
don't have bitwise addressing on arm64.
Signed-off-by: Will Deacon <will@kernel.org>
Detecting the ATS capability of the SMMU at probe time introduces a
spinlock into the ->unmap() fast path, even when ATS is not actually
in use. Furthermore, the ATC invalidation that exists is broken, as it
occurs before invalidation of the main SMMU TLB which leaves a window
where the ATC can be repopulated with stale entries.
Given that ATS is both a new feature and a specialist sport, disable it
for now whilst we fix it properly in subsequent patches. Since PRI
requires ATS, disable that too.
Cc: <stable@vger.kernel.org>
Fixes: 9ce27afc08 ("iommu/arm-smmu-v3: Add support for PCI ATS")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
It turns out that we've always relied on some subtle ordering guarantees
when inserting commands into the SMMUv3 command queue. With the recent
changes to elide locking when possible, these guarantees become more
subtle and even more important.
Add a comment documented the barrier semantics of command insertion so
that we don't have to derive the behaviour from scratch each time it
comes up on the list.
Signed-off-by: Will Deacon <will@kernel.org>
Update the iommu_iotlb_gather structure passed to ->tlb_add_page() and
use this information to defer all TLB invalidation until ->iotlb_sync().
This drastically reduces contention on the command queue, since we can
insert our commands in batches rather than one-by-one.
Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>
The SMMU command queue is a bottleneck in large systems, thanks to the
spin_lock which serialises accesses from all CPUs to the single queue
supported by the hardware.
Attempt to improve this situation by moving to a new algorithm for
inserting commands into the queue, which is lock-free on the fast-path.
Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for rewriting the command queue insertion code to use a
new algorithm, rework many of our queue macro accessors and manipulation
functions so that they operate on the arm_smmu_ll_queue structure where
possible. This will allow us to call these helpers on local variables
without having to construct a full-blown arm_smmu_queue on the stack.
No functional change.
Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for rewriting the command queue insertion code to use a
new algorithm, introduce a new arm_smmu_ll_queue structure which contains
only the information necessary to perform queue arithmetic for a queue
and will later be extended so that we can perform complex atomic
manipulation on some of the fields.
No functional change.
Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>
The Q_OVF macro doesn't need to access the arm_smmu_queue structure, so
drop the unused macro argument.
No functional change.
Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for rewriting the command queue insertion code to use a
new algorithm, separate the software and hardware views of the prod and
cons indexes so that manipulating the software state doesn't
automatically update the hardware state at the same time.
No functional change.
Tested-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Signed-off-by: Will Deacon <will@kernel.org>
With all the pieces in place, we can finally propagate the
iommu_iotlb_gather structure from the call to unmap() down to the IOMMU
drivers' implementation of ->tlb_add_page(). Currently everybody ignores
it, but the machinery is now there to defer invalidation.
Signed-off-by: Will Deacon <will@kernel.org>
Update the io-pgtable ->unmap() function to take an iommu_iotlb_gather
pointer as an argument, and update the callers as appropriate.
Signed-off-by: Will Deacon <will@kernel.org>
The ->tlb_add_flush() callback in the io-pgtable API now looks a bit
silly:
- It takes a size and a granule, which are always the same
- It takes a 'bool leaf', which is always true
- It only ever flushes a single page
With that in mind, replace it with an optional ->tlb_add_page() callback
that drops the useless parameters.
Signed-off-by: Will Deacon <will@kernel.org>
Now that all IOMMU drivers using the io-pgtable API implement the
->tlb_flush_walk() and ->tlb_flush_leaf() callbacks, we can use them in
the io-pgtable code instead of ->tlb_add_flush() immediately followed by
->tlb_sync().
Signed-off-by: Will Deacon <will@kernel.org>
Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers using the
io-pgtable API so that we can start making use of them in the page-table
code. For now, they can just wrap the implementations of ->tlb_add_flush
and ->tlb_sync pending future optimisation in each driver.
Signed-off-by: Will Deacon <will@kernel.org>
To allow IOMMU drivers to batch up TLB flushing operations and postpone
them until ->iotlb_sync() is called, extend the prototypes for the
->unmap() and ->iotlb_sync() IOMMU ops callbacks to take a pointer to
the current iommu_iotlb_gather structure.
All affected IOMMU drivers are updated, but there should be no
functional change since the extra parameter is ignored for now.
Signed-off-by: Will Deacon <will@kernel.org>
To permit batching of TLB flushes across multiple calls to the IOMMU
driver's ->unmap() implementation, introduce a new structure for
tracking the address range to be flushed and the granularity at which
the flushing is required.
This is hooked into the IOMMU API and its caller are updated to make use
of the new structure. Subsequent patches will plumb this into the IOMMU
drivers as well, but for now the gathering information is ignored.
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for TLB flush gathering in the IOMMU API, rename the
iommu_gather_ops structure in io-pgtable to iommu_flush_ops, which
better describes its purpose and avoids the potential for confusion
between different levels of the API.
$ find linux/ -type f -name '*.[ch]' | xargs sed -i 's/gather_ops/flush_ops/g'
Signed-off-by: Will Deacon <will@kernel.org>
Commit b6b65ca20b ("iommu/io-pgtable-arm: Add support for non-strict
mode") added an unconditional call to io_pgtable_tlb_sync() immediately
after the case where we replace a block entry with a table entry during
an unmap() call. This is redundant, since the IOMMU API will call
iommu_tlb_sync() on this path and the patch in question mentions this:
| To save having to reason about it too much, make sure the invalidation
| in arm_lpae_split_blk_unmap() just performs its own unconditional sync
| to minimise the window in which we're technically violating the break-
| before-make requirement on a live mapping. This might work out redundant
| with an outer-level sync for strict unmaps, but we'll never be splitting
| blocks on a DMA fastpath anyway.
However, this sync gets in the way of deferred TLB invalidation for leaf
entries and is at best a questionable, unproven hack. Remove it.
Signed-off-by: Will Deacon <will@kernel.org>
Commit add02cfdc9 ("iommu: Introduce Interface for IOMMU TLB Flushing")
added three new TLB flushing operations to the IOMMU API so that the
underlying driver operations can be batched when unmapping large regions
of IO virtual address space.
However, the ->iotlb_range_add() callback has not been implemented by
any IOMMU drivers (amd_iommu.c implements it as an empty function, which
incurs the overhead of an indirect branch). Instead, drivers either flush
the entire IOTLB in the ->iotlb_sync() callback or perform the necessary
invalidation during ->unmap().
Attempting to implement ->iotlb_range_add() for arm-smmu-v3.c revealed
two major issues:
1. The page size used to map the region in the page-table is not known,
and so it is not generally possible to issue TLB flushes in the most
efficient manner.
2. The only mutable state passed to the callback is a pointer to the
iommu_domain, which can be accessed concurrently and therefore
requires expensive synchronisation to keep track of the outstanding
flushes.
Remove the callback entirely in preparation for extending ->unmap() and
->iotlb_sync() to update a token on the caller's stack.
Signed-off-by: Will Deacon <will@kernel.org>
The commit b3aa14f022 ("iommu: remove the mapping_error dma_map_ops
method") incorrectly changed the checking from dma_ops_alloc_iova() in
map_sg() causes a crash under memory pressure as dma_ops_alloc_iova()
never return DMA_MAPPING_ERROR on failure but 0, so the error handling
is all wrong.
kernel BUG at drivers/iommu/iova.c:801!
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
Call Trace:
free_cpu_cached_iovas+0xbd/0x150
alloc_iova_fast+0x8c/0xba
dma_ops_alloc_iova.isra.6+0x65/0xa0
map_sg+0x8c/0x2a0
scsi_dma_map+0xc6/0x160
pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
scsi_queue_rq+0x79c/0x1200
blk_mq_dispatch_rq_list+0x4dc/0xb70
blk_mq_sched_dispatch_requests+0x249/0x310
__blk_mq_run_hw_queue+0x128/0x200
blk_mq_run_work_fn+0x27/0x30
process_one_work+0x522/0xa10
worker_thread+0x63/0x5b0
kthread+0x1d2/0x1f0
ret_from_fork+0x22/0x40
Fixes: b3aa14f022 ("iommu: remove the mapping_error dma_map_ops method")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new iommu device
vhost guest memory access using vmap (just meta-data for now)
minor fixes
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Note: due to code driver changes the driver-core tree, the following
patch is needed when merging tree with commit 92ce7e83b4
("driver_find_device: Unify the match function with
class_find_device()") in the driver-core tree:
From: Nathan Chancellor <natechancellor@gmail.com>
Subject: [PATCH] iommu/virtio: Constify data parameter in viommu_match_node
After commit 92ce7e83b4 ("driver_find_device: Unify the match
function with class_find_device()") in the driver-core tree.
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/iommu/virtio-iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 4620dd221ffd..433f4d2ee956 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -839,7 +839,7 @@ static void viommu_put_resv_regions(struct device *dev, struct list_head *head)
static struct iommu_ops viommu_ops;
static struct virtio_driver virtio_iommu_drv;
-static int viommu_match_node(struct device *dev, void *data)
+static int viommu_match_node(struct device *dev, const void *data)
{
return dev->parent->fwnode == data;
}
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJdJ5qUAAoJECgfDbjSjVRpQs0H/2qWcIG1zjGKyh9KWrfgOusG
/QIqeP50d7SC6oqdyd00tzmExqO1xdGLPFzYixdOsU817te1gHBP4Rfmzo01jZRd
CUzZNnZQ2JRsDshiA6G2ui+wn1/a/cB3RPN4rT1mquDYS53QmsRGDQDnpp84TXMV
aocB8TS6halbRzKMq3VmaWHIvzNXnt4dwQR542+PyeLLn9bUx2QwWj2ON3QwxixK
dVRZow3GwLGBhKTA/Z1Z/Bta4fEfOKjUGP2XWgvL6zOr+nZR4eQ8w5WXVJYzR+d6
1JCfqTxleweT2k6Tu5VwtTNlQkxn/XvQAeisppOiEE6NnPjubyI9wMQIvL7bkpo=
=uJbC
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio, vhost updates from Michael Tsirkin:
"Fixes, features, performance:
- new iommu device
- vhost guest memory access using vmap (just meta-data for now)
- minor fixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-mmio: add error check for platform_get_irq
scsi: virtio_scsi: Use struct_size() helper
iommu/virtio: Add event queue
iommu/virtio: Add probe request
iommu: Add virtio-iommu driver
PCI: OF: Initialize dev->fwnode appropriately
of: Allow the iommu-map property to omit untranslated devices
dt-bindings: virtio: Add virtio-pci-iommu node
dt-bindings: virtio-mmio: Add IOMMU description
vhost: fix clang build warning
vhost: access vq metadata through kernel virtual address
vhost: factor out setting vring addr and num
vhost: introduce helpers to get the size of metadata area
vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
vhost: fine grain userspace memory accessors
vhost: generalize adding used elem
- move the USB special case that bounced DMA through a device
bar into the USB code instead of handling it in the common
DMA code (Laurentiu Tudor and Fredrik Noring)
- don't dip into the global CMA pool for single page allocations
(Nicolin Chen)
- fix a crash when allocating memory for the atomic pool failed
during boot (Florian Fainelli)
- move support for MIPS-style uncached segments to the common
code and use that for MIPS and nios2 (me)
- make support for DMA_ATTR_NON_CONSISTENT and
DMA_ATTR_NO_KERNEL_MAPPING generic (me)
- convert nds32 to the generic remapping allocator (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl0nPqgLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYNj2hAAxIv2O3wv6V5xhzWwOVo8e/xW1ZLlGAF0/z92u0do
32Tm8jkdAGjZDnyxam7qisMSIjCNykpauQzVVxyUNBRSsn1V5t7KSaH3/OXCOVcr
x2VWBirxGO2BbRseaCBjIcA/2qna+VIDGFcNXCtf6rM00YUK6qaJzkMwBKQAeYcM
uJMJkaf8qaW4hygLJP8axXiGFdIJyFNLAlJ+ok6kYsJHHJNceOp0bo3CDa2mJBK9
IhraK2zVkyE5EQkQM5cE/Kw1ppPelUKUkHwjgM4wpz2b18WbLu11nKP0hmUcvKRQ
heY8xWiKxN0QTgS03ou7EVylyrSAE4dIKgzuA4VO32QCGsWypcAg4iU6s5TX6p9g
tZEW2ckE6wbmRdQPyKoDpZg299/eQjRHc4MAA1yinT8tFMokw2tk8Fq1FWyltwL1
8EiP5oNs2qUNvNgqUresl6/f6YOacFi1Q6IhgBVj6d6lyhMhlsHfW4w1XA1siv/I
6l4qJbLohYab6hY7i+mBOd8iG/KrAlr4P6admnv2jDchswbb5t2j+ABE9xv++PFi
u1HFqMlxqdWQaXGca2UeCUxUjkwO9N+kHpP+VRz+6D2b64dtCWSu8CN23sYXm2tO
ubWIlrQQZPhhMkoFg7XqKSTacd+ut+SXN9Nxsyv548ETV0l1xbiLRHIbhyoIESD5
RAI=
=01Fr
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.3' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- move the USB special case that bounced DMA through a device bar into
the USB code instead of handling it in the common DMA code (Laurentiu
Tudor and Fredrik Noring)
- don't dip into the global CMA pool for single page allocations
(Nicolin Chen)
- fix a crash when allocating memory for the atomic pool failed during
boot (Florian Fainelli)
- move support for MIPS-style uncached segments to the common code and
use that for MIPS and nios2 (me)
- make support for DMA_ATTR_NON_CONSISTENT and
DMA_ATTR_NO_KERNEL_MAPPING generic (me)
- convert nds32 to the generic remapping allocator (me)
* tag 'dma-mapping-5.3' of git://git.infradead.org/users/hch/dma-mapping: (29 commits)
dma-mapping: mark dma_alloc_need_uncached as __always_inline
MIPS: only select ARCH_HAS_UNCACHED_SEGMENT for non-coherent platforms
usb: host: Fix excessive alignment restriction for local memory allocations
lib/genalloc.c: Add algorithm, align and zeroed family of DMA allocators
nios2: use the generic uncached segment support in dma-direct
nds32: use the generic remapping allocator for coherent DMA allocations
arc: use the generic remapping allocator for coherent DMA allocations
dma-direct: handle DMA_ATTR_NO_KERNEL_MAPPING in common code
dma-direct: handle DMA_ATTR_NON_CONSISTENT in common code
dma-mapping: add a dma_alloc_need_uncached helper
openrisc: remove the partial DMA_ATTR_NON_CONSISTENT support
arc: remove the partial DMA_ATTR_NON_CONSISTENT support
arm-nommu: remove the partial DMA_ATTR_NON_CONSISTENT support
ARM: dma-mapping: allow larger DMA mask than supported
dma-mapping: truncate dma masks to what dma_addr_t can hold
iommu/dma: Apply dma_{alloc,free}_contiguous functions
dma-remap: Avoid de-referencing NULL atomic_pool
MIPS: use the generic uncached segment support in dma-direct
dma-direct: provide generic support for uncached kernel segments
au1100fb: fix DMA API abuse
...
Here is the "big" driver core and debugfs changes for 5.3-rc1
It's a lot of different patches, all across the tree due to some api
changes and lots of debugfs cleanups. Because of this, there is going
to be some merge issues with your tree at the moment, I'll follow up
with the expected resolutions to make it easier for you.
Other than the debugfs cleanups, in this set of changes we have:
- bus iteration function cleanups (will cause build warnings
with s390 and coresight drivers in your tree)
- scripts/get_abi.pl tool to display and parse Documentation/ABI
entries in a simple way
- cleanups to Documenatation/ABI/ entries to make them parse
easier due to typos and other minor things
- default_attrs use for some ktype users
- driver model documentation file conversions to .rst
- compressed firmware file loading
- deferred probe fixes
All of these have been in linux-next for a while, with a bunch of merge
issues that Stephen has been patient with me for. Other than the merge
issues, functionality is working properly in linux-next :)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXSgpnQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykcwgCfS30OR4JmwZydWGJ7zK/cHqk+KjsAnjOxjC1K
LpRyb3zX29oChFaZkc5a
=XrEZ
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs updates from Greg KH:
"Here is the "big" driver core and debugfs changes for 5.3-rc1
It's a lot of different patches, all across the tree due to some api
changes and lots of debugfs cleanups.
Other than the debugfs cleanups, in this set of changes we have:
- bus iteration function cleanups
- scripts/get_abi.pl tool to display and parse Documentation/ABI
entries in a simple way
- cleanups to Documenatation/ABI/ entries to make them parse easier
due to typos and other minor things
- default_attrs use for some ktype users
- driver model documentation file conversions to .rst
- compressed firmware file loading
- deferred probe fixes
All of these have been in linux-next for a while, with a bunch of
merge issues that Stephen has been patient with me for"
* tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits)
debugfs: make error message a bit more verbose
orangefs: fix build warning from debugfs cleanup patch
ubifs: fix build warning after debugfs cleanup patch
driver: core: Allow subsystems to continue deferring probe
drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
arch_topology: Remove error messages on out-of-memory conditions
lib: notifier-error-inject: no need to check return value of debugfs_create functions
swiotlb: no need to check return value of debugfs_create functions
ceph: no need to check return value of debugfs_create functions
sunrpc: no need to check return value of debugfs_create functions
ubifs: no need to check return value of debugfs_create functions
orangefs: no need to check return value of debugfs_create functions
nfsd: no need to check return value of debugfs_create functions
lib: 842: no need to check return value of debugfs_create functions
debugfs: provide pr_fmt() macro
debugfs: log errors when something goes wrong
drivers: s390/cio: Fix compilation warning about const qualifiers
drivers: Add generic helper to match by of_node
driver_find_device: Unify the match function with class_find_device()
bus_find_device: Unify the match callback with class_find_device
...
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We make the invalid assumption in arm_smmu_detach_dev() that the ATC is
clear after calling pci_disable_ats(). For one thing, only enabling the
PCIe ATS capability constitutes an implicit invalidation event, so the
comment was wrong. More importantly, the ATS capability isn't necessarily
disabled by pci_disable_ats() in a PF, if the associated VFs have ATS
enabled. Explicitly invalidate all ATC entries in arm_smmu_detach_dev().
The endpoint cannot form new ATC entries because STE.EATS is clear.
Fixes: 9ce27afc08 ("iommu/arm-smmu-v3: Add support for PCI ATS")
Reported-by: Manoj Kumar <Manoj.Kumar3@arm.com>
Reported-by: Robin Murphy <Robin.Murphy@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When compiling a kernel without support for CMA, CONFIG_CMA_ALIGNMENT
is not defined which results in the following build failure:
In file included from ./include/linux/list.h:9:0
from ./include/linux/kobject.h:19,
from ./include/linux/of.h:17
from ./include/linux/irqdomain.h:35,
from ./include/linux/acpi.h:13,
from drivers/iommu/arm-smmu-v3.c:12:
drivers/iommu/arm-smmu-v3.c: In function ‘arm_smmu_device_hw_probe’:
drivers/iommu/arm-smmu-v3.c:194:40: error: ‘CONFIG_CMA_ALIGNMENT’ undeclared (first use in this function)
#define Q_MAX_SZ_SHIFT (PAGE_SHIFT + CONFIG_CMA_ALIGNMENT)
Fix the breakage by capping the maximum queue size based on MAX_ORDER
when CMA is not enabled.
Reported-by: Zhangshaokun <zhangshaokun@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Linux IRQ number virq is not used in IRTE allocation. Remove it.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
check if there is a not-present cache present and flush it if there is.
Signed-off-by: Tom Murphy <murphyt7@tcd.ie>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When amd_iommu=off was specified on the command line, free_X_resources
functions were called immediately after early_amd_iommu_init. They were
then called again when amd_iommu_init also failed (as expected).
Instead, call them only once: at the end of state_next() whenever
there's an error. These functions should be safe to call any time and
any number of times. However, since state_next is never called again in
an error state, the cleanup will only ever be run once.
This also ensures that cleanup code is run as soon as possible after an
error is detected rather than waiting for amd_iommu_init() to be called.
Signed-off-by: Kevin Mitchell <kevmitch@arista.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The fallback to the GART driver in the case amd_iommu doesn't work was
executed in a function called free_iommu_resources, which didn't really
make sense. This was even being called twice if amd_iommu=off was
specified on the command line.
The only complication is that it needs to be verified that amd_iommu has
fully relinquished control by calling free_iommu_resources and emptying
the amd_iommu_list.
Signed-off-by: Kevin Mitchell <kevmitch@arista.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Make it safe to call iommu_disable during early init error conditions
before mmio_base is set, but after the struct amd_iommu has been added
to the amd_iommu_list. For example, this happens if firmware fails to
fill in mmio_phys in the ACPI table leading to a NULL pointer
dereference in iommu_feature_disable.
Fixes: 2c0ae1720c ('iommu/amd: Convert iommu initialization to state machine')
Signed-off-by: Kevin Mitchell <kevmitch@arista.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Describe the memory related to page table walks as non-cacheable for
iommu instances that are not DMA coherent.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[will: Use cfg->coherent_walk, fix arm-v7s, ensure outer-shareable for NC]
Signed-off-by: Will Deacon <will@kernel.org>
IO_PGTABLE_QUIRK_NO_DMA is a bit of a misnomer, since it's really just
an indication of whether or not the page-table walker for the IOMMU is
coherent with the CPU caches. Since cache coherency is more than just a
quirk, replace the flag with its own field in the io_pgtable_cfg
structure.
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
The driver_find_device() accepts a match function pointer to
filter the devices for lookup, similar to bus/class_find_device().
However, there is a minor difference in the prototype for the
match parameter for driver_find_device() with the now unified
version accepted by {bus/class}_find_device(), where it doesn't
accept a "const" qualifier for the data argument. This prevents
us from reusing the generic match functions for driver_find_device().
For this reason, change the prototype of the driver_find_device() to
make the "match" parameter in line with {bus/class}_find_device()
and adjust its callers to use the const qualifier. Also, we could
now promote the "data" parameter to const as we pass it down
as a const parameter to the match functions.
Cc: Corey Minyard <minyard@acm.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Sebastian Ott <sebott@linux.ibm.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Nehal Shah <nehal-bakulchandra.shah@amd.com>
Cc: Shyam Sundar S K <shyam-sundar.s-k@amd.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Revert a commit from the previous pile of fixes which causes
new lockdep splats. It is better to revert it for now and work
on a better and more well tested fix.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl0Og/QACgkQK/BELZcB
GuOkZhAAqzTHRndHUzXpSr7L+MrioQI3pmPKuyK72pxQyhuNlwRNn84c8pUxyQe/
0I6xxvPRu/763BepA0iHW4wKuFtuM299y95STxh7pcEV9+LSTdsrl7TRDxebDG+A
I7ZJoPlDSrZfnAdlzXAXF1GDmNS+6tM3zRYaH/DSBk5EQhmYLZ6qGiFfrY3DZ7A4
Y6MhP4OK0bsTE6nL8MFvxGXQA00CdiYIwxBbdCy0uYyOMIrGRE6/Wu+n6Ld1f3PM
bXuJrj3ZV4Wm1emolG5CWACtJcwvvC+/gn+qQ23mTQvSe4cwK97lUgZ7dS3sTpfr
ZZFAyzTKAwLN7mA3n/5vu0HI6ZwfipE1izkZhTqHzR/Q+Gy0y0afhGuJOWY5N7Y7
2AGxp78e3cuIMN8VChHjSlpRvg/EUA9xgX1oylX3lR/YJLW62V1QXmUiHU66KTuo
QESUSw6IPedqsoLmAJeaP7a5hC+MdliliQEJKOhbluyLfHnMZMo0Ao0jCn/Qtpqr
S+PfiroXKWaP6jsRKPD6p9mE4g4S01pREGnfTULH5RKqBYu8LjPjV++xWNxoy66Q
bhoxkIX8+5x9eUlX9zKIm9faqC8P4VbFdu7P6E/wwzeQrRlMYHpWCyBHhmSEj1Xh
loLvJe5mQv9l+eKWWh4CUngripM1tSnAONWBRsnz0Bz1LoraWqQ=
=JRtp
-----END PGP SIGNATURE-----
Merge tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fix from Joerg Roedel:
"Revert a commit from the previous pile of fixes which causes new
lockdep splats. It is better to revert it for now and work on a better
and more well tested fix"
* tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
This reverts commit 7560cc3ca7.
With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt:
======================================================
WARNING: possible circular locking dependency detected
5.2.0-rc5 #78 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0
but task is already holding lock:
00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (device_domain_lock){....}:
_raw_spin_lock_irqsave+0x3c/0x50
dmar_insert_one_dev_info+0xbb/0x510
domain_add_dev_info+0x50/0x90
dev_prepare_static_identity_mapping+0x30/0x68
intel_iommu_init+0xddd/0x1422
pci_iommu_init+0x16/0x3f
do_one_initcall+0x5d/0x2b4
kernel_init_freeable+0x218/0x2c1
kernel_init+0xa/0x100
ret_from_fork+0x3a/0x50
-> #0 (&(&iommu->lock)->rlock){+.+.}:
lock_acquire+0x9e/0x170
_raw_spin_lock+0x25/0x30
domain_context_mapping_one+0xa5/0x4e0
pci_for_each_dma_alias+0x30/0x140
dmar_insert_one_dev_info+0x3b2/0x510
domain_add_dev_info+0x50/0x90
dev_prepare_static_identity_mapping+0x30/0x68
intel_iommu_init+0xddd/0x1422
pci_iommu_init+0x16/0x3f
do_one_initcall+0x5d/0x2b4
kernel_init_freeable+0x218/0x2c1
kernel_init+0xa/0x100
ret_from_fork+0x3a/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(device_domain_lock);
lock(&(&iommu->lock)->rlock);
lock(device_domain_lock);
lock(&(&iommu->lock)->rlock);
*** DEADLOCK ***
2 locks held by swapper/0/1:
#0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422
#1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
stack backtrace:
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78
Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018
Call Trace:
dump_stack+0x85/0xc0
print_circular_bug.cold.57+0x15c/0x195
__lock_acquire+0x152a/0x1710
lock_acquire+0x9e/0x170
? domain_context_mapping_one+0xa5/0x4e0
_raw_spin_lock+0x25/0x30
? domain_context_mapping_one+0xa5/0x4e0
domain_context_mapping_one+0xa5/0x4e0
? domain_context_mapping_one+0x4e0/0x4e0
pci_for_each_dma_alias+0x30/0x140
dmar_insert_one_dev_info+0x3b2/0x510
domain_add_dev_info+0x50/0x90
dev_prepare_static_identity_mapping+0x30/0x68
intel_iommu_init+0xddd/0x1422
? printk+0x58/0x6f
? lockdep_hardirqs_on+0xf0/0x180
? do_early_param+0x8e/0x8e
? e820__memblock_setup+0x63/0x63
pci_iommu_init+0x16/0x3f
do_one_initcall+0x5d/0x2b4
? do_early_param+0x8e/0x8e
? rcu_read_lock_sched_held+0x55/0x60
? do_early_param+0x8e/0x8e
kernel_init_freeable+0x218/0x2c1
? rest_init+0x230/0x230
kernel_init+0xa/0x100
ret_from_fork+0x3a/0x50
domain_context_mapping_one() is taking device_domain_lock first then
iommu lock, while dmar_insert_one_dev_info() is doing the reverse.
That should be introduced by commit:
7560cc3ca7 ("iommu/vt-d: Fix lock inversion between iommu->lock and
device_domain_lock", 2019-05-27)
So far I still cannot figure out how the previous deadlock was
triggered (I cannot find iommu lock taken before calling of
iommu_flush_dev_iotlb()), however I'm pretty sure that that change
should be incomplete at least because it does not fix all the places
so we're still taking the locks in different orders, while reverting
that commit is very clean to me so far that we should always take
device_domain_lock first then the iommu lock.
We can continue to try to find the real culprit mentioned in
7560cc3ca7, but for now I think we should revert it to fix current
breakage.
CC: Joerg Roedel <joro@8bytes.org>
CC: Lu Baolu <baolu.lu@linux.intel.com>
CC: dave.jiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 503 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Few Qualcomm platforms such as, sdm845 have an additional outer
cache called as System cache, aka. Last level cache (LLC) that
allows non-coherent devices to upgrade to using caching.
This cache sits right before the DDR, and is tightly coupled
with the memory controller. The clients using this cache request
their slices from this system cache, make it active, and can then
start using it.
There is a fundamental assumption that non-coherent devices can't
access caches. This change adds an exception where they *can* use
some level of cache despite still being non-coherent overall.
The coherent devices that use cacheable memory, and CPU make use of
this system cache by default.
Looking at memory types, we have following -
a) Normal uncached :- MAIR 0x44, inner non-cacheable,
outer non-cacheable;
b) Normal cached :- MAIR 0xff, inner read write-back non-transient,
outer read write-back non-transient;
attribute setting for coherenet I/O devices.
and, for non-coherent i/o devices that can allocate in system cache
another type gets added -
c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable,
outer read write-back non-transient
Coherent I/O devices use system cache by marking the memory as
normal cached.
Non-coherent I/O devices should mark the memory as normal
sys-cached in page tables to use system cache.
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We've been artificially limiting the size of our queues to 4k so that we
don't end up allocating huge amounts of physically-contiguous memory at
probe time. However, 4k is only enough for 256 commands in the command
queue, so instead let's try to allocate the largest queue that the SMMU
supports, retrying with a smaller size if the allocation fails.
The caveat here is that we have to limit our upper bound based on
CONFIG_CMA_ALIGNMENT to ensure that our queue allocations remain
natually aligned, which is required by the SMMU architecture.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The commit "iommu/vt-d: Probe DMA-capable ACPI name space devices"
introduced a compilation warning due to the "iommu" variable in
for_each_active_iommu() but never used the for each element, i.e,
"drhd->iommu".
drivers/iommu/intel-iommu.c: In function 'probe_acpi_namespace_devices':
drivers/iommu/intel-iommu.c:4639:22: warning: variable 'iommu' set but
not used [-Wunused-but-set-variable]
struct intel_iommu *iommu;
Silence the warning the same way as in the commit d3ed71e5cc
("drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The linux-next commit "iommu/vt-d: Duplicate iommu_resv_region objects
per device list" [1] left out an unused variable,
drivers/iommu/intel-iommu.c: In function 'dmar_parse_one_rmrr':
drivers/iommu/intel-iommu.c:4014:9: warning: variable 'length' set but
not used [-Wunused-but-set-variable]
[1] https://lore.kernel.org/patchwork/patch/1083073/
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>