If CONFIG_PM is not set, init_iommu_pm_ops() introduced by commit
134fac3f45 (PCI / Intel IOMMU: Use
syscore_ops instead of sysdev class and sysdev) is not defined
appropriately. Fix this issue.
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* git://git.infradead.org/iommu-2.6:
intel-iommu: Fix off-by-one in RMRR setup
intel-iommu: Add domain check in domain_remove_one_dev_info
intel-iommu: Remove Host Bridge devices from identity mapping
intel-iommu: Use coherent DMA mask when requested
intel-iommu: Dont cache iova above 32bit
intel-iommu: Speed up processing of the identity_mapping function
intel-iommu: Check for identity mapping candidate using system dma mask
intel-iommu: Only unlink device domains from iommu
intel-iommu: Enable super page (2MiB, 1GiB, etc.) support
intel-iommu: Flush unmaps at domain_exit
intel-iommu: Remove obsolete comment from detect_intel_iommu
intel-iommu: fix VT-d PMR disable for TXT on S3 resume
We were mapping an extra byte (and hence usually an extra page):
iommu_prepare_identity_map() expects to be given an 'end' argument which
is the last byte to be mapped; not the first byte *not* to be mapped.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The comment in domain_remove_one_dev_info() states "No need to compare
PCI domain; it has to be the same". But for the si_domain that isn't
going to be true, as it consists of all the PCI devices that are
identity mapped thus multiple PCI domains can be in si_domain. The
code needs to validate the PCI domain too.
Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When using the 1:1 (identity) PCI DMA remapping, PCI Host Bridge devices
that do not use the IOMMU causes a kernel panic. Fix that by not
inserting those devices into the si_domain.
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Mike Habeck <habeck@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The __intel_map_single function is not honoring the passed in DMA mask.
This results in not using the coherent DMA mask when called from
intel_alloc_coherent().
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Reviewed-by: Mike Habeck <habeck@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When there are a large count of PCI devices, and the pass through
option for iommu is set, much time is spent in the identity_mapping
function hunting though the iommu domains to check if a specific
device is "identity mapped".
Speed up the function by checking the cached info to see if
it's mapped to the static identity domain.
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Mike Habeck <habeck@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The identity mapping code appears to make the assumption that if the
devices dma_mask is greater than 32bits the device can use identity
mapping. But that is not true: take the case where we have a 40bit
device in a 44bit architecture. The device can potentially receive a
physical address that it will truncate and cause incorrect addresses
to be used.
Instead check to see if the device's dma_mask is large enough
to address the system's dma_mask.
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Mike Habeck <habeck@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Commit a97590e5 added unlinking domains from iommus to reciprocate the
iommu from domains unlinking that was already done. We actually want
to only do this for device domains and never for the static
identity map domain or VM domains. The SI domain is special and
never freed, while VM domain->id lives in their own special address
space, separate from iommu->domain_ids.
In the current code, a VM can get domain->id zero, then mark that
domain unused when unbound from pci-stub. This leads to DMAR
write faults when the device is re-bound to the host driver.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There are no externally-visible changes with this. In the loop in the
internal __domain_mapping() function, we simply detect if we are mapping:
- size >= 2MiB, and
- virtual address aligned to 2MiB, and
- physical address aligned to 2MiB, and
- on hardware that supports superpages.
(and likewise for larger superpages).
We automatically use a superpage for such mappings. We never have to
worry about *breaking* superpages, since we trust that we will always
*unmap* the same range that was mapped. So all we need to do is ensure
that dma_pte_clear_range() will also cope with superpages.
Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
it can return a PTE at the appropriate level rather than always
extending the page tables all the way down to level 1. Again, this is
simplified by the fact that we should never encounter existing small
pages when we're creating a mapping; any old mapping that used the same
virtual range will have been entirely removed and its obsolete page
tables freed.
Provide an 'intel_iommu=sp_off' argument on the command line as a
chicken bit. Not that it should ever be required.
==
The original commit seen in the iommu-2.6.git was Youquan's
implementation (and completion) of my own half-baked code which I'd
typed into an email. Followed by half a dozen subsequent 'fixes'.
I've taken the unusual step of rewriting history and collapsing the
original commits in order to keep the main history simpler, and make
life easier for the people who are going to have to backport this to
older kernels. And also so I can give it a more coherent commit comment
which (hopefully) gives a better explanation of what's going on.
The original sequence of commits leading to identical code was:
Youquan Song (3):
intel-iommu: super page support
intel-iommu: Fix superpage alignment calculation error
intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()
David Woodhouse (4):
intel-iommu: Precalculate superpage support for dmar_domain
intel-iommu: Fix hardware_largepage_caps()
intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We typically batch unmaps to be lazily flushed out at
regular intervals. When we destroy a domain, we need
to force a flush of these lazy unmaps to be sure none
reference the domain we're about to free.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=35062
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
This patch is a follow on to https://lkml.org/lkml/2011/3/21/239, which
was merged as commit 51a63e67da.
This patch adds support for S3, as pointed out by Chris Wright.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.infradead.org/iommu-2.6:
intel_iommu: disable all VT-d PMRs when TXT launched
intel-iommu: Fix get_domain_for_dev() error path
intel-iommu: Unlink domain from iommu
intel-iommu: Fix use after release during device attach
Intel VT-d Protected Memory Regions (PMRs) are supposed to be disabled,
on each VT-d engine, after DMA remapping is enabled on the engines.
This is because the behavior of having both enabled is not deterministic
and because, if TXT has been used to launch the kernel, the PMRs may be
programmed to cover memory regions that will be used for DMA.
Under some circumstances (certain quirks detected, lack of multiple
devices, etc.), the current code does not set up DMA remapping on some
VT-d engines. In such cases it also skips disabling the PMRs. This
causes failures when the kernel is launched with TXT (most often this
occurs on the graphics engine and results in colored vertical bars on
the display).
This patch detects when the kernel has been launched with TXT and then
disables the PMRs on all VT-d engines. In some cases where the reason
that remapping is not being enabled is due to possible ACPI DMAR table
errors, the VT-d engine addresses may not be correct and thus not able
to be safely programmed even to disable PMRs. Because part of the TXT
launch process is the verification of these addresses, it will always be
safe to disable PMRs if the TXT launch has succeeded and hence only
doing this in such cases.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch moves the relevant declarations from the local
header file in drivers/pci to a more accessible locations so
that it can be used by the AMD IOMMU driver too.
The file is named pci-ats.h because support for the PCI PRI
capability will also be added there in a later patch-set.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The Intel IOMMU subsystem uses a sysdev class and a sysdev for
executing iommu_suspend() after interrupts have been turned off
on the boot CPU (during system suspend) and for executing
iommu_resume() before turning on interrupts on the boot CPU
(during system resume). However, since both of these functions
ignore their arguments, the entire mechanism may be replaced with a
struct syscore_ops object which is simpler.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
If we run out of domain_ids and fail iommu_attach_domain(), we
fall into domain_exit() without having setup enough of the
domain structure for this to do anything useful. In fact, it
typically runs off into the weeds walking the bogus domain->devices
list. Just free the domain.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
When we remove a device, we unlink the iommu from the domain, but
we never do the reverse unlinking of the domain from the iommu.
This means that we never clear iommu->domain_ids, eventually leading
to resource exhaustion if we repeatedly bind and unbind a device
to a driver. Also free empty domains to avoid a resource leak.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
Obtain the new pgd pointer before releasing the page containing this
value.
Cc: stable@kernel.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.infradead.org/iommu-2.6:
intel-iommu: Use symbolic values instead of magic numbers in Lenovo w/a
intel-iommu: Abort IOMMU setup for igfx if BIOS gave no shadow GTT space
drivers/pci/intel-iommu.c: In function `__iommu_calculate_agaw':
drivers/pci/intel-iommu.c:437: sorry, unimplemented: inlining failed in call to 'width_to_agaw': function body not available
drivers/pci/intel-iommu.c:445: sorry, unimplemented: called from here
Move the offending function (and its siblings) to top-of-file, remove the
forward declaration.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=17441
Reported-by: Martin Mokrejs <mmokrejs@ribosome.natur.cuni.cz>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9eecabcb9a ("intel-iommu: Abort
IOMMU setup for igfx if BIOS gave no shadow GTT space") uses a bunch of
magic numbers. Provide #defines for those to make it look slightly saner.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
drivers/pci/intel-iommu.c: In function 'dma_pte_addr':
drivers/pci/intel-iommu.c:239: warning: passing argument 1 of '__cmpxchg64' from incompatible pointer type
It seems that __cmpxchg64() now cares about the type of its pointer argument,
so give it a (uint64_t *) instead of a pointer to a structure which contains
only that.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
On some platforms (MacPro3,1) the BIOS assigns the ioatdma device to the
incorrect iommu causing faults when the driver initializes. Add a quirk
to catch this misconfiguration and try falling back to untranslated
operation (which works in the MacPro3,1 case).
Assuming there are other platforms with misconfigured iommus teach the
ioatdma driver to treat initialization failures as non-fatal (just fail
the driver load and emit a warning instead of triggering a BUG_ON).
This can be classified as a boot regression since 2.6.32 on affected
platforms since the ioatdma module did not autoload prior to that
kernel.
Cc: <stable@kernel.org>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Reported-by: Chris Li <lkml@chrisli.org>
Tested-by: Chris Li <lkml@chrisli.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch allows IOMMU users to determine whether the
hardware and software support safe, isolated interrupt
remapping. Not all Intel IOMMUs have the hardware, and the
software for AMD is not there yet.
Signed-off-by: Tom Lyon <pugs@cisco.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Certain revisions of this chipset appear to be broken. There is a shadow
GTT which mirrors the real GTT but contains pre-translated physical
addresses, for performance reasons. When a GTT update happens, the
translations are done once and the resulting physical addresses written
back to the shadow GTT.
Except sometimes, the physical address is actually written back to the
_real_ GTT, not the shadow GTT. Thus we start to see faults when that
physical address is fed through translation again.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
stanse found the following double lock.
In get_domain_for_dev:
spin_lock_irqsave(&device_domain_lock, flags);
domain_exit(domain);
domain_remove_dev_info(domain);
spin_lock_irqsave(&device_domain_lock, flags);
spin_unlock_irqrestore(&device_domain_lock, flags);
spin_unlock_irqrestore(&device_domain_lock, flags);
This happens when the domain is created by another CPU at the same time
as this function is creating one, and the other CPU wins the race to
attach it to the device in question, so we have to destroy our own
newly-created one.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Commit a99c47a2 "intel-iommu: errors with smaller iommu widths" replace the
dmar_domain->pgd with the first entry of page table when iommu's supported
width is smaller than dmar_domain's. But it use physical address directly
for new dmar_domain->pgd...
This result in KVM oops with VT-d on some machines.
Reported-by: Allen Kay <allen.m.kay@intel.com>
Cc: Tom Lyon <pugs@cisco.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.infradead.org/iommu-2.6:
intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables
intel-iommu: Combine the BIOS DMAR table warning messages
panic: Add taint flag TAINT_FIRMWARE_WORKAROUND ('I')
panic: Allow warnings to set different taint flags
intel-iommu: intel_iommu_map_range failed at very end of address space
intel-iommu: errors with smaller iommu widths
intel-iommu: Fix boot inside 64bit virtualbox with io-apic disabled
intel-iommu: use physfn to search drhd for VF
intel-iommu: Print out iommu seq_id
intel-iommu: Don't complain that ACPI_DMAR_SCOPE_TYPE_IOAPIC is not supported
intel-iommu: Avoid global flushes with caching mode.
intel-iommu: Use correct domain ID when caching mode is enabled
intel-iommu mistakenly uses offset_pfn when caching mode is enabled
intel-iommu: use for_each_set_bit()
intel-iommu: Fix section mismatch dmar_ir_support() uses dmar_tbl.
intel_iommu_map_range() doesn't allow allocation at the very end of the
address space; that code has been simplified and corrected.
Signed-off-by: Tom Lyon <pugs@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When using iommu_domain_alloc with the Intel iommu, the domain address
width is always initialized to 48 bits (agaw 2). This domain->agaw value
is then used by pfn_to_dma_pte to (always) build a 4 level page table.
However, not all systems support iommu width of 48 or 4 level page tables.
In particular, the Core i5-660 and i5-670 support an address width of 36
bits (not 39!), an agaw of only 1, and only 3 level page tables.
This version of the patch simply lops off extra levels of the page tables
if the agaw value of the iommu is less than what is currently allocated
for the domain (in intel_iommu_attach_device). If there were already
allocated addresses above what the new iommu can handle, EFAULT is
returned.
Signed-off-by: Tom Lyon <pugs@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
While it may be efficient on real hardware, emulation of global
invalidations is very expensive as all shadow entries must be examined.
This patch changes the behaviour when caching mode is enabled (which is
the case when IOMMU emulation takes place). In this case, page specific
invalidation is used instead.
Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In caching-mode mappings of pages (changes from non-present to present)
require invalidation.
Currently, this IOTLB flush is performed with domain ID of zero.
This is not according to the VT-d spec and causes big problems for
emulating software.
This patch uses the correct domain ID in IOTLB flushes.
Device IOTLB invalidation is performed only on present to non-present
changes. This decision is now based on explicit parameter instead of
zero domain-ID.
Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
intel_map_sg used offset_pfn which was set to zero when invalidating the IOTLB.
intel_map_sg now uses size variable for this matter.
Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Replace open-coded loop with for_each_set_bit().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch changes the iommu-api functions for mapping and
unmapping page ranges to use the new page-size based
interface. This allows to remove the range based functions
later.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The new function pointer names match better with the
top-level functions of the iommu-api which are using them.
Main intention of this change is to make the ->{un}map
pointer names free for two new mapping functions.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI/cardbus: Add a fixup hook and fix powerpc
PCI: change PCI nomenclature in drivers/pci/ (non-comment changes)
PCI: change PCI nomenclature in drivers/pci/ (comment changes)
PCI: fix section mismatch on update_res()
PCI: add Intel 82599 Virtual Function specific reset method
PCI: add Intel USB specific reset method
PCI: support device-specific reset methods
PCI: Handle case when no pci device can provide cache line size hint
PCI/PM: Propagate wake-up enable for PCIe devices too
vgaarbiter: fix a typo in the vgaarbiter Documentation
* git://git.infradead.org/iommu-2.6:
implement early_io{re,un}map for ia64
Revert "Intel IOMMU: Avoid memory allocation failures in dma map api calls"
intel-iommu: ignore page table validation in pass through mode
intel-iommu: Fix oops with intel_iommu=igfx_off
intel-iommu: Check for an RMRR which ends before it starts.
intel-iommu: Apply BIOS sanity checks for interrupt remapping too.
intel-iommu: Detect DMAR in hyperspace at probe time.
dmar: Fix build failure without NUMA, warn on bogus RHSA tables and don't abort
iommu: Allocate dma-remapping structures using numa locality info
intr_remap: Allocate intr-remapping table using numa locality info
dmar: Allocate queued invalidation structure using numa locality info
dmar: support for parsing Remapping Hardware Static Affinity structure
commit eb3fa7cb51 said Intel IOMMU
Intel IOMMU driver needs memory during DMA map calls to setup its
internal page tables and for other data structures. As we all know
that these DMA map calls are mostly called in the interrupt context
or with the spinlock held by the upper level drivers(network/storage
drivers), so in order to avoid any memory allocation failure due to
low memory issues, this patch makes memory allocation by temporarily
setting PF_MEMALLOC flags for the current task before making memory
allocation calls.
We evaluated mempools as a backup when kmem_cache_alloc() fails
and found that mempools are really not useful here because
1) We don't know for sure how much to reserve in advance
2) And mempools are not useful for GFP_ATOMIC case (as we call
memory alloc functions with GFP_ATOMIC)
(akpm: point 2 is wrong...)
The above description doesn't justify to waste system emergency memory
at all. Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need
few memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.
Plus, akpm already pointed out what we should do.
Then, this patch revert it.
Cc: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We are seeing a bug when booting w/ iommu=pt with current upstream
(bisect blames 19943b0e30 "intel-iommu:
Unify hardware and software passthrough support).
The issue is specific to this loop during identity map initialization
of each device:
domain_context_mapping_one(si_domain, ..., CONTEXT_TT_PASS_THROUGH)
...
/* Skip top levels of page tables for
* iommu which has less agaw than default.
*/
for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
pgd = phys_to_virt(dma_pte_addr(pgd));
if (!dma_pte_present(pgd)) { <------ failing here
spin_unlock_irqrestore(&iommu->lock, flags);
return -ENOMEM;
}
This box has 2 iommu's in it. The catchall iommu has MGAW == 48, and
SAGAW == 4. The other iommu has MGAW == 39, SAGAW == 2.
The device that's failing the above pgd test is the only device connected
to the non-catchall iommu, which has a smaller address width than the
domain default. This test is not necessary since the context is in PT
mode and the ASR is ignored.
Thanks to Don Dutile for discovering and debugging this one.
Cc: stable@kernel.org
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>