* pci/locking:
PCI: Check parent kobject in pci_destroy_dev()
xen/pcifront: Use global PCI rescan-remove locking
powerpc/eeh: Use global PCI rescan-remove locking
MPT / PCI: Use pci_stop_and_remove_bus_device_locked()
platform / x86: Use global PCI rescan-remove locking
PCI: hotplug: Use global PCI rescan-remove locking
pcmcia: Use global PCI rescan-remove locking
ACPI / hotplug / PCI: Use global PCI rescan-remove locking
ACPI / PCI: Use global PCI rescan-remove locking in PCI root hotplug
PCI: Add global pci_lock_rescan_remove()
PCI resets will attempt to take the device_lock for any device to be
reset. This is a problem if that lock is already held, for instance
in the device remove path. It's not sufficient to simply kill the
user process or skip the reset if called after .remove as a race could
result in the same deadlock. Instead, we handle all resets as "best
effort" using the PCI "try" reset interfaces. This prevents the user
from being able to induce a deadlock by triggering a reset.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If pci_stop_and_remove_bus_device() is run concurrently for a device and
its parent bridge via remove_callback(), both code paths attempt to acquire
pci_rescan_remove_lock. If the child device removal acquires it first,
there will be no problems. However, if the parent bridge removal acquires
it first, it will eventually execute pci_destroy_dev() for the child
device, but that device object will not be freed yet due to the reference
held by the concurrent child removal. Consequently, both
pci_stop_bus_device() and pci_remove_bus_device() will be executed for that
device unnecessarily and pci_destroy_dev() will see a corrupted list head
in that object. Moreover, an excess put_device() will be executed for that
device in that case which may lead to a use-after-free in the final
kobject_put() done by sysfs_schedule_callback_work().
To avoid that problem, make pci_destroy_dev() check if the device's parent
kobject is NULL, which only happens after device_del() has already run for
it. Make pci_destroy_dev() return immediately whithout doing anything in
that case.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Multiple race conditions are possible between the Xen pcifront device
addition and removal and the generic PCI device addition and removal that
can be triggered via sysfs.
To avoid those race conditions make the Xen pcifront code use global PCI
rescan-remove locking.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Race conditions are theoretically possible between the PCI device addition
and removal in the PPC64 PCI error recovery driver and the generic PCI bus
rescan and device removal that can be triggered via sysfs.
To avoid those race conditions make PPC64 PCI error recovery driver use
global PCI rescan-remove locking around PCI device addition and removal.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When doing a function/slot/bus reset PCI grabs the device_lock for each
device to block things like suspend and driver probes, but call paths exist
where this lock may already be held. This creates an opportunity for
deadlock. For instance, vfio allows userspace to issue resets so long as
it owns the device(s). If a driver unbind .remove callback races with
userspace issuing a reset, we have a deadlock as userspace gets stuck
waiting on device_lock while another thread has device_lock and waits for
.remove to complete. To resolve this, we can make a version of the reset
interfaces which use trylock. With this, we can safely attempt a reset and
return error to userspace if there is contention.
[bhelgaas: the deadlock happens when A (userspace) has a file descriptor for
the device, and B waits in this path:
driver_detach
device_lock # take device_lock
__device_release_driver
pci_device_remove # pci_bus_type.remove
vfio_pci_remove # pci_driver .remove
vfio_del_group_dev
wait_event(vfio.release_q, !vfio_dev_present) # wait (holding device_lock)
Now B is stuck until A gives up the file descriptor. If A tries to acquire
device_lock for any reason, we deadlock because A is waiting for B to release
the lock, and B is waiting for A to release the file descriptor.]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Race conditions are theoretically possible between the MPT PCI device
removal and the generic PCI bus rescan and device removal that can be
triggered via sysfs.
To avoid those race conditions make the MPT PCI code use
pci_stop_and_remove_bus_device_locked().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Multiple race conditions are possible between the rfkill hotplug in the
asus-wmi and eeepc-laptop drivers and the generic PCI bus rescan and device
removal that can be triggered via sysfs.
To avoid those race conditions make asus-wmi and eeepc-laptop use global
PCI rescan-remove locking around the rfkill hotplug.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Multiple race conditions are possible between PCI hotplug and the generic
PCI bus rescan and device removal that can be triggered via sysfs.
To avoid those race conditions make PCI hotplug use global PCI
rescan-remove locking.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Multiple race conditions are possible between the cardbus PCI device
addition and removal and the generic PCI bus rescan and device removal that
can be triggered via sysfs.
To avoid those race conditions make the cardbus code use global PCI
rescan-remove locking.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Multiple race conditions are possible between the ACPI-based PCI hotplug
(ACPIPHP) and the generic PCI bus rescan and device removal that can be
triggered via sysfs.
To avoid those race conditions make the ACPIPHP code use global PCI
rescan-remove locking.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Multiple race conditions are possible between the addition and removal of
PCI devices during ACPI PCI host bridge hotplug and the generic PCI bus
rescan and device removal that can be triggered via sysfs.
To avoid those race conditions make the ACPI PCI host bridge addition and
removal code use global PCI rescan-remove locking.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There are multiple PCI device addition and removal code paths that may be
run concurrently with the generic PCI bus rescan and device removal that
can be triggered via sysfs. If that happens, it may lead to multiple
different, potentially dangerous race conditions.
The most straightforward way to address those problems is to run
the code in question under the same lock that is used by the
generic rescan/remove code in pci-sysfs.c. To prepare for those
changes, move the definition of the global PCI remove/rescan lock
to probe.c and provide global wrappers, pci_lock_rescan_remove()
and pci_unlock_rescan_remove(), allowing drivers to manipulate
that lock. Also provide pci_stop_and_remove_bus_device_locked()
for the callers of pci_stop_and_remove_bus_device() who only need
to hold the rescan/remove lock around it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Consistently use the:
#ifdef CONFIG_PCI_FOO
int pci_foo(...);
#else
static inline int pci_foo(...) { return -1; }
#endif
pattern, instead of sometimes using "#ifndef CONFIG_PCI_FOO".
No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/aer:
PCI/AER: Support ACPI HEST AER error sources for PCI domains other than 0
ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
In the discussion for this set of patches [link below], Bjorn Helgaas
pointed out that the ACPI HEST AER error sources do not have the PCIe
segment number associated with the bus. I worked with the ACPI spec and
got this change to definition of the "Bus" field into the recently released
ACPI Spec 5.0a section 18.3.2.3-5:
Identifies the PCI Bus and Segment of the device. The Bus is encoded in
bits 0-7. For systems that expose multiple PCI segment groups, the
segment number is encoded in bits 8-23 and bits 24-31 must be zero. For
systems that do not expose multiple PCI segment groups, bits 8-31 must be
zero. If the GLOBAL flag is specified, this field is ignored.
This patch makes use of the new definition in the only place in the kernel
that uses the acpi_hest_aer_common's bus field.
This depends on 36f3615152 ("ACPICA: Add helper macros to extract
bus/segment numbers from HEST table.")
Link: http://lkml.kernel.org/r/1370542251-27387-1-git-send-email-betty.dall@hp.com
Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This change adds two macros to extract the encoded bus and segment
numbers from the HEST Bus field.
Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Using 'make namespacecheck' identify code which should be declared static.
Checked for users in other driver/archs as well. Compile tested only.
This stops exporting the following interfaces to modules:
pci_target_state()
pci_load_saved_state()
[bhelgaas: retained pci_find_next_ext_capability() and pci_cfg_space_size()]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This removes this unused and deprecated interface:
alloc_pci_dev()
[bhelgaas: split to separate patch]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts part of f46753c5e3 ("PCI: introduce pci_slot") and
d25b7c8d6b ("PCI: rename pci_update_slot_number to pci_renumber_slot"),
removing this interface:
pci_renumber_slot()
[bhelgaas: split to separate patch, add historical link from Alex]
Link: http://lkml.kernel.org/r/20081009043140.8678.44164.stgit@bob.kio
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Chiang <achiang@canonical.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts part of 3e1b16002a ("ACPI/PCI: PCIe ASPM _OSC support
capabilities called when root bridge added"), removing this interface:
pcie_aspm_enabled()
[bhelgaas: split to separate patch]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Andrew Patterson <andrew.patterson@hp.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts db5679437a ("PCI: add interface to set visible size of
VPD"), removing this interface:
pci_vpd_truncate()
[bhelgaas: split to separate patch, also remove prototype from pci.h]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts b48d4425b6 ("PCI: add ID-based ordering enable/disable
support"), removing these interfaces:
pci_enable_ido()
pci_disable_ido()
[bhelgaas: split to separate patch, also remove prototypes from pci.h]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts 48a92a8179 ("PCI: add OBFF enable/disable support"),
removing these interfaces:
pci_enable_obff()
pci_disable_obff()
[bhelgaas: split to separate patch, also remove prototypes from pci.h]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts 51c2e0a7e5 ("PCI: add latency tolerance reporting
enable/disable support"), removing these interfaces:
pci_enable_ltr()
pci_disable_ltr()
pci_set_ltr()
[bhelgaas: split to separate patch, also remove prototypes from pci.h]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
* pci/resource:
PCI: Allocate 64-bit BARs above 4G when possible
PCI: Enforce bus address limits in resource allocation
PCI: Split out bridge window override of minimum allocation address
agp/ati: Use PCI_COMMAND instead of hard-coded 4
agp/intel: Use CPU physical address, not bus address, for ioremap()
agp/intel: Use pci_bus_address() to get GTTADR bus address
agp/intel: Use pci_bus_address() to get MMADR bus address
agp/intel: Support 64-bit GMADR
agp/intel: Rename gtt_bus_addr to gtt_phys_addr
drm/i915: Rename gtt_bus_addr to gtt_phys_addr
agp: Use pci_resource_start() to get CPU physical address for BAR
agp: Support 64-bit APBASE
PCI: Add pci_bus_address() to get bus address of a BAR
PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev
PCI: Change pci_bus_region addresses to dma_addr_t
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts parts of c320b976d7 ("PCI: Add implementation for PRI
capability"), removing these interfaces:
pci_pri_enabled()
pci_pri_stopped()
pci_pri_status()
[bhelgaas: split to separate patch]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Joerg Roedel <joro@8bytes.org>
Per the SR-IOV spec rev 1.1:
3.4.1.9 Header Type (Offset 0Eh)
"... For VFs, this register must be RO Zero."
Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb NIC.
When they do it makes us handle ACS testing and therefore IOMMU groups as
if they were actual multifunction devices and require ACS capabilities to
make sure there's no peer-to-peer between functions. VFs are never
traditional multifunction devices, so simply clear this bit before we get
any further into setup.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=68431
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The PCI-DMA-mapping.txt moved to general docs and became DMA-API-HOWTO.txt
in 5e07c2c730 ("Documentation: rename PCI/PCI-DMA-mapping.txt to
DMA-API-HOWTO.txt"). Add new file about PCI Express I/O Virtualization.
Signed-off-by: Erik Ekman <erik@kryo.se>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Try to allocate space for 64-bit BARs above 4G first, to preserve the space
below 4G for 32-bit BARs. If there's no space above 4G available, fall
back to allocating anywhere.
[bhelgaas: reworked starting from http://lkml.kernel.org/r/1387485843-17403-2-git-send-email-yinghai@kernel.org]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When allocating space for 32-bit BARs, we previously limited RESOURCE
addresses so they would fit in 32 bits. However, the BUS address need not
be the same as the resource address, and it's the bus address that must fit
in the 32-bit BAR.
This patch adds:
- pci_clip_resource_to_region(), which clips a resource so it contains
only the range that maps to the specified bus address region, e.g., to
clip a resource to 32-bit bus addresses, and
- pci_bus_alloc_from_region(), which allocates space for a resource from
the specified bus address region,
and changes pci_bus_alloc_resource() to allocate space for 64-bit BARs from
the entire bus address region, and space for 32-bit BARs from only the bus
address region below 4GB.
If we had this window:
pci_root HWP0002:0a: host bridge window [mem 0xf0180000000-0xf01fedfffff] (bus address [0x80000000-0xfedfffff])
we previously could not put a 32-bit BAR there, because the CPU addresses
don't fit in 32 bits. This patch fixes this, so we can use this space for
32-bit BARs.
It's also possible (though unlikely) to have resources with 32-bit CPU
addresses but bus addresses above 4GB. In this case the previous code
would allocate space that a 32-bit BAR could not map.
Remove PCIBIOS_MAX_MEM_32, which is no longer used.
[bhelgaas: reworked starting from http://lkml.kernel.org/r/1386658484-15774-3-git-send-email-yinghai@kernel.org]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_bus_alloc_resource() avoids allocating space below the "min" supplied
by the caller (usually PCIBIOS_MIN_IO or PCIBIOS_MIN_MEM). This is to
protect badly documented motherboard resources. But if we're allocating
space inside an already-configured PCI-PCI bridge window, we ignore "min".
See 688d191821 ("pci: make bus resource start address override minimum IO
address").
This patch moves the check to make it more visible and simplify future
patches. No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
We're accessing the PCI_COMMAND register here, so use the appropriate
#define. The bit we're writing (1 << 14) isn't defined by the PCI or PCIe
spec, so we don't have a name for it.
No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In i810_setup(), i830_setup(), and i9xx_setup(), we use the result of
pci_bus_address() as an argument to ioremap() and to compute gtt_phys_addr.
These should use pci_resource_start() instead because we want the CPU
physical address, not the bus address.
If there were an AGP device behind a host bridge that translated addresses,
e.g., a PNP0A08 device with _TRA != 0, this would fix a bug. I'm not aware
of any of those, but they are possible.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Per the Intel 915G/915GV/... Chipset spec (document number 301467-005),
GTTADR is a standard PCI BAR.
The PCI core reads GTTADR at enumeration-time. Use pci_bus_address()
instead of reading it again in the driver. This works correctly for both
32-bit and 64-bit BARs. The spec above only mentions 32-bit GTTADR, but we
should still use the standard interface.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Per the Intel 915G/915GV/... Chipset spec (document number 301467-005),
MMADR is a standard PCI BAR.
The PCI core reads MMADR at enumeration-time. Use pci_bus_address()
instead of reading it again in the driver. This works correctly for both
32-bit and 64-bit BARs. The spec above only mentions 32-bit MMADR, but we
should still use the standard interface.
Also, stop clearing the low 19 bits of the bus address because it's invalid
to use addresses outside the region defined by the BAR. The spec claims
MMADR is 512KB; if that's the case, those bits will be zero anyway.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Per the Intel 915G/915GV/... Chipset spec (document number 301467-005),
GMADR is a standard PCI BAR.
The PCI core reads GMADR at enumeration-time. Use pci_bus_address()
instead of reading it again in the driver. This works correctly for both
32-bit and 64-bit BARs. The spec above only mentions 32-bit GMADR, but
Yinghai's patch (link below) indicates some devices have a 64-bit GMADR.
[bhelgaas: reworked starting from http://lkml.kernel.org/r/1385851238-21085-13-git-send-email-yinghai@kernel.org]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The only use of gtt_bus_addr is as an argument to ioremap(), so it is a CPU
physical address, not a bus address. Rename it to gtt_phys_addr to reflect
this.
No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We're dealing with CPU physical addresses here, which may be different from
bus addresses, so rename gtt_bus_addr to gtt_phys_addr to avoid confusion.
No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
amd_irongate_configure(), ati_configure(), and nvidia_configure() call
ioremap() on an address read directly from a BAR. But a BAR contains a
bus address, and ioremap() expects a CPU physical address. Use
pci_resource_start() to obtain the physical address.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>