In order to have better power management for Thunderbolt PCIe chains,
Windows enables power management for native PCIe hotplug ports if there is
the following ACPI _DSD attached to the root port:
Name (_DSD, Package () {
ToUUID ("6211e2c0-58a3-4af3-90e1-927a4e0c55a4"),
Package () {
Package () {"HotPlugSupportInD3", 1}
}
})
This is also documented in:
https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3
Do the same in Linux by introducing new firmware PM callback
(->bridge_d3()) and then implement it for ACPI based systems so that the
above property is checked.
There is one catch, though. The initial pci_dev->bridge_d3 is set before
the root port has ACPI companion bound (the device is not added to the PCI
bus either) so we need to look up the ACPI companion manually in that case
in acpi_pci_bridge_d3().
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit baecc470d5 ("PCI / PM: Skip bridges in pci_enable_wake()") changed
pci_enable_wake() so that all bridges are skipped when wakeup is enabled
(or disabled) with the reasoning that bridges can only signal wakeup on
behalf of their subordinate devices.
However, there are bridges that can signal wakeup themselves. For example
PCIe downstream and root ports supporting hotplug may signal wakeup upon
hotplug event.
For this reason change pci_enable_wake() so that it skips all bridges
except those that we power manage (->bridge_d3 is set). Those are the ones
that can go into low power states and may need to signal wakeup.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The spec has timing requirements when waiting for a link to become active
after a conventional reset. Implement those hard delays when waiting for
an active link so pciehp and dpc drivers don't need to duplicate this.
For devices that don't support data link layer active reporting, wait the
fixed time recommended by the PCIe spec.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
PCIe r4.0, sec 7.5.1.1.4 defines a new bit in the Status Register:
Immediate Readiness – This optional bit, when Set, indicates the Function
is guaranteed to be ready to successfully complete valid configuration
accesses at any time following any reset that the host is capable of
issuing Configuration Requests to this Function.
When this bit is Set, for accesses to this Function, software is exempt
from all requirements to delay configuration accesses following any type
of reset, including but not limited to the timing requirements defined in
Section 6.6.
This means that all delays after a Conventional or Function Reset can be
skipped.
This patch reads such bit and caches its value in a flag inside struct
pci_dev to be checked later if we should delay or can skip delays after a
reset. While at that, also move the explicit msleep(100) call from
pcie_flr() and pci_af_flr() to pci_dev_wait().
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bhelgaas: rename PCI_STATUS_IMMEDIATE to PCI_STATUS_IMM_READY]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
On 38+ Intel-based ASUS products, the NVIDIA GPU becomes unusable after S3
suspend/resume. The affected products include multiple generations of
NVIDIA GPUs and Intel SoCs. After resume, nouveau logs many errors such
as:
fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04
[HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
DRM: failed to idle channel 0 [DRM]
Similarly, the NVIDIA proprietary driver also fails after resume (black
screen, 100% CPU usage in Xorg process). We shipped a sample to NVIDIA for
diagnosis, and their response indicated that it's a problem with the parent
PCI bridge (on the Intel SoC), not the GPU.
Runtime suspend/resume works fine, only S3 suspend is affected.
We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In the
cases that I checked, this register has value 0 and we just have to rewrite
that value.
Linux already saves and restores PCI config space during suspend/resume,
but this register was being skipped because upon resume, it already has
value 0 (the correct, pre-suspend value).
Intel appear to have previously acknowledged this behaviour and the
requirement to rewrite this register:
https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
Based on that, rewrite the prefetch register values even when that appears
unnecessary.
We have confirmed this solution on all the affected models we have in-hands
(X542UQ, UX533FD, X530UN, V272UN).
Additionally, this solves an issue where r8169 MSI-X interrupts were broken
after S3 suspend/resume on ASUS X441UAR. This issue was recently worked
around in commit 7bb05b85bc ("r8169: don't use MSI-X on RTL8106e"). It
also fixes the same issue on RTL6186evl/8111evl on an Aimfor-tech laptop
that we had not yet patched. I suspect it will also fix the issue that was
worked around in commit 7c53a72245 ("r8169: don't use MSI-X on
RTL8168g").
Thomas Martitz reports that this change also solves an issue where the AMD
Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive after S3
suspend/resume.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-By: Peter Wu <peter@lekensteyn.nl>
CC: stable@vger.kernel.org
The secondary bus reset may have link side effects that a hotplug capable
port may incorrectly react to. Use the slot specific reset for hotplug
ports, fixing the undesirable link down-up handling during error
recovering.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: fold in
https://lore.kernel.org/linux-pci/20180926152326.14821-1-keith.busch@intel.com
for issue reported by Stephen Rothwell <sfr@canb.auug.org.au>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
This patch provides DPC save and restore capabilities. This is necessary
for the driver to observe DPC events in the event the configuration space
needs to be restored after a reset.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Hotplug drivers cannot declare their hotplug_slot_ops const, making them
attractive targets for attackers, because upon registration of a hotplug
slot, __pci_hp_initialize() writes to the "owner" and "mod_name" members
in that struct.
Fix by moving these members to struct hotplug_slot and constify every
driver's hotplug_slot_ops except for pciehp.
pciehp constructs its hotplug_slot_ops at runtime based on the PCIe
port's capabilities, hence cannot declare them const. It can be
converted to __write_rarely once that's mainlined:
http://www.openwall.com/lists/kernel-hardening/2016/11/16/3
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa*
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides
that it returns pointer of bitmap type ("unsigned long *") instead of the
opaque "void *".
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Calling into the new API to reset the secondary bus results in a deadlock.
This occurs because the device/bus is already locked at probe time.
Reverting back to the old behavior while the API is improved.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
Fixes: c6a44ba950 ("PCI: Rename pci_try_reset_bus() to pci_reset_bus()")
Fixes: 409888e096 ("IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
The pci_reset_bus() function calls pci_probe_reset_slot() to determine
whether to call the slot or bus reset. The check has faulty logic in that
it does not account for pci_probe_reset_slot() being able to return an
errno. Fix by only calling the slot reset when the function returns 0.
Fixes: 811c5cb37d ("PCI: Unify try slot and bus reset API")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlt1f9AUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxbdhAArnhRvkwOk4m4/LCuKF6HpmlxbBNC
TjnBCenNf+lFXzWskfDFGFl/Wif4UzGbRTSCNQrwMzj3Ww3f/6R2QIq9rEJvyNC4
VdxQnaBEZSUgN87q5UGqgdjMTo3zFvlFH6fpb5XDiQ5IX/QZeXeYqoB64w+HvKPU
M+IsoOvnA5gb7pMcpchrGUnSfS1e6AqQbbTt6tZflore6YCEA4cH5OnpGx8qiZIp
ut+CMBvQjQB01fHeBc/wGrVte4NwXdONrXqpUb4sHF7HqRNfEh0QVyPhvebBi+k1
kquqoBQfPFTqgcab31VOcQhg70dEx+1qGm5/YBAwmhCpHR/g2gioFXoROsr+iUOe
BtF6LZr+Y8cySuhJnkCrJBqWvvBaKbJLg0KMbI+7p4o9MZpod2u7LS5LFrlRDyKW
3nz3o+b1+v3tCCKVKIhKo0ljolgkweQtR1f6KIHvq93wBODHVQnAOt9NlPfHVyks
ryGBnOhMjoU5hvfexgIWFk9Ph9MEVQSffkI+TeFPO/tyGBfGfQyGtESiXuEaMQaH
FGdZHX2RLkY3pWHOtWeMzRHzOnr2XjpDFcAqL3HBGPdJ30K3Umv3WOgoFe2SaocG
0gaddPjKSwwM4Sa/VP+O5cjGuzi7QnczSDdpYjxIGZzBav32hqx4/rsnLw7bHH8y
XkEme7cYJc8MGsA=
=2Dmn
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
- Decode AER errors with names similar to "lspci" (Tyler Baicar)
- Expose AER statistics in sysfs (Rajat Jain)
- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)
- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)
- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)
- Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
- Defer DPC event handling to work queue (Keith Busch)
- Use threaded IRQ for DPC bottom half (Keith Busch)
- Print AER status while handling DPC events (Keith Busch)
- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)
- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)
- Skip VFs when configuring Max Payload Size (Myron Stowe)
- Reduce Root Port Max Payload Size if necessary when hot-adding a
device below it (Myron Stowe)
- Simplify SHPC existence/permission checks (Bjorn Helgaas)
- Remove hotplug sample skeleton driver (Lukas Wunner)
- Convert pciehp to threaded IRQ handling (Lukas Wunner)
- Improve pciehp tolerance of missed events and initially unstable
links (Lukas Wunner)
- Clear spurious pciehp events on resume (Lukas Wunner)
- Add pciehp runtime PM support, including for Thunderbolt controllers
(Lukas Wunner)
- Support interrupts from pciehp bridges in D3hot (Lukas Wunner)
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)
- Move DMA-debug PCI init from arch code to PCI core (Christoph
Hellwig)
- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is
supplied (Heiner Kallweit)
- Unify PCI and DMA direction #defines (Shunyong Yang)
- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
- Check for VPD completion before checking for timeout (Bert Kenward)
- Limit Netronome NFP5000 config space size to work around erratum
(Jakub Kicinski)
- Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)
- Document ACPI description of PCI host bridges (Bjorn Helgaas)
- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
- Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)
- Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)
- To avoid bus errors, enable PASID only if entire path supports
End-End TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD
Controller (Bjorn Helgaas)
- Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)
- Remove Aardvark outbound window configuration (Evan Wang)
- Fix Aardvark bridge window sizing issue (Zachary Zhang)
- Convert Aardvark to use pci_host_probe() to reduce code duplication
(Thomas Petazzoni)
- Correct the Cadence cdns_pcie_writel() signature (Alan Douglas)
- Add Cadence support for optional generic PHYs (Alan Douglas)
- Add Cadence power management ops (Alan Douglas)
- Remove redundant variable from Cadence driver (Colin Ian King)
- Add Kirin MSI support (Xiaowei Song)
- Drop unnecessary root_bus_nr setting from exynos, imx6, keystone,
armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn
Guo)
- Move link notification settings from DesignWare core to individual
drivers (Gustavo Pimentel)
- Add endpoint library MSI-X interfaces (Gustavo Pimentel)
- Correct signature of endpoint library IRQ interfaces (Gustavo
Pimentel)
- Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel)
- Add endpoint library MSI-X test support (Gustavo Pimentel)
- Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation
(Jia-Ju Bai)
- Add more devices to Broadcom PAXC quirk (Ray Jui)
- Work around corrupted Broadcom PAXC config space to enable SMMU and
GICv3 ITS (Ray Jui)
- Disable MSI parsing to work around broken Broadcom PAXC logic in some
devices (Ray Jui)
- Hide unconfigured functions to work around a Broadcom PAXC defect
(Ray Jui)
- Lower iproc log level to reduce console output during boot (Ray Jui)
- Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi)
- Fix mobiveil missing include file (Lorenzo Pieralisi)
- Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi)
- Fix mvebu I/O space remapping issues (Thomas Petazzoni)
- Use generic pci_host_bridge in mvebu instead of ARM-specific API
(Thomas Petazzoni)
- Whitelist VMD devices with fast interrupt handlers to avoid sharing
vectors with slow handlers (Keith Busch)
* tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits)
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI: Limit config space size for Netronome NFP5000
PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Check for PCIe Link downtraining
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
...
- To avoid bus errors, enable PASID only if entire path supports End-End
TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller
(Bjorn Helgaas)
* pci/virtualization:
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: Rename pci_try_reset_bus() to pci_reset_bus()
PCI: Deprecate pci_reset_bus() and pci_reset_slot() functions
PCI: Unify try slot and bus reset API
PCI: Hide pci_reset_bridge_secondary_bus() from drivers
IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset
PCI: Handle error return from pci_reset_bridge_secondary_bus()
PCI/IOV: Tidy pci_sriov_set_totalvfs()
PCI: Enable PASID only if entire path supports End-End TLP prefixes
# Conflicts:
# drivers/pci/hotplug/pciehp_hpc.c
- Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)
- Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)
* pci/switchtec:
PCI: Expand documentation for pci_add_dma_alias()
PCI: Add DMA alias quirk for Microsemi Switchtec NTB
switchtec: Use generic PCI Vendor ID and Class Code
# Conflicts:
# drivers/pci/quirks.c
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
* pci/resource:
PCI: Make pci_get_rom_size() static
PCI: Add check code for last image indicator not set
PCI: Avoid accessing memory outside the ROM BAR
PCI: Make early dump functionality generic
PCI: Cleanup PCI_REBAR_CTRL_BAR_SHIFT handling
PCI: Restore resized BAR state on resume
PCI: Clean up resource allocation in devm_of_pci_get_host_bridge_resources()
# Conflicts:
# Documentation/admin-guide/kernel-parameters.txt
- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)
* pci/peer-to-peer:
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)
- Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig)
- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied
(Heiner Kallweit)
- Unify PCI and DMA direction #defines (Shunyong Yang)
- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
- Check for VPD completion before checking for timeout (Bert Kenward)
- Limit Netronome NFP5000 config space size to work around erratum (Jakub
Kicinski)
* pci/misc:
PCI: Limit config space size for Netronome NFP5000
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Unify PCI and normal DMA direction definitions
PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler
PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core
PCI: Mark fall-through switch cases before enabling -Wimplicit-fallthrough
# Conflicts:
# drivers/pci/hotplug/pciehp_ctrl.c
- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)
- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)
- Skip VFs when configuring Max Payload Size (Myron Stowe)
- Reduce Root Port Max Payload Size if necessary when hot-adding a device
below it (Myron Stowe)
* pci/enumeration:
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Check for PCIe Link downtraining
PCI: Workaround IDT switch ACS Source Validation erratum
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)
- Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
* pci/aspm:
PCI: Remove unnecessary include of <linux/pci-aspm.h>
iwlwifi: Remove unnecessary include of <linux/pci-aspm.h>
ath9k: Remove unnecessary include of <linux/pci-aspm.h>
igb: Remove unnecessary include of <linux/pci-aspm.h>
PCI/ASPM: Convert to use sysfs_match_string() helper
When both ends of a PCIe Link are capable of a higher bandwidth than is
currently in use, the Link is said to be "downtrained". A downtrained Link
may indicate hardware or configuration problems in the system, but it's
hard to identify such Links from userspace.
Refactor pcie_print_link_status() so it continues to always print PCIe
bandwidth information, as several NIC drivers desire.
Add a new internal __pcie_print_link_status() to emit a message only when a
device's bandwidth is constrained by the fabric and call it from the PCI
core for all devices, which identifies all downtrained Links. It also
emits messages for a few cases that are technically not downtrained, such
as a x4 device in an open-ended x1 slot.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: changelog, move __pcie_print_link_status() declaration to
drivers/pci/, rename pcie_check_upstream_link() to
pcie_report_downtraining()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Intel Sunrise Point (SPT) PCH hardware has an implementation of the ACS
bits that does not comply with the PCIe standard. To deal with this we
need device-specific quirks to disable ACS redirection.
Add a new pci_dev_specific_disable_acs_redir() quirk and a new
.disable_acs_redir() function pointer for use by non-compliant devices. No
functional change intended.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: split to separate patch, move
pci_dev_specific_disable_acs_redir() declarations to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
To support peer-to-peer traffic on a segment of the PCI hierarchy, we must
disable the ACS redirect bits for select PCI bridges. The bridges must be
selected before the devices are discovered by the kernel and the IOMMU
groups created. Therefore, add a kernel command line parameter to specify
devices which must have their ACS bits disabled.
The new parameter takes a list of devices separated by a semicolon. Each
device specified will have its ACS redirect bits disabled. This is
similar to the existing 'resource_alignment' parameter.
The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
Egress Control bits are disabled, which is sufficient to always allow
passing P2P traffic uninterrupted. The bits are set after the kernel
(optionally) enables the ACS bits itself. It is also done regardless of
whether the kernel or platform firmware sets the bits.
If the user tries to disable the ACS redirect for a device without the ACS
capability, print a warning to dmesg.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: reorder to add the generic code first and move the
device-specific quirk to subsequent patches]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
When specifying PCI devices on the kernel command line using a
bus/device/function address, bus numbers can change when adding or
replacing a device, changing motherboard firmware, or applying kernel
parameters like "pci=assign-buses". When bus numbers change, it's likely
the command line tweak will be applied to the wrong device.
Therefore, it is useful to be able to specify devices with a base bus
number and the path of devfns needed to get to it, similar to the "device
scope" structure in the Intel VT-d spec, Section 8.3.1.
Thus, we add an option to specify devices in the following format:
[<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
The path can be any segment within the PCI hierarchy of any length and
determined through the use of 'lspci -t'. When specified this way, it is
less likely that a renumbered bus will result in a valid device
specification and the tweak won't be applied to the wrong device.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Separate out the code to match a PCI device with a string (typically
originating from a kernel parameter) from the
pci_specified_resource_alignment() function into its own helper function.
While we are at it, this change fixes the kernel style of the function
(fixing a number of long lines and extra parentheses).
Additionally, make the analogous change to the kernel parameter
documentation: Separate the description of how to specify a PCI device
into its own section at the head of the "pci=" parameter.
This patch should have no functional alterations.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
pcie_flr() suggests pcie_has_flr() to ensure that PCIe FLR support is
present prior to calling. pcie_flr() is exported while pcie_has_flr()
is not. Resolve this.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Several PCI core files include pci-aspm.h even though they don't need
anything provided by that file. Remove the unnecessary includes of it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Thunderbolt controllers can be runtime suspended to D3cold to save ~1.5W.
This requires that runtime D3 is allowed on its PCIe ports, so whitelist
them.
The 2015 BIOS cutoff that we've instituted for runtime D3 on PCIe ports
is unnecessary on Thunderbolt because we know that even the oldest
controller, Light Ridge (2010), is able to suspend its ports to D3 just
fine -- specifically including its hotplug ports. And the power saving
should be afforded to machines even if their BIOS predates 2015.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andreas Noever <andreas.noever@gmail.com>
Previously we blacklisted PCIe hotplug ports for runtime D3 because:
(a) Ports handled by the firmware must not be transitioned to D3 by the
OS behind the firmware's back:
https://bugzilla.kernel.org/show_bug.cgi?id=53811
(b) Ports handled natively by the OS lacked runtime D3 support in the
pciehp driver.
We've just rectified the latter, so allow users to manually enable and
test it by passing pcie_port_pm=force on the command line. Vendors are
thus put in a position to validate hotplug ports for runtime D3 and
perhaps we can someday enable it by default, but with a BIOS cutoff date.
Ashok Raj tested runtime D3 on hotplug ports of a SkyLake Xeon-SP in
2017 and encountered Hardware Error NMIs, so this feature clearly cannot
be enabled for everyone yet:
https://lkml.kernel.org/r/20170503180426.GA4058@otc-nc-03
While at it, remove an erroneous code comment I added with 97a90aee5d
("PCI: Consolidate conditions to allow runtime PM on PCIe ports") which
claims that parents of a hotplug port must stay awake lest interrupts
cannot be delivered. That has turned out to be wrong at least for
Thunderbolt hotplug ports.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Now that the old implementation of pci_reset_bus() is gone, replace
pci_try_reset_bus() with pci_reset_bus().
Compared to the old implementation, new code will fail immmediately with
-EAGAIN if object lock cannot be obtained.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_reset_bus() and pci_reset_slot() functions are not being used by any
code. Remove them from the kernel in favor of pci_try_reset_bus() and
pci_try_reset_slot() functions.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Drivers are expected to call pci_try_reset_slot() or pci_try_reset_bus() by
querying if a system supports hotplug or not. A survey showed that most
drivers don't do this and we are leaking hotplug capability to the user.
Hide pci_try_slot_reset() from drivers and embed into pci_try_bus_reset().
Change pci_try_reset_bus() parameter from struct pci_bus to struct pci_dev.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Rename pci_reset_bridge_secondary_bus() to pci_bridge_secondary_bus_reset()
and move the declaration from linux/pci.h to drivers/pci.h to be used
internally in PCI directory only.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Commit 01fd61c0b9 ("PCI: Add a return type for
pci_reset_bridge_secondary_bus()") added a return value to the function to
return if a device is accessible following a reset. Callers are not
checking the value.
Pass error code up high in the stack if device is not accessible.
Fixes: 01fd61c0b9 ("PCI: Add a return type for pci_reset_bridge_secondary_bus()")
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.
Warning level 2 was used: -Wimplicit-fallthrough=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
So drivers can use them. This can be used to replace
duplicate code in the drm subsystem.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Seeing there's been some confusion about the use of pci_add_dma_alias(),
expand the comment to describe why it must be called early and how
early it must be called.
Also, expand on the purpose of this function and common reasons it would
be used.
[The comment was reworded to some extent by Alex Williamson]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Doug Meyer <dmeyer@gigaio.com>
Move early dump functionality into common code so that it is available for
all architectures. No need to carry arch-specific reads around as the read
hooks are already initialized by the time pci_setup_device() is getting
called during scan.
Tested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cleanup PCI_REBAR_CTRL_BAR_SHIFT handling. That was hard coded instead of
properly defined in the header for some reason.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Resize BARs after resume to the expected size again.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199959
Fixes: d6895ad39f ("drm/amdgpu: resize VRAM BAR for CPU access v6")
Fixes: 276b738deb ("PCI: Add resizable BAR infrastructure")
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v4.15+
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlsZdg0UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwJOBAAsuuWsOdiJRRhQLU5WfEMFgzcL02R
gsumqZkK7E8LOq0DPNMtcgv9O0KgYZyCiZyTMJ8N7sEYohg04lMz8mtYXOibjcwI
p+nVMko8jQXV9FXwSMGVqigEaLLcrbtkbf/mPriD63DDnRMa/+/Jh15SwfLTydIH
QRTJbIxkS3EiOauj5C8QY3UwzjlvV9mDilzM/x+MSK27k2HFU9Pw/3lIWHY716rr
grPZTwBTfIT+QFZjwOm6iKzHjxRM830sofXARkcH4CgSNaTeq5UbtvAs293MHvc+
v/v/1dfzUh00NxfZDWKHvTUMhjazeTeD9jEVS7T+HUcGzvwGxMSml6bBdznvwKCa
46ynePOd1VcEBlMYYS+P4njRYBLWeUwt6/TzqR4yVwb0keQ6Yj3Y9H2UpzscYiCl
O+0qz6RwyjKY0TpxfjoojgHn4U5ByI5fzVDJHbfr2MFTqqRNaabVrfl6xU4sVuhh
OluT5ym+/dOCTI/wjlolnKNb0XThVre8e2Busr3TRvuwTMKMIWqJ9sXLovntdbqE
furPD/UnuZHkjSFhQ1SQwYdWmsZI5qAq2C9haY8sEWsXEBEcBGLJ2BEleMxm8UsL
KXuy4ER+R4M+sFtCkoWf3D4NTOBUdPHi4jyk6Ooo1idOwXCsASVvUjUEG5YcQC6R
kpJ1VPTKK1XN64I=
=aFAi
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- unify AER decoding for native and ACPI CPER sources (Alexandru
Gagniuc)
- add TLP header info to AER tracepoint (Thomas Tai)
- add generic pcie_wait_for_link() interface (Oza Pawandeep)
- handle AER ERR_FATAL by removing and re-enumerating devices, as
Downstream Port Containment does (Oza Pawandeep)
- factor out common code between AER and DPC recovery (Oza Pawandeep)
- stop triggering DPC for ERR_NONFATAL errors (Oza Pawandeep)
- share ERR_FATAL recovery path between AER and DPC (Oza Pawandeep)
- disable ASPM L1.2 substate if we don't have LTR (Bjorn Helgaas)
- respect platform ownership of LTR (Bjorn Helgaas)
- clear interrupt status in top half to avoid interrupt storm (Oza
Pawandeep)
- neaten pci=earlydump output (Andy Shevchenko)
- avoid errors when extended config space inaccessible (Gilles Buloz)
- prevent sysfs disable of device while driver attached (Christoph
Hellwig)
- use core interface to report PCIe link properties in bnx2x, bnxt_en,
cxgb4, ixgbe (Bjorn Helgaas)
- remove unused pcie_get_minimum_link() (Bjorn Helgaas)
- fix use-before-set error in ibmphp (Dan Carpenter)
- fix pciehp timeouts caused by Command Completed errata (Bjorn
Helgaas)
- fix refcounting in pnv_php hotplug (Julia Lawall)
- clear pciehp Presence Detect and Data Link Layer Status Changed on
resume so we don't miss hotplug events (Mika Westerberg)
- only request pciehp control if we support it, so platform can use
ACPI hotplug otherwise (Mika Westerberg)
- convert SHPC to be builtin only (Mika Westerberg)
- request SHPC control via _OSC if we support it (Mika Westerberg)
- simplify SHPC handoff from firmware (Mika Westerberg)
- fix an SHPC quirk that mistakenly included *all* AMD bridges as well
as devices from any vendor with device ID 0x7458 (Bjorn Helgaas)
- assign a bus number even to non-native hotplug bridges to leave
space for acpiphp additions, to fix a common Thunderbolt xHCI
hot-add failure (Mika Westerberg)
- keep acpiphp from scanning native hotplug bridges, to fix common
Thunderbolt hot-add failures (Mika Westerberg)
- improve "partially hidden behind bridge" messages from core (Mika
Westerberg)
- add macros for PCIe Link Control 2 register (Frederick Lawler)
- replace IB/hfi1 custom macros with PCI core versions (Frederick
Lawler)
- remove dead microblaze and xtensa code (Bjorn Helgaas)
- use dev_printk() when possible in xtensa and mips (Bjorn Helgaas)
- remove unused pcie_port_acpi_setup() and portdrv_acpi.c (Bjorn
Helgaas)
- add managed interface to get PCI host bridge resources from OF (Jan
Kiszka)
- add support for unbinding generic PCI host controller (Jan Kiszka)
- fix memory leaks when unbinding generic PCI host controller (Jan
Kiszka)
- request legacy VGA framebuffer only for VGA devices to avoid false
device conflicts (Bjorn Helgaas)
- turn on PCI_COMMAND_IO & PCI_COMMAND_MEMORY in pci_enable_device()
like everybody else, not in pcibios_fixup_bus() (Bjorn Helgaas)
- add generic enable function for simple SR-IOV hardware (Alexander
Duyck)
- use generic SR-IOV enable for ena, nvme (Alexander Duyck)
- add ACS quirk for Intel 7th & 8th Gen mobile (Alex Williamson)
- add ACS quirk for Intel 300 series (Mika Westerberg)
- enable register clock for Armada 7K/8K (Gregory CLEMENT)
- reduce Keystone "link already up" log level (Fabio Estevam)
- move private DT functions to drivers/pci/ (Rob Herring)
- factor out dwc CONFIG_PCI Kconfig dependencies (Rob Herring)
- add DesignWare support to the endpoint test driver (Gustavo
Pimentel)
- add DesignWare support for endpoint mode (Gustavo Pimentel)
- use devm_ioremap_resource() instead of devm_ioremap() in dra7xx and
artpec6 (Gustavo Pimentel)
- fix Qualcomm bitwise NOT issue (Dan Carpenter)
- add Qualcomm runtime PM support (Srinivas Kandagatla)
- fix DesignWare enumeration below bridges (Koen Vandeputte)
- use usleep() instead of mdelay() in endpoint test (Jia-Ju Bai)
- add configfs entries for pci_epf_driver device IDs (Kishon Vijay
Abraham I)
- clean up pci_endpoint_test driver (Gustavo Pimentel)
- update Layerscape maintainer email addresses (Minghuan Lian)
- add COMPILE_TEST to improve build test coverage (Rob Herring)
- fix Hyper-V bus registration failure caused by domain/serial number
confusion (Sridhar Pitchai)
- improve Hyper-V refcounting and coding style (Stephen Hemminger)
- avoid potential Hyper-V hang waiting for a response that will never
come (Dexuan Cui)
- implement Mediatek chained IRQ handling (Honghui Zhang)
- fix vendor ID & class type for Mediatek MT7622 (Honghui Zhang)
- add Mobiveil PCIe host controller driver (Subrahmanya Lingappa)
- add Mobiveil MSI support (Subrahmanya Lingappa)
- clean up clocks, MSI, IRQ mappings in R-Car probe failure paths
(Marek Vasut)
- poll more frequently (5us vs 5ms) while waiting for R-Car data link
active (Marek Vasut)
- use generic OF parsing interface in R-Car (Vladimir Zapolskiy)
- add R-Car V3H (R8A77980) "compatible" string (Sergei Shtylyov)
- add R-Car gen3 PHY support (Sergei Shtylyov)
- improve R-Car PHYRDY polling (Sergei Shtylyov)
- clean up R-Car macros (Marek Vasut)
- use runtime PM for R-Car controller clock (Dien Pham)
- update arm64 defconfig for Rockchip (Shawn Lin)
- refactor Rockchip code to facilitate both root port and endpoint
mode (Shawn Lin)
- add Rockchip endpoint mode driver (Shawn Lin)
- support VMD "membar shadow" feature (Jon Derrick)
- support VMD bus number offsets (Jon Derrick)
- add VMD "no AER source ID" quirk for more device IDs (Jon Derrick)
- remove unnecessary host controller CONFIG_PCIEPORTBUS Kconfig
selections (Bjorn Helgaas)
- clean up quirks.c organization and whitespace (Bjorn Helgaas)
* tag 'pci-v4.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (144 commits)
PCI/AER: Replace struct pcie_device with pci_dev
PCI/AER: Remove unused parameters
PCI: qcom: Include gpio/consumer.h
PCI: Improve "partially hidden behind bridge" log message
PCI: Improve pci_scan_bridge() and pci_scan_bridge_extend() doc
PCI: Move resource distribution for single bridge outside loop
PCI: Account for all bridges on bus when distributing bus numbers
ACPI / hotplug / PCI: Drop unnecessary parentheses
ACPI / hotplug / PCI: Mark stale PCI devices disconnected
ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug
PCI: hotplug: Add hotplug_is_native()
PCI: shpchp: Add shpchp_is_native()
PCI: shpchp: Fix AMD POGO identification
PCI: mobiveil: Add MSI support
PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver
PCI/AER: Decode Error Source Requester ID
PCI/AER: Remove aer_recover_work_func() forward declaration
PCI/DPC: Use the generic pcie_do_fatal_recovery() path
PCI/AER: Pass service type to pcie_do_fatal_recovery()
PCI/DPC: Disable ERR_NONFATAL handling by DPC
...
- add generic enable function for simple SR-IOV hardware (Alexander
Duyck)
- use generic SR-IOV enable for ena, nvme (Alexander Duyck)
- add ACS quirk for Intel 7th & 8th Gen mobile (Alex Williamson)
- add ACS quirk for Intel 300 series (Mika Westerberg)
* pci/virtualization:
PCI/IOV: Allow PF drivers to limit total_VFs to 0
PCI: Add "pci=noats" boot parameter
PCI: Add ACS quirk for Intel 300 series
PCI: Add ACS quirk for Intel 7th & 8th Gen mobile
nvme-pci: Use pci_sriov_configure_simple() to enable VFs
net: ena: Use pci_sriov_configure_simple() to enable VFs
PCI/IOV: Add pci-pf-stub driver for PFs that only enable VFs
PCI/IOV: Add pci_sriov_configure_simple()
- neaten pci=earlydump output (Andy Shevchenko)
- avoid errors when extended config space inaccessible (Gilles Buloz)
- prevent sysfs disable of device while driver attached (Christoph
Hellwig)
- use core interface to report PCIe link properties in bnx2x, bnxt_en,
cxgb4, ixgbe (Bjorn Helgaas)
- remove unused pcie_get_minimum_link() (Bjorn Helgaas)
* pci/enumeration:
PCI: Remove unused pcie_get_minimum_link()
ixgbe: Report PCIe link properties with pcie_print_link_status()
cxgb4: Report PCIe link properties with pcie_print_link_status()
bnxt_en: Report PCIe link properties with pcie_print_link_status()
bnx2x: Report PCIe link properties with pcie_print_link_status()
PCI: Prevent sysfs disable of device while driver is attached
PCI: Check whether bridges allow access to extended config space
x86/PCI: Make pci=earlydump output neat
In some cases pcie_get_minimum_link() returned misleading information
because it found the slowest link and the narrowest link without
considering the total bandwidth of the link.
For example, consider a path with these two links:
- 16.0 GT/s x1 link (16.0 * 10^9 * 128 / 130) * 1 / 8 = 1969 MB/s
- 2.5 GT/s x16 link ( 2.5 * 10^9 * 8 / 10) * 16 / 8 = 4000 MB/s
The available bandwidth of the path is limited by the 16 GT/s link to about
1969 MB/s, but pcie_get_minimum_link() returned 2.5 GT/s x1, which
corresponds to only 250 MB/s.
Callers should use pcie_print_link_status() instead, or
pcie_bandwidth_available() if they need more detailed information.
Remove pcie_get_minimum_link() since there are no callers left.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Two comments in pci_target_state() are outdated, as the function
doesn't set the target power state for the device any more, only
finds one for it, so fix them accordingly.
Reported-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Clients such as hotplug and Downstream Port Containment (DPC) both need to
wait until a link becomes active or inactive.
Add a generic pcie_wait_link_active() interface and use it instead of
duplicating the code.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
The only user of pci_get_new_domain_nr() is of_pci_bus_find_domain_nr().
Since they are defined in the same file, pci_get_new_domain_nr() can be
made static, which also simplifies preprocessor conditionals.
No functional change intended.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Adds a "pci=noats" boot parameter. When supplied, all ATS related
functions fail immediately and the IOMMU is configured to not use
device-IOTLB.
Any function that checks for ATS capabilities directly against the devices
should also check this flag. Currently, such functions exist only in IOMMU
drivers, and they are covered by this patch.
The motivation behind this patch is the existence of malicious devices.
Lots of research has been done about how to use the IOMMU as protection
from such devices. When ATS is supported, any I/O device can access any
physical address by faking device-IOTLB entries. Adding the ability to
ignore these entries lets sysadmins enhance system security.
Signed-off-by: Gil Kupfer <gilkup@cs.technion.ac.il>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Commit 0847684cfc (PCI / PM: Simplify device wakeup settings code)
went too far and dropped the device_may_wakeup() check from
pci_enable_wake() which causes wakeup to be enabled during system
suspend, hibernation or shutdown for some PCI devices that are not
allowed by user space to wake up the system from sleep (or power off).
As a result of this, excessive power is drawn by some of the affected
systems while in sleep states or off.
Restore the device_may_wakeup() check in pci_enable_wake(), but make
sure that the PCI bus type's runtime suspend callback will not call
device_may_wakeup() which is about system wakeup from sleep and not
about device wakeup from runtime suspend.
Fixes: 0847684cfc (PCI / PM: Simplify device wakeup settings code)
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
USB controller ASM1042 stops working after commit de3ef1eb1c (PM /
core: Drop run_wake flag from struct dev_pm_info).
The device in question is not power managed by platform firmware,
furthermore, it only supports PME# from D3cold:
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Before commit de3ef1eb1c, the device never gets runtime suspended.
After that commit, the device gets runtime suspended to D3hot, which can
not generate any PME#.
usb_hcd_pci_probe() unconditionally calls device_wakeup_enable(), hence
device_can_wakeup() in pci_dev_run_wake() always returns true.
So pci_dev_run_wake() needs to check PME wakeup capability as its first
condition.
In addition, change wakeup flag passed to pci_target_state() from false
to true, because we want to find the deepest state different from D3cold
that the device can still generate PME#. In this case, it's D0 for the
device in question.
Fixes: de3ef1eb1c (PM / core: Drop run_wake flag from struct dev_pm_info)
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently the pcie_print_link_status() will print PCIe bandwidth and link
width information but does not mention it is pertaining to the PCIe. Since
this and related functions are used exclusively by networking drivers today
users may get confused into thinking that it's the NIC bandwidth that is
being talked about. Insert a "PCIe" into the messages.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When reassigning device resources to increase their alignment, e.g.,
because of a "pci=resource_alignment=" kernel parameter or because the
platform aligns resources to its page size, we previously emitted messages
like this:
pci 0000:00:00.0: Disabling memory decoding and releasing memory resources
pci 0000:00:00.0: disabling bridge mem windows
These messages don't convey any useful information, so remove them.
Fixes: 3827463769 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned")
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.vnet.ibm.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlrHeY8UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxhLRAAndV/0NDyWZU0eZNM6twri2SEFnF7
E4ar+YthxDxxJG4TLJbIA12jc5NgHZy4WuttDa6Jb99KreBXIHJFlNi/V/tme6zf
+yXUuxWae7wJzBiaay57VqLGSc80gt/LTgjLa1siwQqjTbO3wSXR6JJXNaE9FtQ4
/jL61t8bD1Peb5cWTpt9p0hrnKI0/pHwASdReyFS4F/HDKdvpof7BxE/OU3HSxxA
XKC2v6RjY4S93vkzvApDXQ+vhKquVRK7/ojyTXQUO/GIzcARprO7H4k62N4ar0x/
qbXLkR8IMkwA8ecsNmcL92ftb/cXoHfd+wdK8WpijqzF4kW4SdteVWbIhUzI0gbr
0gjDYIzjplvH3pZGv/qvx+8sFtAP95OdPjuAAW2qJ9TCVfmiS8naNFCvcxg87RhD
gjyQD3If1X7F8wy309lhq7VNyRexTHgIMgTXHyFvuZMzn/Qe1huL2XCwDcEAg/OX
AvU2iuSE5tWAh7gIUMF/aWi3uoeJUyyoru5ZR//gqdFfx9YxpSimO1UDXnpPi8SR
Iz/jzHJc0aWGYdQ9l6HiSbJF3P/QQcWYs9igt0A7BRGB05SPdWCh7sSO70FJa8ME
f4WID5/qEiaH26kiSRX4cUqpc8Amk8bT0DXw2OT57qy3JM0ZdV5ENQX11pSpr9hv
uLEf0DU7AEmdvzQ=
=T++R
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- move pci_uevent_ers() out of pci.h (Michael Ellerman)
- skip ASPM common clock warning if BIOS already configured it (Sinan
Kaya)
- fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)
- remove last user of pci_get_bus_and_slot() and the function itself
(Sinan Kaya)
- add decoding for 16 GT/s link speed (Jay Fang)
- add interfaces to get max link speed and width (Tal Gilboa)
- add pcie_bandwidth_capable() to compute max supported link bandwidth
(Tal Gilboa)
- add pcie_bandwidth_available() to compute bandwidth available to
device (Tal Gilboa)
- add pcie_print_link_status() to log link speed and whether it's
limited (Tal Gilboa)
- use PCI core interfaces to report when device performance may be
limited by its slot instead of doing it in each driver (Tal Gilboa)
- fix possible cpqphp NULL pointer dereference (Shawn Lin)
- rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
hotplug (Mika Westerberg)
- add support for PCI I/O port space that's neither directly accessible
via CPU in/out instructions nor directly mapped into CPU physical
memory space. This is fairly intrusive and includes minor changes to
interfaces used for I/O space on most platforms (Zhichang Yuan, John
Garry)
- add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
John Garry)
- use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)
- remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
(Shawn Lin)
- report quirk timings with dev_info (Bjorn Helgaas)
- report quirks that take longer than 10ms (Bjorn Helgaas)
- add and use Altera Vendor ID (Johannes Thumshirn)
- tidy Makefiles and comments (Bjorn Helgaas)
- don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
ia64, and mn10300 with x86 (Bjorn Helgaas)
- move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
Lawler)
- merge pcieport_if.h into portdrv.h (Bjorn Helgaas)
- move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
Helgaas)
- completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)
- remove portdrv link order dependency (Bjorn Helgaas)
- remove support for unused VC portdrv service (Bjorn Helgaas)
- simplify portdrv feature permission checking (Bjorn Helgaas)
- remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
Helgaas)
- remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)
- use cached AER capability offset (Frederick Lawler)
- don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)
- rename pcie-dpc.c to dpc.c (Bjorn Helgaas)
- use generic pci_mmap_resource_range() instead of powerpc and xtensa
arch-specific versions (David Woodhouse)
- support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)
- remove System and Video ROM reservations on sparc (Bjorn Helgaas)
- probe for device reset support during enumeration instead of runtime
(Bjorn Helgaas)
- add ACS quirk for Ampere (née APM) root ports (Feng Kan)
- add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
Vincent-Cross)
- protect device restore with device lock (Sinan Kaya)
- handle failure of FLR gracefully (Sinan Kaya)
- handle CRS (config retry status) after device resets (Sinan Kaya)
- skip various config reads for SR-IOV VFs as an optimization
(KarimAllah Ahmed)
- consolidate VPD code in vpd.c (Bjorn Helgaas)
- add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)
- add DT support for R-Car r8a7743 (Biju Das)
- fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
bridge driver that causes a general protection fault (Dexuan Cui)
- fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
(Dexuan Cui)
- fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
(Dexuan Cui)
- make several structures static (Fengguang Wu)
- increase number of MSI IRQs supported by Synopsys DesignWare bridges
from 32 to 256 (Gustavo Pimentel)
- implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
API from DesignWare drivers (Gustavo Pimentel)
- add Tegra power management support (Manikanta Maddireddy)
- add Tegra loadable module support (Manikanta Maddireddy)
- handle 64-bit BARs correctly in endpoint support (Niklas Cassel)
- support optional regulator for HiSilicon STB (Shawn Guo)
- use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)
- support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)
* tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
HISI LPC: Add ACPI support
ACPI / scan: Do not enumerate Indirect IO host children
ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
of: Add missing I/O range exception for indirect-IO devices
PCI: Apply the new generic I/O management on PCI IO hosts
PCI: Add fwnode handler as input param of pci_register_io_range()
PCI: Remove __weak tag from pci_register_io_range()
MAINTAINERS: Add missing /drivers/pci/cadence directory entry
fm10k: Report PCIe link properties with pcie_print_link_status()
net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
net/mlx5: Report PCIe link properties with pcie_print_link_status()
net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
PCI: Add pcie_print_link_status() to log link speed and whether it's limited
PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
misc: pci_endpoint_test: Handle 64-bit BARs properly
PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
...
- probe for device reset support during enumeration instead of runtime
(Bjorn Helgaas)
- add ACS quirk for Ampere (née APM) root ports (Feng Kan)
- add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
Vincent-Cross)
- protect device restore with device lock (Sinan Kaya)
- handle failure of FLR gracefully (Sinan Kaya)
- handle CRS (config retry status) after device resets (Sinan Kaya)
- skip various config reads for SR-IOV VFs as an optimization (KarimAllah
Ahmed)
* pci/virtualization:
PCI/IOV: Add missing prototypes for powerpc pcibios interfaces
PCI/IOV: Use VF0 cached config registers for other VFs
PCI/IOV: Skip BAR sizing for VFs
PCI/IOV: Skip INTx config reads for VFs
PCI: Wait for device to become ready after secondary bus reset
PCI: Add a return type for pci_reset_bridge_secondary_bus()
PCI: Wait for device to become ready after a power management reset
PCI: Rename pci_flr_wait() to pci_dev_wait() and make it generic
PCI: Handle FLR failure and allow other reset types
PCI: Protect restore with device lock to be consistent
PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
PCI: Add ACS quirk for Ampere root ports
PCI: Remove redundant probes for device reset support
PCI: Probe for device reset support during enumeration
Conflicts:
include/linux/pci.h
- move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
Lawler)
- merge pcieport_if.h into portdrv.h (Bjorn Helgaas)
- move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
Helgaas)
- completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)
- remove portdrv link order dependency (Bjorn Helgaas)
- remove support for unused VC portdrv service (Bjorn Helgaas)
- simplify portdrv feature permission checking (Bjorn Helgaas)
- remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
Helgaas)
- remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)
- use cached AER capability offset (Frederick Lawler)
- don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)
- rename pcie-dpc.c to dpc.c (Bjorn Helgaas)
* pci/portdrv:
PCI/DPC: Rename from pcie-dpc.c to dpc.c
PCI/DPC: Do not enable DPC if AER control is not allowed by the BIOS
PCI/AER: Use cached AER Capability offset
PCI/portdrv: Rename and reverse sense of pcie_ports_auto
PCI/portdrv: Encapsulate pcie_ports_auto inside the port driver
PCI/portdrv: Remove unnecessary "pcie_ports=auto" parameter
PCI/portdrv: Remove "pcie_hp=nomsi" kernel parameter
PCI/portdrv: Remove unnecessary include of <linux/pci-aspm.h>
PCI/portdrv: Simplify PCIe feature permission checking
PCI/portdrv: Remove unused PCIE_PORT_SERVICE_VC
PCI/portdrv: Remove pcie_port_bus_type link order dependency
PCI/portdrv: Disable port driver in compat mode
PCI/PM: Clear PCIe PME Status bit for Root Complex Event Collectors
PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver
PCI/PM: Move pcie_clear_root_pme_status() to core
PCI/portdrv: Merge pcieport_if.h into portdrv.h
PCI/portdrv: Move pcieport_if.h to drivers/pci/pcie/
Conflicts:
drivers/pci/pcie/Makefile
drivers/pci/pcie/portdrv.h
- use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)
- remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
(Shawn Lin)
- report quirk timings with dev_info (Bjorn Helgaas)
- report quirks that take longer than 10ms (Bjorn Helgaas)
- add and use Altera Vendor ID (Johannes Thumshirn)
- tidy Makefiles and comments (Bjorn Helgaas)
* pci/misc:
PCI: Always define the of_node helpers
PCI: Tidy comments
PCI: Tidy Makefiles
mcb: Add Altera PCI ID to mcb-pci
PCI: Add Altera vendor ID
PCI: Report quirks that take more than 10ms
PCI: Report quirk timings with pci_info() instead of pr_debug()
PCI: Fix NULL pointer dereference in of_pci_bus_find_domain_nr()
rapidio/tsi721: use PCI_EXP_DEVCTL2_COMP_TIMEOUT macro
- add support for PCI I/O port space that's neither directly accessible
via CPU in/out instructions nor directly mapped into CPU physical
memory space (Zhichang Yuan)
- add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
John Garry)
* pci/lpc:
MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
HISI LPC: Add ACPI support
ACPI / scan: Do not enumerate Indirect IO host children
ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
of: Add missing I/O range exception for indirect-IO devices
PCI: Apply the new generic I/O management on PCI IO hosts
PCI: Add fwnode handler as input param of pci_register_io_range()
PCI: Remove __weak tag from pci_register_io_range()
lib: Add generic PIO mapping method
After introducing the new generic I/O space management (Logical PIO), the
original PCI MMIO relevant helpers need to be updated based on the new
interfaces defined in logical PIO.
Adapt the corresponding code to match the changes introduced by logical
PIO.
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Zhichang Yuan <yuanzhichang@hisilicon.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de> # earlier draft
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
In preparation for having the PCI MMIO helpers use the new generic I/O
space management (logical PIO) we need to add the fwnode handler as an
extra input parameter.
Changes the signature of pci_register_io_range() and its callers as
needed.
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
pci_register_io_range() has only one definition, so there is no need for
the __weak attribute. Remove it.
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Add pcie_print_link_status(). This logs the current settings of the link
(speed, width, and total available bandwidth).
If the device is capable of more bandwidth but is limited by a slower
upstream link, we include information about the link that limits the
device's performance.
The user may be able to move the device to a different slot for better
performance.
This provides a unified method for all PCI devices to report status and
issues, instead of each device reporting in a different way, using
different code.
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
[bhelgaas: changelog, reword log messages, print device capabilities when
not limited, print bandwidth in Gb/s]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add pcie_bandwidth_available() to compute the bandwidth available to a
device. This may be limited by the device itself or by a slower upstream
link leading to the device.
The available bandwidth at each link along the path is computed as:
link_width * link_speed * (1 - encoding_overhead)
2.5 and 5.0 GT/s links use 8b/10b encoding, which reduces the raw bandwidth
available by 20%; 8.0 GT/s and faster links use 128b/130b encoding, which
reduces it by about 1.5%.
The result is in Mb/s, i.e., megabits/second, of raw bandwidth.
Also return the device with the slowest link and the speed and width of
that link.
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
[bhelgaas: changelog, leave pcie_get_minimum_link() alone for now, return
bw directly, use pci_upstream_bridge(), check "next_bw <= bw" to find
uppermost limiting device, return speed/width of the limiting device]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Pull x86 platform updates from Ingo Molnar:
"The main changes in this cycle were:
- Add "Jailhouse" hypervisor support (Jan Kiszka)
- Update DeviceTree support (Ivan Gorinov)
- Improve DMI date handling (Andy Shevchenko)"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/PCI: Fix a potential regression when using dmi_get_bios_year()
firmware/dmi_scan: Uninline dmi_get_bios_year() helper
x86/devicetree: Use CPU description from Device Tree
of/Documentation: Specify local APIC ID in "reg"
MAINTAINERS: Add entry for Jailhouse
x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
x86: Consolidate PCI_MMCONFIG configs
x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
x86/jailhouse: Enable PCI mmconfig access in inmates
PCI: Scan all functions when running over Jailhouse
jailhouse: Provide detection for non-x86 systems
x86/devicetree: Fix device IRQ settings in DT
x86/devicetree: Initialize device tree before using it
pci: Simplify code by using the new dmi_get_bios_year() helper
ACPI/sleep: Simplify code by using the new dmi_get_bios_year() helper
x86/pci: Simplify code by using the new dmi_get_bios_year() helper
dmi: Introduce the dmi_get_bios_year() helper function
x86/platform/quark: Re-use DEFINE_SHOW_ATTRIBUTE() macro
x86/platform/atom: Re-use DEFINE_SHOW_ATTRIBUTE() macro
Add pcie_bandwidth_capable() to compute the max link bandwidth supported by
a device, based on the max link speed and width, adjusted by the encoding
overhead.
The maximum bandwidth of the link is computed as:
max_link_width * max_link_speed * (1 - encoding_overhead)
2.5 and 5.0 GT/s links use 8b/10b encoding, which reduces the raw bandwidth
available by 20%; 8.0 GT/s and faster links use 128b/130b encoding, which
reduces it by about 1.5%.
The result is in Mb/s, i.e., megabits/second, of raw bandwidth.
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
[bhelgaas: add 16 GT/s, adjust for pcie_get_speed_cap() and
pcie_get_width_cap() signatures, don't export outside drivers/pci]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add pcie_get_width_cap() to find the max link width supported by a device.
Change max_link_width_show() to use pcie_get_width_cap().
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
[bhelgaas: return width directly instead of error and *width, don't export
outside drivers/pci]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Add pcie_get_speed_cap() to find the max link speed supported by a device.
Change max_link_speed_show() to use pcie_get_speed_cap().
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
[bhelgaas: return speed directly instead of error and *speed, don't export
outside drivers/pci]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Remove pointless comments that tell us the file name, remove blank line
comments, follow multi-line comment conventions. No functional change
intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There are PCI devices which are power-manageable by a nonstandard means,
such as a custom ACPI method. One example are discrete GPUs in hybrid
graphics laptops, another are Thunderbolt controllers in Macs.
Such devices can't be put into D3cold with pci_set_power_state() because
pci_platform_power_transition() fails with -ENODEV. Instead they're put
into D3hot by pci_set_power_state() and subsequently into D3cold by
invoking the nonstandard means. However as a consequence the cached
current_state is incorrectly left at D3hot.
What we need to do is walk the hierarchy below such a PCI device on
powerdown and update the current_state to D3cold. On powerup the PCI
device itself and the hierarchy below it is in D0uninitialized, so we
need to walk the hierarchy again and wake all devices, causing them to
be put into D0active and then letting them autosuspend as they see fit.
To this end make pci_wakeup_bus() & pci_bus_set_current_state() public
so PCI drivers don't have to reinvent the wheel.
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://patchwork.freedesktop.org/patch/msgid/2962443259e7faec577274b4ef8c54aad66f9a94.1520068884.git.lukas@wunner.de
Move pcie_clear_root_pme_status() from the port driver to the PCI core so
it will be available even when the port driver isn't present. No
functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Setting Secondary Bus Reset of a downstream port sends a hot reset. PCIe
r4.0, sec 2.3.1, Request Handling Rules, indicates that a device can return
CRS Completion Status following such a reset. Wait until the device
becomes ready in that situation.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a return value to pci_reset_bridge_secondary_bus() so we can return an
error if the device doesn't become ready after the reset.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
PCIe r4.0, sec 2.3.1, Request Handling Rules, indicates that a device can
return CRS Completion Status following a D3hot to D0 transition. Wait
until the device becomes ready in that situation.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
If the "parent" pointer passed to of_pci_bus_find_domain_nr() is NULL,
don't dereference it.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCIe r4.0, sec 2.3.1, Request Handling Rules, says:
Valid reset conditions after which a device is permitted to return CRS
are:
* Cold, Warm, and Hot Resets,
* FLR
* A reset initiated in response to a D3hot to D0 uninitialized
Try to reuse FLR implementation towards other reset types.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_flr_wait() and pci_af_flr() functions assume graceful return even
though the device is inaccessible under error conditions.
Return -ENOTTY in error cases so that __pci_reset_function_locked() can
try other reset types if AF_FLR/FLR reset fails.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Commit b014e96d1a ("PCI: Protect pci_error_handlers->reset_notify() usage
with device_lock()") added protection around pci_dev_restore() function so
a device-specific remove callback does not cause a race condition with
hotplug.
pci_dev_lock() usage has been forgotten in two places. Add locks for
pci_slot_restore() and moving pci_dev_restore() inside the locks for
pci_try_reset_function().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
...instead of open coding its functionality.
No changes in functionality.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/20180222125923.57385-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We probe every device for whether it supports reset so we can tell whether
to create a sysfs "reset" file for it. We do that probe in
pci_init_capabilities() during enumeration and save the result in
dev->reset_fn. The result doesn't depend on any other devices on the bus
and shouldn't change after boot, so we don't need to do the probe again.
Remove the pci_probe_reset_function() calls and rely on the dev->reset_fn
we found during enumeration. No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJad5lgAAoJEFmIoMA60/r8s2kQAI3PztawDpaCP9Z12pkbBHSt
Ho0xTyk9rCZi9kQJbNjc+a+QrlA3QmTHXIXerB3LSWoh7M+XhsECjem92eHpgLNS
JvYPhTfOrCr0vdiAmOz6hD0AqN/psrbfzgiJhSwomsGEFS77k7kERSJckRv81sxb
Aj5F/WjucAgLorwm4auveAJEQ7atE7/6pkXzoqYm4G6NLOb46jUcRGndrnvXZBlz
fws8fBM4BHyi7i25CYQl24tFq1CGax1rIPgLg+4KnH76bQk/N6Ju0sGVSzfh+hG8
SIerK9bJbzGRAuNKoxB3aO1dyzsK3x9WztE2mG98w5trOISPIR1FqnvC/225FWAU
d6eIXiC7wKnEx+DElNTzCjzfHc7SAJoupO32H7CoiTe5zPUlWlxJ1zLYkK1gt50q
m8PRBiYTglxyznzrO0drtcdjEzvbdZNRrsYnul4wi1vSHzjk6F6XLtzT10XWM1M1
1pXLB8384FTj0Hu4bq6Y3Aivkmz0Sf+eQM2NaOwe+Zj7/1VV0d3lvi4LUXkqzLCA
FoXPJSMxG2Qu+iflCeYRQBJjExaZH3eNLZ3dT6QpcJrjaFVedd9u5DeeFqNL27zV
bhr8TdqrR4p4rc8EBAGoCapw96IxLZROKB3gxbrZVOpfIZpzthwHbElHX6aqUgF4
w/EV1JWs36WXWaxFk8wd
=ttq9
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- skip AER driver error recovery callbacks for correctable errors
reported via ACPI APEI, as we already do for errors reported via the
native path (Tyler Baicar)
- fix DPC shared interrupt handling (Alex Williamson)
- print full DPC interrupt number (Keith Busch)
- enable DPC only if AER is available (Keith Busch)
- simplify DPC code (Bjorn Helgaas)
- calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
Helgaas)
- enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
Helgaas)
- move ASPM internal interfaces out of public header (Bjorn Helgaas)
- allow hot-removal of VGA devices (Mika Westerberg)
- speed up unplug and shutdown by assuming Thunderbolt controllers
don't support Command Completed events (Lukas Wunner)
- add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
Jay Cornwall)
- expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)
- clean up PCI DMA interface usage (Christoph Hellwig)
- remove PCI pool API (replaced with DMA pool) (Romain Perier)
- deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
Kaya)
- move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)
- add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)
- remove warnings on sysfs mmap failure (Bjorn Helgaas)
- quiet ROM validation messages (Alex Deucher)
- remove redundant memory alloc failure messages (Markus Elfring)
- fill in types for compile-time VGA and other I/O port resources
(Bjorn Helgaas)
- make "pci=pcie_scan_all" work for Root Ports as well as Downstream
Ports to help AmigaOne X1000 (Bjorn Helgaas)
- add SPDX tags to all PCI files (Bjorn Helgaas)
- quirk Marvell 9128 DMA aliases (Alex Williamson)
- quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)
- fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
Cassel)
- use DMA API to get MSI address for DesignWare IP (Niklas Cassel)
- fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)
- fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)
- add support for ARTPEC-7 SoC (Niklas Cassel)
- add endpoint-mode support for ARTPEC (Niklas Cassel)
- add Cadence PCIe host and endpoint controller driver (Cyrille
Pitchen)
- handle multiple INTx status bits being set in dra7xx (Vignesh R)
- translate dra7xx hwirq range to fix INTD handling (Vignesh R)
- remove deprecated Exynos PHY initialization code (Jaehoon Chung)
- fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)
- fix NULL pointer dereference in iProc BCMA driver (Ray Jui)
- fix Keystone interrupt-controller-node lookup (Johan Hovold)
- constify qcom driver structures (Julia Lawall)
- rework Tegra config space mapping to increase space available for
endpoints (Vidya Sagar)
- simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)
- remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)
- add support for Global Fabric Manager Server (GFMS) event to
Microsemi Switchtec switch driver (Logan Gunthorpe)
- add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)
* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
PCI: endpoint: Fix EPF device name to support multi-function devices
PCI: endpoint: Add the function number as argument to EPC ops
PCI: cadence: Add host driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
PCI: Add vendor ID for Cadence
PCI: Add generic function to probe PCI host controllers
PCI: generic: fix missing call of pci_free_resource_list()
PCI: OF: Add generic function to parse and allocate PCI resources
PCI: Regroup all PCI related entries into drivers/pci/Makefile
PCI/DPC: Reformat DPC register definitions
PCI/DPC: Add and use DPC Status register field definitions
PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
PCI/DPC: Remove unnecessary RP PIO register structs
PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
PCI/DPC: Make RP PIO log size check more generic
PCI/DPC: Rename local "status" to "dpc_status"
PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
...
* pci/spdx:
PCI: Add SPDX GPL-2.0+ to replace implicit GPL v2 or later statement
PCI: Add SPDX GPL-2.0+ to replace GPL v2 or later boilerplate
PCI: Add SPDX GPL-2.0 to replace COPYING boilerplate
PCI: Add SPDX GPL-2.0 to replace GPL v2 boilerplate
PCI: Add SPDX GPL-2.0 when no license was specified
* pci/misc:
PCI: Add dummy pci_irqd_intx_xlate() for CONFIG_PCI=n build
PCI: Add wrappers for dev_printk()
PCI: Remove unnecessary messages for memory allocation failures
PCI: Add #defines for Completion Timeout Disable feature
hinic: Replace PCI pool old API
net: e100: Replace PCI pool old API
block: DAC960: Replace PCI pool old API
MAINTAINERS: Include more PCI files
PCI: Remove unneeded kallsyms include
powerpc/pci: Unroll two pass loop when scanning bridges
powerpc/pci: Use for_each_pci_bridge() helper
b24413180f ("License cleanup: add SPDX GPL-2.0 license identifier to
files with no license") added SPDX GPL-2.0 to several PCI files that
previously contained no license information.
Add SPDX GPL-2.0 to all other PCI files that did not contain any license
information and hence were under the default GPL version 2 license of the
kernel.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Atomic Operations feature (PCIe r4.0, sec 6.15) allows atomic
transctions to be requested by, routed through and completed by PCIe
components. Routing and completion do not require software support.
Component support for each is detectable via the DEVCAP2 register.
A Requester may use AtomicOps only if its PCI_EXP_DEVCTL2_ATOMIC_REQ is
set. This should be set only if the Completer and all intermediate routing
elements support AtomicOps.
A concrete example is the AMD Fiji-class GPU (which is capable of making
AtomicOp requests), below a PLX 8747 switch (advertising AtomicOp routing)
with a Haswell host bridge (advertising AtomicOp completion support).
Add pci_enable_atomic_ops_to_root() for per-device control over AtomicOp
requests. This checks to be sure the Root Port supports completion of the
desired AtomicOp sizes and the path to the Root Port supports routing the
AtomicOps.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[bhelgaas: changelog, comments, whitespace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add PCI-specific dev_printk() wrappers and use them to simplify the code
slightly. No functional change intended.
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
[bhelgaas: squash into one patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add pcim_set_mwi(), a device-managed version of pci_set_mwi().
First user is the Realtek r8169 driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* pci/host-thunder:
PCI: Avoid slot reset if bridge itself is broken
PCI: Avoid bus reset if bridge itself is broken
PCI: Mark Cavium CN8xxx to avoid bus reset
* pci/virtualization:
PCI: Document reset method return values
PCI: Detach driver before procfs & sysfs teardown on device remove
PCI: Apply Cavium ThunderX ACS quirk to more Root Ports
PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
PCI: Restore ARI Capable Hierarchy before setting numVFs
PCI: Create SR-IOV virtfn/physfn links before attaching driver
PCI: Expose SR-IOV offset, stride, and VF device ID via sysfs
PCI: Cache the VF device ID in the SR-IOV structure
PCI: Add Kconfig PCI_IOV dependency for PCI_REALLOC_ENABLE_AUTO
PCI: Remove unused function __pci_reset_function()
PCI: Remove reset argument from pci_iov_{add,remove}_virtfn()
* pci/resource:
PCI: Fail pci_map_rom() if the option ROM is invalid
PCI: Move pci_map_rom() error path
x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)
PCI: Add pci_resize_resource() for resizing BARs
PCI: Add resizable BAR infrastructure
PCI: Add PCI resource type mask #define
Fix build error in kernel-doc notation:
../drivers/pci/pci.c:3479: ERROR: Unexpected indentation.
"::" tells the kernel-doc "reStructuredText" processor that the following
block is a literal block of some blob that should be kept as is.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
[bhelgaas: add hint about "::" meaning]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Replace the PCI-specific flag PCI_DEV_FLAGS_NEEDS_RESUME with the
PM core's DPM_FLAG_NEVER_SKIP one everywhere and drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
The pci_reset_function() path may try several different reset methods:
device-specific resets, PCIe Function Level Resets, PCI Advanced Features
Function Level Reset, etc.
Add a comment about what the return values from these methods mean. If one
of the methods fails, in some cases we want to continue and try the next
one in the list, but sometimes we want to stop trying.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add resizable BAR infrastructure, including defines and helper functions to
read the possible sizes of a BAR and update its size. See PCIe r3.1, sec
7.22.
Link: https://pcisig.com/sites/default/files/specification_documents/ECN_Resizable-BAR_24Apr2008.pdf
Signed-off-by: Christian König <christian.koenig@amd.com>
[bhelgaas: rename to functions with "rebar" (to match #defines), drop shift
#defines, drop "_MASK" suffixes, fix typos, fix kerneldoc]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
When checking to see if a PCI slot can safely be reset, we previously
checked to see if any of the children had their PCI_DEV_FLAGS_NO_BUS_RESET
flag set.
Some PCIe root port bridges do not behave well after a slot reset, and may
cause the device in the slot to become unusable.
Add a check for PCI_DEV_FLAGS_NO_BUS_RESET being set in the bridge device
to prevent the slot from being reset.
Signed-off-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
When checking to see if a PCI bus can safely be reset, we previously
checked to see if any of the children had their PCI_DEV_FLAGS_NO_BUS_RESET
flag set. Children marked with that flag are known not to behave well
after a bus reset.
Some PCIe root port bridges also do not behave well after a bus reset,
sometimes causing the devices behind the bridge to become unusable.
Add a check for PCI_DEV_FLAGS_NO_BUS_RESET being set in the bridge device
to allow these bridges to be flagged, and prevent their secondary buses
from being reset.
Signed-off-by: David Daney <david.daney@cavium.com>
[jglauber@cavium.com: fixed typo]
Signed-off-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
The last caller of __pci_reset_function() has been removed. Remove the
function as well.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This reverts commit 40f11adc7c.
Jens found that iwlwifi firmware loading failed on a Lenovo X1 Carbon,
gen4:
iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-34.ucode failed with error -2
iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-33.ucode failed with error -2
iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-32.ucode failed with error -2
iwlwifi 0000:04:00.0: loaded firmware version 31.532993.0 op_mode iwlmvm
iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208
...
iwlwifi 0000:04:00.0: Failed to load firmware chunk!
iwlwifi 0000:04:00.0: Could not load the [0] uCode section
iwlwifi 0000:04:00.0: Failed to start INIT ucode: -110
iwlwifi 0000:04:00.0: Failed to run INIT ucode: -110
He bisected it to 40f11adc7c ("PCI: Avoid race while enabling upstream
bridges"). Revert that commit to fix the regression.
Link: http://lkml.kernel.org/r/4bcbcbc1-7c79-09f0-5071-bc2f53bf6574@kernel.dk
Fixes: 40f11adc7c ("PCI: Avoid race while enabling upstream bridges")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Srinath Mannam <srinath.mannam@broadcom.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Luca Coelho <luca@coelho.fi>
CC: Johannes Berg <johannes@sipsolutions.net>
CC: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
EG76ridTolUFVV+wzJD9
=9cLP
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add enhanced Downstream Port Containment support, which prints more
details about Root Port Programmed I/O errors (Dongdong Liu)
- add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)
- add MediaTek MT2712 and MT7622 support (Ryder Lee)
- add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)
- add Qualcom IPQ8074 support (Varadarajan Narayanan)
- add R-Car r8a7743/5 device tree support (Biju Das)
- add Rockchip per-lane PHY support for better power management (Shawn
Lin)
- fix IRQ mapping for hot-added devices by replacing the
pci_fixup_irqs() boot-time design with a host bridge hook called at
probe-time (Lorenzo Pieralisi, Matthew Minter)
- fix race when enabling two devices that results in upstream bridge
not being enabled correctly (Srinath Mannam)
- fix pciehp power fault infinite loop (Keith Busch)
- fix SHPC bridge MSI hotplug events by enabling bus mastering
(Aleksandr Bezzubikov)
- fix a VFIO issue by correcting PCIe capability sizes (Alex
Williamson)
- fix an INTD issue on Xilinx and possibly other drivers by unifying
INTx IRQ domain support (Paul Burton)
- avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
Roedel)
- allow APM X-Gene device assignment to guests by adding an ACS quirk
(Feng Kan)
- fix driver crashes by disabling Extended Tags on Broadcom HT2100
(Extended Tags support is required for PCIe Receivers but not
Requesters, and we now enable them by default when Requesters support
them) (Sinan Kaya)
- fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
use the real Requester ID (not a phantom RID) (Robin Murphy)
- prevent assignment of Intel VMD children to guests (which may be
supported eventually, but isn't yet) by not associating an IOMMU with
them (Jon Derrick)
- fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
Bauer)
- fix a Function-Level Reset issue with Intel 750 NVMe by waiting
longer (up to 60sec instead of 1sec) for device to become ready
(Sinan Kaya)
- fix a Function-Level Reset issue on iProc Stingray by working around
hardware defects in the CRS implementation (Oza Pawandeep)
- fix an issue with Intel NVMe P3700 after an iProc reset by adding a
delay during shutdown (Oza Pawandeep)
- fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
in compose_msi_msg() (Stephen Hemminger)
- fix a wireless LAN driver timeout by clearing DesignWare MSI
interrupt status after it is handled, not before (Faiz Abbas)
- fix DesignWare ATU enable checking (Jisheng Zhang)
- reduce Layerscape dependencies on the bootloader by doing more
initialization in the driver (Hou Zhiqiang)
- improve Intel VMD performance allowing allocation of more IRQ vectors
than present CPUs (Keith Busch)
- improve endpoint framework support for initial DMA mask, different
BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
Vijay Abraham I, Stan Drozd)
- rework CRS support to add periodic messages while we poll during
enumeration and after Function-Level Reset and prepare for possible
other uses of CRS (Sinan Kaya)
- clean up Root Port AER handling by removing unnecessary code and
moving error handler methods to struct pcie_port_service_driver
(Christoph Hellwig)
- clean up error handling paths in various drivers (Bjorn Andersson,
Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
Lorenzo Pieralisi, Sergei Shtylyov)
- clean up SR-IOV resource handling by disabling VF decoding before
updating the corresponding resource structs (Gavin Shan)
- clean up DesignWare-based drivers by unifying quirks to update Class
Code and Interrupt Pin and related handling of write-protected
registers (Hou Zhiqiang)
- clean up by adding empty generic pcibios_align_resource() and
pcibios_fixup_bus() and removing empty arch-specific implementations
(Palmer Dabbelt)
- request exclusive reset control for several drivers to allow cleanup
elsewhere (Philipp Zabel)
- constify various structures (Arvind Yadav, Bhumika Goyal)
- convert from full_name() to %pOF (Rob Herring)
- remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
Lin)
* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
PCI: xgene: Clean up whitespace
PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
PCI: xgene: Fix platform_get_irq() error handling
PCI: xilinx-nwl: Fix platform_get_irq() error handling
PCI: rockchip: Fix platform_get_irq() error handling
PCI: altera: Fix platform_get_irq() error handling
PCI: spear13xx: Fix platform_get_irq() error handling
PCI: artpec6: Fix platform_get_irq() error handling
PCI: armada8k: Fix platform_get_irq() error handling
PCI: dra7xx: Fix platform_get_irq() error handling
PCI: exynos: Fix platform_get_irq() error handling
PCI: iproc: Clean up whitespace
PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
PCI: iproc: Add 500ms delay during device shutdown
PCI: Fix typos and whitespace errors
PCI: Remove unused "res" variable from pci_resource_io()
PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
PCI/AER: Reformat AER register definitions
iommu/vt-d: Prevent VMD child devices from being remapping targets
x86/PCI: Use is_vmd() rather than relying on the domain number
...
Sporadic reset issues have been observed with an Intel 750 NVMe drive while
assigning the physical function to the guest machine. The sequence of
events observed is as follows:
- perform a Function Level Reset (FLR)
- sleep up to 1000ms total
- read ~0 from PCI_COMMAND (CRS completion for config read)
- warn that the device didn't return from FLR
- touch the device before it's ready
- device drops config writes when we restore register settings (there's
no mechanism for software to learn about CRS completions for writes)
- incomplete register restore leaves device in inconsistent state
- device probe fails because device is in inconsistent state
After reset, an endpoint may respond to config requests with Configuration
Request Retry Status (CRS) to indicate that it is not ready to accept new
requests. See PCIe r3.1, sec 2.3.1 and 6.6.2.
Increase the timeout value from 1 second to 60 seconds to cover the period
where device responds with CRS and also report polling progress.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[bhelgaas: include the mandatory 100ms in the delays we print]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Now that we have a custom printf format specifier, convert users of
full_name() to use %pOF instead. This is preparation for removing storing
of the full path string for each node.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
When we enable a device, we first enable any upstream bridges. If a bridge
has multiple downstream devices and we enable them simultaneously, the race
to enable the upstream bridge may cause problems. Consider this hierarchy:
bridge A --+-- device B
+-- device C
If drivers for B and C call pci_enable_device() simultaneously, both will
attempt to enable A, which involves setting PCI_COMMAND_MASTER via
pci_set_master() and PCI_COMMAND_MEMORY via pci_enable_resources().
In the following sequence, B's update to set A's PCI_COMMAND_MEMORY is
lost, and neither B nor C will work correctly:
B C
pci_set_master(A)
cmd = read(A, PCI_COMMAND)
cmd |= PCI_COMMAND_MASTER
pci_set_master(A)
cmd = read(A, PCI_COMMAND)
cmd |= PCI_COMMAND_MASTER
write(A, PCI_COMMAND, cmd)
pci_enable_device(A)
pci_enable_resources(A)
cmd = read(A, PCI_COMMAND)
cmd |= PCI_COMMAND_MEMORY
write(A, PCI_COMMAND, cmd)
write(A, PCI_COMMAND, cmd)
Avoid this race by holding a new pci_bridge_mutex while enabling a bridge.
This ensures that both PCI_COMMAND_MASTER and PCI_COMMAND_MEMORY will be
updated before another thread can start enabling the bridge.
Note that although pci_enable_bridge() is recursive, it enables any
upstream bridges *before* acquiring the mutex. When it acquires the mutex
and calls pci_set_master() and pci_enable_device(), any upstream bridges
have already been enabled so pci_enable_device() will not deadlock by
calling pci_enable_bridge() again.
Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
[bhelgaas: changelog, comment]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If the pci_find_pcie_root_port() function is called on a root port
itself, return the root port rather than NULL.
This effectively reverts commit 0e40523287 ("PCI: fix oops when
try to find Root Port for a PCI device") which added an extra check
that would now be redundant.
Fixes: a99b646afa ("PCI: Disable PCIe Relaxed Ordering if unsupported")
Fixes: c56d4450eb ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix TCP checksum offload handling in iwlwifi driver, from Emmanuel
Grumbach.
2) In ksz DSA tagging code, free SKB if skb_put_padto() fails. From
Vivien Didelot.
3) Fix two regressions with bonding on wireless, from Andreas Born.
4) Fix build when busypoll is disabled, from Daniel Borkmann.
5) Fix copy_linear_skb() wrt. SO_PEEK_OFF, from Eric Dumazet.
6) Set SKB cached route properly in inet_rtm_getroute(), from Florian
Westphal.
7) Fix PCI-E relaxed ordering handling in cxgb4 driver, from Ding
Tianhong.
8) Fix module refcnt leak in ULP code, from Sabrina Dubroca.
9) Fix use of GFP_KERNEL in atomic contexts in AF_KEY code, from Eric
Dumazet.
10) Need to purge socket write queue in dccp_destroy_sock(), also from
Eric Dumazet.
11) Make bpf_trace_printk() work properly on 32-bit architectures, from
Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
bpf: fix bpf_trace_printk on 32 bit archs
PCI: fix oops when try to find Root Port for a PCI device
sfc: don't try and read ef10 data on non-ef10 NIC
net_sched: remove warning from qdisc_hash_add
net_sched/sfq: update hierarchical backlog when drop packet
net_sched: reset pointers to tcf blocks in classful qdiscs' destructors
ipv4: fix NULL dereference in free_fib_info_rcu()
net: Fix a typo in comment about sock flags.
ipv6: fix NULL dereference in ip6_route_dev_notify()
tcp: fix possible deadlock in TCP stack vs BPF filter
dccp: purge write queue in dccp_destroy_sock()
udp: fix linear skb reception with PEEK_OFF
ipv6: release rt6->rt6i_idev properly during ifdown
af_key: do not use GFP_KERNEL in atomic contexts
tcp: ulp: avoid module refcnt leak in tcp_set_ulp
net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
PCI: Disable Relaxed Ordering Attributes for AMD A1100
PCI: Disable Relaxed Ordering for some Intel processors
PCI: Disable PCIe Relaxed Ordering if unsupported
...
Add two reasons for returning 0 value to the description of
pci_set_power_state() to include the cases when:
- the transition is to D1 or D2 but D1 and D2 are not supported
- the transition is to D3 but D3 is not supported
Signed-off-by: Piotr Gregor <piotrgregor@rsyncme.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The implementation of PCI workarounds may require that the device is reset
from its probe function. This implies that the PCI device lock is already
held, and makes calling pci_reset_function() impossible (since it will
itself try to take that lock).
Add pci_reset_function_locked(), which is the equivalent of
pci_reset_function(), except that it requires the PCI device lock to be
already held by the caller.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[bhelgaas: folded in fix for conflict with 52354b9d1f ("PCI: Remove
__pci_dev_reset() and pci_dev_reset()")]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # 4.11: 52354b9d1f: PCI: Remove __pci_dev_reset() and pci_dev_reset()
Cc: stable@vger.kernel.org # 4.11
PCI bridges only have a reason to generate wakeup signals on behalf
of devices below them, so avoid preparing bridges for wakeup directly
in pci_enable_wake().
Also drop the pci_has_subordinate() check from pci_pm_default_resume()
as this will be done by pci_enable_wake() itself now.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Commit dc15e71eef (PCI / PM: Restore PME Enable if skipping wakeup
setup) introduced a mechanism by which the PME Enable bit can be
restored by pci_enable_wake() if dev->wakeup_prepared is set in
case it has been overwritten by PCI config space restoration.
However, that commit overlooked the fact that on some systems (Dell
XPS13 9360 in particular) the AML handling wakeup events checks PME
Status and PME Enable and it won't trigger a Notify() for devices
where those bits are not set while it is running.
That happens during resume from suspend-to-idle when pci_restore_state()
invoked by pci_pm_default_resume_early() clears PME Enable before the
wakeup events are processed by AML, effectively causing those wakeup
events to be ignored.
Fix this issue by restoring the PME Enable configuration right after
pci_restore_state() has been called instead of doing that in
pci_enable_wake().
Fixes: dc15e71eef (PCI / PM: Restore PME Enable if skipping wakeup setup)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/virtualization:
PCI: Remove __pci_dev_reset() and pci_dev_reset()
PCI: Split ->reset_notify() method into ->reset_prepare() and ->reset_done()
PCI: Protect pci_error_handlers->reset_notify() usage with device_lock()
PCI: Protect pci_driver->sriov_configure() usage with device_lock()
PCI: Mark Intel XXV710 NIC INTx masking as broken
PCI: Restore PRI and PASID state after Function-Level Reset
PCI: Cache PRI and PASID bits in pci_dev
The pci_error_handlers->reset_notify() method had a flag to indicate
whether to prepare for or clean up after a reset. The prepare and done
cases have no shared functionality whatsoever, so split them into separate
methods.
[bhelgaas: changelog, update locking comments]
Link: http://lkml.kernel.org/r/20170601111039.8913-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/resource:
PCI: Work around poweroff & suspend-to-RAM issue on Macbook Pro 11
PCI: Do not disregard parent resources starting at 0x0
Conflicts:
arch/x86/pci/fixup.c
* pci/pm:
PCI/PM: Avoid using device_may_wakeup() for runtime PM
x86/PCI: Avoid AMD SB7xx EHCI USB wakeup defect
PCI/PM: Restore the status of PCI devices across hibernation
drm/radeon: make MacBook Pro d3_delay quirk more generic
drm/amdgpu: remove unnecessary save/restore of pdev->d3_delay
PCI/PM: Add needs_resume flag to avoid suspend complete optimization
PCI: imx6: Fix config read timeout handling
switchtec: Fix minor bug with partition ID register
switchtec: Use new cdev_device_add() helper function
PCI: endpoint: Make PCI_ENDPOINT depend on HAS_DMA
pci_target_state() calls device_may_wakeup() which checks whether or not
the device may wake up the system from sleep states, but pci_target_state()
is used for runtime PM too.
Since runtime PM is expected to always enable remote wakeup if possible,
modify pci_target_state() to take additional argument indicating whether or
not it should look for a state from which the device can signal wakeup and
pass either the return value of device_can_wakeup(), or "false" (if the
device itself is not wakeup-capable) to it from the code related to runtime
PM.
While at it, fix the comment in pci_dev_run_wake() which is not about sleep
states.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
The run_wake flag in struct dev_pm_info is used to indicate whether
or not the device is capable of generating remote wakeup signals at
run time (or in the system working state), but the distinction
between runtime remote wakeup and system wakeup signaling has always
been rather artificial. The only practical reason for it to exist
at the core level was that ACPI and PCI treated those two cases
differently, but that's not the case any more after recent changes.
For this reason, get rid of the run_wake flag and, when applicable,
use device_set_wakeup_capable() and device_can_wakeup() instead of
device_set_run_wake() and device_run_wake(), respectively.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
After previous changes it is not necessary to distinguish between
device wakeup for run time and device wakeup from system sleep states
any more, so rework the PCI device wakeup settings code accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
The test for INTx masking via PCI_COMMAND_INTX_DISABLE performed in
pci_intx_mask_supported() should be done before the device can be used.
This is to avoid writing PCI_COMMAND while the driver owns the device, in
case that has any effect on MSI/MSI-X interrupts.
Move the content of pci_intx_mask_supported() to pci_intx_mask_broken() and
call it from pci_setup_device().
The test result can be queried at any time later using the same
pci_intx_mask_supported() interface as before (though with changed
implementation), so callers (uio, vfio) should be unaffected.
Signed-off-by: Piotr Gregor <piotrgregor@rsyncme.org>
[bhelgaas: changelog, remove quirk check, remove locking, move
dev->broken_intx_masking assignment to caller]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Every method in struct device_driver or structures derived from it like
struct pci_driver MUST provide exclusion vs the driver's ->remove() method,
usually by using device_lock().
Protect use of pci_error_handlers->reset_notify() by holding the device
lock while calling it.
Note:
- pci_dev_lock() calls device_lock() in addition to blocking user-space
config accesses.
- pci_err_handlers->reset_notify() is used inside
pci_dev_save_and_disable() and pci_dev_restore(). We could hold the
device lock directly in pci_reset_notify(), but we expand the region
since we have several calls following each other.
Without this, ->reset_notify() may race with ->remove() calls, which can be
easily triggered in NVMe.
[bhelgaas: changelog, add pci_reset_notify() comment]
[bhelgaas: fold in fix from Dan Carpenter <dan.carpenter@oracle.com>:
http://lkml.kernel.org/r/20170701135323.x5vaj4e2wcs2mcro@mwanda]
Link: http://lkml.kernel.org/r/20170601111039.8913-2-hch@lst.de
Reported-by: Rakesh Pandit <rakesh@tuxera.com>
Tested-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The wakeup_prepared PCI device flag is used for preventing subsequent
changes of PCI device wakeup settings in the same way (e.g. enabling
device wakeup twice in a row).
However, in some cases PME Enable may be updated by things like PCI
configuration space restoration in the meantime and it may need to be
set again even though the rest of the settings need not change, so
modify __pci_enable_wake() to do that when it is about to return
early.
Also, it is reasonable to expect that __pci_enable_wake() will always
clear PME Status when invoked to disable device wakeup, so make it do
so even if it is going to return early then.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
After a Function-Level Reset, PCI states need to be restored. Save PASID
features and PRI reqs cached.
[bhelgaas: search for capability only if PRI/PASID were enabled]
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jean-Phillipe Brucker <jean-philippe.brucker@arm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Some drivers - like i915 - may not support the system suspend direct
complete optimization due to differences in their runtime and system
suspend sequence. Add a flag that when set resumes the device before
calling the driver's system suspend handlers which effectively disables
the optimization.
Needed by a future patch fixing suspend/resume on i915.
Suggested by Rafael.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@vger.kernel.org
Commit f44116ae88 ("PCI: Remove pci_find_parent_resource() use for
allocation") updated the logic that iterates over all bus resources and
compares them to a given resource, in order to decide whether one is the
parent of the latter.
This change inadvertently causes pci_find_parent_resource() to disregard
resources starting at address 0x0, resulting in an error such as the one
below on ARM systems whose I/O window starts at 0x0.
pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff window]
pci_bus 0000:00: root bus resource [io 0x0000-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff window]
pci_bus 0000:00: root bus resource [bus 00-0f]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:02.0: PCI bridge to [bus 02]
pci 0000:00:03.0: PCI bridge to [bus 03]
pci 0000:00:03.0: can't claim BAR 13 [io 0x0000-0x0fff]: no compatible bridge window
pci 0000:03:01.0: can't claim BAR 0 [io 0x0000-0x001f]: no compatible bridge window
While this never happens on x86, it is perfectly legal in general for a PCI
MMIO or IO window to start at address 0x0, and it was supported in the code
before commit f44116ae88.
Drop the test for res->start != 0; resource_contains() already checks
whether [start, end) completely covers the resource, and so it should be
redundant.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/virtualization:
ixgbe: Use pcie_flr() instead of duplicating it
IB/hfi1: Use pcie_flr() instead of duplicating it
PCI: Call pcie_flr() from reset_chelsio_generic_dev()
PCI: Call pcie_flr() from reset_intel_82599_sfp_virtfn()
PCI: Export pcie_flr()
PCI: Add sysfs sriov_drivers_autoprobe to control VF driver binding
PCI: Avoid FLR for Intel 82579 NICs
Conflicts:
include/linux/pci.h
* pci/resource:
PCI: Don't resize resources when realigning all devices in system
PCI: Don't reassign resources that are already aligned
PCI: Factor pci_reassigndev_resource_alignment()
powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
PCI: Add pcibios_default_alignment() for arch-specific alignment control
PCI: Fix calculation of bridge window's size and alignment
PCI: Ignore requested alignment for IOV BARs
PCI: Make PCI_ROM_ADDRESS_MASK a 32-bit constant
* pci/ioremap:
PCI: versatile: Update PCI config space remap function
PCI: keystone-dw: Update PCI config space remap function
PCI: layerscape: Update PCI config space remap function
PCI: hisi: Update PCI config space remap function
PCI: tegra: Update PCI config space remap function
PCI: xgene: Update PCI config space remap function
PCI: armada8k: Update PCI config space remap function
PCI: designware: Update PCI config space remap function
PCI: iproc-platform: Update PCI config space remap function
PCI: qcom: Update PCI config space remap function
PCI: rockchip: Update PCI config space remap function
PCI: spear13xx: Update PCI config space remap function
PCI: xilinx-nwl: Update PCI config space remap function
PCI: xilinx: Update PCI config space remap function
PCI: ECAM: Map config region with pci_remap_cfgspace()
PCI: Implement devm_pci_remap_cfgspace()
devres: fix devm_ioremap_*() offset parameter kerneldoc description
ARM: Implement pci_remap_cfgspace() interface
ARM64: Implement pci_remap_cfgspace() interface
linux/io.h: Add pci_remap_cfgspace() interface
PCI: Remove __weak tag from pci_remap_iospace()
The introduction of the pci_remap_cfgspace() interface allows PCI host
controller drivers to map PCI config space through a dedicated kernel
interface. Current PCI host controller drivers use the devm_ioremap_*()
devres interfaces to map PCI configuration space regions so in order to
update them to the new pci_remap_cfgspace() mapping interface a new set of
devres interfaces should be implemented so that PCI host controller drivers
can make use of them.
Introduce two new functions in the PCI kernel layer and Devres
documentation:
- devm_pci_remap_cfgspace()
- devm_pci_remap_cfg_resource()
so that PCI host controller drivers can make use of them to map PCI
configuration space regions.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
These are useful for PCIe host drivers, and those drivers can be modules.
[bhelgaas: don't remove __weak; it's removed elsewhere]
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
Currently we opencode the FLR sequence in lots of place; export a core
helper instead. We split out the probing for FLR support as all the
non-core callers already know their hardware.
Note that in the new pci_has_flr() function the quirk check has been moved
before the capability check as there is no point in reading the capability
in this case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_remap_iospace() is marked as a weak symbol even though no architecture
is currently overriding it; given that its implementation internals have
already code paths that are arch specific (ie PCI_IOBASE and
ioremap_page_range() attributes) there is no need to leave the weak symbol
in the kernel since the same functionality can be achieved by customizing
per-arch the corresponding functionality.
Remove the __weak symbol from pci_remap_iospace().
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
The "pci=resource_alignment" argument aligns BARs of designated devices by
artificially increasing their size. Increasing the size increases the
alignment and prevents other resources from being assigned in the same
alignment region, e.g., in the same page, but it can break drivers that use
the BAR size to locate things, e.g., ilo_map_device() does this:
off = pci_resource_len(pdev, bar) - 0x2000;
The new pcibios_default_alignment() interface allows an arch to request
that *all* BARs in the system be aligned to a larger size. In this case,
we don't need to artificially increase the resource size because we know
every BAR of every device will be realigned, so nothing will share the same
alignment region.
Use IORESOURCE_STARTALIGN to request realignment of PCI BARs when we know
we're realigning all BARs in the system.
[bhelgaas: comment, changelog]
Signed-off-by: Yongji Xie <elohimes@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The "pci=resource_alignment=" kernel argument designates devices for which
we want alignment greater than is required by the PCI specs. Previously we
set IORESOURCE_UNSET for every MEM resource of those devices, even if the
resource was *already* sufficiently aligned.
If a resource is already sufficiently aligned, leave it alone and don't try
to reassign it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Pull the BAR size adjustment out into a new function,
pci_request_resource_alignment(), and add a comment about how and why we
increase the resource size and alignment.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When VFIO passes through a PCI device to a guest, it does not allow the
guest to mmap BARs that are smaller than PAGE_SIZE unless it can reserve
the rest of the page (see vfio_pci_probe_mmaps()). This is because a page
might contain several small BARs for unrelated devices and a guest should
not be able to access all of them.
VFIO emulates guest accesses to non-mappable BARs, which is functional but
slow. On systems with large page sizes, e.g., PowerNV with 64K pages, BARs
are more likely to share a page and performance is more likely to be a
problem.
Add a weak function to set default alignment for all PCI devices. An arch
can override it to force the PCI core to place memory BARs on their own
pages.
Signed-off-by: Yongji Xie <elohimes@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Laurent Pinchart reported that the Renesas R-Car H2 Lager board (r8a7790)
crashes during suspend tests. Geert Uytterhoeven managed to reproduce the
issue on an M2-W Koelsch board (r8a7791):
It occurs when the PME scan runs, once per second. During PME scan, the
PCI host bridge (rcar-pci) registers are accessed while its module clock
has already been disabled, leading to the crash.
One reproducer is to configure s2ram to use "s2idle" instead of "deep"
suspend:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo s2idle > /sys/power/mem_sleep
# echo mem > /sys/power/state
Another reproducer is to write either "platform" or "processors" to
/sys/power/pm_test. It does not (or is less likely) to happen during full
system suspend ("core" or "none") because system suspend also disables
timers, and thus the workqueue handling PME scans no longer runs. Geert
believes the issue may still happen in the small window between disabling
module clocks and disabling timers:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test # Or "processors"
# echo mem > /sys/power/state
(Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are enabled.)
Rafael Wysocki agrees that PME scans should be suspended before the host
bridge registers become inaccessible. To that end, queue the task on a
workqueue that gets frozen before devices suspend.
Rafael notes however that as a result, some wakeup events may be missed if
they are delivered via PME from a device without working IRQ (which hence
must be polled) and occur after the workqueue has been frozen. If that
turns out to be an issue in practice, it may be possible to solve it by
calling pci_pme_list_scan() once directly from one of the host bridge's
pm_ops callbacks.
Stacktrace for posterity:
PM: Syncing filesystems ... [ 38.566237] done.
PM: Preparing system for sleep (mem)
Freezing user space processes ... [ 38.579813] (elapsed 0.001 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
PM: Suspending system (mem)
PM: suspend of devices complete after 152.456 msecs
PM: late suspend of devices complete after 2.809 msecs
PM: noirq suspend of devices complete after 29.863 msecs
suspend debug: Waiting for 5 second(s).
Unhandled fault: asynchronous external abort (0x1211) at 0x00000000
pgd = c0003000
[00000000] *pgd=80000040004003, *pmd=00000000
Internal error: : 1211 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted
4.9.0-rc1-koelsch-00011-g68db9bc814362e7f #3383
Hardware name: Generic R8A7791 (Flattened Device Tree)
Workqueue: events pci_pme_list_scan
task: eb56e140 task.stack: eb58e000
PC is at pci_generic_config_read+0x64/0x6c
LR is at rcar_pci_cfg_base+0x64/0x84
pc : [<c041d7b4>] lr : [<c04309a0>] psr: 600d0093
sp : eb58fe98 ip : c041d750 fp : 00000008
r10: c0e2283c r9 : 00000000 r8 : 600d0013
r7 : 00000008 r6 : eb58fed6 r5 : 00000002 r4 : eb58feb4
r3 : 00000000 r2 : 00000044 r1 : 00000008 r0 : 00000000
Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 30c5387d Table: 6a9f6c80 DAC: 55555555
Process kworker/1:1 (pid: 20, stack limit = 0xeb58e210)
Stack: (0xeb58fe98 to 0xeb590000)
fe80: 00000002 00000044
fea0: eb6f5800 c041d9b0 eb58feb4 00000008 00000044 00000000 eb78a000 eb78a000
fec0: 00000044 00000000 eb9aff00 c0424bf0 eb78a000 00000000 eb78a000 c0e22830
fee0: ea8a6fc0 c0424c5c eaae79c0 c0424ce0 eb55f380 c0e22838 eb9a9800 c0235fbc
ff00: eb55f380 c0e22838 eb55f380 eb9a9800 eb9a9800 eb58e000 eb9a9824 c0e02100
ff20: eb55f398 c02366c4 eb56e140 eb5631c0 00000000 eb55f380 c023641c 00000000
ff40: 00000000 00000000 00000000 c023a928 cd105598 00000000 40506a34 eb55f380
ff60: 00000000 00000000 dead4ead ffffffff ffffffff eb58ff74 eb58ff74 00000000
ff80: 00000000 dead4ead ffffffff ffffffff eb58ff90 eb58ff90 eb58ffac eb5631c0
ffa0: c023a844 00000000 00000000 c0206d68 00000000 00000000 00000000 00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 3a81336c 10ccd1dd
[<c041d7b4>] (pci_generic_config_read) from [<c041d9b0>]
(pci_bus_read_config_word+0x58/0x80)
[<c041d9b0>] (pci_bus_read_config_word) from [<c0424bf0>]
(pci_check_pme_status+0x34/0x78)
[<c0424bf0>] (pci_check_pme_status) from [<c0424c5c>] (pci_pme_wakeup+0x28/0x54)
[<c0424c5c>] (pci_pme_wakeup) from [<c0424ce0>] (pci_pme_list_scan+0x58/0xb4)
[<c0424ce0>] (pci_pme_list_scan) from [<c0235fbc>]
(process_one_work+0x1bc/0x308)
[<c0235fbc>] (process_one_work) from [<c02366c4>] (worker_thread+0x2a8/0x3e0)
[<c02366c4>] (worker_thread) from [<c023a928>] (kthread+0xe4/0xfc)
[<c023a928>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c)
Code: ea000000 e5903000 f57ff04f e3a00000 (e5843000)
---[ end trace 667d43ba3aa9e589 ]---
Fixes: df17e62e5b ("PCI: Add support for polling PME state on suspended legacy PCI devices")
Reported-and-tested-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@vger.kernel.org # 2.6.37+
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
We would call pci_reassigndev_resource_alignment() before
pci_init_capabilities(). So the requested alignment would never work for
IOV BARs.
Furthermore, it's meaningless to request additional alignment for IOV BARs,
the IOV BAR alignment is only determined by the VF BAR size.
Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Per Intel Specification Update 335553-002 (see link below), some 82579
network adapters advertise a Function Level Reset (FLR) capability, but
they can hang when an FLR is triggered.
To reproduce the problem, attach the device to a VM, then detach and try to
attach again.
Add a quirk to prevent the use of FLR on these devices.
[bhelgaas: changelog, comments]
Link: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/82579lm-82579v-gigabit-network-connection-spec-update.pdf
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If the PCI device is disconnected, return false immediately from
pci_device_is_present(). pci_device_is_present() uses the bus accessors,
so the early return in the device accessors doesn't help here.
Tested-by: Krishna Dhulipala <krishnad@fb.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Wei Zhang <wzhang@fb.com>
msleep() still sleeps 1 jiffy even when told to sleep for zero
milliseconds. That can end up being 1-2 milliseconds or more. In the
cases of d3_delay and d3cold_delay, that unnecessarily increases suspend
and/or resume latencies.
Do not sleep at all for the respective cases if d3_delay is zero or
d3cold_delay is zero.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>