linux/drivers/pci
Lukas Wunner ea401499e9 PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:

  RP -- UP -- DP -- UP -- DP (hotplug) -- EP

pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":

  For a Switch, the following must cause a hot reset to be sent on all
  Downstream Ports: [...]

  * The Data Link Layer of the Upstream Port reporting DL_Down status.
    In Switches that support Link speeds greater than 5.0 GT/s, the
    Upstream Port must direct the LTSSM of each Downstream Port to the
    Hot Reset state, but not hold the LTSSMs in that state. This permits
    each Downstream Port to begin Link training immediately after its
    hot reset completes. This behavior is recommended for all Switches.

  * Receiving a hot reset on the Upstream Port.

Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":

  If the Port is enabled for edge-triggered interrupt signaling using
  MSI or MSI-X, an interrupt message must be sent every time the logical
  AND of the following conditions transitions from FALSE to TRUE: [...]

  * The Hot-Plug Interrupt Enable bit in the Slot Control register is
    set to 1b.

  * At least one hot-plug event status bit in the Slot Status register
    and its associated enable bit in the Slot Control register are both
    set to 1b.

Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space.  Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.

Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.

Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.

Surprise removal during error recovery is detected through a Presence
Detect Changed event.  The hotplug port is expected to not signal that
event as a result of a Hot Reset.

The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset().  So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.

[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com/
Fixes: 6c2b374d74 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@wunner.de
Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v2.6.19+: ba952824e6: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch@kernel.org>
2021-10-15 14:23:46 -05:00
..
controller More ACPI updates for 5.15-rc1 2021-09-08 16:33:21 -07:00
endpoint pci-v5.15-changes 2021-09-07 19:13:42 -07:00
hotplug PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset 2021-10-15 14:23:46 -05:00
pcie PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset 2021-10-15 14:23:46 -05:00
switch PCI/sysfs: Use sysfs_emit() and sysfs_emit_at() in "show" functions 2021-06-03 22:14:47 -05:00
access.c Merge branch 'pci/misc' 2020-08-05 18:24:16 -05:00
ats.c PCI: Allow PASID on fake PCIe devices without TLP prefixes 2021-08-26 14:21:42 -05:00
bus.c PCI: Add device even if driver attach failed 2020-07-07 17:33:41 -05:00
ecam.c PCI: Dynamically map ECAM regions 2021-06-16 17:20:40 -05:00
host-bridge.c PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus 2021-09-02 17:59:58 +02:00
iov.c Merge branch 'pci/virtualization' 2021-07-06 10:56:26 -05:00
irq.c PCI: Remove unused pci_lost_interrupt() 2020-07-29 14:25:18 -05:00
Kconfig pci-v5.10-changes 2020-10-22 12:41:00 -07:00
Makefile PCI: Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy 2021-02-09 15:10:20 -06:00
mmap.c
msi.c Updates to the interrupt core and driver subsystems: 2021-08-30 14:38:37 -07:00
of.c PCI: of: Don't fail devm_pci_alloc_host_bridge() on missing 'ranges' 2021-08-04 12:20:00 +01:00
p2pdma.c Merge branch 'pci/sysfs' 2021-07-06 10:56:25 -05:00
pci-acpi.c PCI/ACPI: Don't reset a fwnode set by OF 2021-09-15 16:26:59 -05:00
pci-bridge-emul.c PCI: pci-bridge-emul: Fix array overruns, improve safety 2021-02-17 17:25:31 -06:00
pci-bridge-emul.h PCI: pci-bridge-emul: Add PCIe Root Capabilities Register 2021-08-05 10:51:49 +01:00
pci-driver.c VFIO update for v5.15-rc1 2021-09-02 13:41:33 -07:00
pci-label.c PCI/sysfs: Use sysfs_emit() and sysfs_emit_at() in "show" functions 2021-06-03 22:14:47 -05:00
pci-mid.c
pci-pf-stub.c PCI/IOV: Simplify pci-pf-stub with module_pci_driver() 2020-09-17 12:40:20 -05:00
pci-stub.c
pci-sysfs.c pci-v5.15-changes 2021-09-07 19:13:42 -07:00
pci.c pci-v5.15-changes 2021-09-07 19:13:42 -07:00
pci.h pci-v5.15-changes 2021-09-07 19:13:42 -07:00
probe.c Merge branch 'remotes/lorenzo/pci/hyper-v' 2021-09-02 14:56:47 -05:00
proc.c pci-v5.15-changes 2021-09-07 19:13:42 -07:00
quirks.c PCI: Add AMD GPU multi-function power dependencies 2021-09-15 16:31:41 -05:00
remove.c PCI: Remove reset_fn field from pci_dev 2021-08-17 17:44:38 -05:00
rom.c PCI: Use ioremap(), not phys_to_virt() for platform ROM 2020-03-30 09:52:23 -05:00
search.c PCI: Remove WARN_ON(in_interrupt()) 2021-02-10 16:46:29 -06:00
setup-bus.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
setup-irq.c
setup-res.c PCI: Decline to resize resources if boot config must be preserved 2021-01-12 16:39:52 -06:00
slot.c PCI/sysfs: Use sysfs_emit() and sysfs_emit_at() in "show" functions 2021-06-03 22:14:47 -05:00
syscall.c PCI: Return int from pciconfig_read() syscall 2021-08-03 16:55:48 -05:00
vc.c PCI: Fix kerneldoc warnings 2020-08-05 18:23:14 -05:00
vpd.c PCI/VPD: Defer VPD sizing until first access 2021-09-15 16:26:13 -05:00
xen-pcifront.c Merge branch 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb 2021-09-03 10:34:44 -07:00