2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-14 08:34:02 +08:00
linux-next/drivers/pci
Chen Yu e60514bd44 PCI/PM: Restore the status of PCI devices across hibernation
Currently we saw a lot of "No irq handler" errors during hibernation, which
caused the system hang finally:

  ata4.00: qc timeout (cmd 0xec)
  ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
  ata4.00: revalidation failed (errno=-5)
  ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
  do_IRQ: 31.151 No irq handler for vector

According to above logs, there is an interrupt triggered and it is
dispatched to CPU31 with a vector number 151, but there is no handler for
it, thus this IRQ will not get acked and will cause an IRQ flood which
kills the system.  To be more specific, the 31.151 is an interrupt from the
AHCI host controller.

After some investigation, the reason why this issue is triggered is because
the thaw_noirq() function does not restore the MSI/MSI-X settings across
hibernation.

The scenario is illustrated below:

  1. Before hibernation, IRQ 34 is the handler for the AHCI device, which
     is bound to CPU31.

  2. Hibernation starts, the AHCI device is put into low power state.

  3. All the nonboot CPUs are put offline, so IRQ 34 has to be migrated to
     the last alive one - CPU0.

  4. After the snapshot has been created, all the nonboot CPUs are brought
     up again; IRQ 34 remains bound to CPU0.

  5. AHCI devices are put into D0.

  6. The snapshot is written to the disk.

The issue is triggered in step 6.  The AHCI interrupt should be delivered
to CPU0, however it is delivered to the original CPU31 instead, which
causes the "No irq handler" issue.

Ying Huang has provided a clue that, in step 3 it is possible that writing
to the register might not take effect as the PCI devices have been
suspended.

In step 3, the IRQ 34 affinity should be modified from CPU31 to CPU0, but
in fact it is not.  In __pci_write_msi_msg(), if the device is already in
low power state, the low level MSI message entry will not be updated but
cached.  During the device restore process after a normal suspend/resume,
pci_restore_msi_state() writes the cached MSI back to the hardware.

But this is not the case for hibernation.  pci_restore_msi_state() is not
currently called in pci_pm_thaw_noirq(), although pci_save_state() has
saved the necessary PCI cached information in pci_pm_freeze_noirq().

Restore the PCI status for the device during hibernation.  Otherwise the
status might be lost across hibernation (for example, settings for MSI,
MSI-X, ATS, ACS, IOV, etc.), which might cause problems during hibernation.

Suggested-by: Ying Huang <ying.huang@intel.com>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@vger.kernel.org
Cc: Len Brown <len.brown@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
2017-06-30 11:15:07 -05:00
..
dwc PCI: imx6: Fix config read timeout handling 2017-05-22 17:06:30 -05:00
endpoint PCI: endpoint: Make PCI_ENDPOINT depend on HAS_DMA 2017-05-22 16:23:59 -05:00
host PCI: Don't allow unbinding host controllers that aren't prepared 2017-04-28 10:38:00 -05:00
hotplug Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
pcie Merge branch 'pci/enumeration' into next 2017-04-28 10:33:55 -05:00
switch switchtec: Fix minor bug with partition ID register 2017-05-22 16:52:30 -05:00
access.c Merge branch 'pci/misc' into next 2017-04-28 10:34:14 -05:00
ats.c PCI: Remove pci_ats_enabled() 2015-08-13 15:59:59 -05:00
bus.c PCI: Autosense device removal in pci_bridge_d3_update() 2016-11-17 18:44:56 -06:00
ecam.c PCI: ECAM: Map config region with pci_remap_cfgspace() 2017-04-24 13:53:14 -05:00
host-bridge.c cxl: use pcibios_free_controller_deferred() when removing vPHBs 2016-08-22 11:09:33 +10:00
hotplug-pci.c
htirq.c x86/htirq: Use hierarchical irqdomain to manage Hypertransport interrupts 2015-04-24 15:36:50 +02:00
iov.c PCI: Add sysfs sriov_drivers_autoprobe to control VF driver binding 2017-04-20 08:53:51 -05:00
irq.c pci-v4.12-changes 2017-05-08 19:03:25 -07:00
Kconfig Merge branch 'pci/switchtec' into next 2017-04-28 10:33:41 -05:00
Makefile Merge branch 'pci/resource-mmap' into next 2017-04-28 10:34:34 -05:00
mmap.c PCI: Add I/O BAR support to generic pci_mmap_resource_range() 2017-04-20 08:47:47 -05:00
msi.c pci-v4.12-changes 2017-05-08 19:03:25 -07:00
of.c PCI/MSI: Use of_msi_get_domain instead of open-coded "msi-parent" parsing 2015-10-16 13:07:14 +01:00
pci-acpi.c Merge branch 'pci/pm' into next 2016-12-12 11:25:04 -06:00
pci-driver.c PCI/PM: Restore the status of PCI devices across hibernation 2017-06-30 11:15:07 -05:00
pci-label.c PCI: Fix broken URL for Dell biosdevname 2016-02-29 12:03:19 -06:00
pci-mid.c pci-v4.10-changes 2016-12-15 12:46:48 -08:00
pci-stub.c
pci-sysfs.c Merge branch 'pci/virtualization' into next 2017-04-28 10:36:12 -05:00
pci.c PCI/PM: Add needs_resume flag to avoid suspend complete optimization 2017-05-23 14:18:17 -05:00
pci.h pci-v4.12-changes 2017-05-08 19:03:25 -07:00
probe.c IOMMU Updates for Linux v4.12 2017-05-09 15:15:47 -07:00
proc.c PCI: Add BAR index argument to pci_mmap_page_range() 2017-04-20 08:47:47 -05:00
quirks.c drm/radeon: make MacBook Pro d3_delay quirk more generic 2017-06-30 11:15:07 -05:00
remove.c PCI: Autosense device removal in pci_bridge_d3_update() 2016-11-17 18:44:56 -06:00
rom.c PCI: Add comments about ROM BAR updating 2016-11-29 18:05:09 -06:00
search.c PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT 2017-04-13 18:49:50 -05:00
setup-bus.c PCI: Fix calculation of bridge window's size and alignment 2017-04-18 14:47:20 -05:00
setup-irq.c
setup-res.c PCI: Make PCI_ROM_ADDRESS_MASK a 32-bit constant 2017-04-18 14:46:57 -05:00
slot.c locking/atomic, kref: Add kref_read() 2017-01-14 11:37:18 +01:00
syscall.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
vc.c PCI: Fix unaligned accesses in VC code 2016-06-20 13:24:20 -05:00
vpd.c
xen-pcifront.c xen: make use of xenbus_read_unsigned() in xen-pcifront 2016-11-07 13:55:26 +01:00