linux/arch/powerpc/platforms/powernv
Gavin Shan d2b0f6f77e powerpc/eeh: No hotplug on permanently removed dev
The issue was detected in a bit complicated test case where
we have multiple hierarchical PEs shown as following figure:

                +-----------------+
                | PE#3     p2p#0  |
                |          p2p#1  |
                +-----------------+
                        |
                +-----------------+
                | PE#4     pdev#0 |
                |          pdev#1 |
                +-----------------+

PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p
bridges. We accidentally had less-known scenario: PE#4 was removed
permanently from the system because of permanent failure (e.g.
exceeding the max allowd failure times in last hour), then we detects
EEH errors on PE#3 and tried to recover it. However, eeh_dev instances
for pdev#0/1 were not detached from PE#4, which was still connected to
PE#3. All of that was because of the fact that we rely on count-based
pcibios_release_device(), which isn't reliable enough. When doing
recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which
are not valid any more. Eventually, we run into kernel crash.

The patch fixes above issue from two aspects. For unplug, we simply
skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED
&& !EEH_PE_STATE_RECOVERING) and its frozen count should be greater
than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently
removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read
its PCI config so that PCI core will omit them.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 17:34:32 +10:00
..
eeh-ioda.c powerpc/eeh: Allow to disable EEH 2014-04-28 17:34:27 +10:00
eeh-powernv.c powerpc/eeh: Use cached capability for log dump 2014-04-28 17:34:19 +10:00
Kconfig cpufreq: powernv: Select CPUFreq related Kconfig options for powernv 2014-04-07 14:35:28 +02:00
Makefile powerpc/powernv: Add OPAL message log interface 2014-04-09 12:53:19 +10:00
opal-async.c powerpc/powernv: Fix endian issues with OPAL async code 2014-04-07 10:34:27 +10:00
opal-dump.c powerpc/powernv: Fix little endian issues in OPAL dump code 2014-04-28 13:11:24 +10:00
opal-elog.c powerpc/powernv: Fix little endian issues in OPAL error log code 2014-04-28 13:11:23 +10:00
opal-flash.c powerpc/powernv: Create OPAL sglist helper functions and fix endian issues 2014-04-28 13:11:23 +10:00
opal-lpc.c powerpc/powernv: Fix OPAL LPC access in Little Endian 2013-12-13 15:55:15 +11:00
opal-memory-errors.c powerpc/powernv: Get FSP memory errors and plumb into memory poison infrastructure. 2013-12-09 11:41:14 +11:00
opal-msglog.c powerpc/powernv: Add OPAL message log interface 2014-04-09 12:53:19 +10:00
opal-nvram.c powerpc/powernv: Make OPAL NVRAM device tree accesses endian safe 2013-10-11 16:48:47 +11:00
opal-rtc.c powernv: Remove get/set_rtc_time when they are not present 2013-12-05 16:08:22 +11:00
opal-sensor.c powerpc/powernv: Fix endian issues with sensor code 2014-04-09 12:52:49 +10:00
opal-sysparam.c powerpc/powernv: Check sysparam size before creation 2014-04-28 13:08:49 +10:00
opal-takeover.S powerpc: Merge STK_REG/PARAM/FRAMESIZE 2012-07-10 19:18:03 +10:00
opal-wrappers.S powerpc/powernv: Add invalid OPAL call 2014-04-09 12:53:23 +10:00
opal-xscom.c powerpc/powernv: Fix indirect XSCOM unmangling 2014-02-28 19:15:49 +11:00
opal.c powerpc/powernv: Create OPAL sglist helper functions and fix endian issues 2014-04-28 13:11:23 +10:00
pci-ioda.c powerpc/powernv: Release the refcount for pci_dev 2014-04-28 13:11:20 +10:00
pci-p5ioc2.c PPC: POWERNV: move iommu_add_device earlier 2013-12-05 16:08:17 +11:00
pci.c powerpc/eeh: No hotplug on permanently removed dev 2014-04-28 17:34:32 +10:00
pci.h powerpc/eeh: Allow to disable EEH 2014-04-28 17:34:27 +10:00
powernv.h powerpc/powernv: Add iommu DMA bypass support for IODA2 2014-02-11 16:07:37 +11:00
rng.c powerpc: Make cpu_to_chip_id() available when SMP=n 2013-11-21 10:33:44 +11:00
setup.c powerpc/powernv: Fix kexec races going back to OPAL 2014-04-28 13:08:50 +10:00
smp.c ppc/powernv: Set the runlatch bits correctly for offline cpus 2014-04-28 16:32:40 +10:00