2006-07-31 15:21:33 +08:00
|
|
|
/*
|
|
|
|
* drivers/pci/pcie/aer/aerdrv_core.c
|
|
|
|
*
|
|
|
|
* This file is subject to the terms and conditions of the GNU General Public
|
|
|
|
* License. See the file "COPYING" in the main directory of this archive
|
|
|
|
* for more details.
|
|
|
|
*
|
|
|
|
* This file implements the core part of PCI-Express AER. When an pci-express
|
|
|
|
* error is delivered, an error message will be collected and printed to
|
|
|
|
* console, then, an error recovery procedure will be executed by following
|
|
|
|
* the pci error recovery rules.
|
|
|
|
*
|
|
|
|
* Copyright (C) 2006 Intel Corp.
|
|
|
|
* Tom Long Nguyen (tom.l.nguyen@intel.com)
|
|
|
|
* Zhang Yanmin (yanmin.zhang@intel.com)
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/pci.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/errno.h>
|
|
|
|
#include <linux/pm.h>
|
|
|
|
#include <linux/suspend.h>
|
|
|
|
#include <linux/delay.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2006-07-31 15:21:33 +08:00
|
|
|
#include "aerdrv.h"
|
|
|
|
|
|
|
|
static int forceload;
|
2009-06-16 13:35:11 +08:00
|
|
|
static int nosourceid;
|
2006-07-31 15:21:33 +08:00
|
|
|
module_param(forceload, bool, 0);
|
2009-06-16 13:35:11 +08:00
|
|
|
module_param(nosourceid, bool, 0);
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
int pci_enable_pcie_error_reporting(struct pci_dev *dev)
|
|
|
|
{
|
|
|
|
u16 reg16 = 0;
|
|
|
|
int pos;
|
|
|
|
|
PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.
Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.
However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.
Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.
Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.
The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-03 01:51:24 +08:00
|
|
|
if (dev->aer_firmware_first)
|
|
|
|
return -EIO;
|
|
|
|
|
2008-10-19 20:35:20 +08:00
|
|
|
pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
|
2006-07-31 15:21:33 +08:00
|
|
|
if (!pos)
|
|
|
|
return -EIO;
|
|
|
|
|
2009-11-11 13:31:38 +08:00
|
|
|
pos = pci_pcie_cap(dev);
|
2008-10-19 08:33:19 +08:00
|
|
|
if (!pos)
|
|
|
|
return -EIO;
|
|
|
|
|
2006-07-31 15:21:33 +08:00
|
|
|
pci_read_config_word(dev, pos+PCI_EXP_DEVCTL, ®16);
|
|
|
|
reg16 = reg16 |
|
|
|
|
PCI_EXP_DEVCTL_CERE |
|
|
|
|
PCI_EXP_DEVCTL_NFERE |
|
|
|
|
PCI_EXP_DEVCTL_FERE |
|
|
|
|
PCI_EXP_DEVCTL_URRE;
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
pci_write_config_word(dev, pos+PCI_EXP_DEVCTL, reg16);
|
|
|
|
|
2006-07-31 15:21:33 +08:00
|
|
|
return 0;
|
|
|
|
}
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
int pci_disable_pcie_error_reporting(struct pci_dev *dev)
|
|
|
|
{
|
|
|
|
u16 reg16 = 0;
|
|
|
|
int pos;
|
|
|
|
|
PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.
Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.
However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.
Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.
Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.
The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-03 01:51:24 +08:00
|
|
|
if (dev->aer_firmware_first)
|
|
|
|
return -EIO;
|
|
|
|
|
2009-11-11 13:31:38 +08:00
|
|
|
pos = pci_pcie_cap(dev);
|
2006-07-31 15:21:33 +08:00
|
|
|
if (!pos)
|
|
|
|
return -EIO;
|
|
|
|
|
|
|
|
pci_read_config_word(dev, pos+PCI_EXP_DEVCTL, ®16);
|
|
|
|
reg16 = reg16 & ~(PCI_EXP_DEVCTL_CERE |
|
|
|
|
PCI_EXP_DEVCTL_NFERE |
|
|
|
|
PCI_EXP_DEVCTL_FERE |
|
|
|
|
PCI_EXP_DEVCTL_URRE);
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
pci_write_config_word(dev, pos+PCI_EXP_DEVCTL, reg16);
|
|
|
|
|
2006-07-31 15:21:33 +08:00
|
|
|
return 0;
|
|
|
|
}
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
|
|
|
|
{
|
|
|
|
int pos;
|
2009-12-04 01:28:20 +08:00
|
|
|
u32 status;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
2008-10-19 08:33:19 +08:00
|
|
|
pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
|
2006-07-31 15:21:33 +08:00
|
|
|
if (!pos)
|
|
|
|
return -EIO;
|
|
|
|
|
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
|
2009-12-04 01:28:20 +08:00
|
|
|
if (status)
|
|
|
|
pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
EXPORT_SYMBOL_GPL(pci_cleanup_aer_uncorrect_error_status);
|
2006-07-31 15:21:33 +08:00
|
|
|
|
2010-04-15 12:14:17 +08:00
|
|
|
/**
|
|
|
|
* add_error_device - list device to be handled
|
|
|
|
* @e_info: pointer to error info
|
|
|
|
* @dev: pointer to pci_dev to be added
|
|
|
|
*/
|
2009-06-16 13:35:16 +08:00
|
|
|
static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev)
|
|
|
|
{
|
|
|
|
if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) {
|
|
|
|
e_info->dev[e_info->error_dev_num] = dev;
|
|
|
|
e_info->error_dev_num++;
|
2010-04-15 12:14:17 +08:00
|
|
|
return 0;
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
}
|
2010-04-15 12:14:17 +08:00
|
|
|
return -ENOSPC;
|
2009-06-16 13:35:16 +08:00
|
|
|
}
|
|
|
|
|
2009-06-16 13:35:11 +08:00
|
|
|
#define PCI_BUS(x) (((x) >> 8) & 0xff)
|
|
|
|
|
2010-04-15 12:12:21 +08:00
|
|
|
/**
|
|
|
|
* is_error_source - check whether the device is source of reported error
|
|
|
|
* @dev: pointer to pci_dev to be checked
|
|
|
|
* @e_info: pointer to reported error info
|
|
|
|
*/
|
|
|
|
static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
|
2009-06-16 13:35:11 +08:00
|
|
|
{
|
|
|
|
int pos;
|
2010-04-15 12:12:21 +08:00
|
|
|
u32 status, mask;
|
2009-06-16 13:35:11 +08:00
|
|
|
u16 reg16;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* When bus id is equal to 0, it might be a bad id
|
|
|
|
* reported by root port.
|
|
|
|
*/
|
|
|
|
if (!nosourceid && (PCI_BUS(e_info->id) != 0)) {
|
2010-04-15 12:13:41 +08:00
|
|
|
/* Device ID match? */
|
|
|
|
if (e_info->id == ((dev->bus->number << 8) | dev->devfn))
|
2010-04-15 12:12:21 +08:00
|
|
|
return true;
|
2009-06-16 13:35:16 +08:00
|
|
|
|
2010-04-15 12:12:21 +08:00
|
|
|
/* Continue id comparing if there is no multiple error */
|
2009-09-07 16:16:20 +08:00
|
|
|
if (!e_info->multi_error_valid)
|
2010-04-15 12:12:21 +08:00
|
|
|
return false;
|
2009-06-16 13:35:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2009-06-16 13:35:16 +08:00
|
|
|
* When either
|
|
|
|
* 1) nosourceid==y;
|
|
|
|
* 2) bus id is equal to 0. Some ports might lose the bus
|
|
|
|
* id of error source id;
|
|
|
|
* 3) There are multiple errors and prior id comparing fails;
|
2010-04-15 12:12:21 +08:00
|
|
|
* We check AER status registers to find possible reporter.
|
2009-06-16 13:35:11 +08:00
|
|
|
*/
|
|
|
|
if (atomic_read(&dev->enable_cnt) == 0)
|
2010-04-15 12:12:21 +08:00
|
|
|
return false;
|
2009-11-11 13:31:38 +08:00
|
|
|
pos = pci_pcie_cap(dev);
|
2009-06-16 13:35:11 +08:00
|
|
|
if (!pos)
|
2010-04-15 12:12:21 +08:00
|
|
|
return false;
|
|
|
|
|
2009-06-16 13:35:11 +08:00
|
|
|
/* Check if AER is enabled */
|
2010-04-15 12:12:21 +08:00
|
|
|
pci_read_config_word(dev, pos + PCI_EXP_DEVCTL, ®16);
|
2009-06-16 13:35:11 +08:00
|
|
|
if (!(reg16 & (
|
|
|
|
PCI_EXP_DEVCTL_CERE |
|
|
|
|
PCI_EXP_DEVCTL_NFERE |
|
|
|
|
PCI_EXP_DEVCTL_FERE |
|
|
|
|
PCI_EXP_DEVCTL_URRE)))
|
2010-04-15 12:12:21 +08:00
|
|
|
return false;
|
2009-06-16 13:35:11 +08:00
|
|
|
pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
|
|
|
|
if (!pos)
|
2010-04-15 12:12:21 +08:00
|
|
|
return false;
|
2009-06-16 13:35:11 +08:00
|
|
|
|
2010-04-15 12:12:21 +08:00
|
|
|
/* Check if error is recorded */
|
2009-06-16 13:35:11 +08:00
|
|
|
if (e_info->severity == AER_CORRECTABLE) {
|
2009-09-07 16:12:25 +08:00
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, &status);
|
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_COR_MASK, &mask);
|
2009-06-16 13:35:11 +08:00
|
|
|
} else {
|
2009-09-07 16:12:25 +08:00
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
|
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, &mask);
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
2010-04-15 12:12:21 +08:00
|
|
|
if (status & ~mask)
|
|
|
|
return true;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
2010-04-15 12:12:21 +08:00
|
|
|
return false;
|
|
|
|
}
|
2009-06-16 13:35:16 +08:00
|
|
|
|
2010-04-15 12:12:21 +08:00
|
|
|
static int find_device_iter(struct pci_dev *dev, void *data)
|
|
|
|
{
|
|
|
|
struct aer_err_info *e_info = (struct aer_err_info *)data;
|
|
|
|
|
|
|
|
if (is_error_source(dev, e_info)) {
|
2010-04-15 12:14:17 +08:00
|
|
|
/* List this device */
|
|
|
|
if (add_error_device(e_info, dev)) {
|
|
|
|
/* We cannot handle more... Stop iteration */
|
|
|
|
/* TODO: Should print error message here? */
|
|
|
|
return 1;
|
|
|
|
}
|
2010-04-15 12:12:21 +08:00
|
|
|
|
|
|
|
/* If there is only a single error, stop iteration */
|
|
|
|
if (!e_info->multi_error_valid)
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* find_source_device - search through device hierarchy for source device
|
2007-11-29 01:04:23 +08:00
|
|
|
* @parent: pointer to Root Port pci_dev data structure
|
2010-04-15 12:11:42 +08:00
|
|
|
* @e_info: including detailed error information such like id
|
2006-07-31 15:21:33 +08:00
|
|
|
*
|
2010-04-15 12:11:42 +08:00
|
|
|
* Return true if found.
|
|
|
|
*
|
|
|
|
* Invoked by DPC when error is detected at the Root Port.
|
2007-11-29 01:04:23 +08:00
|
|
|
*/
|
2010-04-15 12:11:42 +08:00
|
|
|
static bool find_source_device(struct pci_dev *parent,
|
2009-06-16 13:35:11 +08:00
|
|
|
struct aer_err_info *e_info)
|
2006-07-31 15:21:33 +08:00
|
|
|
{
|
|
|
|
struct pci_dev *dev = parent;
|
2009-06-16 13:35:11 +08:00
|
|
|
int result;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
/* Is Root Port an agent that sends error message? */
|
2009-06-16 13:35:11 +08:00
|
|
|
result = find_device_iter(dev, e_info);
|
|
|
|
if (result)
|
2010-04-15 12:11:42 +08:00
|
|
|
return true;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
2009-06-16 13:35:11 +08:00
|
|
|
pci_walk_bus(parent->subordinate, find_device_iter, e_info);
|
2010-04-15 12:11:42 +08:00
|
|
|
|
|
|
|
if (!e_info->error_dev_num) {
|
|
|
|
dev_printk(KERN_DEBUG, &parent->dev,
|
|
|
|
"can't find device of ID%04x\n",
|
|
|
|
e_info->id);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
return true;
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
2009-06-16 13:34:38 +08:00
|
|
|
static int report_error_detected(struct pci_dev *dev, void *data)
|
2006-07-31 15:21:33 +08:00
|
|
|
{
|
|
|
|
pci_ers_result_t vote;
|
|
|
|
struct pci_error_handlers *err_handler;
|
|
|
|
struct aer_broadcast_data *result_data;
|
|
|
|
result_data = (struct aer_broadcast_data *) data;
|
|
|
|
|
|
|
|
dev->error_state = result_data->state;
|
|
|
|
|
|
|
|
if (!dev->driver ||
|
|
|
|
!dev->driver->err_handler ||
|
|
|
|
!dev->driver->err_handler->error_detected) {
|
|
|
|
if (result_data->state == pci_channel_io_frozen &&
|
|
|
|
!(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE)) {
|
|
|
|
/*
|
|
|
|
* In case of fatal recovery, if one of down-
|
|
|
|
* stream device has no driver. We might be
|
|
|
|
* unable to recover because a later insmod
|
|
|
|
* of a driver for this device is unaware of
|
|
|
|
* its hw state.
|
|
|
|
*/
|
2008-06-14 00:52:12 +08:00
|
|
|
dev_printk(KERN_DEBUG, &dev->dev, "device has %s\n",
|
|
|
|
dev->driver ?
|
|
|
|
"no AER-aware driver" : "no driver");
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
2009-06-16 13:34:38 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
err_handler = dev->driver->err_handler;
|
|
|
|
vote = err_handler->error_detected(dev, result_data->state);
|
|
|
|
result_data->result = merge_result(result_data->result, vote);
|
2009-06-16 13:34:38 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
2009-06-16 13:34:38 +08:00
|
|
|
static int report_mmio_enabled(struct pci_dev *dev, void *data)
|
2006-07-31 15:21:33 +08:00
|
|
|
{
|
|
|
|
pci_ers_result_t vote;
|
|
|
|
struct pci_error_handlers *err_handler;
|
|
|
|
struct aer_broadcast_data *result_data;
|
|
|
|
result_data = (struct aer_broadcast_data *) data;
|
|
|
|
|
|
|
|
if (!dev->driver ||
|
|
|
|
!dev->driver->err_handler ||
|
|
|
|
!dev->driver->err_handler->mmio_enabled)
|
2009-06-16 13:34:38 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
err_handler = dev->driver->err_handler;
|
|
|
|
vote = err_handler->mmio_enabled(dev);
|
|
|
|
result_data->result = merge_result(result_data->result, vote);
|
2009-06-16 13:34:38 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
2009-06-16 13:34:38 +08:00
|
|
|
static int report_slot_reset(struct pci_dev *dev, void *data)
|
2006-07-31 15:21:33 +08:00
|
|
|
{
|
|
|
|
pci_ers_result_t vote;
|
|
|
|
struct pci_error_handlers *err_handler;
|
|
|
|
struct aer_broadcast_data *result_data;
|
|
|
|
result_data = (struct aer_broadcast_data *) data;
|
|
|
|
|
|
|
|
if (!dev->driver ||
|
|
|
|
!dev->driver->err_handler ||
|
|
|
|
!dev->driver->err_handler->slot_reset)
|
2009-06-16 13:34:38 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
err_handler = dev->driver->err_handler;
|
|
|
|
vote = err_handler->slot_reset(dev);
|
|
|
|
result_data->result = merge_result(result_data->result, vote);
|
2009-06-16 13:34:38 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
2009-06-16 13:34:38 +08:00
|
|
|
static int report_resume(struct pci_dev *dev, void *data)
|
2006-07-31 15:21:33 +08:00
|
|
|
{
|
|
|
|
struct pci_error_handlers *err_handler;
|
|
|
|
|
|
|
|
dev->error_state = pci_channel_io_normal;
|
|
|
|
|
|
|
|
if (!dev->driver ||
|
|
|
|
!dev->driver->err_handler ||
|
2008-12-01 15:31:06 +08:00
|
|
|
!dev->driver->err_handler->resume)
|
2009-06-16 13:34:38 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
err_handler = dev->driver->err_handler;
|
|
|
|
err_handler->resume(dev);
|
2009-06-16 13:34:38 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* broadcast_error_message - handle message broadcast to downstream drivers
|
2007-11-29 01:04:23 +08:00
|
|
|
* @dev: pointer to from where in a hierarchy message is broadcasted down
|
2006-07-31 15:21:33 +08:00
|
|
|
* @state: error state
|
2007-11-29 01:04:23 +08:00
|
|
|
* @error_mesg: message to print
|
|
|
|
* @cb: callback to be broadcasted
|
2006-07-31 15:21:33 +08:00
|
|
|
*
|
|
|
|
* Invoked during error recovery process. Once being invoked, the content
|
|
|
|
* of error severity will be broadcasted to all downstream drivers in a
|
|
|
|
* hierarchy in question.
|
2007-11-29 01:04:23 +08:00
|
|
|
*/
|
2006-07-31 15:21:33 +08:00
|
|
|
static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
|
|
|
|
enum pci_channel_state state,
|
|
|
|
char *error_mesg,
|
2009-06-16 13:34:38 +08:00
|
|
|
int (*cb)(struct pci_dev *, void *))
|
2006-07-31 15:21:33 +08:00
|
|
|
{
|
|
|
|
struct aer_broadcast_data result_data;
|
|
|
|
|
2008-06-14 00:52:12 +08:00
|
|
|
dev_printk(KERN_DEBUG, &dev->dev, "broadcast %s message\n", error_mesg);
|
2006-07-31 15:21:33 +08:00
|
|
|
result_data.state = state;
|
|
|
|
if (cb == report_error_detected)
|
|
|
|
result_data.result = PCI_ERS_RESULT_CAN_RECOVER;
|
|
|
|
else
|
|
|
|
result_data.result = PCI_ERS_RESULT_RECOVERED;
|
|
|
|
|
|
|
|
if (dev->hdr_type & PCI_HEADER_TYPE_BRIDGE) {
|
|
|
|
/*
|
|
|
|
* If the error is reported by a bridge, we think this error
|
|
|
|
* is related to the downstream link of the bridge, so we
|
|
|
|
* do error recovery on all subordinates of the bridge instead
|
|
|
|
* of the bridge and clear the error status of the bridge.
|
|
|
|
*/
|
|
|
|
if (cb == report_error_detected)
|
|
|
|
dev->error_state = state;
|
|
|
|
pci_walk_bus(dev->subordinate, cb, &result_data);
|
|
|
|
if (cb == report_resume) {
|
|
|
|
pci_cleanup_aer_uncorrect_error_status(dev);
|
|
|
|
dev->error_state = pci_channel_io_normal;
|
|
|
|
}
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
} else {
|
2006-07-31 15:21:33 +08:00
|
|
|
/*
|
|
|
|
* If the error is reported by an end point, we think this
|
|
|
|
* error is related to the upstream link of the end point.
|
|
|
|
*/
|
|
|
|
pci_walk_bus(dev->bus, cb, &result_data);
|
|
|
|
}
|
|
|
|
|
|
|
|
return result_data.result;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct find_aer_service_data {
|
|
|
|
struct pcie_port_service_driver *aer_driver;
|
|
|
|
int is_downstream;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int find_aer_service_iter(struct device *device, void *data)
|
|
|
|
{
|
|
|
|
struct device_driver *driver;
|
|
|
|
struct pcie_port_service_driver *service_driver;
|
|
|
|
struct find_aer_service_data *result;
|
|
|
|
|
|
|
|
result = (struct find_aer_service_data *) data;
|
|
|
|
|
|
|
|
if (device->bus == &pcie_port_bus_type) {
|
2009-11-25 20:06:15 +08:00
|
|
|
struct pcie_device *pcie = to_pcie_device(device);
|
2009-01-13 21:46:46 +08:00
|
|
|
|
2009-11-25 20:06:15 +08:00
|
|
|
if (pcie->port->pcie_type == PCI_EXP_TYPE_DOWNSTREAM)
|
2006-07-31 15:21:33 +08:00
|
|
|
result->is_downstream = 1;
|
|
|
|
|
|
|
|
driver = device->driver;
|
|
|
|
if (driver) {
|
|
|
|
service_driver = to_service_driver(driver);
|
2009-01-13 21:46:46 +08:00
|
|
|
if (service_driver->service == PCIE_PORT_SERVICE_AER) {
|
2006-07-31 15:21:33 +08:00
|
|
|
result->aer_driver = service_driver;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void find_aer_service(struct pci_dev *dev,
|
|
|
|
struct find_aer_service_data *data)
|
|
|
|
{
|
2006-08-29 02:43:25 +08:00
|
|
|
int retval;
|
|
|
|
retval = device_for_each_child(&dev->dev, data, find_aer_service_iter);
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static pci_ers_result_t reset_link(struct pcie_device *aerdev,
|
|
|
|
struct pci_dev *dev)
|
|
|
|
{
|
|
|
|
struct pci_dev *udev;
|
|
|
|
pci_ers_result_t status;
|
|
|
|
struct find_aer_service_data data;
|
|
|
|
|
|
|
|
if (dev->hdr_type & PCI_HEADER_TYPE_BRIDGE)
|
|
|
|
udev = dev;
|
|
|
|
else
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
udev = dev->bus->self;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
data.is_downstream = 0;
|
|
|
|
data.aer_driver = NULL;
|
|
|
|
find_aer_service(udev, &data);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Use the aer driver of the error agent firstly.
|
|
|
|
* If it hasn't the aer driver, use the root port's
|
|
|
|
*/
|
|
|
|
if (!data.aer_driver || !data.aer_driver->reset_link) {
|
|
|
|
if (data.is_downstream &&
|
|
|
|
aerdev->device.driver &&
|
|
|
|
to_service_driver(aerdev->device.driver)->reset_link) {
|
|
|
|
data.aer_driver =
|
|
|
|
to_service_driver(aerdev->device.driver);
|
|
|
|
} else {
|
2008-06-14 00:52:12 +08:00
|
|
|
dev_printk(KERN_DEBUG, &dev->dev, "no link-reset "
|
|
|
|
"support\n");
|
2006-07-31 15:21:33 +08:00
|
|
|
return PCI_ERS_RESULT_DISCONNECT;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
status = data.aer_driver->reset_link(udev);
|
|
|
|
if (status != PCI_ERS_RESULT_RECOVERED) {
|
2008-06-14 00:52:12 +08:00
|
|
|
dev_printk(KERN_DEBUG, &dev->dev, "link reset at upstream "
|
|
|
|
"device %s failed\n", pci_name(udev));
|
2006-07-31 15:21:33 +08:00
|
|
|
return PCI_ERS_RESULT_DISCONNECT;
|
|
|
|
}
|
|
|
|
|
|
|
|
return status;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* do_recovery - handle nonfatal/fatal error recovery process
|
|
|
|
* @aerdev: pointer to a pcie_device data structure of root port
|
|
|
|
* @dev: pointer to a pci_dev data structure of agent detecting an error
|
|
|
|
* @severity: error severity type
|
|
|
|
*
|
|
|
|
* Invoked when an error is nonfatal/fatal. Once being invoked, broadcast
|
|
|
|
* error detected message to all downstream drivers within a hierarchy in
|
|
|
|
* question and return the returned code.
|
2007-11-29 01:04:23 +08:00
|
|
|
*/
|
2006-07-31 15:21:33 +08:00
|
|
|
static pci_ers_result_t do_recovery(struct pcie_device *aerdev,
|
|
|
|
struct pci_dev *dev,
|
|
|
|
int severity)
|
|
|
|
{
|
|
|
|
pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
|
|
|
|
enum pci_channel_state state;
|
|
|
|
|
|
|
|
if (severity == AER_FATAL)
|
|
|
|
state = pci_channel_io_frozen;
|
|
|
|
else
|
|
|
|
state = pci_channel_io_normal;
|
|
|
|
|
|
|
|
status = broadcast_error_message(dev,
|
|
|
|
state,
|
|
|
|
"error_detected",
|
|
|
|
report_error_detected);
|
|
|
|
|
|
|
|
if (severity == AER_FATAL) {
|
|
|
|
result = reset_link(aerdev, dev);
|
|
|
|
if (result != PCI_ERS_RESULT_RECOVERED) {
|
|
|
|
/* TODO: Should panic here? */
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (status == PCI_ERS_RESULT_CAN_RECOVER)
|
|
|
|
status = broadcast_error_message(dev,
|
|
|
|
state,
|
|
|
|
"mmio_enabled",
|
|
|
|
report_mmio_enabled);
|
|
|
|
|
|
|
|
if (status == PCI_ERS_RESULT_NEED_RESET) {
|
|
|
|
/*
|
|
|
|
* TODO: Should call platform-specific
|
|
|
|
* functions to reset slot before calling
|
|
|
|
* drivers' slot_reset callbacks?
|
|
|
|
*/
|
|
|
|
status = broadcast_error_message(dev,
|
|
|
|
state,
|
|
|
|
"slot_reset",
|
|
|
|
report_slot_reset);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (status == PCI_ERS_RESULT_RECOVERED)
|
|
|
|
broadcast_error_message(dev,
|
|
|
|
state,
|
|
|
|
"resume",
|
|
|
|
report_resume);
|
|
|
|
|
|
|
|
return status;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* handle_error_source - handle logging error into an event log
|
|
|
|
* @aerdev: pointer to pcie_device data structure of the root port
|
|
|
|
* @dev: pointer to pci_dev data structure of error source device
|
|
|
|
* @info: comprehensive error information
|
|
|
|
*
|
|
|
|
* Invoked when an error being detected by Root Port.
|
2007-11-29 01:04:23 +08:00
|
|
|
*/
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
static void handle_error_source(struct pcie_device *aerdev,
|
2006-07-31 15:21:33 +08:00
|
|
|
struct pci_dev *dev,
|
2009-06-16 13:35:11 +08:00
|
|
|
struct aer_err_info *info)
|
2006-07-31 15:21:33 +08:00
|
|
|
{
|
|
|
|
pci_ers_result_t status = 0;
|
|
|
|
int pos;
|
|
|
|
|
2009-06-16 13:35:11 +08:00
|
|
|
if (info->severity == AER_CORRECTABLE) {
|
2006-07-31 15:21:33 +08:00
|
|
|
/*
|
|
|
|
* Correctable error does not need software intevention.
|
|
|
|
* No need to go through error recovery process.
|
|
|
|
*/
|
2008-10-19 08:33:19 +08:00
|
|
|
pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
|
2006-07-31 15:21:33 +08:00
|
|
|
if (pos)
|
|
|
|
pci_write_config_dword(dev, pos + PCI_ERR_COR_STATUS,
|
2009-06-16 13:35:11 +08:00
|
|
|
info->status);
|
2006-07-31 15:21:33 +08:00
|
|
|
} else {
|
2009-06-16 13:35:11 +08:00
|
|
|
status = do_recovery(aerdev, dev, info->severity);
|
2006-07-31 15:21:33 +08:00
|
|
|
if (status == PCI_ERS_RESULT_RECOVERED) {
|
2008-06-14 00:52:12 +08:00
|
|
|
dev_printk(KERN_DEBUG, &dev->dev, "AER driver "
|
|
|
|
"successfully recovered\n");
|
2006-07-31 15:21:33 +08:00
|
|
|
} else {
|
|
|
|
/* TODO: Should kernel panic here? */
|
2008-06-14 00:52:12 +08:00
|
|
|
dev_printk(KERN_DEBUG, &dev->dev, "AER driver didn't "
|
|
|
|
"recover\n");
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* get_e_source - retrieve an error source
|
|
|
|
* @rpc: pointer to the root port which holds an error
|
|
|
|
*
|
|
|
|
* Invoked by DPC handler to consume an error.
|
2007-11-29 01:04:23 +08:00
|
|
|
*/
|
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:07:29 +08:00
|
|
|
static struct aer_err_source *get_e_source(struct aer_rpc *rpc)
|
2006-07-31 15:21:33 +08:00
|
|
|
{
|
|
|
|
struct aer_err_source *e_source;
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
/* Lock access to Root error producer/consumer index */
|
|
|
|
spin_lock_irqsave(&rpc->e_lock, flags);
|
|
|
|
if (rpc->prod_idx == rpc->cons_idx) {
|
|
|
|
spin_unlock_irqrestore(&rpc->e_lock, flags);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
e_source = &rpc->e_sources[rpc->cons_idx];
|
|
|
|
rpc->cons_idx++;
|
|
|
|
if (rpc->cons_idx == AER_ERROR_SOURCES_MAX)
|
|
|
|
rpc->cons_idx = 0;
|
|
|
|
spin_unlock_irqrestore(&rpc->e_lock, flags);
|
|
|
|
|
|
|
|
return e_source;
|
|
|
|
}
|
|
|
|
|
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:59 +08:00
|
|
|
/**
|
|
|
|
* get_device_error_info - read error status from dev and store it to info
|
|
|
|
* @dev: pointer to the device expected to have a error record
|
|
|
|
* @info: pointer to structure to store the error record
|
|
|
|
*
|
|
|
|
* Return 1 on success, 0 on error.
|
|
|
|
*/
|
2006-07-31 15:21:33 +08:00
|
|
|
static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
|
|
|
|
{
|
2009-09-07 16:13:42 +08:00
|
|
|
int pos, temp;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
2009-09-07 16:09:58 +08:00
|
|
|
info->status = 0;
|
2009-09-07 16:16:20 +08:00
|
|
|
info->tlp_header_valid = 0;
|
2009-09-07 16:09:58 +08:00
|
|
|
|
2008-10-19 08:33:19 +08:00
|
|
|
pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
/* The device might not support AER */
|
|
|
|
if (!pos)
|
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:59 +08:00
|
|
|
return 1;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
if (info->severity == AER_CORRECTABLE) {
|
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS,
|
|
|
|
&info->status);
|
2009-09-07 16:12:25 +08:00
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_COR_MASK,
|
|
|
|
&info->mask);
|
|
|
|
if (!(info->status & ~info->mask))
|
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:59 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
} else if (dev->hdr_type & PCI_HEADER_TYPE_BRIDGE ||
|
|
|
|
info->severity == AER_NONFATAL) {
|
|
|
|
|
|
|
|
/* Link is still healthy for IO reads */
|
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS,
|
|
|
|
&info->status);
|
2009-09-07 16:12:25 +08:00
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_MASK,
|
|
|
|
&info->mask);
|
|
|
|
if (!(info->status & ~info->mask))
|
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:59 +08:00
|
|
|
return 0;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
2009-09-07 16:13:42 +08:00
|
|
|
/* Get First Error Pointer */
|
|
|
|
pci_read_config_dword(dev, pos + PCI_ERR_CAP, &temp);
|
2009-09-07 16:16:20 +08:00
|
|
|
info->first_error = PCI_ERR_CAP_FEP(temp);
|
2009-09-07 16:13:42 +08:00
|
|
|
|
2006-07-31 15:21:33 +08:00
|
|
|
if (info->status & AER_LOG_TLP_MASKS) {
|
2009-09-07 16:16:20 +08:00
|
|
|
info->tlp_header_valid = 1;
|
2006-07-31 15:21:33 +08:00
|
|
|
pci_read_config_dword(dev,
|
|
|
|
pos + PCI_ERR_HEADER_LOG, &info->tlp.dw0);
|
|
|
|
pci_read_config_dword(dev,
|
|
|
|
pos + PCI_ERR_HEADER_LOG + 4, &info->tlp.dw1);
|
|
|
|
pci_read_config_dword(dev,
|
|
|
|
pos + PCI_ERR_HEADER_LOG + 8, &info->tlp.dw2);
|
|
|
|
pci_read_config_dword(dev,
|
|
|
|
pos + PCI_ERR_HEADER_LOG + 12, &info->tlp.dw3);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:59 +08:00
|
|
|
return 1;
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
2009-06-16 13:35:16 +08:00
|
|
|
static inline void aer_process_err_devices(struct pcie_device *p_device,
|
|
|
|
struct aer_err_info *e_info)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:59 +08:00
|
|
|
/* Report all before handle them, not to lost records by reset etc. */
|
2009-06-16 13:35:16 +08:00
|
|
|
for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
|
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:59 +08:00
|
|
|
if (get_device_error_info(e_info->dev[i], e_info))
|
2009-06-16 13:35:16 +08:00
|
|
|
aer_print_error(e_info->dev[i], e_info);
|
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:59 +08:00
|
|
|
}
|
|
|
|
for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
|
|
|
|
if (get_device_error_info(e_info->dev[i], e_info))
|
|
|
|
handle_error_source(p_device, e_info->dev[i], e_info);
|
2009-06-16 13:35:16 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-07-31 15:21:33 +08:00
|
|
|
/**
|
|
|
|
* aer_isr_one_error - consume an error detected by root port
|
|
|
|
* @p_device: pointer to error root port service device
|
|
|
|
* @e_src: pointer to an error source
|
2007-11-29 01:04:23 +08:00
|
|
|
*/
|
2006-07-31 15:21:33 +08:00
|
|
|
static void aer_isr_one_error(struct pcie_device *p_device,
|
|
|
|
struct aer_err_source *e_src)
|
|
|
|
{
|
2009-06-16 13:35:11 +08:00
|
|
|
struct aer_err_info *e_info;
|
2006-07-31 15:21:33 +08:00
|
|
|
int i;
|
2009-06-16 13:35:11 +08:00
|
|
|
|
|
|
|
/* struct aer_err_info might be big, so we allocate it with slab */
|
|
|
|
e_info = kmalloc(sizeof(struct aer_err_info), GFP_KERNEL);
|
|
|
|
if (e_info == NULL) {
|
|
|
|
dev_printk(KERN_DEBUG, &p_device->port->dev,
|
|
|
|
"Can't allocate mem when processing AER errors\n");
|
|
|
|
return;
|
|
|
|
}
|
2006-07-31 15:21:33 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* There is a possibility that both correctable error and
|
|
|
|
* uncorrectable error being logged. Report correctable error first.
|
|
|
|
*/
|
|
|
|
for (i = 1; i & ROOT_ERR_STATUS_MASKS ; i <<= 2) {
|
|
|
|
if (i > 4)
|
|
|
|
break;
|
|
|
|
if (!(e_src->status & i))
|
|
|
|
continue;
|
|
|
|
|
2009-06-16 13:35:11 +08:00
|
|
|
memset(e_info, 0, sizeof(struct aer_err_info));
|
|
|
|
|
2006-07-31 15:21:33 +08:00
|
|
|
/* Init comprehensive error information */
|
|
|
|
if (i & PCI_ERR_ROOT_COR_RCV) {
|
2009-06-16 13:35:11 +08:00
|
|
|
e_info->id = ERR_COR_ID(e_src->id);
|
|
|
|
e_info->severity = AER_CORRECTABLE;
|
2006-07-31 15:21:33 +08:00
|
|
|
} else {
|
2009-06-16 13:35:11 +08:00
|
|
|
e_info->id = ERR_UNCOR_ID(e_src->id);
|
|
|
|
e_info->severity = ((e_src->status >> 6) & 1);
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
if (e_src->status &
|
|
|
|
(PCI_ERR_ROOT_MULTI_COR_RCV |
|
|
|
|
PCI_ERR_ROOT_MULTI_UNCOR_RCV))
|
2009-09-07 16:16:20 +08:00
|
|
|
e_info->multi_error_valid = 1;
|
2009-06-16 13:35:11 +08:00
|
|
|
|
PCI: pcie, aer: change error print format
Use dev_printk like format.
Sample (real machine + dummy error injected by aer-inject):
- Before:
+------ PCI-Express Device Error ------+
Error Severity : Corrected
PCIE Bus Error type : Data Link Layer
Bad TLP :
Receiver ID : 2800
VendorID=8086h, DeviceID=1096h, Bus=28h, Device=00h, Function=00h
+------ PCI-Express Device Error ------+
Error Severity : Corrected
PCIE Bus Error type : Data Link Layer
Bad TLP :
Bad DLLP :
Receiver ID : 2801
VendorID=8086h, DeviceID=1096h, Bus=28h, Device=00h, Function=01h
Error of this Agent(2801) is reported first
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Corrected error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Corrected, type=Data Link Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00000040/00000000
e1000e 0000:28:00.0: [ 6] Bad TLP
e1000e 0000:28:00.1: PCIE Bus Error: severity=Corrected, type=Data Link Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=000000c0/00000000
e1000e 0000:28:00.1: [ 6] Bad TLP
e1000e 0000:28:00.1: [ 7] Bad DLLP
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:45 +08:00
|
|
|
aer_print_port_info(p_device->port, e_info);
|
|
|
|
|
2010-04-15 12:11:42 +08:00
|
|
|
if (find_source_device(p_device->port, e_info))
|
|
|
|
aer_process_err_devices(p_device, e_info);
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
2009-06-16 13:35:11 +08:00
|
|
|
|
|
|
|
kfree(e_info);
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* aer_isr - consume errors detected by root port
|
2006-11-22 22:55:48 +08:00
|
|
|
* @work: definition of this work item
|
2006-07-31 15:21:33 +08:00
|
|
|
*
|
|
|
|
* Invoked, as DPC, when root port records new detected error
|
2007-11-29 01:04:23 +08:00
|
|
|
*/
|
2006-11-22 22:55:48 +08:00
|
|
|
void aer_isr(struct work_struct *work)
|
2006-07-31 15:21:33 +08:00
|
|
|
{
|
2006-11-22 22:55:48 +08:00
|
|
|
struct aer_rpc *rpc = container_of(work, struct aer_rpc, dpc_handler);
|
|
|
|
struct pcie_device *p_device = rpc->rpd;
|
2006-07-31 15:21:33 +08:00
|
|
|
struct aer_err_source *e_src;
|
|
|
|
|
|
|
|
mutex_lock(&rpc->rpc_mutex);
|
|
|
|
e_src = get_e_source(rpc);
|
|
|
|
while (e_src) {
|
|
|
|
aer_isr_one_error(p_device, e_src);
|
|
|
|
e_src = get_e_source(rpc);
|
|
|
|
}
|
|
|
|
mutex_unlock(&rpc->rpc_mutex);
|
|
|
|
|
|
|
|
wake_up(&rpc->wait_release);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* aer_init - provide AER initialization
|
|
|
|
* @dev: pointer to AER pcie device
|
|
|
|
*
|
|
|
|
* Invoked when AER service driver is loaded.
|
2007-11-29 01:04:23 +08:00
|
|
|
*/
|
2006-07-31 15:21:33 +08:00
|
|
|
int aer_init(struct pcie_device *dev)
|
|
|
|
{
|
PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.
Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.
However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.
Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.
Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.
The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-03 01:51:24 +08:00
|
|
|
if (dev->port->aer_firmware_first) {
|
|
|
|
dev_printk(KERN_DEBUG, &dev->device,
|
|
|
|
"PCIe errors handled by platform firmware.\n");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (aer_osc_setup(dev))
|
|
|
|
goto out;
|
2006-07-31 15:21:33 +08:00
|
|
|
|
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 16:16:59 +08:00
|
|
|
return 0;
|
PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.
Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.
However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.
Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.
Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.
The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-03 01:51:24 +08:00
|
|
|
out:
|
|
|
|
if (forceload) {
|
|
|
|
dev_printk(KERN_DEBUG, &dev->device,
|
|
|
|
"aerdrv forceload requested.\n");
|
|
|
|
dev->port->aer_firmware_first = 0;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
return -ENXIO;
|
2006-07-31 15:21:33 +08:00
|
|
|
}
|