linux/drivers/acpi/apei
Shuai Xue 7cb4103166 ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events
[ Upstream commit a70297d221 ]

There are two major types of uncorrected recoverable (UCR) errors :

 - Synchronous error: The error is detected and raised at the point of
   the consumption in the execution flow, e.g. when a CPU tries to
   access a poisoned cache line. The CPU will take a synchronous error
   exception such as Synchronous External Abort (SEA) on Arm64 and
   Machine Check Exception (MCE) on X86. OS requires to take action (for
   example, offline failure page/kill failure thread) to recover this
   uncorrectable error.

 - Asynchronous error: The error is detected out of processor execution
   context, e.g. when an error is detected by a background scrubber.
   Some data in the memory are corrupted. But the data have not been
   consumed. OS is optional to take action to recover this uncorrectable
   error.

When APEI firmware first is enabled, a platform may describe one error
source for the handling of synchronous errors (e.g. MCE or SEA notification
), or for handling asynchronous errors (e.g. SCI or External Interrupt
notification). In other words, we can distinguish synchronous errors by
APEI notification. For synchronous errors, kernel will kill the current
process which accessing the poisoned page by sending SIGBUS with
BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the
process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in
early kill mode. However, the GHES driver always sets mf_flags to 0 so that
all synchronous errors are handled as asynchronous errors in memory failure.

To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous
events.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Tested-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23 08:54:38 +01:00
..
apei-base.c Merge ACPI APEI material for v5.11. 2020-11-23 12:50:17 +01:00
apei-internal.h
bert.c ACPI: APEI: Better fix to avoid spamming the console with old error logs 2022-08-11 13:07:51 +02:00
einj.c ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP 2022-08-17 14:23:10 +02:00
erst-dbg.c acpi: Use pr_warn instead of pr_warning 2019-10-18 15:00:19 +02:00
erst.c ACPI: APEI: fix return value of __setup handlers 2022-04-08 14:23:09 +02:00
ghes.c ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events 2024-02-23 08:54:38 +01:00
hest.c ACPI: APEI: fix return value of __setup handlers 2022-04-08 14:23:09 +02:00
Kconfig ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue 2019-02-07 23:10:45 +01:00
Makefile