linux/mm
Tony Luck 6713b8f11a mm, hwpoison: try to recover from copy-on write faults
commit a873dfe103 upstream.

Patch series "Copy-on-write poison recovery", v3.

Part 1 deals with the process that triggered the copy on write fault with
a store to a shared read-only page.  That process is send a SIGBUS with
the usual machine check decoration to specify the virtual address of the
lost page, together with the scope.

Part 2 sets up to asynchronously take the page with the uncorrected error
offline to prevent additional machine check faults.  H/t to Miaohe Lin
<linmiaohe@huawei.com> and Shuai Xue <xueshuai@linux.alibaba.com> for
pointing me to the existing function to queue a call to memory_failure().

On x86 there is some duplicate reporting (because the error is also
signalled by the memory controller as well as by the core that triggered
the machine check).  Console logs look like this:

This patch (of 2):

If the kernel is copying a page as the result of a copy-on-write
fault and runs into an uncorrectable error, Linux will crash because
it does not have recovery code for this case where poison is consumed
by the kernel.

It is easy to set up a test case. Just inject an error into a private
page, fork(2), and have the child process write to the page.

I wrapped that neatly into a test at:

  git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git

just enable ACPI error injection and run:

  # ./einj_mem-uc -f copy-on-write

Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel()
on architectures where that is available (currently x86 and powerpc).
When an error is detected during the page copy, return VM_FAULT_HWPOISON
to caller of wp_page_copy(). This propagates up the call stack. Both x86
and powerpc have code in their fault handler to deal with this code by
sending a SIGBUS to the application.

Note that this patch avoids a system crash and signals the process that
triggered the copy-on-write action. It does not take any action for the
memory error that is still in the shared page. To handle that a call to
memory_failure() is needed. But this cannot be done from wp_page_copy()
because it holds mmap_lock(). Perhaps the architecture fault handlers
can deal with this loose end in a subsequent patch?

On Intel/x86 this loose end will often be handled automatically because
the memory controller provides an additional notification of the h/w
poison in memory, the handler for this will call memory_failure(). This
isn't a 100% solution. If there are multiple errors, not all may be
logged in this way.

Cc: <stable@vger.kernel.org>
[tony.luck@intel.com: add call to kmsan_unpoison_memory(), per Miaohe Lin]
  Link: https://lkml.kernel.org/r/20221031201029.102123-2-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-1-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-2-tony.luck@intel.com
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Due to missing commits
  c89357e27f ("mm: support GUP-triggered unsharing of anonymous pages")
  662ce1dc9c ("delayacct: track delays from write-protect copy")
  b073d7f8ae ("mm: kmsan: maintain KMSAN metadata for page operations")
  The impact of c89357e27f is a name change from cow_user_page() to
  __wp_page_copy_user().
  The impact of 662ce1dc9c is the introduction of a new feature of
  tracking write-protect copy in delayacct.
  The impact of b073d7f8ae is an introduction of KASAN feature.
  None of these commits establishes meaningful dependency, hence resolve by
  ignoring them. - jane]
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05 18:25:04 +01:00
..
damon mm/damon/dbgfs: check if rm_contexts input is for a real context 2022-11-16 09:58:27 +01:00
kasan panic: Consolidate open-coded panic_on_warn checks 2023-02-01 08:27:22 +01:00
kfence mm: kfence: fix using kfence_metadata without initialization in show_object() 2023-03-30 12:48:01 +02:00
backing-dev.c writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs 2023-05-11 23:00:18 +09:00
balloon_compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-31 17:16:48 +02:00
cleancache.c
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma_sysfs.c mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-06-09 10:23:27 +02:00
cma.h mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
compaction.c mm, compaction: fix fast_isolate_around() to stay within boundaries 2023-01-12 11:58:47 +01:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-08 18:34:05 +01:00
debug.c mm/debug: sync up latest migrate_reason to migrate_reason_names 2021-09-24 16:13:35 -07:00
dmapool.c mm/dmapool: use DEVICE_ATTR_RO macro 2021-06-29 10:53:52 -07:00
early_ioremap.c mm/early_ioremap.c: remove redundant early_ioremap_shutdown() 2021-09-08 11:50:24 -07:00
fadvise.c
failslab.c
filemap.c mm/filemap: fix page end in filemap_get_read_batch 2023-02-22 12:57:10 +01:00
frontswap.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
gup_test.c selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages 2021-05-05 11:27:26 -07:00
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
gup.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
highmem.c highmem: fix checks in __kmap_local_sched_{in,out} 2022-04-13 20:59:21 +02:00
hmm.c mm/hmm: fault non-owner device private entries 2022-08-03 12:03:54 +02:00
huge_memory.c mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage 2023-03-22 13:31:35 +01:00
hugetlb_cgroup.c hugetlb: make free_huge_page irq safe 2021-05-05 11:27:22 -07:00
hugetlb_vmemmap.c mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2021-06-30 20:47:26 -07:00
hugetlb_vmemmap.h mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate 2021-06-30 20:47:25 -07:00
hugetlb.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
hwpoison-inject.c mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler 2022-07-12 16:35:05 +02:00
init-mm.c mm: add setup_initial_init_mm() helper 2021-07-08 11:48:21 -07:00
internal.h mm/numa: automatically generate node migration order 2021-09-03 09:58:16 -07:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm: move ioremap_page_range to vmalloc.c 2021-09-08 11:50:24 -07:00
Kconfig kmap_local: don't assume kmap PTEs are linear arrays in memory 2021-11-25 09:48:43 +01:00
Kconfig.debug
khugepaged.c mm/khugepaged: check again on anon uffd-wp during isolation 2023-04-26 13:51:52 +02:00
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 11:30:00 +02:00
ksm.c mm/ksm: remove old GCC 4.9+ check 2021-09-13 10:18:28 -07:00
list_lru.c mm: vmscan: consolidate shrinker_maps handling code 2021-05-05 11:27:23 -07:00
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-26 09:24:47 +01:00
madvise.c mm: fix madivse_pageout mishandling on non-LRU page 2022-10-05 10:39:39 +02:00
Makefile mm: introduce Data Access MONitor (DAMON) 2021-09-08 11:50:24 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: remove double Note in kerneldoc 2021-07-01 11:06:02 -07:00
memblock.c Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." 2023-02-22 12:57:07 +01:00
memcontrol.c mm: memcontrol: deprecate charge moving 2023-03-10 09:40:09 +01:00
memfd.c memfd: check for non-NULL file_seals in memfd_create() syscall 2023-06-28 10:29:45 +02:00
memory_hotplug.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
memory-failure.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
memory.c mm, hwpoison: try to recover from copy-on write faults 2023-07-05 18:25:04 +01:00
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-02-14 19:17:56 +01:00
mempool.c kasan: use separate (un)poison implementation for integrated init 2021-06-04 19:32:21 +01:00
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-16 09:58:27 +01:00
memtest.c
migrate.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
mincore.c
mlock.c mm/mlock: fix potential imbalanced rlimit ucounts adjustment 2022-05-15 20:18:53 +02:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap_lock.c mm: mmap_lock: fix disabling preemption directly 2021-07-23 17:43:28 -07:00
mmap.c mm/mmap: undo ->mmap() when arch_validate_flags() fails 2022-10-26 12:34:24 +02:00
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-12-14 11:37:17 +01:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-27 14:38:58 +02:00
mmzone.c
mprotect.c mm: don't try to NUMA-migrate COW pages that have other uses 2022-02-23 12:03:03 +01:00
mremap.c mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) 2022-04-13 20:59:22 +02:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux 2021-09-04 11:35:47 -07:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 14:38:58 +02:00
page_alloc.c mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock 2023-04-26 13:51:55 +02:00
page_counter.c mm: page_counter: mitigate consequences of a page_counter underflow 2021-04-30 11:20:38 -07:00
page_ext.c mm/migrate: add CPU hotplug to demotion #ifdef 2021-10-18 20:22:02 -10:00
page_idle.c mm/idle_page_tracking: make PG_idle reusable 2021-09-08 11:50:24 -07:00
page_io.c mm: fix unexpected zeroed page mapping with zram swap 2022-04-20 09:34:18 +02:00
page_isolation.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
page_owner.c mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE 2021-09-08 11:50:22 -07:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_vma_mapped.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
page-writeback.c writeback: avoid use-after-free after removing device 2022-08-31 17:16:47 +02:00
pagewalk.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 12:28:05 +02:00
percpu-internal.h Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu 2021-07-01 17:17:24 -07:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c percpu: rework memcg accounting 2021-06-05 20:43:15 +00:00
percpu-vm.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-16 09:24:42 -07:00
process_vm_access.c mm/process_vm_access.c: remove duplicate include 2021-05-05 11:27:27 -07:00
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 12:28:05 +02:00
readahead.c mm: Protect operations adding pages to page cache with invalidate_lock 2021-07-13 13:14:27 +02:00
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:30:07 +02:00
rodata_test.c
secretmem.c mm: fix dereferencing possible ERR_PTR 2022-10-05 10:39:39 +02:00
shmem.c mm: shmem: don't truncate page if memory failure happens 2022-11-26 09:24:28 +01:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab_common.c mm, kfence: support kmem_dump_obj() for KFENCE objects 2022-04-27 14:38:51 +02:00
slab.c mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP 2023-03-30 12:47:56 +02:00
slab.h mm, kfence: support kmem_dump_obj() for KFENCE objects 2022-04-27 14:38:51 +02:00
slob.c mm, kfence: support kmem_dump_obj() for KFENCE objects 2022-04-27 14:38:51 +02:00
slub.c mm: slub: fix flush_cpu_slab()/__free_slab() invocations in task context. 2022-09-28 11:11:44 +02:00
sparse-vmemmap.c mm: sparsemem: split the huge PMD mapping of vmemmap pages 2021-06-30 20:47:26 -07:00
sparse.c mm: introduce memmap_alloc() to unify memory map allocation 2021-09-03 09:58:15 -07:00
swap_cgroup.c
swap_slots.c mm: Replace deprecated CPU-hotplug functions. 2021-08-28 01:46:17 +02:00
swap_state.c mm: swap: get rid of livelock in swapin readahead 2022-03-23 09:16:41 +01:00
swap.c mm: fs: invalidate bh_lrus for only cold path 2021-09-24 16:13:35 -07:00
swapfile.c mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() 2023-04-13 16:48:26 +02:00
truncate.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
usercopy.c mm/usercopy: return 1 from hardened_usercopy __setup() handler 2022-04-08 14:24:14 +02:00
userfaultfd.c mm: shmem: don't truncate page if memory failure happens 2022-11-26 09:24:28 +01:00
util.c mm: vmalloc: introduce array allocation functions 2022-07-12 16:35:01 +02:00
vmacache.c
vmalloc.c mm: vmalloc: avoid warn_alloc noise caused by fatal signal 2023-04-13 16:48:25 +02:00
vmpressure.c mm/vmpressure: replace vmpressure_to_css() with vmpressure_to_memcg() 2021-09-03 09:58:17 -07:00
vmscan.c mm: __isolate_lru_page_prepare() in isolate_migratepages_block() 2022-12-08 11:28:44 +01:00
vmstat.c mm/vmstat: protect per cpu variables with preempt disable on RT 2021-09-08 15:32:34 -07:00
workingset.c memcg: sync flush only if periodic flush is delayed 2022-04-27 14:38:57 +02:00
z3fold.c mm/z3fold: add kerneldoc fields for z3fold_pool 2021-07-01 11:06:03 -07:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:43:39 +02:00
zswap.c mm/zswap.c: fix two bugs in zswap_writeback_entry() 2021-06-30 20:47:31 -07:00