linux/mm
Peter Xu 2ea7ff1e39 mm/hugetlb: fix race condition of uffd missing/minor handling
Patch series "mm/hugetlb: Fix selftest failures with write check", v3.

Currently akpm mm-unstable fails with uffd hugetlb private mapping test
randomly on a write check.

The initial bisection of that points to the recent pmd unshare series, but
it turns out there's no direction relationship with the series but only
some timing change caused the race to start trigger.

The race should be fixed in patch 1.  Patch 2 is a trivial cleanup on the
similar race with hugetlb migrations, patch 3 comment on the write check
so when anyone read it again it'll be clear why it's there.


This patch (of 3):

After the recent rework patchset of hugetlb locking on pmd sharing,
kselftest for userfaultfd sometimes fails on hugetlb private tests with
unexpected write fault checks.

It turns out there's nothing wrong within the locking series regarding
this matter, but it could have changed the timing of threads so it can
trigger an old bug.

The real bug is when we call hugetlb_no_page() we're not with the pgtable
lock.  It means we're reading the pte values lockless.  It's perfectly
fine in most cases because before we do normal page allocations we'll take
the lock and check pte_same() again.  However before that, there are
actually two paths on userfaultfd missing/minor handling that may directly
move on with the fault process without checking the pte values.

It means for these two paths we may be generating an uffd message based on
an unstable pte, while an unstable pte can legally be anything as long as
the modifier holds the pgtable lock.

One example, which is also what happened in the failing kselftest and
caused the test failure, is that for private mappings wr-protection
changes can happen on one page.  While hugetlb_change_protection()
generally requires pte being cleared before being changed, then there can
be a race condition like:

        thread 1                              thread 2
        --------                              --------

      UFFDIO_WRITEPROTECT                     hugetlb_fault
        hugetlb_change_protection
          pgtable_lock()
          huge_ptep_modify_prot_start
                                              pte==NULL
                                              hugetlb_no_page
                                                generate uffd missing event
                                                even if page existed!!
          huge_ptep_modify_prot_commit
          pgtable_unlock()

Fix this by rechecking the pte after pgtable lock for both userfaultfd
missing & minor fault paths.

This bug should have been around starting from uffd hugetlb introduced, so
attaching a Fixes to the commit.  Also attach another Fixes to the minor
support commit for easier tracking.

Note that userfaultfd is actually fine with false positives (e.g.  caused
by pte changed), but not wrong logical events (e.g.  caused by reading a
pte during changing).  The latter can confuse the userspace, so the
strictness is very much preferred.  E.g., MISSING event should never
happen on the page after UFFDIO_COPY has correctly installed the page and
returned.

Link: https://lkml.kernel.org/r/20221004193400.110155-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20221004193400.110155-2-peterx@redhat.com
Fixes: 1a1aad8a9b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook")
Fixes: 7677f7fd8b ("userfaultfd: add minor fault registration mode")
Signed-off-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:50 -07:00
..
damon mm/damon: use damon_sz_region() in appropriate place 2022-10-12 18:51:49 -07:00
kasan kasan: fix array-bounds warnings in tests 2022-10-12 18:51:50 -07:00
kfence mm: kfence: convert to DEFINE_SEQ_ATTRIBUTE 2022-10-03 14:03:07 -07:00
kmsan x86: kmsan: handle CPU entry area 2022-10-03 14:03:26 -07:00
backing-dev.c mm: backing-dev: Remove the unneeded result variable 2022-09-11 20:26:02 -07:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-28 14:02:45 -07:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-05-13 15:11:26 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c mm/compaction: fix set skip in fast_find_migrateblock 2022-10-12 15:56:45 -07:00
debug_page_ref.c
debug_vm_pgtable.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
debug.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
failslab.c mm: fix missing handler for __GFP_NOWARN 2022-05-19 14:08:55 -07:00
filemap.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-10-03 14:03:25 -07:00
folio-compat.c mm: remove try_to_free_swap() 2022-10-03 14:02:53 -07:00
frontswap.c frontswap: don't call ->init if no ops are registered 2022-09-26 12:14:34 -07:00
gup_test.c mm: rename is_pinnable_page() to is_longterm_pinnable_page() 2022-07-17 17:14:27 -07:00
gup_test.h
gup.c mm/gup: use gup_can_follow_protnone() also in GUP-fast 2022-09-26 19:46:28 -07:00
highmem.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
hmm.c mm/swap: add swp_offset_pfn() to fetch PFN from swap entry 2022-09-26 19:46:05 -07:00
huge_memory.c mm/shmem: add flag to enforce shmem THP in hugepage_vma_check() 2022-10-03 14:03:33 -07:00
hugetlb_cgroup.c hugetlb_cgroup: use helper for_each_hstate and hstate_index 2022-09-11 20:25:53 -07:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: simplify reset_struct_pages() 2022-09-11 20:25:58 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability 2022-08-08 18:06:43 -07:00
hugetlb.c mm/hugetlb: fix race condition of uffd missing/minor handling 2022-10-12 18:51:50 -07:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: remove rb tree. 2022-09-26 19:46:16 -07:00
internal.h mm/page_alloc: make boot_nodestats static 2022-10-03 14:03:30 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: Add ioremap/iounmap_allowed() 2022-06-27 12:22:31 +01:00
Kconfig mm: multi-gen LRU: admin guide 2022-09-26 19:46:10 -07:00
Kconfig.debug Two followon fixes for the post-5.19 series "Use pageblock_order for cma 2022-05-27 11:40:49 -07:00
khugepaged.c mm/khugepaged: add tracepoint to hpage_collapse_scan_file() 2022-10-03 14:03:33 -07:00
kmemleak.c mm/kmemleak: make create_object return void 2022-09-11 20:26:10 -07:00
ksm.c ksm: use a folio in replace_page() 2022-10-03 14:02:53 -07:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c madvise: convert madvise_free_pte_range() to use a folio 2022-10-03 14:02:52 -07:00
Makefile mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol 2022-10-03 14:03:36 -07:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c mm: add pageblock_align() macro 2022-10-03 14:03:04 -07:00
memcontrol.c mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol 2022-10-03 14:03:36 -07:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 11:08:32 -08:00
memory_hotplug.c mm: add pageblock_aligned() macro 2022-10-03 14:03:04 -07:00
memory-failure.c rmap: remove page_unlock_anon_vma_read() 2022-10-03 14:02:54 -07:00
memory-tiers.c mm/demotion: expose memory tier details via sysfs 2022-09-26 19:46:13 -07:00
memory.c mm: use update_mmu_tlb() on the second thread 2022-10-12 18:51:50 -07:00
mempolicy.c mm/mempolicy: use PAGE_ALIGN instead of open-coding it 2022-10-03 14:03:15 -07:00
mempool.c mm/mempool: use might_alloc() 2022-06-16 19:48:30 -07:00
memremap.c mm/memremap.c: take a pgmap reference on page allocation 2022-10-12 18:51:49 -07:00
memtest.c
migrate_device.c mm/migrate_device.c: add migrate_device_range() 2022-10-12 18:51:49 -07:00
migrate.c mm/memory.c: fix race when faulting a device private page 2022-10-12 18:51:49 -07:00
mincore.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
mlock.c mm/mlock: drop dead code in count_mm_mlocked_page_nr() 2022-09-26 19:46:27 -07:00
mm_init.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap_lock.c mm: mmap_lock: fix disabling preemption directly 2021-07-23 17:43:28 -07:00
mmap.c mm/mmap: undo ->mmap() when arch_validate_flags() fails 2022-10-12 18:51:36 -07:00
mmu_gather.c kmsan: unpoison @tlb in arch_tlb_gather_mmu() 2022-10-12 18:51:48 -07:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mprotect.c mm/uffd: fix warning without PTE_MARKER_UFFD_WP compiled in 2022-10-12 15:56:46 -07:00
mremap.c mm: add merging after mremap resize 2022-09-26 19:46:28 -07:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
oom_kill.c mm: reduce noise in show_mem for lowmem allocations 2022-09-26 19:46:29 -07:00
page_alloc.c mm: free device private pages have zero refcount 2022-10-12 18:51:49 -07:00
page_counter.c mm: page_counter: remove unneeded atomic ops for low/min 2022-09-11 20:26:01 -07:00
page_ext.c page_ext: introduce boot parameter 'early_page_ext' 2022-09-11 20:26:02 -07:00
page_idle.c mm: don't be stuck to rmap lock on reclaim path 2022-05-19 14:08:54 -07:00
page_io.c swap: convert swap_writepage() to use a folio 2022-10-03 14:02:52 -07:00
page_isolation.c mm: add pageblock_aligned() macro 2022-10-03 14:03:04 -07:00
page_owner.c mm: reuse pageblock_start/end_pfn() macro 2022-10-03 14:03:03 -07:00
page_poison.c
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_table_check.c mm/page_table_check: fix typos 2022-10-03 14:03:27 -07:00
page_vma_mapped.c mm/swap: add swp_offset_pfn() to fetch PFN from swap entry 2022-09-26 19:46:05 -07:00
page-writeback.c writeback: avoid use-after-free after removing device 2022-08-28 14:02:43 -07:00
pagewalk.c mm/pagewalk: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
percpu-internal.h percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu.c mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free() 2022-07-17 17:14:47 -07:00
pgalloc-track.h
pgtable-generic.c mm: avoid unnecessary flush on change_huge_pmd() 2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c mm: sparsemem: use page table lock to protect kernel pmd operations 2022-03-22 15:57:08 -07:00
readahead.c filemap: Fix serialization adding transparent huge pages to page cache 2022-06-23 12:22:00 -04:00
rmap.c hugetlb: use new vma_lock for pmd sharing synchronization 2022-10-03 14:03:17 -07:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c mm/secretmem: remove reduntant return value 2022-10-03 14:03:36 -07:00
shmem.c mm/shmem: add flag to enforce shmem THP in hugepage_vma_check() 2022-10-03 14:03:33 -07:00
shrinker_debug.c mm: shrinkers: fix double kfree on shrinker name 2022-07-29 18:07:13 -07:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab_common.c mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slab.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
slab.h mm: kmsan: call KMSAN hooks from SLUB code 2022-10-03 14:03:20 -07:00
slob.c mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slub.c mm: kmsan: call KMSAN hooks from SLUB code 2022-10-03 14:03:20 -07:00
sparse-vmemmap.c mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c 2022-08-08 18:06:42 -07:00
sparse.c mm: memory_hotplug: enumerate all supported section flags 2022-07-03 18:08:49 -07:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c swap_state: convert free_swap_cache() to use a folio 2022-10-03 14:02:51 -07:00
swap.c mm: add folio_add_lru_vma() 2022-10-03 14:02:45 -07:00
swap.h mm: remove lookup_swap_cache() 2022-10-03 14:02:51 -07:00
swapfile.c memcg: convert mem_cgroup_swap_full() to take a folio 2022-10-03 14:02:53 -07:00
truncate.c mm: add split_folio() 2022-10-03 14:02:45 -07:00
usercopy.c usercopy: use unsigned long instead of uintptr_t 2022-07-01 17:03:38 -07:00
userfaultfd.c hugetlb: use new vma_lock for pmd sharing synchronization 2022-10-03 14:03:17 -07:00
util.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
vmalloc.c mm: kmsan: maintain KMSAN metadata for page operations 2022-10-03 14:03:20 -07:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c mglru: mm/vmscan.c: fix imprecise comments 2022-10-07 14:28:40 -07:00
vmstat.c mm: remove vmacache 2022-09-26 19:46:18 -07:00
workingset.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
z3fold.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c zpool: remove the list of pools_head 2022-01-15 16:30:31 +02:00
zsmalloc.c zsmalloc: use correct types in _first_obj_offset functions 2022-10-03 14:03:07 -07:00
zswap.c mm/swap: remove the end_write_func argument to __swap_writepage 2022-09-11 20:25:50 -07:00