linux/mm
Mike Kravetz 04ada095dc hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page
tables associated with the address range.  For hugetlb vmas,
zap_page_range will call __unmap_hugepage_range_final.  However,
__unmap_hugepage_range_final assumes the passed vma is about to be removed
and deletes the vma_lock to prevent pmd sharing as the vma is on the way
out.  In the case of madvise(MADV_DONTNEED) the vma remains, but the
missing vma_lock prevents pmd sharing and could potentially lead to issues
with truncation/fault races.

This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
prevent pmd sharing.  Subsequent faults on this vma were confused as
VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was
not set in new pages added to the page table.  This resulted in pages that
appeared anonymous in a VM_SHARED vma and triggered the BUG.

Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap
call from unmap_vmas().  This is used to indicate the 'final' unmapping of
a hugetlb vma.  When called via MADV_DONTNEED, this flag is not set and
the vm_lock is not deleted.

[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/

Link: https://lkml.kernel.org/r/20221114235507.294320-3-mike.kravetz@oracle.com
Fixes: 90e7e7f5ef ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Wei Chen <harperchen1110@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:40 -08:00
..
damon mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed 2022-11-22 18:50:42 -08:00
kasan Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
kfence kfence: fix stack trace pruning 2022-11-22 18:50:44 -08:00
kmsan kmsan: core: kmsan_in_runtime() should return true in NMI context 2022-11-08 15:57:24 -08:00
backing-dev.c mm: backing-dev: Remove the unneeded result variable 2022-09-11 20:26:02 -07:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-28 14:02:45 -07:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-05-13 15:11:26 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c - Alistair Popple has a series which addresses a race which causes page 2022-10-14 12:28:43 -07:00
debug_page_ref.c
debug_vm_pgtable.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
debug.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
folio-compat.c mm: remove try_to_free_swap() 2022-10-03 14:02:53 -07:00
frontswap.c frontswap: don't call ->init if no ops are registered 2022-09-26 12:14:34 -07:00
gup_test.c mm: rename is_pinnable_page() to is_longterm_pinnable_page() 2022-07-17 17:14:27 -07:00
gup_test.h
gup.c Five hotfixes - three for nilfs2, two for MM. For are cc:stable, one is 2022-10-12 11:16:58 -07:00
highmem.c highmem: fix kmap_to_page() for kmap_local_page() addresses 2022-10-12 18:51:51 -07:00
hmm.c mm/swap: add swp_offset_pfn() to fetch PFN from swap entry 2022-09-26 19:46:05 -07:00
huge_memory.c Partly revert "mm/thp: carry over dirty bit when thp splits on pmd" 2022-11-08 15:57:23 -08:00
hugetlb_cgroup.c hugetlb_cgroup: use helper for_each_hstate and hstate_index 2022-09-11 20:25:53 -07:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: include missing linux/moduleparam.h 2022-11-08 15:57:23 -08:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability 2022-08-08 18:06:43 -07:00
hugetlb.c hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing 2022-11-30 14:49:40 -08:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: remove rb tree. 2022-09-26 19:46:16 -07:00
internal.h mm/page_alloc: make boot_nodestats static 2022-10-03 14:03:30 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: Add ioremap/iounmap_allowed() 2022-06-27 12:22:31 +01:00
Kconfig - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
Kconfig.debug Two followon fixes for the post-5.19 series "Use pageblock_order for cma 2022-05-27 11:40:49 -07:00
khugepaged.c mm/khugepaged: refactor mm_khugepaged_scan_file tracepoint to remove filename from function call 2022-11-22 18:50:41 -08:00
kmemleak.c mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops 2022-10-28 13:37:22 -07:00
ksm.c ksm: use a folio in replace_page() 2022-10-03 14:02:53 -07:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c madvise: use zap_page_range_single for madvise dontneed 2022-11-30 14:49:40 -08:00
Makefile mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol 2022-10-03 14:03:36 -07:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c mm: add pageblock_align() macro 2022-10-03 14:03:04 -07:00
memcontrol.c mm: correctly charge compressed memory to its memcg 2022-11-22 18:50:42 -08:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 11:08:32 -08:00
memory_hotplug.c mm: add pageblock_aligned() macro 2022-10-03 14:03:04 -07:00
memory-failure.c hugetlbfs: don't delete error page from pagecache 2022-11-08 15:57:22 -08:00
memory-tiers.c memory tier, sysfs: rename attribute "nodes" to "nodelist" 2022-10-28 13:37:22 -07:00
memory.c hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing 2022-11-30 14:49:40 -08:00
mempolicy.c mm/mempolicy: fix mbind_range() arguments to vma_merge() 2022-10-20 21:27:21 -07:00
mempool.c mm/mempool: use might_alloc() 2022-06-16 19:48:30 -07:00
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-08 15:57:23 -08:00
memtest.c
migrate_device.c mm/migrate_device: return number of migrating pages in args->cpages 2022-11-22 18:50:43 -08:00
migrate.c mm: migrate: fix return value if all subpages of THPs are migrated successfully 2022-10-28 13:37:22 -07:00
mincore.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
mlock.c mm/mlock: drop dead code in count_mm_mlocked_page_nr() 2022-09-26 19:46:27 -07:00
mm_init.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c mm: mmap: fix documentation for vma_mas_szero 2022-11-22 18:50:42 -08:00
mmu_gather.c kmsan: unpoison @tlb in arch_tlb_gather_mmu() 2022-10-12 18:51:48 -07:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mprotect.c mm/uffd: fix warning without PTE_MARKER_UFFD_WP compiled in 2022-10-12 15:56:46 -07:00
mremap.c mm: add merging after mremap resize 2022-09-26 19:46:28 -07:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
oom_kill.c mm: reduce noise in show_mem for lowmem allocations 2022-09-26 19:46:29 -07:00
page_alloc.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
page_counter.c mm: page_counter: remove unneeded atomic ops for low/min 2022-09-11 20:26:01 -07:00
page_ext.c mm/page_exit: fix kernel doc warning in page_ext_put() 2022-11-22 18:50:41 -08:00
page_idle.c mm: don't be stuck to rmap lock on reclaim path 2022-05-19 14:08:54 -07:00
page_io.c swap: convert swap_writepage() to use a folio 2022-10-03 14:02:52 -07:00
page_isolation.c mm/page_isolation: fix clang deadcode warning 2022-10-28 13:37:22 -07:00
page_owner.c mm: reuse pageblock_start/end_pfn() macro 2022-10-03 14:03:03 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm/page_table_check: fix typos 2022-10-03 14:03:27 -07:00
page_vma_mapped.c mm/swap: add swp_offset_pfn() to fetch PFN from swap entry 2022-09-26 19:46:05 -07:00
page-writeback.c mm: export balance_dirty_pages_ratelimited_flags() 2022-09-26 12:28:07 +02:00
pagewalk.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
percpu-internal.h percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
percpu-km.c
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c
percpu.c mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free() 2022-07-17 17:14:47 -07:00
pgalloc-track.h
pgtable-generic.c mm: avoid unnecessary flush on change_huge_pmd() 2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-03 10:13:13 -07:00
readahead.c mm: add PSI accounting around ->read_folio and ->readahead calls 2022-09-20 08:24:38 -06:00
rmap.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c mm/secretmem: remove reduntant return value 2022-10-03 14:03:36 -07:00
shmem.c mm/shmem: ensure proper fallback if page faults 2022-10-28 13:37:23 -07:00
shrinker_debug.c mm: shrinkers: fix double kfree on shrinker name 2022-07-29 18:07:13 -07:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h
slab_common.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
slab.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
slab.h - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
slob.c Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next 2022-09-29 11:30:55 +02:00
slub.c treewide: use prandom_u32_max() when possible, part 1 2022-10-11 17:42:55 -06:00
sparse-vmemmap.c mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c 2022-08-08 18:06:42 -07:00
sparse.c mm: memory_hotplug: enumerate all supported section flags 2022-07-03 18:08:49 -07:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c swap_state: convert free_swap_cache() to use a folio 2022-10-03 14:02:51 -07:00
swap.c mm: add folio_add_lru_vma() 2022-10-03 14:02:45 -07:00
swap.h mm: remove lookup_swap_cache() 2022-10-03 14:02:51 -07:00
swapfile.c swapfile: fix soft lockup in scan_swap_map_slots 2022-11-22 18:50:44 -08:00
truncate.c mm: add split_folio() 2022-10-03 14:02:45 -07:00
usercopy.c usercopy: use unsigned long instead of uintptr_t 2022-07-01 17:03:38 -07:00
userfaultfd.c mm/shmem: use page_mapping() to detect page cache for uffd continue 2022-11-08 15:57:23 -08:00
util.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
vmalloc.c mm: kmsan: maintain KMSAN metadata for page operations 2022-10-03 14:03:20 -07:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1 2022-11-22 18:50:45 -08:00
vmstat.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
workingset.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
z3fold.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
zbud.c
zpool.c zpool: remove the list of pools_head 2022-01-15 16:30:31 +02:00
zsmalloc.c zsmalloc: zs_destroy_pool: add size_class NULL check 2022-10-20 21:27:21 -07:00
zswap.c mm/swap: remove the end_write_func argument to __swap_writepage 2022-09-11 20:25:50 -07:00