linux/mm
Huang Ying 5ee2fa2f06 mm/rmap: fix potential batched TLB flush race
In theory, the following race is possible for batched TLB flushing.

  CPU0                               CPU1
  ----                               ----
  shrink_page_list()
                                     unmap
                                       zap_pte_range()
                                         flush_tlb_batched_pending()
                                           flush_tlb_mm()
    try_to_unmap()
      set_tlb_ubc_flush_pending()
        mm->tlb_flush_batched = true
                                           mm->tlb_flush_batched = false

After the TLB is flushed on CPU1 via flush_tlb_mm() and before
mm->tlb_flush_batched is set to false, some PTE is unmapped on CPU0 and
the TLB flushing is pended.  Then the pended TLB flushing will be lost.
Although both set_tlb_ubc_flush_pending() and
flush_tlb_batched_pending() are called with PTL locked, different PTL
instances may be used.

Because the race window is really small, and the lost TLB flushing will
cause problem only if a TLB entry is inserted before the unmapping in
the race window, the race is only theoretical.  But the fix is simple
and cheap too.

Syzbot has reported this too as follows:

    ==================================================================
    BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one

    write to 0xffff8881072cfbbc of 1 bytes by task 17406 on cpu 1:
     flush_tlb_batched_pending+0x5f/0x80 mm/rmap.c:691
     madvise_free_pte_range+0xee/0x7d0 mm/madvise.c:594
     walk_pmd_range mm/pagewalk.c:128 [inline]
     walk_pud_range mm/pagewalk.c:205 [inline]
     walk_p4d_range mm/pagewalk.c:240 [inline]
     walk_pgd_range mm/pagewalk.c:277 [inline]
     __walk_page_range+0x981/0x1160 mm/pagewalk.c:379
     walk_page_range+0x131/0x300 mm/pagewalk.c:475
     madvise_free_single_vma mm/madvise.c:734 [inline]
     madvise_dontneed_free mm/madvise.c:822 [inline]
     madvise_vma mm/madvise.c:996 [inline]
     do_madvise+0xe4a/0x1140 mm/madvise.c:1202
     __do_sys_madvise mm/madvise.c:1228 [inline]
     __se_sys_madvise mm/madvise.c:1226 [inline]
     __x64_sys_madvise+0x5d/0x70 mm/madvise.c:1226
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0xae

    write to 0xffff8881072cfbbc of 1 bytes by task 71 on cpu 0:
     set_tlb_ubc_flush_pending mm/rmap.c:636 [inline]
     try_to_unmap_one+0x60e/0x1220 mm/rmap.c:1515
     rmap_walk_anon+0x2fb/0x470 mm/rmap.c:2301
     try_to_unmap+0xec/0x110
     shrink_page_list+0xe91/0x2620 mm/vmscan.c:1719
     shrink_inactive_list+0x3fb/0x730 mm/vmscan.c:2394
     shrink_list mm/vmscan.c:2621 [inline]
     shrink_lruvec+0x3c9/0x710 mm/vmscan.c:2940
     shrink_node_memcgs+0x23e/0x410 mm/vmscan.c:3129
     shrink_node+0x8f6/0x1190 mm/vmscan.c:3252
     kswapd_shrink_node mm/vmscan.c:4022 [inline]
     balance_pgdat+0x702/0xd30 mm/vmscan.c:4213
     kswapd+0x200/0x340 mm/vmscan.c:4473
     kthread+0x2c7/0x2e0 kernel/kthread.c:327
     ret_from_fork+0x1f/0x30

    value changed: 0x01 -> 0x00

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 71 Comm: kswapd0 Not tainted 5.16.0-rc1-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    ==================================================================

[akpm@linux-foundation.org: tweak comments]

Link: https://lkml.kernel.org/r/20211201021104.126469-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: syzbot+aa5bebed695edaccf0df@syzkaller.appspotmail.com
Cc: Nadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:31 +02:00
..
damon mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()' 2021-12-31 09:20:12 -08:00
kasan kasan: fix quarantine conflicting with init_on_free 2022-01-15 16:30:26 +02:00
kfence kfence: fix memory leak when cat kfence objects 2021-12-25 12:20:55 -08:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-10 17:10:56 -08:00
balloon_compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
bootmem_info.c mm/bootmem_info.c: mark __init on register_page_bootmem_info_section 2021-09-03 09:58:14 -07:00
cleancache.c
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma_sysfs.c mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
cma.c memblock: rename memblock_free to memblock_phys_free 2021-11-06 13:30:41 -07:00
cma.h mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
compaction.c mm: compaction: fix the migration stats in trace_mm_compaction_migratepages() 2022-01-15 16:30:30 +02:00
debug_page_ref.c
debug_vm_pgtable.c mm: ptep_clear() page table helper 2022-01-15 16:30:28 +02:00
debug.c mm,fs: split dump_mapping() out from dump_page() 2022-01-15 16:30:26 +02:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap.c: remove redundant early_ioremap_shutdown() 2021-09-08 11:50:24 -07:00
fadvise.c
failslab.c
filemap.c filemap: remove PageHWPoison check from next_uptodate_page() 2021-12-10 17:10:55 -08:00
folio-compat.c mm/filemap: Add FGP_STABLE 2021-10-18 07:49:41 -04:00
frontswap.c mm/frontswap.c: use non-atomic '__set_bit()' when possible 2022-01-15 16:30:26 +02:00
gup_test.c selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages 2021-05-05 11:27:26 -07:00
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
gup.c mm/gup.c: stricter check on THP migration entry during follow_pmd_mask 2022-01-15 16:30:26 +02:00
highmem.c Fixes for 5.16 folios: 2021-11-25 10:13:56 -08:00
hmm.c mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled 2021-09-08 18:45:52 -07:00
huge_memory.c mm: remove the total_mapcount argument from page_trans_huge_mapcount() 2022-01-15 16:30:28 +02:00
hugetlb_cgroup.c hugetlb: add hugetlb.*.numa_stat file 2022-01-15 16:30:29 +02:00
hugetlb_vmemmap.c mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2021-06-30 20:47:26 -07:00
hugetlb_vmemmap.h mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate 2021-06-30 20:47:25 -07:00
hugetlb.c mm: change page type prior to adding page table entry 2022-01-15 16:30:28 +02:00
hwpoison-inject.c mm: hwpoison: don't drop slab caches for offlining non-LRU page 2021-09-03 09:58:15 -07:00
init-mm.c mm: add setup_initial_init_mm() helper 2021-07-08 11:48:21 -07:00
internal.h mm: make slab and vmalloc allocators __GFP_NOLOCKDEP aware 2022-01-15 16:30:29 +02:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm: move ioremap_page_range to vmalloc.c 2021-09-08 11:50:24 -07:00
Kconfig mm: add a field to store names for private anonymous memory 2022-01-15 16:30:27 +02:00
Kconfig.debug mm: page table check 2022-01-15 16:30:28 +02:00
khugepaged.c mm/vmstat: add events for THP max_ptes_* exceeds 2022-01-15 16:30:29 +02:00
kmemleak.c kmemleak: fix kmemleak false positive report with HW tag-based kasan enable 2022-01-15 16:30:25 +02:00
ksm.c mm: ksm: fix use-after-free kasan report in ksm_might_need_to_copy 2022-01-15 16:30:31 +02:00
list_lru.c mm: list_lru: only add memcg-aware lrus to the global lru list 2021-11-06 13:30:35 -07:00
maccess.c ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault 2021-08-20 11:39:25 +01:00
madvise.c mm: move anon_vma declarations to linux/mm_inline.h 2022-01-15 16:30:27 +02:00
Makefile mm: page table check 2022-01-15 16:30:28 +02:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c arm64 fixes for -rc1 2021-11-10 11:29:30 -08:00
memcontrol.c memcg: add per-memcg vmalloc stat 2022-01-15 16:30:27 +02:00
memfd.c mm,hugetlb: remove mlock ulimit for SHM_HUGETLB 2021-11-09 10:02:48 -08:00
memory_hotplug.c treewide: Add missing includes masked by cgroup -> bpf dependency 2021-12-03 10:58:13 -08:00
memory-failure.c mm/hwpoison: fix unpoison_memory() 2022-01-15 16:30:31 +02:00
memory.c mm: remove last argument of reuse_swap_page() 2022-01-15 16:30:28 +02:00
mempolicy.c mm/mempolicy: fix all kernel-doc warnings 2022-01-15 16:30:30 +02:00
mempool.c mm: remove spurious blkdev.h includes 2021-10-18 06:17:01 -06:00
memremap.c mm/memremap: add ZONE_DEVICE support for compound pages 2022-01-15 16:30:25 +02:00
memtest.c
migrate.c mm/migrate: remove redundant variables used in a for-loop 2022-01-15 16:30:31 +02:00
mincore.c inode: make init and permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
mlock.c mm: add a field to store names for private anonymous memory 2022-01-15 16:30:27 +02:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap_lock.c mm: mmap_lock: fix disabling preemption directly 2021-07-23 17:43:28 -07:00
mmap.c mm: protect free_pgtables with mmap_lock write lock in exit_mmap 2022-01-15 16:30:27 +02:00
mmu_gather.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-25 09:22:55 -07:00
mmzone.c
mprotect.c mm: add a field to store names for private anonymous memory 2022-01-15 16:30:27 +02:00
mremap.c mm, hugepages: add mremap() support for hugepage backed vma 2021-11-06 13:30:39 -07:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
oom_kill.c mm, oom: OOM sysrq should always kill a process 2022-01-15 16:30:30 +02:00
page_alloc.c mm/hwpoison: fix unpoison_memory() 2022-01-15 16:30:31 +02:00
page_counter.c mm/page_counter: remove an incorrect call to propagate_protected_usage() 2022-01-15 16:30:27 +02:00
page_ext.c mm: page table check 2022-01-15 16:30:28 +02:00
page_idle.c mm/idle_page_tracking: make PG_idle reusable 2021-09-08 11:50:24 -07:00
page_io.c for-5.16/block-2021-10-29 2021-11-01 09:19:50 -07:00
page_isolation.c mm/page_isolation: unset migratetype directly for non Buddy page 2022-01-15 16:30:30 +02:00
page_owner.c mm/page_owner.c: modify the type of argument "order" in some functions 2021-11-11 09:34:35 -08:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_table_check.c mm: page table check 2022-01-15 16:30:28 +02:00
page_vma_mapped.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
page-writeback.c folio: Add a function to get the host inode for a folio 2021-11-10 21:16:52 +00:00
pagewalk.c mm: pagewalk: fix walk for hugepage tables 2021-06-29 10:53:49 -07:00
percpu-internal.h mm: memcg/percpu: account extra objcg space to memory cgroups 2022-01-15 16:30:31 +02:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c percpu: rework memcg accounting 2021-06-05 20:43:15 +00:00
percpu-vm.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu.c mm: memcg/percpu: account extra objcg space to memory cgroups 2022-01-15 16:30:31 +02:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
process_vm_access.c mm/process_vm_access.c: remove duplicate include 2021-05-05 11:27:27 -07:00
ptdump.c mm: ptdump: fix build failure 2021-04-16 16:10:37 -07:00
readahead.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
rmap.c mm/rmap: fix potential batched TLB flush race 2022-01-15 16:30:31 +02:00
rodata_test.c
secretmem.c mm/secretmem: avoid letting secretmem_users drop to zero 2021-10-28 17:18:55 -07:00
shmem.c mm: drop node from alloc_pages_vma 2022-01-15 16:30:29 +02:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab_common.c mm: memcontrol: make cgroup_memory_nokmem static 2022-01-15 16:30:27 +02:00
slab.c mm: emit the "free" trace report before freeing memory in kmem_cache_free() 2021-11-20 10:35:54 -08:00
slab.h mm: slab: make slab iterator functions static 2022-01-15 16:30:25 +02:00
slob.c mm: emit the "free" trace report before freeing memory in kmem_cache_free() 2021-11-20 10:35:54 -08:00
slub.c mm/slub: fix endianness bug for alloc/free_traces attributes 2021-12-10 17:10:56 -08:00
sparse-vmemmap.c mm: remove redundant smp_wmb() 2021-11-06 13:30:36 -07:00
sparse.c memblock: use memblock_free for freeing virtual pointers 2021-11-06 13:30:41 -07:00
swap_cgroup.c
swap_slots.c treewide: Add missing includes masked by cgroup -> bpf dependency 2021-12-03 10:58:13 -08:00
swap_state.c mm/workingset: Convert workingset_refault() to take a folio 2021-10-18 07:49:40 -04:00
swap.c mm/swap.c:put_pages_list(): reinitialise the page list 2021-11-20 10:35:54 -08:00
swapfile.c mm: remove the total_mapcount argument from page_trans_huge_mapcount() 2022-01-15 16:30:28 +02:00
truncate.c mm/truncate.c: remove unneeded variable 2022-01-15 16:30:26 +02:00
usercopy.c
userfaultfd.c mm: shmem: don't truncate page if memory failure happens 2022-01-15 16:30:26 +02:00
util.c mm: allow !GFP_KERNEL allocations for kvmalloc 2022-01-15 16:30:29 +02:00
vmacache.c
vmalloc.c mm/vmalloc: be more explicit about supported gfp flags. 2022-01-15 16:30:28 +02:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c vmscan: make drop_slab_node static 2022-01-15 16:30:30 +02:00
vmstat.c mm/vmstat: add events for THP max_ptes_* exceeds 2022-01-15 16:30:29 +02:00
workingset.c Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
z3fold.c mm/z3fold: add kerneldoc fields for z3fold_pool 2021-07-01 11:06:03 -07:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
zsmalloc.c mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration() 2021-11-06 13:30:43 -07:00
zswap.c mm/zswap.c: fix two bugs in zswap_writeback_entry() 2021-06-30 20:47:31 -07:00