linux/mm
Vlastimil Babka b1683d6a7e mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
commit 803de9000f upstream.

Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO.  Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.

Quoting Sven:

1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
   with __GFP_RETRY_MAYFAIL set.

2. page alloc's __alloc_pages_slowpath tries to get a page from the
   freelist. This fails because there is nothing free of that costly
   order.

3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
   which bails out because a zone is ready to be compacted; it pretends
   to have made a single page of progress.

4. page alloc tries to compact, but this always bails out early because
   __GFP_IO is not set (it's not passed by the snd allocator, and even
   if it were, we are suspending so the __GFP_IO flag would be cleared
   anyway).

5. page alloc believes reclaim progress was made (because of the
   pretense in item 3) and so it checks whether it should retry
   compaction. The compaction retry logic thinks it should try again,
   because:
    a) reclaim is needed because of the early bail-out in item 4
    b) a zonelist is suitable for compaction

6. goto 2. indefinite stall.

(end quote)

The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries.  There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.

To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.

Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
  there's at least one attempt to actually reclaim, even if chances are
  small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
  return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
  which is then used to avoid retrying reclaim/compaction for costly
  allocations (step 5) if we can't compact and also to skip the early
  compaction attempt that we do in some cases

Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes: 3250845d05 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sven van Ashbrook <svenva@chromium.org>
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03 15:11:40 +02:00
..
damon mm/damon/reclaim: fix quota stauts loss due to online tunings 2024-03-01 13:41:47 +01:00
kasan kasan/test: avoid gcc warning for intentional overflow 2024-04-03 15:11:11 +02:00
kfence LoongArch changes for v6.6 2023-09-08 12:16:52 -07:00
kmsan mm: kmsan: panic on failure to allocate early boot metadata 2023-10-25 16:47:10 -07:00
backing-dev.c blk-wbt: Fix detection of dirty-throttled tasks 2024-02-23 09:51:56 +01:00
balloon_compaction.c
bootmem_info.c bootmem: use kmemleak_free_part_phys in put_page_bootmem 2023-10-25 16:47:13 -07:00
cma_debug.c
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
cma.c mm/cma: use nth_page() in place of direct struct page manipulation 2023-10-04 10:32:29 -07:00
cma.h
compaction.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-03 15:11:40 +02:00
debug_page_alloc.c mm: page_alloc: split out DEBUG_PAGEALLOC 2023-06-09 16:25:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: fix BUG_ON with pud advanced test 2024-03-06 14:53:57 +00:00
debug.c mm: update validate_mm() to use vma iterator 2023-06-09 16:25:31 -07:00
dmapool_test.c dmapool: add alloc/free performance test 2023-04-05 19:42:38 -07:00
dmapool.c dmapool: create/destroy cleanup 2023-06-09 16:25:17 -07:00
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
fadvise.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
fail_page_alloc.c mm: page_alloc: split out FAIL_PAGE_ALLOC 2023-06-09 16:25:23 -07:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c mm: cachestat: fix folio read-after-free in cache walk 2024-03-06 14:53:55 +00:00
folio-compat.c filemap: Add fgf_t typedef 2023-07-24 18:04:30 -04:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h
gup.c mm/gup: make failure to pin an error if FOLL_NOWAIT not specified 2023-10-18 14:34:15 -07:00
highmem.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
hmm.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
huge_memory.c mm: thp_get_unmapped_area must honour topdown preference 2024-02-23 09:51:28 +01:00
hugetlb_cgroup.c mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER 2023-10-18 14:34:17 -07:00
hugetlb_vmemmap.c hugetlb_vmemmap: use folio argument for hugetlb_vmemmap_* functions 2023-10-25 16:47:08 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hugetlb.c hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write 2023-12-06 16:12:43 -08:00
hwpoison-inject.c
init-mm.c mm: move dummy_vm_ops out of a header 2023-08-21 13:37:46 -07:00
internal.h mm: add page_rmappable_folio() wrapper 2023-10-25 16:47:16 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig mm/Kconfig: make userfaultfd a menuconfig 2023-12-06 16:12:47 -08:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-05-29 16:14:28 +01:00
khugepaged.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
kmemleak.c mm/kmemleak: move set_track_prepare() outside raw_spinlocks 2023-12-06 16:12:44 -08:00
ksm.c mm: more ptep_get() conversion 2023-11-15 15:30:09 -08:00
list_lru.c
maccess.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
madvise.c mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range() 2023-12-06 16:12:50 -08:00
Makefile mm: vmscan: move shrinker-related code into a separate file 2023-10-04 10:32:23 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c x86/numa: Fix the address overlap check in numa_fill_memblks() 2024-03-01 13:41:54 +01:00
memcontrol.c eventfd: simplify eventfd_signal() 2024-04-03 15:11:23 +02:00
memfd.c memfd: drop warning for missing exec-related flags 2023-10-04 10:32:22 -07:00
memory_hotplug.c mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval 2024-01-20 11:53:02 +01:00
memory-failure.c fs/hugetlbfs/inode.c: mm/memory-failure.c: fix hugetlbfs hwpoison handling 2024-02-23 09:51:29 +01:00
memory-tiers.c dax, kmem: calculate abstract distance with general interface 2023-10-16 15:44:39 -07:00
memory.c mm/swap: fix race when skipping swapcache 2024-03-01 13:41:47 +01:00
mempolicy.c Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-04-03 15:11:26 +02:00
migrate_device.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
migrate.c mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index 2024-03-06 14:53:57 +00:00
mincore.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
mlock.c mm: mlock: avoid folio_within_range() on KSM pages 2023-10-25 16:47:14 -07:00
mm_init.c efi: disable mirror feature during crashkernel 2024-01-31 16:21:02 -08:00
mm_slot.h
mmap_lock.c
mmap.c mm, mmap: fix vma_merge() case 7 with vma_ops->close 2024-04-03 15:11:36 +02:00
mmu_gather.c mm: fix kernel-doc warning from tlb_flush_rmaps() 2023-08-24 16:20:30 -07:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c mm: remove page_cpupid_xchg_last() 2023-10-25 16:47:13 -07:00
mprotect.c mm: mprotect: use a folio in change_pte_range() 2023-10-25 16:47:12 -07:00
mremap.c mm: abstract VMA merge and extend into vma_merge_extend() helper 2023-10-18 14:34:18 -07:00
msync.c
nommu.c Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
oom_kill.c mm/oom_killer: simplify OOM killer info dump helper 2023-10-25 16:47:10 -07:00
page_alloc.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-03 15:11:40 +02:00
page_counter.c
page_ext.c mm/page_ext: move functions around for minor cleanups to page_ext 2023-08-18 10:12:31 -07:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c mm: memcg: add THP swap out info for anonymous reclaim 2023-10-04 10:32:27 -07:00
page_isolation.c mm/hugetlb: get rid of page_hstate() 2023-08-18 10:12:39 -07:00
page_owner.c mm/page_owner: remove free_ts from page_owner output 2023-10-18 14:34:19 -07:00
page_poison.c mm/page_poison: remove unused page_ext.h from page_poison 2023-08-21 13:37:30 -07:00
page_reporting.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c mm: convert page_table_check_pte_set() to page_table_check_ptes_set() 2023-08-24 16:20:18 -07:00
page_vma_mapped.c mm: correct stale comment of function check_pte 2023-08-18 10:12:13 -07:00
page-writeback.c blk-wbt: Fix detection of dirty-throttled tasks 2024-02-23 09:51:56 +01:00
pagewalk.c mm/pagewalk: fix bootstopping regression from extra pte_unmap() 2023-09-02 08:39:21 -07:00
percpu-internal.h percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing 2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: Introduce flush_cache_vmap_early() 2024-02-16 19:14:25 +01:00
pgalloc-track.h
pgtable-generic.c mm/pgtable: notes on pte_offset_map[_lock]() 2023-08-18 10:12:25 -07:00
process_vm_access.c mm/gup: remove unused vmas parameter from pin_user_pages_remote() 2023-06-09 16:25:25 -07:00
ptdump.c mm: ptdump should use ptep_get_lockless() 2023-06-19 16:19:24 -07:00
readahead.c readahead: avoid multiple marked readahead pages 2024-03-15 10:48:22 -04:00
rmap.c mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap() 2023-10-18 14:34:14 -07:00
rodata_test.c
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem_quota.c shmem: Add default quota limit mount options 2023-08-09 09:15:40 +02:00
shmem.c quota: Properly annotate i_dquot arrays with __rcu 2024-03-26 18:18:32 -04:00
show_mem.c mm: refactor si_mem_available() 2023-10-04 10:32:19 -07:00
shrinker_debug.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shuffle.c
shuffle.h mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
slab_common.c RCU pull request for v6.7 2023-10-30 18:01:41 -10:00
slab.c Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slab.h mm: kmem: scoped objcg protection 2023-10-25 16:47:11 -07:00
slub.c mm/slub: refactor calculate_order() and calc_slab_order() 2023-10-02 11:55:47 +02:00
sparse-vmemmap.c mm/vmemmap: allow architectures to override how vmemmap optimization works 2023-08-18 10:12:53 -07:00
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2024-01-31 16:21:03 -08:00
swap_cgroup.c
swap_slots.c
swap_state.c mempolicy: alloc_pages_mpol() for NUMA policy without vma 2023-10-25 16:47:16 -07:00
swap.c mm: remove references to pagevec 2023-06-23 16:59:30 -07:00
swap.h mm/swap: fix race when skipping swapcache 2024-03-01 13:41:47 +01:00
swapfile.c mm: swap: fix race between free_swap_and_cache() and swapoff() 2024-04-03 15:11:19 +02:00
truncate.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb 2024-02-23 09:51:28 +01:00
util.c parisc: fix mmap_base calculation when stack grows upwards 2023-11-15 15:30:09 -08:00
vmalloc.c mm/vmalloc: fix the unchecked dereference warning in vread_iter() 2023-11-01 12:38:35 -07:00
vmpressure.c eventfd: simplify eventfd_signal() 2024-04-03 15:11:23 +02:00
vmscan.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-03 15:11:40 +02:00
vmstat.c mm: tune PCP high automatically 2023-10-25 16:47:10 -07:00
workingset.c mm/mglru: fix underprotected page cache 2023-12-12 17:20:19 -08:00
z3fold.c mm/z3fold: remove obsolete comment for struct z3fold_pool 2023-08-21 13:37:51 -07:00
zbud.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zpool.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zsmalloc.c zsmalloc: use copy_page for full page copy 2023-10-18 14:34:16 -07:00
zswap.c mm/zswap: invalidate duplicate entry when !zswap_enabled 2024-03-01 13:42:00 +01:00