linux/mm
Charan Teja Kalla 26c3817cc8 mm: page_alloc: unreserve highatomic page blocks before oom
commit ac3f3b0a55 upstream.

__alloc_pages_direct_reclaim() is called from slowpath allocation where
high atomic reserves can be unreserved after there is a progress in
reclaim and yet no suitable page is found.  Later should_reclaim_retry()
gets called from slow path allocation to decide if the reclaim needs to be
retried before OOM kill path is taken.

should_reclaim_retry() checks the available(reclaimable + free pages)
memory against the min wmark levels of a zone and returns:

a) true, if it is above the min wmark so that slow path allocation will
   do the reclaim retries.

b) false, thus slowpath allocation takes oom kill path.

should_reclaim_retry() can also unreserves the high atomic reserves **but
only after all the reclaim retries are exhausted.**

In a case where there are almost none reclaimable memory and free pages
contains mostly the high atomic reserves but allocation context can't use
these high atomic reserves, makes the available memory below min wmark
levels hence false is returned from should_reclaim_retry() leading the
allocation request to take OOM kill path.  This can turn into a early oom
kill if high atomic reserves are holding lot of free memory and
unreserving of them is not attempted.

(early)OOM is encountered on a VM with the below state:
[  295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB
high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB
active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB
present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB
local_pcp:492kB free_cma:0kB
[  295.998656] lowmem_reserve[]: 0 32
[  295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH)
33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
0*4096kB = 7752kB

Per above log, the free memory of ~7MB exist in the high atomic reserves
is not freed up before falling back to oom kill path.

Fix it by trying to unreserve the high atomic reserves in
should_reclaim_retry() before __alloc_pages_direct_reclaim() can fallback
to oom kill path.

Link: https://lkml.kernel.org/r/1700823445-27531-1-git-send-email-quic_charante@quicinc.com
Fixes: 0aaa29a56e ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Reported-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-31 16:18:57 -08:00
..
damon mm/damon/core: make damon_start() waits until kdamond_fn() starts 2024-01-01 12:42:23 +00:00
kasan kasan: disable kasan_non_canonical_hook() for HW tags 2023-10-18 12:12:41 -07:00
kfence LoongArch changes for v6.6 2023-09-08 12:16:52 -07:00
kmsan mm: kmsan: use helper macros PAGE_ALIGN and PAGE_ALIGN_DOWN 2023-08-21 13:37:29 -07:00
backing-dev.c writeback: remove redundant checks for root memcg 2023-08-21 13:37:48 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
cma.c mm/cma: use nth_page() in place of direct struct page manipulation 2023-11-28 17:20:06 +00:00
cma.h
compaction.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
debug_page_alloc.c mm: page_alloc: split out DEBUG_PAGEALLOC 2023-06-09 16:25:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
debug.c mm: update validate_mm() to use vma iterator 2023-06-09 16:25:31 -07:00
dmapool_test.c dmapool: add alloc/free performance test 2023-04-05 19:42:38 -07:00
dmapool.c dmapool: create/destroy cleanup 2023-06-09 16:25:17 -07:00
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
fadvise.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
fail_page_alloc.c mm: page_alloc: split out FAIL_PAGE_ALLOC 2023-06-09 16:25:23 -07:00
failslab.c
filemap.c mm/filemap: avoid buffered read/write race to read inconsistent data 2024-01-05 15:19:43 +01:00
folio-compat.c filemap: Add fgf_t typedef 2023-07-24 18:04:30 -04:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h
gup.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
highmem.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
hmm.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
huge_memory.c mm: fix for negative counter: nr_file_hugepages 2023-11-28 17:20:13 +00:00
hugetlb_cgroup.c mm/hugetlb: increase use of folios in alloc_huge_page() 2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix a race between vmemmap pmd split 2023-08-18 10:12:14 -07:00
hugetlb_vmemmap.h
hugetlb.c hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write 2023-12-13 18:45:24 +01:00
hwpoison-inject.c
init-mm.c mm: move dummy_vm_ops out of a header 2023-08-21 13:37:46 -07:00
internal.h Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-05-29 16:14:28 +01:00
khugepaged.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
kmemleak.c mm/kmemleak: move up cond_resched() call in page scanning loop 2023-09-02 15:17:34 -07:00
ksm.c mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() 2023-09-05 11:11:52 -07:00
list_lru.c
maccess.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
madvise.c swap: remove remnants of polling from read_swap_cache_async 2023-08-24 16:20:16 -07:00
Makefile - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c mm: disable kernelcore=mirror when no mirror memory 2023-08-21 13:37:43 -07:00
memcontrol.c mm: kmem: drop __GFP_NOFAIL when allocating objcg vectors 2023-11-28 17:20:13 +00:00
memfd.c revert "memfd: improve userspace warnings for missing exec-related flags". 2023-09-05 11:11:52 -07:00
memory_hotplug.c mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval 2024-01-20 11:51:49 +01:00
memory-failure.c mm/memory-failure: pass the folio and the page to collect_procs() 2024-01-10 17:16:54 +01:00
memory-tiers.c memory tier: use helper macro __ATTR_RW() 2023-08-18 10:12:38 -07:00
memory.c mm: fix unmap_mapping_range high bits shift bug 2024-01-10 17:17:00 +01:00
mempolicy.c numa: Generalize numa_map_to_online_node() 2023-11-20 11:58:51 +01:00
mempool.c
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c mm: memtest: convert to memtest_report_meminfo() 2023-08-21 13:37:47 -07:00
migrate_device.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
migrate.c mm: migrate high-order folios in swap cache correctly 2024-01-05 15:19:43 +01:00
mincore.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
mlock.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
mm_init.c efi: disable mirror feature during crashkernel 2024-01-31 16:18:56 -08:00
mm_slot.h
mmap_lock.c
mmap.c mmap: fix error paths with dup_anon_vma() 2023-10-06 14:11:38 -07:00
mmu_gather.c mm: fix kernel-doc warning from tlb_flush_rmaps() 2023-08-24 16:20:30 -07:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c
mprotect.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
mremap.c vm: fix move_vma() memory accounting being off 2023-09-16 15:23:31 -07:00
msync.c
nommu.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
oom_kill.c mm: remove redundant K() macro definition 2023-08-21 13:37:44 -07:00
page_alloc.c mm: page_alloc: unreserve highatomic page blocks before oom 2024-01-31 16:18:57 -08:00
page_counter.c
page_ext.c mm/page_ext: move functions around for minor cleanups to page_ext 2023-08-18 10:12:31 -07:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c zswap: make zswap_load() take a folio 2023-08-21 13:37:27 -07:00
page_isolation.c mm/hugetlb: get rid of page_hstate() 2023-08-18 10:12:39 -07:00
page_owner.c mm/page_ext: use page_ext_data helper in page_owner 2023-08-21 13:37:27 -07:00
page_poison.c mm/page_poison: remove unused page_ext.h from page_poison 2023-08-21 13:37:30 -07:00
page_reporting.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c mm: convert page_table_check_pte_set() to page_table_check_ptes_set() 2023-08-24 16:20:18 -07:00
page_vma_mapped.c mm: correct stale comment of function check_pte 2023-08-18 10:12:13 -07:00
page-writeback.c filemap: add a per-mapping stable writes flag 2023-12-03 07:33:03 +01:00
pagewalk.c mm/pagewalk: fix bootstopping regression from extra pte_unmap() 2023-09-02 08:39:21 -07:00
percpu-internal.h percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing 2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm/percpu.c: print error message too if atomic alloc failed 2023-08-25 08:04:59 -07:00
pgalloc-track.h
pgtable-generic.c mm/pgtable: notes on pte_offset_map[_lock]() 2023-08-18 10:12:25 -07:00
process_vm_access.c mm/gup: remove unused vmas parameter from pin_user_pages_remote() 2023-06-09 16:25:25 -07:00
ptdump.c mm: ptdump should use ptep_get_lockless() 2023-06-19 16:19:24 -07:00
readahead.c vfs: fix readahead(2) on block devices 2023-11-20 11:58:52 +01:00
rmap.c mm: hugetlb: add huge page size param to set_huge_pte_at() 2023-09-29 17:20:47 -07:00
rodata_test.c
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem_quota.c shmem: Add default quota limit mount options 2023-08-09 09:15:40 +02:00
shmem.c mm/shmem: fix race in shmem_undo_range w/THP 2023-12-20 17:02:03 +01:00
show_mem.c mm,thp: no space after colon in Mem-Info fields 2023-08-21 13:38:01 -07:00
shrinker_debug.c Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless" 2023-06-19 13:19:34 -07:00
shuffle.c
shuffle.h mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
slab_common.c mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign() 2023-10-11 15:24:49 +02:00
slab.c Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slab.h Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slub.c mm/slub: remove freelist_dereference() 2023-07-14 09:57:21 +02:00
sparse-vmemmap.c mm/vmemmap: allow architectures to override how vmemmap optimization works 2023-08-18 10:12:53 -07:00
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2024-01-31 16:18:56 -08:00
swap_cgroup.c
swap_slots.c
swap_state.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
swap.c mm: remove references to pagevec 2023-06-23 16:59:30 -07:00
swap.h swap: remove remnants of polling from read_swap_cache_async 2023-08-24 16:20:16 -07:00
swapfile.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
truncate.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
util.c parisc: fix mmap_base calculation when stack grows upwards 2023-11-28 17:20:08 +00:00
vmalloc.c mm: hugetlb: add huge page size param to set_huge_pte_at() 2023-09-29 17:20:47 -07:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-08-16 12:21:32 +01:00
vmscan.c mm/mglru: skip special VMAs in lru_gen_look_around() 2024-01-10 17:17:00 +01:00
vmstat.c mm/vmstat: remove unused page_ext.h from vmstat 2023-08-21 13:37:30 -07:00
workingset.c mm/mglru: fix underprotected page cache 2023-12-20 17:02:02 +01:00
z3fold.c mm/z3fold: remove obsolete comment for struct z3fold_pool 2023-08-21 13:37:51 -07:00
zbud.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zpool.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zsmalloc.c merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes 2023-08-21 14:26:20 -07:00
zswap.c mm: zswap: fix pool refcount bug around shrink_worker() 2023-10-18 12:12:40 -07:00