linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-22 20:23:57 +08:00

History

Michal Hocko aac4536355 mm, oom: introduce oom reaper This patch (of 5): This is based on the idea from Mel Gorman discussed during LSFMM 2015 and independently brought up by Oleg Nesterov. The OOM killer currently allows to kill only a single task in a good hope that the task will terminate in a reasonable time and frees up its memory. Such a task (oom victim) will get an access to memory reserves via mark_oom_victim to allow a forward progress should there be a need for additional memory during exit path. It has been shown (e.g. by Tetsuo Handa) that it is not that hard to construct workloads which break the core assumption mentioned above and the OOM victim might take unbounded amount of time to exit because it might be blocked in the uninterruptible state waiting for an event (e.g. lock) which is blocked by another task looping in the page allocator. This patch reduces the probability of such a lockup by introducing a specialized kernel thread (oom_reaper) which tries to reclaim additional memory by preemptively reaping the anonymous or swapped out memory owned by the oom victim under an assumption that such a memory won't be needed when its owner is killed and kicked from the userspace anyway. There is one notable exception to this, though, if the OOM victim was in the process of coredumping the result would be incomplete. This is considered a reasonable constrain because the overall system health is more important than debugability of a particular application. A kernel thread has been chosen because we need a reliable way of invocation so workqueue context is not appropriate because all the workers might be busy (e.g. allocating memory). Kswapd which sounds like another good fit is not appropriate as well because it might get blocked on locks during reclaim as well. oom_reaper has to take mmap_sem on the target task for reading so the solution is not 100% because the semaphore might be held or blocked for write but the probability is reduced considerably wrt. basically any lock blocking forward progress as described above. In order to prevent from blocking on the lock without any forward progress we are using only a trylock and retry 10 times with a short sleep in between. Users of mmap_sem which need it for write should be carefully reviewed to use _killable waiting as much as possible and reduce allocations requests done with the lock held to absolute minimum to reduce the risk even further. The API between oom killer and oom reaper is quite trivial. wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only NULL->mm transition and oom_reaper clear this atomically once it is done with the work. This means that only a single mm_struct can be reaped at the time. As the operation is potentially disruptive we are trying to limit it to the ncessary minimum and the reaper blocks any updates while it operates on an mm. mm_struct is pinned by mm_count to allow parallel exit_mmap and a race is detected by atomic_inc_not_zero(mm_users). Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Mel Gorman <mgorman@suse.de> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2016-03-25 16:37:42 -07:00
..
kasan	kernel: add kcov code coverage	2016-03-22 15:36:02 -07:00
backing-dev.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
balloon_compaction.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2016-03-17 21:38:27 -07:00
bootmem.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
cleancache.c	cleancache: constify cleancache_ops structure	2016-01-27 09:09:57 -05:00
cma_debug.c	mm/cma_debug: correct size input to bitmap function	2015-07-17 16:39:54 -07:00
cma.c	mm/cma.c: suppress warning	2015-11-05 19:34:48 -08:00
cma.h	mm: cma: mark cma_bitmap_maxno() inline in header	2015-08-14 15:56:32 -07:00
compaction.c	mm, kswapd: replace kswapd compaction with waking up kcompactd	2016-03-17 15:09:34 -07:00
debug_page_ref.c	mm/page_ref: add tracepoint to track down page reference manipulation	2016-03-17 15:09:34 -07:00
debug.c	mm: introduce page reference manipulation functions	2016-03-17 15:09:34 -07:00
dmapool.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
early_ioremap.c	mm/early_ioremap: use offset_in_page macro	2015-11-05 19:34:48 -08:00
fadvise.c	writeback: implement and use inode_congested()	2015-06-02 08:33:35 -06:00
failslab.c	mm: fault-inject take over bootstrap kmem_cache check	2016-03-15 16:55:16 -07:00
filemap.c	mm: use radix_tree_iter_retry()	2016-03-17 15:09:34 -07:00
frame_vector.c	mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm	2016-02-16 10:11:12 +01:00
frontswap.c	frontswap: allow multiple backends	2015-06-24 17:49:45 -07:00
gup.c	mm/core, x86/mm/pkeys: Differentiate instruction fetches	2016-02-18 19:46:29 +01:00
highmem.c	mm/highmem: make kmap cache coloring aware	2014-08-06 18:01:22 -07:00
huge_memory.c	powerpc updates for 4.6	2016-03-19 15:38:41 -07:00
hugetlb_cgroup.c	mm: make compound_head() robust	2015-11-06 17:50:42 -08:00
hugetlb.c	mm: convert pr_warning to pr_warn	2016-03-17 15:09:34 -07:00
hwpoison-inject.c	hwpoison: use page_cgroup_ino for filtering by memcg	2015-09-10 13:29:01 -07:00
init-mm.c
internal.h	mm, oom: introduce oom reaper	2016-03-25 16:37:42 -07:00
interval_tree.c	mm: replace vma->sharead.linear with vma->shared	2015-02-10 14:30:31 -08:00
Kconfig	Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-03-20 19:08:56 -07:00
Kconfig.debug	mm/page_ref: add tracepoint to track down page reference manipulation	2016-03-17 15:09:34 -07:00
kmemcheck.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak-test.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak.c	mm: coalesce split strings	2016-03-17 15:09:34 -07:00
ksm.c	mm/core: Do not enforce PKEY permissions on remote mm access	2016-02-18 19:46:28 +01:00
list_lru.c	mm: memcontrol: move kmem accounting code to CONFIG_MEMCG	2016-01-20 17:09:18 -08:00
maccess.c	mm/maccess.c: actually return -EFAULT from strncpy_from_unsafe	2015-11-05 19:34:48 -08:00
madvise.c	mm/madvise: update comment on sys_madvise()	2016-03-15 16:55:16 -07:00
Makefile	kernel: add kcov code coverage	2016-03-22 15:36:02 -07:00
memblock.c	mm: coalesce split strings	2016-03-17 15:09:34 -07:00
memcontrol.c	mm: memcontrol: zap oom_info_lock	2016-03-17 15:09:34 -07:00
memory_hotplug.c	mm: coalesce split strings	2016-03-17 15:09:34 -07:00
memory-failure.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
memory.c	mm, oom: introduce oom reaper	2016-03-25 16:37:42 -07:00
mempolicy.c	Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-03-20 19:08:56 -07:00
mempool.c	mm, mempool: only set __GFP_NOMEMALLOC if there are free elements	2016-03-17 15:09:34 -07:00
memtest.c	memtest: remove unused header files	2015-09-08 15:35:28 -07:00
migrate.c	mm: make remove_migration_ptes() beyond mm/migration.c	2016-03-17 15:09:34 -07:00
mincore.c	thp: change pmd_trans_huge_lock() interface to return ptl	2016-01-21 17:20:51 -08:00
mlock.c	mm: fix mlock accouting	2016-01-21 17:20:51 -08:00
mm_init.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
mmap.c	Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-03-20 19:08:56 -07:00
mmu_context.c	sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm	2014-02-21 08:50:17 +01:00
mmu_notifier.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
mmzone.c	mm/mmzone.c: memmap_valid_within() can be boolean	2016-01-14 16:00:49 -08:00
mprotect.c	mm/mprotect.c: don't imply PROT_EXEC on non-exec fs	2016-03-22 15:36:02 -07:00
mremap.c	mm: cleanup pte_alloc interfaces	2016-03-17 15:09:34 -07:00
msync.c	mm/msync: use offset_in_page macro	2015-11-05 19:34:48 -08:00
nobootmem.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
nommu.c	Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-03-20 19:08:56 -07:00
oom_kill.c	mm, oom: introduce oom reaper	2016-03-25 16:37:42 -07:00
page_alloc.c	mm,oom: do not loop !__GFP_FS allocation if the OOM killer is disabled	2016-03-17 15:09:34 -07:00
page_counter.c	mm: page_counter: let page_counter_try_charge() return bool	2015-11-05 19:34:48 -08:00
page_ext.c	mm/page_poisoning.c: allow for zero poisoning	2016-03-15 16:55:16 -07:00
page_idle.c	mm: add page_check_address_transhuge() helper	2016-01-15 17:56:32 -08:00
page_io.c	zram: revive swap_slot_free_notify	2016-03-22 15:36:02 -07:00
page_isolation.c	mm/page_isolation: do some cleanup in "undo_isolate_page_range"	2016-01-15 17:56:32 -08:00
page_owner.c	mm: coalesce split strings	2016-03-17 15:09:34 -07:00
page_poison.c	mm/page_poisoning.c: allow for zero poisoning	2016-03-15 16:55:16 -07:00
page-writeback.c	mm: remove unnecessary uses of lock_page_memcg()	2016-03-15 16:55:16 -07:00
pagewalk.c	thp: rename split_huge_page_pmd() to split_huge_pmd()	2016-01-15 17:56:32 -08:00
percpu-km.c	mm: percpu: use pr_fmt to prefix output	2016-03-17 15:09:34 -07:00
percpu-vm.c	percpu: move region iterations out of pcpu_[de]populate_chunk()	2014-09-02 14:46:02 -04:00
percpu.c	mm: percpu: use pr_fmt to prefix output	2016-03-17 15:09:34 -07:00
pgtable-generic.c	mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range	2016-03-17 15:09:34 -07:00
process_vm_access.c	mm/gup: Introduce get_user_pages_remote()	2016-02-16 10:04:09 +01:00
quicklist.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
readahead.c	mm: move lru_to_page to mm_inline.h	2016-01-14 16:00:49 -08:00
rmap.c	thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers	2016-03-17 15:09:34 -07:00
shmem.c	radix-tree,shmem: introduce radix_tree_iter_next()	2016-03-17 15:09:34 -07:00
slab_common.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
slab.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
slab.h	mm: memcontrol: report slab usage in cgroup2 memory.stat	2016-03-17 15:09:34 -07:00
slob.c	mm: slab: free kmem_cache_node after destroy sysfs file	2016-02-18 16:23:24 -08:00
slub.c	mm: coalesce split strings	2016-03-17 15:09:34 -07:00
sparse-vmemmap.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
sparse.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
swap_cgroup.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
swap_state.c	mm: memcontrol: charge swap to cgroup2	2016-01-20 17:09:18 -08:00
swap.c	mm, x86: get_user_pages() for dax mappings	2016-01-15 17:56:32 -08:00
swapfile.c	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux	2016-03-21 13:48:00 -07:00
truncate.c	mm: remove unnecessary uses of lock_page_memcg()	2016-03-15 16:55:16 -07:00
userfaultfd.c	mm: cleanup pte_alloc interfaces	2016-03-17 15:09:34 -07:00
util.c	Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-03-20 19:08:56 -07:00
vmacache.c	mm/vmacache: inline vmacache_valid_mm()	2015-11-05 19:34:48 -08:00
vmalloc.c	mm/vmalloc: use PAGE_ALIGNED() to check PAGE_SIZE alignment	2016-03-17 15:09:34 -07:00
vmpressure.c	mm/vmpressure.c: fix subtree pressure detection	2016-02-03 08:28:43 -08:00
vmscan.c	mm: introduce page reference manipulation functions	2016-03-17 15:09:34 -07:00
vmstat.c	thp, vmstats: count deferred split events	2016-03-17 15:09:34 -07:00
workingset.c	mm: workingset: make shadow node shrinker memcg aware	2016-03-17 15:09:34 -07:00
zbud.c	mm/zbud.c: use list_last_entry() instead of list_tail_entry()	2016-01-15 11:40:52 -08:00
zpool.c	mm: zsmalloc: constify struct zs_pool name	2015-11-06 17:50:42 -08:00
zsmalloc.c	mm/zsmalloc: add `freeable' column to pool stat	2016-03-17 15:09:34 -07:00
zswap.c	mm/zswap: change incorrect strncmp use to strcmp	2015-12-18 14:25:40 -08:00