linux/mm
Michal Hocko e5e3f4c4f0 mm, oom_reaper: make sure that mmput_async is called only when memory was reaped
Tetsuo is worried that mmput_async might still lead to a premature new
oom victim selection due to the following race:

__oom_reap_task				exit_mm
  find_lock_task_mm
  atomic_inc(mm->mm_users) # = 2
  task_unlock
  					  task_lock
					  task->mm = NULL
					  up_read(&mm->mmap_sem)
		< somebody write locks mmap_sem >
					  task_unlock
					  mmput
  					    atomic_dec_and_test # = 1
					  exit_oom_victim
  down_read_trylock # failed - no reclaim
  mmput_async # Takes unpredictable amount of time
  		< new OOM situation >

the final __mmput will be executed in the delayed context which might
happen far in the future.  Such a race is highly unlikely because the
write holder of mmap_sem would have to be an external task (all direct
holders are already killed or exiting) and it usually have to pin
mm_users in order to do anything reasonable.

We can, however, make sure that the mmput_async is only called when we
do not back off and reap some memory.  That would reduce the impact of
the delayed __mmput because the real content would be already freed.
Pin mm_count to keep it alive after we drop task_lock and before we try
to get mmap_sem.  If the mmap_sem succeeds we can try to grab mm_users
reference and then go on with unmapping the address space.

It is not clear whether this race is possible at all but it is better to
be more robust and do not pin mm_users unless we are sure we are
actually doing some real work during __oom_reap_task.

Link: http://lkml.kernel.org/r/1465306987-30297-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
..
kasan kasan/quarantine: fix bugs on qlist_move_cache() 2016-07-15 14:54:27 +09:00
backing-dev.c mm: throttle on IO only when there are too many dirty and writeback pages 2016-05-20 17:58:30 -07:00
balloon_compaction.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
bootmem.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
cma.c mm/cma: silence warnings due to max() usage 2016-05-27 14:49:37 -07:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
compaction.c mm/page_alloc: introduce post allocation processing on page allocator 2016-07-26 16:19:19 -07:00
debug_page_ref.c mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
debug.c mm: introduce page reference manipulation functions 2016-03-17 15:09:34 -07:00
dmapool.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
early_ioremap.c mm/early_ioremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
fadvise.c mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED 2016-06-09 14:23:11 -07:00
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c Revert "mm: make faultaround produce old ptes" 2016-06-24 17:23:52 -07:00
frame_vector.c mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm 2016-02-16 10:11:12 +01:00
frontswap.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
gup.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
highmem.c mm/highmem: make nr_free_highpages() handles all highmem zones by itself 2016-05-19 19:12:14 -07:00
huge_memory.c mm/mmu_gather: track page size with mmu gather and force flush if page size change 2016-07-26 16:19:19 -07:00
hugetlb_cgroup.c mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size 2016-05-20 17:58:30 -07:00
hugetlb.c mm/mmu_gather: track page size with mmu gather and force flush if page size change 2016-07-26 16:19:19 -07:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h mm/page_alloc: introduce post allocation processing on page allocator 2016-07-26 16:19:19 -07:00
interval_tree.c mm: replace vma->sharead.linear with vma->shared 2015-02-10 14:30:31 -08:00
Kconfig mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM 2016-05-27 14:49:37 -07:00
Kconfig.debug mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
kmemcheck.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c mm: prevent KASAN false positives in kmemleak 2016-06-24 17:23:52 -07:00
ksm.c mm: migrate: support non-lru movable page migration 2016-07-26 16:19:19 -07:00
list_lru.c mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
maccess.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
madvise.c mm: make mmap_sem for write waits killable for mm syscalls 2016-05-23 17:04:14 -07:00
Makefile z3fold: the 3-fold allocator for compressed pages 2016-05-20 17:58:30 -07:00
memblock.c mm/memblock.c: remove unnecessary always-true comparison 2016-05-20 17:58:30 -07:00
memcontrol.c mm,oom: remove unused argument from oom_scan_process_thread(). 2016-07-26 16:19:19 -07:00
memory_hotplug.c memory-hotplug: more general validation of zone during online 2016-07-26 16:19:19 -07:00
memory-failure.c mm/memory-failure.c: replace "MCE" with "Memory failure" 2016-05-20 17:58:30 -07:00
memory.c mm/mmu_gather: track page size with mmu gather and force flush if page size change 2016-07-26 16:19:19 -07:00
mempolicy.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
mempool.c mm: mempool: kasan: don't poot mempool objects in quarantine 2016-06-24 17:23:52 -07:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
mincore.c mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
mlock.c mm: make mmap_sem for write waits killable for mm syscalls 2016-05-23 17:04:14 -07:00
mm_init.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
mmap.c x86/vdso: Add mremap hook to vm_special_mapping 2016-07-08 14:17:51 +02:00
mmu_context.c mm/mmu_context, sched/core: Fix mmu_context.h assumption 2016-04-28 11:44:19 +02:00
mmu_notifier.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
mmzone.c mm, page_alloc: inline the fast path of the zonelist iterator 2016-05-19 19:12:14 -07:00
mprotect.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
mremap.c mm: thp: check pmd_trans_unstable() after split_huge_pmd() 2016-07-26 16:19:19 -07:00
msync.c mm/msync: use offset_in_page macro 2015-11-05 19:34:48 -08:00
nobootmem.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
nommu.c mm: remove more IS_ERR_VALUE abuses 2016-05-27 15:57:31 -07:00
oom_kill.c mm, oom_reaper: make sure that mmput_async is called only when memory was reaped 2016-07-26 16:19:19 -07:00
page_alloc.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm: use early_pfn_to_nid in page_ext_init 2016-05-27 14:49:37 -07:00
page_idle.c mm: add page_check_address_transhuge() helper 2016-01-15 17:56:32 -08:00
page_io.c Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-05-17 15:05:23 -07:00
page_isolation.c mm/page_isolation: clean up confused code 2016-07-26 16:19:19 -07:00
page_owner.c mm/page_owner: use stackdepot to store stacktrace 2016-07-26 16:19:19 -07:00
page_poison.c mm: check the return value of lookup_page_ext for all call sites 2016-06-03 15:06:22 -07:00
page-writeback.c fs/fs-writeback.c: add a new writeback list for sync 2016-07-26 16:19:19 -07:00
pagewalk.c thp: rename split_huge_page_pmd() to split_huge_pmd() 2016-01-15 17:56:32 -08:00
percpu-km.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: fix synchronization between synchronous map extension and chunk destruction 2016-05-25 11:48:25 -04:00
pgtable-generic.c mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range 2016-03-17 15:09:34 -07:00
process_vm_access.c mm/gup: Introduce get_user_pages_remote() 2016-02-16 10:04:09 +01:00
quicklist.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
readahead.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
rmap.c mm: thp: refix false positive BUG in page_move_anon_rmap() 2016-07-15 14:54:27 +09:00
shmem.c tmpfs: fix regression hang in fallocate undo 2016-07-10 20:08:44 -07:00
slab_common.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
slab.c mm/slab: use list_move instead of list_del/list_add 2016-07-26 16:19:19 -07:00
slab.h mm: memcontrol: cleanup kmem charge functions 2016-07-26 16:19:19 -07:00
slob.c mm: slab: free kmem_cache_node after destroy sysfs file 2016-02-18 16:23:24 -08:00
slub.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
sparse-vmemmap.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
sparse.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
swap_cgroup.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
swap_state.c mm: thp: broken page count after commit aa88b68c3b 2016-06-09 14:23:11 -07:00
swap.c mm/swap.c: flush lru pvecs on compound page arrival 2016-06-24 17:23:52 -07:00
swapfile.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
truncate.c dax: New fault locking 2016-05-19 15:20:54 -06:00
userfaultfd.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
util.c mm: migrate: support non-lru movable page migration 2016-07-26 16:19:19 -07:00
vmacache.c mm/vmacache: inline vmacache_valid_mm() 2015-11-05 19:34:48 -08:00
vmalloc.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
vmpressure.c mm/vmpressure.c: fix subtree pressure detection 2016-02-03 08:28:43 -08:00
vmscan.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
vmstat.c mm: add NR_ZSMALLOC to vmstat 2016-07-26 16:19:19 -07:00
workingset.c mm: workingset: printk missing log level, use pr_info() 2016-07-15 14:54:27 +09:00
z3fold.c mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup 2016-06-03 16:02:55 -07:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c mm: add NR_ZSMALLOC to vmstat 2016-07-26 16:19:19 -07:00
zswap.c mm/zswap: use workqueue to destroy pool 2016-05-20 17:58:30 -07:00