linux/mm
Vlastimil Babka 3bc48f96cf mm, page_alloc: split smallest stolen page in fallback
The __rmqueue_fallback() function is called when there's no free page of
requested migratetype, and we need to steal from a different one.

There are various heuristics to make this event infrequent and reduce
permanent fragmentation.  The main one is to try stealing from a
pageblock that has the most free pages, and possibly steal them all at
once and convert the whole pageblock.  Precise searching for such
pageblock would be expensive, so instead the heuristics walks the free
lists from MAX_ORDER down to requested order and assumes that the block
with highest-order free page is likely to also have the most free pages
in total.

Chances are that together with the highest-order page, we steal also
pages of lower orders from the same block.  But then we still split the
highest order page.  This is wasteful and can contribute to
fragmentation instead of avoiding it.

This patch thus changes __rmqueue_fallback() to just steal the page(s)
and put them on the freelist of the requested migratetype, and only
report whether it was successful.  Then we pick (and eventually split)
the smallest page with __rmqueue_smallest().  This all happens under
zone lock, so nobody can steal it from us in the process.  This should
reduce fragmentation due to fallbacks.  At worst we are only stealing a
single highest-order page and waste some cycles by moving it between
lists and then removing it, but fallback is not exactly hot path so that
should not be a concern.  As a side benefit the patch removes some
duplicate code by reusing __rmqueue_smallest().

[vbabka@suse.cz: fix endless loop in the modified __rmqueue()]
  Link: http://lkml.kernel.org/r/59d71b35-d556-4fc9-ee2e-1574259282fd@suse.cz
Link: http://lkml.kernel.org/r/20170307131545.28577-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:09 -07:00
..
kasan kasan: separate report parts by empty lines 2017-05-03 15:52:12 -07:00
backing-dev.c bdi: Drop 'parent' argument from bdi_register[_va]() 2017-04-20 12:09:55 -06:00
balloon_compaction.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
bootmem.c mm/bootmem.c: cosmetic improvement of code readability 2017-02-22 16:41:29 -08:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma_debug.c cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
cma.c cma: Introduce cma_for_each_area 2017-04-18 20:41:12 +02:00
cma.h cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
compaction.c mm, compaction: remove redundant watermark check in compact_finished() 2017-05-08 17:15:09 -07:00
debug_page_ref.c mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
debug.c mm, debug: print raw struct page data in __dump_page() 2016-12-12 18:55:08 -08:00
dmapool.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
early_ioremap.c mm/early_ioremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
fadvise.c mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED 2016-12-20 09:48:46 -08:00
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c fs: fix data invalidation in the cleancache during direct IO 2017-05-03 15:52:12 -07:00
frame_vector.c mm: replace get_vaddr_frames() write/force parameters with gup_flags 2016-10-19 08:11:24 -07:00
frontswap.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
gup.c mm/gup.c: fix access_ok() argument type 2017-05-03 15:52:12 -07:00
highmem.c mm/highmem: make nr_free_highpages() handles all highmem zones by itself 2016-05-19 19:12:14 -07:00
huge_memory.c mm: make ttu's return boolean 2017-05-03 15:52:10 -07:00
hugetlb_cgroup.c mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size 2016-05-20 17:58:30 -07:00
hugetlb.c mm/hugetlb.c: don't call region_abort if region_chg fails 2017-03-31 17:13:30 -07:00
hwpoison-inject.c mm: hwpoison: call shake_page() unconditionally 2017-05-03 15:52:12 -07:00
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2016-11-22 11:49:48 -06:00
internal.h mm, compaction: reorder fields in struct compact_control 2017-05-08 17:15:09 -07:00
interval_tree.c
Kconfig mm: remove AVR32 arch special handling in mm/Kconfig 2017-05-01 09:36:31 +02:00
Kconfig.debug mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
khugepaged.c mm: do not use double negation for testing page flags 2017-05-03 15:52:09 -07:00
kmemcheck.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c mm: fix section name for .data..ro_after_init 2017-03-31 17:13:30 -07:00
ksm.c mm: make rmap_one boolean function 2017-05-03 15:52:10 -07:00
list_lru.c mm/list_lru.c: avoid error-path NULL pointer deref 2016-10-27 18:43:42 -07:00
maccess.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
madvise.c mm/madvise: move up the behavior parameter validation 2017-05-03 15:52:11 -07:00
Makefile mm: add arch-independent testcases for RODATA 2017-02-27 18:43:48 -08:00
memblock.c memblock: add memblock_cap_memory_range() 2017-04-05 18:26:50 +01:00
memcontrol.c mm: memcontrol: use node page state naming scheme for memcg 2017-05-03 15:52:11 -07:00
memory_hotplug.c mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx 2017-05-03 15:52:09 -07:00
memory-failure.c mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page 2017-05-03 15:52:12 -07:00
memory.c Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc 2017-04-02 10:33:48 -04:00
mempolicy.c mm/mempolicy.c: fix error handling in set_mempolicy and mbind. 2017-04-08 10:57:55 -07:00
mempool.c Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" 2016-07-28 16:07:41 -07:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c mm: make rmap_one boolean function 2017-05-03 15:52:10 -07:00
mincore.c mm: remove shmem_mapping() shmem_zero_setup() duplicates 2017-02-24 17:46:56 -08:00
mlock.c mm: make try_to_munlock() return void 2017-05-03 15:52:10 -07:00
mm_init.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
mmap.c mm/mmap: replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff 2017-05-03 15:52:10 -07:00
mmu_context.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
mmu_notifier.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
mmzone.c mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist() 2017-02-22 16:41:29 -08:00
mprotect.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
mremap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
msync.c mm/msync: use offset_in_page macro 2015-11-05 19:34:48 -08:00
nobootmem.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
nommu.c Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
oom_kill.c oom: improve oom disable handling 2017-05-03 15:52:10 -07:00
page_alloc.c mm, page_alloc: split smallest stolen page in fallback 2017-05-08 17:15:09 -07:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
page_idle.c mm: make rmap_one boolean function 2017-05-03 15:52:10 -07:00
page_io.c writeback: add wbc_to_write_flags() 2016-11-02 10:24:03 -06:00
page_isolation.c mm: use is_migrate_isolate_page() to simplify the code 2017-05-03 15:52:08 -07:00
page_owner.c mm/page_owner: don't define fields on struct page_ext by hard-coding 2016-10-07 18:46:27 -07:00
page_poison.c mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
page_vma_mapped.c mm: fix page_vma_mapped_walk() for ksm pages 2017-04-08 00:47:48 -07:00
page-writeback.c mm: memcontrol: use node page state naming scheme for memcg 2017-05-03 15:52:11 -07:00
pagewalk.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
percpu-km.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
percpu-vm.c percpu: remove unused chunk_alloc parameter from pcpu_get_pages() 2017-03-06 15:56:55 -05:00
percpu.c Merge branch 'sched/core' into locking/core 2017-04-04 11:31:12 +02:00
pgtable-generic.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
process_vm_access.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
quicklist.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
readahead.c mm: don't cap request size based on read-ahead setting 2016-12-12 18:55:08 -08:00
rmap.c mm: memcontrol: use node page state naming scheme for memcg 2017-05-03 15:52:11 -07:00
rodata_test.c mm: remove rodata_test_data export, add pr_fmt 2017-05-03 15:52:09 -07:00
shmem.c Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
slab_common.c kasan: drain quarantine of memcg slab objects 2017-02-24 17:46:56 -08:00
slab.c slab: avoid IPIs when creating kmem caches 2017-05-03 15:52:07 -07:00
slab.h slab: remove synchronous synchronize_sched() from memcg cache deactivation path 2017-02-22 16:41:27 -08:00
slob.c slab: introduce __kmemcg_cache_deactivate() 2017-02-22 16:41:27 -08:00
slub.c slub: make sysfs directories for memcg sub-caches optional 2017-02-22 16:41:27 -08:00
sparse-vmemmap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
sparse.c mm/sparse: refine usemap_size() a little 2017-05-03 15:52:09 -07:00
swap_cgroup.c mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() 2017-04-08 00:47:49 -07:00
swap_slots.c mm/swap_slots.c: add warning if swap slots cache failed to initialize 2017-05-03 15:52:10 -07:00
swap_state.c mm, swap: fix comment in __read_swap_cache_async 2017-05-03 15:52:10 -07:00
swap.c mm: move MADV_FREE pages into LRU_INACTIVE_FILE list 2017-05-03 15:52:08 -07:00
swapfile.c mm/swapfile.c: fix swap space leak in error path of swap_free_entries() 2017-05-03 15:52:12 -07:00
truncate.c mm/truncate: avoid pointless cleancache_invalidate_inode() calls. 2017-05-03 15:52:12 -07:00
usercopy.c mm/usercopy: Drop extra is_vmalloc_or_module() check 2017-04-05 12:30:18 -07:00
userfaultfd.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
util.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
vmacache.c sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
vmalloc.c A reasonably busy cycle for documentation this time around. There is a new 2017-05-02 10:21:17 -07:00
vmpressure.c mm: vmpressure: fix sending wrong events on underflow 2017-02-24 17:46:56 -08:00
vmscan.c mm: memcontrol: use node page state naming scheme for memcg 2017-05-03 15:52:11 -07:00
vmstat.c mm, vmstat: suppress pcp stats for unpopulated zones in zoneinfo 2017-05-03 15:52:08 -07:00
workingset.c mm: memcontrol: use node page state naming scheme for memcg 2017-05-03 15:52:11 -07:00
z3fold.c z3fold: fix page locking in z3fold_alloc() 2017-04-13 18:24:20 -07:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c zsmalloc: expand class bit 2017-04-13 18:24:21 -07:00
zswap.c zswap: don't param_set_charp while holding spinlock 2017-02-27 18:43:45 -08:00