linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-14 22:44:27 +08:00

History

Johannes Weiner 1276ad68e2 mm: vmscan: scan dirty pages even in laptop mode Patch series "mm: vmscan: fix kswapd writeback regression". We noticed a regression on multiple hadoop workloads when moving from 3.10 to 4.0 and 4.6, which involves kswapd getting tangled up in page writeout, causing direct reclaim herds that also don't make progress. I tracked it down to the thrash avoidance efforts after 3.10 that make the kernel better at keeping use-once cache and use-many cache sorted on the inactive and active list, with more aggressive protection of the active list as long as there is inactive cache. Unfortunately, our workload's use-once cache is mostly from streaming writes. Waiting for writes to avoid potential reloads in the future is not a good tradeoff. These patches do the following: 1. Wake the flushers when kswapd sees a lump of dirty pages. It's possible to be below the dirty background limit and still have cache velocity push them through the LRU. So start a-flushin'. 2. Let kswapd only write pages that have been rotated twice. This makes sure we really tried to get all the clean pages on the inactive list before resorting to horrible LRU-order writeback. 3. Move rotating dirty pages off the inactive list. Instead of churning or waiting on page writeback, we'll go after clean active cache. This might lead to thrashing, but in this state memory demand outstrips IO speed anyway, and reads are faster than writes. Mel backported the series to 4.10-rc5 with one minor conflict and ran a couple of tests on it. Mix of read/write random workload didn't show anything interesting. Write-only database didn't show much difference in performance but there were slight reductions in IO -- probably in the noise. simoop did show big differences although not as big as Mel expected. This is Chris Mason's workload that similate the VM activity of hadoop. Mel won't go through the full details but over the samples measured during an hour it reported 4.10.0-rc5 4.10.0-rc5 vanilla johannes-v1r1 Amean p50-Read 21346531.56 ( 0.00%) 21697513.24 ( -1.64%) Amean p95-Read 24700518.40 ( 0.00%) 25743268.98 ( -4.22%) Amean p99-Read 27959842.13 ( 0.00%) 28963271.11 ( -3.59%) Amean p50-Write 1138.04 ( 0.00%) 989.82 ( 13.02%) Amean p95-Write 1106643.48 ( 0.00%) 12104.00 ( 98.91%) Amean p99-Write 1569213.22 ( 0.00%) 36343.38 ( 97.68%) Amean p50-Allocation 85159.82 ( 0.00%) 79120.70 ( 7.09%) Amean p95-Allocation 204222.58 ( 0.00%) 129018.43 ( 36.82%) Amean p99-Allocation 278070.04 ( 0.00%) 183354.43 ( 34.06%) Amean final-p50-Read 21266432.00 ( 0.00%) 21921792.00 ( -3.08%) Amean final-p95-Read 24870912.00 ( 0.00%) 26116096.00 ( -5.01%) Amean final-p99-Read 28147712.00 ( 0.00%) 29523968.00 ( -4.89%) Amean final-p50-Write 1130.00 ( 0.00%) 977.00 ( 13.54%) Amean final-p95-Write 1033216.00 ( 0.00%) 2980.00 ( 99.71%) Amean final-p99-Write 1517568.00 ( 0.00%) 32672.00 ( 97.85%) Amean final-p50-Allocation 86656.00 ( 0.00%) 78464.00 ( 9.45%) Amean final-p95-Allocation 211712.00 ( 0.00%) 116608.00 ( 44.92%) Amean final-p99-Allocation 287232.00 ( 0.00%) 168704.00 ( 41.27%) The latencies are actually completely horrific in comparison to 4.4 (and 4.10-rc5 is worse than 4.9 according to historical data for reasons Mel hasn't analysed yet). Still, 95% of write latency (p95-write) is halved by the series and allocation latency is way down. Direct reclaim activity is one fifth of what it was according to vmstats. Kswapd activity is higher but this is not necessarily surprising. Kswapd efficiency is unchanged at 99% (99% of pages scanned were reclaimed) but direct reclaim efficiency went from 77% to 99% In the vanilla kernel, 627MB of data was written back from reclaim context. With the series, no data was written back. With or without the patch, pages are being immediately reclaimed after writeback completes. However, with the patch, only 1/8th of the pages are reclaimed like this. This patch (of 5): We have an elaborate dirty/writeback throttling mechanism inside the reclaim scanner, but for that to work the pages have to go through shrink_page_list() and get counted for what they are. Otherwise, we mess up the LRU order and don't match reclaim speed to writeback. Especially during deactivation, there is never a reason to skip dirty pages; nothing is even trying to write them out from there. Don't mess up the LRU order for nothing, shuffle these pages along. Link: http://lkml.kernel.org/r/20170123181641.23938-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2017-02-24 17:46:54 -08:00
..
kasan	arm64 updates for 4.11:	2017-02-22 10:46:44 -08:00
backing-dev.c	mm/backing-dev.c: use rb_entry()	2017-02-22 16:41:30 -08:00
balloon_compaction.c	mm: balloon: use general non-lru movable page feature	2016-07-26 16:19:19 -07:00
bootmem.c	mm/bootmem.c: cosmetic improvement of code readability	2017-02-22 16:41:29 -08:00
cleancache.c	cleancache: constify cleancache_ops structure	2016-01-27 09:09:57 -05:00
cma_debug.c	mm/cma_debug: correct size input to bitmap function	2015-07-17 16:39:54 -07:00
cma.c	mm/cma: Cleanup highmem check	2017-01-11 13:56:49 +00:00
cma.h	mm: cma: mark cma_bitmap_maxno() inline in header	2015-08-14 15:56:32 -07:00
compaction.c	mm,compaction: serialize waitqueue_active() checks	2017-02-22 16:41:29 -08:00
debug_page_ref.c	mm/page_ref: add tracepoint to track down page reference manipulation	2016-03-17 15:09:34 -07:00
debug.c	mm, debug: print raw struct page data in __dump_page()	2016-12-12 18:55:08 -08:00
dmapool.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
early_ioremap.c	mm/early_ioremap: use offset_in_page macro	2015-11-05 19:34:48 -08:00
fadvise.c	mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED	2016-12-20 09:48:46 -08:00
failslab.c	mm: fault-inject take over bootstrap kmem_cache check	2016-03-15 16:55:16 -07:00
filemap.c	mm: fix filemap.c kernel-doc warnings	2017-02-22 16:41:29 -08:00
frame_vector.c	mm: replace get_vaddr_frames() write/force parameters with gup_flags	2016-10-19 08:11:24 -07:00
frontswap.c	mm, frontswap: convert frontswap_enabled to static key	2016-07-26 16:19:19 -07:00
gup.c	userfaultfd: hugetlbfs: gup: support VM_FAULT_RETRY	2017-02-22 16:41:28 -08:00
highmem.c	mm/highmem: make nr_free_highpages() handles all highmem zones by itself	2016-05-19 19:12:14 -07:00
huge_memory.c	mm, thp: add new defer+madvise defrag option	2017-02-22 16:41:30 -08:00
hugetlb_cgroup.c	mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size	2016-05-20 17:58:30 -07:00
hugetlb.c	userfaultfd: hugetlbfs: add UFFDIO_COPY support for shared mappings	2017-02-22 16:41:28 -08:00
hwpoison-inject.c	hwpoison: use page_cgroup_ino for filtering by memcg	2015-09-10 13:29:01 -07:00
init-mm.c	mm: Add a user_ns owner to mm_struct and fix ptrace permission checks	2016-11-22 11:49:48 -06:00
internal.h	oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA	2017-02-22 16:41:30 -08:00
interval_tree.c	mm: replace vma->sharead.linear with vma->shared	2015-02-10 14:30:31 -08:00
Kconfig	mm: THP page cache support for ppc64	2016-12-12 18:55:08 -08:00
Kconfig.debug	PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO	2016-09-13 02:35:27 +02:00
khugepaged.c	mm: get rid of __GFP_OTHER_NODE	2017-01-10 18:31:55 -08:00
kmemcheck.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak-test.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak.c	kmemleak: fix reference to Documentation	2016-12-12 18:55:07 -08:00
ksm.c	mm/ksm: improve deduplication of zero pages with colouring	2017-02-24 17:46:53 -08:00
list_lru.c	mm/list_lru.c: avoid error-path NULL pointer deref	2016-10-27 18:43:42 -07:00
maccess.c	x86: remove more uaccess_32.h complexity	2016-05-22 17:21:27 -07:00
madvise.c	userfaultfd: non-cooperative: add madvise() event for MADV_REMOVE request	2017-02-24 17:46:54 -08:00
Makefile	mm/swap: add cache for swap slots allocation	2017-02-22 16:41:30 -08:00
memblock.c	memblock: embed memblock type name within struct memblock_type	2017-02-24 17:46:54 -08:00
memcontrol.c	slab: use memcg_kmem_cache_wq for slab destruction operations	2017-02-22 16:41:27 -08:00
memory_hotplug.c	mm/memory_hotplug.c: unexport __remove_pages()	2017-02-24 17:46:53 -08:00
memory-failure.c	mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked	2016-12-25 11:54:48 -08:00
memory.c	mm: drop unused argument of zap_page_range()	2017-02-22 16:41:30 -08:00
mempolicy.c	mm/mempolicy.c: do not put mempolicy before using its nodemask	2017-01-24 16:26:14 -08:00
mempool.c	Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"	2016-07-28 16:07:41 -07:00
memtest.c	memtest: remove unused header files	2015-09-08 15:35:28 -07:00
migrate.c	mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked	2016-12-25 11:54:48 -08:00
mincore.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
mlock.c	thp: fix corner case of munlock() of PTE-mapped THPs	2016-11-30 16:32:52 -08:00
mm_init.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
mmap.c	powerpc: do not make the entire heap executable	2017-02-22 16:41:29 -08:00
mmu_context.c	mm/mmu_context, sched/core: Fix mmu_context.h assumption	2016-04-28 11:44:19 +02:00
mmu_notifier.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
mmzone.c	mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist()	2017-02-22 16:41:29 -08:00
mprotect.c	mm: mprotect: use pmd_trans_unstable instead of taking the pmd_lock	2017-02-22 16:41:29 -08:00
mremap.c	userfaultfd: non-cooperative: optimize mremap_userfaultfd_complete()	2017-02-22 16:41:28 -08:00
msync.c	mm/msync: use offset_in_page macro	2015-11-05 19:34:48 -08:00
nobootmem.c	mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping	2016-10-11 15:06:33 -07:00
nommu.c	lib/show_mem.c: teach show_mem to work with the given nodemask	2017-02-22 16:41:30 -08:00
oom_kill.c	mm, oom: header nodemask is NULL when cpusets are disabled	2017-02-24 17:46:53 -08:00
page_alloc.c	mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled	2017-02-22 16:41:30 -08:00
page_counter.c	mm: page_counter: let page_counter_try_charge() return bool	2015-11-05 19:34:48 -08:00
page_ext.c	mm/page_ext: support extra space allocation by page_ext user	2016-10-07 18:46:27 -07:00
page_idle.c	mm, vmscan: move lru_lock to the node	2016-07-28 16:07:41 -07:00
page_io.c	writeback: add wbc_to_write_flags()	2016-11-02 10:24:03 -06:00
page_isolation.c	mm, page_alloc: avoid page_to_pfn() when merging buddies	2017-02-22 16:41:27 -08:00
page_owner.c	mm/page_owner: don't define fields on struct page_ext by hard-coding	2016-10-07 18:46:27 -07:00
page_poison.c	mm: check the return value of lookup_page_ext for all call sites	2016-06-03 15:06:22 -07:00
page-writeback.c	block: Use pointer to backing_dev_info from request_queue	2017-02-02 08:20:48 -07:00
pagewalk.c	thp: rename split_huge_page_pmd() to split_huge_pmd()	2016-01-15 17:56:32 -08:00
percpu-km.c	mm: percpu: use pr_fmt to prefix output	2016-03-17 15:09:34 -07:00
percpu-vm.c
percpu.c	Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu	2016-12-13 12:34:47 -08:00
pgtable-generic.c	mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range	2016-03-17 15:09:34 -07:00
process_vm_access.c	mm: unexport __get_user_pages_unlocked()	2016-12-14 16:04:09 -08:00
quicklist.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
readahead.c	mm: don't cap request size based on read-ahead setting	2016-12-12 18:55:08 -08:00
rmap.c	mm, rmap: handle anon_vma_prepare() common case inline	2016-12-12 18:55:08 -08:00
shmem.c	userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY	2017-02-22 16:41:29 -08:00
slab_common.c	slab: use memcg_kmem_cache_wq for slab destruction operations	2017-02-22 16:41:27 -08:00
slab.c	slab: introduce __kmemcg_cache_deactivate()	2017-02-22 16:41:27 -08:00
slab.h	slab: remove synchronous synchronize_sched() from memcg cache deactivation path	2017-02-22 16:41:27 -08:00
slob.c	slab: introduce __kmemcg_cache_deactivate()	2017-02-22 16:41:27 -08:00
slub.c	slub: make sysfs directories for memcg sub-caches optional	2017-02-22 16:41:27 -08:00
sparse-vmemmap.c	treewide: replace obsolete _refok by __ref	2016-08-02 17:31:41 -04:00
sparse.c	mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next	2017-02-22 16:41:29 -08:00
swap_cgroup.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
swap_slots.c	mm/swap: skip readahead only when swap slot cache is enabled	2017-02-22 16:41:30 -08:00
swap_state.c	mm/swap: skip readahead only when swap slot cache is enabled	2017-02-22 16:41:30 -08:00
swap.c	mm/swap: split swap cache into 64MB trunks	2017-02-22 16:41:30 -08:00
swapfile.c	mm/swap: enable swap slots cache usage	2017-02-22 16:41:30 -08:00
truncate.c	mm: Invalidate DAX radix tree entries only if appropriate	2016-12-26 20:29:24 -08:00
usercopy.c	mm/usercopy: Switch to using lm_alias	2017-01-11 13:56:50 +00:00
userfaultfd.c	userfaultfd: hugetlbfs: add UFFDIO_COPY support for shared mappings	2017-02-22 16:41:28 -08:00
util.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
vmacache.c	mm: unrig VMA cache hit ratio	2016-10-07 18:46:27 -07:00
vmalloc.c	mm, page_alloc: warn_alloc print nodemask	2017-02-22 16:41:30 -08:00
vmpressure.c	mm/vmpressure.c: fix subtree pressure detection	2016-02-03 08:28:43 -08:00
vmscan.c	mm: vmscan: scan dirty pages even in laptop mode	2017-02-24 17:46:54 -08:00
vmstat.c	mm, compaction: add vmstats for kcompactd work	2017-02-22 16:41:29 -08:00
workingset.c	mm, vmscan: cleanup lru size claculations	2017-02-22 16:41:30 -08:00
z3fold.c	mm/z3fold.c: limit first_num to the actual range of possible buddy indexes	2017-02-22 16:41:31 -08:00
zbud.c	mm/zbud.c: use list_last_entry() instead of list_tail_entry()	2016-01-15 11:40:52 -08:00
zpool.c	mm: zsmalloc: constify struct zs_pool name	2015-11-06 17:50:42 -08:00
zsmalloc.c	mm: fix some typos in mm/zsmalloc.c	2017-02-22 16:41:29 -08:00
zswap.c	zswap: disable changing params if init fails	2017-02-03 14:13:19 -08:00