linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-16 01:04:08 +08:00

History

Davidlohr Bueso 615d6e8756 mm: per-thread vma caching This patch is a continuation of efforts trying to optimize find_vma(), avoiding potentially expensive rbtree walks to locate a vma upon faults. The original approach (https://lkml.org/lkml/2013/11/1/410), where the largest vma was also cached, ended up being too specific and random, thus further comparison with other approaches were needed. There are two things to consider when dealing with this, the cache hit rate and the latency of find_vma(). Improving the hit-rate does not necessarily translate in finding the vma any faster, as the overhead of any fancy caching schemes can be too high to consider. We currently cache the last used vma for the whole address space, which provides a nice optimization, reducing the total cycles in find_vma() by up to 250%, for workloads with good locality. On the other hand, this simple scheme is pretty much useless for workloads with poor locality. Analyzing ebizzy runs shows that, no matter how many threads are running, the mmap_cache hit rate is less than 2%, and in many situations below 1%. The proposed approach is to replace this scheme with a small per-thread cache, maximizing hit rates at a very low maintenance cost. Invalidations are performed by simply bumping up a 32-bit sequence number. The only expensive operation is in the rare case of a seq number overflow, where all caches that share the same address space are flushed. Upon a miss, the proposed replacement policy is based on the page number that contains the virtual address in question. Concretely, the following results are seen on an 80 core, 8 socket x86-64 box: 1) System bootup: Most programs are single threaded, so the per-thread scheme does improve ~50% hit rate by just adding a few more slots to the cache. +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 50.61% \| 19.90 \| \| patched \| 73.45% \| 13.58 \| +----------------+----------+------------------+ 2) Kernel build: This one is already pretty good with the current approach as we're dealing with good locality. +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 75.28% \| 11.03 \| \| patched \| 88.09% \| 9.31 \| +----------------+----------+------------------+ 3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload. +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 70.66% \| 17.14 \| \| patched \| 91.15% \| 12.57 \| +----------------+----------+------------------+ 4) Ebizzy: There's a fair amount of variation from run to run, but this approach always shows nearly perfect hit rates, while baseline is just about non-existent. The amounts of cycles can fluctuate between anywhere from ~60 to ~116 for the baseline scheme, but this approach reduces it considerably. For instance, with 80 threads: +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 1.06% \| 91.54 \| \| patched \| 99.97% \| 14.18 \| +----------------+----------+------------------+ [akpm@linux-foundation.org: fix nommu build, per Davidlohr] [akpm@linux-foundation.org: document vmacache_valid() logic] [akpm@linux-foundation.org: attempt to untangle header files] [akpm@linux-foundation.org: add vmacache_find() BUG_ON] [hughd@google.com: add vmacache_valid_mm() (from Oleg)] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: adjust and enhance comments] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2014-04-07 16:35:53 -07:00
..
backing-dev.c	bdi: avoid oops on device removal	2014-04-03 16:20:49 -07:00
balloon_compaction.c	mm: print more details for bad_page()	2014-01-23 16:36:50 -08:00
bootmem.c	mm/bootmem.c: remove unused local `map'	2013-11-13 12:09:09 +09:00
bounce.c	block: Convert bio_for_each_segment() to bvec_iter	2013-11-23 22:33:49 -08:00
cleancache.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
compaction.c	mm/compaction: clean-up code on success of ballon isolation	2014-04-07 16:35:51 -07:00
debug-pagealloc.c
dmapool.c	dmapool: make DMAPOOL_DEBUG detect corruption of free marker	2012-12-11 17:22:24 -08:00
fadvise.c	teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long	2013-03-03 22:46:22 -05:00
failslab.c
filemap_xip.c	seqcount: Add lockdep functionality to seqcount/seqlock structures	2013-11-06 12:40:26 +01:00
filemap.c	mm: cleanup size checks in filemap_fault() and filemap_map_pages()	2014-04-07 16:35:53 -07:00
fremap.c	mm: fix bad rss-counter if remap_file_pages raced migration	2014-03-19 16:21:49 -07:00
frontswap.c	frontswap: fix incorrect zeroing and allocation size for frontswap_map	2013-06-12 16:29:46 -07:00
highmem.c	Some nice cleanups, and even a patch my wife did as a "live" demo for	2012-12-20 08:37:05 -08:00
huge_memory.c	mm: revert "thp: make MADV_HUGEPAGE check for mm->def_flags"	2014-04-07 16:35:51 -07:00
hugetlb_cgroup.c	cgroup: drop const from @buffer of cftype->write_string()	2014-03-19 10:23:54 -04:00
hugetlb.c	mm: move mmu notifier call from change_protection to change_pmd_range	2014-04-07 16:35:50 -07:00
hwpoison-inject.c	mm/hwpoison: add '#' to hwpoison_inject	2014-01-21 16:19:48 -08:00
init-mm.c
internal.h	mm/page-writeback.c: do not count anon pages as dirtyable memory	2014-01-29 16:22:39 -08:00
interval_tree.c	mm: add CONFIG_DEBUG_VM_RB build option	2012-10-09 16:22:42 +09:00
Kconfig	mm: disable split page table lock for !MMU	2014-04-07 16:35:52 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c	kmemleak: change some global variables to int	2014-04-03 16:20:50 -07:00
ksm.c	mm: close PageTail race	2014-03-04 07:55:47 -08:00
list_lru.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
maccess.c
madvise.c	mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood	2013-09-30 14:31:02 -07:00
Makefile	mm: per-thread vma caching	2014-04-07 16:35:53 -07:00
memblock.c	ARM: 7993/1: mm/memblock: add memblock_get_current_limit	2014-03-12 00:16:56 +00:00
memcontrol.c	Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2014-04-03 13:05:42 -07:00
memory_hotplug.c	mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug	2014-01-23 16:36:52 -08:00
memory-failure.c	Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2014-04-03 13:05:42 -07:00
memory.c	mm: add debugfs tunable for fault_around_order	2014-04-07 16:35:53 -07:00
mempolicy.c	mm: optimize put_mems_allowed() usage	2014-04-03 16:20:58 -07:00
mempool.c	mm/mempool.c: convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)	2013-09-11 15:58:14 -07:00
migrate.c	mm: fix swapops.h:131 bug if remap_file_pages raced migration	2014-03-20 22:09:09 -07:00
mincore.c	mm + fs: prepare for non-page entries in page cache radix trees	2014-04-03 16:21:00 -07:00
mlock.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
mm_init.c	mm: bring back /sys/kernel/mm	2014-01-27 21:02:39 -08:00
mmap.c	mm: per-thread vma caching	2014-04-07 16:35:53 -07:00
mmu_context.c	sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm	2014-02-21 08:50:17 +01:00
mmu_notifier.c	mm: audit/fix non-modular users of module_init in core code	2014-01-23 16:36:52 -08:00
mmzone.c	mm: numa: Change page last {nid,pid} into {cpu,pid}	2013-10-09 14:47:45 +02:00
mprotect.c	mm: move mmu notifier call from change_protection to change_pmd_range	2014-04-07 16:35:50 -07:00
mremap.c	mm: revert mremap pud_free anti-fix	2013-10-16 21:35:53 -07:00
msync.c
nobootmem.c	mm/nobootmem.c: mark function as static	2014-04-03 16:21:02 -07:00
nommu.c	mm: per-thread vma caching	2014-04-07 16:35:53 -07:00
oom_kill.c	mm, oom: base root bonus on current usage	2014-01-30 16:56:56 -08:00
page_alloc.c	mm: exclude memoryless nodes from zone_reclaim	2014-04-07 16:35:50 -07:00
page_cgroup.c	mm/page_cgroup.c: mark functions as static	2014-04-03 16:21:02 -07:00
page_io.c	Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block	2014-01-30 11:19:05 -08:00
page_isolation.c	mm: memory-hotplug: enable memory hotplug to handle hugepage	2013-09-11 15:57:48 -07:00
page-writeback.c	mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()	2014-02-06 13:48:51 -08:00
pagewalk.c	mm/pagewalk.c: fix walk_page_range() access of wrong PTEs	2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c
percpu.c	percpu: renew the max_contig if we merge the head and previous block	2014-03-29 09:29:42 -04:00
pgtable-generic.c	mm: fix TLB flush race between migration, and change_protection_range	2013-12-18 19:04:51 -08:00
process_vm_access.c	mm/process_vm_access.c: mark function as static	2014-04-03 16:21:02 -07:00
quicklist.c
readahead.c	mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages	2014-04-03 16:21:05 -07:00
rmap.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux	2014-03-31 14:35:30 -07:00
shmem.c	mm: implement ->map_pages for shmem/tmpfs	2014-04-07 16:35:53 -07:00
slab_common.c	slab: fix wrong retval on kmem_cache_create_memcg error path	2014-01-29 16:22:40 -08:00
slab.c	mm: optimize put_mems_allowed() usage	2014-04-03 16:20:58 -07:00
slab.h	memcg, slab: RCU protect memcg_params for root caches	2014-01-23 16:36:51 -08:00
slob.c	mm/sl[aou]b: Move kmallocXXX functions to common code	2013-09-04 20:51:33 +03:00
slub.c	slub: do not drop slab_mutex for sysfs_slab_add	2014-04-03 16:21:05 -07:00
sparse-vmemmap.c	mm/sparse: use memblock apis for early memory allocations	2014-01-21 16:19:47 -08:00
sparse.c	sparse: fix comment	2014-04-02 09:16:17 +02:00
swap_state.c	swap: add a simple detector for inappropriate swapin readahead	2014-02-06 13:48:51 -08:00
swap.c	mm: thrash detection-based file cache sizing	2014-04-03 16:21:01 -07:00
swapfile.c	mm/swap: fix race on swap_info reuse between swapoff and swapon	2014-02-06 13:48:51 -08:00
truncate.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
util.c	mm: add overcommit_kbytes sysctl variable	2014-01-21 16:19:44 -08:00
vmacache.c	mm: per-thread vma caching	2014-04-07 16:35:53 -07:00
vmalloc.c	Revert "mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page}"	2014-01-27 21:02:39 -08:00
vmpressure.c	arm, pm, vmpressure: add missing slab.h includes	2014-02-03 13:24:01 -05:00
vmscan.c	mm/vmscan: do not check compaction_ready on promoted zones	2014-04-07 16:35:50 -07:00
vmstat.c	drop_caches: add some documentation and info message	2014-04-03 16:21:04 -07:00
workingset.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
zbud.c	mm/zbud: fix some trivial typos in comments	2013-09-11 15:57:35 -07:00
zsmalloc.c	zsmalloc: add copyright	2014-01-30 16:56:55 -08:00
zswap.c	mm/zswap.c: change params from hidden to ro	2014-01-23 16:36:50 -08:00