linux/mm
Waiman Long 930668c344 hugetlbfs: take read_lock on i_mmap for PMD sharing
A customer with large SMP systems (up to 16 sockets) with application
that uses large amount of static hugepages (~500-1500GB) are
experiencing random multisecond delays.  These delays were caused by the
long time it took to scan the VMA interval tree with mmap_sem held.

The sharing of huge PMD does not require changes to the i_mmap at all.
Therefore, we can just take the read lock and let other threads
searching for the right VMA share it in parallel.  Once the right VMA is
found, either the PMD lock (2M huge page for x86-64) or the
mm->page_table_lock will be acquired to perform the actual PMD sharing.

Lock contention, if present, will happen in the spinlock.  That is much
better than contention in the rwsem where the time needed to scan the
the interval tree is indeterminate.

With this patch applied, the customer is seeing significant performance
improvement over the unpatched kernel.

Link: http://lkml.kernel.org/r/20191107211809.9539-1-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:08 -08:00
..
kasan kasan: support backing vmalloc space with real shadow memory 2019-12-01 12:59:05 -08:00
backing-dev.c bdi: Do not use freezable workqueue 2019-10-06 09:11:35 -06:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-07-16 19:23:21 -07:00
cma.h
compaction.c mm, compaction: fix wrong pfn handling in __reset_isolation_pfn() 2019-10-14 15:04:01 -07:00
debug_page_ref.c
debug.c mm/debug.c: PageAnon() is true for PageKsm() pages 2019-11-15 18:34:00 -08:00
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2017-12-11 14:54:44 +01:00
fadvise.c fs: Export generic_fadvise() 2019-08-30 22:43:58 -07:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm: drop mmap_sem before calling balance_dirty_pages() in write fault 2019-12-01 06:29:18 -08:00
frame_vector.c mm: untag user pointers in get_vaddr_frames 2019-09-25 17:51:41 -07:00
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup_benchmark.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
gup.c mm/gup.c: fix comments of __get_user_pages() and get_user_pages_remote() 2019-12-01 06:29:18 -08:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap 2019-11-23 19:56:45 -04:00
huge_memory.c mm/thp: fix node page state in split_huge_page_to_list() 2019-10-19 06:32:32 -04:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-15 18:34:00 -08:00
hugetlb.c hugetlbfs: take read_lock on i_mmap for PMD sharing 2019-12-01 12:59:08 -08:00
hwpoison-inject.c hwpoison-inject: no need to check return value of debugfs_create functions 2019-06-03 15:39:40 +02:00
init-mm.c mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch 2019-10-19 06:32:32 -04:00
internal.h mm, pcpu: make zone pcp updates and reset internal to the mm 2019-12-01 12:59:06 -08:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
Kconfig hmm related patches for 5.5 2019-11-30 10:33:14 -08:00
Kconfig.debug mm, page_owner, debug_pagealloc: save and dump freeing stack trace 2019-09-24 15:54:08 -07:00
khugepaged.c mm,thp: recheck each page before collapsing file THP 2019-11-15 18:34:00 -08:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c kmemleak: Do not corrupt the object_list during clean-up 2019-10-14 08:56:16 -07:00
ksm.c mm/ksm.c: don't WARN if page is still mapped in remove_stable_node() 2019-11-22 09:11:18 -08:00
list_lru.c mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages 2019-07-12 11:05:44 -07:00
maccess.c uaccess: Add strict non-pagefault kernel-space read function 2019-11-02 12:39:12 -07:00
madvise.c mm, soft-offline: convert parameter to pfn 2019-12-01 12:59:04 -08:00
Makefile mm: Add write-protect and clean utilities for address space ranges 2019-11-06 13:03:36 +01:00
mapping_dirty_helpers.c mm: Add write-protect and clean utilities for address space ranges 2019-11-06 13:03:36 +01:00
memblock.c mm: support memblock alloc on the exact node for sparse_buffer_init() 2019-12-01 12:59:08 -08:00
memcontrol.c mm: clean up and clarify lruvec lookup procedure 2019-12-01 12:59:06 -08:00
memfd.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
memory_hotplug.c mm/memory_hotplug.c: don't allow to online/offline memory blocks with holes 2019-12-01 12:59:05 -08:00
memory-failure.c mm/memory-failure.c: use page_shift() in add_to_kill() 2019-12-01 12:59:04 -08:00
memory.c mm/memory.c: fix a huge pud insertion race during faulting 2019-12-01 06:29:19 -08:00
mempolicy.c mm/mempolicy.c: fix checking unmapped holes for mbind 2019-12-01 12:59:07 -08:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memremap.c mm/memunmap: don't access uninitialized memmap in memunmap_pages() 2019-10-19 06:32:32 -04:00
memtest.c
migrate.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mincore.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mlock.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c mm/mmap.c: use IS_ERR_VALUE to check return value of get_unmapped_area 2019-12-01 06:29:19 -08:00
mmu_context.c
mmu_gather.c mm: remove quicklist page table caches 2019-09-24 15:54:09 -07:00
mmu_notifier.c hmm related patches for 5.5 2019-11-30 10:33:14 -08:00
mmzone.c
mprotect.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mremap.c mm/mmap.c: use IS_ERR_VALUE to check return value of get_unmapped_area 2019-12-01 06:29:19 -08:00
msync.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
nommu.c mm/mmap.c: rb_parent is not necessary in __vma_link_list() 2019-12-01 06:29:19 -08:00
oom_kill.c mm: introduce MADV_COLD 2019-09-25 17:51:41 -07:00
page_alloc.c mm: clean up and clarify lruvec lookup procedure 2019-12-01 12:59:06 -08:00
page_counter.c memcg: introduce memory.min 2018-06-07 17:34:36 -07:00
page_ext.c mm, page_owner: fix off-by-one error in __set_page_owner_handle() 2019-10-14 15:04:00 -07:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-06-29 16:43:45 +08:00
page_io.c mm/page_io.c: do not free shared swap slots 2019-11-15 18:34:00 -08:00
page_isolation.c mm/page_isolation.c: convert SKIP_HWPOISON to MEMORY_OFFLINE 2019-12-01 12:59:04 -08:00
page_owner.c mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo 2019-10-19 06:32:31 -04:00
page_poison.c mm/page_poison.c: fix a typo in a comment 2019-09-24 15:54:08 -07:00
page_vma_mapped.c mm: introduce page_size() 2019-09-24 15:54:08 -07:00
page-writeback.c writeback, memcg: Implement foreign dirty flushing 2019-08-27 09:22:38 -06:00
pagewalk.c mm: Add a walk_page_mapping() function to the pagewalk code 2019-11-06 13:02:43 +01:00
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c percpu: Use struct_size() helper 2019-09-04 13:40:49 -07:00
pgtable-generic.c asm-generic/mm: stub out p{4,u}d_clear_bad() if __PAGETABLE_P{4,U}D_FOLDED 2019-12-01 06:29:19 -08:00
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm/rmap.c: use VM_BUG_ON_PAGE() in __page_check_anon_rmap() 2019-12-01 06:29:19 -08:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c mm, memfd: fix COW issue on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings 2019-12-01 12:59:03 -08:00
shuffle.c mm: fix -Wmissing-prototypes warnings 2019-10-07 15:47:19 -07:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab_common.c mm, slab_common: use enum kmalloc_cache_type to iterate over kmalloc caches 2019-12-01 06:29:17 -08:00
slab.c mm, slab: remove unused kmalloc_size() 2019-12-01 06:29:17 -08:00
slab.h mm: clean up and clarify lruvec lookup procedure 2019-12-01 12:59:06 -08:00
slob.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slub.c mm/slub.c: clean up validate_slab() 2019-12-01 06:29:18 -08:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm: support memblock alloc on the exact node for sparse_buffer_init() 2019-12-01 12:59:08 -08:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
swap.c mm/swap.c: piggyback lru_add_drain_all() calls 2019-12-01 06:29:19 -08:00
swapfile.c mm, swap: disallow swapon() on zoned block devices 2019-12-01 06:29:18 -08:00
truncate.c mm/thp: allow dropping THP from page cache 2019-10-19 06:32:33 -04:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-09-17 15:20:17 -07:00
userfaultfd.c hugetlbfs: hugetlb_fault_mutex_hash() cleanup 2019-12-01 12:59:08 -08:00
util.c mm/mmap.c: rb_parent is not necessary in __vma_link_list() 2019-12-01 06:29:19 -08:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c kasan: support backing vmalloc space with real shadow memory 2019-12-01 12:59:05 -08:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-07 15:47:19 -07:00
vmscan.c mm/vmscan.c: fix typo in comment 2019-12-01 12:59:07 -08:00
vmstat.c mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo 2019-11-06 08:47:50 -08:00
workingset.c mm: vmscan: detect file thrashing at the reclaim root 2019-12-01 12:59:07 -08:00
z3fold.c mm/z3fold.c: add inter-page compaction 2019-12-01 12:59:07 -08:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c zpool: add malloc_support_movable to zpool_driver 2019-09-24 15:54:12 -07:00
zsmalloc.c mm/zsmalloc.c: fix a -Wunused-function warning 2019-09-24 15:54:12 -07:00
zswap.c zswap: do not map same object twice 2019-09-24 15:54:12 -07:00