linux/mm
Qi Zheng f95bdb700b mm: vmscan: make global slab shrink lockless
The shrinker_rwsem is a global read-write lock in shrinkers subsystem,
which protects most operations such as slab shrink, registration and
unregistration of shrinkers, etc.  This can easily cause problems in the
following cases.

1) When the memory pressure is high and there are many
   filesystems mounted or unmounted at the same time,
   slab shrink will be affected (down_read_trylock()
   failed).

   Such as the real workload mentioned by Kirill Tkhai:

   ```
   One of the real workloads from my experience is start
   of an overcommitted node containing many starting
   containers after node crash (or many resuming containers
   after reboot for kernel update). In these cases memory
   pressure is huge, and the node goes round in long reclaim.
   ```

2) If a shrinker is blocked (such as the case mentioned
   in [1]) and a writer comes in (such as mount a fs),
   then this writer will be blocked and cause all
   subsequent shrinker-related operations to be blocked.

Even if there is no competitor when shrinking slab, there may still be a
problem.  If we have a long shrinker list and we do not reclaim enough
memory with each shrinker, then the down_read_trylock() may be called with
high frequency.  Because of the poor multicore scalability of atomic
operations, this can lead to a significant drop in IPC (instructions per
cycle).

So many times in history ([2],[3],[4],[5]), some people wanted to replace
shrinker_rwsem trylock with SRCU in the slab shrink, but all these patches
were abandoned because SRCU was not unconditionally enabled.

But now, since commit 1cd0bd06093c ("rcu: Remove CONFIG_SRCU"), the SRCU
is unconditionally enabled.  So it's time to use SRCU to protect readers
who previously held shrinker_rwsem.

This commit uses SRCU to make global slab shrink lockless,
the memcg slab shrink is handled in the subsequent patch.

[1]. https://lore.kernel.org/lkml/20191129214541.3110-1-ptikhomirov@virtuozzo.com/
[2]. https://lore.kernel.org/all/1437080113.3596.2.camel@stgolabs.net/
[3]. https://lore.kernel.org/lkml/1510609063-3327-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/
[4]. https://lore.kernel.org/lkml/153365347929.19074.12509495712735843805.stgit@localhost.localdomain/
[5]. https://lore.kernel.org/lkml/20210927074823.5825-1-sultan@kerneltoast.com/

Link: https://lkml.kernel.org/r/20230313112819.38938-3-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill Tkhai <tkhai@ya.ru>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christian König <christian.koenig@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 16:20:16 -07:00
..
damon mm/damon/paddr: fix folio_nr_pages() after folio_put() in damon_pa_mark_accessed_or_deactivate() 2023-03-07 17:04:55 -08:00
kasan kasan: remove PG_skip_kasan_poison flag 2023-03-28 16:20:16 -07:00
kfence mm: kfence: fix handling discontiguous page 2023-03-28 15:24:32 -07:00
kmsan kmsan: add test_stackdepot_roundtrip 2023-03-28 16:20:14 -07:00
backing-dev.c mm: add /sys/class/bdi/<bdi>/min_ratio_fine knob 2022-11-30 15:59:06 -08:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-28 14:02:45 -07:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
cma.c mm/cma: fix potential memory loss on cma_declare_contiguous_nid 2023-02-02 22:33:24 -08:00
cma.h
compaction.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm: prefer xxx_page() alloc/free functions for order-0 pages 2023-03-28 16:20:16 -07:00
debug.c mm/debug: use %pGt to display page_type in dump_page() 2023-03-28 16:20:09 -07:00
dmapool.c
early_ioremap.c
fadvise.c mm: support POSIX_FADV_NOREUSE 2023-01-18 17:12:57 -08:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
folio-compat.c mm: change to return bool for isolate_lru_page() 2023-02-20 12:46:17 -08:00
frontswap.c frontswap: don't call ->init if no ops are registered 2022-09-26 12:14:34 -07:00
gup_test.c mm/gup_test: free memory allocated via kvcalloc() using kvfree() 2022-12-15 16:37:48 -08:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
gup.c mm/gup.c: fix typo in comments 2023-03-28 16:20:14 -07:00
highmem.c highmem: fix kmap_to_page() for kmap_local_page() addresses 2022-10-12 18:51:51 -07:00
hmm.c mm/hugetlb: make walk_hugetlb_range() safe to pmd unshare 2023-01-18 17:12:39 -08:00
huge_memory.c mm: huge_memory: convert __do_huge_pmd_anonymous_page() to use a folio 2023-03-28 16:20:09 -07:00
hugetlb_cgroup.c mm/hugetlb: increase use of folios in alloc_huge_page() 2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c mm: prefer xxx_page() alloc/free functions for order-0 pages 2023-03-28 16:20:16 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability 2022-08-08 18:06:43 -07:00
hugetlb.c mm: hugetlb: change to return bool for isolate_hugetlb() 2023-02-20 12:46:17 -08:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: remove rb tree. 2022-09-26 19:46:16 -07:00
internal.h mm, printk: introduce new format %pGt for page_type 2023-03-28 16:20:09 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: Add ioremap/iounmap_allowed() 2022-06-27 12:22:31 +01:00
Kconfig zsmalloc: set default zspage chain size to 8 2023-02-02 22:33:23 -08:00
Kconfig.debug mm: move KMEMLEAK's Kconfig items from lib to mm 2023-02-02 22:33:26 -08:00
khugepaged.c mm/khugepaged: cleanup memcg uncharge for failure path 2023-03-28 16:20:11 -07:00
kmemleak.c lib/stackdepot, mm: rename stack_depot_want_early_init 2023-02-16 20:43:49 -08:00
ksm.c mm: add tracepoints to ksm 2023-03-28 16:20:08 -07:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-11 11:44:46 -08:00
madvise.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
Makefile mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol 2022-10-03 14:03:36 -07:00
mapping_dirty_helpers.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
memblock.c memblock: small optimizations 2023-02-27 09:34:53 -08:00
memcontrol.c mm, memcg: Prevent memory.soft_limit_in_bytes load/store tearing 2023-03-28 16:20:13 -07:00
memfd.c mm/memfd: add write seals when apply SEAL_EXEC to executable memfd 2023-01-18 17:12:37 -08:00
memory_hotplug.c mm/memory_hotplug: cleanup return value handing in do_migrate_range() 2023-02-20 12:46:18 -08:00
memory-failure.c mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON 2023-02-27 17:00:14 -08:00
memory-tiers.c memory tier: release the new_memtier in find_create_memory_tier() 2023-02-09 16:51:40 -08:00
memory.c mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault() 2023-03-28 16:20:12 -07:00
mempolicy.c mm: hugetlb: change to return bool for isolate_hugetlb() 2023-02-20 12:46:17 -08:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c
migrate_device.c mm: change to return bool for isolate_lru_page() 2023-02-20 12:46:17 -08:00
migrate.c mm/migrate: drop pte_mkhuge() in remove_migration_pte() 2023-03-28 16:20:11 -07:00
mincore.c mm: teach mincore_hugetlb about pte markers 2023-03-07 17:04:53 -08:00
mlock.c mm: introduce vm_flags_reset_once to replace WRITE_ONCE vm_flags updates 2023-02-09 16:51:41 -08:00
mm_init.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c mm: deduplicate error handling for map_deny_write_exec 2023-03-23 17:18:32 -07:00
mmu_gather.c mm: prefer xxx_page() alloc/free functions for order-0 pages 2023-03-28 16:20:16 -07:00
mmu_notifier.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
mmzone.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mprotect.c mm: fix error handling for map_deny_write_exec 2023-03-23 17:18:33 -07:00
mremap.c x86/mm/pat: clear VM_PAT if copy_p4d_range failed 2023-03-28 16:20:07 -07:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c mm: replace vma->vm_flags direct modifications with modifier calls 2023-02-09 16:51:39 -08:00
oom_kill.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
page_alloc.c mm: prefer xxx_page() alloc/free functions for order-0 pages 2023-03-28 16:20:16 -07:00
page_counter.c mm: page_counter: remove unneeded atomic ops for low/min 2022-09-11 20:26:01 -07:00
page_ext.c mm/page_ext: init page_ext early if there are no deferred struct pages 2023-02-02 22:33:22 -08:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
page_isolation.c mm/page_isolation: fix clang deadcode warning 2022-10-28 13:37:22 -07:00
page_owner.c lib/stackdepot, mm: rename stack_depot_want_early_init 2023-02-16 20:43:49 -08:00
page_poison.c
page_reporting.c mm/page_reporting: replace rcu_access_pointer() with rcu_dereference_protected() 2023-01-18 17:12:50 -08:00
page_reporting.h
page_table_check.c mm/page_ext: do not allocate space for page_ext->flags if not needed 2023-02-02 22:33:11 -08:00
page_vma_mapped.c mm/hugetlb: introduce hugetlb_walk() 2023-01-18 17:12:39 -08:00
page-writeback.c mm,jfs: move write_one_page/folio_write_one to jfs 2023-03-28 16:20:14 -07:00
pagewalk.c mm/hugetlb: introduce hugetlb_walk() 2023-01-18 17:12:39 -08:00
percpu-internal.h mm: percpu: fix incorrect size in pcpu_obj_full_size() 2023-02-16 20:43:55 -08:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: memcontrol: rename memcg_kmem_enabled() 2023-02-16 20:43:56 -08:00
pgalloc-track.h
pgtable-generic.c mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault() 2023-03-28 16:20:12 -07:00
process_vm_access.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-03 10:13:13 -07:00
readahead.c readahead: convert readahead_expand() to use a folio 2023-02-02 22:33:21 -08:00
rmap.c mm/rmap: use atomic_try_cmpxchg in set_tlb_ubc_flush_pending 2023-03-28 16:20:09 -07:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
shmem.c shmem: add support to ignore swap 2023-03-28 16:20:15 -07:00
shrinker_debug.c mm: shrinkers: fix deadlock in shrinker debugfs 2023-02-09 15:56:51 -08:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h
slab_common.c mm/kasan: simplify and refine kasan_cache code 2023-01-18 17:12:55 -08:00
slab.c slab fix for 6.3-rc4 2023-03-24 10:12:14 -07:00
slab.h mm: memcontrol: rename memcg_kmem_enabled() 2023-02-16 20:43:56 -08:00
slob.c Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next 2022-09-29 11:30:55 +02:00
slub.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
sparse-vmemmap.c mm/sparse-vmemmap: generalise vmemmap_populate_hugepages() 2022-12-11 18:12:12 -08:00
sparse.c mm/sparse: fix "unused function 'pgdat_to_phys'" warning 2023-02-02 22:33:29 -08:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c swap_state: update shadow_nodes for anonymous page 2023-02-02 22:33:24 -08:00
swap.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
swap.h mm: remove the __swap_writepage return value 2023-02-02 22:33:33 -08:00
swapfile.c mm: swap: remove unneeded cgroup_throttle_swaprate() 2023-03-28 16:20:10 -07:00
truncate.c folio-compat: remove lru_cache_add() 2022-12-11 18:12:13 -08:00
usercopy.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
userfaultfd.c mm/userfaultfd: support WP on multiple VMAs 2023-03-28 16:20:07 -07:00
util.c mm: fix typo in __vm_enough_memory warning 2023-02-13 15:54:33 -08:00
vmalloc.c mm: prefer xxx_page() alloc/free functions for order-0 pages 2023-03-28 16:20:16 -07:00
vmpressure.c
vmscan.c mm: vmscan: make global slab shrink lockless 2023-03-28 16:20:16 -07:00
vmstat.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
workingset.c swap_state: update shadow_nodes for anonymous page 2023-02-02 22:33:24 -08:00
z3fold.c mm: remove PageMovable export 2023-01-18 17:12:57 -08:00
zbud.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zpool.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zsmalloc.c zsmalloc: show per fullness group class stats 2023-03-28 16:20:12 -07:00
zswap.c mm/zswap: try to avoid worst-case scenario on same element pages 2023-03-28 16:20:07 -07:00