linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 04:18:39 +08:00

History

Yosry Ahmed f82e6bf9bb mm: memcg: use rstat for non-hierarchical stats Currently, memcg uses rstat to maintain aggregated hierarchical stats. Counters are maintained for hierarchical stats at each memcg. Rstat tracks which cgroups have updates on which cpus to keep those counters fresh on the read-side. Non-hierarchical stats are currently not covered by rstat. Their per-cpu counters are summed up on every read, which is expensive. The original implementation did the same. At some point before rstat, non-hierarchical aggregated counters were introduced by commit `a983b5ebee` ("mm: memcontrol: fix excessive complexity in memory.stat reporting"). However, those counters were updated on the performance critical write-side, which caused regressions, so they were later removed by commit `815744d751` ("mm: memcontrol: don't batch updates of local VM stats and events"). See [1] for more detailed history. Kernel versions in between `a983b5ebee` & `815744d751` (a year and a half) enjoyed cheap reads of non-hierarchical stats, specifically on cgroup v1. When moving to more recent kernels, a performance regression for reading non-hierarchical stats is observed. Now that we have rstat, we know exactly which percpu counters have updates for each stat. We can maintain non-hierarchical counters again, making reads much more efficient, without affecting the performance critical write-side. Hence, add non-hierarchical (i.e local) counters for the stats, and extend rstat flushing to keep those up-to-date. A caveat is that we now need a stats flush before reading local/non-hierarchical stats through {memcg/lruvec}_page_state_local() or memcg_events_local(), where we previously only needed a flush to read hierarchical stats. Most contexts reading non-hierarchical stats are already doing a flush, add a flush to the only missing context in count_shadow_nodes(). With this patch, reading memory.stat from 1000 memcgs is 3x faster on a machine with 256 cpus on cgroup v1: # for i in $(seq 1000); do mkdir /sys/fs/cgroup/memory/cg$i; done # time cat /sys/fs/cgroup/memory/cg/memory.stat > /dev/null real 0m0.125s user 0m0.005s sys 0m0.120s After: real 0m0.032s user 0m0.005s sys 0m0.027s To make sure there are no regressions on cgroup v2, I ran an artificial reclaim/refault stress test [2] that creates (NR_CPUS 2) cgroups, assigns them limits, runs a worker process in each cgroup that allocates tmpfs memory equal to quadruple the limit (to invoke reclaim continuously), and then reads back the entire file (to invoke refaults). All workers are run in parallel, and zram is used as a swapping backend. Both reclaim and refault have conditional stats flushing. I ran this on a machine with 112 cpus, once on mm-unstable, and once on mm-unstable with this patch reverted. (1) A few runs without this patch: # time ./stress_reclaim_refault.sh real 0m9.949s user 0m0.496s sys 14m44.974s # time ./stress_reclaim_refault.sh real 0m10.049s user 0m0.486s sys 14m55.791s # time ./stress_reclaim_refault.sh real 0m9.984s user 0m0.481s sys 14m53.841s (2) A few runs with this patch: # time ./stress_reclaim_refault.sh real 0m9.885s user 0m0.486s sys 14m48.753s # time ./stress_reclaim_refault.sh real 0m9.903s user 0m0.495s sys 14m48.339s # time ./stress_reclaim_refault.sh real 0m9.861s user 0m0.507s sys 14m49.317s No regressions are observed with this patch. There is actually a very slight improvement. If I have to guess, maybe it's because we avoid the percpu loop in count_shadow_nodes() when calling lruvec_page_state_local(), but I could not prove this using perf, it's probably in the noise. [1] https://lore.kernel.org/lkml/20230725201811.GA1231514@cmpxchg.org/ [2] https://lore.kernel.org/lkml/CAJD7tkb17x=qwoO37uxyYXLEUVp15BQKR+Xfh7Sg9Hx-wTQ_=w@mail.gmail.com/ Link: https://lkml.kernel.org/r/20230803185046.1385770-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20230726153223.821757-2-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2023-08-24 16:20:18 -07:00
..
damon	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-21 14:26:20 -07:00
kasan	kasan, slub: fix HW_TAGS zeroing with slub_debug	2023-07-08 09:29:32 -07:00
kfence	mm: kfence: allocate kfence_metadata at runtime	2023-08-18 10:12:39 -07:00
kmsan	mm: kmsan: use helper macros PAGE_ALIGN and PAGE_ALIGN_DOWN	2023-08-21 13:37:29 -07:00
backing-dev.c	writeback: remove redundant checks for root memcg	2023-08-21 13:37:48 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c	mm: cma: make kobj_type structure constant	2023-03-28 16:20:06 -07:00
cma.c	mm: cma: print cma name as well in cma_alloc debug	2023-08-18 10:12:12 -07:00
cma.h
compaction.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-21 14:26:20 -07:00
debug_page_alloc.c	mm: page_alloc: split out DEBUG_PAGEALLOC	2023-06-09 16:25:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c	mm: change pudp_huge_get_and_clear_full take vm_area_struct as arg	2023-08-18 10:12:53 -07:00
debug.c	mm: update validate_mm() to use vma iterator	2023-06-09 16:25:31 -07:00
dmapool_test.c	dmapool: add alloc/free performance test	2023-04-05 19:42:38 -07:00
dmapool.c	dmapool: create/destroy cleanup	2023-06-09 16:25:17 -07:00
early_ioremap.c	mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup()	2023-06-09 16:25:56 -07:00
fadvise.c	mm: remove unnecessary pagevec includes	2023-06-23 16:59:31 -07:00
fail_page_alloc.c	mm: page_alloc: split out FAIL_PAGE_ALLOC	2023-06-09 16:25:23 -07:00
failslab.c	mm: fix unexpected changes to {failslab\|fail_page_alloc}.attr	2022-11-22 18:50:44 -08:00
filemap.c	mm: handle swap page faults under per-VMA lock	2023-08-24 16:20:17 -07:00
folio-compat.c	- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of	2023-04-27 19:42:02 -07:00
gup_test.c	Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.	2023-06-23 16:58:19 -07:00
gup_test.h	mm/gup_test: start/stop/read functionality for PIN LONGTERM test	2022-11-08 17:37:15 -08:00
gup.c	mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT	2023-08-21 14:28:41 -07:00
highmem.c	mm: ptep_get() conversion	2023-06-19 16:19:25 -07:00
hmm.c	mm: enable page walking API to lock vmas during the walk	2023-08-21 13:07:20 -07:00
huge_memory.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-24 15:25:56 -07:00
hugetlb_cgroup.c	mm/hugetlb: increase use of folios in alloc_huge_page()	2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c	mm: hugetlb_vmemmap: fix a race between vmemmap pmd split	2023-08-18 10:12:14 -07:00
hugetlb_vmemmap.h
hugetlb.c	hugetlb: clear flags in tail pages that will be freed individually	2023-08-24 16:20:15 -07:00
hwpoison-inject.c
init-mm.c	mm: move dummy_vm_ops out of a header	2023-08-21 13:37:46 -07:00
internal.h	mm: free up a word in the first tail page	2023-08-21 14:28:45 -07:00
interval_tree.c
io-mapping.c
ioremap.c	mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed	2023-08-18 10:12:36 -07:00
Kconfig	mm/memory_hotplug: simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig	2023-08-21 13:37:48 -07:00
Kconfig.debug	mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM	2023-05-29 16:14:28 +01:00
khugepaged.c	mm/khugepaged: fix collapse_pte_mapped_thp() versus uffd	2023-08-24 16:20:15 -07:00
kmemleak.c	Rename kmemleak_initialized to kmemleak_late_initialized	2023-08-21 13:38:02 -07:00
ksm.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-21 14:26:20 -07:00
list_lru.c
maccess.c	mm: Fix copy_from_user_nofault().	2023-04-12 17:36:23 -07:00
madvise.c	swap: remove remnants of polling from read_swap_cache_async	2023-08-24 16:20:16 -07:00
Makefile	mm: kill frontswap	2023-08-21 13:37:26 -07:00
mapping_dirty_helpers.c	mm: ptep_get() conversion	2023-06-19 16:19:25 -07:00
memblock.c	mm: disable kernelcore=mirror when no mirror memory	2023-08-21 13:37:43 -07:00
memcontrol.c	mm: memcg: use rstat for non-hierarchical stats	2023-08-24 16:20:18 -07:00
memfd.c	memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy	2023-08-21 13:37:59 -07:00
memory_hotplug.c	mm/memory_hotplug: embed vmem_altmap details in memory block	2023-08-21 13:37:49 -07:00
memory-failure.c	mm: memory-failure: fix potential page refcnt leak in memory_failure()	2023-08-24 16:20:16 -07:00
memory-tiers.c	memory tier: use helper macro __ATTR_RW()	2023-08-18 10:12:38 -07:00
memory.c	mm: handle userfaults under VMA lock	2023-08-24 16:20:17 -07:00
mempolicy.c	mm: convert prep_transhuge_page() to folio_prep_large_rmappable()	2023-08-21 14:28:43 -07:00
mempool.c	mempool: do not use ksize() for poisoning	2022-11-30 15:58:41 -08:00
memremap.c	mm/memremap.c: fix outdated comment in devm_memremap_pages	2023-02-09 16:51:46 -08:00
memtest.c	mm: memtest: convert to memtest_report_meminfo()	2023-08-21 13:37:47 -07:00
migrate_device.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-21 14:26:20 -07:00
migrate.c	migrate: use folio_set_bh() instead of set_bh_page()	2023-08-18 10:12:30 -07:00
mincore.c	mm: enable page walking API to lock vmas during the walk	2023-08-21 13:07:20 -07:00
mlock.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-21 14:26:20 -07:00
mm_init.c	mm/mm_init: use helper macro BITS_PER_LONG and BITS_PER_BYTE	2023-08-21 13:37:47 -07:00
mm_slot.h
mmap_lock.c
mmap.c	mm: move vma locking out of vma_prepare and dup_anon_vma	2023-08-21 13:37:46 -07:00
mmu_gather.c	mm: prefer xxx_page() alloc/free functions for order-0 pages	2023-03-28 16:20:16 -07:00
mmu_notifier.c	mmu_notifiers: rename invalidate_range notifier	2023-08-18 10:12:41 -07:00
mmzone.c
mprotect.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-21 14:26:20 -07:00
mremap.c	mm/huge pud: use transparent huge pud helpers only with CONFIG_TRANSPARENT_HUGEPAGE	2023-08-18 10:12:54 -07:00
msync.c
nommu.c	mm/nommu.c: use helper macro K()	2023-08-21 13:37:44 -07:00
oom_kill.c	mm: remove redundant K() macro definition	2023-08-21 13:37:44 -07:00
page_alloc.c	mm: add large_rmappable page flag	2023-08-21 14:28:44 -07:00
page_counter.c
page_ext.c	mm/page_ext: move functions around for minor cleanups to page_ext	2023-08-18 10:12:31 -07:00
page_idle.c	mm: page_idle: convert page idle to use a folio	2023-01-18 17:12:52 -08:00
page_io.c	zswap: make zswap_load() take a folio	2023-08-21 13:37:27 -07:00
page_isolation.c	mm/hugetlb: get rid of page_hstate()	2023-08-18 10:12:39 -07:00
page_owner.c	mm/page_ext: use page_ext_data helper in page_owner	2023-08-21 13:37:27 -07:00
page_poison.c	mm/page_poison: remove unused page_ext.h from page_poison	2023-08-21 13:37:30 -07:00
page_reporting.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c	mm/page_ext: use page_ext_data helper in page_table_check	2023-08-21 13:37:27 -07:00
page_vma_mapped.c	mm: correct stale comment of function check_pte	2023-08-18 10:12:13 -07:00
page-writeback.c	writeback: account the number of pages written back	2023-07-08 09:29:30 -07:00
pagewalk.c	mm: enable page walking API to lock vmas during the walk	2023-08-21 13:07:20 -07:00
percpu-internal.h	percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing	2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c	mm: memcontrol: rename memcg_kmem_enabled()	2023-02-16 20:43:56 -08:00
pgalloc-track.h
pgtable-generic.c	mm/pgtable: notes on pte_offset_map[_lock]()	2023-08-18 10:12:25 -07:00
process_vm_access.c	mm/gup: remove unused vmas parameter from pin_user_pages_remote()	2023-06-09 16:25:25 -07:00
ptdump.c	mm: ptdump should use ptep_get_lockless()	2023-06-19 16:19:24 -07:00
readahead.c	mm: remove unnecessary pagevec includes	2023-06-23 16:59:31 -07:00
rmap.c	mmu_notifiers: don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end()	2023-08-18 10:12:41 -07:00
rodata_test.c
secretmem.c	mm/secretmem: use a folio in secretmem_fault()	2023-08-21 13:38:02 -07:00
shmem.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-24 15:25:56 -07:00
show_mem.c	mm,thp: no space after colon in Mem-Info fields	2023-08-21 13:38:01 -07:00
shrinker_debug.c	Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless"	2023-06-19 13:19:34 -07:00
shuffle.c
shuffle.h	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
slab_common.c	slab updates for 6.5	2023-06-29 16:34:12 -07:00
slab.c	slab updates for 6.5	2023-06-29 16:34:12 -07:00
slab.h	kasan, slub: fix HW_TAGS zeroing with slub_debug	2023-07-08 09:29:32 -07:00
slub.c	slab updates for 6.5	2023-06-29 16:34:12 -07:00
sparse-vmemmap.c	mm/vmemmap: allow architectures to override how vmemmap optimization works	2023-08-18 10:12:53 -07:00
sparse.c	mm/sparse: remove redundant judgments from macro for_each_present_section_nr	2023-08-18 10:12:14 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c	swap: remove remnants of polling from read_swap_cache_async	2023-08-24 16:20:16 -07:00
swap.c	mm: remove references to pagevec	2023-06-23 16:59:30 -07:00
swap.h	swap: remove remnants of polling from read_swap_cache_async	2023-08-24 16:20:16 -07:00
swapfile.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-21 14:26:20 -07:00
truncate.c	mm: merge folio_has_private()/filemap_release_folio() call pairs	2023-08-18 10:12:12 -07:00
usercopy.c	mm: Fix copy_from_user_nofault().	2023-04-12 17:36:23 -07:00
userfaultfd.c	mm: userfaultfd: support UFFDIO_POISON for hugetlbfs	2023-08-18 10:12:17 -07:00
util.c	mm: remove page_rmapping()	2023-08-18 10:12:01 -07:00
vmalloc.c	mm: add a call to flush_cache_vmap() in vmap_pfn()	2023-08-21 13:07:21 -07:00
vmpressure.c
vmscan.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-21 14:26:20 -07:00
vmstat.c	mm/vmstat: remove unused page_ext.h from vmstat	2023-08-21 13:37:30 -07:00
workingset.c	mm: memcg: use rstat for non-hierarchical stats	2023-08-24 16:20:18 -07:00
z3fold.c	mm/z3fold: remove obsolete comment for struct z3fold_pool	2023-08-21 13:37:51 -07:00
zbud.c	mm: zswap: remove shrink from zpool interface	2023-06-19 16:19:27 -07:00
zpool.c	mm: zswap: remove shrink from zpool interface	2023-06-19 16:19:27 -07:00
zsmalloc.c	merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes	2023-08-21 14:26:20 -07:00
zswap.c	mm: zswap: update comment for struct zswap_entry	2023-08-21 13:37:47 -07:00