linux/mm
Shakeel Butt cfdab60bfa mm: page_counter: remove unneeded atomic ops for low/min
Patch series "memcg: optimize charge codepath", v2.

Recently Linux networking stack has moved from a very old per socket
pre-charge caching to per-cpu caching to avoid pre-charge fragmentation
and unwarranted OOMs.  One impact of this change is that for network
traffic workloads, memcg charging codepath can become a bottleneck.  The
kernel test robot has also reported this regression[1].  This patch series
tries to improve the memcg charging for such workloads.

This patch series implement three optimizations:
(A) Reduce atomic ops in page counter update path.
(B) Change layout of struct page_counter to eliminate false sharing
    between usage and high.
(C) Increase the memcg charge batch to 64.

To evaluate the impact of these optimizations, on a 72 CPUs machine, we
ran the following workload in root memcg and then compared with scenario
where the workload is run in a three level of cgroup hierarchy with top
level having min and low setup appropriately.

 $ netserver -6
 # 36 instances of netperf with following params
 $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K

Results (average throughput of netperf):
1. root memcg		21694.8 Mbps
2. 6.0-rc1		10482.7 Mbps (-51.6%)
3. 6.0-rc1 + (A)	14542.5 Mbps (-32.9%)
4. 6.0-rc1 + (B)	12413.7 Mbps (-42.7%)
5. 6.0-rc1 + (C)	17063.7 Mbps (-21.3%)
6. 6.0-rc1 + (A+B+C)	20120.3 Mbps (-7.2%)

With all three optimizations, the memcg overhead of this workload has
been reduced from 51.6% to just 7.2%.

[1] https://lore.kernel.org/linux-mm/20220619150456.GB34471@xsang-OptiPlex-9020/


This patch (of 3):

For cgroups using low or min protections, the function
propagate_protected_usage() was doing an atomic xchg() operation
irrespectively.  We can optimize out this atomic operation for one
specific scenario where the workload is using the protection (i.e.  min >
0) and the usage is above the protection (i.e.  usage > min).

This scenario is actually very common where the users want a part of their
workload to be protected against the external reclaim.  Though this
optimization does introduce a race when the usage is around the protection
and concurrent charges and uncharged trip it over or under the protection.
In such cases, we might see lower effective protection but the subsequent
charge/uncharge will correct it.

To evaluate the impact of this optimization, on a 72 CPUs machine, we ran
the following workload in a three level of cgroup hierarchy with top level
having min and low setup appropriately to see if this optimization is
effective for the mentioned case.

 $ netserver -6
 # 36 instances of netperf with following params
 $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K

Results (average throughput of netperf):
Without (6.0-rc1)	10482.7 Mbps
With patch		14542.5 Mbps (38.7% improvement)

With the patch, the throughput improved by 38.7%

Link: https://lkml.kernel.org/r/20220825000506.239406-1-shakeelb@google.com
Link: https://lkml.kernel.org/r/20220825000506.239406-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oliver Sang <oliver.sang@intel.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:26:01 -07:00
..
damon mm/damon: replace pmd_huge() with pmd_trans_huge() for THP 2022-09-11 20:25:59 -07:00
kasan - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
kfence kfence: add sysfs interface to disable kfence for selected slabs. 2022-09-11 20:25:52 -07:00
backing-dev.c writeback: avoid use-after-free after removing device 2022-08-28 14:02:43 -07:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-28 14:02:45 -07:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-05-13 15:11:26 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
debug_page_ref.c
debug_vm_pgtable.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
debug.c mm: unexport page_init_poison 2022-03-24 19:06:45 -07:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
failslab.c mm: fix missing handler for __GFP_NOWARN 2022-05-19 14:08:55 -07:00
filemap.c mm/filemap.c: convert page_endio() to use a folio 2022-09-11 20:25:48 -07:00
folio-compat.c mm/folio-compat: Remove migration compatibility functions 2022-08-02 12:34:04 -04:00
frontswap.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
gup_test.c mm: rename is_pinnable_page() to is_longterm_pinnable_page() 2022-07-17 17:14:27 -07:00
gup_test.h
gup.c mm/gup.c: refactor check_and_migrate_movable_pages() 2022-09-11 20:26:00 -07:00
highmem.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
hmm.c mm/hmm: fault non-owner device private entries 2022-07-29 11:33:37 -07:00
huge_memory.c mm: release private data before split THP 2022-09-11 20:25:59 -07:00
hugetlb_cgroup.c hugetlb_cgroup: use helper for_each_hstate and hstate_index 2022-09-11 20:25:53 -07:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: simplify reset_struct_pages() 2022-09-11 20:25:58 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability 2022-08-08 18:06:43 -07:00
hugetlb.c mm/hugetlb: make detecting shared pte more reliable 2022-09-11 20:25:55 -07:00
hwpoison-inject.c mm/memory-failure: disable unpoison once hw error happens 2022-06-16 19:11:32 -07:00
init-mm.c kernel/fork: Initialize mm's PASID 2022-02-14 19:51:47 +01:00
internal.h mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage 2022-09-11 20:25:46 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: Add ioremap/iounmap_allowed() 2022-06-27 12:22:31 +01:00
Kconfig mm: remove EXPERIMENTAL flag for zswap 2022-09-11 20:26:01 -07:00
Kconfig.debug Two followon fixes for the post-5.19 series "Use pageblock_order for cma 2022-05-27 11:40:49 -07:00
khugepaged.c mm/khugepaged: rename prefix of shared collapse functions 2022-09-11 20:25:46 -07:00
kmemleak.c mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan() 2022-06-16 19:48:32 -07:00
ksm.c mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage 2022-09-11 20:25:46 -07:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c mm/madvise: add MADV_COLLAPSE to process_madvise() 2022-09-11 20:25:46 -07:00
Makefile mm: shrinkers: introduce debugfs interface for memory shrinkers 2022-07-03 18:08:40 -07:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c memblock updates for v5.20 2022-08-09 09:48:30 -07:00
memcontrol.c mm: memcg: export workingset refault stats for cgroup v1 2022-09-11 20:25:59 -07:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 11:08:32 -08:00
memory_hotplug.c mm: use is_zone_movable_page() helper 2022-07-29 18:07:20 -07:00
memory-failure.c mm: memory-failure: kill __soft_offline_page() 2022-09-11 20:25:58 -07:00
memory.c memory tiering: hot page selection with hint page fault latency 2022-09-11 20:25:54 -07:00
mempolicy.c mm/hugetlb: add dedicated func to get 'allowed' nodemask for current process 2022-09-11 20:25:50 -07:00
mempool.c mm/mempool: use might_alloc() 2022-06-16 19:48:30 -07:00
memremap.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
memtest.c
migrate_device.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
migrate.c memory tiering: hot page selection with hint page fault latency 2022-09-11 20:25:54 -07:00
mincore.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
mlock.c mm: handling Non-LRU pages returned by vm_normal_pages 2022-07-17 17:14:28 -07:00
mm_init.c
mmap_lock.c mm: mmap_lock: fix disabling preemption directly 2021-07-23 17:43:28 -07:00
mmap.c mm: align larger anonymous mappings on THP boundaries 2022-09-11 20:25:47 -07:00
mmu_gather.c mm/mmu_gather: limit free batch count and add schedule point in tlb_batch_pages_flush 2022-04-28 23:16:12 -07:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mprotect.c memory tiering: hot page selection with hint page fault latency 2022-09-11 20:25:54 -07:00
mremap.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
msync.c
nommu.c mm: nommu: pass a pointer to virt_to_page() 2022-07-17 17:14:37 -07:00
oom_kill.c mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery 2022-06-01 15:57:16 -07:00
page_alloc.c mm: kill find_min_pfn_with_active_regions() 2022-09-11 20:25:56 -07:00
page_counter.c mm: page_counter: remove unneeded atomic ops for low/min 2022-09-11 20:26:01 -07:00
page_ext.c mm: fix use-after free of page_ext after race with memory-offline 2022-09-11 20:25:57 -07:00
page_idle.c mm: don't be stuck to rmap lock on reclaim path 2022-05-19 14:08:54 -07:00
page_io.c mm/swap: remove the end_write_func argument to __swap_writepage 2022-09-11 20:25:50 -07:00
page_isolation.c mm/page_isolation.c: fix one kernel-doc comment 2022-06-16 19:11:30 -07:00
page_owner.c mm/page_owner.c: add llseek for page_owner 2022-09-11 20:25:59 -07:00
page_poison.c
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_table_check.c mm: fix use-after free of page_ext after race with memory-offline 2022-09-11 20:25:57 -07:00
page_vma_mapped.c mm/page_vma_mapped.c: use helper function huge_pte_lock 2022-07-17 17:14:47 -07:00
page-writeback.c writeback: avoid use-after-free after removing device 2022-08-28 14:02:43 -07:00
pagewalk.c mm: pagewalk: add api documentation for walk_page_range_novma() 2022-09-11 20:26:00 -07:00
percpu-internal.h percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu.c mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free() 2022-07-17 17:14:47 -07:00
pgalloc-track.h
pgtable-generic.c mm: avoid unnecessary flush on change_huge_pmd() 2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c mm: sparsemem: use page table lock to protect kernel pmd operations 2022-03-22 15:57:08 -07:00
readahead.c filemap: Fix serialization adding transparent huge pages to page cache 2022-06-23 12:22:00 -04:00
rmap.c mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage 2022-09-11 20:25:46 -07:00
rodata_test.c
secretmem.c Folio changes for 6.0 2022-08-03 10:35:43 -07:00
shmem.c shmem: update folio if shmem_replace_page() updates the page 2022-08-28 14:02:43 -07:00
shrinker_debug.c mm: shrinkers: fix double kfree on shrinker name 2022-07-29 18:07:13 -07:00
shuffle.c
shuffle.h
slab_common.c mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slab.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
slab.h mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slob.c mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slub.c kfence: add sysfs interface to disable kfence for selected slabs. 2022-09-11 20:25:52 -07:00
sparse-vmemmap.c mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c 2022-08-08 18:06:42 -07:00
sparse.c mm: memory_hotplug: enumerate all supported section flags 2022-07-03 18:08:49 -07:00
swap_cgroup.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
swap_slots.c arm64: enable THP_SWAP for arm64 2022-07-20 10:52:40 +01:00
swap_state.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
swap.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
swap.h mm/swap: remove the end_write_func argument to __swap_writepage 2022-09-11 20:25:50 -07:00
swapfile.c mm/swap: convert delete_from_swap_cache() to take a folio 2022-07-03 18:08:48 -07:00
truncate.c mm: Remove __delete_from_page_cache() 2022-06-29 08:51:05 -04:00
usercopy.c usercopy: use unsigned long instead of uintptr_t 2022-07-01 17:03:38 -07:00
userfaultfd.c mm/uffd: reset write protection when unregister with wp-mode 2022-08-20 15:17:45 -07:00
util.c mm/util.c: add warning if __vm_enough_memory fails 2022-09-11 20:25:54 -07:00
vmacache.c
vmalloc.c mm/vmalloc.c: support HIGHMEM pages in vmap_pages_range_noflush() 2022-09-11 20:25:56 -07:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c mm/vmscan: make the annotations of refaults code at the right place 2022-09-11 20:25:51 -07:00
vmstat.c memory tiering: rate limit NUMA migration throughput 2022-09-11 20:25:54 -07:00
workingset.c mm: shrinkers: provide shrinkers with names 2022-07-03 18:08:40 -07:00
z3fold.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c zpool: remove the list of pools_head 2022-01-15 16:30:31 +02:00
zsmalloc.c zsmalloc: zs_object_copy: replace email link to doc 2022-09-11 20:25:56 -07:00
zswap.c mm/swap: remove the end_write_func argument to __swap_writepage 2022-09-11 20:25:50 -07:00