linux/mm
Nhat Pham 4b569387c0 memcontrol: add helpers for hugetlb memcg accounting
Patch series "hugetlb memcg accounting", v4.

Currently, hugetlb memory usage is not acounted for in the memory
controller, which could lead to memory overprotection for cgroups with
hugetlb-backed memory.  This has been observed in our production system.

For instance, here is one of our usecases: suppose there are two 32G
containers.  The machine is booted with hugetlb_cma=6G, and each container
may or may not use up to 3 gigantic page, depending on the workload within
it.  The rest is anon, cache, slab, etc.  We can set the hugetlb cgroup
limit of each cgroup to 3G to enforce hugetlb fairness.  But it is very
difficult to configure memory.max to keep overall consumption, including
anon, cache, slab etcetera fair.

What we have had to resort to is to constantly poll hugetlb usage and
readjust memory.max.  Similar procedure is done to other memory limits
(memory.low for e.g).  However, this is rather cumbersome and buggy. 
Furthermore, when there is a delay in memory limits correction, (for e.g
when hugetlb usage changes within consecutive runs of the userspace
agent), the system could be in an over/underprotected state.

This patch series rectifies this issue by charging the memcg when the
hugetlb folio is allocated, and uncharging when the folio is freed.  In
addition, a new selftest is added to demonstrate and verify this new
behavior.


This patch (of 4):

This patch exposes charge committing and cancelling as parts of the memory
controller interface.  These functionalities are useful when the
try_charge() and commit_charge() stages have to be separated by other
actions in between (which can fail).  One such example is the new hugetlb
accounting behavior in the following patch.

The patch also adds a helper function to obtain a reference to the
current task's memcg.

Link: https://lkml.kernel.org/r/20231006184629.155543-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231006184629.155543-2-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun heo <tj@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:17 -07:00
..
damon Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-10-18 14:32:58 -07:00
kasan mm: delete checks for xor_unlock_is_negative_byte() 2023-10-18 14:34:17 -07:00
kfence LoongArch changes for v6.6 2023-09-08 12:16:52 -07:00
kmsan kmsan: introduce test_memcpy_initialized_gap() 2023-10-04 10:32:22 -07:00
backing-dev.c writeback: remove redundant checks for root memcg 2023-08-21 13:37:48 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
cma.c mm/cma: use nth_page() in place of direct struct page manipulation 2023-10-04 10:32:29 -07:00
cma.h
compaction.c mm/compaction: factor out code to test if we should run compaction for target order 2023-10-04 10:32:19 -07:00
debug_page_alloc.c mm: page_alloc: split out DEBUG_PAGEALLOC 2023-06-09 16:25:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
debug.c mm: update validate_mm() to use vma iterator 2023-06-09 16:25:31 -07:00
dmapool_test.c dmapool: add alloc/free performance test 2023-04-05 19:42:38 -07:00
dmapool.c dmapool: create/destroy cleanup 2023-06-09 16:25:17 -07:00
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
fadvise.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
fail_page_alloc.c mm: page_alloc: split out FAIL_PAGE_ALLOC 2023-06-09 16:25:23 -07:00
failslab.c
filemap.c mm: use folio_xor_flags_has_waiters() in folio_end_writeback() 2023-10-18 14:34:17 -07:00
folio-compat.c filemap: Add fgf_t typedef 2023-07-24 18:04:30 -04:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h
gup.c mm/gup: make failure to pin an error if FOLL_NOWAIT not specified 2023-10-18 14:34:15 -07:00
highmem.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
hmm.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
huge_memory.c mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap() 2023-10-18 14:34:14 -07:00
hugetlb_cgroup.c mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER 2023-10-18 14:34:17 -07:00
hugetlb_vmemmap.c mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO 2023-10-04 10:32:30 -07:00
hugetlb_vmemmap.h mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO 2023-10-04 10:32:30 -07:00
hugetlb.c mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER 2023-10-18 14:34:17 -07:00
hwpoison-inject.c
init-mm.c mm: move dummy_vm_ops out of a header 2023-08-21 13:37:46 -07:00
internal.h mm: use folio_xor_flags_has_waiters() in folio_end_writeback() 2023-10-18 14:34:17 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig zswap: change zswap's default allocator to zsmalloc 2023-10-04 10:32:20 -07:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-05-29 16:14:28 +01:00
khugepaged.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
kmemleak.c mm/kmemleak: move up cond_resched() call in page scanning loop 2023-09-02 15:17:34 -07:00
ksm.c mm/ksm: add pages_skipped metric 2023-10-16 15:44:39 -07:00
list_lru.c
maccess.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
madvise.c mm: remove remnants of SPLIT_RSS_COUNTING 2023-10-04 10:32:20 -07:00
Makefile mm: vmscan: move shrinker-related code into a separate file 2023-10-04 10:32:23 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c memblock: introduce MEMBLOCK_RSRV_NOINIT flag 2023-10-04 10:32:30 -07:00
memcontrol.c memcontrol: add helpers for hugetlb memcg accounting 2023-10-18 14:34:17 -07:00
memfd.c memfd: drop warning for missing exec-related flags 2023-10-04 10:32:22 -07:00
memory_hotplug.c mm/memory_hotplug: use pfn math in place of direct struct page manipulation 2023-10-04 10:32:29 -07:00
memory-failure.c mm: convert DAX lock/unlock page to lock/unlock folio 2023-10-04 10:32:20 -07:00
memory-tiers.c dax, kmem: calculate abstract distance with general interface 2023-10-16 15:44:39 -07:00
memory.c mm/gup: adapt get_user_page_vma_remote() to never return NULL 2023-10-18 14:34:15 -07:00
mempolicy.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-10-18 14:32:58 -07:00
mempool.c
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c mm: memtest: convert to memtest_report_meminfo() 2023-08-21 13:37:47 -07:00
migrate_device.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
migrate.c mm/migrate: remove unused mm argument from do_move_pages_to_node 2023-10-18 14:34:14 -07:00
mincore.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
mlock.c mm: mlock: update mlock_pte_range to handle large folio 2023-10-04 10:32:32 -07:00
mm_init.c mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO 2023-10-04 10:32:30 -07:00
mm_slot.h
mmap_lock.c
mmap.c mmap: add clarifying comment to vma_merge() code 2023-10-18 14:34:15 -07:00
mmu_gather.c mm: fix kernel-doc warning from tlb_flush_rmaps() 2023-08-24 16:20:30 -07:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c
mprotect.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
mremap.c mm/mremap: allow moves within the same VMA for stack moves 2023-10-04 10:32:20 -07:00
msync.c
nommu.c mm: make __access_remote_vm() static 2023-10-18 14:34:15 -07:00
oom_kill.c mm: remove redundant K() macro definition 2023-08-21 13:37:44 -07:00
page_alloc.c mm/page_alloc: remove unnecessary next_page in break_down_buddy_pages 2023-10-18 14:34:15 -07:00
page_counter.c
page_ext.c mm/page_ext: move functions around for minor cleanups to page_ext 2023-08-18 10:12:31 -07:00
page_idle.c
page_io.c mm: memcg: add THP swap out info for anonymous reclaim 2023-10-04 10:32:27 -07:00
page_isolation.c mm/hugetlb: get rid of page_hstate() 2023-08-18 10:12:39 -07:00
page_owner.c mm/page_ext: use page_ext_data helper in page_owner 2023-08-21 13:37:27 -07:00
page_poison.c mm/page_poison: remove unused page_ext.h from page_poison 2023-08-21 13:37:30 -07:00
page_reporting.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c mm: convert page_table_check_pte_set() to page_table_check_ptes_set() 2023-08-24 16:20:18 -07:00
page_vma_mapped.c mm: correct stale comment of function check_pte 2023-08-18 10:12:13 -07:00
page-writeback.c mm: use folio_xor_flags_has_waiters() in folio_end_writeback() 2023-10-18 14:34:17 -07:00
pagewalk.c mm/pagewalk: fix bootstopping regression from extra pte_unmap() 2023-09-02 08:39:21 -07:00
percpu-internal.h percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing 2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm/percpu.c: print error message too if atomic alloc failed 2023-08-25 08:04:59 -07:00
pgalloc-track.h
pgtable-generic.c mm/pgtable: notes on pte_offset_map[_lock]() 2023-08-18 10:12:25 -07:00
process_vm_access.c mm/gup: remove unused vmas parameter from pin_user_pages_remote() 2023-06-09 16:25:25 -07:00
ptdump.c mm: ptdump should use ptep_get_lockless() 2023-06-19 16:19:24 -07:00
readahead.c filemap: Allow __filemap_get_folio to allocate large folios 2023-07-24 18:04:30 -04:00
rmap.c mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap() 2023-10-18 14:34:14 -07:00
rodata_test.c
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem_quota.c shmem: Add default quota limit mount options 2023-08-09 09:15:40 +02:00
shmem.c shmem,percpu_counter: add _limited_add(fbc, limit, amount) 2023-10-18 14:34:14 -07:00
show_mem.c mm: refactor si_mem_available() 2023-10-04 10:32:19 -07:00
shrinker_debug.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shrinker.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shuffle.c
shuffle.h mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
slab_common.c slab fixes for 6.6-rc4 2023-09-29 12:10:12 -07:00
slab.c Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slab.h Randomized slab caches for kmalloc() 2023-07-18 10:07:47 +02:00
slub.c mm/slub: remove freelist_dereference() 2023-07-14 09:57:21 +02:00
sparse-vmemmap.c mm/vmemmap: allow architectures to override how vmemmap optimization works 2023-08-18 10:12:53 -07:00
sparse.c mm/sparse: remove redundant judgments from macro for_each_present_section_nr 2023-08-18 10:12:14 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
swap.c mm: remove references to pagevec 2023-06-23 16:59:30 -07:00
swap.h swap: remove remnants of polling from read_swap_cache_async 2023-08-24 16:20:16 -07:00
swapfile.c mm/swap: inline folio_set_swap_entry() and folio_swap_entry() 2023-08-24 16:20:28 -07:00
truncate.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
util.c rcu: dump vmalloc memory info safely 2023-09-05 10:13:45 -07:00
vmalloc.c mm: hugetlb: add huge page size param to set_huge_pte_at() 2023-09-29 17:20:47 -07:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-08-16 12:21:32 +01:00
vmscan.c mm: multi-gen LRU: reuse some legacy trace events 2023-10-18 14:34:14 -07:00
vmstat.c mm/vmstat: use this_cpu_try_cmpxchg in mod_{zone,node}_state 2023-10-04 10:32:20 -07:00
workingset.c mm: workingset: dynamically allocate the mm-shadow shrinker 2023-10-04 10:32:24 -07:00
z3fold.c mm/z3fold: remove obsolete comment for struct z3fold_pool 2023-08-21 13:37:51 -07:00
zbud.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zpool.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zsmalloc.c zsmalloc: use copy_page for full page copy 2023-10-18 14:34:16 -07:00
zswap.c mm: zswap: fix pool refcount bug around shrink_worker() 2023-10-18 12:12:40 -07:00