linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 16:24:13 +08:00

History

Johannes Weiner e82553c10b Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" This reverts commit `536d3bf261`, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit `b3ff92916a` ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org Fixes: `536d3bf261` ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2021-02-09 17:26:44 -08:00
..
kasan	kasan: fix stack traces dependency for HW_TAGS	2021-02-09 17:26:44 -08:00
backing-dev.c	mm:backing-dev: use sysfs_emit in macro defining functions	2020-12-15 12:13:47 -08:00
balloon_compaction.c	mm/balloon_compaction: suppress allocation warnings	2019-09-04 07:42:01 -04:00
cleancache.c	Driver Core and debugfs changes for 5.3-rc1	2019-07-12 12:24:03 -07:00
cma_debug.c	debugfs: make sure we can remove u32_array files cleanly	2020-07-10 13:54:00 -07:00
cma.c	mm: cma: improve pr_debug log in cma_release()	2020-12-15 12:13:46 -08:00
cma.h	mm: cma: use CMA_MAX_NAME to define the length of cma name array	2020-09-01 09:19:43 +02:00
compaction.c	mm, compaction: move high_pfn to the for loop scope	2021-02-05 11:03:47 -08:00
debug_page_ref.c
debug_vm_pgtable.c	mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.	2020-10-16 11:11:14 -07:00
debug.c	mm: memcontrol: Use helpers to read page's memcg data	2020-12-02 18:28:05 -08:00
dmapool.c	mm/dmapool.c: replace hard coded function name with __func__	2020-10-13 18:38:32 -07:00
early_ioremap.c	mm/early_ioremap.c: use %pa to print resource_size_t variables	2020-01-31 10:30:38 -08:00
fadvise.c	mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED	2020-10-13 18:38:29 -07:00
failslab.c	mm/failslab.c: by default, do not fail allocations with direct reclaim only	2019-07-12 11:05:43 -07:00
filemap.c	mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked()	2021-02-05 11:03:47 -08:00
frame_vector.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
frontswap.c	mm/frontswap: mark various intentional data races	2020-08-14 19:56:56 -07:00
gup_test.c	mm/gup_test.c: mark gup_test_init as __init function	2020-12-15 12:13:38 -08:00
gup_test.h	selftests/vm: gup_test: introduce the dump_pages() sub-test	2020-12-15 12:13:38 -08:00
gup.c	Merge branch 'akpm' (patches from Andrew)	2020-12-15 12:53:37 -08:00
highmem.c	mm/highmem: prepare for overriding set_pte_at()	2021-01-24 10:34:52 -08:00
hmm.c	mm: do page fault accounting in handle_mm_fault	2020-08-12 10:58:02 -07:00
huge_memory.c	mm: thp: fix MADV_REMOVE deadlock on shmem THP	2021-02-05 11:03:47 -08:00
hugetlb_cgroup.c	hugetlb_cgroup: fix offline of hugetlb cgroup with reservations	2020-12-06 10:19:07 -08:00
hugetlb.c	mm: hugetlb: fix missing put_page in gather_surplus_pages()	2021-02-05 11:03:47 -08:00
hwpoison-inject.c	mm,hwpoison-inject: don't pin for hwpoison_filter	2020-10-16 11:11:16 -07:00
init-mm.c	mm/gup: prevent gup_fast from racing with COW during fork	2020-12-15 12:13:39 -08:00
internal.h	mm, page_alloc: disable pcplists during memory offline	2020-12-15 12:13:43 -08:00
interval_tree.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248	2019-06-19 17:09:08 +02:00
ioremap.c	mm: move p?d_alloc_track to separate header file	2020-08-07 11:33:26 -07:00
Kconfig	mm/Kconfig: fix spelling mistake "whats" -> "what's"	2020-12-19 11:25:41 -08:00
Kconfig.debug	mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO	2020-12-15 12:13:46 -08:00
khugepaged.c	mm: fix some spelling mistakes in comments	2020-12-15 22:46:19 -08:00
kmemleak.c	mm/kmemleak: rely on rcu for task stack scanning	2020-10-13 18:38:27 -07:00
ksm.c	mm: cleanup kstrto*() usage	2020-12-15 12:13:47 -08:00
list_lru.c	mm: list_lru: set shrinker map bit when child nr_items is not zero	2020-12-06 10:19:07 -08:00
maccess.c	uaccess: add force_uaccess_{begin,end} helpers	2020-08-12 10:57:59 -07:00
madvise.c	mm,memory_failure: always pin the page in madvise_inject_error	2020-12-15 12:13:44 -08:00
Makefile	mm: mmap_lock: add tracepoints around lock acquisition	2020-12-15 12:13:41 -08:00
mapping_dirty_helpers.c	mm/mapping_dirty_helpers: enhance the kernel-doc markups	2020-12-15 12:13:41 -08:00
memblock.c	memblock: do not start bottom-up allocations with kernel_end	2021-02-05 11:03:47 -08:00
memcontrol.c	Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"	2021-02-09 17:26:44 -08:00
memfd.c	mm: page cache: store only head pages in i_pages	2019-09-24 15:54:08 -07:00
memory_hotplug.c	mm: memmap defer init doesn't work as expected	2020-12-29 15:36:49 -08:00
memory-failure.c	mm: fix page reference leak in soft_offline_page()	2021-01-24 10:34:52 -08:00
memory.c	mm: generalise COW SMC TLB flushing race comment	2020-12-29 15:36:49 -08:00
mempolicy.c	mm: migrate: initialize err in do_migrate_pages	2021-01-12 18:12:54 -08:00
mempool.c	kasan, mm: rename kasan_poison_kfree	2020-12-22 12:55:09 -08:00
memremap.c	mm/mremap_pages: fix static key devmap_managed_key updates	2020-11-02 12:14:18 -08:00
memtest.c
migrate.c	mm: migrate: do not migrate HugeTLB page whose refcount is one	2021-02-05 11:03:47 -08:00
mincore.c	mm: factor find_get_incore_page out of mincore_page	2020-10-13 18:38:29 -07:00
mlock.c	mm/lru: introduce relock_page_lruvec()	2020-12-15 14:48:04 -08:00
mm_init.c	mm: fix fall-through warnings for Clang	2020-12-15 12:13:47 -08:00
mmap_lock.c	mm: mmap_lock: add tracepoints around lock acquisition	2020-12-15 12:13:41 -08:00
mmap.c	UAPI Changes:	2020-12-18 12:38:28 -08:00
mmu_gather.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
mmu_notifier.c	mm: track mmu notifiers in fs_reclaim_acquire/release	2020-12-15 12:13:41 -08:00
mmzone.c	mm/lru: replace pgdat lru_lock with lruvec lock	2020-12-15 14:48:04 -08:00
mprotect.c	mm: Add 'mprotect' hook to struct vm_operations_struct	2020-11-17 14:36:14 +01:00
mremap.c	mm/mremap: fix BUILD_BUG_ON() error in get_extent	2021-02-09 17:26:44 -08:00
msync.c	mmap locking API: use coccinelle to convert mmap_sem rwsem call sites	2020-06-09 09:39:14 -07:00
nommu.c	mm: cleanup: remove unused tsk arg from __access_remote_vm	2020-12-15 12:13:40 -08:00
oom_kill.c	mm/oom_kill: change comment and rename is_dump_unreclaim_slabs()	2020-12-15 12:13:45 -08:00
page_alloc.c	Revert "mm: fix initialization of struct page for holes in memory layout"	2021-01-26 10:39:46 -08:00
page_counter.c	mm/page_counter: use page_counter_read in page_counter_set_max	2020-12-15 12:13:40 -08:00
page_ext.c	mm: fix some spelling mistakes in comments	2020-12-15 22:46:19 -08:00
page_idle.c	mm: page_idle_get_page() does not need lru_lock	2020-12-15 14:48:03 -08:00
page_io.c	mm: memcontrol: Use helpers to read page's memcg data	2020-12-02 18:28:05 -08:00
page_isolation.c	mm/page_isolation: do not isolate the max order page	2020-12-15 12:13:45 -08:00
page_owner.c	mm/page_owner: record timestamp and pid	2020-12-15 12:13:38 -08:00
page_poison.c	kasan, mm: reset tags when accessing metadata	2020-12-22 12:55:08 -08:00
page_reporting.c	mm: rename page_order() to buddy_order()	2020-10-16 11:11:19 -07:00
page_reporting.h	mm: introduce include/linux/pgtable.h	2020-06-09 09:39:13 -07:00
page_vma_mapped.c	mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte	2020-12-15 12:13:41 -08:00
page-writeback.c	mm: make wait_on_page_writeback() wait for multiple pending writebacks	2021-01-05 11:33:00 -08:00
pagewalk.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
percpu-internal.h	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu-km.c	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu-stats.c	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu-vm.c	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu.c	percpu: convert flexible array initializers to use struct_size()	2020-10-30 23:02:28 +00:00
pgalloc-track.h	mm: move p?d_alloc_track to separate header file	2020-08-07 11:33:26 -07:00
pgtable-generic.c	mm: introduce include/linux/pgtable.h	2020-06-09 09:39:13 -07:00
process_vm_access.c	mm/process_vm_access.c: include compat.h	2021-01-12 18:12:54 -08:00
ptdump.c	kasan, arm64: expand CONFIG_KASAN checks	2020-12-22 12:55:08 -08:00
readahead.c	mm: use limited read-ahead to satisfy read	2020-10-17 13:49:08 -06:00
rmap.c	mm/lru: revise the comments of lru_lock	2020-12-15 14:48:04 -08:00
rodata_test.c	mm/rodata_test.c: fix missing function declaration	2020-08-21 09:52:53 -07:00
shmem.c	mm: shmem: convert shmem_enabled_show to use sysfs_emit_at	2020-12-15 12:13:47 -08:00
shuffle.c	mm: rename page_order() to buddy_order()	2020-10-16 11:11:19 -07:00
shuffle.h	mm/shuffle: remove dynamic reconfiguration	2020-08-07 11:33:29 -07:00
slab_common.c	kasan, mm: allow cache merging with no metadata	2020-12-22 12:55:09 -08:00
slab.c	mm: introduce debug_pagealloc_{map,unmap}_pages() helpers	2020-12-15 12:13:43 -08:00
slab.h	Networking updates for 5.11	2020-12-15 13:22:29 -08:00
slob.c	mm: extract might_alloc() debug check	2020-12-15 12:13:41 -08:00
slub.c	Revert "mm/slub: fix a memory leak in sysfs_slab_add()"	2021-01-28 09:05:44 -08:00
sparse-vmemmap.c	mm/sparse: only sub-section aligned range would be populated	2020-08-07 11:33:27 -07:00
sparse.c	mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG	2020-10-16 11:11:18 -07:00
swap_cgroup.c	mm: memcontrol: make swap tracking an integral part of memory control	2020-06-03 20:09:48 -07:00
swap_slots.c	mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache()	2020-10-13 18:38:30 -07:00
swap_state.c	mm: use sysfs_emit for struct kobject * uses	2020-12-15 12:13:47 -08:00
swap.c	mm/lru: introduce relock_page_lruvec()	2020-12-15 14:48:04 -08:00
swapfile.c	mm: fix a race on nr_swap_pages	2020-12-15 22:46:15 -08:00
truncate.c	mm: fix kernel-doc markups	2020-12-15 12:13:47 -08:00
usercopy.c	mm/usercopy.c: delete duplicated word	2020-08-12 10:57:58 -07:00
userfaultfd.c	mm/vmscan: protect the workingset on anonymous LRU	2020-08-12 10:57:55 -07:00
util.c	mm: introduce vma_set_file function v5	2020-11-19 10:36:36 +01:00
vmacache.c	kernel: better document the use_mm/unuse_mm API contract	2020-06-10 19:14:18 -07:00
vmalloc.c	mm/vmalloc.c: fix potential memory leak	2021-01-12 18:12:54 -08:00
vmpressure.c	mm: vmpressure: use mem_cgroup_is_root API	2020-04-02 09:35:31 -07:00
vmscan.c	mm: don't put pinned pages into the swap cache	2021-01-17 12:08:04 -08:00
vmstat.c	arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL	2020-12-15 12:13:42 -08:00
workingset.c	Merge branch 'akpm' (patches from Andrew)	2020-12-15 14:55:10 -08:00
z3fold.c	z3fold: remove preempt disabled sections for RT	2020-12-15 12:13:45 -08:00
zbud.c	mm/zbud: remove redundant initialization	2020-10-13 18:38:34 -07:00
zpool.c	mm/zpool.c: delete duplicated word and fix grammar	2020-08-12 10:57:58 -07:00
zsmalloc.c	mm/zsmalloc.c: rework the list_add code in insert_zspage()	2020-12-15 12:13:46 -08:00
zswap.c	mm/zswap: move to use crypto_acomp API for hardware acceleration	2020-12-15 12:13:46 -08:00