Go to file
Johannes Weiner 63fd327016 mm: memcontrol: don't throttle dying tasks on memory.high
While investigating hosts with high cgroup memory pressures, Tejun
found culprit zombie tasks that had were holding on to a lot of
memory, had SIGKILL pending, but were stuck in memory.high reclaim.

In the past, we used to always force-charge allocations from tasks
that were exiting in order to accelerate them dying and freeing up
their rss. This changed for memory.max in a4ebf1b6ca ("memcg:
prohibit unconditional exceeding the limit of dying tasks"); it noted
that this can cause (userspace inducable) containment failures, so it
added a mandatory reclaim and OOM kill cycle before forcing charges.
At the time, memory.high enforcement was handled in the userspace
return path, which isn't reached by dying tasks, and so memory.high
was still never enforced by dying tasks.

When c9afe31ec4 ("memcg: synchronously enforce memory.high for large
overcharges") added synchronous reclaim for memory.high, it added
unconditional memory.high enforcement for dying tasks as well. The
callstack shows that this path is where the zombie is stuck in.

We need to accelerate dying tasks getting past memory.high, but we
cannot do it quite the same way as we do for memory.max: memory.max is
enforced strictly, and tasks aren't allowed to move past it without
FIRST reclaiming and OOM killing if necessary. This ensures very small
levels of excess. With memory.high, though, enforcement happens lazily
after the charge, and OOM killing is never triggered. A lot of
concurrent threads could have pushed, or could actively be pushing,
the cgroup into excess. The dying task will enter reclaim on every
allocation attempt, with little hope of restoring balance.

To fix this, skip synchronous memory.high enforcement on dying tasks
altogether again. Update memory.high path documentation while at it.

[hannes@cmpxchg.org: also handle tasks are being killed during the reclaim]
  Link: https://lkml.kernel.org/r/20240111192807.GA424308@cmpxchg.org
Link: https://lkml.kernel.org/r/20240111132902.389862-1-hannes@cmpxchg.org
Fixes: c9afe31ec4 ("memcg: synchronously enforce memory.high for large overcharges")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-25 23:52:20 -08:00
arch powerpc fixes for 6.8 #2 2024-01-21 11:04:29 -08:00
block for-6.8/block-2024-01-18 2024-01-18 18:22:40 -08:00
certs This update includes the following changes: 2023-11-02 16:15:30 -10:00
crypto crypto: scomp - fix req->dst buffer overflow 2023-12-29 11:25:56 +08:00
Documentation Updates for time and clocksources: 2024-01-21 11:14:40 -08:00
drivers Updates for time and clocksources: 2024-01-21 11:14:40 -08:00
fs fs/hugetlbfs/inode.c: mm/memory-failure.c: fix hugetlbfs hwpoison handling 2024-01-25 23:52:20 -08:00
include mm: mmap: map MAP_STACK to VM_NOHUGEPAGE 2024-01-25 23:52:20 -08:00
init Driver core changes for 6.8-rc1 2024-01-18 09:48:40 -08:00
io_uring for-6.8/io_uring-2024-01-18 2024-01-18 18:17:57 -08:00
ipc shm: Slim down dependencies 2023-12-20 19:26:31 -05:00
kernel uprobes: use pagesize-aligned virtual address when replacing pages 2024-01-25 23:52:20 -08:00
lib RISC-V Patches for the 6.8 Merge Window, Part 4 2024-01-20 11:06:04 -08:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm: memcontrol: don't throttle dying tasks on memory.high 2024-01-25 23:52:20 -08:00
net Assorted CephFS fixes and cleanups with nothing standing out. 2024-01-19 09:58:55 -08:00
rust Rust changes for v6.8 2024-01-11 13:05:41 -08:00
samples RISC-V Patches for the 6.8 Merge Window, Part 4 2024-01-20 11:06:04 -08:00
scripts Coccinelle change for v6.8 2024-01-20 14:20:34 -08:00
security + Features 2024-01-19 10:53:55 -08:00
sound sound fixes for 6.8-rc1 2024-01-19 12:30:29 -08:00
tools selftests/mm: mremap_test: fix build warning 2024-01-25 23:52:20 -08:00
usr Kbuild updates for v6.8 2024-01-18 17:57:07 -08:00
virt Generic: 2024-01-17 13:03:37 -08:00
.clang-format clang-format: Update with v6.7-rc4's for_each macro list 2023-12-08 23:54:38 +01:00
.cocciconfig
.editorconfig Add .editorconfig file for basic formatting 2023-12-28 16:22:47 +09:00
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Add .editorconfig file for basic formatting 2023-12-28 16:22:47 +09:00
.mailmap Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS Including fixes from bpf and netfilter. 2024-01-18 17:33:50 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Various smb client fixes, including multichannel and for SMB3.1.1 POSIX extensions 2024-01-20 16:48:07 -08:00
Makefile Linux 6.8-rc1 2024-01-21 14:11:32 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.