Go to file
Vlastimil Babka 666716fd26 mm, slub: stop freeing kmem_cache_node structures on node offline
Patch series "mm, slab, slub: remove cpu and memory hotplug locks".

Some related work caused me to look at how we use get/put_mems_online()
and get/put_online_cpus() during kmem cache
creation/descruction/shrinking, and realize that it should be actually
safe to remove all of that with rather small effort (as e.g.  Michal Hocko
suspected in some of the past discussions already).  This has the benefit
to avoid rather heavy locks that have caused locking order issues already
in the past.  So this is the result, Patches 2 and 3 remove memory hotplug
and cpu hotplug locking, respectively.  Patch 1 is due to realization that
in fact some races exist despite the locks (even if not removed), but the
most sane solution is not to introduce more of them, but rather accept
some wasted memory in scenarios that should be rare anyway (full memory
hot remove), as we do the same in other contexts already.

This patch (of 3):

Commit e4f8e513c3 ("mm/slub: fix a deadlock in show_slab_objects()") has
fixed a problematic locking order by removing the memory hotplug lock
get/put_online_mems() from show_slab_objects().  During the discussion, it
was argued [1] that this is OK, because existing slabs on the node would
prevent a hotremove to proceed.

That's true, but per-node kmem_cache_node structures are not necessarily
allocated on the same node and may exist even without actual slab pages on
the same node.  Any path that uses get_node() directly or via
for_each_kmem_cache_node() (such as show_slab_objects()) can race with
freeing of kmem_cache_node even with the !NULL check, resulting in
use-after-free.

To that end, commit e4f8e513c3 argues in a comment that:

 * We don't really need mem_hotplug_lock (to hold off
 * slab_mem_going_offline_callback) here because slab's memory hot
 * unplug code doesn't destroy the kmem_cache->node[] data.

While it's true that slab_mem_going_offline_callback() doesn't free the
kmem_cache_node, the later callback slab_mem_offline_callback() actually
does, so the race and use-after-free exists.  Not just for
show_slab_objects() after commit e4f8e513c3, but also many other places
that are not under slab_mutex.  And adding slab_mutex locking or other
synchronization to SLUB paths such as get_any_partial() would be bad for
performance and error-prone.

The easiest solution is therefore to make the abovementioned comment true
and stop freeing the kmem_cache_node structures, accepting some wasted
memory in the full memory node removal scenario.  Analogically we also
don't free hotremoved pgdat as mentioned in [1], nor the similar per-node
structures in SLAB.  Importantly this approach will not block the
hotremove, as generally such nodes should be movable in order to succeed
hotremove in the first place, and thus the GFP_KERNEL allocated
kmem_cache_node will come from elsewhere.

[1] https://lore.kernel.org/linux-mm/20190924151147.GB23050@dhcp22.suse.cz/

Link: https://lkml.kernel.org/r/20210113131634.3671-1-vbabka@suse.cz
Link: https://lkml.kernel.org/r/20210113131634.3671-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Qian Cai <cai@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 13:38:27 -08:00
arch hexagon: remove CONFIG_EXPERIMENTAL from defconfigs 2021-02-24 13:38:26 -08:00
block for-5.12/block-ipi-2021-02-21 2021-02-22 10:53:05 -08:00
certs certs: Replace K{U,G}IDT_INIT() with GLOBAL_ROOT_{U,G}ID 2021-01-21 16:16:10 +00:00
crypto Keyrings miscellany 2021-02-23 16:09:23 -08:00
Documentation Keyrings miscellany 2021-02-23 16:09:23 -08:00
drivers Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc 2021-02-23 15:09:53 -08:00
fs ramfs: support O_TMPFILE 2021-02-24 13:38:26 -08:00
include mm, tracing: record slab name for kmem_cache_free() 2021-02-24 13:38:26 -08:00
init Kbuild: disable TRIM_UNUSED_KSYMS option 2021-02-23 12:21:58 -08:00
ipc fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
kernel Keyrings miscellany 2021-02-23 16:09:23 -08:00
lib Modules updates for v5.12 2021-02-23 10:15:33 -08:00
LICENSES LICENSES: Add the CC-BY-4.0 license 2020-12-08 10:33:27 -07:00
mm mm, slub: stop freeing kmem_cache_node structures on node offline 2021-02-24 13:38:27 -08:00
net idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
samples Keyrings miscellany 2021-02-23 16:09:23 -08:00
scripts scripts/spelling.txt: add more spellings to spelling.txt 2021-02-24 13:38:26 -08:00
security Keyrings miscellany 2021-02-23 16:09:23 -08:00
sound ARM updates for 5.12-rc1: 2021-02-22 14:27:07 -08:00
tools clang-lto for v5.12-rc1 (part2) 2021-02-23 15:13:45 -08:00
usr arch: ia64: Remove rest of perfmon support 2021-01-22 12:12:20 +05:30
virt KVM/arm64 fixes for 5.11, take #2 2021-02-12 14:07:39 +00:00
.clang-format clang-format: Update with the latest for_each macro list 2021-01-29 15:00:23 +01:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore clang-lto series for v5.12-rc1 2021-02-23 09:28:51 -08:00
.mailmap Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-02-21 17:23:56 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: dccp: move Gerrit Renker to CREDITS 2021-01-14 10:53:49 -08:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS dmaengine updates for v5.12-rc1 2021-02-23 15:05:10 -08:00
Makefile clang-lto for v5.12-rc1 (part2) 2021-02-23 15:13:45 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.