Go to file
Vlastimil Babka ec6e8c7e03 mm, page_alloc: disable pcplists during memory offline
Memory offlining relies on page isolation to guarantee a forward progress
because pages cannot be reused while they are isolated.  But the page
isolation itself doesn't prevent from races while freed pages are stored
on pcp lists and thus can be reused.  This can be worked around by
repeated draining of pcplists, as done by commit 9683182612
("mm/memory_hotplug: drain per-cpu pages again during memory offline").

David and Michal would prefer that this race was closed in a way that
callers of page isolation who need stronger guarantees don't need to
repeatedly drain.  David suggested disabling pcplists usage completely
during page isolation, instead of repeatedly draining them.

To achieve this without adding special cases in alloc/free fastpath, we
can use the same approach as boot pagesets - when pcp->high is 0, any
pcplist addition will be immediately flushed.

The race can thus be closed by setting pcp->high to 0 and draining
pcplists once, before calling start_isolate_page_range().  The draining
will serialize after processes that already disabled interrupts and read
the old value of pcp->high in free_unref_page_commit(), and processes that
have not yet disabled interrupts, will observe pcp->high == 0 when they
are rescheduled, and skip pcplists.  This guarantees no stray pages on
pcplists in zones where isolation happens.

This patch thus adds zone_pcp_disable() and zone_pcp_enable() functions
that page isolation users can call before start_isolate_page_range() and
after unisolating (or offlining) the isolated pages.

Also, drain_all_pages() is optimized to only execute on cpus where
pcplists are not empty.  The check can however race with a free to pcplist
that has not yet increased the pcp->count from 0 to 1.  Thus make the
drain optionally skip the racy check and drain on all cpus, and use this
option in zone_pcp_disable().

As we have to avoid external updates to high and batch while pcplists are
disabled, we take pcp_batch_high_lock in zone_pcp_disable() and release it
in zone_pcp_enable().  This also synchronizes multiple users of
zone_pcp_disable()/enable().

Currently the only user of this functionality is offline_pages().

[vbabka@suse.cz: add comment, per David]
  Link: https://lkml.kernel.org/r/527480ef-ed72-e1c1-52a0-1c5b0113df45@suse.cz

Link: https://lkml.kernel.org/r/20201111092812.11329-8-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:43 -08:00
arch arch, mm: make kernel_page_present() always available 2020-12-15 12:13:43 -08:00
block block-5.10-2020-12-05 2020-12-05 14:45:30 -08:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto drivers-5.10-2020-10-12 2020-10-13 13:04:41 -07:00
Documentation arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2020-12-15 12:13:42 -08:00
drivers lkdtm: disable KASAN for rodata.o 2020-12-15 12:13:42 -08:00
fs arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2020-12-15 12:13:42 -08:00
include mm, page_alloc: cache pageset high and batch in struct zone 2020-12-15 12:13:43 -08:00
init mm: fix page_owner initializing issue for arm32 2020-12-15 12:13:38 -08:00
ipc vm_ops: rename .split() callback to .may_split() 2020-12-15 12:13:41 -08:00
kernel PM: hibernate: make direct map manipulations more explicit 2020-12-15 12:13:43 -08:00
lib lib/test_kasan.c: add workqueue test case 2020-12-15 12:13:42 -08:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm, page_alloc: disable pcplists during memory offline 2020-12-15 12:13:43 -08:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2020-12-10 14:29:30 -08:00
samples Tracing fixes for 5.10-rc6 2020-12-01 15:30:18 -08:00
scripts Kbuild fixes for v5.10 (2nd) 2020-12-06 10:31:39 -08:00
security selinux/stable-5.10 PR 20201113 2020-11-14 12:04:02 -08:00
sound ALSA: pcm: use krealloc_array() 2020-12-15 12:13:37 -08:00
tools kselftests: vm: add mremap tests 2020-12-15 12:13:40 -08:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt kvm: x86/mmu: Support dirty logging for the TDP MMU 2020-10-23 03:42:13 -04:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: docs: ignore sphinx_*/ directories 2020-09-10 10:44:31 -06:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-12-10 15:30:13 -08:00
Makefile Linux 5.10 2020-12-13 14:41:30 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.