mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()

We currently initialize the memmap such that PG_reserved is set and the
refcount of the page is 1.  In virtio-mem code, we have to manually clear
that PG_reserved flag to make memory offlining with partially hotplugged
memory blocks possible: has_unmovable_pages() would otherwise bail out on
such pages.

We want to avoid PG_reserved where possible and move to typed pages
instead.  Further, we want to further enlighten memory offlining code
about PG_offline: offline pages in an online memory section.  One example
is handling managed page count adjustments in a cleaner way during memory
offlining.

So let's initialize the pages with PG_offline instead of PG_reserved. 
generic_online_page()->__free_pages_core() will now clear that flag before
handing that memory to the buddy.

Note that the page refcount is still 1 and would forbid offlining of such
memory except when special care is take during GOING_OFFLINE as currently
only implemented by virtio-mem.

With this change, we can now get non-PageReserved() pages in the XEN
balloon list.  From what I can tell, that can already happen via
decrease_reservation(), so that should be fine.

HV-balloon should not really observe a change: partial online memory
blocks still cannot get surprise-offlined, because the refcount of these
PageOffline() pages is 1.

Update virtio-mem, HV-balloon and XEN-balloon code to be aware that
hotplugged pages are now PageOffline() instead of PageReserved() before
they are handed over to the buddy.

We'll leave the ZONE_DEVICE case alone for now.

Note that self-hosted vmemmap pages will no longer be marked as
reserved.  This matches ordinary vmemmap pages allocated from the buddy
during memory hotplug.  Now, really only vmemmap pages allocated from
memblock during early boot will be marked reserved.  Existing
PageReserved() checks seem to be handling all relevant cases correctly
even after this change.

Link: https://lkml.kernel.org/r/20240607090939.89524-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de> [generic memory-hotplug bits]
Cc: Alexander Potapenko <glider@google.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eugenio Pérez <eperezma@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Marco Elver <elver@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
David Hildenbrand 2024-06-07 11:09:37 +02:00 committed by Andrew Morton
parent 13c526540b
commit 503b158fc3
7 changed files with 67 additions and 35 deletions

View File

@ -683,9 +683,8 @@ static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg)
if (!PageOffline(pg))
__SetPageOffline(pg);
return;
}
if (PageOffline(pg))
__ClearPageOffline(pg);
} else if (!PageOffline(pg))
return;
/* This frame is currently backed; online the page. */
generic_online_page(pg, 0);

View File

@ -1146,12 +1146,16 @@ static void virtio_mem_set_fake_offline(unsigned long pfn,
for (; nr_pages--; pfn++) {
struct page *page = pfn_to_page(pfn);
__SetPageOffline(page);
if (!onlined) {
if (!onlined)
/*
* Pages that have not been onlined yet were initialized
* to PageOffline(). Remember that we have to route them
* through generic_online_page().
*/
SetPageDirty(page);
/* FIXME: remove after cleanups */
ClearPageReserved(page);
}
else
__SetPageOffline(page);
VM_WARN_ON_ONCE(!PageOffline(page));
}
page_offline_end();
}
@ -1166,9 +1170,11 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
for (; nr_pages--; pfn++) {
struct page *page = pfn_to_page(pfn);
__ClearPageOffline(page);
if (!onlined)
/* generic_online_page() will clear PageOffline(). */
ClearPageDirty(page);
else
__ClearPageOffline(page);
}
}

View File

@ -146,7 +146,8 @@ static DECLARE_WAIT_QUEUE_HEAD(balloon_wq);
/* balloon_append: add the given page to the balloon. */
static void balloon_append(struct page *page)
{
__SetPageOffline(page);
if (!PageOffline(page))
__SetPageOffline(page);
/* Lowmem is re-populated first, so highmem pages go at list tail. */
if (PageHighMem(page)) {
@ -412,7 +413,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
/* Relinquish the page back to the allocator. */
/*
* Relinquish the page back to the allocator. Note that
* some pages, including ones added via xen_online_page(), might
* not be marked reserved; free_reserved_page() will handle that.
*/
free_reserved_page(page);
}

View File

@ -30,16 +30,11 @@
* - Pages falling into physical memory gaps - not IORESOURCE_SYSRAM. Trying
* to read/write these pages might end badly. Don't touch!
* - The zero page(s)
* - Pages not added to the page allocator when onlining a section because
* they were excluded via the online_page_callback() or because they are
* PG_hwpoison.
* - Pages allocated in the context of kexec/kdump (loaded kernel image,
* control pages, vmcoreinfo)
* - MMIO/DMA pages. Some architectures don't allow to ioremap pages that are
* not marked PG_reserved (as they might be in use by somebody else who does
* not respect the caching strategy).
* - Pages part of an offline section (struct pages of offline sections should
* not be trusted as they will be initialized when first onlined).
* - MCA pages on ia64
* - Pages holding CPU notes for POWER Firmware Assisted Dump
* - Device memory (e.g. PMEM, DAX, HMM)
@ -1020,6 +1015,10 @@ PAGE_TYPE_OPS(Buddy, buddy, buddy)
* The content of these pages is effectively stale. Such pages should not
* be touched (read/write/dump/save) except by their owner.
*
* When a memory block gets onlined, all pages are initialized with a
* refcount of 1 and PageOffline(). generic_online_page() will
* take care of clearing PageOffline().
*
* If a driver wants to allow to offline unmovable PageOffline() pages without
* putting them back to the buddy, it can do so via the memory notifier by
* decrementing the reference count in MEM_GOING_OFFLINE and incrementing the
@ -1027,8 +1026,7 @@ PAGE_TYPE_OPS(Buddy, buddy, buddy)
* pages (now with a reference count of zero) are treated like free pages,
* allowing the containing memory block to get offlined. A driver that
* relies on this feature is aware that re-onlining the memory block will
* require to re-set the pages PageOffline() and not giving them to the
* buddy via online_page_callback_t.
* require not giving them to the buddy via generic_online_page().
*
* There are drivers that mark a page PageOffline() and expect there won't be
* any further access to page content. PFN walkers that read content of random

View File

@ -734,7 +734,7 @@ static inline void section_taint_zone_device(unsigned long pfn)
/*
* Associate the pfn range with the given zone, initializing the memmaps
* and resizing the pgdat/zone data to span the added pages. After this
* call, all affected pages are PG_reserved.
* call, all affected pages are PageOffline().
*
* All aligned pageblocks are initialized to the specified migratetype
* (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
@ -1101,8 +1101,12 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
for (i = 0; i < nr_pages; i++)
SetPageVmemmapSelfHosted(pfn_to_page(pfn + i));
for (i = 0; i < nr_pages; i++) {
struct page *page = pfn_to_page(pfn + i);
__ClearPageOffline(page);
SetPageVmemmapSelfHosted(page);
}
/*
* It might be that the vmemmap_pages fully span sections. If that is
@ -1960,9 +1964,9 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
* Don't allow to offline memory blocks that contain holes.
* Consequently, memory blocks with holes can never get onlined
* via the hotplug path - online_pages() - as hotplugged memory has
* no holes. This way, we e.g., don't have to worry about marking
* memory holes PG_reserved, don't need pfn_valid() checks, and can
* avoid using walk_system_ram_range() later.
* no holes. This way, we don't have to worry about memory holes,
* don't need pfn_valid() checks, and can avoid using
* walk_system_ram_range() later.
*/
walk_system_ram_range(start_pfn, nr_pages, &system_ram_pages,
count_system_ram_pages_cb);

View File

@ -893,8 +893,14 @@ void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone
page = pfn_to_page(pfn);
__init_single_page(page, pfn, zone, nid);
if (context == MEMINIT_HOTPLUG)
__SetPageReserved(page);
if (context == MEMINIT_HOTPLUG) {
#ifdef CONFIG_ZONE_DEVICE
if (zone == ZONE_DEVICE)
__SetPageReserved(page);
else
#endif
__SetPageOffline(page);
}
/*
* Usually, we want to mark the pageblock MIGRATE_MOVABLE,

View File

@ -1230,18 +1230,23 @@ void __free_pages_core(struct page *page, unsigned int order,
* When initializing the memmap, __init_single_page() sets the refcount
* of all pages to 1 ("allocated"/"not free"). We have to set the
* refcount of all involved pages to 0.
*
* Note that hotplugged memory pages are initialized to PageOffline().
* Pages freed from memblock might be marked as reserved.
*/
prefetchw(p);
for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
prefetchw(p + 1);
__ClearPageReserved(p);
set_page_count(p, 0);
}
__ClearPageReserved(p);
set_page_count(p, 0);
if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) &&
unlikely(context == MEMINIT_HOTPLUG)) {
prefetchw(p);
for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
prefetchw(p + 1);
VM_WARN_ON_ONCE(PageReserved(p));
__ClearPageOffline(p);
set_page_count(p, 0);
}
VM_WARN_ON_ONCE(PageReserved(p));
__ClearPageOffline(p);
set_page_count(p, 0);
/*
* Freeing the page with debug_pagealloc enabled will try to
* unmap it; some archs don't like double-unmappings, so
@ -1250,6 +1255,15 @@ void __free_pages_core(struct page *page, unsigned int order,
debug_pagealloc_map_pages(page, nr_pages);
adjust_managed_page_count(page, nr_pages);
} else {
prefetchw(p);
for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
prefetchw(p + 1);
__ClearPageReserved(p);
set_page_count(p, 0);
}
__ClearPageReserved(p);
set_page_count(p, 0);
/* memblock adjusts totalram_pages() manually. */
atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
}