linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 00:04:15 +08:00

History

Ryan Roberts 845982eb26 mm: swap: allow storage of all mTHP orders Multi-size THP enables performance improvements by allocating large, pte-mapped folios for anonymous memory. However I've observed that on an arm64 system running a parallel workload (e.g. kernel compilation) across many cores, under high memory pressure, the speed regresses. This is due to bottlenecking on the increased number of TLBIs added due to all the extra folio splitting when the large folios are swapped out. Therefore, solve this regression by adding support for swapping out mTHP without needing to split the folio, just like is already done for PMD-sized THP. This change only applies when CONFIG_THP_SWAP is enabled, and when the swap backing store is a non-rotating block device. These are the same constraints as for the existing PMD-sized THP swap-out support. Note that no attempt is made to swap-in (m)THP here - this is still done page-by-page, like for PMD-sized THP. But swapping-out mTHP is a prerequisite for swapping-in mTHP. The main change here is to improve the swap entry allocator so that it can allocate any power-of-2 number of contiguous entries between [1, (1 << PMD_ORDER)]. This is done by allocating a cluster for each distinct order and allocating sequentially from it until the cluster is full. This ensures that we don't need to search the map and we get no fragmentation due to alignment padding for different orders in the cluster. If there is no current cluster for a given order, we attempt to allocate a free cluster from the list. If there are no free clusters, we fail the allocation and the caller can fall back to splitting the folio and allocates individual entries (as per existing PMD-sized THP fallback). The per-order current clusters are maintained per-cpu using the existing infrastructure. This is done to avoid interleving pages from different tasks, which would prevent IO being batched. This is already done for the order-0 allocations so we follow the same pattern. As is done for order-0 per-cpu clusters, the scanner now can steal order-0 entries from any per-cpu-per-order reserved cluster. This ensures that when the swap file is getting full, space doesn't get tied up in the per-cpu reserves. This change only modifies swap to be able to accept any order mTHP. It doesn't change the callers to elide doing the actual split. That will be done in separate changes. Link: https://lkml.kernel.org/r/20240408183946.2991168-6-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Gao Xiang <xiang@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2024-04-25 20:56:37 -07:00
..
damon	mm: madvise: pageout: ignore references rather than clearing young	2024-03-04 17:01:18 -08:00
kasan	fix missing vmalloc.h includes	2024-04-25 20:55:49 -07:00
kfence	mm: introduce slabobj_ext to support slab object extensions	2024-04-25 20:55:51 -07:00
kmsan	mm: kmsan: remove runtime checks from kmsan_unpoison_memory()	2024-02-22 10:24:41 -08:00
backing-dev.c	mm: backing-dev: use group allocation/free of per-cpu counters API	2024-04-25 20:56:12 -07:00
balloon_compaction.c
bootmem_info.c	bootmem: use kmemleak_free_part_phys in put_page_bootmem	2023-10-25 16:47:13 -07:00
cma_debug.c
cma_sysfs.c	mm/cma: add sysfs file 'release_pages_success'	2024-02-22 10:24:57 -08:00
cma.c	mm/cma: add sysfs file 'release_pages_success'	2024-02-22 10:24:57 -08:00
cma.h	mm/cma: add sysfs file 'release_pages_success'	2024-02-22 10:24:57 -08:00
compaction.c	memory: remove the now superfluous sentinel element from ctl_table array	2024-04-25 20:56:32 -07:00
debug_page_alloc.c	mm: page_alloc: consolidate free page accounting	2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c	fix missing vmalloc.h includes	2024-04-25 20:55:49 -07:00
debug.c	mm: switch mm->get_unmapped_area() to a flag	2024-04-25 20:56:25 -07:00
dmapool_test.c	dmapool: add alloc/free performance test	2023-04-05 19:42:38 -07:00
dmapool.c	mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs	2023-12-05 11:17:58 +01:00
early_ioremap.c	mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup()	2023-06-09 16:25:56 -07:00
fadvise.c	mm: remove unnecessary pagevec includes	2023-06-23 16:59:31 -07:00
fail_page_alloc.c	mm: page_alloc: split out FAIL_PAGE_ALLOC	2023-06-09 16:25:23 -07:00
failslab.c	mm: fix unexpected changes to {failslab\|fail_page_alloc}.attr	2022-11-22 18:50:44 -08:00
filemap.c	mm/filemap: optimize filemap folio adding	2024-04-25 20:56:09 -07:00
folio-compat.c	mm: remove __set_page_dirty_nobuffers()	2024-04-25 20:56:25 -07:00
gup_test.c	Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.	2023-06-23 16:58:19 -07:00
gup_test.h	mm/gup_test: start/stop/read functionality for PIN LONGTERM test	2022-11-08 17:37:15 -08:00
gup.c	mm/gup: handle hugetlb in the generic follow_page_mask code	2024-04-25 20:56:23 -07:00
highmem.c	x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs	2023-12-29 12:22:28 -08:00
hmm.c	mm/treewide: replace pXd_huge() with pXd_leaf()	2024-04-25 20:55:46 -07:00
huge_memory.c	mm: swap: remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags	2024-04-25 20:56:36 -07:00
hugetlb_cgroup.c	mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER	2023-10-18 14:34:17 -07:00
hugetlb_vmemmap.c	memory: remove the now superfluous sentinel element from ctl_table array	2024-04-25 20:56:32 -07:00
hugetlb_vmemmap.h	mm: hugetlb_vmemmap: fix reference to nonexistent file	2023-10-25 16:47:14 -07:00
hugetlb.c	memory: remove the now superfluous sentinel element from ctl_table array	2024-04-25 20:56:32 -07:00
hwpoison-inject.c	mm/hwpoison: add __init/__exit annotations to module init/exit funcs	2022-10-03 14:03:05 -07:00
init-mm.c	mm: Deprecate pasid field	2023-12-12 10:11:32 +01:00
internal.h	mm: swap: free_swap_and_cache_nr() as batched free_swap_and_cache()	2024-04-25 20:56:37 -07:00
interval_tree.c
io-mapping.c
ioremap.c	mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed	2023-08-18 10:12:36 -07:00
Kconfig	mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES	2024-04-25 20:56:20 -07:00
Kconfig.debug	mm/slub: unify all sl[au]b parameters with "slab_$param"	2024-01-22 10:31:08 +01:00
khugepaged.c	khugepaged: use a folio throughout hpage_collapse_scan_file()	2024-04-25 20:56:34 -07:00
kmemleak.c	mm/kmemleak: compact kmemleak_object further	2024-04-25 20:56:05 -07:00
ksm.c	mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte\|pmd]()	2023-12-29 11:58:56 -08:00
list_lru.c	mm/zswap: stop lru list shrinking when encounter warm region	2024-02-22 10:24:54 -08:00
maccess.c	mm: Fix copy_from_user_nofault().	2023-04-12 17:36:23 -07:00
madvise.c	mm: swap: free_swap_and_cache_nr() as batched free_swap_and_cache()	2024-04-25 20:56:37 -07:00
Makefile	mm/kmemleak: disable KASAN instrumentation in kmemleak	2024-04-25 20:56:05 -07:00
mapping_dirty_helpers.c	mm: fix clean_record_shared_mapping_range kernel-doc	2023-08-24 16:20:30 -07:00
memblock.c	cxl fixes for 6.8-rc6	2024-02-24 15:53:40 -08:00
memcontrol.c	mm, slab: move slab_memcg hooks to mm/memcontrol.c	2024-04-25 20:56:16 -07:00
memfd.c	mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins()	2024-03-04 17:01:21 -08:00
memory_hotplug.c	mm: record the migration reason for struct migration_target_control	2024-04-25 20:56:06 -07:00
memory-failure.c	memory: remove the now superfluous sentinel element from ctl_table array	2024-04-25 20:56:32 -07:00
memory-tiers.c	mm/demotion: print demotion targets	2024-02-22 10:24:55 -08:00
memory.c	mm: swap: free_swap_and_cache_nr() as batched free_swap_and_cache()	2024-04-25 20:56:37 -07:00
mempolicy.c	mm: add pmd_folio()	2024-04-25 20:56:19 -07:00
mempool.c	mempool: hook up to memory allocation profiling	2024-04-25 20:55:56 -07:00
memremap.c	mm: remove stale example from comment	2023-12-29 11:58:26 -08:00
memtest.c	memtest: use {READ,WRITE}_ONCE in memory scanning	2024-03-13 12:12:21 -07:00
migrate_device.c	mm: convert migrate_vma_collect_pmd to use a folio	2024-04-25 20:56:19 -07:00
migrate.c	remove references to page->flags in documentation	2024-04-25 20:56:15 -07:00
mincore.c	mm: enable page walking API to lock vmas during the walk	2023-08-21 13:07:20 -07:00
mlock.c	mm: add pmd_folio()	2024-04-25 20:56:19 -07:00
mm_init.c	mm: init_mlocked_on_free_v3	2024-04-25 20:56:29 -07:00
mm_slot.h	mm: introduce common struct mm_slot	2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c	mm: take placement mappings gap into account	2024-04-25 20:56:28 -07:00
mmu_gather.c	mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing	2024-02-22 15:27:17 -08:00
mmu_notifier.c	mmu_notifiers: rename invalidate_range notifier	2023-08-18 10:12:41 -07:00
mmzone.c	zswap: shrink zswap pool based on memory pressure	2023-12-12 10:57:02 -08:00
mprotect.c	mm: support multi-size THP numa balancing	2024-04-25 20:56:30 -07:00
mremap.c	mm: remove "prot" parameter from move_pte()	2024-04-25 20:56:24 -07:00
msync.c	mm/msync: use vma_find() instead of vma linked list	2022-09-26 19:46:25 -07:00
nommu.c	mm: remove follow_pfn	2024-04-25 20:56:12 -07:00
oom_kill.c	memory: remove the now superfluous sentinel element from ctl_table array	2024-04-25 20:56:32 -07:00
page_alloc.c	mm: page_alloc: use the correct THP order for THP PCP	2024-04-25 20:56:36 -07:00
page_counter.c
page_ext.c	mm: make page_ext_get() take a const argument	2024-04-25 20:56:14 -07:00
page_idle.c	mm: page_idle: convert page idle to use a folio	2023-01-18 17:12:52 -08:00
page_io.c	arm64: mm: swap: support THP_SWAP on hardware with MTE	2024-04-25 20:56:07 -07:00
page_isolation.c	mm: page_isolation: prepare for hygienic freelists	2024-04-25 20:56:04 -07:00
page_owner.c	mm: introduce slabobj_ext to support slab object extensions	2024-04-25 20:55:51 -07:00
page_poison.c	mm/page_poison: replace kmap_atomic() with kmap_local_page()	2023-12-10 16:51:50 -08:00
page_reporting.c	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER	2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c	mm: convert page_table_check_pte_set() to page_table_check_ptes_set()	2023-08-24 16:20:18 -07:00
page_vma_mapped.c	mm: rename vma_pgoff_address back to vma_address	2024-04-25 20:56:31 -07:00
page-writeback.c	memory: remove the now superfluous sentinel element from ctl_table array	2024-04-25 20:56:32 -07:00
pagewalk.c	mm: pagewalk: assert write mmap lock only for walking the user page tables	2023-12-10 16:51:53 -08:00
percpu-internal.h	mm: percpu: add codetag reference into pcpuobj_ext	2024-04-25 20:55:56 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c	percpu: clean up all mappings when pcpu_map_pages() fails	2024-04-25 20:55:49 -07:00
percpu.c	mm: percpu: enable per-cpu allocation tagging	2024-04-25 20:55:56 -07:00
pgalloc-track.h
pgtable-generic.c	mm/pgtable: notes on pte_offset_map[_lock]()	2023-08-18 10:12:25 -07:00
process_vm_access.c	mm: fix process_vm_rw page counts	2023-12-10 16:51:39 -08:00
ptdump.c	mm: ptdump: add check_wx_pages debugfs attribute	2024-02-22 10:24:47 -08:00
readahead.c	mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM	2024-04-25 20:56:07 -07:00
rmap.c	mm: rename vma_pgoff_address back to vma_address	2024-04-25 20:56:31 -07:00
rodata_test.c	mm/rodata_test: use PAGE_ALIGNED() helper	2022-10-03 14:03:05 -07:00
secretmem.c	mm/secretmem: use a folio in secretmem_fault()	2023-08-21 13:38:02 -07:00
shmem_quota.c	tmpfs: fix race on handling dquot rbtree	2024-03-26 11:07:23 -07:00
shmem.c	mm: switch mm->get_unmapped_area() to a flag	2024-04-25 20:56:25 -07:00
show_mem.c	lib: add memory allocations report in show_mem()	2024-04-25 20:55:57 -07:00
shrinker_debug.c	mm: shrinker: convert shrinker_rwsem to mutex	2023-10-04 10:32:26 -07:00
shrinker.c	mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()	2024-01-05 09:58:32 -08:00
shuffle.c	mm/shuffle: convert module_param_call to module_param_cb	2022-10-03 14:03:07 -07:00
shuffle.h	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER	2024-01-08 15:27:15 -08:00
slab_common.c	mm/slab: enable slab allocation tagging for kmalloc and friends	2024-04-25 20:55:55 -07:00
slab.h	mm, slab: move slab_memcg hooks to mm/memcontrol.c	2024-04-25 20:56:16 -07:00
slub.c	mm, slab: move slab_memcg hooks to mm/memcontrol.c	2024-04-25 20:56:16 -07:00
sparse-vmemmap.c	mm/vmemmap: allow architectures to override how vmemmap optimization works	2023-08-18 10:12:53 -07:00
sparse.c	mm: move array mem_section init code out of memory_present()	2024-04-25 20:56:16 -07:00
swap_cgroup.c	mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled	2022-10-03 14:03:36 -07:00
swap_slots.c	mm: swap: update get_swap_pages() to take folio order	2024-04-25 20:56:37 -07:00
swap_state.c	mm: add is_huge_zero_folio()	2024-04-25 20:56:18 -07:00
swap.c	mm: add is_huge_zero_folio()	2024-04-25 20:56:18 -07:00
swap.h	mm/swap: fix race when skipping swapcache	2024-02-20 14:20:48 -08:00
swapfile.c	mm: swap: allow storage of all mTHP orders	2024-04-25 20:56:37 -07:00
truncate.c	fs: convert error_remove_page to error_remove_folio	2023-12-10 16:51:42 -08:00
usercopy.c	mm: Fix copy_from_user_nofault().	2023-04-12 17:36:23 -07:00
userfaultfd.c	mm: add pmd_folio()	2024-04-25 20:56:19 -07:00
util.c	mm: switch mm->get_unmapped_area() to a flag	2024-04-25 20:56:25 -07:00
vmalloc.c	mm/vmalloc.c: optimize to reduce arguments of alloc_vmap_area()	2024-04-25 20:56:08 -07:00
vmpressure.c	eventfd: simplify eventfd_signal()	2023-11-28 14:08:38 +01:00
vmscan.c	mm: hold PTL from the first PTE while reclaiming a large folio	2024-04-25 20:56:08 -07:00
vmstat.c	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER	2024-01-08 15:27:15 -08:00
workingset.c	mm: move mapping_set_update out of <linux/swap.h>	2024-02-21 11:36:50 +05:30
z3fold.c	mm: zpool: return pool size in pages	2024-04-25 20:55:48 -07:00
zbud.c	mm: zpool: return pool size in pages	2024-04-25 20:55:48 -07:00
zpool.c	mm: zpool: return pool size in pages	2024-04-25 20:55:48 -07:00
zsmalloc.c	mm: zpool: return pool size in pages	2024-04-25 20:55:48 -07:00
zswap.c	zswap: replace RB tree with xarray	2024-04-25 20:56:18 -07:00