linux/mm
Barry Song bea67dcc5e mm: attempt to batch free swap entries for zap_pte_range()
Zhiguo reported that swap release could be a serious bottleneck during
process exits[1].  With mTHP, we have the opportunity to batch free swaps.

Thanks to the work of Chris and Kairui[2], I was able to achieve this
optimization with minimal code changes by building on their efforts.

If swap_count is 1, which is likely true as most anon memory are private,
we can free all contiguous swap slots all together.

Ran the below test program for measuring the bandwidth of munmap
using zRAM and 64KiB mTHP:

 #include <sys/mman.h>
 #include <sys/time.h>
 #include <stdlib.h>

 unsigned long long tv_to_ms(struct timeval tv)
 {
        return tv.tv_sec * 1000 + tv.tv_usec / 1000;
 }

 main()
 {
        struct timeval tv_b, tv_e;
        int i;
 #define SIZE 1024*1024*1024
        void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                                MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        if (!p) {
                perror("fail to get memory");
                exit(-1);
        }

        madvise(p, SIZE, MADV_HUGEPAGE);
        memset(p, 0x11, SIZE); /* write to get mem */

        madvise(p, SIZE, MADV_PAGEOUT);

        gettimeofday(&tv_b, NULL);
        munmap(p, SIZE);
        gettimeofday(&tv_e, NULL);

        printf("munmap in bandwidth: %ld bytes/ms\n",
                        SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b)));
 }

The result is as below (munmap bandwidth):
                mm-unstable  mm-unstable-with-patch
   round1       21053761      63161283
   round2       21053761      63161283
   round3       21053761      63161283
   round4       20648881      67108864
   round5       20648881      67108864

munmap bandwidth becomes 3X faster.

[1] https://lore.kernel.org/linux-mm/20240731133318.527-1-justinjiang@vivo.com/
[2] https://lore.kernel.org/linux-mm/20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org/

[v-songbaohua@oppo.com: check all swaps belong to same swap_cgroup in swap_pte_batch()]
  Link: https://lkml.kernel.org/r/20240815215308.55233-1-21cnbao@gmail.com
[hughd@google.com: add mem_cgroup_disabled() check]
  Link: https://lkml.kernel.org/r/33f34a88-0130-5444-9b84-93198eeb50e7@google.com
[21cnbao@gmail.com: add missing zswap_invalidate()]
  Link: https://lkml.kernel.org/r/20240821054921.43468-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240807215859.57491-3-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:33 -07:00
..
damon mm/damon/lru_sort: adjust local variable to dynamic allocation 2024-09-01 20:25:45 -07:00
kasan kasan: fix bad call to unpoison_slab_object 2024-06-24 20:52:09 -07:00
kfence kfence: save freeing stack trace at calling time instead of freeing time 2024-09-01 20:26:12 -07:00
kmsan kmsan: do not pass NULL pointers as 0 2024-07-03 19:30:26 -07:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c
cma_debug.c
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma.c mm/cma: change the addition of totalcma_pages in the cma_init_reserved_mem 2024-09-01 20:25:56 -07:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick 2024-06-15 10:43:08 -07:00
debug.c mm/debug: print only page mapcount (excluding folio entire mapcount) in __dump_folio() 2024-05-05 17:53:31 -07:00
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
dmapool.c
early_ioremap.c
execmem.c mm/execmem, arch: convert remaining overrides of module_alloc to execmem 2024-05-14 00:31:43 -07:00
fadvise.c
fail_page_alloc.c mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC 2024-07-17 21:05:18 -07:00
failslab.c mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB 2024-07-17 21:05:18 -07:00
filemap.c filemap: add trace events for get_pages, map_pages, and fault 2024-09-01 20:26:10 -07:00
folio-compat.c mm: remove page_mapping() 2024-07-03 19:29:59 -07:00
gup_test.c
gup_test.h
gup.c mm: remove follow_page() 2024-09-01 20:26:01 -07:00
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm: override mTHP "enabled" defaults at kernel cmdline 2024-09-01 20:26:18 -07:00
hugetlb_cgroup.c mm: memcg: don't call propagate_protected_usage() needlessly 2024-09-01 20:25:50 -07:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO 2024-09-01 20:25:45 -07:00
hugetlb_vmemmap.h
hugetlb.c mm/hugetlb_vmemmap: batch HVO work when demoting 2024-09-01 20:26:10 -07:00
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h mm: attempt to batch free swap entries for zap_pte_range() 2024-09-03 21:15:33 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
Kconfig.debug mm/slub: unify all sl[au]b parameters with "slab_$param" 2024-01-22 10:31:08 +01:00
khugepaged.c - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
kmemleak.c kmemleak: enable tracking for percpu pointers 2024-09-01 20:25:49 -07:00
ksm.c mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk 2024-09-01 20:26:02 -07:00
list_lru.c mm: list_lru: fix UAF for memory cgroup 2024-08-07 18:33:56 -07:00
maccess.c
madvise.c Random number generator updates for Linux 6.11-rc1. 2024-07-24 10:29:50 -07:00
Makefile mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
mapping_dirty_helpers.c
memblock.c mm: rework accept memory helpers 2024-09-01 20:26:07 -07:00
memcontrol-v1.c memcg: initiate deprecation of pressure_level 2024-09-01 20:26:21 -07:00
memcontrol-v1.h memcg: allocate v1 event percpu only on v1 deployment 2024-09-01 20:26:20 -07:00
memcontrol.c memcg: make PGPGIN and PGPGOUT v1 only 2024-09-01 20:26:20 -07:00
memfd.c mm/gup: introduce memfd_pin_folios() for pinning memfd folios 2024-07-12 15:52:09 -07:00
memory_hotplug.c mm/memory_hotplug: get rid of __ref 2024-09-01 20:25:56 -07:00
memory-failure.c mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu 2024-08-15 22:16:14 -07:00
memory-tiers.c memory tiering: introduce folio_use_access_time() check 2024-09-01 20:25:47 -07:00
memory.c mm/migrate: move common code to numa_migrate_check (was numa_migrate_prep) 2024-09-01 20:26:06 -07:00
mempolicy.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate_device.c mm: extend rmap flags arguments for folio_add_new_anon_rmap 2024-07-03 19:30:18 -07:00
migrate.c fs: remove calls to set and clear the folio error flag 2024-09-01 20:26:04 -07:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c Random number generator updates for Linux 6.11-rc1. 2024-07-24 10:29:50 -07:00
mm_init.c mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION 2024-09-03 21:15:28 -07:00
mm_slot.h
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-07-03 19:30:26 -07:00
mmap.c mm: remove legacy install_special_mapping() code 2024-09-01 20:26:13 -07:00
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm/mprotect: fix dax pud handlings 2024-09-01 20:26:10 -07:00
mremap.c mm: remove page_mkclean() 2024-07-03 19:30:17 -07:00
mseal.c mseal: fix is_madv_discard() 2024-08-15 22:16:13 -07:00
msync.c
nommu.c mm: remove follow_page() 2024-09-01 20:26:01 -07:00
numa_emulation.c mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
numa_memblks.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
numa.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
oom_kill.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
page_alloc.c mm: accept to promo watermark 2024-09-01 20:26:07 -07:00
page_counter.c mm, memcg: cg2 memory{.swap,}.peak write handlers 2024-09-01 20:25:53 -07:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_idle.c
page_io.c fs: remove calls to set and clear the folio error flag 2024-09-01 20:26:04 -07:00
page_isolation.c mm: page_isolation: handle unaccepted memory isolation 2024-09-01 20:26:07 -07:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: make page_mapped_in_vma conditional on CONFIG_MEMORY_FAILURE 2024-05-05 17:53:45 -07:00
page-writeback.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
pagewalk.c mm/pagewalk: introduce folio_walk_start() + folio_walk_end() 2024-09-01 20:25:59 -07:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c percpu: remove pcpu_alloc_size() 2024-09-01 20:26:04 -07:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-05-07 10:37:00 -07:00
process_vm_access.c mm: fix process_vm_rw page counts 2023-12-10 16:51:39 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm: fix 2024-07-06 11:44:41 -07:00
rmap.c mm/rmap: minimize folio->_nr_pages_mapped updates when batching PTE (un)mapping 2024-09-01 20:26:04 -07:00
rodata_test.c
secretmem.c
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
shmem.c mm: shmem: move shmem_huge_global_enabled() into shmem_allowable_huge_orders() 2024-09-01 20:25:44 -07:00
show_mem.c lib: add memory allocations report in show_mem() 2024-04-25 20:55:57 -07:00
shrinker_debug.c
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shuffle.c
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab_common.c - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
slab.h - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
slub.c mm, slub: do not call do_slab_free for kfence object 2024-07-30 11:50:00 +02:00
sparse-vmemmap.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
sparse.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
swap_cgroup.c mm: attempt to batch free swap entries for zap_pte_range() 2024-09-03 21:15:33 -07:00
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: return the folio from swapin_readahead 2024-09-01 20:26:05 -07:00
swap.c mm/swap: take folio refcount after testing the LRU flag 2024-09-01 20:26:10 -07:00
swap.h mm: return the folio from swapin_readahead 2024-09-01 20:26:05 -07:00
swapfile.c mm: attempt to batch free swap entries for zap_pte_range() 2024-09-03 21:15:33 -07:00
truncate.c mm: Fix missing folio invalidation calls during truncation 2024-08-24 16:09:16 +02:00
usercopy.c
userfaultfd.c userfaultfd: move core VMA manipulation logic to mm/userfaultfd.c 2024-09-01 20:25:53 -07:00
util.c mm: only enforce minimum stack gap size if it's sensible 2024-09-01 20:26:02 -07:00
vma_internal.h mm: remove duplicated include in vma_internal.h 2024-09-01 20:26:02 -07:00
vma.c mm: remove arch_unmap() 2024-09-01 20:26:13 -07:00
vma.h mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
vmalloc.c mm: vmalloc: add optimization hint on page existence check 2024-09-01 20:26:08 -07:00
vmpressure.c
vmscan.c memcg: use ratelimited stats flush in the reclaim 2024-09-01 20:26:13 -07:00
vmstat.c mm: print the promo watermark in zoneinfo 2024-09-01 20:25:59 -07:00
workingset.c cachestat: do not flush stats in recency check 2024-07-03 22:40:37 -07:00
z3fold.c mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool 2024-09-01 20:25:56 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c minmax: make generic MIN() and MAX() macros available everywhere 2024-07-28 15:49:18 -07:00
zswap.c zswap: implement a second chance algorithm for dynamic zswap shrinker 2024-09-01 20:26:02 -07:00