linux/mm
Nadav Amit d84887739d mm/mprotect: allow clean exclusive anon pages to be writable
Patch series "mm/autonuma: replace savedwrite infrastructure", v2.

As discussed in my talk at LPC, we can reuse the same mechanism for
deciding whether to map a pte writable when upgrading permissions via
mprotect() -- e.g., PROT_READ -> PROT_READ|PROT_WRITE -- to replace the
savedwrite infrastructure used for NUMA hinting faults (e.g., PROT_NONE ->
PROT_READ|PROT_WRITE).

Instead of maintaining previous write permissions for a pte/pmd, we
re-determine if the pte/pmd can be writable.  The big benefit is that we
have a common logic for deciding whether we can map a pte/pmd writable on
protection changes.

For private mappings, there should be no difference -- from what I
understand, that is what autonuma benchmarks care about.

I ran autonumabench for v1 on a system with 2 NUMA nodes, 96 GiB each via:
	perf stat --null --repeat 10
The numa01 benchmark is quite noisy in my environment and I failed to
reduce the noise so far.

numa01:
	mm-unstable:   146.88 +- 6.54 seconds time elapsed  ( +-  4.45% )
	mm-unstable++: 147.45 +- 13.39 seconds time elapsed  ( +-  9.08% )

numa02:
	mm-unstable:   16.0300 +- 0.0624 seconds time elapsed  ( +-  0.39% )
	mm-unstable++: 16.1281 +- 0.0945 seconds time elapsed  ( +-  0.59% )

It is worth noting that for shared writable mappings that require
writenotify, we will only avoid write faults if the pte/pmd is dirty
(inherited from the older mprotect logic).  If we ever care about
optimizing that further, we'd need a different mechanism to identify
whether the FS still needs to get notified on the next write access.

In any case, such an optimization will then not be autonuma-specific, but
mprotect() permission upgrades would similarly benefit from it.


This patch (of 7):

Anonymous pages might have the dirty bit clear, but this should not
prevent mprotect from making them writable if they are exclusive. 
Therefore, skip the test whether the page is dirty in this case.

Note that there are already other ways to get a writable PTE mapping an
anonymous page that is clean: for example, via MADV_FREE.  In an ideal
world, we'd have a different indication from the FS whether writenotify is
still required.

[david@redhat.com: return directly; update description]
Link: https://lkml.kernel.org/r/20221108174652.198904-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221108174652.198904-2-david@redhat.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:48 -08:00
..
damon mm/damon: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
kasan memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
kfence kfence: fix stack trace pruning 2022-11-22 18:50:44 -08:00
kmsan kmsan: core: kmsan_in_runtime() should return true in NMI context 2022-11-08 15:57:24 -08:00
backing-dev.c mm: backing-dev: Remove the unneeded result variable 2022-09-11 20:26:02 -07:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-28 14:02:45 -07:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-05-13 15:11:26 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c mm: migrate: fix THP's mapcount on isolation 2022-11-30 14:49:41 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm: debug_vm_pgtable: use VM_ACCESS_FLAGS 2022-11-08 17:37:19 -08:00
debug.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
dmapool.c
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c filemap: find_get_entries() now updates start offset 2022-11-08 17:37:12 -08:00
folio-compat.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
frontswap.c frontswap: don't call ->init if no ops are registered 2022-09-26 12:14:34 -07:00
gup_test.c mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
gup.c hugetlb: simplify hugetlb handling in follow_page_mask 2022-11-08 17:37:10 -08:00
highmem.c highmem: fix kmap_to_page() for kmap_local_page() addresses 2022-10-12 18:51:51 -07:00
hmm.c mm/swap: add swp_offset_pfn() to fetch PFN from swap entry 2022-09-26 19:46:05 -07:00
huge_memory.c mm,thp,rmap: clean up the end of __split_huge_pmd_locked() 2022-11-30 15:58:48 -08:00
hugetlb_cgroup.c mm/hugeltb_cgroup: convert hugetlb_cgroup_commit_charge*() to folios 2022-11-30 15:58:43 -08:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: remap head page to newly allocated page 2022-11-30 15:58:47 -08:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability 2022-08-08 18:06:43 -07:00
hugetlb.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: remove rb tree. 2022-09-26 19:46:16 -07:00
internal.h mm/hwpoison: introduce per-memory_block hwpoison counter 2022-11-08 17:37:22 -08:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: Add ioremap/iounmap_allowed() 2022-06-27 12:22:31 +01:00
Kconfig mm,hugetlb: use folio fields in second tail page 2022-11-30 15:58:46 -08:00
Kconfig.debug Two followon fixes for the post-5.19 series "Use pageblock_order for cma 2022-05-27 11:40:49 -07:00
khugepaged.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
kmemleak.c mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops 2022-10-28 13:37:22 -07:00
ksm.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c madvise: use zap_page_range_single for madvise dontneed 2022-11-30 14:49:40 -08:00
Makefile mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol 2022-10-03 14:03:36 -07:00
mapping_dirty_helpers.c
memblock.c mm: add pageblock_align() macro 2022-10-03 14:03:04 -07:00
memcontrol.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
memfd.c
memory_hotplug.c mm: add pageblock_aligned() macro 2022-10-03 14:03:04 -07:00
memory-failure.c mm,hugetlb: use folio fields in second tail page 2022-11-30 15:58:46 -08:00
memory-tiers.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
memory.c mm: use pte markers for swap errors 2022-11-30 15:58:46 -08:00
mempolicy.c mm/mempolicy: fix mbind_range() arguments to vma_merge() 2022-10-20 21:27:21 -07:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-08 15:57:23 -08:00
memtest.c
migrate_device.c mm/migrate_device: return number of migrating pages in args->cpages 2022-11-22 18:50:43 -08:00
migrate.c mm/hugetlb: convert move_hugetlb_state() to folios 2022-11-30 15:58:43 -08:00
mincore.c mm: convert find_get_incore_page() to filemap_get_incore_folio() 2022-11-08 17:37:18 -08:00
mlock.c mm/mlock: drop dead code in count_mm_mlocked_page_nr() 2022-09-26 19:46:27 -07:00
mm_init.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c Merge branch 'mm-hotfixes-stable' into mm-stable 2022-11-30 14:58:42 -08:00
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-11-30 14:49:42 -08:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mprotect.c mm/mprotect: allow clean exclusive anon pages to be writable 2022-11-30 15:58:48 -08:00
mremap.c mm: add merging after mremap resize 2022-09-26 19:46:28 -07:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
oom_kill.c mm: reduce noise in show_mem for lowmem allocations 2022-09-26 19:46:29 -07:00
page_alloc.c mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped 2022-11-30 15:58:48 -08:00
page_counter.c mm: page_counter: remove unneeded atomic ops for low/min 2022-09-11 20:26:01 -07:00
page_ext.c Merge branch 'mm-hotfixes-stable' into mm-stable 2022-11-30 14:58:42 -08:00
page_idle.c mm: don't be stuck to rmap lock on reclaim path 2022-05-19 14:08:54 -07:00
page_io.c swap: convert swap_writepage() to use a folio 2022-10-03 14:02:52 -07:00
page_isolation.c mm/page_isolation: fix clang deadcode warning 2022-10-28 13:37:22 -07:00
page_owner.c mm: reuse pageblock_start/end_pfn() macro 2022-10-03 14:03:03 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
page_vma_mapped.c mm/swap: add swp_offset_pfn() to fetch PFN from swap entry 2022-09-26 19:46:05 -07:00
page-writeback.c mm: export balance_dirty_pages_ratelimited_flags() 2022-09-26 12:28:07 +02:00
pagewalk.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
percpu-internal.h percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free() 2022-07-17 17:14:47 -07:00
pgalloc-track.h
pgtable-generic.c mm: avoid unnecessary flush on change_huge_pmd() 2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-03 10:13:13 -07:00
readahead.c mm: add PSI accounting around ->read_folio and ->readahead calls 2022-09-20 08:24:38 -06:00
rmap.c mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped 2022-11-30 15:58:48 -08:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c mm/secretmem: remove reduntant return value 2022-10-03 14:03:36 -07:00
shmem.c mm: use pte markers for swap errors 2022-11-30 15:58:46 -08:00
shrinker_debug.c mm: shrinkers: fix double kfree on shrinker name 2022-07-29 18:07:13 -07:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h
slab_common.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
slab.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
slab.h - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
slob.c Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next 2022-09-29 11:30:55 +02:00
slub.c mm/slub.c: use hotplug_memory_notifier() directly 2022-11-08 17:37:16 -08:00
sparse-vmemmap.c mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c 2022-08-08 18:06:42 -07:00
sparse.c mm/hwpoison: introduce per-memory_block hwpoison counter 2022-11-08 17:37:22 -08:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c mm: convert find_get_incore_page() to filemap_get_incore_folio() 2022-11-08 17:37:18 -08:00
swap.c swap: add a limit for readahead page-cluster value 2022-11-08 17:37:22 -08:00
swap.h mm: convert find_get_incore_page() to filemap_get_incore_folio() 2022-11-08 17:37:18 -08:00
swapfile.c mm: use pte markers for swap errors 2022-11-30 15:58:46 -08:00
truncate.c filemap: find_get_entries() now updates start offset 2022-11-08 17:37:12 -08:00
usercopy.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
userfaultfd.c mm/shmem: use page_mapping() to detect page cache for uffd continue 2022-11-08 15:57:23 -08:00
util.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
vmalloc.c mm: vmalloc: use trace_free_vmap_area_noflush event 2022-11-08 17:37:17 -08:00
vmpressure.c
vmscan.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
vmstat.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
workingset.c mm: vmscan: make rotations a secondary factor in balancing anon vs file 2022-11-08 17:37:11 -08:00
z3fold.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: replace IS_ERR() with IS_ERR_VALUE() 2022-11-30 15:58:46 -08:00
zswap.c mm/swap: remove the end_write_func argument to __swap_writepage 2022-09-11 20:25:50 -07:00