Go to file
Lorenzo Stoakes 9b91432985 mm/mprotect: allow unfaulted VMAs to be unaccounted on mprotect()
When mprotect() is used to make unwritable VMAs writable, they have the
VM_ACCOUNT flag applied and memory accounted accordingly.

If the VMA has had no pages faulted in and is then made unwritable once
again, it will remain accounted for, despite not being capable of
extending memory usage.

Consider:-

ptr = mmap(NULL, page_size * 3, PROT_READ, MAP_ANON | MAP_PRIVATE, -1, 0);
mprotect(ptr + page_size, page_size, PROT_READ | PROT_WRITE);
mprotect(ptr + page_size, page_size, PROT_READ);

The first mprotect() splits the range into 3 VMAs and the second fails to
merge the three as the middle VMA has VM_ACCOUNT set and the others do
not, rendering them unmergeable.

This is unnecessary, since no pages have actually been allocated and the
middle VMA is not capable of utilising more memory, thereby introducing
unnecessary VMA fragmentation (and accounting for more memory than is
necessary).

Since we cannot efficiently determine which pages map to an anonymous VMA,
we have to be very conservative - determining whether any pages at all
have been faulted in, by checking whether vma->anon_vma is NULL.

We can see that the lack of anon_vma implies that no anonymous pages are
present as evidenced by vma_needs_copy() utilising this on fork to
determine whether page tables need to be copied.

The only place where anon_vma is set NULL explicitly is on fork with
VM_WIPEONFORK set, however since this flag is intended to cause the child
process to not CoW on a given memory range, it is right to interpret this
as indicating the VMA has no faulted-in anonymous memory mapped.

If the VMA was forked without VM_WIPEONFORK set, then anon_vma_fork() will
have ensured that a new anon_vma is assigned (and correctly related to its
parent anon_vma) should any pages be CoW-mapped.

The overall operation is safe against races as we hold a write lock against
mm->mmap_lock.

If we could efficiently look up the VMA's faulted-in pages then we would
unaccount all those pages not yet faulted in.  However as the original
comment alludes this simply isn't currently possible, so we are
conservative and account all pages or none at all.

Link: https://lkml.kernel.org/r/ad5540371a16623a069f03f4db1739f33cde1fab.1696921767.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:18 -07:00
arch mm: delete checks for xor_unlock_is_negative_byte() 2023-10-18 14:34:17 -07:00
block block: fix kernel-doc for disk_force_media_change() 2023-09-26 00:43:34 -06:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto crypto: sm2 - Fix crash caused by uninitialized context 2023-09-20 13:10:10 +08:00
Documentation hugetlb: memcg: account hugetlb-backed memory in memory controller 2023-10-18 14:34:17 -07:00
drivers dax, kmem: calculate abstract distance with general interface 2023-10-16 15:44:39 -07:00
fs iomap: use folio_end_read() 2023-10-18 14:34:16 -07:00
include mm: add printf attribute to shrinker_debugfs_name_alloc 2023-10-18 14:34:18 -07:00
init workqueue: Changes for v6.6 2023-09-01 16:06:32 -07:00
io_uring io_uring/fs: remove sqe->rw_flags checking from LINKAT 2023-09-29 03:07:09 -06:00
ipc Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
kernel hugetlb: memcg: account hugetlb-backed memory in memory controller 2023-10-18 14:34:17 -07:00
lib percpu_counter: extend _limited_add() to negative amounts 2023-10-18 14:34:14 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm/mprotect: allow unfaulted VMAs to be unaccounted on mprotect() 2023-10-18 14:34:18 -07:00
net sunrpc: dynamically allocate the sunrpc_cred shrinker 2023-10-04 10:32:24 -07:00
rust Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
samples VFIO updates for v6.6-rc1 2023-08-30 20:36:01 -07:00
scripts kbuild: remove stale code for 'source' symlink in packaging scripts 2023-10-01 23:06:06 +09:00
security selinux: fix handling of empty opts in selinux_fs_context_submount() 2023-09-12 17:31:08 -04:00
sound ASoC: Fixes for v6.6 2023-09-20 15:02:16 +02:00
tools selftests: add a selftest to verify hugetlb usage in memcg 2023-10-18 14:34:18 -07:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt ARM: 2023-09-07 13:52:20 -07:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: rpm-pkg: rename binkernel.spec to kernel.spec 2023-07-25 00:59:33 +09:00
.mailmap mailmap: correct email aliasing for Oleksij Rempel 2023-10-18 12:12:41 -07:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS selftests: add a selftest to verify hugetlb usage in memcg 2023-10-18 14:34:18 -07:00
Makefile Linux 6.6-rc4 2023-10-01 14:15:13 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.