linux/Documentation
Zach O'Keefe 58ac9a8993 mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds
The main benefit of THPs are that they can be mapped at the pmd level,
increasing the likelihood of TLB hit and spending less cycles in page
table walks.  pte-mapped hugepages - that is - hugepage-aligned compound
pages of order HPAGE_PMD_ORDER mapped by ptes - although being contiguous
in physical memory, don't have this advantage.  In fact, one could argue
they are detrimental to system performance overall since they occupy a
precious hugepage-aligned/sized region of physical memory that could
otherwise be used more effectively.  Additionally, pte-mapped hugepages
can be the cheapest memory to collapse for khugepaged since no new
hugepage allocation or copying of memory contents is necessary - we only
need to update the mapping page tables.

In the anonymous collapse path, we are able to collapse pte-mapped
hugepages (albeit, perhaps suboptimally), but the file/shmem path makes no
effort when compound pages (of any order) are encountered.

Identify pte-mapped hugepages in the file/shmem collapse path.  The
final step of which makes a racy check of the value of the pmd to
ensure it maps a pte table.  This should be fine, since races that
result in false-positive (i.e.  attempt collapse even though we
shouldn't) will fail later in collapse_pte_mapped_thp() once we
actually lock mmap_lock and reinspect the pmd value.  Races that result
in false-negatives (i.e.  where we decide to not attempt collapse, but
should have) shouldn't be an issue, since in the worst case, we do
nothing - which is what we've done up to this point.  We make a similar
check in retract_page_tables().  If we do think we've found a
pte-mapped hugepgae in khugepaged context, attempt to update page
tables mapping this hugepage.

Note that these collapses still count towards the
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed counter,
and if the pte-mapped hugepage was also mapped into multiple process'
address spaces, could be incremented for each page table update.  Since we
increment the counter when a pte-mapped hugepage is successfully added to
the list of to-collapse pte-mapped THPs, it's possible that we never
actually update the page table either.  This is different from how
file/shmem pages_collapsed accounting works today where only a successful
page cache update is counted (it's also possible here that no page tables
are actually changed).  Though it incurs some slop, this is preferred to
either not accounting for the event at all, or plumbing through data in
struct mm_slot on whether to account for the collapse or not.

Also note that work still needs to be done to support arbitrary compound
pages, and that this should all be converted to using folios.

[shy828301@gmail.com: Spelling mistake, update comment, and add Documentation]
  Link: https://lore.kernel.org/linux-mm/CAHbLzkpHwZxFzjfX9nxVoRhzup8WMjMfyL6Xiq8mZ9M-N3ombw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20220907144521.3115321-3-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-3-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:33 -07:00
..
ABI mm/demotion: expose memory tier details via sysfs 2022-09-26 19:46:13 -07:00
accounting filemap: make the accounting of thrashing more consistent 2022-09-26 19:46:06 -07:00
admin-guide mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds 2022-10-03 14:03:33 -07:00
arc
arm SPDX changes for 6.0-rc1 2022-08-04 12:12:54 -07:00
arm64 arm64: errata: add detection for AMEVCNTR01 incrementing incorrectly 2022-08-23 11:06:48 +01:00
block null_blk: add module parameters for 4 options 2022-08-02 17:14:50 -06:00
bpf bpf: Update bpf_design_QA.rst to clarify that BTF_ID does not ABIify a function 2022-08-04 13:17:24 -07:00
cdrom
core-api mm/page_alloc: remove obsolete gfpflags_normal_context() 2022-10-03 14:03:30 -07:00
cpu-freq
crypto
dev-tools kmsan: add ReST documentation 2022-10-03 14:03:18 -07:00
devicetree Merge branch 'thermal-core' 2022-08-27 15:07:58 +02:00
doc-guide
driver-api cxl for 6.0 2022-08-10 11:07:26 -07:00
fault-injection SUNRPC: Fix server-side fault injection documentation 2022-07-29 20:08:56 -04:00
fb
features Xtensa updates for v5.20 2022-08-04 15:35:58 -07:00
filesystems f2fs-for-6.0 2022-08-08 11:18:31 -07:00
firmware_class
firmware-guide Documentation: ACPI: EINJ: Fix obsolete example 2022-07-21 17:05:42 +02:00
fpga
gpu Merge tag 'amd-drm-next-5.20-2022-07-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next 2022-07-27 09:33:45 +10:00
hid
hwmon This was a moderately busy cycle for documentation, but nothing all that 2022-08-02 19:24:24 -07:00
i2c docs: i2c: i2c-sysfs: fix hyperlinks 2022-08-11 23:25:05 +02:00
ia64
iio
images
infiniband
input
isdn
kbuild asm goto: eradicate CC_HAS_ASM_GOTO 2022-08-21 10:06:28 -07:00
kernel-hacking docs: process: remove outdated submitting-drivers.rst 2022-07-14 15:03:57 -06:00
leds
litmus-tests
livepatch
locking
loongarch docs/LoongArch: Add I14 description 2022-08-12 13:10:11 +08:00
m68k video: fbdev: atari: Fix inverse handling 2022-07-18 07:56:17 +02:00
maintainer
mhi
mips
misc-devices
mm ksm: add the ksm prefix to the names of the ksm private structures 2022-10-03 14:02:43 -07:00
netlabel
networking docs: net: bonding: remove mentions of trans_start 2022-08-03 19:20:13 -07:00
nios2
nvdimm
openrisc
parisc
PCI Fix of heap data and clang warnings, support for a new Intel NTB device, 2022-08-13 14:00:45 -07:00
pcmcia
peci
power Merge branches 'pm-devfreq', 'pm-qos', 'pm-tools' and 'pm-docs' 2022-07-29 19:46:00 +02:00
powerpc docs: powerpc: add elf_hwcaps to table of contents 2022-07-28 16:19:47 +10:00
process sound updates for 6.0-rc1 2022-08-06 10:19:51 -07:00
RCU
riscv
s390 s390/docs: fix warnings for vfio_ap driver doc 2022-07-22 13:54:07 +02:00
scheduler
scsi SCSI misc on 20220804 2022-08-04 19:47:37 -07:00
security Documentation: siphash: Fix typo in the name of offsetofend macro 2022-07-13 14:01:22 -06:00
sh
sound ASoC: Merge up fixes 2022-07-11 15:51:01 +01:00
sparc
sphinx
sphinx-static
spi
staging
target
timers
tools rtla: Fix tracer name 2022-08-10 11:43:59 -04:00
trace Tracing updates for 5.20 / 6.0 2022-08-05 09:41:12 -07:00
translations LoongArch changes for v5.20 2022-08-12 09:44:23 -07:00
usb usb: gadget: f_mass_storage: forced_eject attribute 2022-07-14 16:06:42 +02:00
userspace-api SCSI misc on 20220804 2022-08-04 19:47:37 -07:00
virt KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table 2022-08-11 02:35:37 -04:00
w1
watchdog watchdog/pseries-wdt: initial support for H_WATCHDOG-based watchdog timers 2022-07-20 21:57:39 +10:00
x86 dma-mapping updates 2022-08-06 10:56:45 -07:00
xtensa
.gitignore
arch.rst
asm-annotations.rst
atomic_bitops.txt wait_on_bit: add an acquire memory barrier 2022-08-26 09:30:25 -07:00
atomic_t.txt
Changes
CodingStyle
conf.py
docutils.conf
dontdiff
index.rst
Kconfig
Makefile
memory-barriers.txt
SubmittingPatches