linux/Documentation
Liam R. Howlett 54a611b605 Maple Tree: add new data structure
Patch series "Introducing the Maple Tree"

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses.  The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
long term goal is to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be
allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.

Davidlor said

: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage.  Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit.  This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead.  Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.

A similar work has been discovered in the academic press

	https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf

Sheer coincidence.  We designed our tree with the intention of solving the
hardest problem first.  Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article.  So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.

This patch (of 70):

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses.  The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
long term goal is to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be
allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.

There is additional BUG_ON() calls added within the tree, most of which
are in debug code.  These will be replaced with a WARN_ON() call in the
future.  There is also additional BUG_ON() calls within the code which
will also be reduced in number at a later date.  These exist to catch
things such as out-of-range accesses which would crash anyways.

Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:13 -07:00
..
ABI mm/demotion: expose memory tier details via sysfs 2022-09-26 19:46:13 -07:00
accounting filemap: make the accounting of thrashing more consistent 2022-09-26 19:46:06 -07:00
admin-guide mm: multi-gen LRU: admin guide 2022-09-26 19:46:10 -07:00
arc
arm SPDX changes for 6.0-rc1 2022-08-04 12:12:54 -07:00
arm64 arm64: errata: add detection for AMEVCNTR01 incrementing incorrectly 2022-08-23 11:06:48 +01:00
block null_blk: add module parameters for 4 options 2022-08-02 17:14:50 -06:00
bpf bpf: Update bpf_design_QA.rst to clarify that BTF_ID does not ABIify a function 2022-08-04 13:17:24 -07:00
cdrom
core-api Maple Tree: add new data structure 2022-09-26 19:46:13 -07:00
cpu-freq
crypto
dev-tools - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
devicetree Merge branch 'thermal-core' 2022-08-27 15:07:58 +02:00
doc-guide
driver-api cxl for 6.0 2022-08-10 11:07:26 -07:00
fault-injection SUNRPC: Fix server-side fault injection documentation 2022-07-29 20:08:56 -04:00
fb
features Xtensa updates for v5.20 2022-08-04 15:35:58 -07:00
filesystems f2fs-for-6.0 2022-08-08 11:18:31 -07:00
firmware_class
firmware-guide Documentation: ACPI: EINJ: Fix obsolete example 2022-07-21 17:05:42 +02:00
fpga
gpu Merge tag 'amd-drm-next-5.20-2022-07-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next 2022-07-27 09:33:45 +10:00
hid
hwmon This was a moderately busy cycle for documentation, but nothing all that 2022-08-02 19:24:24 -07:00
i2c docs: i2c: i2c-sysfs: fix hyperlinks 2022-08-11 23:25:05 +02:00
ia64
iio
images
infiniband
input
isdn
kbuild asm goto: eradicate CC_HAS_ASM_GOTO 2022-08-21 10:06:28 -07:00
kernel-hacking docs: process: remove outdated submitting-drivers.rst 2022-07-14 15:03:57 -06:00
leds
litmus-tests
livepatch
locking
loongarch docs/LoongArch: Add I14 description 2022-08-12 13:10:11 +08:00
m68k video: fbdev: atari: Fix inverse handling 2022-07-18 07:56:17 +02:00
maintainer
mhi
mips
misc-devices
mm mm: multi-gen LRU: design doc 2022-09-26 19:46:11 -07:00
netlabel
networking docs: net: bonding: remove mentions of trans_start 2022-08-03 19:20:13 -07:00
nios2
nvdimm
openrisc
parisc
PCI Fix of heap data and clang warnings, support for a new Intel NTB device, 2022-08-13 14:00:45 -07:00
pcmcia
peci
power Merge branches 'pm-devfreq', 'pm-qos', 'pm-tools' and 'pm-docs' 2022-07-29 19:46:00 +02:00
powerpc docs: powerpc: add elf_hwcaps to table of contents 2022-07-28 16:19:47 +10:00
process sound updates for 6.0-rc1 2022-08-06 10:19:51 -07:00
RCU
riscv
s390 s390/docs: fix warnings for vfio_ap driver doc 2022-07-22 13:54:07 +02:00
scheduler
scsi SCSI misc on 20220804 2022-08-04 19:47:37 -07:00
security Documentation: siphash: Fix typo in the name of offsetofend macro 2022-07-13 14:01:22 -06:00
sh
sound ASoC: Merge up fixes 2022-07-11 15:51:01 +01:00
sparc
sphinx docs: automarkup: do not look up symbols twice 2022-07-07 12:57:55 -06:00
sphinx-static
spi
staging
target
timers
tools rtla: Fix tracer name 2022-08-10 11:43:59 -04:00
trace Tracing updates for 5.20 / 6.0 2022-08-05 09:41:12 -07:00
translations LoongArch changes for v5.20 2022-08-12 09:44:23 -07:00
usb usb: gadget: f_mass_storage: forced_eject attribute 2022-07-14 16:06:42 +02:00
userspace-api SCSI misc on 20220804 2022-08-04 19:47:37 -07:00
virt KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table 2022-08-11 02:35:37 -04:00
w1
watchdog watchdog/pseries-wdt: initial support for H_WATCHDOG-based watchdog timers 2022-07-20 21:57:39 +10:00
x86 dma-mapping updates 2022-08-06 10:56:45 -07:00
xtensa
.gitignore
arch.rst
asm-annotations.rst
atomic_bitops.txt wait_on_bit: add an acquire memory barrier 2022-08-26 09:30:25 -07:00
atomic_t.txt
Changes
CodingStyle
conf.py
docutils.conf
dontdiff
index.rst
Kconfig
Makefile
memory-barriers.txt
SubmittingPatches