linux/Documentation/core-api
Liam R. Howlett 54a611b605 Maple Tree: add new data structure
Patch series "Introducing the Maple Tree"

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses.  The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
long term goal is to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be
allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.

Davidlor said

: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage.  Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit.  This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead.  Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.

A similar work has been discovered in the academic press

	https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf

Sheer coincidence.  We designed our tree with the intention of solving the
hardest problem first.  Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article.  So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.

This patch (of 70):

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses.  The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
long term goal is to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be
allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.

There is additional BUG_ON() calls added within the tree, most of which
are in debug code.  These will be replaced with a WARN_ON() call in the
future.  There is also additional BUG_ON() calls within the code which
will also be reduced in number at a later date.  These exist to catch
things such as out-of-range accesses which would crash anyways.

Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:13 -07:00
..
irq irq: remove handle_domain_{irq,nmi}() 2021-10-26 10:13:31 +01:00
assoc_array.rst Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
boot-time-mm.rst docs/boot-time-mm: remove bootmem documentation 2018-10-31 08:54:16 -07:00
cachetlb.rst mm: Add flush_dcache_folio() 2021-10-18 07:49:36 -04:00
circular-buffers.rst doc: Remove ".vnet" from paulmck email addresses 2019-05-28 09:02:57 -07:00
cpu_hotplug.rst Documentation: core-api/cpuhotplug: Rewrite the API section 2021-09-11 00:41:21 +02:00
debug-objects.rst doc: debugobjects: actually pull in the kerneldoc comments 2016-11-29 14:44:14 -07:00
debugging-via-ohci1394.rst docs: debugging-via-ohci1394.txt: add it to the core-api book 2020-05-15 11:59:17 -06:00
dma-api-howto.rst arch/*/: remove CONFIG_VIRT_TO_BUS 2022-06-28 13:20:21 +02:00
dma-api.rst dma-mapping: add dma_opt_mapping_size() 2022-07-19 06:05:41 +02:00
dma-attributes.rst Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" 2022-03-28 11:37:05 -07:00
dma-isa-lpc.rst docs: core-api: avoid using ReST :doc:foo markup 2021-06-17 13:24:37 -06:00
entry.rst Documentation: core-api: entry: Add comments about nesting 2022-01-27 11:32:40 -07:00
errseq.rst errseq: Add to documentation tree 2018-01-01 12:40:27 -07:00
genalloc.rst lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr 2019-12-04 19:44:13 -08:00
generic-radix-tree.rst generic radix trees 2019-03-12 10:04:02 -07:00
genericirq.rst docs: genericirq.rst: don't document chip.c functions twice 2020-10-15 07:49:41 +02:00
gfp_mask-from-fs-io.rst docs: core-api/gfp_mask-from-fs-io: add a label for cross-referencing 2018-09-20 11:02:32 -06:00
idr.rst IDR: Note that the IDR API is deprecated 2022-07-10 21:17:30 -04:00
index.rst Maple Tree: add new data structure 2022-09-26 19:46:13 -07:00
kernel-api.rst doc: module: update file references 2022-07-01 14:50:01 -07:00
kobject.rst kobject documentation: remove default_attrs information 2022-01-07 11:23:37 +01:00
kref.rst docs: move the kref doc into the core-api book 2020-05-15 12:02:19 -06:00
librs.rst docs-rst: convert librs book to ReST 2017-05-16 08:44:16 -03:00
local_ops.rst timer: Remove init_timer() interface 2017-11-21 15:57:09 -08:00
maple_tree.rst Maple Tree: add new data structure 2022-09-26 19:46:13 -07:00
memory-allocation.rst mm: slab: provide krealloc_array() 2020-12-15 12:13:37 -08:00
memory-hotplug.rst mm/memory_hotplug: remove HIGHMEM leftovers 2021-11-06 13:30:42 -07:00
mm-api.rst headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure 2022-07-15 06:35:54 -07:00
packing.rst docs: packing: move it to core-api book and adjust markups 2019-07-31 13:30:01 -06:00
padata.rst padata: fold padata_alloc_possible() into padata_alloc() 2020-07-23 17:34:18 +10:00
pin_user_pages.rst mm: Make compound_pincount always available 2022-03-21 12:56:35 -04:00
printk-basics.rst printk: Move the printk() kerneldoc comment to its new home 2021-07-26 12:36:44 +02:00
printk-formats.rst vsprintf: Update %pGp documentation about that it prints hex value 2021-11-01 15:55:06 +01:00
printk-index.rst printk/index: Printk index feature documentation 2022-04-13 14:25:31 +02:00
protection-keys.rst Documentation/protection-keys: Clean up documentation for User Space pkeys 2022-06-07 16:06:22 -07:00
rbtree.rst docs: rbtree.rst: Fix a typo 2021-03-25 11:38:51 -06:00
refcount-vs-atomic.rst docs: remove :c:func: from refcount-vs-atomic.rst 2019-10-07 09:08:56 -06:00
symbol-namespaces.rst doc: module: update file references 2022-07-01 14:50:01 -07:00
this_cpu_ops.rst docs: move other kAPI documents to core-api 2020-06-26 11:33:42 -06:00
timekeeping.rst timekeeping: Introduce fast accessor to clock tai 2022-04-14 16:19:30 +02:00
tracepoint.rst doc: Sphinxify the tracepoint docbook 2016-11-29 14:44:23 -07:00
unaligned-memory-access.rst docs: move other kAPI documents to core-api 2020-06-26 11:33:42 -06:00
watch_queue.rst Documentation: move watch_queue to core-api 2022-04-22 09:47:25 -06:00
workqueue.rst workqueue: doc: Call out the non-reentrance conditions 2021-10-25 07:18:40 -10:00
xarray.rst XArray: Document the locking requirement for the xa_state 2022-02-03 15:56:50 -05:00