linux/include
Kees Cook b32801d125 mm/slab: Introduce kmem_buckets_create() and family
Dedicated caches are available for fixed size allocations via
kmem_cache_alloc(), but for dynamically sized allocations there is only
the global kmalloc API's set of buckets available. This means it isn't
possible to separate specific sets of dynamically sized allocations into
a separate collection of caches.

This leads to a use-after-free exploitation weakness in the Linux
kernel since many heap memory spraying/grooming attacks depend on using
userspace-controllable dynamically sized allocations to collide with
fixed size allocations that end up in same cache.

While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
against these kinds of "type confusion" attacks, including for fixed
same-size heap objects, we can create a complementary deterministic
defense for dynamically sized allocations that are directly user
controlled. Addressing these cases is limited in scope, so isolating these
kinds of interfaces will not become an unbounded game of whack-a-mole. For
example, many pass through memdup_user(), making isolation there very
effective.

In order to isolate user-controllable dynamically-sized
allocations from the common system kmalloc allocations, introduce
kmem_buckets_create(), which behaves like kmem_cache_create(). Introduce
kmem_buckets_alloc(), which behaves like kmem_cache_alloc(). Introduce
kmem_buckets_alloc_track_caller() for where caller tracking is
needed. Introduce kmem_buckets_valloc() for cases where vmalloc fallback
is needed. Note that these caches are specifically flagged with
SLAB_NO_MERGE, since merging would defeat the entire purpose of the
mitigation.

This can also be used in the future to extend allocation profiling's use
of code tagging to implement per-caller allocation cache isolation[1]
even for dynamic allocations.

Memory allocation pinning[2] is still needed to plug the Use-After-Free
cross-allocator weakness (where attackers can arrange to free an
entire slab page and have it reallocated to a different cache),
but that is an existing and separate issue which is complementary
to this improvement. Development continues for that feature via the
SLAB_VIRTUAL[3] series (which could also provide guard pages -- another
complementary improvement).

Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-07-03 12:24:20 +02:00
..
acpi The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
asm-generic asm-generic cleanups for 6.10 2024-05-20 15:18:34 -07:00
clocksource
crypto This push fixes a bug in the new ecc P521 code as well as a buggy 2024-05-20 08:47:54 -07:00
drm drm fixes for 6.10-rc1 2024-05-24 17:28:02 -07:00
dt-bindings - Core Frameworks 2024-05-22 10:49:54 -07:00
keys Hi, 2024-05-13 10:40:15 -07:00
kunit kunit: Print last test location on fault 2024-05-06 14:22:02 -06:00
kvm Merge branch kvm-arm64/misc-6.10 into kvmarm-master/next 2024-05-08 16:41:50 +01:00
linux mm/slab: Introduce kmem_buckets_create() and family 2024-07-03 12:24:20 +02:00
math-emu
media media: cec.h: Fix kerneldoc 2024-05-04 10:19:59 +02:00
memory
misc
net more s390 updates for 6.10 merge window 2024-05-21 12:09:36 -07:00
pcmcia
ras tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
rdma The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
rv
scsi SCSI misc on 20240514 2024-05-14 18:25:53 -07:00
soc I'm actually surprised this time. There aren't any new Qualcomm SoC clk 2024-05-18 12:48:37 -07:00
sound ASoC: Fixes for v6.10 2024-05-23 13:29:27 +02:00
target
trace block-6.10-20240523 2024-05-23 13:44:47 -07:00
uapi drm fixes for 6.10-rc1 2024-05-24 17:28:02 -07:00
ufs
vdso
video
xen