mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-28 22:54:05 +08:00
6991a564f5
1040 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
28e92f9903 |
Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney: - Bitmap parsing support for "all" as an alias for all bits - Documentation updates - Miscellaneous fixes, including some that overlap into mm and lockdep - kvfree_rcu() updates - mem_dump_obj() updates, with acks from one of the slab-allocator maintainers - RCU NOCB CPU updates, including limited deoffloading - SRCU updates - Tasks-RCU updates - Torture-test updates * 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits) tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states rcu: Add missing __releases() annotation rcu: Remove obsolete rcu_read_unlock() deadlock commentary rcu: Improve comments describing RCU read-side critical sections rcu: Create an unrcu_pointer() to remove __rcu from a pointer srcu: Early test SRCU polling start rcu: Fix various typos in comments rcu/nocb: Unify timers rcu/nocb: Prepare for fine-grained deferred wakeup rcu/nocb: Only cancel nocb timer if not polling rcu/nocb: Delete bypass_timer upon nocb_gp wakeup rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup rcu/nocb: Allow de-offloading rdp leader rcu/nocb: Directly call __wake_nocb_gp() from bypass timer rcu: Don't penalize priority boosting when there is nothing to boost rcu: Point to documentation of ordering guarantees rcu: Make rcu_gp_cleanup() be noinline for tracing rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP ... |
||
Georgi Djakov
|
65ebdeef10 |
mm/slub: add taint after the errors are printed
When running the kernel with panic_on_taint, the usual slub debug error messages are not being printed when object corruption happens. That's because we panic in add_taint(), which is called before printing the additional information. This is a bit unfortunate as the error messages are actually very useful, especially before a panic. Let's fix this by moving add_taint() after the errors are printed on the console. Link: https://lkml.kernel.org/r/1623860738-146761-1-git-send-email-quic_c_gdjako@quicinc.com Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Faiyaz Mohammed
|
64dd68497b |
mm: slub: move sysfs slab alloc/free interfaces to debugfs
alloc_calls and free_calls implementation in sysfs have two issues, one is PAGE_SIZE limitation of sysfs and other is it does not adhere to "one value per file" rule. To overcome this issues, move the alloc_calls and free_calls implementation to debugfs. Debugfs cache will be created if SLAB_STORE_USER flag is set. Rename the alloc_calls/free_calls to alloc_traces/free_traces, to be inline with what it does. [faiyazm@codeaurora.org: fix the leak of alloc/free traces debugfs interface] Link: https://lkml.kernel.org/r/1624248060-30286-1-git-send-email-faiyazm@codeaurora.org Link: https://lkml.kernel.org/r/1623438200-19361-1-git-send-email-faiyazm@codeaurora.org Signed-off-by: Faiyaz Mohammed <faiyazm@codeaurora.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Stephen Boyd
|
792702911f |
slub: force on no_hash_pointers when slub_debug is enabled
Obscuring the pointers that slub shows when debugging makes for some confusing slub debug messages: Padding overwritten. 0x0000000079f0674a-0x000000000d4dce17 Those addresses are hashed for kernel security reasons. If we're trying to be secure with slub_debug on the commandline we have some big problems given that we dump whole chunks of kernel memory to the kernel logs. Let's force on the no_hash_pointers commandline flag when slub_debug is on the commandline. This makes slub debug messages more meaningful and if by chance a kernel address is in some slub debug object dump we will have a better chance of figuring out what went wrong. Note that we don't use %px in the slub code because we want to reduce the number of places that %px is used in the kernel. This also nicely prints a big fat warning at kernel boot if slub_debug is on the commandline so that we know that this kernel shouldn't be used on production systems. [akpm@linux-foundation.org: fix build with CONFIG_SLUB_DEBUG=n] Link: https://lkml.kernel.org/r/20210601182202.3011020-5-swboyd@chromium.org Signed-off-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Petr Mladek <pmladek@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Joe Perches
|
582d1212ed |
slub: indicate slab_fix() uses printf formats
Ideally, slab_fix() would be marked with __printf and the format here would not use \n as that's emitted by the slab_fix(). Make these changes. Link: https://lkml.kernel.org/r/20210601182202.3011020-4-swboyd@chromium.org Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Stephen Boyd
|
1a88ef87f8 |
slub: actually use 'message' in restore_bytes()
The message argument isn't used here. Let's pass the string to the printk message so that the developer can figure out what's happening, instead of guessing that a redzone is being restored, etc. Link: https://lkml.kernel.org/r/20210601182202.3011020-3-swboyd@chromium.org Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joe Perches <joe@perches.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Stephen Boyd
|
02ac47d0cd |
slub: restore slub_debug=- behavior
Petch series "slub: Print non-hashed pointers in slub debugging", v3.
I was doing some debugging recently and noticed that my pointers were
being hashed while slub_debug was on the kernel commandline. Let's force
on the no hash pointer option when slub_debug is on the kernel commandline
so that the prints are more meaningful.
The first two patches are something else I noticed while looking at the
code. The message argument is never used so the debugging messages are
not as clear as they could be and the slub_debug=- behavior seems to be
busted. Then there's a printf fixup from Joe and the final patch is the
one that force disables pointer hashing.
This patch (of 4):
Passing slub_debug=- on the kernel commandline is supposed to disable slub
debugging. This is especially useful with CONFIG_SLUB_DEBUG_ON where the
default is to have slub debugging enabled in the build. Due to some code
reorganization this behavior was dropped, but the code to make it work
mostly stuck around. Restore the previous behavior by disabling the
static key when we parse the commandline and see that we're trying to
disable slub debugging.
Link: https://lkml.kernel.org/r/20210601182202.3011020-1-swboyd@chromium.org
Link: https://lkml.kernel.org/r/20210601182202.3011020-2-swboyd@chromium.org
Fixes:
|
||
Oliver Glitta
|
3d8e374c6d |
slub: remove resiliency_test() function
Function resiliency_test() is hidden behind #ifdef SLUB_RESILIENCY_TEST that is not part of Kconfig, so nobody runs it. This function is replaced with KUnit test for SLUB added by the previous patch "selftests: add a KUnit test for SLUB debugging functionality". Link: https://lkml.kernel.org/r/20210511150734.3492-3-glittao@gmail.com Signed-off-by: Oliver Glitta <glittao@gmail.com> Reviewed-by: Marco Elver <elver@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Oliver Glitta <glittao@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Daniel Latypov <dlatypov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Oliver Glitta
|
1f9f78b1b3 |
mm/slub, kunit: add a KUnit test for SLUB debugging functionality
SLUB has resiliency_test() function which is hidden behind #ifdef SLUB_RESILIENCY_TEST that is not part of Kconfig, so nobody runs it. KUnit should be a proper replacement for it. Try changing byte in redzone after allocation and changing pointer to next free node, first byte, 50th byte and redzone byte. Check if validation finds errors. There are several differences from the original resiliency test: Tests create own caches with known state instead of corrupting shared kmalloc caches. The corruption of freepointer uses correct offset, the original resiliency test got broken with freepointer changes. Scratch changing random byte test, because it does not have meaning in this form where we need deterministic results. Add new option CONFIG_SLUB_KUNIT_TEST in Kconfig. Tests next_pointer, first_word and clobber_50th_byte do not run with KASAN option on. Because the test deliberately modifies non-allocated objects. Use kunit_resource to count errors in cache and silence bug reports. Count error whenever slab_bug() or slab_fix() is called or when the count of pages is wrong. [glittao@gmail.com: remove unused function test_exit(), from SLUB KUnit test] Link: https://lkml.kernel.org/r/20210512140656.12083-1-glittao@gmail.com [akpm@linux-foundation.org: export kasan_enable/disable_current to modules] Link: https://lkml.kernel.org/r/20210511150734.3492-2-glittao@gmail.com Signed-off-by: Oliver Glitta <glittao@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Daniel Latypov <dlatypov@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrew Morton
|
1b3865d016 |
mm/slub.c: include swab.h
Fixes build with CONFIG_SLAB_FREELIST_HARDENED=y.
Hopefully. But it's the right thing to do anwyay.
Fixes:
|
||
Kees Cook
|
e41a49fadb |
mm/slub: actually fix freelist pointer vs redzoning
It turns out that SLUB redzoning ("slub_debug=Z") checks from
s->object_size rather than from s->inuse (which is normally bumped to
make room for the freelist pointer), so a cache created with an object
size less than 24 would have the freelist pointer written beyond
s->object_size, causing the redzone to be corrupted by the freelist
pointer. This was very visible with "slub_debug=ZF":
BUG test (Tainted: G B ): Right Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........
Object (____ptrval____): 00 00 00 00 00 f6 f4 a5 ........
Redzone (____ptrval____): 40 1d e8 1a aa @....
Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........
Adjust the offset to stay within s->object_size.
(Note that no caches of in this size range are known to exist in the
kernel currently.)
Link: https://lkml.kernel.org/r/20210608183955.280836-4-keescook@chromium.org
Link: https://lore.kernel.org/linux-mm/20200807160627.GA1420741@elver.google.com/
Link: https://lore.kernel.org/lkml/0f7dd7b2-7496-5e2d-9488-2ec9f8e90441@suse.cz/Fixes:
|
||
Kees Cook
|
74c1d3e081 |
mm/slub: fix redzoning for small allocations
The redzone area for SLUB exists between s->object_size and s->inuse
(which is at least the word-aligned object_size). If a cache were
created with an object_size smaller than sizeof(void *), the in-object
stored freelist pointer would overwrite the redzone (e.g. with boot
param "slub_debug=ZF"):
BUG test (Tainted: G B ): Right Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........
Object (____ptrval____): f6 f4 a5 40 1d e8 ...@..
Redzone (____ptrval____): 1a aa ..
Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........
Store the freelist pointer out of line when object_size is smaller than
sizeof(void *) and redzoning is enabled.
Additionally remove the "smaller than sizeof(void *)" check under
CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant:
SLAB and SLOB both handle small sizes.
(Note that no caches within this size range are known to exist in the
kernel currently.)
Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org
Fixes:
|
||
Kees Cook
|
8669dbab2a |
mm/slub: clarify verification reporting
Patch series "Actually fix freelist pointer vs redzoning", v4. This fixes redzoning vs the freelist pointer (both for middle-position and very small caches). Both are "theoretical" fixes, in that I see no evidence of such small-sized caches actually be used in the kernel, but that's no reason to let the bugs continue to exist, especially since people doing local development keep tripping over it. :) This patch (of 3): Instead of repeating "Redzone" and "Poison", clarify which sides of those zones got tripped. Additionally fix column alignment in the trailer. Before: BUG test (Tainted: G B ): Redzone overwritten ... Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ After: BUG test (Tainted: G B ): Right Redzone overwritten ... Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ The earlier commits that slowly resulted in the "Before" reporting were: |
||
Alexander Potapenko
|
f70b00496f |
kasan: slab: always reset the tag in get_freepointer_safe()
With CONFIG_DEBUG_PAGEALLOC enabled, the kernel should also untag the object pointer, as done in get_freepointer(). Failing to do so reportedly leads to SLUB freelist corruptions that manifest as boot-time crashes. Link: https://lkml.kernel.org/r/20210514072228.534418-1-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Elliot Berman <eberman@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
afe0c26d19 |
mm, slub: move slub_debug static key enabling outside slab_mutex
Paul E. McKenney reported [1] that commit |
||
Maninder Singh
|
e548eaa116 |
mm/slub: Add Support for free path information of an object
This commit adds enables a stack dump for the last free of an object: slab kmalloc-64 start c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc [ 20.192078] meminfo_proc_show+0x40/0x4fc [ 20.192263] seq_read_iter+0x18c/0x4c4 [ 20.192430] proc_reg_read_iter+0x84/0xac [ 20.192617] generic_file_splice_read+0xe8/0x17c [ 20.192816] splice_direct_to_actor+0xb8/0x290 [ 20.193008] do_splice_direct+0xa0/0xe0 [ 20.193185] do_sendfile+0x2d0/0x438 [ 20.193345] sys_sendfile64+0x12c/0x140 [ 20.193523] ret_fast_syscall+0x0/0x58 [ 20.193695] 0xbeeacde4 [ 20.193822] Free path: [ 20.193935] meminfo_proc_show+0x5c/0x4fc [ 20.194115] seq_read_iter+0x18c/0x4c4 [ 20.194285] proc_reg_read_iter+0x84/0xac [ 20.194475] generic_file_splice_read+0xe8/0x17c [ 20.194685] splice_direct_to_actor+0xb8/0x290 [ 20.194870] do_splice_direct+0xa0/0xe0 [ 20.195014] do_sendfile+0x2d0/0x438 [ 20.195174] sys_sendfile64+0x12c/0x140 [ 20.195336] ret_fast_syscall+0x0/0x58 [ 20.195491] 0xbeeacde4 Acked-by: Vlastimil Babka <vbabka@suse.cz> Co-developed-by: Vaneet Narang <v.narang@samsung.com> Signed-off-by: Vaneet Narang <v.narang@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Maninder Singh
|
0cbc124bce |
mm/slub: Fix backtrace of objects to handle redzone adjustment
This commit fixes commit
|
||
Ingo Molnar
|
f0953a1bba |
mm: fix typos in comments
Fix ~94 single-word typos in locking code comments, plus a few very obvious grammar mistakes. Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
d57a964e09 |
kasan, mm: integrate slab init_on_free with HW_TAGS
This change uses the previously added memory initialization feature of HW_TAGS KASAN routines for slab memory when init_on_free is enabled. With this change, memory initialization memset() is no longer called when both HW_TAGS KASAN and init_on_free are enabled. Instead, memory is initialized in KASAN runtime. For SLUB, the memory initialization memset() is moved into slab_free_hook() that currently directly follows the initialization loop. A new argument is added to slab_free_hook() that indicates whether to initialize the memory or not. To avoid discrepancies with which memory gets initialized that can be caused by future changes, both KASAN hook and initialization memset() are put together and a warning comment is added. Combining setting allocation tags with memory initialization improves HW_TAGS KASAN performance when init_on_free is enabled. Link: https://lkml.kernel.org/r/190fd15c1886654afdec0d19ebebd5ade665b601.1615296150.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
da844b7872 |
kasan, mm: integrate slab init_on_alloc with HW_TAGS
This change uses the previously added memory initialization feature of HW_TAGS KASAN routines for slab memory when init_on_alloc is enabled. With this change, memory initialization memset() is no longer called when both HW_TAGS KASAN and init_on_alloc are enabled. Instead, memory is initialized in KASAN runtime. The memory initialization memset() is moved into slab_post_alloc_hook() that currently directly follows the initialization loop. A new argument is added to slab_post_alloc_hook() that indicates whether to initialize the memory or not. To avoid discrepancies with which memory gets initialized that can be caused by future changes, both KASAN hook and initialization memset() are put together and a warning comment is added. Combining setting allocation tags with memory initialization improves HW_TAGS KASAN performance when init_on_alloc is enabled. Link: https://lkml.kernel.org/r/c1292aeb5d519da221ec74a0684a949b027d7720.1615296150.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Bhaskar Chowdhury
|
dc84207d00 |
mm/slub.c: trivial typo fixes
s/operatios/operations/ s/Mininum/Minimum/ s/mininum/minimum/ ......two different places. Link: https://lkml.kernel.org/r/20210325044940.14516-1-unixbhaskar@gmail.com Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
1f0723a4c0 |
mm, slub: enable slub_debug static key when creating cache with explicit debug flags
Commit |
||
Linus Torvalds
|
9a45da9270 |
RCU changes for this cycle were:
- Bitmap support for "N" as alias for last bit - kvfree_rcu updates - mm_dump_obj() updates. (One of these is to mm, but was suggested by Andrew Morton.) - RCU callback offloading update - Polling RCU grace-period interfaces - Realtime-related RCU updates - Tasks-RCU updates - Torture-test updates - Torture-test scripting updates - Miscellaneous fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCJCZERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hRjw/+Jkb9KvR9odPt/zqN/KPtIlburCUWgsFb 2zAlWN4uMocPAiXT2Xq58/8gqMkpyn7ZVZtL1tD8fZSvlwEr0U8Z74+/NdoQvYE+ kMXIYIuhIAGRyAupmzkriqN33iY+BSZPacX3u6ziPj57/0OZzbWVN/DAhbuvyLqG J/oL4PHCa7XAqXbf95rd5Zjs680QJ3CbTRh4nA8uHArzJmKZOaaHJ05Pxd1LpULe SJ+5p1GQnnwxd1HqmlHMDu/dW+2hE35BGykF8zi78je9OJXualDoM/6JpIYGhMNY 5qlhU55QYP1jzjuNGVZZUS4L77eS2/W7SpPAaTmMEy/SsVB59G8Kf22oNDpVaEqQ m+2ErqwaHvlkMjqnsx+JQbsOP0yCi2NZBoEPFdfk1H23E2deVlSDbxPso4Zb1oUD E12769kN+SWDytuLSOAe1PY/KXqmNUKjPZl1GDCGXL7HlCnWyggUDschTsKJa19O XXl+yCTGMUH4XAPSqavAKQbBjurqpT6i4zfooSH4TBtOHm1ExgZOUS8gglZ1JuJd q+uJdZIgS8BcGkGw/k1bYDWY5TA4Rjv3sAOKQL1PgYBl1t/yLK441mE7LI9gWOwz Crz7vlSxD6Jc2cYQeUVW0KPGt5aVd63Gd9HjpXxGkqYQSDRqYMCebHEAGagz+jj7 Nv/nOnf34Uc= =mpNt -----END PGP SIGNATURE----- Merge tag 'core-rcu-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: - Support for "N" as alias for last bit in bitmap parsing library (eg using syntax like "nohz_full=2-N") - kvfree_rcu updates - mm_dump_obj() updates. (One of these is to mm, but was suggested by Andrew Morton.) - RCU callback offloading update - Polling RCU grace-period interfaces - Realtime-related RCU updates - Tasks-RCU updates - Torture-test updates - Torture-test scripting updates - Miscellaneous fixes * tag 'core-rcu-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) rcutorture: Test start_poll_synchronize_rcu() and poll_state_synchronize_rcu() rcu: Provide polling interfaces for Tiny RCU grace periods torture: Fix kvm.sh --datestamp regex check torture: Consolidate qemu-cmd duration editing into kvm-transform.sh torture: Print proper vmlinux path for kvm-again.sh runs torture: Make TORTURE_TRUST_MAKE available in kvm-again.sh environment torture: Make kvm-transform.sh update jitter commands torture: Add --duration argument to kvm-again.sh torture: Add kvm-again.sh to rerun a previous torture-test torture: Create a "batches" file for build reuse torture: De-capitalize TORTURE_SUITE torture: Make upper-case-only no-dot no-slash scenario names official torture: Rename SRCU-t and SRCU-u to avoid lowercase characters torture: Remove no-mpstat error message torture: Record kvm-test-1-run.sh and kvm-test-1-run-qemu.sh PIDs torture: Record jitter start/stop commands torture: Extract kvm-test-1-run-qemu.sh from kvm-test-1-run.sh torture: Record TORTURE_KCONFIG_GDB_ARG in qemu-cmd torture: Abstract jitter.sh start/stop into scripts rcu: Provide polling interfaces for Tree RCU grace periods ... |
||
Linus Torvalds
|
7f3d08b255 |
printk changes for 5.13
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmCIBMIACgkQUqAMR0iA lPIt9w//bbHUN/JsNtLCs/849oExdUn/thVajrD5yELttYZXhdzbXncNdkGX9tlU 4JmExmUoqKYdN6JhSnrcYvckHj7XXZM7pVh9IdzqRh10MEXIQ+7IUHjQc8034Zs/ W4/oZmfMtBjszap+cJ9hvdp9qaJkPz/fRLGlrbjc1K4hhxDa1gGmeD35SKswGltm q6RzX3uRl5JbBrYsLoqb28MGYRHhjf2+Pvndoj+5Nn9FtwPSot6jAkyqY5Y6iJlS W2EsFqOt+Kv7/I93FyQlnXC6Nx7vntmow7knmmGPXDf2BqLb0J8Bxl3fwuzpQoao nZzL/p9GQ4ZXF6y8gRV8+RzPIcftBdayOswEDGH0LzlTkbAe/9Sq9Lo7a4Z8jxHW ro0P+PSRK5Ksm7jvpVmSTg+Nt+XqDA5zA1lAorX1UjsyeDDNF9ndQ4C+ZNhCKo54 y+RDgtAArJMIvsHLQ53ReoOct5NnGVNb8G/r3bIAu+Dn6K3nesr6fP1XG8iduseL yFlLB7w214BQMr2B/C+8lQvj54wWE4lea2+LNvObxC5b8puYj0fEniUxTYP6bcB5 QT+LfTToufYz4US7ggJy6hoEfohifGWVvDHbn9tXmyXotSTHH7pHdYypqY+UO+kl 7BkwzNFCm4qCIKsg8nyJxT2hDOlpcCrQx1dBIjveMqJ0c5+ahXU= =ovSn -----END PGP SIGNATURE----- Merge tag 'printk-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Stop synchronizing kernel log buffer readers by logbuf_lock. As a result, the access to the buffer is fully lockless now. Note that printk() itself still uses locks because it tries to flush the messages to the console immediately. Also the per-CPU temporary buffers are still there because they prevent infinite recursion and serialize backtraces from NMI. All this is going to change in the future. - kmsg_dump API rework and cleanup as a side effect of the logbuf_lock removal. - Make bstr_printf() aware that %pf and %pF formats could deference the given pointer. - Show also page flags by %pGp format. - Clarify the documentation for plain pointer printing. - Do not show no_hash_pointers warning multiple times. - Update Senozhatsky email address. - Some clean up. * tag 'printk-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (24 commits) lib/vsprintf.c: remove leftover 'f' and 'F' cases from bstr_printf() printk: clarify the documentation for plain pointer printing kernel/printk.c: Fixed mundane typos printk: rename vprintk_func to vprintk vsprintf: dump full information of page flags in pGp mm, slub: don't combine pr_err with INFO mm, slub: use pGp to print page flags MAINTAINERS: update Senozhatsky email address lib/vsprintf: do not show no_hash_pointers message multiple times printk: console: remove unnecessary safe buffer usage printk: kmsg_dump: remove _nolock() variants printk: remove logbuf_lock printk: introduce a kmsg_dump iterator printk: kmsg_dumper: remove @active field printk: add syslog_lock printk: use atomic64_t for devkmsg_user.seq printk: use seqcount_latch for clear_seq printk: introduce CONSOLE_LOG_MAX printk: consolidate kmsg_dump_get_buffer/syslog_print_all code printk: refactor kmsg_dump_get_buffer() ... |
||
Ingo Molnar
|
120b566d1d |
Merge branch 'for-mingo-rcu' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney: - Bitmap support for "N" as alias for last bit - kvfree_rcu updates - mm_dump_obj() updates. (One of these is to mm, but was suggested by Andrew Morton.) - RCU callback offloading update - Polling RCU grace-period interfaces - Realtime-related RCU updates - Tasks-RCU updates - Torture-test updates - Torture-test scripting updates - Miscellaneous fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Yafang Shao
|
96b94abc12 |
mm, slub: don't combine pr_err with INFO
It is strange to combine "pr_err" with "INFO", so let's remove the prefix completely. This patch is motivated by David's comment[1]. - before the patch [ 8846.517809] INFO: Slab 0x00000000f42a2c60 objects=33 used=3 fp=0x0000000060d32ca8 flags=0x17ffffc0010200(slab|head) - after the patch [ 6343.396602] Slab 0x000000004382e02b objects=33 used=3 fp=0x000000009ae06ffc flags=0x17ffffc0010200(slab|head) [1] https://lore.kernel.org/linux-mm/b9c0f2b6-e9b0-0c36-ebdd-2bc684c5a762@redhat.com/#t Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210319101246.73513-3-laoar.shao@gmail.com |
||
Yafang Shao
|
4a8ef190c1 |
mm, slub: use pGp to print page flags
As pGp has been already introduced in printk, we'd better use it to make the output human readable. Before this change, the output is, [ 6155.716018] INFO: Slab 0x000000004027dd4f objects=33 used=3 fp=0x000000008cd1579c flags=0x17ffffc0010200 While after this change, the output is, [ 8846.517809] INFO: Slab 0x00000000f42a2c60 objects=33 used=3 fp=0x0000000060d32ca8 flags=0x17ffffc0010200(slab|head) Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210319101246.73513-2-laoar.shao@gmail.com |
||
Linus Torvalds
|
9b1ea29bc0 |
Revert "mm, slub: consider rest of partial list if acquire_slab() fails"
This reverts commit |
||
Paul E. McKenney
|
5bb1bb353c |
mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels
The mem_dump_obj() functionality adds a few hundred bytes, which is a small price to pay. Except on kernels built with CONFIG_PRINTK=n, in which mem_dump_obj() messages will be suppressed. This commit therefore makes mem_dump_obj() be a static inline empty function on kernels built with CONFIG_PRINTK=n and excludes all of its support functions as well. This avoids kernel bloat on systems that cannot use mem_dump_obj(). Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <linux-mm@kvack.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Andrey Konovalov
|
e2db1a9aa3 |
kasan, mm: optimize kmalloc poisoning
For allocations from kmalloc caches, kasan_kmalloc() always follows kasan_slab_alloc(). Currenly, both of them unpoison the whole object, which is unnecessary. This patch provides separate implementations for both annotations: kasan_slab_alloc() unpoisons the whole object, and kasan_kmalloc() only poisons the redzone. For generic KASAN, the redzone start might not be aligned to KASAN_GRANULE_SIZE. Therefore, the poisoning is split in two parts: kasan_poison_last_granule() poisons the unaligned part, and then kasan_poison() poisons the rest. This patch also clarifies alignment guarantees of each of the poisoning functions and drops the unnecessary round_up() call for redzone_end. With this change, the early SLUB cache annotation needs to be changed to kasan_slab_alloc(), as kasan_kmalloc() doesn't unpoison objects now. The number of poisoned bytes for objects in this cache stays the same, as kmem_cache_node->object_size is equal to sizeof(struct kmem_cache_node). Link: https://lkml.kernel.org/r/7e3961cb52be380bc412860332063f5f7ce10d13.1612546384.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Alexander Potapenko
|
b89fb5ef0c |
mm, kfence: insert KFENCE hooks for SLUB
Inserts KFENCE hooks into the SLUB allocator. To pass the originally requested size to KFENCE, add an argument 'orig_size' to slab_alloc*(). The additional argument is required to preserve the requested original size for kmalloc() allocations, which uses size classes (e.g. an allocation of 272 bytes will return an object of size 512). Therefore, kmem_cache::size does not represent the kmalloc-caller's requested size, and we must introduce the argument 'orig_size' to propagate the originally requested size to KFENCE. Without the originally requested size, we would not be able to detect out-of-bounds accesses for objects placed at the end of a KFENCE object page if that object is not equal to the kmalloc-size class it was bucketed into. When KFENCE is disabled, there is no additional overhead, since slab_alloc*() functions are __always_inline. Link: https://lkml.kernel.org/r/20201103175841.3495947-6-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Jann Horn <jannh@google.com> Co-developed-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
027b37b552 |
kasan: move _RET_IP_ to inline wrappers
Generic mm functions that call KASAN annotations that might report a bug pass _RET_IP_ to them as an argument. This allows KASAN to include the name of the function that called the mm function in its report's header. Now that KASAN has inline wrappers for all of its annotations, move _RET_IP_ to those wrappers to simplify annotation call sites. Link: https://linux-review.googlesource.com/id/I8fb3c06d49671305ee184175a39591bc26647a67 Link: https://lkml.kernel.org/r/5c1490eddf20b436b8c4eeea83fce47687d5e4a4.1610733117.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Muchun Song
|
96403bfe50 |
mm: memcontrol: fix slub memory accounting
SLUB currently account kmalloc() and kmalloc_node() allocations larger
than order-1 page per-node. But it forget to update the per-memcg
vmstats. So it can lead to inaccurate statistics of "slab_unreclaimable"
which is from memory.stat. Fix it by using mod_lruvec_page_state instead
of mod_node_page_state.
Link: https://lkml.kernel.org/r/20210223092423.42420-1-songmuchun@bytedance.com
Fixes:
|
||
Roman Gushchin
|
2e9bd48315 |
mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT
In general it's unknown in advance if a slab page will contain accounted objects or not. In order to avoid memory waste, an obj_cgroup vector is allocated dynamically when a need to account of a new object arises. Such approach is memory efficient, but requires an expensive cmpxchg() to set up the memcg/objcgs pointer, because an allocation can race with a different allocation on another cpu. But in some common cases it's known for sure that a slab page will contain accounted objects: if the page belongs to a slab cache with a SLAB_ACCOUNT flag set. It includes such popular objects like vm_area_struct, anon_vma, task_struct, etc. In such cases we can pre-allocate the objcgs vector and simple assign it to the page without any atomic operations, because at this early stage the page is not visible to anyone else. A very simplistic benchmark (allocating 10000000 64-bytes objects in a row) shows ~15% win. In the real life it seems that most workloads are not very sensitive to the speed of (accounted) slab allocations. [guro@fb.com: open-code set_page_objcgs() and add some comments, by Johannes] Link: https://lkml.kernel.org/r/20201113001926.GA2934489@carbon.dhcp.thefacebook.com [akpm@linux-foundation.org: fix it for mm-slub-call-account_slab_page-after-slab-page-initialization-fix.patch] Link: https://lkml.kernel.org/r/20201110195753.530157-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Zhiyuan Dai
|
457c82c351 |
mm/slub: minor coding style tweaks
Add whitespace to fix coding style issues, improve code reading. Link: https://lkml.kernel.org/r/1612847403-5594-1-git-send-email-daizhiyuan@phytium.com.cn Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
fe2cce15d6 |
mm, slub: remove slub_memcg_sysfs boot param and CONFIG_SLUB_MEMCG_SYSFS_ON
The boot param and config determine the value of memcg_sysfs_enabled,
which is unused since commit
|
||
Vlastimil Babka
|
d930ff03c4 |
mm, slub: splice cpu and page freelists in deactivate_slab()
In deactivate_slab() we currently move all but one objects on the cpu freelist to the page freelist one by one using the costly cmpxchg_double() operation. Then we unfreeze the page while moving the last object on page freelist, with a final cmpxchg_double(). This can be optimized to avoid the cmpxchg_double() per object. Just count the objects on cpu freelist (to adjust page->inuse properly) and also remember the last object in the chain. Then splice page->freelist to the last object and effectively add the whole cpu freelist to page->freelist while unfreezing the page, with a single cmpxchg_double(). Link: https://lkml.kernel.org/r/20210115183543.15097-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Jann Horn <jannh@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
7e1fa93def |
mm, slab, slub: stop taking memory hotplug lock
Since commit |
||
Vlastimil Babka
|
666716fd26 |
mm, slub: stop freeing kmem_cache_node structures on node offline
Patch series "mm, slab, slub: remove cpu and memory hotplug locks". Some related work caused me to look at how we use get/put_mems_online() and get/put_online_cpus() during kmem cache creation/descruction/shrinking, and realize that it should be actually safe to remove all of that with rather small effort (as e.g. Michal Hocko suspected in some of the past discussions already). This has the benefit to avoid rather heavy locks that have caused locking order issues already in the past. So this is the result, Patches 2 and 3 remove memory hotplug and cpu hotplug locking, respectively. Patch 1 is due to realization that in fact some races exist despite the locks (even if not removed), but the most sane solution is not to introduce more of them, but rather accept some wasted memory in scenarios that should be rare anyway (full memory hot remove), as we do the same in other contexts already. This patch (of 3): Commit |
||
Johannes Berg
|
ca22059320 |
mm/slub: disable user tracing for kmemleak caches by default
If kmemleak is enabled, it uses a kmem cache for its own objects. These objects are used to hold information kmemleak uses, including a stack trace. If slub_debug is also turned on, each of them has *another* stack trace, so the overhead adds up, and on my tests (on ARCH=um, admittedly) 2/3rds of the allocations end up being doing the stack tracing. Turn off SLAB_STORE_USER if SLAB_NOLEAKTRACE was given, to avoid storing the essentially same data twice. Link: https://lkml.kernel.org/r/20210113215114.d94efa13ba30.I117b6764e725b3192318bbcf4269b13b709539ae@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Nikolay Borisov
|
3754000872 |
mm/sl?b.c: remove ctor argument from kmem_cache_flags
This argument hasn't been used since
|
||
Jacob Wen
|
3544de8ee6 |
mm, tracing: record slab name for kmem_cache_free()
Currently, a trace record generated by the RCU core is as below. ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f3b49a66 It doesn't tell us what the RCU core has freed. This patch adds the slab name to trace_kmem_cache_free(). The new format is as follows. ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000037f79c8d name=dentry ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f78cb7b5 name=sock_inode_cache ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000018768985 name=pool_workqueue ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=000000006a6cb484 name=radix_tree_node We can use it to understand what the RCU core is going to free. For example, some users maybe interested in when the RCU core starts freeing reclaimable slabs like dentry to reduce memory pressure. Link: https://lkml.kernel.org/r/20201216072804.8838-1-jian.w.wen@oracle.com Signed-off-by: Jacob Wen <jian.w.wen@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Ingo Molnar
|
85e853c5ec |
Merge branch 'for-mingo-rcu' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return addresses to more easily locate bugs. This has a couple of RCU-related commits, but is mostly MM. Was pulled in with akpm's agreement. - Per-callback-batch tracking of numbers of callbacks, which enables better debugging information and smarter reactions to large numbers of callbacks. - The first round of changes to allow CPUs to be runtime switched from and to callback-offloaded state. - CONFIG_PREEMPT_RT-related changes. - RCU CPU stall warning updates. - Addition of polling grace-period APIs for SRCU. - Torture-test and torture-test scripting updates, including a "torture everything" script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale. Plus does an allmodconfig build. Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Vlastimil Babka
|
3286222fc6 |
mm, slub: better heuristic for number of cpus when calculating slab order
When creating a new kmem cache, SLUB determines how large the slab pages will based on number of inputs, including the number of CPUs in the system. Larger slab pages mean that more objects can be allocated/free from per-cpu slabs before accessing shared structures, but also potentially more memory can be wasted due to low slab usage and fragmentation. The rough idea of using number of CPUs is that larger systems will be more likely to benefit from reduced contention, and also should have enough memory to spare. Number of CPUs used to be determined as nr_cpu_ids, which is number of possible cpus, but on some systems many will never be onlined, thus commit |
||
Wang Hai
|
757fed1d08 |
Revert "mm/slub: fix a memory leak in sysfs_slab_add()"
This reverts commit |
||
Andrey Konovalov
|
ce5716c618 |
kasan, mm: fix conflicts with init_on_alloc/free
A few places where SLUB accesses object's data or metadata were missed
in a previous patch. This leads to false positives with hardware
tag-based KASAN when bulk allocations are used with init_on_alloc/free.
Fix the false-positives by resetting pointer tags during these accesses.
(The kasan_reset_tag call is removed from slab_alloc_node, as it's added
into maybe_wipe_obj_freeptr.)
Link: https://linux-review.googlesource.com/id/I50dd32838a666e173fe06c3c5c766f2c36aae901
Link: https://lkml.kernel.org/r/093428b5d2ca8b507f4a79f92f9929b35f7fada7.1610731872.git.andreyknvl@google.com
Fixes:
|
||
Paul E. McKenney
|
8e7f37f2aa |
mm: Add mem_dump_obj() to print source of memory block
There are kernel facilities such as per-CPU reference counts that give error messages in generic handlers or callbacks, whose messages are unenlightening. In the case of per-CPU reference-count underflow, this is not a problem when creating a new use of this facility because in that case the bug is almost certainly in the code implementing that new use. However, trouble arises when deploying across many systems, which might exercise corner cases that were not seen during development and testing. Here, it would be really nice to get some kind of hint as to which of several uses the underflow was caused by. This commit therefore exposes a mem_dump_obj() function that takes a pointer to memory (which must still be allocated if it has been dynamically allocated) and prints available information on where that memory came from. This pointer can reference the middle of the block as well as the beginning of the block, as needed by things like RCU callback functions and timer handlers that might not know where the beginning of the memory block is. These functions and handlers can use mem_dump_obj() to print out better hints as to where the problem might lie. The information printed can depend on kernel configuration. For example, the allocation return address can be printed only for slab and slub, and even then only when the necessary debug has been enabled. For slab, build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space to the next power of two or use the SLAB_STORE_USER when creating the kmem_cache structure. For slub, build with CONFIG_SLUB_DEBUG=y and boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create() if more focused use is desired. Also for slub, use CONFIG_STACKTRACE to enable printing of the allocation-time stack trace. Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Reported-by: Andrii Nakryiko <andrii@kernel.org> [ paulmck: Convert to printing and change names per Joonsoo Kim. ] [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ] [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ] [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ] [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ] [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ] Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Jann Horn
|
8ff60eb052 |
mm, slub: consider rest of partial list if acquire_slab() fails
acquire_slab() fails if there is contention on the freelist of the page
(probably because some other CPU is concurrently freeing an object from
the page). In that case, it might make sense to look for a different page
(since there might be more remote frees to the page from other CPUs, and
we don't want contention on struct page).
However, the current code accidentally stops looking at the partial list
completely in that case. Especially on kernels without CONFIG_NUMA set,
this means that get_partial() fails and new_slab_objects() falls back to
new_slab(), allocating new pages. This could lead to an unnecessary
increase in memory fragmentation.
Link: https://lkml.kernel.org/r/20201228130853.1871516-1-jannh@google.com
Fixes:
|
||
Roman Gushchin
|
1f3147b49d |
mm: slub: call account_slab_page() after slab page initialization
It's convenient to have page->objects initialized before calling into account_slab_page(). In particular, this information can be used to pre-alloc the obj_cgroup vector. Let's call account_slab_page() a bit later, after the initialization of page->objects. This commit doesn't bring any functional change, but is required for further optimizations. [akpm@linux-foundation.org: undo changes needed by forthcoming mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch] Link: https://lkml.kernel.org/r/20201110195753.530157-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
aa1ef4d7b3 |
kasan, mm: reset tags when accessing metadata
Kernel allocator code accesses metadata for slab objects, that may lie out-of-bounds of the object itself, or be accessed when an object is freed. Such accesses trigger tag faults and lead to false-positive reports with hardware tag-based KASAN. Software KASAN modes disable instrumentation for allocator code via KASAN_SANITIZE Makefile macro, and rely on kasan_enable/disable_current() annotations which are used to ignore KASAN reports. With hardware tag-based KASAN neither of those options are available, as it doesn't use compiler instrumetation, no tag faults are ignored, and MTE is disabled after the first one. Instead, reset tags when accessing metadata (currently only for SLUB). Link: https://lkml.kernel.org/r/a0f3cefbc49f34c843b664110842de4db28179d0.1606161801.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Marco Elver <elver@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Joe Perches
|
bf16d19aab |
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
Convert the unbounded uses of sprintf to sysfs_emit. A few conversions may now not end in a newline if the output buffer is overflowed. Link: https://lkml.kernel.org/r/0c90a90f466167f8c37de4b737553cf49c4a277f.1605376435.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Bharata B Rao
|
045ab8c948 |
mm/slub: let number of online CPUs determine the slub page order
The page order of the slab that gets chosen for a given slab cache depends
on the number of objects that can be fit in the slab while meeting other
requirements. We start with a value of minimum objects based on
nr_cpu_ids that is driven by possible number of CPUs and hence could be
higher than the actual number of CPUs present in the system. This leads
to calculate_order() chosing a page order that is on the higher side
leading to increased slab memory consumption on systems that have bigger
page sizes.
Hence rely on the number of online CPUs when determining the mininum
objects, thereby increasing the chances of chosing a lower conservative
page order for the slab.
Vlastimil said:
"Ideally, we would react to hotplug events and update existing caches
accordingly. But for that, recalculation of order for existing caches
would have to be made safe, while not affecting hot paths. We have
removed the sysfs interface with
|
||
Vlastimil Babka
|
965c484815 |
mm, slub: use kmem_cache_debug_flags() in deactivate_slab()
Commit
|
||
Vlastimil Babka
|
0c06dd7551 |
mm, slab, slub: clear the slab_cache field when freeing page
The page allocator expects that page->mapping is NULL for a page being freed. SLAB and SLUB use the slab_cache field which is in union with mapping, but before freeing the page, the field is referenced with the "mapping" name when set to NULL. It's IMHO more correct (albeit functionally the same) to use the slab_cache name as that's the field we use in SL*B, and document why we clear it in a comment (we don't clear fields such as s_mem or freelist, as page allocator doesn't care about those). While using the 'mapping' name would automagically keep the code correct if the unions in struct page changed, such changes should be done consciously and needed changes evaluated - the comment should help with that. Link: https://lkml.kernel.org/r/20201210160020.21562-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Laurent Dufour
|
22e4663e91 |
mm/slub: fix panic in slab_alloc_node()
While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs with 11TB of ram, I hit the following panic: BUG: Kernel NULL pointer dereference on read at 0x00000007 Faulting instruction address: 0xc000000000456048 Oops: Kernel access of bad area, sig: 11 [#2] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS= 2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp CPU: 160 PID: 1 Comm: systemd Tainted: G D 5.9.0 #1 NIP: c000000000456048 LR: c000000000455fd4 CTR: c00000000047b350 REGS: c00006028d1b77a0 TRAP: 0300 Tainted: G D (5.9.0) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004228 XER: 00000000 CFAR: c00000000000f1b0 DAR: 0000000000000007 DSISR: 40000000 IRQMASK: 0 GPR00: c000000000455fd4 c00006028d1b7a30 c000000001bec800 0000000000000000 GPR04: 0000000000000dc0 0000000000000000 00000000000374ef c00007c53df99320 GPR08: 000007c53c980000 0000000000000000 000007c53c980000 0000000000000000 GPR12: 0000000000004400 c00000001e8e4400 0000000000000000 0000000000000f6a GPR16: 0000000000000000 c000000001c25930 c000000001d62528 00000000000000c1 GPR20: c000000001d62538 c00006be469e9000 0000000fffffffe0 c0000000003c0ff8 GPR24: 0000000000000018 0000000000000000 0000000000000dc0 0000000000000000 GPR28: c00007c513755700 c000000001c236a4 c00007bc4001f800 0000000000000001 NIP [c000000000456048] __kmalloc_node+0x108/0x790 LR [c000000000455fd4] __kmalloc_node+0x94/0x790 Call Trace: kvmalloc_node+0x58/0x110 mem_cgroup_css_online+0x10c/0x270 online_css+0x48/0xd0 cgroup_apply_control_enable+0x2c4/0x470 cgroup_mkdir+0x408/0x5f0 kernfs_iop_mkdir+0x90/0x100 vfs_mkdir+0x138/0x250 do_mkdirat+0x154/0x1c0 system_call_exception+0xf8/0x200 system_call_common+0xf0/0x27c Instruction dump: e93e0000 e90d0030 39290008 7cc9402a e94d0030 e93e0000 7ce95214 7f89502a 2fbc0000 419e0018 41920230 e9270010 <89290007> 7f994800 419e0220 7ee6bb78 This pointing to the following code: mm/slub.c:2851 if (unlikely(!object || !node_match(page, node))) { c000000000456038: 00 00 bc 2f cmpdi cr7,r28,0 c00000000045603c: 18 00 9e 41 beq cr7,c000000000456054 <__kmalloc_node+0x114> node_match(): mm/slub.c:2491 if (node != NUMA_NO_NODE && page_to_nid(page) != node) c000000000456040: 30 02 92 41 beq cr4,c000000000456270 <__kmalloc_node+0x330> page_to_nid(): include/linux/mm.h:1294 c000000000456044: 10 00 27 e9 ld r9,16(r7) c000000000456048: 07 00 29 89 lbz r9,7(r9) <<<< r9 = NULL node_match(): mm/slub.c:2491 c00000000045604c: 00 48 99 7f cmpw cr7,r25,r9 c000000000456050: 20 02 9e 41 beq cr7,c000000000456270 <__kmalloc_node+0x330> The panic occurred in slab_alloc_node() when checking for the page's node: object = c->freelist; page = c->page; if (unlikely(!object || !node_match(page, node))) { object = __slab_alloc(s, gfpflags, node, addr, c); stat(s, ALLOC_SLOWPATH); The issue is that object is not NULL while page is NULL which is odd but may happen if the cache flush happened after loading object but before loading page. Thus checking for the page pointer is required too. The cache flush is done through an inter processor interrupt when a piece of memory is off-lined. That interrupt is triggered when a memory hot-unplug operation is initiated and offline_pages() is calling the slub's MEM_GOING_OFFLINE callback slab_mem_going_offline_callback() which is calling flush_cpu_slab(). If that interrupt is caught between the reading of c->freelist and the reading of c->page, this could lead to such a situation. That situation is expected and the later call to this_cpu_cmpxchg_double() will detect the change to c->freelist and redo the whole operation. In commit |
||
Chen Tao
|
70b6d25ec5 |
mm: fix some comments formatting
Correct one function name "get_partials" with "get_partial". Update the old struct name of list3 with kmem_cache_node. Signed-off-by: Chen Tao <chentao3@hotmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Bharata B Rao
|
d1b2cf6cb8 |
mm: memcg/slab: uncharge during kmem_cache_free_bulk()
Object cgroup charging is done for all the objects during allocation, but
during freeing, uncharging ends up happening for only one object in the
case of bulk allocation/freeing.
Fix this by having a separate call to uncharge all the objects from
kmem_cache_free_bulk() and by modifying memcg_slab_free_hook() to take
care of bulk uncharging.
Fixes:
|
||
Abel Wu
|
9cf7a11183 |
mm/slub: make add_full() condition more explicit
The commit below is incomplete, as it didn't handle the add_full() part.
commit
|
||
Abel Wu
|
9f986d998a |
mm/slub: fix missing ALLOC_SLOWPATH stat when bulk alloc
The ALLOC_SLOWPATH statistics is missing in bulk allocation now. Fix it by doing statistics in alloc slow path. Signed-off-by: Abel Wu <wuyun.wu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Link: http://lkml.kernel.org/r/20200811022427.1363-1-wuyun.wu@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Abel Wu
|
c270cf3041 |
mm/slub.c: branch optimization in free slowpath
The two conditions are mutually exclusive and gcc compiler will optimise this into if-else-like pattern. Given that the majority of free_slowpath is free_frozen, let's provide some hint to the compilers. Tests (perf bench sched messaging -g 20 -l 400000, executed 10x after reboot) are done and the summarized result: un-patched patched max. 192.316 189.851 min. 187.267 186.252 avg. 189.154 188.086 stdev. 1.37 0.99 Signed-off-by: Abel Wu <wuyun.wu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Link: http://lkml.kernel.org/r/20200813101812.1617-1-wuyun.wu@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Eric Farman
|
484cfaca95 |
mm, slub: restore initial kmem_cache flags
The routine that applies debug flags to the kmem_cache slabs
inadvertantly prevents non-debug flags from being applied to those
same objects. That is, if slub_debug=<flag>,<slab> is specified,
non-debugged slabs will end up having flags of zero, and the slabs
may be unusable.
Fix this by including the input flags for non-matching slabs with the
contents of slub_debug, so that the caches are created as expected
alongside any debugging options that may be requested. With this, we
can remove the check for a NULL slub_debug_string, since it's covered
by the loop itself.
Fixes:
|
||
Eugeniu Rosca
|
dc07a728d4 |
mm: slub: fix conversion of freelist_corrupted()
Commit |
||
Roman Gushchin
|
74d555bed5 |
mm: slab: rename (un)charge_slab_page() to (un)account_slab_page()
charge_slab_page() and uncharge_slab_page() are not related anymore to memcg charging and uncharging. In order to make their names less confusing, let's rename them to account_slab_page() and unaccount_slab_page() respectively. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200707173612.124425-2-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Roman Gushchin
|
849504809f |
mm: memcg/slab: remove unused argument by charge_slab_page()
charge_slab_page() is not using the gfp argument anymore, remove it. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200707173612.124425-1-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Roman Gushchin
|
10befea91b |
mm: memcg/slab: use a single set of kmem_caches for all allocations
Instead of having two sets of kmem_caches: one for system-wide and non-accounted allocations and the second one shared by all accounted allocations, we can use just one. The idea is simple: space for obj_cgroup metadata can be allocated on demand and filled only for accounted allocations. It allows to remove a bunch of code which is required to handle kmem_cache clones for accounted allocations. There is no more need to create them, accumulate statistics, propagate attributes, etc. It's a quite significant simplification. Also, because the total number of slab_caches is reduced almost twice (not all kmem_caches have a memcg clone), some additional memory savings are expected. On my devvm it additionally saves about 3.5% of slab memory. [guro@fb.com: fix build on MIPS] Link: http://lkml.kernel.org/r/20200717214810.3733082-1-guro@fb.com Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-18-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Roman Gushchin
|
c7094406fc |
mm: memcg/slab: deprecate slab_root_caches
Currently there are two lists of kmem_caches: 1) slab_caches, which contains all kmem_caches, 2) slab_root_caches, which contains only root kmem_caches. And there is some preprocessor magic to have a single list if CONFIG_MEMCG_KMEM isn't enabled. It was required earlier because the number of non-root kmem_caches was proportional to the number of memory cgroups and could reach really big values. Now, when it cannot exceed the number of root kmem_caches, there is really no reason to maintain two lists. We never iterate over the slab_root_caches list on any hot paths, so it's perfectly fine to iterate over slab_caches and filter out non-root kmem_caches. It allows to remove a lot of config-dependent code and two pointers from the kmem_cache structure. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-16-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Roman Gushchin
|
9855609bde |
mm: memcg/slab: use a single set of kmem_caches for all accounted allocations
This is fairly big but mostly red patch, which makes all accounted slab allocations use a single set of kmem_caches instead of creating a separate set for each memory cgroup. Because the number of non-root kmem_caches is now capped by the number of root kmem_caches, there is no need to shrink or destroy them prematurely. They can be perfectly destroyed together with their root counterparts. This allows to dramatically simplify the management of non-root kmem_caches and delete a ton of code. This patch performs the following changes: 1) introduces memcg_params.memcg_cache pointer to represent the kmem_cache which will be used for all non-root allocations 2) reuses the existing memcg kmem_cache creation mechanism to create memcg kmem_cache on the first allocation attempt 3) memcg kmem_caches are named <kmemcache_name>-memcg, e.g. dentry-memcg 4) simplifies memcg_kmem_get_cache() to just return memcg kmem_cache or schedule it's creation and return the root cache 5) removes almost all non-root kmem_cache management code (separate refcounter, reparenting, shrinking, etc) 6) makes slab debugfs to display root_mem_cgroup css id and never show :dead and :deact flags in the memcg_slabinfo attribute. Following patches in the series will simplify the kmem_cache creation. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-13-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Roman Gushchin
|
964d4bd370 |
mm: memcg/slab: save obj_cgroup for non-root slab objects
Store the obj_cgroup pointer in the corresponding place of page->obj_cgroups for each allocated non-root slab object. Make sure that each allocated object holds a reference to obj_cgroup. Objcg pointer is obtained from the memcg->objcg dereferencing in memcg_kmem_get_cache() and passed from pre_alloc_hook to post_alloc_hook. Then in case of successful allocation(s) it's getting stored in the page->obj_cgroups vector. The objcg obtaining part look a bit bulky now, but it will be simplified by next commits in the series. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-9-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Roman Gushchin
|
4138fdfc8b |
mm: slub: implement SLUB version of obj_to_index()
This commit implements SLUB version of the obj_to_index() function, which
will be required to calculate the offset of obj_cgroup in the obj_cgroups
vector to store/obtain the objcg ownership data.
To make it faster, let's repeat the SLAB's trick introduced by commit
|
||
Roman Gushchin
|
d42f3245c7 |
mm: memcg: convert vmstat slab counters to bytes
In order to prepare for per-object slab memory accounting, convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat items to bytes. To make it obvious, rename them to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to NR_KERNEL_STACK_KB). Internally global and per-node counters are stored in pages, however memcg and lruvec counters are stored in bytes. This scheme may look weird, but only for now. As soon as slab pages will be shared between multiple cgroups, global and node counters will reflect the total number of slab pages. However memcg and lruvec counters will be used for per-memcg slab memory tracking, which will take separate kernel objects in the account. Keeping global and node counters in pages helps to avoid additional overhead. The size of slab memory shouldn't exceed 4Gb on 32-bit machines, so it will fit into atomic_long_t we use for vmstats. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20200623174037.3951353-4-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Marco Elver
|
cfbe1636c3 |
mm, kcsan: instrument SLAB/SLUB free with "ASSERT_EXCLUSIVE_ACCESS"
Provide the necessary KCSAN checks to assist with debugging racy use-after-frees. While KASAN is more reliable at generally catching such use-after-frees (due to its use of a quarantine), it can be difficult to debug racy use-after-frees. If a reliable reproducer exists, KCSAN can assist in debugging such issues. Note: ASSERT_EXCLUSIVE_ACCESS is a convenience wrapper if the size is simply sizeof(var). Instead, here we just use __kcsan_check_access() explicitly to pass the correct size. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200623072653.114563-1-elver@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Sebastian Andrzej Siewior
|
b3cb9fc3ae |
mm/slub.c: drop lockdep_assert_held() from put_map()
There is no point in using lockdep_assert_held() unlock that is about to be unlocked. It works only with lockdep and lockdep will complain if spin_unlock() is used on a lock that has not been locked. Remove superfluous lockdep_assert_held(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Yu Zhao <yuzhao@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200618201234.795692-2-bigeasy@linutronix.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
e42f174e43 |
mm, slab/slub: improve error reporting and overhead of cache_from_obj()
cache_from_obj() was added by commit |
||
Vlastimil Babka
|
d3c58f24be |
mm, slab/slub: move and improve cache_from_obj()
The function cache_from_obj() was added by commit |
||
Vlastimil Babka
|
8fc8d66642 |
mm, slub: extend checks guarded by slub_debug static key
There are few more places in SLUB that could benefit from reduced overhead of the static key introduced by a previous patch: - setup_object_debug() called on each object in newly allocated slab page - setup_page_debug() called on newly allocated slab page - __free_slab() called on freed slab page Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-9-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
59052e89fc |
mm, slub: introduce kmem_cache_debug_flags()
There are few places that call kmem_cache_debug(s) (which tests if any of debug flags are enabled for a cache) immediately followed by a test for a specific flag. The compiler can probably eliminate the extra check, but we can make the code nicer by introducing kmem_cache_debug_flags() that works like kmem_cache_debug() (including the static key check) but tests for specific flag(s). The next patches will add more users. [vbabka@suse.cz: change return from int to bool, per Kees. Add VM_WARN_ON_ONCE() for invalid flags, per Roman] Link: http://lkml.kernel.org/r/949b90ed-e0f0-07d7-4d21-e30ec0958a7c@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-8-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
ca0cab65ea |
mm, slub: introduce static key for slub_debug()
One advantage of CONFIG_SLUB_DEBUG is that a generic distro kernel can be built with the option enabled, but it's inactive until simply enabled on boot, without rebuilding the kernel. With a static key, we can further eliminate the overhead of checking whether a cache has a particular debug flag enabled if we know that there are no such caches (slub_debug was not enabled during boot). We use the same mechanism also for e.g. page_owner, debug_pagealloc or kmemcg functionality. This patch introduces the static key and makes the general check for per-cache debug flags kmem_cache_debug() use it. This benefits several call sites, including (slow path but still rather frequent) __slab_free(). The next patches will add more uses. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-7-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
8f58119ac4 |
mm, slub: make reclaim_account attribute read-only
The attribute reflects the SLAB_RECLAIM_ACCOUNT cache flag. It's not clear why this attribute was writable in the first place, as it's tied to how the cache is used by its creator, it's not a user tunable. Furthermore: - it affects slab merging, but that's not being checked while toggled - if affects whether __GFP_RECLAIMABLE flag is used to allocate page, but the runtime toggle doesn't update allocflags - it affects cache_vmstat_idx() so runtime toggling might lead to incosistency of NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE Thus make it read-only. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-6-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
060807f841 |
mm, slub: make remaining slub_debug related attributes read-only
SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be read to check if the respective debugging options are enabled for given cache. Some options, namely sanity_checks, trace, and failslab can be also enabled and disabled at runtime by writing into the files. The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when enabled, which means that in case of concurrent allocations, some can still use __CMPXCHG_DOUBLE and some not, leading to potential corruption. The s->flags field is also not updated or checked atomically. The simplest solution is to remove the runtime toggling. The extended slub_debug boot parameter syntax introduced by earlier patch should allow to fine-tune the debugging configuration during boot with same granularity. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-5-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
32a6f409b6 |
mm, slub: remove runtime allocation order changes
SLUB allows runtime changing of page allocation order by writing into the /sys/kernel/slab/<cache>/order file. Jann has reported [1] that this interface allows the order to be set too small, leading to crashes. While it's possible to fix the immediate issue, closer inspection reveals potential races. Storing the new order calls calculate_sizes() which non-atomically updates a lot of kmem_cache fields while the cache is still in use. Unexpected behavior might occur even if the fields are set to the same value as they were. This could be fixed by splitting out the part of calculate_sizes() that depends on forced_order, so that we only update kmem_cache.oo field. This could still race with init_cache_random_seq(), shuffle_freelist(), allocate_slab(). Perhaps it's possible to audit and e.g. add some READ_ONCE/WRITE_ONCE accesses, it might be easier just to remove the runtime order changes, which is what this patch does. If there are valid usecases for per-cache order setting, we could e.g. extend the boot parameters to do that. [1] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-4-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
ad38b5b113 |
mm, slub: make some slub_debug related attributes read-only
SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be read to check if the respective debugging options are enabled for given cache. The options can be also toggled at runtime by writing into the files. Some of those, namely red_zone, poison, and store_user can be toggled only when no objects yet exist in the cache. Vijayanand reports [1] that there is a problem with freelist randomization if changing the debugging option's state results in different number of objects per page, and the random sequence cache needs thus needs to be recomputed. However, another problem is that the check for "no objects yet exist in the cache" is racy, as noted by Jann [2] and fixing that would add overhead or otherwise complicate the allocation/freeing paths. Thus it would be much simpler just to remove the runtime toggling support. The documentation describes it's "In case you forgot to enable debugging on the kernel command line", but the neccessity of having no objects limits its usefulness anyway for many caches. Vijayanand describes an use case [3] where debugging is enabled for all but zram caches for memory overhead reasons, and using the runtime toggles was the only way to achieve such configuration. After the previous patch it's now possible to do that directly from the kernel boot option, so we can remove the dangerous runtime toggles by making the /sys attribute files read-only. While updating it, also improve the documentation of the debugging /sys files. [1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org [2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com [3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org Reported-by: Vijayanand Jitta <vjitta@codeaurora.org> Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20200610163135.17364-3-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
e17f1dfba3 |
mm, slub: extend slub_debug syntax for multiple blocks
Patch series "slub_debug fixes and improvements". The slub_debug kernel boot parameter can either apply a single set of options to all caches or a list of caches. There is a use case where debugging is applied for all caches and then disabled at runtime for specific caches, for performance and memory consumption reasons [1]. As runtime changes are dangerous, extend the boot parameter syntax so that multiple blocks of either global or slab-specific options can be specified, with blocks delimited by ';'. This will also support the use case of [1] without runtime changes. For details see the updated Documentation/vm/slub.rst [1] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org [weiyongjun1@huawei.com: make parse_slub_debug_flags() static] Link: http://lkml.kernel.org/r/20200702150522.4940-1-weiyongjun1@huawei.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200610163135.17364-2-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Long Li
|
444050990d |
mm, slab: check GFP_SLAB_BUG_MASK before alloc_pages in kmalloc_order
kmalloc cannot allocate memory from HIGHMEM. Allocating large amounts of memory currently bypasses the check and will simply leak the memory when page_address() returns NULL. To fix this, factor the GFP_SLAB_BUG_MASK check out of slab & slub, and call it from kmalloc_order() as well. In order to make the code clear, the warning message is put in one place. Signed-off-by: Long Li <lonuxli.64@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200704035027.GA62481@lilong Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kees Cook
|
3f649ab728 |
treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> |
||
Sebastian Andrzej Siewior
|
55860d96ca |
slub: cure list_slab_objects() from double fix
According to Christopher Lameter two fixes have been merged for the same
problem. As far as I can tell, the code does not acquire the list_lock
and invoke kmalloc(). list_slab_objects() misses an unlock (the
counterpart to get_map()) and the memory allocated in free_partial()
isn't used.
Revert the mentioned commit.
Link: http://lkml.kernel.org/r/20200618201234.795692-1-bigeasy@linutronix.de
Fixes:
|
||
Christoph Hellwig
|
fe557319aa |
maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Ethon Paul
|
0d645ed19c |
mm/slub: fix a typo in comment "disambiguiation"->"disambiguation"
There is a typo in comment, fix it. Signed-off-by: Ethon Paul <ethp@qq.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/20200411002247.14468-1-ethp@qq.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Joonsoo Kim
|
97a225e69a |
mm/page_alloc: integrate classzone_idx and high_zoneidx
classzone_idx is just different name for high_zoneidx now. So, integrate them and add some comment to struct alloc_context in order to reduce future confusion about the meaning of this variable. The accessor, ac_classzone_idx() is also removed since it isn't needed after integration. In addition to integration, this patch also renames high_zoneidx to highest_zoneidx since it represents more precise meaning. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Ye Xiaolong <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/1587095923-7515-3-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Wang Hai
|
dde3c6b72a |
mm/slub: fix a memory leak in sysfs_slab_add()
syzkaller reports for memory leak when kobject_init_and_add() returns an
error in the function sysfs_slab_add() [1]
When this happened, the function kobject_put() is not called for the
corresponding kobject, which potentially leads to memory leak.
This patch fixes the issue by calling kobject_put() even if
kobject_init_and_add() fails.
[1]
BUG: memory leak
unreferenced object 0xffff8880a6d4be88 (size 8):
comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s)
hex dump (first 8 bytes):
70 69 64 5f 33 00 ff ff pid_3...
backtrace:
kstrdup+0x35/0x70 mm/util.c:60
kstrdup_const+0x3d/0x50 mm/util.c:82
kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289
kobject_add_varg lib/kobject.c:384 [inline]
kobject_init_and_add+0xd8/0x170 lib/kobject.c:473
sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811
__kmem_cache_create+0x50a/0x570 mm/slub.c:4384
create_cache+0x113/0x1e0 mm/slab_common.c:407
kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505
kmem_cache_create+0xd/0x10 mm/slab_common.c:564
create_pid_cachep kernel/pid_namespace.c:54 [inline]
create_pid_namespace kernel/pid_namespace.c:96 [inline]
copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148
create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95
unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229
ksys_unshare+0x3d2/0x770 kernel/fork.c:2969
__do_sys_unshare kernel/fork.c:3037 [inline]
__se_sys_unshare kernel/fork.c:3035 [inline]
__x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035
do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295
Fixes:
|
||
Linus Torvalds
|
faa392181a |
drm pull for 5.8-rc1
core: - uapi: error out EBUSY when existing master - uapi: rework SET/DROP MASTER permission handling - remove drm_pci.h - drm_pci* are now legacy - introduced managed DRM resources - subclassing support for drm_framebuffer - simple encoder helper - edid improvements - vblank + writeback documentation improved - drm/mm - optimise tree searches - port drivers to use devm_drm_dev_alloc dma-buf: - add flag for p2p buffer support mst: - ACT timeout improvements - remove drm_dp_mst_has_audio - don't use 2nd TX slot - spec recommends against it bridge: - dw-hdmi various improvements - chrontel ch7033 support - fix stack issues with old gcc hdmi: - add unpack function for drm infoframe fbdev: - misc fbdev driver fixes i915: - uapi: global sseu pinning - uapi: OA buffer polling - uapi: remove generated perf code - uapi: per-engine default property values in sysfs - Tigerlake GEN12 enabled. - Lots of gem refactoring - Tigerlake enablement patches - move to drm_device logging - Icelake gamma HW readout - push MST link retrain to hotplug work - bandwidth atomic helpers - ICL fixes - RPS/GT refactoring - Cherryview full-ppgtt support - i915 locking guidelines documented - require linear fb stride to be 512 multiple on gen9 - Tigerlake SAGV support amdgpu: - uapi: encrypted GPU memory handling - uapi: add MEM_SYNC IB flag - p2p dma-buf support - export VRAM dma-bufs - FRU chip access support - RAS/SR-IOV updates - Powerplay locking fixes - VCN DPG (powergating) enablement - GFX10 clockgating fixes - DC fixes - GPU reset fixes - navi SDMA fix - expose FP16 for modesetting - DP 1.4 compliance fixes - gfx10 soft recovery - Improved Critical Thermal Faults handling - resizable BAR on gmc10 amdkfd: - uapi: GWS resource management - track GPU memory per process - report PCI domain in topology radeon: - safe reg list generator fixes nouveau: - HD audio fixes on recent systems - vGPU detection (fail probe if we're on one, for now) - Interlaced mode fixes (mostly avoidance on Turing, which doesn't support it) - SVM improvements/fixes - NVIDIA format modifier support - Misc other fixes. adv7511: - HDMI SPDIF support ast: - allocate crtc state size - fix double assignment - fix suspend bochs: - drop connector register cirrus: - move to tiny drivers. exynos: - fix imported dma-buf mapping - enable runtime PM - fixes and cleanups mediatek: - DPI pin mode swap - config mipi_tx current/impedance lima: - devfreq + cooling device support - task handling improvements - runtime PM support pl111: - vexpress init improvements - fix module auto-load rcar-du: - DT bindings conversion to YAML - Planes zpos sanity check and fix - MAINTAINERS entry for LVDS panel driver mcde: - fix return value mgag200: - use managed config init stm: - read endpoints from DT vboxvideo: - use PCI managed functions - drop WC mtrr vkms: - enable cursor by default rockchip: - afbc support virtio: - various cleanups qxl: - fix cursor notify port hisilicon: - 128-byte stride alignment fix sun4i: - improved format handling -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJe1edsAAoJEAx081l5xIa+bKEQAJAZv/8OMM2rx+p+GyKgrNpl ihTX/oyToy8dw97s1kWF7V5kKU+qjF8aWlKoPS0xovzaMAzYSFz9FRNEUgqtTXMI zIAzSXioqP21oL9/ZTHcXDULtz8Gk3uiPomgXMWLlNBdt3X5qvCwsmPRIYSwG0GJ 00VCvxDbVxGSM3wzcvbfyRwHCq3SrFvIusXv5jHnnxEFGH0C7Mj2/FLYMKLNjvli Q8VEI2wQPZj1QdA8fLFVneIQsR6YUSko9OfFMANP8VJGpPMnUkvVxTJ5ACGJspvn U/h6NYqJeUU2Y3BSKqtjIC3a1LY51tp5tL9q4H9TD1hqMckt6F2V7T2IeFU8i6+V YzUsSiT4q1xB+uiFVcgopx2hyIp8INOEyWrVdYgw2JviROeRD+pDHvJd13ZNMnTe GvLWQ/PfBFrcz8eligjiYjOf66ZTU+j/rivaOBFyrs9gdlsaEW2QRurFrcNX+0lZ kDbLsIFjhYnPXsvHP87x4BuQCKQIEh8wWuxXuJjunBPdqVrJyltZWbBiKO571b5/ BtX6xj6ztUOffR2RdiVanzY546I2hEi7SHMUuWnMqXsOV46GBN0QvlpZad/47n9x ZUy8HDDD0/qWuGwvPOJGIeAnUteWge9AhWXTeN5+1h5m+QEOzYkPKqC3Hp8TW1pM gToTWgAhnu731fhzLWyt =H7IS -----END PGP SIGNATURE----- Merge tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm Pull drm updates from Dave Airlie: "Highlights: - Core DRM had a lot of refactoring around managed drm resources to make drivers simpler. - Intel Tigerlake support is on by default - amdgpu now support p2p PCI buffer sharing and encrypted GPU memory Details: core: - uapi: error out EBUSY when existing master - uapi: rework SET/DROP MASTER permission handling - remove drm_pci.h - drm_pci* are now legacy - introduced managed DRM resources - subclassing support for drm_framebuffer - simple encoder helper - edid improvements - vblank + writeback documentation improved - drm/mm - optimise tree searches - port drivers to use devm_drm_dev_alloc dma-buf: - add flag for p2p buffer support mst: - ACT timeout improvements - remove drm_dp_mst_has_audio - don't use 2nd TX slot - spec recommends against it bridge: - dw-hdmi various improvements - chrontel ch7033 support - fix stack issues with old gcc hdmi: - add unpack function for drm infoframe fbdev: - misc fbdev driver fixes i915: - uapi: global sseu pinning - uapi: OA buffer polling - uapi: remove generated perf code - uapi: per-engine default property values in sysfs - Tigerlake GEN12 enabled. - Lots of gem refactoring - Tigerlake enablement patches - move to drm_device logging - Icelake gamma HW readout - push MST link retrain to hotplug work - bandwidth atomic helpers - ICL fixes - RPS/GT refactoring - Cherryview full-ppgtt support - i915 locking guidelines documented - require linear fb stride to be 512 multiple on gen9 - Tigerlake SAGV support amdgpu: - uapi: encrypted GPU memory handling - uapi: add MEM_SYNC IB flag - p2p dma-buf support - export VRAM dma-bufs - FRU chip access support - RAS/SR-IOV updates - Powerplay locking fixes - VCN DPG (powergating) enablement - GFX10 clockgating fixes - DC fixes - GPU reset fixes - navi SDMA fix - expose FP16 for modesetting - DP 1.4 compliance fixes - gfx10 soft recovery - Improved Critical Thermal Faults handling - resizable BAR on gmc10 amdkfd: - uapi: GWS resource management - track GPU memory per process - report PCI domain in topology radeon: - safe reg list generator fixes nouveau: - HD audio fixes on recent systems - vGPU detection (fail probe if we're on one, for now) - Interlaced mode fixes (mostly avoidance on Turing, which doesn't support it) - SVM improvements/fixes - NVIDIA format modifier support - Misc other fixes. adv7511: - HDMI SPDIF support ast: - allocate crtc state size - fix double assignment - fix suspend bochs: - drop connector register cirrus: - move to tiny drivers. exynos: - fix imported dma-buf mapping - enable runtime PM - fixes and cleanups mediatek: - DPI pin mode swap - config mipi_tx current/impedance lima: - devfreq + cooling device support - task handling improvements - runtime PM support pl111: - vexpress init improvements - fix module auto-load rcar-du: - DT bindings conversion to YAML - Planes zpos sanity check and fix - MAINTAINERS entry for LVDS panel driver mcde: - fix return value mgag200: - use managed config init stm: - read endpoints from DT vboxvideo: - use PCI managed functions - drop WC mtrr vkms: - enable cursor by default rockchip: - afbc support virtio: - various cleanups qxl: - fix cursor notify port hisilicon: - 128-byte stride alignment fix sun4i: - improved format handling" * tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm: (1401 commits) drm/amd/display: Fix potential integer wraparound resulting in a hang drm/amd/display: drop cursor position check in atomic test drm/amdgpu: fix device attribute node create failed with multi gpu drm/nouveau: use correct conflicting framebuffer API drm/vblank: Fix -Wformat compile warnings on some arches drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode drm/amd/display: Handle GPU reset for DC block drm/amdgpu: add apu flags (v2) drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven drm/amdgpu: fix pm sysfs node handling (v2) drm/amdgpu: move gpu_info parsing after common early init drm/amdgpu: move discovery gfx config fetching drm/nouveau/dispnv50: fix runtime pm imbalance on error drm/nouveau: fix runtime pm imbalance on error drm/nouveau: fix runtime pm imbalance on error drm/nouveau/debugfs: fix runtime pm imbalance on error drm/nouveau/nouveau/hmm: fix migrate zero page to GPU drm/nouveau/nouveau/hmm: fix nouveau_dmem_chunk allocations drm/nouveau/kms/nv50-: Share DP SST mode_valid() handling with MST drm/nouveau/kms/nv50-: Move 8BPC limit for MST into nv50_mstc_get_modes() ... |
||
Qian Cai
|
a68ee05739 |
mm/slub: fix stack overruns with SLUB_STATS
There is no need to copy SLUB_STATS items from root memcg cache to new
memcg cache copies. Doing so could result in stack overruns because the
store function only accepts 0 to clear the stat and returns an error for
everything else while the show method would print out the whole stat.
Then, the mismatch of the lengths returns from show and store methods
happens in memcg_propagate_slab_attrs():
else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf))
buf = mbuf;
max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64]
in show_stat() later where a bounch of sprintf() would overrun the stack
variable. Fix it by always allocating a page of buffer to be used in
show_stat() if SLUB_STATS=y which should only be used for debug purpose.
# echo 1 > /sys/kernel/slab/fs_cache/shrink
BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0
Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
Call Trace:
number+0x421/0x6e0
vsnprintf+0x451/0x8e0
sprintf+0x9e/0xd0
show_stat+0x124/0x1d0
alloc_slowpath_show+0x13/0x20
__kmem_cache_create+0x47a/0x6b0
addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame:
process_one_work+0x0/0xb90
this frame has 1 object:
[32, 72) 'lockdep_map'
Memory state around the buggy address:
ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
^
ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00
ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0
Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
Call Trace:
__kmem_cache_create+0x6ac/0x6b0
Fixes:
|
||
Christopher Lameter
|
aa456c7aeb |
slub: remove kmalloc under list_lock from list_slab_objects() V2
list_slab_objects() is called when a slab is destroyed and there are
objects still left to list the objects in the syslog. This is a pretty
rare event.
And there it seems we take the list_lock and call kmalloc while holding
that lock.
Perform the allocation in free_partial() before the list_lock is taken.
Fixes:
|
||
Christoph Lameter
|
d7660ce591 |
slub: Remove userspace notifier for cache add/remove
I came across some unnecessary uevents once again which reminded me this. The patch seems to be lost in the leaves of the original discussion [1], so resending. [1] https://lore.kernel.org/r/alpine.DEB.2.21.2001281813130.745@www.lameter.com Kmem caches are internal kernel structures so it is strange that userspace notifiers would be needed. And I am not aware of any use of these notifiers. These notifiers may just exist because in the initial slub release the sysfs code was copied from another subsystem. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Koutný <mkoutny@suse.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200423115721.19821-1-mkoutny@suse.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Dongli Zhang
|
52f2347808 |
mm/slub.c: fix corrupted freechain in deactivate_slab()
The slub_debug is able to fix the corrupted slab freelist/page. However, alloc_debug_processing() only checks the validity of current and next freepointer during allocation path. As a result, once some objects have their freepointers corrupted, deactivate_slab() may lead to page fault. Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128 slub_nomerge'. The test kernel corrupts the freepointer of one free object on purpose. Unfortunately, deactivate_slab() does not detect it when iterating the freechain. BUG: unable to handle page fault for address: 00000000123456f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... ... RIP: 0010:deactivate_slab.isra.92+0xed/0x490 ... ... Call Trace: ___slab_alloc+0x536/0x570 __slab_alloc+0x17/0x30 __kmalloc+0x1d9/0x200 ext4_htree_store_dirent+0x30/0xf0 htree_dirblock_to_tree+0xcb/0x1c0 ext4_htree_fill_tree+0x1bc/0x2d0 ext4_readdir+0x54f/0x920 iterate_dir+0x88/0x190 __x64_sys_getdents+0xa6/0x140 do_syscall_64+0x49/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Therefore, this patch adds extra consistency check in deactivate_slab(). Once an object's freepointer is corrupted, all following objects starting at this object are isolated. [akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n] Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Jin <joe.jin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Waiman Long
|
cbfc35a486 |
mm/slub: fix incorrect interpretation of s->offset
In a couple of places in the slub memory allocator, the code uses "s->offset" as a check to see if the free pointer is put right after the object. That check is no longer true with commit |
||
Dave Airlie
|
1aa63ddf72 |
drm-misc-next for 5.8:
UAPI Changes: - drm: error out with EBUSY when device has existing master - drm: rework SET_MASTER and DROP_MASTER perm handling Cross-subsystem Changes: - fbdev: savage: fix -Wextra build warning - video: omap2: Use scnprintf() for avoiding potential buffer overflow Core Changes: - Remove drm_pci.h - drm_pci_{alloc/free)() are now legacy - Introduce managed DRM resourcesA - Allow drivers to subclass struct drm_framebuffer - Introduce struct drm_afbc_framebuffer and helpers - fbdev: remove return value from generic fbdev setup - Introduce simple-encoder helper - vram-helpers: set fence on plane - dp_mst: ACT timeout improvements - dp_mst: Remove drm_dp_mst_has_audio() - TTM: ttm_trace_dma_{map/unmap}() cleanups - dma-buf: add flag for PCIP2P support - EDID: Various improvements - Encoder: cleanup semantics of possible_clones and possible_crtcs - VBLANK documentation updates - Writeback documentation updates Driver Changes: - Convert several drivers to i2c_new_client_device() - Drop explicit drm_mode_config_cleanup() calls from drivers - Auto-release device structures with drmm_add_final_kfree() - Init bfdev console after registering DRM device - Make various .debugfs functions return 0 unconditionally; ignore errors - video: Use scnprintf() to avoid buffer overflows - Convert drivers to simple encoders - drm/amdgpu: note that we can handle peer2peer DMA-buf - drm/amdgpu: add support for exporting VRAM using DMA-buf v3 - drm/kirin: Revert change to register connectors - drm/lima: Add optional devfreq and cooling device support - drm/lima: Various improvements wrt. task handling - drm/panel: nt39016: Support multiple modes and 50Hz - drm/panel: Support Leadtek LTK050H3146W - drm/rockchip: Add support for afbc - drm/virtio: Various cleanups - drm/hisilicon/hibmc: Enforce 128-byte stride alignment - drm/qxl: Fix notify port address of cursor ring buffer - drm/sun4i: Improvements to format handling - drm/bridge: dw-hdmi: Various improvements -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAl6VfAAACgkQaA3BHVML eiNjBwgAtzRaqrKX3c4aL4NCBmfWzqxvKN0fVcx8tHtjhmrPTLITsHCM+wfcD2qC lkr/RMYJT02pNPGnX3jamQk0q/2GKGagChVZgORRsdYOOf5IqGIjvllhkg+U+7YV X0pHAfvGk2VyriHYj3s/cnwi9OwZ2UFjdS+f/u2Qp9jQYG/k8u9CCSnzgratY99I bI4jZi6JIoRkwuBpBEc9NbrduenKhcYNgPLDiYXY2TFmVz89NwITPnLyf5FWG5zd HsQ+dfIS9eoIxL3DTRgBZrPMvrqgiUjztB7cM4bdE0ttwTS7MW6M50/iV553qb9k DZ1+/pWFFyZLOPUYc3EK/QYdu8R3QA== =MQkd -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2020-04-14' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.8: UAPI Changes: - drm: error out with EBUSY when device has existing master - drm: rework SET_MASTER and DROP_MASTER perm handling Cross-subsystem Changes: - mm: export two symbols from slub/slob - fbdev: savage: fix -Wextra build warning - video: omap2: Use scnprintf() for avoiding potential buffer overflow Core Changes: - Remove drm_pci.h - drm_pci_{alloc/free)() are now legacy - Introduce managed DRM resourcesA - Allow drivers to subclass struct drm_framebuffer - Introduce struct drm_afbc_framebuffer and helpers - fbdev: remove return value from generic fbdev setup - Introduce simple-encoder helper - vram-helpers: set fence on plane - dp_mst: ACT timeout improvements - dp_mst: Remove drm_dp_mst_has_audio() - TTM: ttm_trace_dma_{map/unmap}() cleanups - dma-buf: add flag for PCIP2P support - EDID: Various improvements - Encoder: cleanup semantics of possible_clones and possible_crtcs - VBLANK documentation updates - Writeback documentation updates Driver Changes: - Convert several drivers to i2c_new_client_device() - Drop explicit drm_mode_config_cleanup() calls from drivers - Auto-release device structures with drmm_add_final_kfree() - Init bfdev console after registering DRM device - Make various .debugfs functions return 0 unconditionally; ignore errors - video: Use scnprintf() to avoid buffer overflows - Convert drivers to simple encoders - drm/amdgpu: note that we can handle peer2peer DMA-buf - drm/amdgpu: add support for exporting VRAM using DMA-buf v3 - drm/kirin: Revert change to register connectors - drm/lima: Add optional devfreq and cooling device support - drm/lima: Various improvements wrt. task handling - drm/panel: nt39016: Support multiple modes and 50Hz - drm/panel: Support Leadtek LTK050H3146W - drm/rockchip: Add support for afbc - drm/virtio: Various cleanups - drm/hisilicon/hibmc: Enforce 128-byte stride alignment - drm/qxl: Fix notify port address of cursor ring buffer - drm/sun4i: Improvements to format handling - drm/bridge: dw-hdmi: Various improvements Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20200414090738.GA16827@linux-uq9g |
||
Kees Cook
|
89b83f282d |
slub: avoid redzone when choosing freepointer location
Marco Elver reported system crashes when booting with "slub_debug=Z".
The freepointer location (s->offset) was not taking into account that
the "inuse" size that includes the redzone area should not be used by
the freelist pointer. Change the calculation to save the area of the
object that an inline freepointer may be written into.
Fixes:
|
||
Jules Irenge
|
81aba9e06b |
mm/slub: add missing annotation for put_map()
Sparse reports a warning at put_map()() warning: context imbalance in put_map() - unexpected unlock The root cause is the missing annotation at put_map() Add the missing __releases(&object_map_lock) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200214204741.94112-10-jbi.octave@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Jules Irenge
|
31364c2e16 |
mm/slub: add missing annotation for get_map()
Sparse reports a warning at get_map()() warning: context imbalance in get_map() - wrong count at exit The root cause is the missing annotation at get_map() Add the missing __acquires(&object_map_lock) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200214204741.94112-9-jbi.octave@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kees Cook
|
3202fa62fb |
slub: relocate freelist pointer to middle of object
In a recent discussion[1] with Vitaly Nikolenko and Silvio Cesare, it became clear that moving the freelist pointer away from the edge of allocations would likely improve the overall defensive posture of the inline freelist pointer. My benchmarks show no meaningful change to performance (they seem to show it being faster), so this looks like a reasonable change to make. Instead of having the freelist pointer at the very beginning of an allocation (offset 0) or at the very end of an allocation (effectively offset -sizeof(void *) from the next allocation), move it away from the edges of the allocation and into the middle. This provides some protection against small-sized neighboring overflows (or underflows), for which the freelist pointer is commonly the target. (Large or well controlled overwrites are much more likely to attack live object contents, instead of attempting freelist corruption.) The vaunted kernel build benchmark, across 5 runs. Before: Mean: 250.05 Std Dev: 1.85 and after, which appears mysteriously faster: Mean: 247.13 Std Dev: 0.76 Attempts at running "sysbench --test=memory" show the change to be well in the noise (sysbench seems to be pretty unstable here -- it's not really measuring allocation). Hackbench is more allocation-heavy, and while the std dev is above the difference, it looks like may manifest as an improvement as well: 20 runs of "hackbench -g 20 -l 1000", before: Mean: 36.322 Std Dev: 0.577 and after: Mean: 36.056 Std Dev: 0.598 [1] https://twitter.com/vnik5287/status/1235113523098685440 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Vitaly Nikolenko <vnik@duasynt.com> Cc: Silvio Cesare <silvio.cesare@gmail.com> Cc: Christoph Lameter <cl@linux.com>Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/202003051624.AAAC9AECC@keescook Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kees Cook
|
1ad53d9fa3 |
slub: improve bit diffusion for freelist ptr obfuscation
Under CONFIG_SLAB_FREELIST_HARDENED=y, the obfuscation was relatively weak
in that the ptr and ptr address were usually so close that the first XOR
would result in an almost entirely 0-byte value[1], leaving most of the
"secret" number ultimately being stored after the third XOR. A single
blind memory content exposure of the freelist was generally sufficient to
learn the secret.
Add a swab() call to mix bits a little more. This is a cheap way (1
cycle) to make attacks need more than a single exposure to learn the
secret (or to know _where_ the exposure is in memory).
kmalloc-32 freelist walk, before:
ptr ptr_addr stored value secret
ffff90c22e019020@ffff90c22e019000 is 86528eb656b3b5bd (86528eb656b3b59d)
ffff90c22e019040@ffff90c22e019020 is 86528eb656b3b5fd (86528eb656b3b59d)
ffff90c22e019060@ffff90c22e019040 is 86528eb656b3b5bd (86528eb656b3b59d)
ffff90c22e019080@ffff90c22e019060 is 86528eb656b3b57d (86528eb656b3b59d)
ffff90c22e0190a0@ffff90c22e019080 is 86528eb656b3b5bd (86528eb656b3b59d)
...
after:
ptr ptr_addr stored value secret
ffff9eed6e019020@ffff9eed6e019000 is 793d1135d52cda42 (86528eb656b3b59d)
ffff9eed6e019040@ffff9eed6e019020 is 593d1135d52cda22 (86528eb656b3b59d)
ffff9eed6e019060@ffff9eed6e019040 is 393d1135d52cda02 (86528eb656b3b59d)
ffff9eed6e019080@ffff9eed6e019060 is 193d1135d52cdae2 (86528eb656b3b59d)
ffff9eed6e0190a0@ffff9eed6e019080 is f93d1135d52cdac2 (86528eb656b3b59d)
[1] https://blog.infosectcbr.com.au/2020/03/weaknesses-in-linux-kernel-heap.html
Fixes:
|
||
chenqiwu
|
bbd4e305e3 |
mm/slub.c: replace kmem_cache->cpu_partial with wrapped APIs
There are slub_cpu_partial() and slub_set_cpu_partial() APIs to wrap kmem_cache->cpu_partial. This patch will use the two APIs to replace kmem_cache->cpu_partial in slub code. Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/1582079562-17980-1-git-send-email-qiwuchen55@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
chenqiwu
|
4c7ba22e4c |
mm/slub.c: replace cpu_slab->partial with wrapped APIs
There are slub_percpu_partial() and slub_set_percpu_partial() APIs to wrap kmem_cache->cpu_partial. This patch will use the two to replace cpu_slab->partial in slub code. Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/1581951895-3038-1-git-send-email-qiwuchen55@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Daniel Vetter
|
fd7cb5753e |
mm/sl[uo]b: export __kmalloc_track(_node)_caller
slab does this already, and I want to use this in a memory allocation tracker in drm for stuff that's tied to the lifetime of a drm_device, not the underlying struct device. Kinda like devres, but for drm. Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Link: https://patchwork.freedesktop.org/patch/msgid/20200323144950.3018436-2-daniel.vetter@ffwll.ch |
||
Vlastimil Babka
|
0715e6c516 |
mm, slub: prevent kmalloc_node crashes and memory leaks
Sachin reports [1] a crash in SLUB __slab_alloc(): BUG: Kernel NULL pointer dereference on read at 0x000073b0 Faulting instruction address: 0xc0000000003d55f4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1 NIP: c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000 REGS: c0000008b37836d0 TRAP: 0300 Not tainted (5.6.0-rc2-next-20200218-autotest) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004844 XER: 00000000 CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500 GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620 GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000 GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000 GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002 GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122 GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8 GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180 NIP ___slab_alloc+0x1f4/0x760 LR __slab_alloc+0x34/0x60 Call Trace: ___slab_alloc+0x334/0x760 (unreliable) __slab_alloc+0x34/0x60 __kmalloc_node+0x110/0x490 kvmalloc_node+0x58/0x110 mem_cgroup_css_online+0x108/0x270 online_css+0x48/0xd0 cgroup_apply_control_enable+0x2ec/0x4d0 cgroup_mkdir+0x228/0x5f0 kernfs_iop_mkdir+0x90/0xf0 vfs_mkdir+0x110/0x230 do_mkdirat+0xb0/0x1a0 system_call+0x5c/0x68 This is a PowerPC platform with following NUMA topology: available: 2 nodes (0-1) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 35247 MB node 1 free: 30907 MB node distances: node 0 1 0: 10 40 1: 40 10 possible numa nodes: 0-31 This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node" [2] which effectively calls kmalloc_node for each possible node. SLUB however only allocates kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on node_to_mem_node to return such valid node for other nodes since commit |
||
Linus Torvalds
|
5076190dad |
mm: slub: be more careful about the double cmpxchg of freelist
This is just a cleanup addition to Jann's fix to properly update the
transaction ID for the slub slowpath in commit
|
||
Jann Horn
|
fd4d9c7d0c |
mm: slub: add missing TID bump in kmem_cache_alloc_bulk()
When kmem_cache_alloc_bulk() attempts to allocate N objects from a percpu
freelist of length M, and N > M > 0, it will first remove the M elements
from the percpu freelist, then call ___slab_alloc() to allocate the next
element and repopulate the percpu freelist. ___slab_alloc() can re-enable
IRQs via allocate_slab(), so the TID must be bumped before ___slab_alloc()
to properly commit the freelist head change.
Fix it by unconditionally bumping c->tid when entering the slowpath.
Cc: stable@vger.kernel.org
Fixes:
|
||
Yu Zhao
|
90e9f6a66c |
mm/slub.c: avoid slub allocation while holding list_lock
If we are already under list_lock, don't call kmalloc(). Otherwise we will run into a deadlock because kmalloc() also tries to grab the same lock. Fix the problem by using a static bitmap instead. WARNING: possible recursive locking detected -------------------------------------------- mount-encrypted/4921 is trying to acquire lock: (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 but task is already holding lock: (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&n->list_lock)->rlock); lock(&(&n->list_lock)->rlock); *** DEADLOCK *** Link: http://lkml.kernel.org/r/20191108193958.205102-2-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
c677124e63 |
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ... |
||
Sebastian Andrzej Siewior
|
cb923159bb |
smp: Remove allocation mask from on_each_cpu_cond.*()
The allocation mask is no longer used by on_each_cpu_cond() and on_each_cpu_cond_mask() and can be removed. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200117090137.1205765-4-bigeasy@linutronix.de |
||
Vlastimil Babka
|
8e57f8acbb |
mm, debug_pagealloc: don't rely on static keys too early
Commit |
||
Thomas Gleixner
|
923717cbab |
sched/rt, mm: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the pte_unmap_same() and SLUB code over to use CONFIG_PREEMPTION. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Chistoph Lameter <cl@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20191015191821.11479-26-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Yu Zhao
|
dd98afd4d6 |
mm/slub.c: clean up validate_slab()
The function doesn't need to return any value, and the check can be done in one pass. There is a behavior change: before the patch, we stop at the first invalid free object; after the patch, we stop at the first invalid object, free or in use. This shouldn't matter because the original behavior isn't intended anyway. Link: http://lkml.kernel.org/r/20191108193958.205102-1-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Yu Zhao
|
aed6814894 |
mm/slub.c: update comments
Slub doesn't use PG_active and PG_error anymore. Link: http://lkml.kernel.org/r/20191007222023.162256-1-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Miles Chen
|
e1b70dd1e6 |
mm: slub: print the offset of fault addresses
With commit
|
||
Laura Abbott
|
aea4df4c53 |
mm: slub: really fix slab walking for init_on_free
Commit |
||
Alexander Potapenko
|
0f181f9fbe |
mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations
slab_alloc_node() already zeroed out the freelist pointer if
init_on_free was on. Thibaut Sautereau noticed that the same needs to
be done for kmem_cache_alloc_bulk(), which performs the allocations
separately.
kmem_cache_alloc_bulk() is currently used in two places in the kernel,
so this change is unlikely to have a major performance impact.
SLAB doesn't require a similar change, as auto-initialization makes the
allocator store the freelist pointers off-slab.
Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com
Fixes:
|
||
Qian Cai
|
e4f8e513c3 |
mm/slub: fix a deadlock in show_slab_objects()
A long time ago we fixed a similar deadlock in show_slab_objects() [1]. However, it is apparently due to the commits like |
||
Vlastimil Babka
|
6a486c0ad4 |
mm, sl[ou]b: improve memory accounting
Patch series "guarantee natural alignment for kmalloc()", v2. This patch (of 2): SLOB currently doesn't account its pages at all, so in /proc/meminfo the Slab field shows zero. Modifying a counter on page allocation and freeing should be acceptable even for the small system scenarios SLOB is intended for. Since reclaimable caches are not separated in SLOB, account everything as unreclaimable. SLUB currently doesn't account kmalloc() and kmalloc_node() allocations larger than order-1 page, that are passed directly to the page allocator. As they also don't appear in /proc/slabinfo, it might look like a memory leak. For consistency, account them as well. (SLAB doesn't actually use page allocator directly, so no change there). Ideally SLOB and SLUB would be handled in separate patches, but due to the shared kmalloc_order() function and different kfree() implementations, it's easier to patch both at once to prevent inconsistencies. Link: http://lkml.kernel.org/r/20190826111627.7505-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Darrick J . Wong" <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Matthew Wilcox (Oracle)
|
a50b854e07 |
mm: introduce page_size()
Patch series "Make working with compound pages easier", v2. These three patches add three helpers and convert the appropriate places to use them. This patch (of 3): It's unnecessarily hard to find out the size of a potentially huge page. Replace 'PAGE_SIZE << compound_order(page)' with page_size(page). Link: http://lkml.kernel.org/r/20190721104612.19120-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Qian Cai
|
9d5f0be0f7 |
mm/slub.c: fix -Wunused-function compiler warnings
tid_to_cpu() and tid_to_event() are only used in note_cmpxchg_failure() when SLUB_DEBUG_CMPXCHG=y, so when SLUB_DEBUG_CMPXCHG=n by default, Clang will complain that those unused functions. Link: http://lkml.kernel.org/r/1568752232-5094-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Waiman Long
|
04f768a39d |
mm, slab: extend slab/shrink to shrink all memcg caches
Currently, a value of '1" is written to /sys/kernel/slab/<slab>/shrink file to shrink the slab by flushing out all the per-cpu slabs and free slabs in partial lists. This can be useful to squeeze out a bit more memory under extreme condition as well as making the active object counts in /proc/slabinfo more accurate. This usually applies only to the root caches, as the SLUB_MEMCG_SYSFS_ON option is usually not enabled and "slub_memcg_sysfs=1" not set. Even if memcg sysfs is turned on, it is too cumbersome and impractical to manage all those per-memcg sysfs files in a real production system. So there is no practical way to shrink memcg caches. Fix this by enabling a proper write to the shrink sysfs file of the root cache to scan all the available memcg caches and shrink them as well. For a non-root memcg cache (when SLUB_MEMCG_SYSFS_ON or slub_memcg_sysfs is on), only that cache will be shrunk when written. On a 2-socket 64-core 256-thread arm64 system with 64k page after a parallel kernel build, the the amount of memory occupied by slabs before shrinking slabs were: # grep task_struct /proc/slabinfo task_struct 53137 53192 4288 61 4 : tunables 0 0 0 : slabdata 872 872 0 # grep "^S[lRU]" /proc/meminfo Slab: 3936832 kB SReclaimable: 399104 kB SUnreclaim: 3537728 kB After shrinking slabs (by echoing "1" to all shrink files): # grep "^S[lRU]" /proc/meminfo Slab: 1356288 kB SReclaimable: 263296 kB SUnreclaim: 1092992 kB # grep task_struct /proc/slabinfo task_struct 2764 6832 4288 61 4 : tunables 0 0 0 : slabdata 112 112 0 Link: http://lkml.kernel.org/r/20190723151445.7385-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Laura Abbott
|
1b7e816fc8 |
mm: slub: Fix slab walking for init_on_free
To properly clear the slab on free with slab_want_init_on_free, we walk
the list of free objects using get_freepointer/set_freepointer.
The value we get from get_freepointer may not be valid. This isn't an
issue since an actual value will get written later but this means
there's a chance of triggering a bug if we use this value with
set_freepointer:
kernel BUG at mm/slub.c:306!
invalid opcode: 0000 [#1] PREEMPT PTI
CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-05754-g6471384a #4
RIP: 0010:kfree+0x58a/0x5c0
Code: 48 83 05 78 37 51 02 01 0f 0b 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 d6 37 51 02 01 <0f> 0b 48 83 05 d4 37 51 02 01 48 83 05 d4 37 51 02 01 48 83 05 d4
RSP: 0000:ffffffff82603d90 EFLAGS: 00010002
RAX: ffff8c3976c04320 RBX: ffff8c3976c04300 RCX: 0000000000000000
RDX: ffff8c3976c04300 RSI: 0000000000000000 RDI: ffff8c3976c04320
RBP: ffffffff82603db8 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8c3976c04320 R11: ffffffff8289e1e0 R12: ffffd52cc8db0100
R13: ffff8c3976c01a00 R14: ffffffff810f10d4 R15: ffff8c3976c04300
FS: 0000000000000000(0000) GS:ffffffff8266b000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8c397ffff000 CR3: 0000000125020000 CR4: 00000000000406b0
Call Trace:
apply_wqattrs_prepare+0x154/0x280
apply_workqueue_attrs_locked+0x4e/0xe0
apply_workqueue_attrs+0x36/0x60
alloc_workqueue+0x25a/0x6d0
workqueue_init_early+0x246/0x348
start_kernel+0x3c7/0x7ec
x86_64_start_reservations+0x40/0x49
x86_64_start_kernel+0xda/0xe4
secondary_startup_64+0xb6/0xc0
Modules linked in:
---[ end trace f67eb9af4d8d492b ]---
Fix this by ensuring the value we set with set_freepointer is either NULL
or another value in the chain.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Fixes:
|
||
Alexander Potapenko
|
6471384af2 |
mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
Patch series "add init_on_alloc/init_on_free boot options", v10. Provide init_on_alloc and init_on_free boot options. These are aimed at preventing possible information leaks and making the control-flow bugs that depend on uninitialized values more deterministic. Enabling either of the options guarantees that the memory returned by the page allocator and SL[AU]B is initialized with zeroes. SLOB allocator isn't supported at the moment, as its emulation of kmem caches complicates handling of SLAB_TYPESAFE_BY_RCU caches correctly. Enabling init_on_free also guarantees that pages and heap objects are initialized right after they're freed, so it won't be possible to access stale data by using a dangling pointer. As suggested by Michal Hocko, right now we don't let the heap users to disable initialization for certain allocations. There's not enough evidence that doing so can speed up real-life cases, and introducing ways to opt-out may result in things going out of control. This patch (of 2): The new options are needed to prevent possible information leaks and make control-flow bugs that depend on uninitialized values more deterministic. This is expected to be on-by-default on Android and Chrome OS. And it gives the opportunity for anyone else to use it under distros too via the boot args. (The init_on_free feature is regularly requested by folks where memory forensics is included in their threat models.) init_on_alloc=1 makes the kernel initialize newly allocated pages and heap objects with zeroes. Initialization is done at allocation time at the places where checks for __GFP_ZERO are performed. init_on_free=1 makes the kernel initialize freed pages and heap objects with zeroes upon their deletion. This helps to ensure sensitive data doesn't leak via use-after-free accesses. Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator returns zeroed memory. The two exceptions are slab caches with constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never zero-initialized to preserve their semantics. Both init_on_alloc and init_on_free default to zero, but those defaults can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and CONFIG_INIT_ON_FREE_DEFAULT_ON. If either SLUB poisoning or page poisoning is enabled, those options take precedence over init_on_alloc and init_on_free: initialization is only applied to unpoisoned allocations. Slowdown for the new features compared to init_on_free=0, init_on_alloc=0: hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%) hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%) Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%) Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%) Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%) Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%) The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline is within the standard error. The new features are also going to pave the way for hardware memory tagging (e.g. arm64's MTE), which will require both on_alloc and on_free hooks to set the tags for heap objects. With MTE, tagging will have the same cost as memory initialization. Although init_on_free is rather costly, there are paranoid use-cases where in-memory data lifetime is desired to be minimized. There are various arguments for/against the realism of the associated threat models, but given that we'll need the infrastructure for MTE anyway, and there are people who want wipe-on-free behavior no matter what the performance cost, it seems reasonable to include it in this series. [glider@google.com: v8] Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com [glider@google.com: v9] Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com [glider@google.com: v10] Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.cz> [page and dmapool parts Acked-by: James Morris <jamorris@linux.microsoft.com>] Cc: Christoph Lameter <cl@linux.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Sandeep Patil <sspatil@android.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Roman Gushchin
|
6cea1d569d |
mm: memcg/slab: unify SLAB and SLUB page accounting
Currently the page accounting code is duplicated in SLAB and SLUB internals. Let's move it into new (un)charge_slab_page helpers in the slab_common.c file. These helpers will be responsible for statistics (global and memcg-aware) and memcg charging. So they are replacing direct memcg_(un)charge_slab() calls. Link: http://lkml.kernel.org/r/20190611231813.3148843-6-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Waiman Long <longman@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Roman Gushchin
|
4348669475 |
mm: memcg/slab: generalize postponed non-root kmem_cache deactivation
Currently SLUB uses a work scheduled after an RCU grace period to deactivate a non-root kmem_cache. This mechanism can be reused for kmem_caches release, but requires generalization for SLAB case. Introduce kmemcg_cache_deactivate() function, which calls allocator-specific __kmem_cache_deactivate() and schedules execution of __kmem_cache_deactivate_after_rcu() with all necessary locks in a worker context after an rcu grace period. Here is the new calling scheme: kmemcg_cache_deactivate() __kmemcg_cache_deactivate() SLAB/SLUB-specific kmemcg_rcufn() rcu kmemcg_workfn() work __kmemcg_cache_deactivate_after_rcu() SLAB/SLUB-specific instead of: __kmemcg_cache_deactivate() SLAB/SLUB-specific slab_deactivate_memcg_cache_rcu_sched() SLUB-only kmemcg_rcufn() rcu kmemcg_workfn() work kmemcg_cache_deact_after_rcu() SLUB-only For consistency, all allocator-specific functions start with "__". Link: http://lkml.kernel.org/r/20190611231813.3148843-4-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Waiman Long <longman@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Roman Gushchin
|
c03914b7aa |
mm: memcg/slab: postpone kmem_cache memcg pointer initialization to memcg_link_cache()
Patch series "mm: reparent slab memory on cgroup removal", v7. # Why do we need this? We've noticed that the number of dying cgroups is steadily growing on most of our hosts in production. The following investigation revealed an issue in the userspace memory reclaim code [1], accounting of kernel stacks [2], and also the main reason: slab objects. The underlying problem is quite simple: any page charged to a cgroup holds a reference to it, so the cgroup can't be reclaimed unless all charged pages are gone. If a slab object is actively used by other cgroups, it won't be reclaimed, and will prevent the origin cgroup from being reclaimed. Slab objects, and first of all vfs cache, is shared between cgroups, which are using the same underlying fs, and what's even more important, it's shared between multiple generations of the same workload. So if something is running periodically every time in a new cgroup (like how systemd works), we do accumulate multiple dying cgroups. Strictly speaking pagecache isn't different here, but there is a key difference: we disable protection and apply some extra pressure on LRUs of dying cgroups, and these LRUs contain all charged pages. My experiments show that with the disabled kernel memory accounting the number of dying cgroups stabilizes at a relatively small number (~100, depends on memory pressure and cgroup creation rate), and with kernel memory accounting it grows pretty steadily up to several thousands. Memory cgroups are quite complex and big objects (mostly due to percpu stats), so it leads to noticeable memory losses. Memory occupied by dying cgroups is measured in hundreds of megabytes. I've even seen a host with more than 100Gb of memory wasted for dying cgroups. It leads to a degradation of performance with the uptime, and generally limits the usage of cgroups. My previous attempt [3] to fix the problem by applying extra pressure on slab shrinker lists caused a regressions with xfs and ext4, and has been reverted [4]. The following attempts to find the right balance [5, 6] were not successful. So instead of trying to find a maybe non-existing balance, let's do reparent accounted slab caches to the parent cgroup on cgroup removal. # Implementation approach There is however a significant problem with reparenting of slab memory: there is no list of charged pages. Some of them are in shrinker lists, but not all. Introducing of a new list is really not an option. But fortunately there is a way forward: every slab page has a stable pointer to the corresponding kmem_cache. So the idea is to reparent kmem_caches instead of slab pages. It's actually simpler and cheaper, but requires some underlying changes: 1) Make kmem_caches to hold a single reference to the memory cgroup, instead of a separate reference per every slab page. 2) Stop setting page->mem_cgroup pointer for memcg slab pages and use page->kmem_cache->memcg indirection instead. It's used only on slab page release, so performance overhead shouldn't be a big issue. 3) Introduce a refcounter for non-root slab caches. It's required to be able to destroy kmem_caches when they become empty and release the associated memory cgroup. There is a bonus: currently we release all memcg kmem_caches all together with the memory cgroup itself. This patchset allows individual kmem_caches to be released as soon as they become inactive and free. Some additional implementation details are provided in corresponding commit messages. # Results Below is the average number of dying cgroups on two groups of our production hosts. They do run some sort of web frontend workload, the memory pressure is moderate. As we can see, with the kernel memory reparenting the number stabilizes in 60s range; however with the original version it grows almost linearly and doesn't show any signs of plateauing. The difference in slab and percpu usage between patched and unpatched versions also grows linearly. In 7 days it exceeded 200Mb. day 0 1 2 3 4 5 6 7 original 56 362 628 752 1070 1250 1490 1560 patched 23 46 51 55 60 57 67 69 mem diff(Mb) 22 74 123 152 164 182 214 241 # Links [1]: commit |
||
Marco Elver
|
10d1f8cb39 |
mm/slab: refactor common ksize KASAN logic into slab_common.c
This refactors common code of ksize() between the various allocators into slab_common.c: __ksize() is the allocator-specific implementation without instrumentation, whereas ksize() includes the required KASAN logic. Link: http://lkml.kernel.org/r/20190626142014.141844-5-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Shakeel Butt
|
cb097cd483 |
slub: don't panic for memcg kmem cache creation failure
Currently for CONFIG_SLUB, if a memcg kmem cache creation is failed and the corresponding root kmem cache has SLAB_PANIC flag, the kernel will be crashed. This is unnecessary as the kernel can handle the creation failures of memcg kmem caches. Additionally CONFIG_SLAB does not implement this behavior. So, to keep the behavior consistent between SLAB and SLUB, removing the panic for memcg kmem cache creation failures. The root kmem cache creation failure for SLAB_PANIC correctly panics for both SLAB and SLUB. Link: http://lkml.kernel.org/r/20190619232514.58994-1-shakeelb@google.com Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: Roman Gushchin <guro@fb.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Yury Norov
|
9cf3a8d847 |
mm/slub.c: avoid double string traverse in kmem_cache_flags()
If ',' is not found, kmem_cache_flags() calls strlen() to find the end of line. We can do it in a single pass using strchrnul(). Link: http://lkml.kernel.org/r/20190501053111.7950-1-ynorov@marvell.com Signed-off-by: Yury Norov <ynorov@marvell.com> Acked-by: Aaron Tomlin <atomlin@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Liu Xiang
|
632b2ef0c7 |
mm/slub.c: update the comment about slab frozen
Now frozen slab can only be on the per cpu partial list. Link: http://lkml.kernel.org/r/1554022325-11305-1-git-send-email-liu.xiang6@zte.com.cn Signed-off-by: Liu Xiang <liu.xiang6@zte.com.cn> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Liu Xiang
|
a4d3f8916c |
slub: remove useless kmem_cache_debug() before remove_full()
When CONFIG_SLUB_DEBUG is not enabled, remove_full() is empty. While CONFIG_SLUB_DEBUG is enabled, remove_full() can check s->flags by itself. So kmem_cache_debug() is useless and can be removed. Link: http://lkml.kernel.org/r/1552577313-2830-1-git-send-email-liu.xiang6@zte.com.cn Signed-off-by: Liu Xiang <liu.xiang6@zte.com.cn> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Tobin C. Harding
|
916ac05278 |
slub: use slab_list instead of lru
Currently we use the page->lru list for maintaining lists of slabs. We have a list in the page structure (slab_list) that can be used for this purpose. Doing so makes the code cleaner since we are not overloading the lru list. Use the slab_list instead of the lru list for maintaining lists of slabs. Link: http://lkml.kernel.org/r/20190402230545.2929-6-tobin@kernel.org Signed-off-by: Tobin C. Harding <tobin@kernel.org> Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Tobin C. Harding
|
6dfd1b653c |
slub: add comments to endif pre-processor macros
SLUB allocator makes heavy use of ifdef/endif pre-processor macros. The pairing of these statements is at times hard to follow e.g. if the pair are further than a screen apart or if there are nested pairs. We can reduce cognitive load by adding a comment to the endif statement of form #ifdef CONFIG_FOO ... #endif /* CONFIG_FOO */ Add comments to endif pre-processor macros if ifdef/endif pair is not immediately apparent. Link: http://lkml.kernel.org/r/20190402230545.2929-5-tobin@kernel.org Signed-off-by: Tobin C. Harding <tobin@kernel.org> Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Thomas Gleixner
|
7971679994 |
mm/slub: Simplify stack trace retrieval
Replace the indirection through struct stack_trace with an invocation of the storage array based interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: linux-mm@kvack.org Cc: David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Potapenko <glider@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: iommu@lists.linux-foundation.org Cc: Robin Murphy <robin.murphy@arm.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: linux-btrfs@vger.kernel.org Cc: dm-devel@redhat.com Cc: Mike Snitzer <snitzer@redhat.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: intel-gfx@lists.freedesktop.org Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: dri-devel@lists.freedesktop.org Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: linux-arch@vger.kernel.org Link: https://lkml.kernel.org/r/20190425094801.771410441@linutronix.de |
||
Thomas Gleixner
|
b8ca7ff773 |
mm/slub: Remove the ULONG_MAX stack trace hackery
No architecture terminates the stack trace with ULONG_MAX anymore. Remove the cruft. While at it remove the pointless loop of clearing the stack array completely. It's sufficient to clear the last entry as the consumers break out on the first zeroed entry anyway. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: linux-mm@kvack.org Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Link: https://lkml.kernel.org/r/20190410103644.574058244@linutronix.de |
||
Nicolas Boichat
|
6d6ea1e967 |
mm: add support for kmem caches in DMA32 zone
Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables",
v6.
This is a followup to the discussion in [1], [2].
IOMMUs using ARMv7 short-descriptor format require page tables (level 1
and 2) to be allocated within the first 4GB of RAM, even on 64-bit
systems.
For L1 tables that are bigger than a page, we can just use
__get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still
use GFP_DMA).
For L2 tables that only take 1KB, it would be a waste to allocate a full
page, so we considered 3 approaches:
1. This series, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2 page
tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable to reuse
freed fragments until the whole page is freed. [3]
This series is the most memory-efficient approach.
stable@ note:
We confirmed that this is a regression, and IOMMU errors happen on 4.19
and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
most likely starts from commit
|
||
Alexey Dobriyan
|
b9726c26dc |
numa: make "nr_node_ids" unsigned int
Number of NUMA nodes can't be negative. This saves a few bytes on x86_64: add/remove: 0/0 grow/shrink: 4/21 up/down: 27/-265 (-238) Function old new delta hv_synic_alloc.cold 88 110 +22 prealloc_shrinker 260 262 +2 bootstrap 249 251 +2 sched_init_numa 1566 1567 +1 show_slab_objects 778 777 -1 s_show 1201 1200 -1 kmem_cache_init 346 345 -1 __alloc_workqueue_key 1146 1145 -1 mem_cgroup_css_alloc 1614 1612 -2 __do_sys_swapon 4702 4699 -3 __list_lru_init 655 651 -4 nic_probe 2379 2374 -5 store_user_store 118 111 -7 red_zone_store 106 99 -7 poison_store 106 99 -7 wq_numa_init 348 338 -10 __kmem_cache_empty 75 65 -10 task_numa_free 186 173 -13 merge_across_nodes_store 351 336 -15 irq_create_affinity_masks 1261 1246 -15 do_numa_crng_init 343 321 -22 task_numa_fault 4760 4737 -23 swapfile_init 179 156 -23 hv_synic_alloc 536 492 -44 apply_wqattrs_prepare 746 695 -51 Link: http://lkml.kernel.org/r/20190201223029.GA15820@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Wei Yang
|
8bb4e7a2ee |
mm: fix some typos in mm directory
No functional change. Link: http://lkml.kernel.org/r/20190118235123.27843-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Wei Yang
|
9234bae9b2 |
mm, slub: make the comment of put_cpu_partial() complete
There are two cases when put_cpu_partial() is invoked. * __slab_free * get_partial_node This patch just makes it cover these two cases. Link: http://lkml.kernel.org/r/20181025094437.18951-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Qian Cai
|
278d7756df |
mm/slub.c: remove an unused addr argument
"addr" function argument is not used in alloc_consistency_checks() at
all, so remove it.
Link: http://lkml.kernel.org/r/20190211123214.35592-1-cai@lca.pw
Fixes:
|
||
Peng Wang
|
edde82b6df |
mm/slub.c: freelist is ensured to be NULL when new_slab() fails
new_slab_objects() will return immediately if freelist is not NULL. if (freelist) return freelist; One more assignment operation could be avoided. Link: http://lkml.kernel.org/r/20181229062512.30469-1-rocking@whu.edu.cn Signed-off-by: Peng Wang <rocking@whu.edu.cn> Reviewed-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Qian Cai
|
6373dca16c |
slub: fix a crash with SLUB_DEBUG + KASAN_SW_TAGS
In process_slab(), "p = get_freepointer()" could return a tagged pointer, but "addr = page_address()" always return a native pointer. As the result, slab_index() is messed up here, return (p - addr) / s->size; All other callers of slab_index() have the same situation where "addr" is from page_address(), so just need to untag "p". # cat /sys/kernel/slab/hugetlbfs_inode_cache/alloc_calls Unable to handle kernel paging request at virtual address 2bff808aa4856d48 Mem abort info: ESR = 0x96000007 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000007 CM = 0, WnR = 0 swapper pgtable: 64k pages, 48-bit VAs, pgdp = 0000000002498338 [2bff808aa4856d48] pgd=00000097fcfd0003, pud=00000097fcfd0003, pmd=00000097fca30003, pte=00e8008b24850712 Internal error: Oops: 96000007 [#1] SMP CPU: 3 PID: 79210 Comm: read_all Tainted: G L 5.0.0-rc7+ #84 Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.0.6 07/10/2018 pstate: 00400089 (nzcv daIf +PAN -UAO) pc : get_map+0x78/0xec lr : get_map+0xa0/0xec sp : aeff808989e3f8e0 x29: aeff808989e3f940 x28: ffff800826200000 x27: ffff100012d47000 x26: 9700000000002500 x25: 0000000000000001 x24: 52ff8008200131f8 x23: 52ff8008200130a0 x22: 52ff800820013098 x21: ffff800826200000 x20: ffff100013172ba0 x19: 2bff808a8971bc00 x18: ffff1000148f5538 x17: 000000000000001b x16: 00000000000000ff x15: ffff1000148f5000 x14: 00000000000000d2 x13: 0000000000000001 x12: 0000000000000000 x11: 0000000020000002 x10: 2bff808aa4856d48 x9 : 0000020000000000 x8 : 68ff80082620ebb0 x7 : 0000000000000000 x6 : ffff1000105da1dc x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000010 x2 : 2bff808a8971bc00 x1 : ffff7fe002098800 x0 : ffff80082620ceb0 Process read_all (pid: 79210, stack limit = 0x00000000f65b9361) Call trace: get_map+0x78/0xec process_slab+0x7c/0x47c list_locations+0xb0/0x3c8 alloc_calls_show+0x34/0x40 slab_attr_show+0x34/0x48 sysfs_kf_seq_show+0x2e4/0x570 kernfs_seq_show+0x12c/0x1a0 seq_read+0x48c/0xf84 kernfs_fop_read+0xd4/0x448 __vfs_read+0x94/0x5d4 vfs_read+0xcc/0x194 ksys_read+0x6c/0xe8 __arm64_sys_read+0x68/0xb0 el0_svc_handler+0x230/0x3bc el0_svc+0x8/0xc Code: d3467d2a 9ac92329 8b0a0e6a f9800151 (c85f7d4b) ---[ end trace a383a9a44ff13176 ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs SMP: failed to stop secondary CPUs 1-7,32,40,127 Kernel Offset: disabled CPU features: 0x002,20000c18 Memory Limit: none ---[ end Kernel panic - not syncing: Fatal exception ]--- Link: http://lkml.kernel.org/r/20190220020251.82039-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Qian Cai
|
338cfaad49 |
slub: fix SLAB_CONSISTENCY_CHECKS + KASAN_SW_TAGS
Enabling SLUB_DEBUG's SLAB_CONSISTENCY_CHECKS with KASAN_SW_TAGS triggers endless false positives during boot below due to check_valid_pointer() checks tagged pointers which have no addresses that is valid within slab pages: BUG radix_tree_node (Tainted: G B ): Freelist Pointer check fails ----------------------------------------------------------------------------- INFO: Slab objects=69 used=69 fp=0x (null) flags=0x7ffffffc000200 INFO: Object @offset=15060037153926966016 fp=0x Redzone: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 18 6b 06 00 08 80 ff d0 .........k...... Object : 18 6b 06 00 08 80 ff d0 00 00 00 00 00 00 00 00 .k.............. Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Redzone: bb bb bb bb bb bb bb bb ........ Padding: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B 5.0.0-rc5+ #18 Call trace: dump_backtrace+0x0/0x450 show_stack+0x20/0x2c __dump_stack+0x20/0x28 dump_stack+0xa0/0xfc print_trailer+0x1bc/0x1d0 object_err+0x40/0x50 alloc_debug_processing+0xf0/0x19c ___slab_alloc+0x554/0x704 kmem_cache_alloc+0x2f8/0x440 radix_tree_node_alloc+0x90/0x2fc idr_get_free+0x1e8/0x6d0 idr_alloc_u32+0x11c/0x2a4 idr_alloc+0x74/0xe0 worker_pool_assign_id+0x5c/0xbc workqueue_init_early+0x49c/0xd50 start_kernel+0x52c/0xac4 FIX radix_tree_node: Marking all objects used Link: http://lkml.kernel.org/r/20190209044128.3290-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
d36a63a943 |
kasan, slub: fix more conflicts with CONFIG_SLAB_FREELIST_HARDENED
When CONFIG_KASAN_SW_TAGS is enabled, ptr_addr might be tagged. Normally, this doesn't cause any issues, as both set_freepointer() and get_freepointer() are called with a pointer with the same tag. However, there are some issues with CONFIG_SLUB_DEBUG code. For example, when __free_slub() iterates over objects in a cache, it passes untagged pointers to check_object(). check_object() in turns calls get_freepointer() with an untagged pointer, which causes the freepointer to be restored incorrectly. Add kasan_reset_tag to freelist_ptr(). Also add a detailed comment. Link: http://lkml.kernel.org/r/bf858f26ef32eb7bd24c665755b3aee4bc58d0e4.1550103861.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: Qian Cai <cai@lca.pw> Tested-by: Qian Cai <cai@lca.pw> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
18e5066102 |
kasan, slub: fix conflicts with CONFIG_SLAB_FREELIST_HARDENED
CONFIG_SLAB_FREELIST_HARDENED hashes freelist pointer with the address of the object where the pointer gets stored. With tag based KASAN we don't account for that when building freelist, as we call set_freepointer() with the first argument untagged. This patch changes the code to properly propagate tags throughout the loop. Link: http://lkml.kernel.org/r/3df171559c52201376f246bf7ce3184fe21c1dc7.1549921721.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: Qian Cai <cai@lca.pw> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Evgeniy Stepanov <eugenis@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
a710122428 |
kasan, slub: move kasan_poison_slab hook before page_address
With tag based KASAN page_address() looks at the page flags to see whether the resulting pointer needs to have a tag set. Since we don't want to set a tag when page_address() is called on SLAB pages, we call page_kasan_tag_reset() in kasan_poison_slab(). However in allocate_slab() page_address() is called before kasan_poison_slab(). Fix it by changing the order. [andreyknvl@google.com: fix compilation error when CONFIG_SLUB_DEBUG=n] Link: http://lkml.kernel.org/r/ac27cc0bbaeb414ed77bcd6671a877cf3546d56e.1550066133.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/cd895d627465a3f1c712647072d17f10883be2a1.1549921721.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgeniy Stepanov <eugenis@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Qian Cai <cai@lca.pw> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
a2f775751d |
kmemleak: account for tagged pointers when calculating pointer range
kmemleak keeps two global variables, min_addr and max_addr, which store the range of valid (encountered by kmemleak) pointer values, which it later uses to speed up pointer lookup when scanning blocks. With tagged pointers this range will get bigger than it needs to be. This patch makes kmemleak untag pointers before saving them to min_addr and max_addr and when performing a lookup. Link: http://lkml.kernel.org/r/16e887d442986ab87fe87a755815ad92fa431a5f.1550066133.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Qian Cai <cai@lca.pw> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgeniy Stepanov <eugenis@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
53128245b4 |
kasan, kmemleak: pass tagged pointers to kmemleak
Right now we call kmemleak hooks before assigning tags to pointers in KASAN hooks. As a result, when an objects gets allocated, kmemleak sees a differently tagged pointer, compared to the one it sees when the object gets freed. Fix it by calling KASAN hooks before kmemleak's ones. Link: http://lkml.kernel.org/r/cd825aa4897b0fc37d3316838993881daccbe9f5.1549921721.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: Qian Cai <cai@lca.pw> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgeniy Stepanov <eugenis@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
96fedce27e |
kasan: make tag based mode work with CONFIG_HARDENED_USERCOPY
With CONFIG_HARDENED_USERCOPY enabled __check_heap_object() compares and then subtracts a potentially tagged pointer with a non-tagged address of the page that this pointer belongs to, which leads to unexpected behavior. Untag the pointer in __check_heap_object() before doing any of these operations. Link: http://lkml.kernel.org/r/7e756a298d514c4482f52aea6151db34818d395d.1546540962.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |