linux/lib/stackdepot.c
Vlastimil Babka 2dba5eb1c7 lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be
allocated from memblock, even if stack depot ends up not actually used.
The default size of stack_table is 4MB on 32-bit, 8MB on 64-bit.

This is fine for use-cases such as KASAN which is also a config option
and has overhead on its own.  But it's an issue for functionality that
has to be actually enabled on boot (page_owner) or depends on hardware
(GPU drivers) and thus the memory might be wasted.  This was raised as
an issue [1] when attempting to add stackdepot support for SLUB's debug
object tracking functionality.  It's common to build kernels with
CONFIG_SLUB_DEBUG and enable slub_debug on boot only when needed, or
create only specific kmem caches with debugging for testing purposes.

It would thus be more efficient if stackdepot's table was allocated only
when actually going to be used.  This patch thus makes the allocation
(and whole stack_depot_init() call) optional:

 - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
   well-defined point of allocation as part of mem_init(). Make
   CONFIG_KASAN select this flag.

 - Other users have to call stack_depot_init() as part of their own init
   when it's determined that stack depot will actually be used. This may
   depend on both config and runtime conditions. Convert current users
   which are page_owner and several in the DRM subsystem. Same will be
   done for SLUB later.

 - Because the init might now be called after the boot-time memblock
   allocation has given all memory to the buddy allocator, change
   stack_depot_init() to allocate stack_table with kvmalloc() when
   memblock is no longer available. Also handle allocation failure by
   disabling stackdepot (could have theoretically happened even with
   memblock allocation previously), and don't unnecessarily align the
   memblock allocation to its own size anymore.

[1] https://lore.kernel.org/all/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/

Link: https://lkml.kernel.org/r/20211013073005.11351-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com> # stackdepot
Cc: Marco Elver <elver@google.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
From: Colin Ian King <colin.king@canonical.com>
Subject: lib/stackdepot: fix spelling mistake and grammar in pr_err message

There is a spelling mistake of the work allocation so fix this and
re-phrase the message to make it easier to read.

Link: https://lkml.kernel.org/r/20211015104159.11282-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
From: Vlastimil Babka <vbabka@suse.cz>
Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup

On FLATMEM, we call page_ext_init_flatmem_late() just before
kmem_cache_init() which means stack_depot_init() (called by page owner
init) will not recognize properly it should use kvmalloc() and not
memblock_alloc().  memblock_alloc() will also not issue a warning and
return a block memory that can be invalid and cause kernel page fault when
saving stacks, as reported by the kernel test robot [1].

Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so
that slab_is_available() is true during stack_depot_init().  SPARSEMEM
doesn't have this issue, as it doesn't do page_ext_init_flatmem_late(),
but a different page_ext_init() even later in the boot process.

Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue.

While at it, also actually resolve a checkpatch warning in stack_depot_init()
from DRM CI, which was supposed to be in the original patch already.

[1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/

Link: https://lkml.kernel.org/r/6abd9213-19a9-6d58-cedc-2414386d2d81@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
From: Vlastimil Babka <vbabka@suse.cz>
Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup3

Due to cd06ab2fd4 ("drm/locking: add backtrace for locking contended
locks without backoff") landing recently to -next adding a new stack depot
user in drivers/gpu/drm/drm_modeset_lock.c we need to add an appropriate
call to stack_depot_init() there as well.

Link: https://lkml.kernel.org/r/2a692365-cfa1-64f2-34e0-8aa5674dce5e@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Marco Elver <elver@google.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
From: Vlastimil Babka <vbabka@suse.cz>
Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup4

Due to 4e66934eaa ("lib: add reference counting tracking
infrastructure") landing recently to net-next adding a new stack depot
user in lib/ref_tracker.c we need to add an appropriate call to
stack_depot_init() there as well.

Link: https://lkml.kernel.org/r/45c1b738-1a2f-5b5f-2f6d-86fab206d01c@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Slab <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:37 +02:00

442 lines
13 KiB
C

// SPDX-License-Identifier: GPL-2.0-only
/*
* Generic stack depot for storing stack traces.
*
* Some debugging tools need to save stack traces of certain events which can
* be later presented to the user. For example, KASAN needs to safe alloc and
* free stacks for each object, but storing two stack traces per object
* requires too much memory (e.g. SLUB_DEBUG needs 256 bytes per object for
* that).
*
* Instead, stack depot maintains a hashtable of unique stacktraces. Since alloc
* and free stacks repeat a lot, we save about 100x space.
* Stacks are never removed from depot, so we store them contiguously one after
* another in a contiguous memory allocation.
*
* Author: Alexander Potapenko <glider@google.com>
* Copyright (C) 2016 Google, Inc.
*
* Based on code by Dmitry Chernenkov.
*/
#include <linux/gfp.h>
#include <linux/jhash.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/mutex.h>
#include <linux/percpu.h>
#include <linux/printk.h>
#include <linux/slab.h>
#include <linux/stacktrace.h>
#include <linux/stackdepot.h>
#include <linux/string.h>
#include <linux/types.h>
#include <linux/memblock.h>
#define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
#define STACK_ALLOC_NULL_PROTECTION_BITS 1
#define STACK_ALLOC_ORDER 2 /* 'Slab' size order for stack depot, 4 pages */
#define STACK_ALLOC_SIZE (1LL << (PAGE_SHIFT + STACK_ALLOC_ORDER))
#define STACK_ALLOC_ALIGN 4
#define STACK_ALLOC_OFFSET_BITS (STACK_ALLOC_ORDER + PAGE_SHIFT - \
STACK_ALLOC_ALIGN)
#define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
#define STACK_ALLOC_SLABS_CAP 8192
#define STACK_ALLOC_MAX_SLABS \
(((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
(1LL << (STACK_ALLOC_INDEX_BITS)) : STACK_ALLOC_SLABS_CAP)
/* The compact structure to store the reference to stacks. */
union handle_parts {
depot_stack_handle_t handle;
struct {
u32 slabindex : STACK_ALLOC_INDEX_BITS;
u32 offset : STACK_ALLOC_OFFSET_BITS;
u32 valid : STACK_ALLOC_NULL_PROTECTION_BITS;
};
};
struct stack_record {
struct stack_record *next; /* Link in the hashtable */
u32 hash; /* Hash in the hastable */
u32 size; /* Number of frames in the stack */
union handle_parts handle;
unsigned long entries[]; /* Variable-sized array of entries. */
};
static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
static int depot_index;
static int next_slab_inited;
static size_t depot_offset;
static DEFINE_RAW_SPINLOCK(depot_lock);
static bool init_stack_slab(void **prealloc)
{
if (!*prealloc)
return false;
/*
* This smp_load_acquire() pairs with smp_store_release() to
* |next_slab_inited| below and in depot_alloc_stack().
*/
if (smp_load_acquire(&next_slab_inited))
return true;
if (stack_slabs[depot_index] == NULL) {
stack_slabs[depot_index] = *prealloc;
*prealloc = NULL;
} else {
/* If this is the last depot slab, do not touch the next one. */
if (depot_index + 1 < STACK_ALLOC_MAX_SLABS) {
stack_slabs[depot_index + 1] = *prealloc;
*prealloc = NULL;
}
/*
* This smp_store_release pairs with smp_load_acquire() from
* |next_slab_inited| above and in stack_depot_save().
*/
smp_store_release(&next_slab_inited, 1);
}
return true;
}
/* Allocation of a new stack in raw storage */
static struct stack_record *
depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
{
struct stack_record *stack;
size_t required_size = struct_size(stack, entries, size);
required_size = ALIGN(required_size, 1 << STACK_ALLOC_ALIGN);
if (unlikely(depot_offset + required_size > STACK_ALLOC_SIZE)) {
if (unlikely(depot_index + 1 >= STACK_ALLOC_MAX_SLABS)) {
WARN_ONCE(1, "Stack depot reached limit capacity");
return NULL;
}
depot_index++;
depot_offset = 0;
/*
* smp_store_release() here pairs with smp_load_acquire() from
* |next_slab_inited| in stack_depot_save() and
* init_stack_slab().
*/
if (depot_index + 1 < STACK_ALLOC_MAX_SLABS)
smp_store_release(&next_slab_inited, 0);
}
init_stack_slab(prealloc);
if (stack_slabs[depot_index] == NULL)
return NULL;
stack = stack_slabs[depot_index] + depot_offset;
stack->hash = hash;
stack->size = size;
stack->handle.slabindex = depot_index;
stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
stack->handle.valid = 1;
memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
depot_offset += required_size;
return stack;
}
#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
#define STACK_HASH_SEED 0x9747b28c
static bool stack_depot_disable;
static struct stack_record **stack_table;
static int __init is_stack_depot_disabled(char *str)
{
int ret;
ret = kstrtobool(str, &stack_depot_disable);
if (!ret && stack_depot_disable) {
pr_info("Stack Depot is disabled\n");
stack_table = NULL;
}
return 0;
}
early_param("stack_depot_disable", is_stack_depot_disabled);
/*
* __ref because of memblock_alloc(), which will not be actually called after
* the __init code is gone, because at that point slab_is_available() is true
*/
__ref int stack_depot_init(void)
{
static DEFINE_MUTEX(stack_depot_init_mutex);
mutex_lock(&stack_depot_init_mutex);
if (!stack_depot_disable && !stack_table) {
size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
int i;
if (slab_is_available()) {
pr_info("Stack Depot allocating hash table with kvmalloc\n");
stack_table = kvmalloc(size, GFP_KERNEL);
} else {
pr_info("Stack Depot allocating hash table with memblock_alloc\n");
stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
}
if (stack_table) {
for (i = 0; i < STACK_HASH_SIZE; i++)
stack_table[i] = NULL;
} else {
pr_err("Stack Depot hash table allocation failed, disabling\n");
stack_depot_disable = true;
mutex_unlock(&stack_depot_init_mutex);
return -ENOMEM;
}
}
mutex_unlock(&stack_depot_init_mutex);
return 0;
}
EXPORT_SYMBOL_GPL(stack_depot_init);
/* Calculate hash for a stack */
static inline u32 hash_stack(unsigned long *entries, unsigned int size)
{
return jhash2((u32 *)entries,
array_size(size, sizeof(*entries)) / sizeof(u32),
STACK_HASH_SEED);
}
/* Use our own, non-instrumented version of memcmp().
*
* We actually don't care about the order, just the equality.
*/
static inline
int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
unsigned int n)
{
for ( ; n-- ; u1++, u2++) {
if (*u1 != *u2)
return 1;
}
return 0;
}
/* Find a stack that is equal to the one stored in entries in the hash */
static inline struct stack_record *find_stack(struct stack_record *bucket,
unsigned long *entries, int size,
u32 hash)
{
struct stack_record *found;
for (found = bucket; found; found = found->next) {
if (found->hash == hash &&
found->size == size &&
!stackdepot_memcmp(entries, found->entries, size))
return found;
}
return NULL;
}
/**
* stack_depot_snprint - print stack entries from a depot into a buffer
*
* @handle: Stack depot handle which was returned from
* stack_depot_save().
* @buf: Pointer to the print buffer
*
* @size: Size of the print buffer
*
* @spaces: Number of leading spaces to print
*
* Return: Number of bytes printed.
*/
int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
int spaces)
{
unsigned long *entries;
unsigned int nr_entries;
nr_entries = stack_depot_fetch(handle, &entries);
return nr_entries ? stack_trace_snprint(buf, size, entries, nr_entries,
spaces) : 0;
}
EXPORT_SYMBOL_GPL(stack_depot_snprint);
/**
* stack_depot_print - print stack entries from a depot
*
* @stack: Stack depot handle which was returned from
* stack_depot_save().
*
*/
void stack_depot_print(depot_stack_handle_t stack)
{
unsigned long *entries;
unsigned int nr_entries;
nr_entries = stack_depot_fetch(stack, &entries);
if (nr_entries > 0)
stack_trace_print(entries, nr_entries, 0);
}
EXPORT_SYMBOL_GPL(stack_depot_print);
/**
* stack_depot_fetch - Fetch stack entries from a depot
*
* @handle: Stack depot handle which was returned from
* stack_depot_save().
* @entries: Pointer to store the entries address
*
* Return: The number of trace entries for this depot.
*/
unsigned int stack_depot_fetch(depot_stack_handle_t handle,
unsigned long **entries)
{
union handle_parts parts = { .handle = handle };
void *slab;
size_t offset = parts.offset << STACK_ALLOC_ALIGN;
struct stack_record *stack;
*entries = NULL;
if (!handle)
return 0;
if (parts.slabindex > depot_index) {
WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
parts.slabindex, depot_index, handle);
return 0;
}
slab = stack_slabs[parts.slabindex];
if (!slab)
return 0;
stack = slab + offset;
*entries = stack->entries;
return stack->size;
}
EXPORT_SYMBOL_GPL(stack_depot_fetch);
/**
* __stack_depot_save - Save a stack trace from an array
*
* @entries: Pointer to storage array
* @nr_entries: Size of the storage array
* @alloc_flags: Allocation gfp flags
* @can_alloc: Allocate stack slabs (increased chance of failure if false)
*
* Saves a stack trace from @entries array of size @nr_entries. If @can_alloc is
* %true, is allowed to replenish the stack slab pool in case no space is left
* (allocates using GFP flags of @alloc_flags). If @can_alloc is %false, avoids
* any allocations and will fail if no space is left to store the stack trace.
*
* Context: Any context, but setting @can_alloc to %false is required if
* alloc_pages() cannot be used from the current context. Currently
* this is the case from contexts where neither %GFP_ATOMIC nor
* %GFP_NOWAIT can be used (NMI, raw_spin_lock).
*
* Return: The handle of the stack struct stored in depot, 0 on failure.
*/
depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags, bool can_alloc)
{
struct stack_record *found = NULL, **bucket;
depot_stack_handle_t retval = 0;
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
u32 hash;
if (unlikely(nr_entries == 0) || stack_depot_disable)
goto fast_exit;
hash = hash_stack(entries, nr_entries);
bucket = &stack_table[hash & STACK_HASH_MASK];
/*
* Fast path: look the stack trace up without locking.
* The smp_load_acquire() here pairs with smp_store_release() to
* |bucket| below.
*/
found = find_stack(smp_load_acquire(bucket), entries,
nr_entries, hash);
if (found)
goto exit;
/*
* Check if the current or the next stack slab need to be initialized.
* If so, allocate the memory - we won't be able to do that under the
* lock.
*
* The smp_load_acquire() here pairs with smp_store_release() to
* |next_slab_inited| in depot_alloc_stack() and init_stack_slab().
*/
if (unlikely(can_alloc && !smp_load_acquire(&next_slab_inited))) {
/*
* Zero out zone modifiers, as we don't have specific zone
* requirements. Keep the flags related to allocation in atomic
* contexts and I/O.
*/
alloc_flags &= ~GFP_ZONEMASK;
alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
alloc_flags |= __GFP_NOWARN;
page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
if (page)
prealloc = page_address(page);
}
raw_spin_lock_irqsave(&depot_lock, flags);
found = find_stack(*bucket, entries, nr_entries, hash);
if (!found) {
struct stack_record *new = depot_alloc_stack(entries, nr_entries, hash, &prealloc);
if (new) {
new->next = *bucket;
/*
* This smp_store_release() pairs with
* smp_load_acquire() from |bucket| above.
*/
smp_store_release(bucket, new);
found = new;
}
} else if (prealloc) {
/*
* We didn't need to store this stack trace, but let's keep
* the preallocated memory for the future.
*/
WARN_ON(!init_stack_slab(&prealloc));
}
raw_spin_unlock_irqrestore(&depot_lock, flags);
exit:
if (prealloc) {
/* Nobody used this memory, ok to free it. */
free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
}
if (found)
retval = found->handle.handle;
fast_exit:
return retval;
}
EXPORT_SYMBOL_GPL(__stack_depot_save);
/**
* stack_depot_save - Save a stack trace from an array
*
* @entries: Pointer to storage array
* @nr_entries: Size of the storage array
* @alloc_flags: Allocation gfp flags
*
* Context: Contexts where allocations via alloc_pages() are allowed.
* See __stack_depot_save() for more details.
*
* Return: The handle of the stack struct stored in depot, 0 on failure.
*/
depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags)
{
return __stack_depot_save(entries, nr_entries, alloc_flags, true);
}
EXPORT_SYMBOL_GPL(stack_depot_save);