linux/kernel/rcu/Kconfig.debug

172 lines
6.0 KiB
Plaintext
Raw Normal View History

# SPDX-License-Identifier: GPL-2.0-only
#
# RCU-related debugging configuration options
#
menu "RCU Debugging"
config PROVE_RCU
def_bool PROVE_LOCKING
config PROVE_RCU_LIST
bool "RCU list lockdep debugging"
depends on PROVE_RCU && RCU_EXPERT
default n
help
Enable RCU lockdep checking for list usages. By default it is
turned off since there are several list RCU users that still
need to be converted to pass a lockdep expression. To prevent
false-positive splats, we keep it default disabled but once all
users are converted, we can remove this config option.
config TORTURE_TEST
tristate
default n
config RCU_SCALE_TEST
tristate "performance tests for RCU"
depends on DEBUG_KERNEL
select TORTURE_TEST
default n
help
This option provides a kernel module that runs performance
tests on the RCU infrastructure. The kernel module may be built
after the fact on the running kernel to be tested, if desired.
Say Y here if you want RCU performance tests to be built into
the kernel.
Say M if you want the RCU performance tests to build as a module.
Say N if you are unsure.
config RCU_TORTURE_TEST
tristate "torture tests for RCU"
depends on DEBUG_KERNEL
select TORTURE_TEST
default n
help
This option provides a kernel module that runs torture tests
on the RCU infrastructure. The kernel module may be built
after the fact on the running kernel to be tested, if desired.
Say Y here if you want RCU torture tests to be built into
the kernel.
Say M if you want the RCU torture tests to build as a module.
Say N if you are unsure.
config RCU_REF_SCALE_TEST
tristate "Scalability tests for read-side synchronization (RCU and others)"
refperf: Add a test to measure performance of read-side synchronization Add a test for comparing the performance of RCU with various read-side synchronization mechanisms. The test has proved useful for collecting data and performing these comparisons. Currently RCU, SRCU, reader-writer lock, reader-writer semaphore and reference counting can be measured using refperf.perf_type parameter. Each invocation of the test runs measures performance of a specific mechanism. The maximum number of CPUs to concurrently run readers on is chosen by the test itself and is 75% of the total number of CPUs. So if you had 24 CPUs, the test runs with a maximum of 18 parallel readers. A number of experiments are conducted, and in each experiment, the number of readers is increased by 1, upto the 75% of CPUs mark. During each experiment, all readers execute an empty loop with refperf.loops iterations and time the total loop duration. This is then averaged. Example output: Parameters "refperf.perf_type=srcu refperf.loops=2000000" looks like: [ 3.347133] srcu-ref-perf: [ 3.347133] Threads Time(ns) [ 3.347133] 1 36 [ 3.347133] 2 34 [ 3.347133] 3 34 [ 3.347133] 4 34 [ 3.347133] 5 33 [ 3.347133] 6 33 [ 3.347133] 7 33 [ 3.347133] 8 33 [ 3.347133] 9 33 [ 3.347133] 10 33 [ 3.347133] 11 33 [ 3.347133] 12 33 [ 3.347133] 13 33 [ 3.347133] 14 33 [ 3.347133] 15 32 [ 3.347133] 16 33 [ 3.347133] 17 33 [ 3.347133] 18 34 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-25 12:36:48 +08:00
depends on DEBUG_KERNEL
select TORTURE_TEST
default n
help
This option provides a kernel module that runs performance tests
useful comparing RCU with various read-side synchronization mechanisms.
The kernel module may be built after the fact on the running kernel to be
tested, if desired.
Say Y here if you want these performance tests built into the kernel.
Say M if you want to build it as a module instead.
Say N if you are unsure.
config RCU_CPU_STALL_TIMEOUT
int "RCU CPU stall timeout in seconds"
depends on RCU_STALL_COMMON
range 3 300
default 21
help
If a given RCU grace period extends more than the specified
number of seconds, a CPU stall warning is printed. If the
RCU grace period persists, additional CPU stall warnings are
printed at more widely spaced intervals.
config RCU_EXP_CPU_STALL_TIMEOUT
int "Expedited RCU CPU stall timeout in milliseconds"
depends on RCU_STALL_COMMON
range 0 300000
default 0
help
If a given expedited RCU grace period extends more than the
specified number of milliseconds, a CPU stall warning is printed.
If the RCU grace period persists, additional CPU stall warnings
are printed at more widely spaced intervals. A value of zero
says to use the RCU_CPU_STALL_TIMEOUT value converted from
seconds to milliseconds.
rcu: Add RCU stall diagnosis information Because RCU CPU stall warnings are driven from the scheduling-clock interrupt handler, a workload consisting of a very large number of short-duration hardware interrupts can result in misleading stall-warning messages. On systems supporting only a single level of interrupts, that is, where interrupts handlers cannot be interrupted, this can produce misleading diagnostics. The stack traces will show the innocent-bystander interrupted task, not the interrupts that are at the very least exacerbating the stall. This situation can be improved by displaying the number of interrupts and the CPU time that they have consumed. Diagnosing other types of stalls can be eased by also providing the count of softirqs and the CPU time that they consumed as well as the number of context switches and the task-level CPU time consumed. Consider the following output given this change: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (1250 ticks this GP) <omitted> rcu: hardirqs softirqs csw/system rcu: number: 624 45 0 rcu: cputime: 69 1 2425 ==> 2500(ms) This output shows that the number of hard and soft interrupts is small, there are no context switches, and the system takes up a lot of time. This indicates that the current task is looping with preemption disabled. The impact on system performance is negligible because snapshot is recorded only once for all continuous RCU stalls. This added debugging information is suppressed by default and can be enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or by booting with rcupdate.rcu_cpu_stall_cputime=1. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-19 17:25:06 +08:00
config RCU_CPU_STALL_CPUTIME
bool "Provide additional RCU stall debug information"
depends on RCU_STALL_COMMON
default n
help
Collect statistics during the sampling period, such as the number of
(hard interrupts, soft interrupts, task switches) and the cputime of
(hard interrupts, soft interrupts, kernel tasks) are added to the
RCU stall report. For multiple continuous RCU stalls, all sampling
periods begin at half of the first RCU stall timeout.
The boot option rcupdate.rcu_cpu_stall_cputime has the same function
as this one, but will override this if it exists.
rcu: Restrict access to RCU CPU stall notifiers Although the RCU CPU stall notifiers can be useful for dumping state when tracking down delicate forward-progress bugs where NUMA effects cause cache lines to be delivered to a given CPU regularly, but always in a state that prevents that CPU from making forward progress. These bugs can be detected by the RCU CPU stall-warning mechanism, but in some cases, the stall-warnings printk()s disrupt the forward-progress bug before any useful state can be obtained. Unfortunately, the notifier mechanism added by commit 5b404fdabacf ("rcu: Add RCU CPU stall notifier") can make matters worse if used at all carelessly. For example, if the stall warning was caused by a lock not being released, then any attempt to acquire that lock in the notifier will hang. This will prevent not only the notifier from producing any useful output, but it will also prevent the stall-warning message from ever appearing. This commit therefore hides this new RCU CPU stall notifier mechanism under a new RCU_CPU_STALL_NOTIFIER Kconfig option that depends on both DEBUG_KERNEL and RCU_EXPERT. In addition, the rcupdate.rcu_cpu_stall_notifiers=1 kernel boot parameter must also be specified. The RCU_CPU_STALL_NOTIFIER Kconfig option's help text contains a warning and explains the dangers of careless use, recommending lockless notifier code. In addition, a WARN() is triggered each time that an attempt is made to register a stall-warning notifier in kernels built with CONFIG_RCU_CPU_STALL_NOTIFIER=y. This combination of measures will keep use of this mechanism confined to debug kernels and away from routine deployments. [ paulmck: Apply Dan Carpenter feedback. ] Fixes: 5b404fdabacf ("rcu: Add RCU CPU stall notifier") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
2023-11-02 09:28:38 +08:00
config RCU_CPU_STALL_NOTIFIER
bool "Provide RCU CPU-stall notifiers"
depends on RCU_STALL_COMMON
depends on DEBUG_KERNEL
depends on RCU_EXPERT
default n
help
WARNING: You almost certainly do not want this!!!
Enable RCU CPU-stall notifiers, which are invoked just before
printing the RCU CPU stall warning. As such, bugs in notifier
callbacks can prevent stall warnings from being printed.
And the whole reason that a stall warning is being printed is
that something is hung up somewhere. Therefore, the notifier
callbacks must be written extremely carefully, preferably
containing only lockless code. After all, it is quite possible
that the whole reason that the RCU CPU stall is happening in
the first place is that someone forgot to release whatever lock
that you are thinking of acquiring. In which case, having your
notifier callback acquire that lock will hang, preventing the
RCU CPU stall warning from appearing.
Say Y here if you want RCU CPU stall notifiers (you don't want them)
Say N if you are unsure.
config RCU_TRACE
bool "Enable tracing for RCU"
depends on DEBUG_KERNEL
default y if TREE_RCU
select TRACE_CLOCK
help
This option enables additional tracepoints for ftrace-style
event tracing.
Say Y here if you want to enable RCU tracing
Say N if you are unsure.
config RCU_EQS_DEBUG
bool "Provide debugging asserts for adding NO_HZ support to an arch"
depends on DEBUG_KERNEL
help
This option provides consistency checks in RCU's handling of
NO_HZ. These checks have proven quite helpful in detecting
bugs in arch-specific NO_HZ code.
Say N here if you need ultimate kernel/user switch latencies
Say Y if you are unsure
config RCU_STRICT_GRACE_PERIOD
bool "Provide debug RCU implementation with short grace periods"
depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4 && !TINY_RCU
default n
select PREEMPT_COUNT if PREEMPT=n
help
Select this option to build an RCU variant that is strict about
grace periods, making them as short as it can. This limits
scalability, destroys real-time response, degrades battery
lifetime and kills performance. Don't try this on large
machines, as in systems with more than about 10 or 20 CPUs.
But in conjunction with tools like KASAN, it can be helpful
when looking for certain types of RCU usage bugs, for example,
too-short RCU read-side critical sections.
endmenu # "RCU Debugging"