linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-17 01:34:00 +08:00

Author	SHA1	Message	Date
Paul E. McKenney	574de8766f	rcu-tasks: Selectively enable more RCU Tasks Trace IPIs Many workloads are quite sensitive to IPIs, and such workloads should build kernels with CONFIG_TASKS_TRACE_RCU_READ_MB=y to prevent RCU Tasks Trace from using them under normal conditions. However, other workloads are quite happy to permit more IPIs if doing so makes BPF program updates go faster. This commit therefore sets the default value for the rcupdate.rcu_task_ipi_delay kernel parameter to zero for kernels that have been built with CONFIG_TASKS_TRACE_RCU_READ_MB=n, while retaining the old default of (HZ / 10) for kernels that have indicated an aversion to IPIs via CONFIG_TASKS_TRACE_RCU_READ_MB=y. Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/ Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: <bpf@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-09-16 16:32:37 -07:00
Paul E. McKenney	78edc005f4	rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread() Commit `8344496e8b` ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()") introduced conditional compilation of several functions, but forgot one occurrence of show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of an unused static function. This commit uses "static inline" to avoid these complaints and possibly also to avoid emitting an actual definition of this function. Fixes: `8344496e8b` ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()") Cc: <stable@vger.kernel.org> # 5.8.x Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-09-16 16:32:36 -07:00
Paul E. McKenney	2393a613d2	rcu-tasks: Use more aggressive polling for RCU Tasks Trace The RCU Tasks Trace grace periods are too slow, as in 40x slower than those of RCU Tasks. This is due to my having assumed a one-second grace period was OK, and thus not having optimized any further. This commit provides the first step in this optimization process, namely by allowing the task_list scan backoff interval to be specified on a per-flavor basis, and then speeding up the scans for RCU Tasks Trace. However, kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y continue to use the old slower backoff, consistent with that Kconfig option's goal of reducing IPIs. Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/ Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: <bpf@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-09-16 16:32:36 -07:00
Paul E. McKenney	6731da9e0f	rcu-tasks: Mark variables static The n_heavy_reader_attempts, n_heavy_reader_updates, and n_heavy_reader_ofl_updates variables are not used outside of their translation unit, so this commit marks them static. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-09-16 16:32:36 -07:00
Paul E. McKenney	7fbe67e46a	Merge branch 'strictgp.2020.08.24a' into HEAD strictgp.2020.08.24a: Strict grace periods for KASAN testing.	2020-09-03 09:47:42 -07:00
Paul E. McKenney	f511ce1424	Merge branch 'scftorture.2020.08.24a' into HEAD scftorture.2020.08.24a: Torture tests for smp_call_function() and friends.	2020-09-03 09:47:01 -07:00
Paul E. McKenney	cfb2c1070a	Merge branches 'doc.2020.08.24a', 'fixes.2020.09.03b' and 'torture.2020.08.24a' into HEAD doc.2020.08.24a: Documentation updates. fixes.2020.09.03b: Miscellaneous fixes. torture.2020.08.24a: Torture-test updates.	2020-09-03 09:42:02 -07:00
Zqiang	70060b8770	rcu: Shrink each possible cpu krcp CPUs can go offline shortly after kfree_call_rcu() has been invoked, which can leave memory stranded until those CPUs come back online. This commit therefore drains the kcrp of each CPU, not just the ones that happen to be online. Acked-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-09-03 09:40:13 -07:00
Joel Fernandes (Google)	53922270d2	rcu/segcblist: Prevent useless GP start if no CBs to accelerate The rcu_segcblist_accelerate() function returns true iff it is necessary to request another grace period. A tracing session showed that this function unnecessarily requests grace periods. For example, consider the following sequence of events: 1. Callbacks are queued only on the NEXT segment of CPU A's callback list. 2. CPU A runs RCU_SOFTIRQ, accelerating these callbacks from NEXT to WAIT. 3. Thus rcu_segcblist_accelerate() returns true, requesting grace period N. 4. RCU's grace-period kthread wakes up on CPU B and starts grace period N. 4. CPU A notices the new grace period and invokes RCU_SOFTIRQ. 5. CPU A's RCU_SOFTIRQ again invokes rcu_segcblist_accelerate(), but there are no new callbacks. However, rcu_segcblist_accelerate() nevertheless (uselessly) requests a new grace period N+1. This extra grace period results in additional lock contention and also additional wakeups, all for no good reason. This commit therefore adds a check to rcu_segcblist_accelerate() that prevents the return of true when there are no new callbacks. This change reduces the number of grace periods (GPs) and wakeups in each of eleven five-second rcutorture runs as follows: +----+-------------------+-------------------+ \| # \| Number of GPs \| Number of Wakeups \| +====+=========+=========+=========+=========+ \| 1 \| With \| Without \| With \| Without \| +----+---------+---------+---------+---------+ \| 2 \| 75 \| 89 \| 113 \| 119 \| +----+---------+---------+---------+---------+ \| 3 \| 62 \| 91 \| 105 \| 123 \| +----+---------+---------+---------+---------+ \| 4 \| 60 \| 79 \| 98 \| 110 \| +----+---------+---------+---------+---------+ \| 5 \| 63 \| 79 \| 99 \| 112 \| +----+---------+---------+---------+---------+ \| 6 \| 57 \| 89 \| 96 \| 123 \| +----+---------+---------+---------+---------+ \| 7 \| 64 \| 85 \| 97 \| 118 \| +----+---------+---------+---------+---------+ \| 8 \| 58 \| 83 \| 98 \| 113 \| +----+---------+---------+---------+---------+ \| 9 \| 57 \| 77 \| 89 \| 104 \| +----+---------+---------+---------+---------+ \| 10 \| 66 \| 82 \| 98 \| 119 \| +----+---------+---------+---------+---------+ \| 11 \| 52 \| 82 \| 83 \| 117 \| +----+---------+---------+---------+---------+ The reduction in the number of wakeups ranges from 5% to 40%. Cc: urezki@gmail.com [ paulmck: Rework commit log and comment. ] Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-09-03 09:39:59 -07:00
Paul E. McKenney	d685514260	rcutorture: Allow pointer leaks to test diagnostic code This commit adds an rcutorture.leakpointer module parameter that intentionally leaks an RCU-protected pointer out of the RCU read-side critical section and checks to see if the corresponding grace period has elapsed, emitting a WARN_ON_ONCE() if so. This module parameter can be used to test facilities like CONFIG_RCU_STRICT_GRACE_PERIOD that end grace periods quickly. While in the area, also document rcutorture.irqreader, which was previously left out. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:45:36 -07:00
Paul E. McKenney	299c7d94f6	rcutorture: Hoist OOM registry up one level Currently, registering and unregistering the OOM notifier is done right before and after the test, respectively. This will not work well for multi-threaded tests, so this commit hoists this registering and unregistering up into the rcu_torture_fwd_prog_init() and rcu_torture_fwd_prog_cleanup() functions. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:45:35 -07:00
Colin Ian King	58db5785b0	refperf: Avoid null pointer dereference when buf fails to allocate Currently in the unlikely event that buf fails to be allocated it is dereferenced a few times. Use the errexit flag to determine if buf should be written to to avoid the null pointer dereferences. Addresses-Coverity: ("Dereference after null check") Fixes: `f518f154ec` ("refperf: Dynamically allocate experiment-summary output buffer") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:45:35 -07:00
Paul E. McKenney	57f602022e	rcutorture: Properly synchronize with OOM notifier The current rcutorture forward-progress code assumes that it is the only cause of out-of-memory (OOM) events. For script-based rcutorture testing, this assumption is in fact correct. However, testing based on modprobe/rmmod might well encounter external OOM events, which could happen at any time. This commit therefore properly synchronizes the interaction between rcutorture's forward-progress testing and its OOM notifier by adding a global mutex. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:45:34 -07:00
Paul E. McKenney	c8fa637147	rcutorture: Properly set rcu_fwds for OOM handling The conversion of rcu_fwds to dynamic allocation failed to actually allocate the required structure. This commit therefore allocates it, frees it, and updates rcu_fwds accordingly. While in the area, it abstracts the cleanup actions into rcu_torture_fwd_prog_cleanup(). Fixes: `5155be9994` ("rcutorture: Dynamically allocate rcu_fwds structure") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:45:34 -07:00
Joel Fernandes (Google)	959954df0c	rcutorture: Output number of elapsed grace periods This commit adds code to print the grace-period number at the start of the test along with both the grace-period number and the number of elapsed grace periods at the end of the test. Note that variants of RCU)without the notion of a grace-period number (for example, Tiny RCU) just print zeroes. [ paulmck: Adjust commit log. ] Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:45:31 -07:00
Paul E. McKenney	83224afd11	rcutorture: Remove KCSAN stubs KCSAN is now in mainline, so this commit removes the stubs for the data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:45:31 -07:00
Paul E. McKenney	cfeac3977a	rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp() The "cpu" parameter to rcu_report_qs_rdp() is not used, with rdp->cpu being used instead. Furtheremore, every call to rcu_report_qs_rdp() invokes it on rdp->cpu. This commit therefore removes this unused "cpu" parameter and converts a check of rdp->cpu against smp_processor_id() to a WARN_ON_ONCE(). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:28 -07:00
Paul E. McKenney	aa40c138cc	rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPs The CONFIG_PREEMPT=n instance of rcu_read_unlock is even more aggressively than that of CONFIG_PREEMPT=y in deferring reporting quiescent states to the RCU core. This is just what is wanted in normal use because it reduces overhead, but the resulting delay is not what is wanted for kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. This commit therefore adds an rcu_read_unlock_strict() function that checks for exceptional conditions, and reports the newly started quiescent state if it is safe to do so, also doing a spin-delay if requested via rcutree.rcu_unlock_delay. This commit also adds a call to rcu_read_unlock_strict() from the CONFIG_PREEMPT=n instance of __rcu_read_unlock(). [ paulmck: Fixed bug located by kernel test robot <lkp@intel.com> ] Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:28 -07:00
Paul E. McKenney	a657f26170	rcu: Execute RCU reader shortly after rcu_core for strict GPs A kernel built with CONFIG_RCU_STRICT_GRACE_PERIOD=y needs a quiescent state to appear very shortly after a CPU has noticed a new grace period. Placing an RCU reader immediately after this point is ineffective because this normally happens in softirq context, which acts as a big RCU reader. This commit therefore introduces a new per-CPU work_struct, which is used at the end of rcu_core() processing to schedule an RCU read-side critical section from within a clean environment. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:27 -07:00
Paul E. McKenney	3d29aaf1ef	rcu: Provide optional RCU-reader exit delay for strict GPs The goal of this series is to increase the probability of tools like KASAN detecting that an RCU-protected pointer was used outside of its RCU read-side critical section. Thus far, the approach has been to make grace periods and callback processing happen faster. Another approach is to delay the pointer leaker. This commit therefore allows a delay to be applied to exit from RCU read-side critical sections. This slowdown is specified by a new rcutree.rcu_unlock_delay kernel boot parameter that specifies this delay in microseconds, defaulting to zero. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:27 -07:00
Paul E. McKenney	4e025f52a1	rcu: IPI all CPUs at GP end for strict GPs Currently, each CPU discovers the end of a given grace period on its own time, which is again good for efficiency but bad for fast grace periods, given that it is things like kfree() within the RCU callbacks that will cause trouble for pointers leaked from RCU read-side critical sections. This commit therefore uses on_each_cpu() to IPI each CPU after grace-period cleanup in order to inform each CPU of the end of the old grace period in a timely manner, but only in kernels build with CONFIG_RCU_STRICT_GRACE_PERIOD=y. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:26 -07:00
Paul E. McKenney	933ada2c33	rcu: IPI all CPUs at GP start for strict GPs Currently, each CPU discovers the beginning of a given grace period on its own time, which is again good for efficiency but bad for fast grace periods. This commit therefore uses on_each_cpu() to IPI each CPU after grace-period initialization in order to inform each CPU of the new grace period in a timely manner, but only in kernels build with CONFIG_RCU_STRICT_GRACE_PERIOD=y. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:26 -07:00
Paul E. McKenney	1a2f5d57a3	rcu: Attempt QS when CPU discovers GP for strict GPs A given CPU normally notes a new grace period during one RCU_SOFTIRQ, but avoids reporting the corresponding quiescent state until some later RCU_SOFTIRQ. This leisurly approach improves efficiency by increasing the number of update requests served by each grace period, but is not what is needed for kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. This commit therefore adds a new rcu_strict_gp_check_qs() function which, in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, simply enters and immediately exist an RCU read-side critical section. If the CPU is in a quiescent state, the rcu_read_unlock() will attempt to report an immediate quiescent state. This rcu_strict_gp_check_qs() function is invoked from note_gp_changes(), so that a CPU just noticing a new grace period might immediately report a quiescent state for that grace period. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:26 -07:00
Paul E. McKenney	44bad5b3cc	rcu: Do full report for .need_qs for strict GPs The rcu_preempt_deferred_qs_irqrestore() function is invoked at the end of an RCU read-side critical section (for example, directly from rcu_read_unlock()) and, if .need_qs is set, invokes rcu_qs() to report the new quiescent state. This works, except that rcu_qs() only updates per-CPU state, leaving reporting of the actual quiescent state to a later call to rcu_report_qs_rdp(), for example from within a later RCU_SOFTIRQ instance. Although this approach is exactly what you want if you are more concerned about efficiency than about short grace periods, in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, short grace periods are the name of the game. This commit therefore makes rcu_preempt_deferred_qs_irqrestore() directly invoke rcu_report_qs_rdp() in CONFIG_RCU_STRICT_GRACE_PERIOD=y, thus shortening grace periods. Historical note: To the best of my knowledge, causing rcu_read_unlock() to directly report a quiescent state first appeared in Jim Houston's and Joe Korty's JRCU. This is the second instance of a Linux-kernel RCU feature being inspired by JRCU, the first being RCU callback offloading (as in the RCU_NOCB_CPU Kconfig option). Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:25 -07:00
Paul E. McKenney	f19920e412	rcu: Always set .need_qs from __rcu_read_lock() for strict GPs The ->rcu_read_unlock_special.b.need_qs field in the task_struct structure indicates that the RCU core needs a quiscent state from the corresponding task. The __rcu_read_unlock() function checks this (via an eventual call to rcu_preempt_deferred_qs_irqrestore()), and if set reports a quiscent state immediately upon exit from the outermost RCU read-side critical section. Currently, this flag is only set when the scheduling-clock interrupt decides that the current RCU grace period is too old, as in about one full second too old. But if the kernel has been built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, we clearly do not want to wait that long. This commit therefore sets the .need_qs field immediately at the start of the RCU read-side critical section from within __rcu_read_lock() in order to unconditionally enlist help from __rcu_read_unlock(). But note the additional check for rcu_state.gp_kthread, which prevents attempts to awaken RCU's grace-period kthread during early boot before there is a scheduler. Leaving off this check results in early boot hangs. So early that there is no console output. Thus, this additional check fails until such time as RCU's grace-period kthread has been created, avoiding these empty-console hangs. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:25 -07:00
Paul E. McKenney	29fc5f9332	rcu: Force DEFAULT_RCU_BLIMIT to 1000 for strict RCU GPs The value of DEFAULT_RCU_BLIMIT is normally set to 10, the idea being to avoid needless response-time degradation due to RCU callback invocation. However, when CONFIG_RCU_STRICT_GRACE_PERIOD=y it is better to avoid throttling callback execution in order to better detect pointer leaks from RCU read-side critical sections. This commit therefore sets the value of DEFAULT_RCU_BLIMIT to 1000 in kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:24 -07:00
Paul E. McKenney	aecd34b976	rcu: Restrict default jiffies_till_first_fqs for strict RCU GPs If there are idle CPUs, RCU's grace-period kthread will wait several jiffies before even thinking about polling them. This promotes efficiency, which is normally a good thing, but when the kernel has been built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, we care more about short grace periods. This commit therefore restricts the default jiffies_till_first_fqs value to zero in kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, which causes RCU's grace-period kthread to poll for idle CPUs immediately after starting a grace period. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:24 -07:00
Paul E. McKenney	dc1269186b	rcu: Reduce leaf fanout for strict RCU grace periods Because strict RCU grace periods will complete more quickly, they will experience greater lock contention on each leaf rcu_node structure's ->lock. This commit therefore reduces the leaf fanout in order to reduce this lock contention. Note that this also has the effect of reducing the number of CPUs supported to 16 in the case of CONFIG_RCU_FANOUT_LEAF=2 or 81 in the case of CONFIG_RCU_FANOUT_LEAF=3. However, greater numbers of CPUs are probably a bad idea when using CONFIG_RCU_STRICT_GRACE_PERIOD=y. Those wishing to live dangerously are free to edit their kernel/rcu/Kconfig files accordingly. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:23 -07:00
Paul E. McKenney	8cbd0e38a9	rcu: Add Kconfig option for strict RCU grace periods People running automated tests have asked for a way to make RCU minimize grace-period duration in order to increase the probability of KASAN detecting a pointer being improperly leaked from an RCU read-side critical section, for example, like this: rcu_read_lock(); p = rcu_dereference(gp); do_something_with(p); // OK rcu_read_unlock(); do_something_else_with(p); // BUG!!! The rcupdate.rcu_expedited boot parameter is a start in this direction, given that it makes calls to synchronize_rcu() instead invoke the faster (and more wasteful) synchronize_rcu_expedited(). However, this does nothing to shorten RCU grace periods that are instead initiated by call_rcu(), and RCU pointer-leak bugs can involve call_rcu() just as surely as they can synchronize_rcu(). This commit therefore adds a RCU_STRICT_GRACE_PERIOD Kconfig option that will be used to shorten normal (non-expedited) RCU grace periods. This commit also dumps out a message when this option is in effect. Later commits will actually shorten grace periods. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:40:23 -07:00
Paul E. McKenney	4e88ec4a9e	rcuperf: Change rcuperf to rcuscale This commit further avoids conflation of rcuperf with the kernel's perf feature by renaming kernel/rcu/rcuperf.c to kernel/rcu/rcuscale.c, and also by similarly renaming the functions and variables inside this file. This has the side effect of changing the names of the kernel boot parameters, so kernel-parameters.txt and ver_functions.sh are also updated. The rcutorture --torture type was also updated from rcuperf to rcuscale. [ paulmck: Fix bugs located by Stephen Rothwell. ] Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:39:24 -07:00
Paul E. McKenney	7f2a53c231	rcu: Remove unused __rcu_is_watching() function The x86/entry work removed all uses of __rcu_is_watching(), therefore this commit removes it entirely. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:37:56 -07:00
Joel Fernandes (Google)	666ca2907e	rcu: Make FQS more aggressive in complaining about offline CPUs The RCU grace-period kthread's force-quiescent state (FQS) loop should never see an offline CPU that has not yet reported a quiescent state. After all, the offline CPU should have reported a quiescent state during the CPU-offline process, or, failing that, by rcu_gp_init() if it ran concurrently with either the CPU going offline or the last task on a leaf rcu_node structure exiting its RCU read-side critical section while all CPUs corresponding to that structure are offline. The FQS loop should therefore complain if it does see an offline CPU that has not yet reported a quiescent state. And it does, but only once the grace period has been in force for a full second. This commit therefore makes this warning more aggressive, so that it will trigger as soon as the condition makes its appearance. Light testing with TREE03 and hotplug shows no warnings. This commit also converts the warning to WARN_ON_ONCE() in order to stave off possible log spam. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:37:55 -07:00
Joel Fernandes (Google)	f37599e6f0	rcu: Clarify comments about FQS loop reporting quiescent states Since at least v4.19, the FQS loop no longer reports quiescent states for offline CPUs except in emergency situations. This commit therefore fixes the comment in rcu_gp_init() to match the current code. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:37:55 -07:00
Paul E. McKenney	4569c5ee95	rcu/nocb: Add a warning for non-GP kthread running GP code This commit increases RCU's ability to defend itself by emitting a warning if one of the nocb CB kthreads invokes the GP kthread's wait function. This warning augments a similar check that is carried out at the end of rcutorture testing and when RCU CPU stall warnings are emitted. The problem with those checks is that the miscreants have long since departed and disposed of any and all evidence. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:37:54 -07:00
Paul E. McKenney	c0f97f20e5	rcu: Move rcu_cpu_started per-CPU variable to rcu_data When the rcu_cpu_started per-CPU variable was added by commit `f64c6013a2` ("rcu/x86: Provide early rcu_cpu_starting() callback"), there were multiple sets of per-CPU rcu_data structures. Therefore, the rcu_cpu_started flag was added as a separate per-CPU variable. But now there is only one set of per-CPU rcu_data structures, so this commit moves rcu_cpu_started to a new ->cpu_started field in that structure. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:37:54 -07:00
Paul E. McKenney	1ef5a442a1	rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_cpu_stall_ftrace_dump Given that sysfs can change the value of rcu_cpu_stall_ftrace_dump at any time, this commit adds a READ_ONCE() to the accesses to that variable. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:08 -07:00
Paul E. McKenney	fe63b723cc	rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_kick_kthreads Given that sysfs can change the value of rcu_kick_kthreads at any time, this commit adds a READ_ONCE() to the sole access to that variable. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:07 -07:00
Paul E. McKenney	a2b354b995	rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_resched_ns Given that sysfs can change the value of rcu_resched_ns at any time, this commit adds a READ_ONCE() to the sole access to that variable. While in the area, this commit also adds bounds checking, clamping the value to at least a millisecond, but no longer than a second. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:07 -07:00
Paul E. McKenney	b5374b2df0	rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_divisor Given that sysfs can change the value of rcu_divisor at any time, this commit adds a READ_ONCE to the sole access to that variable. While in the area, this commit also adds bounds checking, clamping the value to a shift that makes sense for a signed long. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:06 -07:00
Paul E. McKenney	2130c6b4f6	nocb: Remove show_rcu_nocb_state() false positive printout The rcu_data structure's ->nocb_timer field is used to defer wakeups of the corresponding no-CBs CPU's grace-period kthread ("rcuog*"), and that structure's ->nocb_defer_wakeup field is used to track such deferral. This means that the show_rcu_nocb_state() printing an error when those fields are set for a CPU not corresponding to a no-CBs grace-period kthread is erroneous. This commit therefore switches the check from ->nocb_timer to ->nocb_bypass_timer and removes the check of ->nocb_defer_wakeup. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:06 -07:00
Neeraj Upadhyay	9b1ce0acb5	rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp() Originally, the call to rcu_preempt_blocked_readers_cgp() from force_qs_rnp() had to be conditioned on CONFIG_PREEMPT_RCU=y, as in commit `a77da14ce9` ("rcu: Yet another fix for preemption and CPU hotplug"). However, there is now a CONFIG_PREEMPT_RCU=n definition of rcu_preempt_blocked_readers_cgp() that unconditionally returns zero, so invoking it is now safe. In addition, the CONFIG_PREEMPT_RCU=n definition of rcu_initiate_boost() simply releases the rcu_node structure's ->lock, which is what happens when the "if" condition evaluates to false. This commit therefore drops the IS_ENABLED(CONFIG_PREEMPT_RCU) check, so that rcu_initiate_boost() is called only in CONFIG_PREEMPT_RCU=y kernels when there are readers blocking the current grace period. This does not change the behavior, but reduces code-reader confusion by eliminating non-CONFIG_PREEMPT_RCU=y calls to rcu_initiate_boost(). Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:06 -07:00
Neeraj Upadhyay	9c39245382	rcu/tree: Force quiescent state on callback overload On callback overload, it is necessary to quickly detect idle CPUs, and rcu_gp_fqs_check_wake() checks for this condition. Unfortunately, the code following the call to this function does not repeat this check, which means that in reality no actual quiescent-state forcing, instead only a couple of quick and pointless wakeups at the beginning of the grace period. This commit therefore adds a check for the RCU_GP_FLAG_OVLD flag in the post-wakeup "if" statement in rcu_gp_fqs_loop(). Fixes: `1fca4d12f4` ("rcu: Expedite first two FQS scans under callback-overload conditions") Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:05 -07:00
Paul E. McKenney	e082c7b381	nocb: Clarify RCU nocb CPU error message A message of the form "rcu: !!! lDTs ." can be tracked down, but doing so is not trivial. This commit therefore eases this process by adding text so that this error message now reads as follows: "rcu: nocb GP activity on CB-only CPU!!! lDTs ." Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:05 -07:00
Joel Fernandes (Google)	a7886e899f	rcu/trace: Use gp_seq_req in acceleration's rcu_grace_period tracepoint During acceleration of CB, the rsp's gp_seq is rcu_seq_snap'd. This is the value used for acceleration - it is the value of gp_seq at which it is safe the execute all callbacks in the callback list. The rdp's gp_seq is not very useful for this scenario. Make rcu_grace_period report the gp_seq_req instead as it allows one to reason about how the acceleration works. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:04 -07:00
Paul E. McKenney	7487ea07df	rcu: Initialize at declaration time in rcu_exp_handler() This commit moves the initialization of the CONFIG_PREEMPT=n version of the rcu_exp_handler() function's rdp and rnp local variables into their respective declarations to save a couple lines of code. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:03 -07:00
Paul E. McKenney	d9b6074131	srcu: Remove KCSAN stubs KCSAN is now in mainline, so this commit removes the stubs for the data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:03 -07:00
Paul E. McKenney	beb27bd649	rcu: Remove KCSAN stubs from update.c KCSAN is now in mainline, so this commit removes the stubs for the data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:03 -07:00
Paul E. McKenney	ebc3505d50	rcu: Remove KCSAN stubs KCSAN is now in mainline, so this commit removes the stubs for the data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-08-24 18:36:02 -07:00
Walter Wu	26e760c9a7	rcu: kasan: record and print call_rcu() call stack Patch series "kasan: memorize and print call_rcu stack", v8. This patchset improves KASAN reports by making them to have call_rcu() call stack information. It is useful for programmers to solve use-after-free or double-free memory issue. The KASAN report was as follows(cleaned up slightly): BUG: KASAN: use-after-free in kasan_rcu_reclaim+0x58/0x60 Freed by task 0: kasan_save_stack+0x24/0x50 kasan_set_track+0x24/0x38 kasan_set_free_info+0x18/0x20 __kasan_slab_free+0x10c/0x170 kasan_slab_free+0x10/0x18 kfree+0x98/0x270 kasan_rcu_reclaim+0x1c/0x60 Last call_rcu(): kasan_save_stack+0x24/0x50 kasan_record_aux_stack+0xbc/0xd0 call_rcu+0x8c/0x580 kasan_rcu_uaf+0xf4/0xf8 Generic KASAN will record the last two call_rcu() call stacks and print up to 2 call_rcu() call stacks in KASAN report. it is only suitable for generic KASAN. This feature considers the size of struct kasan_alloc_meta and kasan_free_meta, we try to optimize the structure layout and size, lets it get better memory consumption. [1]https://bugzilla.kernel.org/show_bug.cgi?id=198437 [2]https://groups.google.com/forum/#!searchin/kasan-dev/better$20stack$20traces$20for$20rcu%7Csort:date/kasan-dev/KQsjT_88hDE/7rNUZprRBgAJ This patch (of 4): This feature will record the last two call_rcu() call stacks and prints up to 2 call_rcu() call stacks in KASAN report. When call_rcu() is called, we store the call_rcu() call stack into slub alloc meta-data, so that the KASAN report can print rcu stack. [1]https://bugzilla.kernel.org/show_bug.cgi?id=198437 [2]https://groups.google.com/forum/#!searchin/kasan-dev/better$20stack$20traces$20for$20rcu%7Csort:date/kasan-dev/KQsjT_88hDE/7rNUZprRBgAJ [walter-zh.wu@mediatek.com: build fix] Link: http://lkml.kernel.org/r/20200710162401.23816-1-walter-zh.wu@mediatek.com Suggested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthias Brugger <matthias.bgg@gmail.com> Link: http://lkml.kernel.org/r/20200710162123.23713-1-walter-zh.wu@mediatek.com Link: http://lkml.kernel.org/r/20200601050847.1096-1-walter-zh.wu@mediatek.com Link: http://lkml.kernel.org/r/20200601050927.1153-1-walter-zh.wu@mediatek.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-07 11:33:28 -07:00
Linus Torvalds	6d2b84a4e5	This tree adds the sched_set_fifo() encapsulation APIs to remove static priority level knowledge from non-scheduler code. The three APIs for non-scheduler code to set SCHED_FIFO are: - sched_set_fifo() - sched_set_fifo_low() - sched_set_normal() These are two FIFO priority levels: default (high), and a 'low' priority level, plus sched_set_normal() to set the policy back to non-SCHED_FIFO. Since the changes affect a lot of non-scheduler code, we kept this in a separate tree. When merging to the latest upstream tree there's a conflict in drivers/spi/spi.c, which can be resolved via: sched_set_fifo(ctlr->kworker_task); Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8pPQIRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1j0Jw/+LlSyX6gD2ATy3cizGL7DFPZogD5MVKTb IXbhXH/ACpuPQlBe1+haRLbJj6XfXqbOlAleVKt7eh+jZ1jYjC972RCSTO4566mJ 0v8Iy9kkEeb2TDbYx1H3bnk78lf85t0CB+sCzyKUYFuTrXU04eRj7MtN3vAQyRQU xJg83x/sT5DGdDTP50sL7lpbwk3INWkD0aDCJEaO/a9yHElMsTZiZBKoXxN/s30o FsfzW56jqtng771H2bo8ERN7+abwJg10crQU5mIaLhacNMETuz0NZ/f8fY/fydCL Ju8HAdNKNXyphWkAOmixQuyYtWKe2/GfbHg8hld0jmpwxkOSTgZjY+pFcv7/w306 g2l1TPOt8e1n5jbfnY3eig+9Kr8y0qHkXPfLfgRqKwMMaOqTTYixEzj+NdxEIRX9 Kr7oFAv6VEFfXGSpb5L1qyjIGVgQ5/JE/p3OC3GHEsw5VKiy5yjhNLoSmSGzdS61 1YurVvypSEUAn3DqTXgeGX76f0HH365fIKqmbFrUWxliF+YyflMhtrj2JFtejGzH Md3RgAzxusE9S6k3gw1ev4byh167bPBbY8jz0w3Gd7IBRKy9vo92h6ZRYIl6xeoC BU2To1IhCAydIr6hNsIiCSDTgiLbsYQzPuVVovUxNh+l1ZvKV2X+csEHhs8oW4pr 4BRU7dKL2NE= =/7JH -----END PGP SIGNATURE----- Merge tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull sched/fifo updates from Ingo Molnar: "This adds the sched_set_fifo() encapsulation APIs to remove static priority level knowledge from non-scheduler code. The three APIs for non-scheduler code to set SCHED_FIFO are: - sched_set_fifo() - sched_set_fifo_low() - sched_set_normal() These are two FIFO priority levels: default (high), and a 'low' priority level, plus sched_set_normal() to set the policy back to non-SCHED_FIFO. Since the changes affect a lot of non-scheduler code, we kept this in a separate tree" * tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) sched,tracing: Convert to sched_set_fifo() sched: Remove sched_set_() return value sched: Remove sched_setscheduler() EXPORTs sched,psi: Convert to sched_set_fifo_low() sched,rcutorture: Convert to sched_set_fifo_low() sched,rcuperf: Convert to sched_set_fifo_low() sched,locktorture: Convert to sched_set_fifo() sched,irq: Convert to sched_set_fifo() sched,watchdog: Convert to sched_set_fifo() sched,serial: Convert to sched_set_fifo() sched,powerclamp: Convert to sched_set_fifo() sched,ion: Convert to sched_set_normal() sched,powercap: Convert to sched_set_fifo() sched,spi: Convert to sched_set_fifo() sched,mmc: Convert to sched_set_fifo() sched,ivtv: Convert to sched_set_fifo() sched,drm/scheduler: Convert to sched_set_fifo() sched,msm: Convert to sched_set_fifo() sched,psci: Convert to sched_set_fifo() sched,drbd: Convert to sched_set_fifo() ...	2020-08-06 11:55:43 -07:00
Ingo Molnar	c1cc4784ce	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the v5.9 RCU bits from Paul E. McKenney: - Documentation updates - Miscellaneous fixes - kfree_rcu updates - RCU tasks updates - Read-side scalability tests - SRCU updates - Torture-test updates Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-07-31 00:15:53 +02:00
Linus Torvalds	5465a324af	A single fix for a printk format warning in RCU. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8B8gcTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoXDoEADCdEVs/qLktFXrN17i6Oeju7h6oQ2m 8iI1GDStY5zj4Jk6FdQB864XXHbkNO7C096IqJOZud66k+lj3sEk9lE24tpgZqi/ gHVCueUoKF5ZYNyEtPkSDzHcr/IJg3iueQyShTbGotvGbF/gBAWJtuIq3sVpaD+Q qvZYASQMkBNrRcEgxzaTr286MJ4lIQ61ujwRWQJV4woQgAqjeTrOKQ+qOoKCZVfB c1glieDNLwZvs/534zsBLRj7ApvuJ2SyHXhfC9byIitUb1RdZ/1gAkiteX/K6ici PXoPamBsd+gSEdfWN69HB+cWqPqJ8Gq8M2zcmp3KSrg4IrXTVrnYHmymH3tN5Vbe p3my9/rH/yDv1kgcRgOlgL7ykz6W2oCr7LrTrQ7fupOXrR7UrW0dSsEcFRbWwoBn 7dlfdEI4Q7ay9GPN2f7QOiaGGE+Bi76iCXTjRTFzcEQHiwO6W1bLoSu8qtncYvke 2PaDrE4V/2CWjOuE37mw3IPsjUEOJNKC2+H3y+J9ma94CX4kAuZzH2LS4oHO8ww1 eyF1HSHOKh1tuY9RhNnsyh+1V2Iao6T0BjUMnG/c01xFeEz+lE+e1JRdTSxa5BOr zKSSuEei7z6m9t7Yn2DSU8YaKn8ZbF8JpfC5nGFZTMBh64EOEaWGhQUr7ttuFQxi 7JF6WVarhn5FHw== =p6ry -----END PGP SIGNATURE----- Merge tag 'core-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rcu fixlet from Thomas Gleixner: "A single fix for a printk format warning in RCU" * tag 'core-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcuperf: Fix printk format warning	2020-07-05 12:21:28 -07:00
Paul E. McKenney	13625c0a40	Merge branches 'doc.2020.06.29a', 'fixes.2020.06.29a', 'kfree_rcu.2020.06.29a', 'rcu-tasks.2020.06.29a', 'scale.2020.06.29a', 'srcu.2020.06.29a' and 'torture.2020.06.29a' into HEAD doc.2020.06.29a: Documentation updates. fixes.2020.06.29a: Miscellaneous fixes. kfree_rcu.2020.06.29a: kfree_rcu() updates. rcu-tasks.2020.06.29a: RCU Tasks updates. scale.2020.06.29a: Read-side scalability tests. srcu.2020.06.29a: SRCU updates. torture.2020.06.29a: Torture-test updates.	2020-06-29 12:03:15 -07:00
Paul E. McKenney	7752275118	rcutorture: Check for unwatched readers RCU is supposed to be watching all non-idle kernel code and also all softirq handlers. This commit adds some teeth to this statement by adding a WARN_ON_ONCE(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:01:44 -07:00
Jules Irenge	8f43d5911b	rcu/rcutorture: Replace 0 with false Coccinelle reports a warning WARNING: Assignment of 0/1 to bool variable The root cause is that the variable lastphase is a bool, but is initialised with integer 0. This commit therefore replaces the 0 with a false. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:01:44 -07:00
Paul E. McKenney	cae7cc6ba5	rcutorture: NULL rcu_torture_current earlier in cleanup code Currently, the rcu_torture_current variable remains non-NULL until after all readers have stopped. During this time, rcu_torture_stats_print() will think that the test is still ongoing, which can result in confusing dmesg output. This commit therefore NULLs rcu_torture_current immediately after the rcu_torture_writer() kthread has decided to stop, thus informing rcu_torture_stats_print() much sooner. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:01:44 -07:00
Paul E. McKenney	4a5f133c15	rcutorture: Add races with task-exit processing Several variants of Linux-kernel RCU interact with task-exit processing, including preemptible RCU, Tasks RCU, and Tasks Trace RCU. This commit therefore adds testing of this interaction to rcutorture by adding rcutorture.read_exit_burst and rcutorture.read_exit_delay kernel-boot parameters. These kernel parameters control the frequency and spacing of special read-then-exit kthreads that are spawned. [ paulmck: Apply feedback from Dan Carpenter's static checker. ] [ paulmck: Reduce latency to avoid false-positive shutdown hangs. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:01:44 -07:00
Sebastian Andrzej Siewior	bde50d8ff8	srcu: Avoid local_irq_save() before acquiring spinlock_t SRCU disables interrupts to get a stable per-CPU pointer and then acquires the spinlock which is in the per-CPU data structure. The release uses spin_unlock_irqrestore(). While this is correct on a non-RT kernel, this conflicts with the RT semantics because the spinlock is converted to a 'sleeping' spinlock. Sleeping locks can obviously not be acquired with interrupts disabled. Acquire the per-CPU pointer `ssp->sda' without disabling preemption and then acquire the spinlock_t of the per-CPU data structure. The lock will ensure that the data is consistent. The added call to check_init_srcu_struct() is now needed because a statically defined srcu_struct may remain uninitialized until this point and the newly introduced locking operation requires an initialized spinlock_t. This change was tested for four hours with 8SRCU-N and 8SRCU-P without causing any warnings. Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: rcu@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:01:22 -07:00
Ethon Paul	7fef6cff8f	srcu: Fix a typo in comment "amoritized"->"amortized" This commit fixes a typo in a comment. Signed-off-by: Ethon Paul <ethp@qq.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:01:22 -07:00
Paul E. McKenney	1fbeb3a8c4	refperf: Rename refperf.c to refscale.c and change internal names This commit further avoids conflation of refperf with the kernel's perf feature by renaming kernel/rcu/refperf.c to kernel/rcu/refscale.c, and also by similarly renaming the functions and variables inside this file. This has the side effect of changing the names of the kernel boot parameters, so kernel-parameters.txt and ver_functions.sh are also updated. The rcutorture --torture type remains refperf, and this will be addressed in a separate commit. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:46 -07:00
Paul E. McKenney	8e4ec3d02b	refperf: Rename RCU_REF_PERF_TEST to RCU_REF_SCALE_TEST The old Kconfig option name is all too easy to conflate with the unrelated "perf" feature, so this commit renames RCU_REF_PERF_TEST to RCU_REF_SCALE_TEST. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:46 -07:00
Paul E. McKenney	c7dcf8106f	rcu-tasks: Fix synchronize_rcu_tasks_trace() header comment The synchronize_rcu_tasks_trace() header comment incorrectly claims that any number of things delimit RCU Tasks Trace read-side critical sections, when in fact only rcu_read_lock_trace() and rcu_read_unlock_trace() do so. This commit therefore fixes this comment, and, while in the area, fixes a typo in the rcu_read_lock_trace() header comment. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:46 -07:00
Paul E. McKenney	e13ef442fe	refperf: Add test for RCU Tasks readers This commit adds testing for RCU Tasks readers to the refperf module. This also applies to RCU Rude readers, as both flavors have empty (as in non-existent) read-side markers. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:46 -07:00
Paul E. McKenney	72bb749e70	refperf: Add test for RCU Tasks Trace readers. This commit adds testing for RCU Tasks Trace readers to the refperf module. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	918b351d96	refperf: Change readdelay module parameter to nanoseconds The current units of microseconds are too coarse, so this commit changes the units to nanoseconds. However, ndelay is used only for the nanoseconds with udelay being used for whole microseconds. For example, setting refperf.readdelay=1500 results in a udelay(1) followed by an ndelay(500). Suggested-by: Akira Yokosawa <akiyks@gmail.com> [ paulmck: Abstracted delay per Akira feedback and move from 80 to 100 lines. ] [ paulmck: Fix names as suggested by kbuild test robot. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Arnd Bergmann	7c944d7c67	refperf: Work around 64-bit division A 64-bit division was introduced in refperf, breaking compilation on all 32-bit architectures: kernel/rcu/refperf.o: in function `main_func': refperf.c:(.text+0x57c): undefined reference to `__aeabi_uldivmod' Fix this by using div_u64 to mark the expensive operation. [ paulmck: Update primitive and format per Nathan Chancellor. ] Fixes: bd5b16d6c88d ("refperf: Allow decimal nanoseconds") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Valdis Klētnieks <valdis.kletnieks@vt.edu> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	4dd72a338a	refperf: Adjust refperf.loop default value With the various measurement optimizations, 10,000 loops normally suffices. This commit therefore reduces the refperf.loops default value from 10,000,000 to 10,000. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	b4d1e34f65	refperf: Add read-side delay module parameter This commit adds a refperf.readdelay module parameter that controls the duration of each critical section. This parameter allows gathering data showing how the performance differences between the various primitives vary with critical-section length. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	96af866959	refperf: Simplify initialization-time wakeup protocol This commit moves the reader-launch wait loop from ref_perf_init() to main_func(), removing one layer of wakeup and allowing slightly faster system boot. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	6efb063408	refperf: Label experiment-number column "Runs" The experiment-number column is currently labeled "Threads", which is misleading at best. This commit therefore relabels it as "Runs", and adjusts the scripts accordingly. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	2db0bda384	refperf: Add warmup and cooldown processing phases This commit causes all the readers to start running unmeasured load until all readers have done at least one such run (thus having warmed up), then run the measured load, and then run unmeasured load until all readers have completed their measured load. This approach avoids any thread running measured load while other readers are idle. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	86e0da2bb8	refperf: More closely synchronize reader start times Currently, readers are awakened individually. On most systems, this results in significant wakeup delay from one reader to the next, which can result in the first and last reader having sole access to the synchronization primitive in question. If that synchronization primitive involves shared memory, those readers will rack up a huge number of operations in a very short time, causing large perturbations in the results. This commit therefore has the readers busy-wait after being awakened, and uses a new n_started variable to synchronize their start times. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	af2789db13	refperf: Convert reader_task structure's "start" field to int This commit converts the reader_task structure's "start" field to int in order to demote a full barrier to an smp_load_acquire() and also to simplify the code a bit. While in the area, and to enlist the compiler's help in ensuring that nothing was missed, the field's name was changed to start_reader. Also while in the area, change the main_func() store to use smp_store_release() to further fortify against wait/wake races. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	b864f89ff6	refperf: Tune reader measurement interval This commit moves a printk() out of the measurement interval, converts a atomic_dec()/atomic_read() pair to atomic_dec_and_test(), and adds a smp_mb__before_atomic() to avoid potential wake/wait hangs. These changes have the added benefit of reducing the number of loops required for amortizing loop overhead for CONFIG_PREEMPT=n RCU measurements from 1,000,000 to 10,000. This reduction in turn shortens the test, reducing the probability of interference. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	2990750bce	refperf: Make functions static Because the reset_readers() and process_durations() functions are used only within kernel/rcu/refperf.c, this commit makes them static. Reported-by: kbuild test robot <lkp@intel.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:45 -07:00
Paul E. McKenney	2e90de76f2	refperf: Dynamically allocate thread-summary output buffer Currently, the buffer used to accumulate the thread-summary output is fixed size, which will cause problems if someone decides to run on a large number of PCUs. This commit therefore dynamically allocates this buffer. [ paulmck: Fix memory allocation as suggested by KASAN. ] Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Paul E. McKenney	f518f154ec	refperf: Dynamically allocate experiment-summary output buffer Currently, the buffer used to accumulate the experiment-summary output is fixed size, which will cause problems if someone decides to run one hundred experiments. This commit therefore dynamically allocates this buffer. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Paul E. McKenney	dbf28efdae	refperf: Provide module parameter to specify number of experiments The current code uses the number of threads both to limit the number of threads and to specify the number of experiments, but also varies the number of threads as the experiments progress. This commit takes a different approach by adding an refperf.nruns module parameter that specifies the number of experiments, and furthermore uses the same number of threads for each experiment. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Paul E. McKenney	8fc28783a0	refperf: Convert nreaders to a module parameter This commit converts nreaders to a module parameter, with the default of -1 specifying the old behavior of using 75% of the readers. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Paul E. McKenney	83b88c86da	refperf: Allow decimal nanoseconds The CONFIG_PREEMPT=n rcu_read_lock()/rcu_read_unlock() pair's overhead, even including loop overhead, is far less than one nanosecond. Since logscale plots are not all that happy with zero values, provide picoseconds as decimals. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Paul E. McKenney	75dd8efef5	refperf: Hoist function-pointer calls out of the loop Current runs show PREEMPT=n rcu_read_lock()/rcu_read_unlock() pairs consuming between 20 and 30 nanoseconds, when in fact the actual value is zero, give or take the barrier() asm's effect on compiler optimizations. The additional overhead is caused by function calls through pointers (especially in these days of Spectre mitigations) and perhaps also needless argument passing, a non-const loop limit, and an upcounting loop. This commit therefore combines the ->readlock() and ->readunlock() function pointers into a single ->readsection() function pointer that takes the loop count as a const parameter and keeps any data passed from the read-lock to the read-unlock internal to this new function. These changes reduce the measured overhead of the aforementioned PREEMPT=n rcu_read_lock()/rcu_read_unlock() pairs from between 20 and 30 nanoseconds to somewhere south of 500 picoseconds. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Paul E. McKenney	777a54c908	refperf: Add holdoff parameter to allow CPUs to come online This commit adds an rcuperf module parameter named "holdoff" that defaults to 10 seconds if refperf is built in and to zero otherwise. The assumption is that all the CPUs are online by the time that the modprobe and insmod commands are going to do anything, and that normal systems will have all the CPUs online within ten seconds. Larger systems may take many tens of seconds or even minutes to get to this point, hence this being a module parameter instead of being a hard-coded constant. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Paul E. McKenney	708cda3165	rcuperf: Add comments explaining the high reader overhead This commit adds comments explaining why the readers have otherwise insane levels of measurement overhead, namely that they are intended as a test load for update-side performance measurements, not as a straight-up read-side performance test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Joel Fernandes (Google)	653ed64b01	refperf: Add a test to measure performance of read-side synchronization Add a test for comparing the performance of RCU with various read-side synchronization mechanisms. The test has proved useful for collecting data and performing these comparisons. Currently RCU, SRCU, reader-writer lock, reader-writer semaphore and reference counting can be measured using refperf.perf_type parameter. Each invocation of the test runs measures performance of a specific mechanism. The maximum number of CPUs to concurrently run readers on is chosen by the test itself and is 75% of the total number of CPUs. So if you had 24 CPUs, the test runs with a maximum of 18 parallel readers. A number of experiments are conducted, and in each experiment, the number of readers is increased by 1, upto the 75% of CPUs mark. During each experiment, all readers execute an empty loop with refperf.loops iterations and time the total loop duration. This is then averaged. Example output: Parameters "refperf.perf_type=srcu refperf.loops=2000000" looks like: [ 3.347133] srcu-ref-perf: [ 3.347133] Threads Time(ns) [ 3.347133] 1 36 [ 3.347133] 2 34 [ 3.347133] 3 34 [ 3.347133] 4 34 [ 3.347133] 5 33 [ 3.347133] 6 33 [ 3.347133] 7 33 [ 3.347133] 8 33 [ 3.347133] 9 33 [ 3.347133] 10 33 [ 3.347133] 11 33 [ 3.347133] 12 33 [ 3.347133] 13 33 [ 3.347133] 14 33 [ 3.347133] 15 32 [ 3.347133] 16 33 [ 3.347133] 17 33 [ 3.347133] 18 34 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Joel Fernandes (Google)	7e866460cc	rcuperf: Remove useless while loops around wait_event wait_event() already retries if the condition for the wake up is not satisifed after wake up. Remove them from the rcuperf test. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:44 -07:00
Paul E. McKenney	30d8aa5128	rcu-tasks: Fix code-style issues This commit declares trc_n_readers_need_end and trc_wait static and replaced a "&" with "&&". The "&" happened to work because the values are bool, but accidents waiting to happen and all that... Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:22 -07:00
Paul E. McKenney	8344496e8b	rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads() The show_rcu_tasks_gp_kthreads() function is not invoked by Tiny RCU, but is nevertheless defined in Tiny RCU builds that enable Tasks Trace RCU. This commit therefore conditionally compiles this function so that it is defined only in builds that actually use it. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:22 -07:00
Paul E. McKenney	5b3cc99bed	rcu-tasks: Add #include of rcupdate_trace.h to update.c Although this is in some strict sense unnecessary, it is good to allow the compiler to compare the function declaration with its definition. This commit therefore adds a #include of linux/rcupdate_trace.h to kernel/rcu/update.c. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:22 -07:00
Paul E. McKenney	04a3c5aa7a	rcu-tasks: Make rcu_tasks_postscan() be static The rcu_tasks_postscan() function is not used outside of RCU's tasks.h file, so this commit makes it be static. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:22 -07:00
Paul E. McKenney	ea6eed9f7d	rcu-tasks: Convert sleeps to idle priority This commit converts the long-standing schedule_timeout_interruptible() and schedule_timeout_uninterruptible() calls used by the various Tasks RCU's grace-period kthreads to schedule_timeout_idle(). This conversion avoids polluting the load-average with Tasks-RCU-related sleeping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 12:00:22 -07:00
Uladzislau Rezki (Sony)	3042f83f19	rcu: Support reclaim for head-less object Update the kvfree_call_rcu() function with head-less support. This allows RCU to reclaim objects without an embedded rcu_head. tree-RCU: We introduce two chains of arrays to store SLAB-backed and vmalloc pointers, each. Storage in either of these arrays does not require embedding an rcu_head within the object. Maintaining the arrays may become impossible due to high memory pressure. For such cases there is an emergency path. Objects with rcu_head inside are just queued on a backup rcu_head list. Later on that list is drained. As for the head-less variant, as the current context can sleep, the following emergency measures are applied: a) Synchronously wait until a grace period has elapsed. b) Call kvfree(). tiny-RCU: For double argument calls, there are no new changes in behavior. For single argument call, kvfree() is directly inlined on the current stack after a synchronize_rcu() call. Note that for tiny-RCU, any call to synchronize_rcu() is actually a quiescent state, therefore it does nothing. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:26 -07:00
Uladzislau Rezki (Sony)	c408b215f5	rcu: Rename _kfree_callback/_kfree_rcu_offset/kfree_call_* The following changes are introduced: 1. Rename rcu_invoke_kfree_callback() to rcu_invoke_kvfree_callback(), as well as the associated trace events, so the rcu_kfree_callback(), becomes rcu_kvfree_callback(). The reason is to be aligned with kvfree() notation. 2. Rename __is_kfree_rcu_offset to __is_kvfree_rcu_offset. All RCU paths use kvfree() now instead of kfree(), thus rename it. 3. Rename kfree_call_rcu() to the kvfree_call_rcu(). The reason is, it is capable of freeing vmalloc() memory now. Do the same with __kfree_rcu() macro, it becomes __kvfree_rcu(), the goal is the same. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)	64d1d06ccb	rcu/tiny: support vmalloc in tiny-RCU Replace kfree() with kvfree() in rcu_reclaim_tiny(). This makes it possible to release either SLAB or vmalloc objects after a GP. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)	5f3c8d6204	rcu/tree: Maintain separate array for vmalloc ptrs To do so, we use an array of kvfree_rcu_bulk_data structures. It consists of two elements: - index number 0 corresponds to slab pointers. - index number 1 corresponds to vmalloc pointers. Keeping vmalloc pointers separated from slab pointers makes it possible to invoke the right freeing API for the right kind of pointer. It also prepares us for future headless support for vmalloc and SLAB objects. Such objects cannot be queued on a linked list and are instead directly into an array. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)	53c72b590b	rcu/tree: cache specified number of objects In order to reduce the dynamic need for pages in kfree_rcu(), pre-allocate a configurable number of pages per CPU and link them in a list. When kfree_rcu() reclaims objects, the object's container page is cached into a list instead of being released to the low-level page allocator. Such an approach provides O(1) access to free pages while also reducing the number of requests to the page allocator. It also makes the kfree_rcu() code to have free pages available during a low memory condition. A read-only sysfs parameter (rcu_min_cached_objs) reflects the minimum number of allowed cached pages per CPU. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Sebastian Andrzej Siewior	69f08d3999	rcu/tree: Use static initializer for krc.lock The per-CPU variable is initialized at runtime in kfree_rcu_batch_init(). This function is invoked before 'rcu_scheduler_active' is set to 'RCU_SCHEDULER_RUNNING'. After the initialisation, '->initialized' is to true. The raw_spin_lock is only acquired if '->initialized' is set to true. The worqueue item is only used if 'rcu_scheduler_active' set to RCU_SCHEDULER_RUNNING which happens after initialisation. Use a static initializer for krc.lock and remove the runtime initialisation of the lock. Since the lock can now be always acquired, remove the '->initialized' check. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)	952371d6fc	rcu/tree: Move kfree_rcu_cpu locking/unlocking to separate functions Introduce helpers to lock and unlock per-cpu "kfree_rcu_cpu" structures. That will make kfree_call_rcu() more readable and prevent programming errors. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)	3af8486281	rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro We can simplify KFREE_BULK_MAX_ENTR macro and get rid of magic numbers which were used to make the structure to be exactly one page. Suggested-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Joel Fernandes (Google)	446044eb9c	rcu/tree: Make debug_objects logic independent of rcu_head kfree_rcu()'s debug_objects logic uses the address of the object's embedded rcu_head to queue/unqueue. Instead of this, make use of the object's address itself as preparation for future headless kfree_rcu() support. Reviewed-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Uladzislau Rezki (Sony)	594aa5975b	rcu/tree: Repeat the monitor if any free channel is busy It is possible that one of the channels cannot be detached because its free channel is busy and previously queued data has not been processed yet. On the other hand, another channel can be successfully detached causing the monitor work to stop. Prevent that by rescheduling the monitor work if there are any channels in the pending state after a detach attempt. Fixes: `34c8817455` ("rcu: Support kfree_bulk() interface in kfree_rcu()") Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Joel Fernandes (Google)	4d29194118	rcu/tree: Skip entry into the page allocator for PREEMPT_RT To keep the kfree_rcu() code working in purely atomic sections on RT, such as non-threaded IRQ handlers and raw spinlock sections, avoid calling into the page allocator which uses sleeping locks on RT. In fact, even if the caller is preemptible, the kfree_rcu() code is not, as the krcp->lock is a raw spinlock. Calling into the page allocator is optional and avoiding it should be Ok, especially with the page pre-allocation support in future patches. Such pre-allocation would further avoid the a need for a dynamically allocated page in the first place. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Uladzislau Rezki <urezki@gmail.com> Co-developed-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Joel Fernandes (Google)	8ac88f7177	rcu/tree: Keep kfree_rcu() awake during lock contention On PREEMPT_RT kernels, the krcp spinlock gets converted to an rt-mutex and causes kfree_rcu() callers to sleep. This makes it unusable for callers in purely atomic sections such as non-threaded IRQ handlers and raw spinlock sections. Fix it by converting the spinlock to a raw spinlock. Vetting all code paths, there is no reason to believe that the raw spinlock will hurt RT latencies as it is not held for a long time. Cc: bigeasy@linutronix.de Cc: Uladzislau Rezki <urezki@gmail.com> Reviewed-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:25 -07:00
Mauro Carvalho Chehab	8e11690d2f	rcu: Fix a kernel-doc warnings for "count" There are some kernel-doc warnings: ./kernel/rcu/tree.c:2915: warning: Function parameter or member 'count' not described in 'kfree_rcu_cpu' This commit therefore moves the comment for "count" to the kernel-doc markup. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:59:24 -07:00
Randy Dunlap	c3cb47a6cc	kernel/rcu/tree.c: Fix kernel-doc warnings Fix kernel-doc warning: ../kernel/rcu/tree.c:959: warning: Excess function parameter 'irq' description in 'rcu_nmi_enter' Fixes: `cf7614e13c` ("rcu: Refactor rcu_{nmi,irq}_{enter,exit}()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:51 -07:00
Wei Yang	7a0c2b0940	rcu: grpnum just records group number The ->grpnum field in the rcu_node structure contains the bit position in this structure's parent's bitmasks, which is not the CPU number. This commit therefore adjusts this field's comment accordingly. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:51 -07:00
Wei Yang	a2dae43088	rcu: grplo/grphi just records CPU number The ->grplo and ->grphi fields store the lowest and highest CPU number covered by to a rcu_node structure, which is not the group number. This commit therefore adjusts these fields' comments to match reality. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:51 -07:00
Wei Yang	00943a609d	rcu: gp_max is protected by root rcu_node's lock Because gp_max is protected by root rcu_node's lock, this commit moves the gp_max definition to the region of the rcu_node structure containing fields protected by this lock. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:51 -07:00
Peter Enderborg	c6dfd72b7a	rcu: Stop shrinker loop The count and scan can be separated in time, and there is a fair chance that all work is already done when the scan starts, which might in turn result in a needless retry. This commit therefore avoids this retry by returning SHRINK_STOP. Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Peter Enderborg <peter.enderborg@sony.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:51 -07:00
Jules Irenge	e40bb92111	rcu: Replace 1 with true Coccinelle reports a warning WARNING: Assignment of 0/1 to bool variable The root cause is that the variable lastphase is a bool, but is initialised with integer 1. This commit therefore replaces the 1 with a true. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:51 -07:00
Paul E. McKenney	04b25a495b	rcu: Mark rcu_nmi_enter() call to rcu_cleanup_after_idle() noinstr The objtool complains about the call to rcu_cleanup_after_idle() from rcu_nmi_enter(), so this commit adds instrumentation_begin() before that call and instrumentation_end() after it. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:50 -07:00
Paul E. McKenney	55fbe86ef3	rcu: Remove initialized but unused rnp from check_slow_task() This commit removes the variable rnp from check_slow_task(), which is defined, assigned to, but not otherwise used. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:50 -07:00
Lihao Liang	360fbbb489	rcu: Update comment from rsp->rcu_gp_seq to rsp->gp_seq Signed-off-by: Lihao Liang <lihaoliang@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:50 -07:00
Paul E. McKenney	68c2f27e01	rcu: Expedited grace-period sleeps to idle priority This commit converts the schedule_timeout_uninterruptible() call used by RCU's expedited grace-period processing to schedule_timeout_idle(). This conversion avoids polluting the load-average with RCU-related sleeping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:50 -07:00
Paul E. McKenney	f5ca34643b	rcu: No-CBs-related sleeps to idle priority This commit converts the schedule_timeout_interruptible() call used by RCU's no-CBs grace-period kthreads to schedule_timeout_idle(). This conversion avoids polluting the load-average with RCU-related sleeping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:50 -07:00
Paul E. McKenney	a9352f72d6	rcu: Priority-boost-related sleeps to idle priority This commit converts the long-standing schedule_timeout_interruptible() call used by RCU's priority-boosting kthreads to schedule_timeout_idle(). This conversion avoids polluting the load-average with RCU-related sleeping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:50 -07:00
Paul E. McKenney	77865dea25	rcu: Grace-period-kthread related sleeps to idle priority This commit converts the long-standing schedule_timeout_interruptible() and schedule_timeout_uninterruptible() calls used by RCU's grace-period kthread to schedule_timeout_idle(). This conversion avoids polluting the load-average with RCU-related sleeping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:49 -07:00
Paul E. McKenney	f8466f9468	rcu: Add comment documenting rcu_callback_map's purpose The rcu_callback_map lockdep_map structure was added back in 2013, but its purpose has become obscure. This commit therefore documments that the purpose of rcu_callback map is, in the words of commit `24ef659a85` ("rcu: Provide better diagnostics for blocking in RCU callback functions"), to help lockdep to tie an "inappropriate voluntary context switch back to the fact that the function is being invoked from within a callback." Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:49 -07:00
Paul E. McKenney	e816d56fad	rcu: Add callbacks-invoked counters This commit adds a count of the callbacks invoked to the per-CPU rcu_data structure. This count is printed by the show_rcu_gp_kthreads() that is invoked by rcutorture and the RCU CPU stall-warning code. It is also intended for use by drgn. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:49 -07:00
Wei Yang	abfce04148	rcu: Simplify the calculation of rcu_state.ncpus There is only 1 bit set in mask, which means that the only difference between oldmask and the new one will be at the position where the bit is set in mask. This commit therefore updates rcu_state.ncpus by checking whether the bit in mask is already set in rnp->expmaskinitnext. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:49 -07:00
Wei Yang	7ee880b7bf	rcu: Initialize and destroy rcu_synchronize only when necessary The __wait_rcu_gp() function unconditionally initializes and cleans up each element of rs_array[], whether used or not. This is slightly wasteful and rather confusing, so this commit skips both initialization and cleanup for duplicate callback functions. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:49 -07:00
Mauro Carvalho Chehab	f2286ab995	docs: RCU: Convert stallwarn.txt to ReST - Add a SPDX header; - Adjust document and section titles; - Fix list markups; - Some whitespace fixes and new line breaks; - Mark literal blocks as such; - Add it to RCU/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:11 -07:00
Mauro Carvalho Chehab	43cb5451df	docs: RCU: Convert torture.txt to ReST - Add a SPDX header; - Adjust document and section titles; - Some whitespace fixes and new line breaks; - Mark literal blocks as such; - Add it to RCU/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-29 11:58:11 -07:00
Peter Zijlstra	b58e733fd7	rcu: Fixup noinstr warnings A KCSAN build revealed we have explicit annoations through atomic_() usage, switch to arch_atomic_() for the respective functions. vmlinux.o: warning: objtool: rcu_nmi_exit()+0x4d: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: rcu_dynticks_eqs_enter()+0x25: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: rcu_nmi_enter()+0x4f: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0x2a: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: __rcu_is_watching()+0x25: call to __kcsan_check_access() leaves .noinstr.text section Additionally, without the NOP in instrumentation_begin(), objtool would not detect the lack of the 'else instrumentation_begin();' branch in rcu_nmi_enter(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-25 08:24:32 -07:00
Peter Zijlstra	8b700983de	sched: Remove sched_set_() return value Ingo suggested that since the new sched_set_() functions are implemented using the 'nocheck' variants, they really shouldn't ever fail, so remove the return value. Cc: axboe@kernel.dk Cc: daniel.lezcano@linaro.org Cc: sudeep.holla@arm.com Cc: airlied@redhat.com Cc: broonie@kernel.org Cc: paulmck@kernel.org Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org>	2020-06-15 14:10:26 +02:00
Peter Zijlstra	ce1c8afd3f	sched,rcutorture: Convert to sched_set_fifo_low() Because SCHED_FIFO is a broken scheduler model (see previous patches) take away the priority field, the kernel can't possibly make an informed decision. Effectively no change. Cc: paulmck@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-15 14:10:25 +02:00
Peter Zijlstra	b1433395c4	sched,rcuperf: Convert to sched_set_fifo_low() Because SCHED_FIFO is a broken scheduler model (see previous patches) take away the priority field, the kernel can't possibly make an informed decision. Effectively no change. Cc: paulmck@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-15 14:10:24 +02:00
Linus Torvalds	081096d98b	TTY/Serial driver updates for 5.8-rc1 Here is the tty and serial driver updates for 5.8-rc1 Nothing huge at all, just a lot of little serial driver fixes, updates for new devices and features, and other small things. Full details are in the shortlog. Note, you will get a conflict merging with your tree in the Documentation/devicetree/bindings/serial/rs485.yaml file, but it should be pretty obvious what to do. If not, I'm sure Rob will clean it all up afterwards :) All of these have been in linux-next with no issues for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXtzpCg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylRxACgjGtOKPjahONL4lWd0F8ZYEcyw7sAn34woBCO BDUV3kolrRQ4OYNJWsHP =TvqG -----END PGP SIGNATURE----- Merge tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here is the tty and serial driver updates for 5.8-rc1 Nothing huge at all, just a lot of little serial driver fixes, updates for new devices and features, and other small things. Full details are in the shortlog. All of these have been in linux-next with no issues for a while" * tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (67 commits) tty: serial: qcom_geni_serial: Add 51.2MHz frequency support tty: serial: imx: clear Ageing Timer Interrupt in handler serial: 8250_fintek: Add F81966 Support sc16is7xx: Add flag to activate IrDA mode dt-bindings: sc16is7xx: Add flag to activate IrDA mode serial: 8250: Support rs485 bus termination GPIO serial: 8520_port: Fix function param documentation dt-bindings: serial: Add binding for rs485 bus termination GPIO vt: keyboard: avoid signed integer overflow in k_ascii serial: 8250: Enable 16550A variants by default on non-x86 tty: hvc_console, fix crashes on parallel open/close serial: imx: Initialize lock for non-registered console sc16is7xx: Read the LSR register for basic device presence check sc16is7xx: Allow sharing the IRQ line sc16is7xx: Use threaded IRQ sc16is7xx: Always use falling edge IRQ tty: n_gsm: Fix bogus i++ in gsm_data_kick tty: n_gsm: Remove unnecessary test in gsm_print_packet() serial: stm32: add no_console_suspend support tty: serial: fsl_lpuart: Use __maybe_unused instead of #if CONFIG_PM_SLEEP ...	2020-06-07 09:52:36 -07:00
Ingo Molnar	5fdeefa053	Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull RCU fix from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-06-03 11:14:37 +02:00
Kefeng Wang	b3e2d20973	rcuperf: Fix printk format warning Using "%zu" to fix following warning, kernel/rcu/rcuperf.c: In function ‘kfree_perf_init’: include/linux/kern_levels.h:5:18: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘unsigned int’ [-Wformat=] Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-06-02 08:41:37 -07:00
Ingo Molnar	cb3cb6733f	Merge branch 'WIP.core/rcu' into core/rcu, to pick up two x86/entry dependencies Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-06-01 11:47:29 +02:00
Peter Zijlstra	806f04e9fd	rcu: Allow for smp_call_function() running callbacks from idle Current RCU hard relies on smp_call_function() callbacks running from interrupt context. A pending optimization is going to break that, it will allow idle CPUs to run the callbacks from the idle loop. This avoids raising the IPI on the requesting CPU and avoids handling an exception on the receiving CPU. Change rcu_is_cpu_rrupt_from_idle() to also accept task context, provided it is the idle task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20200527171236.GC706495@hirez.programming.kicks-ass.net	2020-05-28 10:50:12 +02:00
Thomas Gleixner	07325d4a90	rcu: Provide rcu_irq_exit_check_preempt() Provide a debug check which can be invoked from exception return to kernel mode before an attempt is made to schedule. Warn if RCU is not ready for this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20200521202117.089709607@linutronix.de	2020-05-26 19:05:11 +02:00
Paul E. McKenney	aaf2bc50df	rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter() There will likely be exception handlers that can sleep, which rules out the usual approach of invoking rcu_nmi_enter() on entry and also rcu_nmi_exit() on all exit paths. However, the alternative approach of just not calling anything can prevent RCU from coaxing quiescent states from nohz_full CPUs that are looping in the kernel: RCU must instead IPI them explicitly. It would be better to enable the scheduler tick on such CPUs to interact with RCU in a lighter-weight manner, and this enabling is one of the things that rcu_nmi_enter() currently does. What is needed is something that helps RCU coax quiescent states while not preventing subsequent sleeps. This commit therefore splits out the nohz_full scheduler-tick enabling from the rest of the rcu_nmi_enter() logic into a new function named rcu_irq_enter_check_tick(). [ tglx: Renamed the function and made it a nop when context tracking is off ] [ mingo: Fixed a CONFIG_NO_HZ_FULL assumption, harmonized and fixed all the comment blocks and cleaned up rcu_nmi_enter()/exit() definitions. ] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200521202116.996113173@linutronix.de	2020-05-26 19:04:18 +02:00
Thomas Gleixner	b1fcf9b83c	rcu: Provide __rcu_is_watching() Same as rcu_is_watching() but without the preempt_disable/enable() pair inside the function. It is merked noinstr so it ends up in the non-instrumentable text section. This is useful for non-preemptible code especially in the low level entry section. Using rcu_is_watching() there results in a call to the preempt_schedule_notrace() thunk which triggers noinstr section warnings in objtool. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200512213810.518709291@linutronix.de	2020-05-19 15:51:21 +02:00
Thomas Gleixner	8ae0ae6737	rcu: Provide rcu_irq_exit_preempt() Interrupts and exceptions invoke rcu_irq_enter() on entry and need to invoke rcu_irq_exit() before they either return to the interrupted code or invoke the scheduler due to preemption. The general assumption is that RCU idle code has to have preemption disabled so that a return from interrupt cannot schedule. So the return from interrupt code invokes rcu_irq_exit() and preempt_schedule_irq(). If there is any imbalance in the rcu_irq/nmi* invocations or RCU idle code had preemption enabled then this goes unnoticed until the CPU goes idle or some other RCU check is executed. Provide rcu_irq_exit_preempt() which can be invoked from the interrupt/exception return code in case that preemption is enabled. It invokes rcu_irq_exit() and contains a few sanity checks in case that CONFIG_PROVE_RCU is enabled to catch such issues directly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134904.364456424@linutronix.de	2020-05-19 15:51:21 +02:00
Paul E. McKenney	9ea366f669	rcu: Make RCU IRQ enter/exit functions rely on in_nmi() The rcu_nmi_enter_common() and rcu_nmi_exit_common() functions take an "irq" parameter that indicates whether these functions have been invoked from an irq handler (irq==true) or an NMI handler (irq==false). However, recent changes have applied notrace to a few critical functions such that rcu_nmi_enter_common() and rcu_nmi_exit_common() many now rely on in_nmi(). Note that in_nmi() works no differently than before, but rather that tracing is now prohibited in code regions where in_nmi() would incorrectly report NMI state. Therefore remove the "irq" parameter and inline rcu_nmi_enter_common() and rcu_nmi_exit_common() into rcu_nmi_enter() and rcu_nmi_exit(), respectively. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/20200505134101.617130349@linutronix.de	2020-05-19 15:51:21 +02:00
Thomas Gleixner	ff5c4f5cad	rcu/tree: Mark the idle relevant functions noinstr These functions are invoked from context tracking and other places in the low level entry code. Move them into the .noinstr.text section to exclude them from instrumentation. Mark the places which are safe to invoke traceable functions with instrumentation_begin/end() so objtool won't complain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/20200505134100.575356107@linutronix.de	2020-05-19 15:51:20 +02:00
Emil Velikov	0ca650c430	rcu: constify sysrq_key_op With earlier commits, the API no longer discards the const-ness of the sysrq_key_op. As such we can add the notation. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: linux-kernel@vger.kernel.org Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: rcu@vger.kernel.org Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://lore.kernel.org/r/20200513214351.2138580-11-emil.l.velikov@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-05-15 14:53:20 +02:00
Paul E. McKenney	f736e0f1a5	Merge branches 'fixes.2020.04.27a', 'kfree_rcu.2020.04.27a', 'rcu-tasks.2020.04.27a', 'stall.2020.04.27a' and 'torture.2020.05.07a' into HEAD fixes.2020.04.27a: Miscellaneous fixes. kfree_rcu.2020.04.27a: Changes related to kfree_rcu(). rcu-tasks.2020.04.27a: Addition of new RCU-tasks flavors. stall.2020.04.27a: RCU CPU stall-warning updates. torture.2020.05.07a: Torture-test updates.	2020-05-07 10:18:32 -07:00
Paul E. McKenney	3c80b40245	rcutorture: Convert ULONG_CMP_LT() to time_before() This commit converts three ULONG_CMP_LT() invocations in rcutorture to time_before() to reflect the fact that they are comparing timestamps to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-05-07 10:15:29 -07:00
Jason Yan	afbc1574f1	rcutorture: Make rcu_fwds and rcu_fwd_emergency_stop static This commit fixes the following sparse warning: kernel/rcu/rcutorture.c:1695:16: warning: symbol 'rcu_fwds' was not declared. Should it be static? kernel/rcu/rcutorture.c:1696:6: warning: symbol 'rcu_fwd_emergency_stop' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-05-07 10:15:29 -07:00
Paul E. McKenney	55b2dcf587	rcu: Allow rcutorture to starve grace-period kthread This commit provides an rcutorture.stall_gp_kthread module parameter to allow rcutorture to starve the grace-period kthread. This allows testing the code that detects such starvation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-05-07 10:15:28 -07:00
Paul E. McKenney	19a8ff956c	rcutorture: Add flag to produce non-busy-wait task stalls This commit aids testing of RCU task stall warning messages by adding an rcutorture.stall_cpu_block module parameter that results in the induced stall sleeping within the RCU read-side critical section. Spinning with interrupts disabled is still available via the rcutorture.stall_cpu_irqsoff module parameter, and specifying neither of these two module parameters will spin with preemption disabled. Note that sleeping (as opposed to preemption) results in additional complaints from RCU at context-switch time, so yet more testing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-05-07 10:15:28 -07:00
Paul E. McKenney	c9527bebb0	rcutorture: Mark data-race potential for rcu_barrier() test statistics The n_barrier_successes, n_barrier_attempts, and n_rcu_torture_barrier_error variables are updated (without access markings) by the main rcu_barrier() test kthread, and accessed (also without access markings) by the rcu_torture_stats() kthread. This of course can result in KCSAN complaints. Because the accesses are in diagnostic prints, this commit uses data_race() to excuse the diagnostic prints from the data race. If this were to ever cause bogus statistics prints (for example, due to store tearing), any misleading information would be disambiguated by the presence or absence of an rcutorture splat. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to the mild consequences of the failure, namely a confusing rcutorture console message. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2020-04-27 11:05:13 -07:00
Paul E. McKenney	3b2a473985	rcutorture: Add KCSAN stubs This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:05:13 -07:00
Paul E. McKenney	33b2b93bd8	rcu: Remove self-stack-trace when all quiescent states seen When all quiescent states have been seen, it is normally the grace-period kthread that is in trouble. Although the existing stack trace from the current CPU might possibly provide useful information, experience indicates that there is too much noise for this to be worthwhile. This commit therefore removes this stack trace from the output. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:04:37 -07:00
Paul E. McKenney	8837582517	rcu: When GP kthread is starved, tag idle threads as false positives If the grace-period kthread is starved, idle threads' extended quiescent states are not reported. These idle threads thus wrongly appear to be blocking the current grace period. This commit therefore tags such idle threads as probable false positives when the grace-period kthread is being starved. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:04:37 -07:00
Paul E. McKenney	654db05cee	rcu: Use data_race() for RCU expedited CPU stall-warning prints Although the accesses used to determine whether or not an expedited stall should be printed are an integral part of the concurrency algorithm governing use of the corresponding variables, the values that are simply printed are ancillary. As such, it is best to use data_race() for these accesses in order to provide the greatest latitude in the use of KCSAN for the other accesses that are an integral part of the algorithm. This commit therefore changes the relevant uses of READ_ONCE() to data_race(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:04:37 -07:00
Paul E. McKenney	25246fc831	rcu-tasks: Allow standalone use of TASKS_{TRACE_,}RCU This commit allows TASKS_TRACE_RCU to be used independently of TASKS_RCU and vice versa. [ paulmck: Fix conditional compilation per kbuild test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:53 -07:00
Paul E. McKenney	7e0669c3e9	rcu-tasks: Add IPI failure count to statistics This commit adds a failure-return count for smp_call_function_single(), and adds this to the console messages for rcutorture writer stalls and at the end of rcutorture testing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:53 -07:00
Paul E. McKenney	edf3775f0a	rcu-tasks: Add count for idle tasks on offline CPUs This commit adds a counter for the number of times the quiescent state was an idle task associated with an offline CPU, and prints this count at the end of rcutorture runs and at stall time. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:53 -07:00
Paul E. McKenney	40471509be	rcu-tasks: Add rcu_dynticks_zero_in_eqs() effectiveness statistics This commit adds counts of the number of calls and number of successful calls to rcu_dynticks_zero_in_eqs(), which are printed at the end of rcutorture runs and at stall time. This allows evaluation of the effectiveness of rcu_dynticks_zero_in_eqs(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	9796e1ae73	rcu-tasks: Make RCU tasks trace also wait for idle tasks This commit scans the CPUs, adding each CPU's idle task to the list of tasks that need quiescent states. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	7e3b70e070	rcu-tasks: Handle the running-offline idle-task special case The idle task corresponding to an offline CPU can appear to be running while that CPU is offline. This commit therefore adds checks for this situation, treating it as a quiescent state. Because the tasklist scan and the holdout-list scan now exclude CPU-hotplug operations, readers on the CPU-hotplug paths are still waited for. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	81b4a7bc3b	rcu-tasks: Disable CPU hotplug across RCU tasks trace scans This commit disables CPU hotplug across RCU tasks trace scans, which is a first step towards correctly recognizing idle tasks "running" on offline CPUs. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	b38f57c1fe	rcu-tasks: Allow rcu_read_unlock_trace() under scheduler locks The rcu_read_unlock_trace() can invoke rcu_read_unlock_trace_special(), which in turn can call wake_up(). Therefore, if any scheduler lock is held across a call to rcu_read_unlock_trace(), self-deadlock can occur. This commit therefore uses the irq_work facility to defer the wake_up() to a clean environment where no scheduler locks will be held. Reported-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: Update #includes for m68k per kbuild test robot. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	7d0c9c50c5	rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so built Systems running CPU-bound real-time task do not want IPIs sent to CPUs executing nohz_full userspace tasks. Battery-powered systems don't want IPIs sent to idle CPUs in low-power mode. Unfortunately, RCU tasks trace can and will send such IPIs in some cases. Both of these situations occur only when the target CPU is in RCU dyntick-idle mode, in other words, when RCU is not watching the target CPU. This suggests that CPUs in dyntick-idle mode should use memory barriers in outermost invocations of rcu_read_lock_trace() and rcu_read_unlock_trace(), which would allow the RCU tasks trace grace period to directly read out the target CPU's read-side state. One challenge is that RCU tasks trace is not targeting a specific CPU, but rather a task. And that task could switch from one CPU to another at any time. This commit therefore uses try_invoke_on_locked_down_task() and checks for task_curr() in trc_inspect_reader_notrunning(). When this condition holds, the target task is running and cannot move. If CONFIG_TASKS_TRACE_RCU_READ_MB=y, the new rcu_dynticks_zero_in_eqs() function can be used to check if the specified integer (in this case, t->trc_reader_nesting) is zero while the target CPU remains in that same dyntick-idle sojourn. If so, the target task is in a quiescent state. If not, trc_read_check_handler() must indicate failure so that the grace-period kthread can take appropriate action or retry after an appropriate delay, as the case may be. With this change, given CONFIG_TASKS_TRACE_RCU_READ_MB=y, if a given CPU remains idle or a given task continues executing in nohz_full mode, the RCU tasks trace grace-period kthread will detect this without the need to send an IPI. Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	9ae58d7bd1	rcu-tasks: Add Kconfig option to mediate smp_mb() vs. IPI This commit provides a new TASKS_TRACE_RCU_READ_MB Kconfig option that enables use of read-side memory barriers by both rcu_read_lock_trace() and rcu_read_unlock_trace() when the are executed with the current->trc_reader_special.b.need_mb flag set. This flag is currently never set. Doing that is the subject of a later commit. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	238dbce39e	rcu-tasks: Add grace-period and IPI counts to statistics This commit adds a grace-period count and a count of IPIs sent since boot, which is printed in response to rcutorture writer stalls and at the end of rcutorture testing. These counts will be used to evaluate various schemes to reduce the number of IPIs sent. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	276c410448	rcu-tasks: Split ->trc_reader_need_end This commit splits ->trc_reader_need_end by using the rcu_special union. This change permits readers to check to see if a memory barrier is required without any added overhead in the common case where no such barrier is required. This commit also adds the read-side checking. Later commits will add the machinery to properly set the new ->trc_reader_special.b.need_mb field. This commit also makes rcu_read_unlock_trace_special() tolerate nested read-side critical sections within interrupt and NMI handlers. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	b0afa0f056	rcu-tasks: Provide boot parameter to delay IPIs until late in grace period This commit provides a rcupdate.rcu_task_ipi_delay kernel boot parameter that specifies how old the RCU tasks trace grace period must be before the grace-period kthread starts sending IPIs. This delay allows more tasks to pass through rcu_tasks_qs() quiescent states, thus reducing (or even eliminating) the number of IPIs that must be sent. On a short rcutorture test setting this kernel boot parameter to HZ/2 resulted in zero IPIs for all 877 RCU-tasks trace grace periods that elapsed during that test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	88092d0c99	rcu-tasks: Add a grace-period start time for throttling and debug This commit adds a place to record the grace-period start in jiffies. This will be used by later commits for debugging purposes and to throttle IPIs early in the grace period. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	43766c3ead	rcu-tasks: Make RCU Tasks Trace make use of RCU scheduler hooks This commit makes the calls to rcu_tasks_qs() detect and report quiescent states for RCU tasks trace. If the task is in a quiescent state and if ->trc_reader_checked is not yet set, the task sets its own ->trc_reader_checked. This will cause the grace-period kthread to remove it from the holdout list if it still remains there. [ paulmck: Fix conditional compilation per kbuild test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	af051ca4e4	rcu-tasks: Make rcutorture writer stall output include GP state This commit adds grace-period state and time to the rcutorture writer stall output. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:52 -07:00
Paul E. McKenney	e21408ceec	rcu-tasks: Add RCU tasks to rcutorture writer stall output This commit adds state for each RCU-tasks flavor to the rcutorture writer stall output. The initial state is minimal, but you have to start somewhere. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Fixes based on feedback from kbuild test robot. ]	2020-04-27 11:03:51 -07:00
Paul E. McKenney	8fd8ca388c	rcu-tasks: Move #ifdef into tasks.h This commit pushes the #ifdef CONFIG_TASKS_RCU_GENERIC from kernel/rcu/update.c to kernel/rcu/tasks.h in order to improve readability as more APIs are added. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	4593e772b5	rcu-tasks: Add stall warnings for RCU Tasks Trace This commit adds RCU CPU stall warnings for RCU Tasks Trace. These dump out any tasks blocking the current grace period, as well as any CPUs that have not responded to an IPI request. This happens in two phases, when initially extracting state from the tasks and later when waiting for any holdout tasks to check in. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	c1a76c0b6a	rcutorture: Add torture tests for RCU Tasks Trace This commit adds the definitions required to torture the tracing flavor of RCU tasks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	d5f177d35c	rcu-tasks: Add an RCU Tasks Trace to simplify protection of tracing hooks Because RCU does not watch exception early-entry/late-exit, idle-loop, or CPU-hotplug execution, protection of tracing and BPF operations is needlessly complicated. This commit therefore adds a variant of Tasks RCU that: o Has explicit read-side markers to allow finite grace periods in the face of in-kernel loops for PREEMPT=n builds. These markers are rcu_read_lock_trace() and rcu_read_unlock_trace(). o Protects code in the idle loop, exception entry/exit, and CPU-hotplug code paths. In this respect, RCU-tasks trace is similar to SRCU, but with lighter-weight readers. o Avoids expensive read-side instruction, having overhead similar to that of Preemptible RCU. There are of course downsides: o The grace-period code can send IPIs to CPUs, even when those CPUs are in the idle loop or in nohz_full userspace. This is mitigated by later commits. o It is necessary to scan the full tasklist, much as for Tasks RCU. o There is a single callback queue guarded by a single lock, again, much as for Tasks RCU. However, those early use cases that request multiple grace periods in quick succession are expected to do so from a single task, which makes the single lock almost irrelevant. If needed, multiple callback queues can be provided using any number of schemes. Perhaps most important, this variant of RCU does not affect the vanilla flavors, rcu_preempt and rcu_sched. The fact that RCU Tasks Trace readers can operate from idle, offline, and exception entry/exit in no way enables rcu_preempt and rcu_sched readers to do so. The memory ordering was outlined here: https://lore.kernel.org/lkml/20200319034030.GX3199@paulmck-ThinkPad-P72/ This effort benefited greatly from off-list discussions of BPF requirements with Alexei Starovoitov and Andrii Nakryiko. At least some of the on-list discussions are captured in the Link: tags below. In addition, KCSAN was quite helpful in finding some early bugs. Link: https://lore.kernel.org/lkml/20200219150744.428764577@infradead.org/ Link: https://lore.kernel.org/lkml/87mu8p797b.fsf@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/lkml/20200225221305.605144982@linutronix.de/ Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andriin@fb.com> [ paulmck: Apply feedback from Steve Rostedt and Joel Fernandes. ] [ paulmck: Decrement trc_n_readers_need_end upon IPI failure. ] [ paulmck: Fix locking issue reported by rcutorture. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	d01aa2633b	rcu-tasks: Code movement to allow more Tasks RCU variants This commit does nothing but move rcu_tasks_wait_gp() up to a new section for common code. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	e4fe5dd6f2	rcu-tasks: Further refactor RCU-tasks to allow adding more variants This commit refactors RCU tasks to allow variants to be added. These variants will share the current Tasks-RCU tasklist scan and the holdout list processing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	c97d12a63c	rcu-tasks: Use unique names for RCU-Tasks kthreads and messages This commit causes the flavors of RCU Tasks to use different names for their kthreads and in their console messages. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	3d6e43c75d	rcutorture: Add torture tests for RCU Tasks Rude This commit adds the definitions required to torture the rude flavor of RCU tasks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	c84aad7654	rcu-tasks: Add an RCU-tasks rude variant This commit adds a "rude" variant of RCU-tasks that has as quiescent states schedule(), cond_resched_tasks_rcu_qs(), userspace execution, and (in theory, anyway) cond_resched(). In other words, RCU-tasks rude readers are regions of code with preemption disabled, but excluding code early in the CPU-online sequence and late in the CPU-offline sequence. Updates make use of IPIs and force an IPI and a context switch on each online CPU. This variant is useful in some situations in tracing. Suggested-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: Apply EXPORT_SYMBOL_GPL() feedback from Qiujun Huang. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Apply review feedback from Steve Rostedt. ]	2020-04-27 11:03:51 -07:00
Paul E. McKenney	5873b8a94e	rcu-tasks: Refactor RCU-tasks to allow variants to be added This commit splits out generic processing from RCU-tasks-specific processing in order to allow additional flavors to be added. It also adds a def_bool TASKS_RCU_GENERIC to enable the common RCU-tasks infrastructure code. This is primarily, but not entirely, a code-movement commit. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	9cf8fc6fab	rcutorture: Add a test for synchronize_rcu_mult() This commit adds a crude test for synchronize_rcu_mult(). This is currently a smoke test rather than a high-quality stress test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:51 -07:00
Paul E. McKenney	07e105158d	rcu-tasks: Create struct to hold state information This commit creates an rcu_tasks struct to hold state information for RCU Tasks. This is a preparation commit for adding additional flavors of Tasks RCU, each of which would have its own rcu_tasks struct. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Paul E. McKenney	eacd6f04a1	rcu-tasks: Move Tasks RCU to its own file This code-movement-only commit is in preparation for adding an additional flavor of Tasks RCU, which relies on workqueues to detect grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Paul E. McKenney	5bef8da66a	rcu: Add per-task state to RCU CPU stall warnings Currently, an RCU-preempt CPU stall warning simply lists the PIDs of those tasks holding up the current grace period. This can be helpful, but more can be even more helpful. To this end, this commit adds the nesting level, whether the task thinks it was preempted in its current RCU read-side critical section, whether RCU core has asked this task for a quiescent state, whether the expedited-grace-period hint is set, and whether the task believes that it is on the blocked-tasks list (it must be, or it would not be printed, but if things are broken, best not to take too much for granted). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Paul E. McKenney	66777e5821	rcu-tasks: Use context-switch hook for PREEMPT=y kernels Currently, the PREEMPT=y version of rcu_note_context_switch() does not invoke rcu_tasks_qs(), and we need it to in order to keep RCU Tasks Trace's IPIs down to a dull roar. This commit therefore enables this hook. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Paul E. McKenney	ac3caf8274	rcu: Add comments marking transitions between RCU watching and not It is not as clear as it might be just where in RCU's idle entry/exit code RCU stops and starts watching the current CPU. This commit therefore adds comments calling out the transitions. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Paul E. McKenney	52b1fc3f79	rcutorture: Add test of holding scheduler locks across rcu_read_unlock() Now that it should be safe to hold scheduler locks across rcu_read_unlock(), even in cases where the corresponding RCU read-side critical section might have been preempted and boosted, the commit adds a test of this capability to rcutorture. This has been tested on current mainline (which can deadlock in this situation), and lockdep duly reported the expected deadlock. On -rcu, lockdep is silent, thus far, anyway. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Lai Jiangshan	5f5fa7ea89	rcu: Don't use negative nesting depth in __rcu_read_unlock() Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_read_unlock() being invoked within such a handler that interrupted an __rcu_read_unlock_special(), in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for __rcu_read_unlock() to set the nesting level negative. This commit therefore removes this recursion-protection code from __rcu_read_unlock(). [ paulmck: Let rcu_exp_handler() continue to call rcu_report_exp_rdp(). ] [ paulmck: Adjust other checks given no more negative nesting. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Lai Jiangshan	f0bdf6d473	rcu: Remove unused ->rcu_read_unlock_special.b.deferred_qs field The ->rcu_read_unlock_special.b.deferred_qs field is set to true in rcu_read_unlock_special() but never set to false. This is not particularly useful, so this commit removes this field. The only possible justification for this field is to ease debugging of RCU deferred quiscent states, but the combination of the other ->rcu_read_unlock_special fields plus ->rcu_blocked_node and of course ->rcu_read_lock_nesting should cover debugging needs. And if this last proves incorrect, this patch can always be reverted, along with the required setting of ->rcu_read_unlock_special.b.deferred_qs to false in rcu_preempt_deferred_qs_irqrestore(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Lai Jiangshan	07b4a930fc	rcu: Don't set nesting depth negative in rcu_preempt_deferred_qs() Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_preempt_deferred_qs() being invoked within such a handler that interrupted an extended RCU read-side critical section, in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for rcu_preempt_deferred_qs() to set the nesting level negative. This commit therefore removes this recursion-protection code from rcu_preempt_deferred_qs(). [ paulmck: Fix typo in commit log per Steve Rostedt. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Paul E. McKenney	e4453d8a1c	rcu: Make rcu_read_unlock_special() safe for rq/pi locks The scheduler is currently required to hold rq/pi locks across the entire RCU read-side critical section or not at all. This is inconvenient and leaves traps for the unwary, including the author of this commit. But now that excessively long grace periods enable scheduling-clock interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in rcu_read_unlock_special() can be dispensed with. In other words, the rcu_read_unlock_special() function can refrain from doing wakeups unless such wakeups are guaranteed safe. This commit therefore avoids unsafe wakeups, freeing the scheduler to hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU read-side critical section might have been preempted. This commit also updates RCU's requirements documentation. This commit is inspired by a patch from Lai Jiangshan: https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com This commit is further intended to be a step towards his goal of permitting the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock(). Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Paul E. McKenney	c76e7e0bce	rcu: Add KCSAN stubs to update.c This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:03:50 -07:00
Paul E. McKenney	6be7436d22	rcu: Add rcu_gp_might_be_stalled() This commit adds rcu_gp_might_be_stalled(), which returns true if there is some reason to believe that the RCU grace period is stalled. The use case is where an RCU free-memory path needs to allocate memory in order to free it, a situation that should be avoided where possible. But where it is necessary, there is always the alternative of using synchronize_rcu() to wait for a grace period in order to avoid the allocation. And if the grace period is stalled, allocating memory to asynchronously wait for it is a bad idea of epic proportions: Far better to let others use the memory, because these others might actually be able to free that memory before the grace period ends. Thus, rcu_gp_might_be_stalled() can be used to help decide whether allocating memory on an RCU free path is a semi-reasonable course of action. Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:02:50 -07:00
Joel Fernandes (Google)	a6a82ce18b	rcu/tree: Count number of batched kfree_rcu() locklessly We can relax the correctness of counting of number of queued objects in favor of not hurting performance, by locklessly sampling per-cpu counters. This should be Ok since under high memory pressure, it should not matter if we are off by a few objects while counting. The shrinker will still do the reclaim. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Remove unused "flags" variable. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:02:50 -07:00
Joel Fernandes (Google)	9154244c1a	rcu/tree: Add a shrinker to prevent OOM due to kfree_rcu() batching To reduce grace periods and improve kfree() performance, we have done batching recently dramatically bringing down the number of grace periods while giving us the ability to use kfree_bulk() for efficient kfree'ing. However, this has increased the likelihood of OOM condition under heavy kfree_rcu() flood on small memory systems. This patch introduces a shrinker which starts grace periods right away if the system is under memory pressure due to existence of objects that have still not started a grace period. With this patch, I do not observe an OOM anymore on a system with 512MB RAM and 8 CPUs, with the following rcuperf options: rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 rcuperf.kfree_mult=2 Otherwise it easily OOMs with the above parameters. NOTE: 1. On systems with no memory pressure, the patch has no effect as intended. 2. In the future, we can use this same mechanism to prevent grace periods from happening even more, by relying on shrinkers carefully. Cc: urezki@gmail.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:02:50 -07:00
Joel Fernandes (Google)	f87dc80800	rcuperf: Add ability to increase object allocation size This allows us to increase memory pressure dynamically using a new rcuperf boot command line parameter called 'rcumult'. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:02:50 -07:00
Paul E. McKenney	e2f3ccfa62	rcu: Convert rcu_nohz_full_cpu() ULONG_CMP_LT() to time_before() This commit converts the ULONG_CMP_LT() in rcu_nohz_full_cpu() to time_before() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:01:17 -07:00
Paul E. McKenney	7b2413111a	rcu: Convert rcu_initiate_boost() ULONG_CMP_GE() to time_after() This commit converts the ULONG_CMP_GE() in rcu_initiate_boost() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:01:16 -07:00
Paul E. McKenney	29ffebc5fc	rcu: Convert ULONG_CMP_GE() to time_after() for jiffy comparison This commit converts the ULONG_CMP_GE() in rcu_gp_fqs_loop() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:01:16 -07:00
Jules Irenge	da44cd6c8e	rcu: Replace 1 by true Coccinelle reports a warning at use_softirq declaration WARNING: Assignment of 0/1 to bool variable The root cause is use_softirq a variable of bool type is initialised with the integer 1 Replacing 1 with value true solve the issue. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:01:16 -07:00
Jules Irenge	a66dbda789	rcu: Replace assigned pointer ret value by corresponding boolean value Coccinelle reports warnings at rcu_read_lock_held_common() WARNING: Assignment of 0/1 to bool variable To fix this, the assigned pointer ret values are replaced by corresponding boolean value. Given that ret is a pointer of bool type Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:01:16 -07:00
Paul E. McKenney	62ae19511f	rcu: Mark rcu_state.gp_seq to detect more concurrent writes The rcu_state structure's gp_seq field is only to be modified by the RCU grace-period kthread, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. This commit applies KCSAN-specific primitives, so cannot go upstream until KCSAN does. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:01:16 -07:00
Mauro Carvalho Chehab	c28d5c09d0	rcu: Get rid of some doc warnings in update.c This commit escapes *ret, because otherwise the documentation system thinks that this is an incomplete emphasis block: ./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:70: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:82: WARNING: Inline emphasis start-string without end-string. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:01:16 -07:00
Zhaolong Zhang	fcbcc0e700	rcu: Fix the (t=0 jiffies) false positive It is possible that an over-long grace period will end while the RCU CPU stall warning message is printing. In this case, the estimate of the offending grace period's duration can be erroneous due to refetching of rcu_state.gp_start, which will now be the time of the newly started grace period. Computation of this duration clearly needs to use the start time for the old over-long grace period, not the fresh new one. This commit avoids such errors by causing both print_other_cpu_stall() and print_cpu_stall() to reuse the value previously fetched by their caller. Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:01:16 -07:00
Paul E. McKenney	1fca4d12f4	rcu: Expedite first two FQS scans under callback-overload conditions Even if some CPUs have excessive numbers of callbacks, RCU's grace-period kthread will still wait normally between successive force-quiescent-state scans. The first two are the most important, as they are the ones that enlist aid from the scheduler when overloaded. This commit therefore omits the wait before the first and the second force-quiescent-state scan under callback-overload conditions. This approach was inspired by a discussion with Jeff Roberson. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-04-27 11:01:16 -07:00

... 2 3 4 5 6 ...

1766 Commits