linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-20 18:54:09 +08:00

Author	SHA1	Message	Date
Ingo Molnar	40e7d7bdc1	Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull RCU fix from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-04-14 08:36:41 +02:00
Paul E. McKenney	bf37da98c5	rcu: Don't acquire lock in NMI handler in rcu_nmi_enter_common() The rcu_nmi_enter_common() function can be invoked both in interrupt and NMI handlers. If it is invoked from process context (as opposed to userspace or idle context) on a nohz_full CPU, it might acquire the CPU's leaf rcu_node structure's ->lock. Because this lock is held only with interrupts disabled, this is safe from an interrupt handler, but doing so from an NMI handler can result in self-deadlock. This commit therefore adds "irq" to the "if" condition so as to only acquire the ->lock from irq handlers or process context, never from an NMI handler. Fixes: `5b14557b07` ("rcu: Avoid tick_dep_set_cpu() misordering") Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> # 5.5.x	2020-04-05 14:22:15 -07:00
Linus Torvalds	4b9fd8a829	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Continued user-access cleanups in the futex code. - percpu-rwsem rewrite that uses its own waitqueue and atomic_t instead of an embedded rwsem. This addresses a couple of weaknesses, but the primary motivation was complications on the -rt kernel. - Introduce raw lock nesting detection on lockdep (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal lock differences. This too originates from -rt. - Reuse lockdep zapped chain_hlocks entries, to conserve RAM footprint on distro-ish kernels running into the "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep chain-entries pool. - Misc cleanups, smaller fixes and enhancements - see the changelog for details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t Documentation/locking/locktypes: Minor copy editor fixes Documentation/locking/locktypes: Further clarifications and wordsmithing m68knommu: Remove mm.h include from uaccess_no.h x86: get rid of user_atomic_cmpxchg_inatomic() generic arch_futex_atomic_op_inuser() doesn't need access_ok() x86: don't reload after cmpxchg in unsafe_atomic_op2() loop x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end() objtool: whitelist __sanitizer_cov_trace_switch() [parisc, s390, sparc64] no need for access_ok() in futex handling sh: no need of access_ok() in arch_futex_atomic_op_inuser() futex: arch_futex_atomic_op_inuser() calling conventions change completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() lockdep: Add posixtimer context tracing bits lockdep: Annotate irq_work lockdep: Add hrtimer context tracing bits lockdep: Introduce wait-type checks completion: Use simple wait queues sched/swait: Prepare usage in completions ...	2020-03-30 16:17:15 -07:00
Paul E. McKenney	aa93ec620b	Merge branches 'doc.2020.02.27a', 'fixes.2020.03.21a', 'kfree_rcu.2020.02.20a', 'locktorture.2020.02.20a', 'ovld.2020.02.20a', 'rcu-tasks.2020.02.20a', 'srcu.2020.02.20a' and 'torture.2020.02.20a' into HEAD doc.2020.02.27a: Documentation updates. fixes.2020.03.21a: Miscellaneous fixes. kfree_rcu.2020.02.20a: Updates to kfree_rcu(). locktorture.2020.02.20a: Lock torture-test updates. ovld.2020.02.20a: Updates to callback-overload handling. rcu-tasks.2020.02.20a: RCU-tasks updates. srcu.2020.02.20a: SRCU updates. torture.2020.02.20a: Torture-test updates.	2020-03-21 17:15:11 -07:00
Paul E. McKenney	127e29815b	rcu: Make rcu_barrier() account for offline no-CBs CPUs Currently, rcu_barrier() ignores offline CPUs, However, it is possible for an offline no-CBs CPU to have callbacks queued, and rcu_barrier() must wait for those callbacks. This commit therefore makes rcu_barrier() directly invoke the rcu_barrier_func() with interrupts disabled for such CPUs. This requires passing the CPU number into this function so that it can entrain the rcu_barrier() callback onto the correct CPU's callback list, given that the code must instead execute on the current CPU. While in the area, this commit fixes a bug where the first CPU's callback might have been invoked before rcu_segcblist_entrain() returned, which would also result in an early wakeup. Fixes: `5d6742b377` ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Apply optimization feedback from Boqun Feng. ] Cc: <stable@vger.kernel.org> # 5.5.x	2020-03-21 16:14:25 -07:00
Paul E. McKenney	0f11ad323d	rcu: Mark rcu_state.gp_seq to detect concurrent writes The rcu_state structure's gp_seq field is only to be modified by the RCU grace-period kthread, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-03-21 16:13:39 -07:00
Sebastian Andrzej Siewior	49915ac35c	lockdep: Annotate irq_work Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be invoked in softirq context on PREEMPT_RT. Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq context so lockdep knows that these can safely acquire a spinlock_t. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200321113242.643576700@linutronix.de	2020-03-21 16:00:24 +01:00
Peter Zijlstra	de8f5e4f2d	lockdep: Introduce wait-type checks Extend lockdep to validate lock wait-type context. The current wait-types are: LD_WAIT_FREE, /* wait free, rcu etc.. / LD_WAIT_SPIN, / spin loops, raw_spinlock_t etc.. / LD_WAIT_CONFIG, / CONFIG_PREEMPT_LOCK, spinlock_t etc.. / LD_WAIT_SLEEP, / sleeping locks, mutex_t etc.. */ Where lockdep validates that the current lock (the one being acquired) fits in the current wait-context (as generated by the held stack). This ensures that there is no attempt to acquire mutexes while holding spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In other words, its a more fancy might_sleep(). Obviously RCU made the entire ordeal more complex than a simple single value test because RCU can be acquired in (pretty much) any context and while it presents a context to nested locks it is not the same as it got acquired in. Therefore its necessary to split the wait_type into two values, one representing the acquire (outer) and one representing the nested context (inner). For most 'normal' locks these two are the same. [ To make static initialization easier we have the rule that: .outer == INV means .outer == .inner; because INV == 0. ] It further means that its required to find the minimal .inner of the held stack to compare against the outer of the new lock; because while 'normal' RCU presents a CONFIG type to nested locks, if it is taken while already holding a SPIN type it obviously doesn't relax the rules. Below is an example output generated by the trivial test code: raw_spin_lock(&foo); spin_lock(&bar); spin_unlock(&bar); raw_spin_unlock(&foo); [ BUG: Invalid wait context ] ----------------------------- swapper/0/1 is trying to lock: ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187 other info that might help us debug this: 1 lock held by swapper/0/1: #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187 The way to read it is to look at the new -{n,m} part in the lock description; -{3:3} for the attempted lock, and try and match that up to the held locks, which in this case is the one: -{2,2}. This tells that the acquiring lock requires a more relaxed environment than presented by the lock stack. Currently only the normal locks and RCU are converted, the rest of the lockdep users defaults to .inner = INV which is ignored. More conversions can be done when desired. The check for spinlock_t nesting is not enabled by default. It's a separate config option for now as there are known problems which are currently addressed. The config option allows to identify these problems and to verify that the solutions found are indeed solving them. The config switch will be removed and the checks will permanently enabled once the vast majority of issues has been addressed. [ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP] [ tglx: Add the config option ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200321113242.427089655@linutronix.de	2020-03-21 16:00:24 +01:00
Paul E. McKenney	9470a18fab	rcutorture: Manually clean up after rcu_barrier() failure Currently, if rcu_barrier() returns too soon, the test waits 100ms and then does another instance of the test. However, if rcu_barrier() were to have waited for more than 100ms too short a time, this could cause the test's rcu_head structures to be reused while they were still on RCU's callback lists. This can result in knock-on errors that obscure the original rcu_barrier() test failure. This commit therefore adds code that attempts to wait until all of the test's callbacks have been invoked. Of course, if RCU completely lost track of the corresponding rcu_head structures, this wait could be forever. This commit therefore also complains if this attempted recovery takes more than one second, and it also gives up when the test ends. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:31 -08:00
Paul E. McKenney	50d4b62970	rcutorture: Make rcu_torture_barrier_cbs() post from corresponding CPU Currently, rcu_torture_barrier_cbs() posts callbacks from whatever CPU it is running on, which means that all these kthreads might well be posting from the same CPU, which would drastically reduce the effectiveness of this test. This commit therefore uses IPIs to make the callbacks be posted from the corresponding CPU (given by local variable myid). If the IPI fails (which can happen if the target CPU is offline or does not exist at all), the callback is posted on whatever CPU is currently running. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:31 -08:00
Joel Fernandes (Google)	12af660321	rcuperf: Measure memory footprint during kfree_rcu() test During changes to kfree_rcu() code, we often check the amount of free memory. As an alternative to checking this manually, this commit adds a measurement in the test itself. It measures four times during the test for available memory, digitally filters these measurements to produce a running average with a weight of 0.5, and compares this digitally filtered value with the amount of available memory at the beginning of the test. Something like the following is printed at the end of the run: Total time taken by all kfree'ers: 6369738407 ns, loops: 10000, batches: 764, memory footprint: 216MB Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:31 -08:00
Paul E. McKenney	5396d31d3a	rcutorture: Annotation lockless accesses to rcu_torture_current The rcutorture global variable rcu_torture_current is accessed locklessly, so it must use the RCU pointer load/store primitives. This commit therefore adds several that were missed. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to this being used only by rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:31 -08:00
Paul E. McKenney	f042a436c8	rcutorture: Add READ_ONCE() to rcu_torture_count and rcu_torture_batch The rcutorture rcu_torture_count and rcu_torture_batch per-CPU variables are read locklessly, so this commit adds the READ_ONCE() to a load in order to avoid various types of compiler vandalism^Woptimization. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to this being rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:31 -08:00
Paul E. McKenney	102c14d2f8	rcutorture: Fix stray access to rcu_fwd_cb_nodelay The rcu_fwd_cb_nodelay variable suppresses excessively long read-side delays while carrying out an rcutorture forward-progress test. As such, it is accessed both by readers and updaters, and most of the accesses therefore use *_ONCE(). Except for one in rcu_read_delay(), which this commit fixes. This data race was reported by KCSAN. Not appropriate for backporting due to this being rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:31 -08:00
Paul E. McKenney	202489101f	rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race The ->rtort_pipe_count field in the rcu_torture structure checks for too-short grace periods, and is therefore read by rcutorture's readers while being updated by rcutorture's writers. This commit therefore adds the needed READ_ONCE() and WRITE_ONCE() invocations. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to this being rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:31 -08:00
Paul E. McKenney	4ab00bdd99	rcutorture: Suppress boottime bad-sequence warnings In normal production, an excessively long wait on a grace period (synchronize_rcu(), for example) at boottime is often just as bad as at any other time. In fact, given the desire for fast boot, any sort of long wait at boot is a bad idea. However, heavy rcutorture testing on large hyperthreaded systems can generate such long waits during boot as a matter of course. This commit therefore causes the rcupdate.rcu_cpu_stall_suppress_at_boot kernel boot parameter to suppress reporting of bootime bad-sequence warning due to excessively long grace-period waits. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:30 -08:00
Paul E. McKenney	58c53360b3	rcutorture: Allow boottime stall warnings to be suppressed In normal production, an RCU CPU stall warning at boottime is often just as bad as at any other time. In fact, given the desire for fast boot, any sort of long-term stall at boot is a bad idea. However, heavy rcutorture testing on large hyperthreaded systems can generate boottime RCU CPU stalls as a matter of course. This commit therefore provides a kernel boot parameter that suppresses reporting of boottime RCU CPU stall warnings and similarly of rcutorture writer stalls. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:30 -08:00
Paul E. McKenney	435508095a	rcutorture: Refrain from callback flooding during boot Additional rcutorture aggression can result in, believe it or not, boot times in excess of three minutes on large hyperthreaded systems. This is long enough for rcutorture to decide to do some callback flooding, which seems a bit excessive given that userspace cannot have started until long after boot, and it is userspace that does the real-world callback flooding. Worse yet, because Tiny RCU lacks forward-progress functionality, the looping-in-the-kernel tests can also be problematic during early boot. This commit therefore causes rcutorture to hold off on callback flooding until about the time that init is spawned, and the same for looping-in-the-kernel tests for Tiny RCU. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:03:30 -08:00
Paul E. McKenney	59ee0326cc	rcutorture: Suppress forward-progress complaints during early boot Some larger systems can take in excess of 50 seconds to complete their early boot initcalls prior to spawing init. This does not in any way help the forward-progress judgments of built-in rcutorture (when rcutorture is built as a module, the insmod or modprobe command normally cannot happen until some time after boot completes). This commit therefore suppresses such complaints until about the time that init is spawned. This also includes a fix to a stupid error located by kbuild test robot. [ paulmck: Apply kbuild test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Fix to nohz_full slow-expediting recovery logic, per bpetkov. ] [ paulmck: Restrict splat to CONFIG_PREEMPT_RT=y kernels and simplify. ] Tested-by: Borislav Petkov <bp@alien8.de>	2020-02-20 16:03:30 -08:00
Paul E. McKenney	710426068d	srcu: Hold srcu_struct ->lock when updating ->srcu_gp_seq A read of the srcu_struct structure's ->srcu_gp_seq field should not need READ_ONCE() when that structure's ->lock is held. Except that this lock is not always held when updating this field. This commit therefore acquires the lock around updates and removes a now-unneeded READ_ONCE(). This data race was reported by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Switch from READ_ONCE() to lock per Peter Zilstra question. ] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2020-02-20 16:01:11 -08:00
Paul E. McKenney	39f91504a0	srcu: Fix process_srcu()/srcu_batches_completed() datarace The srcu_struct structure's ->srcu_idx field is accessed locklessly, so reads must use READ_ONCE(). This commit therefore adds the needed READ_ONCE() invocation where it was missed. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:01:11 -08:00
Paul E. McKenney	8c9e0cb323	srcu: Fix __call_srcu()/srcu_get_delay() datarace The srcu_struct structure's ->srcu_gp_seq_needed_exp field is accessed locklessly, so updates must use WRITE_ONCE(). This commit therefore adds the needed WRITE_ONCE() invocations. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:01:11 -08:00
Paul E. McKenney	7ff8b4502b	srcu: Fix __call_srcu()/process_srcu() datarace The srcu_node structure's ->srcu_gp_seq_needed_exp field is accessed locklessly, so updates must use WRITE_ONCE(). This commit therefore adds the needed WRITE_ONCE() invocations. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:01:11 -08:00
Jules Irenge	90ba11ba99	rcu: Add missing annotation for exit_tasks_rcu_finish() Sparse reports a warning at exit_tasks_rcu_finish(void) \|warning: context imbalance in exit_tasks_rcu_finish() \|- wrong count at exit To fix this, this commit adds a __releases(&tasks_rcu_exit_srcu). Given that exit_tasks_rcu_finish() does actually call __srcu_read_lock(), this not only fixes the warning but also improves on the readability of the code. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2020-02-20 16:00:45 -08:00
Jules Irenge	e1e9bdc00a	rcu: Add missing annotation for exit_tasks_rcu_start() Sparse reports a warning at exit_tasks_rcu_start(void) \|warning: context imbalance in exit_tasks_rcu_start() - wrong count at exit To fix this, this commit adds an __acquires(&tasks_rcu_exit_srcu). Given that exit_tasks_rcu_start() does actually call __srcu_read_lock(), this not only fixes the warning but also improves on the readability of the code. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2020-02-20 16:00:45 -08:00
Paul E. McKenney	fcb7381265	rcu-tasks: *_ONCE() for rcu_tasks_cbs_head The RCU tasks list of callbacks, rcu_tasks_cbs_head, is sampled locklessly by rcu_tasks_kthread() when waiting for work to do. This commit therefore applies READ_ONCE() to that lockless sampling and WRITE_ONCE() to the single potential store outside of rcu_tasks_kthread. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:00:45 -08:00
Paul E. McKenney	b692dc4adf	rcu: Update __call_rcu() comments The __call_rcu() function's header comment refers to a cpu argument that no longer exists, and the comment of the return path from rcu_nocb_try_bypass() ignores the non-no-CBs CPU case. This commit therefore update both comments. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:00:20 -08:00
Colin Ian King	aa96a93ba2	rcu: Fix spelling mistake "leval" -> "level" This commit fixes a spelling mistake in a pr_info() message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:00:20 -08:00
Paul E. McKenney	8c14263d35	rcu: React to callback overload by boosting RCU readers RCU priority boosting currently is not applied until the grace period is at least 250 milliseconds old (or the number of milliseconds specified by the CONFIG_RCU_BOOST_DELAY Kconfig option). Although this has worked well, it can result in OOM under conditions of RCU callback flooding. One can argue that the real-time systems using RCU priority boosting should carefully avoid RCU callback flooding, but one can just as well argue that an OOM is a rather obnoxious error message. This commit therefore disables the RCU priority boosting delay when there are excessive numbers of callbacks queued. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:00:20 -08:00
Paul E. McKenney	b2b00ddf19	rcu: React to callback overload by aggressively seeking quiescent states In default configutions, RCU currently waits at least 100 milliseconds before asking cond_resched() and/or resched_rcu() for help seeking quiescent states to end a grace period. But 100 milliseconds can be one good long time during an RCU callback flood, for example, as can happen when user processes repeatedly open and close files in a tight loop. These 100-millisecond gaps in successive grace periods during a callback flood can result in excessive numbers of callbacks piling up, unnecessarily increasing memory footprint. This commit therefore asks cond_resched() and/or resched_rcu() for help as early as the first FQS scan when at least one of the CPUs has more than 20,000 callbacks queued, a number that can be changed using the new rcutree.qovld kernel boot parameter. An auxiliary qovld_calc variable is used to avoid acquisition of locks that have not yet been initialized. Early tests indicate that this reduces the RCU-callback memory footprint during rcutorture floods by from 50% to 4x, depending on configuration. Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: Tejun Heo <tj@kernel.org> [ paulmck: Fix bug located by Qian Cai. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Dexuan Cui <decui@microsoft.com> Tested-by: Qian Cai <cai@lca.pw>	2020-02-20 16:00:20 -08:00
Paul E. McKenney	b5ea03709d	rcu: Clear ->core_needs_qs at GP end or self-reported QS The rcu_data structure's ->core_needs_qs field does not necessarily get cleared in a timely fashion after the corresponding CPUs' quiescent state has been reported. From a functional viewpoint, no harm done, but this can result in excessive invocation of RCU core processing, as witnessed by the kernel test robot, which saw greatly increased softirq overhead. This commit therefore restores the rcu_report_qs_rdp() function's clearing of this field, but only when running on the corresponding CPU. Cases where some other CPU reports the quiescent state (for example, on behalf of an idle CPU) are handled by setting this field appropriately within the __note_gp_changes() function's end-of-grace-period checks. This handling is carried out regardless of whether the end of a grace period actually happened, thus handling the case where a CPU goes non-idle after a quiescent state is reported on its behalf, but before the grace period ends. This fix also avoids cross-CPU updates to ->core_needs_qs, While in the area, this commit changes the __note_gp_changes() need_gp variable's name to need_qs because it is a quiescent state that is needed from the CPU in question. Fixes: `ed93dfc6bc` ("rcu: Confine ->core_needs_qs accesses to the corresponding CPU") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 16:00:20 -08:00
Uladzislau Rezki (Sony)	613707929b	rcu: Add a trace event for kfree_rcu() use of kfree_bulk() The event is given three parameters, first one is the name of RCU flavour, second one is the number of elements in array for free and last one is an address of the array holding pointers to be freed by the kfree_bulk() function. To enable the trace event your kernel has to be build with CONFIG_RCU_TRACE=y, after that it is possible to track the events using ftrace subsystem. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:51 -08:00
Uladzislau Rezki (Sony)	34c8817455	rcu: Support kfree_bulk() interface in kfree_rcu() The kfree_rcu() logic can be improved further by using kfree_bulk() interface along with "basic batching support" introduced earlier. The are at least two advantages of using "bulk" interface: - in case of large number of kfree_rcu() requests kfree_bulk() reduces the per-object overhead caused by calling kfree() per-object. - reduces the number of cache-misses due to "pointer chasing" between objects which can be far spread between each other. This approach defines a new kfree_rcu_bulk_data structure that stores pointers in an array with a specific size. Number of entries in that array depends on PAGE_SIZE making kfree_rcu_bulk_data structure to be exactly one page. Since it deals with "block-chain" technique there is an extra need in dynamic allocation when a new block is required. Memory is allocated with GFP_NOWAIT \| __GFP_NOWARN flags, i.e. that allows to skip direct reclaim under low memory condition to prevent stalling and fails silently under high memory pressure. The "emergency path" gets maintained when a system is run out of memory. In that case objects are linked into regular list. The "rcuperf" was run to analyze this change in terms of memory consumption and kfree_bulk() throughput. 1) Testing on the Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz, 12xCPUs with following parameters: kfree_loops=200000 kfree_alloc_num=1000 kfree_rcu_test=1 kfree_vary_obj_size=1 dev.2020.01.10a branch Default / CONFIG_SLAB 53607352517 ns, loops: 200000, batches: 1885, memory footprint: 1248MB 53529637912 ns, loops: 200000, batches: 1921, memory footprint: 1193MB 53570175705 ns, loops: 200000, batches: 1929, memory footprint: 1250MB Patch / CONFIG_SLAB 23981587315 ns, loops: 200000, batches: 810, memory footprint: 1219MB 23879375281 ns, loops: 200000, batches: 822, memory footprint: 1190MB 24086841707 ns, loops: 200000, batches: 794, memory footprint: 1380MB Default / CONFIG_SLUB 51291025022 ns, loops: 200000, batches: 1713, memory footprint: 741MB 51278911477 ns, loops: 200000, batches: 1671, memory footprint: 719MB 51256183045 ns, loops: 200000, batches: 1719, memory footprint: 647MB Patch / CONFIG_SLUB 50709919132 ns, loops: 200000, batches: 1618, memory footprint: 456MB 50736297452 ns, loops: 200000, batches: 1633, memory footprint: 507MB 50660403893 ns, loops: 200000, batches: 1628, memory footprint: 429MB in case of CONFIG_SLAB there is double increase in performance and slightly higher memory usage. As for CONFIG_SLUB, the performance figures are better together with lower memory usage. 2) Testing on the HiKey-960, arm64, 8xCPUs with below parameters: CONFIG_SLAB=y kfree_loops=200000 kfree_alloc_num=1000 kfree_rcu_test=1 102898760401 ns, loops: 200000, batches: 5822, memory footprint: 158MB 89947009882 ns, loops: 200000, batches: 6715, memory footprint: 115MB rcuperf shows approximately ~12% better throughput in case of using "bulk" interface. The "drain logic" or its RCU callback does the work faster that leads to better throughput. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:51 -08:00
Paul E. McKenney	3d05031ae6	rcu: Make nocb_gp_wait() double-check unexpected-callback warning Currently, nocb_gp_wait() unconditionally complains if there is a callback not already associated with a grace period. This assumes that either there was no such callback initially on the one hand, or that the rcu_advance_cbs() function assigned all such callbacks to a grace period on the other. However, in theory there are some situations that would prevent rcu_advance_cbs() from assigning all of the callbacks. This commit therefore checks for unassociated callbacks immediately after rcu_advance_cbs() returns, while the corresponding rcu_node structure's ->lock is still held. If there are unassociated callbacks at that point, the subsequent WARN_ON_ONCE() is disabled. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:23 -08:00
Paul E. McKenney	13817dd589	rcu: Tighten rcu_lockdep_assert_cblist_protected() check The ->nocb_lock lockdep assertion is currently guarded by cpu_online(), which is incorrect for no-CBs CPUs, whose callback lists must be protected by ->nocb_lock regardless of whether or not the corresponding CPU is online. This situation could result in failure to detect bugs resulting from failing to hold ->nocb_lock for offline CPUs. This commit therefore removes the cpu_online() guard. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:23 -08:00
Paul E. McKenney	faa059c397	rcu: Optimize and protect atomic_cmpxchg() loop This commit reworks the atomic_cmpxchg() loop in rcu_eqs_special_set() to do only the initial read from the current CPU's rcu_data structure's ->dynticks field explicitly. On subsequent passes, this value is instead retained from the failing atomic_cmpxchg() operation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:23 -08:00
Jules Irenge	92c0b889f2	rcu/nocb: Add missing annotation for rcu_nocb_bypass_unlock() Sparse reports warning at rcu_nocb_bypass_unlock() warning: context imbalance in rcu_nocb_bypass_unlock() - unexpected unlock The root cause is a missing annotation of rcu_nocb_bypass_unlock() which causes the warning. This commit therefore adds the missing __releases(&rdp->nocb_bypass_lock) annotation. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Boqun Feng <boqun.feng@gmail.com>	2020-02-20 15:58:23 -08:00
Jules Irenge	9ced454807	rcu: Add missing annotation for rcu_nocb_bypass_lock() Sparse reports warning at rcu_nocb_bypass_lock() \|warning: context imbalance in rcu_nocb_bypass_lock() - wrong count at exit To fix this, this commit adds an __acquires(&rdp->nocb_bypass_lock). Given that rcu_nocb_bypass_lock() does actually call raw_spin_lock() when raw_spin_trylock() fails, this not only fixes the warning but also improves on the readability of the code. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:23 -08:00
Paul E. McKenney	5648d65912	rcu: Don't flag non-starting GPs before GP kthread is running Currently rcu_check_gp_start_stall() complains if a grace period takes too long to start, where "too long" is roughly one RCU CPU stall-warning interval. This has worked well, but there are some debugging Kconfig options (such as CONFIG_EFI_PGT_DUMP=y) that can make booting take a very long time, so much so that the stall-warning interval has expired before RCU's grace-period kthread has even been spawned. This commit therefore resets the rcu_state.gp_req_activity and rcu_state.gp_activity timestamps just before the grace-period kthread is spawned, and modifies the checks and adds ordering to ensure that if rcu_check_gp_start_stall() sees that the grace-period kthread has been spawned, that it will also see the resets applied to the rcu_state.gp_req_activity and rcu_state.gp_activity timestamps. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Fix whitespace issues reported by Qian Cai. ] Tested-by: Qian Cai <cai@lca.pw> [ paulmck: Simplify grace-period wakeup check per Steve Rostedt feedback. ]	2020-02-20 15:58:23 -08:00
Paul E. McKenney	aa24f93753	rcu: Fix rcu_barrier_callback() race condition The rcu_barrier_callback() function does an atomic_dec_and_test(), and if it is the last CPU to check in, does the required wakeup. Either way, it does an event trace. Unfortunately, this is susceptible to the following sequence of events: o CPU 0 invokes rcu_barrier_callback(), but atomic_dec_and_test() says that it is not last. But at this point, CPU 0 is delayed, perhaps due to an NMI, SMI, or vCPU preemption. o CPU 1 invokes rcu_barrier_callback(), and atomic_dec_and_test() says that it is last. So CPU 1 traces completion and does the needed wakeup. o The awakened rcu_barrier() function does cleanup and releases rcu_state.barrier_mutex. o Another CPU now acquires rcu_state.barrier_mutex and starts another round of rcu_barrier() processing, including updating rcu_state.barrier_sequence. o CPU 0 gets its act back together and does its tracing. Except that rcu_state.barrier_sequence has already been updated, so its tracing is incorrect and probably quite confusing. (Wait! Why did this CPU check in twice for one rcu_barrier() invocation???) This commit therefore causes rcu_barrier_callback() to take a snapshot of the value of rcu_state.barrier_sequence before invoking atomic_dec_and_test(), thus guaranteeing that the event-trace output is sensible, even if the timing of the event-trace output might still be confusing. (Wait! Why did the old rcu_barrier() complete before all of its CPUs checked in???) But being that this is RCU, only so much confusion can reasonably be eliminated. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to the mild consequences of the failure, namely a confusing event trace. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:23 -08:00
Paul E. McKenney	59881bcd85	rcu: Add WRITE_ONCE() to rcu_state ->gp_start The rcu_state structure's ->gp_start field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:23 -08:00
Paul E. McKenney	57721fd15a	rcu: Remove dead code from rcu_segcblist_insert_pend_cbs() The rcu_segcblist_insert_pend_cbs() function currently (partially) initializes the rcu_cblist that it pulls callbacks from. However, all the resulting stores are dead because all callers pass in the address of an on-stack cblist that is not used afterwards. This commit therefore removes this pointless initialization. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:23 -08:00
Paul E. McKenney	3ca3b0e2cb	rcu: Add *_ONCE() to rcu_node ->boost_kthread_status The rcu_node structure's ->boost_kthread_status field is accessed locklessly, so this commit causes all updates to use WRITE_ONCE() and all reads to use READ_ONCE(). This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:22 -08:00
Paul E. McKenney	2a2ae872ef	rcu: Add *_ONCE() to rcu_data ->rcu_forced_tick The rcu_data structure's ->rcu_forced_tick field is read locklessly, so this commit adds WRITE_ONCE() to all updates and READ_ONCE() to all lockless reads. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:22 -08:00
Paul E. McKenney	a5b8950180	rcu: Add READ_ONCE() to rcu_data ->gpwrap The rcu_data structure's ->gpwrap field is read locklessly, and so this commit adds the required READ_ONCE() to a pair of laods in order to avoid destructive compiler optimizations. This data race was reported by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:22 -08:00
SeongJae Park	65bb0dc437	rcu: Fix typos in file-header comments Convert to plural and add a note that this is for Tree RCU. Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:22 -08:00
Paul E. McKenney	8ff37290d6	rcu: Add *_ONCE() for grace-period progress indicators The various RCU structures' ->gp_seq, ->gp_seq_needed, ->gp_req_activity, and ->gp_activity fields are read locklessly, so they must be updated with WRITE_ONCE() and, when read locklessly, with READ_ONCE(). This commit makes these changes. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:22 -08:00
Paul E. McKenney	bfeebe2421	rcu: Add READ_ONCE() to rcu_segcblist ->tails[] The rcu_segcblist structure's ->tails[] array entries are read locklessly, so this commit adds the READ_ONCE() to a load in order to avoid destructive compiler optimizations. This data race was reported by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:22 -08:00
Paul E. McKenney	105abf82b0	rcu: Add WRITE_ONCE() to rcu_node ->qsmaskinitnext The rcu_state structure's ->qsmaskinitnext field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely for systems not doing incessant CPU-hotplug operations. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:22 -08:00
Paul E. McKenney	2906d2154c	rcu: Add WRITE_ONCE() to rcu_state ->gp_req_activity The rcu_state structure's ->gp_req_activity field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:22 -08:00

1 2 3 4 5 ...

1408 Commits