linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-18 16:44:27 +08:00

Author	SHA1	Message	Date
Paul E. McKenney	6aacd88d17	rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:38:24 -07:00
Paul E. McKenney	d1b222c6be	rcu/nocb: Add bypass callback queueing Use of the rcu_data structure's segmented ->cblist for no-CBs CPUs takes advantage of unrelated grace periods, thus reducing the memory footprint in the face of floods of call_rcu() invocations. However, the ->cblist field is a more-complex rcu_segcblist structure which must be protected via locking. Even though there are only three entities which can acquire this lock (the CPU invoking call_rcu(), the no-CBs grace-period kthread, and the no-CBs callbacks kthread), the contention on this lock is excessive under heavy stress. This commit therefore greatly reduces contention by provisioning an rcu_cblist structure field named ->nocb_bypass within the rcu_data structure. Each no-CBs CPU is permitted only a limited number of enqueues onto the ->cblist per jiffy, controlled by a new nocb_nobypass_lim_per_jiffy kernel boot parameter that defaults to about 16 enqueues per millisecond (16 * 1000 / HZ). When that limit is exceeded, the CPU instead enqueues onto the new ->nocb_bypass. The ->nocb_bypass is flushed into the ->cblist every jiffy or when the number of callbacks on ->nocb_bypass exceeds qhimark, whichever happens first. During call_rcu() floods, this flushing is carried out by the CPU during the course of its call_rcu() invocations. However, a CPU could simply stop invoking call_rcu() at any time. The no-CBs grace-period kthread therefore carries out less-aggressive flushing (every few jiffies or when the number of callbacks on ->nocb_bypass exceeds (2 * qhimark), whichever comes first). This means that the no-CBs grace-period kthread cannot be permitted to do unbounded waits while there are callbacks on ->nocb_bypass. A ->nocb_bypass_timer is used to provide the needed wakeups. [ paulmck: Apply Coverity feedback reported by Colin Ian King. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:37:32 -07:00
Paul E. McKenney	faca5c2509	rcu/nocb: Unconditionally advance and wake for excessive CBs When there are excessive numbers of callbacks, and when either the corresponding no-CBs callback kthread is asleep or there is no more ready-to-invoke callbacks, and when least one callback is pending, __call_rcu_nocb_wake() will advance the callbacks, but refrain from awakening the corresponding no-CBs grace-period kthread. However, because rcu_advance_cbs_nowake() is used, it is possible (if a bit unlikely) that the needed advancement could not happen due to a grace period not being in progress. Plus there will always be at least one pending callback due to one having just now been enqueued. This commit therefore attempts to advance callbacks and awakens the no-CBs grace-period kthread when there are excessive numbers of callbacks posted and when the no-CBs callback kthread is not in a position to do anything helpful. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	4fd8c5f153	rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock The sleep/wakeup of the no-CBs grace-period kthreads is synchronized using the ->nocb_lock of the first CPU corresponding to that kthread. This commit provides a separate ->nocb_gp_lock for this purpose, thus reducing contention on ->nocb_lock. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	523bddd553	rcu/nocb: Reduce contention at no-CBs invocation-done time Currently, nocb_cb_wait() unconditionally acquires the leaf rcu_node ->lock to advance callbacks when done invoking the previous batch. It does this while holding ->nocb_lock, which means that contention on the leaf rcu_node ->lock visits itself on the ->nocb_lock. This commit therefore makes this lock acquisition conditional, forgoing callback advancement when the leaf rcu_node ->lock is not immediately available. (In this case, the no-CBs grace-period kthread will eventually do any needed callback advancement.) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	6608c3a027	rcu/nocb: Reduce contention at no-CBs registry-time CB advancement Currently, __call_rcu_nocb_wake() conditionally acquires the leaf rcu_node structure's ->lock, and only afterwards does rcu_advance_cbs_nowake() check to see if it is possible to advance callbacks without potentially needing to awaken the grace-period kthread. Given that the no-awaken check can be done locklessly, this commit reverses the order, so that rcu_advance_cbs_nowake() is invoked without holding the leaf rcu_node structure's ->lock and rcu_advance_cbs_nowake() checks the grace-period state before conditionally acquiring that lock, thus reducing the number of needless acquistions of the leaf rcu_node structure's ->lock. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	9fcb09bddd	rcu/nocb: Round down for number of no-CBs grace-period kthreads Currently, when the square root of the number of CPUs is rounded down by int_sqrt(), this round-down is applied to the number of callback kthreads per grace-period kthreads. This makes almost no difference for large systems, but results in oddities such as three no-CBs grace-period kthreads for a five-CPU system, which is a bit excessive. This commit therefore causes the round-down to apply to the number of no-CBs grace-period kthreads, so that systems with from four to eight CPUs have only two no-CBs grace period kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	81c0b3d724	rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU A given rcu_data structure's ->nocb_lock can be acquired very frequently by the corresponding CPU and occasionally by the corresponding no-CBs grace-period and callbacks kthreads. In particular, these two kthreads will have frequent gaps between ->nocb_lock acquisitions that are roughly a grace period in duration. This means that any excessive ->nocb_lock contention will be due to the CPU's acquisitions, and this in turn enables a very naive contention-avoidance strategy to be quite effective. This commit therefore modifies rcu_nocb_lock() to first attempt a raw_spin_trylock(), and to atomically increment a separate ->nocb_lock_contended across a raw_spin_lock(). This new ->nocb_lock_contended field is checked in __call_rcu_nocb_wake() when interrupts are enabled, with a spin-wait for contending acquisitions to complete, thus allowing the kthreads a chance to acquire the lock. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	7f36ef82e5	rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread Currently, the code provides an extra wakeup for the no-CBs grace-period kthread if one of its CPUs is generating excessive numbers of callbacks. But satisfying though it is to wake something up when things are going south, unless the thing being awakened can actually help solve the problem, that extra wakeup does nothing but consume additional CPU time, which is exactly what you don't want during a call_rcu() flood. This commit therefore avoids doing anything if the corresponding no-CBs callback kthread is going full tilt. Otherwise, if advancing callbacks immediately might help and if the leaf rcu_node structure's lock is immediately available, this commit invokes a new variant of rcu_advance_cbs() that advances callbacks only if doing so won't require awakening the grace-period kthread (not to be confused with any of the no-CBs grace-period kthreads). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	ce0a825e40	rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks It might be hard to imagine having more than two billion callbacks queued on a single CPU's ->cblist, but someone will do it sometime. This commit therefore makes __call_rcu_nocb_wake() handle this situation by upgrading local variable "len" from "int" to "long". Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	383e133283	rcu/nocb: Never downgrade ->nocb_defer_wakeup in wake_nocb_gp_defer() Currently, wake_nocb_gp_defer() simply stores whatever waketype was passed in, which can result in a RCU_NOCB_WAKE_FORCE being downgraded to RCU_NOCB_WAKE, which could in turn delay callback processing. This commit therefore adds a check so that wake_nocb_gp_defer() only updates ->nocb_defer_wakeup when the update increases the forcefulness, thus avoiding downgrades. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	aeeacd9d84	rcu/nocb: Enable re-awakening under high callback load The __call_rcu_nocb_wake() function and its predecessors set ->qlen_last_fqs_check to zero for the first callback and to LONG_MAX / 2 for forced reawakenings. The former can result in a too-quick reawakening when there are many callbacks ready to invoke and the latter prevents a second reawakening. This commit therefore sets ->qlen_last_fqs_check to the current number of callbacks in both cases. While in the area, this commit also moves both assignments under ->nocb_lock. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	0bd55c6936	rcu/nohz: Turn off tick for offloaded CPUs Historically, no-CBs CPUs allowed the scheduler-clock tick to be unconditionally disabled on any transition to idle or nohz_full userspace execution (see the rcu_needs_cpu() implementations). Unfortunately, the checks used by rcu_needs_cpu() are defeated now that no-CBs CPUs use ->cblist, which might make users of battery-powered devices rather unhappy. This commit therefore adds explicit rcu_segcblist_is_offloaded() checks to return to the historical energy-efficient semantics. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	969974e5c5	rcu/nocb: Suppress uninitialized false-positive in nocb_gp_wait() Some compilers complain that wait_gp_seq might be used uninitialized in nocb_gp_wait(). This cannot actually happen because when wait_gp_seq is uninitialized, needwait_gp must be false, which prevents wait_gp_seq from being used. But this analysis is apparently beyond some compilers, so this commit adds a bogus initialization of wait_gp_seq for the sole purpose of suppressing the false-positive warning. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	2a777de757	rcu/nocb: Remove obsolete nocb_cb_tail and nocb_cb_head fields Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	c035280f17	rcu/nocb: Remove obsolete nocb_q_count and nocb_q_count_lazy fields This commit removes the obsolete nocb_q_count and nocb_q_count_lazy fields, also removing rcu_get_n_cbs_nocb_cpu(), adjusting rcu_get_n_cbs_cpu(), and making rcutree_migrate_callbacks() once again disable the ->cblist fields of offline CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	e7f4c5b399	rcu/nocb: Remove obsolete nocb_head and nocb_tail fields Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	5d6742b377	rcu/nocb: Use rcu_segcblist for no-CBs CPUs Currently the RCU callbacks for no-CBs CPUs are queued on a series of ad-hoc linked lists, which means that these callbacks cannot benefit from "drive-by" grace periods, thus suffering needless delays prior to invocation. In addition, the no-CBs grace-period kthreads first wait for callbacks to appear and later wait for a new grace period, which means that callbacks appearing during a grace-period wait can be delayed. These delays increase memory footprint, and could even result in an out-of-memory condition. This commit therefore enqueues RCU callbacks from no-CBs CPUs on the rcu_segcblist structure that is already used by non-no-CBs CPUs. It also restructures the no-CBs grace-period kthread to be checking for incoming callbacks while waiting for grace periods. Also, instead of waiting for a new grace period, it waits for the closest grace period that will cause some of the callbacks to be safe to invoke. All of these changes reduce callback latency and thus the number of outstanding callbacks, in turn reducing the probability of an out-of-memory condition. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	e83e73f5b0	rcu/nocb: Leave ->cblist enabled for no-CBs CPUs As a first step towards making no-CBs CPUs use the ->cblist, this commit leaves the ->cblist enabled for these CPUs. The main reason to make no-CBs CPUs use ->cblist is to take advantage of callback numbering, which will reduce the effects of missed grace periods which in turn will reduce forward-progress problems for no-CBs CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	ca5c825808	rcu/nocb: Remove deferred wakeup checks for extended quiescent states The idea behind the checks for extended quiescent states at the end of __call_rcu_nocb() is to handle cases where call_rcu() is invoked directly from within an extended quiescent state, for example, from the idle loop. However, this will result in a timer-mediated deferred wakeup, which will cause the needed wakeup to happen within a jiffy or thereabouts. There should be no forward-progress concerns, and if there are, the proper response is to exit the extended quiescent state while executing the endless blast of call_rcu() invocations, for example, using RCU_NONIDLE(). Given the more realistic case of an isolated call_rcu() invocation, there should be no problem. This commit therefore removes the checks for invoking call_rcu() within an extended quiescent state for on no-CBs CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	ce5215c134	rcu/nocb: Use separate flag to indicate offloaded ->cblist RCU callback processing currently uses rcu_is_nocb_cpu() to determine whether or not the current CPU's callbacks are to be offloaded. This works, but it is not so good for cache locality. Plus use of ->cblist for offloaded callbacks will greatly increase the frequency of these checks. This commit therefore adds a ->offloaded flag to the rcu_segcblist structure to provide a more flexible and cache-friendly means of checking for callback offloading. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	1bb5f9b95a	rcu/nocb: Use separate flag to indicate disabled ->cblist NULLing the RCU_NEXT_TAIL pointer was a clever way to save a byte, but forward-progress considerations would require that this pointer be both NULL and non-NULL, which, absent a quantum-computer port of the Linux kernel, simply won't happen. This commit therefore creates as separate ->enabled flag to replace the current NULL checks. [ paulmck: Add include files per 0day test robot and -next. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:34:50 -07:00
Paul E. McKenney	18cd8c93e6	rcu/nocb: Print gp/cb kthread hierarchy if dump_tree This commit causes the no-CBs grace-period/callback hierarchy to be printed to the console when the dump_tree kernel boot parameter is set. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	f7c612b000	rcu/nocb: Rename rcu_nocb_leader_stride kernel boot parameter This commit changes the name of the rcu_nocb_leader_stride kernel boot parameter to rcu_nocb_gp_stride in order to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	f7c9a9b664	rcu/nocb: Rename and document no-CB CB kthread sleep trace event The nocb_cb_wait() function traces a "FollowerSleep" trace_rcu_nocb_wake() event, which never was documented and is now misleading. This commit therefore changes "FollowerSleep" to "CBSleep", documents this, and updates the documentation for "Sleep" as well. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	0bdc33daef	rcu/nocb: Rename rcu_organize_nocb_kthreads() local variable This commit renames rdp_leader to rdp_gp in order to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	0d52a6652f	rcu/nocb: Rename wake_nocb_leader_defer() to wake_nocb_gp_defer() This commit adjusts naming to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	5f675ba6eb	rcu/nocb: Rename __wake_nocb_leader() to __wake_nocb_gp() This commit adjusts naming to account for the new distinction between callback and grace-period no-CBs kthreads. While in the area, it also updates local variables. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	5d62c08c5f	rcu/nocb: Rename wake_nocb_leader() to wake_nocb_gp() This commit adjusts naming to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	9fa471a881	rcu/nocb: Rename nocb_follower_wait() to nocb_cb_wait() This commit adjusts naming to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	12f54c3a84	rcu/nocb: Provide separate no-CBs grace-period kthreads Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads are divided into groups. The first rcuo kthread to come online in a given group is that group's leader, and the leader both waits for grace periods and invokes its CPU's callbacks. The non-leader rcuo kthreads only invoke callbacks. This works well in the real-time/embedded environments for which it was intended because such environments tend not to generate all that many callbacks. However, given huge floods of callbacks, it is possible for the leader kthread to be stuck invoking callbacks while its followers wait helplessly while their callbacks pile up. This is a good recipe for an OOM, and rcutorture's new callback-flood capability does generate such OOMs. One strategy would be to wait until such OOMs start happening in production, but similar OOMs have in fact happened starting in 2018. It would therefore be wise to take a more proactive approach. This commit therefore features per-CPU rcuo kthreads that do nothing but invoke callbacks. Instead of having one of these kthreads act as leader, each group has a separate rcog kthread that handles grace periods for its group. Because these rcuog kthreads do not invoke callbacks, callback floods on one CPU no longer block callbacks from reaching the rcuc callback-invocation kthreads on other CPUs. This change does introduce additional kthreads, however: 1. The number of additional kthreads is about the square root of the number of CPUs, so that a 4096-CPU system would have only about 64 additional kthreads. Note that recent changes decreased the number of rcuo kthreads by a factor of two (CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so this still represents a significant improvement on most systems. 2. The leading "rcuo" of the rcuog kthreads should allow existing scripting to affinity these additional kthreads as needed, the same as for the rcuop and rcuos kthreads. (There are no longer any rcuob kthreads.) 3. A state-machine approach was considered and rejected. Although this would allow the rcuo kthreads to continue their dual leader/follower roles, it complicates callback invocation and makes it more difficult to consolidate rcuo callback invocation with existing softirq callback invocation. The introduction of rcuog kthreads should thus be acceptable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	6484fe54b5	rcu/nocb: Update comments to prepare for forward-progress work This commit simply rewords comments to prepare for leader nocb kthreads doing only grace-period work and callback shuffling. This will mean the addition of replacement kthreads to invoke callbacks. The "leader" and "follower" thus become less meaningful, so the commit changes no-CB comments with these strings to "GP" and "CB", respectively. (Give or take the usual grammatical transformations.) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	58bf6f77c6	rcu/nocb: Rename rcu_data fields to prepare for forward-progress work This commit simply renames rcu_data fields to prepare for leader nocb kthreads doing only grace-period work and callback shuffling. This will mean the addition of replacement kthreads to invoke callbacks. The "leader" and "follower" thus become less meaningful, so the commit changes no-CB fields with these strings to "gp" and "cb", respectively. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:32:39 -07:00
Paul E. McKenney	31da067023	Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a', 'lists.2019.08.13a' and 'torture.2019.08.01b' into HEAD consolidate.2019.08.01b: Further consolidation cleanups fixes.2019.08.12a: Miscellaneous fixes lists.2019.08.13a: Optional lockdep arguments for RCU list macros torture.2019.08.01b: Torture-test updates	2019-08-13 14:30:30 -07:00
Byungchul Park	3545832fc2	rcu: Change return type of rcu_spawn_one_boost_kthread() The return value of rcu_spawn_one_boost_kthread() is not used any longer. This commit therefore changes its return type from int to void, and removes the cast to void from its callers. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-01 14:05:51 -07:00
Paul E. McKenney	1f3ebc8253	rcu: Restore barrier() to rcu_read_lock() and rcu_read_unlock() Commit `bb73c52bad` ("rcu: Don't disable preemption for Tiny and Tree RCU readers") removed the barrier() calls from rcu_read_lock() and rcu_write_lock() in CONFIG_PREEMPT=n&&CONFIG_PREEMPT_COUNT=n kernels. Within RCU, this commit was OK, but it failed to account for things like get_user() that can pagefault and that can be reordered by the compiler. Lack of the barrier() calls in rcu_read_lock() and rcu_read_unlock() can cause these page faults to migrate into RCU read-side critical sections, which in CONFIG_PREEMPT=n kernels could result in too-short grace periods and arbitrary misbehavior. Please see commit `386afc9114` ("spinlocks and preemption points need to be at least compiler barriers") and Linus's commit `66be4e66a7` ("rcu: locking and unlocking need to always be at least barriers"), this last of which restores the barrier() call to both rcu_read_lock() and rcu_read_unlock(). This commit removes barrier() calls that are no longer needed given that the addition of them in Linus's commit noted above. The combination of this commit and Linus's commit effectively reverts commit `bb73c52bad` ("rcu: Don't disable preemption for Tiny and Tree RCU readers"). Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix embarrassing typo located by Alan Stern. ]	2019-08-01 14:05:51 -07:00
Joel Fernandes (Google)	cb4dbbfaa1	rcu: Simplify rcu_note_context_switch exit from critical section Because __rcu_read_unlock() can be preempted just before the call to rcu_read_unlock_special(), it is possible for a task to be preempted just before it would have fully exited its RCU read-side critical section. This would result in a needless extension of that critical section until that task was resumed, which might in turn result in a needlessly long grace period, needless RCU priority boosting, and needless force-quiescent-state actions. Therefore, rcu_note_context_switch() invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when it detects this situation. This action by rcu_note_context_switch() ends the RCU read-side critical section immediately. Of course, once the task resumes, it will invoke rcu_read_unlock_special() redundantly. This is harmless because the fact that a preemption happened means that interrupts, preemption, and softirqs cannot have been disabled, so there would be no deferred quiescent state. While ->rcu_read_lock_nesting remains less than zero, none of the ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by the call to rcu_note_context_switch() at task-preemption time. Therefore, setting ->rcu_read_unlock_special.b.exp_hint to false has no effect. Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore() would return immediately. With one possible exception, which is if an expedited grace period started just as the task was being resumed, which could leave ->exp_deferred_qs set. This will cause rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(), reporting the quiescent state, just as it should. (Such an expedited grace period won't affect the preemption code path due to interrupts having already been disabled.) But when rcu_note_context_switch() invokes __rcu_read_unlock(), it is doing so with preemption disabled, hence __rcu_read_unlock() will unconditionally defer the quiescent state, only to immediately invoke rcu_preempt_deferred_qs(), thus immediately reporting the deferred quiescent state. It turns out to be safe (and faster) to instead just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock() middleman. Because this is the invocation during the preemption (as opposed to the invocation just after the resume), at least one of the bits in ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting must be negative. This means that rcu_preempt_need_deferred_qs() must return true, avoiding the early exit from rcu_preempt_deferred_qs(). Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately, as required. This commit therefore simplifies the CONFIG_PREEMPT=y version of rcu_note_context_switch() by removing the "else if" branch of its "if" statement. This change means that all callers that would have invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs() will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted. Cc: rcu@vger.kernel.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-01 14:04:20 -07:00
Paul E. McKenney	87446b4874	rcu: Make rcu_read_unlock_special() checks match raise_softirq_irqoff() Threaded interrupts provide additional interesting interactions between RCU and raise_softirq() that can result in self-deadlocks in v5.0-2 of the Linux kernel. These self-deadlocks can be provoked in susceptible kernels within a few minutes using the following rcutorture command on an 8-CPU system: tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --configs "TREE03" --bootargs "threadirqs" Although post-v5.2 RCU commits have at least greatly reduced the probability of these self-deadlocks, this was entirely by accident. Although this sort of accident should be rowdily celebrated on those rare occasions when it does occur, such celebrations should be quickly followed by a principled patch, which is what this patch purports to be. The key point behind this patch is that when in_interrupt() returns true, __raise_softirq_irqoff() will never attempt a wakeup. Therefore, if in_interrupt(), calls to raise_softirq() are both safe and extremely cheap. This commit therefore replaces the in_irq() calls in the "if" statement in rcu_read_unlock_special() with in_interrupt() and simplifies the "if" condition to the following: if (irqs_were_disabled && use_softirq && (in_interrupt() \|\| (exp && !t->rcu_read_unlock_special.b.deferred_qs))) { raise_softirq_irqoff(RCU_SOFTIRQ); } else { / Appeal to the scheduler. */ } The rationale behind the "if" condition is as follows: 1. irqs_were_disabled: If interrupts are enabled, we should instead appeal to the scheduler so as to let the upcoming irq_enable()/local_bh_enable() do the rescheduling for us. 2. use_softirq: If this kernel isn't using softirq, then raise_softirq_irqoff() will be unhelpful. 3. a. in_interrupt(): If this returns true, the subsequent call to raise_softirq_irqoff() is guaranteed not to do a wakeup, so that call will be both very cheap and quite safe. b. Otherwise, if !in_interrupt() the raise_softirq_irqoff() might do a wakeup, which is expensive and, in some contexts, unsafe. i. The "exp" (an expedited RCU grace period is being blocked) says that the wakeup is worthwhile, and: ii. The !.deferred_qs says that scheduler locks cannot be held, so the wakeup will be safe. Backporting this requires considerable care, so no auto-backport, please! Fixes: `05f415715c` ("rcu: Speed up expedited GPs when interrupting RCU reader") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-01 14:04:20 -07:00
Paul E. McKenney	d143b3d1cd	rcu: Simplify rcu_read_unlock_special() deferred wakeups In !use_softirq runs, we clearly cannot rely on raise_softirq() and its lightweight bit setting, so we must instead do some form of wakeup. In the absence of a self-IPI when interrupts are disabled, these wakeups can be delayed until the next interrupt occurs. This means that calling invoke_rcu_core() doesn't actually do any expediting. In this case, it is better to take the "else" clause, which sets the current CPU's resched bits and, if there is an expedited grace period in flight, uses IRQ-work to force the needed self-IPI. This commit therefore removes the "else if" clause that calls invoke_rcu_core(). Reported-by: Scott Wood <swood@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-01 14:04:20 -07:00
Paul E. McKenney	11ca7a9d54	Merge branches 'consolidate.2019.05.28a', 'doc.2019.05.28a', 'fixes.2019.06.13a', 'srcu.2019.05.28a', 'sync.2019.05.28a' and 'torture.2019.05.28a' into HEAD consolidate.2019.05.28a: RCU flavor consolidation cleanups and optmizations. doc.2019.05.28a: Documentation updates. fixes.2019.06.13a: Miscellaneous fixes. srcu.2019.05.28a: SRCU updates. sync.2019.05.28a: RCU-sync flavor consolidation. torture.2019.05.28a: Torture-test updates.	2019-06-19 09:21:46 -07:00
Neeraj Upadhyay	cd6d17b4a4	rcu: Dump specified number of blocked tasks The dump_blkd_tasks() function dumps at most 10 blocked tasks, ignoring the value of the ncheck parameter. This commit therefore substitutes the value of ncheck for the hard-coded value of 10. Because all callers currently pass 10 as the number, this patch does not change behavior, but it is clearly an accident waiting to happen. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:02:57 -07:00
Paul E. McKenney	1bb336443c	rcu: Rename rcu_data's ->deferred_qs to ->exp_deferred_qs The rcu_data structure's ->deferred_qs field is used to indicate that the current CPU is blocking an expedited grace period (perhaps a future one). Given that it is used only for expedited grace periods, its current name is misleading, so this commit renames it to ->exp_deferred_qs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 08:48:19 -07:00
Paul E. McKenney	0864f057b0	rcu: Use irq_work to get scheduler's attention in clean context When rcu_read_unlock_special() is invoked with interrupts disabled, is either not in an interrupt handler or is not using RCU_SOFTIRQ, is not the first RCU read-side critical section in the chain, and either there is an expedited grace period in flight or this is a NO_HZ_FULL kernel, the end of the grace period can be unduly delayed. The reason for this is that it is not safe to do wakeups in this situation. This commit fixes this problem by using the irq_work subsystem to force a later interrupt handler in a clean environment. Because set_tsk_need_resched(current) and set_preempt_need_resched() are invoked prior to this, the scheduler will force a context switch upon return from this interrupt (though perhaps at the end of any interrupted preempt-disable or BH-disable region of code), which will invoke rcu_note_context_switch() (again in a clean environment), which will in turn give RCU the chance to report the deferred quiescent state. Of course, by then this task might be within another RCU read-side critical section. But that will be detected at that time and reporting will be further deferred to the outermost rcu_read_unlock(). See rcu_preempt_need_deferred_qs() and rcu_preempt_deferred_qs() for more details on the checking. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:49 -07:00
Paul E. McKenney	385b599e8c	rcu: Allow rcu_read_unlock_special() to raise_softirq() if in_irq() When running in an interrupt handler, raise_softirq() and raise_softirq_irqoff() have extremely low overhead: They simply set a bit in a per-CPU mask, which is checked upon exit from that interrupt handler. Therefore, if rcu_read_unlock_special() is invoked within an interrupt handler and RCU_SOFTIRQ is in use, this commit make use of raise_softirq_irqoff() even if there is no expedited grace period in flight and even if this is not a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:48 -07:00
Paul E. McKenney	25102de65f	rcu: Only do rcu_read_unlock_special() wakeups if expedited Currently, rcu_read_unlock_special() will do wakeups whenever it is safe to do so. However, wakeups are expensive, and they are only really needed when the just-ended RCU read-side critical section is blocking an expedited grace period (in which case speed is of the essence) or on a nohz_full CPU (where it might be a good long time before an interrupt arrives). This commit therefore checks for these conditions, and does the expensive wakeups only if doing so would be useful. Note it can be rather expensive to determine whether or not the current task (as opposed to the current CPU) is blocking the current expedited grace period. Doing so requires traversing the ->blkd_tasks list, which can be quite long. This commit therefore cheats: If the current task is on a given ->blkd_tasks list, and some task on that list is blocking the current expedited grace period, the code assumes that the current task is blocking that expedited grace period. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:48 -07:00
Paul E. McKenney	23634ebc1d	rcu: Check for wakeup-safe conditions in rcu_read_unlock_special() When RCU core processing is offloaded from RCU_SOFTIRQ to the rcuc kthreads, a full and unconditional wakeup is required to initiate RCU core processing. In contrast, when RCU core processing is carried out by RCU_SOFTIRQ, a raise_softirq() suffices. Of course, there are situations where raise_softirq() does a full wakeup, but these do not occur with normal usage of rcu_read_unlock(). The reason that full wakeups can be problematic is that the scheduler sometimes invokes rcu_read_unlock() with its pi or rq locks held, which can of course result in deadlock in CONFIG_PREEMPT=y kernels when rcu_read_unlock() invokes the scheduler. Scheduler invocations can happen in the following situations: (1) The just-ended reader has been subjected to RCU priority boosting, in which case rcu_read_unlock() must deboost, (2) Interrupts were disabled across the call to rcu_read_unlock(), so the quiescent state must be deferred, requiring a wakeup of the rcuc kthread corresponding to the current CPU. Now, the scheduler may hold one of its locks across rcu_read_unlock() only if preemption has been disabled across the entire RCU read-side critical section, which in the days prior to RCU flavor consolidation meant that rcu_read_unlock() never needed to do wakeups. However, this is no longer the case for any but the first rcu_read_unlock() following a condition (e.g., preempted RCU reader) requiring special rcu_read_unlock() attention. For example, an RCU read-side critical section might be preempted, but preemption might be disabled across the rcu_read_unlock(). The rcu_read_unlock() must defer the quiescent state, and therefore leaves the task queued on its leaf rcu_node structure. If a scheduler interrupt occurs, the scheduler might well invoke rcu_read_unlock() with one of its locks held. However, the preempted task is still queued, so rcu_read_unlock() will attempt to defer the quiescent state once more. When RCU core processing is carried out by RCU_SOFTIRQ, this works just fine: The raise_softirq() function simply sets a bit in a per-CPU mask and the RCU core processing will be undertaken upon return from interrupt. Not so when RCU core processing is carried out by the rcuc kthread: In this case, the required wakeup can result in deadlock. The initial solution to this problem was to use set_tsk_need_resched() and set_preempt_need_resched() to force a future context switch, which allows rcu_preempt_note_context_switch() to report the deferred quiescent state to RCU's core processing. Unfortunately for expedited grace periods, there can be a significant delay between the call for a context switch and the actual context switch. This commit therefore introduces a ->deferred_qs flag to the task_struct structure's rcu_special structure. This flag is initially false, and is set to true by the first call to rcu_read_unlock() requiring special attention, then finally reset back to false when the quiescent state is finally reported. Then rcu_read_unlock() attempts full wakeups only when ->deferred_qs is false, that is, on the first rcu_read_unlock() requiring special attention. Note that a chain of RCU readers linked by some other sort of reader may find that a later rcu_read_unlock() is once again able to do a full wakeup, courtesy of an intervening preemption: rcu_read_lock(); /* preempted / local_irq_disable(); rcu_read_unlock(); / Can do full wakeup, sets ->deferred_qs. / rcu_read_lock(); local_irq_enable(); preempt_disable() rcu_read_unlock(); / Cannot do full wakeup, ->deferred_qs set. / rcu_read_lock(); preempt_enable(); / preempted, >deferred_qs reset. / local_irq_disable(); rcu_read_unlock(); / Can again do full wakeup, sets ->deferred_qs. */ Such linked RCU readers do not yet seem to appear in the Linux kernel, and it is probably best if they don't. However, RCU needs to handle them, and some variations on this theme could make even raise_softirq() unsafe due to the possibility of its doing a full wakeup. This commit therefore also avoids invoking raise_softirq() when the ->deferred_qs set flag is set. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2019-05-25 14:50:47 -07:00
Sebastian Andrzej Siewior	48d07c04b4	rcu: Enable elimination of Tree-RCU softirq processing Some workloads need to change kthread priority for RCU core processing without affecting other softirq work. This commit therefore introduces the rcutree.use_softirq kernel boot parameter, which moves the RCU core work from softirq to a per-CPU SCHED_OTHER kthread named rcuc. Use of SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Note that rcutree.use_softirq=0 must be specified to move RCU core processing to the rcuc kthreads: rcutree.use_softirq=1 is the default. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked from RCU core processing, in contrast to softirq->rcuc transition in old mainline RCU priority boosting. ] [ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock() while holding rq or pi locks, also possibly fixing a pre-existing latent bug involving raise_softirq()-induced wakeups. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:46 -07:00
Paul E. McKenney	6cdbc07a5a	Merge branches 'consolidate.2019.04.09a', 'doc.2019.03.26b', 'fixes.2019.03.26b', 'srcu.2019.03.26b', 'stall.2019.03.26b' and 'torture.2019.03.26b' into HEAD consolidate.2019.04.09a: Lingering RCU flavor consolidation cleanups. doc.2019.03.26b: Documentation updates. fixes.2019.03.26b: Miscellaneous fixes. srcu.2019.03.26b: SRCU updates. stall.2019.03.26b: RCU CPU stall warning updates. torture.2019.03.26b: Torture-test updates.	2019-04-09 08:08:13 -07:00
Paul E. McKenney	59b73a2768	rcu: Move FAST_NO_HZ stall-warning code to tree_stall.h This commit further consolidates the stall-warning code by moving print_cpu_stall_info() and its helper functions along with zero_cpu_stall_ticks() to kernel/rcu/tree_stall.h. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:40:13 -07:00
Paul E. McKenney	40e69ac7d0	rcu: Inline RCU stall-warning info helper functions The print_cpu_stall_info_begin() and print_cpu_stall_info_end() print a single character each onto the console, and are a holdover from a time when RCU CPU stall warning messages could be abbreviated using a long-gone Kconfig option. This commit therefore adds these single characters to already-printed strings in the calling functions, and then eliminates both print_cpu_stall_info_begin() and print_cpu_stall_info_end(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:40:13 -07:00
Paul E. McKenney	d87cda5094	rcu: Move rcu_print_task_exp_stall() to tree_exp.h Because expedited CPU stall warnings are contained within the kernel/rcu/tree_exp.h file, rcu_print_task_exp_stall() should live there too. This commit carries out the required code motion. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:40:13 -07:00
Paul E. McKenney	3fc3d1709f	rcu: Move RCU CPU stall-warning code out of tree_plugin.h The RCU CPU stall-warning code for normal grace periods is currently scattered across two files, due to earlier Tiny RCU support for RCU CPU stall warnings and for old Kconfig options that have long since been retired. Given that it is hard for the lead RCU maintainer to find relevant stall-warning code, it would be good to consolidate it. This commit continues this process by moving stall-warning code from kernel/rcu/tree_plugin.c to a new kernel/rcu/tree_stall.h file. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:40:13 -07:00
Paul E. McKenney	add0d37b4f	rcu: Correct READ_ONCE()/WRITE_ONCE() for ->rcu_read_unlock_special The task_struct structure's ->rcu_read_unlock_special field is only ever read or written by the owning task, but it is accessed both at process and interrupt levels. It may therefore be accessed using plain reads and writes while interrupts are disabled, but must be accessed using READ_ONCE() and WRITE_ONCE() or better otherwise. This commit makes a few adjustments to align with this discipline. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:38:38 -07:00
Paul E. McKenney	a2badefa85	rcu: Eliminate redundant NULL-pointer check Because rcu_wake_cond() checks for a null task_struct pointer, there is no need for its callers to do so. This commit eliminates the redundant check. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:38:38 -07:00
Paul E. McKenney	497e42600b	rcu: Report error for bad rcu_nocbs= parameter values This commit prints a console message when cpulist_parse() reports a bad list of CPUs, and sets all CPUs' bits in that case. The reason for setting all CPUs' bits is that this is the safe(r) choice for real-time workloads, which would normally be the ones using the rcu_nocbs= kernel boot parameter. Either way, later RCU console log messages list the actual set of CPUs whose RCU callbacks will be offloaded. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:37:49 -07:00
Paul E. McKenney	da8739f23f	rcu: Allow rcu_nocbs= to specify all CPUs Currently, the rcu_nocbs= kernel boot parameter requires that a specific list of CPUs be specified, and has no way to say "all of them". As noted by user RavFX in a comment to Phoronix topic 1002538, this is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL Kconfig option. This commit therefore enables the rcu_nocbs= kernel boot parameter to be given the string "all", as in "rcu_nocbs=all" to specify that all CPUs on the system are to have their RCU callbacks offloaded. Another approach would be to make cpulist_parse() check for "all", but there are uses of cpulist_parse() that do other checking, which could conflict with an "all". This commit therefore focuses on the specific use of cpulist_parse() in rcu_nocb_setup(). Just a note to other people who would like changes to Linux-kernel RCU: If you send your requests to me directly, they might get fixed somewhat faster. RavFX's comment was posted on January 22, 2018 and I first saw it on March 5, 2019. And the only reason that I found it -at- -all- was that I was looking for projects using RCU, and my search engine showed me that Phoronix comment quite by accident. Your choice, though! ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:37:49 -07:00
Paul E. McKenney	884157cef0	rcu: Make exit_rcu() handle non-preempted RCU readers The purpose of exit_rcu() is to handle cases where buggy code causes a task to exit within an RCU read-side critical section. It currently does that in the case where said RCU read-side critical section was preempted at least once, but fails to handle cases where preemption did not occur. This case needs to be handled because otherwise the final context switch away from the exiting task will incorrectly behave as if task exit were instead a preemption of an RCU read-side critical section, and will therefore queue the exiting task. The exiting task will have exited, and thus won't ever execute rcu_read_unlock(), which means that it will remain queued forever, blocking all subsequent grace periods, and eventually resulting in OOM. Although this is arguably better than letting grace periods proceed and having a later rcu_read_unlock() access the now-freed task structure that once belonged to the exiting tasks, it would obviously be better to correctly handle this case. This commit therefore sets ->rcu_read_lock_nesting to 1 in that case, so that the subsequence call to __rcu_read_unlock() causes the exiting task to exit its dangling RCU read-side critical section. Note that deferred quiescent states need not be considered. The reason is that removing the task from the ->blkd_tasks[] list in the call to rcu_preempt_deferred_qs() handles the per-task component of any deferred quiescent state, and all other components of any deferred quiescent state are associated with the CPU, which isn't going anywhere until some later CPU-hotplug operation, which will report any remaining deferred quiescent states from within the rcu_report_dead() function. Note also that negative values of ->rcu_read_lock_nesting need not be considered. First, these won't show up in exit_rcu() unless there is a serious bug in RCU, and second, setting ->rcu_read_lock_nesting sets the state so that the RCU read-side critical section will be exited normally. Again, this code has no effect unless there has been some prior bug that prevents a task from leaving an RCU read-side critical section before exiting. Furthermore, there have been no reports of the bug fixed by this commit appearing in production. This commit is therefore absolutely -not- recommended for backporting to -stable. Reported-by: ABHISHEK DUBEY <dabhishek@iisc.ac.in> Reported-by: BHARATH Y MOURYA <bharathm@iisc.ac.in> Reported-by: Aravinda Prasad <aravinda@iisc.ac.in> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Tested-by: ABHISHEK DUBEY <dabhishek@iisc.ac.in>	2019-03-26 14:37:49 -07:00
Paul E. McKenney	e7ffb4eb9a	Merge branches 'doc.2019.01.26a', 'fixes.2019.01.26a', 'sil.2019.01.26a', 'spdx.2019.02.09a', 'srcu.2019.01.26a' and 'torture.2019.01.26a' into HEAD doc.2019.01.26a: Documentation updates. fixes.2019.01.26a: Miscellaneous fixes. sil.2019.01.26a: Removal of a few more spin_is_locked() instances. spdx.2019.02.09a: Add SPDX identifiers to RCU files srcu.2019.01.26a: SRCU updates. torture.2019.01.26a: Torture-test updates.	2019-02-09 08:47:52 -08:00
Paul E. McKenney	22e4092531	rcu/tree: Convert to SPDX license identifier Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Update .h file SPDX comment format per Joe Perches. ] Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2019-02-09 08:44:10 -08:00
Paul E. McKenney	c98cac603f	rcu: Rename rcu_check_callbacks() to rcu_sched_clock_irq() The name rcu_check_callbacks() arguably made sense back in the early 2000s when RCU was quite a bit simpler than it is today, but it has become quite misleading, especially with the advent of dyntick-idle and NO_HZ_FULL. The rcu_check_callbacks() function is RCU's hook into the scheduling-clock interrupt, and is now but one of many ways that callbacks get promoted to invocable state. This commit therefore changes the name to rcu_sched_clock_irq(), which is the same number of characters and clearly indicates this function's relation to the rest of the Linux kernel. In addition, for the sake of consistency, rcu_flavor_check_callbacks() is also renamed to rcu_flavor_sched_clock_irq(). While in the area, the header comments for both functions are reworked. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:35:21 -08:00
Paul E. McKenney	7a968bb26a	Merge branches 'consolidate.2019.01.26a' and 'fwd.2019.01.26a' into HEAD consolidate.2019.01.26a: RCU flavor consolidation cleanups. fwd.2019.01.26a: RCU grace-period forward-progress fixes.	2019-01-25 15:32:01 -08:00
Paul E. McKenney	a9fefdb257	rcu: Update NOCB comments This commit updates a few obsolete comments in the RCU callback-offload code. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:29:57 -08:00
Paul E. McKenney	f7e972ee12	rcu: Move rcu_cpu_has_work to rcu_data structure Given that RCU has a perfectly good per-CPU rcu_data structure, most per-CPU quantities should be stored there. This commit therefore moves the rcu_cpu_has_work per-CPU variable to the rcu_data structure. This also makes this variable unconditionally present, which should be acceptable given the memory reduction due to the RCU flavor consolidation and also due to simplifications this will enable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:29:56 -08:00
Paul E. McKenney	8b4d0f4858	rcu: Remove unused rcu_cpu_kthread_loops per-CPU variable The rcu_cpu_kthread_loops variable used to provide debugfs information, but is no longer used. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:29:55 -08:00
Paul E. McKenney	6ffdde28b7	rcu: Move rcu_cpu_kthread_status to rcu_data structure Given that RCU has a perfectly good per-CPU rcu_data structure, most per-CPU quantities should be stored there. This commit therefore moves the rcu_cpu_kthread_status per-CPU variable to the rcu_data structure. This also makes this variable unconditionally present, which should be acceptable given the memory reduction due to the RCU flavor consolidation and also due to simplifications this will enable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:29:54 -08:00
Paul E. McKenney	37f62d7cf0	rcu: Move rcu_cpu_kthread_task to rcu_data structure Given that RCU has a perfectly good per-CPU rcu_data structure, most per-CPU quantities should be stored there. This commit therefore moves the rcu_cpu_kthread_task per-CPU variable to the rcu_data structure. This also makes this variable unconditionally present, which should be acceptable given the memory reduction due to the RCU flavor consolidation and also due to simplifications this will enable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:29:53 -08:00
Paul E. McKenney	260e1e4fd8	rcu: Discard separate per-CPU callback counts Back when there were multiple flavors of RCU, it was necessary to separately count lazy and non-lazy callbacks for each CPU. These counts were used in CONFIG_RCU_FAST_NO_HZ kernels to determine how long a newly idle CPU should be allowed to sleep before handling its RCU callbacks. But now that there is only one flavor, the callback counts for a given CPU's sole rcu_data structure are the counts for that CPU. This commit therefore removes the rcu_data structure's ->nonlazy_posted and ->nonlazy_posted_snap fields, the rcu_idle_count_callbacks_posted() and rcu_cpu_has_callbacks() functions, repurposes the rcu_data structure's ->all_lazy field to record the laziness state at the beginning of the latest idle sojourn, and modifies CONFIG_RCU_FAST_NO_HZ RCU CPU stall warnings accordingly. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:28:30 -08:00
Paul E. McKenney	e5bc3af773	rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu() Now that rcu_blocking_is_gp() makes the correct immediate-return decision for both PREEMPT and !PREEMPT, a single implementation of synchronize_rcu() will work correctly under both configurations. This commit therefore eliminates a few lines of code by consolidating the two implementations of synchronize_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:28:28 -08:00
Paul E. McKenney	c46f497a61	rcu: Inline rcu_kthread_do_work() into its sole remaining caller The rcu_kthread_do_work() function has a single-line body and only one remaining caller. This commit therefore saves a few lines of code by inlining rcu_kthread_do_work() into its sole remaining caller. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:28:25 -08:00
Paul E. McKenney	ad368d15b0	rcu: Rename and comment changes due to only one rcuo kthread per CPU Given RCU flavor consolidation, the name rcu_spawn_all_nocb_kthreads() is quite misleading. It no longer ever creates more than one kthread, and it does so only for the specified CPU. This commit therefore changes this name to the more descriptive rcu_spawn_cpu_nocb_kthread(), and also fixes up a similar issue in its header comment while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:28:23 -08:00
Paul E. McKenney	903ee83d91	rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings The RCU CPU stall warnings print an estimate of the total number of RCU callbacks queued in the system, but this estimate leaves out the callbacks queued for nocbs CPUs. This commit therefore introduces rcu_get_n_cbs_cpu(), which gives an accurate callback estimate for both nocbs and normal CPUs, and uses this new function as needed. This commit also introduces a rcu_get_n_cbs_nocb_cpu() helper function that returns the number of callbacks for nocbs CPUs or zero otherwise, and also uses this function in place of direct access to ->nocb_q_count while in the area (fewer characters, you see). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-12-01 12:45:37 -08:00
Paul E. McKenney	5ab7ab8362	rcutorture: Affinity forward-progress test to avoid housekeeping CPUs This commit affinities the forward-progress tests to avoid hogging a housekeeping CPU on the theory that the offloaded callbacks will be running on those housekeeping CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix NULL-pointer issue located by kbuild test robot. ] Tested-by: Rong Chen <rong.a.chen@intel.com>	2018-12-01 12:45:34 -08:00
Paul E. McKenney	eaaf055f27	Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', 'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD bug.2018.11.12a: Get rid of BUG_ON() and friends consolidate.2018.12.01a: Continued RCU flavor-consolidation cleanup doc.2018.11.12a: Documentation updates fixes.2018.11.12a: Miscellaneous fixes initrd.2018.11.08b: Automate creation of rcutorture initrd sil.2018.11.12a: Remove more spin_unlock_wait() calls	2018-12-01 12:43:16 -08:00
Paul E. McKenney	5f1a6ef374	rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs() Subtracting INT_MIN can be interpreted as unconditional signed integer overflow, which according to the C standard is undefined behavior. Therefore, kernel build arguments notwithstanding, it would be good to future-proof the code. This commit therefore substitutes INT_MAX for INT_MIN in order to avoid undefined behavior. While in the neighborhood, this commit also creates some meaningful names for INT_MAX and friends in order to improve readability, as suggested by Joel Fernandes. Reported-by: Ran Rozenstein <ranro@mellanox.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	117f683c6e	rcu: Replace this_cpu_ptr() with __this_cpu_read() Because __this_cpu_read() can be lighter weight than equivalent uses of this_cpu_ptr(), this commit replaces the latter with the former. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	05f415715c	rcu: Speed up expedited GPs when interrupting RCU reader In PREEMPT kernels, an expedited grace period might send an IPI to a CPU that is executing an RCU read-side critical section. In that case, it would be nice if the rcu_read_unlock() directly interacted with the RCU core code to immediately report the quiescent state. And this does happen in the case where the reader has been preempted. But it would also be a nice performance optimization if immediate reporting also happened in the preemption-free case. This commit therefore adds an ->exp_hint field to the task_struct structure's ->rcu_read_unlock_special field. The IPI handler sets this hint when it has interrupted an RCU read-side critical section, and this causes the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(), which, if preemption is enabled, reports the quiescent state immediately. If preemption is disabled, then the report is required to be deferred until preemption (or bottom halves or interrupts or whatever) is re-enabled. Because this is a hint, it does nothing for more complicated cases. For example, if the IPI interrupts an RCU reader, but interrupts are disabled across the rcu_read_unlock(), but another rcu_read_lock() is executed before interrupts are re-enabled, the hint will already have been cleared. If you do crazy things like this, reporting will be deferred until some later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar. Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	9213784b48	rcu: Eliminate BUG_ON() for kernel/rcu/tree_plugin.h The tree_plugin.h file has a number of calls to BUG_ON(), which panics the kernel, which is not a good strategy for devices (like embedded) that don't have a way to capture console output. This commit therefore converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE(). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix typo: s/rcuo/rcub/. ]	2018-11-12 08:15:16 -08:00
Paul E. McKenney	dc5a4f2932	rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks This commit move ->dynticks from the rcu_dynticks structure to the rcu_data structure, replacing the field of the same name. It also updates the code to access ->dynticks from the rcu_data structure and to use the rcu_data structure rather than following to now-gone ->dynticks field to the now-gone rcu_dynticks structure. While in the area, this commit also fixes up comments. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:52 -07:00
Paul E. McKenney	4c5273bf2b	rcu: Switch dyntick nesting counters to rcu_data structure This commit removes ->dynticks_nesting and ->dynticks_nmi_nesting from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:51 -07:00
Paul E. McKenney	2dba13f0b6	rcu: Switch urgent quiescent-state requests to rcu_data structure This commit removes ->rcu_need_heavy_qs and ->rcu_urgent_qs from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:50 -07:00
Paul E. McKenney	c458a89e96	rcu: Switch lazy counts to rcu_data structure This commit removes ->all_lazy, ->nonlazy_posted and ->nonlazy_posted_snap from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:49 -07:00
Paul E. McKenney	5998a75adb	rcu: Switch last accelerate/advance to rcu_data structure This commit removes ->last_accelerate and ->last_advance_all from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:48 -07:00
Paul E. McKenney	0fd79e7521	rcu: Switch ->tick_nohz_enabled_snap to rcu_data structure This commit removes ->tick_nohz_enabled_snap from the rcu_dynticks structure and updates the code to access it from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:47 -07:00
Paul E. McKenney	fced9c8cfe	rcu: Avoid resched_cpu() when rescheduling the current CPU The resched_cpu() interface is quite handy, but it does acquire the specified CPU's runqueue lock, which does not come for free. This commit therefore substitutes the following when directing resched_cpu() at the current CPU: set_tsk_need_resched(current); set_preempt_need_resched(); Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>	2018-08-30 16:03:45 -07:00
Paul E. McKenney	d3052109c0	rcu: More aggressively enlist scheduler aid for nohz_full CPUs Because nohz_full CPUs can leave the scheduler-clock interrupt disabled even when in kernel mode, RCU cannot rely on rcu_check_callbacks() to enlist the scheduler's aid in extracting a quiescent state from such CPUs. This commit therefore more aggressively uses resched_cpu() on nohz_full CPUs that fail to pass through a quiescent state in a timely manner. By default, the resched_cpu() beating starts 300 milliseconds into the quiescent state. While in the neighborhood, add a ->last_fqs_resched field to the rcu_data structure in order to rate-limit resched_cpu() calls from the RCU grace-period kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:44 -07:00
Paul E. McKenney	c06aed0e31	rcu: Compute jiffies_till_sched_qs from other kernel parameters The jiffies_till_sched_qs value used to determine how old a grace period must be before RCU enlists the help of the scheduler to force a quiescent state on the holdout CPU. Currently, this defaults to HZ/10 regardless of system size and may be set only at boot time. This can be a problem for very large systems, because if the values of the jiffies_till_first_fqs and jiffies_till_next_fqs kernel parameters are left at their defaults, they are calculated to increase as the number of CPUs actually configured on the system increases. Thus, on a sufficiently large system, RCU would enlist the help of the scheduler before the grace-period kthread had a chance to scan for idle CPUs, which wastes CPU time. This commit therefore allows jiffies_till_sched_qs to be set, if desired, but if left as default, computes is as jiffies_till_first_fqs plus twice jiffies_till_next_fqs, thus allowing three force-quiescent-state scans for idle CPUs. This scales with the number of CPUs, providing sensible default values. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:43 -07:00
Paul E. McKenney	7e28c5af4e	rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structure The ->rcu_qs_ctr counter was intended to allow providing a lightweight report of a quiescent state to all RCU flavors. But now that there is only one flavor of RCU in any one running kernel, there is no point in having this feature. This commit therefore removes the ->rcu_qs_ctr field from the rcu_dynticks structure and the ->rcu_qs_ctr_snap field from the rcu_data structure. This results in the "rqc" option to the rcu_fqs trace event no longer being used, so this commit also removes the "rqc" description from the header comment. While in the neighborhood, this commit also causes the forward-progress request .rcu_need_heavy_qs be set one jiffies_till_sched_qs interval later in the grace period than the first setting of .rcu_urgent_qs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:42 -07:00
Paul E. McKenney	dd46a7882c	rcu: Inline _rcu_barrier() into its sole remaining caller Because rcu_barrier() is a one-line wrapper function for _rcu_barrier() and because nothing else calls _rcu_barrier(), this commit inlines _rcu_barrier() into rcu_barrier(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:39 -07:00
Paul E. McKenney	395a2f097e	rcu: Define rcu_all_qs() only in !PREEMPT builds Now that rcu_all_qs() is used only in !PREEMPT builds, move it to tree_plugin.h so that it is defined only in those builds. This in turn means that rcu_momentary_dyntick_idle() is only used in !PREEMPT builds, but it is simply marked __maybe_unused in order to keep it near the rest of the dyntick-idle code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:37 -07:00
Paul E. McKenney	0ae86a2726	rcu: Clean up flavor-related definitions and comments in tree_plugin.h Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:35 -07:00
Paul E. McKenney	4e95020cdd	rcu: Inline increment_cpu_stall_ticks() into its sole caller Consolidation of the RCU flavors into one makes increment_cpu_stall_ticks() a trivial one-line function with only one caller. This commit therefore inlines it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:23 -07:00
Paul E. McKenney	b97d23c51c	rcu: Remove for_each_rcu_flavor() flavor-traversal macro Now that there is only ever a single flavor of RCU in a given kernel build, there isn't a whole lot of point in having a flavor-traversal macro. This commit therefore removes it and converts calls to it to straightline code, inlining trivial functions as appropriate. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:18 -07:00
Paul E. McKenney	564a9ae604	rcu: Remove last non-flavor-traversal rsp local variable from tree_plugin.h This commit removes the last non-flavor-traversal rsp local variable from kernel/rcu/tree_plugin.h in favor of &rcu_state. The flavor-traversal locals will be removed with the removal of flavor traversal. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:17 -07:00
Paul E. McKenney	88d1bead85	rcu: Remove rcu_data structure's ->rsp field Now that there is only one rcu_state structure, there is no need for the rcu_data structure to indicate which it corresponds to. This commit therefore removes the rcu_data structure's ->rsp field, replacing all remaining uses of it with &rcu_state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:17 -07:00
Paul E. McKenney	aedf4ba984	rcu: Remove rsp parameter from rcu_node tree accessor macros There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's rcu_node tree's accessor macros. This commit therefore removes the rsp parameter from those macros in kernel/rcu/rcu.h, and removes some now-unused rsp local variables while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:16 -07:00
Paul E. McKenney	63d4c8c979	rcu: Remove rsp parameter from expedited grace-period functions There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from the code in kernel/rcu/tree_exp.h, and removes all of the rsp local variables while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:14 -07:00
Paul E. McKenney	4580b0541b	rcu: Remove rsp parameter from no-CBs CPU functions There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_nocb_cpu_needs_barrier(), rcu_spawn_one_nocb_kthread(), rcu_organize_nocb_kthreads(), rcu_nocb_cpu_needs_barrier(), and rcu_nohz_full_cpu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:13 -07:00
Paul E. McKenney	b21ebed951	rcu: Remove rsp parameter from print_cpu_stall_info() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from print_cpu_stall_info(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:12 -07:00
Paul E. McKenney	6dbfdc1409	rcu: Remove rsp parameter from rcu_spawn_one_boost_kthread() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_spawn_one_boost_kthread(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:11 -07:00
Paul E. McKenney	81ab59a3ad	rcu: Remove rsp parameter from dump_blkd_tasks() and friend There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from dump_blkd_tasks() and rcu_preempt_blocked_readers_cgp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:10 -07:00
Paul E. McKenney	a2887cd85f	rcu: Remove rsp parameter from rcu_print_detail_task_stall() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_print_detail_task_stall(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:09 -07:00
Paul E. McKenney	5bb5d09cc4	rcu: Remove rsp parameter from rcu_do_batch() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_do_batch(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:05 -07:00
Paul E. McKenney	15cabdffbb	rcu: Remove rsp parameter from note_gp_changes() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from note_gp_changes(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:02 -07:00
Paul E. McKenney	02f501423d	rcu: Remove rsp parameter from rcu_accelerate_cbs() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_accelerate_cbs(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:00 -07:00
Paul E. McKenney	532c00c97f	rcu: Remove rsp parameter from rcu_gp_kthread_wake() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_gp_kthread_wake(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:59 -07:00
Paul E. McKenney	336a4f6c45	rcu: Remove rsp parameter from rcu_get_root() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_get_root(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:55 -07:00
Paul E. McKenney	de8e87305a	rcu: Remove rsp parameter from rcu_gp_in_progress() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_gp_in_progress(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:54 -07:00
Paul E. McKenney	139ad4da5a	rcu: Remove rsp parameter from rcu_report_unblock_qs_rnp() There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_report_unblock_qs_rnp(), which is particularly appropriate in this case given that this parameter is no longer used. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:53 -07:00
Paul E. McKenney	2280ee5a7d	rcu: Remove rcu_data_p pointer to default rcu_data structure The rcu_data_p pointer references the default set of per-CPU rcu_data structures, that is, those that call_rcu() uses, as opposed to call_rcu_bh() and sometimes call_rcu_sched(). But there is now only one set of per-CPU rcu_data structures, so that one set is by definition the default, which means that the rcu_data_p pointer no longer serves any useful purpose. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:51 -07:00
Paul E. McKenney	16fc9c600b	rcu: Remove rcu_state_p pointer to default rcu_state structure The rcu_state_p pointer references the default rcu_state structure, that is, the one that call_rcu() uses, as opposed to call_rcu_bh() and sometimes call_rcu_sched(). But there is now only one rcu_state structure, so that one structure is by definition the default, which means that the rcu_state_p pointer no longer serves any useful purpose. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:50 -07:00
Paul E. McKenney	da1df50d16	rcu: Remove rcu_state structure's ->rda field The rcu_state structure's ->rda field was used to find the per-CPU rcu_data structures corresponding to that rcu_state structure. But now there is only one rcu_state structure (creatively named "rcu_state") and one set of per-CPU rcu_data structures (creatively named "rcu_data"). Therefore, uses of the ->rda field can always be replaced by "rcu_data, and this commit makes that change and removes the ->rda field. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:49 -07:00
Paul E. McKenney	45975c7d21	rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds Now that RCU-preempt knows about preemption disabling, its implementation of synchronize_rcu() works for synchronize_sched(), and likewise for the other RCU-sched update-side API members. This commit therefore confines the RCU-sched update-side code to CONFIG_PREEMPT=n builds, and defines RCU-sched's update-side API members in terms of those of RCU-preempt. This means that any given build of the Linux kernel has only one update-side flavor of RCU, namely RCU-preempt for CONFIG_PREEMPT=y builds and RCU-sched for CONFIG_PREEMPT=n builds. This in turn means that kernels built with CONFIG_RCU_NOCB_CPU=y have only one rcuo kthread per CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com>	2018-08-30 16:02:45 -07:00
Paul E. McKenney	2bbfc25b09	rcu: Drop "wake" parameter from rcu_report_exp_rdp() The rcu_report_exp_rdp() function is always invoked with its "wake" argument set to "true", so this commit drops this parameter. The only potential call site that would use "false" is in the code driving the expedited grace period, and that code uses rcu_report_exp_cpu_mult() instead, which therefore retains its "wake" parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:43 -07:00
Paul E. McKenney	65cfe3583b	rcu: Define RCU-bh update API in terms of RCU Now that the main RCU API knows about softirq disabling and softirq's quiescent states, the RCU-bh update code can be dispensed with. This commit therefore removes the RCU-bh update-side implementation and defines RCU-bh's update-side API in terms of that of either RCU-preempt or RCU-sched, depending on the setting of the CONFIG_PREEMPT Kconfig option. In kernels built with CONFIG_RCU_NOCB_CPU=y this has the knock-on effect of reducing by one the number of rcuo kthreads per CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:40 -07:00
Paul E. McKenney	ba1c64c272	rcu: Report expedited grace periods at context-switch time This commit reduces the latency of expedited RCU grace periods by reporting a quiescent state for the CPU at context-switch time. In CONFIG_PREEMPT=y kernels, if the outgoing task is still within an RCU read-side critical section (and thus still blocking some grace period, perhaps including this expedited grace period), then that task will already have been placed on one of the leaf rcu_node structures' ->blkd_tasks list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:38 -07:00
Paul E. McKenney	d28139c4e9	rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe One necessary step towards consolidating the three flavors of RCU is to make sure that the resulting consolidated "one flavor to rule them all" correctly handles networking denial-of-service attacks. One thing that allows RCU-bh to do so is that __do_softirq() invokes rcu_bh_qs() every so often, and so something similar has to happen for consolidated RCU. This must be done carefully. For example, if a preemption-disabled region of code takes an interrupt which does softirq processing before returning, consolidated RCU must ignore the resulting rcu_bh_qs() invocations -- preemption is still disabled, and that means an RCU reader for the consolidated flavor. This commit therefore creates a new rcu_softirq_qs() that is called only from the ksoftirqd task, thus avoiding the interrupted-a-preempted-region problem. This new rcu_softirq_qs() function invokes rcu_sched_qs(), rcu_preempt_qs(), and rcu_preempt_deferred_qs(). The latter call handles any deferred quiescent states. Note that __do_softirq() still invokes rcu_bh_qs(). It will continue to do so until a later stage of cleanup when the RCU-bh flavor is removed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix !SMP issue located by kbuild test robot. ]	2018-08-30 16:02:38 -07:00
Paul E. McKenney	fcc878e4df	rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special union The ->b.exp_need_qs field is now set only to false, so this commit removes it. The job this field used to do is now done by the rcu_data structure's ->deferred_qs field, which is a consequence of a better split between task-based (the rcu_node structure's ->exp_tasks field) and CPU-based (the aforementioned rcu_data structure's ->deferred_qs field) tracking of quiescent states for RCU-preempt expedited grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:36 -07:00
Paul E. McKenney	27c744e32a	rcu: Allow processing deferred QSes for exiting RCU-preempt readers If an RCU-preempt read-side critical section is exiting, that is, ->rcu_read_lock_nesting is negative, then it is a good time to look at the possibility of reporting deferred quiescent states. This commit therefore updates the checks in rcu_preempt_need_deferred_qs() to allow exiting critical sections to report deferred quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:35 -07:00
Paul E. McKenney	3e31009898	rcu: Defer reporting RCU-preempt quiescent states when disabled This commit defers reporting of RCU-preempt quiescent states at rcu_read_unlock_special() time when any of interrupts, softirq, or preemption are disabled. These deferred quiescent states are reported at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline operation. Of course, if another RCU read-side critical section has started in the meantime, the reporting of the quiescent state will be further deferred. This also means that disabling preemption, interrupts, and/or softirqs will act as an RCU-preempt read-side critical section. This is enforced by checking preempt_count() as needed. Some special cases must be handled on an ad-hoc basis, for example, context switch is a quiescent state even though both the scheduler and do_exit() disable preemption. In these cases, additional calls to rcu_preempt_deferred_qs() override the preemption disabling. Similar logic overrides disabled interrupts in rcu_preempt_check_callbacks() because in this case the quiescent state happened just before the corresponding scheduling-clock interrupt. In theory, this change lifts a long-standing restriction that required that if interrupts were disabled across a call to rcu_read_unlock() that the matching rcu_read_lock() also be contained within that interrupts-disabled region of code. Because the reporting of the corresponding RCU-preempt quiescent state is now deferred until after interrupts have been enabled, it is no longer possible for this situation to result in deadlocks involving the scheduler's runqueue and priority-inheritance locks. This may allow some code simplification that might reduce interrupt latency a bit. Unfortunately, in practice this would also defer deboosting a low-priority task that had been subjected to RCU priority boosting, so real-time-response considerations might well force this restriction to remain in place. Because RCU-preempt grace periods are now blocked not only by RCU read-side critical sections, but also by disabling of interrupts, preemption, and softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels. This may require some additional plumbing to provide the network denial-of-service guarantees that have been traditionally provided by RCU-bh. Once these are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched. This would mean that all kernels would have but one flavor of RCU, which would open the door to significant code cleanup. Moving to a single flavor of RCU would also have the beneficial effect of reducing the NOCB kthreads by at least a factor of two. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback from Joel Fernandes. ] [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located by kbuild test robot involving recursion via rcu_preempt_deferred_qs(). ]	2018-08-30 16:02:34 -07:00
Linus Torvalds	f7951c33f0	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Cleanup and improvement of NUMA balancing - Refactoring and improvements to the PELT (Per Entity Load Tracking) code - Watchdog simplification and related cleanups - The usual pile of small incremental fixes and improvements * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) watchdog: Reduce message verbosity stop_machine: Reflow cpu_stop_queue_two_works() sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() sched/numa: Use group_weights to identify if migration degrades locality sched/numa: Update the scan period without holding the numa_group lock sched/numa: Remove numa_has_capacity() sched/numa: Modify migrate_swap() to accept additional parameters sched/numa: Remove unused task_capacity from 'struct numa_stats' sched/numa: Skip nodes that are at 'hoplimit' sched/debug: Reverse the order of printing faults sched/numa: Use task faults only if numa_group is not yet set up sched/numa: Set preferred_node based on best_cpu sched/numa: Simplify load_too_imbalanced() sched/numa: Evaluate move once per node sched/numa: Remove redundant field sched/debug: Show the sum wait time of a task group sched/fair: Remove #ifdefs from scale_rt_capacity() sched/core: Remove get_cpu() from sched_fork() sched/cpufreq: Clarify sugov_get_util() sched/sysctl: Remove unused sched_time_avg_ms sysctl ...	2018-08-13 11:25:07 -07:00
Paul E. McKenney	89b4cd4b9e	rcu: Print stall-warning NMI dyntick state in hexadecimal The ->dynticks_nmi_nesting field records the nesting depth of both interrupt and NMI handlers. Because the kernel can enter interrupts and never leave them (and vice versa) and because NMIs can interrupt manipulation of the ->dynticks_nmi_nesting field, the values in this field must be both chosen and maniupated very carefully. As a result, although the value is zero when the corresponding CPU is executing neither an interrupt nor an NMI handler, it is 4,611,686,018,427,387,906 on 64-bit systems when there is a single level of interrupt/NMI handling in progress. This number is difficult to remember and interpret, so this commit switches the output to hexadecimal, resulting in the much nicer 0x4000000000000002. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:24 -07:00
Paul E. McKenney	ab6b82147f	rcu: Remove unused local variable "cpu" One danger of using __maybe_unused is that the compiler doesn't yell at you when you remove the last reference, witness rcu_bind_gp_kthread() and its local variable "cpu". This commit removes this local variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:17 -07:00
Paul E. McKenney	164ba3fc48	rcu: Remove unused rcu_kick_nohz_cpu() function The rcu_kick_nohz_cpu() function is no longer used, and the functionality it used to provide is now provided by a call to resched_cpu() in the force-quiescent-state function rcu_implicit_dynticks_qs(). This commit therefore removes rcu_kick_nohz_cpu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:17 -07:00
Paul E. McKenney	c7037ff524	rcu: Clarify and correct the rcu_preempt_qs() header comment The rcu_preempt_qs() function only applies to the CPU, not the task. A task really is allowed to invoke this function while in an RCU-preempt read-side critical section, but only if it has first added itself to some leaf rcu_node structure's ->blkd_tasks list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:16 -07:00
Paul E. McKenney	15651201fa	rcu: Mark task as .need_qs less aggressively If any scheduling-clock interrupt interrupts an RCU-preempt read-side critical section, the interrupted task's ->rcu_read_unlock_special.b.need_qs field is set. This causes the outermost rcu_read_unlock() to incur the extra overhead of calling into rcu_read_unlock_special(). This commit reduces that overhead by setting ->rcu_read_unlock_special.b.need_qs only if the grace period has been in effect for more than one second. Why one second? Because this is comfortably smaller than the minimum RCU CPU stall-warning timeout of three seconds, but long enough that the .need_qs marking should happen quite rarely. And if your RCU read-side critical section has run on-CPU for a full second, it is not unreasonable to invest some CPU time in ending the grace period quickly. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:15 -07:00
Joe Perches	a7538352da	rcu: Use pr_fmt to prefix "rcu: " to logging output This commit also adjusts some whitespace while in the area. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Revert string-breaking %s as requested by Andy Shevchenko. ]	2018-07-12 15:39:13 -07:00
Paul E. McKenney	3949fa9bac	rcu: Make rcu_read_unlock_special() static Because rcu_read_unlock_special() is no longer used outside of kernel/rcu/tree_plugin.h, this commit makes it static. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:11 -07:00
Paul E. McKenney	5773894231	rcu: Add CPU online/offline state to dump_blkd_tasks() Interactions between CPU-hotplug operations and grace-period initialization can result in dump_blkd_tasks(). One of the first debugging actions in this case is to search back in dmesg to work out which of the affected rcu_node structure's CPUs are online and to determine the last CPU-hotplug operation affecting any of those CPUs. This can be laborious and error-prone, especially when console output is lost. This commit therefore causes dump_blkd_tasks() to dump the state of the affected rcu_node structure's CPUs and the last grace period during which the last offline and online operation affected each of these CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:09 -07:00
Paul E. McKenney	ff3cee3908	rcu: Add up-tree information to dump_blkd_tasks() diagnostics This commit updates dump_blkd_tasks() to print out quiescent-state bitmasks for the rcu_node structures further up the tree. This information helps debugging of interactions between CPU-hotplug operations and RCU grace-period initialization. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:08 -07:00
Paul E. McKenney	1f3e5f51b9	rcu: Add RCU-preempt check for waiting on newly onlined CPU RCU should only be waiting on CPUs that were online at the time that the current grace period started. Failure to abide by this rule can result in confusing splats during grace-period cleanup and initialization. This commit therefore adds a check to RCU-preempt's preempted-task queuing that checks for waiting on newly onlined CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:05 -07:00
Paul E. McKenney	0b107d24d9	rcu: Suppress false-positive splats from mid-init task resume Consider the following sequence of events in a PREEMPT=y kernel: 1. All CPUs corresponding to a given leaf rcu_node structure are offline. 2. The first phase of the rcu_gp_init() function's grace-period initialization runs, and sets that rcu_node structure's ->qsmaskinit to zero, as it should. 3. One of the CPUs corresponding to that rcu_node structure comes back online. Note that because this CPU came online after the grace period started, this grace period can safely ignore this newly onlined CPU. 4. A task running on the newly onlined CPU enters an RCU-preempt read-side critical section, and is then preempted. Because the corresponding rcu_node structure's ->qsmask is zero, rcu_preempt_ctxt_queue() leaves the rcu_node structure's ->gp_tasks field NULL, as it should. 5. The rcu_gp_init() function continues running the second phase of grace-period initialization. The ->qsmask field of the parent of the aforementioned leaf rcu_node structure is set to not expect a quiescent state from the leaf, as is only right and proper. However, when rcu_gp_init() reaches the leaf, it invokes rcu_preempt_check_blocked_tasks(), which sees that the leaf's ->blkd_tasks list is non-empty, and therefore sets the leaf's ->gp_tasks field to reference the first task on that list. 6. The grace period ends before the preempted task resumes, which is perfectly fine, given that this grace period was under no obligation to wait for that task to exit its late-starting RCU-preempt read-side critical section. Unfortunately, the leaf's ->gp_tasks field is non-NULL, so rcu_gp_cleanup() splats. After all, it appears to rcu_gp_cleanup() that the grace period failed to wait for a task that was supposed to be blocking that grace period. This commit avoids this false-positive splat by adding a check of both ->qsmaskinit and ->wait_blkd_tasks to rcu_preempt_check_blocked_tasks(). If both ->qsmaskinit and ->wait_blkd_tasks are zero, then the task must have entered its RCU-preempt read-side critical section late (after all, the CPU that it is running on was not online at that time), which means that the upper-level rcu_node structure won't be waiting for anything on the leaf anyway. If ->wait_blkd_tasks is non-zero, then there is at least one task on ths rcu_node structure's ->blkd_tasks list whose RCU read-side critical section predates the current grace period. If ->qsmaskinit is non-zero, there is at least one CPU that was online at the start of the current grace period. Thus, if both are zero, there is nothing to wait for. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:03 -07:00
Paul E. McKenney	77cfc7bf24	rcu: Fix typo and add additional debug This commit fixes a typo and adds some additional debugging to the message emitted when a task blocking the current grace period is listed as blocking it when either that grace period ends or the next grace period begins. This commit also reformats the console message for readability. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:00 -07:00
Paul E. McKenney	ff3bb6f4d0	rcu: Remove ->gpnum and ->completed Now that everything has been converted to use ->gp_seq instead of ->gpnum and ->completed, this commit removes ->gpnum and ->completed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:38:48 -07:00
Paul E. McKenney	db023296f0	rcu: Convert rcu_quiescent_state_report tracepoint to ->gp_seq This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:38:47 -07:00
Paul E. McKenney	865aa1e08d	rcu: Convert rcu_unlock_preempted_task tracepoint to ->gp_seq This commit makes the rcu_unlock_preempted_task tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:38:46 -07:00
Paul E. McKenney	598ce09480	rcu: Convert rcu_preempt_task tracepoint to ->gp_seq This commit makes the rcu_preempt_task tracepoint use ->gp_seq instead of ->gpnum. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:38:45 -07:00
Paul E. McKenney	477351f782	rcu: Convert rcu_grace_period tracepoint to gp_seq This commit makes the rcu_grace_period tracepoint use gp_seq instead of ->gpnum or ->completed. It also introduces a "cpuofl-bgp" string to less obscurely indicate when a CPU has gone offline while a grace period is waiting on it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:38:43 -07:00
Paul E. McKenney	ab5e869c1f	rcu: Make rcu_nocb_wait_gp() check if GP already requested This commit makes rcu_nocb_wait_gp() check rdp->gp_seq_needed to see if the current CPU already knows about the needed grace period having already been requested. If so, it avoids acquiring the corresponding leaf rcu_node structure's ->lock, thus decreasing contention. This optimization is intended for cases where either multiple leader rcuo kthreads are running on the same CPU or these kthreads are running on a non-offloaded (e.g., housekeeping) CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Move lock release past "if" as suggested by Joel Fernandes. ] [ paulmck: Fix caching of furthest-future requested grace period. ]	2018-07-12 15:38:42 -07:00
Paul E. McKenney	471f87c3d9	rcu: Make RCU CPU stall warnings use ->gp_seq This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(), print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum and ->completed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 14:27:56 -07:00
Paul E. McKenney	29365e563b	rcu: Convert grace-period requests to ->gp_seq This commit converts the grace-period request code paths from ->completed and ->gpnum to ->gp_seq. The need_future_gp_element() macro encapsulates the shift operation required to use ->gp_seq as an index to the ->need_future_gp[] array. The rcu_cbs_completed() function is removed in favor of the rcu_seq_snap() function. The rcu_start_this_gp() gets some temporary consistency checks and uses rcu_seq_done(), rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place of the earlier open-coded comparisons of ->gpnum and ->completed. The rcu_future_gp_cleanup() function replaces use of ->completed with ->gp_seq. The rcu_accelerate_cbs() function replaces a call to rcu_cbs_completed() with one to rcu_seq_snap(). The rcu_advance_cbs() function replaces an access to >completed with one to ->gp_seq and adds some temporary warnings. The rcu_nocb_wait_gp() function replaces a call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded comparison with rcu_seq_done(). The temporary warnings will be removed when the various ->gpnum and ->completed fields are removed. Their purpose is to locate code who might still be using ->gpnum and ->completed. (Much easier that way than trying to trace down the causes of too-short grace periods and grace-period hangs!) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 14:27:55 -07:00
Paul E. McKenney	d43a5d32e1	rcu: Convert ->completedqs to ->gp_seq This commit switches the quiescent-state no-backtracking checks from ->gpnum and ->completed to ->gp_seq. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 14:27:54 -07:00
Paul E. McKenney	8aa670cdac	rcu: Convert ->rcu_iw_gpnum to ->gp_seq This commit switches the interrupt-disabled detection mechanism to ->gp_seq. This mechanism is used as part of RCU CPU stall warnings, and detects cases where the stall is due to a CPU having interrupts disabled. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 14:27:53 -07:00
Paul E. McKenney	e0da2374c3	rcu: Move rcu_nocb_gp_get() to ->gp_seq This commit makes rcu_try_advance_all_cbs() use ->gp_seq. It uses rcu_seq_ctr() in order to shift away the state bits, so that the low-order bits of the result may safely be used to index ->nocb_gp_wq[]. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 14:27:52 -07:00
Paul E. McKenney	03c8cb765a	rcu: Move rcu_try_advance_all_cbs() to ->gp_seq This commit makes rcu_try_advance_all_cbs() use ->gp_seq, with the exception of tracing, which will be converted later. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 14:27:51 -07:00
Boqun Feng	ce11fae8d4	rcu: Use the proper lockdep annotation in dump_blkd_tasks() Sparse reported this: \| kernel/rcu/tree_plugin.h:814:9: warning: incorrect type in argument 1 (different modifiers) \| kernel/rcu/tree_plugin.h:814:9: expected struct lockdep_map const lock \| kernel/rcu/tree_plugin.h:814:9: got struct lockdep_map [noderef] <noident> This is caused by using vanilla lockdep annotations on rcu_node::lock, and that requires accessing ->lock of rcu_node directly. However we need to keep rcu_node::lock __private to avoid breaking its extra ordering guarantee. And we have a dedicated lockdep annotation for rcu_node::lock, so use it. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-26 12:25:55 -07:00
Paul E. McKenney	4bc8d55574	rcu: Add debugging info to assertion The WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp()) in rcu_gp_cleanup() triggers (inexplicably, of course) every so often. This commit therefore extracts more information. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-26 12:25:55 -07:00
Peter Zijlstra	b3dae109fa	sched/swait: Rename to exclusive Since swait basically implemented exclusive waits only, make sure the API reflects that. $ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]" -e "\<prepare_to_swait\>" \| while read file; do sed -i -e 's/\<swake_up\>/&_one/g' -e 's/\<swait_event[^ (]/&_exclusive/g' -e 's/\<prepare_to_swait\>/&_exclusive/g' $file; done With a few manual touch-ups. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org	2018-06-20 11:35:56 +02:00
Paul E. McKenney	22df7316ac	Merge branches 'exp.2018.05.15a', 'fixes.2018.05.15a', 'lock.2018.05.15a' and 'torture.2018.05.15a' into HEAD exp.2018.05.15a: Parallelize expedited grace-period initialization. fixes.2018.05.15a: Miscellaneous fixes. lock.2018.05.15a: Decrease lock contention on root rcu_node structure, which is a step towards merging RCU flavors. torture.2018.05.15a: Torture-test updates.	2018-05-15 10:33:05 -07:00
Paul E. McKenney	41e80595ab	rcu: Make rcu_start_future_gp() caller select grace period The rcu_accelerate_cbs() function selects a grace-period target, which it uses to have rcu_segcblist_accelerate() assign numbers to recently queued callbacks. Then it invokes rcu_start_future_gp(), which selects a grace-period target again, which is a bit pointless. This commit therefore changes rcu_start_future_gp() to take the grace-period target as a parameter, thus avoiding double selection. This commit also changes the name of rcu_start_future_gp() to rcu_start_this_gp() to reflect this change in functionality, and also makes a similar change to the name of trace_rcu_future_gp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:32 -07:00
Paul E. McKenney	fb31340f8a	rcu: Make rcu_gp_cleanup() more accurately predict need for new GP Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset state to reflect the end of the grace period. It also checks to see whether a new grace period is needed, but in a number of cases, rather than directly cause the new grace period to be immediately started, it instead leaves the grace-period-needed state where various fail-safes can find it. This works fine, but results in higher contention on the root rcu_node structure's ->lock, which is undesirable, and contention on that lock has recently become noticeable. This commit therefore makes rcu_gp_cleanup() immediately start a new grace period if there is any need for one. It is quite possible that it will later be necessary to throttle the grace-period rate, but that can be dealt with when and if. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:28 -07:00

1 2 3 4 5 ...

487 Commits