linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-12 05:24:12 +08:00

Author	SHA1	Message	Date
Paul E. McKenney	8c42b1f39f	rcu: Exclude near-simultaneous RCU CPU stall warnings There is a two-jiffy delay between the time that a CPU will self-report an RCU CPU stall warning and the time that some other CPU will report a warning on behalf of the first CPU. This has worked well in the past, but on busy systems, it is possible for the two warnings to overlap, which makes interpreting them extremely difficult. This commit therefore uses a cmpxchg-based timing decision that allows only one report in a given one-minute period (assuming default stall-warning Kconfig parameters). This approach will of course fail if you are seeing minute-long vCPU preemption, but in that case the overlapping RCU CPU stall warnings are the least of your worries. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-26 12:25:56 -07:00
Boqun Feng	ce11fae8d4	rcu: Use the proper lockdep annotation in dump_blkd_tasks() Sparse reported this: \| kernel/rcu/tree_plugin.h:814:9: warning: incorrect type in argument 1 (different modifiers) \| kernel/rcu/tree_plugin.h:814:9: expected struct lockdep_map const lock \| kernel/rcu/tree_plugin.h:814:9: got struct lockdep_map [noderef] <noident> This is caused by using vanilla lockdep annotations on rcu_node::lock, and that requires accessing ->lock of rcu_node directly. However we need to keep rcu_node::lock __private to avoid breaking its extra ordering guarantee. And we have a dedicated lockdep annotation for rcu_node::lock, so use it. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-26 12:25:55 -07:00
Paul E. McKenney	4bc8d55574	rcu: Add debugging info to assertion The WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp()) in rcu_gp_cleanup() triggers (inexplicably, of course) every so often. This commit therefore extracts more information. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-26 12:25:55 -07:00
Paul E. McKenney	6050003763	torture: Keep old-school dmesg format This commit adds "#define pr_fmt(fmt) fmt" to the torture-test files in order to keep the current dmesg format. Once Joe's commits have hit mainline, these definitions will be changed in order to automatically generate the dmesg line prefix that the scripts expect. This will have the beneficial side-effect of allowing printk() formats to be used more widely and of shortening some pr_*() lines. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Joe Perches <joe@perches.com>	2018-06-25 11:30:10 -07:00
Paul E. McKenney	90127d605f	torture: Make online/offline messages appear only for verbose=2 Some bugs reproduce quickly only at high CPU-hotplug rates, so the rcutorture TREE03 scenario now has only 200 milliseconds spacing between CPU-hotplug operations. At this rate, the torture-test pair of console messages per operation becomes a bit voluminous. This commit therefore converts the torture-test set of "verbose" kernel-boot arguments from bool to int, and prints the extra console messages only when verbose=2. The default is still verbose=1. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-25 11:30:10 -07:00
Paul E. McKenney	5ab07a8df4	srcu: Add address of first callback to rcutorture output This commit adds the address of the first callback to the per-CPU rcutorture output in order to allow lost wakeups to be more efficiently tracked down. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-25 11:26:24 -07:00
Paul E. McKenney	17294ce6a4	srcu: Document that srcu_funnel_gp_start() implies srcu_funnel_exp_start() This commit updates the header comment of srcu_funnel_gp_start() to document the fact that srcu_funnel_gp_start() does the work of srcu_funnel_exp_start(), in some cases by invoking it directly. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-25 11:26:24 -07:00
Paul E. McKenney	5ef98a6328	srcu: Fix typos in __call_srcu() header comment This commit simply changes some copy-pasta call_rcu() instances to the correct call_srcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-25 11:26:24 -07:00
Paul E. McKenney	5257514d88	rcu: Make expedited grace period use direct call on last leaf During expedited grace-period initialization, a work item is scheduled for each leaf rcu_node structure. However, that initialization code is itself (normally) executing from a workqueue, so one of the leaf rcu_node structures could just as well be handled by that pre-existing workqueue, and with less overhead. This commit therefore uses a shiny new rcu_is_leaf_node() macro to execute the last leaf rcu_node structure's initialization directly from the pre-existing workqueue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-06-25 11:25:41 -07:00
Peter Zijlstra	b3dae109fa	sched/swait: Rename to exclusive Since swait basically implemented exclusive waits only, make sure the API reflects that. $ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]" -e "\<prepare_to_swait\>" \| while read file; do sed -i -e 's/\<swake_up\>/&_one/g' -e 's/\<swait_event[^ (]/&_exclusive/g' -e 's/\<prepare_to_swait\>/&_exclusive/g' $file; done With a few manual touch-ups. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org	2018-06-20 11:35:56 +02:00
Kees Cook	42bc47b353	treewide: Use array_size() in vmalloc() The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| vmalloc( - sizeof(u8) * COUNT + COUNT , ...) \| vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| vmalloc( - sizeof(char) * COUNT + COUNT , ...) \| vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) \| vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) \| vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Peter Zijlstra	f64c6013a2	rcu/x86: Provide early rcu_cpu_starting() callback The x86/mtrr code does horrific things because hardware. It uses stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper thread on another CPU), which uses RCU, all before the CPU is onlined. RCU complains about this, because wakeups use RCU and RCU does (rightfully) not consider offline CPUs for grace-periods. Fix this by initializing RCU way early in the MTRR case. Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Add !SMP support, per 0day Test Robot report. ]	2018-05-22 16:12:26 -07:00
Paul E. McKenney	22df7316ac	Merge branches 'exp.2018.05.15a', 'fixes.2018.05.15a', 'lock.2018.05.15a' and 'torture.2018.05.15a' into HEAD exp.2018.05.15a: Parallelize expedited grace-period initialization. fixes.2018.05.15a: Miscellaneous fixes. lock.2018.05.15a: Decrease lock contention on root rcu_node structure, which is a step towards merging RCU flavors. torture.2018.05.15a: Torture-test updates.	2018-05-15 10:33:05 -07:00
Paul E. McKenney	034777d7f5	rcutorture: Print end-of-test state This commit adds end-of-test state printout to help check whether RCU shut down nicely. Note that this printout only helps for flavors of RCU that are not used much by the kernel. In particular, for normal RCU having a grace period in progress is expected behavior. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:32:08 -07:00
Paul E. McKenney	a458360af6	rcu: Drop early GP request check from rcu_gp_kthread() Now that grace-period requests use funnel locking and now that they set ->gp_flags to RCU_GP_FLAG_INIT even when the RCU grace-period kthread has not yet started, rcu_gp_kthread() no longer needs to check need_any_future_gp() at startup time. This commit therefore removes this check. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:31:04 -07:00
Paul E. McKenney	c1935209df	rcu: Simplify and inline cpu_needs_another_gp() Now that RCU no longer relies on failsafe checks, cpu_needs_another_gp() can be greatly simplified. This simplification eliminates the last call to rcu_future_needs_gp() and to rcu_segcblist_future_gp_needed(), both of which which can then be eliminated. And then, because cpu_needs_another_gp() is called only from __rcu_pending(), it can be inlined and eliminated. This commit carries out the simplification, inlining, and elimination called out above. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:59 -07:00
Paul E. McKenney	384f77f4cb	rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp() All of the cpu_needs_another_gp() function's checks (except for newly arrived callbacks) have been subsumed into the rcu_gp_cleanup() function's scan of the rcu_node tree. This commit therefore drops the call to cpu_needs_another_gp(). The check for newly arrived callbacks is supplied by rcu_accelerate_cbs(). Any needed advancing (as in the earlier rcu_advance_cbs() call) will be supplied when the corresponding CPU becomes aware of the end of the now-completed grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:54 -07:00
Paul E. McKenney	665f08f1ce	rcu: Make rcu_start_this_gp() check for out-of-range requests If rcu_start_this_gp() is invoked with a requested grace period more than three in the future, then either the ->need_future_gp[] array needs to be bigger or the caller needs to be repaired. This commit therefore adds a WARN_ON_ONCE() checking for this condition. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:48 -07:00
Paul E. McKenney	360e0da67e	rcu: Add funnel locking to rcu_start_this_gp() The rcu_start_this_gp() function had a simple form of funnel locking that used only the leaves and root of the rcu_node tree, which is fine for systems with only a few hundred CPUs, but sub-optimal for systems having thousands of CPUs. This commit therefore adds full-tree funnel locking. This variant of funnel locking is unusual in the following ways: 1. The leaf-level rcu_node structure's ->lock is held throughout. Other funnel-locking implementations drop the leaf-level lock before progressing to the next level of the tree. 2. Funnel locking can be started at the root, which is convenient for code that already holds the root rcu_node structure's ->lock. Other funnel-locking implementations start at the leaves. 3. If an rcu_node structure other than the initial one believes that a grace period is in progress, it is not necessary to go further up the tree. This is because grace-period cleanup scans the full tree, so that marking the need for a subsequent grace period anywhere in the tree suffices -- but only if a grace period is currently in progress. 4. It is possible that the RCU grace-period kthread has not yet started, and this case must be handled appropriately. However, the general approach of using a tree to control lock contention is still in place. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:37 -07:00
Paul E. McKenney	41e80595ab	rcu: Make rcu_start_future_gp() caller select grace period The rcu_accelerate_cbs() function selects a grace-period target, which it uses to have rcu_segcblist_accelerate() assign numbers to recently queued callbacks. Then it invokes rcu_start_future_gp(), which selects a grace-period target again, which is a bit pointless. This commit therefore changes rcu_start_future_gp() to take the grace-period target as a parameter, thus avoiding double selection. This commit also changes the name of rcu_start_future_gp() to rcu_start_this_gp() to reflect this change in functionality, and also makes a similar change to the name of trace_rcu_future_gp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:32 -07:00
Paul E. McKenney	d5cd96851d	rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp() The rcu_start_gp_advanced() is invoked only from rcu_start_future_gp() and much of its code is redundant when invoked from that context. This commit therefore inlines rcu_start_gp_advanced() into rcu_start_future_gp(), then removes rcu_start_gp_advanced(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:27 -07:00
Paul E. McKenney	a824a287f6	rcu: Clear request other than RCU_GP_FLAG_INIT at GP end Once the grace period has ended, any RCU_GP_FLAG_FQS requests are irrelevant: The grace period has ended, so there is no longer any point in forcing quiescent states in order to try to make it end sooner. This commit therefore causes rcu_gp_cleanup() to clear any bits other than RCU_GP_FLAG_INIT from ->gp_flags at the end of the grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:20 -07:00
Paul E. McKenney	a508aa597e	rcu: Cleanup, don't put ->completed into an int It is true that currently only the low-order two bits are used, so there should be no problem given modern machines and compilers, but good hygiene and maintainability dictates use of an unsigned long instead of an int. This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:15 -07:00
Paul E. McKenney	bd7af8463b	rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs() The __rcu_process_callbacks() function currently checks to see if the current CPU needs a grace period and also if there is any other reason to kick off a new grace period. This is one of the fail-safe checks that has been rendered unnecessary by the changes that increase the accuracy of rcu_gp_cleanup()'s estimate as to whether another grace period is required. Because this particular fail-safe involved acquiring the root rcu_node structure's ->lock, which has seen excessive contention in real life, this fail-safe needs to go. However, one check must remain, namely the check for newly arrived RCU callbacks that have not yet been associated with a grace period. One might hope that the checks in __note_gp_changes(), which is invoked indirectly from rcu_check_quiescent_state(), would suffice, but this function won't be invoked at all if RCU is idle. It is therefore necessary to replace the fail-safe checks with a simpler check for newly arrived callbacks during an RCU idle period, which is exactly what this commit does. This change removes the final call to rcu_start_gp(), so this function is removed as well. Note that lockless use of cpu_needs_another_gp() is racy, but that these races are harmless in this case. If RCU really is idle, the values will not change, so the return value from cpu_needs_another_gp() will be correct. If RCU is not idle, the resulting redundant call to rcu_accelerate_cbs() will be harmless, and might even have the benefit of reducing grace-period latency a bit. This commit also moves interrupt disabling into the "if" statement to improve real-time response a bit. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:30:03 -07:00
Paul E. McKenney	a6058d85a2	rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition When __call_rcu_core() notices excessive numbers of callbacks pending on the current CPU, we know that at least one of them is not yet classified, namely the one that was just now queued. Therefore, it is not necessary to invoke rcu_start_gp() and thus not necessary to acquire the root rcu_node structure's ->lock. This commit therefore replaces the rcu_start_gp() with rcu_accelerate_cbs(), thus replacing an acquisition of the root rcu_node structure's ->lock with that of this CPU's leaf rcu_node structure. This decreases contention on the root rcu_node structure's ->lock. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:57 -07:00
Paul E. McKenney	ec4eaccef4	rcu: Make rcu_migrate_callbacks wake GP kthread when needed The rcu_migrate_callbacks() function invokes rcu_advance_cbs() twice, ignoring the return value. This is OK at pressent because of failsafe code that does the wakeup when needed. However, this failsafe code acquires the root rcu_node structure's lock frequently, while rcu_migrate_callbacks() does so only once per CPU-offline operation. This commit therefore makes rcu_migrate_callbacks() wake up the RCU GP kthread when either call to rcu_advance_cbs() returns true, thus removing need for the failsafe code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:51 -07:00
Paul E. McKenney	6f576e2816	rcu: Convert ->need_future_gp[] array to boolean There is no longer any need for ->need_future_gp[] to count the number of requests for future grace periods, so this commit converts the additions to assignments to "true" and reduces the size of each element to one byte. While we are in the area, fix an obsolete comment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:46 -07:00
Paul E. McKenney	0ae94e00ce	rcu: Make rcu_future_needs_gp() check all ->need_future_gps[] elements Currently, the rcu_future_needs_gp() function checks only the current element of the ->need_future_gps[] array, which might miss elements that were offset from the expected element, for example, due to races with the start or the end of a grace period. This commit therefore makes rcu_future_needs_gp() use the need_any_future_gp() macro to check all of the elements of this array. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:41 -07:00
Paul E. McKenney	51af970d19	rcu: Avoid losing ->need_future_gp[] values due to GP start/end races The rcu_cbs_completed() function provides the value of ->completed at which new callbacks can safely be invoked. This is recorded in two-element ->need_future_gp[] arrays in the rcu_node structure, and the elements of these arrays corresponding to the just-completed grace period are zeroed at the end of that grace period. However, the rcu_cbs_completed() function can return the current ->completed value plus either one or two, so it is possible for the corresponding ->need_future_gp[] entry to be cleared just after it was set, thus losing a request for a future grace period. This commit avoids this race by expanding ->need_future_gp[] to four elements. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:33 -07:00
Paul E. McKenney	fb31340f8a	rcu: Make rcu_gp_cleanup() more accurately predict need for new GP Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset state to reflect the end of the grace period. It also checks to see whether a new grace period is needed, but in a number of cases, rather than directly cause the new grace period to be immediately started, it instead leaves the grace-period-needed state where various fail-safes can find it. This works fine, but results in higher contention on the root rcu_node structure's ->lock, which is undesirable, and contention on that lock has recently become noticeable. This commit therefore makes rcu_gp_cleanup() immediately start a new grace period if there is any need for one. It is quite possible that it will later be necessary to throttle the grace-period rate, but that can be dealt with when and if. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:28 -07:00
Paul E. McKenney	5fe0a56298	rcu: Make rcu_gp_kthread() check for early-boot activity The rcu_gp_kthread() function immediately sleeps waiting to be notified of the need for a new grace period, which currently works because there are a number of code sequences that will provide the needed wakeup later. However, some of these code sequences need to acquire the root rcu_node structure's ->lock, and contention on that lock has started manifesting. This commit therefore makes rcu_gp_kthread() check for early-boot activity when it starts up, omitting the initial sleep in that case. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:24 -07:00
Paul E. McKenney	c91a8675b9	rcu: Add accessor macros for the ->need_future_gp[] array Accessors for the ->need_future_gp[] array are currently open-coded, which makes them difficult to change. To improve maintainability, this commit adds need_future_gp_mask() to compute the indexing mask from the array size, need_future_gp_element() to access the element corresponding to the specified grace-period number, and need_any_future_gp() to determine if any future grace period is needed. This commit also applies need_future_gp_element() to existing open-coded single-element accesses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:18 -07:00
Paul E. McKenney	825a9911f6	rcu: Make rcu_start_future_gp()'s grace-period check more precise The rcu_start_future_gp() function uses a sloppy check for a grace period being in progress, which works today because there are a number of code sequences that resolve the resulting races. However, some of these race-resolution code sequences must acquire the root rcu_node structure's ->lock, and contention on that lock has started manifesting. This commit therefore makes rcu_start_future_gp() check more precise, eliminating the sloppy lockless check of the rcu_state structure's ->gpnum and ->completed fields. The effect is that rcu_start_future_gp() will sometimes unnecessarily attempt to start a new grace period, but this overhead will be reduced later using funnel locking. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:13 -07:00
Paul E. McKenney	9036c2ffd5	rcu: Improve non-root rcu_cbs_completed() accuracy When rcu_cbs_completed() is invoked on a non-root rcu_node structure, it unconditionally assumes that two grace periods must complete before the callbacks at hand can be invoked. This is overly conservative because if that non-root rcu_node structure believes that no grace period is in progress, and if the corresponding rcu_state structure's ->gpnum field has not yet been incremented, then these callbacks may safely be invoked after only one grace period has completed. This change is required to permit grace-period start requests to use funnel locking, which is in turn permitted to reduce root rcu_node ->lock contention, which has been observed by Nick Piggin. Furthermore, such contention will likely be increased by the merging of RCU-bh, RCU-preempt, and RCU-sched, so it makes sense to take steps to decrease it. This commit therefore improves the accuracy of rcu_cbs_completed() when invoked on a non-root rcu_node structure as described above. Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:29:07 -07:00
Paul E. McKenney	5b4c11d54b	rcu: Add leaf-node macros This commit adds rcu_first_leaf_node() that returns a pointer to the first leaf rcu_node structure in the specified RCU flavor and an rcu_is_leaf_node() that returns true iff the specified rcu_node structure is a leaf. This commit also uses these macros where appropriate. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:28:07 -07:00
Paul E. McKenney	f7194ac32c	srcu: Add cleanup_srcu_struct_quiesced() The current cleanup_srcu_struct() flushes work, which prevents it from being invoked from some workqueue contexts, as well as from atomic (non-blocking) contexts. This patch therefore introduced a cleanup_srcu_struct_quiesced(), which can be invoked only after all activity on the specified srcu_struct has completed. This restriction allows cleanup_srcu_struct_quiesced() to be invoked from workqueue contexts as well as from atomic contexts. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nitzan Carmi <nitzanc@mellanox.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:56 -07:00
Yury Norov	17672480fb	rcu: Declare rcu_eqs_special_set() in public header Because rcu_eqs_special_set() is declared only in internal header kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h, it is inaccessible outside of the RCU implementation. This patch therefore moves the rcu_eqs_special_set() declaration to include/linux/rcutree.h, which allows it to be used in non-rcu kernel code. Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:51 -07:00
Paul E. McKenney	265f5f28f0	rcu: Update rcu_bind_gp_kthread() header comment The header comment for rcu_bind_gp_kthread() refers to sysidle, which is no longer with us. However, it is still important to bind RCU's grace-period kthreads to the housekeeping CPU(s), so rather than remove rcu_bind_gp_kthread(), this commit updates the comment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:46 -07:00
Paul E. McKenney	0e5da22e3f	rcu: Move __rcu_read_lock() and __rcu_read_unlock() to tree_plugin.h The __rcu_read_lock() and __rcu_read_unlock() functions were moved to kernel/rcu/update.c in order to implement tiny preemptible RCU. However, tiny preemptible RCU was removed from the kernel a long time ago, so this commit belatedly moves them back into the only remaining preemptible-RCU code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:41 -07:00
Paul E. McKenney	cee4393989	rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs() Commit `e31d28b6ab` ("trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched()") substituted cond_resched() for the earlier call to cond_resched_rcu_qs(). However, the new-age cond_resched() does not do anything to help RCU-tasks grace periods because (1) RCU-tasks is only enabled when CONFIG_PREEMPT=y and (2) cond_resched() is a complete no-op when preemption is enabled. This situation results in hangs when running the trace benchmarks. A number of potential fixes were discussed on LKML (https://lkml.kernel.org/r/20180224151240.0d63a059@vmware.local.home), including making cond_resched() not be a no-op; making cond_resched() not be a no-op, but only when running tracing benchmarks; reverting the aforementioned commit (which works because cond_resched_rcu_qs() does provide an RCU-tasks quiescent state; and adding a call to the scheduler/RCU rcu_note_voluntary_context_switch() function. All were deemed unsatisfactory, either due to added cond_resched() overhead or due to magic functions inviting cargo culting. This commit renames cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs(), which provides a clear hint as to what this function is doing and why and where it should be used, and then replaces the call to cond_resched() with cond_resched_tasks_rcu_qs() in the trace benchmark's benchmark_event_kthread() function. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:29 -07:00
Byungchul Park	6fba2b3767	rcu: Remove deprecated RCU debugfs tracing code Commit `ae91aa0adb` ("rcu: Remove debugfs tracing") removed the RCU debugfs tracing code, but did not remove the no-longer used ->exp_workdone{0,1,2,3} fields in the srcu_data structure. This commit therefore removes these fields along with the code that uselessly updates them. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:23 -07:00
Byungchul Park	efcd2d5436	rcu: Call wake_nocb_leader_defer() with 'FORCE' when nocb_q_count is high If an excessive number of callbacks have been queued, but the NOCB leader kthread's wakeup must be deferred, then we should wake up the leader unconditionally once it is safe to do so. This was handled correctly in commit `fbce7497ee` ("rcu: Parallelize and economize NOCB kthread wakeups"), but then commit `8be6e1b15c` ("rcu: Use timer as backstop for NOCB deferred wakeups") passed RCU_NOCB_WAKE instead of the correct RCU_NOCB_WAKE_FORCE to wake_nocb_leader_defer(). As an interesting aside, RCU_NOCB_WAKE_FORCE is never passed to anything, which should have been taken as a hint. ;-) This commit therefore passes RCU_NOCB_WAKE_FORCE instead of RCU_NOCB_WAKE to wake_nocb_leader_defer() when a callback is queued onto a NOCB CPU that already has an excessive number of callbacks pending. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:18 -07:00
Paul E. McKenney	ef12620626	rcu: Don't allocate rcu_nocb_mask if no one needs it Commit `44c65ff2e3` ("rcu: Eliminate NOCBs CPU-state Kconfig options") made allocation of rcu_nocb_mask depend only on the rcu_nocbs=, nohz_full=, or isolcpus= kernel boot parameters. However, it failed to change the initial value of rcu_init_nohz()'s local variable need_rcu_nocb_mask to false, which can result in useless allocation of an all-zero rcu_nocb_mask. This commit therefore fixes this bug by changing the initial value of need_rcu_nocb_mask to false. While we are in the area, also correct the error message that is printed when someone specifies that can-never-exist CPUs should be NOCBs CPUs. Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Byungchul Park <byungchul.park@lge.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:11 -07:00
Byungchul Park	be01b4cab1	rcu: Inline rcu_preempt_do_callback() into its sole caller The rcu_preempt_do_callbacks() function was introduced in commit 09223371dea(rcu: Use softirq to address performance regression), where it was necessary to handle kernel builds both containing and not containing RCU-preempt. Since then, various changes (most notably `f8b7fc6b51` ("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) have resulted in this function being invoked only from rcu_kthread_do_work(), which is present only in kernels containing RCU-preempt, which in turn means that the rcu_preempt_do_callbacks() function is no longer needed. This commit therefore inlines rcu_preempt_do_callbacks() into its sole remaining caller and also removes the rcu_state_p and rcu_data_p indirection for added clarity. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:00 -07:00
Boqun Feng	55ebfce060	rcu: exp: Protect all sync_rcu_preempt_exp_done() with rcu_node lock Currently some callsites of sync_rcu_preempt_exp_done() are not called with the corresponding rcu_node's ->lock held, which could introduces bugs as per Paul: o CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and sees that it is NULL. o CPU 1 blocks within an RCU read-side critical section, so it enqueues the task and points ->exp_tasks at it and clears CPU 1's bit in ->expmask. o All other CPUs clear their bits in ->expmask. o CPU 0 reads ->expmask, sees that it is zero, so incorrectly concludes that all quiescent states have completed, despite the fact that ->exp_tasks is non-NULL. To fix this, sync_rcu_preempt_exp_unlocked() is introduced to replace lockless callsites of sync_rcu_preempt_exp_done(). Further, a lockdep annotation is added into sync_rcu_preempt_exp_done() to prevent mis-use in the future. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:26:07 -07:00
Boqun Feng	7be8c56f8f	rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions Since commit `d9a3da0699` ("rcu: Add expedited grace-period support for preemptible RCU"), there are comments for some funtions in rcu_report_exp_rnp()'s call-chain saying that exp_mutex or its predecessors needs to be held. However, exp_mutex and its predecessors were used only to synchronize between GPs, and it is clear that all variables visited by those functions are under the protection of rcu_node's ->lock. Moreover, those functions are currently called without held exp_mutex, and seems that doesn't introduce any trouble. So this patch fixes this problem by updating the comments to match the current code. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Fixes: `d9a3da0699` ("rcu: Add expedited grace-period support for preemptible RCU") Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:26:01 -07:00
Paul E. McKenney	25f3d7effa	rcu: Parallelize expedited grace-period initialization The latency of RCU expedited grace periods grows with increasing numbers of CPUs, eventually failing to be all that expedited. Much of the growth in latency is in the initialization phase, so this commit uses workqueues to carry out this initialization concurrently on a rcu_node-by-rcu_node basis. This change makes use of a new rcu_par_gp_wq because flushing a work item from another work item running from the same workqueue can result in deadlock. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:25:44 -07:00
Paul E. McKenney	338c46403f	Merge branches 'fixes.2018.02.23a', 'srcu.2018.02.20a' and 'torture.2018.02.20a' into HEAD fixes.2018.02.23a: Miscellaneous fixes srcu.2018.02.20a: SRCU updates torture.2018.02.20a: Torture-test updates	2018-02-23 15:15:41 -08:00
Paul E. McKenney	ad7c946b35	rcu: Create RCU-specific workqueues with rescuers RCU's expedited grace periods can participate in out-of-memory deadlocks due to all available system_wq kthreads being blocked and there not being memory available to create more. This commit prevents such deadlocks by allocating an RCU-specific workqueue_struct at early boot time, and providing it with a rescuer to ensure forward progress. This uses the shiny new init_rescuer() function provided by Tejun (but indirectly). This commit also causes SRCU to use this new RCU-specific workqueue_struct. Note that SRCU's use of workqueues never blocks them waiting for readers, so this should be safe from a forward-progress viewpoint. Note that this moves SRCU from system_power_efficient_wq to a normal workqueue. In the unlikely event that this results in measurable degradation, a separate power-efficient workqueue will be creates for SRCU. Reported-by: Prateek Sood <prsood@codeaurora.org> Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org>	2018-02-23 15:14:40 -08:00
Paul E. McKenney	85ba6bfe8b	torture: Provide more sensible nreader/nwriter defaults for rcuperf The default values for nreader and nwriter are apparently not all that user-friendly, resulting in people doing scalability tests that ran all runs at large scale. This commit therefore makes both the nreaders and nwriters module default to the number of CPUs, and adds a comment to rcuperf.c stating that the number of CPUs should be specified using the nr_cpus kernel boot parameter. This commit also eliminates the redundant rcuperf scripting specification of default values for these parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:22:01 -08:00
Paul E. McKenney	db0c1a8aba	rcutorture: Record which grace-period primitives are tested The rcu_torture_writer() function adapts to requested testing from module parameters as well as the function pointers in the structure referenced by cur_ops. However, as long as the module parameters do not conflict with the function pointers, this adaptation is silent. This silence can result in confusion as to exactly what was tested, which could in turn result in untested RCU code making its way into mainline. This commit therefore makes rcu_torture_writer() announce exactly which portions of RCU's API it ends up testing. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:58 -08:00
Paul E. McKenney	f7c0e6ad4b	rcutorture: Re-enable testing of dynamic expediting During boot, normal grace periods are processed as expedited. When rcutorture is built into the kernel, it starts during boot and thus detects that normal grace periods are unconditionally expedited. Therefore, rcutorture concludes that there is no point in trying to dynamically enable expediting, do it disables this aspect of testing, which is a bit of an overreaction to the temporary boot-time expediting. This commit therefore rechecks forced expediting throughout the test, enabling dynamic expediting if normal grace periods are processed normally at any point. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:57 -08:00
Paul E. McKenney	eb0339934f	rcutorture: Avoid fake-writer use of undefined primitives Currently the rcu_torture_fakewriter() function invokes cur_ops->sync() and cur_ops->exp_sync() without first checking to see if they are in fact non-NULL. This results in kernel NULL pointer dereferences when testing RCU implementations that choose not to provide the full set of primitives. Given that it is perfectly reasonable to have specialized RCU implementations that provide only a subset of the RCU API, this is a bug in rcutorture. This commit therefore makes rcu_torture_fakewriter() check function pointers before invoking them, thus allowing it to test subsetted RCU implementations. Reported-by: Lihao Liang <lianglihao@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:56 -08:00
Paul E. McKenney	e0d31a34c6	rcutorture: Abstract function and module names This commit moves to __func__ for function names and for KBUILD_MODNAME for module names, all in the name of better resilience to change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:56 -08:00
Paul E. McKenney	68a675d433	rcutorture: Replace multi-instance kzalloc() with kcalloc() This commit replaces array-allocation calls to kzalloc() with equivalent calls to kcalloc(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:55 -08:00
Paul E. McKenney	6308f34775	rcu: Remove SRCU throttling The code in srcu_gp_end() inserts a delay every 0x3ff grace periods in order to prevent SRCU grace-period work from consuming an entire CPU when there is a long sequence of expedited SRCU grace-period requests. However, all of SRCU's grace-period work is carried out in workqueues, which are in turn within kthreads, which are automatically throttled as needed by the scheduler. In particular, if there is plenty of idle time, there is no point in throttling. This commit therefore removes the expedited SRCU grace-period throttling. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:13 -08:00
Byungchul Park	a72da917f1	srcu: Remove dead code in srcu_gp_end() Of course, compilers will optimize out a dead code. Anyway, remove any dead code for better readibility. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:12 -08:00
Ildar Ismagilov	8ddbd8832d	srcu: Reduce scans of srcu_data in counter wrap check Currently, given a multi-level srcu_node tree, SRCU can scan the full set of srcu_data structures at each level when cleaning up after a grace period. This, though harmless otherwise, represents pointless overhead. This commit therefore eliminates this overhead by scanning the srcu_data structures only when traversing the leaf srcu_node structures. Signed-off-by: Ildar Ismagilov <devix84@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:12 -08:00
Ildar Ismagilov	a35d13ec36	srcu: Prevent sdp->srcu_gp_seq_needed_exp counter wrap SRCU checks each srcu_data structure's grace-period number for counter wrap four times per cycle by default. This frequency guarantees that normal comparisons will detect potential wrap. However, the expedited grace-period number is not checked. The consquences are not too horrible (a failure to expedite a grace period when requested), but it would be good to avoid such things. This commit therefore adds this check to the expedited grace-period number. Signed-off-by: Ildar Ismagilov <devix84@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:11 -08:00
Paul E. McKenney	cb4081cd4e	srcu: Abstract function name This commit moves to __func__ for function names in the name of better resilience to change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:21:11 -08:00
Paul E. McKenney	65963d2461	rcu: Make expedited RCU CPU selection avoid unnecessary stores This commit reworks the first loop in sync_rcu_exp_select_cpus() to avoid doing unnecssary stores to other CPUs' rcu_data structures. This speeds up that first loop by roughly a factor of two on an old x86 system. In the case where the system is mostly idle, this loop incurs a large fraction of the overhead of the synchronize_rcu_expedited(). There is less benefit on busy systems because the overhead of the smp_call_function_single() in the second loop dominates in that case. However, it is not unusual to do configuration chances involving RCU grace periods (both expedited and normal) while the system is mostly idle, so this optimization is worth doing. While we are in the area, this commit also adds parentheses to arguments used by the for_each_leaf_node_possible_cpu() macro. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:12:29 -08:00
Paul E. McKenney	7f5d42d051	rcu: Trace expedited GP delays due to transitioning CPUs If a CPU is transitioning to or from offline state, an expedited grace period may undergo a timed wait. This timed wait can unduly delay grace periods, so this commit adds a trace statement to make it visible. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:12:28 -08:00
Paul E. McKenney	9a414201ae	rcu: Add more tracing of expedited grace periods This commit adds more tracing of expedited grace periods to enable improved debugging of slowdowns. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:12:27 -08:00
Ildar Ismagilov	274afd6bfa	rcu: Fix misprint in srcu_funnel_exp_start The srcu_funnel_exp_start() function checks to see if the srcu_struct structure's expedited grace period counter needs updating to reflect a newly arrived request for an expedited SRCU grace period. Unfortunately, the check is backwards, so this commit reverses the sense of the test. Signed-off-by: Ildar Ismagilov <devix84@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:12:27 -08:00
Matthew Wilcox	a32e01ee68	rcu: Use wrapper for lockdep asserts Commits `c0b334c5bf` and `ea9b0c8a26` introduced new sparse warnings by accessing rcu_node->lock directly and ignoring the __private marker. Introduce a new wrapper and use it. Also fix a similar problem in srcutree.c introduced by `a3883df393`. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:12:26 -08:00
Liu, Changcheng	65518db86b	rcu: Remove redundant nxttail index macro define RCU's nxttail has been optimized to be a rcu_segcblist, which is a multi-tailed linked list with macros defined for the indexes for each tail. The indexes have been defined in linux/rcu_segcblist.h, so this commit removes the redundant definitions in kernel/rcu/tree.h. Signed-off-by: Liu Changcheng <changcheng.liu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:10:31 -08:00
Paul E. McKenney	bfbd767d4d	rcu: Consolidate rcu.h #ifdefs The kernel/rcu/rcu.h file has a pair of consecutive #ifdefs on CONFIG_TINY_RCU, so this commit consolidates them, thus saving a few lines of code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:10:30 -08:00
Paul E. McKenney	d07aee2c03	rcu: More clearly identify grace-period kthread stack dump It is not always obvious that the stack dump from a starved grace-period kthread isn't instead that of a CPU stalling the current grace period. This commit therefore adds a pr_err() flagging these dumps. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:10:29 -08:00
Paul E. McKenney	d62df57370	rcu: Remove obsolete force-quiescent-state statistics for debugfs The debugfs interface displayed statistics on RCU-pending checks but this interface has since been removed. This commit therefore removes the no-longer-used rcu_state structure's ->n_force_qs_lh and ->n_force_qs_ngp fields along with their updates. (Though the ->n_force_qs_ngp field was actually not used at all, embarrassingly enough.) If this information proves necessary in the future, the corresponding event traces will be added. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:10:29 -08:00
Paul E. McKenney	01c495f72a	rcu: Remove obsolete __rcu_pending() statistics for debugfs The debugfs interface displayed statistics on RCU-pending checks but this interface has since been removed. This commit therefore removes the no-longer-used rcu_data structure's ->n_rcu_pending, ->n_rp_core_needs_qs, ->n_rp_report_qs, ->n_rp_cb_ready, ->n_rp_cpu_needs_gp, ->n_rp_gp_completed, ->n_rp_gp_started, ->n_rp_nocb_defer_wakeup, and ->n_rp_need_nothing fields along with their updates. If this information proves necessary in the future, the corresponding event traces will be added. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:10:28 -08:00
Paul E. McKenney	62df63e048	rcu: Remove obsolete callback-invocation statistics for debugfs The debugfs interface displayed statistics on RCU callback invocation but this interface has since been removed. This commit therefore removes the no-longer-used rcu_data structure's ->n_cbs_invoked and ->n_nocbs_invoked fields along with their updates. If this information proves necessary in the future, the corresponding event traces will be added. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:10:27 -08:00
Paul E. McKenney	bec06785fe	rcu: Remove obsolete boost statistics for debugfs The debugfs interface displayed statistics on RCU priority boosting, but this interface has since been removed. This commit therefore removes the no-longer-used rcu_data structure's ->n_tasks_boosted, ->n_exp_boosts, and ->n_exp_boosts and their updates. If this information proves necessary in the future, the corresponding event traces will be added. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:10:27 -08:00
Tejun Heo	3caa973b7a	rcu: Call touch_nmi_watchdog() while printing stall warnings When RCU stall warning triggers, it can print out a lot of messages while holding spinlocks. If the console device is slow (e.g. an actual or IPMI serial console), it may end up triggering NMI hard lockup watchdog like the following. * CPU printking while holding RCU spinlock PID: 4149739 TASK: ffff881a46baa880 CPU: 13 COMMAND: "CPUThreadPool8" #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0 #1 [ffff881fff945e58] nmi_handle at ffffffff81020653 #2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36 #3 [ffff881fff945ed0] do_nmi at ffffffff81020d32 #4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e [exception RIP: io_serial_in+21] RIP: ffffffff81630e55 RSP: ffff881fff943b88 RFLAGS: 00000002 RAX: 000000000000ca00 RBX: ffffffff8230e188 RCX: 0000000000000000 RDX: 00000000000002fd RSI: 0000000000000005 RDI: ffffffff8230e188 RBP: ffff881fff943bb0 R8: 0000000000000000 R9: ffffffff820cb3c4 R10: 0000000000000019 R11: 0000000000002000 R12: 00000000000026e1 R13: 0000000000000020 R14: ffffffff820cd398 R15: 0000000000000035 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- <NMI exception stack> --- #5 [ffff881fff943b88] io_serial_in at ffffffff81630e55 #6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c #7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc #8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00 #9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691 #10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2 #11 [ffff881fff943c90] console_unlock at ffffffff810dfc55 #12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5 #13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf #14 [ffff881fff943d60] vprintk_func at ffffffff810e1127 #15 [ffff881fff943d70] printk at ffffffff8119a8a4 #16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c #17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133 #18 [ffff881fff943ee8] update_process_times at ffffffff810f3497 #19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037 #20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38 #21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b * CPU triggering the hardlockup watchdog PID: 4149709 TASK: ffff88010f88c380 CPU: 26 COMMAND: "CPUThreadPool35" #0 [ffff883fff1059d0] machine_kexec at ffffffff8104a874 #1 [ffff883fff105a30] __crash_kexec at ffffffff811116cc #2 [ffff883fff105af0] __crash_kexec at ffffffff81111795 #3 [ffff883fff105b08] panic at ffffffff8119a6ae #4 [ffff883fff105b98] watchdog_overflow_callback at ffffffff81135dbd #5 [ffff883fff105bb0] __perf_event_overflow at ffffffff81186866 #6 [ffff883fff105be8] perf_event_overflow at ffffffff81192bc4 #7 [ffff883fff105bf8] intel_pmu_handle_irq at ffffffff8100b265 #8 [ffff883fff105df8] perf_event_nmi_handler at ffffffff8100489f #9 [ffff883fff105e58] nmi_handle at ffffffff81020653 #10 [ffff883fff105eb0] default_do_nmi at ffffffff81020b94 #11 [ffff883fff105ed0] do_nmi at ffffffff81020d32 #12 [ffff883fff105ef0] end_repeat_nmi at ffffffff81956a7e [exception RIP: queued_spin_lock_slowpath+248] RIP: ffffffff810da958 RSP: ffff883fff103e68 RFLAGS: 00000046 RAX: 0000000000000000 RBX: 0000000000000046 RCX: 00000000006d0000 RDX: ffff883fff49a950 RSI: 0000000000d10101 RDI: ffffffff81e54300 RBP: ffff883fff103e80 R8: ffff883fff11a950 R9: 0000000000000000 R10: 000000000e5873ba R11: 000000000000010f R12: ffffffff81e54300 R13: 0000000000000000 R14: ffff88010f88c380 R15: ffffffff81e54300 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #13 [ffff883fff103e68] queued_spin_lock_slowpath at ffffffff810da958 #14 [ffff883fff103e70] _raw_spin_lock_irqsave at ffffffff8195550b #15 [ffff883fff103e88] rcu_check_callbacks at ffffffff810eed18 #16 [ffff883fff103ee8] update_process_times at ffffffff810f3497 #17 [ffff883fff103f10] tick_sched_timer at ffffffff81103037 #18 [ffff883fff103f38] __hrtimer_run_queues at ffffffff810f3f38 #19 [ffff883fff103f88] hrtimer_interrupt at ffffffff810f442b --- <IRQ stack> --- Avoid spuriously triggering NMI hardlockup watchdog by touching it from the print functions. show_state_filter() shares the same problem and solution. v2: Relocate the comment to where it belongs. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:10:26 -08:00
Paul E. McKenney	3016611eed	rcu: Fix CPU offload boot message when no CPUs are offloaded In CONFIG_RCU_NOCB_CPU=y kernels, if the boot parameters indicate that none of the CPUs should in fact be offloaded, the following somewhat obtuse message appears: Offload RCU callbacks from CPUs: . This commit therefore makes the message at least grammatically correct in this case: Offload RCU callbacks from CPUs: (none) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-20 16:10:19 -08:00
Lihao Liang	398953e62c	rcu: Remove unnecessary spinlock in rcu_boot_init_percpu_data() Since rcu_boot_init_percpu_data() is only called at boot time, there is no data race and spinlock is not needed. Signed-off-by: Lihao Liang <lianglihao@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-15 15:40:36 -08:00
Linus Torvalds	28bc6fb959	SCSI misc on 20180131 This is mostly updates of the usual driver suspects: arcmsr, scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas, hisi_sas. We also have a rework of the libsas hotplug handling to make it more robust, a slew of 32 bit time conversions and fixes, and a host of the usual minor updates and style changes. The biggest potential for regressions is the libsas hotplug changes, but so far they seem stable under testing. Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWnH+5SYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWxuAP0UvuJp MNR/yU/wv/emSzOc48Ldwd7I0xD2XxSnloGUgwD+IGZZT5yNUQA1THCbm+en4hkB WvyBieQs9qRit+2czd4= =gJMf -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly updates of the usual driver suspects: arcmsr, scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas, hisi_sas. We also have a rework of the libsas hotplug handling to make it more robust, a slew of 32 bit time conversions and fixes, and a host of the usual minor updates and style changes. The biggest potential for regressions is the libsas hotplug changes, but so far they seem stable under testing" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (313 commits) scsi: qla2xxx: Fix logo flag for qlt_free_session_done() scsi: arcmsr: avoid do_gettimeofday scsi: core: Add VENDOR_SPECIFIC sense code definitions scsi: qedi: Drop cqe response during connection recovery scsi: fas216: fix sense buffer initialization scsi: ibmvfc: Remove unneeded semicolons scsi: hisi_sas: fix a bug in hisi_sas_dev_gone() scsi: hisi_sas: directly attached disk LED feature for v2 hw scsi: hisi_sas: devicetree: bindings: add LED feature for v2 hw scsi: megaraid_sas: NVMe passthrough command support scsi: megaraid: use ktime_get_real for firmware time scsi: fnic: use 64-bit timestamps scsi: qedf: Fix error return code in __qedf_probe() scsi: devinfo: fix format of the device list scsi: qla2xxx: Update driver version to 10.00.00.05-k scsi: qla2xxx: Add XCB counters to debugfs scsi: qla2xxx: Fix queue ID for async abort with Multiqueue scsi: qla2xxx: Fix warning for code intentation in __qla24xx_handle_gpdb_event() scsi: qla2xxx: Fix warning during port_name debug print scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout() ...	2018-01-31 11:23:28 -08:00
Paul E. McKenney	1dfa55e019	Merge branches 'cond_resched.2017.12.04a', 'dyntick.2017.11.28a', 'fixes.2017.12.11a', 'srbd.2017.12.05a' and 'torture.2017.12.11a' into HEAD cond_resched.2017.12.04a: Convert cond_resched_rcu_qs() to cond_resched() dyntick.2017.11.28a: Make RCU dynticks handle interrupts from NMI fixes.2017.12.11a: Miscellaneous fixes srbd.2017.12.05a: Remove now-redundant smp_read_barrier_depends() torture.2017.12.11a: Torture-testing update	2017-12-11 09:21:58 -08:00
Paul E. McKenney	a2f2577d96	torture: Eliminate torture_runnable and perf_runnable The purpose of torture_runnable is to allow rcutorture and locktorture to be started and stopped via sysfs when they are built into the kernel (as in not compiled as loadable modules). However, the 0444 permissions for both instances of torture_runnable prevent this use case from ever being put into practice. Given that there have been no complaints about this deficiency, it is reasonable to conclude that no one actually makes use of this sysfs capability. The perf_runnable module parameter for rcuperf is in the same situation. This commit therefore removes both torture_runnable instances as well as perf_runnable. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-12-11 09:18:29 -08:00
Paul E. McKenney	e8302739aa	rcutorture: Preempt RCU-preempt readers more vigorously This commit attempts to make a very rare rcutorture failure happen more often by increasing the fraction of RCU-preempt read-side critical sections that are preempted. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-12-11 09:18:22 -08:00
Paul E. McKenney	cc1321c96f	torture: Reduce #ifdefs for preempt_schedule() This commit adds a torture_preempt_schedule() that is nothingness in !PREEMPT builds and is preempt_schedule() otherwise. Then torture_preempt_schedule() is used to eliminate several ugly #ifdefs, both in rcutorture and in locktorture. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-12-11 09:18:22 -08:00
Rakib Mullick	84b12b752f	rcu: Remove have_rcu_nocb_mask from tree_plugin.h Currently have_rcu_nocb_mask is used to avoid double allocation of rcu_nocb_mask during boot up. Due to different representation of cpumask_var_t on different kernel config CPUMASK=y(or n) it was okay. But now we have a helper cpumask_available(), which can be utilized to check whether rcu_nocb_mask has been allocated or not without using a variable. Removing the variable also reduces vmlinux size. Unpatched version: text data bss dec hex filename 13050393 7852470 14543408 35446271 21cddff vmlinux Patched version: text data bss dec hex filename 13050390 7852438 14543408 35446236 21cdddc vmlinux Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-12-11 09:17:40 -08:00
Paul E. McKenney	efd88b02bb	rcu: Add comment giving debug strategy for double call_rcu() The following statement has for some reason proven non-intuitive: WARN_ON_ONCE(rcu_segcblist_empty(&rdp->cblist) != (count == 0)); This commit therefore adds a comment that states that this warning usually triggers in response to a double call_rcu(), which is sort of like a double free. The comment also suggests building with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y to track down the double call_rcu(). Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-12-11 09:17:39 -08:00
Paul E. McKenney	156baec397	rcu: Export init_rcu_head() and destroy_rcu_head() to GPL modules Use of init_rcu_head() and destroy_rcu_head() from modules results in the following build-time error with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y: ERROR: "init_rcu_head" [drivers/scsi/scsi_mod.ko] undefined! ERROR: "destroy_rcu_head" [drivers/scsi/scsi_mod.ko] undefined! This commit therefore adds EXPORT_SYMBOL_GPL() for each to allow them to be used by GPL-licensed kernel modules. Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-07 19:51:49 -05:00
Paul E. McKenney	d633198088	srcu: Prohibit call_srcu() use under raw spinlocks Invoking queue_delayed_work() while holding a raw spinlock is forbidden in -rt kernels, which is exactly what __call_srcu() does, indirectly via srcu_funnel_gp_start(). This commit therefore downgrades Tree SRCU's locking from raw to non-raw spinlocks, which works because call_srcu() is not ever called while holding a raw spinlock. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-28 15:52:33 -08:00
Paul E. McKenney	e68bbb266d	rcu: Simplify rcu_eqs_{enter,exit}() non-idle task debug code The code that checks for non-idle non-nohz_idle-usermode tasks invoking rcu_eqs_enter() and rcu_eqs_exit() prints a considerable quantity of helpful information. However, these checks fire rarely, so the extra complexity is no longer worth it. This commit therefore replaces this debug code with simple WARN_ON_ONCE() statements. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-28 15:51:21 -08:00
Paul E. McKenney	9dd238e286	rcu: Fold rcu_eqs_exit_common() into rcu_eqs_exit() There is now only one call to rcu_eqs_exit_common() and there is no other reason to keep it separate. This commit therefore inlines it into its sole call site, saving a few lines of code in the process. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-28 15:51:21 -08:00
Paul E. McKenney	215bba9f59	rcu: Fold rcu_eqs_enter_common() into rcu_eqs_enter() There is now only one call to rcu_eqs_enter_common() and there is no other reason to keep it separate. This commit therefore inlines it into its sole call site, saving a few lines of code in the process. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-28 15:51:20 -08:00
Paul E. McKenney	2342172fd6	rcu: Avoid ->dynticks_nesting store tearing Although ->dynticks_nesting is updated only by process level, it is accessed from hardirq to check for interrupt-from-idle quiescent states. Store tearing is thus possible, so this commit applies WRITE_ONCE() to ->dynticks_nesting stores. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-28 15:51:20 -08:00
Paul E. McKenney	914955e18c	rcu: Stop duplicating lockdep checks in RCU's idle-entry code The three RCU_LOCKDEP_WARN() calls in rcu_eqs_enter_common() are redundant with other lockdep checks, so this commit removes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-28 15:51:19 -08:00
Paul E. McKenney	dec98900ea	rcu: Add ->dynticks field to rcu_dyntick trace event Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-28 15:51:19 -08:00
Paul E. McKenney	84585aa8b6	rcu: Shrink ->dynticks_{nmi_,}nesting from long long to long Because the ->dynticks_nesting field now only contains the process-based nesting level instead of a value encoding both the process nesting level and the irq "nesting" level, we no longer need a long long, even on 32-bit systems. This commit therefore changes both the ->dynticks_nesting and ->dynticks_nmi_nesting fields to long. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-28 15:51:18 -08:00
Paul E. McKenney	bd2b879a1c	rcu: Add tracing to irq/NMI dyntick-idle transitions Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-28 15:51:18 -08:00
Paul E. McKenney	844ccdd7dc	rcu: Eliminate rcu_irq_enter_disabled() Now that the irq path uses the rcu_nmi_{enter,exit}() algorithm, rcu_irq_enter() and rcu_irq_exit() may be used from any context. There is thus no need for rcu_irq_enter_disabled() and for the checks using it. This commit therefore eliminates rcu_irq_enter_disabled(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-27 08:42:03 -08:00
Paul E. McKenney	51a1fd30f1	rcu: Make ->dynticks_nesting be a simple counter Now that ->dynticks_nesting counts only process-level dyntick-idle entry and exit, there is no need for the elaborate segmented counter with its guard fields and overflow checking. This commit therefore makes ->dynticks_nesting be a simple counter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-27 08:42:03 -08:00
Paul E. McKenney	58721f5da4	rcu: Define rcu_irq_{enter,exit}() in terms of rcu_nmi_{enter,exit}() RCU currently uses two different mechanisms for tracking irqs and NMIs. This is unnecessary complexity: Given that NMIs can nest and given that RCU's tracking handles such nesting, the NMI tracking mechanism can also be used to track irqs. This commit therefore defines rcu_irq_enter() in terms of rcu_nmi_enter() and rcu_irq_exit() in terms of rcu_nmi_exit(). Unfortunately, callers must still distinguish between the irq and NMI functions because additional actions are taken when an irq interrupts idle or nohz_full usermode execution, and these actions cannot always be taken from NMI handlers. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-27 08:42:03 -08:00
Paul E. McKenney	6136d6e48a	rcu: Clamp ->dynticks_nmi_nesting at eqs entry/exit In preparation for merging dyntick-idle irq handling into the NMI algorithm, clamp ->dynticks_nmi_nesting value to allow for interrupts that enter but never leave and vice versa. It is important that the clamping happen outside of the extended quiescent state. Otherwise, there will be short windows where irqs and NMIs fail to convince RCU to start watching. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-27 08:40:10 -08:00
Paul E. McKenney	fd581a91ac	rcu: Move rcu_nmi_{enter,exit}() to prepare for consolidation This is a code-motion-only commit that prepares to define rcu_irq_enter() in terms of rcu_nmi_enter() and rcu_irq_exit() in terms of rcu_irq_exit(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-27 08:40:10 -08:00
Paul E. McKenney	a0eb22bf64	rcu: Reduce dyntick-idle state space Both extended-quiescent-state entry and exit first update the nesting counter and then adjust the dyntick-idle state. This means that there are four states: (1) Both nesting and dyntick idle indicate idle, (2) Nesting indicates idle but dyntick idle does not, (3) Nesting indicates non-idle and dyntick idle does not, and (4) Both nesting and dyntick idle indicate non-idle. This commit simplifies the state space by eliminating #3, reversing the order of updates on exit from extended quiescent state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-27 08:40:10 -08:00
Paul E. McKenney	2e672ab2d4	rcu: Avoid ->dynticks_nmi_nesting store tearing NMIs can nest, and store tearing could in theory happen on carries from one byte to the next. This commit therefore adds the WRITE_ONCE() macros preventing this. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-11-27 08:40:09 -08:00
Linus Torvalds	2bcc673101	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Yet another big pile of changes: - More year 2038 work from Arnd slowly reaching the point where we need to think about the syscalls themself. - A new timer function which allows to conditionally (re)arm a timer only when it's either not running or the new expiry time is sooner than the armed expiry time. This allows to use a single timer for multiple timeout requirements w/o caring about the first expiry time at the call site. - A new NMI safe accessor to clock real time for the printk timestamp work. Can be used by tracing, perf as well if required. - A large number of timer setup conversions from Kees which got collected here because either maintainers requested so or they simply got ignored. As Kees pointed out already there are a few trivial merge conflicts and some redundant commits which was unavoidable due to the size of this conversion effort. - Avoid a redundant iteration in the timer wheel softirq processing. - Provide a mechanism to treat RTC implementations depending on their hardware properties, i.e. don't inflict the write at the 0.5 seconds boundary which originates from the PC CMOS RTC to all RTCs. No functional change as drivers need to be updated separately. - The usual small updates to core code clocksource drivers. Nothing really exciting" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits) timers: Add a function to start/reduce a timer pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday() timer: Prepare to change all DEFINE_TIMER() callbacks netfilter: ipvs: Convert timers to use timer_setup() scsi: qla2xxx: Convert timers to use timer_setup() block/aoe: discover_timer: Convert timers to use timer_setup() ide: Convert timers to use timer_setup() drbd: Convert timers to use timer_setup() mailbox: Convert timers to use timer_setup() crypto: Convert timers to use timer_setup() drivers/pcmcia: omap1: Fix error in automated timer conversion ARM: footbridge: Fix typo in timer conversion drivers/sgi-xp: Convert timers to use timer_setup() drivers/pcmcia: Convert timers to use timer_setup() drivers/memstick: Convert timers to use timer_setup() drivers/macintosh: Convert timers to use timer_setup() hwrng/xgene-rng: Convert timers to use timer_setup() auxdisplay: Convert timers to use timer_setup() sparc/led: Convert timers to use timer_setup() mips: ip22/32: Convert timers to use timer_setup() ...	2017-11-13 17:56:58 -08:00
Linus Torvalds	3e2014637c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main updates in this cycle were: - Group balancing enhancements and cleanups (Brendan Jackman) - Move CPU isolation related functionality into its separate kernel/sched/isolation.c file, with related 'housekeeping_()' namespace and nomenclature et al. (Frederic Weisbecker) - Improve the interactive/cpu-intense fairness calculation (Josef Bacik) - Improve the PELT code and related cleanups (Peter Zijlstra) - Improve the logic of pick_next_task_fair() (Uladzislau Rezki) - Improve the RT IPI based balancing logic (Steven Rostedt) - Various micro-optimizations: - better !CONFIG_SCHED_DEBUG optimizations (Patrick Bellasi) - better idle loop (Cheng Jian) - ... plus misc fixes, cleanups and updates" 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds sched/sysctl: Fix attributes of some extern declarations sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated sched/isolation: Add basic isolcpus flags sched/isolation: Move isolcpus= handling to the housekeeping code sched/isolation: Handle the nohz_full= parameter sched/isolation: Introduce housekeeping flags sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu() sched/isolation: Use its own static key sched/isolation: Make the housekeeping cpumask private sched/isolation: Provide a dynamic off-case to housekeeping_any_cpu() sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version sched/isolation: Move housekeeping related code to its own file sched/idle: Micro-optimize the idle loop sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter sched/rt: Simplify the IPI based RT balancing logic block/ioprio: Use a helper to check for RT prio sched/rt: Add a helper to test for a RT task ...	2017-11-13 13:37:52 -08:00
Linus Torvalds	8e9a2dba86	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main changes in this cycle are: - Another attempt at enabling cross-release lockdep dependency tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time with better performance and fewer false positives. (Byungchul Park) - Introduce lockdep_assert_irqs_enabled()/disabled() and convert open-coded equivalents to lockdep variants. (Frederic Weisbecker) - Add down_read_killable() and use it in the VFS's iterate_dir() method. (Kirill Tkhai) - Convert remaining uses of ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle driven. (Mark Rutland, Paul E. McKenney) - Get rid of lockless_dereference(), by strengthening Alpha atomics, strengthening READ_ONCE() with smp_read_barrier_depends() and thus being able to convert users of lockless_dereference() to READ_ONCE(). (Will Deacon) - Various micro-optimizations: - better PV qspinlocks (Waiman Long), - better x86 barriers (Michael S. Tsirkin) - better x86 refcounts (Kees Cook) - ... plus other fixes and enhancements. (Borislav Petkov, Juergen Gross, Miguel Bernal Marin)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE rcu: Use lockdep to assert IRQs are disabled/enabled netpoll: Use lockdep to assert IRQs are disabled/enabled timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled irq_work: Use lockdep to assert IRQs are disabled/enabled irq/timings: Use lockdep to assert IRQs are disabled/enabled perf/core: Use lockdep to assert IRQs are disabled/enabled x86: Use lockdep to assert IRQs are disabled/enabled smp/core: Use lockdep to assert IRQs are disabled/enabled timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled timers/nohz: Use lockdep to assert IRQs are disabled/enabled workqueue: Use lockdep to assert IRQs are disabled/enabled irq/softirqs: Use lockdep to assert IRQs are disabled/enabled locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled() locking/pvqspinlock: Implement hybrid PV queued/unfair locks locking/rwlocks: Fix comments x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion() workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes ...	2017-11-13 12:38:26 -08:00
Linus Torvalds	6098850e7e	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle are: - Documentation updates - RCU CPU stall-warning updates - Torture-test updates - Miscellaneous fixes Size wise the biggest updates are to documentation. Excluding documentation most of the code increase comes from a single commit which expands debugging" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) srcu: Add parameters to SRCU docbook comments doc: Rewrite confusing statement about memory barriers memory-barriers.txt: Fix typo in pairing example rcu/segcblist: Include rcupdate.h rcu: Add extended-quiescent-state testing advice rcu: Suppress lockdep false-positive ->boost_mtx complaints rcu: Do not include rtmutex_common.h unconditionally torture: Provide TMPDIR environment variable to specify tmpdir rcutorture: Dump writer stack if stalled rcutorture: Add interrupt-disable capability to stall-warning tests rcu: Suppress RCU CPU stall warnings while dumping trace rcu: Turn off tracing before dumping trace rcu: Make RCU CPU stall warnings check for irq-disabled CPUs sched,rcu: Make cond_resched() provide RCU quiescent state sched: Make resched_cpu() unconditional irq_work: Map irq_work_on_queue() to irq_work_on() in !SMP rcu: Create call_rcu_tasks() kthread at boot time rcu: Fix up pending cbs check in rcu_prepare_for_idle memory-barriers: Rework multicopy-atomicity section memory-barriers: Replace uses of "transitive" ...	2017-11-13 12:18:10 -08:00
Frederic Weisbecker	b04db8e19f	rcu: Use lockdep to assert IRQs are disabled/enabled Lockdep now has an integrated IRQs disabled/enabled sanity check. Just use it instead of the ad-hoc RCU version. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-15-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-11-08 11:13:55 +01:00
Ingo Molnar	8a103df440	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-11-08 10:17:15 +01:00
Kees Cook	fd30b717b8	rcu: Convert timers to use timer_setup() In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2017-11-02 15:44:09 -07:00
Greg Kroah-Hartman	b24413180f	License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a /uapi/ one with no licensing information in it, - file was a /uapi/ one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non /uapi/ files that summary was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a /uapi/ path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the /uapi/ ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------\|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-02 11:10:55 +01:00
Frederic Weisbecker	de201559df	sched/isolation: Introduce housekeeping flags Before we implement isolcpus under housekeeping, we need the isolation features to be more finegrained. For example some people want NOHZ_FULL without the full scheduler isolation, others want full scheduler isolation without NOHZ_FULL. So let's cut all these isolation features piecewise, at the risk of overcutting it right now. We can still merge some flags later if they always make sense together. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-9-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-10-27 09:55:29 +02:00
Frederic Weisbecker	7863406143	sched/isolation: Move housekeeping related code to its own file The housekeeping code is currently tied to the NOHZ code. As we are planning to make housekeeping independent from it, start with moving the relevant code to its own file. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-2-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-10-27 09:55:24 +02:00
Ingo Molnar	72bc286b81	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Documentation updates - Miscellaneous fixes - RCU CPU stall-warning updates - Torture-test updates Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-10-24 10:49:44 +02:00
Paul E. McKenney	ad4e25a3a1	Merge branches 'doc.2017.10.20a', 'fixes.2017.10.19a', 'stall.2017.10.09a' and 'torture.2017.10.09a' into HEAD doc.2017.10.20a: Documentation updates. fixes.2017.10.19a: Miscellaneous fixes. stall.2017.10.09a: RCU CPU stall-warning updates. torture.2017.10.09a: Torture-test updates.	2017-10-20 11:11:15 -07:00
Paul E. McKenney	e4d0b679a8	srcu: Add parameters to SRCU docbook comments Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-20 11:09:33 -07:00
Paul E. McKenney	27fdb35fe9	doc: Fix various RCU docbook comment-header problems Because many of RCU's files have not been included into docbook, a number of errors have accumulated. This commit fixes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-19 22:26:11 -04:00
Sebastian Andrzej Siewior	56628a7fc8	rcu/segcblist: Include rcupdate.h The RT build on ARM complains about non-existing ULONG_CMP_LT. This commit therefore includes rcupdate.h into rcu_segcblist.c. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-19 12:13:36 -07:00
Paul E. McKenney	c0da313e09	rcu: Add extended-quiescent-state testing advice If you add or remove calls to rcu_idle_enter(), rcu_user_enter(), rcu_irq_exit(), rcu_irq_exit_irqson(), rcu_idle_exit(), rcu_user_exit(), rcu_irq_enter(), rcu_irq_enter_irqson(), rcu_nmi_enter(), or rcu_nmi_exit(), you should run a full set of tests on a kernel built with CONFIG_RCU_EQS_DEBUG=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-19 12:13:36 -07:00
Paul E. McKenney	02a7c234e5	rcu: Suppress lockdep false-positive ->boost_mtx complaints RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an rt_mutex structure in locked state held by some other task. When that other task releases it, lockdep complains (quite accurately, but a bit uselessly) that the other task never acquired it. This complaint can suppress other, more helpful, lockdep complaints, and in any case it is a false positive. This commit therefore switches from rt_mutex_unlock() to rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations. Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(), addtional adjustments will be required. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-19 12:13:36 -07:00
Sebastian Andrzej Siewior	b88697810d	rcu: Do not include rtmutex_common.h unconditionally This commit adjusts include files and provides definitions in preparation for suppressing lockdep false-positive ->boost_mtx complaints. Without this preparation, architectures not supporting rt_mutex will get build failures. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-19 12:12:06 -07:00
Paul E. McKenney	0032f4e889	rcutorture: Dump writer stack if stalled Right now, rcutorture warns if an rcu_torture_writer() kthread stalls, but this warning is not always all that helpful. This commit therefore makes the first such warning include a stack dump. This in turn requires that sched_show_task() be exported to GPL modules, so this commit makes that change as well. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-09 14:26:09 -07:00
Paul E. McKenney	2b1516e55f	rcutorture: Add interrupt-disable capability to stall-warning tests When rcutorture sees the rcutorture.stall_cpu kernel boot parameter, it loops with preemption disabled, which does in fact normally generate an RCU CPU stall warning message. However, there are test scenarios that need the stalling CPU to have interrupts disabled. This commit therefore adds an rcutorture.stall_cpu_irqsoff kernel boot parameter that causes the stalling CPU to disable interrupts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-09 14:26:08 -07:00
Paul E. McKenney	f22ce09157	rcu: Suppress RCU CPU stall warnings while dumping trace Currently, RCU emits Suppress RCU CPU stall warnings during its automatically initiated ftrace_dump() calls after detecting an error condition, which can result in excessively excessive console output and lost trace events. This commit therefore suppresses RCU CPU stall warnings across any of these ftrace_dump() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-09 14:25:17 -07:00
Paul E. McKenney	83b6ca1fed	rcu: Turn off tracing before dumping trace Currently, RCU allows tracing to continue when it automatically does ftrace_dump() after detecting an error condition, which can result in excessively large traces and lost trace events. This commit therefore does a tracing_off() before any of these ftrace_dump() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-09 14:25:17 -07:00
Paul E. McKenney	9b9500da81	rcu: Make RCU CPU stall warnings check for irq-disabled CPUs One common question upon seeing an RCU CPU stall warning is "did the stalled CPUs have interrupts disabled?" However, the current stall warnings are silent on this point. This commit therefore uses irq_work to check whether stalled CPUs still respond to IPIs, and flags this state in the RCU CPU stall warning console messages. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-09 14:25:17 -07:00
Paul E. McKenney	f79c3ad618	sched,rcu: Make cond_resched() provide RCU quiescent state There is some confusion as to which of cond_resched() or cond_resched_rcu_qs() should be added to long in-kernel loops. This commit therefore eliminates the decision by adding RCU quiescent states to cond_resched(). This commit also simplifies the code that used to interact with cond_resched_rcu_qs(), and that now interacts with cond_resched(), to reduce its overhead. This reduction is necessary to allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked everywhere that cond_resched() is invoked. Part of that reduction in overhead converts the jiffies_till_sched_qs kernel parameter to read-only at runtime, thus eliminating the need for bounds checking. Reported-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> [ paulmck: Keep PREEMPT=n cond_resched a no-op, per Peter Zijlstra. ]	2017-10-09 14:25:17 -07:00
Paul E. McKenney	c63eb17ff0	rcu: Create call_rcu_tasks() kthread at boot time Currently the call_rcu_tasks() kthread is created upon first invocation of call_rcu_tasks(). This has the advantage of avoiding creation if there are never any invocations of call_rcu_tasks() and of synchronize_rcu_tasks(), but it requires an unreliable heuristic to determine when it is safe to create the kthread. For example, it is not safe to create the kthread when call_rcu_tasks() is invoked with a spinlock held, but there is no good way to detect this in !PREEMPT kernels. This commit therefore creates this kthread unconditionally at core_initcall() time. If you don't want this kthread created, then build with CONFIG_TASKS_RCU=n. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-09 14:24:14 -07:00
Neeraj Upadhyay	135bd1a230	rcu: Fix up pending cbs check in rcu_prepare_for_idle The pending-callbacks check in rcu_prepare_for_idle() is backwards. It should accelerate if there are pending callbacks, but the check rather uselessly accelerates only if there are no callbacks. This commit therefore inverts this check. Fixes: `15fecf89e4` ("srcu: Abstract multi-tail callback list handling") Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # 4.12.x	2017-10-09 14:24:14 -07:00
Linus Torvalds	013a8ee628	Two updates. - A memory fix with left over code from spliting out ftrace_ops and function graph tracer, where the function graph tracer could reset the trampoline pointer, leaving the old trampoline not to be freed (memory leak). - The update to Paul's patch that added the unnecessary READ_ONCE(). This removes the unnecessary READ_ONCE() instead of having to rebase the branch to update the patch that added it. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEEQEw9Eu0DdyUUkuUUybkF8mrZjcsFAlnU++sUHHJvc3RlZHRA Z29vZG1pcy5vcmcACgkQybkF8mrZjcujzgf/ebIzGKe5vQKNrL4ITAcIz0T7Hvzl pWw4uJp8kqO9x9EHMnztAkltQigvjvgDKZozJpUGgtNsFLuvdgQSBMK24YV8vLHs UmXEnQ2tSB/2Sg2ccEnpjVXaMzL9aqlbeTmACbdd9UgZnvPiUYPejq2jFfECFQjb k/gZT911ukBtx4mXYKzGFbTEZHdc/YUs6Y/wzB1ox5BBIUh71ZDZXxQTUHfXHlwS Cst69/9dKl4nBEGDGas6/95iR+ORVv85osI/pqPtjSj4EkRnWfVRotaH1kNuSQil gDIHSoy35NfXJx77/5IFHfrjFBAkr0IYRNL/jZaWazwM7rdqfAN8TwMQuA== =4CtF -----END PGP SIGNATURE----- Merge tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixlets from Steven Rostedt: "Two updates: - A memory fix with left over code from spliting out ftrace_ops and function graph tracer, where the function graph tracer could reset the trampoline pointer, leaving the old trampoline not to be freed (memory leak). - The update to Paul's patch that added the unnecessary READ_ONCE(). This removes the unnecessary READ_ONCE() instead of having to rebase the branch to update the patch that added it" * tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}() ftrace: Fix kmemleak in unregister_ftrace_graph	2017-10-04 08:34:01 -07:00
Paul E. McKenney	f39b536ce9	rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}() The read of ->dynticks_nmi_nesting in rcu_irq_enter() and rcu_irq_exit() is currently protected with READ_ONCE(). However, this protection is unnecessary because (1) ->dynticks_nmi_nesting is updated only by the current CPU, (2) Although NMI handlers can update this field, they reset it back to its old value before return, and (3) Interrupts are disabled, so nothing else can modify it. The value of ->dynticks_nmi_nesting is thus effectively constant, and so no protection is required. This commit therefore removes the READ_ONCE() protection from these two accesses. Link: http://lkml.kernel.org/r/20170926031902.GA2074@linux.vnet.ibm.com Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-10-03 10:27:32 -04:00
Linus Torvalds	ac0a36461f	Stack tracing and RCU has been having issues with each other and lockdep has been pointing out constant problems. The changes have been going into the stack tracer, but it has been discovered that the problem isn't with the stack tracer itself, but it is with calling save_stack_trace() from within the internals of RCU. The stack tracer is the one that can trigger the issue the easiest, but examining the problem further, it could also happen from a WARN() in the wrong place, or even if an NMI happened in this area and it did an rcu_read_lock(). The critical area is where RCU is not watching. Which can happen while going to and from idle, or bringing up or taking down a CPU. The final fix was to put the protection in kernel_text_address() as it is the one that requires RCU to be watching while doing the stack trace. To make this work properly, Paul had to allow rcu_irq_enter() happen after rcu_nmi_enter(). This should have been done anyway, since an NMI can page fault (reading vmalloc area), and a page fault triggers rcu_irq_enter(). One patch is just a consolidation of code so that the fix only needed to be done in one location. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEEQEw9Eu0DdyUUkuUUybkF8mrZjcsFAlnGyXoUHHJvc3RlZHRA Z29vZG1pcy5vcmcACgkQybkF8mrZjctKtwf8CeKGqOdlqkZEafIpWaIASXmAVMO/ WE+hQK+rCydWFvzADgb/rOmsR0ou8WGEXcuUPxVxmvMyqhKhZ6AU1hE/7Y8P0pMq F4bev+j3lAJC65ezFAh+ZQcIjaRIH4MFVPsUTaibSPSN7xziMNIpbf9VOVfpUm8A jf9p6YAmyhFVi6DstCc29SWnywEVwC2ZWRVKRPXKry8/dPxjfVcLclGX680Eqi9I EnYaOdC/mGbtvHPOUSs/P0cfxExHmyEErQHeOV8FPymj6KJ6+KoYIiELNlTHUBj/ eeKzrHc/b3j+lz0RPlA8WxYmpmEm4SE5cV3vRebdBNUBrABSN1RxeOozyQ== =1KkS -----END PGP SIGNATURE----- Merge tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Stack tracing and RCU has been having issues with each other and lockdep has been pointing out constant problems. The changes have been going into the stack tracer, but it has been discovered that the problem isn't with the stack tracer itself, but it is with calling save_stack_trace() from within the internals of RCU. The stack tracer is the one that can trigger the issue the easiest, but examining the problem further, it could also happen from a WARN() in the wrong place, or even if an NMI happened in this area and it did an rcu_read_lock(). The critical area is where RCU is not watching. Which can happen while going to and from idle, or bringing up or taking down a CPU. The final fix was to put the protection in kernel_text_address() as it is the one that requires RCU to be watching while doing the stack trace. To make this work properly, Paul had to allow rcu_irq_enter() happen after rcu_nmi_enter(). This should have been done anyway, since an NMI can page fault (reading vmalloc area), and a page fault triggers rcu_irq_enter(). One patch is just a consolidation of code so that the fix only needed to be done in one location" * tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Remove RCU work arounds from stack tracer extable: Enable RCU if it is not watching in kernel_text_address() extable: Consolidate *kernel_text_address() functions rcu: Allow for page faults in NMI handlers	2017-09-25 15:22:31 -07:00
Paul E. McKenney	28585a8326	rcu: Allow for page faults in NMI handlers A number of architecture invoke rcu_irq_enter() on exception entry in order to allow RCU read-side critical sections in the exception handler when the exception is from an idle or nohz_full CPU. This works, at least unless the exception happens in an NMI handler. In that case, rcu_nmi_enter() would already have exited the extended quiescent state, which would mean that rcu_irq_enter() would (incorrectly) cause RCU to think that it is again in an extended quiescent state. This will in turn result in lockdep splats in response to later RCU read-side critical sections. This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to take no action if there is an rcu_nmi_enter() in effect, thus avoiding the unscheduled return to RCU quiescent state. This in turn should make the kernel safe for on-demand RCU voyeurism. Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: `0be964be0` ("module: Sanitize RCU usage and locking") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-09-23 16:49:42 -04:00
Alexey Dobriyan	9b130ad5bb	treewide: make "nr_cpu_ids" unsigned First, number of CPUs can't be negative number. Second, different signnnedness leads to suboptimal code in the following cases: 1) kmalloc(nr_cpu_ids * sizeof(X)); "int" has to be sign extended to size_t. 2) while (loff_t *pos < nr_cpu_ids) MOVSXD is 1 byte longed than the same MOV. Other cases exist as well. Basically compiler is told that nr_cpu_ids can't be negative which can't be deduced if it is "int". Code savings on allyesconfig kernel: -3KB add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370) function old new delta coretemp_cpu_online 450 512 +62 rcu_init_one 1234 1272 +38 pci_device_probe 374 399 +25 ... pgdat_reclaimable_pages 628 556 -72 select_fallback_rq 446 369 -77 task_numa_find_cpu 1923 1807 -116 Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-09-08 18:26:48 -07:00
Paul E. McKenney	656e7c0c0a	Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD doc.2017.08.17a: Documentation updates. fixes.2017.08.17a: RCU fixes. hotplug.2017.07.25b: CPU-hotplug updates. misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts). spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait(). srcu.2017.07.27c: SRCU updates. torture.2017.07.24c: Torture-test updates.	2017-08-17 08:10:04 -07:00
Paul E. McKenney	16c0b10607	rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() The rcu_idle_exit() and rcu_idle_enter() functions are exported because they were originally used by RCU_NONIDLE(), which was intended to be usable from modules. However, RCU_NONIDLE() now instead uses rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not exported, and there have been no complaints. This commit therefore removes the exports from rcu_idle_exit() and rcu_idle_enter(). Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-17 07:26:25 -07:00
Paul E. McKenney	d4db30af51	rcu: Add warning to rcu_idle_enter() for irqs enabled All current callers of rcu_idle_enter() have irqs disabled, and rcu_idle_enter() relies on this, but doesn't check. This commit therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust. While we are there, pass "true" rather than "1" to rcu_eqs_enter(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-17 07:26:25 -07:00
Peter Zijlstra (Intel)	3a60799269	rcu: Make rcu_idle_enter() rely on callers disabling irqs All callers to rcu_idle_enter() have irqs disabled, so there is no point in rcu_idle_enter disabling them again. This commit therefore replaces the irq disabling with a RCU_LOCKDEP_WARN(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-17 07:26:24 -07:00
Paul E. McKenney	2dee9404fa	rcu: Add assertions verifying blocked-tasks list This commit adds assertions verifying the consistency of the rcu_node structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and ->boost_tasks pointers. In particular, the ->blkd_tasks lists must be empty except for leaf rcu_node structures. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-17 07:26:23 -07:00
Masami Hiramatsu	35fe723bda	rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was fixed for rcu_eqs_enter_common() by commit `03ecd3f48e` ("rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work"). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-17 07:26:23 -07:00
Paul E. McKenney	d8db2e86d8	rcu: Add TPS() protection for _rcu_barrier_trace strings The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(), which needs TPS() protection for strings passed through the second argument. However, it has escaped prior TPS()-ification efforts because it _rcu_barrier_trace() does not start with "trace_". This commit therefore adds the needed TPS() protection Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-08-17 07:26:22 -07:00
Luis R. Rodriguez	d5374226c3	rcu: Use idle versions of swait to make idle-hack clear These RCU waits were set to use interruptible waits to avoid the kthreads contributing to system load average, even though they are not interruptible as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes our goal clear, and removes confusion about these paths possibly being interruptible -- they are not. When the system is idle the RCU grace-period kthread will spend all its time blocked inside the swait_event_interruptible(). If the interruptible() was not used, then this kthread would contribute to the load average. This means that an idle system would have a load average of 2 (or 3 if PREEMPT=y), rather than the load average of 0 that almost fifty years of UNIX has conditioned sysadmins to expect. The same argument applies to swait_event_interruptible_timeout() use. The RCU grace-period kthread spends its time blocked inside this call while waiting for grace periods to complete. In particular, if there was only one busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU grace-period kthread would spend almost all its time blocked inside the swait_event_interruptible_timeout(). This would mean that the load average would be 2 rather than the expected 1 for the single busy CPU. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-17 07:26:15 -07:00
Paul E. McKenney	c5ebe66ce7	rcu: Add event tracing to ->gp_tasks update at GP start There is currently event tracing to track when a task is preempted within a preemptible RCU read-side critical section, and also when that task subsequently reaches its outermost rcu_read_unlock(), but none indicating when a new grace period starts when that grace period must wait on pre-existing readers that have been been preempted at least once since the beginning of their current RCU read-side critical sections. This commit therefore adds an event trace at grace-period start in the case where there are such readers. Note that only the first reader in the list is traced. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-08-17 07:26:06 -07:00
Paul E. McKenney	7414fac050	rcu: Move rcu.h to new trivial-function style This commit saves a few lines in kernel/rcu/rcu.h by moving to single-line definitions for trivial functions, instead of the old style where the two curly braces each get their own line. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-17 07:26:06 -07:00
Paul E. McKenney	bedbb648ef	rcu: Add TPS() to event-traced strings Strings used in event tracing need to be specially handled, for example, using the TPS() macro. Without the TPS() macro, although output looks fine from within a running kernel, extracting traces from a crash dump produces garbage instead of strings. This commit therefore adds the TPS() macro to some unadorned strings that were passed to event-tracing macros. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-08-17 07:26:05 -07:00
Paul E. McKenney	ccdd29ffff	rcu: Create reasonable API for do_exit() TASKS_RCU processing Currently, the exit-time support for TASKS_RCU is open-coded in do_exit(). This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish() APIs for do_exit() use. This has the benefit of confining the use of the tasks_rcu_exit_srcu variable to one file, allowing it to become static. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-17 07:26:05 -07:00
Paul E. McKenney	7e42776d5e	rcu: Drive TASKS_RCU directly off of PREEMPT The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched is used instead. This commit therefore makes synchronize_rcu_tasks() and call_rcu_tasks() available always, but mapped to synchronize_sched() and call_rcu_sched(), respectively, when !PREEMPT. This approach also allows some #ifdefs to be removed from rcutorture. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org>	2017-08-17 07:26:04 -07:00
Paul E. McKenney	35732cf9dd	srcu: Provide ordering for CPU not involved in grace period Tree RCU guarantees that every online CPU has a memory barrier between any given grace period and any of that CPU's RCU read-side sections that must be ordered against that grace period. Since RCU doesn't always know where read-side critical sections are, the actual implementation guarantees order against prior and subsequent non-idle non-offline code, whether in an RCU read-side critical section or not. As a result, there does not need to be a memory barrier at the end of synchronize_rcu() and friends because the ordering internal to the grace period has ordered every CPU's post-grace-period execution against each CPU's pre-grace-period execution, again for all non-idle online CPUs. In contrast, SRCU can have non-idle online CPUs that are completely uninvolved in a given SRCU grace period, for example, a CPU that never runs any SRCU read-side critical sections and took no part in the grace-period processing. It is in theory possible for a given synchronize_srcu()'s wakeup to be delivered to a CPU that was completely uninvolved in the prior SRCU grace period, which could mean that the code following that synchronize_srcu() would end up being unordered with respect to both the grace period and any pre-existing SRCU read-side critical sections. This commit therefore adds an smp_mb() to the end of __synchronize_srcu(), which prevents this scenario from occurring. Reported-by: Lance Roy <ldr709@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Lance Roy <ldr709@gmail.com> Cc: <stable@vger.kernel.org> # 4.12.x	2017-07-27 15:53:04 -07:00
Paul E. McKenney	09efeeee17	rcu: Move callback-list warning to irq-disable region After adopting callbacks from a newly offlined CPU, the adopting CPU checks to make sure that its callback list's count is zero only if the list has no callbacks and vice versa. Unfortunately, it does so after enabling interrupts, which means that false positives are possible due to interrupt handlers invoking call_rcu(). Although these false positives are improbable, rcutorture did make it happen once. This commit therefore moves this check to an irq-disabled region of code, thus suppressing the false positive. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:50 -07:00
Paul E. McKenney	aed4e04686	rcu: Remove unused RCU list functions Given changes to callback migration, rcu_cblist_head(), rcu_cblist_tail(), rcu_cblist_count_cbs(), rcu_segcblist_segempty(), rcu_segcblist_dequeued_lazy(), and rcu_segcblist_new_cbs() are no longer used. This commit therefore removes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:49 -07:00
Paul E. McKenney	f2dbe4a562	rcu: Localize rcu_state ->orphan_pend and ->orphan_done Given that the rcu_state structure's >orphan_pend and ->orphan_done fields are used only during migration of callbacks from the recently offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() are combined, these fields can become local variables in the combined function. This commit therefore combines rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new rcu_segcblist_merge() function and removes the ->orphan_pend and ->orphan_done fields. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:49 -07:00
Paul E. McKenney	21cc248384	rcu: Advance callbacks after migration When migrating callbacks from a newly offlined CPU, we are already holding the root rcu_node structure's lock, so it costs almost nothing to advance and accelerate the newly migrated callbacks. This patch therefore makes this advancing and acceleration happen. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:48 -07:00
Paul E. McKenney	537b85c870	rcu: Eliminate rcu_state ->orphan_lock The ->orphan_lock is acquired and released only within the rcu_migrate_callbacks() function, which now acquires the root rcu_node structure's ->lock. This commit therefore eliminates the ->orphan_lock in favor of the root rcu_node structure's ->lock. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:48 -07:00
Paul E. McKenney	9fa46fb8c9	rcu: Advance outgoing CPU's callbacks before migrating them It is possible that the outgoing CPU is unaware of recent grace periods, and so it is also possible that some of its pending callbacks are actually ready to be invoked. The current callback-migration code would needlessly force these callbacks to pass through another grace period. This commit therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in order to give them full credit for having passed through any recent grace periods. This also fixes an odd theoretical bug where there are no callbacks in the system except for those on the outgoing CPU, none of those callbacks have yet been associated with a grace-period number, there is never again another callback registered, and the surviving CPU never again takes a scheduling-clock interrupt, never goes idle, and never enters nohz_full userspace execution. Yes, this is (just barely) possible. It requires that the surviving CPU be a nohz_full CPU, that its scheduler-clock interrupt be shut off, and that it loop forever in the kernel. You get bonus points if you can make this one happen! ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:47 -07:00

1 2 3 4 5 ...

936 Commits