2020-04-22 01:04:02 +08:00
|
|
|
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
|
|
|
|
================================
|
2005-04-17 06:20:36 +08:00
|
|
|
Review Checklist for RCU Patches
|
2020-04-22 01:04:02 +08:00
|
|
|
================================
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
|
|
|
|
This document contains a checklist for producing and reviewing patches
|
|
|
|
that make use of RCU. Violating any of the rules listed below will
|
|
|
|
result in the same sorts of problems that leaving out a locking primitive
|
|
|
|
would cause. This list is based on experiences reviewing such patches
|
|
|
|
over a rather long period of time, but improvements are always welcome!
|
|
|
|
|
|
|
|
0. Is RCU being applied to a read-mostly situation? If the data
|
2010-01-15 08:10:57 +08:00
|
|
|
structure is updated more than about 10% of the time, then you
|
|
|
|
should strongly consider some other approach, unless detailed
|
|
|
|
performance measurements show that RCU is nonetheless the right
|
|
|
|
tool for the job. Yes, RCU does reduce read-side overhead by
|
|
|
|
increasing write-side overhead, which is exactly why normal uses
|
|
|
|
of RCU will do much more reading than updating.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-05-13 03:21:05 +08:00
|
|
|
Another exception is where performance is not an issue, and RCU
|
|
|
|
provides a simpler implementation. An example of this situation
|
|
|
|
is the dynamic NMI code in the Linux 2.6 kernel, at least on
|
|
|
|
architectures where NMIs are rare.
|
|
|
|
|
|
|
|
Yet another exception is where the low real-time latency of RCU's
|
|
|
|
read-side primitives is critically important.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2017-06-07 06:04:03 +08:00
|
|
|
One final exception is where RCU readers are used to prevent
|
|
|
|
the ABA problem (https://en.wikipedia.org/wiki/ABA_problem)
|
|
|
|
for lockless updates. This does result in the mildly
|
|
|
|
counter-intuitive situation where rcu_read_lock() and
|
|
|
|
rcu_read_unlock() are used to protect updates, however, this
|
2022-09-09 19:46:26 +08:00
|
|
|
approach can provide the same simplifications to certain types
|
|
|
|
of lockless algorithms that garbage collectors do.
|
2017-06-07 06:04:03 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
1. Does the update code have proper mutual exclusion?
|
|
|
|
|
2021-05-20 12:32:36 +08:00
|
|
|
RCU does allow *readers* to run (almost) naked, but *writers* must
|
2005-04-17 06:20:36 +08:00
|
|
|
still use some sort of mutual exclusion, such as:
|
|
|
|
|
|
|
|
a. locking,
|
|
|
|
b. atomic operations, or
|
|
|
|
c. restricting updates to a single task.
|
|
|
|
|
|
|
|
If you choose #b, be prepared to describe how you have handled
|
|
|
|
memory barriers on weakly ordered machines (pretty much all of
|
2010-01-15 08:10:57 +08:00
|
|
|
them -- even x86 allows later loads to be reordered to precede
|
|
|
|
earlier stores), and be prepared to explain why this added
|
|
|
|
complexity is worthwhile. If you choose #c, be prepared to
|
2022-09-09 19:46:26 +08:00
|
|
|
explain how this single task does not become a major bottleneck
|
|
|
|
on large systems (for example, if the task is updating information
|
|
|
|
relating to itself that other tasks can read, there by definition
|
|
|
|
can be no bottleneck). Note that the definition of "large" has
|
|
|
|
changed significantly: Eight CPUs was "large" in the year 2000,
|
|
|
|
but a hundred CPUs was unremarkable in 2017.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
2. Do the RCU read-side critical sections make proper use of
|
|
|
|
rcu_read_lock() and friends? These primitives are needed
|
2008-05-13 03:21:05 +08:00
|
|
|
to prevent grace periods from ending prematurely, which
|
|
|
|
could result in data being unceremoniously freed out from
|
|
|
|
under your read-side code, which can greatly increase the
|
|
|
|
actuarial risk of your kernel.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-09-10 15:26:24 +08:00
|
|
|
As a rough rule of thumb, any dereference of an RCU-protected
|
2010-01-15 08:10:57 +08:00
|
|
|
pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
|
|
|
|
rcu_read_lock_sched(), or by the appropriate update-side lock.
|
2022-07-06 03:15:35 +08:00
|
|
|
Explicit disabling of preemption (preempt_disable(), for example)
|
|
|
|
can serve as rcu_read_lock_sched(), but is less readable and
|
|
|
|
prevents lockdep from detecting locking issues.
|
|
|
|
|
2023-01-13 16:31:08 +08:00
|
|
|
Please note that you *cannot* rely on code known to be built
|
2022-07-06 03:15:35 +08:00
|
|
|
only in non-preemptible kernels. Such code can and will break,
|
|
|
|
especially in kernels built with CONFIG_PREEMPT_COUNT=y.
|
2005-09-10 15:26:24 +08:00
|
|
|
|
2017-06-07 06:04:03 +08:00
|
|
|
Letting RCU-protected pointers "leak" out of an RCU read-side
|
2020-11-29 04:32:59 +08:00
|
|
|
critical section is every bit as bad as letting them leak out
|
2017-06-07 06:04:03 +08:00
|
|
|
from under a lock. Unless, of course, you have arranged some
|
|
|
|
other means of protection, such as a lock or a reference count
|
2021-05-20 12:32:36 +08:00
|
|
|
*before* letting them out of the RCU read-side critical section.
|
2017-06-07 06:04:03 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
3. Does the update code tolerate concurrent accesses?
|
|
|
|
|
|
|
|
The whole point of RCU is to permit readers to run without
|
|
|
|
any locks or atomic operations. This means that readers will
|
|
|
|
be running while updates are in progress. There are a number
|
|
|
|
of ways to handle this concurrency, depending on the situation:
|
|
|
|
|
2008-05-13 03:21:05 +08:00
|
|
|
a. Use the RCU variants of the list and hlist update
|
2010-01-15 08:10:57 +08:00
|
|
|
primitives to add, remove, and replace elements on
|
|
|
|
an RCU-protected list. Alternatively, use the other
|
|
|
|
RCU-protected data structures that have been added to
|
|
|
|
the Linux kernel.
|
2008-05-13 03:21:05 +08:00
|
|
|
|
|
|
|
This is almost always the best approach.
|
|
|
|
|
|
|
|
b. Proceed as in (a) above, but also maintain per-element
|
|
|
|
locks (that are acquired by both readers and writers)
|
2022-09-09 19:46:26 +08:00
|
|
|
that guard per-element state. Fields that the readers
|
|
|
|
refrain from accessing can be guarded by some other lock
|
|
|
|
acquired only by updaters, if desired.
|
2008-05-13 03:21:05 +08:00
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
This also works quite well.
|
2008-05-13 03:21:05 +08:00
|
|
|
|
2017-06-07 06:04:03 +08:00
|
|
|
c. Make updates appear atomic to readers. For example,
|
2010-01-15 08:10:57 +08:00
|
|
|
pointer updates to properly aligned fields will
|
|
|
|
appear atomic, as will individual atomic primitives.
|
2021-05-20 12:32:36 +08:00
|
|
|
Sequences of operations performed under a lock will *not*
|
2010-01-15 08:10:57 +08:00
|
|
|
appear to be atomic to RCU readers, nor will sequences
|
2022-09-09 19:46:26 +08:00
|
|
|
of multiple atomic primitives. One alternative is to
|
|
|
|
move multiple individual fields to a separate structure,
|
|
|
|
thus solving the multiple-field problem by imposing an
|
|
|
|
additional level of indirection.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-05-13 03:21:05 +08:00
|
|
|
This can work, but is starting to get a bit tricky.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
d. Carefully order the updates and the reads so that readers
|
|
|
|
see valid data at all phases of the update. This is often
|
|
|
|
more difficult than it sounds, especially given modern
|
|
|
|
CPUs' tendency to reorder memory references. One must
|
|
|
|
usually liberally sprinkle memory-ordering operations
|
|
|
|
through the code, making it difficult to understand and
|
|
|
|
to test. Where it works, it is better to use things
|
|
|
|
like smp_store_release() and smp_load_acquire(), but in
|
|
|
|
some cases the smp_mb() full memory barrier is required.
|
|
|
|
|
|
|
|
As noted earlier, it is usually better to group the
|
|
|
|
changing data into a separate structure, so that the
|
|
|
|
change may be made to appear atomic by updating a pointer
|
|
|
|
to reference a new structure containing updated values.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
4. Weakly ordered CPUs pose special challenges. Almost all CPUs
|
2010-01-15 08:10:57 +08:00
|
|
|
are weakly ordered -- even x86 CPUs allow later loads to be
|
|
|
|
reordered to precede earlier stores. RCU code must take all of
|
|
|
|
the following measures to prevent memory-corruption problems:
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
a. Readers must maintain proper ordering of their memory
|
|
|
|
accesses. The rcu_dereference() primitive ensures that
|
|
|
|
the CPU picks up the pointer before it picks up the data
|
|
|
|
that the pointer points to. This really is necessary
|
2020-11-29 04:32:59 +08:00
|
|
|
on Alpha CPUs.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
The rcu_dereference() primitive is also an excellent
|
2014-03-01 08:11:28 +08:00
|
|
|
documentation aid, letting the person reading the
|
|
|
|
code know exactly which pointers are protected by RCU.
|
2010-01-15 08:10:57 +08:00
|
|
|
Please note that compilers can also reorder code, and
|
|
|
|
they are becoming increasingly aggressive about doing
|
2014-03-01 08:11:28 +08:00
|
|
|
just that. The rcu_dereference() primitive therefore also
|
|
|
|
prevents destructive compiler optimizations. However,
|
|
|
|
with a bit of devious creativity, it is possible to
|
|
|
|
mishandle the return value from rcu_dereference().
|
2022-03-30 22:41:00 +08:00
|
|
|
Please see rcu_dereference.rst for more information.
|
2010-01-15 08:10:57 +08:00
|
|
|
|
|
|
|
The rcu_dereference() primitive is used by the
|
|
|
|
various "_rcu()" list-traversal primitives, such
|
|
|
|
as the list_for_each_entry_rcu(). Note that it is
|
|
|
|
perfectly legal (if redundant) for update-side code to
|
|
|
|
use rcu_dereference() and the "_rcu()" list-traversal
|
|
|
|
primitives. This is particularly useful in code that
|
2010-02-23 09:04:57 +08:00
|
|
|
is common to readers and updaters. However, lockdep
|
|
|
|
will complain if you access rcu_dereference() outside
|
2022-03-30 22:41:00 +08:00
|
|
|
of an RCU read-side critical section. See lockdep.rst
|
2010-02-23 09:04:57 +08:00
|
|
|
to learn what to do about this.
|
|
|
|
|
|
|
|
Of course, neither rcu_dereference() nor the "_rcu()"
|
|
|
|
list-traversal primitives can substitute for a good
|
|
|
|
concurrency design coordinating among multiple updaters.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-05-01 23:59:05 +08:00
|
|
|
b. If the list macros are being used, the list_add_tail_rcu()
|
|
|
|
and list_add_rcu() primitives must be used in order
|
|
|
|
to prevent weakly ordered machines from misordering
|
|
|
|
structure initialization and pointer planting.
|
2005-04-17 06:20:36 +08:00
|
|
|
Similarly, if the hlist macros are being used, the
|
2005-05-01 23:59:05 +08:00
|
|
|
hlist_add_head_rcu() primitive is required.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-05-01 23:59:05 +08:00
|
|
|
c. If the list macros are being used, the list_del_rcu()
|
|
|
|
primitive must be used to keep list_del()'s pointer
|
|
|
|
poisoning from inflicting toxic effects on concurrent
|
|
|
|
readers. Similarly, if the hlist macros are being used,
|
|
|
|
the hlist_del_rcu() primitive is required.
|
|
|
|
|
2010-01-15 08:10:57 +08:00
|
|
|
The list_replace_rcu() and hlist_replace_rcu() primitives
|
|
|
|
may be used to replace an old structure with a new one
|
|
|
|
in their respective types of RCU-protected lists.
|
|
|
|
|
|
|
|
d. Rules similar to (4b) and (4c) apply to the "hlist_nulls"
|
|
|
|
type of RCU-protected linked lists.
|
2005-05-01 23:59:05 +08:00
|
|
|
|
2010-01-15 08:10:57 +08:00
|
|
|
e. Updates must ensure that initialization of a given
|
2005-04-17 06:20:36 +08:00
|
|
|
structure happens before pointers to that structure are
|
|
|
|
publicized. Use the rcu_assign_pointer() primitive
|
|
|
|
when publicizing a pointer to a structure that can
|
|
|
|
be traversed by an RCU read-side critical section.
|
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
5. If any of call_rcu(), call_srcu(), call_rcu_tasks(),
|
|
|
|
call_rcu_tasks_rude(), or call_rcu_tasks_trace() is used,
|
|
|
|
the callback function may be invoked from softirq context,
|
|
|
|
and in any case with bottom halves disabled. In particular,
|
|
|
|
this callback function cannot block. If you need the callback
|
|
|
|
to block, run that code in a workqueue handler scheduled from
|
|
|
|
the callback. The queue_rcu_work() function does this for you
|
|
|
|
in the case of call_rcu().
|
2005-04-17 06:20:36 +08:00
|
|
|
|
doc: Remove obsolete RCU update functions from RCU documentation
Now that synchronize_rcu_bh, synchronize_rcu_bh_expedited, call_rcu_bh,
rcu_barrier_bh, synchronize_sched, synchronize_sched_expedited,
call_rcu_sched, rcu_barrier_sched, get_state_synchronize_sched,
and cond_synchronize_sched are obsolete, let's remove them from the
documentation aside from a small historical section.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-10 06:48:09 +08:00
|
|
|
6. Since synchronize_rcu() can block, it cannot be called
|
|
|
|
from any sort of irq context. The same rule applies
|
2022-09-09 19:46:26 +08:00
|
|
|
for synchronize_srcu(), synchronize_rcu_expedited(),
|
|
|
|
synchronize_srcu_expedited(), synchronize_rcu_tasks(),
|
|
|
|
synchronize_rcu_tasks_rude(), and synchronize_rcu_tasks_trace().
|
2010-01-15 08:10:57 +08:00
|
|
|
|
|
|
|
The expedited forms of these primitives have the same semantics
|
2022-09-09 19:46:26 +08:00
|
|
|
as the non-expedited forms, but expediting is more CPU intensive.
|
|
|
|
Use of the expedited primitives should be restricted to rare
|
|
|
|
configuration-change operations that would not normally be
|
|
|
|
undertaken while a real-time workload is running. Note that
|
|
|
|
IPI-sensitive real-time workloads can use the rcupdate.rcu_normal
|
|
|
|
kernel boot parameter to completely disable expedited grace
|
|
|
|
periods, though this might have performance implications.
|
2010-01-15 08:10:57 +08:00
|
|
|
|
2012-02-01 06:00:41 +08:00
|
|
|
In particular, if you find yourself invoking one of the expedited
|
|
|
|
primitives repeatedly in a loop, please do everyone a favor:
|
|
|
|
Restructure your code so that it batches the updates, allowing
|
|
|
|
a single non-expedited primitive to cover the entire batch.
|
|
|
|
This will very likely be faster than the loop containing the
|
|
|
|
expedited primitive, and will be much much easier on the rest
|
2022-09-09 19:46:26 +08:00
|
|
|
of the system, especially to real-time workloads running on the
|
|
|
|
rest of the system. Alternatively, instead use asynchronous
|
|
|
|
primitives such as call_rcu().
|
2012-02-01 06:00:41 +08:00
|
|
|
|
2021-06-25 00:05:52 +08:00
|
|
|
7. As of v4.20, a given kernel implements only one RCU flavor, which
|
|
|
|
is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y.
|
|
|
|
If the updater uses call_rcu() or synchronize_rcu(), then
|
|
|
|
the corresponding readers may use: (1) rcu_read_lock() and
|
|
|
|
rcu_read_unlock(), (2) any pair of primitives that disables
|
|
|
|
and re-enables softirq, for example, rcu_read_lock_bh() and
|
|
|
|
rcu_read_unlock_bh(), or (3) any pair of primitives that disables
|
|
|
|
and re-enables preemption, for example, rcu_read_lock_sched() and
|
|
|
|
rcu_read_unlock_sched(). If the updater uses synchronize_srcu()
|
|
|
|
or call_srcu(), then the corresponding readers must use
|
|
|
|
srcu_read_lock() and srcu_read_unlock(), and with the same
|
|
|
|
srcu_struct. The rules for the expedited RCU grace-period-wait
|
|
|
|
primitives are the same as for their non-expedited counterparts.
|
|
|
|
|
|
|
|
If the updater uses call_rcu_tasks() or synchronize_rcu_tasks(),
|
|
|
|
then the readers must refrain from executing voluntary
|
|
|
|
context switches, that is, from blocking. If the updater uses
|
|
|
|
call_rcu_tasks_trace() or synchronize_rcu_tasks_trace(), then
|
|
|
|
the corresponding readers must use rcu_read_lock_trace() and
|
|
|
|
rcu_read_unlock_trace(). If an updater uses call_rcu_tasks_rude()
|
|
|
|
or synchronize_rcu_tasks_rude(), then the corresponding readers
|
2022-09-09 19:46:26 +08:00
|
|
|
must use anything that disables preemption, for example,
|
|
|
|
preempt_disable() and preempt_enable().
|
2021-06-25 00:05:52 +08:00
|
|
|
|
|
|
|
Mixing things up will result in confusion and broken kernels, and
|
|
|
|
has even resulted in an exploitable security issue. Therefore,
|
2021-06-25 00:05:53 +08:00
|
|
|
when using non-obvious pairs of primitives, commenting is
|
|
|
|
of course a must. One example of non-obvious pairing is
|
|
|
|
the XDP feature in networking, which calls BPF programs from
|
|
|
|
network-driver NAPI (softirq) context. BPF relies heavily on RCU
|
|
|
|
protection for its data structures, but because the BPF program
|
|
|
|
invocation happens entirely within a single local_bh_disable()
|
|
|
|
section in a NAPI poll cycle, this usage is safe. The reason
|
|
|
|
that this usage is safe is that readers can use anything that
|
|
|
|
disables BH when updaters use call_rcu() or synchronize_rcu().
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
8. Although synchronize_rcu() is slower than is call_rcu(),
|
|
|
|
it usually results in simpler code. So, unless update
|
|
|
|
performance is critically important, the updaters cannot block,
|
|
|
|
or the latency of synchronize_rcu() is visible from userspace,
|
|
|
|
synchronize_rcu() should be used in preference to call_rcu().
|
|
|
|
Furthermore, kfree_rcu() and kvfree_rcu() usually result
|
|
|
|
in even simpler code than does synchronize_rcu() without
|
|
|
|
synchronize_rcu()'s multi-millisecond latency. So please take
|
|
|
|
advantage of kfree_rcu()'s and kvfree_rcu()'s "fire and forget"
|
|
|
|
memory-freeing capabilities where it applies.
|
2006-06-25 20:48:44 +08:00
|
|
|
|
|
|
|
An especially important property of the synchronize_rcu()
|
|
|
|
primitive is that it automatically self-limits: if grace periods
|
|
|
|
are delayed for whatever reason, then the synchronize_rcu()
|
|
|
|
primitive will correspondingly delay updates. In contrast,
|
|
|
|
code using call_rcu() should explicitly limit update rate in
|
|
|
|
cases where grace periods are delayed, as failing to do so can
|
|
|
|
result in excessive realtime latencies or even OOM conditions.
|
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
Ways of gaining this self-limiting property when using call_rcu(),
|
|
|
|
kfree_rcu(), or kvfree_rcu() include:
|
2006-06-25 20:48:44 +08:00
|
|
|
|
|
|
|
a. Keeping a count of the number of data-structure elements
|
2010-08-14 07:34:22 +08:00
|
|
|
used by the RCU-protected data structure, including
|
|
|
|
those waiting for a grace period to elapse. Enforce a
|
|
|
|
limit on this number, stalling updates as needed to allow
|
|
|
|
previously deferred frees to complete. Alternatively,
|
|
|
|
limit only the number awaiting deferred free rather than
|
|
|
|
the total number of elements.
|
|
|
|
|
|
|
|
One way to stall the updates is to acquire the update-side
|
|
|
|
mutex. (Don't try this with a spinlock -- other CPUs
|
|
|
|
spinning on the lock could prevent the grace period
|
|
|
|
from ever ending.) Another way to stall the updates
|
|
|
|
is for the updates to use a wrapper function around
|
|
|
|
the memory allocator, so that this wrapper function
|
|
|
|
simulates OOM when there is too much memory awaiting an
|
|
|
|
RCU grace period. There are of course many other
|
|
|
|
variations on this theme.
|
2006-06-25 20:48:44 +08:00
|
|
|
|
|
|
|
b. Limiting update rate. For example, if updates occur only
|
2013-12-06 06:56:54 +08:00
|
|
|
once per hour, then no explicit rate limiting is
|
|
|
|
required, unless your system is already badly broken.
|
|
|
|
Older versions of the dcache subsystem take this approach,
|
|
|
|
guarding updates with a global lock, limiting their rate.
|
2006-06-25 20:48:44 +08:00
|
|
|
|
|
|
|
c. Trusted update -- if updates can only be done manually by
|
|
|
|
superuser or some other trusted user, then it might not
|
|
|
|
be necessary to automatically limit them. The theory
|
|
|
|
here is that superuser already has lots of ways to crash
|
|
|
|
the machine.
|
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
d. Periodically invoke rcu_barrier(), permitting a limited
|
|
|
|
number of updates per grace period.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
The same cautions apply to call_srcu(), call_rcu_tasks(),
|
|
|
|
call_rcu_tasks_rude(), and call_rcu_tasks_trace(). This is
|
|
|
|
why there is an srcu_barrier(), rcu_barrier_tasks(),
|
|
|
|
rcu_barrier_tasks_rude(), and rcu_barrier_tasks_rude(),
|
|
|
|
respectively.
|
2010-01-15 08:10:57 +08:00
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
Note that although these primitives do take action to avoid
|
|
|
|
memory exhaustion when any given CPU has too many callbacks,
|
|
|
|
a determined user or administrator can still exhaust memory.
|
|
|
|
This is especially the case if a system with a large number of
|
|
|
|
CPUs has been configured to offload all of its RCU callbacks onto
|
|
|
|
a single CPU, or if the system has relatively little free memory.
|
2013-12-06 06:56:54 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
9. All RCU list-traversal primitives, which include
|
2012-10-21 03:33:37 +08:00
|
|
|
rcu_dereference(), list_for_each_entry_rcu(), and
|
|
|
|
list_for_each_safe_rcu(), must be either within an RCU read-side
|
|
|
|
critical section or must be protected by appropriate update-side
|
|
|
|
locks. RCU read-side critical sections are delimited by
|
|
|
|
rcu_read_lock() and rcu_read_unlock(), or by similar primitives
|
|
|
|
such as rcu_read_lock_bh() and rcu_read_unlock_bh(), in which
|
|
|
|
case the matching rcu_dereference() primitive must be used in
|
|
|
|
order to keep lockdep happy, in this case, rcu_dereference_bh().
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-05-13 03:21:05 +08:00
|
|
|
The reason that it is permissible to use RCU list-traversal
|
|
|
|
primitives when the update-side lock is held is that doing so
|
|
|
|
can be quite helpful in reducing code bloat when common code is
|
2010-04-10 06:39:12 +08:00
|
|
|
shared between readers and updaters. Additional primitives
|
2022-03-30 22:41:00 +08:00
|
|
|
are provided for this case, as discussed in lockdep.rst.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2020-09-25 11:53:25 +08:00
|
|
|
One exception to this rule is when data is only ever added to
|
|
|
|
the linked data structure, and is never removed during any
|
|
|
|
time that readers might be accessing that structure. In such
|
|
|
|
cases, READ_ONCE() may be used in place of rcu_dereference()
|
|
|
|
and the read-side markers (rcu_read_lock() and rcu_read_unlock(),
|
|
|
|
for example) may be omitted.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
10. Conversely, if you are in an RCU read-side critical section,
|
2021-05-20 12:32:36 +08:00
|
|
|
and you don't hold the appropriate update-side lock, you *must*
|
2008-05-13 03:21:05 +08:00
|
|
|
use the "_rcu()" variants of the list macros. Failing to do so
|
2010-01-15 08:10:57 +08:00
|
|
|
will break Alpha, cause aggressive compilers to generate bad code,
|
2022-09-09 19:46:26 +08:00
|
|
|
and confuse people trying to understand your code.
|
2005-05-01 23:59:05 +08:00
|
|
|
|
2018-10-06 07:18:13 +08:00
|
|
|
11. Any lock acquired by an RCU callback must be acquired elsewhere
|
2022-09-09 19:46:26 +08:00
|
|
|
with softirq disabled, e.g., via spin_lock_bh(). Failing to
|
|
|
|
disable softirq on a given acquisition of that lock will result
|
|
|
|
in deadlock as soon as the RCU softirq handler happens to run
|
|
|
|
your RCU callback while interrupting that acquisition's critical
|
|
|
|
section.
|
[PATCH] srcu-3: RCU variant permitting read-side blocking
Updated patch adding a variant of RCU that permits sleeping in read-side
critical sections. SRCU is as follows:
o Each use of SRCU creates its own srcu_struct, and each
srcu_struct has its own set of grace periods. This is
critical, as it prevents one subsystem with a blocking
reader from holding up SRCU grace periods for other
subsystems.
o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(),
and synchronize_srcu()) all take a pointer to a srcu_struct.
o The SRCU primitives must be called from process context.
o srcu_read_lock() returns an int that must be passed to
the matching srcu_read_unlock(). Realtime RCU avoids the
need for this by storing the state in the task struct,
but SRCU needs to allow a given code path to pass through
multiple SRCU domains -- storing state in the task struct
would therefore require either arbitrary space in the
task struct or arbitrary limits on SRCU nesting. So I
kicked the state-storage problem up to the caller.
Of course, it is not permitted to call synchronize_srcu()
while in an SRCU read-side critical section.
o There is no call_srcu(). It would not be hard to implement
one, but it seems like too easy a way to OOM the system.
(Hey, we have enough trouble with call_rcu(), which does
-not- permit readers to sleep!!!) So, if you want it,
please tell me why...
[josht@us.ibm.com: sparse notation]
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 17:17:02 +08:00
|
|
|
|
2018-10-06 07:18:13 +08:00
|
|
|
12. RCU callbacks can be and are executed in parallel. In many cases,
|
2007-07-16 14:41:03 +08:00
|
|
|
the callback code simply wrappers around kfree(), so that this
|
|
|
|
is not an issue (or, more accurately, to the extent that it is
|
|
|
|
an issue, the memory-allocator locking handles it). However,
|
|
|
|
if the callbacks do manipulate a shared data structure, they
|
|
|
|
must use whatever locking or other synchronization is required
|
|
|
|
to safely access and/or modify that data structure.
|
|
|
|
|
2019-03-07 03:24:35 +08:00
|
|
|
Do not assume that RCU callbacks will be executed on the same
|
|
|
|
CPU that executed the corresponding call_rcu() or call_srcu().
|
|
|
|
For example, if a given CPU goes offline while having an RCU
|
|
|
|
callback pending, then that RCU callback will execute on some
|
|
|
|
surviving CPU. (If this was not the case, a self-spawning RCU
|
|
|
|
callback would prevent the victim CPU from ever going offline.)
|
2021-05-20 12:32:36 +08:00
|
|
|
Furthermore, CPUs designated by rcu_nocbs= might well *always*
|
2019-03-07 03:24:35 +08:00
|
|
|
have their RCU callbacks executed on some other CPUs, in fact,
|
|
|
|
for some real-time workloads, this is the whole point of using
|
|
|
|
the rcu_nocbs= kernel boot parameter.
|
2008-05-13 03:21:05 +08:00
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
In addition, do not assume that callbacks queued in a given order
|
|
|
|
will be invoked in that order, even if they all are queued on the
|
|
|
|
same CPU. Furthermore, do not assume that same-CPU callbacks will
|
|
|
|
be invoked serially. For example, in recent kernels, CPUs can be
|
|
|
|
switched between offloaded and de-offloaded callback invocation,
|
|
|
|
and while a given CPU is undergoing such a switch, its callbacks
|
|
|
|
might be concurrently invoked by that CPU's softirq handler and
|
|
|
|
that CPU's rcuo kthread. At such times, that CPU's callbacks
|
|
|
|
might be executed both concurrently and out of order.
|
|
|
|
|
|
|
|
13. Unlike most flavors of RCU, it *is* permissible to block in an
|
2017-06-07 06:04:03 +08:00
|
|
|
SRCU read-side critical section (demarked by srcu_read_lock()
|
|
|
|
and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
|
|
|
|
Please note that if you don't need to sleep in read-side critical
|
|
|
|
sections, you should be using RCU rather than SRCU, because RCU
|
|
|
|
is almost always faster and easier to use than is SRCU.
|
|
|
|
|
|
|
|
Also unlike other forms of RCU, explicit initialization and
|
|
|
|
cleanup is required either at build time via DEFINE_SRCU()
|
|
|
|
or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
|
|
|
|
and cleanup_srcu_struct(). These last two are passed a
|
|
|
|
"struct srcu_struct" that defines the scope of a given
|
|
|
|
SRCU domain. Once initialized, the srcu_struct is passed
|
|
|
|
to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
|
|
|
|
synchronize_srcu_expedited(), and call_srcu(). A given
|
|
|
|
synchronize_srcu() waits only for SRCU read-side critical
|
2010-01-15 08:10:57 +08:00
|
|
|
sections governed by srcu_read_lock() and srcu_read_unlock()
|
|
|
|
calls that have been passed the same srcu_struct. This property
|
|
|
|
is what makes sleeping read-side critical sections tolerable --
|
|
|
|
a given subsystem delays only its own updates, not those of other
|
|
|
|
subsystems using SRCU. Therefore, SRCU is less prone to OOM the
|
|
|
|
system than RCU would be if RCU's read-side critical sections
|
|
|
|
were permitted to sleep.
|
[PATCH] srcu-3: RCU variant permitting read-side blocking
Updated patch adding a variant of RCU that permits sleeping in read-side
critical sections. SRCU is as follows:
o Each use of SRCU creates its own srcu_struct, and each
srcu_struct has its own set of grace periods. This is
critical, as it prevents one subsystem with a blocking
reader from holding up SRCU grace periods for other
subsystems.
o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(),
and synchronize_srcu()) all take a pointer to a srcu_struct.
o The SRCU primitives must be called from process context.
o srcu_read_lock() returns an int that must be passed to
the matching srcu_read_unlock(). Realtime RCU avoids the
need for this by storing the state in the task struct,
but SRCU needs to allow a given code path to pass through
multiple SRCU domains -- storing state in the task struct
would therefore require either arbitrary space in the
task struct or arbitrary limits on SRCU nesting. So I
kicked the state-storage problem up to the caller.
Of course, it is not permitted to call synchronize_srcu()
while in an SRCU read-side critical section.
o There is no call_srcu(). It would not be hard to implement
one, but it seems like too easy a way to OOM the system.
(Hey, we have enough trouble with call_rcu(), which does
-not- permit readers to sleep!!!) So, if you want it,
please tell me why...
[josht@us.ibm.com: sparse notation]
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 17:17:02 +08:00
|
|
|
|
|
|
|
The ability to sleep in read-side critical sections does not
|
|
|
|
come for free. First, corresponding srcu_read_lock() and
|
|
|
|
srcu_read_unlock() calls must be passed the same srcu_struct.
|
|
|
|
Second, grace-period-detection overhead is amortized only
|
|
|
|
over those updates sharing a given srcu_struct, rather than
|
|
|
|
being globally amortized as they are for other forms of RCU.
|
|
|
|
Therefore, SRCU should be used in preference to rw_semaphore
|
|
|
|
only in extremely read-intensive situations, or in situations
|
|
|
|
requiring SRCU's read-side deadlock immunity or low read-side
|
2017-06-07 06:04:03 +08:00
|
|
|
realtime latency. You should also consider percpu_rw_semaphore
|
|
|
|
when you need lightweight readers.
|
[PATCH] srcu-3: RCU variant permitting read-side blocking
Updated patch adding a variant of RCU that permits sleeping in read-side
critical sections. SRCU is as follows:
o Each use of SRCU creates its own srcu_struct, and each
srcu_struct has its own set of grace periods. This is
critical, as it prevents one subsystem with a blocking
reader from holding up SRCU grace periods for other
subsystems.
o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(),
and synchronize_srcu()) all take a pointer to a srcu_struct.
o The SRCU primitives must be called from process context.
o srcu_read_lock() returns an int that must be passed to
the matching srcu_read_unlock(). Realtime RCU avoids the
need for this by storing the state in the task struct,
but SRCU needs to allow a given code path to pass through
multiple SRCU domains -- storing state in the task struct
would therefore require either arbitrary space in the
task struct or arbitrary limits on SRCU nesting. So I
kicked the state-storage problem up to the caller.
Of course, it is not permitted to call synchronize_srcu()
while in an SRCU read-side critical section.
o There is no call_srcu(). It would not be hard to implement
one, but it seems like too easy a way to OOM the system.
(Hey, we have enough trouble with call_rcu(), which does
-not- permit readers to sleep!!!) So, if you want it,
please tell me why...
[josht@us.ibm.com: sparse notation]
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 17:17:02 +08:00
|
|
|
|
2017-06-07 06:04:03 +08:00
|
|
|
SRCU's expedited primitive (synchronize_srcu_expedited())
|
|
|
|
never sends IPIs to other CPUs, so it is easier on
|
doc: Remove obsolete RCU update functions from RCU documentation
Now that synchronize_rcu_bh, synchronize_rcu_bh_expedited, call_rcu_bh,
rcu_barrier_bh, synchronize_sched, synchronize_sched_expedited,
call_rcu_sched, rcu_barrier_sched, get_state_synchronize_sched,
and cond_synchronize_sched are obsolete, let's remove them from the
documentation aside from a small historical section.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-10 06:48:09 +08:00
|
|
|
real-time workloads than is synchronize_rcu_expedited().
|
2017-06-07 06:04:03 +08:00
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
It is also permissible to sleep in RCU Tasks Trace read-side
|
|
|
|
critical, which are delimited by rcu_read_lock_trace() and
|
|
|
|
rcu_read_unlock_trace(). However, this is a specialized flavor
|
|
|
|
of RCU, and you should not use it without first checking with
|
|
|
|
its current users. In most cases, you should instead use SRCU.
|
|
|
|
|
2019-03-07 03:24:35 +08:00
|
|
|
Note that rcu_assign_pointer() relates to SRCU just as it does to
|
|
|
|
other forms of RCU, but instead of rcu_dereference() you should
|
|
|
|
use srcu_dereference() in order to avoid lockdep splats.
|
2009-03-11 03:55:57 +08:00
|
|
|
|
2018-10-06 07:18:13 +08:00
|
|
|
14. The whole point of call_rcu(), synchronize_rcu(), and friends
|
2009-03-11 03:55:57 +08:00
|
|
|
is to wait until all pre-existing readers have finished before
|
|
|
|
carrying out some otherwise-destructive operation. It is
|
2021-05-20 12:32:36 +08:00
|
|
|
therefore critically important to *first* remove any path
|
2009-03-11 03:55:57 +08:00
|
|
|
that readers can follow that could be affected by the
|
2021-05-20 12:32:36 +08:00
|
|
|
destructive operation, and *only then* invoke call_rcu(),
|
2009-03-11 03:55:57 +08:00
|
|
|
synchronize_rcu(), or friends.
|
|
|
|
|
2010-01-15 08:10:57 +08:00
|
|
|
Because these primitives only wait for pre-existing readers, it
|
|
|
|
is the caller's responsibility to guarantee that any subsequent
|
|
|
|
readers will execute safely.
|
2009-06-26 00:08:18 +08:00
|
|
|
|
2021-05-20 12:32:36 +08:00
|
|
|
15. The various RCU read-side primitives do *not* necessarily contain
|
2010-01-15 08:10:57 +08:00
|
|
|
memory barriers. You should therefore plan for the CPU
|
|
|
|
and the compiler to freely reorder code into and out of RCU
|
|
|
|
read-side critical sections. It is the responsibility of the
|
|
|
|
RCU update-side primitives to deal with this.
|
2010-06-17 07:48:13 +08:00
|
|
|
|
2019-03-07 03:24:35 +08:00
|
|
|
For SRCU readers, you can use smp_mb__after_srcu_read_unlock()
|
|
|
|
immediately after an srcu_read_unlock() to get a full barrier.
|
|
|
|
|
2018-10-06 07:18:13 +08:00
|
|
|
16. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
|
2017-05-13 06:56:35 +08:00
|
|
|
__rcu sparse checks to validate your RCU code. These can help
|
|
|
|
find problems as follows:
|
2010-06-17 07:48:13 +08:00
|
|
|
|
2020-04-22 01:04:02 +08:00
|
|
|
CONFIG_PROVE_LOCKING:
|
2022-09-09 19:46:26 +08:00
|
|
|
check that accesses to RCU-protected data structures
|
|
|
|
are carried out under the proper RCU read-side critical
|
|
|
|
section, while holding the right combination of locks,
|
|
|
|
or whatever other conditions are appropriate.
|
2010-06-17 07:48:13 +08:00
|
|
|
|
2020-04-22 01:04:02 +08:00
|
|
|
CONFIG_DEBUG_OBJECTS_RCU_HEAD:
|
2022-09-09 19:46:26 +08:00
|
|
|
check that you don't pass the same object to call_rcu()
|
|
|
|
(or friends) before an RCU grace period has elapsed
|
|
|
|
since the last time that you passed that same object to
|
|
|
|
call_rcu() (or friends).
|
2010-06-17 07:48:13 +08:00
|
|
|
|
2020-04-22 01:04:02 +08:00
|
|
|
__rcu sparse checks:
|
2022-09-09 19:46:26 +08:00
|
|
|
tag the pointer to the RCU-protected data structure
|
|
|
|
with __rcu, and sparse will warn you if you access that
|
|
|
|
pointer without the services of one of the variants
|
|
|
|
of rcu_dereference().
|
2010-06-17 07:48:13 +08:00
|
|
|
|
|
|
|
These debugging aids can help you find problems that are
|
|
|
|
otherwise extremely difficult to spot.
|
2017-06-07 06:04:03 +08:00
|
|
|
|
2022-09-09 19:46:26 +08:00
|
|
|
17. If you pass a callback function defined within a module to one of
|
|
|
|
call_rcu(), call_srcu(), call_rcu_tasks(), call_rcu_tasks_rude(),
|
|
|
|
or call_rcu_tasks_trace(), then it is necessary to wait for all
|
|
|
|
pending callbacks to be invoked before unloading that module.
|
|
|
|
Note that it is absolutely *not* sufficient to wait for a grace
|
|
|
|
period! For example, synchronize_rcu() implementation is *not*
|
|
|
|
guaranteed to wait for callbacks registered on other CPUs via
|
|
|
|
call_rcu(). Or even on the current CPU if that CPU recently
|
|
|
|
went offline and came back online.
|
2017-06-07 06:04:03 +08:00
|
|
|
|
|
|
|
You instead need to use one of the barrier functions:
|
|
|
|
|
2020-04-22 01:04:02 +08:00
|
|
|
- call_rcu() -> rcu_barrier()
|
|
|
|
- call_srcu() -> srcu_barrier()
|
2022-09-09 19:46:26 +08:00
|
|
|
- call_rcu_tasks() -> rcu_barrier_tasks()
|
|
|
|
- call_rcu_tasks_rude() -> rcu_barrier_tasks_rude()
|
|
|
|
- call_rcu_tasks_trace() -> rcu_barrier_tasks_trace()
|
2017-06-07 06:04:03 +08:00
|
|
|
|
2021-05-20 12:32:36 +08:00
|
|
|
However, these barrier functions are absolutely *not* guaranteed
|
2022-09-09 19:46:26 +08:00
|
|
|
to wait for a grace period. For example, if there are no
|
|
|
|
call_rcu() callbacks queued anywhere in the system, rcu_barrier()
|
|
|
|
can and will return immediately.
|
|
|
|
|
|
|
|
So if you need to wait for both a grace period and for all
|
|
|
|
pre-existing callbacks, you will need to invoke both functions,
|
|
|
|
with the pair depending on the flavor of RCU:
|
|
|
|
|
|
|
|
- Either synchronize_rcu() or synchronize_rcu_expedited(),
|
|
|
|
together with rcu_barrier()
|
|
|
|
- Either synchronize_srcu() or synchronize_srcu_expedited(),
|
|
|
|
together with and srcu_barrier()
|
|
|
|
- synchronize_rcu_tasks() and rcu_barrier_tasks()
|
|
|
|
- synchronize_tasks_rude() and rcu_barrier_tasks_rude()
|
|
|
|
- synchronize_tasks_trace() and rcu_barrier_tasks_trace()
|
|
|
|
|
|
|
|
If necessary, you can use something like workqueues to execute
|
|
|
|
the requisite pair of functions concurrently.
|
2017-06-07 06:04:03 +08:00
|
|
|
|
2022-03-30 22:41:00 +08:00
|
|
|
See rcubarrier.rst for more information.
|