mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-11 04:18:39 +08:00
doc: Update whatisRCU.rst
This commit updates whatisRCU.rst with wordsmithing and updates provokes by the passage of time. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit is contained in:
parent
3f58c55e23
commit
148750d736
@ -16,18 +16,23 @@ to start learning about RCU:
|
|||||||
| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
|
| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
|
||||||
| 2019 Big API Table https://lwn.net/Articles/777165/
|
| 2019 Big API Table https://lwn.net/Articles/777165/
|
||||||
|
|
||||||
|
For those preferring video:
|
||||||
|
|
||||||
|
| 1. Unraveling RCU Mysteries: Fundamentals https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries
|
||||||
|
| 2. Unraveling RCU Mysteries: Additional Use Cases https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries-additional-use-cases
|
||||||
|
|
||||||
|
|
||||||
What is RCU?
|
What is RCU?
|
||||||
|
|
||||||
RCU is a synchronization mechanism that was added to the Linux kernel
|
RCU is a synchronization mechanism that was added to the Linux kernel
|
||||||
during the 2.5 development effort that is optimized for read-mostly
|
during the 2.5 development effort that is optimized for read-mostly
|
||||||
situations. Although RCU is actually quite simple once you understand it,
|
situations. Although RCU is actually quite simple, making effective use
|
||||||
getting there can sometimes be a challenge. Part of the problem is that
|
of it requires you to think differently about your code. Another part
|
||||||
most of the past descriptions of RCU have been written with the mistaken
|
of the problem is the mistaken assumption that there is "one true way" to
|
||||||
assumption that there is "one true way" to describe RCU. Instead,
|
describe and to use RCU. Instead, the experience has been that different
|
||||||
the experience has been that different people must take different paths
|
people must take different paths to arrive at an understanding of RCU,
|
||||||
to arrive at an understanding of RCU. This document provides several
|
depending on their experiences and use cases. This document provides
|
||||||
different paths, as follows:
|
several different paths, as follows:
|
||||||
|
|
||||||
:ref:`1. RCU OVERVIEW <1_whatisRCU>`
|
:ref:`1. RCU OVERVIEW <1_whatisRCU>`
|
||||||
|
|
||||||
@ -157,34 +162,36 @@ rcu_read_lock()
|
|||||||
^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^
|
||||||
void rcu_read_lock(void);
|
void rcu_read_lock(void);
|
||||||
|
|
||||||
Used by a reader to inform the reclaimer that the reader is
|
This temporal primitive is used by a reader to inform the
|
||||||
entering an RCU read-side critical section. It is illegal
|
reclaimer that the reader is entering an RCU read-side critical
|
||||||
to block while in an RCU read-side critical section, though
|
section. It is illegal to block while in an RCU read-side
|
||||||
kernels built with CONFIG_PREEMPT_RCU can preempt RCU
|
critical section, though kernels built with CONFIG_PREEMPT_RCU
|
||||||
read-side critical sections. Any RCU-protected data structure
|
can preempt RCU read-side critical sections. Any RCU-protected
|
||||||
accessed during an RCU read-side critical section is guaranteed to
|
data structure accessed during an RCU read-side critical section
|
||||||
remain unreclaimed for the full duration of that critical section.
|
is guaranteed to remain unreclaimed for the full duration of that
|
||||||
Reference counts may be used in conjunction with RCU to maintain
|
critical section. Reference counts may be used in conjunction
|
||||||
longer-term references to data structures.
|
with RCU to maintain longer-term references to data structures.
|
||||||
|
|
||||||
rcu_read_unlock()
|
rcu_read_unlock()
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
void rcu_read_unlock(void);
|
void rcu_read_unlock(void);
|
||||||
|
|
||||||
Used by a reader to inform the reclaimer that the reader is
|
This temporal primitives is used by a reader to inform the
|
||||||
exiting an RCU read-side critical section. Note that RCU
|
reclaimer that the reader is exiting an RCU read-side critical
|
||||||
read-side critical sections may be nested and/or overlapping.
|
section. Note that RCU read-side critical sections may be nested
|
||||||
|
and/or overlapping.
|
||||||
|
|
||||||
synchronize_rcu()
|
synchronize_rcu()
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
void synchronize_rcu(void);
|
void synchronize_rcu(void);
|
||||||
|
|
||||||
Marks the end of updater code and the beginning of reclaimer
|
This temporal primitive marks the end of updater code and the
|
||||||
code. It does this by blocking until all pre-existing RCU
|
beginning of reclaimer code. It does this by blocking until
|
||||||
read-side critical sections on all CPUs have completed.
|
all pre-existing RCU read-side critical sections on all CPUs
|
||||||
Note that synchronize_rcu() will **not** necessarily wait for
|
have completed. Note that synchronize_rcu() will **not**
|
||||||
any subsequent RCU read-side critical sections to complete.
|
necessarily wait for any subsequent RCU read-side critical
|
||||||
For example, consider the following sequence of events::
|
sections to complete. For example, consider the following
|
||||||
|
sequence of events::
|
||||||
|
|
||||||
CPU 0 CPU 1 CPU 2
|
CPU 0 CPU 1 CPU 2
|
||||||
----------------- ------------------------- ---------------
|
----------------- ------------------------- ---------------
|
||||||
@ -211,13 +218,13 @@ synchronize_rcu()
|
|||||||
to be useful in all but the most read-intensive situations,
|
to be useful in all but the most read-intensive situations,
|
||||||
synchronize_rcu()'s overhead must also be quite small.
|
synchronize_rcu()'s overhead must also be quite small.
|
||||||
|
|
||||||
The call_rcu() API is a callback form of synchronize_rcu(),
|
The call_rcu() API is an asynchronous callback form of
|
||||||
and is described in more detail in a later section. Instead of
|
synchronize_rcu(), and is described in more detail in a later
|
||||||
blocking, it registers a function and argument which are invoked
|
section. Instead of blocking, it registers a function and
|
||||||
after all ongoing RCU read-side critical sections have completed.
|
argument which are invoked after all ongoing RCU read-side
|
||||||
This callback variant is particularly useful in situations where
|
critical sections have completed. This callback variant is
|
||||||
it is illegal to block or where update-side performance is
|
particularly useful in situations where it is illegal to block
|
||||||
critically important.
|
or where update-side performance is critically important.
|
||||||
|
|
||||||
However, the call_rcu() API should not be used lightly, as use
|
However, the call_rcu() API should not be used lightly, as use
|
||||||
of the synchronize_rcu() API generally results in simpler code.
|
of the synchronize_rcu() API generally results in simpler code.
|
||||||
@ -236,11 +243,13 @@ rcu_assign_pointer()
|
|||||||
would be cool to be able to declare a function in this manner.
|
would be cool to be able to declare a function in this manner.
|
||||||
(Compiler experts will no doubt disagree.)
|
(Compiler experts will no doubt disagree.)
|
||||||
|
|
||||||
The updater uses this function to assign a new value to an
|
The updater uses this spatial macro to assign a new value to an
|
||||||
RCU-protected pointer, in order to safely communicate the change
|
RCU-protected pointer, in order to safely communicate the change
|
||||||
in value from the updater to the reader. This macro does not
|
in value from the updater to the reader. This is a spatial (as
|
||||||
evaluate to an rvalue, but it does execute any memory-barrier
|
opposed to temporal) macro. It does not evaluate to an rvalue,
|
||||||
instructions required for a given CPU architecture.
|
but it does execute any memory-barrier instructions required
|
||||||
|
for a given CPU architecture. Its ordering properties are that
|
||||||
|
of a store-release operation.
|
||||||
|
|
||||||
Perhaps just as important, it serves to document (1) which
|
Perhaps just as important, it serves to document (1) which
|
||||||
pointers are protected by RCU and (2) the point at which a
|
pointers are protected by RCU and (2) the point at which a
|
||||||
@ -255,14 +264,15 @@ rcu_dereference()
|
|||||||
Like rcu_assign_pointer(), rcu_dereference() must be implemented
|
Like rcu_assign_pointer(), rcu_dereference() must be implemented
|
||||||
as a macro.
|
as a macro.
|
||||||
|
|
||||||
The reader uses rcu_dereference() to fetch an RCU-protected
|
The reader uses the spatial rcu_dereference() macro to fetch
|
||||||
pointer, which returns a value that may then be safely
|
an RCU-protected pointer, which returns a value that may
|
||||||
dereferenced. Note that rcu_dereference() does not actually
|
then be safely dereferenced. Note that rcu_dereference()
|
||||||
dereference the pointer, instead, it protects the pointer for
|
does not actually dereference the pointer, instead, it
|
||||||
later dereferencing. It also executes any needed memory-barrier
|
protects the pointer for later dereferencing. It also
|
||||||
instructions for a given CPU architecture. Currently, only Alpha
|
executes any needed memory-barrier instructions for a given
|
||||||
needs memory barriers within rcu_dereference() -- on other CPUs,
|
CPU architecture. Currently, only Alpha needs memory barriers
|
||||||
it compiles to nothing, not even a compiler directive.
|
within rcu_dereference() -- on other CPUs, it compiles to a
|
||||||
|
volatile load.
|
||||||
|
|
||||||
Common coding practice uses rcu_dereference() to copy an
|
Common coding practice uses rcu_dereference() to copy an
|
||||||
RCU-protected pointer to a local variable, then dereferences
|
RCU-protected pointer to a local variable, then dereferences
|
||||||
@ -355,12 +365,15 @@ reader, updater, and reclaimer.
|
|||||||
synchronize_rcu() & call_rcu()
|
synchronize_rcu() & call_rcu()
|
||||||
|
|
||||||
|
|
||||||
The RCU infrastructure observes the time sequence of rcu_read_lock(),
|
The RCU infrastructure observes the temporal sequence of rcu_read_lock(),
|
||||||
rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
|
rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
|
||||||
order to determine when (1) synchronize_rcu() invocations may return
|
order to determine when (1) synchronize_rcu() invocations may return
|
||||||
to their callers and (2) call_rcu() callbacks may be invoked. Efficient
|
to their callers and (2) call_rcu() callbacks may be invoked. Efficient
|
||||||
implementations of the RCU infrastructure make heavy use of batching in
|
implementations of the RCU infrastructure make heavy use of batching in
|
||||||
order to amortize their overhead over many uses of the corresponding APIs.
|
order to amortize their overhead over many uses of the corresponding APIs.
|
||||||
|
The rcu_assign_pointer() and rcu_dereference() invocations communicate
|
||||||
|
spatial changes via stores to and loads from the RCU-protected pointer in
|
||||||
|
question.
|
||||||
|
|
||||||
There are at least three flavors of RCU usage in the Linux kernel. The diagram
|
There are at least three flavors of RCU usage in the Linux kernel. The diagram
|
||||||
above shows the most common one. On the updater side, the rcu_assign_pointer(),
|
above shows the most common one. On the updater side, the rcu_assign_pointer(),
|
||||||
@ -392,7 +405,9 @@ b. RCU applied to networking data structures that may be subjected
|
|||||||
c. RCU applied to scheduler and interrupt/NMI-handler tasks.
|
c. RCU applied to scheduler and interrupt/NMI-handler tasks.
|
||||||
|
|
||||||
Again, most uses will be of (a). The (b) and (c) cases are important
|
Again, most uses will be of (a). The (b) and (c) cases are important
|
||||||
for specialized uses, but are relatively uncommon.
|
for specialized uses, but are relatively uncommon. The SRCU, RCU-Tasks,
|
||||||
|
RCU-Tasks-Rude, and RCU-Tasks-Trace have similar relationships among
|
||||||
|
their assorted primitives.
|
||||||
|
|
||||||
.. _3_whatisRCU:
|
.. _3_whatisRCU:
|
||||||
|
|
||||||
@ -468,7 +483,7 @@ So, to sum up:
|
|||||||
- Within an RCU read-side critical section, use rcu_dereference()
|
- Within an RCU read-side critical section, use rcu_dereference()
|
||||||
to dereference RCU-protected pointers.
|
to dereference RCU-protected pointers.
|
||||||
|
|
||||||
- Use some solid scheme (such as locks or semaphores) to
|
- Use some solid design (such as locks or semaphores) to
|
||||||
keep concurrent updates from interfering with each other.
|
keep concurrent updates from interfering with each other.
|
||||||
|
|
||||||
- Use rcu_assign_pointer() to update an RCU-protected pointer.
|
- Use rcu_assign_pointer() to update an RCU-protected pointer.
|
||||||
@ -579,6 +594,14 @@ to avoid having to write your own callback::
|
|||||||
|
|
||||||
kfree_rcu(old_fp, rcu);
|
kfree_rcu(old_fp, rcu);
|
||||||
|
|
||||||
|
If the occasional sleep is permitted, the single-argument form may
|
||||||
|
be used, omitting the rcu_head structure from struct foo.
|
||||||
|
|
||||||
|
kfree_rcu(old_fp);
|
||||||
|
|
||||||
|
This variant of kfree_rcu() almost never blocks, but might do so by
|
||||||
|
invoking synchronize_rcu() in response to memory-allocation failure.
|
||||||
|
|
||||||
Again, see checklist.rst for additional rules governing the use of RCU.
|
Again, see checklist.rst for additional rules governing the use of RCU.
|
||||||
|
|
||||||
.. _5_whatisRCU:
|
.. _5_whatisRCU:
|
||||||
@ -596,7 +619,7 @@ lacking both functionality and performance. However, they are useful
|
|||||||
in getting a feel for how RCU works. See kernel/rcu/update.c for a
|
in getting a feel for how RCU works. See kernel/rcu/update.c for a
|
||||||
production-quality implementation, and see:
|
production-quality implementation, and see:
|
||||||
|
|
||||||
http://www.rdrop.com/users/paulmck/RCU
|
https://docs.google.com/document/d/1X0lThx8OK0ZgLMqVoXiR4ZrGURHrXK6NyLRbeXe3Xac/edit
|
||||||
|
|
||||||
for papers describing the Linux kernel RCU implementation. The OLS'01
|
for papers describing the Linux kernel RCU implementation. The OLS'01
|
||||||
and OLS'02 papers are a good introduction, and the dissertation provides
|
and OLS'02 papers are a good introduction, and the dissertation provides
|
||||||
@ -929,6 +952,8 @@ unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
|
|||||||
initialized after each and every call to kmem_cache_alloc(), which renders
|
initialized after each and every call to kmem_cache_alloc(), which renders
|
||||||
reference-free spinlock acquisition completely unsafe. Therefore, when
|
reference-free spinlock acquisition completely unsafe. Therefore, when
|
||||||
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
|
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
|
||||||
|
(Those willing to use a kmem_cache constructor may also use locking,
|
||||||
|
including cache-friendly sequence locking.)
|
||||||
|
|
||||||
With traditional reference counting -- such as that implemented by the
|
With traditional reference counting -- such as that implemented by the
|
||||||
kref library in Linux -- there is typically code that runs when the last
|
kref library in Linux -- there is typically code that runs when the last
|
||||||
@ -1047,6 +1072,30 @@ sched::
|
|||||||
rcu_read_lock_sched_held
|
rcu_read_lock_sched_held
|
||||||
|
|
||||||
|
|
||||||
|
RCU-Tasks::
|
||||||
|
|
||||||
|
Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
N/A call_rcu_tasks rcu_barrier_tasks
|
||||||
|
synchronize_rcu_tasks
|
||||||
|
|
||||||
|
|
||||||
|
RCU-Tasks-Rude::
|
||||||
|
|
||||||
|
Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
N/A call_rcu_tasks_rude rcu_barrier_tasks_rude
|
||||||
|
synchronize_rcu_tasks_rude
|
||||||
|
|
||||||
|
|
||||||
|
RCU-Tasks-Trace::
|
||||||
|
|
||||||
|
Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
rcu_read_lock_trace call_rcu_tasks_trace rcu_barrier_tasks_trace
|
||||||
|
rcu_read_unlock_trace synchronize_rcu_tasks_trace
|
||||||
|
|
||||||
|
|
||||||
SRCU::
|
SRCU::
|
||||||
|
|
||||||
Critical sections Grace period Barrier
|
Critical sections Grace period Barrier
|
||||||
@ -1087,35 +1136,43 @@ list can be helpful:
|
|||||||
|
|
||||||
a. Will readers need to block? If so, you need SRCU.
|
a. Will readers need to block? If so, you need SRCU.
|
||||||
|
|
||||||
b. What about the -rt patchset? If readers would need to block
|
b. Will readers need to block and are you doing tracing, for
|
||||||
in an non-rt kernel, you need SRCU. If readers would block
|
example, ftrace or BPF? If so, you need RCU-tasks,
|
||||||
in a -rt kernel, but not in a non-rt kernel, SRCU is not
|
RCU-tasks-rude, and/or RCU-tasks-trace.
|
||||||
necessary. (The -rt patchset turns spinlocks into sleeplocks,
|
|
||||||
hence this distinction.)
|
|
||||||
|
|
||||||
c. Do you need to treat NMI handlers, hardirq handlers,
|
c. What about the -rt patchset? If readers would need to block in
|
||||||
|
an non-rt kernel, you need SRCU. If readers would block when
|
||||||
|
acquiring spinlocks in a -rt kernel, but not in a non-rt kernel,
|
||||||
|
SRCU is not necessary. (The -rt patchset turns spinlocks into
|
||||||
|
sleeplocks, hence this distinction.)
|
||||||
|
|
||||||
|
d. Do you need to treat NMI handlers, hardirq handlers,
|
||||||
and code segments with preemption disabled (whether
|
and code segments with preemption disabled (whether
|
||||||
via preempt_disable(), local_irq_save(), local_bh_disable(),
|
via preempt_disable(), local_irq_save(), local_bh_disable(),
|
||||||
or some other mechanism) as if they were explicit RCU readers?
|
or some other mechanism) as if they were explicit RCU readers?
|
||||||
If so, RCU-sched is the only choice that will work for you.
|
If so, RCU-sched readers are the only choice that will work
|
||||||
|
for you, but since about v4.20 you use can use the vanilla RCU
|
||||||
|
update primitives.
|
||||||
|
|
||||||
d. Do you need RCU grace periods to complete even in the face
|
e. Do you need RCU grace periods to complete even in the face of
|
||||||
of softirq monopolization of one or more of the CPUs? For
|
softirq monopolization of one or more of the CPUs? For example,
|
||||||
example, is your code subject to network-based denial-of-service
|
is your code subject to network-based denial-of-service attacks?
|
||||||
attacks? If so, you should disable softirq across your readers,
|
If so, you should disable softirq across your readers, for
|
||||||
for example, by using rcu_read_lock_bh().
|
example, by using rcu_read_lock_bh(). Since about v4.20 you
|
||||||
|
use can use the vanilla RCU update primitives.
|
||||||
|
|
||||||
e. Is your workload too update-intensive for normal use of
|
f. Is your workload too update-intensive for normal use of
|
||||||
RCU, but inappropriate for other synchronization mechanisms?
|
RCU, but inappropriate for other synchronization mechanisms?
|
||||||
If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
|
If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
|
||||||
named SLAB_DESTROY_BY_RCU). But please be careful!
|
named SLAB_DESTROY_BY_RCU). But please be careful!
|
||||||
|
|
||||||
f. Do you need read-side critical sections that are respected
|
g. Do you need read-side critical sections that are respected even
|
||||||
even though they are in the middle of the idle loop, during
|
on CPUs that are deep in the idle loop, during entry to or exit
|
||||||
user-mode execution, or on an offlined CPU? If so, SRCU is the
|
from user-mode execution, or on an offlined CPU? If so, SRCU
|
||||||
only choice that will work for you.
|
and RCU Tasks Trace are the only choices that will work for you,
|
||||||
|
with SRCU being strongly preferred in almost all cases.
|
||||||
|
|
||||||
g. Otherwise, use RCU.
|
h. Otherwise, use RCU.
|
||||||
|
|
||||||
Of course, this all assumes that you have determined that RCU is in fact
|
Of course, this all assumes that you have determined that RCU is in fact
|
||||||
the right tool for your job.
|
the right tool for your job.
|
||||||
|
Loading…
Reference in New Issue
Block a user