linux/kernel/rcu
Joel Fernandes (Google) 754aa6427e srcu: Clarify comments on memory barrier "E"
There is an smp_mb() named "E" in srcu_flip() immediately before the
increment (flip) of the srcu_struct structure's ->srcu_idx.

The purpose of E is to order the preceding scan's read of lock counters
against the flipping of the ->srcu_idx, in order to prevent new readers
from continuing to use the old ->srcu_idx value, which might needlessly
extend the grace period.

However, this ordering is already enforced because of the control
dependency between the preceding scan and the ->srcu_idx flip.
This control dependency exists because atomic_long_read() is used
to scan the counts, because WRITE_ONCE() is used to flip ->srcu_idx,
and because ->srcu_idx is not flipped until the ->srcu_lock_count[] and
->srcu_unlock_count[] counts match.  And such a match cannot happen when
there is an in-flight reader that started before the flip (observation
courtesy Mathieu Desnoyers).

The litmus test below (courtesy of Frederic Weisbecker, with changes
for ctrldep by Boqun and Joel) shows this:

C srcu
(*
 * bad condition: P0's first scan (SCAN1) saw P1's idx=0 LOCK count inc, though P1 saw flip.
 *
 * So basically, the ->po ordering on both P0 and P1 is enforced via ->ppo
 * (control deps) on both sides, and both P0 and P1 are interconnected by ->rf
 * relations. Combining the ->ppo with ->rf, a cycle is impossible.
 *)

{}

// updater
P0(int *IDX, int *LOCK0, int *UNLOCK0, int *LOCK1, int *UNLOCK1)
{
        int lock1;
        int unlock1;
        int lock0;
        int unlock0;

        // SCAN1
        unlock1 = READ_ONCE(*UNLOCK1);
        smp_mb(); // A
        lock1 = READ_ONCE(*LOCK1);

        // FLIP
        if (lock1 == unlock1) {   // Control dep
                smp_mb(); // E    // Remove E and still passes.
                WRITE_ONCE(*IDX, 1);
                smp_mb(); // D

                // SCAN2
                unlock0 = READ_ONCE(*UNLOCK0);
                smp_mb(); // A
                lock0 = READ_ONCE(*LOCK0);
        }
}

// reader
P1(int *IDX, int *LOCK0, int *UNLOCK0, int *LOCK1, int *UNLOCK1)
{
        int tmp;
        int idx1;
        int idx2;

        // 1st reader
        idx1 = READ_ONCE(*IDX);
        if (idx1 == 0) {         // Control dep
                tmp = READ_ONCE(*LOCK0);
                WRITE_ONCE(*LOCK0, tmp + 1);
                smp_mb(); /* B and C */
                tmp = READ_ONCE(*UNLOCK0);
                WRITE_ONCE(*UNLOCK0, tmp + 1);
        } else {
                tmp = READ_ONCE(*LOCK1);
                WRITE_ONCE(*LOCK1, tmp + 1);
                smp_mb(); /* B and C */
                tmp = READ_ONCE(*UNLOCK1);
                WRITE_ONCE(*UNLOCK1, tmp + 1);
        }
}

exists (0:lock1=1 /\ 1:idx1=1)

More complicated litmus tests with multiple SRCU readers also show that
memory barrier E is not needed.

This commit therefore clarifies the comment on memory barrier E.

Why not also remove that redundant smp_mb()?

Because control dependencies are quite fragile due to their not being
recognized by most compilers and tools.  Control dependencies therefore
exact an ongoing maintenance burden, and such a burden cannot be justified
in this slowpath.  Therefore, that smp_mb() stays until such time as
its overhead becomes a measurable problem in a real workload running on
a real production system, or until such time as compilers start paying
attention to this sort of control dependency.

Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-04-05 13:47:18 +00:00
..
Kconfig printk changes for 6.2 2022-12-12 09:01:36 -08:00
Kconfig.debug rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts 2023-01-09 12:09:52 -08:00
Makefile rcuperf: Change rcuperf to rcuscale 2020-08-24 18:39:24 -07:00
rcu_segcblist.c rcu: Throttle callback invocation based on number of ready callbacks 2023-01-03 17:28:34 -08:00
rcu_segcblist.h rcu: Throttle callback invocation based on number of ready callbacks 2023-01-03 17:28:34 -08:00
rcu.h rcu: Further comment and explain the state space of GP sequences 2023-04-05 13:47:17 +00:00
rcuscale.c rcu/rcuscale: Use call_rcu_hurry() for async reader test 2022-11-29 14:04:33 -08:00
rcutorture.c rcutorture: Drop sparse lock-acquisition annotations 2023-01-05 12:10:35 -08:00
refscale.c refscale: Add tests using SLAB_TYPESAFE_BY_RCU 2023-01-05 12:09:42 -08:00
srcutiny.c srcu: Make Tiny synchronize_srcu() check for readers 2022-12-01 15:49:12 -08:00
srcutree.c srcu: Clarify comments on memory barrier "E" 2023-04-05 13:47:18 +00:00
sync.c rcu/sync: Use call_rcu_hurry() instead of call_rcu 2022-11-29 14:04:33 -08:00
tasks.h rcu-tasks: Handle queue-shrink/callback-enqueue race condition 2023-01-03 17:52:17 -08:00
tiny.c rcu: Refactor kvfree_call_rcu() and high-level helpers 2023-01-03 17:48:40 -08:00
tree_exp.h rcu: Allow expedited RCU CPU stall warnings to dump task stacks 2023-01-03 17:47:44 -08:00
tree_nocb.h rcu: Shrinker for lazy rcu 2022-11-29 14:02:52 -08:00
tree_plugin.h rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity() 2022-10-18 14:59:57 -07:00
tree_stall.h rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts 2023-01-09 12:09:52 -08:00
tree.c Merge branch 'stall.2023.01.09a' into HEAD 2023-02-02 16:40:07 -08:00
tree.h rcu: Add RCU stall diagnosis information 2023-01-05 12:21:11 -08:00
update.c Merge branch 'stall.2023.01.09a' into HEAD 2023-02-02 16:40:07 -08:00