Commit Graph

131 Commits

Author SHA1 Message Date
Paul E. McKenney
660349ac79 srcu: Move ->sda_is_static from srcu_struct to srcu_usage
This commit moves the ->sda_is_static field from the srcu_struct structure
to the srcu_usage structure to reduce the size of the former in order
to improve cache locality.

Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:36:28 -07:00
Paul E. McKenney
3b46679c62 srcu: Move heuristics fields from srcu_struct to srcu_usage
This commit moves the ->srcu_size_jiffies, ->srcu_n_lock_retries,
and ->srcu_n_exp_nodelay fields from the srcu_struct structure to the
srcu_usage structure to reduce the size of the former in order to improve
cache locality.

Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:36:23 -07:00
Paul E. McKenney
03200b5ca3 srcu: Move grace-period fields from srcu_struct to srcu_usage
This commit moves the ->srcu_gp_seq, ->srcu_gp_seq_needed,
->srcu_gp_seq_needed_exp, ->srcu_gp_start, and ->srcu_last_gp_end fields
from the srcu_struct structure to the srcu_usage structure to reduce
the size of the former in order to improve cache locality.

Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:36:19 -07:00
Paul E. McKenney
e3a6ab25cf srcu: Move ->srcu_gp_mutex from srcu_struct to srcu_usage
This commit moves the ->srcu_gp_mutex field from the srcu_struct structure
to the srcu_usage structure to reduce the size of the former in order
to improve cache locality.

Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:36:14 -07:00
Paul E. McKenney
b3fb11f7e9 srcu: Move ->lock from srcu_struct to srcu_usage
This commit moves the ->lock field from the srcu_struct structure to
the srcu_usage structure to reduce the size of the former in order to
improve cache locality.

Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:36:07 -07:00
Paul E. McKenney
0839ade94b srcu: Move ->lock initialization after srcu_usage allocation
Currently, both __init_srcu_struct() in CONFIG_DEBUG_LOCK_ALLOC=y kernels
and init_srcu_struct() in CONFIG_DEBUG_LOCK_ALLOC=n kernel initialize
the srcu_struct structure's ->lock before the srcu_usage structure has
been allocated.  This of course prevents the ->lock from being moved
to the srcu_usage structure, so this commit moves the initialization
into the init_srcu_struct_fields() after the srcu_usage structure has
been allocated.

Cc: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:36:01 -07:00
Paul E. McKenney
574dc1a7ef srcu: Move ->srcu_cb_mutex from srcu_struct to srcu_usage
This commit moves the ->srcu_cb_mutex field from the srcu_struct structure
to the srcu_usage structure to reduce the size of the former in order
to improve cache locality.

Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:35:56 -07:00
Paul E. McKenney
a0d8cbd382 srcu: Move ->srcu_size_state from srcu_struct to srcu_usage
This commit moves the ->srcu_size_state field from the srcu_struct
structure to the srcu_usage structure to reduce the size of the former
in order to improve cache locality.

Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:35:50 -07:00
Paul E. McKenney
208f41b131 srcu: Move ->level from srcu_struct to srcu_usage
This commit moves the ->level[] array from the srcu_struct structure to
the srcu_usage structure to reduce the size of the former in order to
improve cache locality.

Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:35:44 -07:00
Paul E. McKenney
95433f7263 srcu: Begin offloading srcu_struct fields to srcu_update
The current srcu_struct structure is on the order of 200 bytes in size
(depending on architecture and .config), which is much better than the
old-style 26K bytes, but still all too inconvenient when one is trying
to achieve good cache locality on a fastpath involving SRCU readers.

However, only a few fields in srcu_struct are used by SRCU readers.
The remaining fields could be offloaded to a new srcu_update
structure, thus shrinking the srcu_struct structure down to a few
tens of bytes.  This commit begins this noble quest, a quest that is
complicated by open-coded initialization of the srcu_struct within the
srcu_notifier_head structure.  This complication is addressed by updating
the srcu_notifier_head structure's open coding, given that there does
not appear to be a straightforward way of abstracting that initialization.

This commit moves only the ->node pointer to srcu_update.  Later commits
will move additional fields.

[ paulmck: Fold in qiang1.zhang@intel.com's memory-leak fix. ]

Link: https://lore.kernel.org/all/20230320055751.4120251-1-qiang1.zhang@intel.com/
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:35:34 -07:00
Paul E. McKenney
f4d01a2593 srcu: Use static init for statically allocated in-module srcu_struct
Further shrinking the srcu_struct structure is eased by requiring
that in-module srcu_struct structures rely more heavily on static
initialization.  In particular, this preserves the property that
a module-load-time srcu_struct initialization can fail only due
to memory-allocation failure of the per-CPU srcu_data structures.
It might also slightly improve robustness by keeping the number of memory
allocations that must succeed down percpu_alloc() call.

This is in preparation for splitting an srcu_usage structure out
of the srcu_struct structure.

[ paulmck: Fold in qiang1.zhang@intel.com feedback. ]

Cc: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-04 08:35:28 -07:00
Paul E. McKenney
dafc4d1603 srcu: Update comment after the index flip
Because there is not guaranteed to be a full memory barrier between
the ->srcu_unlock_count increment of an srcu_read_unlock() and the
->srcu_lock_count increment of the next srcu_read_lock(), this next
srcu_read_lock() is not guaranteed to see the effect of the index flip
just prior to this comment.  However, this next srcu_read_lock() will
execute a full memory barrier, so the srcu_read_lock() after that is
guaranteed to see that index flip.

This guarantee is illustrated by the following diagram of events and
the litmus test following that.

------------------------------------------------------------------------

READER                  UPDATER
-------------           ----------
                           // idx is initially 0.

                           srcu_flip() {
                              smp_mb();
// RSCS

srcu_read_unlock() {
  smp_mb();
                              idx++;    // P
                              smp_mb(); // QQ
                           }

                           srcu_readers_unlock_idx(0) {
        ,--counted------------ count all unlock[0]; // Q
        |
  unlock[0]++;  // X

}
                               smp_mb();
srcu_read_lock() {
  READ(idx) = 0;         ,---- count all lock[0]; // contributes imbalance of 1.
  lock[0]++;  ----counted              |
  smp_mb(); // PP          }           |
}                                      |
                                       |
// RSCS                             not going to effect above scan
                                       |
srcu_read_unlock() {                   |
  smp_mb();                            |
  unlock[0]++;                         |
}                                      |
                                      /
                                     /
srcu_read_lock() {                  |
  READ(idx);  // Y  -----cannot be counted because of P (has to sample idx as 1)
  lock[1]++;
  ...
}

------------------------------------------------------------------------

This makes it similar to the store buffer pattern. Using X, Y, P and Q
annotated above, we get:

------------------------------------------------------------------------

READER                    UPDATER
X (write)                 P (write)

smp_mb(); //PP            smp_mb(); //QQ

Y (read)                  Q (read)

------------------------------------------------------------------------

ASCII art courtesy of Joel Fernandes.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03 17:49:23 -08:00
Paul E. McKenney
0cd4b50b12 srcu: Yet more detail for srcu_readers_active_idx_check() comments
The comment in srcu_readers_active_idx_check() following the smp_mb()
is out of date, hailing from a simpler time when preemption was disabled
across the bulk of __srcu_read_lock().  The fact that preemption was
disabled meant that the number of tasks that had fetched the old index
but not yet incremented counters was limited by the number of CPUs.

In our more complex modern times, the number of CPUs is no longer a limit.
This commit therefore updates this comment, additionally giving more
memory-ordering detail.

[ paulmck: Apply Nt->Nc feedback from Joel Fernandes. ]

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Reported-by: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Reported-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03 17:49:23 -08:00
Pingfan Liu
1bafbfb3e1 srcu: Remove needless rcu_seq_done() check while holding read lock
The srcu_gp_start_if_needed() function now read-holds the srcu_struct
whose grace period is being started, which means that the corresponding
SRCU grace period cannot end.  This in turn means that the SRCU
grace-period sequence number returned by rcu_seq_snap() cannot expire
during this time.  And that means that the calls to rcu_seq_done() in
srcu_funnel_exp_start() and srcu_funnel_gp_start() can never return true.

This commit therefore removes these rcu_seq_done() checks, but adds checks
in kernels built with CONFIG_PROVE_RCU=y that splats if rcu_seq_done()
does somehow return true.

[ paulmck: Rearrange checks to handle kernels built with lockdep. ]

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rcu@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03 17:49:23 -08:00
Pingfan Liu
50be0c0439 srcu: Fix the comparision in srcu_invl_snp_seq()
A grace-period sequence number contains two fields: counter and
state.  SRCU_SNP_INIT_SEQ provides a guaranteed invalid value for
grace-period sequence numbers in newly allocated srcu_node structures'
->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp fields.  The point of the
comparison in srcu_invl_snp_seq() is not to detect invalid grace-period
sequence numbers in general, but rather to detect a newly allocated
srcu_node structure whose ->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp
fields need to be brought into line with the srcu_struct structure's
->srcu_gp_seq field.

This commit therefore causes srcu_invl_snp_seq() to compare both fields
of the specified grace-period sequence number.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: <rcu@vger.kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03 17:49:22 -08:00
Pingfan Liu
7f24626d6d srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALL
Commit 994f706872 ("srcu: Make Tree SRCU able to operate without
snp_node array") assumes that cpu 0 is always online.  However, there
really are situations when some other CPU is the boot CPU, for example,
when booting a kdump kernel with the maxcpus=1 boot parameter.

On PowerPC, the kdump kernel can hang as follows:
...
[    1.740036] systemd[1]: Hostname set to <xyz.com>
[  243.686240] INFO: task systemd:1 blocked for more than 122 seconds.
[  243.686264]       Not tainted 6.1.0-rc1 #1
[  243.686272] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.686281] task:systemd         state:D stack:0     pid:1     ppid:0      flags:0x00042000
[  243.686296] Call Trace:
[  243.686301] [c000000016657640] [c000000016657670] 0xc000000016657670 (unreliable)
[  243.686317] [c000000016657830] [c00000001001dec0] __switch_to+0x130/0x220
[  243.686333] [c000000016657890] [c000000010f607b8] __schedule+0x1f8/0x580
[  243.686347] [c000000016657940] [c000000010f60bb4] schedule+0x74/0x140
[  243.686361] [c0000000166579b0] [c000000010f699b8] schedule_timeout+0x168/0x1c0
[  243.686374] [c000000016657a80] [c000000010f61de8] __wait_for_common+0x148/0x360
[  243.686387] [c000000016657b20] [c000000010176bb0] __flush_work.isra.0+0x1c0/0x3d0
[  243.686401] [c000000016657bb0] [c0000000105f2768] fsnotify_wait_marks_destroyed+0x28/0x40
[  243.686415] [c000000016657bd0] [c0000000105f21b8] fsnotify_destroy_group+0x68/0x160
[  243.686428] [c000000016657c40] [c0000000105f6500] inotify_release+0x30/0xa0
[  243.686440] [c000000016657cb0] [c0000000105751a8] __fput+0xc8/0x350
[  243.686452] [c000000016657d00] [c00000001017d524] task_work_run+0xe4/0x170
[  243.686464] [c000000016657d50] [c000000010020e94] do_notify_resume+0x134/0x140
[  243.686478] [c000000016657d80] [c00000001002eb18] interrupt_exit_user_prepare_main+0x198/0x270
[  243.686493] [c000000016657de0] [c00000001002ec60] syscall_exit_prepare+0x70/0x180
[  243.686505] [c000000016657e10] [c00000001000bf7c] system_call_vectored_common+0xfc/0x280
[  243.686520] --- interrupt: 3000 at 0x7fffa47d5ba4
[  243.686528] NIP:  00007fffa47d5ba4 LR: 0000000000000000 CTR: 0000000000000000
[  243.686538] REGS: c000000016657e80 TRAP: 3000   Not tainted  (6.1.0-rc1)
[  243.686548] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 42044440  XER: 00000000
[  243.686572] IRQMASK: 0
[  243.686572] GPR00: 0000000000000006 00007ffffa606710 00007fffa48e7200 0000000000000000
[  243.686572] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000001
[  243.686572] GPR08: 000001000c172dd0 0000000000000000 0000000000000000 0000000000000000
[  243.686572] GPR12: 0000000000000000 00007fffa4ff4bc0 0000000000000000 0000000000000000
[  243.686572] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  243.686572] GPR20: 0000000132dfdc50 000000000000000e 0000000000189375 0000000000000000
[  243.686572] GPR24: 00007ffffa606ae0 0000000000000005 000001000c185490 000001000c172570
[  243.686572] GPR28: 000001000c172990 000001000c184850 000001000c172e00 00007fffa4fedd98
[  243.686683] NIP [00007fffa47d5ba4] 0x7fffa47d5ba4
[  243.686691] LR [0000000000000000] 0x0
[  243.686698] --- interrupt: 3000
[  243.686708] INFO: task kworker/u16:1:24 blocked for more than 122 seconds.
[  243.686717]       Not tainted 6.1.0-rc1 #1
[  243.686724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.686733] task:kworker/u16:1   state:D stack:0     pid:24    ppid:2      flags:0x00000800
[  243.686747] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[  243.686758] Call Trace:
[  243.686762] [c0000000166736e0] [c00000004fd91000] 0xc00000004fd91000 (unreliable)
[  243.686775] [c0000000166738d0] [c00000001001dec0] __switch_to+0x130/0x220
[  243.686788] [c000000016673930] [c000000010f607b8] __schedule+0x1f8/0x580
[  243.686801] [c0000000166739e0] [c000000010f60bb4] schedule+0x74/0x140
[  243.686814] [c000000016673a50] [c000000010f699b8] schedule_timeout+0x168/0x1c0
[  243.686827] [c000000016673b20] [c000000010f61de8] __wait_for_common+0x148/0x360
[  243.686840] [c000000016673bc0] [c000000010210840] __synchronize_srcu.part.0+0xa0/0xe0
[  243.686855] [c000000016673c30] [c0000000105f2c64] fsnotify_mark_destroy_workfn+0xc4/0x1a0
[  243.686868] [c000000016673ca0] [c000000010174ea8] process_one_work+0x2a8/0x570
[  243.686882] [c000000016673d40] [c000000010175208] worker_thread+0x98/0x5e0
[  243.686895] [c000000016673dc0] [c0000000101828d4] kthread+0x124/0x130
[  243.686908] [c000000016673e10] [c00000001000cd40] ret_from_kernel_thread+0x5c/0x64
[  366.566274] INFO: task systemd:1 blocked for more than 245 seconds.
[  366.566298]       Not tainted 6.1.0-rc1 #1
[  366.566305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  366.566314] task:systemd         state:D stack:0     pid:1     ppid:0      flags:0x00042000
[  366.566329] Call Trace:
...

The above splat occurs because PowerPC really does use maxcpus=1
instead of nr_cpus=1 in the kernel command line.  Consequently, the
(quite possibly non-zero) kdump CPU is the only online CPU in the kdump
kernel.  SRCU unconditionally queues a sdp->work on cpu 0, for which no
worker thread has been created, so sdp->work will be never executed and
__synchronize_srcu() will never be completed.

This commit therefore replaces CPU ID 0 with get_boot_cpu_id() in key
places in Tree SRCU.  Since the CPU indicated by get_boot_cpu_id()
is guaranteed to be online, this avoids the above splat.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rcu@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03 17:49:22 -08:00
Frederic Weisbecker
e29a4915db srcu: Debug NMI safety even on archs that don't require it
Currently the NMI safety debugging is only performed on architectures
that don't support NMI-safe this_cpu_inc().

Reorder the code so that other architectures like x86 also detect bad
uses.

[ paulmck: Apply kernel test robot, Stephen Rothwell, and Zqiang feedback. ]

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-21 10:44:11 -07:00
Frederic Weisbecker
ae3c070616 srcu: Explain the reason behind the read side critical section on GP start
Tell about the need to protect against concurrent updaters who may
overflow the GP counter behind the current update.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-21 10:16:15 -07:00
Frederic Weisbecker
6b77bb9b99 srcu: Warn when NMI-unsafe API is used in NMI
Using the NMI-unsafe reader API from within an NMI handler is very likely
to be buggy for three reasons:

1) NMIs aren't strictly re-entrant (a pending nested NMI will execute at
   the end of the current one) so it should be fine to use a non-atomic
   increment here. However, breakpoints can still interrupt NMIs and if
   a breakpoint callback has a reader on that same ssp, a racy increment
   can happen.

2) If the only reader site for a given srcu_struct structure is in an
   NMI handler, then RCU should be used instead of SRCU.

3) Because of the previous reason (2), an srcu_struct structure having
   an SRCU read side critical section in an NMI handler is likely to
   have another one from a task context.

For all these reasons, warn if an NMI-unsafe reader API is used from an
NMI handler.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-21 10:15:53 -07:00
Paul E. McKenney
36f65f1d15 srcu: Check for consistent global per-srcu_struct NMI safety
This commit adds runtime checks to verify that a given srcu_struct uses
consistent NMI-safe (or not) read-side primitives globally, but based
on the per-CPU data.  These global checks are made by the grace-period
code that must scan the srcu_data structures anyway, and are done only
in kernels built with CONFIG_PROVE_RCU=y.

Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
2022-10-20 15:02:27 -07:00
Paul E. McKenney
27120e7d2c srcu: Check for consistent per-CPU per-srcu_struct NMI safety
This commit adds runtime checks to verify that a given srcu_struct uses
consistent NMI-safe (or not) read-side primitives on a per-CPU basis.

Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
2022-10-20 15:02:15 -07:00
Paul E. McKenney
2e83b879fb srcu: Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe()
On strict load-store architectures, the use of this_cpu_inc() by
srcu_read_lock() and srcu_read_unlock() is not NMI-safe in TREE SRCU.
To see this suppose that an NMI arrives in the middle of srcu_read_lock(),
just after it has read ->srcu_lock_count, but before it has written
the incremented value back to memory.  If that NMI handler also does
srcu_read_lock() and srcu_read_lock() on that same srcu_struct structure,
then upon return from that NMI handler, the interrupted srcu_read_lock()
will overwrite the NMI handler's update to ->srcu_lock_count, but
leave unchanged the NMI handler's update by srcu_read_unlock() to
->srcu_unlock_count.

This can result in a too-short SRCU grace period, which can in turn
result in arbitrary memory corruption.

If the NMI handler instead interrupts the srcu_read_unlock(), this
can result in eternal SRCU grace periods, which is not much better.

This commit therefore creates a pair of new srcu_read_lock_nmisafe()
and srcu_read_unlock_nmisafe() functions, which allow SRCU readers in
both NMI handlers and in process and IRQ context.  It is bad practice
to mix the existing and the new _nmisafe() primitives on the same
srcu_struct structure.  Use one set or the other, not both.

Just to underline that "bad practice" point, using srcu_read_lock() at
process level and srcu_read_lock_nmisafe() in your NMI handler will not,
repeat NOT, work.  If you do not immediately understand why this is the
case, please review the earlier paragraphs in this commit log.

[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Apply feedback from Randy Dunlap. ]
[ paulmck: Apply feedback from John Ogness. ]
[ paulmck: Apply feedback from Frederic Weisbecker. ]

Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
2022-10-20 14:39:18 -07:00
Paul E. McKenney
5d0f5953b6 srcu: Convert ->srcu_lock_count and ->srcu_unlock_count to atomic
NMI-safe variants of srcu_read_lock() and srcu_read_unlock() are needed
by printk(), which on many architectures entails read-modify-write
atomic operations.  This commit prepares Tree SRCU for this change by
making both ->srcu_lock_count and ->srcu_unlock_count by atomic_long_t.

[ paulmck: Apply feedback from John Ogness. ]

Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
2022-10-18 11:24:24 -07:00
Neeraj Upadhyay
4f2bfd9494 srcu: Make expedited RCU grace periods block even less frequently
The purpose of commit 282d8998e9 ("srcu: Prevent expedited GPs
and blocking readers from consuming CPU") was to prevent a long
series of never-blocking expedited SRCU grace periods from blocking
kernel-live-patching (KLP) progress.  Although it was successful, it also
resulted in excessive boot times on certain embedded workloads running
under qemu with the "-bios QEMU_EFI.fd" command line.  Here "excessive"
means increasing the boot time up into the three-to-four minute range.
This increase in boot time was due to the more than 6000 back-to-back
invocations of synchronize_rcu_expedited() within the KVM host OS, which
in turn resulted from qemu's emulation of a long series of MMIO accesses.

Commit 640a7d37c3 ("srcu: Block less aggressively for expedited grace
periods") did not significantly help this particular use case.

Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the
value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values
of non-sleeping per phase counts on a system with preemption enabled,
and observed the following boot times:

+──────────────────────────+────────────────+
| SRCU_MAX_NODELAY_PHASE   | Boot time (s)  |
+──────────────────────────+────────────────+
| 100                      | 30.053         |
| 150                      | 25.151         |
| 200                      | 20.704         |
| 250                      | 15.748         |
| 500                      | 11.401         |
| 1000                     | 11.443         |
| 10000                    | 11.258         |
| 1000000                  | 11.154         |
+──────────────────────────+────────────────+

Analysis on the experiment results show additional improvements with
CPU-bound delays approaching one jiffy in duration. This improvement was
also seen when number of per-phase iterations were scaled to one jiffy.

This commit therefore scales per-grace-period phase number of non-sleeping
polls so that non-sleeping polls extend for about one jiffy. In addition,
the delay-calculation call to srcu_get_delay() in srcu_gp_end() is
replaced with a simple check for an expedited grace period.  This change
schedules callback invocation immediately after expedited grace periods
complete, which results in greatly improved boot times.  Testing done
by Marc and Zhangfei confirms that this change recovers most of the
performance degradation in boottime; for CONFIG_HZ_250 configuration,
specifically, boot times improve from 3m50s to 41s on Marc's setup;
and from 2m40s to ~9.7s on Zhangfei's setup.

In addition to the changes to default per phase delays, this
change adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay.
This allows users to configure the srcu grace period scanning delays in
order to more quickly react to additional use cases.

Fixes: 640a7d37c3 ("srcu: Block less aggressively for expedited grace periods")
Fixes: 282d8998e9 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reported-by: yueluck <yueluck@163.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-07-19 11:39:59 -07:00
Paul E. McKenney
8f870e6eb8 srcu: Block less aggressively for expedited grace periods
Commit 282d8998e9 ("srcu: Prevent expedited GPs and blocking readers
from consuming CPU") fixed a problem where a long-running expedited SRCU
grace period could block kernel live patching.  It did so by giving up
on expediting once a given SRCU expedited grace period grew too old.

Unfortunately, this added excessive delays to boots of virtual embedded
systems specifying "-bios QEMU_EFI.fd" to qemu.  This commit therefore
makes the transition away from expediting less aggressive, increasing
the per-grace-period phase number of non-sleeping polls of readers from
one to three and increasing the required grace-period age from one jiffy
(actually from zero to one jiffies) to two jiffies (actually from one
to two jiffies).

Fixes: 282d8998e9 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com>
Cc: Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
2022-07-19 11:39:59 -07:00
Lukas Bulwahn
586e31d59c srcu: Drop needless initialization of sdp in srcu_gp_start()
Commit 9c7ef4c30f12 ("srcu: Make Tree SRCU able to operate without
snp_node array") initializes the local variable sdp differently depending
on the srcu's state in srcu_gp_start().  Either way, this initialization
overwrites the value used when sdp is defined.

This commit therefore drops this pointless definition-time initialization.
Although there is no functional change, compiler code generation may
be affected.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03 10:20:57 -07:00
Paul E. McKenney
282d8998e9 srcu: Prevent expedited GPs and blocking readers from consuming CPU
If an SRCU reader blocks while a synchronize_srcu_expedited() waits for
that same reader, then that grace period will spawn an endless series of
workqueue handlers, consuming a full CPU.  This quickly gets pointless
because consuming more CPU isn't going to make that reader get done
faster, especially if it is blocked waiting for an external event.

This commit therefore spawns at most one pair of back-to-back workqueue
handlers per expedited grace period phase, instead inserting increasing
delays as that grace period phase grows older, but capped at 10 jiffies.
In any case, if there have been at least 100 back-to-back workqueue
handlers within a single jiffy, regardless of grace period or grace-period
phase, then a one-jiffy delay is inserted.

[ paulmck:  Apply feedback from kernel test robot. ]

Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Song Liu <song@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03 10:20:57 -07:00
Paul E. McKenney
c2445d3878 srcu: Add contention check to call_srcu() srcu_data ->lock acquisition
This commit increases the sensitivity of contention detection by adding
checks to the acquisition of the srcu_data structure's lock on the
call_srcu() code path.

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03 10:20:57 -07:00
Paul E. McKenney
a57ffb3c6b srcu: Automatically determine size-transition strategy at boot
This commit adds a srcutree.convert_to_big option of zero that causes
SRCU to decide at boot whether to wait for contention (small systems) or
immediately expand to large (large systems).  A new srcutree.big_cpu_lim
(defaulting to 128) defines how many CPUs constitute a large system.

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03 10:19:39 -07:00
Paul E. McKenney
9f2e91d94c srcu: Add contention-triggered addition of srcu_node tree
This commit instruments the acquisitions of the srcu_struct structure's
->lock, enabling the initiation of a transition from SRCU_SIZE_SMALL
to SRCU_SIZE_BIG when sufficient contention is experienced.  The
instrumentation counts the number of trylock failures within the confines
of a single jiffy.  If that number exceeds the value specified by the
srcutree.small_contention_lim kernel boot parameter (which defaults to
100), and if the value specified by the srcutree.convert_to_big kernel
boot parameter has the 0x10 bit set (defaults to 0), then a transition
will be automatically initiated.

By default, there will never be any transitions, so that none of the
srcu_struct structures ever gains an srcu_node array.

The useful values for srcutree.convert_to_big are:

0x00:  Never convert.
0x01:  Always convert at init_srcu_struct() time.
0x02:  Convert when rcutorture prints its first round of statistics.
0x03:  Decide conversion approach at boot given system size.
0x10:  Convert if contention is encountered.
0x12:  Convert if contention is encountered or when rcutorture prints
        its first round of statistics, whichever comes first.

The value 0x11 acts the same as 0x01 because the conversion happens
before there is any chance of contention.

[ paulmck: Apply "static" feedback from kernel test robot. ]

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:52:30 -07:00
Paul E. McKenney
99659f64b1 srcu: Create concurrency-safe helper for initiating size transition
Once there are contention-initiated size transitions, it will be
possible for rcutorture to initiate a transition at the same time
as a contention-initiated transition.  This commit therefore creates
a concurrency-safe helper function named srcu_transition_to_big() to
safely initiate size transitions.

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:52:30 -07:00
Paul E. McKenney
ee5e2448bc srcu: Explain srcu_funnel_gp_start() call to list_add() is safe
This commit adds a comment explaining why an unprotected call to
list_add() from srcu_funnel_gp_start() can be safe.  TL;DR: It is only
called during very early boot when we don't have no steeking concurrency!

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:52:30 -07:00
Paul E. McKenney
46470cf85d srcu: Prevent cleanup_srcu_struct() from freeing non-dynamic ->sda
When an srcu_struct structure is created (but not in a kernel module)
by DEFINE_SRCU() and friends, the per-CPU srcu_data structure is
statically allocated.  In all other cases, that structure is obtained
from alloc_percpu(), in which case cleanup_srcu_struct() must invoke
free_percpu() on the resulting ->sda pointer in the srcu_struct pointer.

Which it does.

Except that it also invokes free_percpu() on the ->sda pointer
referencing the statically allocated per-CPU srcu_data structures.
Which free_percpu() is surprisingly OK with.

This commit nevertheless stops cleanup_srcu_struct() from freeing
statically allocated per-CPU srcu_data structures.

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:52:30 -07:00
Paul E. McKenney
4a230f8046 srcu: Avoid NULL dereference in srcu_torture_stats_print()
You really shouldn't invoke srcu_torture_stats_print() after invoking
cleanup_srcu_struct(), but there is really no reason to get a
compiler-obfuscated per-CPU-variable NULL pointer dereference as the
diagnostic.  This commit therefore checks for NULL ->sda and makes a
more polite console-message complaint in that case.

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:52:30 -07:00
Paul E. McKenney
c69a00a12e srcu: Add boot-time control over srcu_node array allocation
This commit adds an srcu_tree.convert_to_big kernel parameter that either
refuses to convert at all (0), converts immediately at init_srcu_struct()
time (1), or lets rcutorture convert it (2).  An addition contention-based
dynamic conversion choice will be added, along with documentation.

[ paulmck: Apply callback-scanning feedback from Neeraj Upadhyay. ]

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:51:08 -07:00
Neeraj Upadhyay
0b56f95390 srcu: Ensure snp nodes tree is fully initialized before traversal
For configurations where snp node tree is not initialized at
init time (added in subsequent commits), srcu_funnel_gp_start()
and srcu_funnel_exp_start() can potential traverse and observe
the snp nodes' transient (uninitialized) states. This can potentially
happen, when init_srcu_struct_nodes() initialization of sdp->mynode
races with srcu_funnel_gp_start() and srcu_funnel_exp_start()

Consider the case below where srcu_funnel_gp_start() observes
sdp->mynode to be not NULL and uses an uninitialized sdp->grpmask

          P1                                  P2

init_srcu_struct_nodes()           void srcu_funnel_gp_start(...)
{
for_each_possible_cpu(cpu) {
  ...
  sdp->mynode = &snp_first[...];
  for (snp = sdp->mynode;...)       struct srcu_node *snp_leaf =
                                       smp_load_acquire(&sdp->mynode)
    ...                             if (snp_leaf) {
                                      for (snp = snp_leaf; ...)
                                        ...
					if (snp == snp_leaf)
                                         snp->srcu_data_have_cbs[idx] |=
                                           sdp->grpmask;
    sdp->grpmask =
      1 << (cpu - sdp->mynode->grplo);
  }
}

Similarly, init_srcu_struct_nodes() and srcu_funnel_exp_start() can
race, where srcu_funnel_exp_start() could observe state of snp lock
before spin_lock_init().

          P1                                      P2

init_srcu_struct_nodes()               void srcu_funnel_exp_start(...)
{
  srcu_for_each_node_breadth_first(ssp, snp) {      for (; ...) {
                                                      spin_lock_...(snp, )
	spin_lock_init(&ACCESS_PRIVATE(snp, lock));
    ...
  }
  for_each_possible_cpu(cpu) {
    ...
    sdp->mynode = &snp_first[...];

To avoid these issues, ensure that snp node tree initialization is
complete i.e. after SRCU_SIZE_WAIT_BARRIER srcu_size_state is reached,
before traversing the tree. Given that srcu_funnel_gp_start() and
srcu_funnel_exp_start() are called within SRCU read side critical
sections, this check is safe, in the sense that all callbacks are
enqueued on CPU0 srcu_cblist until SRCU_SIZE_WAIT_CALL is entered,
and these read side critical sections (containing srcu_funnel_gp_start()
and srcu_funnel_exp_start()) need to complete, before SRCU_SIZE_WAIT_CALL
is reached.

Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:31:03 -07:00
Paul E. McKenney
cbdc98e93e srcu: Use invalid initial value for srcu_node GP sequence numbers
Currently, tree SRCU relies on the srcu_node structures being initialized
at the same time that the srcu_struct itself is initialized, and thus
use the initial grace-period sequence number as the initial value for
the srcu_node structure's ->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp
fields.  Although this has a high probability of also working when the
srcu_node array is allocated and initialized at some random later time,
it would be better to avoid leaving such things to chance.

This commit therefore initializes these fields with 0x2, which is a
recognizable invalid value.  It then adds the required checks for this
invalid value in order to avoid confusion on long-running kernels
(especially those on 32-bit systems) that allocate and initialize
srcu_node arrays late in life.

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:31:03 -07:00
Paul E. McKenney
aeb9b39b8f srcu: Compute snp_seq earlier in srcu_funnel_gp_start()
Currently, srcu_funnel_gp_start() tests snp->srcu_have_cbs[idx] and then
separately assigns it to the snp_seq local variable.  This commit does
the assignment earlier to simplify the code a bit.  While in the area,
this commit also takes advantage of the 100-character line limit to put
the call to srcu_schedule_cbs_sdp() on a single line.

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:31:02 -07:00
Paul E. McKenney
3bedebcf63 srcu: Make rcutorture dump the SRCU size state
This commit adds the numeric and string version of ->srcu_size_state to
the Tree-SRCU-specific portion of the rcutorture output.

[ paulmck: Apply feedback from kernel test robot and Dan Carpenter. ]
[ quic_neeraju: Apply feedback from Jiapeng Chong. ]

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:31:02 -07:00
Paul E. McKenney
e2f638365d srcu: Add size-state transitioning code
This is just dead code at the moment, and will be used once
the state-transition code is activated.

Because srcu_barrier() must be aware of transition before call_srcu(), the
state machine waits for an SRCU grace period before callbacks are queued
to the non-CPU-0 queues.  This requres that portions of srcu_barrier()
be enclosed in an SRCU read-side critical section.

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:31:02 -07:00
Paul E. McKenney
2ec303113d srcu: Dynamically allocate srcu_node array
This commit shrinks the srcu_struct structure by converting its ->node
field from a fixed-size compile-time array to a pointer to a dynamically
allocated array.  In kernels built with large values of NR_CPUS that boot
on systems with smaller numbers of CPUs, this can save significant memory.

[ paulmck: Apply kernel test robot feedback. ]

Reported-by: A cast of thousands
Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:31:02 -07:00
Paul E. McKenney
994f706872 srcu: Make Tree SRCU able to operate without snp_node array
This commit makes Tree SRCU able to operate without an snp_node
array, that is, when the srcu_data structures' ->mynode pointers
are NULL.  This can result in high contention on the srcu_struct
structure's ->lock, but only when there are lots of call_srcu(),
synchronize_srcu(), and synchronize_srcu_expedited() calls.

Note that when there is no snp_node array, all SRCU callbacks use
CPU 0's callback queue.  This is optimal in the common case of low
update-side load because it removes the need to search each CPU
for the single callback that made the grace period happen.

Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:31:02 -07:00
Paul E. McKenney
7b9e9b5856 srcu: Make srcu_funnel_gp_start() cache ->mynode in snp_leaf
Currently, the srcu_funnel_gp_start() walks its local variable snp up the
tree and reloads sdp->mynode whenever it is necessary to check whether
it is still at the leaf srcu_node level.  This works, but is a bit more
obtuse than absolutely necessary.  In addition, upcoming commits will
dynamically size srcu_struct structures, in which case sdp->mynode will
no longer necessarily be a constant, and this commit helps prepare for
that dynamic sizing.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:31:02 -07:00
Paul E. McKenney
8ed0076020 srcu: Tighten cleanup_srcu_struct() GP checks
Currently, cleanup_srcu_struct() checks for a grace period in progress,
but it does not check for a grace period that has not yet started but
which might start at any time.  Such a situation could result in a
use-after-free bug, so this commit adds a check for a grace period that
is needed but not yet started to cleanup_srcu_struct().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 15:31:01 -07:00
Paul E. McKenney
641faf1b90 Merge branches 'bitmaprange.2021.05.10c', 'doc.2021.05.10c', 'fixes.2021.05.13a', 'kvfree_rcu.2021.05.10c', 'mmdumpobj.2021.05.10c', 'nocb.2021.05.12a', 'srcu.2021.05.12a', 'tasks.2021.05.18a' and 'torture.2021.05.10c' into HEAD
bitmaprange.2021.05.10c: Allow "all" for bitmap ranges.
doc.2021.05.10c: Documentation updates.
fixes.2021.05.13a: Miscellaneous fixes.
kvfree_rcu.2021.05.10c: kvfree_rcu() updates.
mmdumpobj.2021.05.10c: mem_dump_obj() updates.
nocb.2021.05.12a: RCU NOCB CPU updates, including limited deoffloading.
srcu.2021.05.12a: SRCU updates.
tasks.2021.05.18a: Tasks-RCU updates.
torture.2021.05.10c: Torture-test updates.
2021-05-18 10:56:19 -07:00
Ingo Molnar
a616aec9aa rcu: Fix various typos in comments
Fix ~12 single-word typos in RCU code comments.

[ paulmck: Apply feedback from Randy Dunlap. ]
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12 12:11:05 -07:00
Paul E. McKenney
3d3a0d1b50 rcu: Point to documentation of ordering guarantees
Add comments to synchronize_rcu() and friends that point to
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:22:54 -07:00
Frederic Weisbecker
b5befe842e srcu: Fix broken node geometry after early ssp init
An srcu_struct structure that is initialized before rcu_init_geometry()
will have its srcu_node hierarchy based on CONFIG_NR_CPUS.  Once
rcu_init_geometry() is called, this hierarchy is compressed as needed
for the actual maximum number of CPUs for this system.

Later on, that srcu_struct structure is confused, sometimes referring
to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead
to the new num_possible_cpus() hierarchy.  For example, each of its
->mynode fields continues to reference the original leaf rcu_node
structures, some of which might no longer exist.  On the other hand,
srcu_for_each_node_breadth_first() traverses to the new node hierarchy.

There are at least two bad possible outcomes to this:

1) a) A callback enqueued early on an srcu_data structure (call it
      *sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in
      srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf
      (say 3 levels).

   b) The grace period ends after rcu_init_geometry() shrinks the
      nodes level to a single one.  srcu_gp_end() walks through the new
      srcu_node hierarchy without ever reaching the old leaves so the
      callback is never executed.

   This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32
   and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests
   verification never completes and the boot hangs:

	[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds.
	[ 5413.147564]       Not tainted 5.12.0-rc4+ #28
	[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	[ 5413.159753] task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00004000
	[ 5413.168099] Call Trace:
	[ 5413.170555]  __schedule+0x36c/0x930
	[ 5413.174057]  ? wait_for_completion+0x88/0x110
	[ 5413.178423]  schedule+0x46/0xf0
	[ 5413.181575]  schedule_timeout+0x284/0x380
	[ 5413.185591]  ? wait_for_completion+0x88/0x110
	[ 5413.189957]  ? mark_held_locks+0x61/0x80
	[ 5413.193882]  ? mark_held_locks+0x61/0x80
	[ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
	[ 5413.202173]  ? wait_for_completion+0x88/0x110
	[ 5413.206535]  wait_for_completion+0xb4/0x110
	[ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
	[ 5413.215610]  srcu_barrier+0x187/0x200
	[ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
	[ 5413.224244]  ? rdinit_setup+0x2b/0x2b
	[ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
	[ 5413.232700]  do_one_initcall+0x63/0x310
	[ 5413.236541]  ? rdinit_setup+0x2b/0x2b
	[ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
	[ 5413.244912]  kernel_init_freeable+0x253/0x28f
	[ 5413.249273]  ? rest_init+0x250/0x250
	[ 5413.252846]  kernel_init+0xa/0x110
	[ 5413.256257]  ret_from_fork+0x22/0x30

2) An srcu_struct structure that is initialized before rcu_init_geometry()
   and used afterward will always have stale rdp->mynode references,
   resulting in callbacks to be missed in srcu_gp_end(), just like in
   the previous scenario.

This commit therefore causes init_srcu_struct_nodes to initialize the
geometry, if needed.  This ensures that the srcu_node hierarchy is
properly built and distributed from the get-go.

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:03:35 -07:00
Frederic Weisbecker
8e9c01c717 srcu: Initialize SRCU after timers
Once srcu_init() is called, the SRCU core will make use of delayed
workqueues, which rely on timers.  However init_timers() is called
several steps after rcu_init().  This means that a call_srcu() after
rcu_init() but before init_timers() would find itself within a dangerously
uninitialized timer core.

This commit therefore creates a separate call to srcu_init() after
init_timer() completes, which ensures that we stay in early SRCU mode
until timers are safe(r).

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:03:35 -07:00
Frederic Weisbecker
c75e9d2915 srcu: Remove superfluous ssp initialization for early callbacks
Pre-srcu_init() invocations of call_srcu() initialize the srcu_struct
structure in question, so there is no need to check this initialization
in srcu_init() when initiating grace periods for srcu_struct structures
that had early call_srcu() invocations.  This commit therefore drops
the calls to check_init_srcu_struct() in srcu_init().

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:03:34 -07:00