linux/kernel/rcu
Michal Hocko 093590c16b rcu: Back off upon fill_page_cache_func() allocation failure
The fill_page_cache_func() function allocates couple of pages to store
kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY)
allocation which can fail under memory pressure. The function will,
however keep retrying even when the previous attempt has failed.

This retrying is in theory correct, but in practice the allocation is
invoked from workqueue context, which means that if the memory reclaim
gets stuck, these retries can hog the worker for quite some time.
Although the workqueues subsystem automatically adjusts concurrency, such
adjustment is not guaranteed to happen until the worker context sleeps.
And the fill_page_cache_func() function's retry loop is not guaranteed
to sleep (see the should_reclaim_retry() function).

And we have seen this function cause workqueue lockups:

kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s!
[...]
kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146
kernel:   pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5
kernel:     in-flight: 1917:fill_page_cache_func
kernel:     pending: dbs_work_handler, free_work, kfree_rcu_monitor

Originally, we thought that the root cause of this lockup was several
retries with direct reclaim, but this is not yet confirmed.  Furthermore,
we have seen similar lockups without any heavy memory pressure.  This
suggests that there are other factors contributing to these lockups.
However, it is not really clear that endless retries are desireable.

So let's make the fill_page_cache_func() function back off after
allocation failure.

Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31 05:06:50 -07:00
..
Kconfig Merge branch 'ctxt.2022.07.05a' into HEAD 2022-07-21 17:46:18 -07:00
Kconfig.debug Char / Misc driver changes for 6.0-rc1 2022-08-04 11:05:48 -07:00
Makefile rcuperf: Change rcuperf to rcuscale 2020-08-24 18:39:24 -07:00
rcu_segcblist.c rcu: Clarify fill-the-gap comment in rcu_segcblist_advance() 2022-04-11 17:28:48 -07:00
rcu_segcblist.h rcu: Mark writes to the rcu_segcblist structure's ->flags field 2022-02-14 10:36:58 -08:00
rcu.h Merge branch 'ctxt.2022.07.05a' into HEAD 2022-07-21 17:46:18 -07:00
rcuscale.c rcuscale: Fix smp_processor_id()-in-preemptible warnings 2022-06-21 15:57:04 -07:00
rcutorture.c Merge branches 'doc.2022.06.21a', 'fixes.2022.07.19a', 'nocb.2022.07.19a', 'poll.2022.07.21a', 'rcu-tasks.2022.06.21a' and 'torture.2022.06.21a' into HEAD 2022-07-21 17:43:16 -07:00
refscale.c refscale: Convert test_lock spinlock to raw_spinlock 2022-06-21 15:57:04 -07:00
srcutiny.c srcu: Prevent redundant __srcu_read_unlock() wakeup 2021-11-30 17:28:16 -08:00
srcutree.c srcu: Make expedited RCU grace periods block even less frequently 2022-07-19 11:39:59 -07:00
sync.c rcu_sync: Fix comment to properly reflect rcu_sync_exit() behavior 2022-04-20 16:51:11 -07:00
tasks.h rcu-tasks: Use delayed_work to delay rcu_tasks_verify_self_tests() 2022-06-21 15:49:38 -07:00
tiny.c Merge branches 'doc.2022.06.21a', 'fixes.2022.07.19a', 'nocb.2022.07.19a', 'poll.2022.07.21a', 'rcu-tasks.2022.06.21a' and 'torture.2022.06.21a' into HEAD 2022-07-21 17:43:16 -07:00
tree_exp.h Merge branch 'ctxt.2022.07.05a' into HEAD 2022-07-21 17:46:18 -07:00
tree_nocb.h rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty 2022-07-19 11:43:55 -07:00
tree_plugin.h Merge branch 'ctxt.2022.07.05a' into HEAD 2022-07-21 17:46:18 -07:00
tree_stall.h RCU pull request for v5.20 (or whatever) 2022-08-02 19:12:45 -07:00
tree.c rcu: Back off upon fill_page_cache_func() allocation failure 2022-08-31 05:06:50 -07:00
tree.h Merge branch 'ctxt.2022.07.05a' into HEAD 2022-07-21 17:46:18 -07:00
update.c Merge branch 'ctxt.2022.07.05a' into HEAD 2022-07-21 17:46:18 -07:00