linux/net/smc
D. Wythe e99fe4137b net/smc: Prevent smc_release() from long blocking
[ Upstream commit 5c15b3123f ]

In nginx/wrk benchmark, there's a hung problem with high probability
on case likes that: (client will last several minutes to exit)

server: smc_run nginx

client: smc_run wrk -c 10000 -t 1 http://server

Client hangs with the following backtrace:

0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c

This issue dues to flush_work(), which is used to wait for
smc_connect_work() to finish in smc_release(). Once lots of
smc_connect_work() was pending or all executing work dangling,
smc_release() has to block until one worker comes to free, which
is equivalent to wait another smc_connnect_work() to finish.

In order to fix this, There are two changes:

1. For those idle smc_connect_work(), cancel it from the workqueue; for
   executing smc_connect_work(), waiting for it to finish. For that
   purpose, replace flush_work() with cancel_work_sync().

2. Since smc_connect() hold a reference for passive closing, if
   smc_connect_work() has been cancelled, release the reference.

Fixes: 24ac3a08e6 ("net/smc: rebuild nonblocking connect")
Reported-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22 09:32:44 +01:00
..
af_smc.c net/smc: Prevent smc_release() from long blocking 2021-12-22 09:32:44 +01:00
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Makefile net/smc: Add SMC statistics support 2021-06-16 12:54:02 -07:00
smc_cdc.c net/smc: improved fix wait on already cleared link 2021-10-08 17:00:16 +01:00
smc_cdc.h net/smc: pre-fetch send buffer outside of send_lock 2020-05-30 18:12:25 -07:00
smc_clc.c net/smc: add missing error check in smc_clc_prfx_set() 2021-09-21 10:54:16 +01:00
smc_clc.h net/smc: Add support for obtaining system information 2020-12-01 17:56:13 -08:00
smc_close.c net/smc: Keep smc_close_final rc during active close 2021-12-08 09:04:50 +01:00
smc_close.h net/smc: remove close abort worker 2019-10-22 11:23:44 -07:00
smc_core.c net/smc: fix wrong list_del in smc_lgr_cleanup_early 2021-12-08 09:04:49 +01:00
smc_core.h net/smc: Correct smc link connection counter in case of smc client 2021-08-09 10:46:59 +01:00
smc_diag.c net/smc: Introduce SMCR get link command 2020-12-01 17:56:13 -08:00
smc_ib.c net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
smc_ib.h net/smc: Add support for obtaining SMCR device list 2020-12-01 17:56:13 -08:00
smc_ism.c net/smc: no need to flush smcd_dev's event_wq before destroying it 2021-06-03 13:54:49 -07:00
smc_ism.h net/smc: Add support for obtaining SMCD device list 2020-12-01 17:56:13 -08:00
smc_llc.c net/smc: Fix smc_link->llc_testlink_time overflow 2021-10-28 13:04:28 +01:00
smc_llc.h net/smc: move add link processing for new device into llc layer 2020-07-19 15:30:22 -07:00
smc_netlink.c net/smc: Add netlink support for SMC fallback statistics 2021-06-16 12:54:02 -07:00
smc_netlink.h net/smc: Add netlink support for SMC fallback statistics 2021-06-16 12:54:02 -07:00
smc_netns.h net/smc: introduce list of pnetids for Ethernet devices 2020-09-28 15:19:03 -07:00
smc_pnet.c net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
smc_pnet.h net/smc: determine proposed ISM devices 2020-09-28 15:19:03 -07:00
smc_rx.c net/smc: Make SMC statistics network namespace aware 2021-06-16 12:54:02 -07:00
smc_rx.h smc: add support for splice() 2018-05-04 11:45:06 -04:00
smc_stats.c net/smc: Fix ENODATA tests in smc_nl_get_fback_stats() 2021-06-21 12:16:58 -07:00
smc_stats.h net/smc: Make SMC statistics network namespace aware 2021-06-16 12:54:02 -07:00
smc_tx.c net/smc: improved fix wait on already cleared link 2021-10-08 17:00:16 +01:00
smc_tx.h net/smc: eliminate cursor read and write calls 2018-07-23 10:57:14 -07:00
smc_wr.c net/smc: fix wait on already cleared link 2021-08-09 10:46:59 +01:00
smc_wr.h net/smc: improved fix wait on already cleared link 2021-10-08 17:00:16 +01:00
smc.h net/smc: introduce CLC first contact extension 2020-09-28 15:19:03 -07:00