linux/net
santosh.shilimkar@oracle.com 7b4b000951 RDS: fix rds-ping deadlock over TCP transport
Sowmini found hang with rds-ping while testing RDS over TCP. Its
a corner case and doesn't happen always. The issue is not reproducible
with IB transport. Its clear from below dump why we see it with RDS TCP.

 [<ffffffff8153b7e5>] do_tcp_setsockopt+0xb5/0x740
 [<ffffffff8153bec4>] tcp_setsockopt+0x24/0x30
 [<ffffffff814d57d4>] sock_common_setsockopt+0x14/0x20
 [<ffffffffa096071d>] rds_tcp_xmit_prepare+0x5d/0x70 [rds_tcp]
 [<ffffffffa093b5f7>] rds_send_xmit+0xd7/0x740 [rds]
 [<ffffffffa093bda2>] rds_send_pong+0x142/0x180 [rds]
 [<ffffffffa0939d34>] rds_recv_incoming+0x274/0x330 [rds]
 [<ffffffff810815ae>] ? ttwu_queue+0x11e/0x130
 [<ffffffff814dcacd>] ? skb_copy_bits+0x6d/0x2c0
 [<ffffffffa0960350>] rds_tcp_data_recv+0x2f0/0x3d0 [rds_tcp]
 [<ffffffff8153d836>] tcp_read_sock+0x96/0x1c0
 [<ffffffffa0960060>] ? rds_tcp_recv_init+0x40/0x40 [rds_tcp]
 [<ffffffff814d6a90>] ? sock_def_write_space+0xa0/0xa0
 [<ffffffffa09604d1>] rds_tcp_data_ready+0xa1/0xf0 [rds_tcp]
 [<ffffffff81545249>] tcp_data_queue+0x379/0x5b0
 [<ffffffffa0960cdb>] ? rds_tcp_write_space+0xbb/0x110 [rds_tcp]
 [<ffffffff81547fd2>] tcp_rcv_established+0x2e2/0x6e0
 [<ffffffff81552602>] tcp_v4_do_rcv+0x122/0x220
 [<ffffffff81553627>] tcp_v4_rcv+0x867/0x880
 [<ffffffff8152e0b3>] ip_local_deliver_finish+0xa3/0x220

This happens because rds_send_xmit() chain wants to take
sock_lock which is already taken by tcp_v4_rcv() on its
way to rds_tcp_data_ready(). Commit db6526dcb5 ("RDS: use
rds_send_xmit() state instead of RDS_LL_SEND_FULL") which
was trying to opportunistically finish the send request
in same thread context.

But because of above recursive lock hang with RDS TCP,
the send work from rds_send_pong() needs to deferred to
worker to avoid lock up. Given RDS ping is more of connectivity
test than performance critical path, its should be ok even
for transport like IB.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by:  Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 22:45:55 -07:00
..
6lowpan 6lowpan: move shared settings to lowpan_netdev_setup 2015-10-08 14:25:34 +02:00
9p net/9p: Remove ib_get_dma_mr calls 2015-08-30 18:12:36 -04:00
802
8021q net: 8021q: convert to using IFF_NO_QUEUE 2015-08-18 11:55:06 -07:00
appletalk
atm atm: deal with setting entry before mkip was called 2015-09-17 22:13:32 -07:00
ax25 NET: AX.25: Stop heartbeat timer on disconnect. 2015-07-15 15:59:58 -07:00
batman-adv batman-adv: turn batadv_neigh_node_get() into local function 2015-08-27 20:15:34 +02:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2015-10-11 05:15:30 -07:00
bridge bridge: defer switchdev fdb del call in fdb_del_external_learn 2015-10-15 06:09:50 -07:00
caif net: caif: convert to using IFF_NO_QUEUE 2015-08-18 11:55:07 -07:00
can can: avoid using timeval for uapi 2015-10-13 17:42:34 +02:00
ceph libceph: don't access invalid memory in keepalive2 path 2015-09-17 20:14:15 +03:00
core net: introduce pre-change upper device notifier 2015-10-16 07:15:05 -07:00
dcb net/dcb: make dcbnl.c explicitly non-modular 2015-10-09 07:52:27 -07:00
dccp tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper 2015-10-16 00:52:18 -07:00
decnet dst: Pass net into dst->output 2015-10-08 04:27:03 -07:00
dns_resolver
dsa switchdev: remove pointers from switchdev objects 2015-10-15 06:09:49 -07:00
ethernet net: help compiler generate better code in eth_get_headlen 2015-09-28 22:51:15 -07:00
hsr net: hsr: convert to using IFF_NO_QUEUE 2015-08-18 11:55:07 -07:00
ieee802154 6lowpan: move shared settings to lowpan_netdev_setup 2015-10-08 14:25:34 +02:00
ipv4 tcp: do not set queue_mapping on SYNACK 2015-10-18 22:26:02 -07:00
ipv6 tcp: do not set queue_mapping on SYNACK 2015-10-18 22:26:02 -07:00
ipx
irda
iucv s390/iucv: do not use arrays as argument 2015-09-21 16:03:04 -07:00
key net: Fix RCU splat in af_key 2015-08-24 14:48:10 -07:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-02 07:21:25 -07:00
l3mdev net: Add netif_is_l3_slave 2015-10-07 04:27:43 -07:00
lapb
llc tcp: fix recv with flags MSG_WAITALL | MSG_PEEK 2015-07-27 01:06:53 -07:00
mac80211 For the current cycle, we have the following right now: 2015-10-07 04:29:18 -07:00
mac802154 ieee802154: change mtu size behaviour 2015-09-30 13:21:32 +02:00
mpls dst: Pass net into dst->output 2015-10-08 04:27:03 -07:00
netfilter ipv4: Pass struct net into ip_defrag and ip_check_defrag 2015-10-12 19:44:16 -07:00
netlabel
netlink net/netlink: lockdep_genl_is_held can be boolean 2015-10-09 07:48:59 -07:00
netrom netfilter: Remove spurios included of netfilter.h 2015-06-18 21:14:32 +02:00
nfc nfc: netlink: Add capability to reply to vendor_cmd with data 2015-08-20 22:00:11 +02:00
openvswitch ipv6: Pass struct net into nf_ct_frag6_gather 2015-10-12 19:44:17 -07:00
packet ipv4: Pass struct net into ip_defrag and ip_check_defrag 2015-10-12 19:44:16 -07:00
phonet
rds RDS: fix rds-ping deadlock over TCP transport 2015-10-18 22:45:55 -07:00
rfkill rfkill: Copy "all" global state to other types 2015-09-04 14:26:56 +02:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-24 02:58:51 -07:00
rxrpc rxrpc: Replace get_seconds with ktime_get_seconds 2015-09-20 21:53:56 -07:00
sched net: synack packets can be attached to request sockets 2015-10-11 05:05:06 -07:00
sctp net: sctp: avoid incorrect time_t use 2015-10-05 03:16:48 -07:00
sunrpc Changes for 4.3-rc4 2015-10-01 16:38:52 -04:00
switchdev switchdev: assert rtnl mutex when going over lower netdevs 2015-10-15 06:09:53 -07:00
tipc tipc: update node FSM when peer RESET message is received 2015-10-15 23:55:23 -07:00
unix af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag 2015-09-29 13:47:08 -07:00
vmw_vsock
wimax net:wimax: Fix doucble word "the the" in networking.xml 2015-08-09 22:43:52 -07:00
wireless For the current cycle, we have the following right now: 2015-10-07 04:29:18 -07:00
x25
xfrm dst: Pass net into dst->output 2015-10-08 04:27:03 -07:00
compat.c
Kconfig net: Introduce L3 Master device abstraction 2015-09-29 20:40:32 -07:00
Makefile net: Introduce L3 Master device abstraction 2015-09-29 20:40:32 -07:00
socket.c
sysctl_net.c