linux/net/tipc
Jon Paul Maloy f1d048f24e tipc: fix socket timer deadlock
We sometimes observe a 'deadly embrace' type deadlock occurring
between mutually connected sockets on the same node. This happens
when the one-hour peer supervision timers happen to expire
simultaneously in both sockets.

The scenario is as follows:

CPU 1:                          CPU 2:
--------                        --------
tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
  lock(sk1.slock)                 lock(sk2.slock)
  msg_create(probe)               msg_create(probe)
  unlock(sk1.slock)               unlock(sk2.slock)
  tipc_node_xmit_skb()            tipc_node_xmit_skb()
    tipc_node_xmit()                tipc_node_xmit()
      tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
        lock(sk2.slock)                 lock((sk1.slock)
        filter_rcv()                    filter_rcv()
          tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
            msg_create(probe_rsp)           msg_create(probe_rsp)
            tipc_sk_respond()               tipc_sk_respond()
              tipc_node_xmit_skb()            tipc_node_xmit_skb()
                tipc_node_xmit()                tipc_node_xmit()
                  tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                    lock((sk1.slock)                lock((sk2.slock)
                    ===> DEADLOCK                   ===> DEADLOCK

Further analysis reveals that there are three different locations in the
socket code where tipc_sk_respond() is called within the context of the
socket lock, with ensuing risk of similar deadlocks.

We now solve this by passing a buffer queue along with all upcalls where
sk_lock.slock may potentially be held. Response or rejected message
buffers are accumulated into this queue instead of being sent out
directly, and only sent once we know we are safely outside the slock
context.

Reported-by: GUNA <gbalasun@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 21:38:10 -07:00
..
addr.c tipc: simplify include dependencies 2015-05-14 12:24:45 -04:00
addr.h tipc: simplify include dependencies 2015-05-14 12:24:45 -04:00
bcast.c tipc: remove pre-allocated message header in link struct 2016-03-06 23:01:20 -05:00
bcast.h tipc: remove pre-allocated message header in link struct 2016-03-06 23:01:20 -05:00
bearer.c tipc: fix suspicious RCU usage 2016-06-15 21:47:23 -07:00
bearer.h tipc: remove remnants of old broadcast code 2016-04-13 17:49:11 -04:00
core.c tipc: redesign connection-level flow control 2016-05-03 15:51:16 -04:00
core.h tipc: make dist queue pernet 2016-04-11 15:22:20 -04:00
discover.c tipc: eliminate buffer leak in bearer layer 2016-04-07 17:00:13 -04:00
discover.h tipc: eliminate buffer leak in bearer layer 2016-04-07 17:00:13 -04:00
eth_media.c tipc: make media address offset a common define 2015-02-27 18:18:48 -05:00
ib_media.c tipc: rename media/msg related definitions 2015-02-27 18:18:48 -05:00
Kconfig tipc: add ip/udp media type 2015-03-05 22:08:42 -05:00
link.c tipc: eliminate uninitialized variable warning 2016-06-15 21:47:23 -07:00
link.h tipc: let first message on link be a state message 2016-04-15 16:09:06 -04:00
Makefile tipc: add ip/udp media type 2015-03-05 22:08:42 -05:00
msg.c tipc: let broadcast packet reception use new link receive function 2015-10-24 06:56:37 -07:00
msg.h tipc: redesign connection-level flow control 2016-05-03 15:51:16 -04:00
name_distr.c tipc: purge deferred updates from dead nodes 2016-04-11 15:22:20 -04:00
name_distr.h tipc: reduce code dependency between binding table and node layer 2015-11-20 14:06:10 -05:00
name_table.c tipc: move netlink policies to netlink.c 2016-03-07 14:56:41 -05:00
name_table.h tipc: convert legacy nl name table dump to nl compat 2015-02-09 13:20:48 -08:00
net.c tipc: move netlink policies to netlink.c 2016-03-07 14:56:41 -05:00
net.h tipc: make tipc node table aware of net namespace 2015-01-12 16:24:32 -05:00
netlink_compat.c tipc: fix an infoleak in tipc_nl_compat_link_dump 2016-06-02 21:32:37 -07:00
netlink.c tipc: move netlink policies to netlink.c 2016-03-07 14:56:41 -05:00
netlink.h tipc: move netlink policies to netlink.c 2016-03-07 14:56:41 -05:00
node.c tipc: eliminate risk of double link_up events 2016-05-12 17:11:27 -04:00
node.h tipc: redesign connection-level flow control 2016-05-03 15:51:16 -04:00
server.c tipc: block BH in TCP callbacks 2016-05-19 11:36:49 -07:00
server.h tipc: fix a race condition leading to subscriber refcnt bug 2016-04-14 16:46:46 -04:00
socket.c tipc: fix socket timer deadlock 2016-06-17 21:38:10 -07:00
socket.h tipc: redesign connection-level flow control 2016-05-03 15:51:16 -04:00
subscr.c tipc: remove an unnecessary NULL check 2016-04-28 16:54:12 -04:00
subscr.h tipc: remove struct tipc_name_seq from struct tipc_subscription 2016-02-06 03:40:43 -05:00
sysctl.c tipc: add name distributor resiliency queue 2014-09-01 17:51:48 -07:00
udp_media.c tipc: make sure IPv6 header fits in skb headroom 2016-03-14 12:23:12 -04:00