linux/drivers/infiniband/hw
Steve Wise c337374bf2 RDMA/cxgb4: Use completion objects for event blocking
There exists a race condition when using wait_queue_head_t objects
that are declared on the stack.  This was being done in a few places
where we are sending work requests to the FW and awaiting replies, but
we don't have an endpoint structure with an embedded c4iw_wr_wait
struct.  So the code was allocating it locally on the stack.  Bad
design.  The race is:

  1) thread on cpuX declares the wait_queue_head_t on the stack, then
     posts a firmware WR with that wait object ptr as the cookie to be
     returned in the WR reply.  This thread will proceed to block in
     wait_event_timeout() but before it does:

  2) An interrupt runs on cpuY with the WR reply.  fw6_msg() handles
     this and calls c4iw_wake_up().  c4iw_wake_up() sets the condition
     variable in the c4iw_wr_wait object to TRUE and will call
     wake_up(), but before it calls wake_up():

  3) The thread on cpuX calls c4iw_wait_for_reply(), which calls
     wait_event_timeout().  The wait_event_timeout() macro checks the
     condition variable and returns immediately since it is TRUE.  So
     this thread never blocks/sleeps. The function then returns
     effectively deallocating the c4iw_wr_wait object that was on the
     stack.

  4) So at this point cpuY has a pointer to the c4iw_wr_wait object
     that is no longer valid.  Further its pointing to a stack frame
     that might now be in use by some other context/thread.  So cpuY
     continues execution and calls wake_up() on a ptr to a wait object
     that as been effectively deallocated.

This race, when it hits, can cause a crash in wake_up(), which I've
seen under heavy stress. It can also corrupt the referenced stack
which can cause any number of failures.

The fix:

Use struct completion, which supports on-stack declarations.
Completions use a spinlock around setting the condition to true and
the wake up so that steps 2 and 4 above are atomic and step 3 can
never happen in-between.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
2011-05-24 09:47:38 -07:00
..
amso1100 Fix common misspellings 2011-03-31 11:26:23 -03:00
cxgb3 ipv4: Create and use route lookup helpers. 2011-03-12 15:08:42 -08:00
cxgb4 RDMA/cxgb4: Use completion objects for event blocking 2011-05-24 09:47:38 -07:00
ehca RDMA: Use vzalloc() to replace vmalloc()+memset(0) 2011-01-12 11:11:58 -08:00
ipath IB/ipath: Use pci_dev->revision, again 2011-05-09 22:07:31 -07:00
mlx4 mlx4: generalization of multicast steering. 2011-03-23 12:24:21 -07:00
mthca IB: Increase DMA max_segment_size on Mellanox hardware 2011-03-22 09:39:18 -07:00
nes RDMA/iwcm: Get rid of enum iw_cm_event_status 2011-05-09 22:23:57 -07:00
qib IB/qib: Use pci_dev->revision 2011-05-12 08:57:12 -07:00