linux/drivers/block/drbd
Philipp Reisner b8853dbd8c drbd: fix race between disconnect and receive_state
If the asender thread, or request_timer_fn(), or some other part of
the code, decided to drop the connection (because of timeout or other),
but the receiver just now was processing a P_STATE packet, there was a
chance that receive_state() would do a hard state change
"re-establishing" an already failed connection without additional handshake.

Log excerpt:
  Remote failed to finish a request within ko-count * timeout
  peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
  asender terminated
  ...
  peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 )
  ...
  Connection closed
  peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 )
  receiver terminated

Impact:
while the connection state is erroneously "Connected",
requests may be queued and even sent,
which would never be acknowledged,
and may have been missed by the cleanup.
These requests would never be completed.

The next drbd_suspend_io() will then lock up,
waiting forever for these requests to complete.

Fixed in several code paths:
  Make sure the connection state is NetworkFailure or worse
  before starting the cleanup in drbd_disconnect().
  This should make sure the cleanup won't miss any requests.

  Disallow receive_state() to "upgrade" the connection state
  from an error state. This will make sure the "illegal" state
  transition won't happen.

  For all connection failure states,
  relax the safe-guard in sanitize_state() again
  to silently mask out those state changes
  (e.g. Timeout -> Connected becomes Timeout -> Timeout).

 Note by Philipp Reisner:
  The 3rd chunk described as "relax the safe-guard..."
  is not there in 8.4 as it is relaxed to the maximum in
  8.4 already

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08 16:58:12 +01:00
..
drbd_actlog.c drbd: fix potential spinlock deadlock 2012-11-08 16:58:09 +01:00
drbd_bitmap.c drbd: fix bitmap writeout after aborted resync 2012-11-08 16:58:04 +01:00
drbd_int.h drbd: Load balancing of read requests 2012-11-08 16:58:10 +01:00
drbd_interval.c drbd: Iterate over all overlapping intervals in a tree 2011-10-14 16:47:37 +02:00
drbd_interval.h drbd: Iterate over all overlapping intervals in a tree 2011-10-14 16:47:37 +02:00
drbd_main.c drbd: Move list of epochs from mdev to tconn 2012-11-08 16:58:08 +01:00
drbd_nl.c drbd: Move write_ordering from mdev to tconn 2012-11-08 16:58:07 +01:00
drbd_nla.c drbd: Split off netlink mandatory attribute handling into separate file 2012-11-08 16:57:45 +01:00
drbd_nla.h drbd: Split off netlink mandatory attribute handling into separate file 2012-11-08 16:57:45 +01:00
drbd_proc.c drbd: Move list of epochs from mdev to tconn 2012-11-08 16:58:08 +01:00
drbd_receiver.c drbd: fix race between disconnect and receive_state 2012-11-08 16:58:12 +01:00
drbd_req.c drbd: Do not call generic_make_request() while holding req_lock 2012-11-08 16:58:11 +01:00
drbd_req.h drbd: Get rid of MR_{READ,WRITE}_SHIFT 2012-11-08 16:58:00 +01:00
drbd_state.c drbd: Fixes from the drbd-8.3 branch 2012-11-08 16:58:06 +01:00
drbd_state.h drbd: Improved logging of state changes 2012-11-08 16:45:06 +01:00
drbd_strings.c drbd: Allow volumes to become primary only on one side 2012-11-04 00:16:31 +01:00
drbd_vli.h Fix common misspellings 2011-03-31 11:26:23 -03:00
drbd_worker.c drbd: Fixed an obvious copy-n-paste mistake 2012-11-08 16:58:06 +01:00
drbd_wrappers.h drbd: Split off netlink mandatory attribute handling into separate file 2012-11-08 16:57:45 +01:00
Kconfig drbd: Kconfig fix 2009-12-29 17:38:28 +01:00
Makefile drbd: Split off netlink mandatory attribute handling into separate file 2012-11-08 16:57:45 +01:00