linux/drivers/block
Philipp Reisner b8853dbd8c drbd: fix race between disconnect and receive_state
If the asender thread, or request_timer_fn(), or some other part of
the code, decided to drop the connection (because of timeout or other),
but the receiver just now was processing a P_STATE packet, there was a
chance that receive_state() would do a hard state change
"re-establishing" an already failed connection without additional handshake.

Log excerpt:
  Remote failed to finish a request within ko-count * timeout
  peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
  asender terminated
  ...
  peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 )
  ...
  Connection closed
  peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 )
  receiver terminated

Impact:
while the connection state is erroneously "Connected",
requests may be queued and even sent,
which would never be acknowledged,
and may have been missed by the cleanup.
These requests would never be completed.

The next drbd_suspend_io() will then lock up,
waiting forever for these requests to complete.

Fixed in several code paths:
  Make sure the connection state is NetworkFailure or worse
  before starting the cleanup in drbd_disconnect().
  This should make sure the cleanup won't miss any requests.

  Disallow receive_state() to "upgrade" the connection state
  from an error state. This will make sure the "illegal" state
  transition won't happen.

  For all connection failure states,
  relax the safe-guard in sanitize_state() again
  to silently mask out those state changes
  (e.g. Timeout -> Connected becomes Timeout -> Timeout).

 Note by Philipp Reisner:
  The 3rd chunk described as "relax the safe-guard..."
  is not there in 8.4 as it is relaxed to the maximum in
  8.4 already

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08 16:58:12 +01:00
..
aoe drivers/block/aoe/Makefile: replace the use of <module>-objs with <module>-y 2011-01-19 08:25:02 -07:00
drbd drbd: fix race between disconnect and receive_state 2012-11-08 16:58:12 +01:00
paride block: fix mismerge of the DISK_EVENT_MEDIA_CHANGE removal 2011-06-02 05:29:19 +09:00
xen-blkback Merge branch 'for-3.1/drivers' of git://git.kernel.dk/linux-block 2011-07-25 10:38:18 -07:00
amiflop.c block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers 2011-04-21 21:33:05 +02:00
ataflop.c block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers 2011-04-21 21:33:05 +02:00
brd.c brd: export module parameters 2011-05-26 21:06:50 +02:00
cciss_cmd.h cciss: use new doorbell-bit-5 reset method 2011-05-06 08:23:55 -06:00
cciss_scsi.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
cciss_scsi.h cciss: add cciss_tape_cmds module paramter 2011-05-06 08:23:59 -06:00
cciss.c cciss: add transport mode attribute to sys 2011-08-08 11:40:17 +02:00
cciss.h cciss: Adds simple mode functionality 2011-08-08 11:40:15 +02:00
cpqarray.c block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
cpqarray.h
cryptoloop.c
DAC960.c block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers 2011-04-21 21:33:05 +02:00
DAC960.h
floppy.c Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 2011-05-29 11:18:09 -07:00
hd.c i8253: Create linux/i8253.h and use it in all 8253 related files 2011-06-09 15:01:37 +02:00
ida_cmd.h
ida_ioctl.h
Kconfig xen/blkback: Flesh out the description in the Kconfig. 2011-05-12 17:55:40 -04:00
loop.c loop: always allow userspace partitions and optionally support automatic scanning 2011-08-23 20:12:04 +02:00
Makefile xen/blkback: Move it from drivers/xen to drivers/block 2011-04-18 14:30:26 -04:00
mg_disk.c block: switch s390 tape_block and mg_disk to elevator_change() 2010-08-23 14:02:44 +02:00
nbd.c nbd-replace-some-printk-with-dev_warn-and-dev_info-checkpatch-fixes 2011-08-19 14:48:28 +02:00
osdblk.c block: remove spurious uses of REQ_HARDBARRIER 2010-09-10 12:35:36 +02:00
pktcdvd.c kill useless checks for sb->s_op == NULL 2011-07-20 01:44:21 -04:00
ps3disk.c Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block 2010-10-22 17:07:18 -07:00
ps3vram.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
rbd_types.h rbd: introduce rados block device (rbd), based on libceph 2010-10-20 15:38:13 -07:00
rbd.c rbd: set blk_queue request sizes to object size 2011-07-26 11:29:35 -07:00
smart1,2.h fix typos 'comamnd' -> 'command' in comments 2011-02-02 11:31:21 +01:00
sunvdc.c block: Consolidate phys_segment and hw_segment limits 2010-02-26 13:58:08 +01:00
swim3.c block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers 2011-04-21 21:33:05 +02:00
swim_asm.S m68k: mac - Add SWIM floppy support 2009-03-26 21:15:27 +01:00
swim.c block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers 2011-04-21 21:33:05 +02:00
sx8.c block: Consolidate phys_segment and hw_segment limits 2010-02-26 13:58:08 +01:00
ub.c block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers 2011-04-21 21:33:05 +02:00
umem.c Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core 2011-03-10 08:58:35 +01:00
umem.h
viodasd.c Fix common misspellings 2011-03-31 11:26:23 -03:00
virtio_blk.c drivers, block: virtio_blk: Replace cryptic number with the macro 2011-05-30 11:14:13 +09:30
xd.c block: autoconvert trivial BKL users to private mutex 2010-10-05 15:01:10 +02:00
xd.h [PATCH] switch xd 2008-10-21 07:48:11 -04:00
xen-blkfront.c xen-blkfront: Introduce BLKIF_OP_FLUSH_DISKCACHE support. 2011-05-12 08:56:03 -04:00
xsysace.c dt: remove extra xsysace platform_driver registration 2011-07-14 05:33:52 -06:00
z2ram.c drivers/block/z2ram.c: correct printing of sector_t 2010-10-28 06:15:26 -06:00