linux/drivers
Coly Li 3fd47bfe55 bcache: stop dc->writeback_rate_update properly
struct delayed_work writeback_rate_update in struct cache_dev is a delayed
worker to call function update_writeback_rate() in period (the interval is
defined by dc->writeback_rate_update_seconds).

When a metadate I/O error happens on cache device, bcache error handling
routine bch_cache_set_error() will call bch_cache_set_unregister() to
retire whole cache set. On the unregister code path, this delayed work is
stopped by calling cancel_delayed_work_sync(&dc->writeback_rate_update).

dc->writeback_rate_update is a special delayed work from others in bcache.
In its routine update_writeback_rate(), this delayed work is re-armed
itself. That means when cancel_delayed_work_sync() returns, this delayed
work can still be executed after several seconds defined by
dc->writeback_rate_update_seconds.

The problem is, after cancel_delayed_work_sync() returns, the cache set
unregister code path will continue and release memory of struct cache set.
Then the delayed work is scheduled to run, __update_writeback_rate()
will reference the already released cache_set memory, and trigger a NULL
pointer deference fault.

This patch introduces two more bcache device flags,
- BCACHE_DEV_WB_RUNNING
  bit set:  bcache device is in writeback mode and running, it is OK for
            dc->writeback_rate_update to re-arm itself.
  bit clear:bcache device is trying to stop dc->writeback_rate_update,
            this delayed work should not re-arm itself and quit.
- BCACHE_DEV_RATE_DW_RUNNING
  bit set:  routine update_writeback_rate() is executing.
  bit clear: routine update_writeback_rate() quits.

This patch also adds a function cancel_writeback_rate_update_dwork() to
wait for dc->writeback_rate_update quits before cancel it by calling
cancel_delayed_work_sync(). In order to avoid a deadlock by unexpected
quit dc->writeback_rate_update, after time_out seconds this function will
give up and continue to call cancel_delayed_work_sync().

And here I explain how this patch stops self re-armed delayed work properly
with the above stuffs.

update_writeback_rate() sets BCACHE_DEV_RATE_DW_RUNNING at its beginning
and clears BCACHE_DEV_RATE_DW_RUNNING at its end. Before calling
cancel_writeback_rate_update_dwork() clear flag BCACHE_DEV_WB_RUNNING.

Before calling cancel_delayed_work_sync() wait utill flag
BCACHE_DEV_RATE_DW_RUNNING is clear. So when calling
cancel_delayed_work_sync(), dc->writeback_rate_update must be already re-
armed, or quite by seeing BCACHE_DEV_WB_RUNNING cleared. In both cases
delayed work routine update_writeback_rate() won't be executed after
cancel_delayed_work_sync() returns.

Inside update_writeback_rate() before calling schedule_delayed_work(), flag
BCACHE_DEV_WB_RUNNING is checked before. If this flag is cleared, it means
someone is about to stop the delayed work. Because flag
BCACHE_DEV_RATE_DW_RUNNING is set already and cancel_delayed_work_sync()
has to wait for this flag to be cleared, we don't need to worry about race
condition here.

If update_writeback_rate() is scheduled to run after checking
BCACHE_DEV_RATE_DW_RUNNING and before calling cancel_delayed_work_sync()
in cancel_writeback_rate_update_dwork(), it is also safe. Because at this
moment BCACHE_DEV_WB_RUNNING is cleared with memory barrier. As I mentioned
previously, update_writeback_rate() will see BCACHE_DEV_WB_RUNNING is clear
and quit immediately.

Because there are more dependences inside update_writeback_rate() to struct
cache_set memory, dc->writeback_rate_update is not a simple self re-arm
delayed work. After trying many different methods (e.g. hold dc->count, or
use locks), this is the only way I can find which works to properly stop
dc->writeback_rate_update delayed work.

Changelog:
v3: change values of BCACHE_DEV_WB_RUNNING and BCACHE_DEV_RATE_DW_RUNNING
    to bit index, for test_bit().
v2: Try to fix the race issue which is pointed out by Junhui.
v1: The initial version for review

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Junhui Tang <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Cc: Michael Lyle <mlyle@lyle.org>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-18 20:15:20 -06:00
..
accessibility
acpi Merge branches 'acpi-ec', 'acpi-tables' and 'acpi-doc' 2018-02-15 12:02:42 +01:00
amba
android ANDROID: binder: synchronize_rcu() when using POLLFREE. 2018-02-16 11:16:38 +01:00
ata pci-v4.16-changes 2018-02-06 09:59:40 -08:00
atm atm: he: use 64-bit arithmetic instead of 32-bit 2018-02-08 15:05:16 -05:00
auxdisplay
base ACPI updates for v4.16-rc2 2018-02-15 14:50:32 -08:00
bcma
block block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> 2018-03-17 14:45:23 -06:00
bluetooth vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
bus ARM: SoC driver updates for 4.16 2018-02-01 16:35:31 -08:00
cdrom cdrom: do not call check_disk_change() inside cdrom_open() 2018-03-09 08:06:35 -07:00
char Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-02-14 17:02:15 -08:00
clk MIPS changes for 4.16 2018-02-07 11:22:44 -08:00
clocksource
connector
cpufreq Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-02-14 17:02:15 -08:00
cpuidle powerpc updates for 4.16 2018-02-02 10:01:04 -08:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-02-22 16:38:10 -08:00
dax Merge branch 'for-4.16/dax' into libnvdimm-for-next 2018-02-03 00:26:10 -07:00
dca
devfreq
dio
dma Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-02-02 09:50:51 -08:00
dma-buf vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
edac Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-02-14 17:02:15 -08:00
eisa
extcon extcon: int3496: process id-pin first so that we start with the right status 2018-02-14 06:37:33 +09:00
firewire vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
firmware 2nd set of arm64 updates for 4.16: 2018-02-08 10:44:25 -08:00
fmc
fpga
fsi
gpio vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
gpu Merge branch 'linux-4.16' of git://github.com/skeggsb/linux into drm-fixes 2018-02-16 14:26:01 +10:00
hid usb: ldusb: add PIDs for new CASSY devices supported by this driver 2018-02-15 18:44:03 +01:00
hsi vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
hv vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
hwmon Fix bad temperature display on Ryzen/Threadripper 2018-02-15 14:31:28 -08:00
hwspinlock
hwtracing Char/Misc driver patches for 4.16-rc1 2018-02-01 10:31:17 -08:00
i2c Merge branch 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2018-02-04 10:57:43 -08:00
ide block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> 2018-03-17 14:45:23 -06:00
idle
iio First round of IIO fixes for the 4.16 cycle. 2018-02-20 10:03:22 +01:00
infiniband RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type 2018-02-21 13:52:19 -05:00
input vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
iommu IOMMU Updates for Linux v4.16 2018-02-08 12:03:54 -08:00
ipack
irqchip irqchip/bcm: Remove hashed address printing 2018-02-16 14:22:16 +00:00
isdn vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
leds vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
lightnvm block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() 2018-03-08 14:13:48 -07:00
macintosh powerpc/macio: set a proper dma_coherent_mask 2018-02-13 08:58:53 -08:00
mailbox vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
mcb
md bcache: stop dc->writeback_rate_update properly 2018-03-18 20:15:20 -06:00
media vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
memory ARM: SoC driver updates for 4.16 2018-02-01 16:35:31 -08:00
memstick
message
mfd vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
misc misc: rtsx: rename SG_END macro 2018-03-01 08:33:05 -07:00
mmc block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() 2018-03-08 14:13:48 -07:00
mtd block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() 2018-03-08 14:13:48 -07:00
mux Char/Misc driver patches for 4.16-rc1 2018-02-01 10:31:17 -08:00
net tg3: APE heartbeat changes 2018-02-19 14:16:52 -05:00
nfc
ntb NTB: ntb_perf: fix cast to restricted __le32 2018-01-28 22:17:24 -05:00
nubus
nvdimm block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> 2018-03-17 14:45:23 -06:00
nvme block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() 2018-03-08 14:13:48 -07:00
nvmem
of device property: Constify device_get_match_data() 2018-02-12 10:41:11 +01:00
opp opp: cpu: Replace GFP_ATOMIC with GFP_KERNEL in dev_pm_opp_init_cpufreq_table 2018-02-12 15:07:46 +05:30
oprofile
parisc
parport
pci PCI/cxgb4: Extend T3 PCI quirk to T4+ devices 2018-02-16 15:41:53 -05:00
pcmcia Merge branch 'pcmcia' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia 2018-02-08 11:48:49 -08:00
perf bitmap: replace bitmap_{from,to}_u32array 2018-02-06 18:32:44 -08:00
phy USB/PHY updates for 4.16-rc1 2018-02-01 09:40:49 -08:00
pinctrl This is the bulk of pin control changes for the v4.16 kernel cycle: 2018-02-02 14:22:53 -08:00
platform platform/x86: dell-laptop: Removed duplicates in DMI whitelist 2018-02-15 12:18:33 +02:00
pnp
power power supply and reset changes for the v4.16 series 2018-01-31 12:55:31 -08:00
powercap
pps vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
ps3
ptp vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
pwm
rapidio vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
ras mm/memory_failure: Remove unused trapno from memory_failure 2018-01-23 12:17:42 -06:00
regulator regulator: Fix suspend to idle 2018-01-30 12:25:59 +00:00
remoteproc remoteproc updates for v4.16 2018-02-05 10:07:40 -08:00
reset
rpmsg vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
rtc vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
s390 bsg-lib: introduce a timeout field in struct bsg_job 2018-03-13 11:40:21 -06:00
sbus pci-v4.16-changes 2018-02-06 09:59:40 -08:00
scsi block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> 2018-03-17 14:45:23 -06:00
sfi
sh cpufreq: Add and use cpufreq_for_each_{valid_,}entry_idx() 2018-02-08 10:21:39 +01:00
siox
slimbus
sn
soc ARM: SoC driver updates for 4.16 2018-02-01 16:35:31 -08:00
soundwire
spi Merge remote-tracking branch 'spi/topic/xilinx' into spi-next 2018-01-26 17:57:34 +00:00
spmi
ssb Merge git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git 2018-02-01 10:37:39 +02:00
staging staging: rts5208: rename SG_END macro 2018-03-01 08:33:06 -07:00
target target/tcm_loop: Use blk_queue_flag_set() 2018-03-08 14:13:48 -07:00
tc
tee
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2018-02-06 15:04:58 -08:00
thunderbolt
tty vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
uio vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
usb USB fixes for 4.16-rc3 2018-02-22 12:13:01 -08:00
uwb
vfio vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
vhost vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
video Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-02-14 17:02:15 -08:00
virt vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
virtio virtio_pci: don't kfree device on register failure 2018-02-01 16:26:45 +02:00
visorbus
vlynq
vme
w1 Documentation updates for 4.16. New stuff includes refcount_t 2018-01-31 19:25:25 -08:00
watchdog linux-watchdog 4.16-rc1 merge window tag 2018-02-07 11:54:34 -08:00
xen mm, swap, frontswap: fix THP swap if frontswap enabled 2018-02-21 15:35:43 -08:00
zorro
Kconfig Char/Misc driver patches for 4.16-rc1 2018-02-01 10:31:17 -08:00
Makefile pci-v4.16-changes 2018-02-06 09:59:40 -08:00