linux/drivers
Coly Li 41fe8d088e bcache: avoid oversized read request in cache missing code path
In the cache missing code path of cached device, if a proper location
from the internal B+ tree is matched for a cache miss range, function
cached_dev_cache_miss() will be called in cache_lookup_fn() in the
following code block,
[code block 1]
  526         unsigned int sectors = KEY_INODE(k) == s->iop.inode
  527                 ? min_t(uint64_t, INT_MAX,
  528                         KEY_START(k) - bio->bi_iter.bi_sector)
  529                 : INT_MAX;
  530         int ret = s->d->cache_miss(b, s, bio, sectors);

Here s->d->cache_miss() is the call backfunction pointer initialized as
cached_dev_cache_miss(), the last parameter 'sectors' is an important
hint to calculate the size of read request to backing device of the
missing cache data.

Current calculation in above code block may generate oversized value of
'sectors', which consequently may trigger 2 different potential kernel
panics by BUG() or BUG_ON() as listed below,

1) BUG_ON() inside bch_btree_insert_key(),
[code block 2]
   886         BUG_ON(b->ops->is_extents && !KEY_SIZE(k));
2) BUG() inside biovec_slab(),
[code block 3]
   51         default:
   52                 BUG();
   53                 return NULL;

All the above panics are original from cached_dev_cache_miss() by the
oversized parameter 'sectors'.

Inside cached_dev_cache_miss(), parameter 'sectors' is used to calculate
the size of data read from backing device for the cache missing. This
size is stored in s->insert_bio_sectors by the following lines of code,
[code block 4]
  909    s->insert_bio_sectors = min(sectors, bio_sectors(bio) + reada);

Then the actual key inserting to the internal B+ tree is generated and
stored in s->iop.replace_key by the following lines of code,
[code block 5]
  911   s->iop.replace_key = KEY(s->iop.inode,
  912                    bio->bi_iter.bi_sector + s->insert_bio_sectors,
  913                    s->insert_bio_sectors);
The oversized parameter 'sectors' may trigger panic 1) by BUG_ON() from
the above code block.

And the bio sending to backing device for the missing data is allocated
with hint from s->insert_bio_sectors by the following lines of code,
[code block 6]
  926    cache_bio = bio_alloc_bioset(GFP_NOWAIT,
  927                 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS),
  928                 &dc->disk.bio_split);
The oversized parameter 'sectors' may trigger panic 2) by BUG() from the
agove code block.

Now let me explain how the panics happen with the oversized 'sectors'.
In code block 5, replace_key is generated by macro KEY(). From the
definition of macro KEY(),
[code block 7]
  71 #define KEY(inode, offset, size)                                  \
  72 ((struct bkey) {                                                  \
  73      .high = (1ULL << 63) | ((__u64) (size) << 20) | (inode),     \
  74      .low = (offset)                                              \
  75 })

Here 'size' is 16bits width embedded in 64bits member 'high' of struct
bkey. But in code block 1, if "KEY_START(k) - bio->bi_iter.bi_sector" is
very probably to be larger than (1<<16) - 1, which makes the bkey size
calculation in code block 5 is overflowed. In one bug report the value
of parameter 'sectors' is 131072 (= 1 << 17), the overflowed 'sectors'
results the overflowed s->insert_bio_sectors in code block 4, then makes
size field of s->iop.replace_key to be 0 in code block 5. Then the 0-
sized s->iop.replace_key is inserted into the internal B+ tree as cache
missing check key (a special key to detect and avoid a racing between
normal write request and cache missing read request) as,
[code block 8]
  915   ret = bch_btree_insert_check_key(b, &s->op, &s->iop.replace_key);

Then the 0-sized s->iop.replace_key as 3rd parameter triggers the bkey
size check BUG_ON() in code block 2, and causes the kernel panic 1).

Another kernel panic is from code block 6, is by the bvecs number
oversized value s->insert_bio_sectors from code block 4,
        min(sectors, bio_sectors(bio) + reada)
There are two possibility for oversized reresult,
- bio_sectors(bio) is valid, but bio_sectors(bio) + reada is oversized.
- sectors < bio_sectors(bio) + reada, but sectors is oversized.

From a bug report the result of "DIV_ROUND_UP(s->insert_bio_sectors,
PAGE_SECTORS)" from code block 6 can be 344, 282, 946, 342 and many
other values which larther than BIO_MAX_VECS (a.k.a 256). When calling
bio_alloc_bioset() with such larger-than-256 value as the 2nd parameter,
this value will eventually be sent to biovec_slab() as parameter
'nr_vecs' in following code path,
   bio_alloc_bioset() ==> bvec_alloc() ==> biovec_slab()
Because parameter 'nr_vecs' is larger-than-256 value, the panic by BUG()
in code block 3 is triggered inside biovec_slab().

From the above analysis, we know that the 4th parameter 'sector' sent
into cached_dev_cache_miss() may cause overflow in code block 5 and 6,
and finally cause kernel panic in code block 2 and 3. And if result of
bio_sectors(bio) + reada exceeds valid bvecs number, it may also trigger
kernel panic in code block 3 from code block 6.

Now the almost-useless readahead size for cache missing request back to
backing device is removed, this patch can fix the oversized issue with
more simpler method.
- add a local variable size_limit,  set it by the minimum value from
  the max bkey size and max bio bvecs number.
- set s->insert_bio_sectors by the minimum value from size_limit,
  sectors, and the sectors size of bio.
- replace sectors by s->insert_bio_sectors to do bio_next_split.

By the above method with size_limit, s->insert_bio_sectors will never
result oversized replace_key size or bio bvecs number. And split bio
'miss' from bio_next_split() will always match the size of 'cache_bio',
that is the current maximum bio size we can sent to backing device for
fetching the cache missing data.

Current problmatic code can be partially found since Linux v3.13-rc1,
therefore all maintained stable kernels should try to apply this fix.

Reported-by: Alexander Ullrich <ealex1979@gmail.com>
Reported-by: Diego Ercolani <diego.ercolani@gmail.com>
Reported-by: Jan Szubiak <jan.szubiak@linuxpolska.pl>
Reported-by: Marco Rebhan <me@dblsaiko.net>
Reported-by: Matthias Ferdinand <bcache@mfedv.net>
Reported-by: Victor Westerhuis <victor@westerhu.is>
Reported-by: Vojtech Pavlik <vojtech@suse.cz>
Reported-and-tested-by: Rolf Fokkens <rolf@rolffokkens.nl>
Reported-and-tested-by: Thorsten Knabe <linux@thorsten-knabe.de>
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Takashi Iwai <tiwai@suse.com>
Link: https://lore.kernel.org/r/20210607125052.21277-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-08 15:06:03 -06:00
..
accessibility TTY/Serial driver updates for 5.13-rc1 2021-04-26 11:20:10 -07:00
acpi CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
amba
android selinux/stable-5.13 PR 20210426 2021-04-27 13:42:11 -07:00
ata SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
atm The usual updates from the irq departement: 2021-04-26 09:43:16 -07:00
auxdisplay
base regmap: Updates for v5.13 2021-04-26 16:21:16 -07:00
bcma
block nbd: share nbd_put and return by goto put_nbd 2021-05-12 08:42:43 -06:00
bluetooth TTY/Serial driver updates for 5.13-rc1 2021-04-26 11:20:10 -07:00
bus ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
cdrom gdrom: fix compilation error 2021-04-11 19:32:06 -06:00
char A bunch of little cleanups 2021-04-28 15:54:57 -07:00
clk Here's a collection of largely clk driver updates for the merge window. The 2021-04-28 17:13:56 -07:00
clocksource ARM: platform support for Apple M1 2021-04-26 12:30:36 -07:00
comedi staging: comedi: move out of staging directory 2021-04-15 09:26:25 +02:00
connector
counter
cpufreq Power management updates for 5.13-rc1 2021-04-26 15:10:25 -07:00
cpuidle Merge back earlier cpuidle updates for v5.13. 2021-04-08 20:05:49 +02:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-04-26 08:51:23 -07:00
cxl cxl/mem: Fix memory device capacity probing 2021-04-16 18:21:56 -07:00
dax
dca
devfreq PM / devfreq: imx8m-ddrc: Remove unneeded of_match_ptr() 2021-04-08 13:14:51 +09:00
dio
dma ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
dma-buf drm/syncobj: use newly allocated stub fences 2021-04-08 12:21:13 +02:00
edac
eisa
extcon - Core Frameworks 2021-04-28 15:59:13 -07:00
firewire The usual updates from the irq departement: 2021-04-26 09:43:16 -07:00
firmware - removed get_fs/set_fs 2021-04-29 11:28:08 -07:00
fpga ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
fsi
gnss
gpio - Core Frameworks 2021-04-28 15:59:13 -07:00
gpu VFIO updates for v5.13-rc1 2021-04-28 17:19:47 -07:00
greybus greybus: es2: fix kernel-doc warnings 2021-04-16 07:26:50 +02:00
hid
hsi HSI: core: fix resource leaks in hsi_add_client_from_dt() 2021-04-16 00:14:49 +02:00
hv printk changes for 5.13 2021-04-27 18:09:44 -07:00
hwmon ACPI updates for 5.13-rc1 2021-04-26 15:03:23 -07:00
hwspinlock
hwtracing coresight: etm-perf: Fix define build issue when built as module 2021-04-16 09:34:57 +02:00
i2c - Core Frameworks 2021-04-28 15:59:13 -07:00
i3c
ide
idle intel_idle: add Iclelake-D support 2021-04-08 19:18:07 +02:00
iio spi: Updates for v5.13 2021-04-26 16:32:11 -07:00
infiniband RDMA/rtrs: fix uninitialized symbol 'cnt' 2021-05-03 11:00:11 -06:00
input - Core Frameworks 2021-04-28 15:59:13 -07:00
interconnect CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
iommu
ipack
irqchip - removed get_fs/set_fs 2021-04-29 11:28:08 -07:00
isdn
leds treewide: change my e-mail address, fix my name 2021-04-09 14:54:23 -07:00
lightnvm lightnvm: deprecated OCSSD support and schedule it for removal in Linux 5.15 2021-04-13 09:16:12 -06:00
macintosh
mailbox - qcom: enable support for SM8350 and SC7280 2021-04-28 16:10:33 -07:00
mcb
md bcache: avoid oversized read request in cache missing code path 2021-06-08 15:06:03 -06:00
media drm for 5.13-rc1 2021-04-28 10:01:40 -07:00
memory Power management updates for 5.13-rc1 2021-04-26 15:10:25 -07:00
memstick memstick: r592: ignore kfifo_out() return code again 2021-04-26 11:08:23 +02:00
message scsi: message: fusion: Remove unused local variable 'vtarget' 2021-04-13 01:39:12 -04:00
mfd - Core Frameworks 2021-04-28 15:59:13 -07:00
misc CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
mmc MMC core: 2021-04-28 15:56:51 -07:00
most Staging/IIO driver updates for 5.13-rc1 2021-04-26 11:14:21 -07:00
mtd printk changes for 5.13 2021-04-27 18:09:44 -07:00
mux
net Locking changes for this cycle were: 2021-04-28 12:37:53 -07:00
nfc
ntb
nubus
nvdimm libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNC 2021-04-09 21:56:01 -07:00
nvme nvmet: fix freeing unallocated p2pmem 2021-06-02 10:10:38 +03:00
nvmem
of Devicetree updates for v5.13: 2021-04-28 15:50:24 -07:00
opp
parisc
parport
pci s390 updates for 5.13 merge window 2021-04-27 17:54:15 -07:00
pcmcia
perf
phy phy: Revert "phy: ti: j721e-wiz: add missing of_node_put" 2021-04-16 07:27:37 +02:00
pinctrl ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
platform USB/Thunderbolt patches for 5.13-rc1 2021-04-26 11:32:23 -07:00
pnp
power power supply and reset changes for the v5.13 series 2021-04-28 15:43:58 -07:00
powercap
pps TTY/Serial driver updates for 5.13-rc1 2021-04-26 11:20:10 -07:00
ps3
ptp
pwm - Core Frameworks 2021-04-28 15:59:13 -07:00
rapidio
ras
regulator - Core Frameworks 2021-04-28 15:59:13 -07:00
remoteproc
reset ARM SCMI updates for v5.13 2021-04-08 17:38:20 +02:00
rpmsg
rtc - Core Frameworks 2021-04-28 15:59:13 -07:00
s390 s390/dasd: add missing discipline function 2021-05-25 12:54:00 -06:00
sbus
scsi SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
sh The usual updates from the irq departement: 2021-04-26 09:43:16 -07:00
siox
slimbus
soc ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
soundwire
spi CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
spmi
ssb
staging Here's a collection of largely clk driver updates for the merge window. The 2021-04-28 17:13:56 -07:00
target SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
tc
tee ARM: SoC drivers for v5.13 2021-04-26 12:11:52 -07:00
thermal
thunderbolt thunderbolt: Changes for v5.13 merge window 2021-04-13 12:17:14 +02:00
tty Power management updates for 5.13-rc1 2021-04-26 15:10:25 -07:00
uio
usb SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
vdpa vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails 2021-04-22 18:15:31 -04:00
vfio VFIO updates for v5.13-rc1 2021-04-28 17:19:47 -07:00
vhost SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
video - New Device Support 2021-04-28 16:02:58 -07:00
virt
virtio
visorbus
vlynq
vme
w1 w1: ds28e17: Use module_w1_family to simplify the code 2021-04-10 10:58:21 +02:00
watchdog - Core Frameworks 2021-04-28 15:59:13 -07:00
xen SCSI misc on 20210428 2021-04-28 17:22:10 -07:00
zorro
Kconfig staging: comedi: move out of staging directory 2021-04-15 09:26:25 +02:00
Makefile staging: comedi: move out of staging directory 2021-04-15 09:26:25 +02:00