2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-22 20:43:56 +08:00
linux-next/block
Gabriel Krisman Bertazi 36e1f3d107 blk-mq: Avoid memory reclaim when remapping queues
While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-14 08:15:25 -07:00
..
partitions block: atari: Return early for unsupported sector size 2016-07-13 09:31:44 -07:00
badblocks.c badblocks: badblocks_set/clear update unacked_exist 2016-10-21 15:45:47 -06:00
bio-integrity.c block: remove bio_is_rw 2016-10-28 08:45:17 -06:00
bio.c block: improve handling of the magic discard payload 2016-12-09 08:30:51 -07:00
blk-cgroup.c block,blkcg: use __GFP_NOWARN for best-effort allocations in blkcg 2016-11-22 08:59:49 -07:00
blk-core.c Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2016-12-13 13:26:24 -08:00
blk-exec.c block: split out request-only flags into a new namespace 2016-10-28 08:45:17 -06:00
blk-flush.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
blk-integrity.c block, libnvdimm, nvme: provide a built-in blk_integrity nop profile 2015-10-21 14:43:45 -06:00
blk-ioc.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
blk-lib.c block: improve handling of the magic discard payload 2016-12-09 08:30:51 -07:00
blk-map.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
blk-merge.c block: improve handling of the magic discard payload 2016-12-09 08:30:51 -07:00
blk-mq-cpumap.c blk-mq: allow the driver to pass in a queue mapping 2016-09-15 08:42:03 -06:00
blk-mq-pci.c blk_mq: linux/blk-mq.h does not include all the headers it depends on 2016-09-19 08:21:51 -06:00
blk-mq-sysfs.c block: add scalable completion tracking of requests 2016-11-10 13:53:26 -07:00
blk-mq-tag.c Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block 2016-10-09 17:29:33 -07:00
blk-mq-tag.h Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block 2016-10-09 17:29:33 -07:00
blk-mq.c blk-mq: Avoid memory reclaim when remapping queues 2016-12-14 08:15:25 -07:00
blk-mq.h blk-mq: abstract out blk_mq_dispatch_rq_list() helper 2016-12-09 09:03:02 -07:00
blk-settings.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
blk-softirq.c This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00
blk-stat.c blk-stat: fix a few cases of missing batch flushing 2016-12-09 13:08:35 -07:00
blk-stat.h block: add scalable completion tracking of requests 2016-11-10 13:53:26 -07:00
blk-sysfs.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
blk-tag.c block: split out request-only flags into a new namespace 2016-10-28 08:45:17 -06:00
blk-throttle.c block: replace REQ_THROTTLED with a bio flag 2016-10-28 08:45:17 -06:00
blk-timeout.c block: remove REQ_NO_TIMEOUT flag 2015-12-22 09:38:34 -07:00
blk-wbt.c blk-wbt: don't throttle discard or write zeroes 2016-12-09 08:29:35 -07:00
blk-wbt.h blk-wbt: allow wbt to be enabled always through sysfs 2016-11-28 10:27:03 -07:00
blk-zoned.c block: zoned: fix harmless maybe-uninitialized warning 2016-10-24 20:51:22 -06:00
blk.h blk-mq: implement hybrid poll mode for sync O_DIRECT 2016-11-17 13:34:51 -07:00
bounce.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2015-09-19 18:57:09 -07:00
bsg-lib.c bsg: Add sparse annotations to bsg_request_fn() 2016-11-14 09:57:03 -07:00
bsg.c block: drop q argument from bsg_validate_sgv4_hdr 2016-11-03 07:56:14 -06:00
cfq-iosched.c blk-wbt: cleanup disable-by-default for CFQ 2016-11-28 10:27:03 -07:00
cmdline-parser.c block: remove unrelated header files and export symbol 2014-01-21 20:18:26 -08:00
compat_ioctl.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
deadline-iosched.c block: do not merge requests without consulting with io scheduler 2016-07-20 21:35:12 -06:00
elevator.c elevator: make the rqhash helpers exported 2016-12-09 09:03:02 -07:00
genhd.c block: fix bdi vs gendisk lifetime mismatch 2016-08-04 14:19:16 -06:00
ioctl.c blk-zoned: implement ioctls 2016-10-18 10:05:42 -06:00
ioprio.c block: fix use-after-free in sys_ioprio_get() 2016-07-01 08:39:24 -06:00
Kconfig block: hook up writeback throttling 2016-11-10 13:53:40 -07:00
Kconfig.iosched blkcg: make CONFIG_BLK_CGROUP bool 2012-03-06 21:27:21 +01:00
Makefile blk-wbt: add general throttling mechanism 2016-11-10 13:53:32 -07:00
noop-iosched.c elevator: use list_{first,prev,next}_entry 2015-11-16 15:21:48 -07:00
partition-generic.c block: Check partition alignment on zoned block devices 2016-12-01 07:56:53 -07:00
scsi_ioctl.c mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM 2015-11-06 17:50:42 -08:00
t10-pi.c block: Consolidate static integrity profile properties 2015-10-21 14:42:38 -06:00