linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 20:48:49 +08:00

Author	SHA1	Message	Date
Linus Torvalds	3689f9f8b0	bitmap patches for 5.17-rc1 -----BEGIN PGP SIGNATURE----- iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z 9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7 MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk 5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA== =RKW4 -----END PGP SIGNATURE----- Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first__bit() instead of find_next__bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each__bit_from() with for_each__bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly	2022-01-23 06:20:44 +02:00
Christoph Hellwig	0a4ee51818	mm: remove cleancache Patch series "remove Xen tmem leftovers". Since the removal of the Xen tmem driver in 2019, the cleancache hooks are entirely unused, as are large parts of frontswap. This series against linux-next (with the folio changes included) removes cleancaches, and cuts down frontswap to the bits actually used by zswap. This patch (of 13): The cleancache subsystem is unused since the removal of Xen tmem driver in commit `814bbf49dc` ("xen: remove tmem driver"). [akpm@linux-foundation.org: remove now-unreachable code] Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-22 08:33:38 +02:00
Linus Torvalds	3c7c25038b	block-5.17-2022-01-21 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHqtecQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgph8iD/9nahzCdiPYRE+POHneiZbfaEnBEVFH7cz1 rbEjiAR5EbkLxGZohEkIjbHuZyiF8cP6l8f1D5aEmqiFZfiuib8UOVURk9ZQdEMU lXnOhEuRopQnGGyzSs0yXdx8rZ8xvijmg2UDjwl/VZ4UMgkyD4NjFqNEjdXkmQPP pWWDkg4CQJIJ9jYeIKtfwijfeyi2LMkYniZFuwiYTAf+9Zt8OIrg7LtDkHulhMqk V/c5TSho9p22Hv0q6edQSbWhdm6QZ+MRz71Nsycr9cdvvO1jKoLKlcuXwlhqEB1q BMkwuJI4hhcauqKtwIqNIM+ulNj8HsPqRxP6n9b4RL017dhDLIrbeiOL0qG3PUNi VbC7EGvQIqTNp0zeyeIV3xM9jaBMbh+FpCqtzdT1ZKlPI4jOB89x7lXKpG30ixA2 8nWXOiRE+UxXT96EbP6cLS/ykfvMiPqbVOSXdPl9d78R1j+xQVnBdMQoX2Yp/j1Y qN40Lp2mQgNJjkIiLOZxncx2xSx1/EVTDW1OPEm2Atv/NGxSK5vaN1P+X9DKB3e7 pjpKHhvJuNy6c3yeJs5tyZrBu1zZl1dCMxC3fhK8XNTTWJ3zBiUxicDCsGN7YCwR 5VJ+FbVATrzauBPtT7uQYRFnFePu1RxY5xTCdbg04hgGZmSSIqmJvZSpqp5Nn90s M0NbwyQrLg== =cebW -----END PGP SIGNATURE----- Merge tag 'block-5.17-2022-01-21' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Various little minor fixes that should go into this release: - Fix issue with cloned bios and IO accounting (Christoph) - Remove redundant assignments (Colin, GuoYong) - Fix an issue with the mq-deadline async_depth sysfs interface (me) - Fix brd module loading race (Tetsuo) - Shared tag map wakeup fix (Laibin) - End of bdev read fix (OGAWA) - srcu leak fix (Ming)" * tag 'block-5.17-2022-01-21' of git://git.kernel.dk/linux-block: block: fix async_depth sysfs interface for mq-deadline block: Fix wrong offset in bio_truncate() block: assign bi_bdev for cloned bios in blk_rq_prep_clone block: cleanup q->srcu block: Remove unnecessary variable assignment brd: remove brd_devices_mutex mutex aoe: remove redundant assignment on variable n loop: remove redundant initialization of pointer node blk-mq: fix tag_get wait task can't be awakened	2022-01-21 16:17:03 +02:00
Jens Axboe	46cdc45acb	block: fix async_depth sysfs interface for mq-deadline A previous commit added this feature, but it inadvertently used the wrong variable to show/store the setting from/to, victimized by copy/paste. Fix it up so that the async_depth sysfs interface reads and writes from the right setting. Fixes: `07757588e5` ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215485 Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-20 10:54:02 -07:00
OGAWA Hirofumi	3ee859e384	block: Fix wrong offset in bio_truncate() bio_truncate() clears the buffer outside of last block of bdev, however current bio_truncate() is using the wrong offset of page. So it can return the uninitialized data. This happened when both of truncated/corrupted FS and userspace (via bdev) are trying to read the last of bdev. Reported-by: syzbot+ac94ae5f68b84197f41c@syzkaller.appspotmail.com Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/875yqt1c9g.fsf@mail.parknet.co.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-20 06:30:12 -07:00
Christoph Hellwig	fd9f4e62a3	block: assign bi_bdev for cloned bios in blk_rq_prep_clone bio_clone_fast() sets the cloned bio to have the same ->bi_bdev as the source bio. This means that when request-based dm called setup_clone(), the cloned bio had its ->bi_bdev pointing to the dm device. After Commit `0b6e522cdc` ("blk-mq: use ->bi_bdev for I/O accounting") __blk_account_io_start() started using the request's ->bio->bi_bdev for I/O accounting, if it was set. This caused IO going to the underlying devices to use the dm device for their I/O accounting. Set up the proper ->bi_bdev in blk_rq_prep_clone based on the whole device bdev for the queue the request is cloned onto. Fixes: `0b6e522cdc` ("blk-mq: use ->bi_bdev for I/O accounting") Reported-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [hch: the commit message is mostly from a different patch from Benjamin] Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Link: https://lore.kernel.org/r/20220118070444.1241739-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-18 06:34:05 -07:00
Ming Lei	850fd2abbe	block: cleanup q->srcu srcu structure has to be cleanup via cleanup_srcu_struct(), so fix it. Reported-by: syzbot+4f789823c1abc5accf13@syzkaller.appspotmail.com Fixes: `704b914f15` ("blk-mq: move srcu from blk_mq_hw_ctx to request_queue") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220111123401.520192-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-17 07:24:45 -07:00
GuoYong Zheng	e6a2e5116e	block: Remove unnecessary variable assignment The parameter "ret" should be zero when running to this line, no need to set to zero again, remove it. Signed-off-by: GuoYong Zheng <zhenggy@chinatelecom.cn> Link: https://lore.kernel.org/r/1642414957-6785-1-git-send-email-zhenggy@chinatelecom.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-17 07:24:04 -07:00
Yury Norov	9b51d9d866	cpumask: replace cpumask_next_* with cpumask_first_* where appropriate cpumask_first() is a more effective analogue of 'next' version if n == -1 (which means start == 0). This patch replaces 'next' with 'first' where things look trivial. There's no cpumask_first_zero() function, so create it. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>	2022-01-15 08:47:31 -08:00
Linus Torvalds	e1a7aa25ff	SCSI misc on 20220113 This series consists of the usual driver updates (ufs, pm80xx, lpfc, mpi3mr, mpt3sas, hisi_sas, libsas) and minor updates and bug fixes. The most impactful change is likely the switch from GFP_DMA to GFP_KERNEL in a bunch of drivers, but even that shouldn't affect too many people. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYeCJZyYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishfMnAQCsERG9 V4yX8LDpBjD7leIccf+6krJNNWaIWYYkEdxpzQD9FShB7/yDakFq3erW2y5mVqac dZ065M0ckE4bxk9uMIE= =gPHF -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, pm80xx, lpfc, mpi3mr, mpt3sas, hisi_sas, libsas) and minor updates and bug fixes. The most impactful change is likely the switch from GFP_DMA to GFP_KERNEL in a bunch of drivers, but even that shouldn't affect too many people" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (121 commits) scsi: mpi3mr: Bump driver version to 8.0.0.61.0 scsi: mpi3mr: Fixes around reply request queues scsi: mpi3mr: Enhanced Task Management Support Reply handling scsi: mpi3mr: Use TM response codes from MPI3 headers scsi: mpi3mr: Add io_uring interface support in I/O-polled mode scsi: mpi3mr: Print cable mngnt and temp threshold events scsi: mpi3mr: Support Prepare for Reset event scsi: mpi3mr: Add Event acknowledgment logic scsi: mpi3mr: Gracefully handle online FW update operation scsi: mpi3mr: Detect async reset that occurred in firmware scsi: mpi3mr: Add IOC reinit function scsi: mpi3mr: Handle offline FW activation in graceful manner scsi: mpi3mr: Code refactor of IOC init - part2 scsi: mpi3mr: Code refactor of IOC init - part1 scsi: mpi3mr: Fault IOC when internal command gets timeout scsi: mpi3mr: Display IOC firmware package version scsi: mpi3mr: Handle unaligned PLL in unmap cmnds scsi: mpi3mr: Increase internal cmnds timeout to 60s scsi: mpi3mr: Do access status validation before adding devices scsi: mpi3mr: Add support for PCIe Managed Switch SES device ...	2022-01-14 14:37:34 +01:00
Laibin Qiu	180dccb0db	blk-mq: fix tag_get wait task can't be awakened In case of shared tags, there might be more than one hctx which allocates from the same tags, and each hctx is limited to allocate at most: hctx_max_depth = max((bt->sb.depth + users - 1) / users, 4U); tag idle detection is lazy, and may be delayed for 30sec, so there could be just one real active hctx(queue) but all others are actually idle and still accounted as active because of the lazy idle detection. Then if wake_batch is > hctx_max_depth, driver tag allocation may wait forever on this real active hctx. Fix this by recalculating wake_batch when inc or dec active_queues. Fixes: `0d2602ca30` ("blk-mq: improve support for shared tags maps") Suggested-by: Ming Lei <ming.lei@redhat.com> Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220113025536.1479653-1-qiulaibin@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-13 12:52:14 -07:00
Linus Torvalds	f079ab01b5	Convert xfs/iomap to use folios This should be all that is needed for XFS to use large folios. There is no code in this pull request to create large folios, but no additional changes should be needed to XFS or iomap once they are created. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmHcpaUACgkQDpNsjXcp gj4MUAf+ItcKfgFo1QCMT+6Y0mohVqPme/vdyOCNv6yOOfZZqN5ZQc+2hmxXrRz9 XPOPwZKL0TttlHSYEJmrm8mqwN8UXl0kqMu4kQqOXMziiD9qpVlaLXOZ7iLdkQxu z/xe1iACcGfJUaQCsaMP6BZqp6iETA4qP72dBE4jc6PC4H3OI0pN/900gEbAcLxD Yn0a5NhrdS/EySU2aHLB6OcwhqnSiHBVjUbFiuXxuvOYyzLaERIh00Kx3jLdj4DR 82K4TF8h2IZpALfIDSt0JG+gHLCc+EfF7Yd/xkeEv0md3ncyi+jWvFCFPNJbyFjm cYoDTSunfbxwszA2n01R4JM8/KkGwA== =IeFX -----END PGP SIGNATURE----- Merge tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux Pull iomap updates from Matthew Wilcox: "Convert xfs/iomap to use folios. This should be all that is needed for XFS to use large folios. There is no code in this pull request to create large folios, but no additional changes should be needed to XFS or iomap once they are created. Usually this would have come from Darrick, and we had intended that it would come that route. Between the holidays and various things which Darrick needed to work on, he asked if I could send things directly. There weren't any other iomap patches pending for this release, which probably also played a role" * tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux: (26 commits) iomap: Inline __iomap_zero_iter into its caller xfs: Support large folios iomap: Support large folios in invalidatepage iomap: Convert iomap_migrate_page() to use folios iomap: Convert iomap_add_to_ioend() to take a folio iomap: Simplify iomap_do_writepage() iomap: Simplify iomap_writepage_map() iomap,xfs: Convert ->discard_page to ->discard_folio iomap: Convert iomap_write_end_inline to take a folio iomap: Convert iomap_write_begin() and iomap_write_end() to folios iomap: Convert __iomap_zero_iter to use a folio iomap: Allow iomap_write_begin() to be called with the full length iomap: Convert iomap_page_mkwrite to use a folio iomap: Convert readahead and readpage to use a folio iomap: Convert iomap_read_inline_data to take a folio iomap: Use folio offsets instead of page offsets iomap: Convert bio completions to use folios iomap: Pass the iomap_page into iomap_set_range_uptodate iomap: Add iomap_invalidate_folio iomap: Convert iomap_releasepage to use a folio ...	2022-01-12 12:51:41 -08:00
Linus Torvalds	d3c8108035	for-5.17/block-2022-01-11 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8DAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnhRD/wMAjsNO65PCA+o/bPpVi4ulx9EejAzrJnB 5vHFvREAoOOGKvRpYGe4w3TcKyW+zPb+GtlXFjPfK+wuVzWhrQtW/+vkjKlBt8wK o7rzeMwTKJ9ZGvYaaQpp1yC0WURBB3qnCRQhb8dOQzhJgEXinhIOznZsut4mniLv fTqcDmKAb/+G6K6CQCCqnH0I/+OJZyUeSFo1kk2i4ZqCBepQpBkOL6H2rBOtGxUg bt1jiGHbbhCRYEE3u2kV0HP10qAChNaMQC705jV4Qpf4+3EntSxs+6nSb74dvMkX 3+Wmp8Ctq6lpPnDL1nrAFGz3jZnB0Y+GdgOclQn3ViQd1FCXZzuYWQ3fTaBfURCZ /RE5nc047SqpwCFLOynM++OkaeQZ1zSxeyoFTtzDaPF4tLuaX3JHswvTzNGPw8SN BnexseNnNBCjJliZSEE7fOkjJDcev2dvRxPtI8/wkF4lHUgETc5IW563C53xo/Tx 32yFjZwCVIpNWk21su/0H3iEq80wZ7PnriiN/E3JA6XbnevlRPu0NPMb0D258GCm yCcdPVDNZsQCB8hluqZcu0g6LSgZRo90Yg1oqKqEpAllJJMBaEAPPPuUIJh998mo iKGxZzgr7d9jrbGJTInp0F8b3B3/oV/hxgzy0Hu/mHP3AsnaAk9o/oEQZ7rX4Khr 6biloqkIMA== =RWnJ -----END PGP SIGNATURE----- Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - Unify where the struct request handling code is located in the blk-mq code (Christoph) - Header cleanups (Christoph) - Clean up the io_context handling code (Christoph, me) - Get rid of ->rq_disk in struct request (Christoph) - Error handling fix for add_disk() (Christoph) - request allocation cleanusp (Christoph) - Documentation updates (Eric, Matthew) - Remove trivial crypto unregister helper (Eric) - Reduce shared tag overhead (John) - Reduce poll_stats memory overhead (me) - Known indirect function call for dio (me) - Use atomic references for struct request (me) - Support request list issue for block and NVMe (me) - Improve queue dispatch pinning (Ming) - Improve the direct list issue code (Keith) - BFQ improvements (Jan) - Direct completion helper and use it in mmc block (Sebastian) - Use raw spinlock for the blktrace code (Wander) - fsync error handling fix (Ye) - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me) * tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits) MAINTAINERS: add entries for block layer documentation docs: block: remove queue-sysfs.rst docs: sysfs-block: document virt_boundary_mask docs: sysfs-block: document stable_writes docs: sysfs-block: fill in missing documentation from queue-sysfs.rst docs: sysfs-block: add contact for nomerges docs: sysfs-block: sort alphabetically docs: sysfs-block: move to stable directory block: don't protect submit_bio_checks by q_usage_counter block: fix old-style declaration nvme-pci: fix queue_rqs list splitting block: introduce rq_list_move block: introduce rq_list_for_each_safe macro block: move rq_list macros to blk-mq.h block: drop needless assignment in set_task_ioprio() block: remove unnecessary trailing '\' bio.h: fix kernel-doc warnings block: check minor range in device_add_disk() block: use "unsigned long" for blk_validate_block_size(). block: fix error unwinding in device_add_disk ...	2022-01-12 10:26:52 -08:00
Ming Lei	9d497e2941	block: don't protect submit_bio_checks by q_usage_counter Commit `cc9c884dd7` ("block: call submit_bio_checks under q_usage_counter") uses q_usage_counter to protect submit_bio_checks for avoiding IO after disk is deleted by del_gendisk(). Turns out the protection isn't necessary, because once blk_mq_freeze_queue_wait() in del_gendisk() returns: 1) all in-flight IO has been done 2) all new IO will be failed in __bio_queue_enter() because q_usage_counter is dead, and GD_DEAD is set 3) both disk and request queue instance are safe since caller of submit_bio() guarantees that the disk can't be closed. Once submit_bio_checks() needn't the protection of q_usage_counter, we can move submit_bio_checks before calling blk_mq_submit_bio() and ->submit_bio(). With this change, we needn't to throttle queue with holding one allocated request, then precise driver tag or request won't be wasted in throttling. Meantime we can unify the bio check for both bio based and request based driver. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220104134223.590803-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-09 18:54:52 -07:00
Lukas Bulwahn	669a064625	block: drop needless assignment in set_task_ioprio() Commit `5fc11eebb4` ("block: open code create_task_io_context in set_task_ioprio") introduces a needless assignment 'ioc = task->io_context', as the local variable ioc is not further used before returning. Even after the further fix, commit `a957b61254` ("block: fix error in handling dead task for ioprio setting"), the assignment still remains needless. Drop this needless assignment in set_task_ioprio(). This code smell was identified with 'make clang-analyzer'. Fixes: `5fc11eebb4` ("block: open code create_task_io_context in set_task_ioprio") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211223125300.20691-1-lukas.bulwahn@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-23 07:10:07 -07:00
Alan Stern	6e1fcab00a	scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume() John Garry reported a deadlock that occurs when trying to access a runtime-suspended SATA device. For obscure reasons, the rescan procedure causes the link to be hard-reset, which disconnects the device. The rescan tries to carry out a runtime resume when accessing the device. scsi_rescan_device() holds the SCSI device lock and won't release it until it can put commands onto the device's block queue. This can't happen until the queue is successfully runtime-resumed or the device is unregistered. But the runtime resume fails because the device is disconnected, and __scsi_remove_device() can't do the unregistration because it can't get the device lock. The best way to resolve this deadlock appears to be to allow the block queue to start running again even after an unsuccessful runtime resume. The idea is that the driver or the SCSI error handler will need to be able to use the queue to resolve the runtime resume failure. This patch removes the err argument to blk_post_runtime_resume() and makes the routine act as though the resume was successful always. This fixes the deadlock. Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@hisilicon.com Fixes: `e27829dc92` ("scsi: serialize ->rescan against ->remove") Reported-and-tested-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
Tetsuo Handa	e338924bd0	block: check minor range in device_add_disk() ioctl(fd, LOOP_CTL_ADD, 1048576) causes sysfs: cannot create duplicate filename '/dev/block/7:0' message because such request is treated as if ioctl(fd, LOOP_CTL_ADD, 0) due to MINORMASK == 1048575. Verify that all minor numbers for that device fit in the minor range. Reported-by: wangyangbo <wangyangbo@uniontech.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/b1b19379-23ee-5379-0eb5-94bf5f79f1b4@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-21 09:34:29 -07:00
Christoph Hellwig	99d8690aae	block: fix error unwinding in device_add_disk One device_add is called disk->ev will be freed by disk_release, so we should free it twice. Fix this by allocating disk->ev after device_add so that the extra local unwinding can be removed entirely. Based on an earlier patch from Tetsuo Handa. Reported-by: syzbot <syzbot+28a66a9fbc621c939000@syzkaller.appspotmail.com> Tested-by: syzbot <syzbot+28a66a9fbc621c939000@syzkaller.appspotmail.com> Fixes: `83cbce9574` ("block: add error handling for device_add_disk / add_disk") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211221161851.788424-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-21 09:31:51 -07:00
Ming Lei	37e11c3616	block: call blk_exit_queue() before freeing q->stats blk_stat_disable_accounting() is added in commit `68497092bd` ("block: make queue stat accounting a reference"), and called in kyber_exit_sched(). So we have to free q->stats after elevator is unloaded from blk_exit_queue() in blk_release_queue(). Otherwise kernel panic is caused. Fixes: `68497092bd` ("block: make queue stat accounting a reference") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211221040436.1333880-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-20 21:07:51 -07:00
Jens Axboe	a957b61254	block: fix error in handling dead task for ioprio setting Don't combine the task exiting and "already have io_context" case, we need to just abort if the task is marked as dead. Return -ESRCH, which is the documented value for ioprio_set() if the specified task could not be found. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: syzbot+8836466a79f4175961b0@syzkaller.appspotmail.com Fixes: `5fc11eebb4` ("block: open code create_task_io_context in set_task_ioprio") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-20 20:32:24 -07:00
Keith Busch	518579a9af	blk-mq: blk-mq: check quiesce state before queue_rqs The low level drivers don't expect to see new requests after a successful quiesce completes. Check the queue quiesce state within the rcu protected area prior to calling the driver's queue_rqs(). Fixes: `3c67d44de7` ("block: add mq_ops->queue_rqs hook") Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211220205919.180191-1-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-20 14:03:59 -07:00
Linus Torvalds	2da09da4ae	block-5.16-2021-12-19 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG/Z/MQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpjo7EADdal+Z26fAabpI/wHJJQQKOE9WIy3hP5kA G466lPP4FvW+2JIclxRGWmBfkvnzzKLjEFKg2OJZQgOvSycut217qBNxG4xpxy6O a0B0jKV3Dh8uLGaaOwvwcYmeBzuEnzmr+qh5hQJQDr+9Cpx491Uh7+GABOt+sceY +j+KNNNShcK1qUYtYHsKqSvxEAyTmKuN0uz6rOTbGr/Z2TjWXVVBjRw5g3J52U7y DjueqM0Y7qebCN+DexdOXZDC4h3A5/j1kCABPe3Vx3yRRWR1RJDLGDfQes/QZDpF sf5kkr/1iPki7Dh2laLin4686sNa4HyIJgS8wlrVXGBnRl38SGTShcbuP94m5okJ JsdBD38pmmBsSaokKGpxfsG1N5kf3MpBD9zc2WWeiYpwZH+Mr3cRwSvrz3jUpVqL MfrHoCHtgqOgIdC7Jdv3ilet3ujfp9yX9QcgIljgkxlBHZHQxX2mXcobROdWbMgz v+sn0Ot9+bEf4FrwSeq43fWgBGTK3IbiLajtxWozTKvy3Re+hnDnfoHFqUZE9yfe nFzKJPXeplKdM9Y5J75OaJQ+Ohp0F+0jcPNGHeseEWqvJhK0A5mgssM8TqrTQkAo GJj8DZQU8Szc0q7yO7gqVJ398sjyKEtL8Rg3qTGLmKJrNdFis+bmjtaDzyDd4HeW Uij+cetUuw== =dbu4 -----END PGP SIGNATURE----- Merge tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block Pull block revert from Jens Axboe: "It turns out that the fix for not hammering on the delayed work timer too much caused a performance regression for BFQ, so let's revert the change for now. I've got some ideas on how to fix it appropriately, but they should wait for 5.17" * tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block: Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"	2021-12-19 12:38:53 -08:00
Jens Axboe	87959fa16c	Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption" This reverts commit `cb2ac2912a`. Alex and the kernel test robot report that this causes a significant performance regression with BFQ. I can reproduce that result, so let's revert this one as we're close to -rc6 and we there's no point in trying to rush a fix. Link: https://lore.kernel.org/linux-block/1639853092.524jxfaem2.none@localhost/ Link: https://lore.kernel.org/lkml/20211219141852.GH14057@xsang-OptiPlex-9020/ Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-19 07:58:44 -07:00
Linus Torvalds	fa09ca5ebc	block-5.16-2021-12-17 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG8wJ4QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpp+aD/9C6yo/8I2rbP0iJqIWKaADKoU9WGo4E4Q5 NFPsEF17IY2mxZGfvpjMMx2fWPbgfvMuXnWPyfHnFv/00Bm3sr79eQ4Y7Ak8pZWQ r0YzNz1lbgHn8S+EsqWGxWbDZisptaYN2d0skT81dTE62s2dcZc7WYHBJnCW5JZb OTwx68e5QoHrWnoHVnCx8toUszFr0OMiQVRVwbrRtJKVh+2rQODIKz2qs2ajceHo IN2Bk35W1xsv6TWGLKCK4zk9k+/Nz5Cqx+FF59bKAIs3kPy3dt10PDjN30jQGR17 DuGZe08wZ+lCMQGXd1XNcjz3GS9qWlc+mbzVpqUATNMQSGbqwW8JyheiNZwfNClw SzWasQD0KipdBiU0hOxBbDko2n4TVFHbnISfBwz0vHEz54HHHtltNd4iBowJuaq4 VG/zuD2CyWaj7p9Xpymrir1xMonVNTRFRL3mkkU8feTxqTEhy2zIix9CbKdGkU5e rEhAAb5i1+/58A8dcCh9X+zULvXjFL6FfbnCGSMrLEq0ymyysc/+w0BPNzq93Jva F8qSx2nUGq0250u+Boc+wL9uB7AXWwR8addyWWlwaAZ52nqVhciWus4gh1jn8CAV EhO+NFbx/9K1rmPkvbJau1/fgnz4+gJS8iBC0pxlyhvt6rNzMFCJB719jcdr05tG JO0131r51g== =Huz8 -----END PGP SIGNATURE----- Merge tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - Fix for hammering on the delayed run queue timer (me) - bcache regression fix for this merge window (Lin) - Fix a divide-by-zero in the blk-iocost code (Tejun) * tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block: bcache: fix NULL pointer reference in cached_dev_detach_finish block: reduce kblockd_mod_delayed_work_on() CPU consumption iocost: Fix divide-by-zero on donation from low hweight cgroup	2021-12-17 11:46:07 -08:00
Matthew Wilcox (Oracle)	85f5a74c2b	block: Add bio_add_folio() This is a thin wrapper around bio_add_page(). The main advantage here is the documentation that folios larger than 2GiB are not supported. It's not currently possible to allocate folios that large, but if it ever becomes possible, this function will fail gracefully instead of doing I/O to the wrong bytes. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-12-16 15:49:51 -05:00
Christoph Hellwig	5ef1630586	block: only build the icq tracking code when needed Only bfq needs to code to track icq, so make it conditional. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:02 -07:00
Christoph Hellwig	90b627f542	block: fold create_task_io_context into ioc_find_get_icq Fold create_task_io_context into the only remaining caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:02 -07:00
Christoph Hellwig	5fc11eebb4	block: open code create_task_io_context in set_task_ioprio The flow in set_task_ioprio can be simplified by simply open coding create_task_io_context, which removes a refcount roundtrip on the I/O context. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:02 -07:00
Christoph Hellwig	8472161b77	block: fold get_task_io_context into set_task_ioprio Fold get_task_io_context into its only caller, and simplify the code as no reference to the I/O context is required to just set the ioprio field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:02 -07:00
Christoph Hellwig	a411cd3cfd	block: move set_task_ioprio to blk-ioc.c Keep set_task_ioprio with the other low-level code that accesses the io_context structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	091abcb3ef	block: cleanup ioc_clear_queue Fold __ioc_clear_queue into ioc_clear_queue and switch to always use plain _irq locking instead of the more expensive _irqsave that is not needed here. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	edf70ff5a1	block: refactor put_io_context Move the code to delay freeing the icqs into a separate helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	8a20c0c7e0	block: remove the NULL ioc check in put_io_context No caller passes in a NULL pointer, so remove the check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	4be8a2eaff	block: refactor put_iocontext_active Factor out a ioc_exit_icqs helper to tear down the icqs and the fold the rest of put_iocontext_active into exit_io_context. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	0aed2f162b	block: simplify struct io_context refcounting Don't hold a reference to ->refcount for each active reference, but just one for all active references. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	8a2ba1785c	block: remove the nr_task field from struct io_context Nothing ever looks at ->nr_tasks, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Jens Axboe	3c67d44de7	block: add mq_ops->queue_rqs hook If we have a list of requests in our plug list, send it to the driver in one go, if possible. The driver must set mq_ops->queue_rqs() to support this, if not the usual one-by-one path is used. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 08:49:17 -07:00
Jens Axboe	fcade2ce06	block: use singly linked list for bio cache Pointless to maintain a head/tail for the list, as we never need to access the tail. Entries are always LIFO for cache hotness reasons. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 08:43:09 -07:00
Jens Axboe	5581a5ddfe	block: add completion handler for fast path The batched completions only deal with non-partial requests anyway, and it doesn't deal with any requests that have errors. Add a completion handler that assumes it's a full request and that it's all being ended successfully. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 08:42:06 -07:00
Jens Axboe	cb2ac2912a	block: reduce kblockd_mod_delayed_work_on() CPU consumption Dexuan reports that he's seeing spikes of very heavy CPU utilization when running 24 disks and using the 'none' scheduler. This happens off the sched restart path, because SCSI requires the queue to be restarted async, and hence we're hammering on mod_delayed_work_on() to ensure that the work item gets run appropriately. Avoid hammering on the timer and just use queue_work_on() if no delay has been specified. Reported-and-tested-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/ Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-14 20:08:05 -07:00
Jens Axboe	68497092bd	block: make queue stat accounting a reference kyber turns on IO statistics when it is loaded on a queue, which means that even if kyber is then later unloaded, we're still stuck with stats enabled on the queue. Change the account enabled from a bool to an int, and pair the enable call with the equivalent disable call. This ensures that stats gets turned off again appropriately. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-14 17:23:05 -07:00
Tejun Heo	edaa26334c	iocost: Fix divide-by-zero on donation from low hweight cgroup The donation calculation logic assumes that the donor has non-zero after-donation hweight, so the lowest active hweight a donating cgroup can have is 2 so that it can donate 1 while keeping the other 1 for itself. Earlier, we only donated from cgroups with sizable surpluses so this condition was always true. However, with the precise donation algorithm implemented, `f1de2439ec` ("blk-iocost: revamp donation amount determination") made the donation amount calculation exact enabling even low hweight cgroups to donate. This means that in rare occasions, a cgroup with active hweight of 1 can enter donation calculation triggering the following warning and then a divide-by-zero oops. WARNING: CPU: 4 PID: 0 at block/blk-iocost.c:1928 transfer_surpluses.cold+0x0/0x53 [884/94867] ... RIP: 0010:transfer_surpluses.cold+0x0/0x53 Code: 92 ff 48 c7 c7 28 d1 ab b5 65 48 8b 34 25 00 ae 01 00 48 81 c6 90 06 00 00 e8 8b 3f fe ff 48 c7 c0 ea ff ff ff e9 95 ff 92 ff <0f> 0b 48 c7 c7 30 da ab b5 e8 71 3f fe ff 4c 89 e8 4d 85 ed 74 0 4 ... Call Trace: <IRQ> ioc_timer_fn+0x1043/0x1390 call_timer_fn+0xa1/0x2c0 __run_timers.part.0+0x1ec/0x2e0 run_timer_softirq+0x35/0x70 ... iocg: invalid donation weights in /a/b: active=1 donating=1 after=0 Fix it by excluding cgroups w/ active hweight < 2 from donating. Excluding these extreme low hweight donations shouldn't affect work conservation in any meaningful way. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `f1de2439ec` ("blk-iocost: revamp donation amount determination") Cc: stable@vger.kernel.org # v5.10+ Link: https://lore.kernel.org/r/Ybfh86iSvpWKxhVM@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-14 06:58:15 -07:00
Matthew Wilcox (Oracle)	0ba4566cd8	bdev: Improve lookup_bdev documentation Add a Context section and rewrite the rest to be clearer. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Link: https://lore.kernel.org/r/20211213171113.3097631-1-willy@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-13 11:29:16 -07:00
Linus Torvalds	eccea80be2	block-5.16-2021-12-10 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG0PvAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpjiDEACP9AWpsoLiN5CCRsrQpneErFRaYsov59OD n299Gy9hsccVVbl1hjgp3lF/RrCLkdQ3K1WNlIZ+pG4OO20us9b8zdgTO7u9ZOvi sKyucmPhzpUnbUidmdoytW0KdvQ7SXO95OI2C7OSJhNCzJdpemiu6WpFJKwYiB2V Agv73knEtjhNIM8biWgy5B8KekYulkGUy9JO3tLgnpLbnnln210WFvUNamW0JJKg 671OwT7TSbK0tZF1wqtaxv4iaf1CqHPLDxJaABZvo6FGTLc878xD8KFmCFMsWNGq dl+R9HdG+fGvaaZiEBMoI6Gy02M7BcQimGnra33nWghe+XCpR10efZ3K/Tl3HrwX eiw2GGBASejrgioMqa15XVw36e6MMQwZeqisHHNiULUm0UG9SipJYTWlpy0FqfbG vRdkArHNXCJgNN6kacRqnfspe2vf0VGOMLODcwGKzl7iwf4shAHlWsvcvE/bPpe1 GwQosO/6XUiF6jIFu05QsqU7FHjk7rMknDVrNX6oxxnn84ZOMUHfZDMkFbK5vDsZ n5f0uEohgJWzOCMrHbrtW252lwWLGxxa0XZdTZLsD8xnS3axZf9JpgAXINf1lHey YGRsvo3veZNFRI1S/HgF2/IcqyP4rMeFZDDJK/L5Tq5IvDo2usjcFEeyzNTOosKd hhdETVuZHg== =aqTJ -----END PGP SIGNATURE----- Merge tag 'block-5.16-2021-12-10' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few block fixes that should go into this release: - NVMe pull request: - set ana_log_size to 0 after freeing ana_log_buf (Hou Tao) - show subsys nqn for duplicate cntlids (Keith Busch) - disable namespace access for unsupported metadata (Keith Busch) - report write pointer for a full zone as zone start + zone len (Niklas Cassel) - fix use after free when disconnecting a reconnecting ctrl (Ruozhu Li) - fix a list corruption in nvmet-tcp (Sagi Grimberg) - Fix for a regression on DIO single bio async IO (Pavel) - ioprio seteuid fix (Davidlohr) - mtd fix that subsequently got reverted as it was broken, will get re-done and submitted for the next round - Two MD fixes via Song (Markus, zhangyue)" * tag 'block-5.16-2021-12-10' of git://git.kernel.dk/linux-block: Revert "mtd_blkdevs: don't scan partitions for plain mtdblock" block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) md: fix double free of mddev->private in autorun_array() md: fix update super 1.0 on rdev size change nvmet-tcp: fix possible list corruption for unexpected command failure block: fix single bio async DIO error handling nvme: fix use after free when disconnecting a reconnecting ctrl nvme-multipath: set ana_log_size to 0 after free ana_log_buf mtd_blkdevs: don't scan partitions for plain mtdblock nvme: report write pointer for a full zone as zone start + zone len nvme: disable namespace access for unsupported metadata nvme: show subsys nqn for duplicate cntlids	2021-12-11 09:25:07 -08:00
Davidlohr Bueso	e6a59aac8a	block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent change_pid(PIDTYPE_PGID) that can move the task from one hlist to another while iterating. Serialize ioprio_get to take the tasklist_lock in this case, just like it's set counterpart. Fixes: `d69b78ba1d` (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()) Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lore.kernel.org/r/20211210182058.43417-1-dave@stgolabs.net Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-10 11:26:07 -07:00
Jakub Kicinski	6efcdadc15	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2021-12-08 We've added 12 non-merge commits during the last 22 day(s) which contain a total of 29 files changed, 659 insertions(+), 80 deletions(-). The main changes are: 1) Fix an off-by-two error in packet range markings and also add a batch of new tests for coverage of these corner cases, from Maxim Mikityanskiy. 2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh. 3) Fix two functional regressions and a build warning related to BTF kfunc for modules, from Kumar Kartikeya Dwivedi. 4) Fix outdated code and docs regarding BPF's migrate_disable() use on non- PREEMPT_RT kernels, from Sebastian Andrzej Siewior. 5) Add missing includes in order to be able to detangle cgroup vs bpf header dependencies, from Jakub Kicinski. 6) Fix regression in BPF sockmap tests caused by missing detachment of progs from sockets when they are removed from the map, from John Fastabend. 7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF dispatcher, from Björn Töpel. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add selftests to cover packet access corner cases bpf: Fix the off-by-two error in range markings treewide: Add missing includes masked by cgroup -> bpf dependency tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets bpf: Fix bpf_check_mod_kfunc_call for built-in modules bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL mips, bpf: Fix reference to non-existing Kconfig symbol bpf: Make sure bpf_disable_instrumentation() is safe vs preemption. Documentation/locking/locktypes: Update migrate_disable() bits. bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap bpf, sockmap: Attach map progs to psock early for feature probes bpf, x86: Fix "no previous prototype" warning ==================== Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 16:06:44 -08:00
Pavel Begunkov	75feae73a2	block: fix single bio async DIO error handling BUG: KASAN: use-after-free in io_submit_one+0x496/0x2fe0 fs/aio.c:1882 CPU: 2 PID: 15100 Comm: syz-executor873 Not tainted 5.16.0-rc1-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: [...] refcount_dec_and_test include/linux/refcount.h:333 [inline] iocb_put fs/aio.c:1161 [inline] io_submit_one+0x496/0x2fe0 fs/aio.c:1882 __do_sys_io_submit fs/aio.c:1938 [inline] __se_sys_io_submit fs/aio.c:1908 [inline] __x64_sys_io_submit+0x1c7/0x4a0 fs/aio.c:1908 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae __blkdev_direct_IO_async() returns errors from bio_iov_iter_get_pages() directly, in which case upper layers won't be expecting ->ki_complete to be called by the block layer and will terminate the request. However, there is also bio_endio() leading to a second ->ki_complete and a double free. Fixes: `54a88eb838` ("block: add single bio async direct IO helper") Reported-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c9eb786f6cef041e159e6287de131bec0719ad5c.1638907997.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-07 15:07:40 -07:00
John Garry	fea9f92f17	blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags Kashyap reports high CPU usage in blk_mq_queue_tag_busy_iter() and callees using megaraid SAS RAID card since moving to shared tags [0]. Previously, when shared tags was shared sbitmap, this function was less than optimum since we would iter through all tags for all hctx's, yet only ever match upto tagset depth number of rqs. Since the change to shared tags, things are even less efficient if we have parallel callers of blk_mq_queue_tag_busy_iter(). This is because in bt_iter() -> blk_mq_find_and_get_req() there would be more contention on accessing each request ref and tags->lock since they are now shared among all HW queues. Optimise by having separate calls to bt_for_each() for when we're using shared tags. In this case no longer pass a hctx, as it is no longer relevant, and teach bt_iter() about this. Ming suggested something along the lines of this change, apart from a different implementation. [0] https://lore.kernel.org/linux-block/e4e92abbe9d52bcba6b8cc6c91c442cc@mail.gmail.com/ Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reported-and-tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Fixes: `e155b0c238` ("blk-mq: Use shared tags for shared sbitmap support") Link: https://lore.kernel.org/r/1638794990-137490-4-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-06 13:18:47 -07:00
John Garry	fc39f8d2d1	blk-mq: Delete busy_iter_fn Typedefs busy_iter_fn and busy_tag_iter_fn are now identical, so delete busy_iter_fn to reduce duplication. It would be nicer to delete busy_tag_iter_fn, as the name busy_iter_fn is less specific. However busy_tag_iter_fn is used in many different parts of the tree, unlike busy_iter_fn which is just use in block/, so just take the straightforward path now, so that we could rename later treewide. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Link: https://lore.kernel.org/r/1638794990-137490-3-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-06 13:18:47 -07:00
John Garry	8ab30a3319	blk-mq: Drop busy_iter_fn blk_mq_hw_ctx argument The only user of blk_mq_hw_ctx blk_mq_hw_ctx argument is blk_mq_rq_inflight(). Function blk_mq_rq_inflight() uses the hctx to find the associated request queue to match against the request. However this same check is already done in caller bt_iter(), so drop this check. With that change there are no more users of busy_iter_fn blk_mq_hw_ctx argument, so drop the argument. Reviewed-by Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Link: https://lore.kernel.org/r/1638794990-137490-2-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-06 13:18:47 -07:00

1 2 3 4 5 ...

6037 Commits