linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-15 00:54:03 +08:00

Author	SHA1	Message	Date
Guoqing Jiang	5addeae1be	blk-cgroup: remove blkcg_drain_queue Since blk_drain_queue had already been removed, so this function is not needed anymore. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-12 09:26:55 -07:00
Logan Gunthorpe	ecb6186cf7	block: fix NULL pointer dereference in account statistics with IDE The IDE driver creates some passthru requests which never get submitted to the block layer in such a way that blk_account_io_start() gets called. However, the driver still calls __blk_mq_end_request() in ide_end_rq() which will call blk_account_io_completion() which tries to dereferences req->part which is never set. See ide_prep_sense() for an example of where these requests come from. To fix this, blk_account_io_completion() and blk_account_io_done() should do nothing if req->part is not set. The back trace of this bug is: BUG: kernel NULL pointer dereference, address: 000002ac #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page *pde = 00000000 Oops: 0002 [#1] CPU: 0 PID: 237 Comm: kworker/0:1H Not tainted 5.4.0-rc2-00011-g48d9b0d43105e #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: kblockd drive_rq_insert_work EIP: blk_account_io_completion+0x7a/0xf0 Code: 89 54 24 08 31 d2 89 4c 24 04 31 c9 c7 04 24 02 00 00 00 c1 ee 09 e8 f5 21 a6 ff e8 70 5c a7 ff 8b 53 60 8d 04 bd 00 00 00 00 <01> b4 02 ac 02 00 00 8b 9a 88 02 00 00 85 db 74 11 85 d2 74 51 8b EAX: 00000000 EBX: f5b80000 ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: f3031e70 ESP: f3031e54 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010046 CR0: 80050033 CR2: 000002ac CR3: 03c25000 CR4: 000406d0 Call Trace: <IRQ> blk_update_request+0x85/0x420 ide_end_rq+0x38/0xa0 ide_complete_rq+0x3d/0x70 cdrom_newpc_intr+0x258/0xba0 ide_intr+0x135/0x250 __handle_irq_event_percpu+0x3e/0x250 handle_irq_event_percpu+0x1f/0x50 handle_irq_event+0x32/0x60 handle_level_irq+0x6c/0x110 handle_irq+0x72/0xa0 </IRQ> do_IRQ+0x45/0xad common_interrupt+0x115/0x11c Fixes: `48d9b0d431` ("block: account statistics for passthrough requests") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-12 08:12:50 -07:00
Jens Axboe	296aec45a6	Merge branch 'md-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-linus Pull MD fixes from Song. * 'md-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: make sure desc_nr less than MD_SB_DISKS md: raid1: check rdev before reference in raid1_sync_request func raid5: need to set STRIPE_HANDLE for batch head	2019-12-11 20:49:58 -07:00
Yufen Yu	3b7436cc94	md: make sure desc_nr less than MD_SB_DISKS For super_90_load, we need to make sure 'desc_nr' less than MD_SB_DISKS, avoiding invalid memory access of 'sb->disks'. Fixes: `228fc7d76d` ("md: avoid invalid memory access for array sb->dev_roles") Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2019-12-11 10:38:08 -08:00
Zhiqiang Liu	028288df63	md: raid1: check rdev before reference in raid1_sync_request func In raid1_sync_request func, rdev should be checked before reference. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2019-12-11 10:36:13 -08:00
Guoqing Jiang	a7ede3d168	raid5: need to set STRIPE_HANDLE for batch head With commit `6ce220dd2f` ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list"), we don't want to set STRIPE_HANDLE flag for sh which is already in batch list. However, the stripe which is the head of batch list should set this flag, otherwise panic could happen inside init_stripe at BUG_ON(sh->batch_head), it is reproducible with raid5 on top of nvdimm devices per Xiao oberserved. Thanks for Xiao's effort to verify the change. Fixes: `6ce220dd2f` ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list") Reported-by: Xiao Ni <xni@redhat.com> Tested-by: Xiao Ni <xni@redhat.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2019-12-11 10:12:09 -08:00
Andreas Gruenbacher	cc90bc6842	block: fix "check bi_size overflow before merge" This partially reverts commit `e3a5d8e386`. Commit `e3a5d8e386` ("check bi_size overflow before merge") adds a bio_full check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail when the last bi_io_vec has been reached. Instead, what we want here is only the bi_size overflow check. Fixes: `e3a5d8e386` ("block: check bi_size overflow before merge") Cc: stable@vger.kernel.org # v5.4+ Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-09 22:04:35 -07:00
Jens Axboe	dc3ecfc981	Merge branch 'nvme/for-5.5' of git://git.infradead.org/nvme into for-linus Pull NVMe fixes from Keith * 'nvme/for-5.5' of git://git.infradead.org/nvme: nvme/pci: Fix read queue count nvme/pci Limit write queue sizes to possible cpus nvme/pci: Fix write and poll queue types nvme/pci: Remove last_cq_head nvme: Namepace identification descriptor list is optional nvme-fc: fix double-free scenarios on hw queues nvme: else following return is not needed nvme: add error message on mismatching controller ids nvme_fc: add module to ops template to allow module references nvmet-loop: Avoid preallocating big SGL for data nvme-fc: Avoid preallocating big SGL for data nvme-rdma: Avoid preallocating big SGL for data	2019-12-06 17:27:56 -07:00
Keith Busch	7e4c6b9a5d	nvme/pci: Fix read queue count If nvme.write_queues equals the number of CPUs, the driver had decreased the number of interrupts available such that there could only be one read queue even if the controller could support more. Remove the interrupt count reduction in this case. The driver wouldn't request more IRQs than it wants queues anyway. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org>	2019-12-07 02:52:47 +09:00
Keith Busch	17c3316734	nvme/pci Limit write queue sizes to possible cpus The driver can never use more queues of any type than the number of possible CPUs, so a higher value causes the driver to allocate more memory for IO queues than it could ever use. Limit the parameter at module load time to the number of possible cpus. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org>	2019-12-07 02:52:42 +09:00
Keith Busch	3f68baf706	nvme/pci: Fix write and poll queue types The number of poll or write queues should never be negative. Use unsigned types so that it's not possible to break have the driver not allocate any queues. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org>	2019-12-07 02:52:24 +09:00
Jens Axboe	8539429917	Merge branch 'io_uring-5.5' into for-linus * io_uring-5.5: io_uring: fix a typo in a comment io_uring: hook all linked requests via link_list io_uring: fix error handling in io_queue_link_head io_uring: use hash table for poll command lookups	2019-12-05 19:09:26 -07:00
Justin Tee	ece841abbe	block: fix memleak of bio integrity data `7c20f11680` ("bio-integrity: stop abusing bi_end_io") moves bio_integrity_free from bio_uninit() to bio_integrity_verify_fn() and bio_endio(). This way looks wrong because bio may be freed without calling bio_endio(), for example, blk_rq_unprep_clone() is called from dm_mq_queue_rq() when the underlying queue of dm-mpath is busy. So memory leak of bio integrity data is caused by commit `7c20f11680`. Fixes this issue by re-adding bio_integrity_free() to bio_uninit(). Fixes: `7c20f11680` ("bio-integrity: stop abusing bi_end_io") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by Justin Tee <justin.tee@broadcom.com> Add commit log, and simplify/fix the original patch wroten by Justin. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-05 11:38:36 -07:00
LimingWu	0b4295b5e2	io_uring: fix a typo in a comment thatn -> than. Signed-off-by: Liming Wu <19092205@suning.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-05 07:59:37 -07:00
Hou Tao	08802ed665	bfq-iosched: Ensure bio->bi_blkg is valid before using it bio->bi_blkg will be NULL when the issue of the request has bypassed the block layer as shown in the following oops: Internal error: Oops: 96000005 [#1] SMP CPU: 17 PID: 2996 Comm: scsi_id Not tainted 5.4.0 #4 Call trace: percpu_counter_add_batch+0x38/0x4c8 bfqg_stats_update_legacy_io+0x9c/0x280 bfq_insert_requests+0xbac/0x2190 blk_mq_sched_insert_request+0x288/0x670 blk_execute_rq_nowait+0x140/0x178 blk_execute_rq+0x8c/0x140 sg_io+0x604/0x9c0 scsi_cmd_ioctl+0xe38/0x10a8 scsi_cmd_blk_ioctl+0xac/0xe8 sd_ioctl+0xe4/0x238 blkdev_ioctl+0x590/0x20e0 block_ioctl+0x60/0x98 do_vfs_ioctl+0xe0/0x1b58 ksys_ioctl+0x80/0xd8 __arm64_sys_ioctl+0x40/0x78 el0_svc_handler+0xc4/0x270 so ensure its validity before using it. Fixes: `fd41e60331` ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-05 07:10:09 -07:00
Pavel Begunkov	4493233edc	io_uring: hook all linked requests via link_list Links are created by chaining requests through req->list with an exception that head uses req->link_list. (e.g. link_list->list->list) Because of that, io_req_link_next() needs complex splicing to advance. Link them all through list_list. Also, it seems to be simpler and more consistent IMHO. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-05 06:54:52 -07:00
Pavel Begunkov	2e6e1fde32	io_uring: fix error handling in io_queue_link_head In case of an error io_submit_sqe() drops a request and continues without it, even if the request was a part of a link. Not only it doesn't cancel links, but also may execute wrong sequence of actions. Stop consuming sqes, and let the user handle errors. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-05 06:54:51 -07:00
Jens Axboe	78076bb64a	io_uring: use hash table for poll command lookups We recently changed this from a single list to an rbtree, but for some real life workloads, the rbtree slows down the submission/insertion case enough so that it's the top cycle consumer on the io_uring side. In testing, using a hash table is a more well rounded compromise. It is fast for insertion, and as long as it's sized appropriately, it works well for the cancellation case as well. Running TAO with a lot of network sockets, this removes io_poll_req_insert() from spending 2% of the CPU cycles. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-04 20:12:58 -07:00
Jens Axboe	08bdcc35f0	io-wq: clear node->next on list deletion If someone removes a node from a list, and then later adds it back to a list, we can have invalid data in ->next. This can cause all sorts of issues. One such use case is the IORING_OP_POLL_ADD command, which will do just that if we race and get woken twice without any pending events. This is a pretty rare case, but can happen under extreme loads. Dan reports that he saw the following crash: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD d283ce067 P4D d283ce067 PUD e5ca04067 PMD 0 Oops: 0002 [#1] SMP CPU: 17 PID: 10726 Comm: tao:fast-fiber Kdump: loaded Not tainted 5.2.9-02851-gac7bc042d2d1 #116 Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019 RIP: 0010:io_wqe_enqueue+0x3e/0xd0 Code: 34 24 74 55 8b 47 58 48 8d 6f 50 85 c0 74 50 48 89 df e8 35 7c 75 00 48 83 7b 08 00 48 8b 14 24 0f 84 84 00 00 00 48 8b 4b 10 <48> 89 11 48 89 53 10 83 63 20 fe 48 89 c6 48 89 df e8 0c 7a 75 00 RSP: 0000:ffffc90006858a08 EFLAGS: 00010082 RAX: 0000000000000002 RBX: ffff889037492fc0 RCX: 0000000000000000 RDX: ffff888e40cc11a8 RSI: ffff888e40cc11a8 RDI: ffff889037492fc0 RBP: ffff889037493010 R08: 00000000000000c3 R09: ffffc90006858ab8 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888e40cc11a8 R13: 0000000000000000 R14: 00000000000000c3 R15: ffff888e40cc1100 FS: 00007fcddc9db700(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000e479f5003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> io_poll_wake+0x12f/0x2a0 __wake_up_common+0x86/0x120 __wake_up_common_lock+0x7a/0xc0 sock_def_readable+0x3c/0x70 tcp_rcv_established+0x557/0x630 tcp_v6_do_rcv+0x118/0x3c0 tcp_v6_rcv+0x97e/0x9d0 ip6_protocol_deliver_rcu+0xe3/0x440 ip6_input+0x3d/0xc0 ? ip6_protocol_deliver_rcu+0x440/0x440 ipv6_rcv+0x56/0xd0 ? ip6_rcv_finish_core.isra.18+0x80/0x80 __netif_receive_skb_one_core+0x50/0x70 netif_receive_skb_internal+0x2f/0xa0 napi_gro_receive+0x125/0x150 mlx5e_handle_rx_cqe+0x1d9/0x5a0 ? mlx5e_poll_tx_cq+0x305/0x560 mlx5e_poll_rx_cq+0x49f/0x9c5 mlx5e_napi_poll+0xee/0x640 ? smp_reschedule_interrupt+0x16/0xd0 ? reschedule_interrupt+0xf/0x20 net_rx_action+0x286/0x3d0 __do_softirq+0xca/0x297 irq_exit+0x96/0xa0 do_IRQ+0x54/0xe0 common_interrupt+0xf/0xf </IRQ> RIP: 0033:0x7fdc627a2e3a Code: 31 c0 85 d2 0f 88 f6 00 00 00 55 48 89 e5 41 57 41 56 4c 63 f2 41 55 41 54 53 48 83 ec 18 48 85 ff 0f 84 c7 00 00 00 48 8b 07 <41> 89 d4 49 89 f5 48 89 fb 48 85 c0 0f 84 64 01 00 00 48 83 78 10 when running a networked workload with about 5000 sockets being polled for. Fix this by clearing node->next when the node is being removed from the list. Fixes: `6206f0e180` ("io-wq: shrink io_wq_work a bit") Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-04 17:26:57 -07:00
Jens Axboe	2d28390aff	io_uring: ensure deferred timeouts copy necessary data If we defer a timeout, we should ensure that we copy the timespec when we have consumed the sqe. This is similar to commit `f67676d160` for read/write requests. We already did this correctly for timeouts deferred as links, but do it generally and use the infrastructure added by commit `1a6b74fc87` instead of having the timeout deferral use its own. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-04 11:12:08 -07:00
Jens Axboe	901e59bba9	io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUT There's really no reason why we forbid things like link/drain etc on regular timeout commands. Enable the usual SQE flags on timeouts. Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-04 10:34:03 -07:00
Jens Axboe	bca1c43cb2	null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED If BLK_DEV_ZONED isn't set, 'ret' isn't used. This makes gcc complain, rightfully. Move ret where it is used. Fixes: `979d54475e` ("null_blk: cleanup null_gendisk_register") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-04 09:17:41 -07:00
Ming Lei	f1acbf2186	brd: warn on un-aligned buffer Queue dma alignment limit requires users(fs, target, ...) of block layer to pass aligned buffer. So far brd doesn't support un-aligned buffer, even though it is easy to support it. However, given brd is often used for debug purpose, and there are other drivers which can't support un-aligned buffer too. So add warning so that brd users know what to fix. Reported-by: Stephen Rust <srust@blockbridge.com> Cc: Stephen Rust <srust@blockbridge.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-04 09:04:26 -07:00
Ming Lei	36582a5a45	brd: remove max_hw_sectors queue limit Now we depend on blk_queue_split() to respect most of queue limit (the only one exception could be dma alignment), however blk_queue_split() isn't used for brd, so this limit isn't respected since v4.3. Also max_hw_sectors limit doesn't play a big role for brd, which is added since brd is added to tree for unknown reason. So remove it. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-04 09:04:25 -07:00
SeongJae Park	f9bd84a8a8	xen/blkback: Avoid unmapping unmapped grant pages For each I/O request, blkback first maps the foreign pages for the request to its local pages. If an allocation of a local page for the mapping fails, it should unmap every mapping already made for the request. However, blkback's handling mechanism for the allocation failure does not mark the remaining foreign pages as unmapped. Therefore, the unmap function merely tries to unmap every valid grant page for the request, including the pages not mapped due to the allocation failure. On a system that fails the allocation frequently, this problem leads to following kernel crash. [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [ 372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40 [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0 [ 372.012562] Oops: 0002 [#1] SMP [ 372.012566] Modules linked in: act_police sch_ingress cls_u32 ... [ 372.012746] Call Trace: [ 372.012752] [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40 [ 372.012759] [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback] ... [ 372.012802] [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback] ... Decompressing Linux... Parsing ELF... done. Booting the kernel. [ 0.000000] Initializing cgroup subsys cpuset This commit fixes this problem by marking the grant pages of the given request that didn't mapped due to the allocation failure as invalid. Fixes: `c6cc142dac` ("xen-blkback: use balloon pages for all mappings") Reviewed-by: David Woodhouse <dwmw@amazon.de> Reviewed-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 14:03:24 -07:00
Jens Axboe	87f80d623c	io_uring: handle connect -EINPROGRESS like -EAGAIN Right now we return it to userspace, which means the application has to poll for the socket to be writeable. Let's just treat it like -EAGAIN and have io_uring handle it internally, this makes it much easier to use. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 11:23:54 -07:00
Christoph Hellwig	6c6b354914	block: set the zone size in blk_revalidate_disk_zones atomically The current zone revalidation code has a major problem in that it doesn't update the zone size and q->nr_zones atomically, leading to a short window where an out of bounds access to the zone arrays is possible. To fix this move the setting of the zone size into the crticial sections blk_revalidate_disk_zones so that it gets updated together with the zone bitmaps and q->nr_zones. This also slightly simplifies the caller as it deducts the zone size from the report_zones. This change also allows to check for a power of two zone size in generic code. Reported-by: Hans Holmberg <hans@owltronix.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 10:18:22 -07:00
Christoph Hellwig	ae58954d87	block: don't handle bio based drivers in blk_revalidate_disk_zones bio based drivers only need to update q->nr_zones. Do that manually instead of overloading blk_revalidate_disk_zones to keep that function simpler for the next round of changes that will rely even more on the request based functionality. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 08:51:25 -07:00
Christoph Hellwig	e94f581944	block: allocate the zone bitmaps lazily Allocate the conventional zone bitmap and the sequential zone locking bitmap only when we find a zone of the respective type. This avoids wasting memory on the conventional zone bitmap for devices that only have sequential zones, and will also prepare for other future changes. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 08:51:25 -07:00
Christoph Hellwig	f216fdd77b	block: replace seq_zones_bitmap with conv_zones_bitmap Invert the meaning of seq_zones_bitmap by keeping a bitmap of conventional zones. This allows not having a bitmap for devices that do not have conventional zones. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 08:51:25 -07:00
Christoph Hellwig	9b38bb4b1e	block: simplify blkdev_nr_zones Simplify the arguments to blkdev_nr_zones by passing a gendisk instead of the block_device and capacity. This also removes the need for __blkdev_nr_zones as all callers are outside the fast path and can deal with the additional branch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 08:51:24 -07:00
Christoph Hellwig	bb55628288	block: remove the empty line at the end of blk-zoned.c Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 08:51:24 -07:00
Christoph Hellwig	979d54475e	null_blk: cleanup null_gendisk_register Use a saner size calculation, and do a trivial cleanup on the zone revalidation to prepare to future changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 08:51:24 -07:00
Damien Le Moal	5c4bd1f40c	null_blk: fix zone size paramter check For zoned=1 mode, the zone size must be a power of 2. Check this not only when the zone size is specified during modprobe, but also when creating a zoned null_blk device using configfs. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 08:51:24 -07:00
Keith Busch	f6c4d97b0d	nvme/pci: Remove last_cq_head We had been saving the last_cq_head seen from an interrupt so that a polled queue wouldn't mistakenly trigger spruious interrupt detection. We don't poll interrupt driven queues any more, so saving this value is pointless. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2019-12-04 00:38:06 +09:00
Pavel Begunkov	795ee49c1a	block: optimise bvec_iter_advance() bvec_iter_advance() is quite popular, but compilers fail to do proper alias analysis and optimise it good enough. The assembly is checked for gcc 9.2, x86-64. - remove @iter->bi_size from min(...), as it's always less than @bytes. Modify at the beginning and forget about it. - the compiler isn't able to collapse memory dependencies and remove writes in the loop. Help it by explicitely using local vars. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 07:36:38 -07:00
Jackie Liu	8cdda87a44	io_uring: remove io_wq_current_is_worker Since commit `b18fdf71e0` ("io_uring: simplify io_req_link_next()"), the io_wq_current_is_worker function is no longer needed, clean it up. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 07:04:32 -07:00
Jackie Liu	22efde5998	io_uring: remove parameter ctx of io_submit_state_start Parameter ctx we have never used, clean it up. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 07:04:32 -07:00
Jens Axboe	da8c969069	io_uring: mark us with IORING_FEAT_SUBMIT_STABLE If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 07:04:32 -07:00
Jens Axboe	f499a021ea	io_uring: ensure async punted connect requests copy data Just like commit `f67676d160` for read/write requests, this one ensures that the sockaddr data has been copied for IORING_OP_CONNECT if we need to punt the request to async context. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 07:04:30 -07:00
Jens Axboe	03b1230ca1	io_uring: ensure async punted sendmsg/recvmsg requests copy data Just like commit `f67676d160` for read/write requests, this one ensures that the msghdr data is fully copied if we need to punt a recvmsg or sendmsg system call to async context. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 07:03:35 -07:00
Jens Axboe	f67676d160	io_uring: ensure async punted read/write requests copy iovec Currently we don't copy the iovecs when we punt to async context. This can be problematic for applications that store the iovec on the stack, as they often assume that it's safe to let the iovec go out of scope as soon as IO submission has been called. This isn't always safe, as we will re-copy the iovec once we're in async context. Make this 100% safe by copying the iovec just once. With this change, applications may safely store the iovec on the stack for all cases. Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-02 21:33:25 -07:00
Jens Axboe	1a6b74fc87	io_uring: add general async offload context Right now we just copy the sqe for async offload, but we want to store more context across an async punt. In preparation for doing so, put the sqe copy inside a structure that we can expand. With this pointer added, we can get rid of REQ_F_FREE_SQE, as that is now indicated by whether req->io is NULL or not. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-02 18:49:33 -07:00
Eric Biggers	490547ca2d	block: don't send uevent for empty disk when not invalidating Commit `6917d06899` ("block: merge invalidate_partitions into rescan_partitions") caused a regression where systemd-udevd spins forever using max CPU starting at boot time. It's caused by a behavior change where a KOBJ_CHANGE uevent is now sent in a case where previously it wasn't. Restore the old behavior. Fixes: `6917d06899` ("block: merge invalidate_partitions into rescan_partitions") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-02 18:49:30 -07:00
Jens Axboe	441cdbd544	io_uring: transform send/recvmsg() -ERESTARTSYS to -EINTR We should never return -ERESTARTSYS to userspace, transform it into -EINTR. Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-02 18:49:10 -07:00
Keith Busch	22802bf742	nvme: Namepace identification descriptor list is optional Despite NVM Express specification 1.3 requires a controller claiming to be 1.3 or higher implement Identify CNS 03h (Namespace Identification Descriptor list), the driver doesn't really need this identification in order to use a namespace. The code had already documented in comments that we're not to consider an error to this command. Return success if the controller provided any response to an namespace identification descriptors command. Fixes: `538af88ea7` ("nvme: make nvme_report_ns_ids propagate error back") Link: https://bugzilla.kernel.org/show_bug.cgi?id=205679 Reported-by: Ingo Brunberg <ingo_brunberg@web.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: stable@vger.kernel.org # 5.4+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2019-12-03 06:10:00 +09:00
Jens Axboe	0b8c0ec7ee	io_uring: use current task creds instead of allocating a new one syzbot reports: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline] RIP: 0010:__validate_creds include/linux/cred.h:187 [inline] RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550 Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318 RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010 RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849 R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000 R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274 kthread+0x361/0x430 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Modules linked in: ---[ end trace f2e1a4307fbe2245 ]--- RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline] RIP: 0010:__validate_creds include/linux/cred.h:187 [inline] RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550 Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318 RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010 RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849 R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000 R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 which is caused by slab fault injection triggering a failure in prepare_creds(). We don't actually need to create a copy of the creds as we're not modifying it, we just need a reference on the current task creds. This avoids the failure case as well, and propagates the const throughout the stack. Fixes: `181e448d87` ("io_uring: async workers should inherit the user creds") Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-02 08:50:00 -07:00
Linus Torvalds	72c0870e3a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - updates to Ilitech driver to support ILI2117 - face lift of st1232 driver to support MT-B protocol - a new driver for i.MX system controller keys - mpr121 driver now supports polling mode - various input drivers have been switched away from input_polled_dev to use polled mode of regular input devices - other assorted cleanups and fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (70 commits) Input: synaptics-rmi4 - fix various V4L2 compliance problems in F54 Input: synaptics - switch another X1 Carbon 6 to RMI/SMbus Input: fix Kconfig indentation Input: imx_sc_key - correct SCU message structure to avoid stack corruption Input: ili210x - optionally show calibrate sysfs attribute Input: ili210x - add resolution to chip operations structure Input: ili210x - do not retrieve/print chip firmware version Input: mms114 - use device_get_match_data Input: ili210x - remove unneeded suspend and resume handlers Input: ili210x - do not unconditionally mark touchscreen as wakeup source Input: ili210x - define and use chip operations structure Input: ili210x - do not set parent device explicitly Input: ili210x - handle errors from input_mt_init_slots() Input: ili210x - switch to using threaded IRQ Input: ili210x - add ILI2117 support dt-bindings: input: touchscreen: ad7879: generic node names in example Input: ar1021 - fix typo in preprocessor macro name Input: synaptics-rmi4 - simplify data read in rmi_f54_work Input: kxtj9 - switch to using polled mode of input devices Input: kxtj9 - switch to using managed resources ...	2019-12-01 18:45:29 -08:00
Linus Torvalds	d10032dd53	libnvdimm for 5.5 - Updates to better support vmalloc space restrictions on PowerPC platforms. - Cleanups to move common sysfs attributes to core 'struct device_type' objects. - Export the 'target_node' attribute (the effective numa node if pmem is marked online) for regions and namespaces. - Miscellaneous fixups and optimizations. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl3hZEAACgkQHtKRamZ9 iAJ9Sg/+MVuwazQyL8dLpvEl534SncDjurTCrRE9SOHhMXGp78AN3t6zDKB2sr2Y /iE4gSvg6DTj2xI2Hg1KFh5AMiSOtI8qJkhb2IL+cbmGhfYpwKWnQUStkoMMZpxJ sCEsk1js0KsGRkPDCayDGosrzKoO0K2VKVY/kGgFdP9cEOhm/H6CVNARrkDtZDzD P9GQ+7VCTjS2OLCFHVECdsDQD1XfzL6pW8GW2f/WpKy7NbxaNG3FFTZ5NOFUh+v6 5VZaOXFIPo8DCot+K2bXJgtWDqVU4TscRoEJcFM8G74Ggi7L1gG84lA/1IABfg16 GFYQ3qaKlyE9mvy147FZvHzIHDTx/TT5WNB8Efoy61xiH+ACtlu5ss1GksX+7Pl8 CPLrM2vy0dgSCJ65qOe9/ztoohj+7Xidx9roctx3gtRSURq6txsIzmhG4rn7bdRx s7VGz4Ov4VhrdA1ILCDMGr2Rm8yjf2RnhEj8IzA7e4VqsQ59/hRbXZNm6jmFdkyU zNbq8m5Y2Y1bOTcxYMIRS9xEdcbRIv1PyZ8ByvwpvbW1zSFbRYmZNhmG531pkUSU tIBpVWTmcsxvvKIL+LkZHQ+jzrE2wOeQtFIKZedDKKBKw9YxaYAfSEldQQfI3FrX 7GruA2ipU787bDX1K/QChnEbGk0R9nBo3ET/0vsnCCtEJADs794= =ITT3 -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The highlight this cycle is continuing integration fixes for PowerPC and some resulting optimizations. Summary: - Updates to better support vmalloc space restrictions on PowerPC platforms. - Cleanups to move common sysfs attributes to core 'struct device_type' objects. - Export the 'target_node' attribute (the effective numa node if pmem is marked online) for regions and namespaces. - Miscellaneous fixups and optimizations" * tag 'libnvdimm-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) MAINTAINERS: Remove Keith from NVDIMM maintainers libnvdimm: Export the target_node attribute for regions and namespaces dax: Add numa_node to the default device-dax attributes libnvdimm: Simplify root read-only definition for the 'resource' attribute dax: Simplify root read-only definition for the 'resource' attribute dax: Create a dax device_type libnvdimm: Move nvdimm_bus_attribute_group to device_type libnvdimm: Move nvdimm_attribute_group to device_type libnvdimm: Move nd_mapping_attribute_group to device_type libnvdimm: Move nd_region_attribute_group to device_type libnvdimm: Move nd_numa_attribute_group to device_type libnvdimm: Move nd_device_attribute_group to device_type libnvdimm: Move region attribute group definition libnvdimm: Move attribute groups to device type libnvdimm: Remove prototypes for nonexistent functions libnvdimm/btt: fix variable 'rc' set but not used libnvdimm/pmem: Delete include of nd-core.h libnvdimm/namespace: Differentiate between probe mapping and runtime mapping libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe libnvdimm: Trivial comment fix ...	2019-12-01 18:43:25 -08:00
Linus Torvalds	43fd4bd72c	mailbox changes for v5.5 - omap : misc - catch error returned from pm_runtime_put_sync - hisi : misc - drop .owner from platform_driver - stm : change how wakeup is handled - imx : fix - bailout on error and nuke correct irq - imx : add support for imx7ulp platform -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6EwehDt/SOnwFyTyf9lkf8eYP5UFAl3jUDYACgkQf9lkf8eY P5VVsQ//drP3rGcUKUzIh0zhO478YlpjNy/kTZ93rUAFmN9JSL/MzBexM8fTcTrD O+0JAU8P4a+qANxqKpRUxDsLROaD8Vq8q+oSGUgtr8y/XGDlVb4O74LLQy/277U8 FFDzUkQYtYsRBOv6fQU3u5cZt4Q9soVFU/+w10sF3Px4UiZOox+qsBKP5eik6xBe EaxMM8ear03ZA3nHq/EK3aHEXfNSu3xYfusS4BLEIFenmAaVRl9pCTcEdkQRoDea Y6cUWX7KgJWn8+cghtLmHHMz154Q5j+6CqFnEPRZbP/JvD9sHqFPRNxYlec/i+yL gQagKlj3OdCLUzbolxayhUO/SDmTHeUsvDClZMy+CpROUKOmm1yAcDlnsXmP6wtg LTGHB/WFtYFNBMko9nBDZS2HyzUtEo9s7uHjTktXn0MRJvadifrJPB+eYaXZADyA wsbbqbxn4DFeATw24mYBcNAkdLxsXYOO0a7Gl2v+NPnh6aNHcmxeiFUF2jAa8o4N ACXeQiUofMwKahIQu228zBPgT35JqhJJpPqesYoce1TmaHQiCqiocf0wv3aENFrZ WiEFLQ/N6k7NSeXo8b1KA5XiqA8Sry9SJlEeJfv0uakNPXihsv+cC5a6EANzrDzM ahZ6KXJ4vIYrpCl7o7wcZkXydjlOffgB+6qCXpKOWJBxayjmF08= =KJ0/ -----END PGP SIGNATURE----- Merge tag 'mailbox-v5.5' of git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - omap : misc - catch error returned from pm_runtime_put_sync - hisi : misc - drop .owner from platform_driver - stm : change how wakeup is handled - imx : fix - bailout on error and nuke correct irq - imx : add support for imx7ulp platform * tag 'mailbox-v5.5' of git://git.linaro.org/landing-teams/working/fujitsu/integration: mailbox: imx: add support for imx v1 mu dt-bindings: mailbox: imx-mu: add imx7ulp MU support mailbox: imx: Clear the right interrupts at shutdown mailbox: imx: Fix Tx doorbell shutdown path mailbox: stm32-ipcc: Update wakeup management mailbox: no need to set .owner platform_driver_register mailbox/omap: Handle if CONFIG_PM is disabled	2019-12-01 18:42:02 -08:00

1 2 3 4 5 ...

883522 Commits