linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 04:18:39 +08:00

Author	SHA1	Message	Date
Eric Dumazet	56440d7ec2	genetlink: hold RCU in genlmsg_mcast() While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw one lockdep splat [1]. genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU. Instead of letting all callers guard genlmsg_multicast_allns() with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast(). This also means the @flags parameter is useless, we need to always use GFP_ATOMIC. [1] [10882.424136] ============================= [10882.424166] WARNING: suspicious RCU usage [10882.424309] 6.12.0-rc2-virtme #1156 Not tainted [10882.424400] ----------------------------- [10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!! [10882.424469] other info that might help us debug this: [10882.424500] rcu_scheduler_active = 2, debug_locks = 1 [10882.424744] 2 locks held by ip/15677: [10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219) [10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209) [10882.426465] stack backtrace: [10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156 [10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [10882.427046] Call Trace: [10882.427131] <TASK> [10882.427244] dump_stack_lvl (lib/dump_stack.c:123) [10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) [10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7)) [10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink [10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink [10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115) [10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210) [10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink [10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201) [10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551) [10882.428069] genl_rcv (net/netlink/genetlink.c:1220) [10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357) [10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901) [10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1)) Fixes: `33f72e6f0c` ("l2tp : multicast notification to the registered listeners") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Tom Parkin <tparkin@katalix.com> Cc: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-15 17:52:58 -07:00
Kees Cook	caf22c969e	scsi: target: tcmu: Annotate struct tcmu_tmr with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct tcmu_tmr. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Bodo Stroesser <bostroesser@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Cc: target-devel@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230922175300.work.148-kees@kernel.org Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Reviewed-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-09-27 11:28:48 -04:00
Azeem Shaikh	4b2e28758d	scsi: target: tcmu: Replace strlcpy() with strscpy() strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Link: https://lore.kernel.org/r/20230621030033.3800351-3-azeemshaikh38@gmail.com Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-06-21 21:13:00 -04:00
Suren Baghdasaryan	1c71222e5f	mm: replace vma->vm_flags direct modifications with modifier calls Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-09 16:51:39 -08:00
Jakub Kicinski	9c5d03d362	genetlink: start to validate reserved header bytes We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-29 12:47:15 +01:00
Bodo Stroesser	325d5c5fb2	scsi: target: tcmu: Avoid holding XArray lock when calling lock_page In tcmu_blocks_release(), lock_page() is called to prevent a race causing possible data corruption. Since lock_page() might sleep, calling it while holding XArray lock is a bug. To fix this, replace the xas_for_each() call with xa_for_each_range(). Since the latter does its own handling of XArray locking, the xas_lock() and xas_unlock() calls around the original loop are no longer necessary. The switch to xa_for_each_range() slows down the loop slightly. This is acceptable since tcmu_blocks_release() is not relevant for performance. Link: https://lore.kernel.org/r/20220517192913.21405-1-bostroesser@gmail.com Fixes: `bb9b9eb0ae` ("scsi: target: tcmu: Fix possible data corruption") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-05-19 20:16:27 -04:00
Xiaoguang Wang	bb9b9eb0ae	scsi: target: tcmu: Fix possible data corruption When tcmu_vma_fault() gets a page successfully, before the current context completes page fault procedure, find_free_blocks() may run and call unmap_mapping_range() to unmap the page. Assume that when find_free_blocks() initially completes and the previous page fault procedure starts to run again and completes, then one truncated page has been mapped to userspace. But note that tcmu_vma_fault() has gotten a refcount for the page so any other subsystem won't be able to use the page unless the userspace address is unmapped later. If another command subsequently runs and needs to extend dbi_thresh it may reuse the corresponding slot for the previous page in data_bitmap. Then though we'll allocate new page for this slot in data_area, no page fault will happen because we have a valid map and the real request's data will be lost. Filesystem implementations will also run into this issue but they usually lock the page when vm_operations_struct->fault gets a page and unlock the page after finish_fault() completes. For truncate filesystems lock pages in truncate_inode_pages() to protect against racing wrt. page faults. To fix this possible data corruption scenario we can apply a method similar to the filesystems. For pages that are to be freed, tcmu_blocks_release() locks and unlocks. Make tcmu_vma_fault() also lock found page under cmdr_lock. At the same time, since tcmu_vma_fault() gets an extra page refcount, tcmu_blocks_release() won't free pages if pages are in page fault procedure, which means it is safe to call tcmu_blocks_release() before unmap_mapping_range(). With these changes tcmu_blocks_release() will wait for all page faults to be completed before calling unmap_mapping_range(). And later, if unmap_mapping_range() is called, it will ensure stale mappings are removed. Link: https://lore.kernel.org/r/20220421023735.9018-1-xiaoguang.wang@linux.alibaba.com Reviewed-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-05-02 16:59:11 -04:00
Xiaoguang Wang	a6968f7a36	scsi: target: tcmu: Fix possible page UAF tcmu_try_get_data_page() looks up pages under cmdr_lock, but it does not take refcount properly and just returns page pointer. When tcmu_try_get_data_page() returns, the returned page may have been freed by tcmu_blocks_release(). We need to get_page() under cmdr_lock to avoid concurrent tcmu_blocks_release(). Link: https://lore.kernel.org/r/20220311132206.24515-1-xiaoguang.wang@linux.alibaba.com Reviewed-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-29 23:07:56 -04:00
Guixin Liu	c7ede4f044	scsi: target: tcmu: Make cmd_ring_size changeable via configfs Make cmd_ring_size changeable similar to the way it is done for max_data_area_mb. The reason is that our tcmu client will create thousands of tcmu instances, and this will consume lots of mem with default 8Mb cmd ring size for every backstore. One can change the value by typing: echo "cmd_ring_size_mb=N" > control The "N" is a integer between 1 to 8, if set 1, the cmd ring can hold about 6k cmds(tcmu_cmd_entry about 176 byte) at least. The value is printed when doing: cat info In addition, a new readonly attribute 'cmd_ring_size_mb' returns the value in read. Link: https://lore.kernel.org/r/1644978109-14885-1-git-send-email-kanie@linux.alibaba.com Reviewed-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-22 21:11:07 -05:00
Bodo Stroesser	1d2ac7b69d	scsi: target: tcmu: Allocate zeroed pages for data area Tcmu populates the data area (used for communication with userspace) with pages that are allocated by calling alloc_page(GFP_NOIO). Therefore previous content of the allocated pages is exposed to user space. Avoid this by adding __GFP_ZERO flag. Zeroing the pages does (nearly) not affect tcmu throughput, because allocated pages are re-used for the data transfers of later SCSI cmds. Link: https://lore.kernel.org/r/20211013171606.25197-1-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-10-18 22:38:35 -04:00
Gustavo A. R. Silva	c20bda3419	scsi: target: tcmu: Use struct_size() helper in kmalloc() Make use of the struct_size() helper instead of an open-coded version, in order to avoid any potential type mistakes or integer overflows that, in the worst scenario, could lead to heap overflows. Link: https://github.com/KSPP/linux/issues/160 Link: https://lore.kernel.org/r/20210927224344.GA190701@embeddedor Reviewed-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-10-04 23:29:38 -04:00
Bodo Stroesser	018c14911d	scsi: target: tcmu: Add new feature KEEP_BUF When running command pipelining for WRITE direction commands (e.g. tape device write), userspace sends cmd completion to cmd ring before processing write data. In that case userspace has to copy data before sending completion, because cmd completion also implicitly releases the data buffer in data area. The new feature KEEP_BUF allows userspace to optionally keep the buffer after completion by setting new bit TCMU_UFLAG_KEEP_BUF in tcmu_cmd_entry_hdr->uflags. In that case buffer has to be released explicitly by writing the cmd_id to new action item free_kept_buf. All kept buffers are released during reset_ring and if userspace closes uio device (tcmu_release). Link: https://lore.kernel.org/r/20210713175021.20103-1-bostroesser@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-03 07:27:42 -04:00
Linus Torvalds	bd31b9efbf	SCSI misc on 20210702 This series consists of the usual driver updates (ufs, ibmvfc, megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with elx and mpi3mr being new drivers. The major core change is a rework to drop the status byte handling macros and the old bit shifted definitions and the rest of the updates are minor fixes. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYN7I6iYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXpRAQCkngYZ 35yQrqOxgOk2pfrysE95tHrV1MfJm2U49NFTwAEAuZutEvBUTfBF+sbcJ06r6q7i H0hkJN/Io7enFs5v3WA= =zwIa -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, ibmvfc, megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with elx and mpi3mr being new drivers. The major core change is a rework to drop the status byte handling macros and the old bit shifted definitions and the rest of the updates are minor fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits) scsi: aha1740: Avoid over-read of sense buffer scsi: arcmsr: Avoid over-read of sense buffer scsi: ips: Avoid over-read of sense buffer scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe() scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame() scsi: elx: libefc: Fix less than zero comparison of a unsigned int scsi: elx: efct: Fix pointer error checking in debugfs init scsi: elx: efct: Fix is_originator return code type scsi: elx: efct: Fix link error for _bad_cmpxchg scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel() scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session() scsi: elx: efct: Fix error handling in efct_hw_init() scsi: elx: efct: Remove redundant initialization of variable lun scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected" scsi: lpfc: Fix build error in lpfc_scsi.c scsi: target: iscsi: Remove redundant continue statement scsi: qla4xxx: Remove redundant continue statement scsi: ppa: Switch to use module_parport_driver() scsi: imm: Switch to use module_parport_driver() scsi: mpt3sas: Fix error return value in _scsih_expander_add() ...	2021-07-02 15:14:36 -07:00
kernel test robot	824731258b	scsi: target: tcmu: Fix boolreturn.cocci warnings drivers/target/target_core_user.c:1424:9-10: WARNING: return of 0/1 in function 'tcmu_handle_completions' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: scripts/coccinelle/misc/boolreturn.cocci Link: https://lore.kernel.org/r/20210515230358.GA97544@60d1edce16e0 Fixes: `9814b55cde` ("scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found") CC: Bodo Stroesser <bostroesser@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-05-21 17:14:21 -04:00
Bodo Stroesser	b4150b6881	scsi: target: tcmu: Fix xarray RCU warning Commit `f5ce815f34` ("scsi: target: tcmu: Support DATA_BLOCK_SIZE = N * PAGE_SIZE") introduced xas_next() calls to iterate xarray elements. These calls triggered the WARNING "suspicious RCU usage" at tcmu device set up [1]. In the call stack of xas_next(), xas_load() was called. According to its comment, this function requires "the xa_lock or the RCU lock". To avoid the warning: - Guard the small loop calling xas_next() in tcmu_get_empty_block with RCU lock. - In the large loop in tcmu_copy_data using RCU lock would possibly disable preemtion for a long time (copy multi MBs). Therefore replace XA_STATE, xas_set and xas_next with a single xa_load. [1] [ 1899.867091] ============================= [ 1899.871199] WARNING: suspicious RCU usage [ 1899.875310] 5.13.0-rc1+ #41 Not tainted [ 1899.879222] ----------------------------- [ 1899.883299] include/linux/xarray.h:1182 suspicious rcu_dereference_check() usage! [ 1899.890940] other info that might help us debug this: [ 1899.899082] rcu_scheduler_active = 2, debug_locks = 1 [ 1899.905719] 3 locks held by kworker/0:1/1368: [ 1899.910161] #0: ffffa1f8c8b98738 ((wq_completion)target_submission){+.+.}-{0:0}, at: process_one_work+0x1ee/0x580 [ 1899.920732] #1: ffffbd7040cd7e78 ((work_completion)(&q->sq.work)){+.+.}-{0:0}, at: process_one_work+0x1ee/0x580 [ 1899.931146] #2: ffffa1f8d1c99768 (&udev->cmdr_lock){+.+.}-{3:3}, at: tcmu_queue_cmd+0xea/0x160 [target_core_user] [ 1899.941678] stack backtrace: [ 1899.946093] CPU: 0 PID: 1368 Comm: kworker/0:1 Not tainted 5.13.0-rc1+ #41 [ 1899.953070] Hardware name: System manufacturer System Product Name/PRIME Z270-A, BIOS 1302 03/15/2018 [ 1899.962459] Workqueue: target_submission target_queued_submit_work [target_core_mod] [ 1899.970337] Call Trace: [ 1899.972839] dump_stack+0x6d/0x89 [ 1899.976222] xas_descend+0x10e/0x120 [ 1899.979875] xas_load+0x39/0x50 [ 1899.983077] tcmu_get_empty_blocks+0x115/0x1c0 [target_core_user] [ 1899.989318] queue_cmd_ring+0x1da/0x630 [target_core_user] [ 1899.994897] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1899.999695] ? trace_kmalloc+0xa6/0xd0 [ 1900.003501] ? __kmalloc+0x205/0x380 [ 1900.007167] tcmu_queue_cmd+0x12f/0x160 [target_core_user] [ 1900.012746] __target_execute_cmd+0x23/0xa0 [target_core_mod] [ 1900.018589] transport_generic_new_cmd+0x1f3/0x370 [target_core_mod] [ 1900.025046] transport_handle_cdb_direct+0x34/0x50 [target_core_mod] [ 1900.031517] target_queued_submit_work+0x43/0xe0 [target_core_mod] [ 1900.037837] process_one_work+0x268/0x580 [ 1900.041952] ? process_one_work+0x580/0x580 [ 1900.046195] worker_thread+0x55/0x3b0 [ 1900.049921] ? process_one_work+0x580/0x580 [ 1900.054192] kthread+0x143/0x160 [ 1900.057499] ? kthread_create_worker_on_cpu+0x40/0x40 [ 1900.062661] ret_from_fork+0x1f/0x30 Link: https://lore.kernel.org/r/20210519135440.26773-1-bostroesser@gmail.com Fixes: `f5ce815f34` ("scsi: target: tcmu: Support DATA_BLOCK_SIZE = N * PAGE_SIZE") Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-05-21 16:15:48 -04:00
Bodo Stroesser	3ac0fcb4b1	scsi: target: tcmu: Rename TCM_DEV_BIT_PLUGGED to TCMU_DEV_BIT_PLUGGED The bit definition TCM_DEV_BIT_PLUGGED should correctly be named TCMU_DEV_BIT_PLUGGED, since all other bits in the same bitfield have prefix TCMU_. Link: https://lore.kernel.org/r/20210512140654.31249-1-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-05-14 22:58:27 -04:00
Bodo Stroesser	9814b55cde	scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found If tcmu_handle_completions() finds an invalid cmd_id while looping over cmd responses from userspace it sets TCMU_DEV_BIT_BROKEN and breaks the loop. This means that it does further handling for the tcmu device. Skip that handling by replacing 'break' with 'return'. Additionally change tcmu_handle_completions() from unsigned int to bool, since the value used in return already is bool. Link: https://lore.kernel.org/r/20210423150123.24468-1-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-28 22:55:00 -04:00
Bodo Stroesser	08976cb548	scsi: target: tcmu: Make data_pages_per_blk changeable via configfs Make data_pages_per_blk changeable similar to the way it is done for max_data_area_mb. One can change the value by typing: echo "data_pages_per_blk=N" >control The value is printed when doing: cat info In addition, a new readonly attribute 'data_pages_per_blk' returns the value on read. Link: https://lore.kernel.org/r/20210324195758.2021-7-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-12 22:40:56 -04:00
Bodo Stroesser	e719afdcf6	scsi: target: tcmu: Replace block size definitions with new udev members Replace DATA_PAGES_PER_BLK and DATA_BLOCK_SIZE with new struct elements tcmu_dev->data_pages_per_blk and tcmu_dev->data_blk_size. These new variables are still loaded with constant definition DATA_PAGES_PER_BLK_DEF (= 1) and DATA_PAGES_PER_BLK_DEF * PAGE_SIZE. There is no way yet to set the values via configfs. Link: https://lore.kernel.org/r/20210324195758.2021-6-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-12 22:40:56 -04:00
Bodo Stroesser	3722e36c4e	scsi: target: tcmu: Remove function tcmu_get_block_page() There is only one caller of tcmu_get_block_page left. Since it is a one-liner, we can remove the function. Link: https://lore.kernel.org/r/20210324195758.2021-5-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-12 22:40:55 -04:00
Bodo Stroesser	f5ce815f34	scsi: target: tcmu: Support DATA_BLOCK_SIZE = N * PAGE_SIZE Change tcmu to support DATA_BLOCK_SIZE being a multiple of PAGE_SIZE. There are two reasons why one would like to have a bigger DATA_BLOCK_SIZE: 1) If userspace - e.g. due to data compression, encryption or deduplication - needs to have receive or transmit data in a consecutive buffer, we can define DATA_BLOCK_SIZE to the maximum size of a SCSI READ/WRITE to enforce that userspace sees just one consecutive buffer. That way we can avoid the need for doing data copy in userspace. 2) Using a bigger data block size can speed up command processing in tcmu. The number of free data blocks to look up in bitmap is reduced substantially. The lookup for data pages in radix_tree can be done more efficiently if there are multiple pages in a data block. The maximum number of IOVs to set up is lower so cmd entries in the ring become smaller. Link: https://lore.kernel.org/r/20210324195758.2021-4-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-12 22:40:55 -04:00
Bodo Stroesser	8b084d9dfb	scsi: target: tcmu: Prepare for PAGE_SIZE != DATA_BLOCK_SIZE Rename some variables and definitions as a first preparation for DATA_BLOCK_SIZE != PAGE_SIZE and add the new DATA_PAGES_PER_BLK definition containing the number of pages per data block. Rename tcmu_try_get_block_page() to tcmu_try_get_data_page(). Keep name tcmu_get_block_page() since it will go away in a following commit when there is only one caller left. Subsequent commits will then add full support for DATA_PAGES_PER_BLK != 1, which also means DATA_BLOCK_SIZE = DATA_PAGES_PER_BLK * PAGE_SIZE Link: https://lore.kernel.org/r/20210324195758.2021-3-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-12 22:40:55 -04:00
Bodo Stroesser	ecddbb7e94	scsi: target: tcmu: Adjust names of variables and definitions Some definitions and members of struct tcmu_dev had misleading names. Examples: - ring_size was used for the size of mailbox + cmd ring + data area - CMDR_SIZE was used for size of mailbox + cmd ring I added the new definition MB_CMDR_SIZE (mailbox + command ring), changed CMDR_SIZE to hold the size of the command ring only and replaced in struct tcmu_dev the member ring_size with mmap_pages, because the member is now used in tcmu_mmap() only, where we need page count, not size. I also added the new struct tcmu_dev member 'cmdr' which is used to replace some occurences of '(void *)mb + CMDR_OFF' with 'udev->cmdr' for better readability. Link: https://lore.kernel.org/r/20210324195758.2021-2-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-12 22:40:55 -04:00
Bodo Stroesser	471ee95ccc	scsi: target: tcmu: Adjust parameter in call to tcmu_blocks_release() In commit `f7c89771d0` ("scsi: target: tcmu: Replace radix_tree with XArray") the meaning of last parameter of tcmu_blocks_release() was changed. So in the callers we should subtract 1 from the previous parameter. Unfortunately that change got lost at one of the two places where tcmu_blocks_release() is called. That does not lead to any problems, but we should adjust it anyway. Link: https://lore.kernel.org/r/20210310184458.10741-1-bostroesser@gmail.com Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-03-15 23:18:37 -04:00
Bodo Stroesser	1080782f13	scsi: target: tcmu: Use GFP_NOIO while handling cmds or holding cmdr_lock Especially when using tcmu with tcm_loop, memory allocations with GFP_KERNEL for a LUN can cause write back to the same LUN. So we have to use GFP_NOIO when allocation is done while handling commands or while holding cmdr_lock. Link: https://lore.kernel.org/r/20210305190009.32242-1-bostroesser@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-03-09 23:47:17 -05:00
Bodo Stroesser	f7c89771d0	scsi: target: tcmu: Replace radix_tree with XArray An attempt from Matthew Wilcox to replace radix-tree usage by XArray in tcmu more than 1 year ago unfortunately got lost. I rebased that work on latest tcmu and tested it. Link: https://lore.kernel.org/r/20210224185335.13844-3-bostroesser@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-03-09 23:47:17 -05:00
Bodo Stroesser	d3cbb743c3	scsi: target: tcmu: Replace IDR by XArray An attempt from Matthew Wilcox to replace IDR usage by XArray in tcmu more than 1 year ago unfortunately got lost. I rebased that work on latest tcmu and tested it. Link: https://lore.kernel.org/r/20210224185335.13844-2-bostroesser@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-03-09 23:47:17 -05:00
Mike Christie	6888da8179	scsi: target: tcmu: Add backend plug/unplug callouts This patch adds plug/unplug callouts for tcmu, so we can avoid the number of times we switch to userspace. Using this driver with tcm_loop is a common config, and dependng on the nr_hw_queues (nr_hw_queues=1 performs much better) and fio jobs (lower num jobs around 4) this patch can increase IOPS by only around 5-10% because we hit other issues like the big per tcmu device mutex. Link: https://lore.kernel.org/r/20210227170006.5077-24-michael.christie@oracle.com Reviewed-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-03-04 17:37:02 -05:00
Bodo Stroesser	8f33bb2400	scsi: target: tcmu: Fix memory leak caused by wrong uio usage When user deletes a tcmu device via configFS, tcmu calls uio_unregister_device(). During that call uio resets its pointer to struct uio_info provided by tcmu. That means, after uio_unregister_device() uio will no longer execute any of the callbacks tcmu had set in uio_info. Especially, if userspace daemon still holds the corresponding uio device open or mmap'ed while tcmu calls uio_unregister_device(), uio will not call tcmu_release() when userspace finally closes and munmaps the uio device. Since tcmu does refcounting for the tcmu device in tcmu_open() and tcmu_release(), in the decribed case refcount does not drop to 0 and tcmu does not free tcmu device's resources. In extreme cases this can cause memory leaking of up to 1 GB for a single tcmu device. After uio_unregister_device(), uio will reject every open, read, write, mmap from userspace with -EOI. But userspace daemon can still access the mmap'ed command ring and data area. Therefore tcmu should wait until userspace munmaps the uio device before it frees the resources, as we don't want to cause SIGSEGV or SIGBUS to user space. That said, current refcounting during tcmu_open and tcmu_release does not work correctly, and refcounting better should be done in the open and close callouts of the vm_operations_struct, which tcmu assigns to each mmap of the uio device (because it wants its own page fault handler). This patch fixes the memory leak by removing refcounting from tcmu_open and tcmu_close, and instead adding new tcmu_vma_open() and tcmu_vma_close() handlers that only do refcounting. Link: https://lore.kernel.org/r/20210218175039.7829-3-bostroesser@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-02-22 22:35:21 -05:00
Bodo Stroesser	43bf922cdd	scsi: target: tcmu: Move some functions without code change This patch just moves one block of code containing some functions inside target_core_user.c to avoid adding prototypes in next patch. Link: https://lore.kernel.org/r/20210218175039.7829-2-bostroesser@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-02-22 22:35:21 -05:00
Shin'ichiro Kawasaki	780e138468	scsi: target: tcmu: Fix use-after-free of se_cmd->priv Commit `a35129024e` ("scsi: target: tcmu: Use priv pointer in se_cmd") modified tcmu_free_cmd() to set NULL to priv pointer in se_cmd. However, se_cmd can be already freed by work queue triggered in target_complete_cmd(). This caused BUG KASAN use-after-free [1]. To fix the bug, do not touch priv pointer in tcmu_free_cmd(). Instead, set NULL to priv pointer before target_complete_cmd() calls. Also, to avoid unnecessary priv pointer change in tcmu_queue_cmd(), modify priv pointer in the function only when tcmu_free_cmd() is not called. [1] BUG: KASAN: use-after-free in tcmu_handle_completions+0x1172/0x1770 [target_core_user] Write of size 8 at addr ffff88814cf79a40 by task cmdproc-uio0/14842 CPU: 2 PID: 14842 Comm: cmdproc-uio0 Not tainted 5.11.0-rc2 #1 Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.2 11/22/2019 Call Trace: dump_stack+0x9a/0xcc ? tcmu_handle_completions+0x1172/0x1770 [target_core_user] print_address_description.constprop.0+0x18/0x130 ? tcmu_handle_completions+0x1172/0x1770 [target_core_user] ? tcmu_handle_completions+0x1172/0x1770 [target_core_user] kasan_report.cold+0x7f/0x10e ? tcmu_handle_completions+0x1172/0x1770 [target_core_user] tcmu_handle_completions+0x1172/0x1770 [target_core_user] ? queue_tmr_ring+0x5d0/0x5d0 [target_core_user] tcmu_irqcontrol+0x28/0x60 [target_core_user] uio_write+0x155/0x230 ? uio_vma_fault+0x460/0x460 ? security_file_permission+0x4f/0x440 vfs_write+0x1ce/0x860 ksys_write+0xe9/0x1b0 ? __ia32_sys_read+0xb0/0xb0 ? syscall_enter_from_user_mode+0x27/0x70 ? trace_hardirqs_on+0x1c/0x110 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fcf8b61905f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 b9 fc ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c fd ff ff 48 RSP: 002b:00007fcf7b3e6c30 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcf8b61905f RDX: 0000000000000004 RSI: 00007fcf7b3e6c78 RDI: 000000000000000c RBP: 00007fcf7b3e6c80 R08: 0000000000000000 R09: 00007fcf7b3e6aa8 R10: 000000000b01c000 R11: 0000000000000293 R12: 00007ffe0c32a52e R13: 00007ffe0c32a52f R14: 0000000000000000 R15: 00007fcf7b3e7640 Allocated by task 383: kasan_save_stack+0x1b/0x40 ____kasan_kmalloc.constprop.0+0x84/0xa0 kmem_cache_alloc+0x142/0x330 tcm_loop_queuecommand+0x2a/0x4e0 [tcm_loop] scsi_queue_rq+0x12ec/0x2d20 blk_mq_dispatch_rq_list+0x30a/0x1db0 __blk_mq_do_dispatch_sched+0x326/0x830 __blk_mq_sched_dispatch_requests+0x2c8/0x3f0 blk_mq_sched_dispatch_requests+0xca/0x120 __blk_mq_run_hw_queue+0x93/0xe0 process_one_work+0x7b6/0x1290 worker_thread+0x590/0xf80 kthread+0x362/0x430 ret_from_fork+0x22/0x30 Freed by task 11655: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0xec/0x120 slab_free_freelist_hook+0x53/0x160 kmem_cache_free+0xf4/0x5c0 target_release_cmd_kref+0x3ea/0x9e0 [target_core_mod] transport_generic_free_cmd+0x28b/0x2f0 [target_core_mod] target_complete_ok_work+0x250/0xac0 [target_core_mod] process_one_work+0x7b6/0x1290 worker_thread+0x590/0xf80 kthread+0x362/0x430 ret_from_fork+0x22/0x30 Last potentially related work creation: kasan_save_stack+0x1b/0x40 kasan_record_aux_stack+0xa3/0xb0 insert_work+0x48/0x2e0 __queue_work+0x4e8/0xdf0 queue_work_on+0x78/0x80 tcmu_handle_completions+0xad0/0x1770 [target_core_user] tcmu_irqcontrol+0x28/0x60 [target_core_user] uio_write+0x155/0x230 vfs_write+0x1ce/0x860 ksys_write+0xe9/0x1b0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Second to last potentially related work creation: kasan_save_stack+0x1b/0x40 kasan_record_aux_stack+0xa3/0xb0 insert_work+0x48/0x2e0 __queue_work+0x4e8/0xdf0 queue_work_on+0x78/0x80 tcm_loop_queuecommand+0x1c3/0x4e0 [tcm_loop] scsi_queue_rq+0x12ec/0x2d20 blk_mq_dispatch_rq_list+0x30a/0x1db0 __blk_mq_do_dispatch_sched+0x326/0x830 __blk_mq_sched_dispatch_requests+0x2c8/0x3f0 blk_mq_sched_dispatch_requests+0xca/0x120 __blk_mq_run_hw_queue+0x93/0xe0 process_one_work+0x7b6/0x1290 worker_thread+0x590/0xf80 kthread+0x362/0x430 ret_from_fork+0x22/0x30 The buggy address belongs to the object at ffff88814cf79800 which belongs to the cache tcm_loop_cmd_cache of size 896. Link: https://lore.kernel.org/r/20210113024508.1264992-1-shinichiro.kawasaki@wdc.com Fixes: `a35129024e` ("scsi: target: tcmu: Use priv pointer in se_cmd") Cc: stable@vger.kernel.org # v5.9+ Acked-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-01-14 21:56:43 -05:00
Linus Torvalds	60f7c503d9	SCSI misc on 20201216 This series consists of the usual driver updates (ufs, qla2xxx, smartpqi, target, zfcp, fnic, mpt3sas, ibmvfc) plus a load of cleanups, a major power management rework and a load of assorted minor updates. There are a few core updates (formatting fixes being the big one) but nothing major this cycle. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX9o0KSYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbOZAP9D5NTN J7dJUo2MIMy84YBu+d9ag7yLlNiRWVY2yw5vHwD/Z7JjAVLwz/tzmyjU9//o2J6w hwhOv6Uto89gLCWSEz8= =KUPT -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This consists of the usual driver updates (ufs, qla2xxx, smartpqi, target, zfcp, fnic, mpt3sas, ibmvfc) plus a load of cleanups, a major power management rework and a load of assorted minor updates. There are a few core updates (formatting fixes being the big one) but nothing major this cycle" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: mpt3sas: Update driver version to 36.100.00.00 scsi: mpt3sas: Handle trigger page after firmware update scsi: mpt3sas: Add persistent MPI trigger page scsi: mpt3sas: Add persistent SCSI sense trigger page scsi: mpt3sas: Add persistent Event trigger page scsi: mpt3sas: Add persistent Master trigger page scsi: mpt3sas: Add persistent trigger pages support scsi: mpt3sas: Sync time periodically between driver and firmware scsi: qla2xxx: Update version to 10.02.00.104-k scsi: qla2xxx: Fix device loss on 4G and older HBAs scsi: qla2xxx: If fcport is undergoing deletion complete I/O with retry scsi: qla2xxx: Fix the call trace for flush workqueue scsi: qla2xxx: Fix flash update in 28XX adapters on big endian machines scsi: qla2xxx: Handle aborts correctly for port undergoing deletion scsi: qla2xxx: Fix N2N and NVMe connect retry failure scsi: qla2xxx: Fix FW initialization error on big endian machines scsi: qla2xxx: Fix crash during driver load on big endian machines scsi: qla2xxx: Fix compilation issue in PPC systems scsi: qla2xxx: Don't check for fw_started while posting NVMe command scsi: qla2xxx: Tear down session if FW say it is down ...	2020-12-16 13:34:31 -08:00
Gustavo A. R. Silva	8fdaabe1c9	scsi: target: tcmu: Replace zero-length array with flexible-array member There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-10-29 17:22:59 -05:00
Bodo Stroesser	c8ed1ff88c	scsi: target: tcmu: scatter_/gather_data_area() rework scatter_data_area() and gather_data_area() are not easy to understand since data is copied in nested loops over sg_list and tcmu dbi list. Since sg list can contain only partly filled pages, the loop has to be prepared to handle sg pages not matching dbi pages one by one. Existing implementation uses kmap_atomic()/kunmap_atomic() due to performance reasons. But instead of using these calls strictly nested for sg and dpi pages, the code holds the mappings in an overlapping way, which indeed is a bug that would trigger on archs using highmem. The scatterlist lib contains the sg_miter_start/_next/_stop functions which can be used to simplify such complicated loops. The new code now processes the dbi list in the outer loop, while sg list is handled by the inner one. That way the code can take advantage of the sg_miter_* family calls. Calling sg_miter_stop() after the end of the inner loop enforces strict nesting of atomic kmaps. Since the nested loops in scatter_/gather_data_area were very similar, I replaced them by the new helper function tcmu_copy_data(). Link: https://lore.kernel.org/r/20201019115118.11949-1-bostroesser@gmail.com Acked-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bostroesser@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-10-26 18:27:44 -04:00
Linus Torvalds	9ff9b0d392	networking changes for the 5.10 merge window Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. Allow more than 255 IPv4 multicast interfaces. Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. Allow more calls to same peer in RxRPC. Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. Add TC actions for implementing MPLS L2 VPNs. Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. Support sleepable BPF programs, initially targeting LSM and tracing. Add bpf_d_path() helper for returning full path for given 'struct path'. Make bpf_tail_call compatible with bpf-to-bpf calls. Allow BPF programs to call map_update_elem on sockmaps. Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). Support listing and getting information about bpf_links via the bpf syscall. Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. Add XDP support for Intel's igb driver. Support offloading TC flower classification and filtering rules to mscc_ocelot switches. Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81 PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy 2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP 81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc= =bc1U -----END PGP SIGNATURE----- Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: - Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. - Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). - Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. - Allow more than 255 IPv4 multicast interfaces. - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. - In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. - Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. - Allow more calls to same peer in RxRPC. - Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. - Add TC actions for implementing MPLS L2 VPNs. - Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. - Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. - Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. - Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. - Support sleepable BPF programs, initially targeting LSM and tracing. - Add bpf_d_path() helper for returning full path for given 'struct path'. - Make bpf_tail_call compatible with bpf-to-bpf calls. - Allow BPF programs to call map_update_elem on sockmaps. - Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). - Support listing and getting information about bpf_links via the bpf syscall. - Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). - Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. - Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). - In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. - Add XDP support for Intel's igb driver. - Support offloading TC flower classification and filtering rules to mscc_ocelot switches. - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. - Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. - Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. - Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. - Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. - Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits) Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" net, sockmap: Don't call bpf_prog_put() on NULL pointer bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo bpf, sockmap: Add locking annotations to iterator netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements net: fix pos incrementment in ipv6_route_seq_next net/smc: fix invalid return code in smcd_new_buf_create() net/smc: fix valid DMBE buffer sizes net/smc: fix use-after-free of delayed events bpfilter: Fix build error with CONFIG_BPFILTER_UMH cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info bpf: Fix register equivalence tracking. rxrpc: Fix loss of final ack on shutdown rxrpc: Fix bundle counting for exclusive connections netfilter: restore NF_INET_NUMHOOKS ibmveth: Identify ingress large send packets. ibmveth: Switch order of ibmveth_helper calls. cxgb4: handle 4-tuple PEDIT to NAT mode translation selftests: Add VRF route leaking tests ...	2020-10-15 18:42:13 -07:00
Jakub Kicinski	66a9b9287d	genetlink: move to smaller ops wherever possible Bulk of the genetlink users can use smaller ops, move them. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-02 19:11:11 -07:00
John Donnelly	61741d8699	scsi: target: tcmu: Fix warning: 'page' may be used uninitialized Corrects drivers/target/target_core_user.c:688:6: warning: 'page' may be used uninitialized. Link: https://lore.kernel.org/r/20200924001920.43594-1-john.p.donnelly@oracle.com Fixes: `3c58f73723` ("scsi: target: tcmu: Optimize use of flush_dcache_page") Cc: Mike Christie <michael.christie@oracle.com> Acked-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: John Donnelly <john.p.donnelly@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-10-02 21:22:01 -04:00
Bodo Stroesser	3c9a7c58ea	scsi: target: tcmu: Optimize scatter_data_area() scatter_data_area() has two purposes: 1) Create the iovs for the data area buffer of a SCSI cmd. 2) If there is data in DMA_TO_DEVICE direction, copy the data from sg_list to data area buffer. Both are done in a common loop. In case of DMA_FROM_DEVICE data transfer, scatter_data_area() is called with parameter copy_data = false. But this flag is just used to skip memcpy() for data, while radix_tree_lookup still is called for every dbi of the area area buffer, and kmap and kunmap are called for every page from sg_list and data_area as well as flush_dcache_page() for the data area pages. Since the only thing to do with copy_data = false would be to set up the iovs, this is a noticeable overhead. Rework the iov creation in the main loop of scatter_data_area() providing the new function new_block_to_iov(). Based on this, create the short new function tcmu_setup_iovs() that only writes the iovs with no overhead. This new function is now called instead of scatter_data_area() for bidi buffers and for data buffers in those cases where memcpy() would have been skipped. Link: https://lore.kernel.org/r/20200910155041.17654-4-bstroesser@ts.fujitsu.com Acked-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-09-22 17:31:43 -04:00
Bodo Stroesser	7e98905e9d	scsi: target: tcmu: Optimize queue_cmd_ring() queue_cmd_ring() needs to check whether there is enough space in cmd ring and data area for the cmd to queue. Currently the sequence is: 1) Calculate size the cmd will occupy on the ring based on estimation of needed iovs. 2) Check whether there is enough space on the ring based on size from 1) 3) Allocate buffers in data area. 4) Calculate number of iovs the command really needs while copying incoming data (if any) to data area. 5) Re-calculate real size of cmd on ring based on real number of iovs. 6) Set up possible padding and cmd on the ring. Step 1) must not underestimate the cmd size so use max possible number of iovs for the given I/O data size. The resulting overestimation can be really high so this sequence is not ideal. The earliest the real number of iovs can be calculated is after data buffer allocation. Therefore rework the code to implement the following sequence: A) Allocate buffers on data area and calculate number of necessary iovs during this. B) Calculate real size of cmd on ring based on number of iovs. C) Check whether there is enough space on the ring. D) Set up possible padding and cmd on the ring. The new sequence enforces the split of new function tcmu_alloc_data_space() from is_ring_space_avail(). Using this function, change queue_cmd_ring() according to the new sequence. Change routines called by tcmu_alloc_data_space() to allow calculating and returning the iov count. Remove counting of iovs in scatter_data_area(). Link: https://lore.kernel.org/r/20200910155041.17654-3-bstroesser@ts.fujitsu.com Acked-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-09-22 17:31:43 -04:00
Bodo Stroesser	52ef2743f1	scsi: target: tcmu: Join tcmu_cmd_get_data_length() and tcmu_cmd_get_block_cnt() Simplify code by joining tcmu_cmd_get_data_length() and tcmu_cmd_get_block_cnt() into tcmu_cmd_set_block_cnts(). The new function sets tcmu_cmd->dbi_cnt and also the new field tcmu_cmd->dbi_bidi_cnt which is needed for further enhancements in following patches. Simplify some code by using tcmu_cmd->dbi(_bidi)_cnt instead of calculation from length. Please note: The calculation of the number of dbis needed for bidi was wrong. It was based on the length of the first bidi sg only. I changed it to correctly sum up entire length of all bidi sgs. Link: https://lore.kernel.org/r/20200910155041.17654-2-bstroesser@ts.fujitsu.com Acked-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-09-22 17:31:42 -04:00
Xiongfeng Wang	6d70cb3434	scsi: target: tcmu: Add missing newline when printing parameters The tcmu 'global_max_data_area_mb' parameter in sysfs is missing a newline. Add it. Link: https://lore.kernel.org/r/1599132573-33818-1-git-send-email-wangxiongfeng2@huawei.com Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-09-15 20:19:13 -04:00
Bodo Stroesser	59526d7a18	scsi: target: tcmu: Make TMR notification optional Add "tmr_notification" configfs attribute to tcmu devices. If the default value 0 is used, tcmu only removes aborted commands from qfull_queue. If user changes tmr_notification to 1, additionally TMR notifications will be written to the cmd ring. Link: https://lore.kernel.org/r/20200726153510.13077-9-bstroesser@ts.fujitsu.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-28 22:25:31 -04:00
Bodo Stroesser	bc2d214af5	scsi: target: tcmu: Implement tmr_notify callback This patch implements the tmr_notify callback for tcmu. When the callback is called, tcmu checks the list of aborted commands it received as parameter: - aborted commands in the qfull_queue are removed from the queue and target_complete_command is called - from the cmd_ids of aborted commands currently uncompleted in cmd ring it creates a list of aborted cmd_ids. Finally a TMR notification is written to cmd ring containing TMR type and cmd_id list. If there is no space in ring, the TMR notification is queued on a TMR specific queue. The TMR specific queue 'tmr_queue' can be seen as a extension of the cmd ring. At the end of each iexecution of tcmu_complete_commands() we check whether tmr_queue contains TMRs and try to move them onto the ring. If tmr_queue is not empty after that, we don't call run_qfull_queue() because commands must not overtake TMRs. This way we guarantee that cmd_ids in TMR notification received by userspace either match an active, not yet completed command or are no longer valid due to userspace having complete some cmd_ids meanwhile. New commands that were assigned to an aborted cmd_id will always appear on the cmd ring _after_ the TMR. Link: https://lore.kernel.org/r/20200726153510.13077-8-bstroesser@ts.fujitsu.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-28 22:25:30 -04:00
Bodo Stroesser	ed212ca878	scsi: target: tcmu: Fix and simplify timeout handling During cmd timeout handling in check_timedout_devices(), due to a race, it can happen that tcmu_set_next_deadline() does not start a timer as expected: 1) Either tcmu_check_expired_ring_cmd() checks the inflight_queue or tcmu_check_expired_queue_cmd() checks the qfull_queue while jiffies has the value X 2) At the end of the check the queue contains one remaining command with deadline X (time_after(X, X) is false and thus the command is not handled as being timed out). 3) After tcmu_check_expired_xxxxx_cmd() a timer interrupt happens and jiffies is incremented to X+1. 4) Now tcmu_set_next_deadline() is called, but it skips the command, since time_after(X+1, X) is true. Therefore tcmu_set_next_deadline() finds no new deadline and stops the timer, which it shouldn't. Since commands that time out are removed from inflight_queue or qfull_queue, we don't need the check with time_after() in tcmu_set_next_deadline() but can use the deadline from the first cmd in the queue. Additionally, replace the remaining time_after() calls in tcmu_check_expired_xxxxx_cmd() with time_after_eq(), because it is not useful to set the timeout to deadline but then check for jiffies being greater than deadline. Simplify the end of tcmu_handle_completions() and change the check for no more pending commands from mb->cmd_tail == mb->cmd_head to idr_is_empty(&udev->commands) because the old check doesn't work correctly if paddings or in the future TMRs are in the ring. Finally tcmu_set_next_deadline() was shifted in the source as preparation for later implementation of tmr_notify callback. Link: https://lore.kernel.org/r/20200726153510.13077-7-bstroesser@ts.fujitsu.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-28 22:25:29 -04:00
Bodo Stroesser	3d3f9d56a5	scsi: target: tcmu: Factor out new helper ring_insert_padding The new helper ring_insert_padding is split off from and then called by queue_cmd_ring. It inserts a padding if necessary. The new helper will in a subsequent patch be used during writing of TMR notifications to command ring. Link: https://lore.kernel.org/r/20200726153510.13077-6-bstroesser@ts.fujitsu.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-28 22:25:28 -04:00
Bodo Stroesser	c968492762	scsi: target: tcmu: Do not queue aborted commands If tcmu receives an already aborted command, tcmu_queue_cmd() should reject it. Link: https://lore.kernel.org/r/20200726153510.13077-5-bstroesser@ts.fujitsu.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-28 22:25:28 -04:00
Bodo Stroesser	a35129024e	scsi: target: tcmu: Use priv pointer in se_cmd We initialize and clean up the se_cmd's priv pointer under cmd_ring_lock to point to the corresponding tcmu_cmd. In the patch that implements tmr_notify callback in tcmu we will use the priv pointer. Link: https://lore.kernel.org/r/20200726153510.13077-4-bstroesser@ts.fujitsu.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-28 22:25:27 -04:00
Bodo Stroesser	5a0c256d96	scsi: target: tcmu: Fix crash on ARM during cmd completion If tcmu_handle_completions() has to process a padding shorter than sizeof(struct tcmu_cmd_entry), the current call to tcmu_flush_dcache_range() with sizeof(struct tcmu_cmd_entry) as length param is wrong and causes crashes on e.g. ARM, because tcmu_flush_dcache_range() in this case calls flush_dcache_page(vmalloc_to_page(start)); with start being an invalid address above the end of the vmalloc'ed area. The fix is to use the minimum of remaining ring space and sizeof(struct tcmu_cmd_entry) as the length param. The patch was tested on kernel 4.19.118. See https://bugzilla.kernel.org/show_bug.cgi?id=208045#c10 Link: https://lore.kernel.org/r/20200629093756.8947-1-bstroesser@ts.fujitsu.com Tested-by: JiangYu <lnsyyj@hotmail.com> Acked-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-06-29 21:44:57 -04:00
Bodo Stroesser	3145550a7f	scsi: target: tcmu: Fix crash in tcmu_flush_dcache_range on ARM This patch fixes the following crash (see https://bugzilla.kernel.org/show_bug.cgi?id=208045) Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a) CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic #202004230533 Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019 pstate: 80400005 (Nzcv daif +PAN -UAO) pc : flush_dcache_page+0x18/0x40 lr : is_ring_space_avail+0x68/0x2f8 [target_core_user] sp : ffff000015123a80 x29: ffff000015123a80 x28: 0000000000000000 x27: 0000000000001000 x26: ffff000023ea5000 x25: ffffcfa25bbe08b8 x24: 0000000000000078 x23: ffff7e0000000000 x22: ffff000023ea5001 x21: ffffcfa24b79c000 x20: 0000000000000fff x19: ffff7e00008fa940 x18: 0000000000000000 x17: 0000000000000000 x16: ffff2d047e709138 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: ffff2d047fbd0a40 x11: 0000000000000000 x10: 0000000000000030 x9 : 0000000000000000 x8 : ffffc9a254820a00 x7 : 00000000000013b0 x6 : 000000000000003f x5 : 0000000000000040 x4 : ffffcfa25bbe08e8 x3 : 0000000000001000 x2 : 0000000000000078 x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18 Call trace: flush_dcache_page+0x18/0x40 is_ring_space_avail+0x68/0x2f8 [target_core_user] queue_cmd_ring+0x1f8/0x680 [target_core_user] tcmu_queue_cmd+0xe4/0x158 [target_core_user] __target_execute_cmd+0x30/0xf0 [target_core_mod] target_execute_cmd+0x294/0x390 [target_core_mod] transport_generic_new_cmd+0x1e8/0x358 [target_core_mod] transport_handle_cdb_direct+0x50/0xb0 [target_core_mod] iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod] iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod] iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod] iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod] iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod] kthread+0x130/0x138 ret_from_fork+0x10/0x18 Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260) ---[ end trace 1e451c73f4266776 ]--- The solution is based on patch: "scsi: target: tcmu: Optimize use of flush_dcache_page" which restricts the use of tcmu_flush_dcache_range() to addresses from vmalloc'ed areas only. This patch now replaces the virt_to_page() call in tcmu_flush_dcache_range() - which is wrong for vmalloced addrs - by vmalloc_to_page(). The patch was tested on ARM with kernel 4.19.118 and 5.7.2 Link: https://lore.kernel.org/r/20200618131632.32748-3-bstroesser@ts.fujitsu.com Tested-by: JiangYu <lnsyyj@hotmail.com> Tested-by: Daniel Meyerholt <dxm523@gmail.com> Acked-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-06-19 23:22:48 -04:00
Bodo Stroesser	3c58f73723	scsi: target: tcmu: Optimize use of flush_dcache_page (scatter\|gather)_data_area() need to flush dcache after writing data to or before reading data from a page in uio data area. The two routines are able to handle data transfer to/from such a page in fragments and flush the cache after each fragment was copied by calling the wrapper tcmu_flush_dcache_range(). That means: 1) flush_dcache_page() can be called multiple times for the same page. 2) Calling flush_dcache_page() indirectly using the wrapper does not make sense, because each call of the wrapper is for one single page only and the calling routine already has the correct page pointer. Change (scatter\|gather)_data_area() such that, instead of calling tcmu_flush_dcache_range() before/after each memcpy, it now calls flush_dcache_page() before unmapping a page (when writing is complete for that page) or after mapping a page (when starting to read the page). After this change only calls to tcmu_flush_dcache_range() for addresses in vmalloc'ed command ring are left over. The patch was tested on ARM with kernel 4.19.118 and 5.7.2 Link: https://lore.kernel.org/r/20200618131632.32748-2-bstroesser@ts.fujitsu.com Tested-by: JiangYu <lnsyyj@hotmail.com> Tested-by: Daniel Meyerholt <dxm523@gmail.com> Acked-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-06-19 23:22:47 -04:00

1 2 3 4 5

223 Commits