linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-04 17:44:14 +08:00

Author	SHA1	Message	Date
Michal Koutný	9474c62ab6	net/sched: Add module alias for sch_fq_pie The commit `2c15a5aee2` ("net/sched: Load modules via their alias") starts loading modules via aliases and not canonical names. The new aliases were added in commit `241a94abcf` ("net/sched: Add module aliases for cls_,sch_,act_ modules") via a Coccinele script. sch_fq_pie.c is missing module.h header and thus Coccinele did not patch it. Add the include and module alias manually, so that autoloading works for sch_fq_pie too. (Note: commit message in commit `241a94abcf` ("net/sched: Add module aliases for cls_,sch_,act_ modules") was mangled due to '#' misinterpretation. The predicate haskernel is: \| @ haskernel @ \| @@ \| \| #include <linux/module.h> \| .) Fixes: `241a94abcf` ("net/sched: Add module aliases for cls_,sch_,act_ modules") Signed-off-by: Michal Koutný <mkoutny@suse.com> Link: https://lore.kernel.org/r/20240315160210.8379-1-mkoutny@suse.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 15:33:25 +01:00
Martin Jocić	af1752ecdc	can: kvaser_pciefd: Add additional Xilinx interrupts Since Xilinx-based adapters now support up to eight CAN channels, the TX interrupt mask array must have eight elements. Signed-off-by: Martin Jocic <martin.jocic@kvaser.com> Link: https://lore.kernel.org/all/2ab3c0585c3baba272ede0487182a423a420134b.camel@kvaser.com Fixes: `9b221ba452` ("can: kvaser_pciefd: Add support for Kvaser PCIe 8xCAN") [mkl: replace Link by Fixes tag] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2024-03-19 15:26:01 +01:00
Xiubo Li	825b82f6b8	ceph: set correct cap mask for getattr request for read In case of hitting the file EOF, ceph_read_iter() needs to retrieve the file size from MDS, and Fr caps aren't neccessary. [ idryomov: fold into existing retry_op == READ_INLINE branch ] Reported-by: Frank Hsiao <frankhsiao@qnap.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Frank Hsiao <frankhsiao@qnap.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-03-19 14:35:55 +01:00
Xiubo Li	1065da21e5	ceph: stop copying to iter at EOF on sync reads If EOF is encountered, ceph_sync_read() return value is adjusted down according to i_size, but the "to" iter is advanced by the actual number of bytes read. Then, when retrying, the remainder of the range may be skipped incorrectly. Ensure that the "to" iter is advanced only until EOF. [ idryomov: changelog ] Fixes: `c3d8e0b5de` ("ceph: return the real size read when it hits EOF") Reported-by: Frank Hsiao <frankhsiao@qnap.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Frank Hsiao <frankhsiao@qnap.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-03-19 14:35:55 +01:00
Dave Airlie	5d4e8ae6e5	nouveau/gsp: don't check devinit disable on GSP. GSP should be handling this and I can see no evidence in opengpu driver that this register should be touched. Fixed acceleration on 2080 Ti GPUs. Fixes: `15740541e8` ("drm/nouveau/devinit/tu102-: prepare for GSP-RM") Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240314014521.2695233-1-airlied@gmail.com	2024-03-19 14:34:55 +01:00
Tobias Brunner	c9b3b81716	ipv4: raw: Fix sending packets from raw sockets via IPsec tunnels Since the referenced commit, the xfrm_inner_extract_output() function uses the protocol field to determine the address family. So not setting it for IPv4 raw sockets meant that such packets couldn't be tunneled via IPsec anymore. IPv6 raw sockets are not affected as they already set the protocol since `9c9c9ad5fa` ("ipv6: set skb->protocol on tcp, raw and ip6_append_data genereated skbs"). Fixes: `f4796398f2` ("xfrm: Remove inner/outer modes from output path") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/c5d9a947-eb19-4164-ac99-468ea814ce20@strongswan.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 13:45:58 +01:00
Felix Maurer	3cf28cd492	hsr: Handle failures in module init A failure during registration of the netdev notifier was not handled at all. A failure during netlink initialization did not unregister the netdev notifier. Handle failures of netdev notifier registration and netlink initialization. Both functions should only return negative values on failure and thereby lead to the hsr module not being loaded. Fixes: `f421436a59` ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/3ce097c15e3f7ace98fc7fd9bcbf299f092e63d1.1710504184.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 13:38:17 +01:00
Rafael J. Wysocki	a6d6590917	Merge branches 'pm-em', 'pm-powercap' and 'pm-sleep' Merge additional updates related to the Energy Model, power capping and system-wide power management for 6.9-rc1: - Modify the Energy Model code to bail out and complain if the unit of power is not uW to prevent errors due to unit mismatches (Lukasz Luba). - Make the intel_rapl platform driver use a remove callback returning void (Uwe Kleine-König). - Fix typo in the suspend and interrupts document (Saravana Kannan). * pm-em: PM: EM: Force device drivers to provide power in uW * pm-powercap: powercap: intel_rapl: Convert to platform remove callback returning void * pm-sleep: Documentation: power: Fix typo in suspend and interrupts doc	2024-03-19 13:25:49 +01:00
Roman Smirnov	c2d953276b	fbmon: prevent division by zero in fb_videomode_from_videomode() The expression htotal * vtotal can have a zero value on overflow. It is necessary to prevent division by zero like in fb_var_to_videomode(). Found by Linux Verification Center (linuxtesting.org) with Svace. Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Helge Deller <deller@gmx.de>	2024-03-19 13:20:20 +01:00
Rafael J. Wysocki	a873add22a	Merge branch 'acpi-docs' Merge an ACPI documentation update for 6.9-rc1 which adds markup to generate links from footnotes in the enumeration document. * acpi-docs: ACPI: docs: enumeration: Make footnotes links	2024-03-19 13:16:15 +01:00
Mathias Nyman	a788e53c05	usb: usb-acpi: Fix oops due to freeing uninitialized pld pointer If reading the ACPI _PLD port location object fails, or the port doesn't have a _PLD ACPI object then the *pld pointer will remain uninitialized and oops when freed. The patch that caused this is currently in next, on its way to v6.9. So no need to add this to stable or current 6.8 kernel. Reported-by: Klara Modin <klarasmodin@gmail.com> Closes: https://lore.kernel.org/linux-usb/7e92369a-3197-4883-9988-3c93452704f5@gmail.com/ Tested-by: Klara Modin <klarasmodin@gmail.com> Fixes: `f3ac348e6e` ("usb: usb-acpi: Set port connect type of not connectable ports correctly") Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240308113425.1144689-1-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-03-19 13:07:35 +01:00
Yuezhang Mo	dc38fdc51b	exfat: remove duplicate update parent dir For renaming, the directory only needs to be updated once if it is in the same directory. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:56:10 +09:00
Yuezhang Mo	96cf51accc	exfat: do not sync parent dir if just update timestamp When sync or dir_sync is enabled, there is no need to sync the parent directory's inode if only for updating its timestamp. 1. If an unexpected power failure occurs, the timestamp of the parent directory is not updated to the storage, which has no impact on the user. 2. The number of writes will be greatly reduced, which can not only improve performance, but also prolong device life. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:56:05 +09:00
Yuezhang Mo	4d71455976	exfat: remove unused functions exfat_count_ext_entries() is no longer called, remove it. exfat_update_dir_chksum() is no longer called, remove it and rename exfat_update_dir_chksum_with_entry_set() to it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:56:01 +09:00
Yuezhang Mo	af02c72d0b	exfat: convert exfat_find_empty_entry() to use dentry cache Before this conversion, each dentry traversed needs to be read from the storage device or page cache. There are at least 16 dentries in a sector. This will result in frequent page cache searches. After this conversion, if all directory entries in a sector are used, the sector only needs to be read once. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:55:54 +09:00
Yuezhang Mo	d97e060673	exfat: convert exfat_init_ext_entry() to use dentry cache Before this conversion, in exfat_init_ext_entry(), to init the dentries in a dentry set, the sync times is equals the dentry number if 'dirsync' or 'sync' is enabled. That affects not only performance but also device life. After this conversion, only needs to be synchronized once if 'dirsync' or 'sync' is enabled. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:55:49 +09:00
Yuezhang Mo	4e1aa22fea	exfat: move free cluster out of exfat_init_ext_entry() exfat_init_ext_entry() is an init function, it's a bit strange to free cluster in it. And the argument 'inode' will be removed from exfat_init_ext_entry(). So this commit changes to free the cluster in exfat_remove_entries(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:55:45 +09:00
Yuezhang Mo	ff4343da02	exfat: convert exfat_remove_entries() to use dentry cache Before this conversion, in exfat_remove_entries(), to mark the dentries in a dentry set as deleted, the sync times is equals the dentry numbers if 'dirsync' or 'sync' is enabled. That affects not only performance but also device life. After this conversion, only needs to be synchronized once if 'dirsync' or 'sync' is enabled. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:55:40 +09:00
Yuezhang Mo	cf8663fa99	exfat: convert exfat_add_entry() to use dentry cache After this conversion, if "dirsync" or "sync" is enabled, the number of synchronized dentries in exfat_add_entry() will change from 2 to 1. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:55:36 +09:00
Yuezhang Mo	01da3a5176	exfat: add exfat_get_empty_dentry_set() helper This helper is used to lookup empty dentry set. If there are no enough empty dentries at the input location, this helper will return the number of dentries that need to be skipped for the next lookup. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:55:33 +09:00
Yuezhang Mo	7b6bab2359	exfat: add __exfat_get_dentry_set() helper Since exfat_get_dentry_set() invokes the validate functions of exfat_validate_entry(), it only supports getting a directory entry set of an existing file, doesn't support getting an empty entry set. To remove the limitation, add this helper. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2024-03-19 20:55:28 +09:00
Yewon Choi	1422f28826	rds: introduce acquire/release ordering in acquire/release_in_xmit() acquire/release_in_xmit() work as bit lock in rds_send_xmit(), so they are expected to ensure acquire/release memory ordering semantics. However, test_and_set_bit/clear_bit() don't imply such semantics, on top of this, following smp_mb__after_atomic() does not guarantee release ordering (memory barrier actually should be placed before clear_bit()). Instead, we use clear_bit_unlock/test_and_set_bit_lock() here. Fixes: `0f4b1c7e89` ("rds: fix rds_send_xmit() serialization") Fixes: `1f9ecd7eac` ("RDS: Pass rds_conn_path to rds_send_xmit()") Signed-off-by: Yewon Choi <woni9911@gmail.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://lore.kernel.org/r/ZfQUxnNTO9AJmzwc@libra05 Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 12:15:35 +01:00
Conrad Kostecki	6cd8adc3e1	ahci: asm1064: asm1166: don't limit reported ports Previously, patches have been added to limit the reported count of SATA ports for asm1064 and asm1166 SATA controllers, as those controllers do report more ports than physically having. While it is allowed to report more ports than physically having in CAP.NP, it is not allowed to report more ports than physically having in the PI (Ports Implemented) register, which is what these HBAs do. (This is a AHCI spec violation.) Unfortunately, it seems that the PMP implementation in these ASMedia HBAs is also violating the AHCI and SATA-IO PMP specification. What these HBAs do is that they do not report that they support PMP (CAP.SPM (Supports Port Multiplier) is not set). Instead, they have decided to add extra "virtual" ports in the PI register that is used if a port multiplier is connected to any of the physical ports of the HBA. Enumerating the devices behind the PMP as specified in the AHCI and SATA-IO specifications, by using PMP READ and PMP WRITE commands to the physical ports of the HBA is not possible, you have to use the "virtual" ports. This is of course bad, because this gives us no way to detect the device and vendor ID of the PMP actually connected to the HBA, which means that we can not apply the proper PMP quirks for the PMP that is connected to the HBA. Limiting the port map will thus stop these controllers from working with SATA Port Multipliers. This patch reverts both patches for asm1064 and asm1166, so old behavior is restored and SATA PMP will work again, but it will also reintroduce the (minutes long) extra boot time for the ASMedia controllers that do not have a PMP connected (either on the PCIe card itself, or an external PMP). However, a longer boot time for some, is the lesser evil compared to some other users not being able to detect their drives at all. Fixes: `0077a504e1` ("ahci: asm1166: correct count of reported ports") Fixes: `9815e39617` ("ahci: asm1064: correct count of reported ports") Cc: stable@vger.kernel.org Reported-by: Matt <cryptearth@googlemail.com> Signed-off-by: Conrad Kostecki <conikost@gentoo.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> [cassel: rewrote commit message] Signed-off-by: Niklas Cassel <cassel@kernel.org>	2024-03-19 12:06:54 +01:00
Jakub Kicinski	9966e329d6	tools: ynl: add header guards for nlctrl I "extracted" YNL C into a GitHub repo to make it easier to use in other projects: https://github.com/linux-netdev/ynl-c GitHub actions use Ubuntu by default, and the kernel headers there are missing `f329a0ebea` ("genetlink: correct uAPI defines"). Add the direct include workaround for nlctrl. Fixes: `768e044a5f` ("doc/netlink/specs: Add spec for nlctrl netlink family") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240315002108.523232-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:33:02 +01:00
Paolo Abeni	710fe438e3	Merge branch 'wireguard-fixes-for-6-9-rc1' Jason A. Donenfeld says: ==================== wireguard fixes for 6.9-rc1 This series has four WireGuard fixes: 1) Annotate a data race that KCSAN found by using READ_ONCE/WRITE_ONCE, which has been causing syzkaller noise. 2) Use the generic netdev tstats allocation and stats getters instead of doing this within the driver. 3) Explicitly check a flag variable instead of an empty list in the netlink code, to prevent a UaF situation when paging through GET results during a remove-all SET operation. 4) Set a flag in the RISC-V CI config so the selftests continue to boot. ==================== Link: https://lore.kernel.org/r/20240314224911.6653-1-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:54 +01:00
Jason A. Donenfeld	e995f5dd9a	wireguard: selftests: set RISCV_ISA_FALLBACK on riscv{32,64} This option is needed to continue booting with QEMU. Recent changes that made this optional meant that it gets unset in the test harness, and so WireGuard CI has been broken. Fix this by simply setting this option. Cc: stable@vger.kernel.org Fixes: `496ea826d1` ("RISC-V: provide Kconfig & commandline options to control parsing "riscv,isa"") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:50 +01:00
Jason A. Donenfeld	71cbd32e3d	wireguard: netlink: access device through ctx instead of peer The previous commit fixed a bug that led to a NULL peer->device being dereferenced. It's actually easier and faster performance-wise to instead get the device from ctx->wg. This semantically makes more sense too, since ctx->wg->peer_allowedips.seq is compared with ctx->allowedips_seq, basing them both in ctx. This also acts as a defence in depth provision against freed peers. Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:50 +01:00
Jason A. Donenfeld	55b6c73867	wireguard: netlink: check for dangling peer via is_dead instead of empty list If all peers are removed via wg_peer_remove_all(), rather than setting peer_list to empty, the peer is added to a temporary list with a head on the stack of wg_peer_remove_all(). If a netlink dump is resumed and the cursored peer is one that has been removed via wg_peer_remove_all(), it will iterate from that peer and then attempt to dump freed peers. Fix this by instead checking peer->is_dead, which was explictly created for this purpose. Also move up the device_update_lock lockdep assertion, since reading is_dead relies on that. It can be reproduced by a small script like: echo "Setting config..." ip link add dev wg0 type wireguard wg setconf wg0 /big-config ( while true; do echo "Showing config..." wg showconf wg0 > /dev/null done ) & sleep 4 wg setconf wg0 <(printf "[Peer]\nPublicKey=$(wg genkey)\n") Resulting in: BUG: KASAN: slab-use-after-free in __lock_acquire+0x182a/0x1b20 Read of size 8 at addr ffff88811956ec70 by task wg/59 CPU: 2 PID: 59 Comm: wg Not tainted 6.8.0-rc2-debug+ #5 Call Trace: <TASK> dump_stack_lvl+0x47/0x70 print_address_description.constprop.0+0x2c/0x380 print_report+0xab/0x250 kasan_report+0xba/0xf0 __lock_acquire+0x182a/0x1b20 lock_acquire+0x191/0x4b0 down_read+0x80/0x440 get_peer+0x140/0xcb0 wg_get_device_dump+0x471/0x1130 Cc: stable@vger.kernel.org Fixes: `e7096c131e` ("net: WireGuard secure network tunnel") Reported-by: Lillian Berry <lillian@star-ark.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:50 +01:00
Breno Leitao	df9bbb5e77	wireguard: device: remove generic .ndo_get_stats64 Commit `3e2f544dd8` ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:49 +01:00
Breno Leitao	db2952dfbd	wireguard: device: leverage core stats allocator With commit `34d21de99c` ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in this driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:49 +01:00
Nikita Zhandarovich	bba045dc4d	wireguard: receive: annotate data-race around receiving_counter.counter Syzkaller with KCSAN identified a data-race issue when accessing keypair->receiving_counter.counter. Use READ_ONCE() and WRITE_ONCE() annotations to mark the data race as intentional. BUG: KCSAN: data-race in wg_packet_decrypt_worker / wg_packet_rx_poll write to 0xffff888107765888 of 8 bytes by interrupt on cpu 0: counter_validate drivers/net/wireguard/receive.c:321 [inline] wg_packet_rx_poll+0x3ac/0xf00 drivers/net/wireguard/receive.c:461 __napi_poll+0x60/0x3b0 net/core/dev.c:6536 napi_poll net/core/dev.c:6605 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6738 __do_softirq+0xc4/0x279 kernel/softirq.c:553 do_softirq+0x5e/0x90 kernel/softirq.c:454 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x6c5/0x700 drivers/net/wireguard/receive.c:499 process_one_work kernel/workqueue.c:2633 [inline] ... read to 0xffff888107765888 of 8 bytes by task 3196 on cpu 1: decrypt_packet drivers/net/wireguard/receive.c:252 [inline] wg_packet_decrypt_worker+0x220/0x700 drivers/net/wireguard/receive.c:501 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 ... Fixes: `a9e90d9931` ("wireguard: noise: separate receive counter from send counter") Reported-by: syzbot+d1de830e4ecdaac83d89@syzkaller.appspotmail.com Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 11:22:49 +01:00
Eric Dumazet	f6e0a4984c	net: move dev->state into net_device_read_txrx group dev->state can be read in rx and tx fast paths. netif_running() which needs dev->state is called from - enqueue_to_backlog() [RX path] - __dev_direct_xmit() [TX path] Fixes: `43a71cd66b` ("net-device: reorganize net_device fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240314200845.3050179-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-19 10:47:47 +01:00
Frederic Weisbecker	0387703986	timers: Fix removed self-IPI on global timer's enqueue in nohz_full While running in nohz_full mode, a task may enqueue a timer while the tick is stopped. However the only places where the timer wheel, alongside the timer migration machinery's decision, may reprogram the next event accordingly with that new timer's expiry are the idle loop or any IRQ tail. However neither the idle task nor an interrupt may run on the CPU if it resumes busy work in userspace for a long while in full dynticks mode. To solve this, the timer enqueue path raises a self-IPI that will re-evaluate the timer wheel on its IRQ tail. This asynchronous solution avoids potential locking inversion. This is supposed to happen both for local and global timers but commit: `b2cf7507e1` ("timers: Always queue timers on the local CPU") broke the global timers case with removing the ->is_idle field handling for the global base. As a result, global timers enqueue may go unnoticed in nohz_full. Fix this with restoring the idle tracking of the global timer's base, allowing self-IPIs again on enqueue time. Fixes: `b2cf7507e1` ("timers: Always queue timers on the local CPU") Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240318230729.15497-3-frederic@kernel.org	2024-03-19 10:14:55 +01:00
Frederic Weisbecker	f55acb1e44	timers/migration: Fix endless timer requeue after idle interrupts When a CPU is an idle migrator, but another CPU wakes up before it, becomes an active migrator and handles the queue, the initial idle migrator may end up endlessly reprogramming its clockevent, chasing ghost timers forever such as in the following scenario: [GRP0:0] migrator = 0 active = 0 nextevt = T1 / \ 0 1 active idle (T1) 0) CPU 1 is idle and has a timer queued (T1), CPU 0 is active and is the active migrator. [GRP0:0] migrator = NONE active = NONE nextevt = T1 / \ 0 1 idle idle (T1) wakeup = T1 1) CPU 0 is now idle and is therefore the idle migrator. It has programmed its next timer interrupt to handle T1. [GRP0:0] migrator = 1 active = 1 nextevt = KTIME_MAX / \ 0 1 idle active wakeup = T1 2) CPU 1 has woken up, it is now active and it has just handled its own timer T1. 3) CPU 0 gets a timer interrupt to handle T1 but tmigr_handle_remote() realize it is not the migrator anymore. So it early returns without observing that T1 has been expired already and therefore without updating its ->wakeup value. 4) CPU 0 goes into tmigr_cpu_new_timer() which also early returns because it doesn't queue a timer of its own. So ->wakeup is left unchanged and the next timer is programmed to fire now. 5) goto 3) forever This results in timer interrupt storms in idle and also in nohz_full (as observed in rcutorture's TREE07 scenario). Fix this with forcing a re-evaluation of tmc->wakeup while trying remote timer handling when the CPU isn't the migrator anymmore. The check is inherently racy but in the worst case the CPU just races setting the KTIME_MAX value that a remote expiry also tries to set. Fixes: `7ee9887703` ("timers: Implement the hierarchical pull model") Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240318230729.15497-2-frederic@kernel.org	2024-03-19 10:14:55 +01:00
Yuli Wang	fea1c949f6	LoongArch/crypto: Clean up useless assignment operations The LoongArch CRC32 hw acceleration is based on arch/mips/crypto/ crc32-mips.c. While the MIPS code supports both MIPS32 and MIPS64, but LoongArch32 lacks the CRC instruction. As a result, the line "len -= sizeof(u32)" is unnecessary. Removing it can make context code style more unified and improve code readability. Cc: stable@vger.kernel.org Reviewed-by: WANG Xuerui <git@xen0n.name> Suggested-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	9c68ece8b2	LoongArch: Define the __io_aw() hook as mmiowb() Commit `fb24ea52f7` ("drivers: Remove explicit invocations of mmiowb()") remove all mmiowb() in drivers, but it says: "NOTE: mmiowb() has only ever guaranteed ordering in conjunction with spin_unlock(). However, pairing each mmiowb() removal in this patch with the corresponding call to spin_unlock() is not at all trivial, so there is a small chance that this change may regress any drivers incorrectly relying on mmiowb() to order MMIO writes between CPUs using lock-free synchronisation." The mmio in radeon_ring_commit() is protected by a mutex rather than a spinlock, but in the mutex fastpath it behaves similar to spinlock. We can add mmiowb() calls in the radeon driver but the maintainer says he doesn't like such a workaround, and radeon is not the only example of mutex protected mmio. So we should extend the mmiowb tracking system from spinlock to mutex, and maybe other locking primitives. This is not easy and error prone, so we solve it in the architectural code, by simply defining the __io_aw() hook as mmiowb(). And we no longer need to override queued_spin_unlock() so use the generic definition. Without this, we get such an error when run 'glxgears' on weak ordering architectures such as LoongArch: radeon 0000:04:00.0: ring 0 stalled for more than 10324msec radeon 0000:04:00.0: ring 3 stalled for more than 10240msec radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3) radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) radeon 0000:04:00.0: scheduling IB failed (-35). [drm:radeon_gem_va_ioctl [radeon]] ERROR Couldn't update BO_VA (-35) Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@amd.com/T/#t Link: https://lore.kernel.org/linux-arch/20240301130532.3953167-1-chenhuacai@loongson.cn/T/#t Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	82bf60a6fe	LoongArch: Remove superfluous flush_dcache_page() definition LoongArch doesn't have cache aliases, so flush_dcache_page() is a no-op. There is a generic implementation for this case in include/asm-generic/ cacheflush.h. So remove the superfluous flush_dcache_page() definition, which also silences such build warnings: In file included from crypto/scompress.c:12: include/crypto/scatterwalk.h: In function 'scatterwalk_pagedone': include/crypto/scatterwalk.h:76:30: warning: variable 'page' set but not used [-Wunused-but-set-variable] 76 \| struct page page; \| ^~~~ crypto/scompress.c: In function 'scomp_acomp_comp_decomp': >> crypto/scompress.c:174:38: warning: unused variable 'dst_page' [-Wunused-variable] 174 \| struct page dst_page = sg_page(req->dst); \| Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202403091614.NeUw5zcv-lkp@intel.com/ Suggested-by: Barry Song <baohua@kernel.org> Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Max Kellermann	d42ab9af60	LoongArch: Move {dmw,tlb}_virt_to_page() definition to page.h These two functions are implemented in pgtable.c, and they are needed only by the virt_to_page() macro in page.h. Having the prototypes in pgtable.h causes a circular dependency between page.h and pgtable.h, because the virt_to_page() macro in page.h needs pgtable.h for these two functions, while pgtable.h needs various definitions from page.h (e.g. pte_t and pgt_t). Let's avoid this circular dependency by moving the function prototypes to page.h. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	c87e12e0e8	LoongArch: Change __my_cpu_offset definition to avoid mis-optimization From GCC commit 3f13154553f8546a ("df-scan: remove ad-hoc handling of global regs in asms"), global registers will no longer be forced to add to the def-use chain. Then current_thread_info(), current_stack_pointer and __my_cpu_offset may be lifted out of the loop because they are no longer treated as "volatile variables". This optimization is still correct for the current_thread_info() and current_stack_pointer usages because they are associated to a thread. However it is wrong for __my_cpu_offset because it is associated to a CPU rather than a thread: if the thread migrates to a different CPU in the loop, __my_cpu_offset should be changed. Change __my_cpu_offset definition to treat it as a "volatile variable", in order to avoid such a mis-optimization. Cc: stable@vger.kernel.org Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn> Reported-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Xing Li <lixing@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Rui Wang <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	f48ad26e5e	LoongArch: Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig This allocates the VM flag needed to support the userfaultfd minor fault functionality. See commit `7677f7fd8b` ("userfaultfd: add minor fault registration mode") for more information. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:34 +08:00
Huacai Chen	8b5db5e533	LoongArch: Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig LoongArch has implemented the current_stack_pointer macro, so select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig. This will let it be used in non-arch places (like HARDENED_USERCOPY). Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2024-03-19 15:50:27 +08:00
Xuan Zhuo	5da7137de7	virtio_net: rename free_old_xmit_skbs to free_old_xmit Since free_old_xmit_skbs not only deals with skb, but also xdp frame and subsequent added xsk, so change the name of this function to free_old_xmit. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240229072044.77388-19-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 03:19:22 -04:00
Xuan Zhuo	b1dc24aba7	virtio_net: unify the code for recycling the xmit ptr There are two completely similar and independent implementations. This is inconvenient for the subsequent addition of new types. So extract a function from this piece of code and call this function uniformly to recover old xmit ptr. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240229072044.77388-18-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 03:19:22 -04:00
Jason Wang	0d197a1471	virtio-net: add cond_resched() to the command waiting loop Adding cond_resched() to the command waiting loop for a better co-operation with the scheduler. This allows to give CPU a breath to run other task(workqueue) instead of busy looping when preemption is not allowed on a device whose CVQ might be slow. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230720083839.481487-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>	2024-03-19 03:19:22 -04:00
Jason Wang	b9f7425239	virtio-net: convert rx mode setting to use workqueue This patch convert rx mode setting to be done in a workqueue, this is a must for allow to sleep when waiting for the cvq command to response since current code is executed under addr spin lock. Note that we need to disable and flush the workqueue during freeze, this means the rx mode setting is lost after resuming. This is not the bug of this patch as we never try to restore rx mode setting during resume. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230720083839.481487-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>	2024-03-19 03:19:22 -04:00
Xuan Zhuo	d5c0ed17fe	virtio: packed: fix unmap leak for indirect desc table When use_dma_api and premapped are true, then the do_unmap is false. Because the do_unmap is false, vring_unmap_extra_packed is not called by detach_buf_packed. if (unlikely(vq->do_unmap)) { curr = id; for (i = 0; i < state->num; i++) { vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr]); curr = vq->packed.desc_extra[curr].next; } } So the indirect desc table is not unmapped. This causes the unmap leak. So here, we check vq->use_dma_api instead. Synchronously, dma info is updated based on use_dma_api judgment This bug does not occur, because no driver use the premapped with indirect. Fixes: `b319940f83` ("virtio_ring: skip unmap for premapped") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20240223071833.26095-1-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 03:19:22 -04:00
Zhu Lingshan	1ac61ddfee	vDPA: report virtio-blk flush info to user space This commit reports whether a virtio-blk device support cache flush command to user space Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-11-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	ae1374b7f7	vDPA: report virtio-block read-only info to user space This commit report read-only information of virtio-blk devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-10-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	6bdc7846e6	vDPA: report virtio-block write zeroes configuration to user space This commits reports write zeroes configuration of virtio-block devices to user space, includes: 1)maximum write zeroes sectors size 2)maximum write zeroes segment number Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-9-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00
Zhu Lingshan	65848f46e1	vDPA: report virtio-block discarding configuration to user space This commit reports virtio-blk discarding configuration to user space,includes: 1) the maximum discard sectors 2) maximum number of discard segments for the block driver to use 3) the alignment for splitting a discarding request Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2024-03-19 02:45:51 -04:00

... 3 4 5 6 7 ...

1264532 Commits