linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-09-22 12:44:11 +08:00

Author	SHA1	Message	Date
David S. Miller	969afb4347	Merge branch 'l2tp-misc-improvements' James Chapman says: ==================== l2tp: misc improvements This series makes several improvements to l2tp: * update documentation to be consistent with recent l2tp changes. * move l2tp_ip socket tables to per-net data. * fix handling of hash key collisions in l2tp_v3_session_get * implement and use get-next APIs for management and procfs/debugfs. * improve l2tp refcount helpers. * use per-cpu dev->tstats in l2tpeth devices. * fix a lockdep splat. * fix a race between l2tp_pre_exit_net and pppol2tp_release. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:50 +01:00
James Chapman	c1b2e36b87	l2tp: flush workqueue before draining it syzbot exposes a race where a net used by l2tp is removed while an existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues TUNNEL_DELETE work items to close each tunnel in the net. When these are run, new SESSION_DELETE work items are queued to delete each session in the tunnel. This all happens in drain_workqueue. However, drain_workqueue allows only new work items if they are queued by other work items which are already in the queue. If pppol2tp_release runs after drain_workqueue has started, it may queue a SESSION_DELETE work item, which results in the warning below in drain_workqueue. Address this by flushing the workqueue before drain_workqueue such that all queued TUNNEL_DELETE work items run before drain_workqueue is started. This will queue SESSION_DELETE work items for each session in the tunnel, hence pppol2tp_release or other API requests won't queue SESSION_DELETE requests once drain_workqueue is started. WARNING: CPU: 1 PID: 5467 at kernel/workqueue.c:2259 __queue_work+0xcd3/0xf50 kernel/workqueue.c:2258 Modules linked in: CPU: 1 UID: 0 PID: 5467 Comm: syz.3.43 Not tainted 6.11.0-rc1-syzkaller-00247-g3608d6aca5e7 #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 RIP: 0010:__queue_work+0xcd3/0xf50 kernel/workqueue.c:2258 Code: ff e8 11 84 36 00 90 0f 0b 90 e9 1e fd ff ff e8 03 84 36 00 eb 13 e8 fc 83 36 00 eb 0c e8 f5 83 36 00 eb 05 e8 ee 83 36 00 90 <0f> 0b 90 48 83 c4 60 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc RSP: 0018:ffffc90004607b48 EFLAGS: 00010093 RAX: ffffffff815ce274 RBX: ffff8880661fda00 RCX: ffff8880661fda00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffff815cd6d4 R09: 0000000000000000 R10: ffffc90004607c20 R11: fffff520008c0f85 R12: ffff88802ac33800 R13: ffff88802ac339c0 R14: dffffc0000000000 R15: 0000000000000008 FS: 00005555713eb500(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000001eda6000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> queue_work_on+0x1c2/0x380 kernel/workqueue.c:2392 pppol2tp_release+0x163/0x230 net/l2tp/l2tp_ppp.c:445 __sock_release net/socket.c:659 [inline] sock_close+0xbc/0x240 net/socket.c:1421 __fput+0x24a/0x8a0 fs/file_table.c:422 task_work_run+0x24f/0x310 kernel/task_work.c:228 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop kernel/entry/common.c:114 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f061e9779f9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffff1c1fce8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 000000000001017d RCX: 00007f061e9779f9 RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007ffff1c1fdc0 R08: 0000000000000001 R09: 00007ffff1c1ffcf R10: 00007f061e800000 R11: 0000000000000246 R12: 0000000000000032 R13: 00007ffff1c1fde0 R14: 00007ffff1c1fe00 R15: ffffffffffffffff </TASK> Fixes: `fc7ec7f554` ("l2tp: delete sessions using work queue") Reported-by: syzbot+0e85b10481d2f5478053@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0e85b10481d2f5478053 Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:50 +01:00
James Chapman	dcc59d3e32	l2tp: l2tp_eth: use per-cpu counters from dev->tstats l2tp_eth uses old-style dev->stats for fastpath packet/byte counters. Convert it to use dev->tstats per-cpu counters. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:50 +01:00
James Chapman	abe7a1a7d0	l2tp: improve tunnel/session refcount helpers l2tp_tunnel_inc_refcount and l2tp_session_inc_refcount wrap refcount_inc. They add no value so just use the refcount APIs directly and drop l2tp's helpers. l2tp already uses refcount_inc_not_zero anyway. Rename l2tp_tunnel_dec_refcount and l2tp_session_dec_refcount to l2tp_tunnel_put and l2tp_session_put to better match their use pairing various _get getters. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:50 +01:00
James Chapman	1f4c3dce91	l2tp: use get_next APIs for management requests and procfs/debugfs l2tp netlink and procfs/debugfs iterate over tunnel and session lists to obtain data. They currently use very inefficient get_nth functions to do so. Replace these with get_next. For netlink, use nl cb->ctx[] for passing state instead of the obsolete cb->args[]. l2tp_tunnel_get_nth and l2tp_session_get_nth are no longer used so they can be removed. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	aa92c1cec9	l2tp: add tunnel/session get_next helpers l2tp management APIs and procfs/debugfs iterate over l2tp tunnel and session lists. Since these lists are now implemented using IDR, we can use IDR get_next APIs to iterate them. Add tunnel/session get_next functions to do so. The session get_next functions get the next session in a given tunnel and need to account for l2tpv2 and l2tpv3 differences: * l2tpv2 sessions are keyed by tunnel ID / session ID. Iteration for a given tunnel ID, TID, can therefore start with a key given by TID/0 and finish when the next entry's tunnel ID is not TID. This is possible only because the tunnel ID part of the key is the upper 16 bits and the session ID part the lower 16 bits; when idr_next increments the key value, it therefore finds the next sessions of the current tunnel before those of the next tunnel. Entries with session ID 0 are always skipped because they are used internally by pppol2tp. * l2tpv3 sessions are keyed by session ID. Iteration starts at the first IDR entry and skips entries where the tunnel does not match. Iteration must also consider session ID collisions and walk the list of colliding sessions (if any) for one which matches the supplied tunnel. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	b0a8deda06	l2tp: handle hash key collisions in l2tp_v3_session_get To handle colliding l2tpv3 session IDs, l2tp_v3_session_get searches a hashed list keyed by ID and sk. Although unlikely, if hash keys collide, it is possible that hash_for_each_possible loops over a session which doesn't have the ID that we are searching for. So check for session ID match when looping over possible hash key matches. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	ebed6606b9	l2tp: move l2tp_ip and l2tp_ip6 data to pernet l2tp_ip[6] have always used global socket tables. It is therefore not possible to create l2tpip sockets in different namespaces with the same socket address. To support this, move l2tpip socket tables to pernet data. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	168464c19e	l2tp: remove inline from functions in c sources Update l2tp to remove the inline keyword from several functions in C sources, since this is now discouraged. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
James Chapman	e2b1762cf3	documentation/networking: update l2tp docs l2tp no longer uses sk_user_data in tunnel sockets and now manages tunnel/session lifetimes slightly differently. Update docs to cover this. CC: linux-doc@vger.kernel.org CC: corbet@lwn.net Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 04:38:49 +01:00
Jakub Kicinski	bbfeba2603	Merge branch 'mlx5-misc-patches-2024-08-08' Tariq Toukan says: ==================== mlx5 misc patches 2024-08-08 This patchset contains multiple enhancements from the team to the mlx5 core and Eth drivers. Patch #1 by Chris bumps a defined value to permit more devices doing TC offloads. Patch #2 by Jianbo adds an IPsec fast-path optimization to replace the slow async handling. Patches #3 and #4 by Jianbo add TC offload support for complicated rules to overcome firmware limitation. Patch #5 by Gal unifies the access macro to advertised/supported link modes. Patches #6 to #9 by Gal adds extack messages in ethtool ops to replace prints to the kernel log. Patch #10 by Cosmin switches to using 'update' verb instead of 'replace' to better reflect the operation. Patch #11 by Cosmin exposes an update connection tracking operation to replace the assumed delete+add implementaiton. ==================== Link: https://patch.msgid.link/20240808055927.2059700-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:17 -07:00
Cosmin Ratiu	6b5662b759	net/mlx5e: CT: Update connection tracking steering entries Previously, replacing a connection tracking steering entry was done by adding a new rule (with the same tag but possibly different mod hdr actions/labels) then removing the old rule. This approach doesn't work in hardware steering because two steering entries with the same tag cannot coexist in a hardware steering table. This commit prepares for that by adding a new ct_rule_update operation on the ct_fs_ops struct which is used instead of add+delete. Implementations for both dmfs (firmware steering) and smfs (software steering) are provided, which simply add the new rule and delete the old one. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Cosmin Ratiu	486aeb2db5	net/mlx5e: CT: 'update' rules instead of 'replace' Offloaded rules can be updated with a new modify header action containing a changed restore cookie. This was done using the verb 'replace', while in some configurations 'update' is a better fit. This commit renames the functions used to reflect that. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Gal Pressman	b5100b72da	net/mlx5e: Use extack in get module eeprom by page callback In case of errors in get module eeprom by page, reflect it through extack instead of a dmesg print. While at it, make the messages more human friendly. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Gal Pressman	9c4298b466	net/mlx5e: Use extack in set coalesce callback In case of errors in set coalesce, reflect it through extack instead of a dmesg print. While at it, make the messages more human friendly. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Gal Pressman	29a943d71d	net/mlx5e: Use extack in get coalesce callback In case of errors in get coalesce, reflect it through extack instead of a dmesg print. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Gal Pressman	ab666b5287	net/mlx5e: Use extack in set ringparams callback In case of errors in set ringparams, reflect it through extack instead of a dmesg print. While at it, make the messages more human friendly and remove two redundant checks that are already validated by the core. Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Gal Pressman	4384bcff03	net/mlx5e: Be consistent with bitmap handling of link modes Use the bitmap operations when accessing the advertised/supported link modes and remove places that access them as arrays of unsigned longs (underlying implementation of the bitmap), this makes the code much more readable and clear. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:15 -07:00
Jianbo Liu	b11bde5624	net/mlx5e: TC, Offload rewrite and mirror to both internal and external dests Firmware has the limitation that it cannot offload a rule with rewrite and mirror to internal and external destinations simultaneously. This patch adds a workaround to this issue. Here the destination array is split again, just like what's done in previous commit, but after the action indexed by split_count - 1. An extra rule is added for the leftover destinations. Such rule can be offloaded, even there are destinations to both internal and external destinations, because the header rewrite is left in the original FTE. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:14 -07:00
Jianbo Liu	16bb8c6133	net/mlx5e: TC, Offload rewrite and mirror on tunnel over ovs internal port To offload the encap rule when the tunnel IP is configured on an openvswitch internal port, driver need to overwrite vport metadata in reg_c0 to the value assigned to the internal port, then forward packets to root table to be processed again by the rules matching on the metadata for such internal port. When such rule is combined with header rewrite and mirror, openvswitch generates the rule like the following, because it resets mirror after packets are modified. in_port(enp8s0f0npf0sf1),.., actions:enp8s0f0npf0sf2,set(tunnel(...)),set(ipv4(...)),vxlan_sys_4789,enp8s0f0npf0sf2 The split_count was introduced before to support rewrite and mirror. Driver splits the rule into two different hardware rules in order to offload it. But it's not enough to offload the above complicated rule because of the limitations, in both driver and firmware. To resolve this issue, the destination array is split again after the destination indexed by split_count. An extra rule is added for the leftover destinations (in the above example, it is enp8s0f0npf0sf2), and is inserted to post_act table. And the extra destination is added in the original rule to forward to post_act table, so the extra mirror is done there. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:14 -07:00
Jianbo Liu	88c46f6103	net/mlx5e: Enable remove flow for hard packet limit In the commit `a2a73ea14b` ("net/mlx5e: Don't listen to remove flows event"), remove_flow_enable event is removed, and the hard limit usually relies on software mechanism added in commit `b2f7b01d36` ("net/mlx5e: Simulate missing IPsec TX limits hardware functionality"). But the delayed work is rescheduled every one second, which is slow for fast traffic. As a result, traffic can't be blocked even reaches the hard limit, which usually happens when soft and hard limits are very close. In reality it won't happen because soft limit is much lower than hard limit. But, as an optimization for RX to block traffic when reaching hard limit, need to set remove_flow_enable. When remove flow is enabled, IPSEC HARD_LIFETIME ASO syndrome will be set in the metadata defined in the ASO return register if packets reach hard lifetime threshold. And those packets are dropped immediately by the steering table. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:14 -07:00
Chris Mi	6e20d538fb	net/mlx5: E-Switch, Increase max int port number for offload Currently MLX5E_TC_MAX_INT_PORT_NUM is 8. Usually int port has one ingress and one egress rules. But sometimes, a temporary rule can be offloaded as well, eg: recirc_id(0),in_port(br-phy),eth(src=10:70:fd:87:57:c0,dst=33:33:00:00:00:16), eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:180, used:0.060s, actions:enp8s0f0 If one int port device offloads 3 rules, only 2 devices can offload. Other devices will hit the limit and fail to offload. Actually it is insufficient for customers. So increase the number to 32. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20240808055927.2059700-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:13:14 -07:00
Rosen Penev	1862923bf6	net: ag71xx: use phylink_mii_ioctl `f1294617d2` removed the custom function for ndo_eth_ioctl and used the standard phy_do_ioctl which calls phy_mii_ioctl. However since then, this driver was ported to phylink where it makes more sense to call phylink_mii_ioctl. Bring back custom function that calls phylink_mii_ioctl. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240807215834.33980-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:10:09 -07:00
Jakub Kicinski	c89c6757cf	Merge branch 'ibmvnic-ibmvnic-rr-patchset' Nick Child says: ==================== ibmvnic: ibmvnic rr patchset v1 - https://lore.kernel.org/netdev/20240801212340.132607-1-nnac123@linux.ibm.com/ v2 - https://lore.kernel.org/netdev/20240806193706.998148-1-nnac123@linux.ibm.com/ ==================== Link: https://patch.msgid.link/20240807211809.1259563-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:09:25 -07:00
Nick Child	e633e32b60	ibmvnic: Perform tx CSO during send scrq direct During initialization with the vnic server, a bitstring is communicated to the client regarding header info needed during CSO (See "VNIC Capabilities" in PAPR). Most of the time, to be safe, vnic server requests header info for CSO. When header info is needed, multiple TX descriptors are required per skb; This limits the driver to use send_subcrq_indirect instead of send_subcrq_direct. Previously, the vnic server request for header info was ignored. This allowed the use of send_sub_crq_direct. Transmissions were successful because the bitstring returned by vnic server is broad and over cautionary. It was observed that mlx backing devices could actually transmit and handle CSO packets without the vnic server receiving header info (despite the fact that the bitstring requested it). There was a trust issue: The bitstring was overcautionary. This extra precaution (requesting header info when the backing device may not use it) comes at the cost of performance (using direct vs indirect hcalls has a 30% delta in small packet RR transaction rate). So it has been requested that the vnic server team tries to ensure that the bitstring is more exact. In the meantime, disable CSO when it is possible to use the skb in the send_subcrq_direct path. In other words, calculate the checksum before handing the packet to FW when the packet is not segmented and xmit_more is false. Since the code path is only possible if the skb is non GSO and xmit_more is false, the cost of doing checksum in the send_subcrq_direct path is minimal. Any large segmented skb will have xmit_more set to true more frequently and it is inexpensive to do checksumming on a small skb. The worst-case workload would be a 9000 MTU TCP_RR test with close to MTU sized packets (and TSO off). This allows xmit_more to be false more frequently and open the code path up to use send_subcrq_direct. Observing trace data (graph-time = 1) and packet rate with this workload shows minimal performance degradation: 1. NIC does checksum w headers, safely use send_subcrq_indirect: - Packet rate: 631k txs - Trace data: ibmvnic_xmit = 44344685.87 us / 6234576 hits = AVG 7.11 us skb_checksum_help = 4.07 us / 2 hits = AVG 2.04 us ^ Notice hits, tracing this just for reassurance ibmvnic_tx_scrq_flush = 33040649.69 us / 5638441 hits = AVG 5.86 us send_subcrq_indirect = 37438922.24 us / 6030859 hits = AVG 6.21 us 2. NIC does checksum w/o headers, dangerously use send_subcrq_direct: - Packet rate: 831k txs - Trace data: ibmvnic_xmit = 48940092.29 us / `8187630` hits = AVG 5.98 us skb_checksum_help = 2.03 us / 1 hits = AVG 2.03 ibmvnic_tx_scrq_flush = 31141879.57 us / 7948960 hits = AVG 3.92 us send_subcrq_indirect = 8412506.03 us / 728781 hits = AVG 11.54 ^ notice hits is much lower b/c send_subcrq_direct was called ^ wasn't traceable 3. driver does checksum, safely use send_subcrq_direct (THIS PATCH): - Packet rate: 829k txs - Trace data: ibmvnic_xmit = 56696077.63 us / `8066168` hits = AVG 7.03 us skb_checksum_help = 8587456.16 us / 7526072 hits = AVG 1.14 us ibmvnic_tx_scrq_flush = 30219545.55 us / 7782409 hits = AVG 3.88 us send_subcrq_indirect = 8638326.44 us / 763693 hits = AVG 11.31 us When the bitstring ever specifies that CSO does not require headers (dependent on VIOS vnic server changes), then this patch should be removed and replaced with one that investigates the bitstring before using send_subcrq_direct. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-8-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:09:18 -07:00
Nick Child	1c33e29245	ibmvnic: Only record tx completed bytes once per handler Byte Queue Limits depends on dql_completed being called once per tx completion round in order to adjust its algorithm appropriately. The dql->limit value is an approximation of the amount of bytes that the NIC can consume per irq interval. If this approximation is too high then the NIC will become over-saturated. Too low and the NIC will starve. The dql->limit depends on dql->prev-* stats to calculate an optimal value. If dql_completed() is called more than once per irq handler then those prev-* values become unreliable (because they are not an accurate representation of the previous state of the NIC) resulting in a sub-optimal limit value. Therefore, move the call to netdev_tx_completed_queue() to the end of ibmvnic_complete_tx(). When performing 150 sessions of TCP rr (request-response 1 byte packets) workloads, one could observe: PREVIOUSLY: - limit and inflight values hovering around 130 - transaction rate of around 750k pps. NOW: - limit rises and falls in response to inflight (130-900) - transaction rate of around 1M pps (33% improvement) Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-7-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:09:18 -07:00
Nick Child	74839f7a82	ibmvnic: Introduce send sub-crq direct Firmware supports two hcalls to send a sub-crq request: H_SEND_SUB_CRQ_INDIRECT and H_SEND_SUB_CRQ. The indirect hcall allows for submission of batched messages while the other hcall is limited to only one message. This protocol is defined in PAPR section 17.2.3.3. Previously, the ibmvnic xmit function only used the indirect hcall. This allowed the driver to batch it's skbs. A single skb can occupy a few entries per hcall depending on if FW requires skb header information or not. The FW only needs header information if the packet is segmented. By this logic, if an skb is not GSO then it can fit in one sub-crq message and therefore is a candidate for H_SEND_SUB_CRQ. Batching skb transmission is only useful when there are more packets coming down the line (ie netdev_xmit_more is true). As it turns out, H_SEND_SUB_CRQ induces less latency than H_SEND_SUB_CRQ_INDIRECT. Therefore, use H_SEND_SUB_CRQ where appropriate. Small latency gains seen when doing TCP_RR_150 (request/response workload). Ftrace results (graph-time=1): Previous: ibmvnic_xmit = 29618270.83 us / 8860058.0 hits = AVG 3.34 ibmvnic_tx_scrq_flush = 21972231.02 us / 6553972.0 hits = AVG 3.35 Now: ibmvnic_xmit = 22153350.96 us / 8438942.0 hits = AVG 2.63 ibmvnic_tx_scrq_flush = 15858922.4 us / 6244076.0 hits = AVG 2.54 Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-6-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:09:18 -07:00
Nick Child	6e7a57581a	ibmvnic: Remove duplicate memory barriers in tx send_subcrq_[in]direct() already has a dma memory barrier. Remove the earlier one. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-5-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:09:18 -07:00
Nick Child	d95f749a0b	ibmvnic: Reduce memcpys in tx descriptor generation Previously when creating the header descriptors, the driver would: 1. allocate a temporary buffer on the stack (in build_hdr_descs_arr) 2. memcpy the header info into the temporary buffer (in build_hdr_data) 3. memcpy the temp buffer into a local variable (in create_hdr_descs) 4. copy the local variable into the return buffer (in create_hdr_descs) Since, there is no opportunity for errors during this process, the temp buffer is not needed and work can be done on the return buffer directly. Repurpose build_hdr_data() to only calculate the header lengths. Rename it to get_hdr_lens(). Edit create_hdr_descs() to read from the skb directly and copy directly into the returned useful buffer. The process now involves less memory and write operations while also being more readable. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-4-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:09:18 -07:00
Nick Child	b41b45ecee	ibmvnic: Use header len helper functions on tx Use the header length helper functions rather than trying to calculate it within the driver. There are defined functions for mac and network headers (skb_mac_header_len and skb_network_header_len) but no such function exists for the transport header length. Also, hdr_data was memset during allocation to all 0's so no need to memset again. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-3-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:09:18 -07:00
Nick Child	dda10fc801	ibmvnic: Only replenish rx pool when resources are getting low Previously, the driver would replenish the rx pool if the polling function consumed less than the budget. The logic being that the driver did not exhaust its budget so that must mean that the driver is not busy and has cycles to spare for replenishing the pool. So pool replenishment happens on every poll which did not consume the budget. This can very costly during request-response tests. In fact, an extra ~100pps can be seen in TCP_RR_150 tests when we remove this conditional. Trace results (ftrace, graph-time=1) for the poll function are below: Previous results: ibmvnic_poll = 64951846.0 us / 4167628.0 hits = AVG 15.58 replenish_rx_pool = 17602846.0 us / 4710437.0 hits = AVG 3.74 Now: ibmvnic_poll = 57673941.0 us / 4791737.0 hits = AVG 12.04 replenish_rx_pool = 3938171.6 us / 4314.0 hits = AVG 912.88 While the replenish function takes longer, it is hit less frequently meaning the ibmvnic_poll function, on average, is faster. Furthermore, this change does not have a negative effect on performance bandwidth/latency measurements. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:09:17 -07:00
Christophe Leroy	c146f3d191	net: fs_enet: Fix warning due to wrong type Building fs_enet on powerpc e500 leads to following warning: CC drivers/net/ethernet/freescale/fs_enet/mac-scc.o In file included from ./include/linux/build_bug.h:5, from ./include/linux/container_of.h:5, from ./include/linux/list.h:5, from ./include/linux/module.h:12, from drivers/net/ethernet/freescale/fs_enet/mac-scc.c:15: drivers/net/ethernet/freescale/fs_enet/mac-scc.c: In function 'allocate_bd': ./include/linux/err.h:28:49: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 28 \| #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO) \| ^ ./include/linux/compiler.h:77:45: note: in definition of macro 'unlikely' 77 \| # define unlikely(x) __builtin_expect(!!(x), 0) \| ^ drivers/net/ethernet/freescale/fs_enet/mac-scc.c:138:13: note: in expansion of macro 'IS_ERR_VALUE' 138 \| if (IS_ERR_VALUE(fep->ring_mem_addr)) \| ^~~~~~~~~~~~ This is due to fep->ring_mem_addr not being a pointer but a DMA address which is 64 bits on that platform while pointers are 32 bits as this is a 32 bits platform with wider physical bus. However, using fep->ring_mem_addr is just wrong because cpm_muram_alloc() returns an offset within the muram and not a physical address directly. So use fpi->dpram_offset instead. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ec67ea3a3bef7e58b8dc959f7c17d405af0d27e4.1723101144.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:08:16 -07:00
zhangxiangqian	2d5c9dd2cd	net: usb: cdc_ether: don't spew notifications The usbnet_link_change function is not called, if the link has not changed. ... [16913.807393][ 3] cdc_ether 1-2:2.0 enx00e0995fd1ac: kevent 12 may have been dropped [16913.822266][ 2] cdc_ether 1-2:2.0 enx00e0995fd1ac: kevent 12 may have been dropped [16913.826296][ 2] cdc_ether 1-2:2.0 enx00e0995fd1ac: kevent 11 may have been dropped ... kevent 11 is scheduled too frequently and may affect other event schedules. Signed-off-by: zhangxiangqian <zhangxiangqian@kylinos.cn> Acked-by: Oliver Neukum <oneukum@suse.com> Link: https://patch.msgid.link/1723109985-11996-1-git-send-email-zhangxiangqian@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 22:01:01 -07:00
Mina Almasry	916b7d31f7	ethtool: refactor checking max channels Currently ethtool_set_channel calls separate functions to check whether the new channel number violates rss configuration or flow steering configuration. Very soon we need to check whether the new channel number violates memory provider configuration as well. To do all 3 checks cleanly, add a wrapper around ethtool_get_max_rxnfc_channel() and ethtool_get_max_rxfh_channel(), which does both checks. We can later extend this wrapper to add the memory provider check in one place. Note that in the current code, we put a descriptive genl error message when we run into issues. To preserve the error message, we pass the genl_info* to the common helper. The ioctl calls can pass NULL instead. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20240808205345.2141858-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-09 21:52:13 -07:00
David S. Miller	600a919310	Merge branch 'selftest-rds' Allison Henderson says: ==================== selftests: rds selftest This series is a new selftest that Vegard, Chuck and myself have been working on to provide some test coverage for rds. I've modified the scripts to include the feedback from the last version, but let me know if there's anything missed. Questions and comments appreciated. Thanks everyone! Allison Changes in v2: - Removed qemu vm creation and related code - Updated README.txt with examples of running the test with virtme - Removed init.sh. run.sh now directly calls test.py - Some clean up done with the return code handling since there is no vm between the scripts anymore - Imported ip python function in tools/testing/selftests/net/lib/py/utils.py into test.py - Adapted test.py to use the imported ip function, and removed the local ip wrapper - Some line wrap clean up - Link to v1: https://lore.kernel.org/netdev/20240626012834.5678-3-allison.henderson@oracle.com/T ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-09 13:18:46 +01:00
Vegard Nossum	3ade6ce125	selftests: rds: add testing infrastructure This adds some basic self-testing infrastructure for RDS-TCP. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-09 13:18:46 +01:00
Vegard Nossum	bc75dcc3ce	net: rds: add option for GCOV profiling To better our unit tests we need code coverage to be part of the kernel. This patch borrows heavily from how CONFIG_GCOV_PROFILE_FTRACE is implemented Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-09 13:18:46 +01:00
Vegard Nossum	a0f6e5e9f1	.gitignore: add .gcda files These files contain the runtime coverage data generated by gcov. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-09 13:18:46 +01:00
Simon Horman	36fb51479e	net: stmmac: xgmac: use const char arrays for string constants Jiri Slaby advises me that the preferred mechanism for declaring string constants is static char arrays, so use that here. This mostly reverts commit `1692b9775e` ("net: stmmac: xgmac: use #define for string constants") That commit was a fix for commit `46eba193d0` ("net: stmmac: xgmac: fix handling of DPP safety error for DMA channels"). The fix being replacing const char * with #defines in order to address compilation failures observed on GCC 6 through 10. Compile tested only. No functional change intended. Suggested-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/netdev/485dbc5a-a04b-40c2-9481-955eaa5ce2e2@kernel.org/ Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-09 12:07:57 +01:00
Pawel Dembicki	2524d6c28b	net: dsa: vsc73xx: use defined values in phy operations This commit changes magic numbers in phy operations. Some shifted registers was replaced with bitfield macros. No functional changes done. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-09 12:03:33 +01:00
Matthias Schiffer	eb3ab13d99	net: ti: icssg_prueth: populate netdev of_node Allow of_find_net_device_by_node() to find icssg_prueth ports and make the individual ports' of_nodes available in sysfs. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240807121215.3178964-1-matthias.schiffer@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 19:59:31 -07:00
Christophe JAILLET	0961257604	net: sungem_phy: Constify struct mii_phy_def 'struct mii_phy_def' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increase overall security. While at it fix the checkpatch warning related to this patch (some missing newlines and spaces around *) On a x86_64, with allmodconfig: Before: ====== 27709 928 0 28637 6fdd drivers/net/sungem_phy.o After: ===== text data bss dec hex filename 28157 476 0 28633 6fd9 drivers/net/sungem_phy.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/54c3b30930f80f4895e6fa2f4234714fdea4ef4e.1723033266.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 19:59:06 -07:00
Edward Cree	ceb627435b	net: ethtool: check rxfh_max_num_contexts != 1 at register time A value of 1 doesn't make sense, as it implies the only allowed context ID is 0, which is reserved for the default context - in which case the driver should just not claim to support custom RSS contexts at all. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/c07725b3a3d0b0a63b85e230f9c77af59d4d07f8.1723045898.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 19:55:49 -07:00
Rosen Penev	df665ab188	net: atlantic: use ethtool_sprintf Allows simplifying get_strings and avoids manual pointer manipulation. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240807190303.6143-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 19:53:49 -07:00
Simon Horman	a39036847f	bnx2x: Provide declaration of dmae_reg_go_c in header Provide declaration of dmae_reg_go_c in header. This symbol is defined in bnx2x_main.c. And used in that file and bnx2x_stats.c. However, Sparse complains that there is no declaration of the symbol in dmae_reg_go_c nor is the symbol static. .../bnx2x_main.c:291:11: warning: symbol 'dmae_reg_go_c' was not declared. Should it be static? Address this by moving the declaration from bnx2x_stats.c to bnx2x_reg.h. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240806-bnx2x-dec-v1-1-ae844ec785e4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 19:43:06 -07:00
Jakub Kicinski	e47fd9beb1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Link: https://patch.msgid.link/20240808170148.3629934-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 14:04:17 -07:00
Linus Torvalds	ee9a43b7cf	Including fixes from bluetooth. Current release - regressions: - eth: bnxt_en: fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() on older chips Current release - new code bugs: - ethtool: fix off-by-one error / kdoc contradicting the code for max RSS context IDs - Bluetooth: hci_qca: - QCA6390: fix support on non-DT platforms - QCA6390: don't call pwrseq_power_off() twice - fix a NULL-pointer derefence at shutdown - eth: ice: fix incorrect assigns of FEC counters Previous releases - regressions: - mptcp: fix handling endpoints with both 'signal' and 'subflow' flags set - virtio-net: fix changing ring count when vq IRQ coalescing not supported - eth: gve: fix use of netif_carrier_ok() during reconfig / reset Previous releases - always broken: - eth: idpf: fix bugs in queue re-allocation on reconfig / reset - ethtool: fix context creation with no parameters Misc: - linkwatch: use system_unbound_wq to ease RTNL contention Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAma0+MsACgkQMUZtbf5S Iru7UA//e+qaQ4yYQUZnilt4L+HdlMvFGnj6YV2rlXtG2o4leoKBtrvV4Ov5Hrj+ Lzc9Cz4DQBi3tWvaD8SR2wEPYVmUaQm2DLgxTv3NrwSwbL+yd87y6AqnrxyjLglX E/Lg0FKxJC3zVb4Jb6FOX9KwIP60u+kwL5pev7UbpzNU4aVpNovWKo1sICLFYC5L 2rlCWmd2UHDSxDYIqL3nI68IqmhP5etfJnnURxIzJZInKSLRx4Evkke/3Y+Ju8CE LS5HIsTzYmAmgI0+LLPd9fyL8puYuw+tFGMAKSRHLAlzno2Cpre5IEucfbFhQ81+ 7RfJO+lzTvqcIj17bK+QsgAgQUxlRexdct/0KfBG6O/Ll0ptRGl475X5K8hb471r m3WLtNkByYznKpr6ORvNoGqTSAVJpx0Xdl0YWt5G5ANWCMi2kA51K9JOHRNfOnvZ dq0WMu/kwBq8XJ/l0FjB2lT41bhkOzdbfHiqi60miuixzU3e8rQbs59zHr3wmio2 7NAVfmOPJTILHpyJ4c3DZrKNkxyXnIS04OyDo4SO6QwvJ/XbXi22/ir/3NyKgOTG NqIeA2Ap96n0NzqqUrRMGdSPoKW8Cbl7zNO1opHJj/l57XPMVz+g9qA5mbSCT5k1 fNu2lURKzRLcoq5OmwySLJR2ASMfpaLpmooFYBrtftPVwl4YQvo= =bKTC -----END PGP SIGNATURE----- Merge tag 'net-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth. Current release - regressions: - eth: bnxt_en: fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() on older chips Current release - new code bugs: - ethtool: fix off-by-one error / kdoc contradicting the code for max RSS context IDs - Bluetooth: hci_qca: - QCA6390: fix support on non-DT platforms - QCA6390: don't call pwrseq_power_off() twice - fix a NULL-pointer derefence at shutdown - eth: ice: fix incorrect assigns of FEC counters Previous releases - regressions: - mptcp: fix handling endpoints with both 'signal' and 'subflow' flags set - virtio-net: fix changing ring count when vq IRQ coalescing not supported - eth: gve: fix use of netif_carrier_ok() during reconfig / reset Previous releases - always broken: - eth: idpf: fix bugs in queue re-allocation on reconfig / reset - ethtool: fix context creation with no parameters Misc: - linkwatch: use system_unbound_wq to ease RTNL contention" * tag 'net-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (41 commits) net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897. ethtool: Fix context creation with no parameters net: ethtool: fix off-by-one error in max RSS context IDs net: pse-pd: tps23881: include missing bitfield.h header net: fec: Stop PPS on driver remove net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities l2tp: fix lockdep splat net: stmmac: dwmac4: fix PCS duplex mode decode idpf: fix UAFs when destroying the queues idpf: fix memleak in vport interrupt configuration idpf: fix memory leaks and crashes while performing a soft reset bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register() net/smc: add the max value of fallback reason count Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor Bluetooth: l2cap: always unlock channel in l2cap_conless_channel() Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390 ice: Fix incorrect assigns of FEC counts ...	2024-08-08 13:51:44 -07:00
Linus Torvalds	9466b6ae6b	tracing fixes for v6.11: - Have reading of event format files test if the meta data still exists. When a event is freed, a flag (EVENT_FILE_FL_FREED) in the meta data is set to state that it is to prevent any new references to it from happening while waiting for existing references to close. When the last reference closes, the meta data is freed. But the "format" was missing a check to this flag (along with some other files) that allowed new references to happen, and a use-after-free bug to occur. - Have the trace event meta data use the refcount infrastructure instead of relying on its own atomic counters. - Have tracefs inodes use alloc_inode_sb() for allocation instead of using kmem_cache_alloc() directly. - Have eventfs_create_dir() return an ERR_PTR instead of NULL as the callers expect a real object or an ERR_PTR. - Have release_ei() use call_srcu() and not call_rcu() as all the protection is on SRCU and not RCU. - Fix ftrace_graph_ret_addr() to use the task passed in and not current. - Fix overflow bug in get_free_elt() where the counter can overflow the integer and cause an infinite loop. - Remove unused function ring_buffer_nr_pages() - Have tracefs freeing use the inode RCU infrastructure instead of creating its own. When the kernel had randomize structure fields enabled, the rcu field of the tracefs_inode was overlapping the rcu field of the inode structure, and corrupting it. Instead, use the destroy_inode() callback to do the initial cleanup of the code, and then have free_inode() free it. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZrTvXxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qu39AP9ze6ELpShDrxbXhf0adbNqG2IXMepa MMLqfq8tU8E/vAEAuZXJ6rKXeGvKeONa06ocvWJ0dpb2cy/n4hmx+KtM5gI= =Pkh4 -----END PGP SIGNATURE----- Merge tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Have reading of event format files test if the metadata still exists. When a event is freed, a flag (EVENT_FILE_FL_FREED) in the metadata is set to state that it is to prevent any new references to it from happening while waiting for existing references to close. When the last reference closes, the metadata is freed. But the "format" was missing a check to this flag (along with some other files) that allowed new references to happen, and a use-after-free bug to occur. - Have the trace event meta data use the refcount infrastructure instead of relying on its own atomic counters. - Have tracefs inodes use alloc_inode_sb() for allocation instead of using kmem_cache_alloc() directly. - Have eventfs_create_dir() return an ERR_PTR instead of NULL as the callers expect a real object or an ERR_PTR. - Have release_ei() use call_srcu() and not call_rcu() as all the protection is on SRCU and not RCU. - Fix ftrace_graph_ret_addr() to use the task passed in and not current. - Fix overflow bug in get_free_elt() where the counter can overflow the integer and cause an infinite loop. - Remove unused function ring_buffer_nr_pages() - Have tracefs freeing use the inode RCU infrastructure instead of creating its own. When the kernel had randomize structure fields enabled, the rcu field of the tracefs_inode was overlapping the rcu field of the inode structure, and corrupting it. Instead, use the destroy_inode() callback to do the initial cleanup of the code, and then have free_inode() free it. * tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracefs: Use generic inode RCU for synchronizing freeing ring-buffer: Remove unused function ring_buffer_nr_pages() tracing: Fix overflow in get_free_elt() function_graph: Fix the ret_stack used by ftrace_graph_ret_addr() eventfs: Use SRCU for freeing eventfs_inodes eventfs: Don't return NULL in eventfs_create_dir() tracefs: Fix inode allocation tracing: Use refcount for trace_event_file reference counter tracing: Have format file honor EVENT_FILE_FL_FREED	2024-08-08 13:32:59 -07:00
Linus Torvalds	b3f5620f76	bcachefs fixes for 6.11-rc3 Assorted little stuff: - lockdep fixup for lockdep_set_notrack_class() - we can now remove a device when using erasure coding without deadlocking, though we still hit other issues - the "allocator stuck" timeout is now configurable, and messages are ratelimited; default timeout has been increased from 10 seconds to 30 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAma05JwACgkQE6szbY3K bnYE+Q//VJ6R/UxDoxjk8zgftRcdgwXod6U+/E0Kj3ZBKLYXkcGaWWmmGMkFafBp eL7Y3wtHSKiMsHYX9KEdFUZFLe1KI4c16RgNIXk9nwkF+3/+8pEDHKPFuoGHJH3O HComHGqwVg8Zx2jRNvEkvQ980iH7OBGhCjMFXhJ3xbMGLdw91TQQi49a+Q/vz7QT y3Cl1dgX5xBl7fqKefsYa+X6mpWi4/6t60vJvatI+bvDfznjI6jN3qGVLlQye7tC 6VbJAjHsPPyNMlWa99UaHqDdaM325zR2ES0bsfHd8Up4iAwO8OgjzYQxpYTgi51i 6DTiGEOV2S8gF+Rnprnbzsnau0hEvrtQY2Ub85TCIGbZJa8b+aDIlq9k8jF36O2E 2CUTleQ/E129RxXpkZGsVRpNmemdCi6rHAcluaFEgezX4FJH8BVOwQQq2Xz7rd7E 3ZP5iAWmX0IgOL0VOCP/ZXl/lEMwSk0VAED3jEbT7f2K7rU9nXDO2bIEx1wXDCm1 b32kvmUi2FBjqLHSqvAPEb52tvvZuliMUY7z9dEx+AX9AVC9kGE+amGexosKb/LY nWzey+D0cKHtgbkMFrCClkpg75Tnt9ISJbad53+5qhN8an/a71djdj8Zk0UQnQjv 6Amv4Ns1lDo3XGC1QtYkF5HqiWaupbUXAftptpS4Av4X1zZEQIc= =q1dD -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs Pull bcachefs fixes from Kent Overstreet: "Assorted little stuff: - lockdep fixup for lockdep_set_notrack_class() - we can now remove a device when using erasure coding without deadlocking, though we still hit other issues - the 'allocator stuck' timeout is now configurable, and messages are ratelimited. The default timeout has been increased from 10 seconds to 30" * tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs: bcachefs: Use bch2_wait_on_allocator() in btree node alloc path bcachefs: Make allocator stuck timeout configurable, ratelimit messages bcachefs: Add missing path_traverse() to btree_iter_next_node() bcachefs: ec should not allocate from ro devs bcachefs: Improved allocator debugging for ec bcachefs: Add missing bch2_trans_begin() call bcachefs: Add a comment for bucket helper types bcachefs: Don't rely on implicit unsigned -> signed integer conversion lockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STAT bcachefs: Fix double free of ca->buckets_nouse	2024-08-08 13:27:31 -07:00
Linus Torvalds	cb5b81bc9a	module: warn about excessively long module waits Russell King reported that the arm cbc(aes) crypto module hangs when loaded, and Herbert Xu bisected it to commit `9b9879fc03` ("modules: catch concurrent module loads, treat them as idempotent"), and noted: "So what's happening here is that the first modprobe tries to load a fallback CBC implementation, in doing so it triggers a load of the exact same module due to module aliases. IOW we're loading aes-arm-bs which provides cbc(aes). However, this needs a fallback of cbc(aes) to operate, which is made out of the generic cbc module + any implementation of aes, or ecb(aes). The latter happens to also be provided by aes-arm-cb so that's why it tries to load the same module again" So loading the aes-arm-bs module ends up wanting to recursively load itself, and the recursive load then ends up waiting for the original module load to complete. This is a regression, in that it used to be that we just tried to load the module multiple times, and then as we went on to install it the second time we would instead just error out because the module name already existed. That is actually also exactly what the original "catch concurrent loads" patch did in commit `9828ed3f69` ("module: error out early on concurrent load of the same module file"), but it turns out that it ends up being racy, in that erroring out before the module has been fully initialized will cause failures in dependent module loading. See commit `ac2263b588` (which was the revert of that "error out early") commit for details about why erroring out before the module has been initialized is actually fundamentally racy. Now, for the actual recursive module load (as opposed to just concurrently loading the same module twice), the race is not an issue. At the same time it's hard for the kernel to see that this is recursion, because the module load is always done from a usermode helper, so the recursion is not some simple callchain within the kernel. End result: this is not the real fix, but this at least adds a warning for the situation (admittedly much too late for all the debugging pain that Russell and Herbert went through) and if we can come to a resolution on how to detect the recursion properly, this re-organizes the code to make that easier. Link: https://lore.kernel.org/all/ZrFHLqvFqhzykuYw@shell.armlinux.org.uk/ Reported-by: Russell King <linux@armlinux.org.uk> Debugged-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-08-08 12:29:40 -07:00

1 2 3 4 5 ...

1295128 Commits