linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 00:04:15 +08:00

Author	SHA1	Message	Date
Jakub Kicinski	da4f3b72c8	linux-can-next-for-6.12-20240830 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEUEC6huC2BN0pvD5fKDiiPnotvG8FAmbSOo4THG1rbEBwZW5n dXRyb25peC5kZQAKCRAoOKI+ei28b3Q7CACNWVZ79TdlnhayDyXaJ9E1j/jDjUdY oUpce0n8zCceowkrwsa/XqCltmJ9wi/3FaAl4/kAjGu2GoqPReD+P0UkUAxf0pCX 5JllkFJP5JTZyU81N3ykfKWcZJ03KMuSxc2fQpW36auPuy9LlxeNE1sxGvWYcjex IQvzbmszjwWLvDQ2meszb0MGwW2jD1m+iX2LfqzpBZQx5TFQHA91B4IWPI/VxmJE 78aAB89TYCjE/2FodFH8znKCGcjH61QvmNIkQ530Qca4qeIMBwcFF5YatR77SjaZ QPYsm+n5VxolJMv1/PD10t1kglqJ9lFIfCa72U1tCIIANugaxqddpMHB =Uhyk -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-6.12-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2024-08-30 The first patch is by Duy Nguyen and document the R-Car V4M support in the rcar-canfd DT bindings. Frank Li's patch converts the microchip,mcp251x.txt DT bindings documentation to yaml. A patch by Zhang Changzhong update a comment in the j1939 CAN networking stack. Stefan Mätje's patch updates the CAN configuration netlink code, so that the bit timing calculation doesn't work on stale can_priv::ctrlmode data. Martin Jocic contributes a patch for the kvaser_pciefd driver to convert some ifdefs into if (IS_ENABLED()). The last patch is by Yan Zhen and simplifies the probe() function of the kvaser USB driver by using dev_err_probe(). * tag 'linux-can-next-for-6.12-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: kvaser_usb: Simplify with dev_err_probe() can: kvaser_pciefd: Use IS_ENABLED() instead of #ifdef can: netlink: avoid call to do_set_data_bittiming callback with stale can_priv::ctrlmode can: j1939: use correct function name in comment dt-bindings: can: convert microchip,mcp251x.txt to yaml dt-bindings: can: renesas,rcar-canfd: Document R-Car V4M support ==================== Link: https://patch.msgid.link/20240830214406.1605786-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-02 19:10:45 -07:00
Joe Damato	4e3a024b43	netdev-genl: Set extack and fix error on napi-get In commit `27f91aaf49` ("netdev-genl: Add netlink framework functions for napi"), when an invalid NAPI ID is specified the return value -EINVAL is used and no extack is set. Change the return value to -ENOENT and set the extack. Before this commit: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-get --json='{"id": 451}' Netlink error: Invalid argument nl_len = 36 (20) nl_flags = 0x100 nl_type = 2 error: -22 After this commit: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-get --json='{"id": 451}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.id'} Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240831121707.17562-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-02 18:27:15 -07:00
Ido Schimmel	50033400fc	bpf: Unmask upper DSCP bits in __bpf_redirect_neigh_v4() Unmask the upper DSCP bits when calling ip_route_output_flow() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-31 17:44:51 +01:00
Ido Schimmel	6a59526628	ipv6: sit: Unmask upper DSCP bits in ipip6_tunnel_xmit() The function calls flowi4_init_output() to initialize an IPv4 flow key with which it then performs a FIB lookup using ip_route_output_flow(). The 'tos' variable with which the TOS value in the IPv4 flow key (flowi4_tos) is initialized contains the full DS field. Unmask the upper DSCP bits so that in the future the FIB lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-31 17:44:51 +01:00
Ido Schimmel	13f6538de2	ipv4: Unmask upper DSCP bits in ip_send_unicast_reply() The function calls flowi4_init_output() to initialize an IPv4 flow key with which it then performs a FIB lookup using ip_route_output_flow(). 'arg->tos' with which the TOS value in the IPv4 flow key (flowi4_tos) is initialized contains the full DS field. Unmask the upper DSCP bits so that in the future the FIB lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-31 17:44:51 +01:00
Ido Schimmel	b261b2c6c1	xfrm: Unmask upper DSCP bits in xfrm_get_tos() The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain: xfrm_bundle_create() tos = xfrm_get_tos(fl, family) xfrm_dst_lookup(..., tos, ...) __xfrm_dst_lookup(..., tos, ...) xfrm4_dst_lookup(..., tos, ...) __xfrm4_dst_lookup(..., tos, ...) fl4->flowi4_tos = tos __ip_route_output_key(net, fl4) Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Remove IPTOS_RT_MASK since it is no longer used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-31 17:44:51 +01:00
Ido Schimmel	f6c89e9555	ipv4: Unmask upper DSCP bits when building flow key build_sk_flow_key() and __build_flow_key() are used to build an IPv4 flow key before calling one of the FIB lookup APIs. Unmask the upper DSCP bits so that in the future the lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-31 17:44:51 +01:00
Ido Schimmel	4805646c42	ipv4: icmp: Unmask upper DSCP bits in icmp_route_lookup() The function is called to resolve a route for an ICMP message that is sent in response to a situation. Based on the type of the generated ICMP message, the function is either passed the DS field of the packet that generated the ICMP message or a DS field that is derived from it. Unmask the upper DSCP bits before resolving and output route via ip_route_output_key_hash() so that in the future the lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-31 17:44:51 +01:00
Ido Schimmel	a63cef46ad	ipv4: Unmask upper DSCP bits in ip_route_output_key_hash() Unmask the upper DSCP bits so that in the future output routes could be looked up according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-31 17:44:50 +01:00
Ido Schimmel	47afa284b9	ipv4: Unmask upper DSCP bits in RTM_GETROUTE output route lookup Unmask the upper DSCP bits when looking up an output route via the RTM_GETROUTE netlink message so that in the future the lookup could be performed according to the full DSCP value. No functional changes intended since the upper DSCP bits are masked when comparing against the TOS selectors in FIB rules and routes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-31 17:44:50 +01:00
Zhang Changzhong	dc2ddcd136	can: j1939: use correct function name in comment The function j1939_cancel_all_active_sessions() was renamed to j1939_cancel_active_session() but name in comment wasn't updated. Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Link: https://patch.msgid.link/1724935703-44621-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2024-08-30 22:40:22 +02:00
Diogo Jahchan Koike	cff69f72d3	ethtool: pse-pd: move pse validation into set Move validation into set, removing .set_validate operation as its current implementation holds the rtnl lock for acquiring the PHY device, defeating the intended purpose of checking before grabbing the lock. Reported-by: syzbot+ec369e6d58e210135f71@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ec369e6d58e210135f71 Fixes: `31748765be` ("net: ethtool: pse-pd: Target the command to the requested PHY") Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20240829184830.5861-1-djahchankoike@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-30 12:14:57 -07:00
Eric Dumazet	f17bf505ff	icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netns Previous patch made ICMP rate limits per netns, it makes sense to allow each netns to change the associated sysctl. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240829144641.3880376-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-30 11:14:06 -07:00
Eric Dumazet	b056b4cd91	icmp: move icmp_global.credit and icmp_global.stamp to per netns storage Host wide ICMP ratelimiter should be per netns, to provide better isolation. Following patch in this series makes the sysctl per netns. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240829144641.3880376-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-30 11:14:06 -07:00
Eric Dumazet	8c2bd38b95	icmp: change the order of rate limits ICMP messages are ratelimited : After the blamed commits, the two rate limiters are applied in this order: 1) host wide ratelimit (icmp_global_allow()) 2) Per destination ratelimit (inetpeer based) In order to avoid side-channels attacks, we need to apply the per destination check first. This patch makes the following change : 1) icmp_global_allow() checks if the host wide limit is reached. But credits are not yet consumed. This is deferred to 3) 2) The per destination limit is checked/updated. This might add a new node in inetpeer tree. 3) icmp_global_consume() consumes tokens if prior operations succeeded. This means that host wide ratelimit is still effective in keeping inetpeer tree small even under DDOS. As a bonus, I removed icmp_global.lock as the fast path can use a lock-free operation. Fixes: `c0303efeab` ("net: reduce cycles spend on ICMP replies that gets rate limited") Fixes: `4cdf507d54` ("icmp: add a global rate limitation") Reported-by: Keyu Man <keyu.man@email.ucr.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240829144641.3880376-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-30 11:14:06 -07:00
Yan Zhen	b26b644933	net: openvswitch: Use ERR_CAST() to return Using ERR_CAST() is more reasonable and safer, When it is necessary to convert the type of an error pointer and return it. Signed-off-by: Yan Zhen <yanzhen@vivo.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20240829095509.3151987-1-yanzhen@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-30 11:11:45 -07:00
Jon Maloy	be9a4fb831	tcp: add SO_PEEK_OFF socket option tor TCPv6 When doing further testing of the recently added SO_PEEK_OFF feature for TCP I realized I had omitted to add it for TCP/IPv6. I do that here. Fixes: `05ea491641` ("tcp: add support for SO_PEEK_OFF socket option") Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20240828183752.660267-2-jmaloy@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 13:00:36 -07:00
Hongbo Li	82183b03de	net/ipv4: net: prefer strscpy over strcpy The deprecated helper strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehaviors. The safe replacement is strscpy() [1]. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://patch.msgid.link/20240828123224.3697672-7-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 12:33:14 -07:00
Hongbo Li	af1052fd49	net/tipc: replace deprecated strcpy with strscpy The deprecated helper strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehaviors. The safe replacement is strscpy() [1]. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://patch.msgid.link/20240828123224.3697672-6-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 12:33:14 -07:00
Hongbo Li	597be7bd17	net/netrom: prefer strscpy over strcpy The deprecated helper strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehaviors. The safe replacement is strscpy() [1]. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://patch.msgid.link/20240828123224.3697672-4-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 12:33:07 -07:00
Hongbo Li	b19f69a958	net/ipv6: replace deprecated strcpy with strscpy The deprecated helper strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehaviors. The safe replacement is strscpy() [1]. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://patch.msgid.link/20240828123224.3697672-3-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 12:33:07 -07:00
Hongbo Li	68016b9972	net: prefer strscpy over strcpy The deprecated helper strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehaviors. The safe replacement is strscpy() [1]. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 12:33:07 -07:00
Jakub Kicinski	3cbd2090d3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/faraday/ftgmac100.c `4186c8d9e6` ("net: ftgmac100: Ensure tx descriptor updates are visible") `e24a6c8746` ("net: ftgmac100: Get link speed and duplex for NC-SI") https://lore.kernel.org/0b851ec5-f91d-4dd3-99da-e81b98c9ed28@kernel.org net/ipv4/tcp.c `bac76cf898` ("tcp: fix forever orphan socket caused by tcp_abort") `edefba66d9` ("tcp: rstreason: introduce SK_RST_REASON_TCP_STATE for active reset") https://lore.kernel.org/20240828112207.5c199d41@canb.auug.org.au No adjacent changes. Link: https://patch.msgid.link/20240829130829.39148-1-pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 11:49:10 -07:00
Linus Torvalds	0dd5dd63ba	Including fixes from bluetooth, wireless and netfilter. No known outstanding regressions. Current release - regressions: - wifi: iwlwifi: fix hibernation - eth: ionic: prevent tx_timeout due to frequent doorbell ringing Previous releases - regressions: - sched: fix sch_fq incorrect behavior for small weights - wifi: - iwlwifi: take the mutex before running link selection - wfx: repair open network AP mode - netfilter: restore IP sanity checks for netdev/egress - tcp: fix forever orphan socket caused by tcp_abort - mptcp: close subflow when receiving TCP+FIN - bluetooth: fix random crash seen while removing btnxpuart driver Previous releases - always broken: - mptcp: more fixes for the in-kernel PM - eth: bonding: change ipsec_lock from spin lock to mutex - eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response Misc: - documentation: drop special comment style for net code Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmbQcqISHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkOJkP+QHZx2LCilc0uvrYkqWBz7aYEigISK+6 NdGiF/c9FO/dvmisUbs7i48TXKplHu56bR0YTVm2pdKUNcXO5jUgy+s4n9uncsCF /Cq8WnaXJ3THqKKNlMnSeTJ1URE47iagI+LdX4g9a5HE5GgrORcHm4mfcn7m68EP pZ+TaPDw9jp+o+1nkpqgPe8Vdz1dPlqC1S2KQMl0S60WcSlYgDpVUtVU5m2mitJ1 giNHXcU2UXxFFyvqhHXyQIFIkKU6sNbD32cm9VXDMw680KjmBM63Fz5EIiZospkz efQWHn/xwqZNEOLdN5sPUtKLv02D8sTusfTaGWaNulmNd346ABrkS+fDjHqRxLFb OBKXOn4AG1OPCs4MgrgtJf7mrcwJErbE/21dLqGPZWkOzqbe+7r8pXn0XtS9m+gR V0IkSpwrS1D394nP2TVmw0B+LwXMA3ItzAjNa8Lemr7TVEpAmuCdAz2XiR13KZfw B0DBtWCWr5skgWZdlhofi71OOXC9bWIq75i+aSZR/KXqRj4I38OsargSVNQve/H4 OmOitt1w0ZRhmb+HmW+xgIffikGsi9YEO165UUzUQCQVn00r0EHP8T3Q2d1/2Hzx FtnaYFsBAI2TV2m2DTuJNWc0Fa3G08tJWkoEqbrWeAuebOk0oKWExsXzVfMOrig1 a9nNA98DmlN3 =yJIE -----END PGP SIGNATURE----- Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth, wireless and netfilter. No known outstanding regressions. Current release - regressions: - wifi: iwlwifi: fix hibernation - eth: ionic: prevent tx_timeout due to frequent doorbell ringing Previous releases - regressions: - sched: fix sch_fq incorrect behavior for small weights - wifi: - iwlwifi: take the mutex before running link selection - wfx: repair open network AP mode - netfilter: restore IP sanity checks for netdev/egress - tcp: fix forever orphan socket caused by tcp_abort - mptcp: close subflow when receiving TCP+FIN - bluetooth: fix random crash seen while removing btnxpuart driver Previous releases - always broken: - mptcp: more fixes for the in-kernel PM - eth: bonding: change ipsec_lock from spin lock to mutex - eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response Misc: - documentation: drop special comment style for net code" * tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) nfc: pn533: Add poll mod list filling check mailmap: update entry for Sriram Yagnaraman selftests: mptcp: join: check re-re-adding ID 0 signal mptcp: pm: ADD_ADDR 0 is not a new address selftests: mptcp: join: validate event numbers mptcp: avoid duplicated SUB_CLOSED events selftests: mptcp: join: check re-re-adding ID 0 endp mptcp: pm: fix ID 0 endp usage after multiple re-creations mptcp: pm: do not remove already closed subflows selftests: mptcp: join: no extra msg if no counter selftests: mptcp: join: check re-adding init endp with != id mptcp: pm: reset MPC endp ID when re-added mptcp: pm: skip connecting to already established sf mptcp: pm: send ACK on an active subflow selftests: mptcp: join: check removing ID 0 endpoint mptcp: pm: fix RM_ADDR ID for the initial subflow mptcp: pm: reuse ID 0 after delete and re-add net: busy-poll: use ktime_get_ns() instead of local_clock() sctp: fix association labeling in the duplicate COOKIE-ECHO case mptcp: pr_debug: add missing \n at the end ...	2024-08-30 06:14:39 +12:00
Matthieu Baerts (NGI0)	57f86203b4	mptcp: pm: ADD_ADDR 0 is not a new address The ADD_ADDR 0 with the address from the initial subflow should not be considered as a new address: this is not something new. If the host receives it, it simply means that the address is available again. When receiving an ADD_ADDR for the ID 0, the PM already doesn't consider it as new by not incrementing the 'add_addr_accepted' counter. But the 'accept_addr' might not be set if the limit has already been reached: this can be bypassed in this case. But before, it is important to check that this ADD_ADDR for the ID 0 is for the same address as the initial subflow. If not, it is not something that should happen, and the ADD_ADDR can be ignored. Note that if an ADD_ADDR is received while there is already a subflow opened using the same address, this ADD_ADDR is ignored as well. It means that if multiple ADD_ADDR for ID 0 are received, there will not be any duplicated subflows created by the client. Fixes: `d0876b2284` ("mptcp: add the incoming RM_ADDR support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)	d82809b6c5	mptcp: avoid duplicated SUB_CLOSED events The initial subflow might have already been closed, but still in the connection list. When the worker is instructed to close the subflows that have been marked as closed, it might then try to close the initial subflow again. A consequence of that is that the SUB_CLOSED event can be seen twice: # ip mptcp endpoint 1.1.1.1 id 1 subflow dev eth0 2.2.2.2 id 2 subflow dev eth1 # ip mptcp monitor & [ CREATED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9 [ ESTABLISHED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9 [ SF_ESTABLISHED] remid=0 locid=2 saddr4=2.2.2.2 daddr4=9.9.9.9 # ip mptcp endpoint delete id 1 [ SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9 [ SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9 The first one is coming from mptcp_pm_nl_rm_subflow_received(), and the second one from __mptcp_close_subflow(). To avoid doing the post-closed processing twice, the subflow is now marked as closed the first time. Note that it is not enough to check if we are dealing with the first subflow and check its sk_state: the subflow might have been reset or closed before calling mptcp_close_ssk(). Fixes: `b911c97c7d` ("mptcp: add netlink event support") Cc: stable@vger.kernel.org Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)	9366922adc	mptcp: pm: fix ID 0 endp usage after multiple re-creations 'local_addr_used' and 'add_addr_accepted' are decremented for addresses not related to the initial subflow (ID0), because the source and destination addresses of the initial subflows are known from the beginning: they don't count as "additional local address being used" or "ADD_ADDR being accepted". It is then required not to increment them when the entrypoint used by the initial subflow is removed and re-added during a connection. Without this modification, this entrypoint cannot be removed and re-added more than once. Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/512 Fixes: `3ad14f54bd` ("mptcp: more accurate MPC endpoint tracking") Reported-by: syzbot+455d38ecd5f655fc45cf@syzkaller.appspotmail.com Closes: https://lore.kernel.org/00000000000049861306209237f4@google.com Cc: stable@vger.kernel.org Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)	58e1b66b4e	mptcp: pm: do not remove already closed subflows It is possible to have in the list already closed subflows, e.g. the initial subflow has been already closed, but still in the list. No need to try to close it again, and increments the related counters again. Fixes: `0ee4261a36` ("mptcp: implement mptcp_pm_remove_subflow") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)	dce1c6d1e9	mptcp: pm: reset MPC endp ID when re-added The initial subflow has a special local ID: 0. It is specific per connection. When a global endpoint is deleted and re-added later, it can have a different ID -- most services managing the endpoints automatically don't force the ID to be the same as before. It is then important to track these modifications to be consistent with the ID being used for the address used by the initial subflow, not to confuse the other peer or to send the ID 0 for the wrong address. Now when removing an endpoint, msk->mpc_endpoint_id is reset if it corresponds to this endpoint. When adding a new endpoint, the same variable is updated if the address match the one of the initial subflow. Fixes: `3ad14f54bd` ("mptcp: more accurate MPC endpoint tracking") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)	bc19ff5763	mptcp: pm: skip connecting to already established sf The lookup_subflow_by_daddr() helper checks if there is already a subflow connected to this address. But there could be a subflow that is closing, but taking time due to some reasons: latency, losses, data to process, etc. If an ADD_ADDR is received while the endpoint is being closed, it is better to try connecting to it, instead of rejecting it: the peer which has sent the ADD_ADDR will not be notified that the ADD_ADDR has been rejected for this reason, and the expected subflow will not be created at the end. This helper should then only look for subflows that are established, or going to be, but not the ones being closed. Fixes: `d84ad04941` ("mptcp: skip connecting the connected address") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)	c07cc3ed89	mptcp: pm: send ACK on an active subflow Taking the first one on the list doesn't work in some cases, e.g. if the initial subflow is being removed. Pick another one instead of not sending anything. Fixes: `84dfe3677a` ("mptcp: send out dedicated ADD_ADDR packet") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-29 10:39:50 +02:00
Matthieu Baerts (NGI0)	87b5896f3f	mptcp: pm: fix RM_ADDR ID for the initial subflow The initial subflow has a special local ID: 0. When an endpoint is being deleted, it is then important to check if its address is not linked to the initial subflow to send the right ID. If there was an endpoint linked to the initial subflow, msk's mpc_endpoint_id field will be set. We can then use this info when an endpoint is being removed to see if it is linked to the initial subflow. So now, the correct IDs are passed to mptcp_pm_nl_rm_addr_or_subflow(), it is no longer needed to use mptcp_local_id_match(). Fixes: `3ad14f54bd` ("mptcp: more accurate MPC endpoint tracking") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-29 10:39:49 +02:00
Matthieu Baerts (NGI0)	8b8ed1b429	mptcp: pm: reuse ID 0 after delete and re-add When the endpoint used by the initial subflow is removed and re-added later, the PM has to force the ID 0, it is a special case imposed by the MPTCP specs. Note that the endpoint should then need to be re-added reusing the same ID. Fixes: `3ad14f54bd` ("mptcp: more accurate MPC endpoint tracking") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-29 10:39:49 +02:00
Maxime Chevallier	ad78337cb2	net: ethtool: cable-test: Release RTNL when the PHY isn't found Use the correct logic to check for the presence of a PHY device, and jump to a label that correctly releases RTNL in case of an error, as we are holding RTNL at that point. Fixes: `3688ff3077` ("net: ethtool: cable-test: Target the command to the requested PHY") Closes: https://lore.kernel.org/netdev/20240827104825.5cbe0602@fedora-3.home/T/#m6bc49cdcc5cfab0d162516b92916b944a01c833f Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240827092314.2500284-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-28 17:18:53 -07:00
Eric Dumazet	c0a11493ee	tcp: annotate data-races around tcptw->tw_rcv_nxt No lock protects tcp tw fields. tcptw->tw_rcv_nxt can be changed from twsk_rcv_nxt_update() while other threads might read this field. Add READ_ONCE()/WRITE_ONCE() annotations, and make sure tcp_timewait_state_process() reads tcptw->tw_rcv_nxt only once. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20240827015250.3509197-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-28 17:08:17 -07:00
Eric Dumazet	3e5cbbb1fb	tcp: remove volatile qualifier on tw_substate Using a volatile qualifier for a specific struct field is unusual. Use instead READ_ONCE()/WRITE_ONCE() where necessary. tcp_timewait_state_process() can change tw_substate while other threads are reading this field. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20240827015250.3509197-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-28 17:08:16 -07:00
Jakub Kicinski	41901c227e	Regressions: * wfx: fix for open network connection * iwlwifi: fix for hibernate (due to fast resume feature) * iwlwifi: fix for a few warnings that were recently added (had previously been messages not warnings) Previously broken: * mwifiex: fix static structures used for per-device data * iwlwifi: some harmless FW related messages were tagged too high priority * iwlwifi: scan buffers weren't checked correctly * mac80211: SKB leak on beacon error path * iwlwifi: fix ACPI table interop with certain BIOSes * iwlwifi: fix locking for link selection * mac80211: fix SSID comparison in beacon validation -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmbO9O8ACgkQ10qiO8sP aACYDA//TwIfA3+SMUducdNfLjRPa8hoxitZqXUdMVHO/gj6b5isz7g15KgL6UV6 NtJyPSLfYPY4UHsXvEs6zreLQMoQgjpk58sq8uE0y9KxjNzeVd9Qy0rq66KqZR5p N9bM10qx1oT6CkFK8F1GGKKmOVuYm3bLtwcvCZICcMBi1ziuK6A5FoOFdbzXdUET QaDFc5P6rtNh0WAtsGjoed+9GQ+AhXcBloohfRZvCHCaY8jMdJyAKACpBQmPUSbX b1WVwp/cLNeUBmkhhfPE2xJmur4WiI2s/Xy94b5naUIn33re1AfAu2msuhQCyoHv LiF9wG++NW2Gbh7F5vFt4PVPgCKgkhgmPXDy+yFJose4Ay6wpq8nr0XzgG4Y/mWQ GYElJrZPQnqxnSMa3VEgLBuGas0KOTfAwXIIGZiFDVlAlPGO1zL5fti25pCQbkKH NYcNuWCUX5jPrCLjx0SL9P7BLpr8eiWtvq9h+8DfkNWrcSFllIJzkCKYbP7ps9cE KmFp4ilqmKlYKcMLQM8ePgx2vPegQfmPczcqdYAMJcnXmOcQS7s47jI6BV41CVPt ls2+/cO2BSexKB4yGWSbF8Xk9+67UL8h10Z0OJ5gD7zWESwyRi2N7q5CiA5XOTLu RrbkH3uPzWLP+A/uXkQBMUEihaJSwfAO67Wrzi91kSdyPWmervo= =024u -----END PGP SIGNATURE----- Merge tag 'wireless-2024-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Regressions: * wfx: fix for open network connection * iwlwifi: fix for hibernate (due to fast resume feature) * iwlwifi: fix for a few warnings that were recently added (had previously been messages not warnings) Previously broken: * mwifiex: fix static structures used for per-device data * iwlwifi: some harmless FW related messages were tagged too high priority * iwlwifi: scan buffers weren't checked correctly * mac80211: SKB leak on beacon error path * iwlwifi: fix ACPI table interop with certain BIOSes * iwlwifi: fix locking for link selection * mac80211: fix SSID comparison in beacon validation * tag 'wireless-2024-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: clear trans->state earlier upon error wifi: wfx: repair open network AP mode wifi: mac80211: free skb on error path in ieee80211_beacon_get_ap() wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead wifi: iwlwifi: mvm: allow 6 GHz channels in MLO scan wifi: iwlwifi: mvm: pause TCM when the firmware is stopped wifi: iwlwifi: fw: fix wgds rev 3 exact size wifi: iwlwifi: mvm: take the mutex before running link selection wifi: iwlwifi: mvm: fix iwl_mvm_max_scan_ie_fw_cmd_room() wifi: iwlwifi: mvm: fix iwl_mvm_scan_fits() calculation wifi: iwlwifi: lower message level for FW buffer destination wifi: iwlwifi: mvm: fix hibernation wifi: mac80211: fix beacon SSID mismatch handling wifi: mwifiex: duplicate static structs used in driver instances ==================== Link: https://patch.msgid.link/20240828100151.23662-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-28 16:54:45 -07:00
A K M Fazla Mehrab	85d4cf56e9	net/handshake: use sockfd_put() helper Replace fput() with sockfd_put() in handshake_nl_done_doit(). Signed-off-by: A K M Fazla Mehrab <a.mehrab@bytedance.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20240826182652.2449359-1-a.mehrab@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-27 16:09:25 -07:00
Ondrej Mosnacek	3a0504d54b	sctp: fix association labeling in the duplicate COOKIE-ECHO case sctp_sf_do_5_2_4_dupcook() currently calls security_sctp_assoc_request() on new_asoc, but as it turns out, this association is always discarded and the LSM labels never get into the final association (asoc). This can be reproduced by having two SCTP endpoints try to initiate an association with each other at approximately the same time and then peel off the association into a new socket, which exposes the unitialized labels and triggers SELinux denials. Fix it by calling security_sctp_assoc_request() on asoc instead of new_asoc. Xin Long also suggested limit calling the hook only to cases A, B, and D, since in cases C and E the COOKIE ECHO chunk is discarded and the association doesn't enter the ESTABLISHED state, so rectify that as well. One related caveat with SELinux and peer labeling: When an SCTP connection is set up simultaneously in this way, we will end up with an association that is initialized with security_sctp_assoc_request() on both sides, so the MLS component of the security context of the association will get swapped between the peers, instead of just one side setting it to the other's MLS component. However, at that point security_sctp_assoc_request() had already been called on both sides in sctp_sf_do_unexpected_init() (on a temporary association) and thus if the exchange didn't fail before due to MLS, it won't fail now either (most likely both endpoints have the same MLS range). Tested by: - reproducer from https://src.fedoraproject.org/tests/selinux/pull-request/530 - selinux-testsuite (https://github.com/SELinuxProject/selinux-testsuite/) - sctp-tests (https://github.com/sctp/sctp-tests) - no tests failed that wouldn't fail also without the patch applied Fixes: `c081d53f97` ("security: pass asoc to sctp_assoc_request and sctp_sk_clone") Suggested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Xin Long <lucien.xin@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM/SELinux) Link: https://patch.msgid.link/20240826130711.141271-1-omosnace@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-27 16:07:12 -07:00
Matthieu Baerts (NGI0)	cb41b195e6	mptcp: pr_debug: add missing \n at the end pr_debug() have been added in various places in MPTCP code to help developers to debug some situations. With the dynamic debug feature, it is easy to enable all or some of them, and asks users to reproduce issues with extra debug. Many of these pr_debug() don't end with a new line, while no 'pr_cont()' are used in MPTCP code. So the goal was not to display multiple debug messages on one line: they were then not missing the '\n' on purpose. Not having the new line at the end causes these messages to be printed with a delay, when something else needs to be printed. This issue is not visible when many messages need to be printed, but it is annoying and confusing when only specific messages are expected, e.g. # echo "func mptcp_pm_add_addr_echoed +fmp" \ > /sys/kernel/debug/dynamic_debug/control # ./mptcp_join.sh "signal address"; \ echo "$(awk '{print $1}' /proc/uptime) - end"; \ sleep 5s; \ echo "$(awk '{print $1}' /proc/uptime) - restart"; \ ./mptcp_join.sh "signal address" 013 signal address (...) 10.75 - end 15.76 - restart 013 signal address [ 10.367935] mptcp:mptcp_pm_add_addr_echoed: MPTCP: msk=(...) (...) => a delay of 5 seconds: printed with a 10.36 ts, but after 'restart' which was printed at the 15.76 ts. The 'Fixes' tag here below points to the first pr_debug() used without '\n' in net/mptcp. This patch could be split in many small ones, with different Fixes tag, but it doesn't seem worth it, because it is easy to re-generate this patch with this simple 'sed' command: git grep -l pr_debug -- net/mptcp \| xargs sed -i "s/$pr_debug(\".*[^n]$$\"[,)]$/\1\\\n\2/g" So in case of conflicts, simply drop the modifications, and launch this command. Fixes: `f870fa0b57` ("mptcp: Add MPTCP socket stubs") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-4-905199fe1172@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-27 14:45:16 -07:00
Matthieu Baerts (NGI0)	2a1f596ebb	mptcp: sched: check both backup in retrans The 'mptcp_subflow_context' structure has two items related to the backup flags: - 'backup': the subflow has been marked as backup by the other peer - 'request_bkup': the backup flag has been set by the host Looking only at the 'backup' flag can make sense in some cases, but it is not the behaviour of the default packet scheduler when selecting paths. As explained in the commit `b6a66e521a` ("mptcp: sched: check both directions for backup"), the packet scheduler should look at both flags, because that was the behaviour from the beginning: the 'backup' flag was set by accident instead of the 'request_bkup' one. Now that the latter has been fixed, get_retrans() needs to be adapted as well. Fixes: `b6a66e521a` ("mptcp: sched: check both directions for backup") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-3-905199fe1172@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-27 14:45:16 -07:00
Matthieu Baerts (NGI0)	f09b0ad55a	mptcp: close subflow when receiving TCP+FIN When a peer decides to close one subflow in the middle of a connection having multiple subflows, the receiver of the first FIN should accept that, and close the subflow on its side as well. If not, the subflow will stay half closed, and would even continue to be used until the end of the MPTCP connection or a reset from the network. The issue has not been seen before, probably because the in-kernel path-manager always sends a RM_ADDR before closing the subflow. Upon the reception of this RM_ADDR, the other peer will initiate the closure on its side as well. On the other hand, if the RM_ADDR is lost, or if the path-manager of the other peer only closes the subflow without sending a RM_ADDR, the subflow would switch to TCP_CLOSE_WAIT, but that's it, leaving the subflow half-closed. So now, when the subflow switches to the TCP_CLOSE_WAIT state, and if the MPTCP connection has not been closed before with a DATA_FIN, the kernel owning the subflow schedules its worker to initiate the closure on its side as well. This issue can be easily reproduced with packetdrill, as visible in [1], by creating an additional subflow, injecting a FIN+ACK before sending the DATA_FIN, and expecting a FIN+ACK in return. Fixes: `40947e1399` ("mptcp: schedule worker when subflow is closed") Cc: stable@vger.kernel.org Link: https://github.com/multipath-tcp/packetdrill/pull/154 [1] Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-1-905199fe1172@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-27 14:45:16 -07:00
Xueming Feng	bac76cf898	tcp: fix forever orphan socket caused by tcp_abort We have some problem closing zero-window fin-wait-1 tcp sockets in our environment. This patch come from the investigation. Previously tcp_abort only sends out reset and calls tcp_done when the socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only purging the write queue, but not close the socket and left it to the timer. While purging the write queue, tp->packets_out and sk->sk_write_queue is cleared along the way. However tcp_retransmit_timer have early return based on !tp->packets_out and tcp_probe_timer have early return based on !sk->sk_write_queue. This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched and socket not being killed by the timers, converting a zero-windowed orphan into a forever orphan. This patch removes the SOCK_DEAD check in tcp_abort, making it send reset to peer and close the socket accordingly. Preventing the timer-less orphan from happening. According to Lorenzo's email in the v1 thread, the check was there to prevent force-closing the same socket twice. That situation is handled by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is already closed. The -ENOENT code comes from the associate patch Lorenzo made for iproute2-ss; link attached below, which also conform to RFC 9293. At the end of the patch, tcp_write_queue_purge(sk) is removed because it was already called in tcp_done_with_error(). p.s. This is the same patch with v2. Resent due to mis-labeled "changes requested" on patchwork.kernel.org. Link: https://patchwork.ozlabs.org/project/netdev/patch/1450773094-7978-3-git-send-email-lorenzo@google.com/ Fixes: `c1e64e298b` ("net: diag: Support destroying TCP sockets.") Signed-off-by: Xueming Feng <kuro@kuroa.me> Tested-by: Lorenzo Colitti <lorenzo@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240826102327.1461482-1-kuro@kuroa.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-27 14:27:49 -07:00
James Chapman	73d33bd063	l2tp: avoid using drain_workqueue in l2tp_pre_exit_net Recent commit `fc7ec7f554` ("l2tp: delete sessions using work queue") incorrectly uses drain_workqueue. The use of drain_workqueue in l2tp_pre_exit_net is flawed because the workqueue is shared by all nets and it is therefore possible for new work items to be queued for other nets while drain_workqueue runs. Instead of using drain_workqueue, use __flush_workqueue twice. The first one will run all tunnel delete work items and any work already queued. When tunnel delete work items are run, they may queue new session delete work items, which the second __flush_workqueue will run. In l2tp_exit_net, warn if any of the net's idr lists are not empty. Fixes: `fc7ec7f554` ("l2tp: delete sessions using work queue") Signed-off-by: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/20240823142257.692667-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-27 13:37:22 -07:00
Diogo Jahchan Koike	3d6a0c4f45	net: fix unreleased lock in cable test fix an unreleased lock in out_dev_put path by removing the (now) unnecessary path. Reported-by: syzbot+c641161e97237326ea74@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c641161e97237326ea74 Fixes: `3688ff3077` ("net: ethtool: cable-test: Target the command to the requested PHY") Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20240826134656.94892-1-djahchankoike@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-27 12:47:38 -07:00
Eric Dumazet	bc21000e99	net_sched: sch_fq: fix incorrect behavior for small weights fq_dequeue() has a complex logic to find packets in one of the 3 bands. As Neal found out, it is possible that one band has a deficit smaller than its weight. fq_dequeue() can return NULL while some packets are elligible for immediate transmit. In this case, more than one iteration is needed to refill pband->credit. With default parameters (weights 589824 196608 65536) bug can trigger if large BIG TCP packets are sent to the lowest priority band. Bisected-by: John Sperbeck <jsperbeck@google.com> Diagnosed-by: Neal Cardwell <ncardwell@google.com> Fixes: `29f834aa32` ("net_sched: sch_fq: add 3 bands and WRR scheduling") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20240824181901.953776-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-27 08:20:45 -07:00
Boris Sukholitko	9388637270	tc: adjust network header after 2nd vlan push <tldr> skb network header of the single-tagged vlan packet continues to point the vlan payload (e.g. IP) after second vlan tag is pushed by tc act_vlan. This causes problem at the dissector which expects double-tagged packet network header to point to the inner vlan. The fix is to adjust network header in tcf_act_vlan.c but requires refactoring of skb_vlan_push function. </tldr> Consider the following shell script snippet configuring TC rules on the veth interface: ip link add veth0 type veth peer veth1 ip link set veth0 up ip link set veth1 up tc qdisc add dev veth0 clsact tc filter add dev veth0 ingress pref 10 chain 0 flower \ num_of_vlans 2 cvlan_ethtype 0x800 action goto chain 5 tc filter add dev veth0 ingress pref 20 chain 0 flower \ num_of_vlans 1 action vlan push id 100 \ protocol 0x8100 action goto chain 5 tc filter add dev veth0 ingress pref 30 chain 5 flower \ num_of_vlans 2 cvlan_ethtype 0x800 action simple sdata "success" Sending double-tagged vlan packet with the IP payload inside: cat <<ENDS \| text2pcap - - \| tcpreplay -i veth1 - 0000 00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 64 ..........."...d 0010 81 00 00 14 08 00 45 04 00 26 04 d2 00 00 7f 11 ......E..&...... 0020 18 ef 0a 00 00 01 14 00 00 02 00 00 00 00 00 12 ................ 0030 e1 c7 00 00 00 00 00 00 00 00 00 00 ............ ENDS will match rule 10, goto rule 30 in chain 5 and correctly emit "success" to the dmesg. OTOH, sending single-tagged vlan packet: cat <<ENDS \| text2pcap - - \| tcpreplay -i veth1 - 0000 00 00 00 00 00 11 00 00 00 00 00 22 81 00 00 14 ...........".... 0010 08 00 45 04 00 2a 04 d2 00 00 7f 11 18 eb 0a 00 ..E............ 0020 00 01 14 00 00 02 00 00 00 00 00 16 e1 bf 00 00 ................ 0030 00 00 00 00 00 00 00 00 00 00 00 00 ............ ENDS will match rule 20, will push the second vlan tag but will not* match rule 30. IOW, the match at rule 30 fails if the second vlan was freshly pushed by the kernel. Lets look at __skb_flow_dissect working on the double-tagged vlan packet. Here is the relevant code from around net/core/flow_dissector.c:1277 copy-pasted here for convenience: if (dissector_vlan == FLOW_DISSECTOR_KEY_MAX && skb && skb_vlan_tag_present(skb)) { proto = skb->protocol; } else { vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), data, hlen, &_vlan); if (!vlan) { fdret = FLOW_DISSECT_RET_OUT_BAD; break; } proto = vlan->h_vlan_encapsulated_proto; nhoff += sizeof(*vlan); } The "else" clause above gets the protocol of the encapsulated packet from the skb data at the network header location. printk debugging has showed that in the good double-tagged packet case proto is htons(0x800 == ETH_P_IP) as expected. However in the single-tagged packet case proto is garbage leading to the failure to match tc filter 30. proto is being set from the skb header pointed by nhoff parameter which is defined at the beginning of __skb_flow_dissect (net/core/flow_dissector.c:1055 in the current version): nhoff = skb_network_offset(skb); Therefore the culprit seems to be that the skb network offset is different between double-tagged packet received from the interface and single-tagged packet having its vlan tag pushed by TC. Lets look at the interesting points of the lifetime of the single/double tagged packets as they traverse our packet flow. Both of them will start at __netif_receive_skb_core where the first vlan tag will be stripped: if (eth_type_vlan(skb->protocol)) { skb = skb_vlan_untag(skb); if (unlikely(!skb)) goto out; } At this stage in double-tagged case skb->data points to the second vlan tag while in single-tagged case skb->data points to the network (eg. IP) header. Looking at TC vlan push action (net/sched/act_vlan.c) we have the following code at tcf_vlan_act (interesting points are in square brackets): if (skb_at_tc_ingress(skb)) [1] skb_push_rcsum(skb, skb->mac_len); .... case TCA_VLAN_ACT_PUSH: err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid \| (p->tcfv_push_prio << VLAN_PRIO_SHIFT), 0); if (err) goto drop; break; .... out: if (skb_at_tc_ingress(skb)) [3] skb_pull_rcsum(skb, skb->mac_len); And skb_vlan_push (net/core/skbuff.c:6204) function does: err = __vlan_insert_tag(skb, skb->vlan_proto, skb_vlan_tag_get(skb)); if (err) return err; skb->protocol = skb->vlan_proto; [2] skb->mac_len += VLAN_HLEN; in the case of pushing the second tag. Lets look at what happens with skb->data of the single-tagged packet at each of the above points: 1. As a result of the skb_push_rcsum, skb->data is moved back to the start of the packet. 2. First VLAN tag is moved from the skb into packet buffer, skb->mac_len is incremented, skb->data still points to the start of the packet. 3. As a result of the skb_pull_rcsum, skb->data is moved forward by the modified skb->mac_len, thus pointing to the network header again. Then __skb_flow_dissect will get confused by having double-tagged vlan packet with the skb->data at the network header. The solution for the bug is to preserve "skb->data at second vlan header" semantics in the skb_vlan_push function. We do this by manipulating skb->network_header rather than skb->mac_len. skb_vlan_push callers are updated to do skb_reset_mac_len. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-27 11:37:42 +02:00
Eric Dumazet	89683b45f1	ipv6: avoid indirect calls for SOL_IP socket options ipv6_setsockopt() can directly call ip_setsockopt() instead of going through udp_prot.setsockopt() ipv6_getsockopt() can directly call ip_getsockopt() instead of going through udp_prot.getsockopt() These indirections predate git history, not sure why they were there. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240823140019.3727643-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-26 14:53:50 -07:00
Hongbo Li	9ceebd7a26	net/ipv4: fix macro definition sk_for_each_bound_bhash The macro sk_for_each_bound_bhash accepts a parameter __sk, but it was not used, rather the sk2 is directly used, so we replace the sk2 with __sk in macro. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://patch.msgid.link/20240823070453.3327832-1-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-26 14:04:25 -07:00
Jamie Bainbridge	a699781c79	ethtool: check device is present when getting link settings A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit `4224cfd7fb` ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: `d519e17e2d` ("net: export device speed and duplex via sysfs") Fixes: `4224cfd7fb` ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-26 14:03:02 -07:00

1 2 3 4 5 ...

78193 Commits