linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 16:24:13 +08:00

Author	SHA1	Message	Date
Luiz Augusto von Dentz	0ece498c27	Bluetooth: MGMT: Make MGMT_OP_LOAD_CONN_PARAM update existing connection This makes MGMT_OP_LOAD_CONN_PARAM update existing connection by dectecting the request is just for one connection, parameters already exists and there is a connection. Since this is a new behavior the revision is also updated to enable userspace to detect it. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-07-14 21:33:24 -04:00
Jakub Kicinski	62fdd1708f	ipsec-next-2024-07-13 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmaSU/QACgkQrB3Eaf9P W7etjA/+I8bWTjMCCGFT7AXIisXWQhHbrRuaU6hpROxWUTAyjUuM4qhdXHYUyG6i 2mcg7Ppqn0etEnrvCDJqgWGPonSJuxKRMpRNiB2uRYZAKDK2X7d5gCVVK+xGyuYn rXjAw3yQ9W6oV8lQvm7GqLYOFL5vj9UA5q8QEhyTxH11HDDRBjlHSgzgWovzGsjO 2qLHSh3wuBuuoWS6jhN5n0pA1mFiKxhzPRRvTV2Q8CEBt+JML0gGd08g0s6tSGMJ qlEGdTHIkIGi/QsbOoRm14X5gYYrDz1EEATISZTA9/Pbb03MsQfxUp6EUZNZIM4O /K9XO7LLXOYWXBcI3BDCHCOT1cJPw1WVvYwlwWzu4DpxelPAc+pk2/QZk9wV2cWd MzScbhHKmZ5GnYnlfQAyOnC5tvQXUBG2OntyXMBGh9seh+H5Lcl1RJAflIwRvBx5 7cnR6HiTmLUlbBxKjSJF+xFPnTucp0J637DkY/ONtAA7qNHnOKh3LWqkIH80q/FI 7Ua0EpgTtzAzN6iR2ujMHusfAjJs4yhMGY5KFGcEHwqS2axYq+mpnaShYzNebzl6 9kOmj6UAVP0tivH2Ahmsz2HaNhZaJ3hXftZeF3zwcoN6XTc3jrQ4JuNyiDcsUdnf ggyLMZ7VI6Jf38ep8LEnfpqQm5qFTVfto62goWWLlGgr4wsy66c= =KyYL -----END PGP SIGNATURE----- Merge tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-07-13 1) Support sending NAT keepalives in ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it. From Eyal Birger. 2) Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths. Currently, IPsec crypto offload is enabled for GRO code path only. This patchset support UDP encapsulation for the non GRO path. From Mike Yu. * tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet xfrm: Allow UDP encapsulation in crypto offload control path xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path xfrm: support sending NAT keepalives in ESP in UDP states ==================== Link: https://patch.msgid.link/20240713102416.3272997-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-14 07:56:32 -07:00
Nicolas Dichtel	abb9a68d2c	ipv6: take care of scope when choosing the src addr When the source address is selected, the scope must be checked. For example, if a loopback address is assigned to the vrf device, it must not be chosen for packets sent outside. CC: stable@vger.kernel.org Fixes: `afbac6010a` ("net: ipv6: Address selection needs to consider L3 domains") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240710081521.3809742-4-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-14 07:34:16 -07:00
Nicolas Dichtel	252442f2ae	ipv6: fix source address selection with route leak By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address. Let's add a check against the output interface and call the appropriate function to select the source address. CC: stable@vger.kernel.org Fixes: `0d240e7811` ("net: vrf: Implement get_saddr for IPv6") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20240710081521.3809742-3-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-14 07:34:16 -07:00
Nicolas Dichtel	6807352353	ipv4: fix source address selection with route leak By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address. Let's add a check against the output interface and call the appropriate function to select the source address. CC: stable@vger.kernel.org Fixes: `8cbb512c92` ("net: Add source address lookup op for VRF") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240710081521.3809742-2-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-14 07:34:15 -07:00
Kory Maincent	4cddb0f15e	net: ethtool: pse-pd: Fix possible null-deref Fix a possible null dereference when a PSE supports both c33 and PoDL, but only one of the netlink attributes is specified. The c33 or PoDL PSE capabilities are already validated in the ethnl_set_pse_validate() call. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240705184116.13d8235a@kernel.org/ Fixes: `4d18e3ddf4` ("net: ethtool: pse-pd: Expand pse commands with the PSE PoE interface") Link: https://patch.msgid.link/20240711-fix_pse_pd_deref-v3-2-edd78fc4fe42@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-14 07:16:18 -07:00
Kory Maincent	93c3a96c30	net: pse-pd: Do not return EOPNOSUPP if config is null For a PSE supporting both c33 and PoDL, setting config for one type of PoE leaves the other type's config null. Currently, this case returns EOPNOTSUPP, which is incorrect. Instead, we should do nothing if the configuration is empty. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Fixes: `d83e13761d` ("net: pse-pd: Use regulator framework within PSE framework") Link: https://patch.msgid.link/20240711-fix_pse_pd_deref-v3-1-edd78fc4fe42@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-14 07:16:18 -07:00
Jakub Kicinski	70c676cb3d	ipsec-2024-07-11 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmaPqSEACgkQrB3Eaf9P W7eOmQ//YVp6OL+oS5lRzLMvhKLXh42qGbaOPAZl/k0cOACsOnNhubTQHUToIMYt FXLVCDrXHU3F4JVGdgzwJb+/2wqElP+3Wlw48WCnycAlB8NpFc24qKwZHWzo04Mv uutWG5oVXXMYsnLEQhsQCMj+rCjDnSJG2bmsQCHS8GFB4PKP/SSGm/H0UFUbYjIE leZ6rPmqmHf/FShqSmm0VTbXyeLE3bIJQ5zfDLzKW9/nO5h/VyZcZCEzEENF5i2i bKaEGSNrK4evyj+9j/B8FDdujEfVbNyanTAkChJgx3Wug6rIy1QdsG2xDpPn3zm+ pdDvSLPAjjLHrCr7yPPnHEdtOYBvnvjW035VBG/q7pNZfHUaKcutvQJESiNVjsV0 hqmL8XhKgdT/0dPrevXVSXcLOXT25EkzLoN8W4P3qOY4OSFQPC8V+ELCOhWGlZwB rKA8/NfEwV2yIlxhEzSYUTaGT3YZVLJsAVuEfR8Y3tq/j7X5G6h4lCKddxNKhLn+ jJroKlKQEHsC7HCMOW9kJijiXWxNjT4cAPRXMSIxf3cL29UwU9zPE1wx1oq1Pr97 FZiGg9IapcK5nKslaim+nwn6PtEJzVzCWtZ5gddtS4qOrZKuveql/B2P1I8EL9S6 LUqOE9gUeQpSdG/M5FqkLJnUE1knHYRZhQw682fA1zvZFj+G9lo= =xFmH -----END PGP SIGNATURE----- Merge tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-07-11 1) Fix esp_output_tail_tcp() on unsupported ESPINTCP. From Hagar Hemdan. 2) Fix two bugs in the recently introduced SA direction separation. From Antony Antony. 3) Fix unregister netdevice hang on hardware offload. We had to add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. 4) Fix netdev reference count imbalance in xfrm_state_find. From Jianbo Liu. 5) Call xfrm_dev_policy_delete when killingi them on offloaded policies. Jianbo Liu. * tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: call xfrm_dev_policy_delete when kill policy xfrm: fix netdev reference count imbalance xfrm: Export symbol xfrm_dev_state_delete. xfrm: Fix unregister netdevice hang on hardware offload. xfrm: Log input direction mismatch error in one place xfrm: Fix input error path memory access net: esp: cleanup esp_output_tail_tcp() in case of unsupported ESPINTCP ==================== Link: https://patch.msgid.link/20240711100025.1949454-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-14 07:10:49 -07:00
Danielle Ratson	275a63c9fe	net: ethtool: Monotonically increase the message sequence number Currently, during the module firmware flashing process, unicast notifications are sent from the kernel using the same sequence number, making it impossible for user space to track missed notifications. Monotonically increase the message sequence number, so the order of notifications could be tracked effectively. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240711080934.2071869-1-danieller@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-13 15:37:36 -07:00
Kuniyuki Iwashima	23e89e8ee7	tcp: Don't drop SYN+ACK for simultaneous connect(). RFC 9293 states that in the case of simultaneous connect(), the connection gets established when SYN+ACK is received. [0] TCP Peer A TCP Peer B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... 6. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED 7. ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED However, since commit `0c24604b68` ("tcp: implement RFC 5961 4.2"), such a SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge ACK. For example, the write() syscall in the following packetdrill script fails with -EAGAIN, and wrong SNMP stats get incremented. 0 socket(..., SOCK_STREAM\|SOCK_NONBLOCK, IPPROTO_TCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8> +0 < S 0:0(0) win 1000 <mss 1000> +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8> +0 < S. 0:0(0) ack 1 win 1000 +0 write(3, ..., 100) = 100 +0 > P. 1:101(100) ack 1 -- # packetdrill cross-synack.pkt cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable) # nstat ... TcpExtTCPChallengeACK 1 0.0 TcpExtTCPSYNChallenge 1 0.0 The problem is that bpf_skops_established() is triggered by the Challenge ACK instead of SYN+ACK. This causes the bpf prog to miss the chance to check if the peer supports a TCP option that is expected to be exchanged in SYN and SYN+ACK. Let's accept a bare SYN+ACK for active-open TCP_SYN_RECV sockets to avoid such a situation. Note that tcp_ack_snd_check() in tcp_rcv_state_process() is skipped not to send an unnecessary ACK, but this could be a bit risky for net.git, so this targets for net-next. Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240710171246.87533-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-13 15:19:49 -07:00
Jakub Kicinski	69cf87304d	Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== idpf: XDP chapter I: convert Rx to libeth Alexander Lobakin says: XDP for idpf is currently 5 chapters: * convert Rx to libeth (this); * convert Tx and stats to libeth; * generic XDP and XSk code changes, libeth_xdp; * actual XDP for idpf via libeth_xdp; * XSk for idpf (^). Part I does the following: * splits &idpf_queue into 4 (RQ, SQ, FQ, CQ) and puts them on a diet; * ensures optimal cacheline placement, strictly asserts CL sizes; * moves currently unused/dead singleq mode out of line; * reuses libeth's Rx ptype definitions and helpers; * uses libeth's Rx buffer management for both header and payload; * eliminates memcpy()s and coherent DMA uses on hotpath, uses napi_build_skb() instead of in-place short skb allocation. Most idpf patches, except for the queue split, removes more lines than adds. Expect far better memory utilization and +5-8% on Rx depending on the case (+17% on skb XDP_DROP :>). * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: idpf: use libeth Rx buffer management for payload buffer idpf: convert header split mode to libeth + napi_build_skb() libeth: support different types of buffers for Rx idpf: remove legacy Page Pool Ethtool stats idpf: reuse libeth's definitions of parsed ptype structures idpf: compile singleq code only under default-n CONFIG_IDPF_SINGLEQ idpf: merge singleq and splitq &net_device_ops idpf: strictly assert cachelines of queue and queue vector structures idpf: avoid bloating &idpf_q_vector with big %NR_CPUS idpf: split &idpf_queue into 4 strictly-typed queue structures idpf: stop using macros for accessing queue descriptors libeth: add cacheline / struct layout assertion helpers page_pool: use __cacheline_group_{begin, end}_aligned() cache: add __cacheline_group_{begin, end}_aligned() (+ couple more) ==================== Link: https://patch.msgid.link/20240710203031.188081-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:27:26 -07:00
Jakub Kicinski	26f453176a	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZpGVmAAKCRDbK58LschI gxB4AQCgquQis63yqTI36j4iXBT+TuxHEBNoQBSLyzYdrLS1dgD/S5DRJDA+3LD+ 394hn/VtB1qvX5vaqjsov4UIwSMyxA0= =OhSn -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-07-12 We've added 23 non-merge commits during the last 3 day(s) which contain a total of 18 files changed, 234 insertions(+), 243 deletions(-). The main changes are: 1) Improve BPF verifier by utilizing overflow.h helpers to check for overflows, from Shung-Hsi Yu. 2) Fix NULL pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT when attr->attach_prog_fd was not specified, from Tengda Wu. 3) Fix arm64 BPF JIT when generating code for BPF trampolines with BPF_TRAMP_F_CALL_ORIG which corrupted upper address bits, from Puranjay Mohan. 4) Remove test_run callback from lwt_seg6local_prog_ops which never worked in the first place and caused syzbot reports, from Sebastian Andrzej Siewior. 5) Relax BPF verifier to accept non-zero offset on KF_TRUSTED_ARGS/ /KF_RCU-typed BPF kfuncs, from Matt Bobrowski. 6) Fix a long standing bug in libbpf with regards to handling of BPF skeleton's forward and backward compatibility, from Andrii Nakryiko. 7) Annotate btf_{seq,snprintf}_show functions with __printf, from Alan Maguire. 8) BPF selftest improvements to reuse common network helpers in sk_lookup test and dropping the open-coded inetaddr_len() and make_socket() ones, from Geliang Tang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) selftests/bpf: Test for null-pointer-deref bugfix in resolve_prog_type() bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again bpf: use check_sub_overflow() to check for subtraction overflows bpf: use check_add_overflow() to check for addition overflows bpf: fix overflow check in adjust_jmp_off() bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o bpf: annotate BTF show functions with __printf bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG selftests/bpf: Close obj in error path in xdp_adjust_tail selftests/bpf: Null checks for links in bpf_tcp_ca selftests/bpf: Use connect_fd_to_fd in sk_lookup selftests/bpf: Use start_server_addr in sk_lookup selftests/bpf: Use start_server_str in sk_lookup selftests/bpf: Close fd in error path in drop_on_reuseport selftests/bpf: Add ASSERT_OK_FD macro selftests/bpf: Add backlog for network_helper_opts selftests/bpf: fix compilation failure when CONFIG_NF_FLOW_TABLE=m bpf: Remove tst_run from lwt_seg6local_prog_ops. bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCU ... ==================== Link: https://patch.msgid.link/20240712212448.5378-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:25:54 -07:00
Jakub Kicinski	e5abd12f3d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c `f7ce5eb2cb` ("bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring()") `20c8ad72eb` ("eth: bnxt: use the RSS context XArray instead of the local list") Adjacent changes: net/ethtool/ioctl.c `503757c809` ("net: ethtool: Fix RSS setting") `eac9122f0c` ("net: ethtool: record custom RSS contexts in the XArray") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:20:30 -07:00
Jakub Kicinski	28c8757a79	net: ethtool: let drivers declare max size of RSS indir table and key Some drivers (bnxt but I think also mlx5 from ML discussions) change the size of the indirection table depending on the number of Rx rings. Decouple the max table size from the size of the currently used table, so that we can reserve space in the context for table growth. Static members in ethtool_ops are good enough for now, we can add callbacks to read the max size more dynamically if someone needs that. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:22 -07:00
Jakub Kicinski	d69ba6bbaf	net: ethtool: let drivers remove lost RSS contexts RSS contexts may get lost from a device, in various extreme circumstances. Specifically if the firmware leaks resources and resets, or crashes and either recovers in partially working state or the crash causes a different FW version to run - creating the context again may fail. Drivers should do their absolute best to prevent this from happening. When it does, however, telling user that a context exists, when it can't possibly be used any more is counter productive. Add a helper for drivers to discard contexts. Print an error, in the future netlink notification will also be sent. More robust approaches were proposed, like keeping the contexts but marking them as "dead" (but possibly resurrected by next reset). That may be better but it's unclear at this stage whether the effort is worth the benefits. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:21 -07:00
Linus Torvalds	528dd46d0f	A quick follow up to yesterday's PR. We got a regressions report for the bnxt patch as soon as it got to your tree. The ethtool fix is also good to have, although it's an older regression. Current release - regressions: - eth: bnxt_en: fix crash in bnxt_get_max_rss_ctx_ring() on older HW when user tries to decrease the ring count Previous releases - regressions: - ethtool: fix RSS setting, accept "no change" setting if the driver doesn't support the new features - eth: i40e: remove needless retries of NVM update, don't wait 20min when we know the firmware update won't succeed Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaR1W4ACgkQMUZtbf5S IrtO5g/+O2iKpyCzkvXUq06Xm3ouTAEQNnKBOkcAq002z2OkPruf83Nu3UK/OPRe /Gmed6AnFWFl1a9gP7zK83goFn9avYaEUXiRACrwNQvS5aj7CxxbBXFGpLEoNmnr DT6ugjkVtkS+SfctZh71CwQ6O9U0rk3c3Ro9vD78IG++/pJp5gsv2JGREcqWNzo7 UmNRJozEZ9nVCuUQ1T2bgtzfcAi2qqtTCYxqXWurVZXJnbqwTW4gjjNBCf5qnzKh g0QrZegYrWzJNkhF6i0vsEpc3vrZ8dzGK3OVJASGW23L7gkc7g5MD81ivcQKwQuM 0oQl245rMpqPXRBYmr8tO3HWGeaMXV0Ut5mhAU1tkTIzX9Qz3vKeHowmDwwMWQmY 345MSxTxhHMXIbPe2fBYyfh/LDp2lLFopYOBmUkJbVNZLY33BYJMvyUTRgjcgTvD vLN1tFAvIrqgSiuV8fwtb/tUwMlgGqouunwDlnqhpsedS3XTcZjeWxWlUJjKsxCN Lgh0uqKRKijM6S7b4vDZ8Nbm7XgGtsnLPcpqQldy5wt/v2W7r/CzQTPU+Q53NbKY 5RkMBrNQQ9pQNZihIdCaF5ASQRcHYuzniCaiE3afnFLpQtjGVctSJF9rFNdP9+46 vgV5MxJNZy02FjngBFYu9RVYXrSe0ZiAEdVXWQ1Ao2UAlqhGRNk= =byGC -----END PGP SIGNATURE----- Merge tag 'net-6.10-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull more networking fixes from Jakub Kicinski: "A quick follow up to yesterday's pull. We got a regressions report for the bnxt patch as soon as it got to your tree. The ethtool fix is also good to have, although it's an older regression. Current release - regressions: - eth: bnxt_en: fix crash in bnxt_get_max_rss_ctx_ring() on older HW when user tries to decrease the ring count Previous releases - regressions: - ethtool: fix RSS setting, accept "no change" setting if the driver doesn't support the new features - eth: i40e: remove needless retries of NVM update, don't wait 20min when we know the firmware update won't succeed" * tag 'net-6.10-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring() octeontx2-af: fix issue with IPv4 match for RSS octeontx2-af: fix issue with IPv6 ext match for RSS octeontx2-af: fix detection of IP layer octeontx2-af: fix a issue with cpt_lf_alloc mailbox octeontx2-af: replace cpt slot with lf id on reg write i40e: fix: remove needless retries of NVM update net: ethtool: Fix RSS setting	2024-07-12 18:33:33 -07:00
Linus Torvalds	a52ff901a1	A fix for a possible use-after-free following "rbd unmap" or "umount" marked for stable and two kernel-doc fixups. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmaRXKYTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi4l2B/9a9jLU/CJwNdvq2wkMn7wis9QlXaz9 shIAefqvCY92pCAbsyHjbG6OnY5hU/eI1l64pnzws8aTtAt4kuhnTwQAMjUIPbjg 6ji1IkDu5Z9csiVxZ0R4KAzjcEOAxnv/TeGMu1FgpRwTvKXqThotHt3N/pm6Nj6x iJlxNpDBgUXjjdOc2Kd6kTVZ2CJwOhaTjMyXphxNbhCIO+2ULHWf0VZecTG4oKbO C7C4fJe+gsJl0GsjzlbTwrj50WTSsKP+QBc6cxutFIaUCO52wgf7k5EBs+5JXb7s mkioLXxS+9Iz58OIrtbW6vMVp35MDSSc8NNRd3uUzsMwIYiU9Uvg38+j =em4x -----END PGP SIGNATURE----- Merge tag 'ceph-for-6.10-rc8' of https://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "A fix for a possible use-after-free following "rbd unmap" or "umount" marked for stable and two kernel-doc fixups" * tag 'ceph-for-6.10-rc8' of https://github.com/ceph/ceph-client: libceph: fix crush_choose_firstn() kernel-doc warnings libceph: suppress crush_choose_indep() kernel-doc warnings libceph: fix race between delayed_work() and ceph_monc_stop()	2024-07-12 10:39:29 -07:00
Mike Yu	447bc4b190	xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet esp_xmit() is already able to handle UDP encapsulation through the call to esp_output_head(). However, the ESP header and the outer IP header are not correct and need to be corrected. Test: Enabled both dir=in/out IPsec crypto offload, and verified IPv4 UDP-encapsulated ESP packets on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-07-12 08:43:29 +02:00
Mike Yu	4ecbac84b5	xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet If xfrm_input() is called with UDP_ENCAP_ESPINUDP, the packet is already processed in UDP layer that removes the UDP header. Therefore, there should be no much difference to treat it as an ESP packet in the XFRM stack. Test: Enabled dir=in IPsec crypto offload, and verified IPv4 UDP-encapsulated ESP packets on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-07-12 08:43:29 +02:00
Mike Yu	a10fb4a84a	xfrm: Allow UDP encapsulation in crypto offload control path Unblock this limitation so that SAs with encapsulation specified can be passed to HW drivers. HW drivers can still reject the SA in their implementation of xdo_dev_state_add if the encapsulation is not supported. Test: Verified on Android device Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-07-12 08:43:28 +02:00
Mike Yu	f7e8542d71	xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path IPsec crypt offload supports outbound IPv6 ESP packets, but it doesn't support inbound IPv6 ESP packets. This change enables the crypto offload for inbound IPv6 ESP packets that are not handled through GRO code path. If HW drivers add the offload information to the skb, the packet will be handled in the crypto offload rx code path. Apart from the change in crypto offload rx code path, the change in xfrm_policy_check is also needed. Exampe of RX data path: +-----------+ +-------+ \| HW Driver \|-->\| wlan0 \|--------+ +-----------+ +-------+ \| v +---------------+ +------+ +------>\| Network Stack \|-->\| Apps \| \| +---------------+ +------+ \| \| \| v +--------+ +------------+ \| ipsec1 \|<--\| XFRM Stack \| +--------+ +------------+ Test: Enabled both in/out IPsec crypto offload, and verified IPv6 ESP packets on Android device on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-07-12 08:43:28 +02:00
James Chapman	2146b7dd35	l2tp: fix l2tp_session_register with colliding l2tpv3 IDs When handling colliding L2TPv3 session IDs, we use the existing session IDR entry and link the new session on that using session->coll_list. However, when using an existing IDR entry, we must not do the idr_replace step. Fixes: `aa5e17e1f5` ("l2tp: store l2tpv3 sessions in per-net IDR") Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-12 04:09:18 +01:00
Shigeru Yoshida	b6c6796789	tipc: Consolidate redundant functions link_is_up() and tipc_link_is_up() have the same functionality. Consolidate these functions. Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@endava.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-12 03:47:43 +01:00
Shigeru Yoshida	534ea0a95e	tipc: Remove unused struct declaration struct tipc_name_table in core.h is not used. Remove this declaration. Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@endava.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-12 03:41:32 +01:00
Alexander Lobakin	13cabc47f8	netdevice: define and allocate &net_device _properly_ In fact, this structure contains a flexible array at the end, but historically its size, alignment etc., is calculated manually. There are several instances of the structure embedded into other structures, but also there's ongoing effort to remove them and we could in the meantime declare &net_device properly. Declare the array explicitly, use struct_size() and store the array size inside the structure, so that __counted_by() can be applied. Don't use PTR_ALIGN(), as SLUB itself tries its best to ensure the allocated buffer is aligned to what the user expects. Also, change its alignment from %NETDEV_ALIGN to the cacheline size as per several suggestions on the netdev ML. bloat-o-meter for vmlinux: free_netdev 445 440 -5 netdev_freemem 24 - -24 alloc_netdev_mqs 1481 1450 -31 On x86_64 with several NICs of different vendors, I was never able to get a &net_device pointer not aligned to the cacheline size after the change. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20240710113036.2125584-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 18:11:31 -07:00
Adrian Moreno	8341eee81c	net: psample: fix flag being set in wrong skb A typo makes PSAMPLE_ATTR_SAMPLE_RATE netlink flag be added to the wrong sk_buff. Fix the error and make the input sk_buff pointer "const" so that it doesn't happen again. Acked-by: Eelco Chaudron <echaudro@redhat.com> Fixes: `7b1b2b60c6` ("net: psample: allow using rate as probability") Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20240710171004.2164034-1-amorenoz@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 18:11:31 -07:00
Jakub Kicinski	80ab5445da	wireless-next patches for v6.11 Most likely the last "new features" pull request for v6.11 with changes both in stack and in drivers. The big thing is the multiple radios for wiphy feature which makes it possible to better advertise radio capabilities to user space. mt76 enabled MLO and iwlwifi re-enabled MLO, ath12k and rtw89 Wi-Fi 6 devices got WoWLAN support. Major changes: cfg80211/mac80211 * remove DEAUTH_NEED_MGD_TX_PREP flag * multiple radios per wiphy support mac80211_hwsim * multi-radio wiphy support ath12k * DebugFS support for datapath statistics * WCN7850: support for WoW (Wake on WLAN) * WCN7850: device-tree bindings ath11k * QCA6390: device-tree bindings iwlwifi * mvm: re-enable Multi-Link Operation (MLO) * aggregation (A-MSDU) optimisations rtw89 * preparation for RTL8852BE-VT support * WoWLAN support for WiFi 6 chips * 36-bit PCI DMA support mt76 * mt7925 Multi-Link Operation (MLO) support -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmaPsBQRHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZt9EQf/Wevf/RnKyHhcuW4kmv0cxnjLW39K7CAh ZlfN2JNTsVk4Na1EBjUgVyAWGdnGQpEhQlJYDExHcf5iD12pMVMIAQS8JXTDxuva +ErAN1652p2N8nFCkNNuGbjYfO0D61xSIQj2uHhAlafK2k8FwnSn6XPP6jjHWvur Acmw6W6l8eL+MP2K1VN2/2S09Gr6IQs7gXgWQX/6CaoK+OynFbUg8T9GQ2aqjr+d lD17YB+oOHNCBxvg9LtBhKdfV14OBkKT6hW+YEqsrBEbx3N07ogDkPO0NUUPMXN3 IePEhj4XXrJ5UBMTvgWzNG9CwPeZFwuKGga+HZO9RKF5rwu42LsUMA== =MpwE -----END PGP SIGNATURE----- Merge tag 'wireless-next-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.11 Most likely the last "new features" pull request for v6.11 with changes both in stack and in drivers. The big thing is the multiple radios for wiphy feature which makes it possible to better advertise radio capabilities to user space. mt76 enabled MLO and iwlwifi re-enabled MLO, ath12k and rtw89 Wi-Fi 6 devices got WoWLAN support. Major changes: cfg80211/mac80211 * remove DEAUTH_NEED_MGD_TX_PREP flag * multiple radios per wiphy support mac80211_hwsim * multi-radio wiphy support ath12k * DebugFS support for datapath statistics * WCN7850: support for WoW (Wake on WLAN) * WCN7850: device-tree bindings ath11k * QCA6390: device-tree bindings iwlwifi * mvm: re-enable Multi-Link Operation (MLO) * aggregation (A-MSDU) optimisations rtw89 * preparation for RTL8852BE-VT support * WoWLAN support for WiFi 6 chips * 36-bit PCI DMA support mt76 * mt7925 Multi-Link Operation (MLO) support * tag 'wireless-next-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (204 commits) wifi: mac80211: fix AP chandef capturing in CSA wifi: iwlwifi: correctly reference TSO page information wifi: mt76: mt792x: fix scheduler interference in drv own process wifi: mt76: mt7925: enabling MLO when the firmware supports it wifi: mt76: mt7925: remove the unused mt7925_mcu_set_chan_info wifi: mt76: mt7925: update mt7925_mac_link_bss_add for MLO wifi: mt76: mt7925: update mt7925_mcu_bss_basic_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_set_timing for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_phy_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_rate_ctrl_tlv for MLO wifi: mt76: mt7925: add mt7925_mcu_sta_eht_mld_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_update for MLO wifi: mt76: mt7925: update mt7925_mcu_add_bss_info for MLO wifi: mt76: mt7925: update mt7925_mcu_bss_mld_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_mld_tlv for MLO wifi: mt76: mt7925: add mt7925_[assign,unassign]_vif_chanctx wifi: mt76: add def_wcid to struct mt76_wcid wifi: mt76: mt7925: report link information in rx status wifi: mt76: mt7925: update rate index according to link id wifi: mt76: mt7925: add link handling in the mt7925_ipv6_addr_change ... ==================== Link: https://patch.msgid.link/20240711102353.0C849C116B1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 17:22:04 -07:00
Saeed Mahameed	503757c809	net: ethtool: Fix RSS setting When user submits a rxfh set command without touching XFRM_SYM_XOR, rxfh.input_xfrm is set to RXH_XFRM_NO_CHANGE, which is equal to 0xff. Testing if (rxfh.input_xfrm & RXH_XFRM_SYM_XOR && !ops->cap_rss_sym_xor_supported) return -EOPNOTSUPP; Will always be true on devices that don't set cap_rss_sym_xor_supported, since rxfh.input_xfrm & RXH_XFRM_SYM_XOR is always true, if input_xfrm was not set, i.e RXH_XFRM_NO_CHANGE=0xff, which will result in failure of any command that doesn't require any change of XFRM, e.g RSS context or hash function changes. To avoid this breakage, test if rxfh.input_xfrm != RXH_XFRM_NO_CHANGE before testing other conditions. Note that the problem will only trigger with XFRM-aware userspace, old ethtool CLI would continue to work. Fixes: `0dd415d155` ("net: ethtool: add a NO_CHANGE uAPI for new RXFH's input_xfrm") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://patch.msgid.link/20240710225538.43368-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 17:21:08 -07:00
Eric Dumazet	cef4902b0f	net: reduce rtnetlink_rcv_msg() stack usage IFLA_MAX is increasing slowly but surely. Some compilers use more than 512 bytes of stack in rtnetlink_rcv_msg() because it calls rtnl_calcit() for RTM_GETLINK message. Use noinline_for_stack attribute to not inline rtnl_calcit(), and directly use nla_for_each_attr_type() (Jakub suggestion) because we only care about IFLA_EXT_MASK at this stage. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240710151653.3786604-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 17:13:58 -07:00
Chen Ni	b07593edd2	net/sched: act_skbmod: convert comma to semicolon Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240709072838.1152880-1-nichen@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 17:12:15 -07:00
Jakub Kicinski	24ac7e5440	ethtool: use the rss context XArray in ring deactivation safety-check ethtool_get_max_rxfh_channel() gets called when user requests deactivating Rx channels. Check the additional RSS contexts, too. While we do track whether RSS context has an indirection table explicitly set by the user, no driver looks at that bit. Assume drivers won't auto-regenerate the additional tables, to be safe. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240710174043.754664-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 14:41:42 -07:00
Jakub Kicinski	2899d58462	ethtool: fail closed if we can't get max channel used in indirection tables Commit `0d1b7d6c92` ("bnxt: fix crashes when reducing ring count with active RSS contexts") proves that allowing indirection table to contain channels with out of bounds IDs may lead to crashes. Currently the max channel check in the core gets skipped if driver can't fetch the indirection table or when we can't allocate memory. Both of those conditions should be extremely rare but if they do happen we should try to be safe and fail the channel change. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240710174043.754664-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 14:41:41 -07:00
Jakub Kicinski	7c8267275d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: net/sched/act_ct.c `26488172b0` ("net/sched: Fix UAF when resolving a clash") `3abbd7ed8b` ("act_ct: prepare for stolen verdict coming from conntrack and nat engine") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 12:58:13 -07:00
Jeff Johnson	359bc01d2e	libceph: fix crush_choose_firstn() kernel-doc warnings Currently, when built with "make W=1", the following warnings are generated: net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'work' not described in 'crush_choose_firstn' net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'weight' not described in 'crush_choose_firstn' net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'weight_max' not described in 'crush_choose_firstn' net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'choose_args' not described in 'crush_choose_firstn' Update the crush_choose_firstn() kernel-doc to document these parameters. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-07-11 16:33:07 +02:00
Jeff Johnson	6463c360d6	libceph: suppress crush_choose_indep() kernel-doc warnings Currently, when built with "make W=1", the following warnings are generated: net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'map' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'work' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'bucket' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'weight' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'weight_max' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'x' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'left' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'numrep' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'type' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'out' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'outpos' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'tries' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'recurse_tries' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'recurse_to_leaf' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'out2' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'parent_r' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'choose_args' not described in 'crush_choose_indep' These warnings are generated because the prologue comment for crush_choose_indep() uses the kernel-doc prefix, but the actual comment is a very brief description that is not in kernel-doc format. Since this is a static function there is no need to fully document the function, so replace the kernel-doc comment prefix with a standard comment prefix to remove these warnings. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-07-11 16:30:53 +02:00
Paolo Abeni	d7c199e77e	netfilter pull request 24-07-11 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmaPpyIACgkQ1V2XiooU IOQgSw/+O7EHZuK7zy59S0XKfIXsC5dEckbww0GNlXVwwnrOzV0C9f23ZpvUWGO1 TguozO4TUjtH9ZO4e95dn5PBdc3FkfNZMfX80ThIzY6ACs03czzJbjjrV68J4rIA sf9N7dehrdQKHyVgoJQgtepJ31BMpjpAbfxawLaW1SRYQPksbP6YB2FhVW+VOXQD /pyUA1xSTIMlwParnQEvZk5202JQm+LqmfT0DFvd14c0m6/i34C9DXlEgcEbI2zC 4EVfmpQ3T1l5Qvrt5Xw1JAAA8A5H5OJvjt1puTHUr5cqcZ+g6gHzdz5pNBNn2cUB xdGkIY+38Vz7GuEcHxMrXTaoh3G6l1wE7op50UHiTsFz2biIF+ITnxust2Ra2GI3 NLPxx2ylqzS7/sLB5qDutRsVDYA1TKVsJG9QlCotPtgijpM6joJstErRSx1rYZfa ARSTN4+rr9uh4LVUintYVXWjAn/StK1dTIUOFy821zjJMhVTkGdXomfFw0H5bd+X Bf7sSTyKT6u5+jJssgQbg4s+mO67NKOv3+gv1FRLSsIpagQ+clm8WlOzmVa6GlH0 sUgUPPuD3TPHOWmUyuQNsXCpAA7UMkq4PH/vutF/vyuyTanU6HmcgBMkA4rnAx0C EykBjAiQ3aRACq9Of+/VBNM/Gcmeat1Y0CRhgZvDG4HZgDBbC/A= =Bt/k -----END PGP SIGNATURE----- Merge tag 'nf-24-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: Patch #1 fixes a bogus WARN_ON splat in nfnetlink_queue. Patch #2 fixes a crash due to stack overflow in chain loop detection by using the existing chain validation routines Both patches from Florian Westphal. netfilter pull request 24-07-11 * tag 'nf-24-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: prefer nft_chain_validate netfilter: nfnetlink_queue: drop bogus WARN_ON ==================== Link: https://patch.msgid.link/20240711093948.3816-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-11 12:57:10 +02:00
Daniel Borkmann	626dfed5fa	net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket When using a BPF program on kernel_connect(), the call can return -EPERM. This causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing the kernel to potentially freeze up. Neil suggested: This will propagate -EPERM up into other layers which might not be ready to handle it. It might be safer to map EPERM to an error we would be more likely to expect from the network system - such as ECONNREFUSED or ENETDOWN. ECONNREFUSED as error seems reasonable. For programs setting a different error can be out of reach (see handling in `4fbac77d2d`) in particular on kernels which do not have `f10d059661` ("bpf: Make BPF_PROG_RUN_ARRAY return -err instead of allow boolean"), thus given that it is better to simply remap for consistent behavior. UDP does handle EPERM in xs_udp_send_request(). Fixes: `d74bad4e74` ("bpf: Hooks for sys_connect") Fixes: `4fbac77d2d` ("bpf: Hooks for sys_bind") Co-developed-by: Lex Siegel <usiegl00@gmail.com> Signed-off-by: Lex Siegel <usiegl00@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trondmy@kernel.org> Cc: Anna Schumaker <anna@kernel.org> Link: https://github.com/cilium/cilium/issues/33395 Link: https://lore.kernel.org/bpf/171374175513.12877.8993642908082014881@noble.neil.brown.name Link: https://patch.msgid.link/9069ec1d59e4b2129fc23433349fd5580ad43921.1720075070.git.daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-11 12:17:45 +02:00
Chengen Du	26488172b0	net/sched: Fix UAF when resolving a clash KASAN reports the following UAF: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> <TASK> asm_common_interrupt+0x27/0x40 Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 The ct may be dropped if a clash has been resolved but is still passed to the tcf_ct_flow_table_process_conn function for further usage. This issue can be fixed by retrieving ct from skb again after confirming conntrack. Fixes: `0cc254e5aa` ("net/sched: act_ct: Offload connections with commit action") Co-developed-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Chengen Du <chengen.du@canonical.com> Link: https://patch.msgid.link/20240710053747.13223-1-chengen.du@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-11 12:07:54 +02:00
Kuniyuki Iwashima	5c0b485a8c	udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). syzkaller triggered the warning [0] in udp_v4_early_demux(). In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount of the looked-up sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace period. Currently, SOCK_RCU_FREE is flagged for a bound socket after being put into the hash table. Moreover, the SOCK_RCU_FREE check is done too early in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race window: CPU1 CPU2 ---- ---- udp_v4_early_demux() udp_lib_get_port() \| \|- hlist_add_head_rcu() \|- sk = __udp4_lib_demux_lookup() \| \|- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk)); `- sock_set_flag(sk, SOCK_RCU_FREE) We had the same bug in TCP and fixed it in commit `871019b22d` ("net: set SOCK_RCU_FREE before inserting socket into hashtable"). Let's apply the same fix for UDP. [0]: WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599 Modules linked in: CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda33046e7 #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599 Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52 RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001 RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680 R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e FS: 00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 PKRU: 55555554 Call Trace: <TASK> ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349 ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624 __netif_receive_skb+0x21/0xd0 net/core/dev.c:5738 netif_receive_skb_internal net/core/dev.c:5824 [inline] netif_receive_skb+0x271/0x300 net/core/dev.c:5884 tun_rx_batched drivers/net/tun.c:1549 [inline] tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002 tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048 new_sync_write fs/read_write.c:497 [inline] vfs_write+0x76f/0x8d0 fs/read_write.c:590 ksys_write+0xbf/0x190 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x41/0x50 fs/read_write.c:652 x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7fc44a68bc1f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48 RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8 RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000 R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000 </TASK> Fixes: `6acc9b432e` ("bpf: Add helper to retrieve socket in BPF") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240709191356.24010-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-11 11:28:27 +02:00
Florian Westphal	cff3bd012a	netfilter: nf_tables: prefer nft_chain_validate nft_chain_validate already performs loop detection because a cycle will result in a call stack overflow (ctx->level >= NFT_JUMP_STACK_SIZE). It also follows maps via ->validate callback in nft_lookup, so there appears no reason to iterate the maps again. nf_tables_check_loops() and all its helper functions can be removed. This improves ruleset load time significantly, from 23s down to 12s. This also fixes a crash bug. Old loop detection code can result in unbounded recursion: BUG: TASK stack guard page was hit at .... Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN CPU: 4 PID: 1539 Comm: nft Not tainted 6.10.0-rc5+ #1 [..] with a suitable ruleset during validation of register stores. I can't see any actual reason to attempt to check for this from nft_validate_register_store(), at this point the transaction is still in progress, so we don't have a full picture of the rule graph. For nf-next it might make sense to either remove it or make this depend on table->validate_state in case we could catch an error earlier (for improved error reporting to userspace). Fixes: `20a69341f2` ("netfilter: nf_tables: add netlink set API") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-07-11 11:26:35 +02:00
Florian Westphal	631a4b3ddc	netfilter: nfnetlink_queue: drop bogus WARN_ON Happens when rules get flushed/deleted while packet is out, so remove this WARN_ON. This WARN exists in one form or another since v4.14, no need to backport this to older releases, hence use a more recent fixes tag. Fixes: `3f80196888` ("netfilter: move nf_reinject into nfnetlink_queue modules") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202407081453.11ac0f63-lkp@intel.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-07-11 11:26:33 +02:00
Oleksij Rempel	c184cf94e7	ethtool: netlink: do not return SQI value if link is down Do not attach SQI value if link is down. "SQI values are only valid if link-up condition is present" per OpenAlliance specification of 100Base-T1 Interoperability Test suite [1]. The same rule would apply for other link types. [1] https://opensig.org/automotive-ethernet-specifications/# Fixes: `8066021915` ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Woojung Huh <woojung.huh@microchip.com> Link: https://patch.msgid.link/20240709061943.729381-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-11 11:19:07 +02:00
Eric Dumazet	97a9063518	tcp: avoid too many retransmit packets If a TCP socket is using TCP_USER_TIMEOUT, and the other peer retracted its window to zero, tcp_retransmit_timer() can retransmit a packet every two jiffies (2 ms for HZ=1000), for about 4 minutes after TCP_USER_TIMEOUT has 'expired'. The fix is to make sure tcp_rtx_probe0_timed_out() takes icsk->icsk_user_timeout into account. Before blamed commit, the socket would not timeout after icsk->icsk_user_timeout, but would use standard exponential backoff for the retransmits. Also worth noting that before commit `e89688e3e9` ("net: tcp: fix unexcepted socket die when snd_wnd is 0"), the issue would last 2 minutes instead of 4. Fixes: `b701a99e43` ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-10 19:05:27 -07:00
Alexander Lobakin	39daa09d34	page_pool: use __cacheline_group_{begin, end}_aligned() Instead of doing __cacheline_group_begin() __aligned(), use the new __cacheline_group_{begin,end}_aligned(), so that it will take care of the group alignment itself. Also replace open-coded `4 * sizeof(long)` in two places with a definition. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-07-10 10:28:23 -07:00
Sebastian Andrzej Siewior	c13fda93ac	bpf: Remove tst_run from lwt_seg6local_prog_ops. The syzbot reported that the lwt_seg6 related BPF ops can be invoked via bpf_test_run() without without entering input_action_end_bpf() first. Martin KaFai Lau said that self test for BPF_PROG_TYPE_LWT_SEG6LOCAL probably didn't work since it was introduced in commit 04d4b274e2a ("ipv6: sr: Add seg6local action End.BPF"). The reason is that the per-CPU variable seg6_bpf_srh_states::srh is never assigned in the self test case but each BPF function expects it. Remove test_run for BPF_PROG_TYPE_LWT_SEG6LOCAL. Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Reported-by: syzbot+608a2acde8c5a101d07d@syzkaller.appspotmail.com Fixes: `d1542d4ae4` ("seg6: Use nested-BH locking for seg6_bpf_srh_states.") Fixes: `004d4b274e` ("ipv6: sr: Add seg6local action End.BPF") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20240710141631.FbmHcQaX@linutronix.de Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-07-10 09:58:52 -07:00
Johannes Berg	408ac28c62	wifi: mac80211: fix AP chandef capturing in CSA When the CSA is announced with only HT elements, the AP chandef isn't captured correctly, leading to crashes in the later code that checks for TPE changes during CSA. Capture the AP chandef correctly in both cases to fix this. Reported-by: Jouni Malinen <j@w1.fi> Fixes: `4540568136` ("wifi: mac80211: handle TPE element during CSA") Link: https://patch.msgid.link/20240709160851.47805f24624d.I024091f701447f7921e93bb23b46e01c2f46347d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-10 12:35:58 +02:00
Ilya Dryomov	69c7b2fe4c	libceph: fix race between delayed_work() and ceph_monc_stop() The way the delayed work is handled in ceph_monc_stop() is prone to races with mon_fault() and possibly also finish_hunting(). Both of these can requeue the delayed work which wouldn't be canceled by any of the following code in case that happens after cancel_delayed_work_sync() runs -- __close_session() doesn't mess with the delayed work in order to avoid interfering with the hunting interval logic. This part was missed in commit `b5d91704f5` ("libceph: behave in mon_fault() if cur_mon < 0") and use-after-free can still ensue on monc and objects that hang off of it, with monc->auth and monc->monmap being particularly susceptible to quickly being reused. To fix this: - clear monc->cur_mon and monc->hunting as part of closing the session in ceph_monc_stop() - bail from delayed_work() if monc->cur_mon is cleared, similar to how it's done in mon_fault() and finish_hunting() (based on monc->hunting) - call cancel_delayed_work_sync() after the session is closed Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/66857 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>	2024-07-10 10:11:55 +02:00
Hugh Dickins	f153831097	net: fix rc7's __skb_datagram_iter() X would not start in my old 32-bit partition (and the "n"-handling looks just as wrong on 64-bit, but for whatever reason did not show up there): "n" must be accumulated over all pages before it's added to "offset" and compared with "copy", immediately after the skb_frag_foreach_page() loop. Fixes: `d2d30a376d` ("net: allow skb_datagram_iter to be called from any context") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://patch.msgid.link/fef352e8-b89a-da51-f8ce-04bc39ee6481@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-09 11:24:54 -07:00
Simon Horman	0d9e699d34	net: tls: Pass union tls_crypto_context pointer to memzero_explicit Pass union tls_crypto_context pointer, rather than struct tls_crypto_info pointer, to memzero_explicit(). The address of the pointer is the same before and after. But the new construct means that the size of the dereferenced pointer type matches the size being zeroed. Which aids static analysis. As reported by Smatch: .../tls_main.c:842 do_tls_setsockopt_conf() error: memzero_explicit() 'crypto_info' too small (4 vs 56) No functional change intended. Compile tested only. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240708-tls-memzero-v2-1-9694eaf31b79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-09 11:14:47 -07:00
Paolo Abeni	7b769adc26	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZoxN0AAKCRDbK58LschI g0c5AQDa3ZV9gfbN42y1zSDoM1uOgO60fb+ydxyOYh8l3+OiQQD/fLfpTY3gBFSY 9yi/pZhw/QdNzQskHNIBrHFGtJbMxgs= =p1Zz -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-07-08 The following pull-request contains BPF updates for your net-next tree. We've added 102 non-merge commits during the last 28 day(s) which contain a total of 127 files changed, 4606 insertions(+), 980 deletions(-). The main changes are: 1) Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman. 2) Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs, from Daniel Xu. 3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement support for BPF exceptions, from Ilya Leoshkevich. 4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui. 5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option for skbs and add coverage along with it, from Vadim Fedorenko. 6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan. 7) Extend the BPF verifier to track the delta between linked registers in order to better deal with recent LLVM code optimizations, from Alexei Starovoitov. 8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should have been a pointer to the map value, from Benjamin Tissoires. 9) Extend BPF selftests to add regular expression support for test output matching and adjust some of the selftest when compiled under gcc, from Cupertino Miranda. 10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always iterates exactly once anyway, from Dan Carpenter. 11) Add the capability to offload the netfilter flowtable in XDP layer through kfuncs, from Florian Westphal & Lorenzo Bianconi. 12) Various cleanups in networking helpers in BPF selftests to shave off a few lines of open-coded functions on client/server handling, from Geliang Tang. 13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so that x86 JIT does not need to implement detection, from Leon Hwang. 14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an out-of-bounds memory access for dynpointers, from Matt Bobrowski. 15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as it might lead to problems on 32-bit archs, from Jiri Olsa. 16) Enhance traffic validation and dynamic batch size support in xsk selftests, from Tushar Vyavahare. bpf-next-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits) selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature bpf: helpers: fix bpf_wq_set_callback_impl signature libbpf: Add NULL checks to bpf_object__{prev_map,next_map} selftests/bpf: Remove exceptions tests from DENYLIST.s390x s390/bpf: Implement exceptions s390/bpf: Change seen_reg to a mask bpf: Remove unnecessary loop in task_file_seq_get_next() riscv, bpf: Optimize stack usage of trampoline bpf, devmap: Add .map_alloc_check selftests/bpf: Remove arena tests from DENYLIST.s390x selftests/bpf: Add UAF tests for arena atomics selftests/bpf: Introduce __arena_global s390/bpf: Support arena atomics s390/bpf: Enable arena s390/bpf: Support address space cast instruction s390/bpf: Support BPF_PROBE_MEM32 s390/bpf: Land on the next JITed instruction after exception s390/bpf: Introduce pre- and post- probe functions s390/bpf: Get rid of get_probe_mem_regno() ... ==================== Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-09 17:01:46 +02:00
Thorsten Blum	0787ab206f	udp: Remove duplicate included header file trace/events/udp.h Remove duplicate included header file trace/events/udp.h and the following warning reported by make includecheck: trace/events/udp.h is included more than once Compile-tested only. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240706071132.274352-2-thorsten.blum@toblux.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-09 13:30:25 +02:00
Felix Fietkau	27d4c03441	wifi: mac80211: add wiphy radio assignment and validation Validate number of channels and interface combinations per radio. Assign each channel context to a radio. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/1d3e9ba70a30ce18aaff337f0a76d7aeb311bafb.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 11:36:12 +02:00
Felix Fietkau	6265c67f26	wifi: mac80211: move code in ieee80211_link_reserve_chanctx to a helper Reduces indentation in preparation for further changes Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/cce95007092336254d51570f4a27e05a6f150a53.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 11:36:12 +02:00
Felix Fietkau	0874bcd0e1	wifi: mac80211: extend ifcomb check functions for multi-radio Add support for counting global and per-radio max/current number of channels, as well as checking radio-specific interface combinations. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/e76307f8ce562a91a74faab274ae01f6a5ba0a2e.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 11:36:12 +02:00
Felix Fietkau	2920bc8d91	wifi: mac80211: add radio index to ieee80211_chanctx_conf Will be used to explicitly assign a channel context to a wiphy radio. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/59f76f57d935f155099276be22badfa671d5bfd9.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 11:36:12 +02:00
Felix Fietkau	a01b1e9f99	wifi: mac80211: add support for DFS with multiple radios DFS can be supported with multi-channel combinations, as long as each DFS capable radio only supports one channel. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/4d27a4adca99fa832af1f7cda4f2e71016bd9fda.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 11:36:12 +02:00
Felix Fietkau	510dba80ed	wifi: cfg80211: add helper for checking if a chandef is valid on a radio Check if the full channel width is in the radio's frequency range. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/7c8ea146feb6f37cee62e5ba6be5370403695797.1720514221.git-series.nbd@nbd.name [add missing Return: documentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 11:36:00 +02:00
Felix Fietkau	abb4cfe366	wifi: cfg80211: extend interface combination check for multi-radio Add a field in struct iface_combination_params to check per-radio interface combinations instead of per-wiphy ones. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/32b28da89c2d759b0324deeefe2be4cee91de18e.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 11:29:59 +02:00
Felix Fietkau	e6c06ca8f2	wifi: cfg80211: add support for advertising multiple radios belonging to a wiphy The prerequisite for MLO support in cfg80211/mac80211 is that all the links participating in MLO must be from the same wiphy/ieee80211_hw. To meet this expectation, some drivers may need to group multiple discrete hardware each acting as a link in MLO under single wiphy. With this change, supported frequencies and interface combinations of each individual radio are reported to user space. This allows user space to figure out the limitations of what combination of channels can be used concurrently. Even for non-MLO devices, this improves support for devices capable of running on multiple channels at the same time. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/18a88f9ce82b1c9f7c12f1672430eaf2bb0be295.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 11:29:59 +02:00
Zong-Zhe Yang	19b815ed71	wifi: mac80211: chanctx emulation set CHANGE_CHANNEL when in_reconfig Chanctx emulation didn't info IEEE80211_CONF_CHANGE_CHANNEL to drivers during ieee80211_restart_hw (ieee80211_emulate_add_chanctx). It caused non-chanctx drivers to not stand on the correct channel after recovery. RX then behaved abnormally. Finally, disconnection/reconnection occurred. So, set IEEE80211_CONF_CHANGE_CHANNEL when in_reconfig. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Link: https://patch.msgid.link/20240709073531.30565-1-kevin_yang@realtek.com Cc: stable@vger.kernel.org Fixes: `0a44dfc070` ("wifi: mac80211: simplify non-chanctx drivers") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-09 11:29:09 +02:00
James Chapman	f8ad00f3fb	l2tp: fix possible UAF when cleaning up tunnels syzbot reported a UAF caused by a race when the L2TP work queue closes a tunnel at the same time as a userspace thread closes a session in that tunnel. Tunnel cleanup is handled by a work queue which iterates through the sessions contained within a tunnel, and closes them in turn. Meanwhile, a userspace thread may arbitrarily close a session via either netlink command or by closing the pppox socket in the case of l2tp_ppp. The race condition may occur when l2tp_tunnel_closeall walks the list of sessions in the tunnel and deletes each one. Currently this is implemented using list_for_each_safe, but because the list spinlock is dropped in the loop body it's possible for other threads to manipulate the list during list_for_each_safe's list walk. This can lead to the list iterator being corrupted, leading to list_for_each_safe spinning. One sequence of events which may lead to this is as follows: * A tunnel is created, containing two sessions A and B. * A thread closes the tunnel, triggering tunnel cleanup via the work queue. * l2tp_tunnel_closeall runs in the context of the work queue. It removes session A from the tunnel session list, then drops the list lock. At this point the list_for_each_safe temporary variable is pointing to the other session on the list, which is session B, and the list can be manipulated by other threads since the list lock has been released. * Userspace closes session B, which removes the session from its parent tunnel via l2tp_session_delete. Since l2tp_tunnel_closeall has released the tunnel list lock, l2tp_session_delete is able to call list_del_init on the session B list node. * Back on the work queue, l2tp_tunnel_closeall resumes execution and will now spin forever on the same list entry until the underlying session structure is freed, at which point UAF occurs. The solution is to iterate over the tunnel's session list using list_first_entry_not_null to avoid the possibility of the list iterator pointing at a list item which may be removed during the walk. Also, have l2tp_tunnel_closeall ref each session while it processes it to prevent another thread from freeing it. cpu1 cpu2 --- --- pppol2tp_release() spin_lock_bh(&tunnel->list_lock); for (;;) { session = list_first_entry_or_null(&tunnel->session_list, struct l2tp_session, list); if (!session) break; list_del_init(&session->list); spin_unlock_bh(&tunnel->list_lock); l2tp_session_delete(session); l2tp_session_delete(session); spin_lock_bh(&tunnel->list_lock); } spin_unlock_bh(&tunnel->list_lock); Calling l2tp_session_delete on the same session twice isn't a problem per-se, but if cpu2 manages to destruct the socket and unref the session to zero before cpu1 progresses then it would lead to UAF. Reported-by: syzbot+b471b7c936301a59745b@syzkaller.appspotmail.com Reported-by: syzbot+c041b4ce3a6dfd1e63e2@syzkaller.appspotmail.com Fixes: `d18d3f0a24` ("l2tp: replace hlist with simple list for per-tunnel session list") Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Link: https://patch.msgid.link/20240704152508.1923908-1-jchapman@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-09 11:06:21 +02:00
Geliang Tang	f0c1802569	skmsg: Skip zero length skb in sk_msg_recvmsg When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch platform, the following kernel panic occurs: [...] Oops[#1]: CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018 ... ... ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560 ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 0000000c (PPLV0 +PIE +PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) BADV: 0000000000000040 PRID: 0014c011 (Loongson-64bit, Loongson-3C5000) Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...) Stack : ... Call Trace: [<9000000004162774>] copy_page_to_iter+0x74/0x1c0 [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560 [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0 [<90000000049aae34>] inet_recvmsg+0x54/0x100 [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0 [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0 [<900000000481e27c>] sys_recvfrom+0x1c/0x40 [<9000000004c076ec>] do_syscall+0x8c/0xc0 [<9000000003731da4>] handle_syscall+0xc4/0x160 Code: ... ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception Kernel relocated by 0x3510000 .text @ 0x9000000003710000 .data @ 0x9000000004d70000 .bss @ 0x9000000006469400 ---[ end Kernel panic - not syncing: Fatal exception ]--- [...] This crash happens every time when running sockmap_skb_verdict_shutdown subtest in sockmap_basic. This crash is because a NULL pointer is passed to page_address() in the sk_msg_recvmsg(). Due to the different implementations depending on the architecture, page_address(NULL) will trigger a panic on Loongarch platform but not on x86 platform. So this bug was hidden on x86 platform for a while, but now it is exposed on Loongarch platform. The root cause is that a zero length skb (skb->len == 0) was put on the queue. This zero length skb is a TCP FIN packet, which was sent by shutdown(), invoked in test_sockmap_skb_verdict_shutdown(): shutdown(p1, SHUT_WR); In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no page is put to this sge (see sg_set_page in sg_set_page), but this empty sge is queued into ingress_msg list. And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it to kmap_local_page() and to page_address(), then kernel panics. To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(), if copy is zero, that means it's a zero length skb, skip invoking copy_page_to_iter(). We are using the EFAULT return triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c. Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Suggested-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn	2024-07-09 10:24:50 +02:00
Johannes Berg	946b6c48cc	net: page_pool: fix warning code WARN_ON_ONCE("string") doesn't really do what appears to be intended, so fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: `90de47f020` ("page_pool: fragment API support for 32-bit arch with 64-bit DMA") Link: https://patch.msgid.link/20240705134221.2f4de205caa1.I28496dc0f2ced580282d1fb892048017c4491e21@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-08 20:19:18 -07:00
Daniel Borkmann	1cb6f0bae5	bpf: Fix too early release of tcx_entry Pedro Pinto and later independently also Hyunwoo Kim and Wongi Lee reported an issue that the tcx_entry can be released too early leading to a use after free (UAF) when an active old-style ingress or clsact qdisc with a shared tc block is later replaced by another ingress or clsact instance. Essentially, the sequence to trigger the UAF (one example) can be as follows: 1. A network namespace is created 2. An ingress qdisc is created. This allocates a tcx_entry, and &tcx_entry->miniq is stored in the qdisc's miniqp->p_miniq. At the same time, a tcf block with index 1 is created. 3. chain0 is attached to the tcf block. chain0 must be connected to the block linked to the ingress qdisc to later reach the function tcf_chain0_head_change_cb_del() which triggers the UAF. 4. Create and graft a clsact qdisc. This causes the ingress qdisc created in step 1 to be removed, thus freeing the previously linked tcx_entry: rtnetlink_rcv_msg() => tc_modify_qdisc() => qdisc_create() => clsact_init() [a] => qdisc_graft() => qdisc_destroy() => __qdisc_destroy() => ingress_destroy() [b] => tcx_entry_free() => kfree_rcu() // tcx_entry freed 5. Finally, the network namespace is closed. This registers the cleanup_net worker, and during the process of releasing the remaining clsact qdisc, it accesses the tcx_entry that was already freed in step 4, causing the UAF to occur: cleanup_net() => ops_exit_list() => default_device_exit_batch() => unregister_netdevice_many() => unregister_netdevice_many_notify() => dev_shutdown() => qdisc_put() => clsact_destroy() [c] => tcf_block_put_ext() => tcf_chain0_head_change_cb_del() => tcf_chain_head_change_item() => clsact_chain_head_change() => mini_qdisc_pair_swap() // UAF There are also other variants, the gist is to add an ingress (or clsact) qdisc with a specific shared block, then to replace that qdisc, waiting for the tcx_entry kfree_rcu() to be executed and subsequently accessing the current active qdisc's miniq one way or another. The correct fix is to turn the miniq_active boolean into a counter. What can be observed, at step 2 above, the counter transitions from 0->1, at step [a] from 1->2 (in order for the miniq object to remain active during the replacement), then in [b] from 2->1 and finally [c] 1->0 with the eventual release. The reference counter in general ranges from [0,2] and it does not need to be atomic since all access to the counter is protected by the rtnl mutex. With this in place, there is no longer a UAF happening and the tcx_entry is freed at the correct time. Fixes: `e420bed025` ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: Pedro Pinto <xten@osec.io> Co-developed-by: Pedro Pinto <xten@osec.io> Signed-off-by: Pedro Pinto <xten@osec.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Hyunwoo Kim <v4bel@theori.io> Cc: Wongi Lee <qwerty@theori.io> Cc: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240708133130.11609-1-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-07-08 14:07:31 -07:00
Gaosheng Cui	a3123341dc	gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey If we fail to call crypto_sync_skcipher_setkey, we should free the memory allocation for cipher, replace err_return with err_free_cipher to free the memory of cipher. Fixes: `4891f2d008` ("gss_krb5: import functionality to derive keys into the kernel") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-07-08 14:10:06 -04:00
Jeff Layton	5f71f3c325	sunrpc: refactor pool_mode setting code Allow the pool_mode setting code to be called from internal callers so we can call it from a new netlink op. Add a new svc_pool_map_get function to return the current setting. Change the existing module parameter handling to use the new interfaces under the hood. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-07-08 14:10:05 -04:00
Jeff Layton	8e0c8d2395	sunrpc: fix up the special handling of sv_nrpools == 1 Only pooled services take a reference to the svc_pool_map. The sunrpc code has always used the sv_nrpools value to detect whether the service is pooled. The problem there is that nfsd is a pooled service, but when it's running in "global" pool_mode, it doesn't take a reference to the pool map because it has a sv_nrpools value of 1. This means that we have two separate codepaths for starting the server, depending on whether it's pooled or not. Fix this by adding a new flag to the svc_serv, that indicates whether the serv is pooled. With this we can have the nfsd service unconditionally take a reference, regardless of pool_mode. Note that this is a behavior change for /sys/module/sunrpc/parameters/pool_mode. Usually this file does not allow you to change the pool-mode while there are nfsd threads running, but if the pool-mode is "global" it's allowed. My assumption is that this is a bug, since it probably should never have worked this way. This patch changes the behavior such that you get back EBUSY even when nfsd is running in global mode. I think this is more reasonable behavior, and given that most people set this today using the module parameter, it's doubtful anyone will notice. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-07-08 14:10:04 -04:00
Chuck Lever	3a6adfcae8	SUNRPC: Add a trace point in svc_xprt_deferred_close The trace point in svc_xprt_close() reports only some local close requests. Try to capture more local close requests. Note that "trace-cmd record -T -e sunrpc:svc_xprt_close" will neatly capture the identity of the caller requesting the close. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-07-08 14:10:04 -04:00
Chuck Lever	d1b586e75e	svcrdma: Handle ADDR_CHANGE CM event properly Sagi tells me that when a bonded device reports an address change, the consumer must destroy its listener IDs and create new ones. See commit `a032e4f6d6` ("nvmet-rdma: fix bonding failover possible NULL deref"). Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-07-08 14:10:02 -04:00
Chuck Lever	283d285462	svcrdma: Refactor the creation of listener CMA ID In a moment, I will add a second consumer of CMA ID creation in svcrdma. Refactor so this code can be reused. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-07-08 14:10:02 -04:00
NeilBrown	6258cf25d5	SUNRPC: avoid soft lockup when transmitting UDP to reachable server. Prior to the commit identified below, call_transmit_status() would handle -EPERM and other errors related to an unreachable server by falling through to call_status() which added a 3-second delay and handled the failure as a timeout. Since that commit, call_transmit_status() falls through to handle_bind(). For UDP this moves straight on to handle_connect() and handle_transmit() so we immediately retransmit - and likely get the same error. This results in an indefinite loop in __rpc_execute() which triggers a soft-lockup warning. For the errors that indicate an unreachable server, call_transmit_status() should fall back to call_status() as it did before. This cannot cause the thundering herd that the previous patch was avoiding, as the call_status() will insert a delay. Fixes: `ed7dc973bd` ("SUNRPC: Prevent thundering herd when the socket is not connected") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-07-08 13:47:24 -04:00
Chuck Lever	0e13dd9ea8	xprtrdma: Remove temp allocation of rpcrdma_rep objects The original code was designed so that most calls to rpcrdma_rep_create() would occur on the NUMA node that the device preferred. There are a few cases where that's not possible, so those reps are marked as temporary. However, we have the device (and its preferred node) already in rpcrdma_rep_create(), so let's use that to guarantee the memory is allocated from the correct node. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-07-08 13:47:24 -04:00
Chuck Lever	9d53378c2c	xprtrdma: Clean up synopsis of frwr_mr_unmap() Commit `7a03aeb66c` ("xprtrdma: Micro-optimize MR DMA-unmapping") removed the last use of the @r_xprt parameter in this function, but neglected to remove the parameter itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-07-08 13:47:24 -04:00
Chuck Lever	3f4eb9ff92	xprtrdma: Handle device removal outside of the CM event handler Wait for all disconnects to complete to ensure the transport has divested all of its hardware resources before the underlying RDMA device can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-07-08 13:47:24 -04:00
Chuck Lever	7e86845a03	rpcrdma: Implement generic device removal Commit `e87a911fed` ("nvme-rdma: use ib_client API to detect device removal") explains the benefits of handling device removal outside of the CM event handler. Sketch in an IB device removal notification mechanism that can be used by both the client and server side RPC-over-RDMA transport implementations. Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-07-08 13:47:24 -04:00
Chuck Lever	acd9f2dd23	xprtrdma: Fix rpcrdma_reqs_reset() Avoid FastReg operations getting MW_BIND_ERR after a reconnect. rpcrdma_reqs_reset() is called on transport tear-down to get each rpcrdma_req back into a clean state. MRs on req->rl_registered are waiting for a FastReg, are already registered, or are waiting for invalidation. If the transport is being torn down when reqs_reset() is called, the matching LocalInv might never be posted. That leaves these MR registered /and/ on req->rl_free_mrs, where they can be re-used for the next connection. Since xprtrdma does not keep specific track of the MR state, it's not possible to know what state these MRs are in, so the only safe thing to do is release them immediately. Fixes: `5de55ce951` ("xprtrdma: Release in-flight MRs on disconnect") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-07-08 13:47:24 -04:00
Michael-CY Lee	4044b23781	wifi: mac80211: do not check BSS color collision in certain cases Do not check BSS color collision in following cases 1. already under a color change 2. color change is disabled Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com> Link: https://patch.msgid.link/20240705074346.11228-1-michael-cy.lee@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-08 18:28:47 +02:00
Michael-CY Lee	7cd4456355	wifi: mac80211: cancel color change finalize work when link is stopped The color change finalize work might be called after the link is stopped, which might lead to a kernel crash. Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com> Link: https://patch.msgid.link/20240705074326.11172-1-michael-cy.lee@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-08 18:28:47 +02:00
Felix Fietkau	574e609c4e	wifi: mac80211: clear vif drv_priv after remove_interface when stopping Avoid reusing stale driver data when an interface is brought down and up again. In order to avoid having to duplicate the memset in every single driver, do it here. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/20240704130947.48609-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-08 18:28:47 +02:00
Felix Fietkau	34ce9c8b8a	wifi: nl80211: split helper function from nl80211_put_iface_combinations Create a helper function that puts the data from struct ieee80211_iface_combination to a nl80211 message. This will be used for adding per-radio interface combination data. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/22a0eee19dbcf98627239328bc66decd3395122c.1719919832.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-07-08 17:27:04 +02:00
Tanzir Hasan	146a99aefe	xprtrdma: removed asm-generic headers from verbs.c asm-generic/barrier.h and asm/bitops.h are already brought into the file through the header linux/sunrpc/svc_rdma.h and the file can still be built with their removal. They have been replaced with the preferred linux/bitops.h and asm/barrier.h to remove the need for the asm-generic header. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tanzir Hasan <tanzirh@google.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2024-07-08 10:55:39 -04:00
Jianbo Liu	89a2aefe4b	xfrm: call xfrm_dev_policy_delete when kill policy xfrm_policy_kill() is called at different places to delete xfrm policy. It will call xfrm_pol_put(). But xfrm_dev_policy_delete() is not called to free the policy offloaded to hardware. The three commits cited here are to handle this issue by calling xfrm_dev_policy_delete() outside xfrm_get_policy(). But they didn't cover all the cases. An example, which is not handled for now, is xfrm_policy_insert(). It is called when XFRM_MSG_UPDPOLICY request is received. Old policy is replaced by new one, but the offloaded policy is not deleted, so driver doesn't have the chance to release hardware resources. To resolve this issue for all cases, move xfrm_dev_policy_delete() into xfrm_policy_kill(), so the offloaded policy can be deleted from hardware when it is called, which avoids hardware resources leakage. Fixes: `919e43fad5` ("xfrm: add an interface to offload policy") Fixes: `bf06fcf4be` ("xfrm: add missed call to delete offloaded policies") Fixes: `982c3aca8b` ("xfrm: delete offloaded policy") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-07-08 13:24:13 +02:00
Jianbo Liu	9199b915e9	xfrm: fix netdev reference count imbalance In cited commit, netdev_tracker_alloc() is called for the newly allocated xfrm state, but dev_hold() is missed, which causes netdev reference count imbalance, because netdev_put() is called when the state is freed in xfrm_dev_state_free(). Fix the issue by replacing netdev_tracker_alloc() with netdev_hold(). Fixes: `f8a70afafc` ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-07-08 13:24:13 +02:00
Florian Westphal	3abbd7ed8b	act_ct: prepare for stolen verdict coming from conntrack and nat engine At this time, conntrack either returns NF_ACCEPT or NF_DROP. To improve debuging it would be nice to be able to replace NF_DROP verdict with NF_DROP_REASON() helper, This helper releases the skb instantly (so drop_monitor can pinpoint exact location) and returns NF_STOLEN. Prepare call sites to deal with this before introducing such changes in conntrack and nat core. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-08 11:35:31 +01:00
Kory Maincent (Dent Project)	30d7b67277	net: ethtool: Add new power limit get and set features This patch expands the status information provided by ethtool for PSE c33 with available power limit and available power limit ranges. It also adds a call to pse_ethtool_set_pw_limit() to configure the PSE control power limit. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-5-320003204264@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 18:30:00 -07:00
Kory Maincent (Dent Project)	e462960021	net: ethtool: pse-pd: Expand C33 PSE status with class, power and extended state This update expands the status information provided by ethtool for PSE c33. It includes details such as the detected class, current power delivered, and extended state information. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-1-320003204264@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 18:30:00 -07:00
Neal Cardwell	0ec986ed7b	tcp: fix incorrect undo caused by DSACK of TLP retransmit Loss recovery undo_retrans bookkeeping had a long-standing bug where a DSACK from a spurious TLP retransmit packet could cause an erroneous undo of a fast recovery or RTO recovery that repaired a single really-lost packet (in a sequence range outside that of the TLP retransmit). Basically, because the loss recovery state machine didn't account for the fact that it sent a TLP retransmit, the DSACK for the TLP retransmit could erroneously be implicitly be interpreted as corresponding to the normal fast recovery or RTO recovery retransmit that plugged a real hole, thus resulting in an improper undo. For example, consider the following buggy scenario where there is a real packet loss but the congestion control response is improperly undone because of this bug: + send packets P1, P2, P3, P4 + P1 is really lost + send TLP retransmit of P4 + receive SACK for original P2, P3, P4 + enter fast recovery, fast-retransmit P1, increment undo_retrans to 1 + receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!) + receive cumulative ACK for P1-P4 (fast retransmit plugged real hole) The fix: when we initialize undo machinery in tcp_init_undo(), if there is a TLP retransmit in flight, then increment tp->undo_retrans so that we make sure that we receive a DSACK corresponding to the TLP retransmit, as well as DSACKs for all later normal retransmits, before triggering a loss recovery undo. Note that we also have to move the line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO we remember the tp->tlp_high_seq value until tcp_init_undo() and clear it only afterward. Also note that the bug dates back to the original 2013 TLP implementation, commit `6ba8a3b19e` ("tcp: Tail loss probe (TLP)"). However, this patch will only compile and work correctly with kernels that have tp->tlp_retrans, which was added only in v5.8 in 2020 in commit `76be93fc07` ("tcp: allow at most one TLP probe per flight"). So we associate this fix with that later commit. Fixes: `76be93fc07` ("tcp: allow at most one TLP probe per flight") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Kevin Yang <yyd@google.com> Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 18:03:44 -07:00
Adrian Moreno	71763d8a82	net: openvswitch: store sampling probability in cb. When a packet sample is observed, the sampling rate that was used is important to estimate the real frequency of such event. Store the probability of the parent sample action in the skb's cb area and use it in psample action to pass it down to psample module. Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Link: https://patch.msgid.link/20240704085710.353845-7-amorenoz@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:45:47 -07:00
Adrian Moreno	aae0b82b46	net: openvswitch: add psample action Add support for a new action: psample. This action accepts a u32 group id and a variable-length cookie and uses the psample multicast group to make the packet available for observability. The maximum length of the user-defined cookie is set to 16, same as tc_cookie, to discourage using cookies that will not be offloadable. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Link: https://patch.msgid.link/20240704085710.353845-6-amorenoz@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:45:47 -07:00
Adrian Moreno	7b1b2b60c6	net: psample: allow using rate as probability Although not explicitly documented in the psample module itself, the definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample. Quoting tc-sample(8): "RATE of 100 will lead to an average of one sampled packet out of every 100 observed." With this semantics, the rates that we can express with an unsigned 32-bits number are very unevenly distributed and concentrated towards "sampling few packets". For example, we can express a probability of 2.32E-8% but we cannot express anything between 100% and 50%. For sampling applications that are capable of sampling a decent amount of packets, this sampling rate semantics is not very useful. Add a new flag to the uAPI that indicates that the sampling rate is expressed in scaled probability, this is: - 0 is 0% probability, no packets get sampled. - U32_MAX is 100% probability, all packets get sampled. Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Link: https://patch.msgid.link/20240704085710.353845-5-amorenoz@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:45:47 -07:00
Adrian Moreno	c35d86a230	net: psample: skip packet copy if no listeners If nobody is listening on the multicast group, generating the sample, which involves copying packet data, seems completely unnecessary. Return fast in this case. Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Link: https://patch.msgid.link/20240704085710.353845-4-amorenoz@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:45:47 -07:00
Adrian Moreno	03448444ae	net: sched: act_sample: add action cookie to sample If the action has a user_cookie, pass it along to the sample so it can be easily identified. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Link: https://patch.msgid.link/20240704085710.353845-3-amorenoz@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:45:47 -07:00
Adrian Moreno	093b0f3665	net: psample: add user cookie Add a user cookie to the sample metadata so that sample emitters can provide more contextual information to samples. If present, send the user cookie in a new attribute: PSAMPLE_ATTR_USER_COOKIE. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Link: https://patch.msgid.link/20240704085710.353845-2-amorenoz@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:45:46 -07:00
Sebastian Andrzej Siewior	fecef4cd42	tun: Assign missing bpf_net_context. During the introduction of struct bpf_net_context handling for XDP-redirect, the tun driver has been missed. Jakub also pointed out that there is another call chain to do_xdp_generic() originating from netif_receive_skb() and drivers may use it outside from the NAPI context. Set the bpf_net_context before invoking BPF XDP program within the TUN driver. Set the bpf_net_context also in do_xdp_generic() if a xdp program is available. Reported-by: syzbot+0b5c75599f1d872bea6f@syzkaller.appspotmail.com Reported-by: syzbot+5ae46b237278e2369cac@syzkaller.appspotmail.com Reported-by: syzbot+c1e04a422bbc0f0f2921@syzkaller.appspotmail.com Fixes: `401cb7dae8` ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240704144815.j8xQda5r@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 16:59:37 -07:00
Florian Westphal	c7f79f2620	openvswitch: prepare for stolen verdict coming from conntrack and nat engine At this time, conntrack either returns NF_ACCEPT or NF_DROP. To improve debuging it would be nice to be able to replace NF_DROP verdict with NF_DROP_REASON() helper, This helper releases the skb instantly (so drop_monitor can pinpoint precise location) and returns NF_STOLEN. Prepare call sites to deal with this before introducing such changes in conntrack and nat core. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Aaron Conole <aconole@redhat.om> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-05 11:05:05 +01:00
Edward Cree	caa93b7c25	ethtool: move firmware flashing flag to struct ethtool_netdev_state Commit `31e0aa99dc` ("ethtool: Veto some operations during firmware flashing process") added a flag module_fw_flash_in_progress to struct net_device. As this is ethtool related state, move it to the recently created struct ethtool_netdev_state, accessed via the 'ethtool' member of struct net_device. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20240703121849.652893-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-04 15:45:15 -07:00
Jakub Kicinski	76ed626479	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/phy/aquantia/aquantia.h `219343755e` ("net: phy: aquantia: add missing include guards") `61578f6793` ("net: phy: aquantia: add support for PHY LEDs") drivers/net/ethernet/wangxun/libwx/wx_hw.c `bd07a98178` ("net: txgbe: remove separate irq request for MSI and INTx") `b501d261a5` ("net: txgbe: add FDIR ATR support") https://lore.kernel.org/all/20240703112936.483c1975@canb.auug.org.au/ include/linux/mlx5/mlx5_ifc.h `048a403648` ("net/mlx5: IFC updates for changing max EQs") `99be56171f` ("net/mlx5e: SHAMPO, Re-enable HW-GRO") https://lore.kernel.org/all/20240701133951.6926b2e3@canb.auug.org.au/ Adjacent changes: drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c `4130c67cd1` ("wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference") `3f3126515f` ("wifi: iwlwifi: mvm: add mvm-specific guard") include/net/mac80211.h `816c6bec09` ("wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP") `5a009b42e0` ("wifi: mac80211: track changes in AP's TPE") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-04 14:16:11 -07:00
Linus Torvalds	033771c085	Including fixes from bluetooth, wireless and netfilter. There's one fix for power management with Intel's e1000e here, Thorsten tells us there's another problem that started in v6.9. We're trying to wrap that up but I don't think it's blocking. Current release - new code bugs: - wifi: mac80211: disable softirqs for queued frame handling - af_unix: fix uninit-value in __unix_walk_scc(), with the new garbage collection algo Previous releases - regressions: - Bluetooth: - qca: fix BT enable failure for QCA6390 after warm reboot - add quirk to ignore reserved PHY bits in LE Extended Adv Report, abused by some Broadcom controllers found on Apple machines - wifi: wilc1000: fix ies_len type in connect path Previous releases - always broken: - tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(), avoid premature timeouts - net: make sure skb_datagram_iter maps fragments page by page, in case we somehow get compound highmem mixed in - eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when more queues are used Misc: - MAINTAINERS: Remembering Larry Finger Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaGv3sACgkQMUZtbf5S IrviRxAAr7yvDg07P1UzFsXI1khOijGCxOYcDDV+AHVZKwJ9fAq0ky5pcaiW62mY h3HffvEbNgZ07zd9l4Z9dVoel5ke9k4yofYZrI0D8X/T1e/Xo0LlxUFGwb0IidBj IYkTnZfu1lGa73TWCIh369s1HgybupiHQicYSw+KIO1wtfds8gvyZJUyjNhlvUYQ NdB/JQrBp/oxm2JlAMDubZfNuVEFCum5J3Ldj5W32j+H82RbGDi/eMn5w+Cs/tEx rRFSuJ1L0rBhNcB4HDbcfin9jHjLhDjNXyYprlZAauMXK5AEBwRcOuEzyXWt1Npq ZkJ8t/ToVLk9QkXaKA1gR9C6Bo8A+SL5a8ddfj/pHEqOa/GNXKYqEvGOmM7mmbBf 93sU+dBYZ3nLGrUtuTRVGTnbr+J1AhP/kUqIY1c787m3gCSB1qFkF67DQiYTGB9g qf+xTcmJeGpL+4OtXgjpK4gUa152g0VsuAMTzecW/7EU/owD0+zCWuVGK9Gv/bgf si40hgZ7Ipnq8k+N+4e2VQp1ufCduT8zGn6sxiivdS5GSNc8e2BnQH3AfjfIM8Z8 rK15U5WJIVQiCkthYh8cx8pxh2uwtcXevjUh4B682/U4HbLdiYfAQuD4/AOc2i8M EJVzl7/5AaxhjoZPxloe9mtRnMvt7XhUiNOW0lR9fgDYOqcmnSo= =MZAG -----END PGP SIGNATURE----- Merge tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, wireless and netfilter. There's one fix for power management with Intel's e1000e here, Thorsten tells us there's another problem that started in v6.9. We're trying to wrap that up but I don't think it's blocking. Current release - new code bugs: - wifi: mac80211: disable softirqs for queued frame handling - af_unix: fix uninit-value in __unix_walk_scc(), with the new garbage collection algo Previous releases - regressions: - Bluetooth: - qca: fix BT enable failure for QCA6390 after warm reboot - add quirk to ignore reserved PHY bits in LE Extended Adv Report, abused by some Broadcom controllers found on Apple machines - wifi: wilc1000: fix ies_len type in connect path Previous releases - always broken: - tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(), avoid premature timeouts - net: make sure skb_datagram_iter maps fragments page by page, in case we somehow get compound highmem mixed in - eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when more queues are used Misc: - MAINTAINERS: Remembering Larry Finger" * tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) bnxt_en: Fix the resource check condition for RSS contexts mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file inet_diag: Initialize pad field in struct inet_diag_req_v2 tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO. selftests: make order checking verbose in msg_zerocopy selftest selftests: fix OOM in msg_zerocopy selftest ice: use proper macro for testing bit ice: Reject pin requests with unsupported flags ice: Don't process extts if PTP is disabled ice: Fix improper extts handling selftest: af_unix: Add test case for backtrack after finalising SCC. af_unix: Fix uninit-value in __unix_walk_scc() bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set() net: rswitch: Avoid use-after-free in rswitch_poll() netfilter: nf_tables: unconditionally flush pending work before notifier wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference wifi: iwlwifi: mvm: avoid link lookup in statistics wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK wifi: wilc1000: fix ies_len type in connect path ...	2024-07-04 10:11:12 -07:00
Paolo Abeni	e367197166	netfilter pull request 24-07-04 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmaF0OAACgkQ1V2XiooU IOSMGg/9GU+i2sYGNr+XR8eXjVdURQ60EkSsbhamEHL6DZt68Dg5Q/xFu5d+weoS S5bFHJVw+qXBxAG6kfXazLPjpMjikdIsuZuQkfbnooI8YJZgdDbE8fuIFIo8WyCK U+u81fWQ0+c5RevWzCz46NXBC3Kowf+ihW9WDlk4mrnzqHY7YG/1VZtQz1pJqugI OxbKsONvs3d3RQhvHg3dia9Vco/Zd4DieZrad+TanoCSoq9JJKdI8ZTG9aEBWoIL bzkACjbZrqyEiQBFi7tt+aG8gMfqc+ZDv6i+Ne/9j6jFqpR03u4Hdg9SUJvv4QcY uYUlHSBspc55PgbwigRZpCO5YLCEUvKXiOcuIqzt4Kxb649SKJKKrHkzAZoxwhb0 JFpYPHond8WiFUeteWQ/3+XN63LZDfVWYEi6WtgLiwXUa6Snf+v+lwUPKm8a77r7 yLyCO6aj8u56Nqy6UDlF+uP1Zoykw6NfIAzTBoZnByS7BsEafAj842TQw54IjCv0 NwKad+ByDlsiorXywCsBGg/iXQ9nSrR80ihda8JJOa7fAf9WviQr+7+XHED21f1C bSSFXMb457WrTWag+iAVIUxfKzOCjdxVludHXuado0BUKq7FT0N3LpN3eIhy4dmV d6HiuGytwHofTzkGX+OOLSexLGvWNX6kJh23skTJMbNizQOvX6g= =Dutj -----END PGP SIGNATURE----- Merge tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains a oneliner patch to inconditionally flush workqueue containing stale objects to be released, syzbot managed to trigger UaF. Patch from Florian Westphal. netfilter pull request 24-07-04 * tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: unconditionally flush pending work before notifier ==================== Link: https://patch.msgid.link/20240703223304.1455-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-04 15:31:27 +02:00
Shigeru Yoshida	61cf1c739f	inet_diag: Initialize pad field in struct inet_diag_req_v2 KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw sockets uses the pad field in struct inet_diag_req_v2 for the underlying protocol. This field corresponds to the sdiag_raw_protocol field in struct inet_diag_req_raw. inet_diag_get_exact_compat() converts inet_diag_req to inet_diag_req_v2, but leaves the pad field uninitialized. So the issue occurs when raw_lookup() accesses the sdiag_raw_protocol field. Fix this by initializing the pad field in inet_diag_get_exact_compat(). Also, do the same fix in inet_diag_dump_compat() to avoid the similar issue in the future. [1] BUG: KMSAN: uninit-value in raw_lookup net/ipv4/raw_diag.c:49 [inline] BUG: KMSAN: uninit-value in raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71 raw_lookup net/ipv4/raw_diag.c:49 [inline] raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99 inet_diag_cmd_exact+0x7d9/0x980 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline] inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x332/0x3d0 net/socket.c:745 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639 __sys_sendmsg net/socket.c:2668 [inline] __do_sys_sendmsg net/socket.c:2677 [inline] __se_sys_sendmsg net/socket.c:2675 [inline] __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was stored to memory at: raw_sock_get+0x650/0x800 net/ipv4/raw_diag.c:71 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99 inet_diag_cmd_exact+0x7d9/0x980 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline] inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x332/0x3d0 net/socket.c:745 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639 __sys_sendmsg net/socket.c:2668 [inline] __do_sys_sendmsg net/socket.c:2677 [inline] __se_sys_sendmsg net/socket.c:2675 [inline] __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Local variable req.i created at: inet_diag_get_exact_compat net/ipv4/inet_diag.c:1396 [inline] inet_diag_rcv_msg_compat+0x2a6/0x530 net/ipv4/inet_diag.c:1426 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282 CPU: 1 PID: 8888 Comm: syz-executor.6 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Fixes: `432490f9d4` ("net: ip, diag -- Add diag interface for raw sockets") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240703091649.111773-1-syoshida@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-04 15:25:09 +02:00
Thorsten Blum	47c130130d	l2tp: Remove duplicate included header file trace.h Remove duplicate included header file trace.h and the following warning reported by make includecheck: trace.h is included more than once Compile-tested only. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20240703061147.691973-2-thorsten.blum@toblux.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-04 12:37:58 +02:00
Kuniyuki Iwashima	4b74726c01	tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO. When we process segments with TCP AO, we don't check it in tcp_parse_options(). Thus, opt_rx->saw_unknown is set to 1, which unconditionally triggers the BPF TCP option parser. Let's avoid the unnecessary BPF invocation. Fixes: `0a3a809089` ("net/tcp: Verify inbound TCP-AO signed segments") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20240703033508.6321-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-04 11:56:12 +02:00
Shigeru Yoshida	927fa5b3e4	af_unix: Fix uninit-value in __unix_walk_scc() KMSAN reported uninit-value access in __unix_walk_scc() [1]. In the list_for_each_entry_reverse() loop, when the vertex's index equals it's scc_index, the loop uses the variable vertex as a temporary variable that points to a vertex in scc. And when the loop is finished, the variable vertex points to the list head, in this case scc, which is a local variable on the stack (more precisely, it's not even scc and might underflow the call stack of __unix_walk_scc(): container_of(&scc, struct unix_vertex, scc_entry)). However, the variable vertex is used under the label prev_vertex. So if the edge_stack is not empty and the function jumps to the prev_vertex label, the function will access invalid data on the stack. This causes the uninit-value access issue. Fix this by introducing a new temporary variable for the loop. [1] BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline] BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline] BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 __unix_walk_scc net/unix/garbage.c:478 [inline] unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Uninit was stored to memory at: unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Local variable entries created at: ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222 netdev_tracker_free include/linux/netdevice.h:4058 [inline] netdev_put include/linux/netdevice.h:4075 [inline] dev_put include/linux/netdevice.h:4101 [inline] update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813 CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Workqueue: events_unbound __unix_gc Fixes: `3484f06317` ("af_unix: Detect Strongly Connected Components.") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-03 19:36:22 -07:00
Jakub Kicinski	1a16cdf77e	net: ethtool: fix compat with old RSS context API Device driver gets access to rxfh_dev, while rxfh is just a local copy of user space params. We need to check what RSS context ID driver assigned in rxfh_dev, not rxfh. Using rxfh leads to trying to store all contexts at index 0xffffffff. From the user perspective it leads to "driver chose duplicate ID" warnings when second context is added and inability to access any contexts even tho they were successfully created - xa_load() for the actual context ID will return NULL, and syscall will return -ENOENT. Looks like a rebasing mistake, since rxfh_dev was added relatively recently by commit `fb6e30a725` ("net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops"). Fixes: `eac9122f0c` ("net: ethtool: record custom RSS contexts in the XArray") Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20240702164157.4018425-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-03 19:22:17 -07:00
Florian Westphal	9f6958ba2e	netfilter: nf_tables: unconditionally flush pending work before notifier syzbot reports: KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831 KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530 KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597 Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45 [..] Workqueue: events nf_tables_trans_destroy_work Call Trace: nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline] nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline] nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597 Problem is that the notifier does a conditional flush, but its possible that the table-to-be-removed is still referenced by transactions being processed by the worker, so we need to flush unconditionally. We could make the flush_work depend on whether we found a table to delete in nf-next to avoid the flush for most cases. AFAICS this problem is only exposed in nf-next, with commit `e169285f8c` ("netfilter: nf_tables: do not store nft_ctx in transaction objects"), with this commit applied there is an unconditional fetch of table->family which is whats triggering the above splat. Fixes: `2c9f029328` ("netfilter: nf_tables: flush pending destroy work before netlink notifier") Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-07-04 00:28:27 +02:00
Greg Kroah-Hartman	d69d804845	driver core: have match() callback in struct bus_type take a const * In the match() callback, the struct device_driver * should not be changed, so change the function callback to be a const . This is one step of many towards making the driver core safe to have struct device_driver in read-only memory. Because the match() callback is in all busses, all busses are modified to handle this properly. This does entail switching some container_of() calls to container_of_const() to properly handle the constant . For some busses, like PCI and USB and HV, the const * is cast away in the match callback as those busses do want to modify those structures at this point in time (they have a local lock in the driver structure.) That will have to be changed in the future if they wish to have their struct device * in read-only-memory. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alex Elder <elder@kernel.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-03 15:16:54 +02:00
Xin Long	cda91d5b91	sctp: cancel a blocking accept when shutdown a listen socket As David Laight noticed, "In a multithreaded program it is reasonable to have a thread blocked in accept(). With TCP a subsequent shutdown(listen_fd, SHUT_RDWR) causes the accept to fail. But nothing happens for SCTP." sctp_disconnect() is eventually called when shutdown a listen socket, but nothing is done in this function. This patch sets RCV_SHUTDOWN flag in sk->sk_shutdown there, and adds the check (sk->sk_shutdown & RCV_SHUTDOWN) to break and return in sctp_accept(). Note that shutdown() is only supported on TCP-style SCTP socket. Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-03 09:45:39 +01:00
Mina Almasry	4dec64c52e	page_pool: convert to use netmem Abstract the memory type from the page_pool so we can later add support for new memory types. Convert the page_pool to use the new netmem type abstraction, rather than use struct page directly. As of this patch the netmem type is a no-op abstraction: it's always a struct page underneath. All the page pool internals are converted to use struct netmem instead of struct page, and the page pool now exports 2 APIs: 1. The existing struct page API. 2. The new struct netmem API. Keeping the existing API is transitional; we do not want to refactor all the current drivers using the page pool at once. The netmem abstraction is currently a no-op. The page_pool uses page_to_netmem() to convert allocated pages to netmem, and uses netmem_to_page() to convert the netmem back to pages to pass to mm APIs, Follow up patches to this series add non-paged netmem support to the page_pool. This change is factored out on its own to limit the code churn to this 1 patch, for ease of code review. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/20240628003253.1694510-6-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-02 18:59:33 -07:00
Sebastian Andrzej Siewior	e3d69f585d	net: Move flush list retrieval to where it is used. The bpf_net_ctx_get_.*_flush_list() are used at the top of the function. This means the variable is always assigned even if unused. By moving the function to where it is used, it is possible to delay the initialisation until it is unavoidable. Not sure how much this gains in reality but by looking at bq_enqueue() (in devmap.c) gcc pushes one register less to the stack. \o/. Move flush list retrieval to where it is used. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 15:26:57 +02:00
Sebastian Andrzej Siewior	d839a73179	net: Optimize xdp_do_flush() with bpf_net_context infos. Every NIC driver utilizing XDP should invoke xdp_do_flush() after processing all packages. With the introduction of the bpf_net_context logic the flush lists (for dev, CPU-map and xsk) are lazy initialized only if used. However xdp_do_flush() tries to flush all three of them so all three lists are always initialized and the likely empty lists are "iterated". Without the usage of XDP but with CONFIG_DEBUG_NET the lists are also initialized due to xdp_do_check_flushed(). Jakub suggest to utilize the hints in bpf_net_context and avoid invoking the flush function. This will also avoiding initializing the lists which are otherwise unused. Introduce bpf_net_ctx_get_all_used_flush_lists() to return the individual list if not-empty. Use the logic in xdp_do_flush() and xdp_do_check_flushed(). Remove the not needed .*_check_flush(). Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 15:26:57 +02:00
David Wei	d7f39aee79	page_pool: export page_pool_disable_direct_recycling() `56ef27e3` unexported page_pool_unlink_napi() and renamed it to page_pool_disable_direct_recycling(). This is because there was no in-tree user of page_pool_unlink_napi(). Since then Rx queue API and an implementation in bnxt got merged. In the bnxt implementation, it broadly follows the following steps: allocate new queue memory + page pool, stop old rx queue, swap, then destroy old queue memory + page pool. The existing NAPI instance is re-used so when the old page pool that is no longer used but still linked to this shared NAPI instance is destroyed, it will trigger warnings. In my initial patches I unlinked a page pool from a NAPI instance directly. Instead, export page_pool_disable_direct_recycling() and call that instead to avoid having a driver touch a core struct. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 15:00:11 +02:00
Sagi Grimberg	d2d30a376d	net: allow skb_datagram_iter to be called from any context We only use the mapping in a single context, so kmap_local is sufficient and cheaper. Make sure to use skb_frag_foreach_page as skb frags may contain compound pages and we need to map page by page. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20-oliver.sang@intel.com Fixes: `950fcaecd5` ("datagram: consolidate datagram copy to iter helpers") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Link: https://patch.msgid.link/20240626100008.831849-1-sagi@grimberg.me Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 14:55:15 +02:00
Pavel Begunkov	2ca58ed21c	net: limit scope of a skb_zerocopy_iter_stream var skb_zerocopy_iter_stream() only uses @orig_uarg in the !link_skb path, and we can move the local variable in the appropriate block. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 12:06:50 +02:00
Pavel Begunkov	060f4ba6e4	io_uring/net: move charging socket out of zc io_uring Currently, io_uring's io_sg_from_iter() duplicates the part of __zerocopy_sg_from_iter() charging pages to the socket. It'd be too easy to miss while changing it in net/, the chunk is not the most straightforward for outside users and full of internal implementation details. io_uring is not a good place to keep it, deduplicate it by moving out of the callback into __zerocopy_sg_from_iter(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 12:06:50 +02:00
Pavel Begunkov	aeb320fc05	net: batch zerocopy_fill_skb_from_iter accounting Instead of accounting every page range against the socket separately, do it in batch based on the change in skb->truesize. It's also moved into __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is simpler and responsible for setting frags but not the accounting. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 12:06:50 +02:00
Pavel Begunkov	7fb05423fe	net: split __zerocopy_sg_from_iter() Split a function out of __zerocopy_sg_from_iter() that only cares about the traditional path with refcounted pages and doesn't need to know about ->sg_from_iter. A preparation patch, we'll improve on the function later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 12:06:50 +02:00
Pavel Begunkov	9e2db9d399	net: always try to set ubuf in skb_zerocopy_iter_stream skb_zcopy_set() does nothing if there is already a ubuf_info associated with an skb, and since ->link_skb should have set it several lines above the check here essentially does nothing and can be removed. It's also safer this way, because even if the callback is faulty we'll have it set. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 12:06:50 +02:00
Lorenzo Bianconi	391bb6594f	netfilter: Add bpf_xdp_flow_lookup kfunc Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup of a given flowtable entry based on a fib tuple of incoming traffic. bpf_xdp_flow_lookup can be used as building block to offload in xdp the processing of sw flowtable when hw flowtable is not available. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/bpf/55d38a4e5856f6d1509d823ff4e98aaa6d356097.1719698275.git.lorenzo@kernel.org	2024-07-01 17:03:01 +02:00
Florian Westphal	89cc8f1c5f	netfilter: nf_tables: Add flowtable map for xdp offload This adds a small internal mapping table so that a new bpf (xdp) kfunc can perform lookups in a flowtable. As-is, xdp program has access to the device pointer, but no way to do a lookup in a flowtable -- there is no way to obtain the needed struct without questionable stunts. This allows to obtain an nf_flowtable pointer given a net_device structure. In order to keep backward compatibility, the infrastructure allows the user to add a given device to multiple flowtables, but it will always return the first added mapping performing the lookup since it assumes the right configuration is 1:1 mapping between flowtables and net_devices. Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/bpf/9f20e2c36f494b3bf177328718367f636bb0b2ab.1719698275.git.lorenzo@kernel.org	2024-07-01 17:01:53 +02:00
Heng Qi	74d6529b78	net: ethtool: Fix the panic caused by dev being null when dumping coalesce syzbot reported a general protection fault caused by a null pointer dereference in coalesce_fill_reply(). The issue occurs when req_base->dev is null, leading to an invalid memory access. This panic occurs if dumping coalesce when no device name is specified. Fixes: `f750dfe825` ("ethtool: provide customized dim profile management") Reported-by: syzbot+e77327e34cdc8c36b7d3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e77327e34cdc8c36b7d3 Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 13:43:50 +01:00
David S. Miller	42391445a8	bluetooth pull request for net: - Ignore too large handle values in BIG - L2CAP: sync sock recv cb and release - hci_bcm4377: Fix msgid release - ISO: Check socket flag instead of hcon - hci_event: Fix setting of unicast qos interval - hci: disallow setting handle bigger than HCI_CONN_HANDLE_MAX - Add quirk to ignore reserved PHY bits in LE Extended Adv Report - hci_core: cancel all works upon hci_unregister_dev - btintel_pcie: Fix REVERSE_INULL issue reported by coverity - qca: Fix BT enable failure again for QCA6390 after warm reboot -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmZ/BFgZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKfb/EACW5sy6JlUxmN4MjYT5rk6j UVxtiS60XS5y1kMpNBE9FuRnF54vnKV+anuQ1RhsE+ICjla9PrjI5pXCKUXGCD+q z8bshTtzWyZ3RiWnkfictHfXHZ/wYwK1Ly6Sn3I7C4Ttl1O408Mn43zXdMQw7Y7O 6UfUOhli57lgZObvOfNlvIwfHXt0qPoyYkyvBrbtd90PZ7DgcG3QHrj7Md/Scuh5 3IB4YvrX27pHu/VuR6FqaeywgFcAEGH+YnHGOJX1zbcMhwLEf4Uw+0EgpQ33kwdv N6ZBbfdvxOVCxVdsiEQmBEfzsazopqXgq/6QXyq0AqnHl7Wue/nCKag0WOR7frTy LOKo7pGQsW59g8xdA7n38qbk11LyazGIaSeGX2HYHy8jNgH/f6Pq6D4fsoSjc/Jd fqSdZ6pd0p2WmJLG76yoQlXRCRsP3327NBPvQkKk5+EFZi3r1QvunxQbABL5bohB rsIgpeK8OnTn0bNvnSM6clE8aKI6JBYi/ZqNUjblxM9jY2st/jiRdEla9KX6KnHh tmL+d/MsZqkxVsNL/2bkc830FT1QDlx4JZrtcUs8mLTZKRRHoSJ9RgGLbI5qwzDd b3uIrQ93HaQhZCh/46cAUPxJO4afSa9I12YCpLuhEofVsacp3f3YqBOVXRcmh3oH G9mJdYI6JHLwtqHWiGrIaA== =X7As -----END PGP SIGNATURE----- Merge tag 'for-net-2024-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth into main bluetooth pull request for net: - Ignore too large handle values in BIG - L2CAP: sync sock recv cb and release - hci_bcm4377: Fix msgid release - ISO: Check socket flag instead of hcon - hci_event: Fix setting of unicast qos interval - hci: disallow setting handle bigger than HCI_CONN_HANDLE_MAX - Add quirk to ignore reserved PHY bits in LE Extended Adv Report - hci_core: cancel all works upon hci_unregister_dev - btintel_pcie: Fix REVERSE_INULL issue reported by coverity - qca: Fix BT enable failure again for QCA6390 after warm reboot Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 13:08:12 +01:00
Steffen Klassert	2d5317753e	xfrm: Export symbol xfrm_dev_state_delete. This fixes a build failure if xfrm_user is build as a module. Fixes: `07b87f9eea` ("xfrm: Fix unregister netdevice hang on hardware offload.") Reported-by: Mark Brown <broonie@kernel.org> Tested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-07-01 12:40:21 +02:00
David S. Miller	1c5fc27bc4	netfilter pull request 24-06-28 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmZ+3ogACgkQ1V2XiooU IOR7HRAAsVkJnKLPqV4lcY2Yx/QHi+o1s0pBCTZIqzs2rRXfaYrdu9xV0225DuPn xuNNV2GChtWftQxvwcxVLgTGHGG/p8bNYiNJoYEE6acftHZMV4ZZ7NG1yCv2TI3x 8Udu3vFnvnQhV9Q4LNR3SMtCtz5Z5QP1KNM74uaksN+9opCNniuG23Eft6YXh7Kf BYLvJX4pn+St2YTvvnNbA6U/ALxy5OZ/YwXP6FjmERp3AGoFPF2w+MEBmBlyGE3X LDKZ05hnKG4Sd/qp7XnZi9kEZoI9iBKg+GPm5ey1BVjZNMCc5hSpCIdYKb8FiwRa cN+UCc82H9/N2mJXSrcBDA6n8+lp0dLpfomliERyieY3m38Rp7BKTh6pUOmQCw+H bmTJ7rz5WBCC5yjts0N7+2SaVOo+RQpSLXV/SQCIKmk+Xl5sJinvP/gnKWAaPWIm 3gC4Bv7JUuB6x62EcRzoWGFDw8dXlQ64gvkwyMpeelFIexR3dFCfoA3zAaqJnlxJ uZXEF9xuQsZht8IYD37Z6C99tVJzVj/4gCKWKZwi3Kcn/G/MRkQ3lNAPyLewIcMV nC1pwU31z1PXNrbSXrXlUEdl1yUzg04wkc4RrVMJgU983kdQdMTp8Q4BbckdhWCV 4agMNuP4brp6iCvDPamcrWQ+4AbXw/zSdqQr8ONExrOgDUd1ePw= =0BtN -----END PGP SIGNATURE----- Merge tag 'nf-next-24-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: Patch #1 to #11 to shrink memory consumption for transaction objects: struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 / struct nft_trans_elem { / size: 72 (-40), cachelines: 2, members: 4 / struct nft_trans_flowtable { / size: 80 (-48), cachelines: 2, members: 5 / struct nft_trans_obj { / size: 72 (-40), cachelines: 2, members: 4 / struct nft_trans_rule { / size: 80 (-32), cachelines: 2, members: 6 / struct nft_trans_set { / size: 96 (-24), cachelines: 2, members: 8 / struct nft_trans_table { / size: 56 (-40), cachelines: 1, members: 2 / struct nft_trans_elem can now be allocated from kmalloc-96 instead of kmalloc-128 slab. Series from Florian Westphal. For the record, I have mangled patch #1 to add nft_trans_container_() and use if for every transaction object. I have also added BUILD_BUG_ON to ensure struct nft_trans always comes at the beginning of the container transaction object. And few minor cleanups, any new bugs are of my own. Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno. Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang. Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting default conntrack timeouts via nfnetlink_cttimeout API, from Lin Ma. Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use larger secctx names than the existing 256 bytes length. Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving nfnetlink_queue, from Florian Westphal. Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 09:52:35 +01:00
Jakub Kicinski	66be40e622	tcp_metrics: validate source addr length I don't see anything checking that TCP_METRICS_ATTR_SADDR_IPV4 is at least 4 bytes long, and the policy doesn't have an entry for this attribute at all (neither does it for IPv6 but v6 is manually validated). Reviewed-by: Eric Dumazet <edumazet@google.com> Fixes: `3e7013ddf5` ("tcp: metrics: Allow selective get/del of tcp-metrics based on src IP") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 09:40:36 +01:00
Linus Torvalds	8282d5af7b	NFS client bugfixes for Linux 6.10 Bugfixes: - SUNRPC one more fix for the NFSv4.x backchannel timeouts -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmaAUuUACgkQZwvnipYK APKO2A//XjinP2LBsPPtjNCrZRujWgYZDblymXeFyZKh8LEsORojvaijtE959VXQ loPxmybv5Ht7Zg1p5vxRF00T6MX4eza50JAUfxa7QaUYTDMF7MX2sR1217aG2ht/ ccmWLz4+JiuSbiPyOHYgY044KXMN9jLth1PjaBn6Efw4waQTyL7nI9dm7Mu/miij eyIYzDl2gb2XyB6+qG528XWi3nA57OGV3vHvET8S15Rvq6LwdYdnJykk/cG7uVkp 00ZsdotVBUtnGyrQrcbZulzqVdKRYCYrmWK33UhP9NSDazf9ou89ieTZqyFi2+8m 46rSmLmW2rx6u37he46ZVjly+0PFWGZaO/U/Qj7e19iA5bp5C8Sp5z8igo3dG93p 1SFNGH4SKJ6IuNz3mWfhza25/sV0ZTdVSDgGazzVpf/dHOdBJ24Btjo2ANo9oxct 3BP4k6mXrZyQse967WjOQIhGXC7DAwzU9rw0hLDd6PPQD+SA8KsIn3P4PFJSGqOP VGE5QK2mGRUHUypo/b2YPueXHxLxN2NJv+xLvd/IiAUr0km8hcITnrhOgAf297/j BK1qv2PdKhTFc0NTZ7frXrBV5hpbws98z5JOVq/lkt9spGqEdtjUzSeCd4gpJAWk abHpHI59OZH8Byt9nJllDn/BbNJTDhk//dnytt4iklm1yllKs6k= =c5Ek -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client fix from Trond Myklebust: - One more SUNRPC fix for the NFSv4.x backchannel timeouts * tag 'nfs-for-6.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Fix backchannel reply, again	2024-06-29 13:48:24 -07:00
Jesse Brandeburg	b8c7dd15ce	kernel-wide: fix spelling mistakes like "assocative" -> "associative" There were several instances of the string "assocat" in the kernel, which should have been spelled "associat", with the various endings of -ive, -ed, -ion, and sometimes beginnging with dis-. Add to the spelling dictionary the corrections so that future instances will be caught by checkpatch, and fix the instances found. Originally noticed by accident with a 'git grep socat'. Link: https://lkml.kernel.org/r/20240612001247.356867-1-jesse.brandeburg@intel.com Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-06-28 19:36:28 -07:00
Edward Cree	7964e78846	net: ethtool: use the tracking array for get_rxfh on custom RSS contexts On 'ethtool -x' with rss_context != 0, instead of calling the driver to read the RSS settings for the context, just get the settings from the rss_ctx xarray, and return them to the user with no driver involvement. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/2d0190fa29638f307ea720f882ebd41f6f867694.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-28 18:53:21 -07:00
Edward Cree	8792515119	net: ethtool: add a mutex protecting RSS contexts While this is not needed to serialise the ethtool entry points (which are all under RTNL), drivers may have cause to asynchronously access dev->ethtool->rss_ctx; taking dev->ethtool->rss_lock allows them to do this safely without needing to take the RTNL. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/7f9c15eb7525bf87af62c275dde3a8570ee8bf0a.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-28 18:53:21 -07:00
Edward Cree	30a32cdf6b	net: ethtool: add an extack parameter to new rxfh_context APIs Currently passed as NULL, but will allow drivers to report back errors when ethnl support for these ops is added. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/6e0012347d175fdd1280363d7bfa76a2f2777e17.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-28 18:53:21 -07:00
Edward Cree	847a8ab186	net: ethtool: let the core choose RSS context IDs Add a new API to create/modify/remove RSS contexts, that passes in the newly-chosen context ID (not as a pointer) rather than leaving the driver to choose it on create. Also pass in the ctx, allowing drivers to easily use its private data area to store their hardware-specific state. Keep the existing .set_rxfh API for now as a fallback, but deprecate it for custom contexts (rss_context != 0). Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/45f1fe61df2163c091ec394c9f52000c8b16cc3b.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-28 18:53:21 -07:00
Edward Cree	eac9122f0c	net: ethtool: record custom RSS contexts in the XArray Since drivers are still choosing the context IDs, we have to force the XArray to use the ID they've chosen rather than picking one ourselves, and handle the case where they give us an ID that's already in use. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/801f5faa4cec87c65b2c6e27fb220c944bce593a.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-28 18:53:21 -07:00
Edward Cree	6ad2962f8a	net: ethtool: attach an XArray of custom RSS contexts to a netdevice Each context stores the RXFH settings (indir, key, and hfunc) as well as optionally some driver private data. Delete any still-existing contexts at netdev unregister time. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/cbd1c402cec38f2e03124f2ab65b4ae4e08bd90d.1719502240.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-28 18:53:21 -07:00
Edward Cree	3ebbd9f6de	net: move ethtool-related netdev state into its own struct net_dev->ethtool is a pointer to new struct ethtool_netdev_state, which currently contains only the wol_enabled field. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/293a562278371de7534ed1eb17531838ca090633.1719502239.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-28 18:53:17 -07:00
Jakub Sitnicki	10154dbded	udp: Allow GSO transmit from devices with no checksum offload Today sending a UDP GSO packet from a TUN device results in an EIO error: import fcntl, os, struct from socket import * TUNSETIFF = 0x400454CA IFF_TUN = 0x0001 IFF_NO_PI = 0x1000 UDP_SEGMENT = 103 tun_fd = os.open("/dev/net/tun", os.O_RDWR) ifr = struct.pack("16sH", b"tun0", IFF_TUN \| IFF_NO_PI) fcntl.ioctl(tun_fd, TUNSETIFF, ifr) os.system("ip addr add 192.0.2.1/24 dev tun0") os.system("ip link set dev tun0 up") s = socket(AF_INET, SOCK_DGRAM) s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200) s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO This is due to a check in the udp stack if the egress device offers checksum offload. While TUN/TAP devices, by default, don't advertise this capability because it requires support from the TUN/TAP reader. However, the GSO stack has a software fallback for checksum calculation, which we can use. This way we don't force UDP_SEGMENT users to handle the EIO error and implement a segmentation fallback. Lift the restriction so that UDP_SEGMENT can be used with any egress device. We also need to adjust the UDP GSO code to match the GSO stack expectation about ip_summed field, as set in commit `8d63bee643` ("net: avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit the bad offload check. Users should, however, expect a potential performance impact when batch-sending packets with UDP_SEGMENT without checksum offload on the egress device. In such case the packet payload is read twice: first during the sendmsg syscall when copying data from user memory, and then in the GSO stack for checksum computation. This double memory read can be less efficient than a regular sendmsg where the checksum is calculated during the initial data copy from user memory. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626-linux-udpgso-v2-1-422dfcbd6b48@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-28 18:12:59 -07:00
Luiz Augusto von Dentz	f1a8f402f1	Bluetooth: L2CAP: Fix deadlock This fixes the following deadlock introduced by `39a92a55be` ("bluetooth/l2cap: sync sock recv cb and release") ============================================ WARNING: possible recursive locking detected 6.10.0-rc3-g4029dba6b6f1 #6823 Not tainted -------------------------------------------- kworker/u5:0/35 is trying to acquire lock: ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at: l2cap_sock_recv_cb+0x44/0x1e0 but task is already holding lock: ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at: l2cap_get_chan_by_scid+0xaf/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&chan->lock#2/1); lock(&chan->lock#2/1); * DEADLOCK * May be due to missing lock nesting notation 3 locks held by kworker/u5:0/35: #0: ffff888002b8a940 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x750/0x930 #1: ffff888002c67dd0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work+0x44e/0x930 #2: ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at: l2cap_get_chan_by_scid+0xaf/0xd0 To fix the original problem this introduces l2cap_chan_lock at l2cap_conless_channel to ensure that l2cap_sock_recv_cb is called with chan->lock held. Fixes: `89e856e124` ("bluetooth/l2cap: sync sock recv cb and release") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-06-28 14:32:02 -04:00
Pavel Skripkin	1cc18c2ab2	bluetooth/hci: disallow setting handle bigger than HCI_CONN_HANDLE_MAX Syzbot hit warning in hci_conn_del() caused by freeing handle that was not allocated using ida allocator. This is caused by handle bigger than HCI_CONN_HANDLE_MAX passed by hci_le_big_sync_established_evt(), which makes code think it's unset connection. Add same check for handle upper bound as in hci_conn_set_handle() to prevent warning. Link: https://syzkaller.appspot.com/bug?extid=b2545b087a01a7319474 Reported-by: syzbot+b2545b087a01a7319474@syzkaller.appspotmail.com Fixes: `181a42eddd` ("Bluetooth: Make handle of hci_conn be unique") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-06-28 14:30:50 -04:00
Iulia Tanasescu	596b6f0813	Bluetooth: ISO: Check socket flag instead of hcon This fixes the following Smatch static checker warning: net/bluetooth/iso.c:1364 iso_sock_recvmsg() error: we previously assumed 'pi->conn->hcon' could be null (line 1359) net/bluetooth/iso.c 1347 static int iso_sock_recvmsg(struct socket sock, struct msghdr msg, 1348 size_t len, int flags) 1349 { 1350 struct sock sk = sock->sk; 1351 struct iso_pinfo pi = iso_pi(sk); 1352 1353 BT_DBG("sk %p", sk); 1354 1355 if (test_and_clear_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags)) { 1356 lock_sock(sk); 1357 switch (sk->sk_state) { 1358 case BT_CONNECT2: 1359 if (pi->conn->hcon && ^^^^^^^^^^^^^^ If ->hcon is NULL 1360 test_bit(HCI_CONN_PA_SYNC, &pi->conn->hcon->flags)) { 1361 iso_conn_big_sync(sk); 1362 sk->sk_state = BT_LISTEN; 1363 } else { --> 1364 iso_conn_defer_accept(pi->conn->hcon); ^^^^^^^^^^^^^^ then we're toast 1365 sk->sk_state = BT_CONFIG; 1366 } 1367 release_sock(sk); 1368 return 0; 1369 case BT_CONNECTED: 1370 if (test_bit(BT_SK_PA_SYNC, Fixes: `fbdc4bc472` ("Bluetooth: ISO: Use defer setup to separate PA sync and BIG sync") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-06-28 14:30:46 -04:00
Edward Adam Davis	89e856e124	bluetooth/l2cap: sync sock recv cb and release The problem occurs between the system call to close the sock and hci_rx_work, where the former releases the sock and the latter accesses it without lock protection. CPU0 CPU1 ---- ---- sock_close hci_rx_work l2cap_sock_release hci_acldata_packet l2cap_sock_kill l2cap_recv_frame sk_free l2cap_conless_channel l2cap_sock_recv_cb If hci_rx_work processes the data that needs to be received before the sock is closed, then everything is normal; Otherwise, the work thread may access the released sock when receiving data. Add a chan mutex in the rx callback of the sock to achieve synchronization between the sock release and recv cb. Sock is dead, so set chan data to NULL, avoid others use invalid sock pointer. Reported-and-tested-by: syzbot+b7f6f8c9303466e16c8a@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-06-28 14:30:43 -04:00
Edward Adam Davis	015d79c96d	Bluetooth: Ignore too large handle values in BIG hci_le_big_sync_established_evt is necessary to filter out cases where the handle value is belonging to ida id range, otherwise ida will be erroneously released in hci_conn_cleanup. Fixes: `181a42eddd` ("Bluetooth: Make handle of hci_conn be unique") Reported-by: syzbot+b2545b087a01a7319474@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b2545b087a01a7319474 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-06-28 14:30:40 -04:00
Tetsuo Handa	0d151a1037	Bluetooth: hci_core: cancel all works upon hci_unregister_dev() syzbot is reporting that calling hci_release_dev() from hci_error_reset() due to hci_dev_put() from hci_error_reset() can cause deadlock at destroy_workqueue(), for hci_error_reset() is called from hdev->req_workqueue which destroy_workqueue() needs to flush. We need to make sure that hdev->{rx_work,cmd_work,tx_work} which are queued into hdev->workqueue and hdev->{power_on,error_reset} which are queued into hdev->req_workqueue are no longer running by the moment destroy_workqueue(hdev->workqueue); destroy_workqueue(hdev->req_workqueue); are called from hci_release_dev(). Call cancel_work_sync() on these work items from hci_unregister_dev() as soon as hdev->list is removed from hci_dev_list. Reported-by: syzbot <syzbot+da0a9c9721e36db712e8@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=da0a9c9721e36db712e8 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-06-28 14:30:34 -04:00
Luiz Augusto von Dentz	ac65ecccae	Bluetooth: hci_event: Fix setting of unicast qos interval qos->ucast interval reffers to the SDU interval, and should not be set to the interval value reported by the LE CIS Established event since the latter reffers to the ISO interval. These two interval are not the same thing: BLUETOOTH CORE SPECIFICATION Version 5.3 \| Vol 6, Part G Isochronous interval: The time between two consecutive BIS or CIS events (designated ISO_Interval in the Link Layer) SDU interval: The nominal time between two consecutive SDUs that are sent or received by the upper layer. So this instead uses the following formula from the spec to calculate the resulting SDU interface: BLUETOOTH CORE SPECIFICATION Version 5.4 \| Vol 6, Part G page 3075: Transport_Latency_C_To_P = CIG_Sync_Delay + (FT_C_To_P) × ISO_Interval + SDU_Interval_C_To_P Transport_Latency_P_To_C = CIG_Sync_Delay + (FT_P_To_C) × ISO_Interval + SDU_Interval_P_To_C Link: https://github.com/bluez/bluez/issues/823 Fixes: `2be22f1941` ("Bluetooth: hci_event: Fix parsing of CIS Established Event") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-06-28 14:30:28 -04:00
Sven Peter	ed2a2ef16a	Bluetooth: Add quirk to ignore reserved PHY bits in LE Extended Adv Report Some Broadcom controllers found on Apple Silicon machines abuse the reserved bits inside the PHY fields of LE Extended Advertising Report events for additional flags. Add a quirk to drop these and correctly extract the Primary/Secondary_PHY field. The following excerpt from a btmon trace shows a report received with "Reserved" for "Primary PHY" on a 4388 controller: > HCI Event: LE Meta Event (0x3e) plen 26 LE Extended Advertising Report (0x0d) Num reports: 1 Entry 0 Event type: 0x2515 Props: 0x0015 Connectable Directed Use legacy advertising PDUs Data status: Complete Reserved (0x2500) Legacy PDU Type: Reserved (0x2515) Address type: Random (0x01) Address: 00:00:00:00:00:00 (Static) Primary PHY: Reserved Secondary PHY: No packets SID: no ADI field (0xff) TX power: 127 dBm RSSI: -60 dBm (0xc4) Periodic advertising interval: 0.00 msec (0x0000) Direct address type: Public (0x00) Direct address: 00:00:00:00:00:00 (Apple, Inc.) Data length: 0x00 Cc: stable@vger.kernel.org Fixes: `2e7ed5f5e6` ("Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync") Reported-by: Janne Grunau <j@jannau.net> Closes: https://lore.kernel.org/all/Zjz0atzRhFykROM9@robin Tested-by: Janne Grunau <j@jannau.net> Signed-off-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-06-28 14:30:20 -04:00
Linus Torvalds	6c0483dbfe	nfsd-6.10 fixes: - Due to a late review, revert and re-fix a recent crasher fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmZ+zksACgkQM2qzM29m f5ey0Q/+O8Z//rVg8L8V1EI44gZ1LRTiJEeDFoJU1Hktn0dOwTvGQJNzNbfpTBbT zTvvDBd9+cbHQ7Jl2r73+wkqgZbGXsRhMRUCmlA5ur0O7ujkYDo3rpOgQhP4V3LV 1G6o2uOJT9d9UykUF74Demlg97kNrc8CY1QNujXjKrj8jYpQY4KcRCDb91KdYgc9 GSque9zMD2jTJkTkSVRUyfWGEDvYtDxs7/4rdErjixaY30dgkPHae6euRzoQd2sC v3aoKHSt5Y/pvOX+3/c2QzDqOiKPOnm3dLazB+rK6lYrnt4iUeP3CiPcFmqNJZk2 kZ3hpCPN8Le3oQFlMe450EljB8wLj2DqLix+Y/iOGyA732GLxNW6Hu8PkwRVFFwx cGGp6KryrcwO/BNHgxceErfhBG4IPKSgnY6C8Io+mPYKwSRAOKA4su02RoJDWJkA PTf67JgZiKv+bn3EGojeTo9YAvb04EYstttK2yBFUR9xylyew9vaZ9X8mYkqfyHN pZEWf/SI8SExZteo3Ymx9HLcX5FYsBjgvigpow33kUCn2hMhuhfBXEtmLNsxOSYC aEICcAYqnBovFJ+3vbz11IDS04ERIFdprlLQuExtJ21pujIlIM9pOxrIGvgu5pPl ++rsDP31Kh6imuZbSaJ5TFTdtDqcpah587m+mQtlaXAjOQL1qfw= =gwat -----END PGP SIGNATURE----- Merge tag 'nfsd-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Due to a late review, revert and re-fix a recent crasher fix * tag 'nfsd-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "nfsd: fix oops when reading pool_stats before server is started" nfsd: initialise nfsd_info.mutex early.	2024-06-28 09:32:33 -07:00
Phil Sutter	f4ebd03496	netfilter: xt_recent: Lift restrictions on max hitcount value Support tracking of up to 65535 packets per table entry instead of just 255 to better facilitate longer term tracking or higher throughput scenarios. Note how this aligns sizes of struct recent_entry's 'nstamps' and 'index' fields when 'nstamps' was larger before. This is unnecessary as the value of 'nstamps' grows along with that of 'index' after being initialized to 1 (see recent_entry_update()). Its value will thus never exceed that of 'index' and therefore does not need to provide space for larger values. Requested-by: Fabio <pedretti.fabio@gmail.com> Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1745 Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-06-28 17:57:50 +02:00
David S. Miller	dc6be0b73f	Merge tag 'ieee802154-for-net-2024-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan into main	2024-06-28 13:10:12 +01:00
Danielle Ratson	32b4c8b53e	ethtool: Add ability to flash transceiver modules' firmware Add the ability to flash the modules' firmware by implementing the interface between the user space and the kernel. Example from a succeeding implementation: # ethtool --flash-module-firmware swp40 file test.bin Transceiver module firmware flashing started for device swp40 Transceiver module firmware flashing in progress for device swp40 Progress: 99% Transceiver module firmware flashing completed for device swp40 In addition, add infrastructure that allows modules to set socket-specific private data. This ensures that when a socket is closed from user space during the flashing process, the right socket halts sending notifications to user space until the work item is completed. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-28 10:48:23 +01:00
Danielle Ratson	c4f78134d4	ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB According to the CMIS standard, the firmware update process is done using a CDB commands sequence. Implement a work that will be triggered from the module layer in the next patch the will initiate and execute all the CDB commands in order, to eventually complete the firmware update process. This flashing process includes, writing the firmware image, running the new firmware image and committing it after testing, so that it will run upon reset. This work will also notify user space about the progress of the firmware update process. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-28 10:48:23 +01:00
Danielle Ratson	a39c84d796	ethtool: cmis_cdb: Add a layer for supporting CDB commands CDB (Command Data Block Message Communication) reads and writes are performed on memory map pages 9Fh-AFh according to the CMIS standard, section 8.20 of revision 5.2. Page 9Fh is used to specify the CDB command to be executed and also provides an area for a local payload (LPL). According to the CMIS standard, the firmware update process is done using a CDB commands sequence that will be implemented in the next patch. The kernel interface that will implement the firmware update using CDB command will include 2 layers that will be added under ethtool: * The upper layer that will be triggered from the module layer, is cmis_fw_update. * The lower one is cmis_cdb. In the future there might be more operations to implement using CDB commands. Therefore, the idea is to keep the CDB interface clean and the cmis_fw_update specific to the CDB commands handling it. These two layers will communicate using the API the consists of three functions: - struct ethtool_cmis_cdb * ethtool_cmis_cdb_init(struct net_device dev, struct ethtool_module_fw_flash_params params); - void ethtool_cmis_cdb_fini(struct ethtool_cmis_cdb cdb); - int ethtool_cmis_cdb_execute_cmd(struct net_device dev, struct ethtool_cmis_cdb_cmd_args args); Add the CDB layer to support initializing, finishing and executing CDB commands: The initialization process will include creating of an ethtool_cmis_cdb instance, querying the module CDB support, entering and validating the password from user space (CMD 0x0000) and querying the module features (CMD 0x0040). * The finishing API will simply free the ethtool_cmis_cdb instance. * The executing process will write the CDB command to EEPROM using set_module_eeprom_by_page() that was presented earlier, and will process the reply from EEPROM. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-28 10:48:23 +01:00
Danielle Ratson	31e0aa99dc	ethtool: Veto some operations during firmware flashing process Some operations cannot be performed during the firmware flashing process. For example: - Port must be down during the whole flashing process to avoid packet loss while committing reset for example. - Writing to EEPROM interrupts the flashing process, so operations like ethtool dump, module reset, get and set power mode should be vetoed. - Split port firmware flashing should be vetoed. In order to veto those scenarios, add a flag in 'struct net_device' that indicates when a firmware flash is taking place on the module and use it to prevent interruptions during the process. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-28 10:48:22 +01:00
Danielle Ratson	d7d4cfc4c9	ethtool: Add flashing transceiver modules' firmware notifications ability Add progress notifications ability to user space while flashing modules' firmware by implementing the interface between the user space and the kernel. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-28 10:48:22 +01:00

1 2 3 4 5 ...

77912 Commits