linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-11-27 12:04:22 +08:00

Author	SHA1	Message	Date
Stephen Rothwell	c4ceb479d4	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git # Conflicts: # drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c	2024-03-28 12:53:26 +11:00
Stephen Rothwell	b866f07294	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git	2024-03-28 12:53:22 +11:00
Stephen Rothwell	319ac933b9	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git # Conflicts: # kernel/bpf/arena.c	2024-03-28 12:53:21 +11:00
Stephen Rothwell	8340e59e58	Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git	2024-03-28 09:55:11 +11:00
Stephen Rothwell	cfeebdaaf6	Merge branch '9p-next' of git://github.com/martinetd/linux	2024-03-28 09:14:59 +11:00
Stephen Rothwell	9b16d0833e	Merge branch 'nfsd-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux	2024-03-28 09:14:56 +11:00
Martin KaFai Lau	88be2ea40f	bpf: Remove CONFIG_X86 and CONFIG_DYNAMIC_FTRACE guard from the tcp-cc kfuncs The commit `7aae231ac9` ("bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE") added CONFIG_DYNAMIC_FTRACE guard because pahole was only generating btf for ftrace-able functions. The ftrace filter had already been removed from pahole, so the CONFIG_DYNAMIC_FTRACE guard can be removed. The commit `569c484f99` ("bpf: Limit static tcp-cc functions in the .BTF_ids list to x86") has added CONFIG_X86 guard because it failed the powerpc arch which prepended a "." to the local static function, so "cubictcp_init" becomes ".cubictcp_init". "__bpf_kfunc" has been added to kfunc since then and it uses the __unused compiler attribute. There is an existing "__bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused)" test in bpf_testmod.c to cover the static kfunc case. cross compile on ppc64 with CONFIG_DYNAMIC_FTRACE disabled: > readelf -s vmlinux \| grep cubictcp_ 56938: c00000000144fd00 184 FUNC LOCAL DEFAULT 2 cubictcp_cwnd_event [<localentry>: 8] 56939: c00000000144fdb8 200 FUNC LOCAL DEFAULT 2 cubictcp_recalc_[...] [<localentry>: 8] 56940: c00000000144fe80 296 FUNC LOCAL DEFAULT 2 cubictcp_init [<localentry>: 8] 56941: c00000000144ffa8 228 FUNC LOCAL DEFAULT 2 cubictcp_state [<localentry>: 8] 56942: c00000000145008c 1908 FUNC LOCAL DEFAULT 2 cubictcp_cong_avoid [<localentry>: 8] 56943: c000000001450800 1644 FUNC LOCAL DEFAULT 2 cubictcp_acked [<localentry>: 8] > bpftool btf dump file vmlinux \| grep cubictcp_ [51540] FUNC 'cubictcp_acked' type_id=38137 linkage=static [51541] FUNC 'cubictcp_cong_avoid' type_id=38122 linkage=static [51543] FUNC 'cubictcp_cwnd_event' type_id=51542 linkage=static [51544] FUNC 'cubictcp_init' type_id=9186 linkage=static [51545] FUNC 'cubictcp_recalc_ssthresh' type_id=35021 linkage=static [51547] FUNC 'cubictcp_state' type_id=38141 linkage=static The patch removed both config guards. Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240322191433.4133280-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-03-27 15:08:38 -07:00
Andrii Nakryiko	c55e25089e	bpf: add bpf_modify_return_test_tp() kfunc triggering tracepoint Add a simple bpf_modify_return_test_tp() kfunc, available to all program types, that is useful for various testing and benchmarking scenarios, as it allows to trigger most tracing BPF program types from BPF side, allowing to do complex testing and benchmarking scenarios. It is also attachable to for fmod_ret programs, making it a good and simple way to trigger fmod_ret program under test/benchmark. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-03-27 15:01:00 -07:00
Stephen Rothwell	446cfb8de6	Merge branch 'nfsd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux	2024-03-28 08:22:30 +11:00
Stephen Rothwell	7214e1e17d	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless.git	2024-03-28 08:22:04 +11:00
Bastien Nocera	db4597cc88	Bluetooth: Fix TOCTOU in HCI debugfs implementation struct hci_dev members conn_info_max_age, conn_info_min_age, le_conn_max_interval, le_conn_min_interval, le_adv_max_interval, and le_adv_min_interval can be modified from the HCI core code, as well through debugfs. The debugfs implementation, that's only available to privileged users, will check for boundaries, making sure that the minimum value being set is strictly above the maximum value that already exists, and vice-versa. However, as both minimum and maximum values can be changed concurrently to us modifying them, we need to make sure that the value we check is the value we end up using. For example, with ->conn_info_max_age set to 10, conn_info_min_age_set() gets called from vfs handlers to set conn_info_min_age to 8. In conn_info_min_age_set(), this goes through: if (val == 0 \|\| val > hdev->conn_info_max_age) return -EINVAL; Concurrently, conn_info_max_age_set() gets called to set to set the conn_info_max_age to 7: if (val == 0 \|\| val > hdev->conn_info_max_age) return -EINVAL; That check will also pass because we used the old value (10) for conn_info_max_age. After those checks that both passed, the struct hci_dev access is mutex-locked, disabling concurrent access, but that does not matter because the invalid value checks both passed, and we'll end up with conn_info_min_age = 8 and conn_info_max_age = 7 To fix this problem, we need to lock the structure access before so the check and assignment are not interrupted. This fix was originally devised by the BassCheck[1] team, and considered the problem to be an atomicity one. This isn't the case as there aren't any concerns about the variable changing while we check it, but rather after we check it parallel to another change. This patch fixes CVE-2024-24858 and CVE-2024-24857. [1] https://sites.google.com/view/basscheck/ Co-developed-by: Gui-Dong Han <2045gemini@gmail.com> Signed-off-by: Gui-Dong Han <2045gemini@gmail.com> Link: https://lore.kernel.org/linux-bluetooth/20231222161317.6255-1-2045gemini@gmail.com/ Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24858 Link: https://lore.kernel.org/linux-bluetooth/20231222162931.6553-1-2045gemini@gmail.com/ Link: https://lore.kernel.org/linux-bluetooth/20231222162310.6461-1-2045gemini@gmail.com/ Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24857 Fixes: `31ad169148` ("Bluetooth: Add conn info lifetime parameters to debugfs") Fixes: `729a1051da` ("Bluetooth: Expose default LE advertising interval via debugfs") Fixes: `71c3b60ec6` ("Bluetooth: Move BR/EDR debugfs file creation into hci_debugfs.c") Signed-off-by: Bastien Nocera <hadess@hadess.net> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-03-27 11:40:55 -04:00
Hui Wang	a2a52ae8fe	Bluetooth: hci_event: set the conn encrypted before conn establishes We have a BT headset (Lenovo Thinkplus XT99), the pairing and connecting has no problem, once this headset is paired, bluez will remember this device and will auto re-connect it whenever the device is powered on. The auto re-connecting works well with Windows and Android, but with Linux, it always fails. Through debugging, we found at the rfcomm connection stage, the bluetooth stack reports "Connection refused - security block (0x0003)". For this device, the re-connecting negotiation process is different from other BT headsets, it sends the Link_KEY_REQUEST command before the CONNECT_REQUEST completes, and it doesn't send ENCRYPT_CHANGE command during the negotiation. When the device sends the "connect complete" to hci, the ev->encr_mode is 1. So here in the conn_complete_evt(), if ev->encr_mode is 1, link type is ACL and HCI_CONN_ENCRYPT is not set, we set HCI_CONN_ENCRYPT to this conn, and update conn->enc_key_size accordingly. After this change, this BT headset could re-connect with Linux successfully. This is the btmon log after applying the patch, after receiving the "Connect Complete" with "Encryption: Enabled", will send the command to read encryption key size: > HCI Event: Connect Request (0x04) plen 10 Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA) Class: 0x240404 Major class: Audio/Video (headset, speaker, stereo, video, vcr) Minor class: Wearable Headset Device Rendering (Printing, Speaker) Audio (Speaker, Microphone, Headset) Link type: ACL (0x01) ... > HCI Event: Link Key Request (0x17) plen 6 Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA) < HCI Command: Link Key Request Reply (0x01\|0x000b) plen 22 Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA) Link key: ${32-hex-digits-key} ... > HCI Event: Connect Complete (0x03) plen 11 Status: Success (0x00) Handle: 256 Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA) Link type: ACL (0x01) Encryption: Enabled (0x01) < HCI Command: Read Encryption Key... (0x05\|0x0008) plen 2 Handle: 256 < ACL Data TX: Handle 256 flags 0x00 dlen 10 L2CAP: Information Request (0x0a) ident 1 len 2 Type: Extended features supported (0x0002) > HCI Event: Command Complete (0x0e) plen 7 Read Encryption Key Size (0x05\|0x0008) ncmd 1 Status: Success (0x00) Handle: 256 Key size: 16 Cc: stable@vger.kernel.org Link: https://github.com/bluez/bluez/issues/704 Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-03-27 11:40:39 -04:00
Luiz Augusto von Dentz	1c3366abdb	Bluetooth: hci_sync: Fix not checking error on hci_cmd_sync_cancel_sync hci_cmd_sync_cancel_sync shall check the error passed to it since it will be propagated using req_result which is __u32 it needs to be properly set to a positive value if it was passed as negative othertise IS_ERR will not trigger as -(errno) would be converted to a positive value. Fixes: `63298d6e75` ("Bluetooth: hci_core: Cancel request on command timeout") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reported-and-tested-by: Thorsten Leemhuis <linux@leemhuis.info> Closes: https://lore.kernel.org/all/08275279-7462-4f4a-a0ee-8aa015f829bc@leemhuis.info/	2024-03-27 11:40:30 -04:00
Johan Hovold	19b8ed6007	Bluetooth: add quirk for broken address properties Some Bluetooth controllers lack persistent storage for the device address and instead one can be provided by the boot firmware using the 'local-bd-address' devicetree property. The Bluetooth devicetree bindings clearly states that the address should be specified in little-endian order, but due to a long-standing bug in the Qualcomm driver which reversed the address some boot firmware has been providing the address in big-endian order instead. Add a new quirk that can be set on platforms with broken firmware and use it to reverse the address when parsing the property so that the underlying driver bug can be fixed. Fixes: `5c0a1001c8` ("Bluetooth: hci_qca: Add helper to set device address") Cc: stable@vger.kernel.org # 5.1 Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-03-27 11:39:53 -04:00
Luiz Augusto von Dentz	042d12b044	Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset Consider certain values (0x00) as unset and load proper default if an application has not set them properly. Fixes: `0fe8c8d071` ("Bluetooth: Split bt_iso_qos into dedicated structures") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2024-03-27 11:39:52 -04:00
Anton Protopopov	2a720ccf30	bpf: Add a check for struct bpf_fib_lookup size The struct bpf_fib_lookup should not grow outside of its 64 bytes. Add a static assert to validate this. Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240326101742.17421-4-aspsk@isovalent.com	2024-03-27 16:33:10 +01:00
Anton Protopopov	e108d6cef9	bpf: Add support for passing mark with bpf_fib_lookup Extend the bpf_fib_lookup() helper by making it to utilize mark if the BPF_FIB_LOOKUP_MARK flag is set. In order to pass the mark the four bytes of struct bpf_fib_lookup are used, shared with the output-only smac/dmac fields. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240326101742.17421-2-aspsk@isovalent.com	2024-03-27 16:29:20 +01:00
Jakub Kicinski	2a702c2e57	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZgHylwAKCRDbK58LschI gzmaAPwKhDFFSU/DU08k22muJxLIXVR7Xx04baJ9mPiFrqZyyAEA8RFNamC7wZIB AnfwwoDjfDTP60rlXFaEf8UT5PpA7Ao= =/KF6 -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-03-25 We've added 38 non-merge commits during the last 13 day(s) which contain a total of 50 files changed, 867 insertions(+), 274 deletions(-). The main changes are: 1) Add the ability to specify and retrieve BPF cookie also for raw tracepoint programs in order to ease migration from classic to raw tracepoints, from Andrii Nakryiko. 2) Allow the use of bpf_get_{ns_,}current_pid_tgid() helper for all program types and add additional BPF selftests, from Yonghong Song. 3) Several improvements to bpftool and its build, for example, enabling libbpf logs when loading pid_iter in debug mode, from Quentin Monnet. 4) Check the return code of all BPF-related set_memory_() functions during load and bail out in case they fail, from Christophe Leroy. 5) Avoid a goto in regs_refine_cond_op() such that the verifier can be better integrated into Agni tool which doesn't support backedges yet, from Harishankar Vishwanathan. 6) Add a small BPF trie perf improvement by always inlining longest_prefix_match, from Jesper Dangaard Brouer. 7) Small BPF selftest refactor in bpf_tcp_ca.c to utilize start_server() helper instead of open-coding it, from Geliang Tang. 8) Improve test_tc_tunnel.sh BPF selftest to prevent client connect before the server bind, from Alessandro Carminati. 9) Fix BPF selftest benchmark for older glibc and use syscall(SYS_gettid) instead of gettid(), from Alan Maguire. 10) Implement a backward-compatible method for struct_ops types with additional fields which are not present in older kernels, from Kui-Feng Lee. 11) Add a small helper to check if an instruction is addr_space_cast from as(0) to as(1) and utilize it in x86-64 JIT, from Puranjay Mohan. 12) Small cleanup to remove unnecessary error check in bpf_struct_ops_map_update_elem, from Martin KaFai Lau. 13) Improvements to libbpf fd validity checks for BPF map/programs, from Mykyta Yatsenko. tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (38 commits) selftests/bpf: Fix flaky test btf_map_in_map/lookup_update bpf: implement insn_is_cast_user() helper for JITs bpf: Avoid get_kernel_nofault() to fetch kprobe entry IP selftests/bpf: Use start_server in bpf_tcp_ca bpf: Sync uapi bpf.h to tools directory libbpf: Add new sec_def "sk_skb/verdict" selftests/bpf: Mark uprobe trigger functions with nocf_check attribute selftests/bpf: Use syscall(SYS_gettid) instead of gettid() wrapper in bench bpf-next: Avoid goto in regs_refine_cond_op() bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool selftests/bpf: scale benchmark counting by using per-CPU counters bpftool: Remove unnecessary source files from bootstrap version bpftool: Enable libbpf logs when loading pid_iter in debug mode selftests/bpf: add raw_tp/tp_btf BPF cookie subtests libbpf: add support for BPF cookie for raw_tp/tp_btf programs bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs bpf: pass whole link instead of prog when triggering raw tracepoint bpf: flatten bpf_probe_register call chain selftests/bpf: Prevent client connect before server bind in test_tc_tunnel.sh selftests/bpf: Add a sk_msg prog bpf_get_ns_current_pid_tgid() test ... ==================== Link: https://lore.kernel.org/r/20240325233940.7154-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-27 07:52:34 -07:00
Aleksandr Aprelkov	769a17e81f	sunrpc: removed redundant procp check since vs_proc pointer is dereferenced before getting it's address there's no need to check for NULL. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `8e5b67731d` ("SUNRPC: Add a callback to initialise server requests") Signed-off-by: Aleksandr Aprelkov <aaprelkov@usergate.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-03-27 10:05:11 -04:00
Sabrina Dubroca	417e91e856	tls: get psock ref after taking rxlock to avoid leak At the start of tls_sw_recvmsg, we take a reference on the psock, and then call tls_rx_reader_lock. If that fails, we return directly without releasing the reference. Instead of adding a new label, just take the reference after locking has succeeded, since we don't need it before. Fixes: `4cbc325ed6` ("tls: rx: allow only one reader at a time") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/fe2ade22d030051ce4c3638704ed58b67d0df643.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-26 20:48:24 -07:00
Sabrina Dubroca	85eef9a41d	tls: adjust recv return with async crypto and failed copy to userspace process_rx_list may not copy as many bytes as we want to the userspace buffer, for example in case we hit an EFAULT during the copy. If this happens, we should only count the bytes that were actually copied, which may be 0. Subtracting async_copy_bytes is correct in both peek and !peek cases, because decrypted == async_copy_bytes + peeked for the peek case: peek is always !ZC, and we can go through either the sync or async path. In the async case, we add chunk to both decrypted and async_copy_bytes. In the sync case, we add chunk to both decrypted and peeked. I missed that in commit `6caaf10442` ("tls: fix peeking with sync+async decryption"). Fixes: `4d42cd6bc2` ("tls: rx: fix return value for async crypto") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1b5a1eaab3c088a9dd5d9f1059ceecd7afe888d1.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-26 20:48:24 -07:00
Sabrina Dubroca	7608a971fd	tls: recv: process_rx_list shouldn't use an offset with kvec Only MSG_PEEK needs to copy from an offset during the final process_rx_list call, because the bytes we copied at the beginning of tls_sw_recvmsg were left on the rx_list. In the KVEC case, we removed data from the rx_list as we were copying it, so there's no need to use an offset, just like in the normal case. Fixes: `692d7b5d1f` ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/e5487514f828e0347d2b92ca40002c62b58af73d.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-26 20:48:24 -07:00
Alexander Lobakin	341ee1a584	net: pin system percpu page_pools to the corresponding NUMA nodes System page_pools are percpu and one instance can be used only on one CPU. %NUMA_NO_NODE is fine for allocating pages, as the PP core always allocates local pages in this case. But for the struct &page_pool itself, this node ID means they are allocated on the boot CPU, which may belong to a different node than the target CPU. Pin system page_pools to the corresponding nodes when creating, so that all the allocated data will always be local. Use cpu_to_mem() to account memless nodes. Nodes != 0 win some Kpps when testing with xdp-trafficgen. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20240325160635.3215855-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-26 20:46:59 -07:00
Eric Dumazet	6e06312035	net: remove skb_free_datagram_locked() Last user of skb_free_datagram_locked() went away in 2016 with commit `850cbaddb5` ("udp: use it's own memory accounting schema"). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240325134155.620531-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-26 15:37:24 +01:00
Sebastian Andrzej Siewior	765b11f8f4	net: Rename rps_lock to backlog_lock. The rps_lock.() functions use the inner lock of a sk_buff_head for locking. This lock is used if RPS is enabled, otherwise the list is accessed lockless and disabling interrupts is enough for the synchronisation because it is only accessed CPU local. Not only the list is protected but also the NAPI state protected. With the addition of backlog threads, the lock is also needed because of the cross CPU access even without RPS. The clean up of the defer_list list is also done via backlog threads (if enabled). It has been suggested to rename the locking function since it is no longer just RPS. Rename the rps_lock() functions to backlog_lock*(). Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-26 12:17:18 +01:00
Sebastian Andrzej Siewior	80d2eefcb4	net: Use backlog-NAPI to clean up the defer_list. The defer_list is a per-CPU list which is used to free skbs outside of the socket lock and on the CPU on which they have been allocated. The list is processed during NAPI callbacks so ideally the list is cleaned up. Should the amount of skbs on the list exceed a certain water mark then the softirq is triggered remotely on the target CPU by invoking a remote function call. The raise of the softirqs via a remote function call leads to waking the ksoftirqd on PREEMPT_RT which is undesired. The backlog-NAPI threads already provide the infrastructure which can be utilized to perform the cleanup of the defer_list. The NAPI state is updated with the input_pkt_queue.lock acquired. It order not to break the state, it is needed to also wake the backlog-NAPI thread with the lock held. This requires to acquire the use the lock in rps_lock_irq() if the backlog-NAPI threads are used even with RPS disabled. Move the logic of remotely starting softirqs to clean up the defer_list into kick_defer_list_purge(). Make sure a lock is held in rps_lock_irq() if backlog-NAPI threads are used. Schedule backlog-NAPI for defer_list cleanup if backlog-NAPI is available. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-26 12:17:18 +01:00
Sebastian Andrzej Siewior	dad6b97702	net: Allow to use SMP threads for backlog NAPI. Backlog NAPI is a per-CPU NAPI struct only (with no device behind it) used by drivers which don't do NAPI them self, RPS and parts of the stack which need to avoid recursive deadlocks while processing a packet. The non-NAPI driver use the CPU local backlog NAPI. If RPS is enabled then a flow for the skb is computed and based on the flow the skb can be enqueued on a remote CPU. Scheduling/ raising the softirq (for backlog's NAPI) on the remote CPU isn't trivial because the softirq is only scheduled on the local CPU and performed after the hardirq is done. In order to schedule a softirq on the remote CPU, an IPI is sent to the remote CPU which schedules the backlog-NAPI on the then local CPU. On PREEMPT_RT interrupts are force-threaded. The soft interrupts are raised within the interrupt thread and processed after the interrupt handler completed still within the context of the interrupt thread. The softirq is handled in the context where it originated. With force-threaded interrupts enabled, ksoftirqd is woken up if a softirq is raised from hardirq context. This is the case if it is raised from an IPI. Additionally there is a warning on PREEMPT_RT if the softirq is raised from the idle thread. This was done for two reasons: - With threaded interrupts the processing should happen in thread context (where it originated) and ksoftirqd is the only thread for this context if raised from hardirq. Using the currently running task instead would "punish" a random task. - Once ksoftirqd is active it consumes all further softirqs until it stops running. This changed recently and is no longer the case. Instead of keeping the backlog NAPI in ksoftirqd (in force-threaded/ PREEMPT_RT setups) I am proposing NAPI-threads for backlog. The "proper" setup with threaded-NAPI is not doable because the threads are not pinned to an individual CPU and can be modified by the user. Additionally a dummy network device would have to be assigned. Also CPU-hotplug has to be considered if additional CPUs show up. All this can be probably done/ solved but the smpboot-threads already provide this infrastructure. Sending UDP packets over loopback expects that the packet is processed within the call. Delaying it by handing it over to the thread hurts performance. It is not beneficial to the outcome if the context switch happens immediately after enqueue or after a while to process a few packets in a batch. There is no need to always use the thread if the backlog NAPI is requested on the local CPU. This restores the loopback throuput. The performance drops mostly to the same value after enabling RPS on the loopback comparing the IPI and the tread result. Create NAPI-threads for backlog if request during boot. The thread runs the inner loop from napi_threaded_poll(), the wait part is different. It checks for NAPI_STATE_SCHED (the backlog NAPI can not be disabled). The NAPI threads for backlog are optional, it has to be enabled via the boot argument "thread_backlog_napi". It is mandatory for PREEMPT_RT to avoid the wakeup of ksoftirqd from the IPI. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-26 12:17:18 +01:00
Sebastian Andrzej Siewior	56364c9106	net: Remove conditional threaded-NAPI wakeup based on task state. A NAPI thread is scheduled by first setting NAPI_STATE_SCHED bit. If successful (the bit was not yet set) then the NAPI_STATE_SCHED_THREADED is set but only if thread's state is not TASK_INTERRUPTIBLE (is TASK_RUNNING) followed by task wakeup. If the task is idle (TASK_INTERRUPTIBLE) then the NAPI_STATE_SCHED_THREADED bit is not set. The thread is no relying on the bit but always leaving the wait-loop after returning from schedule() because there must have been a wakeup. The smpboot-threads implementation for per-CPU threads requires an explicit condition and does not support "if we get out of schedule() then there must be something to do". Removing this optimisation simplifies the following integration. Set NAPI_STATE_SCHED_THREADED unconditionally on wakeup and rely on it in the wait path by removing the `woken' condition. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-26 12:17:18 +01:00
Eric Dumazet	151c9c724d	tcp: properly terminate timers for kernel sockets We had various syzbot reports about tcp timers firing after the corresponding netns has been dismantled. Fortunately Josef Bacik could trigger the issue more often, and could test a patch I wrote two years ago. When TCP sockets are closed, we call inet_csk_clear_xmit_timers() to 'stop' the timers. inet_csk_clear_xmit_timers() can be called from any context, including when socket lock is held. This is the reason it uses sk_stop_timer(), aka del_timer(). This means that ongoing timers might finish much later. For user sockets, this is fine because each running timer holds a reference on the socket, and the user socket holds a reference on the netns. For kernel sockets, we risk that the netns is freed before timer can complete, because kernel sockets do not hold reference on the netns. This patch adds inet_csk_clear_xmit_timers_sync() function that using sk_stop_timer_sync() to make sure all timers are terminated before the kernel socket is released. Modules using kernel sockets close them in their netns exit() handler. Also add sock_not_owned_by_me() helper to get LOCKDEP support : inet_csk_clear_xmit_timers_sync() must not be called while socket lock is held. It is very possible we can revert in the future commit `3a58f13a88` ("net: rds: acquire refcount on TCP sockets") which attempted to solve the issue in rds only. (net/smc/af_smc.c and net/mptcp/subflow.c have similar code) We probably can remove the check_net() tests from tcp_out_of_resources() and __tcp_close() in the future. Reported-by: Josef Bacik <josef@toxicpanda.com> Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/ Fixes: `26abe14379` ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Fixes: `8a68173691` ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket") Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Josef Bacik <josef@toxicpanda.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-25 19:51:57 -07:00
Ravi Gunasekaran	b11c81731c	net: hsr: hsr_slave: Fix the promiscuous mode in offload mode commit `e748d0fd66` ("net: hsr: Disable promiscuous mode in offload mode") disables promiscuous mode of slave devices while creating an HSR interface. But while deleting the HSR interface, it does not take care of it. It decreases the promiscuous mode count, which eventually enables promiscuous mode on the slave devices when creating HSR interface again. Fix this by not decrementing the promiscuous mode count while deleting the HSR interface when offload is enabled. Fixes: `e748d0fd66` ("net: hsr: Disable promiscuous mode in offload mode") Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240322100447.27615-1-r-gunasekaran@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-25 19:51:56 -07:00
linke li	c2deb2e971	net: mark racy access on sk->sk_rcvbuf sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be changed by other threads. Mark this as benign using READ_ONCE(). This patch is aimed at reducing the number of benign races reported by KCSAN in order to focus future debugging effort on harmful races. Signed-off-by: linke li <lilinke99@qq.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-03-25 14:46:59 +00:00
Uwe Kleine-König	597d817202	net: rfkill: gpio: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://msgid.link/20240306183538.88777-2-u.kleine-koenig@pengutronix.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:40:22 +01:00
Johannes Berg	f40db02e8f	wifi: mac80211: use kvcalloc() for codel vars This is a big array, but it's only used by software and need not be contiguous in memory. Use kvcalloc() since it's so big (order 5 allocation). Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240325150509.9195643699e4.I1b94b17abc809491080d6312f31ce6b5decdd446@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:40:03 +01:00
Johannes Berg	a0b9ecffc3	wifi: mac80211: reactivate multi-link later in restart In case of restart, we currently reactivate multi-link on interfaces before reconfiguring keys etc. which means the drivers need to handle this case differently. Enable more links later to allow them to handle it the same way. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.d0f18a56335d.Ib3338d93872a4a568f38db0d02546534d3eff810@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:39:29 +01:00
Johannes Berg	6e02ba7c9e	wifi: mac80211: improve drop for action frame return If we use a drop we not only save the extra call to dev_kfree_skb(), but also have a better reason in tracing, so do that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.34daf0a89eb4.I60e0639511f9de64e40e6105b640adf90f8f57f7@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:39:28 +01:00
Johannes Berg	2f51c87a15	wifi: mac80211: don't ask driver about no-op link changes If the links won't actually change, nothing will happen. This was previously done in the inner function (twice in some cases), but we shouldn't bother the driver with it. Clean that up. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.a8190a312a27.If4e6f5ce8228eda7afac0fc8c17dd731c5da9ed9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:39:28 +01:00
Ayala Beker	80b0aacd1a	wifi: mac80211: don't select link ID if not provided in scan request If scan request doesn't include a link ID to be used for TSF reporting, don't select it as it might become inactive before scan is actually started by the driver. Instead, let the driver select one of the active links. Fixes: `cbde0b49f2` ("wifi: mac80211: Extend support for scanning while MLO connected") Signed-off-by: Ayala Beker <ayala.beker@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.a6b643a15755.Ic28ed9a611432387b7f85e9ca9a97a4ce34a6e0f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:39:28 +01:00
Johannes Berg	21e29016d5	wifi: mac80211: clarify IEEE80211_STATUS_SUBDATA_MASK We have 13 bits for the status_data, so restricting type to 4 and subdata to 8 bits is confusing, even if we don't need more bits now. Change subdata mask to be 9 bits instead, just to make things match up. If we actually need more types or more subdata bits we can later also reshuffle the bits between these, but we should probably keep them at 13 bits together. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.28ac7b665039.I1abbb13e90f016cab552492e05f5cb5b52de6463@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:39:28 +01:00
Johannes Berg	3c9ff1a1e1	wifi: mac80211: don't enter idle during link switch When doing link switch with a disjoint set of links before and after the switch, we end up removing all channel contexts, adding new ones later. This looks like 'idle' to the code now, and we enter idle which also includes flushing queues. But we can't actually flush since we don't have a link active (bound to a channel context), and entering idle just to leave it again is also wrong. Fix this by passing through an indication that we shouldn't do any idle checks in this case. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.170328bac555.If4a522a9dd3133b91983854b909a4de13aa635da@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:39:28 +01:00
Ayala Beker	a17a58ad2f	wifi: mac80211: add support for tearing down negotiated TTLM In order to activate a link that is currently inactive due to a negotiated TTLM request, need to first tear down the negotiated TTLM request. Add support for sending TTLM teardown request and update the links state accordingly. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.d480cbf46fcf.Idedad472469d2c27dd2a088cf80a13a1e1cf9b78@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:38:15 +01:00
Benjamin Berg	97f8df4db4	wifi: cfg80211: ignore non-TX BSSs in per-STA profile If a non-TX BSS is included in a per-STA profile, then we cannot set transmitted_bss for it. Even worse, if we do things properly we should be configuring both bssid_index and max_bssid_indicator correctly. We do not actually have both pieces of information (and, some APs currently do not include either). So, ignore any per-STA profile where the RNR says that the BSS is not transmitted. Also fix transmitted_bss to never be set for per-STA profiles. This fixes issues where mac80211 was setting the reference BSSID to an incorrect value. Fixes: `2481b5da9c` ("wifi: cfg80211: handle BSS data contained in ML probe responses") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.6a0babed655a.Iad447fea417c63f683da793556b97c31d07a4aab@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:38:15 +01:00
Benjamin Berg	c7378d7d8b	wifi: cfg80211: check BSSID Index against MaxBSSID Add a verification that the BSSID Index does not exceed the maximum number of BSSIDs in the Multiple-BSSID set. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.a7574d415adc.I02f40c2920a9f602898190679cc27d0c8ee2c67d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:38:14 +01:00
Benjamin Berg	0dfedd48ac	wifi: mac80211: improve association error reporting slightly There is no reason to check the request flags for each of the links, so pull that out of the loop. Also, within the loop we can set the per-link error everywhere. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.695faa9be279.I71b11a8d66a9cae4c27e242a47d1d92922609b03@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:38:14 +01:00
Johannes Berg	6943e00331	wifi: mac80211: add flag to disallow puncturing in 5 GHz Some devices may not be capable of handling puncturing in 5 GHz only (vs. the current flag that just removes puncturing support completely). Add a flag to support such devices: check and then downgrade the channel width if needed. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.49759510da7d.I12c5a61f0be512e0c4e574c2f794ef4b37ecaf6b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:38:13 +01:00
Anjaneyulu	dc63b1d083	wifi: cfg80211: handle indoor AFC/LPI AP in probe response and beacon Mark Indoor LPI and Indoor AFC power types as valid based on channel flags. While on it, added default case. Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.091cfaaa5f45.I23cfa1104a16fd4eb9751b3d0d7b158db4ff3ecd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:38:13 +01:00
Anjaneyulu	56cc479188	wifi: mac80211: handle indoor AFC/LPI AP on assoc success Update power_type in bss_conf based on Indoor AFC and LPI power types received in HE 6 GHz operation element on assoc success. Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.89c25dae34ff.Ifd8b2983f400623ac03dc032fc9a20025c9ca365@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:38:13 +01:00
Johannes Berg	5930a9967c	wifi: mac80211: spectmgmt: simplify 6 GHz HE/EHT handling Clean up the code here a bit to have only a single call to ieee80211_chandef_he_6ghz_oper() by using a local pointer variable for the difference. Link: https://msgid.link/20240312112048.94c421d767f9.Ia7ca2f315b392c74d39b44fa9eb872a2e62e75c1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:36:36 +01:00
Kang Yang	b919099eba	wifi: mac80211: supplement parsing of puncturing bitmap Current mac80211 won't parsing puncturing bitmap when process EHT Operation element in 6 GHz band or Bandwidth Indication element. This leads to puncturing bitmap cannot be updated in related situations, such as connecting to an EHT AP in 6 GHz band. So supplement parsing of puncturing bitmap for these elements. Signed-off-by: Kang Yang <quic_kangyang@quicinc.com> Link: https://msgid.link/20240312045947.576231-2-quic_kangyang@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:36:22 +01:00
Ayala Beker	134d715e9e	wifi: mac80211: correctly set active links upon TTLM Fix ieee80211_ttlm_set_links() to not set all active links, but instead let the driver know that valid links status changed and select the active links properly. Fixes: `8f500fbc6c` ("wifi: mac80211: process and save negotiated TID to Link mapping request") Signed-off-by: Ayala Beker <ayala.beker@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.acddbbf39584.Ide858f95248fcb3e483c97fcaa14b0cd4e964b10@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:23:07 +01:00
Johannes Berg	2e6bd24339	wifi: mac80211: fix prep_connection error path If prep_channel fails in prep_connection, the code releases the deflink's chanctx, which is wrong since we may be using a different link. It's already wrong to even do that always though, since we might still have the station. Remove it only if prep_channel succeeded and later updates fail. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.2780c1f08c3d.I033c9b15483933088f32a2c0789612a33dd33d82@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-03-25 15:23:06 +01:00

1 2 3 4 5 ...

76727 Commits