linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-09-22 12:44:11 +08:00

Author	SHA1	Message	Date
Dmitry Safonov	f7dca36fc5	net/tcp: Add tcp_parse_auth_options() Introduce a helper that: (1) shares the common code with TCP-MD5 header options parsing (2) looks for hash signature only once for both TCP-MD5 and TCP-AO (3) fails with -EEXIST if any TCP sign option is present twice, see RFC5925 (2.2): ">> A single TCP segment MUST NOT have more than one TCP-AO in its options sequence. When multiple TCP-AOs appear, TCP MUST discard the segment." Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-27 10:35:44 +01:00
Dmitry Safonov	1e03d32bea	net/tcp: Add TCP-AO sign to outgoing packets Using precalculated traffic keys, sign TCP segments as prescribed by RFC5925. Per RFC, TCP header options are included in sign calculation: "The TCP header, by default including options, and where the TCP checksum and TCP-AO MAC fields are set to zero, all in network- byte order." (5.1.3) tcp_ao_hash_header() has exclude_options parameter to optionally exclude TCP header from hash calculation, as described in RFC5925 (9.1), this is needed for interaction with middleboxes that may change "some TCP options". This is wired up to AO key flags and setsockopt() later. Similarly to TCP-MD5 hash TCP segment fragments. From this moment a user can start sending TCP-AO signed segments with one of crypto ahash algorithms from supported by Linux kernel. It can have a user-specified MAC length, to either save TCP option header space or provide higher protection using a longer signature. The inbound segments are not yet verified, TCP-AO option is ignored and they are accepted. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-27 10:35:44 +01:00
Dmitry Safonov	7c2ffaf21b	net/tcp: Calculate TCP-AO traffic keys Add traffic key calculation the way it's described in RFC5926. Wire it up to tcp_finish_connect() and cache the new keys straight away on already established TCP connections. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-27 10:35:44 +01:00
Dmitry Safonov	0aadc73995	net/tcp: Prevent TCP-MD5 with TCP-AO being set Be as conservative as possible: if there is TCP-MD5 key for a given peer regardless of L3 interface - don't allow setting TCP-AO key for the same peer. According to RFC5925, TCP-AO is supposed to replace TCP-MD5 and there can't be any switch between both on any connected tuple. Later it can be relaxed, if there's a use, but in the beginning restrict any intersection. Note: it's still should be possible to set both TCP-MD5 and TCP-AO keys on a listening socket for different peers. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-27 10:35:44 +01:00
Dmitry Safonov	4954f17dde	net/tcp: Introduce TCP_AO setsockopt()s Add 3 setsockopt()s: 1. TCP_AO_ADD_KEY to add a new Master Key Tuple (MKT) on a socket 2. TCP_AO_DEL_KEY to delete present MKT from a socket 3. TCP_AO_INFO to change flags, Current_key/RNext_key on a TCP-AO sk Userspace has to introduce keys on every socket it wants to use TCP-AO option on, similarly to TCP_MD5SIG/TCP_MD5SIG_EXT. RFC5925 prohibits definition of MKTs that would match the same peer, so do sanity checks on the data provided by userspace. Be as conservative as possible, including refusal of defining MKT on an established connection with no AO, removing the key in-use and etc. (1) and (2) are to be used by userspace key manager to add/remove keys. (3) main purpose is to set RNext_key, which (as prescribed by RFC5925) is the KeyID that will be requested in TCP-AO header from the peer to sign their segments with. At this moment the life of ao_info ends in tcp_v4_destroy_sock(). Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-27 10:35:44 +01:00
Dmitry Safonov	c845f5f359	net/tcp: Add TCP-AO config and structures Introduce new kernel config option and common structures as well as helpers to be used by TCP-AO code. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-27 10:35:44 +01:00
Dmitry Safonov	8c73b26315	net/tcp: Prepare tcp_md5sig_pool for TCP-AO TCP-AO, similarly to TCP-MD5, needs to allocate tfms on a slow-path, which is setsockopt() and use crypto ahash requests on fast paths, which are RX/TX softirqs. Also, it needs a temporary/scratch buffer for preparing the hash. Rework tcp_md5sig_pool in order to support other hashing algorithms than MD5. It will make it possible to share pre-allocated crypto_ahash descriptors and scratch area between all TCP hash users. Internally tcp_sigpool calls crypto_clone_ahash() API over pre-allocated crypto ahash tfm. Kudos to Herbert, who provided this new crypto API. I was a little concerned over GFP_ATOMIC allocations of ahash and crypto_request in RX/TX (see tcp_sigpool_start()), so I benchmarked both "backends" with different algorithms, using patched version of iperf3[2]. On my laptop with i7-7600U @ 2.80GHz: clone-tfm per-CPU-requests TCP-MD5 2.25 Gbits/sec 2.30 Gbits/sec TCP-AO(hmac(sha1)) 2.53 Gbits/sec 2.54 Gbits/sec TCP-AO(hmac(sha512)) 1.67 Gbits/sec 1.64 Gbits/sec TCP-AO(hmac(sha384)) 1.77 Gbits/sec 1.80 Gbits/sec TCP-AO(hmac(sha224)) 1.29 Gbits/sec 1.30 Gbits/sec TCP-AO(hmac(sha3-512)) 481 Mbits/sec 480 Mbits/sec TCP-AO(hmac(md5)) 2.07 Gbits/sec 2.12 Gbits/sec TCP-AO(hmac(rmd160)) 1.01 Gbits/sec 995 Mbits/sec TCP-AO(cmac(aes128)) [not supporetd yet] 2.11 Gbits/sec So, it seems that my concerns don't have strong grounds and per-CPU crypto_request allocation can be dropped/removed from tcp_sigpool once ciphers get crypto_clone_ahash() support. [1]: https://lore.kernel.org/all/ZDefxOq6Ax0JeTRH@gondor.apana.org.au/T/#u [2]: https://github.com/0x7f454c46/iperf/tree/tcp-md5-ao Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-27 10:35:44 +01:00
Jakub Kicinski	edd68156bc	wireless-next patches for v6.7 The third, and most likely the last, features pull request for v6.7. Fixes all over and only few small new features. Major changes: iwlwifi * more Multi-Link Operation (MLO) work ath12k * QCN9274: mesh support ath11k * firmware-2.bin container file format support -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmU6KqgRHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZtyMwf7B/BqV0LCNzBxtrWl3WYtgQgULgWFmEJt 83/Vo8pXelZzzMMERwvZtPCwEUm/L/vOO/a/k0oSz/XQbt4PTIBGnWA7JwYZGY++ 1Kc79oMyXxG4Q4RCnKG/qQMzCnyL54RHUfFQrNaa3Bkgp7vGobU+ixH4NaqHI3M9 OFmyhCklk9AO0VTtT6vQQBM6wM3UC1adneZMVlb8xD2Wi5rkrRk4PX5msgaYrStR ketZE6IPnnX8DziqGZPlTz1SSuOSnwGTOramdeGLKIUUlZbPWHTSBZ8lh/xnvGUB 561mp3/iguFtq2NvduPBqItotBzLGvnJZbLDrBPxB/v99q+7/cziSA== =Xf7b -----END PGP SIGNATURE----- Merge tag 'wireless-next-2023-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.7 The third, and most likely the last, features pull request for v6.7. Fixes all over and only few small new features. Major changes: iwlwifi - more Multi-Link Operation (MLO) work ath12k - QCN9274: mesh support ath11k - firmware-2.bin container file format support * tag 'wireless-next-2023-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (155 commits) wifi: ray_cs: Remove unnecessary (void*) conversions Revert "wifi: ath11k: call ath11k_mac_fils_discovery() without condition" wifi: ath12k: Introduce and use ath12k_sta_to_arsta() wifi: ath12k: fix htt mlo-offset event locking wifi: ath12k: fix dfs-radar and temperature event locking wifi: ath11k: fix gtk offload status event locking wifi: ath11k: fix htt pktlog locking wifi: ath11k: fix dfs radar event locking wifi: ath11k: fix temperature event locking wifi: ath12k: rename the sc naming convention to ab wifi: ath12k: rename the wmi_sc naming convention to wmi_ab wifi: ath11k: add firmware-2.bin support wifi: ath11k: qmi: refactor ath11k_qmi_m3_load() wifi: rtw89: cleanup firmware elements parsing wifi: rt2x00: rework MT7620 PA/LNA RF calibration wifi: rt2x00: rework MT7620 channel config function wifi: rt2x00: improve MT7620 register initialization MAINTAINERS: wifi: rt2x00: drop Helmut Schaa wifi: wlcore: main: replace deprecated strncpy with strscpy wifi: wlcore: boot: replace deprecated strncpy with strscpy ... ==================== Link: https://lore.kernel.org/r/20231026090411.B2426C433CB@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-26 20:27:58 -07:00
Jakub Kicinski	c6f9b7138b	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZTp12QAKCRDbK58LschI g8BrAQDifqp5liEEdXV8jdReBwJtqInjrL5tzy5LcyHUMQbTaAEA6Ph3Ct3B+3oA mFnIW/y6UJiJrby0Xz4+vV5BXI/5WQg= =pLCV -----END PGP SIGNATURE----- Merge tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-26 We've added 51 non-merge commits during the last 10 day(s) which contain a total of 75 files changed, 5037 insertions(+), 200 deletions(-). The main changes are: 1) Add open-coded task, css_task and css iterator support. One of the use cases is customizable OOM victim selection via BPF, from Chuyi Zhou. 2) Fix BPF verifier's iterator convergence logic to use exact states comparison for convergence checks, from Eduard Zingerman, Andrii Nakryiko and Alexei Starovoitov. 3) Add BPF programmable net device where bpf_mprog defines the logic of its xmit routine. It can operate in L3 and L2 mode, from Daniel Borkmann and Nikolay Aleksandrov. 4) Batch of fixes for BPF per-CPU kptr and re-enable unit_size checking for global per-CPU allocator, from Hou Tao. 5) Fix libbpf which eagerly assumed that SHT_GNU_verdef ELF section was going to be present whenever a binary has SHT_GNU_versym section, from Andrii Nakryiko. 6) Fix BPF ringbuf correctness to fold smp_mb__before_atomic() into atomic_set_release(), from Paul E. McKenney. 7) Add a warning if NAPI callback missed xdp_do_flush() under CONFIG_DEBUG_NET which helps checking if drivers were missing the former, from Sebastian Andrzej Siewior. 8) Fix missed RCU read-lock in bpf_task_under_cgroup() which was throwing a warning under sleepable programs, from Yafang Shao. 9) Avoid unnecessary -EBUSY from htab_lock_bucket by disabling IRQ before checking map_locked, from Song Liu. 10) Make BPF CI linked_list failure test more robust, from Kumar Kartikeya Dwivedi. 11) Enable samples/bpf to be built as PIE in Fedora, from Viktor Malik. 12) Fix xsk starving when multiple xsk sockets were associated with a single xsk_buff_pool, from Albert Huang. 13) Clarify the signed modulo implementation for the BPF ISA standardization document that it uses truncated division, from Dave Thaler. 14) Improve BPF verifier's JEQ/JNE branch taken logic to also consider signed bounds knowledge, from Andrii Nakryiko. 15) Add an option to XDP selftests to use multi-buffer AF_XDP xdp_hw_metadata and mark used XDP programs as capable to use frags, from Larysa Zaremba. 16) Fix bpftool's BTF dumper wrt printing a pointer value and another one to fix struct_ops dump in an array, from Manu Bretelle. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (51 commits) netkit: Remove explicit active/peer ptr initialization selftests/bpf: Fix selftests broken by mitigations=off samples/bpf: Allow building with custom bpftool samples/bpf: Fix passing LDFLAGS to libbpf samples/bpf: Allow building with custom CFLAGS/LDFLAGS bpf: Add more WARN_ON_ONCE checks for mismatched alloc and free selftests/bpf: Add selftests for netkit selftests/bpf: Add netlink helper library bpftool: Extend net dump with netkit progs bpftool: Implement link show support for netkit libbpf: Add link-based API for netkit tools: Sync if_link uapi header netkit, bpf: Add bpf programmable net device bpf: Improve JEQ/JNE branch taken logic bpf: Fold smp_mb__before_atomic() into atomic_set_release() bpf: Fix unnecessary -EBUSY from htab_lock_bucket xsk: Avoid starving the xsk further down the list bpf: print full verifier states on infinite loop detection selftests/bpf: test if state loops are detected in a tricky case bpf: correct loop detection for iterators convergence ... ==================== Link: https://lore.kernel.org/r/20231026150509.2824-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-26 20:02:41 -07:00
Jakub Kicinski	ea23fbd2a8	netlink: make range pointers in policies const struct nla_policy is usually constant itself, but unless we make the ranges inside constant we won't be able to make range structs const. The ranges are not modified by the core. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231025162204.132528-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-26 16:24:09 -07:00
Jakub Kicinski	ec4c20ca09	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c `91535613b6` ("wifi: mac80211: don't drop all unprotected public action frames") `6c02fab724` ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c `61471264c0` ("net: ethernet: apm: Convert to platform remove callback returning void") `d2ca43f306` ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c `64c99d2d6a` ("vsock/virtio: support to send non-linear skb") `53b08c4985` ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-26 13:46:28 -07:00
Paolo Abeni	3967336126	netfilter pull request 23-10-25 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmU5gvgACgkQ1V2XiooU IOS9eA//QyIqcGRzr+tX1ZPwikkicmSb7w8vkrY7jXMNWNiye54tA5fSJsxqpIMy 9J9k8eB+fI6AJV44tOq4K8XsYCcI4ZEst2mftumvuq8igX27ulz46uLvoVwnqhwc NlPO06RwSHQHR3S5tKRRwwYfUwpPjDRCW15c14pHw4EkHAaL+dItLwATrrJhPv93 PUZNGbB+i+55QrLJdMMshvpoPAhLmo57cDvDcerhOWygaoxiaKIaR0bRQ40eM3Zl j9veG2oiuehv/RHqFJ5MBCiqrIYRHU8kTflqaVA+ODfgUSbijCcv/RQxaILnwnZd 57vdLSBTgVFh2uiPTYGAxfUwv3BpA7g9uzmeBgMbl/t71HhYoPe/QhjvoVhlzNl/ 1JrCFbrQVaCdhtZbDt7f359mUtXv9yh1+9pBytEkdbxRfsDmzRVsFrUeLo0P9Ho8 4jKpnaqPTdnzhfoQocZjL7+M22/zk6jZCu1Pcs318yqpTkhJwiTtByo4iaQ2f0F7 xgA/auZ/33mmirFXnLzMoA4b0TbJNm+Jjye3tdbu0FlY2sKb943RW2kXg9rdfyOL OUvSSux7Fezyq5y55+KV4FFrKgYhrZ5tWR8mVBg1KHrYs9r9p7SmXL3KzbSPxBvn QrPdCQsQ187QNIkwli/ChYQcrahwSiEqFSJbdDmzIA7AFNLHVaY= =ZG+P -----END PGP SIGNATURE----- Merge tag 'nf-next-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. Mostly nf_tables updates with two patches for connlabel and br_netfilter. 1) Rename function name to perform on-demand GC for rbtree elements, and replace async GC in rbtree by sync GC. Patches from Florian Westphal. 2) Use commit_mutex for NFT_MSG_GETRULE_RESET to ensure that two concurrent threads invoking this command do not underrun stateful objects. Patches from Phil Sutter. 3) Use single hook to deal with IP and ARP packets in br_netfilter. Patch from Florian Westphal. 4) Use atomic_t in netns->connlabel use counter instead of using a spinlock, also patch from Florian. 5) Cleanups for stateful objects infrastructure in nf_tables. Patches from Phil Sutter. 6) Flush path uses opaque set element offered by the iterator, instead of calling pipapo_deactivate() which looks up for it again. 7) Set backend .flush interface always succeeds, make it return void instead. 8) Add struct nft_elem_priv placeholder structure and use it by replacing void * to pass opaque set element representation from backend to frontend which defeats compiler type checks. 9) Shrink memory consumption of set element transactions, by reducing struct nft_trans_elem object size and reducing stack memory usage. 10) Use struct nft_elem_priv also for set backend .insert operation too. 11) Carry reset flag in nft_set_dump_ctx structure, instead of passing it as a function argument, from Phil Sutter. netfilter pull request 23-10-25 * tag 'nf-next-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctx netfilter: nf_tables: set->ops->insert returns opaque set element in case of EEXIST netfilter: nf_tables: shrink memory consumption of set elements netfilter: nf_tables: expose opaque set element as struct nft_elem_priv netfilter: nf_tables: set backend .flush always succeeds netfilter: nft_set_pipapo: no need to call pipapo_deactivate() from flush netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctx netfilter: nf_tables: nft_obj_filter fits into cb->ctx netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctx netfilter: nf_tables: A better name for nft_obj_filter netfilter: nf_tables: Unconditionally allocate nft_obj_filter netfilter: nf_tables: Drop pointless memset in nf_tables_dump_obj netfilter: conntrack: switch connlabels to atomic_t br_netfilter: use single forward hook for ip and arp netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests netfilter: nf_tables: Introduce nf_tables_getrule_single() netfilter: nf_tables: Open-code audit log call in nf_tables_getrule() netfilter: nft_set_rbtree: prefer sync gc to async worker netfilter: nft_set_rbtree: rename gc deactivate+erase function ==================== Link: https://lore.kernel.org/r/20231025212555.132775-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-26 12:20:35 +02:00
Alex Henrie	629df6701c	net: ipv6/addrconf: clamp preferred_lft to the minimum required If the preferred lifetime was less than the minimum required lifetime, ipv6_create_tempaddr would error out without creating any new address. On my machine and network, this error happened immediately with the preferred lifetime set to 1 second, after a few minutes with the preferred lifetime set to 4 seconds, and not at all with the preferred lifetime set to 5 seconds. During my investigation, I found a Stack Exchange post from another person who seems to have had the same problem: They stopped getting new addresses if they lowered the preferred lifetime below 3 seconds, and they didn't really know why. The preferred lifetime is a preference, not a hard requirement. The kernel does not strictly forbid new connections on a deprecated address, nor does it guarantee that the address will be disposed of the instant its total valid lifetime expires. So rather than disable IPv6 privacy extensions altogether if the minimum required lifetime swells above the preferred lifetime, it is more in keeping with the user's intent to increase the temporary address's lifetime to the minimum necessary for the current network conditions. With these fixes, setting the preferred lifetime to 3 or 4 seconds "just works" because the extra fraction of a second is practically unnoticeable. It's even possible to reduce the time before deprecation to 1 or 2 seconds by also disabling duplicate address detection (setting /proc/sys/net/ipv6/conf/*/dad_transmits to 0). I realize that that is a pretty niche use case, but I know at least one person who would gladly sacrifice performance and convenience to be sure that they are getting the maximum possible level of privacy. Link: https://serverfault.com/a/1031168/310447 Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231024212312.299370-3-alexhenrie24@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 18:23:06 -07:00
Alex Henrie	bfbf81b310	net: ipv6/addrconf: clamp preferred_lft to the maximum allowed Without this patch, there is nothing to stop the preferred lifetime of a temporary address from being greater than its valid lifetime. If that was the case, the valid lifetime was effectively ignored. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231024212312.299370-2-alexhenrie24@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 18:23:06 -07:00
Yan Zhai	03d6c848bf	ipv6: avoid atomic fragment on GSO packets When the ipv6 stack output a GSO packet, if its gso_size is larger than dst MTU, then all segments would be fragmented. However, it is possible for a GSO packet to have a trailing segment with smaller actual size than both gso_size as well as the MTU, which leads to an "atomic fragment". Atomic fragments are considered harmful in RFC-8021. An Existing report from APNIC also shows that atomic fragments are more likely to be dropped even it is equivalent to a no-op [1]. Add an extra check in the GSO slow output path. For each segment from the original over-sized packet, if it fits with the path MTU, then avoid generating an atomic fragment. Link: https://www.potaroo.net/presentations/2022-03-01-ipv6-frag.pdf [1] Fixes: `b210de4f8c` ("net: ipv6: Validate GSO SKB before finish IPv6 processing") Reported-by: David Wragg <dwragg@cloudflare.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Link: https://lore.kernel.org/r/90912e3503a242dca0bc36958b11ed03a2696e5e.1698156966.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 18:04:29 -07:00
Yan Zhai	1f7ec1b372	ipv6: refactor ip6_finish_output for GSO handling Separate GSO and non-GSO packets handling to make the logic cleaner. For GSO packets, frag_max_size check can be omitted because it is only useful for packets defragmented by netfilter hooks. Both local output and GRO logic won't produce GSO packets when defragment is needed. This also mirrors what IPv4 side code is doing. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Yan Zhai <yan@cloudflare.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/0e1d4599f858e2becff5c4fe0b5f843236bc3fe8.1698156966.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 18:04:29 -07:00
Yan Zhai	e57a344785	ipv6: drop feature RTAX_FEATURE_ALLFRAG RTAX_FEATURE_ALLFRAG was added before the first git commit: https://www.mail-archive.com/bk-commits-head@vger.kernel.org/msg03399.html The feature would send packets to the fragmentation path if a box receives a PMTU value with less than 1280 byte. However, since commit `9d289715eb` ("ipv6: stop sending PTB packets for MTU < 1280"), such message would be simply discarded. The feature flag is neither supported in iproute2 utility. In theory one can still manipulate it with direct netlink message, but it is not ideal because it was based on obsoleted guidance of RFC-2460 (replaced by RFC-8200). The feature would always test false at the moment, so remove related code or mark them as unused. Signed-off-by: Yan Zhai <yan@cloudflare.com> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/d78e44dcd9968a252143ffe78460446476a472a1.1698156966.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 18:04:29 -07:00
Jakub Kicinski	5e5d8b94a4	netfilter pull request 23-10-25 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmU45rAACgkQ1V2XiooU IORtEQ//U91FHPqc1KpJi5lAnXXAXaDji6RjZ080bwk4H3oXc2moc71SiGEgggGs POZEnN0sNJXfUacdG23pQGLnrT1iQpG927mzV01W9HhyZEopO4g+mRt5ymt/qmvO Q9MKWuNOlJCD5blPyKxU7VF3LsQynaPST1IbuPI1NVKiqNUpIpAWC1G+Ofpt67QY Tq7KiJDX0yc+51OFT9Ahs3piSbzC5bl0yC4iynajPxziv+rUiJW5ym2GM24G2rNh /SD4EeJkArdFa3I4Kf15Hnj9809qQP22PDhoQ2Hzzr7XbveArmPjaI0UQ39uV5Jr 1/lFP3iQMBsj04dI/xRLBHJHb2WZvlNa+btV/RHuaw1TEnYevdarMl3Lh0q7p5sT 3M4JBbk0+bq7ZXWmDBT48ZQs4S5UqMscunZXKg2k0fZPn/rSlASAZ3TAXZuF0avp KLQGQsjeBX/zgmQqhq37/oD+YV13LCtEqC0xz4WgX9WpVvgyMR3LFcsHQcZBAVUN PJenvgmpdo8sbhABOXsURJPVDo0JzS4xZhrPyIKaojTo33KfQ/1Z5Ef0EOkbs75+ 6wMoUTdvcZK+Y5f6hvMQ/XOu7XNz0sVZlfBjAhFrVU/TsbprviQCN8QB1IQNHclm 5A93VnID0WPCSAmOmaIdMlcJka4wKv4irI+Iv8vNlQXqV7dXuzQ= =r+0z -----END PGP SIGNATURE----- Merge tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net This patch contains two late Netfilter's flowtable fixes for net: 1) Flowtable GC pushes back packets to classic path in every GC run, ie. every second. This is because NF_FLOW_HW_ESTABLISHED is only used by sched/act_ct (never set) and IPS_SEEN_REPLY might be unset by the time the flow is offloaded (this status bit is only reliable in the sched/act_ct datapath). 2) sched/act_ct logic to push back packets to classic path to reevaluate if UDP flow is unidirectional only applies if IPS_HW_OFFLOAD_BIT is set on and no hardware offload request is pending to be handled. From Vlad Buslov. These two patches fixes two problems that were introduced in the previous 6.5 development cycle. * tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: net/sched: act_ct: additional checks for outdated flows netfilter: flowtable: GC pushes back packets to classic path ==================== Link: https://lore.kernel.org/r/20231025100819.2664-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 16:02:06 -07:00
Alexandru Matei	53b08c4985	vsock/virtio: initialize the_virtio_vsock before using VQs Once VQs are filled with empty buffers and we kick the host, it can send connection requests. If the_virtio_vsock is not initialized before, replies are silently dropped and do not reach the host. virtio_transport_send_pkt() can queue packets once the_virtio_vsock is set, but they won't be processed until vsock->tx_run is set to true. We queue vsock->send_pkt_work when initialization finishes to send those packets queued earlier. Fixes: `0deab087b1` ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock") Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20231024191742.14259-1-alexandru.matei@uipath.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 15:49:47 -07:00
Paolo Abeni	8005184fd1	mptcp: refactor sndbuf auto-tuning The MPTCP protocol account for the data enqueued on all the subflows to the main socket send buffer, while the send buffer auto-tuning algorithm set the main socket send buffer size as the max size among the subflows. That causes bad performances when at least one subflow is sndbuf limited, e.g. due to very high latency, as the MPTCP scheduler can't even fill such buffer. Change the send-buffer auto-tuning algorithm to compute the main socket send buffer size as the sum of all the subflows buffer size. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-9-9dc60939d371@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 12:23:35 -07:00
Paolo Abeni	9fdc779331	mptcp: ignore notsent_lowat setting at the subflow level Any latency related tuning taking action at the subflow level does not really affect the user-space, as only the main MPTCP socket is relevant. Anyway any limiting setting may foul the MPTCP scheduler, not being able to fully use the subflow-level cwin, leading to very poor b/w usage. Enforce notsent_lowat to be a no-op on every subflow. Note that TCP_NOTSENT_LOWAT is currently not supported, and properly dealing with that will require more invasive changes. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-8-9dc60939d371@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 12:23:34 -07:00
Paolo Abeni	a1ab24e5fc	mptcp: consolidate sockopt synchronization Move the socket option synchronization for active subflows at subflow creation time. This allows removing the now unused unlocked variant of such helper. While at that, clean-up a bit the mptcp_subflow_create_socket() errors path. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-7-9dc60939d371@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 12:23:34 -07:00
Paolo Abeni	0ffe8e7490	mptcp: use copy_from_iter helpers on transmit The perf traces show an high cost for the MPTCP transmit path memcpy. It turn out that the helper currently in use carries quite a bit of unneeded overhead, e.g. to map/unmap the memory pages. Moving to the 'copy_from_iter' variant removes such overhead and additionally gains the no-cache support. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-6-9dc60939d371@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 12:23:34 -07:00
Paolo Abeni	5684ab1a0e	mptcp: give rcvlowat some love The MPTCP protocol allow setting sk_rcvlowat, but the value there is currently ignored. Additionally, the default subflows sk_rcvlowat basically disables per subflow delayed ack: the MPTCP protocol move the incoming data from the subflows into the msk socket as soon as the TCP stacks invokes the subflow data_ready callback. Later, when __tcp_ack_snd_check() takes action, the subflow-level copied_seq matches rcv_nxt, and that mandate for an immediate ack. Let the mptcp receive path be aware of such threshold, explicitly tracking the amount of data available to be ready and checking vs sk_rcvlowat in mptcp_poll() and before waking-up readers. Additionally implement the set_rcvlowat() callback, to properly handle the rcvbuf auto-tuning on sk_rcvlowat changes. Finally to properly handle delayed ack, force the subflow level threshold to 0 and instead explicitly ask for an immediate ack when the msk level th is not reached. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-5-9dc60939d371@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 12:23:34 -07:00
Paolo Abeni	f1f26512a9	mptcp: use plain bool instead of custom binary enum The 'data_avail' subflow field is already used as plain boolean, drop the custom binary enum type and switch to bool. No functional changed intended. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-3-9dc60939d371@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 12:23:34 -07:00
Paolo Abeni	bf0e96108f	mptcp: properly account fastopen data Currently the socket level counter aggregating the received data does not take in account the data received via fastopen. Address the issue updating the counter as required. Fixes: `38967f424b` ("mptcp: track some aggregate data counters") Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-2-9dc60939d371@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 12:23:34 -07:00
Paolo Abeni	d866ae9aaa	mptcp: add a new sysctl for make after break timeout The MPTCP protocol allows sockets with no alive subflows to stay in ESTABLISHED status for and user-defined timeout, to allow for later subflows creation. Currently such timeout is constant - TCP_TIMEWAIT_LEN. Let the user-space configure them via a newly added sysctl, to better cope with busy servers and simplify (make them faster) the relevant pktdrill tests. Note that the new know does not apply to orphaned MPTCP socket waiting for the data_fin handshake completion: they always wait TCP_TIMEWAIT_LEN. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-1-9dc60939d371@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-25 12:23:33 -07:00
Deming Wang	1711435e3e	net: ipv6: fix typo in comments The word "advertize" should be replaced by "advertise". Signed-off-by: Deming Wang <wangdeming@inspur.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-25 10:38:07 +01:00
Deming Wang	197f9fba96	net: ipv4: fix typo in comments The word "advertize" should be replaced by "advertise". Signed-off-by: Deming Wang <wangdeming@inspur.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-25 10:38:07 +01:00
Vlad Buslov	a63b662212	net/sched: act_ct: additional checks for outdated flows Current nf_flow_is_outdated() implementation considers any flow table flow which state diverged from its underlying CT connection status for teardown which can be problematic in the following cases: - Flow has never been offloaded to hardware in the first place either because flow table has hardware offload disabled (flag NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add' workqueue to be offloaded for the first time. The former is incorrect, the later generates excessive deletions and additions of flows. - Flow is already pending to be updated on the workqueue. Tearing down such flows will also generate excessive removals from the flow table, especially on highly loaded system where the latency to re-offload a flow via 'add' workqueue can be quite high. When considering a flow for teardown as outdated verify that it is both offloaded to hardware and doesn't have any pending updates. Fixes: `41f2c7c342` ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple") Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-25 11:35:57 +02:00
Pablo Neira Ayuso	735795f68b	netfilter: flowtable: GC pushes back packets to classic path Since `41f2c7c342` ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY back to classic path in every run, ie. every second. This is because of a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct. In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on and IPS_SEEN_REPLY is unreliable since users decide when to offload the flow before, such bit might be set on at a later stage. Fix it by adding a custom .gc handler that sched/act_ct can use to deal with its NF_FLOW_HW_ESTABLISHED bit. Fixes: `41f2c7c342` ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple") Reported-by: Vladimir Smelhaus <vl.sm@email.cz> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-25 11:35:46 +02:00
Florian Westphal	70f06c115b	sched: act_ct: switch to per-action label counting net->ct.labels_used was meant to convey 'number of ip/nftables rules that need the label extension allocated'. act_ct enables this for each net namespace, which voids all attempts to avoid ct->ext allocation when possible. Move this increment to the control plane to request label extension space allocation only when its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-25 10:24:04 +01:00
Jakub Kicinski	00d67093e4	Three more fixes: - don't drop all unprotected public action frames since some don't have a protected dual - fix pointer confusion in scanning code - fix warning in some connections with multiple links -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmU3naIACgkQ10qiO8sP aADUNA//clFAsaH6A94tD6Hgyafi9idBpsHERkYE4RaGmiID34yOYDInvkoDmsy1 WG7wEdNjnsYDBrX0eG1x7WSrQRLhs76U0HnBP9tFYIeygnLuul2/UNRFkK6EwQfn OJbVJdjQdL/c8p129DUr5JKhavbc4ovY2acECLRY54n1fAYlnn6u7SWsOyCu0zl7 wXSQ5pzYHu5lFM5LSFj6mC7U8b/aFQ5r9XsNHwGz4YVvd5cEdLYc/y5bAK6xAIxz jJcJLV088QikAcYmgIS7MNQuKrMudNjCEDWtqM23N9pO//QjsbOag2q02JmfxWyv 4YJy42G/0K0/wjwCpIZig2OOE5iKDKCJ+dvNBUaCnnTn1ARQSXSDgAYkmWReCrZu DUpvn8Be3fgULtaUC0QQ3R1oCTVJyKYTD55Ofcy3Pj1qt+1lmhgLp1qyezemJcfJ p2sv5GLwyPLcOUTjeqTgP57xoJl2JV9vUVey9xvk2dMl0vS5qpfIf3FR7R0+HtlZ UIrveQOLMsKAaamk59RaSpfg4vJoCqaabu97f/lHRc5WdaeURSlUw0rU9xdqc/P+ GTX7ubKoMiMEx11v25JdTE3eniFGxu28cojqScryvFo6WIlkYp4cbNxtRb4i9rOX ZJXQWCE6YJZ90VlR/a8lpJnTXjntQT5vBtH7vhqAneN2TJN74h8= =AvmQ -----END PGP SIGNATURE----- Merge tag 'wireless-2023-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Three more fixes: - don't drop all unprotected public action frames since some don't have a protected dual - fix pointer confusion in scanning code - fix warning in some connections with multiple links * tag 'wireless-2023-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: don't drop all unprotected public action frames wifi: cfg80211: fix assoc response warning on failed links wifi: cfg80211: pass correct pointer to rdev_inform_bss() ==================== Link: https://lore.kernel.org/r/20231024103540.19198-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:10:53 -07:00
Florian Fainelli	87cd83714f	net: dsa: Rename IFLA_DSA_MASTER to IFLA_DSA_CONDUIT This preserves the existing IFLA_DSA_MASTER which is part of the uAPI and creates an alias named IFLA_DSA_CONDUIT. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:08:14 -07:00
Florian Fainelli	6ca80638b9	net: dsa: Use conduit and user terms Use more inclusive terms throughout the DSA subsystem by moving away from "master" which is replaced by "conduit" and "slave" which is replaced by "user". No functional changes. Acked-by: Rob Herring <robh@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:08:14 -07:00
Jakub Kicinski	ce4cfa2318	net: remove else after return in dev_prep_valid_name() Remove unnecessary else clauses after return. I copied this if / else construct from somewhere, it makes the code harder to read. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:02:59 -07:00
Jakub Kicinski	70e1b14c1b	net: remove dev_valid_name() check from __dev_alloc_name() __dev_alloc_name() is only called by dev_prep_valid_name(), which already checks that name is valid. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:02:58 -07:00
Jakub Kicinski	7ad17b04dc	net: trust the bitmap in __dev_alloc_name() Prior to restructuring __dev_alloc_name() handled both printf and non-printf names. In a clever attempt at code reuse it always prints the name into a buffer and checks if it's a duplicate. Trust the bitmap, and return an error if its full. This shrinks the possible ID space by one from 32K to 32K - 1, as previously the max value would have been tried as a valid ID. It seems very unlikely that anyone would care as we heard no requests to increase the max beyond 32k. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:02:58 -07:00
Jakub Kicinski	9a81046812	net: reduce indentation of __dev_alloc_name() All callers of __dev_valid_name() go thru dev_prep_valid_name() which handles the non-printf case. Focus __dev_alloc_name() on the sprintf case, remove the indentation level. Minor functional change of returning -EINVAL if % is not found, which should now never happen. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:02:58 -07:00
Jakub Kicinski	556c755a4d	net: make dev_alloc_name() call dev_prep_valid_name() __dev_alloc_name() handles both the sprintf and non-sprintf target names. This complicates the code. dev_prep_valid_name() already handles the non-sprintf case, before calling __dev_alloc_name(), make the only other caller also go thru dev_prep_valid_name(). This way we can drop the non-sprintf handling in __dev_alloc_name() in one of the next changes. commit `55a5ec9b77` ("Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"") and commit `029b6d1405` ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"") tell us that we can't start returning -EEXIST from dev_alloc_name() on name duplicates. Bite the bullet and pass the expected errno to dev_prep_valid_name(). dev_prep_valid_name() must now propagate out the allocated id for printf names. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:02:58 -07:00
Jakub Kicinski	bd07063dd1	net: don't use input buffer of __dev_alloc_name() as a scratch space Callers of __dev_alloc_name() want to pass dev->name as the output buffer. Make __dev_alloc_name() not clobber that buffer on failure, and remove the workarounds in callers. dev_alloc_name_ns() is now completely unnecessary. The extra strscpy() added here will be gone by the end of the patch series. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:02:58 -07:00
Davide Caratti	aab4d85649	net: mptcp: use policy generated by YAML spec generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \ > --spec Documentation/netlink/specs/mptcp.yaml --source \ > -o net/mptcp/mptcp_pm_gen.c $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \ > --spec Documentation/netlink/specs/mptcp.yaml --header \ > -o net/mptcp/mptcp_pm_gen.h Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-7-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:00:32 -07:00
Davide Caratti	1e07938e29	net: mptcp: rename netlink handlers to mptcp_pm_nl_<blah>_{doit,dumpit} so that they will match names generated from YAML spec. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-6-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:00:32 -07:00
Davide Caratti	1d0507f468	net: mptcp: convert netlink from small_ops to ops in the current MPTCP control plane, all operations use a netlink attribute of the same type "MPTCP_PM_ATTR". However, add/del/get/flush operations only parse the first element in the message _ the one that describes MPTCP endpoints (that was named MPTCP_PM_ATTR_ADDR and mostly used in ADD_ADDR operations _ probably the similarity of "attr", "addr" and "add" might cause some confusion to human readers). Convert MPTCP from 'small_ops' to 'ops', thus allowing different attributes for each single operation, hopefully makes all this clearer to human readers. - use a separate attribute set for add/del/get/flush address operation, binary compatible with the existing one, to store the endpoint address. MPTCP_PM_ENDPOINT_ADDR is added to the uAPI (with the same value as MPTCP_PM_ATTR_ADDR) for these operations. - convert mptcp_pm_ops[] and add policy files accordingly. this prepares MPTCP control plane to be described as YAML spec. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-3-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-24 13:00:31 -07:00
Phil Sutter	9cdee06347	netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctx Relieve the dump callback from having to check nlmsg_type upon each call. Prep work for set element reset locking. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-24 15:48:30 +02:00
Pablo Neira Ayuso	078996fcd6	netfilter: nf_tables: set->ops->insert returns opaque set element in case of EEXIST Return struct nft_elem_priv instead of struct nft_set_ext for consistency with ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv") and to prepare the introduction of element timeout updates from control path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-24 13:37:46 +02:00
Pablo Neira Ayuso	0e1ea651c9	netfilter: nf_tables: shrink memory consumption of set elements Instead of copying struct nft_set_elem into struct nft_trans_elem, store the pointer to the opaque set element object in the transaction. Adapt set backend API (and set backend implementations) to take the pointer to opaque set element representation whenever required. This patch deconstifies .remove() and .activate() set backend API since these modify the set element opaque object. And it also constify nft_set_elem_ext() this provides access to the nft_set_ext struct without updating the object. According to pahole on x86_64, this patch shrinks struct nft_trans_elem size from 216 to 24 bytes. This patch also reduces stack memory consumption by removing the template struct nft_set_elem object, using the opaque set element object instead such as from the set iterator API, catchall elements and the get element command. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-24 13:37:42 +02:00
Pablo Neira Ayuso	9dad402b89	netfilter: nf_tables: expose opaque set element as struct nft_elem_priv Add placeholder structure and place it at the beginning of each struct nft__elem for each existing set backend, instead of exposing elements as void type to the frontend which defeats compiler type checks. Use this pointer to this new type to replace void . This patch updates the following set backend API to use this new struct nft_elem_priv placeholder structure: - update - deactivate - flush - get as well as the following helper functions: - nft_set_elem_ext() - nft_set_elem_init() - nft_set_elem_destroy() - nf_tables_set_elem_destroy() This patch adds nft_elem_priv_cast() to cast struct nft_elem_priv to native element representation from the corresponding set backend. BUILD_BUG_ON() makes sure this .priv placeholder is always at the top of the opaque set element representation. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-24 13:16:30 +02:00
Pablo Neira Ayuso	6509a2e410	netfilter: nf_tables: set backend .flush always succeeds .flush is always successful since this results from iterating over the set elements to toggle mark the element as inactive in the next generation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-24 13:16:30 +02:00
Pablo Neira Ayuso	26cec9d414	netfilter: nft_set_pipapo: no need to call pipapo_deactivate() from flush Use the element object that is already offered instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-10-24 13:16:30 +02:00

1 2 3 4 5 ...

75080 Commits