linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-13 14:24:11 +08:00

Author	SHA1	Message	Date
Abhinav Jain	1820b84f3c	selftests: net: Create veth pair for testing in networkless kernel Check if the netdev list is empty and create veth pair to be used for feature on/off testing. Remove the veth pair after testing is complete. Signed-off-by: Abhinav Jain <jain.abhinav177@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240821171903.118324-2-jain.abhinav177@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-22 16:56:06 -07:00
Simon Horman	5874e0c9f2	net: atlantic: Avoid warning about potential string truncation W=1 builds with GCC 14.2.0 warn that: .../aq_ethtool.c:278:59: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 6 [-Wformat-truncation=] 278 \| snprintf(tc_string, 8, "TC%d ", tc); \| ^~ .../aq_ethtool.c:278:56: note: directive argument in the range [-2147483641, 254] 278 \| snprintf(tc_string, 8, "TC%d ", tc); \| ^~~~~~~ .../aq_ethtool.c:278:33: note: ‘snprintf’ output between 5 and 15 bytes into a destination of size 8 278 \| snprintf(tc_string, 8, "TC%d ", tc); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ tc is always in the range 0 - cfg->tcs. And as cfg->tcs is a u8, the range is 0 - 255. Further, on inspecting the code, it seems that cfg->tcs will never be more than AQ_CFG_TCS_MAX (8), so the range is actually 0 - 8. So, it seems that the condition that GCC flags will not occur. But, nonetheless, it would be nice if it didn't emit the warning. It seems that this can be achieved by changing the format specifier from %d to %u, in which case I believe GCC recognises an upper bound on the range of tc of 0 - 255. After some experimentation I think this is due to the combination of the use of %u and the type of cfg->tcs (u8). Empirically, updating the type of the tc variable to unsigned int has the same effect. As both of these changes seem to make sense in relation to what the code is actually doing - iterating over unsigned values - do both. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240821-atlantic-str-v1-1-fa2cfe38ca00@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-22 16:54:53 -07:00
Yu Jiaoliang	d6f75d86aa	nfp: bpf: Use kmemdup_array instead of kmemdup for multiple allocation Let the kememdup_array() take care about multiplication and possible overflows. Signed-off-by: Yu Jiaoliang <yujiaoliang@vivo.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240821081447.12430-1-yujiaoliang@vivo.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-22 15:12:02 +02:00
Lorenzo Bianconi	812a2751e8	net: airoha: configure hw mac address according to the port id GDM1 port on EN7581 SoC is connected to the lan dsa switch. GDM{2,3,4} can be used as wan port connected to an external phy module. Configure hw mac address registers according to the port id. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20240821-airoha-eth-wan-mac-addr-v2-1-8706d0cd6cd5@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-22 13:25:02 +02:00
Jakub Kicinski	bcc3773c49	selftests: net: add helper for checking if nettest is available A few tests check if nettest exists in the $PATH before adding $PWD to $PATH and re-checking. They don't discard stderr on the first check (and nettest is built as part of selftests, so it's pretty normal for it to not be available in system $PATH). This leads to output noise: which: no nettest in (/home/virtme/tools/fs/bin:/home/virtme/tools/fs/sbin:/home/virtme/tools/fs/usr/bin:/home/virtme/tools/fs/usr/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin) Add a common helper for the check which does silence stderr. There is another small functional change hiding here, because pmtu.sh and fib_rule_tests.sh used to return from the test case rather than completely exit. Building nettest is not hard, there should be no need to maintain the ability to selectively skip cases in its absence. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20240821012227.1398769-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-22 12:55:47 +02:00
Paolo Abeni	001b98c989	Merge branch 'net-ipv6-ioam6-introduce-tunsrc' Justin Iurman says: ==================== net: ipv6: ioam6: introduce tunsrc This series introduces a new feature called "tunsrc" (just like seg6 already does). v3: - address Jakub's comments v2: - add links to performance result figures (see patch#2 description) - move the ipv6_addr_any() check out of the datapath ==================== Link: https://patch.msgid.link/20240817131818.11834-1-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-22 10:45:16 +02:00
Justin Iurman	273f8c1420	net: ipv6: ioam6: new feature tunsrc This patch provides a new feature (i.e., "tunsrc") for the tunnel (i.e., "encap") mode of ioam6. Just like seg6 already does, except it is attached to a route. The "tunsrc" is optional: when not provided (by default), the automatic resolution is applied. Using "tunsrc" when possible has a benefit: performance. See the comparison: - before (= "encap" mode): https://ibb.co/bNCzvf7 - after (= "encap" mode with "tunsrc"): https://ibb.co/PT8L6yq Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-22 10:45:12 +02:00
Justin Iurman	924b8bea87	net: ipv6: ioam6: code alignment This patch prepares the next one by correcting the alignment of some lines. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-22 10:45:12 +02:00
Paolo Abeni	f32c821ae0	tools: ynl: lift an assumption about spec file name Currently the parsing code generator assumes that the yaml specification file name and the main 'name' attribute carried inside correspond, that is the field is the c-name representation of the file basename. The above assumption held true within the current tree, but will be hopefully broken soon by the upcoming net shaper specification. Additionally, it makes the field 'name' itself useless. Lift the assumption, always computing the generated include file name from the generated c file name. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/24da5a3596d814beeb12bd7139a6b4f89756cc19.1724165948.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:56:06 -07:00
Jakub Kicinski	f2e9b5caac	Merge branch 'net-xilinx-axienet-add-statistics-support' Sean Anderson says: ==================== net: xilinx: axienet: Add statistics support Add support for hardware statistics counters (if they are enabled) in the AXI Ethernet driver. Unfortunately, the implementation is complicated a bit since the hardware might only support 32-bit counters. ==================== Link: https://patch.msgid.link/20240820175343.760389-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:49:25 -07:00
Sean Anderson	76abb5d675	net: xilinx: axienet: Add statistics support Add support for reading the statistics counters, if they are enabled. The counters may be 64-bit, but we can't detect this statically as there's no ability bit for it and the counters are read-only. Therefore, we assume the counters are 32-bits by default. To ensure we don't miss an overflow, we read all counters at 13-second intervals. This should be often enough to ensure the bytes counters don't wrap at 2.5 Gbit/s. Another complication is that the counters may be reset when the device is reset (depending on configuration). To ensure the counters persist across link up/down (including suspend/resume), we maintain our own versions along with the last counter value we saw. Because we might wait up to 100 ms for the reset to complete, we use a mutex to protect writing hw_stats. We can't sleep in ndo_get_stats64, so we use a seqlock to protect readers. We don't bother disabling the refresh work when we detect 64-bit counters. This is because the reset issue requires us to read hw_stat_base and reset_in_progress anyway, which would still require the seqcount. And I don't think skipping the task is worth the extra bookkeeping. We can't use the byte counters for either get_stats64 or get_eth_mac_stats. This is because the byte counters include everything in the frame (destination address to FCS, inclusive). But rtnl_link_stats64 wants bytes excluding the FCS, and ethtool_eth_mac_stats wants to exclude the L2 overhead (addresses and length/type). It might be possible to calculate the byte values Linux expects based on the frame counters, but I think it is simpler to use the existing software counters. get_ethtool_stats is implemented for nonstandard statistics. This includes the aforementioned byte counters, VLAN and PFC frame counters, and user-defined (e.g. with custom RTL) counters. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20240820175343.760389-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:49:20 -07:00
Sean Anderson	d70e3788da	net: xilinx: axienet: Report RxRject as rx_dropped The Receive Frame Rejected interrupt is asserted whenever there was a receive error (bad FCS, bad length, etc.) or whenever the frame was dropped due to a mismatched address. So this is really a combination of rx_otherhost_dropped, rx_length_errors, rx_frame_errors, and rx_crc_errors. Mismatched addresses are common and aren't really errors at all (much like how fragments are normal on half-duplex links). To avoid confusion, report these events as rx_dropped. This better reflects what's going on: the packet was received by the MAC but dropped before being processed. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://patch.msgid.link/20240820175343.760389-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:49:20 -07:00
Jakub Kicinski	74b1e94e94	net: repack struct netdev_queue Adding the NAPI pointer to struct netdev_queue made it grow into another cacheline, even though there was 44 bytes of padding available. The struct was historically grouped as follows: /* read-mostly stuff (align) / / ... random control path fields ... / / write-mostly stuff (align) / / ... 40 byte hole ... / / struct dql (align) / It seems that people want to add control path fields after the read only fields. struct dql looks pretty innocent but it forces its own alignment and nothing indicates that there is a lot of empty space above it. Move dql above the xmit_lock. This shifts the empty space to the end of the struct rather than in the middle of it. Move two example fields there to set an example. Hopefully people will now add new fields at the end of the struct. A lot of the read-only stuff is also control path-only, but if we move it all we'll have another hole in the middle. Before: / size: 384, cachelines: 6, members: 16 / / sum members: 284, holes: 3, sum holes: 100 / After: / size: 320, cachelines: 5, members: 16 / / sum members: 284, holes: 1, sum holes: 8 */ Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240820205119.1321322-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:37:39 -07:00
Dan Carpenter	0ce054f2b8	ice: Fix a 32bit bug BIT() is unsigned long but ->pu.flg_msk and ->pu.flg_val are u64 type. On 32 bit systems, unsigned long is a u32 and the mismatch between u32 and u64 will break things for the high 32 bits. Fixes: `9a4c07aaa0` ("ice: add parser execution main loop") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/ddc231a8-89c1-4ff4-8704-9198bcb41f8d@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:21:47 -07:00
Xi Huang	d35a3a8f1b	ipv6: remove redundant check err varibale will be set everytime,like -ENOBUFS and in if (err < 0), when code gets into this path. This check will just slowdown the execution and that's all. Signed-off-by: Xi Huang <xuiagnh@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20240820115442.49366-1-xuiagnh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:21:09 -07:00
Jinjie Ruan	2d86ecb64b	net: dsa: sja1105: Simplify with scoped for each OF child loop Use scoped for_each_available_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20240820075047.681223-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:19:23 -07:00
Jinjie Ruan	e58c3f3d51	net: dsa: ocelot: Simplify with scoped for each OF child loop Use scoped for_each_available_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20240820074805.680674-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:18:44 -07:00
Gustavo A. R. Silva	488d34643e	nfc: pn533: Avoid -Wflex-array-member-not-at-end warnings -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Remove unnecessary flex-array member `data[]`, and with this fix the following warnings: drivers/nfc/pn533/usb.c:268:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/nfc/pn533/usb.c:275:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ZsPw7+6vNoS651Cb@elsanto Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:15:43 -07:00
Jinjian Song	d785ed945d	net: wwan: t7xx: PCIe reset rescan WWAN device is programmed to boot in normal mode or fastboot mode, when triggering a device reset through ACPI call or fastboot switch command. Maintain state machine synchronization and reprobe logic after a device reset. The PCIe device reset triggered by several ways. E.g.: - fastboot: echo "fastboot_switching" > /sys/bus/pci/devices/${bdf}/t7xx_mode. - reset: echo "reset" > /sys/bus/pci/devices/${bdf}/t7xx_mode. - IRQ: PCIe device request driver to reset itself by an interrupt request. Use pci_reset_function() as a generic way to reset device, save and restore the PCIe configuration before and after reset device to ensure the reprobe process. Suggestion from Bjorn: Link: https://lore.kernel.org/all/20230127133034.GA1364550@bhelgaas/ Signed-off-by: Jinjian Song <jinjian.song@fibocom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-21 12:57:24 +01:00
Florian Fainelli	4d36b2b1de	net: dsa: b53: Use dev_err_probe() Rather than print an error even when we get -EPROBE_DEFER, use dev_err_probe() to filter out those messages. Link: https://github.com/openwrt/openwrt/pull/11680 Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240820004436.224603-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 18:22:44 -07:00
James Chapman	bc3dd9ed04	l2tp: use skb_queue_purge in l2tp_ip_destroy_sock Recent commit `ed8ebee6de` ("l2tp: have l2tp_ip_destroy_sock use ip_flush_pending_frames") was incorrect in that l2tp_ip does not use socket cork and ip_flush_pending_frames is for sockets that do. Use __skb_queue_purge instead and remove the unnecessary lock. Also unexport ip_flush_pending_frames since it was originally exported in commit `4ff8863419` ("ipv4: export ip_flush_pending_frames") for l2tp and is not used by other modules. Suggested-by: xiyou.wangcong@gmail.com Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240819143333.3204957-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 16:52:28 -07:00
Kuniyuki Iwashima	8594d9b85c	af_unix: Don't call skb_get() for OOB skb. Since introduced, OOB skb holds an additional reference count with no special reason and caused many issues. Also, kfree_skb() and consume_skb() are used to decrement the count, which is confusing. Let's drop the unnecessary skb_get() in queue_oob() and corresponding kfree_skb(), consume_skb(), and skb_unref(). Now unix_sk(sk)->oob_skb is just a pointer to skb in the receive queue, so special handing is no longer needed in GC. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240816233921.57800-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 15:48:00 -07:00
Krzysztof Kozlowski	2862c9349d	dt-bindings: net: socionext,uniphier-ave4: add top-level constraints Properties with variable number of items per each device are expected to have widest constraints in top-level "properties:" block and further customized (narrowed) in "if:then:". Add missing top-level constraints for clock-names and reset-names. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240818172905.121829-4-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 15:28:42 -07:00
Krzysztof Kozlowski	70d16e1336	dt-bindings: net: renesas,etheravb: add top-level constraints Properties with variable number of items per each device are expected to have widest constraints in top-level "properties:" block and further customized (narrowed) in "if:then:". Add missing top-level constraints for reg, clocks, clock-names, interrupts and interrupt-names. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240818172905.121829-3-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 15:28:41 -07:00
Krzysztof Kozlowski	06ab21c3cb	dt-bindings: net: mediatek,net: add top-level constraints Properties with variable number of items per each device are expected to have widest constraints in top-level "properties:" block and further customized (narrowed) in "if:then:". Add missing top-level constraints for clocks and clock-names. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240818172905.121829-2-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 15:28:41 -07:00
Krzysztof Kozlowski	55da77dec1	dt-bindings: net: mediatek,net: narrow interrupts per variants Each variable-length property like interrupts must have fixed constraints on number of items for given variant in binding. The clauses in "if:then:" block should define both limits: upper and lower. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240818172905.121829-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 15:28:41 -07:00
Gal Pressman	13cfd6a6d7	net: Silence false field-spanning write warning in metadata_dst memcpy When metadata_dst struct is allocated (using metadata_dst_alloc()), it reserves room for options at the end of the struct. Change the memcpy() to unsafe_memcpy() as it is guaranteed that enough room (md_size bytes) was allocated and the field-spanning write is intentional. This resolves the following warning: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 104) of single field "&new_md->u.tun_info" at include/net/dst_metadata.h:166 (size 96) WARNING: CPU: 2 PID: 391470 at include/net/dst_metadata.h:166 tun_dst_unclone+0x114/0x138 [geneve] Modules linked in: act_tunnel_key geneve ip6_udp_tunnel udp_tunnel act_vlan act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress sbsa_gwdt ipmi_devintf ipmi_msghandler xfrm_interface xfrm6_tunnel tunnel6 tunnel4 xfrm_user xfrm_algo nvme_fabrics overlay optee openvswitch nsh nf_conncount ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser rdma_cm ib_umad iw_cm libiscsi ib_ipoib scsi_transport_iscsi ib_cm uio_pdrv_genirq uio mlxbf_pmc pwr_mlxbf mlxbf_bootctl bluefield_edac nft_chain_nat binfmt_misc xt_MASQUERADE nf_nat xt_tcpmss xt_NFLOG nfnetlink_log xt_recent xt_hashlimit xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_comment ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink sch_fq_codel dm_multipath fuse efi_pstore ip_tables btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 nvme nvme_core mlx5_ib ib_uverbs ib_core ipv6 crc_ccitt mlx5_core crct10dif_ce mlxfw psample i2c_mlxbf gpio_mlxbf2 mlxbf_gige mlxbf_tmfifo CPU: 2 PID: 391470 Comm: handler6 Not tainted 6.10.0-rc1 #1 Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.5.0.12993 Dec 6 2023 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : tun_dst_unclone+0x114/0x138 [geneve] lr : tun_dst_unclone+0x114/0x138 [geneve] sp : ffffffc0804533f0 x29: ffffffc0804533f0 x28: 000000000000024e x27: 0000000000000000 x26: ffffffdcfc0e8e40 x25: ffffff8086fa6600 x24: ffffff8096a0c000 x23: 0000000000000068 x22: 0000000000000008 x21: ffffff8092ad7000 x20: ffffff8081e17900 x19: ffffff8092ad7900 x18: 00000000fffffffd x17: 0000000000000000 x16: ffffffdcfa018488 x15: 695f6e75742e753e x14: 2d646d5f77656e26 x13: 6d5f77656e262220 x12: 646c65696620656c x11: ffffffdcfbe33ae8 x10: ffffffdcfbe1baa8 x9 : ffffffdcfa0a4c10 x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001 x5 : ffffff83fdeeb010 x4 : 0000000000000000 x3 : 0000000000000027 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80913f6780 Call trace: tun_dst_unclone+0x114/0x138 [geneve] geneve_xmit+0x214/0x10e0 [geneve] dev_hard_start_xmit+0xc0/0x220 __dev_queue_xmit+0xa14/0xd38 dev_queue_xmit+0x14/0x28 [openvswitch] ovs_vport_send+0x98/0x1c8 [openvswitch] do_output+0x80/0x1a0 [openvswitch] do_execute_actions+0x172c/0x1958 [openvswitch] ovs_execute_actions+0x64/0x1a8 [openvswitch] ovs_packet_cmd_execute+0x258/0x2d8 [openvswitch] genl_family_rcv_msg_doit+0xc8/0x138 genl_rcv_msg+0x1ec/0x280 netlink_rcv_skb+0x64/0x150 genl_rcv+0x40/0x60 netlink_unicast+0x2e4/0x348 netlink_sendmsg+0x1b0/0x400 __sock_sendmsg+0x64/0xc0 ____sys_sendmsg+0x284/0x308 ___sys_sendmsg+0x88/0xf0 __sys_sendmsg+0x70/0xd8 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x38/0x100 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 ---[ end trace 0000000000000000 ]--- Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240818114351.3612692-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 15:22:17 -07:00
Zhang Zekun	2cbece60a4	net: hns3: Use ARRAY_SIZE() to improve readability There is a helper function ARRAY_SIZE() to help calculating the u32 array size, and we don't need to do it mannually. So, let's use ARRAY_SIZE() to calculate the array size, and improve the code readability. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Jijie Shao<shaojijie@huawei.com> Link: https://patch.msgid.link/20240818052518.45489-1-zhangzekun11@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 15:18:21 -07:00
Jakub Kicinski	555e553163	selftests: net/forwarding: spawn sh inside vrf to speed up ping loop Looking at timestamped output of netdev CI reveals that most of the time in forwarding tests for custom route hashing is spent on a single case, namely the test which uses ping (mausezahn does not support flow labels). On a non-debug kernel we spend 714 of 730 total test runtime (97%) on this test case. While having flow label support in a traffic gen tool / mausezahn would be best, we can significantly speed up the loop by putting ip vrf exec outside of the iteration. In a test of 1000 pings using a normal loop takes 50 seconds to finish. While using: ip vrf exec $vrf sh -c "$loop-body" takes 12 seconds (1/4 of the time). Some of the slowness is likely due to our inefficient virtualization setup, but even on my laptop running "ip link help" 16k times takes 25-30 seconds, so I think it's worth optimizing even for fastest setups. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20240817203659.712085-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-20 15:17:58 -07:00
Zhang Zekun	797653865b	net: ethernet: ibm: Simpify code with for_each_child_of_node() for_each_child_of_node can help to iterate through the device_node, and we don't need to use while loop. No functional change with this conversion. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816015837.109627-1-zhangzekun11@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-20 15:15:29 +02:00
Paolo Abeni	6b2efdc454	Merge branch 'preparations-for-fib-rule-dscp-selector' Ido Schimmel says: ==================== Preparations for FIB rule DSCP selector This patchset moves the masking of the upper DSCP bits in 'flowi4_tos' to the core instead of relying on callers of the FIB lookup API to do it. This will allow us to start changing users of the API to initialize the 'flowi4_tos' field with all six bits of the DSCP field. In turn, this will allow us to extend FIB rules with a new DSCP selector. By masking the upper DSCP bits in the core we are able to maintain the behavior of the TOS selector in FIB rules and routes to only match on the lower DSCP bits. While working on this I found two users of the API that do not mask the upper DSCP bits before performing the lookup. The first is an ancient netlink family that is unlikely to be used. It is adjusted in patch #1 to mask both the upper DSCP bits and the ECN bits before calling the API. The second user is a nftables module that differs in this regard from its equivalent iptables module. It is adjusted in patch #2 to invoke the API with the upper DSCP bits masked, like all other callers. The relevant selftest passed, but in the unlikely case that regressions are reported because of this change, we can restore the existing behavior using a new flow information flag as discussed here [1]. The last patch moves the masking of the upper DSCP bits to the core, making the first two patches redundant, but I wanted to post them separately to call attention to the behavior change for these two users of the FIB lookup API. Future patchsets (around 3) will start unmasking the upper DSCP bits throughout the networking stack before adding support for the new FIB rule DSCP selector. Changes from v1 [2]: Patch #3: Include <linux/ip.h> in <linux/in_route.h> instead of including it in net/ip_fib.h [1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/ [2] https://lore.kernel.org/netdev/20240725131729.1729103-1-idosch@nvidia.com/ ==================== Link: https://patch.msgid.link/20240814125224.972815-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-20 14:57:09 +02:00
Ido Schimmel	1fa3314c14	ipv4: Centralize TOS matching The TOS field in the IPv4 flow information structure ('flowi4_tos') is matched by the kernel against the TOS selector in IPv4 rules and routes. The field is initialized differently by different call sites. Some treat it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC 791 TOS and initialize it using IPTOS_RT_MASK. What is common to all these call sites is that they all initialize the lower three DSCP bits, which fits the TOS definition in the initial IPv4 specification (RFC 791). Therefore, the kernel only allows configuring IPv4 FIB rules that match on the lower three DSCP bits which are always guaranteed to be initialized by all call sites: # ip -4 rule add tos 0x1c table 100 # ip -4 rule add tos 0x3c table 100 Error: Invalid tos. While this works, it is unlikely to be very useful. RFC 791 that initially defined the TOS and IP precedence fields was updated by RFC 2474 over twenty five years ago where these fields were replaced by a single six bits DSCP field. Extending FIB rules to match on DSCP can be done by adding a new DSCP selector while maintaining the existing semantics of the TOS selector for applications that rely on that. A prerequisite for allowing FIB rules to match on DSCP is to adjust all the call sites to initialize the high order DSCP bits and remove their masking along the path to the core where the field is matched on. However, making this change alone will result in a behavior change. For example, a forwarded IPv4 packet with a DS field of 0xfc will no longer match a FIB rule that was configured with 'tos 0x1c'. This behavior change can be avoided by masking the upper three DSCP bits in 'flowi4_tos' before comparing it against the TOS selectors in FIB rules and routes. Implement the above by adding a new function that checks whether a given DSCP value matches the one specified in the IPv4 flow information structure and invoke it from the three places that currently match on 'flowi4_tos'. Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK since the latter is not uAPI and we should be able to remove it at some point. Include <linux/ip.h> in <linux/in_route.h> since the former defines IPTOS_TOS_MASK which is used in the definition of RT_TOS() in <linux/in_route.h>. No regressions in FIB tests: # ./fib_tests.sh [...] Tests passed: 218 Tests failed: 0 And FIB rule tests: # ./fib_rule_tests.sh [...] Tests passed: 116 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-20 14:57:08 +02:00
Ido Schimmel	548a2029eb	netfilter: nft_fib: Mask upper DSCP bits before FIB lookup As part of its functionality, the nftables FIB expression module performs a FIB lookup, but unlike other users of the FIB lookup API, it does so without masking the upper DSCP bits. In particular, this differs from the equivalent iptables match ("rpfilter") that does mask the upper DSCP bits before the FIB lookup. Align the module to other users of the FIB lookup API and mask the upper DSCP bits using IPTOS_RT_MASK before the lookup. No regressions in nft_fib.sh: # ./nft_fib.sh PASS: fib expression did not cause unwanted packet drops PASS: fib expression did drop packets for 1.1.1.1 PASS: fib expression did drop packets for 1c3::c01d PASS: fib expression forward check with policy based routing Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-20 14:57:07 +02:00
Ido Schimmel	8fed54758c	ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB lookup according to user provided parameters and communicate the result back to user space. However, unlike other users of the FIB lookup API, the upper DSCP bits and the ECN bits of the DS field are not masked, which can result in the wrong result being returned. Solve this by masking the upper DSCP bits and the ECN bits using IPTOS_RT_MASK. The structure that communicates the request and the response is not exported to user space, so it is unlikely that this netlink family is actually in use [1]. [1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-20 14:57:07 +02:00
Paolo Abeni	ccb445ae46	Merge branch 'net-smc-introduce-ringbufs-usage-statistics' Wen Gu says: ==================== net/smc: introduce ringbufs usage statistics Currently, we have histograms that show the sizes of ringbufs that ever used by SMC connections. However, they are always incremental and since SMC allows the reuse of ringbufs, we cannot know the actual amount of ringbufs being allocated or actively used. So this patch set introduces statistics for the amount of ringbufs that actually allocated by link group and actively used by connections of a certain net namespace, so that we can react based on these memory usage information, e.g. active fallback to TCP. With appropriate adaptations of smc-tools, we can obtain these ringbufs usage information: $ smcr -d linkgroup LG-ID : 00000500 LG-Role : SERV LG-Type : ASYML VLAN : 0 PNET-ID : Version : 1 Conns : 0 Sndbuf : 12910592 B <- RMB : 12910592 B <- or $ smcr -d stats [...] RX Stats Data transmitted (Bytes) 869225943 (869.2M) Total requests 18494479 Buffer usage (Bytes) 12910592 (12.31M) <- [...] TX Stats Data transmitted (Bytes) 12760884405 (12.76G) Total requests 36988338 Buffer usage (Bytes) 12910592 (12.31M) <- [...] [...] Change log: v3->v2 - use new helper nla_put_uint() instead of nla_put_u64_64bit(). v2->v1 https://lore.kernel.org/r/20240807075939.57882-1-guwen@linux.alibaba.com/ - remove inline keyword in .c files. - use local variable in macros to avoid potential side effects. v1 https://lore.kernel.org/r/20240805090551.80786-1-guwen@linux.alibaba.com/ ==================== Link: https://patch.msgid.link/20240814130827.73321-1-guwen@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-20 11:38:30 +02:00
Wen Gu	e0d103542b	net/smc: introduce statistics for ringbufs usage of net namespace The buffer size histograms in smc_stats, namely rx/tx_rmbsize, record the sizes of ringbufs for all connections that have ever appeared in the net namespace. They are incremental and we cannot know the actual ringbufs usage from these. So here introduces statistics for current ringbufs usage of existing smc connections in the net namespace into smc_stats, it will be incremented when new connection uses a ringbuf and decremented when the ringbuf is unused. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-20 11:38:23 +02:00
Wen Gu	d386d59b7c	net/smc: introduce statistics for allocated ringbufs of link group Currently we have the statistics on sndbuf/RMB sizes of all connections that have ever been on the link group, namely smc_stats_memsize. However these statistics are incremental and since the ringbufs of link group are allowed to be reused, we cannot know the actual allocated buffers through these. So here introduces the statistic on actual allocated ringbufs of the link group, it will be incremented when a new ringbuf is added into buf_list and decremented when it is deleted from buf_list. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-08-20 11:38:23 +02:00
Zhang Changzhong	dca9d62a0d	net: remove redundant check in skb_shift() The check for '!to' is redundant here, since skb_can_coalesce() already contains this check. Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1723730983-22912-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-19 18:20:18 -07:00
Yue Haibing	af3dc0ad31	mptcp: Remove unused declaration mptcp_sockopt_sync() Commit `a1ab24e5fc` ("mptcp: consolidate sockopt synchronization") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816100404.879598-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-19 17:49:00 -07:00
Yue Haibing	c5e2a1b067	net/mlx5: E-Switch, Remove unused declarations These are never implenmented since commit `b691b1116e` ("net/mlx5: Implement devlink port function cmds to control ipsec_packet"). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816101550.881844-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-19 17:48:43 -07:00
Yue Haibing	12906bab44	igbvf: Remove two unused declarations There is no caller and implementations in tree. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816101638.882072-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-19 17:48:33 -07:00
Yue Haibing	359c5eb0f7	gve: Remove unused declaration gve_rx_alloc_rings() Commit `f13697cc7a` ("gve: Switch to config-aware queue allocation") convert this function to gve_rx_alloc_rings_gqi(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816101906.882743-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-19 17:48:20 -07:00
Jakub Kicinski	a2901083b1	tcp_metrics: use netlink policy for IPv6 addr len validation Use the netlink policy to validate IPv6 address length. Destination address currently has policy for max len set, and source has no policy validation. In both cases the code does the real check. With correct policy check the code can be removed. Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240816212245.467745-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-19 17:42:57 -07:00
Frank Li	1bf8e07c38	dt-binding: ptp: fsl,ptp: add pci1957,ee02 compatible string for fsl,enetc-ptp fsl,enetc-ptp is embedded pcie device. Add compatible string pci1957,ee02. Fix warning: arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-kbox-a-230-ls.dtb: ethernet@0,4: compatible:0: 'pci1957,ee02' is not one of ['fsl,etsec-ptp', 'fsl,fman-ptp-timer', 'fsl,dpaa2-ptp', 'fsl,enetc-ptp'] Signed-off-by: Frank Li <Frank.Li@nxp.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-19 09:48:53 +01:00
Simon Horman	a99ef548bb	bnx2x: Set ivi->vlan field as an integer In bnx2x_get_vf_config(): * The vlan field of ivi is a 32-bit integer, it is used to store a vlan ID. * The vlan field of bulletin is a 16-bit integer, it is also used to store a vlan ID. In the current code, ivi->vlan is set using memset. But in the case of setting it to the value of bulletin->vlan, this involves reading 32 bits from a 16bit source. This is likely safe, as the following 6 bytes are padding in the same structure, but none the less, it seems undesirable. However, it is entirely unclear to me how this scheme works on big-endian systems. Resolve this by simply assigning integer values to ivi->vlan. Flagged by W=1 builds. f.e. gcc-14 reports: In function 'fortify_memcpy_chk', inlined from 'bnx2x_get_vf_config' at .../bnx2x_sriov.c:2655:4: .../fortify-string.h:580:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 580 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Link: https://patch.msgid.link/20240815-bnx2x-int-vlan-v1-1-5940b76e37ad@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-16 18:02:28 -07:00
Christoph Paasch	f4ae8420f6	mpls: Reduce skb re-allocations due to skb_cow() mpls_xmit() needs to prepend the MPLS-labels to the packet. That implies one needs to make sure there is enough space for it in the headers. Calling skb_cow() implies however that one wants to change even the playload part of the packet (which is not true for MPLS). Thus, call skb_cow_head() instead, which is what other tunnelling protocols do. Running a server with this comm it entirely removed the calls to pskb_expand_head() from the callstack in mpls_xmit() thus having significant CPU-reduction, especially at peak times. Cc: Roopa Prabhu <roopa@nvidia.com> Reported-by: Craig Taylor <cmtaylor@apple.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240815161201.22021-1-cpaasch@apple.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-16 17:53:49 -07:00
Simon Horman	1c66df8625	net: txgbe: Remove unnecessary NULL check before free Remove unnecessary NULL check before freeing using kvfree(). This function will ignore a NULL argument. Flagged by Coccinelle: .../txgbe_hw.c:187:2-8: WARNING: NULL check before some freeing functions is not needed. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240815-txgbe-kvfree-v1-1-5ecf8656f555@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-16 16:19:34 -07:00
Aleksander Jan Bajkowski	1f803c9569	net: ethernet: lantiq_etop: remove unused variable Remove a variable that has never been used. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Link: https://patch.msgid.link/20240815074956.155224-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-16 16:16:59 -07:00
Tariq Toukan	9480fd0cd8	docs: networking: Align documentation with behavior change Following commit `9f7e8fbb91` ("net/mlx5: offset comp irq index in name by one"), which fixed the index in IRQ name to start once again from 0, we change the documentation accordingly. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://patch.msgid.link/20240815142343.2254247-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-16 14:29:44 -07:00
Frank Li	02404bdb81	dt-bindings: net: mdio: change nodename match pattern Change mdio.yaml nodename match pattern to '^mdio(-(bus\|external))?(@.+\|-([0-9]+))$' Fix mdio.yaml wrong parser mdio controller's address instead phy's address when mdio-mux exista. For example: mdio-mux-emi1@54 { compatible = "mdio-mux-mmioreg", "mdio-mux"; mdio@20 { reg = <0x20>; ^^^ This is mdio controller register ethernet-phy@2 { reg = <0x2>; ^^^ This phy's address }; }; }; Only phy's address is limited to 31 because MDIO bus definition. But CHECK_DTBS report below warning: arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: mdio-mux-emi1@54: mdio@20:reg:0:0: 32 is greater than the maximum of 31 The reason is that "mdio-mux-emi1@54" match "nodename: '^mdio(@.*)?'" in mdio.yaml. Change to '^mdio(-(bus\|external))?(@.+\|-([0-9]+))?$' to avoid wrong match mdio mux controller's node. Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240815163408.4184705-1-Frank.Li@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-16 14:28:53 -07:00

1 2 3 4 5 ...

1295659 Commits