linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-23 20:24:12 +08:00

Author	SHA1	Message	Date
Daniel Golle	f95b4725e7	net: phy: mxl-gpy: add missing support for TRIGGER_NETDEV_LINK_10 The PHY also support 10MBit/s links as well as the corresponding link indication trigger to be offloaded. Add TRIGGER_NETDEV_LINK_10 to the supported triggers. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/cc5da0a989af8b0d49d823656d88053c4de2ab98.1728057367.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 17:12:36 -07:00
Ronak Doshi	0458cbedfe	vmxnet3: support higher link speeds from vmxnet3 v9 Until now, vmxnet3 was default reporting 10Gbps as link speed. Vmxnet3 v9 adds support for user to configure higher link speeds. User can configure the link speed via VMs advanced parameters options in VCenter. This speed is reported in gbps by hypervisor. This patch adds support for vmxnet3 to report higher link speeds and converts it to mbps as expected by Linux stack. Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com> Acked-by: Guolin Yang <guolin.yang@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241004174303.5370-1-ronak.doshi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 17:04:41 -07:00
Linus Walleij	7651f1149a	dt-bindings: net: realtek: Use proper node names We eventually want to get to a place where we fix all DTS files so that we can simply disallow switch/port/ports without the ethernet-* prefix so the DTS files are more readable. Replace: - switch with ethernet-switch - ports with ethernet-ports - port with ethernet-port Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20241004-realtek-bindings-fixup-v2-1-667afa08d184@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:50:27 -07:00
Jakub Kicinski	58ec6857d5	Merge branch 'ipv4-preliminary-work-for-per-netns-rtnl' Eric Dumazet says: ==================== ipv4: preliminary work for per-netns RTNL Inspired by `9b8ca04854` ("ipv4: avoid quadratic behavior in FIB insertion of common address") and per-netns RTNL conversion started by Kuniyuki this week. ip_fib_check_default() can use RCU instead of a shared spinlock. fib_info_lock can be removed, RTNL is already used. fib_info_devhash[] can be removed in favor of a single pointer in net_device. ==================== Link: https://patch.msgid.link/20241004134720.579244-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:46:32 -07:00
Eric Dumazet	a3f5f4c2f9	ipv4: remove fib_info_devhash[] Upcoming per-netns RTNL conversion needs to get rid of shared hash tables. fib_info_devhash[] is one of them. It is unclear why we used a hash table, because a single hlist_head per net device was cheaper and scalable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241004134720.579244-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:46:27 -07:00
Eric Dumazet	143ca845ec	ipv4: remove fib_info_lock After the prior patch, fib_info_lock became redundant because all of its users are holding RTNL. BH protection is not needed. Remove the READ_ONCE()/WRITE_ONCE() annotations around fib_info_cnt, since it is protected by RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241004134720.579244-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:38:58 -07:00
Eric Dumazet	fc38b28365	ipv4: use rcu in ip_fib_check_default() fib_info_devhash[] is not resized in fib_info_hash_move(). fib_nh structs are already freed after an rcu grace period. This will allow to remove fib_info_lock in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241004134720.579244-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:38:58 -07:00
Eric Dumazet	8a0f62fdeb	ipv4: remove fib_devindex_hashfn() fib_devindex_hashfn() converts a 32bit ifindex value to a 8bit hash. It makes no sense doing this from fib_info_hashfn() and fib_find_info_nh(). It is better to keep as many bits as possible to let fib_info_hashfn_result() have better spread. Only fib_info_devhash_bucket() needs to make this operation, we can 'inline' trivial fib_devindex_hashfn() in it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241004134720.579244-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:38:58 -07:00
Vladimir Oltean	1405981bbb	lib: packing: catch kunit_kzalloc() failure in the pack() test kunit_kzalloc() may fail. Other call sites verify that this is the case, either using a direct comparison with the NULL pointer, or the KUNIT_ASSERT_NOT_NULL() or KUNIT_ASSERT_NOT_ERR_OR_NULL(). Pick KUNIT_ASSERT_NOT_NULL() as the error handling method that made most sense to me. It's an unlikely thing to happen, but at least we call __kunit_abort() instead of dereferencing this NULL pointer. Fixes: `e9502ea6db` ("lib: packing: add KUnit tests adapted from selftests") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241004110012.1323427-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:36:25 -07:00
Christophe JAILLET	bec2a32145	mlxsw: spectrum_acl_flex_keys: Constify struct mlxsw_afk_element_inst 'struct mlxsw_afk_element_inst' are not modified in these drivers. Constifying these structures moves some data to a read-only section, so increases overall security. Update a few functions and struct mlxsw_afk_block accordingly. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 4278 4032 0 8310 2076 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.o After: ===== text data bss dec hex filename 7934 352 0 8286 205e drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/8ccfc7bfb2365dcee5b03c81ebe061a927d6da2e.1727541677.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:35:03 -07:00
Russell King (Oracle)	5397706165	net: dsa: remove obsolete phylink dsa_switch operations No driver now uses the DSA switch phylink members, so we can now remove the method pointers, but we need to leave empty shim functions to allow those drivers that do not provide phylink MAC operations structure to continue functioning. Signed-off-by: Russell King (oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # sja1105, felix, dsa_loop Link: https://patch.msgid.link/E1swKNV-0060oN-1b@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:23:10 -07:00
Menglong Dong	269084f748	net: tcp: refresh tcp_mstamp for compressed ack in timer For now, we refresh the tcp_mstamp for delayed acks and keepalives, but not for the compressed ack in tcp_compressed_ack_kick(). I have not found out the effact of the tcp_mstamp when sending ack, but we can still refresh it for the compressed ack to keep consistent. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241003082231.759759-1-dongml2@chinatelecom.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-07 16:01:39 -07:00
Joe Damato	8b641b5e4c	hv_netvsc: Link queues to NAPIs Use netif_queue_set_napi to link queues to NAPI instances so that they can be queried with netlink. Shradha Gupta tested the patch and reported that the results are as expected: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 4, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'}, {'id': 5, 'ifindex': 2, 'napi-id': 8198, 'type': 'rx'}, {'id': 6, 'ifindex': 2, 'napi-id': 8199, 'type': 'rx'}, {'id': 7, 'ifindex': 2, 'napi-id': 8200, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}, {'id': 4, 'ifindex': 2, 'napi-id': 8197, 'type': 'tx'}, {'id': 5, 'ifindex': 2, 'napi-id': 8198, 'type': 'tx'}, {'id': 6, 'ifindex': 2, 'napi-id': 8199, 'type': 'tx'}, {'id': 7, 'ifindex': 2, 'napi-id': 8200, 'type': 'tx'}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Tested-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-06 16:24:06 +01:00
David S. Miller	cf95456862	Merge branch 'sfc-per-q-stats' Edward Cree says: ==================== sfc: per-queue stats This series implements the netdev_stat_ops interface for per-queue statistics in the sfc driver, partly using existing counters that were originally added for ethtool -S output. Changed in v4: * remove RFC tags Changed in v3: * make TX stats count completions rather than enqueues * add new patch #4 to account for XDP TX separately from netdev traffic and include it in base_stats * move the tx_queue->old_* members out of the fastpath cachelines * note on patch #6 that our hw_gso stats still count enqueues * RFC since net-next is closed right now Changed in v2: * exclude (dedicated) XDP TXQ stats from per-queue TX stats * explain patch #3 better ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-06 16:02:24 +01:00
Edward Cree	b3411dbdaa	sfc: add per-queue RX bytes stats While this does add overhead to the fast path, it should be minimal as the cacheline should already be held for write from updating the queue's rx_packets stat. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-06 16:02:23 +01:00
Edward Cree	db3067c8aa	sfc: implement per-queue TSO (hw_gso) stats Use our existing TSO stats, which count enqueued TSO TXes. Users may expect them to count completions, as tx-packets and tx-bytes do; however, these are the counters we have, and the qstats documentation doesn't actually specify. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-06 16:02:23 +01:00
Edward Cree	07e5fa5b7f	sfc: implement per-queue rx drop and overrun stats Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-06 16:02:23 +01:00
Edward Cree	cfa63b9080	sfc: account XDP TXes in netdev base stats When we handle a TX completion for an XDP packet, it is not counted in the per-TXQ netdev stats. Record it in new internal counters, and include those in the device-wide total in efx_get_base_stats(). Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-06 16:02:23 +01:00
Edward Cree	5c24de42f1	sfc: add n_rx_overlength to ethtool stats The previous patch changed when we increment the RX queue's rx_packets counter, to match the semantics of netdev per-queue stats. The differences between the old and new counts are scatter errors (which produce a WARN_ON) and this counter, which is incremented by efx_rx_packet__check_len() when an RX packet (which was placed in a single buffer by SG, i.e. n_frags == 1) has a length (from the RX event) which is too long to fit in the RX buffer. If this occurs, we drop the packet and fire a ratelimited netif_err(). The counter previously was not reported anywhere; add it to ethtool -S output to ensure users still have this information. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-06 16:02:23 +01:00
Edward Cree	873e857950	sfc: implement basic per-queue stats Just RX and TX packet counts and TX bytes for now. We do not have per-queue RX byte counts, which causes us to fail stats.pkt_byte_sum selftest with "Drivers should always report basic keys" error. Per-queue counts are since the last time the queue was inited (typically by efx_start_datapath(), on ifup or reconfiguration); device-wide total (efx_get_base_stats()) is since driver probe. This is not the same lifetime as rtnl_link_stats64, which uses firmware stats which count since FW (re)booted; this can cause a "Qstats are lower" or "RTNL stats are lower" failure in stats.pkt_byte_sum selftest. Move the increment of rx_queue->rx_packets to match the semantics specified for netdev per-queue stats, i.e. just before handing the packet to XDP (if present) or the netstack (through GRO). This will affect the existing ethtool -S output which also reports these counters. XDP TX packets are not yet counted into base_stats. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-06 16:02:23 +01:00
Edward Cree	65131ea8d3	sfc: remove obsolete counters from struct efx_channel The n_rx_tobe_disc and n_rx_mcast_mismatch counters are a legacy from farch, and are never written in EF10 or EF100 code. Remove them from the struct and from ethtool -S output, saving a bit of memory and avoiding user confusion. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-10-06 16:02:22 +01:00
Jakub Kicinski	d521db38f3	Merge branch 'net-switch-back-to-struct-platform_driver-remove' Uwe Kleine-König says: ==================== net: Switch back to struct platform_driver::remove() I already sent a patch last week that is very similar to patch #1 of this series. However the previous submission was based on plain next. I was asked to resend based on net-next once the merge window closed, so here comes this v2. The additional patches address drivers/net/dsa, drivers/net/mdio and the rest of drivers/net apart from wireless which has its own tree and will addressed separately at a later point in time. ==================== Link: https://patch.msgid.link/cover.1727949050.git.u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 16:40:49 -07:00
Uwe Kleine-König	46e338bbd7	net: Switch back to struct platform_driver::remove() After commit `0edb555a65` ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/net after the previous conversion commits apart from the wireless drivers to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 16:39:57 -07:00
Uwe Kleine-König	a208a39ed0	net: mdio: Switch back to struct platform_driver::remove() After commit `0edb555a65` ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/net/mdio to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/0b60d8bfc45a3de8193f953794dda241e11032a9.1727949050.git.u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 16:39:57 -07:00
Uwe Kleine-König	4818016ded	net: dsa: Switch back to struct platform_driver::remove() After commit `0edb555a65` ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/net/dsa to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/36da477cb9fa0bffec32d50c2cf3d18e94a0e7e3.1727949050.git.u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 16:39:57 -07:00
Uwe Kleine-König	e96321fad3	net: ethernet: Switch back to struct platform_driver::remove() After commit `0edb555a65` ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/net/ethernet to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/18f7c585a1a8a8ac8b03a2fca7de19bd5c52ac2b.1727949050.git.u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 16:39:56 -07:00
Sam Edwards	41378cfdc4	net: dsa: bcm_sf2: fix crossbar port bitwidth logic The SF2 crossbar register is a packed bitfield, giving the index of the external port selected for each of the internal ports. On BCM4908 (the only currently-supported switch family with a crossbar), there are 2 internal ports and 3 external ports, so there are 2 bits per internal port. The driver currently conflates the "bits per port" and "number of ports" concepts, lumping both into the `num_crossbar_int_ports` field. Since it is currently only possible for either of these counts to have a value of 2, there is no behavioral error resulting from this situation for now. Make the code more readable (and support the future possibility of larger crossbars) by adding a `num_crossbar_ext_bits` field to represent the "bits per port" count and relying on this where appropriate instead. Signed-off-by: Sam Edwards <CFSworks@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241003212301.1339647-1-CFSworks@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 16:16:15 -07:00
Jakub Kicinski	0da7fb3bca	Merge branch 'net-prepare-pacing-offload-support' Eric Dumazet says: ==================== net: prepare pacing offload support Some network devices have the ability to offload EDT (Earliest Departure Time) which is the model used for TCP pacing and FQ packet scheduler. Some of them implement the timing wheel mechanism described in https://saeed.github.io/files/carousel-sigcomm17.pdf with an associated 'timing wheel horizon'. In order to upstream the NIC support, this series adds : 1) timing wheel horizon as a per-device attribute. 2) FQ packet scheduler support, to let paced packets below the timing wheel horizon be handled by the driver. v1: https://lore.kernel.org/20240930152304.472767-2-edumazet@google.com ==================== Link: https://patch.msgid.link/20241003121219.2396589-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:37:58 -07:00
Jeffrey Ji	f26080d470	net_sched: sch_fq: add the ability to offload pacing Some network devices have the ability to offload EDT (Earliest Departure Time) which is the model used for TCP pacing and FQ packet scheduler. Some of them implement the timing wheel mechanism described in https://saeed.github.io/files/carousel-sigcomm17.pdf with an associated 'timing wheel horizon'. This patchs adds to FQ packet scheduler TCA_FQ_OFFLOAD_HORIZON attribute. Its value is capped by the device max_pacing_offload_horizon, added in the prior patch. It allows FQ to let packets within pacing offload horizon to be delivered to the device, which will handle the needed delay without host involvement. Signed-off-by: Jeffrey Ji <jeffreyji@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20241003121219.2396589-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:37:54 -07:00
Eric Dumazet	f858cc9eed	net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute Some network devices have the ability to offload EDT (Earliest Departure Time) which is the model used for TCP pacing and FQ packet scheduler. Some of them implement the timing wheel mechanism described in https://saeed.github.io/files/carousel-sigcomm17.pdf with an associated 'timing wheel horizon'. This patch adds dev->max_pacing_offload_horizon expressing this timing wheel horizon in nsec units. This is a read-only attribute. Unless a driver sets it, dev->max_pacing_offload_horizon is zero. v2: addressed Jakub feedback ( https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/#mf6294d714c41cc459962154cc2580ce3c9693663 ) v3: added yaml doc (also per Jakub feedback) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20241003121219.2396589-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:37:53 -07:00
Mahesh Bandewar	3d07b691ee	selftest/ptp: update ptp selftest to exercise the gettimex options With the inclusion of commit `c259acab83` ("ptp/ioctl: support MONOTONIC{,_RAW} timestamps for PTP_SYS_OFFSET_EXTENDED") clock_gettime() now allows retrieval of pre/post timestamps for CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW timebases along with the previously supported CLOCK_REALTIME. This patch adds a command line option 'y' to the testptp program to choose one of the allowed timebases [realtime aka system, monotonic, and monotonic-raw). Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Shuah Khan <shuah@kernel.org> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://patch.msgid.link/20241003101506.769418-1-maheshb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:36:43 -07:00
Jakub Kicinski	2f65168355	Merge branch 'tcp-add-fast-path-in-timer-handlers' Eric Dumazet says: ==================== tcp: add fast path in timer handlers As mentioned in Netconf 2024: TCP retransmit and delack timers are not stopped from inet_csk_clear_xmit_timer() because we do not define INET_CSK_CLEAR_TIMERS. Enabling INET_CSK_CLEAR_TIMERS leads to lower performance, mainly because del_timer() and mod_timer() happen from different cpus quite often. What we can do instead is to add fast paths to tcp_write_timer() and tcp_delack_timer() to avoid socket spinlock acquisition. ==================== Link: https://patch.msgid.link/20241002173042.917928-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:34:43 -07:00
Eric Dumazet	81df4fa94e	tcp: add a fast path in tcp_delack_timer() delack timer is not stopped from inet_csk_clear_xmit_timer() because we do not define INET_CSK_CLEAR_TIMERS. This is a conscious choice : inet_csk_clear_xmit_timer() is often called from another cpu. Calling del_timer() would cause false sharing and lock contention. This means that very often, tcp_delack_timer() is called at the timer expiration, while there is no ACK to transmit. This can be detected very early, avoiding the socket spinlock. Notes: - test about tp->compressed_ack is racy, but in the unlikely case there is a race, the dedicated compressed_ack_timer hrtimer would close it. - Even if the fast path is not taken, reading icsk->icsk_ack.pending and tp->compressed_ack before acquiring the socket spinlock reduces acquisition time and chances of contention. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241002173042.917928-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:34:40 -07:00
Eric Dumazet	3b78429301	tcp: add a fast path in tcp_write_timer() retransmit timer is not stopped from inet_csk_clear_xmit_timer() because we do not define INET_CSK_CLEAR_TIMERS. This is a conscious choice : for active TCP flows, it is better to only call mod_timer(), because there is more chances of keeping the timer unchanged. Also inet_csk_clear_xmit_timer() is often called from another cpu, and calling del_timer() would cause false sharing and lock contention. This means that very often, tcp_write_timer() is called at the timer expiration, while there is nothing to retransmit. This can be detected very early, avoiding the socket spinlock. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241002173042.917928-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:34:39 -07:00
Eric Dumazet	5a9071a760	tcp: annotate data-races around icsk->icsk_pending icsk->icsk_pending can be read locklessly already. Following patch in the series will add another lockless read. Add smp_load_acquire() and smp_store_release() annotations because following patch will add a test in tcp_write_timer(), and READ_ONCE()/WRITE_ONCE() alone would possibly lead to races. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241002173042.917928-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:34:39 -07:00
Jakub Kicinski	d454184bba	Merge branch 'selftests-net-ioam-add-tunsrc-support' Justin Iurman says: ==================== selftests: net: ioam: add tunsrc support TL;DR This patch comes from a discussion we had with Jakub and Paolo on aligning the ioam selftests with its new "tunsrc" feature. This patch updates the IOAM selftests to support the new "tunsrc" feature of IOAM. As a consequence, some changes were required. For example, the IPv6 header must be accessed to check some fields (i.e., the source address for the "tunsrc" feature), which is not possible AFAIK with IPv6 raw sockets. The latter is currently used with IPV6_RECVHOPOPTS and was introduced by commit `187bbb6968` ("selftests: ioam: refactoring to align with the fix") to fix an issue. But, we really need packet sockets actually... which is one of the changes in this patch (see the description of the topology at the top of ioam6.sh for explanations). Another change is that all IPv6 addresses used in the topology are now based on the documentation prefix (2001:db8::/32). Also, the tests have been improved and there are now many more of them. Overall, the script is more robust. ==================== Link: https://patch.msgid.link/20241002162731.19847-1-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:34:11 -07:00
Justin Iurman	2d2b5028b4	selftests: net: add new ioam tests This patch re-adds the (updated) ioam selftests with support for the tunsrc feature. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20241002162731.19847-3-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:34:07 -07:00
Justin Iurman	897408d5e2	selftests: net: remove ioam tests This patch entirely removes the ioam selftests to prepare for the next patch in this series, which re-adds the new ioam selftests for better readability. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20241002162731.19847-2-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:34:07 -07:00
Linus Walleij	94a2a84f5e	net: dsa: mv88e6xxx: Support LED control This adds control over the hardware LEDs in the Marvell MV88E6xxx DSA switch and enables it for MV88E6352. This fixes an imminent problem on the Inteno XG6846 which has a WAN LED that simply do not work with hardware defaults: driver amendment is necessary. The patch is modeled after Christian Marangis LED support code for the QCA8k DSA switch, I got help with the register definitions from Tim Harvey. After this patch it is possible to activate hardware link indication like this (or with a similar script): cd /sys/class/leds/Marvell\ 88E6352:05:00:green:wan/ echo netdev > trigger echo 1 > link This makes the green link indicator come up on any link speed. It is also possible to be more elaborate, like this: cd /sys/class/leds/Marvell\ 88E6352:05:00:green:wan/ echo netdev > trigger echo 1 > link_1000 cd /sys/class/leds/Marvell\ 88E6352:05:01:amber:wan/ echo netdev > trigger echo 1 > link_100 Making the green LED come on for a gigabit link and the amber LED come on for a 100 mbit link. Each port has 2 LED slots (the hardware may use just one or none) and the hardware triggers are specified in four bits per LED, and some of the hardware triggers are only available on the SFP (fiber) uplink. The restrictions are described in the port.h header file where the registers are described. For example, selector 1 set for LED 1 on port 5 or 6 will indicate Fiber 1000 (gigabit) and activity with a blinking LED, but ONLY for an SFP connection. If port 5/6 is used with something not SFP, this selector is a noop: something else need to be selected. After the previous series rewriting the MV88E6xxx DT bindings to use YAML a "leds" subnode is already valid for each port, in my scratch device tree it looks like this: leds { #address-cells = <1>; #size-cells = <0>; led@0 { reg = <0>; color = <LED_COLOR_ID_GREEN>; function = LED_FUNCTION_LAN; default-state = "off"; linux,default-trigger = "netdev"; }; led@1 { reg = <1>; color = <LED_COLOR_ID_AMBER>; function = LED_FUNCTION_LAN; default-state = "off"; }; }; This DT config is not yet configuring everything: when the netdev default trigger is assigned the hw acceleration callbacks are not called, and there is no way to set the netdev sub-trigger type (such as link_1000) from the device tree, such as if you want a gigabit link indicator. This has to be done from userspace at this point. We add LED operations to all switches in the 6352 family: 6172, 6176, 6240 and 6352. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241001-mv88e6xxx-leds-v4-1-cc11c4f49b18@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 15:31:28 -07:00
Michael Kelley	c86ab60b92	hv_netvsc: Don't assume cpu_possible_mask is dense Current code allocates the pcpu_sum array with size num_possible_cpus(). This code assumes the cpu_possible_mask is dense, which is not true in the general case per [1]. If cpu_possible_mask is sparse, the array might be indexed by a value beyond the size of the array. However, the configurations that Hyper-V provides to guest VMs on x86 and ARM64 hardware, in combination with how architecture specific code assigns Linux CPU numbers, does always produce a dense cpu_possible_mask. So the dense assumption is not currently causing failures. But for robustness against future changes in how cpu_possible_mask is populated, update the code to no longer assume dense. The correct approach is to allocate and initialize the array using size "nr_cpu_ids". While this leaves unused array entries corresponding to holes in cpu_possible_mask, the holes are assumed to be minimal and hence the amount of memory wasted by unused entries is minimal. [1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20241003035333.49261-6-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 13:09:20 -07:00
Daniel Zahka	5c2ab978f9	ethtool: rss: fix rss key initialization warning This warning is emitted when a driver does not default populate an rss key when one is not provided from userspace. Some devices do not support individual rss keys per context. For these devices, it is ok to leave the key zeroed out in ethtool_rxfh_context. Do not warn on zeroed key when ethtool_ops.rxfh_per_ctx_key == 0. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20241003162310.1310576-1-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 12:34:13 -07:00
Jakub Kicinski	00110c5eeb	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-10-01 (ice) This series contains updates to ice driver only. Karol cleans up current PTP GPIO pin handling, fixes minor bugs, refactors implementation for all products, introduces SDP (Software Definable Pins) for E825C and implements reading SDP section from NVM for E810 products. Sergey replaces multiple aux buses and devices used in the PTP support code with struct ice_adapter holding the necessary shared data. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Drop auxbus use for PTP to finalize ice_adapter move ice: Use ice_adapter for PTP shared data instead of auxdev ice: Initial support for E825C hardware in ice_adapter ice: Add ice_get_ctrl_ptp() wrapper to simplify the code ice: Introduce ice_get_phy_model() wrapper ice: Enable 1PPS out from CGU for E825C products ice: Read SDP section from NVM for pin definitions ice: Disable shared pin on E810 on setfunc ice: Cache perout/extts requests and check flags ice: Align E810T GPIO to other products ice: Add SDPs support for E825C ice: Implement ice_ptp_pin_desc ==================== Link: https://patch.msgid.link/20241001201702.3252954-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 12:30:18 -07:00
Jakub Kicinski	a73f214e89	Merge branch 'add-option-to-provide-opt_id-value-via-cmsg' Vadim Fedorenko says: ==================== Add option to provide OPT_ID value via cmsg SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX timestamps and packets sent via socket. Unfortunately, there is no way to reliably predict socket timestamp ID value in case of error returned by sendmsg. For UDP sockets it's impossible because of lockless nature of UDP transmit, several threads may send packets in parallel. In case of RAW sockets MSG_MORE option makes things complicated. More details are in the conversation [1]. This patch adds new control message type to give user-space software an opportunity to control the mapping between packets and values by providing ID with each sendmsg. The first patch in the series adds all needed definitions and implements the function for UDP sockets. The explicit check of socket's type is not added because subsequent patches in the series will add support for other types of sockets. The documentation is also included into the first patch. Patch 2/4 adds support for TCP sockets. This part is simple and straight forward. Patch 3/4 adds support for RAW sockets. It's a bit tricky because sock_tx_timestamp functions has to be refactored to receive full socket cookie information to fill in ID. The commit `b534dc46c8` ("net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP") did the conversion of sk_tsflags to u32 but sock_tx_timestamp functions were not converted and still receive 16b flags. It wasn't a problem because SOF_TIMESTAMPING_OPT_ID_TCP was not checked in these functions, that's why no backporting is needed. Patch 4/4 adds selftests for new feature. [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/ ==================== Link: https://patch.msgid.link/20241001125716.2832769-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:52:23 -07:00
Vadim Fedorenko	a89568e9be	selftests: txtimestamp: add SCM_TS_OPT_ID test Extend txtimestamp test to run with fixed tskey using SCM_TS_OPT_ID control message for all types of sockets. Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241001125716.2832769-4-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:52:20 -07:00
Vadim Fedorenko	822b5bc6db	net_tstamp: add SCM_TS_OPT_ID for RAW sockets The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW sockets. To add new option this patch converts all callers (direct and indirect) of _sock_tx_timestamp to provide sockcm_cookie instead of tsflags. And while here fix __sock_tx_timestamp to receive tsflags as __u32 instead of __u16. Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241001125716.2832769-3-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:52:19 -07:00
Vadim Fedorenko	4aecca4c76	net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX timestamps and packets sent via socket. Unfortunately, there is no way to reliably predict socket timestamp ID value in case of error returned by sendmsg. For UDP sockets it's impossible because of lockless nature of UDP transmit, several threads may send packets in parallel. In case of RAW sockets MSG_MORE option makes things complicated. More details are in the conversation [1]. This patch adds new control message type to give user-space software an opportunity to control the mapping between packets and values by providing ID with each sendmsg for UDP sockets. The documentation is also added in this patch. [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/ Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241001125716.2832769-2-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:52:19 -07:00
Jakub Kicinski	34ea1df802	Merge branch 'net-mlx5-hw-counters-refactor' Tariq Toukan says: ==================== net/mlx5: hw counters refactor This is a patchset re-post, see: https://lore.kernel.org/20240815054656.2210494-7-tariqt@nvidia.com In this patchset, Cosmin refactors hw counters and solves perf scaling issue. Series generated against: commit `c824deb1a8` ("cxgb4: clip_tbl: Fix spelling mistake "wont" -> "won't"") HW counters are central to mlx5 driver operations. They are hardware objects created and used alongside most steering operations, and queried from a variety of places. Most counters are queried in bulk from a periodic task in fs_counters.c. Counter performance is important and as such, a variety of improvements have been done over the years. Currently, counters are allocated from pools, which are bulk allocated to amortize the cost of firmware commands. Counters are managed through an IDR, a doubly linked list and two atomic single linked lists. Adding/removing counters is a complex dance between user contexts requesting it and the mlx5_fc_stats_work task which does most of the work. Under high load (e.g. from connection tracking flow insertion/deletion), the counter code becomes a bottleneck, as seen on flame graphs. Whenever a counter is deleted, it gets added to a list and the wq task is scheduled to run immediately to actually delete it. This is done via mod_delayed_work which uses an internal spinlock. In some tests, waiting for this spinlock took up to 66% of all samples. This series refactors the counter code to use a more straight-forward approach, avoiding the mod_delayed_work problem and making the code easier to understand. For that: - patch #1 moves counters data structs to a more appropriate place. - patch #2 simplifies the bulk query allocation scheme by using vmalloc. - patch #3 replaces the IDR+3 lists with an xarray. This is the main patch of the series, solving the spinlock congestion issue. - patch #4 removes an unnecessary cacheline alignment causing a lot of memory to be wasted. - patches #5 and #6 are small cleanups enabled by the refactoring. ==================== Link: https://patch.msgid.link/20241001103709.58127-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:48 -07:00
Cosmin Ratiu	d1c9cffe4b	net/mlx5: hw counters: Remove mlx5_fc_create_ex It no longer serves any purpose and is identical to mlx5_fc_create upon which it was originally based of. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:47 -07:00
Cosmin Ratiu	4a67ebf85f	net/mlx5: hw counters: Don't maintain a counter count num_counters is only used for deciding whether to grow the bulk query buffer, which is done once more counters than a small initial threshold are present. After that, maintaining num_counters serves no purpose. This commit replaces that with an actual xarray traversal to count the counters. This appears expensive at first sight, but is only done when the number of counters is less than the initial threshold (8) and only once every sampling interval. Once the number of counters goes above the threshold, the bulk query buffer is grown to max size and the xarray traversal is never done again. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:46 -07:00
Cosmin Ratiu	d95f77f119	net/mlx5: hw counters: Drop unneeded cacheline alignment The mlx5_fc struct has a cache for values queried from hw, which is cacheline aligned. On x86_64, this results in: struct mlx5_fc { u32 id; /* 0 4 / bool aging; / 4 1 / / XXX 3 bytes hole, try to pack / struct mlx5_fc_bulk bulk; /* 8 8 / / XXX 48 bytes hole, try to pack / / --- cacheline 1 boundary (64 bytes) --- / struct mlx5_fc_cache cache __attribute__((__aligned__(64))); / 64 24 / u64 lastpackets; / 88 8 / u64 lastbytes; / 96 8 / / size: 128, cachelines: 2, members: 6 / / sum members: 53, holes: 2, sum holes: 51 / / padding: 24 / / forced aligns: 1, forced holes: 1, sum forced holes: 48 / } __attribute__((__aligned__(64))); (output from pahole). ...So a 48+24=72 byte waste. As far as I can determine, this serves no purpose other than maybe making sure that the values in the cache do not span two cachelines in the worst case scenario, but that's not a valid enough reason to waste 72 bytes per counter, especially since this code is not performance-critical. There could potentially be hundreds of thousands of counters (e.g. for connection-tracking), so this quickly adds up to multiple MB wasted. This commit removes the alignment, resulting in: struct mlx5_fc { [...] / size: 56, cachelines: 1, members: 6 / / sum members: 53, holes: 1, sum holes: 3 / / last cacheline: 56 bytes */ }; Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:46 -07:00

... 2 3 4 5 6 ...

1309164 Commits