linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-25 13:14:07 +08:00

Author	SHA1	Message	Date
Coco Li	5eddb24901	gro: add support of (hw)gro packets to gro stack Current GRO stack only supports incoming packets containing one frame/MSS. This patch changes GRO to accept packets that are already GRO. HW-GRO (aka RSC for some vendors) is very often limited in presence of interleaved packets. Linux SW GRO stack can complete the job and provide larger GRO packets, thus reducing rate of ACK packets and cpu overhead. This also means BIG TCP can still be used, even if HW-GRO/RSC was able to cook ~64 KB GRO packets. v2: fix logic in tcp_gro_receive() Only support TCP for the moment (Paolo) Co-Developed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Coco Li <lixiaoyan@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 12:38:34 +01:00
David S. Miller	197060c155	Merge branch 'mptcp-fastclose' Mat Martineau says: ==================== mptcp: Fastclose edge cases and error handling MPTCP has existing code to use the MP_FASTCLOSE option header, which works like a RST for the MPTCP-level connection (regular RSTs only affect specific subflows in MPTCP). This series has some improvements for fastclose. Patch 1 aligns fastclose socket error handling with TCP RST behavior on TCP sockets. Patch 2 adds use of MP_FASTCLOSE in some more edge cases, like file descriptor close, FIN_WAIT timeout, and when the socket has unread data. Patch 3 updates the fastclose self tests. Patch 4 does not change any code, just fixes some outdated comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:18:53 +01:00
Paolo Abeni	d89e3ed76b	mptcp: update misleading comments. The MPTCP data path is quite complex and hard to understend even without some foggy comments referring to modified code and/or completely misleading from the beginning. Update a few of them to more accurately describing the current status. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:18:53 +01:00
Paolo Abeni	6bf41020b7	selftests: mptcp: update and extend fastclose test-cases After the previous patches, the MPTCP protocol can generate fast-closes on both ends of the connection. Rework the relevant test-case to carefully trigger the fast-close code-path on a single end at the time, while ensuring than a predictable amount of data is spooled on both ends. Additionally add another test-cases for the passive socket fast-close. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:18:53 +01:00
Paolo Abeni	d21f834855	mptcp: use fastclose on more edge scenarios Daire reported a user-space application hang-up when the peer is forcibly closed before the data transfer completion. The relevant application expects the peer to either do an application-level clean shutdown or a transport-level connection reset. We can accommodate a such user by extending the fastclose usage: at fd close time, if the msk socket has some unread data, and at FIN_WAIT timeout. Note that at MPTCP close time we must ensure that the TCP subflows will reset: set the linger socket option to a suitable value. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:18:53 +01:00
Paolo Abeni	69800e516e	mptcp: propagate fastclose error When an mptcp socket is closed due to an incoming FASTCLOSE option, so specific sk_err is set and later syscall will fail usually with EPIPE. Align the current fastclose error handling with TCP reset, properly setting the socket error according to the current msk state and propagating such error. Additionally sendmsg() is currently not handling properly the sk_err, always returning EPIPE. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:18:53 +01:00
David S. Miller	7171e8a1a4	Merge branch 'RollBall-Hilink-Turris-10G-copper-SFP-support' Marek Behún says: ==================== RollBall / Hilink / Turris 10G copper SFP support I am resurrecting my attempt to add support for RollBall / Hilink / Turris 10G copper SFPs modules. The modules contain Marvell 88X3310 PHY, which can communicate with the system via sgmii, 2500base-x, 5gbase-r, 10gbase-r or usxgmii mode. Some of the patches I've taken from Russell King's net-queue [1] (with some rebasing). The important change from my previous attempts are: - I am including the changes needed to phylink and marvell10g driver, so that the 88X3310 PHY is configured to use PHY modes supported by the host (the PHY defaults to use 10gbase-r only on host's side) - I have changed the patch that informs phylib about the interfaces supported by the host (patch 5 of this series): it now fills in the phydev->host_interfaces member only when connecting a PHY that is inside a SFP module. This may change in the future. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:33 +01:00
Marek Behún	324e88cbe3	net: sfp: add support for multigig RollBall transceivers This adds support for multigig copper SFP modules from RollBall/Hilink. These modules have a specific way to access clause 45 registers of the internal PHY. We also need to wait at least 22 seconds after deasserting TX disable before accessing the PHY. The code waits for 25 seconds just to be sure. Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:33 +01:00
Marek Behún	09bbedac72	net: phy: mdio-i2c: support I2C MDIO protocol for RollBall SFP modules Some multigig SFPs from RollBall and Hilink do not expose functional MDIO access to the internal PHY of the SFP via I2C address 0x56 (although there seems to be read-only clause 22 access on this address). Instead these SFPs PHY can be accessed via I2C via the SFP Enhanced Digital Diagnostic Interface - I2C address 0x51. The SFP_PAGE has to be selected to 3 and the password must be filled with 0xff bytes for this PHY communication to work. This extends the mdio-i2c driver to support this protocol by adding a special parameter to mdio_i2c_alloc function via which this RollBall protocol can be selected. Signed-off-by: Marek Behún <kabel@kernel.org> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:33 +01:00
Marek Behún	e85b1347ac	net: sfp: create/destroy I2C mdiobus before PHY probe/after PHY release Instead of configuring the I2C mdiobus when SFP driver is probed, create/destroy the mdiobus before the PHY is probed for/after it is released. This way we can tell the mdio-i2c code which protocol to use for each SFP transceiver. Move the code that determines MDIO I2C protocol from sfp_sm_probe_for_phy() to sfp_sm_mod_probe(), where most of the SFP ID parsing is done. Don't allocate I2C bus if no PHY is expected. Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:33 +01:00
Marek Behún	13c8adcf22	net: sfp: Add and use macros for SFP quirks definitions Add macros SFP_QUIRK(), SFP_QUIRK_M() and SFP_QUIRK_F() for defining SFP quirk table entries. Use them to deduplicate the code a little bit. Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:33 +01:00
Marek Behún	31eb8907aa	net: phylink: allow attaching phy for SFP modules on 802.3z mode Some SFPs may contain an internal PHY which may in some cases want to connect with the host interface in 1000base-x/2500base-x mode. Do not fail if such PHY is being attached in one of these PHY interface modes. Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Pali Rohár <pali@kernel.org> Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:33 +01:00
Russell King	d6d2929264	net: phy: marvell10g: select host interface configuration Select the host interface configuration according to the capabilities of the host if the host provided them. This is currently provided only when connecting PHY that is inside a SFP. The PHY supports several configurations of host communication: - always communicate with host in 10gbase-r, even if copper speed is lower (rate matching mode), - the same as above but use xaui/rxaui instead of 10gbase-r, - switch host SerDes mode between 10gbase-r, 5gbase-r, 2500base-x and sgmii according to copper speed, - the same as above but use xaui/rxaui instead of 10gbase-r. This mode of host communication, called MACTYPE, is by default selected by strapping pins, but it can be changed in software. This adds support for selecting this mode according to which modes are supported by the host. This allows the kernel to: - support SFP modules with 88X33X0 or 88E21X0 inside them Note: we use mv3310_select_mactype() for both 88X3310 and 88X3340, although 88X3340 does not support XAUI. This is not a problem because 88X3340 does not declare XAUI in it's supported_interfaces, and so this function will never choose that MACTYPE. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> [ rebase, updated, also added support for 88E21X0 ] Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:32 +01:00
Marek Behún	3891569b2f	net: phy: marvell10g: Use tabs instead of spaces for indentation Some register definitions were defined with spaces used for indentation. Change them to tabs. Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:32 +01:00
Marek Behún	eca68a3c7d	net: phylink: pass supported host PHY interface modes to phylib for SFP's PHYs Pass the supported PHY interface types to phylib if the PHY we are connecting is inside a SFP, so that the PHY driver can select an appropriate host configuration mode for their interface according to the host capabilities. For example the Marvell 88X3310 PHY inside RollBall SFP modules defaults to 10gbase-r mode on host's side, and the marvell10g driver currently does not change this setting. But a host may not support 10gbase-r. For example Turris Omnia only supports sgmii, 1000base-x and 2500base-x modes. The PHY can be configured to use those modes, but in order for the PHY driver to do that, it needs to know which modes are supported. Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:32 +01:00
Russell King (Oracle)	e60846370c	net: phylink: rename phylink_sfp_config() phylink_sfp_config() now only deals with configuring the MAC for a SFP containing a PHY. Rename it to be specific. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:32 +01:00
Russell King	f81fa96d8a	net: phylink: use phy_interface_t bitmaps for optical modules Where a MAC provides a phy_interface_t bitmap, use these bitmaps to select the operating interface mode for optical SFP modules, rather than using the linkmode bitmaps. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:32 +01:00
Russell King	fd580c9830	net: sfp: augment SFP parsing with phy_interface_t bitmap We currently parse the SFP EEPROM to a bitmap of ethtool link modes, and then attempt to convert the link modes to a PHY interface mode. While this works at present, there are cases where this is sub-optimal. For example, where a module can operate with several different PHY interface modes. To start addressing this, arrange for the SFP EEPROM parsing to also provide a bitmap of the possible PHY interface modes. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:32 +01:00
Russell King (Oracle)	1645f44dd5	net: phylink: add ability to validate a set of interface modes Rather than having the ability to validate all supported interface modes or a single interface mode, introduce the ability to validate a subset of supported modes. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [ rebased on current net-next ] Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 11:08:32 +01:00
David S. Miller	3735264d72	Merge branch 'ip_tunnel-netlink-parms' Liu Jian says: ==================== Add helper functions to parse netlink msg of ip_tunnel v1->v2: Move the implementation of the helper function to ip_tunnel_core.c v2->v3: Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 07:59:06 +01:00
Liu Jian	b86fca800a	net: Add helper function to parse netlink msg of ip_tunnel_parm Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm. Reduces duplicate code, no actual functional changes. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 07:59:06 +01:00
Liu Jian	537dd2d9fb	net: Add helper function to parse netlink msg of ip_tunnel_encap Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap. Reduces duplicate code, no actual functional changes. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 07:59:06 +01:00
David S. Miller	42e8e6d906	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Refactor selftests to use an array of structs in xfrm_fill_key(). From Gautam Menghani. 2) Drop an unused argument from xfrm_policy_match. From Hongbin Wang. 3) Support collect metadata mode for xfrm interfaces. From Eyal Birger. 4) Add netlink extack support to xfrm. From Sabrina Dubroca. Please note, there is a merge conflict in: include/net/dst_metadata.h between commit: `0a28bfd497` ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support") from the net-next tree and commit: `5182a5d48c` ("net: allow storing xfrm interface metadata in metadata_dst") from the ipsec-next tree. Can be solved as done in linux-next. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-03 07:52:13 +01:00
David S. Miller	9d43507319	Merge branch 'tc-bind_class-hook' Zhengchao Shao says: ==================== refactor duplicate codes in bind_class hook function All the bind_class callback duplicate the same logic, so we can refactor them. First, ensure n arg not empty before call bind_class hook function. Then, add tc_cls_bind_class() helper. Last, use tc_cls_bind_class() in filter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-02 16:07:17 +01:00
Zhengchao Shao	cc9039a134	net: sched: use tc_cls_bind_class() in filter Use tc_cls_bind_class() in filter. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-02 16:07:17 +01:00
Zhengchao Shao	402963e34a	net: sched: cls_api: introduce tc_cls_bind_class() helper All the bind_class callback duplicate the same logic, this patch introduces tc_cls_bind_class() helper for common usage. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-02 16:07:17 +01:00
Zhengchao Shao	4e6263ec8b	net: sched: ensure n arg not empty before call bind_class All bind_class callbacks are directly returned when n arg is empty. Therefore, bind_class is invoked only when n arg is not empty. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-10-02 16:07:17 +01:00
Jakub Kicinski	bc37b24ee0	Merge branch 'mlx5-xsk-updates-part3-2022-09-30' Saeed Mahameed says: ==================== mlx5 xsk updates part3 2022-09-30 The gist of this 4 part series is in this patchset's last patch This series contains performance optimizations. XSK starts using the batching allocator, and XSK data path gets separated from the regular RX, allowing to drop some branches not relevant for non-XSK use cases. Some minor optimizations for indirect calls and need_wakeup are also included. Other than that, this series adds a few features to the mlx5e implementation of XSK: 1. XDP metadata support on XSK RQs. 2. RSS contexts support for XSK RQs. 3. Some other optimizations 4. Last but not least, change the queuing scheme, so that XSK RQs no longer use higher indices, but replace the regular RQs. Maxim Says: ========== In the initial implementation of XSK in mlx5e, XSK RQs coexisted with regular RQs in the same channel. The main idea was to allow RSS work the same for regular traffic, without need to reconfigure RSS to exclude XSK queues. However, this scheme didn't prove to be beneficial, mainly because of incompatibility with other vendors. Some tools don't properly support using higher indices for XSK queues, some tools get confused with the double amount of RQs exposed in sysfs. Some use cases are purely XSK, and allocating the same amount of unused regular RQs is a waste of resources. This commit changes the queuing scheme to the standard one, where XSK RQs replace regular RQs on the channels where XSK sockets are open. Two RQs still exist in the channel to allow failsafe disable of XSK, but only one is exposed at a time. The next commit will achieve the desired memory save by flushing the buffers when the regular RQ is unused. As the result of this transition: 1. It's possible to use RSS contexts over XSK RQs. 2. It's possible to dedicate all queues to XSK. 3. When XSK RQs coexist with regular RQs, the admin should make sure no unwanted traffic goes into XSK RQs by either excluding them from RSS or settings up the XDP program to return XDP_PASS for non-XSK traffic. 4. When using a mixed fleet of mlx5e devices and other netdevs, the same configuration can be applied. If the application supports the fallback to copy mode on unsupported drivers, it will work too. ========== Part 4 will include some final xsk optimizations and minor improvements part 1: https://lore.kernel.org/netdev/20220927203611.244301-1-saeed@kernel.org/ part 2: https://lore.kernel.org/netdev/20220929072156.93299-1-saeed@kernel.org/ ==================== Link: https://lore.kernel.org/r/20220930162903.62262-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:26 -07:00
Maxim Mikityanskiy	3db4c85cde	net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues In the initial implementation of XSK in mlx5e, XSK RQs coexisted with regular RQs in the same channel. The main idea was to allow RSS work the same for regular traffic, without need to reconfigure RSS to exclude XSK queues. However, this scheme didn't prove to be beneficial, mainly because of incompatibility with other vendors. Some tools don't properly support using higher indices for XSK queues, some tools get confused with the double amount of RQs exposed in sysfs. Some use cases are purely XSK, and allocating the same amount of unused regular RQs is a waste of resources. This commit changes the queuing scheme to the standard one, where XSK RQs replace regular RQs on the channels where XSK sockets are open. Two RQs still exist in the channel to allow failsafe disable of XSK, but only one is exposed at a time. The next commit will achieve the desired memory save by flushing the buffers when the regular RQ is unused. As the result of this transition: 1. It's possible to use RSS contexts over XSK RQs. 2. It's possible to dedicate all queues to XSK. 3. When XSK RQs coexist with regular RQs, the admin should make sure no unwanted traffic goes into XSK RQs by either excluding them from RSS or settings up the XDP program to return XDP_PASS for non-XSK traffic. 4. When using a mixed fleet of mlx5e devices and other netdevs, the same configuration can be applied. If the application supports the fallback to copy mode on unsupported drivers, it will work too. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy	d9ba64deb2	net/mlx5e: Introduce the mlx5e_flush_rq function Add a function to flush an RQ: clean up descriptors, release pages and reset the RQ. This procedure is used by the recovery flow, and it will also be used in a following commit to free some memory when switching a channel to the XSK mode. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy	a752b2edb5	net/mlx5e: xsk: Support XDP metadata on XSK RQs Add support for XDP metadata on XSK RQs for cross-program communication. The driver no longer calls xdp_set_data_meta_invalid and copies the metadata to a newly allocated SKB on XDP_PASS. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy	ddb7afeee2	net/mlx5e: Optimize RQ page deallocation mlx5e_free_rx_mpwqe loops over all pages of a MPWQE, calling mlx5e_page_release for ones that are not scheduled for XDP_TX or XDP_REDIRECT; and mlx5e_page_release checks whether it's an XSK RQ or a regular one for each page/XSK frame. This check can be moved outside the loop to reduce the number of branches. mlx5e_free_rx_wqe loops over all fragments, calling mlx5e_page_release for the ones that are last in a page; and mlx5e_page_release checks whether it's an XSK RQ or a regular one for each fragment. Using the fact that XSK doesn't support multiple fragments, it can be optimized for both XSK and regular usages: 1. Make an early check for XSK and call its deallocator directly, saving 3 branches (loop condition, frag->last_in_page and selection of deallocator). 2. Call the regular deallocator directly in the non-XSK case, saving a branch per fragment, except the first one. After the changes, mlx5e_page_release is removed, as there are no callers left. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy	96d37d861a	net/mlx5e: Call mlx5e_page_release_dynamic directly where possible mlx5e_page_release calls the appropriate deallocator depending on whether it's an XSK RQ or a regular one. Some flows that call this function are not compatible with XSK, so they can call the non-XSK deallocator directly to save a branch. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:21 -07:00
Maxim Mikityanskiy	132857d912	net/mlx5e: Use non-XSK page allocator in SHAMPO The SHAMPO flow is not compatible with XSK, it can call the page pool allocator directly to save a branch. mlx5e_page_alloc is removed, as it's no longer used in any flow. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy	cf544517c4	net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on striding RQ and creates an optimized flow for XSK. A side effect is an opportunity to optimize the regular RX flow by dropping branching for XSK cases. Performance improvement is up to 6.4% in the aligned mode and up to 7.5% in the unaligned mode. Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy	259bbc6436	net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on legacy RQ, adding a special case for XSK. The new branch introduced basically replaces the branch that was removed from the same place a few commits before. A check is made that DMA sync is not needed, because the batching allocator falls back to returning one frame when DMA sync is needed, and this is best handled by the loop in the standard case. Performance improvement is up to 8% in the aligned mode and up to 9% in the unaligned mode. Aligned mode, 2048-byte frames: 12.8 Mpps -> 13.5 Mpps Aligned mode, 4096-byte frames: 11.5 Mpps -> 12.4 Mpps Unaligned mode, 2048-byte frames: 12.2 Mpps -> 13.4 Mpps Unaligned mode, 3072-byte frames: 11.6 Mpps -> 12.5 Mpps Unaligned mode, 4096-byte frames: 11.2 Mpps -> 12.2 Mpps CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy	a2e5ba242c	net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ Allocation of XSK frames on legacy RQ may be made more efficient with a specialized routine that relies on certain assumptions, such as there is only one fragment, allocation units (XSK frames) are not shared among multiple packets. It reduces the number of branches both in the XSK code and in the regular RQ, because with this approach there is only a single check whether it's an XSK or regular RQ. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy	0b48223237	net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs Legacy RQ WQEs are allocated in a loop in small batches (8 WQEs). As partial batches are allowed, there is no point to have a loop in a loop, so the outer loop is removed, and the batch size is increased up to the total number of WQEs to allocate, still not smaller than 8. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:20 -07:00
Maxim Mikityanskiy	3f5fe0b2e6	net/mlx5e: xsk: Use partial batches in legacy RQ with XSK The previous commit allowed allocating WQE batches in legacy RQ partially, however, XSK still checks whether there are enough frames in the fill ring. Remove this check to allow to allocate batches partially also with XSK. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:19 -07:00
Maxim Mikityanskiy	42847fed55	net/mlx5e: Use partial batches in legacy RQ Legacy RQ allocates WQEs in batches. If the batch allocation fails, the pages of the allocated part are released. This commit changes this behavior to allow to use the pages that have been already allocated. After this change, we need to be careful about indexing rq->wqe.frags[]. The WQ size is a power of two that divides by wqe_bulk (8), and the old code used whole bulks, which allowed to use indices [8K; 8K+7] without overflowing. Now that the bulks may be partial, the range can start at any location (not only at 8*K), so we need to wrap them around to avoid out-of-bounds array access. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:19 -07:00
Maxim Mikityanskiy	5758c3145b	net/mlx5e: Make the wqe_index_mask calculation more exact The old calculation of wqe_index_mask may give false positives, i.e. request bulking of pairs of WQEs when not strictly needed, for example, when the first fragment size is equal to the PAGE_SIZE, bulking is not needed, even if the number of fragments is odd. Make the calculation more exact to cut false positives. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:19 -07:00
Maxim Mikityanskiy	a064c60984	net/mlx5e: Introduce wqe_index_mask for legacy RQ When fragments of different WQEs share the same page, mlx5e_post_rx_wqes must wait until the old WQE stops using the page, only then the new WQE can allocate the new page. Essentially, it means that if WQE index i is still in use, the allocation must stop before `i % bulk`, where bulk is the number of WQEs that may share the same page. As bulk is always a power of two, `i % bulk = i & (bulk - 1)`, and the new wqe_index_mask field will be equal to `bulk - 1`. At the same time, wqe_bulk remains for optimization purposes and stores `max(bulk, 8)`, which allows to skip the allocation until we have at least 8 WQEs free. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:19 -07:00
Maxim Mikityanskiy	8cbcafcee1	net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup The MLX5E_CHANNEL_STATE_XSK flag checked in mlx5e_xsk_wakeup indicates that XSK queues are open, but not necessarily activated. This check is not very useful, because: 0. Both XSK setup and netdev state transitions take the same state_lock mutex, so they can't happen at the same time. 1. If the netdev is up, xsk_is_bound can return true only when MLX5E_CHANNEL_STATE_XSK is set on the corresponding channel. mlx5e_xsk_wakeup is only called when xsk_is_bound is true. 2. If the XSK socket is bound, and the netdev is going up or down, mlx5e_xsk_wakeup can take one of two branches, depending on the return value of napi_if_scheduled_mark_missed: 2.1. True means one of two things: either NAPI was enabled at this point, which means MLX5E_CHANNEL_STATE_XSK was also set; or NAPI was disabled, and nothing really happened. 2.2. False means that NAPI was enabled by this point, which also implies MLX5E_CHANNEL_STATE_XSK was set. Additionally, mlx5e_xsk_wakeup contains a following check for MLX5E_SQ_STATE_ENABLED on async_icosq, and this flag implies MLX5E_CHANNEL_STATE_XSK too on XSK channels. As checking this flag doesn't cut any flows, remove the check. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:19 -07:00
Maxim Mikityanskiy	d54d7194ba	net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup mlx5e_xsk_wakeup triggers an IRQ by posting a NOP to async_icosq, taking a spinlock to protect from concurrent access. There is already a function that does the same: mlx5e_trigger_napi_icosq. Use this function in mlx5e_xsk_wakeup. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-01 13:30:19 -07:00
Jakub Kicinski	5fcc2cfc14	Merge branch 'nfp-support-fec-mode-reporting-and-auto-neg' Simon Horman says: ==================== nfp: support FEC mode reporting and auto-neg this series adds support for the following features to the nfp driver: * Patch 1/5: Support active FEC mode * Patch 2/5: Don't halt driver on non-fatal error when interacting with fw * Patch 3/5: Treat port independence as a firmware rather than port property * Patch 4/5: Support link auto negotiation * Patch 5/5: Support restart of link auto negotiation ==================== Link: https://lore.kernel.org/r/20220929085832.622510-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-30 19:03:49 -07:00
Fei Qin	2820a400df	nfp: add support restart of link auto-negotiation Add support restart of link auto-negotiation. This may be initiated using: # ethtool -r <intf> Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-30 18:47:53 -07:00
Yinjun Zhang	8d545385bf	nfp: add support for link auto negotiation Report the auto negotiation capability if it's supported in management firmware, and advertise it if it's enabled. Changing port speed is not allowed when autoneg is enabled. The ethtool <intf> command displays the auto-neg capability: # ethtool enp1s0np0 Settings for enp1s0np0: Supported ports: [ FIBRE ] Supported link modes: Not reported Supported pause frame use: Symmetric Supports auto-negotiation: Yes Supported FEC modes: None RS BASER Advertised link modes: Not reported Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Advertised FEC modes: None RS BASER Speed: 25000Mb/s Duplex: Full Auto-negotiation: on Port: FIBRE PHYAD: 0 Transceiver: internal Link detected: yes Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-30 18:47:53 -07:00
Yinjun Zhang	b1e4f11e42	nfp: refine the ABI of getting `sp_indiff` info Considering that whether application firmware is indifferent to port speed is a firmware property instead of port property, now use a new rtsym to get the property instead of parsing per-port tlv caps. With this change, relevant code is moved to `nfp_main` layer. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-30 18:47:52 -07:00
Yinjun Zhang	965dd27d98	nfp: avoid halt of driver init process when non-fatal error happens It's not a fatal error when setting `hwinfo` into management firmware fails, no need to halt the whole driver initialization process. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-30 18:47:52 -07:00
Yinjun Zhang	fc26e70f8a	nfp: add support for reporting active FEC mode The latest management firmware can now report the active FEC mode. Adapt driver accordingly so that user can get the active FEC mode by running command: # ethtool --show-fec <intf> Also correct use of `fec` field. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-30 18:47:52 -07:00

1 2 3 4 5 ...

1125368 Commits