linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-18 09:44:18 +08:00

Author	SHA1	Message	Date
Ofer Levi	b7cf0806e8	net/mlx5e: Add CQE compression support for multi-strides packets Add CQE compression support for completions of packets that span multiple strides in a Striding RQ, per the HW capability. In our memory model, we use small strides (256B as of today) for the non-linear SKB mode. This feature allows CQE compression to work also for multiple strides packets. In this case decompressing the mini CQE array will use stride index provided by HW as part of the mini CQE. Before this feature, compression was possible only for single-strided packets, i.e. for packets of size up to 256 bytes when in non-linear mode, and the index was maintained by SW. This feature is supported for ConnectX-5 and above. Feature performance test: This was whitebox-tested, we reduced the PCI speed from 125Gb/s to 62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming packets before building the SKB to achieve low cpu utilization. Outcome is low cpu utilization and bottleneck on pci only. Test setup: Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores NIC: ConnectX-6 DX. Sender side generates 300 byte packets at full pci bandwidth. Receiver side configuration: Single channel, one cpu processing with one ring allocated. Cpu utilization is ~20% while pci bandwidth is fully utilized. For the generated traffic and interface MTU of 4500B (to activate the non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps to ~21Mpps. Without this feature, counters show no CQE compression blocks for this setup, while with the feature, counters show ~20.7Mpps compressed CQEs in ~500K compression blocks. Signed-off-by: Ofer Levi <oferle@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:53 -07:00
Maor Dickman	748cde9a38	net/mlx5e: Add IPv6 traffic class (DSCP) header rewrite support Add support for rewriting of IPV6 DSCP part of traffic class field. Next commands, for example, can be used to offload rewrite action: OVS: $ ovs-ofctl add-flow ovs-sriov "tcpv6, in_port=REP, \ actions=mod_nw_tos:68, output:NIC" iproute2: $ tc filter add dev REP ingress protocol ipv6 prio 1 flower skip_sw \ ip_proto tcp \ action pedit ex munge ip6 traffic_class set 68 retain 0xfc pipe \ action mirred egress redirect dev NIC Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:53 -07:00
Eli Cohen	f02882102b	net/mlx5e: Add support for tc trap Support tc trap such that packets can explicitly be forwarded to slow path if they match a specific rule. In the example below, we want packets with src IP equals 7.7.7.8 to be forwarded to software, in which case it will get to the appropriate representor net device. $ tc filter add dev eth1 protocol ip prio 1 root flower skip_sw \ src_ip 7.7.7.8 action trap Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:52 -07:00
Vu Pham	cd1ef96621	net/mlx5: E-Switch, Use vport metadata matching by default Multiple features use metadata matching such as bond vport in live migration, multi-port RoCE mode, stacked devices; hence, enable vport metadata matching by default. Fixes: `1e62e222db` ("net/mlx5: E-Switch, Use vport metadata matching only when mandatory") Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:52 -07:00
Vu Pham	fc99c3d637	net/mlx5: E-Switch, Setup all vports' metadata to support peer miss rule In merged eswitch configuration, peer miss rule is setup for all vports. If metadata is enabled, peer miss rule with metadata matching will be configured instead of source port matching; however, some vports that have not yet been enabled don't have default_metadata setup and their default_metadata will be zero. Hence, setup/cleanup default metadata for all vports when eswitch moves in/out of offloads mode. Fixes: `133dcfc577` ("net/mlx5: E-Switch, Alloc and free unique metadata for match") Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:52 -07:00
Vu Pham	406493a52f	net/mlx5: E-Switch, Dedicated metadata for uplink vport Uplink vport must have a dedicated metadata with vhca_id being part of the metadata. Fixes: `133dcfc577` ("net/mlx5: E-Switch, Alloc and free unique metadata for match") Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:51 -07:00
Vu Pham	4e9a9ef7d8	net/mlx5: E-Switch, Check and enable metadata support flag before using Check E-Switch capabilities and enable metadata support flag before using it to setup other features that need metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:51 -07:00
Jianbo Liu	9b412cc35f	net/mlx5e: Add LAG warning if bond slave is not lag master LAG offload can't be enabled if the enslaved PF is not lag master, which is indicated by HCA capabilities bit. It is cleared if more than 64 VFs are configured for this PF. Previously, a data structure is created to store lag info, including PFs to be enslaved, then a handler is registered for netdev notifier. However, this initialization is skipped if PF is not lag master. So PF can't handle CHANGEUPPER event from upper bond device. Even worse, PF is enslaved silently, and LAG offload is not activated. Fix this by registering netdev notifier for PFs which are not lag masters. When CHANGEUPPER event is received, and both physical ports (and only them) on the same NIC are about to be enslaved, a warning is returned for user to know it. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:51 -07:00
Jianbo Liu	1a3c911483	net/mlx5e: Add LAG warning for unsupported tx type If bond tx type is not active-backup or hash, LAG offload can't be enabled. When CHANGEUPPER event is received, and both PFs (and only them) under the same lag master are about to be enslaved, a warning is returned for user to know the offload failure, otherwise PFs are enslaved silently without LAG offload activated. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:51 -07:00
Jianbo Liu	f552be54e0	net/mlx5e: Return a valid errno if can't get lag device index Change the return value to -ENOENT, to make it more meaningful. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:50 -07:00
Eran Ben Elisha	0d2ffdc8d4	net/mlx5: Don't call timecounter cyc2time directly from 1PPS flow Before calling timecounter_cyc2time(), clock->lock must be taken. Use mlx5_timecounter_cyc2time instead which guarantees a safe access. Fixes: `afc98a0b46` ("net/mlx5: Update ptp_clock_event foreach PPS event") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>	2020-09-15 11:59:50 -07:00
Eran Ben Elisha	87f3495cbe	net/mlx5: Release clock lock before scheduling a PPS work Holding the clock lock is not required when scheduling a PPS work. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>	2020-09-15 11:59:50 -07:00
Eran Ben Elisha	aac2df7f02	net/mlx5: Rename ptp clock info Fix a typo in ptp_clock_info naming: mlx5_p2p -> mlx5_ptp. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>	2020-09-15 11:59:49 -07:00
Eran Ben Elisha	fb609b5112	net/mlx5: Always use container_of to find mdev pointer from clock struct Clock struct is part of struct mlx5_core_dev. Code was inconsistent, on some cases used container_of and on another used clock->mdev. Align code to use container_of amd remove clock->mdev pointer. While here, fix reverse xmas tree coding style. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>	2020-09-15 11:59:49 -07:00
Dan Carpenter	ec529b44ab	net/mlx5: remove erroneous fallthrough This isn't a fall through because it was after a return statement. The fall through annotation leads to a Smatch warning: drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:246 mlx5e_ethtool_get_sset_count() warn: ignoring unreachable code. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-09-15 11:59:49 -07:00
Moshe Tal	19f5b63bc9	net/mlx5: Fix uninitialized variable warning Add variable initialization to eliminate the warning "variable may be used uninitialized". Fixes: `5f29458b77` ("net/mlx5e: Support dump callback in TX reporter") Signed-off-by: Moshe Tal <moshet@mellanox.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:48 -07:00
Shannon Nelson	ed6d9b0228	ionic: fix up debugfs after queue swap Clean and rebuild the debugfs info for the queues being swapped. Fixes: `a34e25ab97` ("ionic: change the descriptor ring length without full reset") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:55:54 -07:00
Vladimir Oltean	b14a9fc452	__netif_receive_skb_core: don't untag vlan from skb on DSA master A DSA master interface has upper network devices, each representing an Ethernet switch port attached to it. Demultiplexing the source ports and setting skb->dev accordingly is done through the catch-all ETH_P_XDSA packet_type handler. Catch-all because DSA vendors have various header implementations, which can be placed anywhere in the frame: before the DMAC, before the EtherType, before the FCS, etc. So, the ETH_P_XDSA handler acts like an rx_handler more than anything. It is unlikely for the DSA master interface to have any other upper than the DSA switch interfaces themselves. Only maybe a bridge upper, but it is very likely that the DSA master will have no 8021q upper. So __netif_receive_skb_core() will try to untag the VLAN, despite the fact that the DSA switch interface might have an 8021q upper. So the skb will never reach that. So far, this hasn't been a problem because most of the possible placements of the DSA switch header mentioned in the first paragraph will displace the VLAN header when the DSA master receives the frame, so __netif_receive_skb_core() will not actually execute any VLAN-specific code for it. This only becomes a problem when the DSA switch header does not displace the VLAN header (for example with a tail tag). What the patch does is it bypasses the untagging of the skb when there is a DSA switch attached to this net device. So, DSA is the only packet_type handler which requires seeing the VLAN header. Once skb->dev will be changed, __netif_receive_skb_core() will be invoked again and untagging, or delivery to an 8021q upper, will happen in the RX of the DSA switch interface itself. see commit `9eb8eff0cf` ("net: bridge: allow enslaving some DSA master network devices". This is actually the reason why I prefer keeping DSA as a packet_type handler of ETH_P_XDSA rather than converting to an rx_handler. Currently the rx_handler code doesn't support chaining, and this is a problem because a DSA master might be bridged. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:34:18 -07:00
David S. Miller	0ca6d8b7d6	Merge branch 'net-next-dsa-mt7530-add-support-for-MT7531' Landen Chao says: ==================== net-next: dsa: mt7530: add support for MT7531 This patch series adds support for MT7531. MT7531 is the next generation of MT7530 which could be found on Mediatek router platforms such as MT7622 or MT7629. It is also a 7-ports switch with 5 giga embedded phys, 2 cpu ports, and the same MAC logic of MT7530. Cpu port 6 only supports SGMII interface. Cpu port 5 supports either RGMII or SGMII in different HW SKU, but cannot be muxed to PHY of port 0/4 like mt7530. Due to support for SGMII interface, pll, and pad setting are different from MT7530. MT7531 SGMII interface can be configured in following mode: - 'SGMII AN mode' with in-band negotiation capability which is compatible with PHY_INTERFACE_MODE_SGMII. - 'SGMII force mode' without in-band negotiation which is compatible with 10B/8B encoding of PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause. - 2.5 times faster clocked 'SGMII force mode' without in-band negotiation which is compatible with 10B/8B encoding of PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause. v4 -> v5 - Add fixed-link node to dsa cpu port in dts file by suggestion of Vladimir Oltean. v3 -> v4 - Adjust the coding style by suggestion of Jakub Kicinski. Remove unnecessary jumping label, merge continuous numeric 'switch cases' into one line, and keep the variables longest to shortest (reverse xmas tree). v2 -> v3 - Keep the same setup logic of mt7530/mt7621 because these series of patches is for adding mt7531 hardware. - Do not adjust rgmii delay when vendor phy driver presents in order to prevent double adjustment by suggestion of Andrew Lunn. - Remove redundant 'Example 4' from dt-bindings by suggestion of Rob Herring. - Fix typo. v1 -> v2 - change phylink_validate callback function to support full-duplex gigabit only to match hardware capability. - add description of SGMII interface. - configure mt7531 cpu port in fastest speed by default. - parse SGMII control word for in-band negotiation mode. - configure RGMII delay based on phy.rst. - Rename the definition in the header file to avoid potential conflicts. - Add wrapper function for mdio read/write to support both C22 and C45. - correct fixed-link speed of 2500base-x in dts. - add MT7531 port mirror setting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	79a675e6b1	arm64: dts: mt7622: add mt7531 dsa to bananapi-bpi-r64 board Add mt7531 dsa to bananapi-bpi-r64 board for 5 giga Ethernet ports support. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Tested-By: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	6af064486b	arm64: dts: mt7622: add mt7531 dsa to mt7622-rfb1 board Add mt7531 dsa to mt7622-rfb1 board for 5 giga Ethernet ports support. mt7622 only supports 1 sgmii interface, so either gmac0 or gmac1 can be configured as sgmii interface. In this patch, change to connect mt7622 gmac0 and mt7531 port6 through sgmii interface. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	c288575f78	net: dsa: mt7530: Add the support of MT7531 switch Add new support for MT7531: MT7531 is the next generation of MT7530. It is also a 7-ports switch with 5 giga embedded phys, 2 cpu ports, and the same MAC logic of MT7530. Cpu port 6 only supports SGMII interface. Cpu port 5 supports either RGMII or SGMII in different HW sku, but cannot be muxed to PHY of port 0/4 like mt7530. Due to SGMII interface support, pll, and pad setting are different from MT7530. This patch adds different initial setting, and SGMII phylink handlers of MT7531. MT7531 SGMII interface can be configured in following mode: - 'SGMII AN mode' with in-band negotiation capability which is compatible with PHY_INTERFACE_MODE_SGMII. - 'SGMII force mode' without in-band negotiation which is compatible with 10B/8B encoding of PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause. - 2.5 times faster clocked 'SGMII force mode' without in-band negotiation which is compatible with 10B/8B encoding of PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	27834b0223	dt-bindings: net: dsa: add new MT7531 binding to support MT7531 Add devicetree binding to support the compatible mt7531 switch as used in the MediaTek MT7531 switch. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	88bdef8be9	net: dsa: mt7530: Extend device data ready for adding a new hardware Add a structure holding required operations for each device such as device initialization, PHY port read or write, a checker whether PHY interface is supported on a certain port, MAC port setup for either bus pad or a specific PHY interface. The patch is done for ready adding a new hardware MT7531, and keep the same setup logic of existing hardware. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:38 -07:00
Landen Chao	dc8ef938c9	net: dsa: mt7530: Refine message in Kconfig Refine message in Kconfig with fixing typo and an explicit MT7621 support. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:38 -07:00
Xie He	4b46838535	drivers/net/wan/x25_asy: Remove an unnecessary x25_type_trans call x25_type_trans only needs to be called before we call netif_rx to pass the skb to upper layers. It does not need to be called before lapb_data_received. The LAPB module does not need the fields that are set by calling it. In the other two X.25 drivers - lapbether and hdlc_x25. x25_type_trans is only called before netif_rx and not before lapb_data_received. Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:41:02 -07:00
Paolo Abeni	2de79ee27f	net: try to avoid unneeded backlog flush flush_all_backlogs() may cause deadlock on systems running processes with FIFO scheduling policy. The above is critical in -RT scenarios, where user-space specifically ensure no network activity is scheduled on the CPU running the mentioned FIFO process, but still get stuck. This commit tries to address the problem checking the backlog status on the remote CPUs before scheduling the flush operation. If the backlog is empty, we can skip it. v1 -> v2: - explicitly clear flushed cpu mask - Eric Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:39:00 -07:00
David S. Miller	7b2d1b8d9d	Merge branch 'mlxsw-Derive-SBIB-from-maximum-port-speed-and-MTU' Ido Schimmel says: ==================== mlxsw: Derive SBIB from maximum port speed & MTU Petr says: Internal buffer is a part of port headroom used for packets that are mirrored due to triggers that the Spectrum ASIC considers "egress". Besides ACL mirroring on port egresss this includes also packets mirrored due to ECN marking. This patchset changes the way the internal mirroring buffer is reserved. Currently the buffer reflects port MTU and speed accurately. In the future, mlxsw should support dcbnl_setbuffer hook to allow the users to set buffer sizes by hand. In that case, there might not be enough space for growth of the internal mirroring buffer due to MTU and speed changes. While vetoing MTU changes would be merely confusing, port speed changes cannot be vetoed, and such change would simply lead to issues in packet mirroring. For these reasons, with these patches the internal mirroring buffer is derived from maximum MTU and maximum speed achievable on the port. Patches #1 and #2 introduce a new callback to determine the maximum speed a given port can achieve. With patches #3 and #4, the information about, respectively, maximum MTU and maximum port speed, is kept in struct mlxsw_sp_port. In patch #5, maximum MTU and maximum speed are used to determine the size of the internal buffer. MTU update and speed update hooks are dropped, because they are no longer necessary. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	532b49e41e	mlxsw: spectrum_span: Derive SBIB from maximum port speed & MTU The SBIB register configures the size of an internal buffer that the Spectrum ASICs use when mirroring traffic on egress. This size should be taken into account when validating that the port headroom buffers are not larger than the chip can handle. Up until now this was not done, which is incidentally not a problem, because the priority group buffers that mlxsw auto-configures are small enough that the boundary condition could not be violated. However when dcbnl_setbuffer is implemented, the user has control over sizes of PG buffers, and they might overshoot the headroom capacity. However the size of the SBIB buffer depends on port speed, and that cannot be vetoed. Therefore SBIB size should be deduced from maximum port speed. Additionally, once the buffers are configured by hand, the user could get into an uncomfortable situation where their MTU change requests get vetoed, because the SBIB does not fit anymore. Therefore derive SBIB size from maximum permissible MTU as well. Remove all the code that adjusted the SBIB size whenever speed or MTU changed. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	3232e8c66e	mlxsw: spectrum: Keep maximum speed around The maximum port speed depends on link modes supported by the port, and for Ethernet ports is constant. The maximum speed will be handy when setting SBIB, the internal buffer used for traffic mirroring. Therefore, keep it in struct mlxsw_sp_port for easy access. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	2ecf87ae6c	mlxsw: spectrum: Keep maximum MTU around The maximum port MTU depends on port type. On Spectrum, mlxsw configures all ports as Ethernet ports, and the maximum MTU therefore never changes. Besides checking MTU configuration, maximum MTU will also be handy when setting SBIB, the internal buffer used for traffic mirroring. Therefore, keep it in struct mlxsw_sp_port for easy access. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	60fbc52184	mlxsw: spectrum_ethtool: Introduce ptys_max_speed callback The SBIB register configures the size of an internal buffer that the Spectrum ASICs use when mirroring traffic on egress. This size should be taken into account when validating that the port headroom buffers are not larger than the chip can handle. Up until now this was not done, which is incidentally not a problem, because the priority group buffers that mlxsw auto-configures are small enough that the boundary condition could not be violated. When dcbnl_setbuffer is implemented, the user gets control over sizes of PG buffers, and they might overshoot the headroom capacity. However the size of the SBIB buffer depends on port speed, which cannot be vetoed. There is obviously no way to retroactively push back on requests for overlarge PG buffers, or reject an overlarge MTU, or cancel losslessness of a certain PG. Therefore, instead of taking into account the current speed when calculating SBIB buffer size, take into account the maximum speed that a port with given Ethernet protocol capabilities can have. To that end, add a new ethtool callback, ptys_max_speed, which determines this maximum speed. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	d24ca6c0a7	mlxsw: spectrum_ethtool: Extract a helper to get Ethernet attributes In order to allow reusing the logic, extract from mlxsw_sp_port_get_link_ksettings() the code to obtain Ethernet protocol attributes, mlxsw_sp_port_ptys_query(). Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
David S. Miller	7952d7edf3	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2020-09-14 This series contains updates to i40e driver only. Li RongQing removes binding affinity mask to a fixed CPU and sets prefetch of Rx buffer page to occur conditionally. Björn provides AF_XDP performance improvements by not prefetching HW descriptors, using 16 byte descriptors, and moving buffer allocation out of Rx processing loop. v2: Define prefetch_page_address in a common header for patch 2. Dropped, previous, patch 5 as it is being reworked to be more generalized. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:07:58 -07:00
David S. Miller	e0d9ae699e	RxRPC development fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl9fipcACgkQ+7dXa6fL C2sNIhAAjnqKckjbLtzy2ZhO3nyEMlABYtGcDi8a1x3H42Ncsqca5GiKjjY54n90 rLe2iyX/5ncURjrkVVUJFTlkhQrha40dOp/DYHHbwj4ko9P625QrsPn0h5zo/Ben UUeOVqibyAOoqXWCqhRgLF1BhPmg/22TtHiqbcRul+nss9vcjuFcjOEIhNVZDUfu VPjeitxF9Tuz9FEH00UJs23LWONBCvNWDtCjAj/hf328Mk+TptSiFPTNVEuPrbje 1IbBy3PjBzeL2CFtp0OQs3uibAz+7C9IY4i53tdBPQNE5uW1FE/Wm7ixK3Oseq8X hkAP3phNG669tZzE+49g0X1AfqHJr9F0dGbdIqOYC4seyC6NXROuvnzX3HdV7gYd MwCyIcjWxw2B6dhjk2sDncFjr7Tima6KRvWHsf8cEk645gbMEltvNxJi1KCK/sj/ wpiiQrPZZ82e+RfIfQ5l5cuMEROceZ1LpUKRK5rc4Gc49xuFbanoOYh4iBChmABb ULKVRHb/HFRIY9Y8boxw+0iDzDYQugoH6IsEEBdH87UBonEfPaJpcRTljcFU4LVh ppNeOXFu0p+CQwDaLDhTILDVoFDjMfVAjOC42TMfiTLEarWz5cpRPu96tOerpSgk Ulmh6m2cGNYDOIuCdVyRJFf5F9+Mj3VIBygven4GuWUqkZ18ooc= =0qvR -----END PGP SIGNATURE----- Merge tag 'rxrpc-next-20200914' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fixes for the connection manager rewrite Here are some fixes for the connection manager rewrite: (1) Fix a goto to the wrong place in error handling. (2) Fix a missing NULL pointer check. (3) The stored allocation error needs to be stored signed. (4) Fix a leak of connection bundle when clearing connections due to net namespace exit. (5) Fix an overget of the bundle when setting up a new client conn. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:03:38 -07:00
Luo bin	33acd755f4	hinic: add vxlan segmentation and cs offload support Add NETIF_F_GSO_UDP_TUNNEL and NETIF_F_GSO_UDP_TUNNEL_CSUM features to support vxlan segmentation and checksum offload. Ipip and ipv6 tunnel packets are regarded as non-tunnel pkt for hw and as for other type of tunnel pkts, checksum offload is disabled. Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:59:15 -07:00
Zhang Changzhong	f3694707ad	net: qlcnic: remove unused variable 'val' in qlcnic_83xx_cam_unlock() Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c:661:6: warning: variable 'val' set but not used [-Wunused-but-set-variable] 661 \| u32 val; \| ^~~ After commit `7f9664525f` ("qlcnic: 83xx memory map and HW access routines"), variable 'val' is never used in qlcnic_83xx_cam_unlock(), so removing it to avoid build warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:44:12 -07:00
Zhang Changzhong	f7ab0f04a0	net: pxa168_eth: remove unused variable 'retval' int pxa168_eth_change_mtu() Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/marvell/pxa168_eth.c:1190:6: warning: variable 'retval' set but not used [-Wunused-but-set-variable] 1190 \| int retval; \| ^~~~~~ Function pxa168_eth_change_mtu() always return zero, so variable 'retval' is redundant, just remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:43:38 -07:00
Zhang Changzhong	992bae7e42	net: fec: ptp: remove unused variable 'ns' in fec_time_keep() Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/freescale/fec_ptp.c:523:6: warning: variable 'ns' set but not used [-Wunused-but-set-variable] 523 \| u64 ns; \| ^~ After commit `6605b730c0` ("FEC: Add time stamping code and a PTP hardware clock"), variable 'ns' is never used in fec_time_keep(), so removing it to avoid build warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:42:50 -07:00
Zhang Changzhong	85743cead5	net: dnet: remove unused variable 'tx_status 'in dnet_start_xmit() Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/dnet.c:510:6: warning: variable 'tx_status' set but not used [-Wunused-but-set-variable] u32 tx_status, irq_enable; ^~~~~~~~~ After commit `4796417417` ("dnet: Dave DNET ethernet controller driver (updated)"), variable 'tx_status' is never used in dnet_start_xmit(), so removing it to avoid build warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:42:09 -07:00
Eric Dumazet	0cbe6a8f08	tcp: remove SOCK_QUEUE_SHRUNK SOCK_QUEUE_SHRUNK is currently used by TCP as a temporary state that remembers if some room has been made in the rtx queue by an incoming ACK packet. This is later used from tcp_check_space() before considering to send EPOLLOUT. Problem is: If we receive SACK packets, and no packet is removed from RTX queue, we can send fresh packets, thus moving them from write queue to rtx queue and eventually empty the write queue. This stall can happen if TCP_NOTSENT_LOWAT is used. With this fix, we no longer risk stalling sends while holes are repaired, and we can fully use socket sndbuf. This also removes a cache line dirtying for typical RPC workloads. Fixes: `c9bee3b7fd` ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:36:00 -07:00
Xie He	b4c5881446	net/packet: Fix a comment about hard_header_len and headroom allocation This comment is outdated and no longer reflects the actual implementation of af_packet.c. Reasons for the new comment: 1. In af_packet.c, the function packet_snd first reserves a headroom of length (dev->hard_header_len + dev->needed_headroom). Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, which calls dev->header_ops->create, to create the link layer header. If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of length (dev->hard_header_len), and checks if the user has provided a header sized between (dev->min_header_len) and (dev->hard_header_len) (in dev_validate_header). This shows the developers of af_packet.c expect hard_header_len to be consistent with header_ops. 2. In af_packet.c, the function packet_sendmsg_spkt has a FIXME comment. That comment states that prepending an LL header internally in a driver is considered a bug. I believe this bug can be fixed by setting hard_header_len to 0, making the internal header completely invisible to af_packet.c (and requesting the headroom in needed_headroom instead). 3. There is a commit for a WiFi driver: commit `9454f7a895` ("mwifiex: set needed_headroom, not hard_header_len") According to the discussion about it at: https://patchwork.kernel.org/patch/11407493/ The author tried to set the WiFi driver's hard_header_len to the Ethernet header length, and request additional header space internally needed by setting needed_headroom. This means this usage is already adopted by driver developers. Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Brian Norris <briannorris@chromium.org> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:34:39 -07:00
David S. Miller	b91c06c5df	Merge branch 'mptcp-introduce-support-for-real-multipath-xmit' Paolo Abeni says: ==================== mptcp: introduce support for real multipath xmit This series enable MPTCP socket to transmit data on multiple subflows concurrently in a load balancing scenario. First the receive code path is refactored to better deal with out-of-order data (patches 1-7). An RB-tree is introduced to queue MPTCP-level out-of-order data, closely resembling the TCP level OoO handling. When data is sent on multiple subflows, the peer can easily see OoO - "future" data at the MPTCP level, especially if speeds, delay, or jitter are not symmetric. The other major change regards the netlink PM, which is extended to allow creating non backup subflows in patches 9-11. There are a few smaller additions, like the introduction of OoO related mibs, send buffer autotuning and better ack handling. Finally a bunch of new self-tests is introduced. The new feature is tested ensuring that the B/W used by an MPTCP socket using multiple subflows matches the link aggregated B/W - we use low B/W virtual links, to ensure the tests are not CPU bounded. v1 -> v2: - fix 32 bit build breakage - fix a bunch of checkpatch issues ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:03 -07:00
Paolo Abeni	1a418cb8e8	mptcp: simult flow self-tests Add a bunch of test-cases for multiple subflow xmit: create multiple subflows simulating different links condition via netem and verify that the msk is able to use completely the aggregated bandwidth. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:03 -07:00
Paolo Abeni	c76c695656	mptcp: call tcp_cleanup_rbuf on subflows That is needed to let the subflows announce promptly when new space is available in the receive buffer. tcp_cleanup_rbuf() is currently a static function, drop the scope modifier and add a declaration in the TCP header. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	d5f49190de	mptcp: allow picking different xmit subflows Update the scheduler to less trivial heuristic: cache the last used subflow, and try to send on it a reasonably long burst of data. When the burst or the subflow send space is exhausted, pick the subflow with the lower ratio between write space and send buffer - that is, the subflow with the greater relative amount of free space. v1 -> v2: - fix 32 bit build breakage due to 64bits div - fix checkpath issues (uint64_t -> u64) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	4596a2c1b7	mptcp: allow creating non-backup subflows Currently the 'backup' attribute of local endpoint is ignored. Let's use it for the MP_JOIN handshake Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	ef0da3b8a2	mptcp: move address attribute into mptcp_addr_info So that can be accessed easily from the subflow creation helper. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	06242e44b9	mptcp: add OoO related mibs Add a bunch of MPTCP mibs related to MPTCP OoO data processing. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	04e4cd4f7c	mptcp: cleanup mptcp_subflow_discard_data() There is no need to use the tcp_read_sock(), we can simply drop the skb. Additionally try to look at the next buffer for in order data. This both simplifies the code and avoid unneeded indirect calls. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00

1 2 3 4 5 ...

951053 Commits