Add tcp tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures for both clients and servers. Includes permutations with
net.ipv4.tcp_l3mdev_accept set to 0 and 1.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IPv6 ping tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures.
Setup includes unreachable routes and fib rules blocking traffic.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ping tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures.
Setup includes unreachable routes and fib rules blocking traffic.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initial commit for functional test suite for fib and socket lookups.
This commit contains the namespace setup, networking config, test options
and other basic infrastructure.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add nettest - a simple program with an implementation for various networking
APIs. nettest is used for tcp, udp and raw functional tests for both IPv4
and IPv6.
Point of this command versus existing utilities:
- controlled implementation of the APIs and the order in which they
are called,
- ability to verify ingress device, local and remote addresses,
- timeout for controlled test length,
- ability to discriminate a timeout from a system call failure, and
- simplicity with test scripts.
The command returns:
0 on success,
1 for any system call failure, and
2 on timeout.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-08-01
This series for fm10k, by Jake Keller, reduces the scope of local variables
where possible.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil says:
====================
enetc: Add mdio bus driver for the PCIe MDIO endpoint
First patch fixes a sparse issue and cleans up accessors to avoid
casting to __iomem. The second one cleans up the Makefile, to make
it easier to add new entries.
Third patch just registers the PCIe endpoint device containing
the MDIO registers as a standalone MDIO bus driver, to provide
an alternative way to control the MDIO bus. The same code used
by the ENETC ports (eth controllers) to manage MDIO via local
registers applies and is reused.
Bindings are provided for the new MDIO node, similarly to ENETC
port nodes bindings.
Last patch enables the ENETC port 1 and its RGMII PHY on the
LS1028A QDS board, where the MDIO muxing configuration relies
on the MDIO support provided in the first patch.
Changes since v0:
v1 - fixed mdio bus allocation
v2 - cleaned up accessors to avoid casting
v3 - fixed spelling (mostly commit message)
v4 - fixed err path check blunder
v5 - fixed loadble module build, provided separate kbuild module
for the driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
LS1028a has one Ethernet management interface. On the QDS board, the
MDIO signals are multiplexed to either on-board AR8035 PHY device or
to 4 PCIe slots allowing for SGMII cards.
To enable the Ethernet ENETC Port 1, which can only be connected to a
RGMII PHY, the multiplexer needs to be configured to route the MDIO to
the AR8035 PHY. The MDIO/MDC routing is controlled by bits 7:4 of FPGA
board config register 0x54, and value 0 selects the on-board RGMII PHY.
The FPGA board config registers are accessible on the i2c bus, at address
0x66.
The PF3 MDIO PCIe integrated endpoint device allows for centralized access
to the MDIO bus. Add the corresponding devicetree node and set it to be
the MDIO bus parent.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The on-chip PCIe root complex that integrates the ENETC ethernet
controllers also integrates a PCIe endpoint for the MDIO controller
providing for centralized control of the ENETC mdio bus.
Add bindings for this "central" MDIO Integrated PCIe Endpoint.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
ENETC ports can manage the MDIO bus via local register
interface. However there's also a centralized way
to manage the MDIO bus, via the MDIO PCIe endpoint
device integrated by the same root complex that also
integrates the ENETC ports (eth controllers).
Depending on board design and use case, centralized
access to MDIO may be better than using local ENETC
port registers. For instance, on the LS1028A QDS board
where MDIO muxing is required. Also, the LS1028A on-chip
switch doesn't have a local MDIO register interface.
The current patch registers the above PCIe endpoint as a
separate MDIO bus and provides a driver for it by re-using
the code used for local MDIO access. It also allows the
ENETC port PHYs to be managed by this driver if the local
"mdio" node is missing from the ENETC port node.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up overcomplicated makefile to make it more maintainable.
Basically, there's a set of common objects shared between
the PF and VF driver modules. This can be implemented in a
simpler way, without conditionals, less repetition, allowing
also for easier updates in the future.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
What's needed is basically a pointer to the mdio registers.
This is one way to store it inside bus->priv allocated space,
without upsetting sparse.
Reworked accessors to avoid __iomem casting.
Used devm_* variant to further clean up the init error /
remove paths.
Fixes following sparse warning:
warning: incorrect type in assignment (different address spaces)
expected void *priv
got struct enetc_mdio_regs [noderef] <asn:2>*[assigned] regs
Fixes: ebfcb23d62 ("enetc: Add ENETC PF level external MDIO support")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hubert Feurstein says:
====================
net: dsa: mv88e6xxx: add support for MV88E6220
This patch series adds support for the MV88E6220 chip to the mv88e6xxx driver.
The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins.
Furthermore, PTP support is added to the MV88E6250 family.
v2:
- insert all 6220 entries in correct numerical order
- introduce invalid_port_mask
- move ptp_cc_mult* to ptp_ops and restored original ptp_adjfine code
- added Andrews Reviewed-By to patch 2 and 4
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds PTP support for the MV88E6250 family.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it is done for all the other structs within this driver.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MV88E6250 family doesn't support the MV88E6XXX_PORT_CTL1_MESSAGE_PORT
bit.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this it is possible to mark certain chip ports as invalid. This is
required for example for the MV88E6220 (which is in general a MV88E6250
with 7 ports) but the ports 2-4 are not routed to pins.
If a user configures an invalid port, an error is returned.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MV88E6220 is part of the MV88E6250 family.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins. So the usable ports are 0, 1, 5 and 6.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Jeffery says:
====================
net: phy: Add AST2600 MDIO support
v2 of the ASPEED MDIO series addresses comments from Rob on the devicetree
bindings and Andrew on the driver itself.
v1 of the series can be found here:
http://patchwork.ozlabs.org/cover/1138140/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensures we can talk to a PHY via MDIO on the AST2600, as the MDIO
controller is now separate from the MAC.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
phy-handle is necessary for the AST2600 which separates the MDIO
controllers from the MAC.
I've tried to minimise the intrusion of supporting the AST2600 to the
FTGMAC100 by leaving in place the existing MDIO support for the embedded
MDIO interface. The AST2400 and AST2500 continue to be supported this
way, as it avoids breaking/reworking existing devicetrees.
The AST2600 support by contrast requires the presence of the phy-handle
property in the MAC devicetree node to specify the appropriate PHY to
associate with the MAC. In the event that someone wants to specify the
MDIO bus topology under the MAC node on an AST2400 or AST2500, the
current auto-probe approach is done conditional on the absence of an
"mdio" child node of the MAC.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The AST2600 design separates the MDIO controllers from the MAC, which is
where they were placed in the AST2400 and AST2500. Further, the register
interface is reworked again, so now we have three possible different
interface implementations, however this driver only supports the
interface provided by the AST2600. The AST2400 and AST2500 will continue
to be supported by the MDIO support embedded in the FTGMAC100 driver.
The hardware supports both C22 and C45 mode, but for the moment only C22
support is implemented.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The AST2600 splits out the MDIO bus controller from the MAC into its own
IP block and rearranges the register layout. Add a new binding to
describe the new hardware.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 365ad353c2 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.
Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.
We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.
Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the scope of the ring local variable in the fm10k_assign_l2_accel
function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[fm10k_netdev.c:1447]: (style) The scope of the variable 'ring' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reduce the scope of the result local variable in the
fm10k_iov_msg_lport_state_pf function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[fm10k_pf.c:1435]: (style) The scope of the variable 'result' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The msg variable in the fm10k_mbx_validate_msg_size and
fm10k_sm_mbx_transmit functions is only used within the do {} loop
scope. Reduce its scope only to where it is used.
This was detected by cppcheck, and resolves the following warnings
produced by that tool:
[fm10k_mbx.c:299]: (style) The scope of the variable 'msg' can be reduced.
[fm10k_mbx.c:2004]: (style) The scope of the variable 'msg' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reduce the scope of the local loop variable in the
fm10k_check_hang_subtask function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[driver/fm10k_pci.c:852]: (style) The scope of the variable 'i' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reduce the scope of the local variable err in the fm10k_detach_subtask
function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[fm10k_pci.c:403]: (style) The scope of the variable 'err' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The tx_buffer local variable in the function fm10k_clean_tx_ring is not
used except inside a smaller block scope. Reduce the scope to its point
of use.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_netdev.c:179]: (style) The scope of the variable 'tx_buffer' can
be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reduce the scope of the q_idx local variable in the fm10k_cache_ring_qos
function.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_main.c:2016]: (style) The scope of the variable 'q_idx' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reduce the scope of the local err variable in the fm10k_iov_alloc_data
function.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_iov.c:426]: (style) The scope of the variable 'err' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reduce the scope of the qv vector pointer local variable in the
fm10k_set_coalesce function.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_ethtool.c:658]: (style) The scope of the variable 'qv' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reduce the scope of the char *p local variable to only the block where
it is used.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_ethtool.c:229]: (style) The scope of the variable 'p' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reduce the scope of the err local variable in the fm10k_dcbnl_ieee_setets
function.
This was detected using cppcheck, and resolves the following style
warning:
[fm10k_dcbnl.c:37]: (style) The scope of the variable 'err' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: avoid some redundant VTU operations
The mv88e6xxx driver currently uses a mv88e6xxx_vtu_get wrapper to get a
single entry and uses a boolean to eventually initialize a fresh one.
However the fresh entry is only needed in one place and mv88e6xxx_vtu_getnext
is simple enough to call it directly. Doing so makes the code easier to read,
especially for the return code expected by switchdev to honor software VLANs.
In addition to not loading the VTU again when an entry is already correctly
programmed, this also allows to avoid programming the broadcast entries
again when updating a port's membership, from e.g. tagged to untagged.
This patch series removes the mv88e6xxx_vtu_get wrapper in favor of direct
calls to mv88e6xxx_vtu_getnext, and also renames the _mv88e6xxx_port_vlan_add
and _mv88e6xxx_port_vlan_del helpers using an old underscore prefix convention.
In case the port's membership is already correctly programmed in hardware,
the following debug message may be printed:
[ 745.989884] mv88e6085 2188000.ethernet-1:00: p4: already a member of VLAN 42
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read and
_mv88e6xxx_port_vlan_add is the only function requiring the preparation
of a new VLAN entry.
To simplify things up, remove the mv88e6xxx_vtu_get wrapper and
explicit the VLAN lookup in _mv88e6xxx_port_vlan_add. This rework
also avoids programming the broadcast entries again when changing a
port's membership, e.g. from tagged to untagged.
At the same time, rename the helper using an old underscore convention.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read.
Explicit the call to mv88e6xxx_vtu_getnext in _mv88e6xxx_port_vlan_del
and the return value expected by switchdev in case of software VLANs.
At the same time, rename the helper using an old underscore convention.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mv88e6xxx_vtu_getnext is simple enough to call it directly in the
mv88e6xxx_port_db_load_purge function and explicit the return code
expected by switchdev for software VLANs when an hardware VLAN does
not exist.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mv88e6xxx_vtu_getnext interprets two members from the input
mv88e6xxx_vtu_entry structure: the (excluded) vid member to start
the iteration from, and the valid argument specifying whether the VID
must be written or not (only required once at the start of a loop).
Explicit the assignation of these two fields right before calling
mv88e6xxx_vtu_getnext, as it is done in the mv88e6xxx_vtu_get wrapper.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lock the mutex in the mv88e6xxx_port_vlan_prepare function
called by the DSA stack, instead of doing it in the internal
mv88e6xxx_port_check_hw_vlan helper.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some case, we don't want to allow specific tunnel packets
to host that can avoid to take up high CPU (e.g network attacks).
But other tunnel packets which not matched in hardware will be
sent to host too.
$ tc filter add dev vxlan_sys_4789 \
protocol ip chain 0 parent ffff: prio 1 handle 1 \
flower dst_ip 1.1.1.100 ip_proto tcp dst_port 80 \
enc_dst_ip 2.2.2.100 enc_key_id 100 enc_dst_port 4789 \
action tunnel_key unset pipe action drop
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When failing to create tx reporter, don't set the reporter's pointer.
Creating a reporter is not mandatory for driver load, avoid
garbage/error pointer.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Return error when failing to create a reporter in devlink. Since
NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer
can't be NULL and can only hold an error in bad path.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move vlan checksum fixup flow into mlx5e_skb_padding_csum(), which is
supposed to fixup SKB checksum if needed. And rename
mlx5e_skb_padding_csum() to mlx5e_skb_csum_fixup().
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If capable, use zero inline mode in TX WQE for non-VLAN packets.
For VLAN ones, keep the enforcement of at least L2 inline mode,
unless the WQE VLAN insertion offload cap is on.
Performance:
Tested single core packet rate of 64Bytes.
NIC: ConnectX-5
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
pktgen:
Before: 12.46 Mpps
After: 14.65 Mpps (+17.5%)
XDP_TX:
The MPWQE flow is not affected, as it already has this optimization.
So we test with priv-flag xdp_tx_mpwqe: off.
Before: 9.90 Mpps
After: 10.20 Mpps (+3%)
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Tested-by: Noam Stolero <noams@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of passing an output param, let function return the
WQE pointer.
In addition, pass &pi so it gets its value in the function,
and save the redundant assignment that comes after it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In MPWQE mode, when transmitting packets with XDP, a packet that is smaller
than a certain size (set to 256 bytes) would be sent inline within its WQE
TX descriptor (mem-copied), in case the hardware tx queue is congested
beyond a pre-defined water-mark.
If a MPWQE cannot contain an additional inline packet, we close this
MPWQE session, and send the packet inlined within the next MPWQE.
To save some MPWQE session close+open operations, we don't open MPWQE
sessions that are contiguously smaller than certain size (set to the
HW MPWQE maximum size). If there isn't enough contiguous room in the
send queue, we fill it with NOPs and wrap the send queue index around.
This way, qualified packets are always sent inline.
Perf tests:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_TX:
With 24 channels:
| ------ | bounced packets | inlined packets | inline ratio |
| before | 113.6Mpps | 96.3Mpps | 84% |
| after | 115Mpps | 99.5Mpps | 86% |
With one channel:
| ------ | bounced packets | inlined packets | inline ratio |
| before | 6.7Mpps | 0pps | 0% |
| after | 6.8Mpps | 0pps | 0% |
As we can see, there is improvement in both inline ratio and overall
packet rate for 24 channels. Also, we see no degradation for the
one-channel case.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>