Move useful functions into a separate file in preparation for more
vsock test programs.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vsock_diag_test program directly included ../../../include/uapi/
headers from the source tree. Tests are supposed to use the
usr/include/linux/ headers that have been prepared with make
headers_install instead.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel says:
====================
add dsa switch support for ar9331
changes v6:
- remove ag71xx changes from this patch set. It needs more work.
- ar9331: fix register definition and add ASCII art switch documentation.
changes v6:
- rebase against net-next
changes v5:
- remote support for port5. The effort of using this port is
questionable. Currently, it is better to not use it at all, then
adding buggy support.
- remove port enable call back. There is nothing what we actually need
to enable.
- rebase it against v5.5-rc1
changes v4:
- ag71xx: ag71xx_mac_validate fix always false comparison (&& -> ||)
- tag_ar9331: use skb_pull_rcsum() instead of skb_pull().
- tag_ar9331: drop skb_set_mac_header()
changes v3:
- ag71xx: ag71xx_mac_config: ignore MLO_AN_INBAND mode. It is not
supported by HW and SW.
- ag71xx: ag71xx_mac_validate: return all supported bits on
PHY_INTERFACE_MODE_NA
changes v2:
- move Atheros AR9331 TAG format to separate patch
- use netdev_warn_once in the tag driver to reduce potential message spam
- typo fixes
- reorder tag driver alphabetically
- configure switch to maximal frame size
- use mdiobus_read/write
- fail if mdio sub node is not found
- add comment for post reset state
- remove deprecated comment about device id
- remove phy-handle option for node with fixed-link
- ag71xx: set 1G support only for GMII mode
This patch series provides dsa switch support for Atheros ar9331 WiSoC.
As side effect ag71xx needed to be ported to phylink to make the switch
driver (as well phylink based) work properly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide basic support for Atheros AR9331 built-in switch. So far it
works as port multiplexer without any hardware offloading support.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for tag format used in Atheros AR9331 built-in switch.
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add switch node supported by dsa ar9331 driver.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atheros AR9331 has built-in 5 port switch. The switch can be configured
to use all 5 or 4 ports. One of built-in PHYs can be used by first built-in
ethernet controller or to be used directly by the switch over second ethernet
controller.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang warns
../drivers/nfc/pn544/pn544.c:696:4: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
return nfc_hci_send_cmd(hdev, NFC_HCI_RF_READER_A_GATE,
^
../drivers/nfc/pn544/pn544.c:692:3: note: previous statement is here
if (target->nfcid1_len != 4 && target->nfcid1_len != 7 &&
^
1 warning generated.
This warning occurs because there is a space after the tab on this line.
Remove it so that the indentation is consistent with the Linux kernel
coding style and clang no longer warns.
Fixes: da052850b9 ("NFC: Add pn544 presence check for different targets")
Link: https://github.com/ClangBuiltLinux/linux/issues/814
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger says:
====================
net: bcmgenet: Turn on offloads by default
This commit stack is based on Florian's commit 4e8aedfe78c7 ("net:
systemport: Turn on offloads by default") and enables the offloads for
the bcmgenet driver by default.
The first commit adds support for the HIGHDMA feature to the driver.
The second converts the Tx checksum implementation to use the generic
hardware logic rather than the deprecated IP centric methods.
The third modifies the Rx checksum implementation to use the hardware
offload to compute the complete checksum rather than filtering out bad
packets detected by the hardware's IP centric implementation. This may
increase processing load by passing bad packets to the network stack,
but it provides for more flexible handling of packets by the network
stack without requiring software computation of the checksum.
The remaining commits mirror the extensions Florian made to the sysport
driver to retain symmetry with that driver and to make the benefits of
the hardware offloads more ubiquitous.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When inserting the TSB, keep track of how many times we had to do
it and if there was a failure in doing so, this helps profile the
driver for possibly incorrect headroom settings.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During bcmgenet_put_tx_csum() make sure we differentiate a SKB
headroom re-allocation failure from the normal swap and replace
path.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can turn on the RX/TX checksum offloads and the scatter/gather
features by default and make sure that those are properly reflected
back to e.g: stacked devices such as VLAN.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During driver resume and open, the HW may have lost its context/state,
utilize bcmgenet_set_features() to make sure we do restore the correct
set of features that were previously configured.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally enabling TX and RX checksum
offloads, refactor bcmgenet_set_features() a bit such that
__netdev_update_features() during register_netdev() can make sure
that features are correctly programmed during network device
registration.
Since we can now be called during register_netdev() with clocks
gated, we need to temporarily turn them on/off in order to have a
successful register programming.
We also move the CRC forward setting read into
bcmgenet_set_features() since priv->crc_fwd_en matters while
turning on RX checksum offload, that way we are guaranteed they
are in sync in case we ever add support for NETIF_F_RXFCS at some
point in the future.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit updates the Rx checksum offload behavior of the driver
to use the more generic CHECKSUM_COMPLETE method that supports all
protocols over the CHECKSUM_UNNECESSARY method that only applies
to some protocols known by the hardware.
This behavior is perceived to be superior.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GENET hardware should be capable of generating IP checksums
using the NETIF_F_HW_CSUM feature, so switch to using that feature
instead of the depricated NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit configures the DMA masks for the GENET driver and
sets the NETIF_F_HIGHDMA flag to report support of the feature.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYSTEMPORT is capabable of doing up to 40-bit of physical addresses, set
an appropriate DMA mask to permit that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: tls: implement the stream sync RX resync
This small series adds support for using the device
in stream scan RX resync mode which improves the RX
resync success rate. Without stream scan it's pretty
much impossible to successfully resync a continuous
stream.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The simple RX resync strategy controlled by the kernel does not
guarantee as good results as if the device helps by detecting
the potential record boundaries and keeping track of them.
We've called this strategy stream scan in the tls-offload doc.
Implement this strategy for the NFP. The device sends a request
for record boundary confirmation, which is then recorded in
per-TLS socket state and responded to once record is reached.
Because the device keeps track of records passing after the
request was sent the response is not as latency sensitive as
when kernel just tries to tell the device the information
about the next record.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is currently no way for driver to reliably check that
the socket it has looked up is in fact RX offloaded. Add
a helper. This allows drivers to catch misbehaving firmware.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make nfp_net_parse_meta() take a packet pointer and return
a drop/no drop decision. Right now it returns the end of
metadata and caller compares it to the packet pointer.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley says:
====================
Add ipv6 tunnel support to NFP
The following patches add support for IPv6 tunnel offload to the NFP
driver.
Patches 1-2 do some code tidy up and prepare existing code for reuse in
IPv6 tunnels.
Patches 3-4 handle IPv6 tunnel decap (match) rules.
Patches 5-8 handle encap (action) rules.
Patch 9 adds IPv6 support to the merge and pre-tunnel rule functions.
v1->v2:
- fix compiler warning when building without CONFIG_IPV6 set -
Jakub Kicinski (patch 7)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Both pre-tunnel match rules and flow merge functions parse compiled
match/action fields for validation.
Update these validation functions to include IPv6 match and action fields.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW sends an update of IPv6 tunnels that are active in a given period. Use
this information to update the kernel table so that neighbour entries do
not time out when active on the NIC.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A notifier is used to track route changes in the kernel. If a change is
made to a route that is offloaded to fw then an update is sent to the NIC.
The driver tracks all routes that are offloaded to determine if a kernel
change is of interest.
Extend the notifier to track IPv6 route changes and create a new list that
stores offloaded IPv6 routes. Modify the IPv4 route helper functions to
accept varying address lengths. This way, the same core functions can be
used to handle IPv4 and IPv6.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fw does not know the next hop for an IPv6 tunnel, it sends a request
to the driver.
Handle this request by doing a route lookup on the IPv6 address and
offloading the next hop to the fw neighbour table.
Similar functions already exist to handle IPv4 no neighbour requests. To
avoid confusion, append these functions with the _ipv4 tag. There is no
change in functionality with this.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv4 set tunnel action allows the setting of tunnel metadata such as
the TTL and ToS values. The pre-tunnel action includes the destination IP
address and is used to calculate the next hop from from the neighbour
table.
Much of the IPv4 tunnel actions can be reused for IPv6 tunnels. Change the
names of associated functions and structs to remove the IPv4 identifier
and make minor modifcations to support IPv6 tunnel actions.
Ensure the pre-tunnel action contains the IPv6 address along with an
identifying flag when an IPv6 tunnel action is required.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fw requires a list of IPv6 addresses that are used as tunnel endpoints to
enable correct decap of tunneled packets.
Store a list of IPv6 endpoints used in rules with a ref counter to track
how many times it is in use. Offload the entire list any time a new IPv6
address is added or when an address is removed (ref count is 0).
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 tunnel matches are now supported by firmware. Modify the NFP driver
to compile these match rules. IPv6 matches are handled similar to IPv4
tunnels with the difference the address length. The type of tunnel is
indicated by the same bitmap that is used in IPv4 with an extra bit
signifying that the IPv6 variation should be used.
Only compile IPv6 tunnel matches when the fw features symbol indicated
that they are compatible with the currently loaded fw.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 UDP and GRE tunnel match rule compile helpers share functions for
compiling fields such as IP addresses. However, they handle fields such
tunnel IDs differently.
Create new helper functions for compiling GRE and UDP tunnel key data.
This is in preparation for supporting IPv6 tunnels where these new
functions can be reused.
This patch does not change functionality.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In kernel 5.1, the flow offload API was introduced along with a helper
function to extract the flow_rule from the TC offload struct. Each of the
match helper functions are passed the offload struct and extract the flow
rule to a local variable.
Simplify the code while also removing the extra compat and local variable
calls by extracting the rule once in the main match handler, and passing
a reference to the rule direct to each helper.
This patch does not change driver functionality.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In hdlcdrv_register, failure to register the driver causes a crash.
The three callers of hdlcdrv_register all pass valid pointers and
do not fail. The patch eliminates the unnecessary BUG_ON assertion.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Presently, at boot time, the comphys are enabled. For firmware
compatibility reasons, the comphy driver does not power down the
comphys at boot. Consequently, the ethernet comphys are left active
until the network interfaces are brought through an up/down cycle.
If the port is never used, the port wastes power needlessly. Arrange
for the ethernet comphys to be cycled by the mvpp2 driver as if the
interface went through an up/down cycle during driver probe, thereby
powering them down.
This saves:
270mW per 10G SFP+ port on the Macchiatobin Single Shot (eth0/eth1)
370mW per 10G PHY port on the Macchiatobin Double Shot (eth0/eth1)
160mW on the SFP port on either Macchiatobin flavour (eth3)
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report a rate-limited error if we fail to read the SFP soft status,
and preserve the current status in that case. This avoids I2C bus
errors from triggering a link flap.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King says:
====================
phylib consolidation
Over the last few releases, there has been a push to clean up and
consolidate the phylib code. Some cases have been missed, and this
series catches those cases.
1. Remove redundant .aneg_done initialisers; calling genphy_aneg_done()
for clause 22 PHYs is the default when .aneg_done is not set.
2. Some PHY drivers manually set phydev->pause and phydev->asym_pause,
but we have a helper for this - phy_resolve_aneg_pause(), introduced
in 2d880b8709 ("net: phy: extract pause mode"). Use this in the
lxt, marvell and uPD60620 drivers.
Incidentally, this brings up the question whether marvell fiber mode
is correctly interpreting and advertising the pause parameters.
3. Add a genphy_check_and_restart_aneg() helper, which complements the
clause 45 version of this. This will be useful for PHY drivers that
open code this logic (e.g. marvell.c)
4. Add a genphy_read_status_fixed() helper to read the fixed-mode
status from a clause 22 PHY. lxt and marvell both contain copies
of this code, so convert them over.
5. Arrange marvell driver to use genphy_read_lpa() for copper mode.
This needs some rearrangement of the code in
marvell_read_status_page_an(), but preserves using the PHY specific
status register to derive the current negotiation results.
6. Simplify the marvell driver so we can use the
genphy_read_status_fixed() helper directly rather than
marvell_read_status_page_fixed().
7. Use positive logic in the marvell driver to determine the link
state, and get rid of the REGISTER_LINK_STATUS definition; we
already have a definition for this.
8. The marvell driver reads the PHY specific status register multiple
times when determining the status: once in marvell_update_link()
and again in marvell_read_status_page_an(). This is a waste;
rearrange to read the status register once, and pass its value into
marvell_read_status_page_an(). We preserve using
genphy_update_link() for the copper side.
9. The marvell driver was using private clause 37 definitions, but we
have clause 37 definitions in uapi/linux/mii.h. Use the generic
definitions.
10. Switch the marvell driver to use phy_modify_changed() to modify
the fiber advertisement.
11. Switch the marvell driver to use genphy_check_and_restart_aneg()
introduced above rather than open-coding this functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the helper to check and restart autonegotiation for the marvell
fiber page negotiation setting.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use phy_modify_changed() to change the fiber advertisement register
rather than open coding this functionality.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use existing clause 37 advertising/link partner definitions rather than
private ones for the advertisement registers.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
marvell_read_status_page_an() always reads the PHY status register, but
marvell_update_link() has already done this. Rather than wastefully
reading the register twice in quick succession, read it once in
marvell_read_status_page() and use the result for both.
This makes marvell_update_link() rather pointless, so move it into
marvell_read_status_page().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than using negative logic:
if (there is no link)
set link = 0
else
set link = 1
use the more natural positive logic:
if (there is link)
set link = 1
else
set link = 0
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the initialisation of the link partner state earlier, inside
marvell_read_status_page(), so we don't have the same initialisation
scattered amongst the other files. This is in a similar place to
the genphy implementation, so would result in the same behaviour if
a PHY read error occurs.
This allows us to get rid of marvell_read_status_page_fixed(), which
became a pointless wrapper around genphy_read_status_fixed().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rearrange the Marvell PHY driver to use genphy_read_lpa() rather than
open-coding this functionality.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two drivers and generic code which contain exactly the same
code to read the status of a PHY operating without autonegotiation
enabled. Rather than duplicate this code, provide a helper to read
this information.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper for restarting autonegotiation(), similar to the clause 45
variant. Use it in __genphy_config_aneg()
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several drivers code their own version of this, working from the LPA
register, after setting the ethtool link partner advertisement bitmask.
Use the generic function instead.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove initialisers that set .aneg_done to genphy_aneg_done - this is
the default for clause 22 PHYs, so the initialiser is redundant.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ARCHs that don't support 64 bits division we need to use the
helpers.
Fixes: b60189e039 ("net: stmmac: Integrate EST with TAPRIO scheduler API")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata says:
====================
Add a new Qdisc, ETS
The IEEE standard 802.1Qaz (and 802.1Q-2014) specifies four principal
transmission selection algorithms: strict priority, credit-based shaper,
ETS (bandwidth sharing), and vendor-specific. All these have their
corresponding knobs in DCB. But DCB does not have interfaces to configure
RED and ECN, unlike Qdiscs.
In the Qdisc land, strict priority is implemented by PRIO. Credit-based
transmission selection algorithm can then be modeled by having e.g. TBF or
CBS Qdisc below some of the PRIO bands. ETS would then be modeled by
placing a DRR Qdisc under the last PRIO band.
The problem with this approach is that DRR on its own, as well as the
combination of PRIO and DRR, are tricky to configure and tricky to offload
to 802.1Qaz-compliant hardware. This is due to several reasons:
- As any classful Qdisc, DRR supports adding classifiers to decide in which
class to enqueue packets. Unlike PRIO, there's however no fallback in the
form of priomap. A way to achieve classification based on packet priority
is e.g. like this:
# tc filter add dev swp1 root handle 1: \
basic match 'meta(priority eq 0)' flowid 1:10
Expressing the priomap in this manner however forces drivers to deep dive
into the classifier block to parse the individual rules.
A possible solution would be to extend the classes with a "defmap" a la
split / defmap mechanism of CBQ, and introduce this as a last resort
classification. However, unlike priomap, this doesn't have the guarantee
of covering all priorities. Traffic whose priority is not covered is
dropped by DRR as unclassified. But ASICs tend to implement dropping in
the ACL block, not in scheduling pipelines. The need to treat these
configurations correctly (if only to decide to not offload at all)
complicates a driver.
It's not clear how to retrofit priomap with all its benefits to DRR
without changing it beyond recognition.
- The interplay between PRIO and DRR is also causing problems. 802.1Qaz has
all ETS TCs as a last resort. Switch ASICs that support ETS at all are
likely to handle ETS traffic this way as well. However, the Linux model
is more generic, allowing the DRR block in any band. Drivers would need
to be careful to handle this case correctly, otherwise the offloaded
model might not match the slow-path one.
In a similar vein, PRIO and DRR need to agree on the list of priorities
assigned to DRR. This is doubly problematic--the user needs to take care
to keep the two in sync, and the driver needs to watch for any holes in
DRR coverage and treat the traffic correctly, as discussed above.
Note that at the time that DRR Qdisc is added, it has no classes, and
thus any priorities assigned to that PRIO band are not covered. Thus this
case is surprisingly rather common, and needs to be handled gracefully by
the driver.
- Similarly due to DRR flexibility, when a Qdisc (such as RED) is attached
below it, it is not immediately clear which TC the class represents. This
is unlike PRIO with its straightforward classid scheme. When DRR is
combined with PRIO, the relationship between classes and TCs gets even
more murky.
This is a problem for users as well: the TC mapping is rather important
for (devlink) shared buffer configuration and (ethtool) counters.
So instead, this patch set introduces a new Qdisc, which is based on
802.1Qaz wording. It is PRIO-like in how it is configured, meaning one
needs to specify how many bands there are, how many are strict and how many
are ETS, quanta for the latter, and priomap.
The new Qdisc operates like the PRIO / DRR combo would when configured as
per the standard. The strict classes, if any, are tried for traffic first.
When there's no traffic in any of the strict queues, the ETS ones (if any)
are treated in the same way as in DRR.
The chosen interface makes the overall system both reasonably easy to
configure, and reasonably easy to offload. The extra code to support ETS in
mlxsw (which already supports PRIO) is about 150 lines, of which perhaps 20
lines is bona fide new business logic.
Credit-based shaping transmission selection algorithm can be configured by
adding a CBS Qdisc under one of the strict bands (e.g. TBF can be used to a
similar effect as well). As a non-work-conserving Qdisc, CBS can't be
hooked under the ETS bands. This is detected and handled identically to DRR
Qdisc at runtime. Note that offloading CBS is not subject of this patchset.
The patchset proceeds in four stages:
- Patches #1-#3 are cleanups.
- Patches #4 and #5 contain the new Qdisc.
- Patches #6 and #7 update mlxsw to offload the new Qdisc.
- Patches #8-#10 add selftests for ETS.
Examples:
- Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:
# tc qdisc add dev swp1 root handle 1: \
ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5
# tc qdisc sh dev swp1
qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
- Tweak quantum of one of the classes of the previous Qdisc:
# tc class ch dev swp1 classid 1:4 ets quantum 1000
# tc qdisc sh dev swp1
qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
# tc class ch dev swp1 classid 1:3 ets quantum 1000
Error: Strict bands do not have a configurable quantum.
- Purely strict Qdisc with 1:1 mapping between priorities and TCs:
# tc qdisc add dev swp1 root handle 1: \
ets strict 8 priomap 7 6 5 4 3 2 1 0
# tc qdisc sh dev swp1
qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7
- Use "bands" to specify number of bands explicitly. Underspecified bands
are implicitly ETS and their quantum is taken from MTU. The following
thus gives each band the same weight:
# tc qdisc add dev swp1 root handle 1: \
ets bands 8 priomap 7 6 5 4 3 2 1 0
# tc qdisc sh dev swp1
qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7
v2:
- This addresses points raised by David Miller.
- Patch #4:
- sch_ets.c: Add a comment with description of the Qdisc and the
dequeuing algorithm.
- Kconfig: Add a high-level description to the help blurb.
v1:
- No changes, first upstream submission after RFC.
v3 (internal):
- This addresses review from Jiri Pirko.
- Patch #3:
- Rename to _HR_ instead of to _HIERARCHY_.
- Patch #4:
- pkt_sched.h: Keep all the TCA_ETS_ constants in one enum.
- pkt_sched.h: Rename TCA_ETS_BANDS to _NBANDS, _STRICT to _NSTRICT,
_BAND_QUANTUM to _QUANTA_BAND and _PMAP_BAND to _PRIOMAP_BAND.
- sch_ets.c: Update to reflect the above changes. Add a new policy,
ets_class_policy, which is used when parsing class changes.
Currently that policy is the same as the quanta policy, but that
might change.
- sch_ets.c: Move MTU handling from ets_quantum_parse() to the one
caller that makes use of it.
- sch_ets.c: ets_qdisc_priomap_parse(): WARN_ON_ONCE on invalid
attribute instead of returning an extack.
- Patch #6:
- __mlxsw_sp_qdisc_ets_replace(): Pass the weights argument to this
function in this patch already. Drop the weight computation.
- mlxsw_sp_qdisc_prio_replace(): Rename "quanta" to "zeroes" and
pass for the abovementioned "weights".
- mlxsw_sp_qdisc_prio_graft(): Convert to a wrapper around
__mlxsw_sp_qdisc_ets_graft(), instead of invoking the latter
directly from mlxsw_sp_setup_tc_prio().
- Update to follow the _HIERARCHY_ -> _HR_ renaming.
- Patch #7:
- __mlxsw_sp_qdisc_ets_replace(): The "weights" argument passing and
weight computation removal are now done in a previous patch.
- mlxsw_sp_setup_tc_ets(): Drop case TC_ETS_REPLACE, which is handled
earlier in the function.
- Patch #3 (iproute2):
- Add an example output to the commit message.
- tc-ets.8: Fix output of two examples.
- tc-ets.8: Describe default values of "bands", "quanta".
- q_ets.c: A number of fixes in error messages.
- q_ets.c: Comment formatting: /*padding*/ -> /* padding */
- q_ets.c: parse_nbands: Move duplicate checking to callers.
- q_ets.c: Don't accept both "quantum" and "quanta" as equivalent.
v2 (internal):
- This addresses review from Ido Schimmel and comments from Alexander
Kushnarov.
- Patch #2:
- s/coment/comment in the commit message.
- Patch #4:
- sch_ets: ets_class_is_strict(), ets_class_id(): Constify an argument
- ets_class_find(): RXTify
- Patch #3 (iproute2):
- tc-ets.8: some spelling fixes
- tc-ets.8: add another example
- tc.8: add an ETS to "CLASSFUL QDISCS" section
v1 (internal):
- This addresses RFC reviews from Ido Schimmel and Roman Mashak, bugs found
by Alexander Petrovskiy and myself, and other improvements.
- Patch #2:
- Expand the explanation with an explicit example.
- Patch #4:
- Kconfig: s/sch_drr/sch_ets/
- sch_ets: Reorder includes to be in alphabetical order
- sch_ets: ets_quantum_parse(): Rename the return-pointer argument
from pquantum to quantum, and use it directly, not going through a
local temporary.
- sch_ets: ets_qdisc_quanta_parse(): Convert syntax of function
argument "quanta" from an array to a pointer.
- sch_ets: ets_qdisc_priomap_parse(): Likewise with "priomap".
- sch_ets: ets_qdisc_quanta_parse(), ets_qdisc_priomap_parse(): Invoke
__nla_validate_nested directly instead of nl80211_validate_nested().
- sch_ets: ets_qdisc_quanta_parse(): WARN_ON_ONCE on invalid attribute
instead of returning an extack.
- sch_ets: ets_qdisc_change(): Make the last band the default one for
unmentioned priomap priorities.
- sch_ets: Fix a panic when an offloaded child in a bandwidth-sharing
band notified its ETS parent.
- sch_ets: When ungrafting, add the newly-created invisible FIFO to
the Qdisc hash
- Patch #5:
- pkt_cls.h: Note that quantum=0 signifies a strict band.
- Fix error path handling when ets_offload_dump() fails.
- Patch #6:
- __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function arguments
"quanta" and "priomap" from arrays to pointers.
- Patch #7:
- __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function argument
"weights" from an array to a pointer.
- Patch #9:
- mlxsw/sch_ets.sh: Add a comment explaining packet prioritization.
- Adjust the whole suite to allow testing of traffic classifiers
in addition to testing priomap.
- Patch #10:
- Add a number of new tests to test default priomap band, overlarge
number of bands, zeroes in quanta, and altogether missing quanta.
- Patch #1 (iproute2):
- State motivation for inclusion of this patch in the patcheset in the
commit message.
- Patch #3 (iproute2):
- tc-ets.8: it is now December
- tc-ets.8: explain inactivity WRT using non-WC Qdiscs under ETS band
- tc-ets.8: s/flow/band in explanation of quantum
- tc-ets.8: explain what happens with priorities not covered by priomap
- tc-ets.8: default priomap band is now the last one
- q_ets.c: ets_parse_opt(): Remove unnecessary initialization of
priomap and quanta.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>