linux/net
Vladimir Oltean 2b30f8291a net: ethtool: add support for MAC Merge layer
The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2
specifications (the other being Frame Preemption; IEEE 802.1Q-2018
clause 6.7.2), which work together to minimize latency caused by frame
interference at TX. The overall goal of TSN is for normal traffic and
traffic with a bounded deadline to be able to cohabitate on the same L2
network and not bother each other too much.

The standards achieve this (partly) by introducing the concept of
preemptible traffic, i.e. Ethernet frames that have a custom value for
the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented
and reassembled at L2 on a link-local basis. The non-preemptible frames
are called express traffic, they are transmitted using a normal SFD, and
they can preempt preemptible frames, therefore having lower latency,
which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
frames around 9K). Preemption is not recursive, i.e. a P frame cannot
preempt another P frame. Preemption also does not depend upon priority,
or otherwise said, an E frame with prio 0 will still preempt a P frame
with prio 7.

In terms of implementation, the standards talk about the presence of an
express MAC (eMAC) which handles express traffic, and a preemptible MAC
(pMAC) which handles preemptible traffic, and these MACs are multiplexed
on the same MII by a MAC merge layer.

To support frame preemption, the definition of the SFD was generalized
to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an
Ethernet frame fragment, or a complete frame. Stations unaware of an SMD
value different from the standard SFD will treat P frames as error
frames. To prevent that from happening, a negotiation process is
defined.

On RX, packets are dispatched to the eMAC or pMAC after being filtered
by their SMD. On TX, the eMAC/pMAC classification decision is taken by
the 802.1Q spec, based on packet priority (each of the 8 user priority
values may have an admin-status of preemptible or express).

The MAC Merge layer and the Frame Preemption parameters have some degree
of independence in terms of how software stacks are supposed to deal
with them. The activation of the MM layer is supposed to be controlled
by an LLDP daemon (after it has been communicated that the link partner
also supports it), after which a (hardware-based or not) verification
handshake takes place, before actually enabling the feature. So the
process is intended to be relatively plug-and-play. Whereas FP settings
are supposed to be coordinated across a network using something
approximating NETCONF.

The support contained here is exclusively for the 802.3 (MAC Merge)
portions and not for the 802.1Q (Frame Preemption) parts. This API is
sufficient for an LLDP daemon to do its job. The FP adminStatus variable
from 802.1Q is outside the scope of an LLDP daemon.

I have taken a few creative licenses and augmented the Linux kernel UAPI
compared to the standard managed objects recommended by IEEE 802.3.
These are:

- ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive
  Processing state diagram, a MAC Merge layer is always supposed to be
  able to receive P frames. However, this implies keeping the pMAC
  powered on, which will consume needless power in applications where FP
  will never be used. If LLDP is used, the reception of an Additional
  Ethernet Capabilities TLV from the link partner is sufficient
  indication that the pMAC should be enabled. So my proposal is that in
  Linux, we keep the pMAC turned off by default and that user space
  turns it on when needed.

- ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called
  aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in
  the boolean netlink attributes offered, so this is also positive here.
  Other than the meaning being reversed, they correspond to the same
  thing.

- ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP
  daemon to maximize the verifyTime variable (delay between SMD-V
  transmissions), to maximize its chances that the LP replies. IEEE says
  that the verifyTime can range between 1 and 128 ms, but the NXP ENETC
  stupidly keeps this variable in a 7 bit register, so the maximum
  supported value is 127 ms. I could have chosen to hardcode this in the
  LLDP daemon to a lower value, but why not let the kernel expose its
  supported range directly.

- ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called
  aMACMergeAddFragSize, and expresses the "additional" fragment size
  (on top of ETH_ZLEN), whereas this expresses the absolute value of the
  fragment size.

- ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed
  object mandated by the standard, but user space clearly needs to know
  what is the minimum supported fragment size of our local receiver,
  since LLDP must advertise a value no lower than that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23 12:44:18 +00:00
..
6lowpan
9p xen: branch for v6.2-rc4 2023-01-12 17:02:20 -06:00
802 treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
8021q net: Remove the obsolte u64_stats_fetch_*_irq() users (net). 2022-10-28 20:13:54 -07:00
appletalk
atm driver core: make struct class.dev_uevent() take a const * 2022-11-24 17:12:15 +01:00
ax25 ax25: af_ax25: Remove unnecessary (void*) conversions 2022-11-16 13:31:03 +00:00
batman-adv Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
bluetooth net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
bpf New Feature: 2022-12-17 14:06:53 -06:00
bpfilter
bridge treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
caif caif: don't assume iov_iter type 2023-01-13 20:44:20 -08:00
can Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
ceph net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
core net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
dcb net: dcb: add helper functions to retrieve PCP and DSCP rewrite maps 2023-01-20 09:33:22 +00:00
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-11-29 13:04:52 -08:00
devlink devlink: add instance lock assertion in devl_is_registered() 2023-01-19 19:08:38 -08:00
dns_resolver cred: Do not default to init_cred in prepare_kernel_cred() 2022-11-01 10:04:52 -07:00
dsa net: dsa: microchip: ptp: move pdelay_rsp correction field to tail tag 2023-01-13 08:40:41 +00:00
ethernet net: ethernet: use sysfs_emit() to instead of scnprintf() 2022-12-07 20:02:44 -08:00
ethtool net: ethtool: add support for MAC Merge layer 2023-01-23 12:44:18 +00:00
hsr hsr: Use a single struct for self_node. 2022-12-01 20:26:22 -08:00
ieee802154 Merge tag 'ieee802154-for-net-next-2022-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next 2022-12-07 17:33:26 -08:00
ife
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-01-20 12:28:23 -08:00
ipv6 ipv6: Remove extra counter pull before gc 2023-01-18 13:00:16 +00:00
iucv
kcm net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
key Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2022-11-29 20:50:51 -08:00
l2tp l2tp: prevent lockdep issue in l2tp_tunnel_register() 2023-01-18 14:44:54 +00:00
l3mdev
lapb
llc
mac80211 Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()" 2023-01-16 17:28:52 +02:00
mac802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-12-08 18:19:59 -08:00
mctp mctp: Remove device type check at unregister 2022-12-19 17:20:22 -08:00
mpls net: Remove the obsolte u64_stats_fetch_*_irq() users (net). 2022-10-28 20:13:54 -07:00
mptcp net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
ncsi net/ncsi: Silence runtime memcpy() false positive warning 2022-12-06 17:29:14 -08:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-01-20 12:28:23 -08:00
netlabel genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
netlink Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
netrom
nfc net: nfc: Fix use-after-free in local_cleanup() 2023-01-13 20:53:44 -08:00
nsh
openvswitch net: openvswitch: release vport resources on failure 2022-12-21 17:48:12 -08:00
packet Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
phonet net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
psample genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
qrtr net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
rds net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
rfkill driver core: make struct class.dev_uevent() take a const * 2022-11-24 17:12:15 +01:00
rose rose: Fix NULL pointer dereference in rose_send_frame() 2022-11-02 11:57:30 +00:00
rxrpc rxrpc: Fix wrong error return in rxrpc_connect_call() 2023-01-12 21:51:55 -08:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-01-20 12:28:23 -08:00
sctp net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
smc net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
strparser
sunrpc net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
switchdev
tipc net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
tls net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
unix unix: Improve locking scheme in unix_show_fdinfo() 2023-01-16 11:21:11 +00:00
vmw_vsock virtio/vsock: replace virtio_vsock_pkt with sk_buff 2023-01-16 13:26:33 +00:00
wireless Driver Core changes for 6.2-rc1 2022-12-16 03:54:54 -08:00
x25 net/x25: Fix skb leak in x25_lapb_receive_frame() 2022-11-15 20:22:19 -08:00
xdp bpf: Expand map key argument of bpf_redirect_map to u64 2022-11-15 09:00:27 -08:00
xfrm net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
compat.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
devres.c
Kconfig Remove DECnet support from kernel 2022-08-22 14:26:30 +01:00
Kconfig.debug net: make NET_(DEV|NS)_REFCNT_TRACKER depend on NET 2022-09-20 14:23:56 -07:00
Makefile devlink: move code to a dedicated directory 2023-01-05 22:12:00 -08:00
socket.c sock: add tracepoint for send recv length 2023-01-13 10:25:10 +00:00
sysctl_net.c