linux/net
Vladimir Oltean 67c3ca2c5c net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection
Problem description
-------------------

On an NXP LS1028A (felix DSA driver) with the following configuration:

- ocelot-8021q tagging protocol
- VLAN-aware bridge (with STP) spanning at least swp0 and swp1
- 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700
- ptp4l on swp0.700 and swp1.700

we see that the ptp4l instances do not see each other's traffic,
and they all go to the grand master state due to the
ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition.

Jumping to the conclusion for the impatient
-------------------------------------------

There is a zero-day bug in the ocelot switchdev driver in the way it
handles VLAN-tagged packet injection. The correct logic already exists in
the source code, in function ocelot_xmit_get_vlan_info() added by commit
5ca721c54d ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
But it is used only for normal NPI-based injection with the DSA "ocelot"
tagging protocol. The other injection code paths (register-based and
FDMA-based) roll their own wrong logic. This affects and was noticed on
the DSA "ocelot-8021q" protocol because it uses register-based injection.

By moving ocelot_xmit_get_vlan_info() to a place that's common for both
the DSA tagger and the ocelot switch library, it can also be called from
ocelot_port_inject_frame() in ocelot.c.

We need to touch the lines with ocelot_ifh_port_set()'s prototype
anyway, so let's rename it to something clearer regarding what it does,
and add a kernel-doc. ocelot_ifh_set_basic() should do.

Investigation notes
-------------------

Debugging reveals that PTP event (aka those carrying timestamps, like
Sync) frames injected into swp0.700 (but also swp1.700) hit the wire
with two VLAN tags:

00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc
                                              ~~~~~~~~~~~
00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00
          ~~~~~~~~~~~
00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03
00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00
00000040: 00 00

The second (unexpected) VLAN tag makes felix_check_xtr_pkt() ->
ptp_classify_raw() fail to see these as PTP packets at the link
partner's receiving end, and return PTP_CLASS_NONE (because the BPF
classifier is not written to expect 2 VLAN tags).

The reason why packets have 2 VLAN tags is because the transmission
code treats VLAN incorrectly.

Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX
feature. Therefore, at xmit time, all VLANs should be in the skb head,
and none should be in the hwaccel area. This is done by:

static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb,
					  netdev_features_t features)
{
	if (skb_vlan_tag_present(skb) &&
	    !vlan_hw_offload_capable(features, skb->vlan_proto))
		skb = __vlan_hwaccel_push_inside(skb);
	return skb;
}

But ocelot_port_inject_frame() handles things incorrectly:

	ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));

void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op)
{
	(...)
	if (vlan_tag)
		ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
	(...)
}

The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head
is by calling:

static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
{
	skb->vlan_present = 0;
}

which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get().
This means that ocelot, when it calls skb_vlan_tag_get(), sees
(and uses) a residual skb->vlan_tci, while the same VLAN tag is
_already_ in the skb head.

The trivial fix for double VLAN headers is to replace the content of
ocelot_ifh_port_set() with:

	if (skb_vlan_tag_present(skb))
		ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));

but this would not be correct either, because, as mentioned,
vlan_hw_offload_capable() is false for us, so we'd be inserting dead
code and we'd always transmit packets with VID=0 in the injection frame
header.

I can't actually test the ocelot switchdev driver and rely exclusively
on code inspection, but I don't think traffic from 8021q uppers has ever
been injected properly, and not double-tagged. Thus I'm blaming the
introduction of VLAN fields in the injection header - early driver code.

As hinted at in the early conclusion, what we _want_ to happen for
VLAN transmission was already described once in commit 5ca721c54d
("net: dsa: tag_ocelot: set the classified VLAN during xmit").

ocelot_xmit_get_vlan_info() intends to ensure that if the port through
which we're transmitting is under a VLAN-aware bridge, the outer VLAN
tag from the skb head is stripped from there and inserted into the
injection frame header (so that the packet is processed in hardware
through that actual VLAN). And in all other cases, the packet is sent
with VID=0 in the injection frame header, since the port is VLAN-unaware
and has logic to strip this VID on egress (making it invisible to the
wire).

Fixes: 08d02364b1 ("net: mscc: fix the injection header")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-16 09:59:32 +01:00
..
6lowpan
9p Two fixes headed to stable trees: 2024-05-29 09:25:15 -07:00
802
8021q net: Add struct kernel_ethtool_ts_info 2024-07-15 08:02:26 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
atm atm: clean up a put_user() calls 2024-06-14 19:08:50 -07:00
ax25 ax25: Replace kfree() in ax25_dev_free() with ax25_dev_put() 2024-06-01 15:49:42 -07:00
batman-adv Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks" 2024-06-12 20:18:00 +02:00
bluetooth Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor 2024-08-07 16:36:01 -04:00
bpf bpf-next-for-netdev 2024-07-09 17:01:46 +02:00
bridge netfilter: nf_queue: drop packets with cloned unconfirmed conntracks 2024-08-14 23:37:23 +02:00
caif net: caif: remove unused structs 2024-06-05 10:18:06 +01:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-27 12:14:11 -07:00
ceph libceph: fix crush_choose_firstn() kernel-doc warnings 2024-07-11 16:33:07 +02:00
core net: Make USO depend on CSUM offload 2024-08-09 21:58:08 -07:00
dcb
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-27 12:14:11 -07:00
devlink devlink: Constify the 'table_ops' parameter of devl_dpipe_table_register() 2024-06-05 10:24:57 +01:00
dns_resolver
dsa net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection 2024-08-16 09:59:32 +01:00
ethernet netkit: Fix pkt_type override upon netkit pass verdict 2024-05-25 10:48:57 -07:00
ethtool net: ethtool: Allow write mechanism of LPL and both LPL and EPL 2024-08-15 12:20:14 +02:00
handshake net/handshake: remove redundant assignment to variable ret 2024-04-16 17:14:55 -07:00
hsr net: hsr: cosmetic: Remove extra white space 2024-06-19 17:32:57 -07:00
ieee802154 bpf-next-for-netdev 2024-05-28 07:27:29 -07:00
ife
ipv4 tcp: Update window clamping condition 2024-08-14 10:50:49 +01:00
ipv6 netfilter: allow ipv6 fragments to arrive on different devices 2024-08-14 21:16:12 +02:00
iucv net/iucv: fix use after free in iucv_sock_close() 2024-07-30 15:01:50 +02:00
kcm
key
l2tp l2tp: fix lockdep splat 2024-08-08 08:28:24 -07:00
l3mdev
lapb
llc llc: Constify struct llc_sap_state_trans 2024-07-15 08:51:19 -07:00
mac80211 wifi: mac80211: use monitor sdata with driver only if desired 2024-07-26 12:30:49 +02:00
mac802154 net: mac802154: Fix racy device stats updates by DEV_STATS_INC() and DEV_STATS_ADD() 2024-06-03 11:20:56 +02:00
mctp
mpls sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
mptcp mptcp: correct MPTCP_SUBFLOW_ATTR_SSN_OFFSET reserved size 2024-08-13 19:13:25 -07:00
ncsi net/ncsi: Fix the multi thread manner of NCSI driver 2024-06-01 16:21:44 -07:00
netfilter netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests 2024-08-14 23:44:55 +02:00
netlabel netlabel: fix RCU annotation for IPv4 options on socket creation 2024-05-13 14:58:12 -07:00
netlink net: netlink: remove the cb_mutex "injection" from netlink core 2024-06-10 13:15:40 +01:00
netrom netrom: Fix a memory leak in nr_heartbeat_expiry() 2024-06-17 13:06:23 +01:00
nfc Quite smaller than usual. Notably it includes the fix for the unix 2024-05-23 12:49:37 -07:00
nsh nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). 2024-04-26 12:20:01 +02:00
openvswitch net: openvswitch: store sampling probability in cb. 2024-07-05 17:45:47 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-07-15 13:19:17 -07:00
phonet sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
psample net: psample: fix flag being set in wrong skb 2024-07-11 18:11:31 -07:00
qrtr net: qrtr: ns: Ignore ENODEV failures in ns 2024-06-14 13:17:21 +02:00
rds sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
rfkill net: rfkill: Correct return value in invalid parameter case 2024-06-26 10:49:01 +02:00
rose net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
sched sched: act_ct: take care of padding in struct zones_ht_key 2024-07-26 11:22:57 +01:00
sctp sctp: Fix null-ptr-deref in reuseport_add_sock(). 2024-08-02 16:25:06 -07:00
smc net/smc: add the max value of fallback reason count 2024-08-07 19:36:23 -07:00
strparser
sunrpc nfsd-6.11 fixes: 2024-08-10 10:44:21 -07:00
switchdev net: bridge: switchdev: Improve error message for port_obj_add/del functions 2024-05-08 12:19:12 +01:00
tipc A lot of networking people were at a conference last week, busy 2024-07-25 13:32:25 -07:00
tls net: tls: Pass union tls_crypto_context pointer to memzero_explicit 2024-07-09 11:14:47 -07:00
unix af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash 2024-07-17 22:49:00 +02:00
vmw_vsock vsock: fix recursive ->recvmsg calls 2024-08-15 12:07:04 +02:00
wireless wifi: cfg80211: correct S1G beacon length calculation 2024-07-26 12:32:47 +02:00
x25 net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
xdp xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len 2024-07-25 11:57:27 +02:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-07-15 13:19:17 -07:00
compat.c
devres.c
Kconfig ethtool: provide customized dim profile management 2024-06-25 17:15:06 -07:00
Kconfig.debug
Makefile
socket.c net: Split a __sys_listen helper for io_uring 2024-06-19 07:57:21 -06:00
sysctl_net.c sysctl: Remove check for sentinel element in ctl_table arrays 2024-06-13 10:50:52 +02:00