linux/drivers/net/dsa/sja1105
Vladimir Oltean 1f9fc48fd3 net: dsa: sja1105: fix reception from VLAN-unaware bridges
The blamed commit introduced an unexpected regression in the sja1105
driver. Packets from VLAN-unaware bridge ports get received correctly,
but the protocol stack can't seem to decode them properly.

For ds->untag_bridge_pvid users (thus also sja1105), the blamed commit
did introduce a functional change: dsa_switch_rcv() used to call
dsa_untag_bridge_pvid(), which looked like this:

	err = br_vlan_get_proto(br, &proto);
	if (err)
		return skb;

	/* Move VLAN tag from data to hwaccel */
	if (!skb_vlan_tag_present(skb) && skb->protocol == htons(proto)) {
		skb = skb_vlan_untag(skb);
		if (!skb)
			return NULL;
	}

and now it calls dsa_software_vlan_untag() which has just this:

	/* Move VLAN tag from data to hwaccel */
	if (!skb_vlan_tag_present(skb)) {
		skb = skb_vlan_untag(skb);
		if (!skb)
			return NULL;
	}

thus lacks any skb->protocol == bridge VLAN protocol check. That check
is deferred until a later check for skb->vlan_proto (in the hwaccel area).

The new code is problematic because, for VLAN-untagged packets,
skb_vlan_untag() blindly takes the 4 bytes starting with the EtherType
and turns them into a hwaccel VLAN tag. This is what breaks the protocol
stack.

It would be tempting to "make it work as before" and only call
skb_vlan_untag() for those packets with the skb->protocol actually
representing a VLAN.

But the premise of the newly introduced dsa_software_vlan_untag() core
function is not wrong. Drivers set ds->untag_bridge_pvid or
ds->untag_vlan_aware_bridge_pvid presumably because they send all
traffic to the CPU reception path as VLAN-tagged. So why should we spend
any additional CPU cycles assuming that the packet may be VLAN-untagged?
And why does the sja1105 driver opt into ds->untag_bridge_pvid if it
doesn't always deliver packets to the CPU as VLAN-tagged?

The answer to the latter question is indeed more interesting: it doesn't
need to. This got done in commit 884be12f85 ("net: dsa: sja1105: add
support for imprecise RX"), because I thought it would be needed, but I
didn't realize that it doesn't actually make a difference.

As explained in the commit message of the blamed patch, ds->untag_bridge_pvid
only makes a difference in the VLAN-untagged receive path of a bridge port.
However, in that operating mode, tag_sja1105.c makes use of VLAN tags
with the ETH_P_SJA1105 TPID, and it decodes and consumes these VLAN tags
as if they were DSA tags (aka tag_8021q operation). Even if commit
884be12f85 ("net: dsa: sja1105: add support for imprecise RX") added
this logic in sja1105_bridge_vlan_add():

	/* Always install bridge VLANs as egress-tagged on the CPU port. */
	if (dsa_is_cpu_port(ds, port))
		flags = 0;

that was for _bridge_ VLANs, which are _not_ committed to hardware
in VLAN-unaware mode (aka the mode where ds->untag_bridge_pvid does
anything at all). Even prior to that change, the tag_8021q VLANs
were always installed as egress-tagged on the CPU port, see
dsa_switch_tag_8021q_vlan_add():

	u16 flags = 0; // egress-tagged, non-PVID

	if (dsa_port_is_user(dp))
		flags |= BRIDGE_VLAN_INFO_UNTAGGED |
			 BRIDGE_VLAN_INFO_PVID;

	err = dsa_port_do_tag_8021q_vlan_add(dp, info->vid,
					     flags);
	if (err)
		return err;

Whether the sja1105 driver needs the new flag, ds->untag_vlan_aware_bridge_pvid,
rather than ds->untag_bridge_pvid, is a separate discussion. To fix the
current bug in VLAN-unaware bridge mode, I would argue that the sja1105
driver should not request something it doesn't need, rather than
complicating the core DSA helper. Whereas before the blamed commit, this
setting was harmless, now it has caused breakage.

Fixes: 93e4649efa ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241001140206.50933-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04 11:19:50 -07:00
..
Kconfig ethernet: fix PTP_1588_CLOCK dependencies 2021-08-13 17:49:05 -07:00
Makefile net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX 2021-06-08 14:37:16 -07:00
sja1105_clocking.c net: dsa: sja1105: make read-only const arrays static 2023-09-21 16:12:12 +02:00
sja1105_devlink.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-12-08 18:19:59 -08:00
sja1105_dynamic_config.c net: dsa: sja1105: fix multicast forwarding working only for last added mdb entry 2023-09-11 08:32:30 +01:00
sja1105_dynamic_config.h net: dsa: sja1105: add support for the SJA1110 switch family 2021-06-08 14:37:16 -07:00
sja1105_ethtool.c net: dsa: sja1105: don't use burst SPI reads for port statistics 2021-05-21 14:01:41 -07:00
sja1105_flower.c net: dsa: sja1105: flower: validate control flags 2024-04-18 17:08:37 -07:00
sja1105_main.c net: dsa: sja1105: fix reception from VLAN-unaware bridges 2024-10-04 11:19:50 -07:00
sja1105_mdio.c net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45() 2024-04-04 12:51:45 +02:00
sja1105_ptp.c net: Add struct kernel_ethtool_ts_info 2024-07-15 08:02:26 -07:00
sja1105_ptp.h net: Add struct kernel_ethtool_ts_info 2024-07-15 08:02:26 -07:00
sja1105_spi.c net: dsa: sja1105: complete tc-cbs offload support on SJA1110 2023-09-06 06:23:05 +01:00
sja1105_static_config.c net: update NXP copyright text 2021-09-17 13:52:17 +01:00
sja1105_static_config.h net: update NXP copyright text 2021-09-17 13:52:17 +01:00
sja1105_tas.c net/sched: taprio: replace tc_taprio_qopt_offload :: enable with a "cmd" enum 2023-05-31 10:00:30 +01:00
sja1105_tas.h net: dsa: sja1105: dimension the data structures for a larger port count 2021-05-24 13:59:03 -07:00
sja1105_vl.c net: dsa: tag_8021q: rename dsa_8021q_bridge_tx_fwd_offload_vid 2022-02-27 11:06:14 +00:00
sja1105_vl.h net: update NXP copyright text 2021-09-17 13:52:17 +01:00
sja1105.h net: dsa: sja1105: serialize sja1105_port_mcast_flood() with other FDB accesses 2023-09-11 08:32:30 +01:00