ip6_mr_forward() uses standard RCU protection already.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rcu_read_lock() protection is good enough.
ip6mr_cache_unresolved() uses a dedicated spinlock (mfc_unres_lock)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rcu_read_lock() protection is more than enough.
vif_dev_read() supports either mrt_lock or rcu_read_lock().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6mr_cache_report() first argument can be marked const, and we change
the caller convention about which lock needs to be held.
Instead of read_lock(&mrt_lock), we can use rcu_read_lock().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mr_fill_mroute() uses standard rcu_read_unlock(),
no need to grab mrt_lock anymore.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_mr_forward() uses standard RCU protection already.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rcu_read_lock() protection is good enough.
ipmr_cache_unresolved() uses a dedicated spinlock (mfc_unres_lock)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rcu_read_lock() protection is more than enough.
vif_dev_read() supports either mrt_lock or rcu_read_lock().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipmr_cache_report() first argument can be marked const, and we change
the caller convention about which lock needs to be held.
Instead of read_lock(&mrt_lock), we can use rcu_read_lock().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
igmpmsg_netlink_event() first argument can be marked const.
igmpmsg_netlink_event() reads mrt->net and mrt->id,
both being set once in mr_table_alloc().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will soon use RCU instead of rwlock in ipmr & ip6mr
This preliminary patch adds proper rcu verbs to read/write
(struct vif_device)->dev
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pim6_rcv() is called under rcu_read_lock(), there is
no need to use dev_hold()/dev_put() pair.
IPv4 side was handled in commit 55747a0a73
("ipmr: __pim_rcv() is called under rcu_read_lock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ramadoss says:
====================
net: dsa: microchip: common spi probe for the ksz series switches - part 2
This patch series aims to refactor the ksz_switch_register routine to have the
common flow for the ksz series switch. And this is the follow up patch series.
First, it tries moves the common implementation in the setup from individual
files to ksz_setup. Then implements the common dsa_switch_ops structure instead
of independent registration. And then moves the ksz_dev_ops to ksz_common.c,
it allows the dynamic detection of which ksz_dev_ops to be used based on
the switch detection function.
Finally, the patch updates the ksz_spi probe function to be same for all the
ksz_switches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As of now, there are two spi probes, one ksz8795_spi.c and other
ksz9477_spi.c. This patch combines two files into single ksz_spi.c. The
difference between the two are regmap config and struct ksz8. The regmap
config is assigned based on the platform data. And struct ksz8 is left
untouched, as it is used only ksz8795.c. It can be used for all
other switches also in future.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch delete the ksz8_switch_register and ksz9477_switch_register
since both are calling the ksz_switch_register function. Instead the
ksz_switch_register is called from the probe function.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch move the ksz_dev_ops from individual files to ksz_common.c.
And the dev_ops is assigned to ksz_device based on the switch detect.
This reduces the redundant function and allows to reuse the
functionality for LAN937x which has similar register set.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces the two different menuconfig for ksz9477 and ksz8795
to single ksz_common. so that it can be extended for the other switch
like lan937x. And removes the export_symbols for the extern functions in
the ksz_common.h.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per 'commit 3506b2f42d ("net: dsa: microchip: call
phy_remove_link_mode during probe")' phy_remove_link_mode is added in
the switch_register function after dsa_switch_register. In order to have
the common switch register function, moving this phylink validation to
phylink_get_caps validation hook.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At present, ksz8795.c and ksz9477.c have separate dsa_switch_ops
structure initialization. This patch modifies the files such a way that
ksz switches has common dsa_switch_ops in the ksz_common.c file.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch move the setting the start bit from the individual switch
configuration to ksz_setup
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the enabling the multicast storm protection from
individual setup function to ksz_setup function.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch move the 10% broadcast protection from the individual setup
to ksz_setup. In the ksz9477, broadcast protection is updated in reset
function.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch move the common initialization of switches to ksz_setup and
perform the switch specific initialization using the ksz_dev_ops
function pointer.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to transmit the STP BPDU packet to the CPU port, the STP
address 01-80-c2-00-00-00 has to be added to static alu table for
ksz8795 series switch. For the ksz9477 switch, there is reserved
multicast table which handles forwarding the particular set of
multicast address to cpu port. So enabling the multicast reserved table
and updated the cpu port index. The stp addr is enabled during the setup
phase using the enable_stp_addr pointer in struct ksz_dev_ops.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To have the common set of initialization in ksz_setup, introduced the
new config_cpu_port member to ksz_dev_ops. Since both the ksz8795.c and
ksz9477.c configuring the cpu port in the setup function, introduced the
member.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch renames the shutdown to reset in ksz_dev_ops in order to use
the reset dev_ops in the ksz_setup.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu says:
====================
Bonding: add per-port priority support
This patch set add per-port priority for bonding failover re-selection.
The first patch add a new filed for bond_opt_value so we can set slave
value easier. I will update the bond_option_queue_id_set() setting
in later patch.
The second patch add the per-port priority for bonding. I defined
it as s32 to compatible with team prio option, which also use a s32
value.
v3: store slave_dev in bond_opt_value directly to simplify setting
values for slave.
v2: using the extant bonding options management stuff instead setting
slave prio in bond_slave_changelink() directly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add per port priority support for bonding active slave re-selection during
failover. A higher number means higher priority in selection. The primary
slave still has the highest priority. This option also follows the
primary_reselect rules.
This option could only be configured via netlink.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, bond_opt_value are mostly used for bonding option settings. If
we want to set a value for slave, we need to re-alloc a string to store
both slave name and vlaue, like bond_option_queue_id_set() does, which
is complex and dumb.
As Jon suggested, let's add a union field slave_dev for bond_opt_value,
which will be benefit for future slave option setting. In function
__bond_opt_init(), we will always check the extra field and set it
if it's not NULL.
Suggested-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GPY115 and GPY2xx PHYs contain an integrated temperature sensor. It
accuracy is +/- 5°C. Add support for it.
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220622141716.3517645-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of open-coding the bad characters replacement in the hwmon name,
use the new devm_hwmon_sanitize_name().
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of open-coding the bad characters replacement in the hwmon name,
use the new hwmon_sanitize_name().
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon says:
====================
Broadcom PTP PHY support
This adds PTP support for the Broadcom PHY BCM54210E (and the
specific variant BCM54213PE that the rpi-5.15 branch uses).
This has only been tested on the RPI CM4, which has one port.
There are other Broadcom chips which may benefit from using the
same framework here, although with different register sets.
====================
Link: https://lore.kernel.org/r/20220622050454.878052-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The perout function is used to generate a 1PPS signal, synchronized
to the PHC. This is accomplished by a using the hardware oneshot
functionality, which is reset by a timer.
The external timestamp function is set up for a 1PPS input pulse,
and uses a timer to poll for temestamps.
Both functions use the SYNC_OUT/SYNC_IN1 pin, so cannot run
simultaneously.
Co-developed-by: Lasse Johnsen <l@ssejohnsen.me>
Signed-off-by: Lasse Johnsen <l@ssejohnsen.me>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds PTP support for BCM54210E Broadcom PHYs, in particular,
the BCM54213PE, as used in the Rasperry PI CM4. It has only been
tested on that hardware.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add 'struct bcm_ptp_private' to bcm54xx_phy_priv which points to
an optional PTP structure attached to the PHY. This is allocated
on probe if PHY PTP support is configured, and if the driver supports
PTP for the specified PHY.
Add the bcm_ptp_probe(), bcm_ptp_config_init() and bcn_ptp_stop()
API functions to the bcm-phy library.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
net: dsa: mv88e6xxx: get rid of SPEED_MAX
This series does two things:
1. it gets rid of mv88e6065_port_set_speed_duplex() which is completely
unused (do we support this device? I couldn't find it in the tables
in chip.c) This has a max speed of 200Mbps which we don't support.
2. get rid of the SPEED_MAX constant, which is used to configure a DSA
or CPU port to their maximum speed during initialisation. We no
longer need this as we can derive the maximum port speed from the
mac_capabilities instead.
The reason for making this change is in preparation for phylink to be
used by DSA for CPU ports. This omission has come back to bite us with
the conversion of DSA drivers to phylink_pcs, since phylink_pcs won't
get used unless phylink is being used. Particularly with this driver,
it is very common for DT descriptions to omit the fixed-link details
which means "use maximum speed".
It will eventually be necessary to hoist the selection of "max speed"
into the DSA layer (trivial) and also have a way for the DSA driver
to tell the DSA layer which interface it should be using for these
ports.
====================
Link: https://lore.kernel.org/r/YrGQBssOvQBZiDS4@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, all the device specific speed setting functions convert
SPEED_MAX to the actual speed of the port. Rather than having each
of the mv88e6xxx chip specifics handling SPEED_MAX, derive it from
the mac_capabilities instead.
This is only needed for CPU and DSA ports, so move the logic up into
mv88e6xxx_setup_port() - which allows us to kill off all users of
SPEED_MAX throughout the driver.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove mv88e6065_port_set_speed_duplex() - this is never called, and
thus is completely redundant.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - regressions:
- netfilter: cttimeout: fix slab-out-of-bounds read in cttimeout_net_exit
Current release - new code bugs:
- bpf: ftrace: keep address offset in ftrace_lookup_symbols
- bpf: force cookies array to follow symbols sorting
Previous releases - regressions:
- ipv4: ping: fix bind address validity check
- tipc: fix use-after-free read in tipc_named_reinit
- eth: veth: add updating of trans_start
Previous releases - always broken:
- sock: redo the psock vs ULP protection check
- netfilter: nf_dup_netdev: fix skb_under_panic
- bpf: fix request_sock leak in sk lookup helpers
- eth: igb: fix a use-after-free issue in igb_clean_tx_ring
- eth: ice: prohibit improper channel config for DCB
- eth: at803x: fix null pointer dereference on AR9331 phy
- eth: virtio_net: fix xdp_rxq_info bug after suspend/resume
Misc:
- eth: hinic: replace memcpy() with direct assignment
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmK0P+0SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkmBkP/1m5Et04wgtlEfQJtudZj0Sadra0tu6P
vaYlqtiRNMziSY/hxG1p4w7giM4gD7fD3S12Pc/ueCaUwxxILN/eZ/hNgCq9huf6
IbmVmfq6YNZwDaNzFDP8UcIqjnxbg1B3XD41dN7+FggA9ogGFkOvuAcJByzdANVX
BLOkQmGP22+pNJmniH3KYvCZlHIa+LVeRjdjdM+1/LKDs2pxpBi97obyzb5zUiE5
c5E7+BhkGI9X6V1TuHVCHIEFssYNWLiTJcw76HptWmK9Z/DlDEeVlHzKbAMNTycl
I8eTLXnqgye0KCKOqJ4fN+YN42ypdDzrUILKMHGEddG1lOot/2XChgp8+EqMY7Nx
Gjpjh28jTsKdCZMFF3lxDGxeonHciP6lZA80g3GNk4FWUVrqnKEYpdy+6psTkpDr
HahjmFWylGXfmPIKJrsiVGIyxD4ObkRF6SSH7L8j5tAVGxaB5MDFrCws136kACCk
ZyZiXTS0J3Cn1fAb2/vGKgDFhbEWykITYPaiVo7pyrO1jju5qQTtiKiABpcX0Ejs
WxvPA8HB61+kEapIzBLhhxRl25CXTleGE986au2MVh0I/HuQBxVExrRE9FgThjwk
YbSKhR2JOcD5B94HRQXVsQ05q02JzxmB0kVbqSLcIAbCOuo++LZCIdwR5XxSpF6s
AAFhqQycWowh
=JFWo
-----END PGP SIGNATURE-----
Merge tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- netfilter: cttimeout: fix slab-out-of-bounds read in
cttimeout_net_exit
Current release - new code bugs:
- bpf: ftrace: keep address offset in ftrace_lookup_symbols
- bpf: force cookies array to follow symbols sorting
Previous releases - regressions:
- ipv4: ping: fix bind address validity check
- tipc: fix use-after-free read in tipc_named_reinit
- eth: veth: add updating of trans_start
Previous releases - always broken:
- sock: redo the psock vs ULP protection check
- netfilter: nf_dup_netdev: fix skb_under_panic
- bpf: fix request_sock leak in sk lookup helpers
- eth: igb: fix a use-after-free issue in igb_clean_tx_ring
- eth: ice: prohibit improper channel config for DCB
- eth: at803x: fix null pointer dereference on AR9331 phy
- eth: virtio_net: fix xdp_rxq_info bug after suspend/resume
Misc:
- eth: hinic: replace memcpy() with direct assignment"
* tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: openvswitch: fix parsing of nw_proto for IPv6 fragments
sock: redo the psock vs ULP protection check
Revert "net/tls: fix tls_sk_proto_close executed repeatedly"
virtio_net: fix xdp_rxq_info bug after suspend/resume
igb: Make DMA faster when CPU is active on the PCIe link
net: dsa: qca8k: reduce mgmt ethernet timeout
net: dsa: qca8k: reset cpu port on MTU change
MAINTAINERS: Add a maintainer for OCP Time Card
hinic: Replace memcpy() with direct assignment
Revert "drivers/net/ethernet/neterion/vxge: Fix a use-after-free bug in vxge-main.c"
net: phy: smsc: Disable Energy Detect Power-Down in interrupt mode
ice: ethtool: Prohibit improper channel config for DCB
ice: ethtool: advertise 1000M speeds properly
ice: Fix switchdev rules book keeping
ice: ignore protocol field in GTP offload
netfilter: nf_dup_netdev: add and use recursion counter
netfilter: nf_dup_netdev: do not push mac header a second time
selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh
net/tls: fix tls_sk_proto_close executed repeatedly
erspan: do not assume transport header is always set
...
All small changes, mostly device-specific:
- A regression fix for PCM WC-page allocation on x86
- A regression fix for i915 audio component binding
- Fixes for (longstanding) beep handling bug
- Runtime PM fixes for Intel LPE HDMI audio
- A couple of pending FireWire fixes
- Usual HD-audio and USB-audio quirks, new Intel dspconf entries
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmKzM0YOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/eUg//bQ2kaPY4p75mBc7cxhwiJnxU8cnLxSJ4O7/A
Roz+9Sp741TIkY+jKXxvdLSZ8aqOF2H2Tx1DE1NQdOr55yVNIWJIZ6uu2Oue0MKJ
A7xsQalUf7tUehXTEBDjeojoM3WA7AYvpz3PLXWXsc1QCH/sM9baVAXdZ5SryuuT
7tCRsDwTtzHfIsS5N4rWEl9e8xzvTtQ/MluprbUia+WLrLorov4esDJokbCeolqV
sdB24xhBttJDwuQ7FANki+FIKldDdHQKuCFk4FocdnmiUWYTWVSkeAJbHaWjm9HJ
FZgXkD4UiwBmeiKIE0uCHaQ5QwOtJzXSI+eQU37VnAkfmQ3nyhxYrxXX4E3FxYss
NHoMXfCmequXTq/aF+QemortgxI+c78CpN7tC/45pJCspDko1o/44D/BwjldMRRp
f+4EiGM8xTYge53NmlTJvSAo55WdyDCWYX7JKxNbhH9KC1lUlNNnGemD5AhjeGRs
3nC95Iva7IWrDHVaAYmhMqmlCG9CZTJR/C4xWLNdBfdvyQKx5pRrqh5wlt24A0+0
crrylt56c3eDzOvCKwLj8cUFJyf9Qp2ZcxJowdxS1S+H0qZcWkg3i9gqDh7F7gRB
s7gLwjZym6rYylae7lxOZp8y81JNrHJRD2R8dAWjrIadxRFXYm8rFt61v6mjFt/V
jVkyGHQ=
=TcQa
-----END PGP SIGNATURE-----
Merge tag 'sound-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All small changes, mostly device-specific:
- A regression fix for PCM WC-page allocation on x86
- A regression fix for i915 audio component binding
- Fixes for (longstanding) beep handling bug
- Runtime PM fixes for Intel LPE HDMI audio
- A couple of pending FireWire fixes
- Usual HD-audio and USB-audio quirks, new Intel dspconf entries"
* tag 'sound-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for Clevo NS50PU
ALSA: hda: Fix discovery of i915 graphics PCI device
ALSA: hda/via: Fix missing beep setup
ALSA: hda/conexant: Fix missing beep setup
ALSA: memalloc: Drop x86-specific hack for WC allocations
ALSA: hda/realtek: Add quirk for Clevo PD70PNT
ALSA: x86: intel_hdmi_audio: use pm_runtime_resume_and_get()
ALSA: x86: intel_hdmi_audio: enable pm_runtime and set autosuspend delay
ALSA: hda: intel-nhlt: remove use of __func__ in dev_dbg
ALSA: hda: intel-dspcfg: use SOF for UpExtreme and UpExtreme11 boards
firewire: convert sysfs sprintf/snprintf family to sysfs_emit
firewire: cdev: fix potential leak of kernel stack due to uninitialized value
ALSA: hda/realtek: Apply fixup for Lenovo Yoga Duet 7 properly
ALSA: hda/realtek - ALC897 headset MIC no sound
ALSA: usb-audio: US16x08: Move overflow check before array access
ALSA: hda/realtek: Add mute LED quirk for HP Omen laptop
Add support for ethtool -p|--identify
by enabling blinking of the panel LED if supported by the NIC firmware.
Signed-off-by: Sixiang Chen <sixiang.chen@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220622083938.291548-1-simon.horman@corigine.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When a packet enters the OVS datapath and does not match any existing
flows installed in the kernel flow cache, the packet will be sent to
userspace to be parsed, and a new flow will be created. The kernel and
OVS rely on each other to parse packet fields in the same way so that
packets will be handled properly.
As per the design document linked below, OVS expects all later IPv6
fragments to have nw_proto=44 in the flow key, so they can be correctly
matched on OpenFlow rules. OpenFlow controllers create pipelines based
on this design.
This behavior was changed by the commit in the Fixes tag so that
nw_proto equals the next_header field of the last extension header.
However, there is no counterpart for this change in OVS userspace,
meaning that this field is parsed differently between OVS and the
kernel. This is a problem because OVS creates actions based on what is
parsed in userspace, but the kernel-provided flow key is used as a match
criteria, as described in Documentation/networking/openvswitch.rst. This
leads to issues such as packets incorrectly matching on a flow and thus
the wrong list of actions being applied to the packet. Such changes in
packet parsing cannot be implemented without breaking the userspace.
The offending commit is partially reverted to restore the expected
behavior.
The change technically made sense and there is a good reason that it was
implemented, but it does not comply with the original design of OVS.
If in the future someone wants to implement such a change, then it must
be user-configurable and disabled by default to preserve backwards
compatibility with existing OVS versions.
Cc: stable@vger.kernel.org
Fixes: fa642f0883 ("openvswitch: Derive IP protocol number for IPv6 later frags")
Link: https://docs.openvswitch.org/en/latest/topics/design/#fragments
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20220621204845.9721-1-roriorden@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>