Wangxun SP chip supports to connect with external PHY (marvell 88x3310),
which links to 10GBASE-T/1000BASE-T/100BASE-T. Add the identification of
media types from subsystem device IDs. For sp_media_copper, register mdio
bus for the external PHY.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Disable data path before PCS VR reset while switching PCS mode, to prevent
the blocking of data path. Enable AN interrupt for CL37 auto-negotiation.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since XPCS device identifier is implemented in the firmware version
0x20010 and above, so add a warning to prompt the users to upgrade the
firmware to make sure the driver works.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wangxun NICs support the connection with SFP to RJ45 module. In this case,
PCS need to be configured in SGMII mode.
According to chapter 6.11.1 "SGMII Auto-Negitiation" of DesignWare Cores
Ethernet PCS (version 3.20a) and custom design manual, do the following
configuration when the interface mode is SGMII.
1. program VR_MII_AN_CTRL bit(3) [TX_CONFIG] = 1b (PHY side SGMII)
2. program VR_MII_AN_CTRL bit(8) [MII_CTRL] = 1b (8-bit MII)
3. program VR_MII_DIG_CTRL1 bit(0) [PHY_MODE_CTRL] = 1b
Also CL37 AN in backplane configurations need to be enabled because of the
special hardware design. Another thing to note is that PMA needs to be
reconfigured before each CL37 AN configuration for SGMII, otherwise AN will
fail, although we don't know why.
On this device, CL37_ANSGM_STS (bit[4:1] of VR_MII_AN_INTR_STS) indicates
the status received from remote link during the auto-negotiation, and
self-clear after the auto-negotiation is complete.
Meanwhile, CL37_ANCMPLT_INTR will be set to 1, to indicate CL37 AN is
complete. So add another way to get the state for CL37 SGMII.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable CL37 AN complete interrupt for DW XPCS. It requires to clear the
bit(0) [CL37_ANCMPLT_INTR] of VR_MII_AN_INTR_STS after AN completed.
And there is a quirk for Wangxun devices to enable CL37 AN in backplane
configurations because of the special hardware design.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to chapter 6 of DesignWare Cores Ethernet PCS (version 3.20a)
and custom design manual, add a configuration flow for switching interface
mode.
If the interface changes, the following setting is required:
1. wait VR_XS_PCS_DIG_STS bit(4, 2) [PSEQ_STATE] = 100b (Power-Good)
2. write SR_XS_PCS_CTRL2 to select various PCS type
3. write SR_PMA_CTRL1 and/or SR_XS_PCS_CTRL1 for link speed
4. program PMA registers
5. write VR_XS_PCS_DIG_CTRL1 bit(15) [VR_RST] = 1b (Vendor-Specific
Soft Reset)
6. wait for VR_XS_PCS_DIG_CTRL1 bit(15) [VR_RST] to get cleared
Only 10GBASE-R/SGMII/1000BASE-X modes are planned for the current Wangxun
devices. And there is a quirk for Wangxun devices to switch mode although
the interface in phylink state has not changed, since PCS will change to
default 10GBASE-R when the ethernet driver(txgbe) do LAN reset.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since Wangxun 10Gb NICs require some special configuration on the IP of
Synopsys Designware XPCS, introduce dev_flag for different vendors. Read
OUI from device identifier registers, to detect Wangxun devices.
And xpcs_soft_reset() is skipped to avoid the reset of device identifier
registers.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
tools: ynl: handful of forward looking updates
Small YNL improvements, mostly for work-in-progress families.
====================
Link: https://lore.kernel.org/r/20230824003056.1436637-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Differentiate between empty list and None for member lists.
New families may want to create request responses with no attribute.
If we treat those the same as None we end up rendering
a full parsing policy in user space, instead of an empty one.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20230824003056.1436637-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We look for attributes inside do.request, but there's another
layer of nesting in the spec, look inside do.request.attributes.
This bug had no effect as all global policies we generate (fou)
seem to be full, anyway, and we treat full and empty the same.
Next patch will change the treatment of empty policies.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20230824003056.1436637-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recent changes made us assume that input for binary data is in hex.
When using YNL as a Python library it's possible to pass in raw bytes.
Bring the ability to do that back.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20230824003056.1436637-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recent merge had a conflict in:
drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c
between commit:
aeb660171b ("net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups")
from Linus' tree and commit:
cb5ebe4896f9 ("net/mlx5e: Move MACsec flow steering operations to be used as core library")
from the mlx5-next tree. This was missed and the former commit
got lost, bring it back.
Fixes: 3c5066c6b0 ("Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20230815123725.4ef5b7b9@canb.auug.org.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Documentation/process/license-rules.rst and checkpatch expect the SPDX
identifier syntax for multiple licenses to use capital "OR". Correct it
to keep consistent format and avoid copy-paste issues.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: FLorian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20230823085632.116725-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky says:
====================
mlx5 MACsec RoCEv2 support
From Patrisious:
This series extends previously added MACsec offload support
to cover RoCE traffic either.
In order to achieve that, we need configure MACsec with offload between
the two endpoints, like below:
REMOTE_MAC=10:70:fd:43:71:c0
* ip addr add 1.1.1.1/16 dev eth2
* ip link set dev eth2 up
* ip link add link eth2 macsec0 type macsec encrypt on
* ip macsec offload macsec0 mac
* ip macsec add macsec0 tx sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16
* ip macsec add macsec0 rx port 1 address $REMOTE_MAC
* ip macsec add macsec0 rx port 1 address $REMOTE_MAC sa 0 pn 1 on key 01 ead3664f508eb06c40ac7104cdae4ce5
* ip addr add 10.1.0.1/16 dev macsec0
* ip link set dev macsec0 up
And in a similar manner on the other machine, while noting the keys order
would be reversed and the MAC address of the other machine.
RDMA traffic is separated through relevant GID entries and in case
of IP ambiguity issue - meaning we have a physical GIDs and a MACsec
GIDs with the same IP/GID, we disable our physical GID in order
to force the user to only use the MACsec GID.
v0: https://lore.kernel.org/netdev/20230813064703.574082-1-leon@kernel.org/
* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion
net/mlx5: Add RoCE MACsec steering infrastructure in core
net/mlx5: Configure MACsec steering for ingress RoCEv2 traffic
net/mlx5: Configure MACsec steering for egress RoCEv2 traffic
IB/core: Reorder GID delete code for RoCE
net/mlx5: Add MACsec priorities in RDMA namespaces
RDMA/mlx5: Implement MACsec gid addition and deletion
net/mlx5: Maintain fs_id xarray per MACsec device inside macsec steering
net/mlx5: Remove netdevice from MACsec steering
net/mlx5e: Move MACsec flow steering and statistics database from ethernet to core
net/mlx5e: Rename MACsec flow steering functions/parameters to suit core naming style
net/mlx5: Remove dependency of macsec flow steering on ethernet
net/mlx5e: Move MACsec flow steering operations to be used as core library
macsec: add functions to get macsec real netdevice and check offload
====================
Link: https://lore.kernel.org/r/20230821073833.59042-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It's somewhat unfortunate but with (my?) the current tooling
if people post new versions of a set in reply to an old version
managing the review queue gets difficult. So recommend against it.
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230823154922.1162644-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the necessity to modify skb_ext_total_length() when new extension
types are added.
Also reduces the line count a bit.
With optimizations enabled the function is folded down to the same
constant value as before during compilation.
This has been validated on x86 with GCC 6.5.0 and 13.2.1.
Also a similar construct has been validated on godbolt.org with GCC 5.1.
In any case the compiler has to be able to evaluate the construct at
compile-time for the BUILD_BUG_ON() in skb_extensions_init().
Even if not evaluated at compile-time this function would only ever
be executed once at run-time, so the overhead would be very minuscule.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230823-skb_ext-simplify-v2-1-66e26cd66860@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
and netfilter
Fixes to fixes:
- nf_tables:
- GC transaction race with abort path
- defer gc run if previous batch is still pending
Previous releases - regressions:
- ipv4: fix data-races around inet->inet_id
- phy: fix deadlocking in phy_error() invocation
- mdio: fix C45 read/write protocol
- ipvlan: fix a reference count leak warning in ipvlan_ns_exit()
- ice: fix NULL pointer deref during VF reset
- i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
- tg3: use slab_build_skb() when needed
- mtk_eth_soc: fix NULL pointer on hw reset
Previous releases - always broken:
- core: validate veth and vxcan peer ifindexes
- sched: fix a qdisc modification with ambiguous command request
- devlink: add missing unregister linecard notification
- wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
- batman:
- do not get eth header before batadv_check_management_packet
- fix batadv_v_ogm_aggr_send memory leak
- bonding: fix macvlan over alb bond support
- mlxsw: set time stamp fields also when its type is MIRROR_UTC
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTnJIQSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkt7kP/jy6HOMwSOMFbtxQD2m89EImr6ZlLUPg
H09seQzC5nwRbgZrdzukmM27HDKEkYe1sPyxhpS8E4iAslFaefEvnWqOY0oiQSpH
OuF4mP/cS9QKb62NwKVrau3SCARS9arLmOF0mcJNdDOWwucE+SoFaebxSMitAU/w
k8hHVsLwc5dwZAYznOl2/qsmPBnIUsxfymNJE/RuFqj1nHccGybh9mJKpAxc0knj
QEjqno//PgAXPV/X3mH/wG0fcsXs0OlAnBS9yA95GNzuR2yWrh7bD/et99En/elS
8paUio+O3P6Y6WaewgDYFm44pf/x+hFb18Irtab82BkdRw+lgFyF23g8IH7ToJAE
mEaxwdS7AQ4XEunNyJsjwiffWUG1nFaoIhaGb0Lo1qmgLHDo+rrNhkrBWvZxSf0Q
8QlMnCXopJ1c5Qltz5QNVaWPErpCcanxV3cpNlG+lTpfamWBrUpuv/EhHCUF/fr3
hlgJEm+WoFTvexO+QC3CyJDz2JYLLMaaYaoUZ1aJS2dtTTc3tfUjEL8VcopfXI87
2FXJ3qEtCkvfdtfFjhofw97qHDvGrTXa9r2JSh1Pp8v15pKdM2P/lMYxd4B0cSEw
9udW/3bWkvHZayzBWvqDEiz3UTID1+uX0/qpBWY40QzTdIXo6sBrCCk93tjJUdcA
kXjw9HkSqW6H
=WKil
-----END PGP SIGNATURE-----
Merge tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wifi, can and netfilter.
Fixes to fixes:
- nf_tables:
- GC transaction race with abort path
- defer gc run if previous batch is still pending
Previous releases - regressions:
- ipv4: fix data-races around inet->inet_id
- phy: fix deadlocking in phy_error() invocation
- mdio: fix C45 read/write protocol
- ipvlan: fix a reference count leak warning in ipvlan_ns_exit()
- ice: fix NULL pointer deref during VF reset
- i40e: fix potential NULL pointer dereferencing of pf->vf in
i40e_sync_vsi_filters()
- tg3: use slab_build_skb() when needed
- mtk_eth_soc: fix NULL pointer on hw reset
Previous releases - always broken:
- core: validate veth and vxcan peer ifindexes
- sched: fix a qdisc modification with ambiguous command request
- devlink: add missing unregister linecard notification
- wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
- batman:
- do not get eth header before batadv_check_management_packet
- fix batadv_v_ogm_aggr_send memory leak
- bonding: fix macvlan over alb bond support
- mlxsw: set time stamp fields also when its type is MIRROR_UTC"
* tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
selftests: bonding: add macvlan over bond testing
selftest: bond: add new topo bond_topo_2d1c.sh
bonding: fix macvlan over alb bond support
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix out of memory error handling
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: validate all pending tables
ibmveth: Use dcbf rather than dcbfl
i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
net/sched: fix a qdisc modification with ambiguous command request
igc: Fix the typo in the PTM Control macro
batman-adv: Hold rtnl lock during MTU update via netlink
igb: Avoid starting unnecessary workqueues
can: raw: add missing refcount for memory leak fix
can: isotp: fix support for transmission of SF without flow control
bnx2x: new flag for track HW resource allocation
sfc: allocate a big enough SKB for loopback selftest packet
...
1) Patches #1..#13 From Jiri:
The goal of this patchset is to make the SF code cleaner.
Benefit from previously introduced devlink_port struct containerization
to avoid unnecessary lookups in devlink port ops.
Also, benefit from the devlink locking changes and avoid unnecessary
reference counting.
2) Patches #14,#15:
Add ability to configure proto both UDP and TCP selectors in RX and TX
directions.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmTljEoACgkQSD+KveBX
+j6PFAgAi1x6JuJDs2bdNHu9ocNUmLwGmg5k4SugO4QaKltIor1ZWupabK44Fd7d
Wit7xLPwP6qOK0b3l6J2FtpeFn8nceyudiXmGGEoE/ea9j75GXQkydqDWBw6lvTx
Y79FUks24G5eio/Lu/K3gdtcnx0W+vaYuWUNhgmF2d0NM4hRyuszdCe06cXjt524
1EX/9WFSRmo1hga9xNeK8IHpF1E6CuBsvvKML2qJsuCmUZ2qRvnHBPjMEHAsof/G
5mcpiG/l5f34fWzSgFla3HZzjuf2t8Vbku/gN++xrfFWp4q1ZIDbgp0twII6mF0v
Oddkflx9DDXh8gBQikNiCy6Sg2VL5g==
=qTlO
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-08-22
1) Patches #1..#13 From Jiri:
The goal of this patchset is to make the SF code cleaner.
Benefit from previously introduced devlink_port struct containerization
to avoid unnecessary lookups in devlink port ops.
Also, benefit from the devlink locking changes and avoid unnecessary
reference counting.
2) Patches #14,#15:
Add ability to configure proto both UDP and TCP selectors in RX and TX
directions.
* tag 'mlx5-updates-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Support IPsec upper TCP protocol selector
net/mlx5e: Support IPsec upper protocol selector field offload for RX
net/mlx5: Store vport in struct mlx5_devlink_port and use it in port ops
net/mlx5: Check vhca_resource_manager capability in each op and add extack msg
net/mlx5: Relax mlx5_devlink_eswitch_get() return value checking
net/mlx5: Return -EOPNOTSUPP in mlx5_devlink_port_fn_migratable_set() directly
net/mlx5: Reduce number of vport lookups passing vport pointer instead of index
net/mlx5: Embed struct devlink_port into driver structure
net/mlx5: Don't register ops for non-PF/VF/SF port and avoid checks in ops
net/mlx5: Remove no longer used mlx5_esw_offloads_sf_vport_enable/disable()
net/mlx5: Introduce mlx5_eswitch_load/unload_sf_vport() and use it from SF code
net/mlx5: Allow mlx5_esw_offloads_devlink_port_register() to register SFs
net/mlx5: Push devlink port PF/VF init/cleanup calls out of devlink_port_register/unregister()
net/mlx5: Push out SF devlink port init and cleanup code to separate helpers
net/mlx5: Rework devlink port alloc/free into init/cleanup
====================
Link: https://lore.kernel.org/all/20230823051012.162483-1-saeed@kernel.org/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmTmI1cNHGZ3QHN0cmxl
bi5kZQAKCRBwkajZrV/2AKBEEACACRkBNJ38IZoNhRdDWWVpoGiBL08BBZ/9Fdhh
Cc/iZ0d/XWcAS8qmPlABk82rwZ7EwW0l+9VGai4easY37S6SC0qLKZQYScZj5Fpl
hUMRiEn/Hd1fYjgGPCPG7dCFHYmh0JzXDFDDrBE9eRJmo7JdU/M9amLxYa2q1La7
vvC6f9MO7+zUeCl5KLOpCBl3/kLDadHSA0FBaPIWP3K+Pd1wR2QJpNoy8U7XzZJP
0+oS6kqqaOhAKImCzct2de1xfY4djnMzYYxAqxAUdd60/2dLiT+NJK03LA+FMKFX
7bZY/CnoqWZzXbWcMAC/fg7nbj7zSS1HIgOft3zbj1sGZrhZmINC3hTjiIeSwyZV
/n0fbV3IQaGCWx3dAGUQpuuCk3FwpIsw4NyRM8v43mnbFeaon/dBtMycXsWP+xiH
VMc0j+BJl5zWNynZVTF1PYuNwkX9uubhDVrgtkqZZD+9RzE8i6DiRf7deOBLsI3N
XlJpuc34hgGKe3s+Wn1FOY7jMO4FG6OEjB67t0tpjgAxg4mnuxGncXPV+dbTDq9k
fgwntbo5RAL9R4itb2Qfy0cg4NiFF1Nqjyzxo+bBMMByst1hlsrAX/V7LInKF9Hi
VI4X8YRdV2b8cQVFpqBigJS/k7wRUH7pdgd7YA6QSDVrBSp5mLf49+L7gaGOTJ6i
hag4pg==
=EVaB
-----END PGP SIGNATURE-----
Merge tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
This PR contains nf_tables updates for your *net* tree.
First patch fixes table validation, I broke this in 6.4 when tracking
validation state per table, reported by Pablo, fixup from myself.
Second patch makes sure objects waiting for memory release have been
released, this was broken in 6.1, patch from Pablo Neira Ayuso.
Patch three is a fix-for-fix from previous PR: In case a transaction
gets aborted, gc sequence counter needs to be incremented so pending
gc requests are invalidated, from Pablo.
Same for patch 4: gc list needs to use gc list lock, not destroy lock,
also from Pablo.
Patch 5 fixes a UaF in a set backend, but this should only occur when
failslab is enabled for GFP_KERNEL allocations, broken since feature
was added in 5.6, from myself.
Patch 6 fixes a double-free bug that was also added via previous PR:
We must not schedule gc work if the previous batch is still queued.
netfilter pull request 2023-08-23
* tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix out of memory error handling
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: validate all pending tables
====================
Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu says:
====================
fix macvlan over alb bond support
Currently, the macvlan over alb bond is broken after commit
14af9963ba ("bonding: Support macvlans on top of tlb/rlb mode bonds").
Fix this and add relate tests.
====================
Link: https://lore.kernel.org/r/20230823071907.3027782-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a new testing topo bond_topo_2d1c.sh which is used more commonly.
Make bond_topo_3d1c.sh just source bond_topo_2d1c.sh and add the
extra link.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The commit 14af9963ba ("bonding: Support macvlans on top of tlb/rlb mode
bonds") aims to enable the use of macvlans on top of rlb bond mode. However,
the current rlb bond mode only handles ARP packets to update remote neighbor
entries. This causes an issue when a macvlan is on top of the bond, and
remote devices send packets to the macvlan using the bond's MAC address
as the destination. After delivering the packets to the macvlan, the macvlan
will rejects them as the MAC address is incorrect. Consequently, this commit
makes macvlan over bond non-functional.
To address this problem, one potential solution is to check for the presence
of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev)
and return NULL in the rlb_arp_xmit() function. However, this approach
doesn't fully resolve the situation when a VLAN exists between the bond and
macvlan.
So let's just do a partial revert for commit 14af9963ba in rlb_arp_xmit().
As the comment said, Don't modify or load balance ARPs that do not originate
locally.
Fixes: 14af9963ba ("bonding: Support macvlans on top of tlb/rlb mode bonds")
Reported-by: susan.zheng@veritas.com
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle says:
====================
net: ethernet: mtk_eth_soc: improve support for MT7988
This series fixes and completes commit 445eb6448e ("net: ethernet:
mtk_eth_soc: add basic support for MT7988 SoC") and also adds support
for using the in-SoC SRAM to previous MT7986 and MT7981 SoCs.
====================
Link: https://lore.kernel.org/r/cover.1692721443.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Systems having 4 GiB of RAM and more require DMA addressing beyond the
current 32-bit limit. Starting from MT7988 the hardware now supports
36-bit DMA addressing, let's use that new capability in the driver to
avoid running into swiotlb on systems with 4 GiB of RAM or more.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/95b919c98876c9e49761e44662e7c937479eecb8.1692721443.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MT7981, MT7986 and MT7988 come with in-SoC SRAM dedicated for Ethernet
DMA rings. Support using the SRAM without breaking existing device tree
bindings, ie. only new SoC starting from MT7988 will have the SRAM
declared as additional resource in device tree. For MT7981 and MT7986
an offset on top of the main I/O base is used.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/e45e0f230c63ad58869e8fe35b95a2fb8925b625.1692721443.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As we already added the exception tracing for XDP_TX, I think it is
necessary to add the exception tracing for other XDP actions, such
as XDP_REDIRECT, XDP_ABORTED and unknown error actions.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230822065255.606739-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make an existing ACPI IRQ override quirk for PCSpecialist Elimina
Pro 16 M work as intended (Hans de Goede).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmTmXzwSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx9QYP/jDKFLQRVD+pUzdgGLKhEhJmZrmK4xdJ
Dao6eJBivNE7O0vx6WN8mFPeccnS6sln+/6aq+XoVRkoOmk4N3/IIjnooYEPtg0u
JqA4fUUtMPk+7tiwmzSAPId9qcP1XcUR2SDDe0jDfNRy4ppIaBwatf0k3elx+yRl
Fv+fPJLeeUfhZnqtAxyclAtAp5UC6OPz1VPt3QWU0JTiv/8OrC8jTIpVur8cxyzK
CMaZ46OzVdXQAb717tIlrOZtVSVN3pDL/SynkDgbmGKv9M76qu4Cn7TcDj3w71bW
WiOVv9mL6UsS3pwD5tBOiDO1NnLCvwZSN6zrNNYiT+ajQqOwCENMfO+qnu8vH3Q9
nrpAkeKFE3rFzHcwq4XV2lkthK5ednYXmij50E3op5t63/oYwXaqYKLazK4iIjiJ
tyA4NOwa5Y9ATu375g5rxIip5MBD50BZM0/6MRA7/h1mMRca2dz2Rie+9PkffnGk
WGs25DtjrN7wHr7+eOuPcvz+CTyZ2R0X6+ppUGA4j2YWKDbiAGxKKtULDy0s3+RR
2CJUJfIaf9HesiUtNzV/nnwbP6CXPmPsMlRomSNd5Aik/IJr/dI+qrJNJjGZy3lF
YG99/NSIAPCXjD6iGDvRwlajDS8Q3e2CI9FRvyXwz9YrrcOsng4OFQXoDutP++Ch
ZS9w22hz8OlA
=WWI/
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Make an existing ACPI IRQ override quirk for PCSpecialist Elimina Pro
16 M work as intended (Hans de Goede)"
* tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Fix IRQ override quirk for PCSpecialist Elimina Pro 16 M
Final set of pdx86 fixes for 6.5.
The following is an automated git shortlog grouped by driver:
ideapad-laptop:
- Add support for new hotkeys found on ThinkBook 14s Yoga ITL
lenovo-ymc:
- Add Lenovo Yoga 7 14ACN6 to ec_trigger_quirk_dmi_table
platform/mellanox:
- Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmTmIokUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9zwkAf9GVfHsEMGbCFtktSOivZ98nSoMFHi
bS9XjvvbNVpv+vHGFQA1L/933+G31OXr4Dtyb06xpOXem86JzZVdZoouNjMN4B0G
1hwOobqG3C52ROaherECtAqpj7QuC1OU5ccd8dWXLYiXymlKCsAKysXtM4zCfkpE
M3edQG+Wcjwp2PwnzmzLA7OMCMCFxuDOxRw+VOVj7JxsGFA37MJQAW7/Q1lKLBgj
/O86FpJZvn/ZCJm7UJ4EqQbv9dbKTIqey2SxciF9AQ96OCreWj0pDBsCBNz7JiBc
n/oqq5sPd6w8wrBVe+DhbiVzAJ8QBFcwOnHZvDP8wiGial+Oe+gCiQeXZA==
=zumw
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Final set of three small fixes for 6.5"
* tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications
platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL
platform/x86: lenovo-ymc: Add Lenovo Yoga 7 14ACN6 to ec_trigger_quirk_dmi_table
rshim console does not show all entries of dmesg.
Fixed by setting MLXBF_TM_TX_LWM_IRQ for every CONSOLE notification.
Signed-off-by: Shih-Yi Chen <shihyic@nvidia.com>
Reviewed-by: Liming Sung <limings@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20230821150627.26075-1-shihyic@nvidia.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Don't queue more gc work, else we may queue the same elements multiple
times.
If an element is flagged as dead, this can mean that either the previous
gc request was invalidated/discarded by a transaction or that the previous
request is still pending in the system work queue.
The latter will happen if the gc interval is set to a very low value,
e.g. 1ms, and system work queue is backlogged.
The sets refcount is 1 if no previous gc requeusts are queued, so add
a helper for this and skip gc run if old requests are pending.
Add a helper for this and skip the gc run in this case.
Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Several instances of pipapo_resize() don't propagate allocation failures,
this causes a crash when fault injection is enabled for gfp_kernel slabs.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to
protect the gc list.
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Abort path is missing a synchronization point with GC transactions. Add
GC sequence number hence any GC transaction losing race will be
discarded.
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Destroy work waits for the RCU grace period then it releases the objects
with no mutex held. All releases objects follow this path for
transactions, therefore, order is guaranteed and references to top-level
objects in the hierarchy remain valid.
However, netlink notifier might interfer with pending destroy work.
rcu_barrier() is not correct because objects are not release via RCU
callback. Flush destroy work before releasing objects from netlink
notifier path.
Fixes: d4bc8271db ("netfilter: nf_tables: netlink notifier might race to release objects")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
We have to validate all tables in the transaction that are in
VALIDATE_DO state, the blamed commit below did not move the break
statement to its right location so we only validate one table.
Moreover, we can't init table->validate to _SKIP when a table object
is allocated.
If we do, then if a transcaction creates a new table and then
fails the transaction, nfnetlink will loop and nft will hang until
user cancels the command.
Add back the pernet state as a place to stash the last state encountered.
This is either _DO (we hit an error during commit validation) or _SKIP
(transaction passed all checks).
Fixes: 00c320f9b7 ("netfilter: nf_tables: make validation state per table")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
When building for power4, newer binutils don't recognise the "dcbfl"
extended mnemonic.
dcbfl RA, RB is equivalent to dcbf RA, RB, 1.
Switch to "dcbf" to avoid the build error.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add check for pf->vf not being NULL before dereferencing
pf->vf[vsi->vf_id] in updating VSI filter sync.
Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
in the condition for clearing promisc mode bit.
Fixes: c87c938f62 ("i40e: Add VF VLAN pruning")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers of build_skb() (*) in bnxt are in NAPI context.
The budget checking is somewhat convoluted because in the shared
completion queue cases Rx packets are discarded by netpoll
by forcing an error (E). But that happens before skb allocation.
Only a call chain starting at __bnxt_poll_work() can lead to
an skb allocation and it checks budget (b).
* bnxt_rx_multi_page_skb
* bnxt_rx_skb
` bp->rx_skb_func
* bnxt_tpa_end
` bnxt_rx_pkt
E bnxt_force_rx_discard
E bnxt_poll_nitroa0
b __bnxt_poll_work
Use napi_build_skb() to take advantage of the skb cache.
In iperf tests with HW-GRO enabled it barely makes a difference
but in cases where HW-GRO is not as effective (or disabled) it
can give even a >10% boost (20.7Gbps -> 23.1Gbps).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When replacing an existing root qdisc, with one that is of the same kind, the
request boils down to essentially a parameterization change i.e not one that
requires allocation and grafting of a new qdisc. syzbot was able to create a
scenario which resulted in a taprio qdisc replacing an existing taprio qdisc
with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to
create and graft scenario.
The fix ensures that only when the qdisc kinds are different that we should
allow a create and graft, otherwise it goes into the "change" codepath.
While at it, fix the code and comments to improve readability.
While syzbot was able to create the issue, it did not zone on the root cause.
Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down.
v1->V2 changes:
- remove "inline" function definition (Vladmir)
- remove extrenous braces in branches (Vladmir)
- change inline function names (Pedro)
- Run tdc tests (Victor)
v2->v3 changes:
- dont break else/if (Simon)
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove debug logs in port vlan management, since there are already multiple
tracepoints defined for those operations in DSA
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF programs that run on connect can rewrite the connect address. For
the connect system call this isn't a problem, because a copy of the address
is made when it is moved into kernel space. However, kernel_connect
simply passes through the address it is given, so the caller may observe
its address value unexpectedly change.
A practical example where this is problematic is where NFS is combined
with a system such as Cilium which implements BPF-based load balancing.
A common pattern in software-defined storage systems is to have an NFS
mount that connects to a persistent virtual IP which in turn maps to an
ephemeral server IP. This is usually done to achieve high availability:
if your server goes down you can quickly spin up a replacement and remap
the virtual IP to that endpoint. With BPF-based load balancing, mounts
will forget the virtual IP address when the address rewrite occurs
because a pointer to the only copy of that address is passed down the
stack. Server failover then breaks, because clients have forgotten the
virtual IP address. Reconnects fail and mounts remain broken. This patch
was tested by setting up a scenario like this and ensuring that NFS
reconnects worked after applying the patch.
Signed-off-by: Jordan Rife <jrife@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The virtio_net driver currently deals with different versions and types
of virtio net headers, such as virtio_net_hdr_mrg_rxbuf,
virtio_net_hdr_v1_hash, etc. Due to these variations, the code relies
on multiple type casts to convert memory between different structures,
potentially leading to bugs when there are changes in these structures.
Introduces the "struct skb_vnet_common_hdr" as a unifying header
structure using a union. With this approach, various virtio net header
structures can be converted by accessing different members of this
structure, thus eliminating the need for type casting and reducing the
risk of potential bugs.
For example following code:
static struct sk_buff *page_to_skb(struct virtnet_info *vi,
struct receive_queue *rq,
struct page *page, unsigned int offset,
unsigned int len, unsigned int truesize,
unsigned int headroom)
{
[...]
struct virtio_net_hdr_mrg_rxbuf *hdr;
[...]
hdr_len = vi->hdr_len;
[...]
ok:
hdr = skb_vnet_hdr(skb);
memcpy(hdr, hdr_p, hdr_len);
[...]
}
When VIRTIO_NET_F_HASH_REPORT feature is enabled, hdr_len = 20
But the sizeof(*hdr) is 12,
memcpy(hdr, hdr_p, hdr_len); will copy 20 bytes to the hdr,
which make a potential risk of bug. And this risk can be avoided by
introducing struct skb_vnet_common_hdr.
Change log
v1->v2
feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com>
feedback from Simon Horman <horms@kernel.org>
1. change to use net-next tree.
2. move skb_vnet_common_hdr inside kernel file instead of the UAPI header.
v2->v3
feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com>
1. fix typo in commit message.
2. add original struct virtio_net_hdr into union
3. remove virtio_net_hdr_mrg_rxbuf variable in receive_buf;
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>