When t4vf_update_port_info() failed in cxgb4vf_open(), resources applied
during adapter goes up are not cleared. Fix it. Only be compiled, not be
tested.
Fixes: 18d79f721e ("cxgb4vf: Update port information in cxgb4vf_open()")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109012100.99132-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
Clean up pcs-xpcs accessors
This series cleans up the pcs-xpcs code to use mdiodev accessors for
read/write just like xpcs_modify_changed() does. In order to do this,
we need to introduce the mdiodev clause 45 accessors.
====================
Link: https://lore.kernel.org/r/Y2pm13+SDg6N/IVx@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use mdiodev accessors rather than accessing the bus and address in
the mdio_device structure and using the mdiobus accessors.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The strlen() may go too far when estimating the length of
the given string. In some cases it may go over the boundary
and crash the system which is the case according to the commit
13a55372b6 ("ARM: orion5x: Revert commit 4904dbda41c8.").
Rectify this by switching to strnlen() for the expected
maximum length of the string.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20221108141108.62974-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If mctp_neigh_init() return error, the routes resources should
be released in the error handling path. Otherwise some resources
leak.
Fixes: 4d8b931928 ("mctp: Add neighbour implementation")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20221108095517.620115-1-weiyongjun@huaweicloud.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The existing genphy_loopback() is not working for TI DP83867 PHY as it
will disable autoneg support while another side is still enabling autoneg.
This is causing the link is not established and results in timeout error
in genphy_loopback() function.
Thus, based on TI PHY datasheet, introduce a TI PHY loopback function by
just configuring BMCR_LOOPBACK(Bit-9) in MII_BMCR register (0x0).
Tested working on TI DP83867 PHY for all speeds (10/100/1000Mbps).
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221108101527.612723-1-michael.wei.hong.sit@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Raju Lakkaraju says:
====================
net: lan743x: PCI11010 / PCI11414 devices Enhancements
This patch series continues with the addition of supported features for the
Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver.
====================
Link: https://lore.kernel.org/r/20221107085650.991470-1-Raju.Lakkaraju@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test that locked bridge port configurations that are not supported by
mlxsw are rejected.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test that packets received via a locked bridge port whose {SMAC, VID}
does not appear in the bridge's FDB or appears with a different port,
trigger the "locked_port" packet trap.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test that packets with a destination MAC of 01:80:C2:00:00:03 trigger
the "eapol" packet trap.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merely checking whether a trap counter incremented or not without
logging a test result is useful on its own. Split this functionality to
a helper which will be used by subsequent patches.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add locked bridge port support by reacting to changes in the
'BR_PORT_LOCKED' flag. When set, enable security checks on the local
port via the previously added SPFSR register.
When security checks are enabled, an incoming packet will trigger an FDB
lookup with the packet's source MAC and the FID it was classified to. If
an FDB entry was not found or was found to be pointing to a different
port, the packet will be dropped. Such packets increment the
"discard_ingress_general" ethtool counter. For added visibility, user
space can trap such packets to the CPU by enabling the "locked_port"
trap. Example:
# devlink trap set pci/0000:06:00.0 trap locked_port action trap
Unlike other configurations done via bridge port flags (e.g., learning,
flooding), security checks are enabled in the device on a per-port basis
and not on a per-{port, VLAN} basis. As such, scenarios where user space
can configure different locking settings for different VLANs configured
on a port need to be vetoed. To that end, veto the following scenarios:
1. Locking is set on a bridge port that is a VLAN upper
2. Locking is set on a bridge port that has VLAN uppers
3. VLAN upper is configured on a locked bridge port
Examples:
# bridge link set dev swp1.10 locked on
Error: mlxsw_spectrum: Locked flag cannot be set on a VLAN upper.
# ip link add link swp1 name swp1.10 type vlan id 10
# bridge link set dev swp1 locked on
Error: mlxsw_spectrum: Locked flag cannot be set on a bridge port that has VLAN uppers.
# bridge link set dev swp1 locked on
# ip link add link swp1 name swp1.10 type vlan id 10
Error: mlxsw_spectrum: VLAN uppers are not supported on a locked port.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Propagate extack to mlxsw_sp_port_attr_br_pre_flags_set() in order to
communicate error messages related to bridge port flag validation.
Example:
# bridge link set dev swp1 locked on
Error: mlxsw_spectrum: Unsupported bridge port flag.
More error messages will be added in subsequent patches.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In Spectrum, learning happens in parallel to the security checks.
Therefore, regardless of the result of the security checks, a learning
notification will be generated by the device and polled later on by the
driver.
Currently, the driver reacts to learning notifications by programming
corresponding FDB entries to the device. When a port is locked (i.e.,
has security checks enabled), this can no longer happen, as otherwise
any host will blindly gain authorization.
Instead, notify the learned entry as a locked entry to the bridge driver
that will in turn notify it to user space, in case MAB is enabled. User
space can then decide to authorize the host by clearing the "locked"
flag, which will cause the entry to be programmed to the device.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Subsequent patches will need to report locked FDB entries to the bridge
driver. Prepare for that by adding a 'locked' argument to
mlxsw_sp_fdb_call_notifiers() according to which the 'locked' bit is set
in the FDB notification info. For now, always pass 'false'.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add an API to enable or disable security checks on a local port. It will
be used by subsequent patches when the 'BR_PORT_LOCKED' flag is toggled.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the Switch Port FDB Security Register (SPFSR) that allows enabling
and disabling security checks on a given local port. In Linux terms, it
allows locking / unlocking a port.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Register the previously added packet traps with devlink. This allows
user space to tune their policers and in the case of the locked port
trap, user space can set its action to "trap" in order to gain
visibility into packets that were discarded by the device due to the
locked port check failure.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add packet traps for 802.1X operation. The "eapol" control trap is used
to trap EAPOL packets and is required for the correct operation of the
control plane. The "locked_port" drop trap can be enabled to gain
visibility into packets that were dropped by the device due to the
locked bridge port check.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reflect the 'BR_PORT_MAB' flag to device drivers so that:
* Drivers that support MAB could act upon the flag being toggled.
* Drivers that do not support MAB will prevent MAB from being enabled.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the bridge is offloaded to hardware, FDB entries are learned and
aged-out by the hardware. Some device drivers synchronize the hardware
and software FDBs by generating switchdev events towards the bridge.
When a port is locked, the hardware must not learn autonomously, as
otherwise any host will blindly gain authorization. Instead, the
hardware should generate events regarding hosts that are trying to gain
authorization and their MAC addresses should be notified by the device
driver as locked FDB entries towards the bridge driver.
Allow device drivers to notify the bridge driver about such entries by
extending the 'switchdev_notifier_fdb_info' structure with the 'locked'
bit. The bit can only be set by device drivers and not by the bridge
driver.
Prevent a locked entry from being installed if MAB is not enabled on the
bridge port.
If an entry already exists in the bridge driver, reject the locked entry
if the current entry does not have the "locked" flag set or if it points
to a different port. The same semantics are implemented in the software
data path.
Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, FDB entries that are notified to the bridge via
'SWITCHDEV_FDB_ADD_TO_BRIDGE' are always marked as offloaded. With MAB
enabled, this will no longer be universally true. Device drivers will
report locked FDB entries to the bridge to let it know that the
corresponding hosts required authorization, but it does not mean that
these entries are necessarily programmed in the underlying hardware.
Solve this by determining the offload indication based of the
'offloaded' bit in the FDB notification.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current Intel platform has an output of ~976ms interval
when probed on 1 Pulse-per-Second(PPS) hardware pin.
The correct PTP clock frequency for PCH GbE should be 204.8MHz
instead of 200MHz. PSE GbE PTP clock rate remains at 200MHz.
Fixes: 58da0cfa6c ("net: stmmac: create dwmac-intel.c to contain all Intel platform")
Signed-off-by: Ling Pei Lee <pei.lee.ling@intel.com>
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Gan Yi Fang <yi.fang.gan@intel.com>
Link: https://lore.kernel.org/r/20221108020811.12919-1-yi.fang.gan@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When failed to bind qsets in cxgb_up() for opening device, napi isn't
disabled. When open cxgb3 device next time, it will trigger a BUG_ON()
in napi_enable(). Compile tested only.
Fixes: 48c4b6dbb7 ("cxgb3 - fix port up/down error path")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109021451.121490-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When failed to create xdp rxqs or fill rx channels in cpsw_ndo_open() for
opening device, napi isn't disabled. When open cpsw device next time, it
will report a invalid opcode issue. Compiled tested only.
Fixes: d354eb85d6 ("drivers: net: cpsw: dual_emac: simplify napi usage")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109011537.96975-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko says:
====================
net: devlink: move netdev notifier block to dest namespace during reload
Patch #1 is just a dependency of patch #2, which is the actual fix.
====================
Link: https://lore.kernel.org/r/20221108132208.938676-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The notifier block tracking netdev changes in devlink is registered
during devlink_alloc() per-net, it is then unregistered
in devlink_free(). When devlink moves from net namespace to another one,
the notifier block needs to move along.
Fix this by adding forgotten call to move the block.
Reported-by: Ido Schimmel <idosch@idosch.org>
Fixes: 02a68a47ea ("net: devlink: track netdev with devlink_port assigned")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, net_dev() netdev notifier variant follows the netdev with
per-net notifier from namespace to namespace. This is implemented
by move_netdevice_notifiers_dev_net() helper.
For devlink it is needed to re-register per-net notifier during
devlink reload. Introduce a new helper called
move_netdevice_notifier_net() and share the unregister/register code
with existing move_netdevice_notifiers_dev_net() helper.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VF driver mistakenly counts VLAN 0 filters, when no PF driver
counts them.
Do not count VLAN 0 filters, when VLAN_V2 is engaged.
Counting those filters in, will affect filters size by -1, when
sending batched VLAN addition message.
Fixes: 968996c070 ("iavf: Fix VLAN_V2 addition/rejection")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Previously, during removal of trusted VF when VF is down there was
number of spurious interrupt equal to number of queues on VF.
Add check if VF already has inactive queues. If VF is disabled and
has inactive rx queues then do not disable rx queues.
Add check in ice_vsi_stop_tx_ring if it's VF's vsi and if VF is
disabled.
Fixes: efe4186000 ("ice: Fix memory corruption in VF driver")
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmNrulwACgkQ4CHKc/GJ
qRDGWwf/bqkCffS+Eg8p3wrGEbhWb1pOWnshcPl9EttSlclIfwaby5+kHTjeKpGR
r3nt2cRAtWH3gUbU32352TJJ97oobasFHk3aE7xorHYTQ5HVAycwiHi+6BqcEcNH
MyH7rcOAnKV1GeE1NnX99CeOtCA0wOaO/kCAn9y1QvSifoxKaiixBodoov4CHuSt
PPXcJU3Rgyo8pDzFya3BAScayTTNkr1MU18iacJwndhAyjWolL4tlVqoLgVsi/TA
wHb80Moj0iPyEioxHW7OHLkoapCYr4mfB3AUUY2t91ZciFQEKfihmki2KJw2VOg5
XBU1iNezxMJhteNJc6JqXr90nsriAw==
=p9yC
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
"Most are small fixups as described below.
The !CONFIG_TRACING fix is a bit bigger and would normally be done in
the next merge window as part of upcoming hardening changes. But we
realized it can make the kmalloc waste tracking introduced in this
window inaccurate, so decided to go with it now.
Summary:
- Remove !CONFIG_TRACING kmalloc() wrappers intended to save a
function call, due to incompatilibity with recently introduced
wasted space tracking and planned hardening changes.
- A tracing parameter regression fix, by Kees Cook.
- Two kernel-doc warning fixups, by Lukas Bulwahn and myself
* tag 'slab-for-6.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm, slab: remove duplicate kernel-doc comment for ksize()
mm/slab_common: Restore passing "caller" for tracing
mm/slab: remove !CONFIG_TRACING variants of kmalloc_[node_]trace()
mm/slab_common: repair kernel-doc for __ksize()
The pkt_reformat pointer being saved under flow_act and not
dest attribute in the termination table instance.
Fix the comparison pointers.
Also fix returning success if one pkt_reformat pointer is null
and the other is not.
Fixes: 249ccc3c95 ("net/mlx5e: Add support for offloading traffic from uplink to uplink")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In the bellow commit, we added support for PPS policing without
removing the check which block offload of such cases.
Fix it by removing this check.
Fixes: a8d52b024d ("net/mlx5e: TC, Support offloading police action")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
DMA sync functions should use the same direction that was used by DMA
mapping. Use DMA_BIDIRECTIONAL for XDP_TX from regular RQ, which reuses
the same mapping that was used for RX, and DMA_TO_DEVICE for XDP_TX from
XSK RQ and XDP_REDIRECT, which establish a new mapping in this
direction. On the RX side, use the same direction that was used when
setting up the mapping (DMA_BIDIRECTIONAL for XDP, DMA_FROM_DEVICE
otherwise).
Also don't skip sync for device when establishing a DMA_FROM_DEVICE
mapping for RX, as some architectures (ARM) may require invalidating
caches before the device can use the mapping. It doesn't break the
bugfix made in
commit 0b7cfa4082 ("net/mlx5e: Fix page DMA map/unmap attributes"),
since the bug happened on unmap.
Fixes: 0b7cfa4082 ("net/mlx5e: Fix page DMA map/unmap attributes")
Fixes: b5503b994e ("net/mlx5e: XDP TX forwarding support")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The commit cited below started using the firmware capability for the
maximum TX WQE size. This commit adds an important check to verify that
the driver doesn't attempt to exceed this capability, and also restores
another check mistakenly removed in the cited commit (a WQE must not
exceed the page size).
Fixes: c27bd1718c ("net/mlx5e: Read max WQEBBs on the SQ from firmware")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In case PCI reads fail after unload, there is no use in trying to
load the device.
Fixes: 5ec697446f ("net/mlx5: Add support for devlink reload action fw activate")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
No need to rollback to the other mode because probably will fail
again. Just set to legacy mode and clear fdb table created flag.
So that fdb table will not be cleared again.
Fixes: f019679ea5 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
For a single CPU system, the kernel thread executing mlx5_cmd_flush()
never releases the CPU but calls down_trylock(&cmd→sem) in a busy loop.
On a single processor system, this leads to a deadlock as the kernel
thread which executes mlx5_cmd_invoke() never gets scheduled. Fix this,
by adding the cond_resched() call to the loop, allow the command
completion kernel thread to execute.
Fixes: 8e715cd613 ("net/mlx5: Set command entry semaphore up once got index free")
Signed-off-by: Alexander Schmidt <alexschm@de.ibm.com>
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mlx5 LAG is initialized asynchronously on a workqueue which means that for
a brief moment after setting mlx5 UL representors as lower devices of a
bond netdevice the LAG itself is not fully initialized in the driver. When
adding such bond device to a bridge mlx5 bridge code will not consider it
as offload-capable, skip creating necessary bookkeeping and fail any
further bridge offload-related commands with it (setting VLANs, offloading
FDBs, etc.). In order to make the error explicit during bridge
initialization stage implement the code that detects such condition during
NETDEV_PRECHANGEUPPER event and returns an error.
Fixes: ff9b752146 ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The return value from genl_op_iter_init() only tells us if
there are any policies but to begin the iteration (and therefore
load the first entry) we need to call genl_op_iter_next().
Note that it's safe to call genl_op_iter_next() on a family
with no ops, it will just return false.
This may lead to various crashes, a warning in
netlink_policy_dump_get_policy_idx() when policy is not found
or.. no problem at all if the kmalloc'ed memory happens to be
zeroed.
Fixes: b502b3185c ("genetlink: use iterator in the op to policy map dumping")
Link: https://lore.kernel.org/r/20221108204128.330287-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for net:
1) Fix deadlock in nfnetlink due to missing mutex release in error path,
from Ziyang Xuan.
2) Clean up pending autoload module list from nf_tables_exit_net() path,
from Shigeru Yoshida.
3) Fixes for the netfilter's reverse path selftest, from Phil Sutter.
All of these bugs have been around for several releases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmNq0Q8ACgkQ+7dXa6fL
C2toRxAAmmvce10i3hcS6ke0PvB4gPu6ZSuaQWO3KxpP9jz6lV7M+cfFOh9N3neG
uEe6ms4Kzt/BJIBm+aMdXW84648sV5vOqdrNGBOb2cJikaiTkj9x730klSdwOVr2
epEELoj/IEWZZz/d9U05uq26VUtnxsc/Enzkq/GIaENSVauYWaZXrHdKzrzUZYjk
gEbspFSpQEJqu5slRl2XGos4tMHHvTIkehoLH9KM4YmC5WGf1kKYz/6v38PIhc/9
mEBsUqQlTVsUPNcOXWBY24HJKY91CBgowhbTQIxyJNydHPJYPVJ8U5nNp1g1CYmu
URdvvX8IyIR0zX2RcVlc9vnWQ+p5NoTjxjwc1iKjnBsofCmqDucie6Iz2vis7Zl6
6s6N1FZSYQTX0fbBbf00efWaG/3I/ynRhcW+zM9NcozHzpRxyuptDlKSOVORXRG7
gy7+sID2y5dLqCg9ukTIx1y9Njt+uryosBOajCMaaAy0VgXEsETFO8UxbodUAu6N
ubmPwGO42bY//c+fJWRAjT9tjhzp2fWK4rgrgd3VG4cYrjq2W21EMwyjzilVp2dM
ZlvWoWJptIqEhPtWU8nf3i759XE+FOWKt9ns1FupKB+0msht1p2HBj88bue8TrKk
CcV1dY9cohNzgRFXvXcgSLvSCioT31Q//mGmXWLif7teOXIUN4A=
=q04p
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-next-20221108' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
rxrpc changes
David Howells says:
====================
rxrpc: Increasing SACK size and moving away from softirq, part 1
AF_RXRPC has some issues that need addressing:
(1) The SACK table has a maximum capacity of 255, but for modern networks
that isn't sufficient. This is hard to increase in the upstream code
because of the way the application thread is coupled to the softirq
and retransmission side through a ring buffer. Adjustments to the rx
protocol allows a capacity of up to 8192, and having a ring
sufficiently large to accommodate that would use an excessive amount
of memory as this is per-call.
(2) Processing ACKs in softirq mode causes the ACKs get conflated, with
only the most recent being considered. Whilst this has the upside
that the retransmission algorithm only needs to deal with the most
recent ACK, it causes DATA transmission for a call to be very bursty
because DATA packets cannot be transmitted in softirq mode. Rather
transmission must be delegated to either the application thread or a
workqueue, so there tend to be sudden bursts of traffic for any
particular call due to scheduling delays.
(3) All crypto in a single call is done in series; however, each DATA
packet is individually encrypted so encryption and decryption of large
calls could be parallelised if spare CPU resources are available.
This is the first of a number of sets of patches that try and address them.
The overall aims of these changes include:
(1) To get rid of the TxRx ring and instead pass the packets round in
queues (eg. sk_buff_head). On the Tx side, each ACK packet comes with
a SACK table that can be parsed as-is, so there's no particular need
to maintain our own; we just have to refer to the ACK.
On the Rx side, we do need to maintain a SACK table with one bit per
entry - but only if packets go missing - and we don't want to have to
perform a complex transformation to get the information into an ACK
packet.
(2) To try and move almost all processing of received packets out of the
softirq handler and into a high-priority kernel I/O thread. Only the
transferral of packets would be left there. I would still use the
encap_rcv hook to receive packets as there's a noticeable performance
drop from letting the UDP socket put the packets into its own queue
and then getting them out of there.
(3) To make the I/O thread also do all the transmission. The app thread
would be responsible for packaging the data into packets and then
buffering them for the I/O thread to transmit. This would make it
easier for the app thread to run ahead of the I/O thread, and would
mean the I/O thread is less likely to have to wait around for a new
packet to come available for transmission.
(4) To logically partition the socket/UAPI/KAPI side of things from the
I/O side of things. The local endpoint, connection, peer and call
objects would belong to the I/O side. The socket side would not then
touch the private internals of calls and suchlike and would not change
their states. It would only look at the send queue, receive queue and
a way to pass a message to cause an abort.
(5) To remove as much locking, synchronisation, barriering and atomic ops
as possible from the I/O side. Exclusion would be achieved by
limiting modification of state to the I/O thread only. Locks would
still need to be used in communication with the UDP socket and the
AF_RXRPC socket API.
(6) To provide crypto offload kernel threads that, when there's slack in
the system, can see packets that need crypting and provide
parallelisation in dealing with them.
(7) To remove the use of system timers. Since each timer would then send
a poke to the I/O thread, which would then deal with it when it had
the opportunity, there seems no point in using system timers if,
instead, a list of timeouts can be sensibly consulted. An I/O thread
only then needs to schedule with a timeout when it is idle.
(8) To use zero-copy sendmsg to send packets. This would make use of the
I/O thread being the sole transmitter on the socket to manage the
dead-reckoning sequencing of the completion notifications. There is a
problem with zero-copy, though: the UDP socket doesn't handle running
out of option memory very gracefully.
With regard to this first patchset, the changes made include:
(1) Some fixes, including a fallback for proc_create_net_single_write(),
setting ack.bufferSize to 0 in ACK packets and a fix for rxrpc
congestion management, which shouldn't be saving the cwnd value
between calls.
(2) Improvements in rxrpc tracepoints, including splitting the timer
tracepoint into a set-timer and a timer-expired trace.
(3) Addition of a new proc file to display some stats.
(4) Some code cleanups, including removing some unused bits and
unnecessary header inclusions.
(5) A change to the recently added UDP encap_err_rcv hook so that it has
the same signature as {ip,ipv6}_icmp_error(), and then just have rxrpc
point its UDP socket's hook directly at those.
(6) Definition of a new struct, rxrpc_txbuf, that is used to hold
transmissible packets of DATA and ACK type in a single 2KiB block
rather than using an sk_buff. This allows the buffer to be on a
number of queues simultaneously more easily, and also guarantees that
the entire block is in a single unit for zerocopy purposes and that
the data payload is aligned for in-place crypto purposes.
(7) ACK txbufs are allocated at proposal and queued for later transmission
rather than being stored in a single place in the rxrpc_call struct,
which means only a single ACK can be pending transmission at a time.
The queue is then drained at various points. This allows the ACK
generation code to be simplified.
(8) The Rx ring buffer is removed. When a jumbo packet is received (which
comprises a number of ordinary DATA packets glued together), it used
to be pointed to by the ring multiple times, with an annotation in a
side ring indicating which subpacket was in that slot - but this is no
longer possible. Instead, the packet is cloned once for each
subpacket, barring the last, and the range of data is set in the skb
private area. This makes it easier for the subpackets in a jumbo
packet to be decrypted in parallel.
(9) The Tx ring buffer is removed. The side annotation ring that held the
SACK information is also removed. Instead, in the event of packet
loss, the SACK data attached an ACK packet is parsed.
(10) Allocate an skcipher request when needed in the rxkad security class
rather than caching one in the rxrpc_call struct. This deals with a
race between externally-driven call disconnection getting rid of the
skcipher request and sendmsg/recvmsg trying to use it because they
haven't seen the completion yet. This is also needed to support
parallelisation as the skcipher request cannot be used by two or more
threads simultaneously.
(11) Call udp_sendmsg() and udpv6_sendmsg() directly rather than going
through kernel_sendmsg() so that we can provide our own iterator
(zerocopy explicitly doesn't work with a KVEC iterator). This also
lets us avoid the overhead of the security hook.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Include linux/vmalloc.h in iosm_ipc_coredump.c &
iosm_ipc_devlink.c to resolve kernel test robot errors.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>