Remove a redundant null check, as vsi could not be null at this point.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The PHY provides only 39b timestamp. With current timing
implementation, we discard lower 7b, leaving 32b timestamp.
The driver reconstructs the full 64b timestamp by correlating the
32b timestamp with cached_time for performance. The reconstruction
algorithm does both forward & backward interpolation.
The 32b timeval has overflow duration of 2^32 counts ~= 4.23 second.
Due to interpolation in both direction, its now ~= 2.125 second
IIRC, going with at least half a duration, the cached_time is updated
with periodic thread of 1 second (worst-case) periodicity.
But the 1 second periodicity is based on System-timer.
With PPB adjustments, if the 1588 timers increments at say
double the rate, (2s in-place of 1s), the Nyquist rate/half duration
sampling/update of cached_time with 1 second periodic thread will
lead to incorrect interpolations.
Hence we should restrict the PPB adjustments to at least half duration
of cached_time update which translates to 500,000,000 PPB.
Since the periodicity of the cached-time system thread can vary,
it is good to have some buffer time and considering practicality of
PPB adjustments, limiting the max_adj to 100,000,000.
Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently the drop action is supported only in switchdev mode.
Add support for offloading receive filters with action drop
in ADQ/non-ADQ modes. This is in addition to other actions
such as forwarding to a VSI (ADQ) or a queue (ADQ/non-ADQ).
Also renamed 'ch_vsi' to 'dest_vsi' as it is valid for multiple
actions such as forward to vsi/queue which may/may not create a
channel vsi.
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If the number of Traffic Classes (TC) is decreased, the FW will no
longer remove TC nodes, but will send a pending change notification. This
will allow RDMA to destroy corresponding Control QP markers. After RDMA
finishes outstanding operations, the ice driver will send an execute MIB
Pending change admin queue command to FW to finish DCB configuration
change.
The FW will buffer all incoming Pending changes, so there can be only
one active Pending change.
RDMA driver guarantees to remove Control QP markers within 5000 ms.
Hence, LLDP response timeout txTTL (default 30 sec) will be met.
In the case of a Pending change, LLDP MIB Change Event (opcode 0x0A01) will
contain the whole new MIB. But Get LLDP MIB (opcode 0x0A00) AQ call would
still return an old MIB, as the Pending change hasn't been applied yet.
Add ice_get_dcb_cfg_from_mib_change() function to retrieve DCBX config
from LLDP MIB Change Event's buffer for Pending changes.
Co-developed-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In DCB Willing Mode (FW managed LLDP), when the link partner changes
configuration which requires fewer TCs, the TCs that are no longer
needed are suspended by EMP FW, removed, and never resumed. This occurs
before a MIB change event is indicated to SW. The permanent suspension and
removal of these TC nodes in the scheduler prevents RDMA from being able
to destroy QPs associated with this TC, requiring a CORE reset to recover.
A new DCBX configuration change flow is defined to allow SW driver and
other SW components (RDMA) to properly adjust to the configuration
changes before they are taking effect in HW. This flow includes a
two-way handshake between EMP FW<->LAN SW<->RDMA SW.
List of changes:
- Add 'Execute Pending LLDP MIB' AQC.
- Add 'Pending Event Enable' bit.
- Add additional logic to ignore Pending Event Enable' request
while 'LLDP MIB Chnage' event is disabled.
- Add 'Execute Pending LLDP MIB' AQC sending function to FW,
which is needed to take place MIB Event change.
Signed-off-by: Tsotne Chakhvadze <tsotne.chakhvadze@intel.com>
Co-developed-by: Karen Sornek <karen.sornek@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Co-developed-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michael Walle says:
====================
net: phy: Remove probe_capabilities
With all the drivers which used .probe_capabilities converted to the
new c45 MDIO access methods, we can now decide based upon these whether
a bus driver supports c45 and we can get rid of the not widely used
probe_capabilites.
Unfortunately, due to a now broader support of c45 scans, this will
trigger a bug on some boards with a (c22-only) Micrel PHY. These PHYs
don't ignore c45 accesses correctly, thinking they are addressed
themselves and distrupt the MDIO access. To avoid this, a blacklist
for c45 scans is introduced.
====================
Link: https://lore.kernel.org/r/20230116-net-next-remove-probe-capabilities-v2-0-15513b05e1f4@walle.cc
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Deciding if to probe of PHYs using C45 is now determine by if the bus
provides the C45 read method. This makes probe_capabilities redundant
so remove it.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Some PHYs provide invalid IDs in C22 space. If C45 is supported on the
bus an attempt can be made to get the IDs from the C45 space. Decide
on this based on the presence of the C45 read method in the bus
structure. This will allow the unreliable probe_capabilities to be
removed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Now that all MDIO bus drivers which set probe_capabilities to
MDIOBUS_C22_C45 have been converted to use the name API for C45
transactions, perform the scanning of the bus based on which methods
the bus provides.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
After scanning the bus for C22 devices, check if any Micrel PHYs have
been found. They are known to do bad things if there are C45
transactions on the bus. Prevent the scanning of the bus using C45 if
such a PHY has been detected.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Some C22 PHYs do bad things when there are C45 transactions on the
bus. In order to handle this, the bus needs to be scanned first for
C22 at all addresses, and then C45 scanned for all addresses.
The Marvell pxa168 driver scans a specific address on the bus to find
its PHY. This is a C22 only device, so update it to use the c22
helper.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
No functional change, just place it earlier in preparation for some
refactoring.
While at it, correct the comment format and one typo.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmPHp2QACgkQSD+KveBX
+j6M/QgAoyMrhYAKC8xiD/yYMvzpQUbl1iMZpK+YhTMMbYCnnsNZSNHLpq2CRu/B
hqNCBvMOgZD37413UcyskpLuPSNUTQPQthKxui1R3BQyCMjvw3Qoso3ALGkxGh+M
OIak6FPx6Mh198FXGFFjK5kvmlrN4Z+A9vdIVP725M27JqhX4czOJDBVXBukV4+L
yxmRxVk926iMNUs4j4+iVlFbAssUEqQVbjg+eEZUkb60mgKLXK9lTE3+5ngybeZm
qjluUa7mukUMcR1fmR6jAiuHQSkn3xWHbS1ztTm14PTkpomn5JQ/mSqY6jxVfl1u
1AGNmx2k+Cca2z8/5MndhVafLOSe6g==
=tzWU
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net: mlx5: eliminate anonymous module_init & module_exit
net/mlx5: E-switch, Fix switchdev mode after devlink reload
net/mlx5e: Protect global IPsec ASO
net/mlx5e: Remove optimization which prevented update of ESN state
net/mlx5e: Set decap action based on attr for sample
net/mlx5e: QoS, Fix wrongfully setting parent_element_id on MODIFY_SCHEDULING_ELEMENT
net/mlx5: E-switch, Fix setting of reserved fields on MODIFY_SCHEDULING_ELEMENT
net/mlx5e: Remove redundant xsk pointer check in mlx5e_mpwrq_validate_xsk
net/mlx5e: Avoid false lock dependency warning on tc_ht even more
net/mlx5: fix missing mutex_unlock in mlx5_fw_fatal_reporter_err_work()
====================
Link: https://lore.kernel.org/r/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The commit 4af1b64f80 ("octeontx2-pf: Fix lmtst ID used in aura
free") uses the get/put_cpu() to protect the usage of percpu pointer
in ->aura_freeptr() callback, but it also unnecessarily disable the
preemption for the blockable memory allocation. The commit 87b93b678e
("octeontx2-pf: Avoid use of GFP_KERNEL in atomic context") tried to
fix these sleep inside atomic warnings. But it only fix the one for
the non-rt kernel. For the rt kernel, we still get the similar warnings
like below.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by swapper/0/1:
#0: ffff800009fc5fe8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
#1: ffff000100c276c0 (&mbox->lock){+.+.}-{3:3}, at: otx2_init_hw_resources+0x8c/0x3a4
#2: ffffffbfef6537e0 (&cpu_rcache->lock){+.+.}-{2:2}, at: alloc_iova_fast+0x1ac/0x2ac
Preemption disabled at:
[<ffff800008b1908c>] otx2_rq_aura_pool_init+0x14c/0x284
CPU: 20 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc3-rt1-yocto-preempt-rt #1
Hardware name: Marvell OcteonTX CN96XX board (DT)
Call trace:
dump_backtrace.part.0+0xe8/0xf4
show_stack+0x20/0x30
dump_stack_lvl+0x9c/0xd8
dump_stack+0x18/0x34
__might_resched+0x188/0x224
rt_spin_lock+0x64/0x110
alloc_iova_fast+0x1ac/0x2ac
iommu_dma_alloc_iova+0xd4/0x110
__iommu_dma_map+0x80/0x144
iommu_dma_map_page+0xe8/0x260
dma_map_page_attrs+0xb4/0xc0
__otx2_alloc_rbuf+0x90/0x150
otx2_rq_aura_pool_init+0x1c8/0x284
otx2_init_hw_resources+0xe4/0x3a4
otx2_open+0xf0/0x610
__dev_open+0x104/0x224
__dev_change_flags+0x1e4/0x274
dev_change_flags+0x2c/0x7c
ic_open_devs+0x124/0x2f8
ip_auto_config+0x180/0x42c
do_one_initcall+0x90/0x4dc
do_basic_setup+0x10c/0x14c
kernel_init_freeable+0x10c/0x13c
kernel_init+0x2c/0x140
ret_from_fork+0x10/0x20
Of course, we can shuffle the get/put_cpu() to only wrap the invocation
of ->aura_freeptr() as what commit 87b93b678e does. But there are only
two ->aura_freeptr() callbacks, otx2_aura_freeptr() and
cn10k_aura_freeptr(). There is no usage of perpcu variable in the
otx2_aura_freeptr() at all, so the get/put_cpu() seems redundant to it.
We can move the get/put_cpu() into the corresponding callback which
really has the percpu variable usage and avoid the sprinkling of
get/put_cpu() in several places.
Fixes: 4af1b64f80 ("octeontx2-pf: Fix lmtst ID used in aura free")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://lore.kernel.org/r/20230118071300.3271125-1-haokexin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pavithra Sathyanarayanan says:
====================
generic implementation of phy interface and fixed_phy support for the LAN743x device
This patch series includes the following changes:
- Remove the unwanted interface settings in the LAN743x driver as
it is preset in EEPROM configurations.
- Handle generic implementation for the phy interfaces for different
devices LAN7430/31 and pci11x1x.
- Add new feature for fixed_phy support at 1Gbps full duplex for the
LAN7431 device if a phy not found over MDIO. Includes support for
communication between a MAC in a LAN7431 device and custom phys
without an MDIO interface.
====================
Link: https://lore.kernel.org/r/20230117141614.4411-1-Pavithra.Sathyanarayanan@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add fixed_phy support at 1Gbps full duplex for the lan7431 device
if a phy not found over MDIO. Tested with a MAC to MAC connection
from LAN7431 to a KSZ9893 switch. This avoids the Driver open error
in LAN743x. TX delay and internal CLK125 generation is already
enabled in EEPROM.
Signed-off-by: Pavithra Sathyanarayanan <Pavithra.Sathyanarayanan@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add logic to read the Phy interface from MAC_CR register for LAN743x
driver.
Checks for the LAN7430/31 or pci11x1x devices and the adapter
interface is updated accordingly. For LAN7431, adapter interface is set
based on Bit 19 of MAC_CR register as MII or RGMII which removes the
forced RGMII/GMII configurations in lan743x_phy_open().
Signed-off-by: Pavithra Sathyanarayanan <Pavithra.Sathyanarayanan@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Remove the MII/RGMII Selection settings in driver as it is preset
by the EEPROM and has the required configurations before the driver
loads for LAN743x.
Signed-off-by: Pavithra Sathyanarayanan <Pavithra.Sathyanarayanan@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There are some issues with the bpf/nat6to4.c building.
1. It use TEST_CUSTOM_PROGS, which will add the nat6to4.o to
kselftest-list file and run by common run_tests.
2. When building the test via `make -C tools/testing/selftests/
TARGETS="net"`, the nat6to4.o will be build in selftests/net/bpf/
folder. But in test udpgro_frglist.sh it refers to ../bpf/nat6to4.o.
The correct path should be ./bpf/nat6to4.o.
3. If building the test via `make -C tools/testing/selftests/ TARGETS="net"
install`. The nat6to4.o will be installed to kselftest_install/net/
folder. Then the udpgro_frglist.sh should refer to ./nat6to4.o.
To fix the confusing test path, let's just move the nat6to4.c to net folder
and build it as TEST_GEN_FILES.
Fixes: edae34a3ed ("selftests net: add UDP GRO fraglist + bpf self-tests")
Tested-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230118020927.3971864-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
While one cpu is working on looking up the right socket from ehash
table, another cpu is done deleting the request socket and is about
to add (or is adding) the big socket from the table. It means that
we could miss both of them, even though it has little chance.
Let me draw a call trace map of the server side.
CPU 0 CPU 1
----- -----
tcp_v4_rcv() syn_recv_sock()
inet_ehash_insert()
-> sk_nulls_del_node_init_rcu(osk)
__inet_lookup_established()
-> __sk_nulls_add_node_rcu(sk, list)
Notice that the CPU 0 is receiving the data after the final ack
during 3-way shakehands and CPU 1 is still handling the final ack.
Why could this be a real problem?
This case is happening only when the final ack and the first data
receiving by different CPUs. Then the server receiving data with
ACK flag tries to search one proper established socket from ehash
table, but apparently it fails as my map shows above. After that,
the server fetches a listener socket and then sends a RST because
it finds a ACK flag in the skb (data), which obeys RST definition
in RFC 793.
Besides, Eric pointed out there's one more race condition where it
handles tw socket hashdance. Only by adding to the tail of the list
before deleting the old one can we avoid the race if the reader has
already begun the bucket traversal and it would possibly miss the head.
Many thanks to Eric for great help from beginning to end.
Fixes: 5e0724d027 ("tcp/dccp: fix hashdance race for passive sessions")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/
Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean says:
====================
ENETC BD ring cleanup
The highlights of this patch set are:
- Installing a BPF program and changing PTP RX timestamping settings are
currently implemented through a port reconfiguration procedure which
triggers an AN restart on the PHY, and these procedures are not
generally guaranteed to leave the port in a sane state. Patches 9/12
and 11/12 address that.
- Attempting to put the port down (or trying to reconfigure it) has the
driver oppose some resistance if it's bombarded with RX traffic
(it won't go down). Patch 12/12 addresses that.
The other 9 patches are just cleanup in the BD ring setup/teardown code,
which gradually led to bringing the driver in a position where resolving
those 2 issues was possible.
====================
Link: https://lore.kernel.org/r/20230117230234.2950873-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
napi_synchronize() from enetc_stop() waits until the softirq has
finished execution and no longer wants to be rescheduled. However under
high traffic load, this will never happen, and the interface can never
be closed.
The problem is the fact that the NAPI poll routine is written to update
the consumer index which makes the device want to put more buffers in
the RX ring, which restarts the madness again.
Browsing around, it seems that some drivers like i40e keep a bit
(__I40E_VSI_DOWN) which they use as communication between the control
path and the data path. But that isn't my first choice, because
complications ensue - since the enetc hardirq may trigger while we are
in a theoretical ENETC_DOWN state, it may happen that enetc_msix() masks
it, but enetc_poll() never unmasks it. To prevent a stall in that case,
one would need to schedule all NAPI instances when ENETC_DOWN gets
cleared, to process what's pending.
I find it more desirable for the control path - enetc_stop() - to just
quiesce the RX ring and let the softirq finish what remains there,
without any explicit communication, just by making hardware not provide
any more packets.
This seems possible with the Enable bit of the RX BD ring (RBaMR[EN]).
I can't seem to find an exact definition of what this bit does, but when
the RX ring is disabled, the port seems to no longer update the producer
index, and not react to software updates of the consumer index.
In fact, the RBaMR[EN] bit is already toggled by the driver, but too
late for what we want:
enetc_close()
-> enetc_stop()
-> napi_synchronize()
-> enetc_clear_bdrs()
-> enetc_clear_rxbdr()
The enetc_clear_bdrs() function contains not only logic to disable the
RX and TX rings, but also logic to wait for the TX ring stop being busy.
We split enetc_clear_bdrs() into enetc_disable_bdrs() and
enetc_wait_bdrs(). One needs to run before napi_synchronize() and the
other after (NAPI also processes TX completions, so we maximize our
chances of not waiting for the ENETC_TBSR_BUSY bit - unless a packet is
stuck for some reason, ofc).
We also split off enetc_enable_bdrs() from enetc_setup_bdrs(), and call
this from the mirror position in enetc_start() compared to enetc_stop(),
i.e. right before netif_tx_start_all_queues().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Offloading a BPF program to the RX path of the driver suffers from the
same problems as the PTP reconfiguration - improper error checking can
leave the driver in an invalid state, and the link on the PHY is lost.
Reuse the enetc_reconfigure() procedure, but here, we need to run some
code in the middle of the ring reconfiguration procedure - while the
interface is still down. Introduce a callback which makes that possible.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Follow the convention from this driver, which is to name "struct
net_device *" as "ndev", and the convention from other drivers, to name
"struct netdev_bpf *" as "bpf".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The crude enetc_stop() -> enetc_open() mechanism suffers from 2
problems:
1. improper error checking
2. it involves phylink_stop() -> phylink_start() which loses the link
Right now, the driver is prepared to offer a better alternative: a ring
reconfiguration procedure which takes the RX BD size (normal or
extended) as argument. It allocates new resources (failing if that
fails), stops the traffic, and assigns the new resources to the rings.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We want to introduce a fast interface reconfiguration procedure, which
involves temporarily stopping the rings.
But we want enetc_start() and enetc_stop() to not restart PHY autoneg,
because that can take a few seconds until it completes again.
So we need part of enetc_start() and enetc_stop(), but not all of them.
Move phylink_start() right next to phylink_of_phy_connect(), and
phylink_stop() right next to phylink_disconnect_phy(), both still in
ndo_open() and ndo_stop().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We have a few instances in the enetc driver where the ring resources
(BD ring iomem, software BD ring, software TSO headers, basically
everything except RX buffers) need to be reallocated. For example, when
RX timestamping is enabled, the RX BD format changes to an extended one
(twice as large).
Currently, this is done using a simplistic enetc_close() -> enetc_open()
procedure. But this is quite crude, since it also invokes phylink_stop()
-> phylink_start(), the link is lost, and a few seconds need to pass for
autoneg to complete again.
In fact it's bad also due to the improper (yolo) error checking. In case
we fail to allocate new resources, we've already freed the old ones, so
the interface is more or less stuck.
To avoid that, we need a system where reconfiguration is possible in a
way in which resources are allocated upfront. This means that there will
be a higher memory usage temporarily, but the assignment of resources to
rings can be done when both the old and new resources are still available.
Introduce a struct enetc_bdr_resource which holds the resources for a
ring, be it RX or TX. This structure duplicates a lot of fields from
struct enetc_bdr (and access to the same fields in the ring structure
was left duplicated, to not change cache characteristics in the fast
path).
When enetc_alloc_tx_resources() runs, it returns an array of resource
elements (one per TX ring), in addition to the existing priv->tx_res.
To populate priv->tx_res with that array, one must call
enetc_assign_tx_resources(), and this also frees the old resources.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extended RX buffer descriptors are necessary if they carry RX
timestamps, which will be true when PTP timestamping is enabled.
Right now, the rx_ring->ext_en is set from the function that allocates
ring resources (enetc_alloc_rx_resources() -> enetc_alloc_rxbdr()), and
also used later, in enetc_setup_rxbdr(). It is also used in the
enetc_rxbd() and enetc_rxbd_next() fast path helpers.
We want to decouple resource allocation from BD ring setup, but both
procedures depend on BD size (extended or not). Move the "extended"
boolean to enetc_open() and pass it both to the RX allocation procedure
as well as to the RX ring setup procedure. The latter will set
rx_ring->ext_en from now on.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The call path in enetc_close() is:
enetc_close()
-> enetc_free_rxtx_rings()
-> enetc_free_tx_ring()
-> enetc_free_tx_frame()
-> enetc_free_tx_resources()
-> enetc_free_txbdr()
-> enetc_free_tx_frame()
The enetc_free_tx_frame() function is written such that the second call
exits without doing anything, but nonetheless, it is completely
redundant. Delete it. This makes the TX teardown path more similar to
the RX one, where rx_swbd freeing is done in enetc_free_rx_ring(), not
in enetc_free_rxbdr().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The call path in enetc_close() is:
enetc_close()
-> enetc_free_rxtx_rings()
-> enetc_free_rx_ring()
-> tests whether rx_ring->rx_swbd is NULL
-> enetc_free_tx_ring()
-> tests whether tx_ring->tx_swbd is NULL
-> enetc_free_rx_resources()
-> enetc_free_rxbdr()
-> sets rxr->rx_swbd to NULL
-> enetc_free_tx_resources()
-> enetc_free_txbdr()
-> setx txr->tx_swbd to NULL
From the above, it is clear that due to the function ordering, the
checks for NULL are redundant, since the software buffer descriptor
arrays have not yet been set to NULL. Drop these checks.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a refactoring change which introduces the opposite function of
enetc_dma_alloc_bdr().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is only one place which needs to set up indices in the RX ring.
Be consistent with what was done in the TX path and do this in
enetc_setup_rxbdr().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
enetc_alloc_txbdr() deals with allocating resources necessary for a TX
ring to work (the array of software BDs and the array of TSO headers).
The next_to_clean and next_to_use pointers are overwritten with proper
values which are read from hardware here:
enetc_open
-> enetc_alloc_tx_resources
-> enetc_alloc_txbdr
-> set to zero
-> enetc_setup_bdrs
-> enetc_setup_txbdr
-> read from hardware
So their initialization with zeroes is pointless and confusing.
Delete it.
Consequently, since enetc_setup_txbdr() has no opposite cleanup
function, also delete the resetting of these indices from
enetc_free_tx_ring().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 0aa64df30b.
Currently IFF_NO_ADDRCONF is used to prevent all ipv6 addrconf for the
slave ports of team, bonding and failover devices and it means no ipv6
packets can be sent out through these slave ports. However, for team
device, "nsna_ping" link_watch requires ipv6 addrconf. Otherwise, the
link will be marked failure. This patch removes the IFF_NO_ADDRCONF
flag set for team port, and we will fix the original issue in another
patch, as Jakub suggested.
Fixes: 0aa64df30b ("net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/63e09531fc47963d2e4eff376653d3db21b97058.1673980932.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We often have to ping Willem asking for reviews of patches
because he doesn't get included in the CC list. Add MAINTAINERS
entries for some of the areas he covers so that ./scripts/ will
know to add him.
Acked-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Link: https://lore.kernel.org/r/20230117190141.60795-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Third set of fixes for v6.2. This time most of them are for drivers,
only one revert for mac80211. For an important mt76 fix we had to
cherry pick two commits from wireless-next.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmPHoUcRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZtaCgf9G16rPowxI6AD+EliFbArdiwrV+mJHDyN
6eVniHDKgiFnvbyqvh+sGpbuYMwqqfERdxU3qi4+YhVGWyNNQYdJlntggKsTVRKK
gtE6h4zAo2DC6F+/zYt/FkQ6mCK6UQsaHDktGEqRP0vxH8Kdk85+yXEuwklI+L1L
w5ZTZ3HRxdtMhF9AmjCVOUrEEFXosanYTwSZ+1nlMEZ8vc5Wg5TH9wgue3Eg+9vx
vUjfRrrjOAlvGCcb9lVvPseH7n0m/U2JbugQkebuEUvzo4Fxcl2mR9pFXLGGbtAM
gseuNUfJftKVyYlTLc8brI7XpSSx6pV75h1EmvrHPkjiw1oSGeK8ig==
=13z+
-----END PGP SIGNATURE-----
Merge tag 'wireless-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.2
Third set of fixes for v6.2. This time most of them are for drivers,
only one revert for mac80211. For an important mt76 fix we had to
cherry pick two commits from wireless-next.
* tag 'wireless-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()"
wifi: mt76: dma: fix a regression in adding rx buffers
wifi: mt76: handle possible mt76_rx_token_consume failures
wifi: mt76: dma: do not increment queue head if mt76_dma_add_buf fails
wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid
wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices
wifi: brcmfmac: avoid NULL-deref in survey dump for 2G only device
wifi: brcmfmac: avoid handling disabled channels for survey dump
====================
Link: https://lore.kernel.org/r/20230118073749.AF061C433EF@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIVAwUAY8hR6qZi849r7WBJAQLGAQ//fG5t9Iuwxh1OYdSneEaA0nf36+c2VhR/
+znwR0O3EVwMPtkf4L50kwkCi0lpP7i+AeXPPTzAsqUveFUmgYeb/Lt5SFnzI/oi
+ffZcVhrfNkeinfRlTcBtN0GRdnYMwF/eLVEfv0SP5fWrYDwbPTIRhfx2Buc3Irt
CEDbpX1XaehZhwWWWIhCl1kSbIys1f09egLwqoiS4tQsfcD0i3ok5eUdyBdaypDp
sRi5CTorB8S22Vpa/mPGHyckZkv5FUUHC5qPWi3lyF7il54w0fN9WkahOWozmYV4
qmjoAN9fTHrGOV/regBc6GtXv5e4at2BI1b17XajaAGDqt+tindHBO+Atd/Vnsx2
ojSehQRG89RDoXphahAl5dFAZRrdk+NJCvl8lPBGwTArHUgk4Th+lHk2+xgxI46n
gvOxgfhNwfV+/AT+/XFdSMpIB9ZGCTNnNE+oR2K6weCHdDLmOw9LQWWMDhV4DQEG
nKiPE0svSkQiqsgsa7jg6ZI157XdxaKbL3ecYjiQPZlAf8S4kRYcKgMj64/8zWqm
edJMTm+P4IzF9HR/LdtZo9HfkrluV5MOMImDZ/cbQs1bJIxvJrRbUEIUbvv/7/cz
kkUmoJCoxo3syWO9F2fQsgho91g7yDBDmFzYk19Gl5c8dVb/kzcWIeQdU/wpzGJf
O6mpwDyf55U=
=QW+x
-----END PGP SIGNATURE-----
Merge tag 'for-linus-2023011801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- fixes for potential empty list handling in HID core (Pietro Borrello)
- fix for NULL pointer dereference in betop driver that could be
triggered by malicious device (Pietro Borrello)
- fixes for handling calibration data preventing division by zero in
Playstation driver (Roderick Colenbrander)
- fix for memory leak on error path in amd-sfh driver (Basavaraj
Natikar)
- other few assorted small fixes and device ID-specific handling
* tag 'for-linus-2023011801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: betop: check shape of output reports
HID: playstation: sanity check DualSense calibration data.
HID: playstation: sanity check DualShock4 calibration data.
HID: uclogic: Add support for XP-PEN Deco 01 V2
HID: revert CHERRY_MOUSE_000C quirk
HID: check empty report_list in bigben_probe()
HID: check empty report_list in hid_validate_values()
HID: amd_sfh: Fix warning unwind goto
HID: intel_ish-hid: Add check for ishtp_dma_tx_map
In commit 367dfa1212 ("net/mlx5: Remove devl_unlock from
mlx5_eswtich_mode_callback_enter") all functions were converted
to use write lock without relation to their actual purpose.
Change the devlink eswitch getters to use read and not write locks.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5_add_flow_rules supports creating rules without any matches by passing NULL
pointer instead of spec, if NULL is passed it will use a static empty spec.
This make allocation of spec in mlx5_create_indir_fwd_group unnecessary.
Remove the redundant allocation.
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Today VF tunnel offload (tunnel endpoint is on VF) is implemented
by indirect table which use rules that match on VXLAN VNI to
recirculated to root table, this limit the support for only
VXLAN tunnels.
This patch change indirect table to use one single match all rule
to recirculated to root table which is added when any tunnel decap
rule is added with tunnel endpoint is VF. This allow support of
Geneve and GRE with this configuration.
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To avoid memory leaks add a warn when destroying mod hdr hash table
but the hash table is not empty.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use mlx5e_tc_attach_mod_hdr() when allocating encap mod hdr and
remove mlx5e_tc_add_flow_mod_hdr() which is not being used now.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently there are confusing names for attach/detach functions.
mlx5e_attach_mod_hdr() vs mlx5e_mod_hdr_attach()
mlx5e_detach_mod_hdr() vs mlx5e_mod_hdr_detach()
Add tc prefix to the functions that are in en_tc.c to separate
from the functions in mod_hdr.c which has the mod_hdr prefix.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In preparation to remove duplicate functions handling mod hdr allocation
and the fact that modify hdr should be per flow attr and not flow
pass flow attr to the attach and detach mod hdr funcs.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add warning macro in the function mlx5e_mpwqe_get_log_num_strides()
when log WQE size is smaller than log stride size. Theoretically this
should not happen.
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Current XSK prerequisites validation implementation
(setup.c/mlx5e_validate_xsk_param()) fails silently when xsk
prerequisites are not fulfilled.
Add error messages to the kernel log to help the user understand what
went wrong when params are not valid for XSK.
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Meta rules are created/destroyed per vport and not in eswitch
init/destroy.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>