Commit Graph

1281870 Commits

Author SHA1 Message Date
Thorsten Blum
47c130130d l2tp: Remove duplicate included header file trace.h
Remove duplicate included header file trace.h and the following warning
reported by make includecheck:

  trace.h is included more than once

Compile-tested only.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20240703061147.691973-2-thorsten.blum@toblux.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-04 12:37:58 +02:00
Jakub Kicinski
e19f67df9c Merge branch 'selftests-openvswitch-address-some-flakes-in-the-ci-environment'
Aaron Conole says:

====================
selftests: openvswitch: Address some flakes in the CI environment

These patches aim to make using the openvswitch testsuite more reliable.
These should address the major sources of flakiness in the openvswitch
test suite allowing the CI infrastructure to exercise the openvswitch
module for patch series.  There should be no change for users who simply
run the tests (except that patch 3/3 does make some of the debugging a bit
easier by making some output more verbose).
====================

Link: https://patch.msgid.link/20240702132830.213384-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:29:18 -07:00
Aaron Conole
7abfd8ecb7 selftests: openvswitch: Be more verbose with selftest debugging.
The openvswitch selftest is difficult to debug for anyone that isn't
directly familiar with the openvswitch module and the specifics of the
test cases.  Many times when something fails, the debug log will be
sparsely populated and it takes some time to understand where a failure
occured.

Increase the amount of details logged to the debug log by trapping all
'info' logs, and all 'ovs_sbx' commands.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-4-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:29:15 -07:00
Aaron Conole
818481db3d selftests: openvswitch: Attempt to autoload module.
Previously, the openvswitch.sh test suites would not attempt to autoload
the openvswitch module.  The idea was that a user who is manually running
tests might not even have the OVS module loaded or configured for their
own development.  However, if the kernel module is configured, and the
module can be autoloaded then we should just attempt to load it and run
the tests.  This is especially true in the CI environments, where the CI
tests should be able to rely on auto loading to get the test suite running.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-3-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:29:15 -07:00
Aaron Conole
ff015706fc selftests: openvswitch: Bump timeout to 15 minutes.
We found that since some tests rely on the TCP SYN timeouts to cause flow
misses, the default test suite timeout of 45 seconds is quick to be
exceeded.  Bump the timeout to 15 minutes.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-2-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:29:15 -07:00
Jakub Kicinski
1a16cdf77e net: ethtool: fix compat with old RSS context API
Device driver gets access to rxfh_dev, while rxfh is just a local
copy of user space params. We need to check what RSS context ID
driver assigned in rxfh_dev, not rxfh.

Using rxfh leads to trying to store all contexts at index 0xffffffff.
From the user perspective it leads to "driver chose duplicate ID"
warnings when second context is added and inability to access any
contexts even tho they were successfully created - xa_load() for
the actual context ID will return NULL, and syscall will return -ENOENT.

Looks like a rebasing mistake, since rxfh_dev was added relatively
recently by commit fb6e30a725 ("net: ethtool: pass a pointer to
parameters to get/set_rxfh ethtool ops").

Fixes: eac9122f0c ("net: ethtool: record custom RSS contexts in the XArray")
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20240702164157.4018425-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:22:17 -07:00
Jakub Kicinski
0b8774586b selftests: drv-net: rss_ctx: allow more noise on default context
As predicted by David running the test on a machine with a single
interface is a bit unreliable. We try to send 20k packets with
iperf and expect fewer than 10k packets on the default context.
The test isn't very quick, iperf will usually send 100k packets
by the time we stop it. So we're off by 5x on the number of iperf
packets but still expect default context to only get the hardcoded
10k. The intent is to make sure we get noticeably less traffic
on the default context. Use half of the resulting iperf traffic
instead of the hard coded 10k.

Link: https://patch.msgid.link/20240702233728.4183387-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:13:59 -07:00
Paolo Abeni
8c5a9f290e tools: ynl: use ident name for Family, too.
This allow consistent naming convention between Family and others
element's name.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/9bbcab3094970b371bd47aa18481ae6ca5a93687.1719930479.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-03 19:13:20 -07:00
Xin Long
cda91d5b91 sctp: cancel a blocking accept when shutdown a listen socket
As David Laight noticed,

"In a multithreaded program it is reasonable to have a thread blocked in
 accept(). With TCP a subsequent shutdown(listen_fd, SHUT_RDWR) causes
 the accept to fail. But nothing happens for SCTP."

sctp_disconnect() is eventually called when shutdown a listen socket,
but nothing is done in this function. This patch sets RCV_SHUTDOWN
flag in sk->sk_shutdown there, and adds the check (sk->sk_shutdown &
RCV_SHUTDOWN) to break and return in sctp_accept().

Note that shutdown() is only supported on TCP-style SCTP socket.

Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-03 09:45:39 +01:00
Lucas Stach
2e3ed20c17 net: dsa: microchip: lan937x: disable VPHY support
As described by the microchip article "LAN937X - The required
configuration for the external MAC port to operate at RGMII-to-RGMII
1Gbps link speed." [1]:

"When VPHY is enabled, the auto-negotiation process following IEEE 802.3
standard will be triggered and will result in RGMII-to-RGMII signal
failure on the interface because VPHY will try to poll the PHY status
that is not available in the scenario of RGMII-to-RGMII connection
(normally the link partner is usually an external processor).

Note that when VPHY fails on accessing PHY registers, it will fall back
to 100Mbps speed, it indicates disabling VPHY is optional if you only
need the port to link at 100Mbps speed.

Again, VPHY must and can only be disabled by writing VPHY_DISABLE bit in
the register below as there is no strapping pin for the control."

This patch was tested on LAN9372, so far it seems to not to affect VPHY
based clock crossing optimization for the ports with integrated PHYs.

[1]: https://microchip.my.site.com/s/article/LAN937X-The-required-configuration-for-the-external-MAC-port-to-operate-at-RGMII-to-RGMII-1Gbps-link-speed

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-03 09:13:39 +01:00
Lucas Stach
c3db39468a net: dsa: microchip: lan937x: disable in-band status support for RGMII interfaces
This driver do not support in-band mode and in case of CPU<->Switch
link, this mode is not working any way. So, disable it otherwise ingress
path of the switch MAC will stay disabled.

Note: lan9372 manual do not document 0xN301 BIT(2) for the RGMII mode
and recommend[1] to disable in-band link status update for the RGMII RX
path by clearing 0xN302 BIT(0). But, 0xN301 BIT(2) seems to work too, so
keep it unified with other KSZ switches.

[1] https://microchip.my.site.com/s/article/LAN937X-The-required-configuration-for-the-external-MAC-port-to-operate-at-RGMII-to-RGMII-1Gbps-link-speed

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-03 09:13:39 +01:00
Lucas Stach
8d7330b3a9 net: dsa: microchip: lan9371/2: add 100BaseTX PHY support
On the LAN9371 and LAN9372, the 4th internal PHY is a 100BaseTX PHY
instead of a 100BaseT1 PHY. The 100BaseTX PHYs have a different base
register offset.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-03 09:13:38 +01:00
Jakub Kicinski
df18948d33 Merge branch 'device-memory-tcp'
Prep patches for Device Memory TCP

Pick up a couple of prep patches for Device Memory TCP which
stand on their own.

Link: https://patch.msgid.link/20240628003253.1694510-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02 18:59:33 -07:00
Jakub Kicinski
07c3cc51a0 tools: net: package libynl for use in selftests
Support building the C YNL userspace library into one big static file.
We can then link selftests against it for easy to use C netlink
interface.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240628003253.1694510-14-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02 18:59:33 -07:00
Mina Almasry
4dec64c52e page_pool: convert to use netmem
Abstract the memory type from the page_pool so we can later add support
for new memory types. Convert the page_pool to use the new netmem type
abstraction, rather than use struct page directly.

As of this patch the netmem type is a no-op abstraction: it's always a
struct page underneath. All the page pool internals are converted to
use struct netmem instead of struct page, and the page pool now exports
2 APIs:

1. The existing struct page API.
2. The new struct netmem API.

Keeping the existing API is transitional; we do not want to refactor all
the current drivers using the page pool at once.

The netmem abstraction is currently a no-op. The page_pool uses
page_to_netmem() to convert allocated pages to netmem, and uses
netmem_to_page() to convert the netmem back to pages to pass to mm APIs,

Follow up patches to this series add non-paged netmem support to the
page_pool. This change is factored out on its own to limit the code
churn to this 1 patch, for ease of code review.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/20240628003253.1694510-6-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02 18:59:33 -07:00
Paolo Abeni
ac26327635 Merge branch 'fixes-for-stm32-dwmac-driver-fails-to-probe'
Christophe Roullier says:

====================
Fixes for stm32-dwmac driver fails to probe

Mark Brown found issue during stm32-dwmac probe:

For the past few days networking has been broken on the Avenger 96, a
stm32mp157a based platform.  The stm32-dwmac driver fails to probe:

<6>[    1.894271] stm32-dwmac 5800a000.ethernet: IRQ eth_wake_irq not found
<6>[    1.899694] stm32-dwmac 5800a000.ethernet: IRQ eth_lpi not found
<6>[    1.905849] stm32-dwmac 5800a000.ethernet: IRQ sfty not found
<3>[    1.912304] stm32-dwmac 5800a000.ethernet: Unable to parse OF data
<3>[    1.918393] stm32-dwmac 5800a000.ethernet: probe with driver stm32-dwmac failed with error -75

which looks a bit odd given the commit contents but I didn't look at the
driver code at all.

Full boot log here:

   https://lava.sirena.org.uk/scheduler/job/467150

A working equivalent is here:

   https://lava.sirena.org.uk/scheduler/job/466518

I delivered 2 fixes to solve issue.
====================

Link: https://patch.msgid.link/20240701064838.425521-1-christophe.roullier@foss.st.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:47:17 +02:00
Christophe Roullier
f8dbe58e2f net: stmmac: dwmac-stm32: update err status in case different of stm32mp13
The mask parameter of syscfg property is mandatory for MP13 but
optional for all other cases.
The function should not return error code because for non-MP13
the missing syscfg phandle in DT is not considered an error.
So reset err to 0 in that case to support existing DTs without
syscfg phandle.

Fixes: 50bbc03931 ("net: stmmac: dwmac-stm32: add management of stm32mp13 for stm32")
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:47:13 +02:00
Christophe Roullier
3cd1d4651c net: stmmac: dwmac-stm32: Add test to verify if ETHCK is used before checking clk rate
When we want to use clock from RCC to clock Ethernet PHY (with ETHCK)
we need to check if value of clock rate is authorized.
If ETHCK is unused, the ETHCK frequency is 0Hz and validation fails.
It makes no sense to validate unused ETHCK, so skip the validation.

Fixes: 582ac13496 ("net: stmmac: dwmac-stm32: Separate out external clock rate validation")
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:47:09 +02:00
Serge Semin
d01e0e98de dt-bindings: net: dwmac: Validate PBL for all IP-cores
Indeed the maximum DMA burst length can be programmed not only for DW
xGMACs, Allwinner EMACs and Spear SoC GMAC, but in accordance with
[1, 2, 3] for Generic DW *MAC IP-cores. Moreover the STMMAC driver parses
the property and then apply the configuration for all supported DW MAC
devices. All of that makes the property being available for all IP-cores
the bindings supports. Let's make sure the PBL-related properties are
validated for all of them by the common DW *MAC DT schema.

[1] DesignWare Cores Ethernet MAC Universal Databook, Revision 3.73a,
    October 2013, p.378.

[2] DesignWare Cores Ethernet Quality-of-Service Databook, Revision 5.10a,
    December 2017, p.1223.

[3] DesignWare Cores XGMAC - 10G Ethernet MAC Databook, Revision 2.11a,
    September 2015, p.469-473.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20240628154515.8783-1-fancer.lancer@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:34:50 +02:00
Paolo Abeni
2a01a88950 Merge branch 'net-bpf_net_context-cleanups'
Sebastian Andrzej Siewior says:

====================
net: bpf_net_context cleanups.

a small series with bpf_net_context cleanups/ improvements.
Jakub asked for #1 and #2 and while looking around I made #3.
====================

Link: https://patch.msgid.link/20240628103020.1766241-1-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:27:00 +02:00
Sebastian Andrzej Siewior
e3d69f585d net: Move flush list retrieval to where it is used.
The bpf_net_ctx_get_.*_flush_list() are used at the top of the function.
This means the variable is always assigned even if unused. By moving the
function to where it is used, it is possible to delay the initialisation
until it is unavoidable.
Not sure how much this gains in reality but by looking at bq_enqueue()
(in devmap.c) gcc pushes one register less to the stack. \o/.

 Move flush list retrieval to where it is used.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:26:57 +02:00
Sebastian Andrzej Siewior
d839a73179 net: Optimize xdp_do_flush() with bpf_net_context infos.
Every NIC driver utilizing XDP should invoke xdp_do_flush() after
processing all packages. With the introduction of the bpf_net_context
logic the flush lists (for dev, CPU-map and xsk) are lazy initialized
only if used. However xdp_do_flush() tries to flush all three of them so
all three lists are always initialized and the likely empty lists are
"iterated".
Without the usage of XDP but with CONFIG_DEBUG_NET the lists are also
initialized due to xdp_do_check_flushed().

Jakub suggest to utilize the hints in bpf_net_context and avoid invoking
the flush function. This will also avoiding initializing the lists which
are otherwise unused.

Introduce bpf_net_ctx_get_all_used_flush_lists() to return the
individual list if not-empty. Use the logic in xdp_do_flush() and
xdp_do_check_flushed(). Remove the not needed .*_check_flush().

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:26:57 +02:00
Sebastian Andrzej Siewior
2896624be3 net: Remove task_struct::bpf_net_context init on fork.
There is no clone() invocation within a bpf_net_ctx_…() block. Therefore
the task_struct::bpf_net_context has always to be NULL and an explicit
initialisation is not required.

Remove the NULL assignment in the clone() path.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:26:57 +02:00
Paolo Abeni
e27d7168f0 Merge branch 'page_pool-bnxt_en-unlink-old-page-pool-in-queue-api-using-helper'
David Wei says:

====================
page_pool: bnxt_en: unlink old page pool in queue api using helper

56ef27e3 unexported page_pool_unlink_napi() and renamed it to
page_pool_disable_direct_recycling(). This is because there was no
in-tree user of page_pool_unlink_napi().

Since then Rx queue API and an implementation in bnxt got merged. In the
bnxt implementation, it broadly follows the following steps: allocate
new queue memory + page pool, stop old rx queue, swap, then destroy old
queue memory + page pool.

The existing NAPI instance is re-used so when the old page pool that is
no longer used but still linked to this shared NAPI instance is
destroyed, it will trigger warnings.

In my initial patches I unlinked a page pool from a NAPI instance
directly. Instead, export page_pool_disable_direct_recycling() and call
that instead to avoid having a driver touch a core struct.
====================

Link: https://patch.msgid.link/20240627030200.3647145-1-dw@davidwei.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:00:14 +02:00
David Wei
40eca00ae6 bnxt_en: unlink page pool when stopping Rx queue
Have bnxt call page_pool_disable_direct_recycling() to unlink the old
page pool when resetting a queue prior to destroying it, instead of
touching a netdev core struct directly.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:00:11 +02:00
David Wei
d7f39aee79 page_pool: export page_pool_disable_direct_recycling()
56ef27e3 unexported page_pool_unlink_napi() and renamed it to
page_pool_disable_direct_recycling(). This is because there was no
in-tree user of page_pool_unlink_napi().

Since then Rx queue API and an implementation in bnxt got merged. In the
bnxt implementation, it broadly follows the following steps: allocate
new queue memory + page pool, stop old rx queue, swap, then destroy old
queue memory + page pool.

The existing NAPI instance is re-used so when the old page pool that is
no longer used but still linked to this shared NAPI instance is
destroyed, it will trigger warnings.

In my initial patches I unlinked a page pool from a NAPI instance
directly. Instead, export page_pool_disable_direct_recycling() and call
that instead to avoid having a driver touch a core struct.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 15:00:11 +02:00
Paolo Abeni
e2dd0d0593 Merge branch 'zerocopy-tx-cleanups'
Pavel Begunkov says:

====================
zerocopy tx cleanups

Assorted zerocopy send path cleanups, the main part of which is
moving some net stack specific accounting out of io_uring back
to net/ in Patch 4.
====================

Link: https://patch.msgid.link/cover.1719190216.git.asml.silence@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 12:06:52 +02:00
Pavel Begunkov
2ca58ed21c net: limit scope of a skb_zerocopy_iter_stream var
skb_zerocopy_iter_stream() only uses @orig_uarg in the !link_skb path,
and we can move the local variable in the appropriate block.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 12:06:50 +02:00
Pavel Begunkov
060f4ba6e4 io_uring/net: move charging socket out of zc io_uring
Currently, io_uring's io_sg_from_iter() duplicates the part of
__zerocopy_sg_from_iter() charging pages to the socket. It'd be too easy
to miss while changing it in net/, the chunk is not the most
straightforward for outside users and full of internal implementation
details. io_uring is not a good place to keep it, deduplicate it by
moving out of the callback into __zerocopy_sg_from_iter().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 12:06:50 +02:00
Pavel Begunkov
aeb320fc05 net: batch zerocopy_fill_skb_from_iter accounting
Instead of accounting every page range against the socket separately, do
it in batch based on the change in skb->truesize. It's also moved into
__zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is
simpler and responsible for setting frags but not the accounting.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 12:06:50 +02:00
Pavel Begunkov
7fb05423fe net: split __zerocopy_sg_from_iter()
Split a function out of __zerocopy_sg_from_iter() that only cares about
the traditional path with refcounted pages and doesn't need to know
about ->sg_from_iter. A preparation patch, we'll improve on the function
later.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 12:06:50 +02:00
Pavel Begunkov
9e2db9d399 net: always try to set ubuf in skb_zerocopy_iter_stream
skb_zcopy_set() does nothing if there is already a ubuf_info associated
with an skb, and since ->link_skb should have set it several lines above
the check here essentially does nothing and can be removed. It's also
safer this way, because even if the callback is faulty we'll
have it set.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02 12:06:50 +02:00
Russell King (Oracle)
19e6ad2c75 net: phy: fix potential use of NULL pointer in phy_suspend()
phy_suspend() checks the WoL status, and then dereferences
phydrv->flags if (and only if) we decided that WoL has been enabled
on either the PHY or the netdev.

We then check whether phydrv was NULL, but we've potentially already
dereferenced the pointer.

If phydrv is NULL, then phy_ethtool_get_wol() will return an error
and leave wol.wolopts set to zero. However, if netdev->wol_enabled
is true, then we would dereference a NULL pointer.

Checking the PHY drivers, the only place that phydev->wol_enabled is
checked by them is in their suspend/resume callbacks and nowhere else
(which is correct, because phylib only updates this in phy_suspend()).

So, move the NULL pointer check earlier to avoid a NULL pointer
dereference. Leave the check for phydrv->suspend in place as a driver
may populate the .resume method but not the .suspend method.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1sN8tn-00GDCZ-Jj@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-01 20:14:13 -07:00
Jakub Kicinski
1eea11e937 linux-can-next-for-6.11-20240629
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEUEC6huC2BN0pvD5fKDiiPnotvG8FAmZ/8VUTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAoOKI+ei28b35qB/44WrOFDefCGNuERFq5/8Kq0FGoDvoa
 ie1PJXjfGbhn5A53pNpB7jOyDnr6DwWVTdF6AMZObHGpqjRT03LuflWqeC9+JIC1
 X6pIAeZVToBMTi8SKTGvK0Fw13dZuxc/QqlHp3S6cd8s+9ETTDnIBN3iQx4vqPTc
 +G0iJUNC49zf9D4F6MsJTYGP8bmzeexTcjOwj+i2oiZvNN/vSrWs+cjC9l8K3Y0a
 b1FL7rv4l+TcUWrSIxbqNUUZBrzA5uV0JFUOokhC3MvLKw+61r3L9DMUUBZBIB+Q
 aFgJyCwX+ju/j1c76lJT2U3EnzNLxego7fLzha5BcftzCk3974Hvr/3y
 =Ykzx
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-6.11-20240629' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2024-06-29

Geert Uytterhoeven contributes 3 patches with small improvements and
cleanups for the rcar_canfd driver.

A patch by Christophe JAILLET constifies the struct m_can_ops in the
m_can driver to reduce the code size.

The last 9 patches are by me an work around erratum DS80000789E 6 of
mcp2518fd.

* tag 'linux-can-next-for-6.11-20240629' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd
  can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum
  can: mcp251xfd: rx: add workaround for erratum DS80000789E 6 of mcp2518fd
  can: mcp251xfd: rx: prepare to workaround broken RX FIFO head index erratum
  can: mcp251xfd: mcp251xfd_handle_rxif_ring_uinc(): factor out in separate function
  can: mcp251xfd: clarify the meaning of timestamp
  can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()
  can: mcp251xfd: update errata references
  can: mcp251xfd: properly indent labels
  can: gs_usb: add VID/PID for Xylanta SAINT3 product family
  can: m_can: Constify struct m_can_ops
  can: rcar_canfd: Remove superfluous parentheses in address calculations
  can: rcar_canfd: Improve printing of global operational state
  can: rcar_canfd: Simplify clock handling
====================

Link: https://patch.msgid.link/20240629114017.1080160-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-01 20:04:59 -07:00
Heng Qi
74d6529b78 net: ethtool: Fix the panic caused by dev being null when dumping coalesce
syzbot reported a general protection fault caused by a null pointer
dereference in coalesce_fill_reply(). The issue occurs when req_base->dev
is null, leading to an invalid memory access.

This panic occurs if dumping coalesce when no device name is specified.

Fixes: f750dfe825 ("ethtool: provide customized dim profile management")
Reported-by: syzbot+e77327e34cdc8c36b7d3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e77327e34cdc8c36b7d3
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 13:43:50 +01:00
David S. Miller
f61c72be2d Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue into main
Tony nguyen says:

====================
Intel Wired LAN Driver Updates 2024-06-28 (MAINTAINERS, ice)

This series contains updates to MAINTAINERS file and ice driver.

Jesse replaces himself with Przemek in the maintainers file.

Karthik Sundaravel adds support for VF get/set MAC address via devlink.

Eric checks for errors from ice_vsi_rebuild() during queue
reconfiguration.

Paul adjusts FW API version check for E830 devices.

Piotr adds differentiation of unload type when shutting down AdminQ.

Przemek changes ice_adapter initialization to occur once per physical
card.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 13:11:57 +01:00
David S. Miller
2e7b471121 Merge branch 'bnxt_en-ptp' into main
Michael Chan says:

====================
bnxt_en: PTP updates for net-next

The first 5 patches implement the PTP feature on the new BCM5760X
chips.  The main new hardware feature is the new TX timestamp
completion which enables the driver to retrieve the TX timestamp
in NAPI without deferring to the PTP worker.

The last 5 patches increase the number of TX PTP packets in-flight
from 1 to 4 on the older BCM5750X chips.  On these older chips, we
need to call firmware in the PTP worker to retrieve the timestamp.
We use an arry to keep track of the in-flight TX PTP packets.

v2: Patch #2: Fix the unwind of txr->is_ts_pkt when bnxt_start_xmit() aborts.
    Patch #4: Set the SKBTX_IN_PROGRESS flag for timestamp packets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:22 +01:00
Pavan Chebbi
0603383907 bnxt_en: Remove atomic operations on ptp->tx_avail
Now that we require the spinlock to protect ptp->txts_prod, change
ptp->tx_avail to non-atomic and protect it under the same spinlock.
Add a new helper function bnxt_ptp_get_txts_prod() to decrement
ptp->tx_avail under spinlock and return the producer.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:22 +01:00
Pavan Chebbi
8aa2a79e9b bnxt_en: Increase the max total outstanding PTP TX packets to 4
Start accepting up to 4 TX TS requests on BCM5750X (P5) chips.
These PTP TX packets will be queued in the ptp->txts_req[] array
waiting for the TX timestamp to complete.  The entries in the
array will be managed by a producer and consumer index.  The
producer index is updated under spinlock since multiple TX rings
can try to send PTP packets at the same time.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:21 +01:00
Pavan Chebbi
9bf688d40d bnxt_en: Let bnxt_stamp_tx_skb() return error code
Change the function bnxt_stamp_tx_skb() to return 0 for suceess
or -EAGAIN if the timestamp is still pending in firmware.  The
calling PTP aux worker will reschedule based on the return code.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:21 +01:00
Pavan Chebbi
573f2a4bfc bnxt_en: Remove an impossible condition check for PTP TX pending SKB
In the current 5750X PTP code paths, there is always at most one TX
SKB requested for timestamp and we won't accept another one until we
have retrieved the timestamp or it has timed out.  Remove the
unnecessary check in bnxt_get_tx_ts_p5() for a pending SKB and change
the function to void.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:21 +01:00
Pavan Chebbi
92595a0c02 bnxt_en: Refactor all PTP TX timestamp fields into a struct
On the older 5750X (P5) chips, we currently support only 1 TX PTP
packet in-flight waiting for the timestamp.  Refactor the
datastructures to prepare to support up to 4 TX PTP packets.

Combine all fields required for PTP TX timestamp query into one
structure.  An array of this structure will be added in follow-on
patches to support multiple outstanding TX timestamps.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:21 +01:00
Pavan Chebbi
4d588d32b0 bnxt_en: Add BCM5760X specific PHC registers mapping
BCM5760X firmware will advertise direct 64-bit PHC registers access
for the driver from BAR0.

Make the necessary changes in handling HWRM_PORT_MAC_PTP_QCFG's
response and PHC register mapping for 5760X chips.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:20 +01:00
Michael Chan
1d294b4f90 bnxt_en: Add TX timestamp completion logic
The new BCM5760X chips will return the timestamp of TX packets in a
new completion.  Add logic in __bnxt_poll_work() to handle this
completion type to retrieve the timestamp.  This feature eliminates
the limit on the number of in-flight PTP TX packets.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:20 +01:00
Michael Chan
ba0155f1e9 bnxt_en: Allow some TX packets to be unprocessed in NAPI
The driver's current logic will always free all the TX SKBs up to
txr->tx_hw_cons within NAPI.  In the next patches, we'll be adding
logic to handle TX timestamp completion and we may need to hold
some remaining TX SKBs if we don't have the timestamp completions
yet.

Modify __bnxt_poll_work_done() to clear each event bit separately to
allow bnapi->tx_int() to decide whether to clear BNXT_TX_CMP_EVENT or
not.  bnapi->tx_int() will not clear BNXT_TX_CMP_EVENT if some TX
SKBs are held waiting for TX timestamps.  Note that legacy chips will
never hold any SKBs this way.  The SKB is always deferred to the PTP
worker slow path to retrieve the timestamp from firmware.  On the new
P7 chips, the timestamp is returned by the hardware directly and we
can retrieve it directly from NAPI.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:20 +01:00
Michael Chan
449da97512 bnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bd
Remove the unused is_gso field and add the is_ts_pkt field to struct
bnxt_sw_tx_bd.  This field will mark the TX BD that has requested
HW TX timestamp.  The field needs to be cleared if the timestamp packet
is later aborted.  This field will be useful when processing the
new TX timestamp completion from the hardware in the next patches.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:20 +01:00
Michael Chan
be6b7ca3c2 bnxt_en: Add new TX timestamp completion definitions
The new BCM5760X chips will generate this new TX timestamp completion
when a TX packet's timestamp has been taken right before transmission.
The driver logic to retrieve the timestamp will be added in the next
few patches.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:23:20 +01:00
Nithin Dabilpuram
42c45ac141 octeontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAM
Octeontx2 hardware uses Near Data Cache(NDC) block to cache
contexts in it so that access to LLC/DRAM can be avoided.
It is recommended in HRM to sync the NDC contents before
releasing/resetting LF resources. Hence implement NDC_SYNC
mailbox and sync contexts during driver teardown.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:02:11 +01:00
FUJITA Tomonori
7433d034ac net: tn40xx: add initial ethtool_ops support
Call phylink_ethtool_ksettings_get() for get_link_ksettings method and
ethtool_op_get_link() for get_link method.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 11:00:39 +01:00
David S. Miller
1c5fc27bc4 netfilter pull request 24-06-28
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmZ+3ogACgkQ1V2XiooU
 IOR7HRAAsVkJnKLPqV4lcY2Yx/QHi+o1s0pBCTZIqzs2rRXfaYrdu9xV0225DuPn
 xuNNV2GChtWftQxvwcxVLgTGHGG/p8bNYiNJoYEE6acftHZMV4ZZ7NG1yCv2TI3x
 8Udu3vFnvnQhV9Q4LNR3SMtCtz5Z5QP1KNM74uaksN+9opCNniuG23Eft6YXh7Kf
 BYLvJX4pn+St2YTvvnNbA6U/ALxy5OZ/YwXP6FjmERp3AGoFPF2w+MEBmBlyGE3X
 LDKZ05hnKG4Sd/qp7XnZi9kEZoI9iBKg+GPm5ey1BVjZNMCc5hSpCIdYKb8FiwRa
 cN+UCc82H9/N2mJXSrcBDA6n8+lp0dLpfomliERyieY3m38Rp7BKTh6pUOmQCw+H
 bmTJ7rz5WBCC5yjts0N7+2SaVOo+RQpSLXV/SQCIKmk+Xl5sJinvP/gnKWAaPWIm
 3gC4Bv7JUuB6x62EcRzoWGFDw8dXlQ64gvkwyMpeelFIexR3dFCfoA3zAaqJnlxJ
 uZXEF9xuQsZht8IYD37Z6C99tVJzVj/4gCKWKZwi3Kcn/G/MRkQ3lNAPyLewIcMV
 nC1pwU31z1PXNrbSXrXlUEdl1yUzg04wkc4RrVMJgU983kdQdMTp8Q4BbckdhWCV
 4agMNuP4brp6iCvDPamcrWQ+4AbXw/zSdqQr8ONExrOgDUd1ePw=
 =0BtN
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-24-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

Patch #1 to #11 to shrink memory consumption for transaction objects:

  struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 */
  struct nft_trans_elem { /* size: 72 (-40), cachelines: 2, members: 4 */
  struct nft_trans_flowtable { /* size: 80 (-48), cachelines: 2, members: 5 */
  struct nft_trans_obj { /* size: 72 (-40), cachelines: 2, members: 4 */
  struct nft_trans_rule { /* size: 80 (-32), cachelines: 2, members: 6 */
  struct nft_trans_set { /* size: 96 (-24), cachelines: 2, members: 8 */
  struct nft_trans_table { /* size: 56 (-40), cachelines: 1, members: 2 */

  struct nft_trans_elem can now be allocated from kmalloc-96 instead of
  kmalloc-128 slab.

  Series from Florian Westphal. For the record, I have mangled patch #1
  to add nft_trans_container_*() and use if for every transaction object.
   I have also added BUILD_BUG_ON to ensure struct nft_trans always comes
  at the beginning of the container transaction object. And few minor
  cleanups, any new bugs are of my own.

Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno.

Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang.

Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting
          default conntrack timeouts via nfnetlink_cttimeout API, from
          Lin Ma.

Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use
          larger secctx names than the existing 256 bytes length.

Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving
          nfnetlink_queue, from Florian Westphal.

Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01 09:52:35 +01:00