Commit Graph

1153807 Commits

Author SHA1 Message Date
David S. Miller
6bd4755c7c Merge branch 'devlink-unregister'
Jakub Kicinski says:

====================
devlink: remove the wait-for-references on unregister

Move the registration and unregistration of the devlink instances
under their instance locks. Don't perform the netdev-style wait
for all references when unregistering the instance.

Instead the devlink instance refcount will only ensure that
the memory of the instance is not freed. All places which acquire
access to devlink instances via a reference must check that the
instance is still registered under the instance lock.

This fixes the problem of the netdev code accessing devlink
instances before they are registered.

RFC: https://lore.kernel.org/all/20221217011953.152487-1-kuba@kernel.org/
 - rewrite the cover letter
 - rewrite the commit message for patch 1
 - un-export and rename devl_is_alive
 - squash the netdevsim patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:20 +00:00
Jakub Kicinski
82a3aef2e6 netdevsim: move devlink registration under the instance lock
To prevent races with netdev code accessing free devlink instances
move the registration under the devlink instance lock.
Core now waits for the instance to be registered before accessing it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:19 +00:00
Jakub Kicinski
5c5ea1d09f netdevsim: rename a label
err_dl_unregister should unregister the devlink instance.
Looks like renaming it was missed in one of the reshufflings.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:19 +00:00
Jakub Kicinski
1d18bb1a4d devlink: allow registering parameters after the instance
It's most natural to register the instance first and then its
subobjects. Now that we can use the instance lock to protect
the atomicity of all init - it should also be safe.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:19 +00:00
Jakub Kicinski
6ef8f7da92 devlink: don't require setting features before registration
Requiring devlink_set_features() to be run before devlink is
registered is overzealous. devlink_set_features() itself is
a leftover from old workarounds which were trying to prevent
initiating reload before probe was complete.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:19 +00:00
Jakub Kicinski
9053637e0d devlink: remove the registration guarantee of references
The objective of exposing the devlink instance locks to
drivers was to let them use these locks to prevent user space
from accessing the device before it's fully initialized.
This is difficult because devlink_unregister() waits for all
references to be released, meaning that devlink_unregister()
can't itself be called under the instance lock.

To avoid this issue devlink_register() was moved after subobject
registration a while ago. Unfortunately the netdev paths get
a hold of the devlink instances _before_ they are registered.
Ideally netdev should wait for devlink init to finish (synchronizing
on the instance lock). This can't work because we don't know if the
instance will _ever_ be registered (in case of failures it may not).
The other option of returning an error until devlink_register()
is called is unappealing (user space would get a notification
netdev exist but would have to wait arbitrary amount of time
before accessing some of its attributes).

Weaken the guarantees of the devlink references.

Holding a reference will now only guarantee that the memory
of the object is around. Another way of looking at it is that
the reference now protects the object not its "registered" status.
Use devlink instance lock to synchronize unregistration.

This implies that releasing of the "main" reference of the devlink
instance moves from devlink_unregister() to devlink_free().

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:19 +00:00
Jakub Kicinski
ed539ba614 devlink: always check if the devlink instance is registered
Always check under the instance lock whether the devlink instance
is still / already registered.

This is a no-op for the most part, as the unregistration path currently
waits for all references. On the init path, however, we may temporarily
open up a race with netdev code, if netdevs are registered before the
devlink instance. This is temporary, the next change fixes it, and this
commit has been split out for the ease of review.

Note that in case of iterating over sub-objects which have their
own lock (regions and line cards) we assume an implicit dependency
between those objects existing and devlink unregistration.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:19 +00:00
Jakub Kicinski
870c7ad4a5 devlink: protect devlink->dev by the instance lock
devlink->dev is assumed to be always valid as long as any
outstanding reference to the devlink instance exists.

In prep for weakening of the references take the instance lock.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:18 +00:00
Jakub Kicinski
7a54a5195b devlink: update the code in netns move to latest helpers
devlink_pernet_pre_exit() is the only obvious place which takes
the instance lock without using the devl_ helpers. Update the code
and move the error print after releasing the reference
(having unlock and put together feels slightly idiomatic).

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:18 +00:00
Jakub Kicinski
d772781964 devlink: bump the instance index directly when iterating
xa_find_after() is designed to handle multi-index entries correctly.
If a xarray has two entries one which spans indexes 0-3 and one at
index 4 xa_find_after(0) will return the entry at index 4.

Having to juggle the two callbacks, however, is unnecessary in case
of the devlink xarray, as there is 1:1 relationship with indexes.

Always use xa_find() and increment the index manually.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:56:18 +00:00
Mahesh Bandewar
6b754d7bd0 sysctl: expose all net/core sysctls inside netns
All were not visible to the non-priv users inside netns. However,
with 4ecb90090c ("sysctl: allow override of /proc/sys/net with
CAP_NET_ADMIN"), these vars are protected from getting modified.
A proc with capable(CAP_NET_ADMIN) can change the values so
not having them visible inside netns is just causing nuisance to
process that check certain values (e.g. net.core.somaxconn) and
see different behavior in root-netns vs. other-netns

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06 12:41:52 +00:00
Jakub Kicinski
3d759e9e24 Merge branch 'devlink-code-split-and-structured-instance-walk'
Jakub Kicinski says:

====================
devlink: code split and structured instance walk

Split devlink.c into a handful of files, trying to keep the "core"
code away from all the command-specific implementations.
The core code has been quite scattered until now. Going forward we can
consider using a source file per-subobject, I think that it's quite
beneficial to newcomers (based on relative ease with which folks
contribute to ethtool vs devlink). But this series doesn't split
everything out, yet - partially due to backporting concerns,
but mostly due to lack of time. Bulk of the netlink command
handling is left in a leftover.c file.

Introduce a context structure for dumps, and use it to store
the devlink instance ID of the last dumped devlink instance.
This means we don't have to restart the walk from 0 each time.

Finally - introduce a "structured walk". A centralized dump handler
in devlink/netlink.c which walks the devlink instances, deals with
refcounting/locking, simplifying the per-object implementations quite
a bit. Inspired by the ethtool code.

v1: https://lore.kernel.org/all/20230104041636.226398-1-kuba@kernel.org/
RFC: https://lore.kernel.org/all/20221215020155.1619839-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20230105040531.353563-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:40 -08:00
Jakub Kicinski
5ce76d78b9 devlink: convert remaining dumps to the by-instance scheme
Soon we'll have to check if a devlink instance is alive after
locking it. Convert to the by-instance dumping scheme to make
refactoring easier.

Most of the subobject code no longer has to worry about any devlink
locking / lifetime rules (the only ones that still do are the two subject
types which stubbornly use their own locking). Both dump and do callbacks
are given a devlink instance which is already locked and good-to-access
(do from the .pre_doit handler, dump from the new dump indirection).

Note that we'll now check presence of an op (e.g. for sb_pool_get)
under the devlink instance lock, that will soon be necessary anyway,
because we don't hold refs on the driver modules so the memory
in which ops live may be gone for a dead instance, after upcoming
locking changes.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:40 -08:00
Jakub Kicinski
07f3af6608 devlink: add by-instance dump infra
Most dumpit implementations walk the devlink instances.
This requires careful lock taking and reference dropping.
Factor the loop out and provide just a callback to handle
a single instance dump.

Convert one user as an example, other users converted
in the next change.

Slightly inspired by ethtool netlink code.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
e4d5015bc1 devlink: uniformly take the devlink instance lock in the dump loop
Move the lock taking out of devlink_nl_cmd_region_get_devlink_dumpit().
This way all dumps will take the instance lock in the main iteration
loop directly, making refactoring and reading the code easier.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
c9666bac53 devlink: restart dump based on devlink instance ids (function)
Use xarray id for cases of sub-objects which are iterated in
a function.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
a8f947073f devlink: restart dump based on devlink instance ids (nested)
Use xarray id for cases of simple sub-object iteration.
We'll now use the state->instance for the devlink instances
and state->idx for subobject index.

Moving the definition of idx into the inner loop makes sense,
so while at it also move other sub-object local variables into
the loop.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
731d69a6bd devlink: restart dump based on devlink instance ids (simple)
xarray gives each devlink instance an id and allows us to restart
walk based on that id quite neatly. This is nice both from the
perspective of code brevity and from the stability of the dump
(devlink instances disappearing from before the resumption point
will not cause inconsistent dumps).

This patch takes care of simple cases where state->idx counts
devlink instances only.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
a0e13dfdc3 devlink: health: combine loops in dump
Walk devlink instances only once. Dump the instance reporters
and port reporters before moving to the next instance.
User space should not depend on ordering of messages.

This will make improving stability of the walk easier.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
8861c0933c devlink: drop the filter argument from devlinks_xa_find_get
Looks like devlinks_xa_find_get() was intended to get the mark
from the @filter argument. It doesn't actually use @filter, passing
DEVLINK_REGISTERED to xa_find_fn() directly. Walking marks other
than registered is unlikely so drop @filter argument completely.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
20615659b5 devlink: remove start variables from dumps
The start variables made the code clearer when we had to access
cb->args[0] directly, as the name args doesn't explain much.
Now that we use a structure to hold state this seems no longer
needed.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
3015f82249 devlink: use an explicit structure for dump context
Create a dump context structure instead of using cb->args
as an unsigned long array. This is a pure conversion which
is intended to be as much of a noop as possible.
Subsequent changes will use this to simplify the code.

The two non-trivial parts are:
 - devlink_nl_cmd_health_reporter_dump_get_dumpit() checks args[0]
   to see if devlink_fmsg_dumpit() has already been called (whether
   this is the first msg), but doesn't use the exact value, so we
   can drop the local variable there already
 - devlink_nl_cmd_region_read_dumpit() uses args[0] for address
   but we'll use args[1] now, shouldn't matter

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
2c7bc10d0f netlink: add macro for checking dump ctx size
We encourage casting struct netlink_callback::ctx to a local
struct (in a comment above the field). Provide a convenience
macro for checking if the local struct fits into the ctx.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
623cd13b16 devlink: split out netlink code
Move out the netlink glue into a separate file.
Leave the ops in the old file because we'd have to export a ton
of functions. Going forward we should switch to split ops which
will let us to put the new ops in the netlink.c file.

Pure code move, no functional changes.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:39 -08:00
Jakub Kicinski
687125b579 devlink: split out core code
Move core code into a separate file. It's spread around the main
file which makes refactoring and figuring out how devlink works
harder.

Move the xarray, all the most core devlink instance code out like
locking, ref counting, alloc, register, etc. Leave port stuff in
leftover.c, if we want to move port code it'd probably be to its
own file.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:13:31 -08:00
Jakub Kicinski
e50ef40f9a devlink: rename devlink_netdevice_event -> devlink_port_netdevice_event
To make the upcoming change a pure(er?) code move rename
devlink_netdevice_event -> devlink_port_netdevice_event.
This makes it clear that it only touches ports and doesn't
belong cleanly in the core.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:12:01 -08:00
Jakub Kicinski
f05bd8ebeb devlink: move code to a dedicated directory
The devlink code is hard to navigate with 13kLoC in one file.
I really like the way Michal split the ethtool into per-command
files and core. It'd probably be too much to split it all up,
but we can at least separate the core parts out of the per-cmd
implementations and put it in a directory so that new commands
can be separate files.

Move the code, subsequent commit will do a partial split.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:12:00 -08:00
Jakub Kicinski
0da6855378 Merge branch 'net-ipa-simplify-ipa-interrupt-handling'
Alex Elder says:

====================
net: ipa: simplify IPA interrupt handling

One of the IPA's two IRQs fires when data on a suspended channel is
available (to request that the channel--or system--be resumed to
recieve the pending data).  This interrupt also handles a few
conditions signaled by the embedded microcontroller.

For this "IPA interrupt", the current code requires a handler to be
dynamically registered for each interrupt condition.  Any condition
that has no registered handler is quietly ignored.  This design is
derived from the downstream IPA driver implementation.

There isn't any need for this complexity.  Even in the downstream
code, only four of the available 30 or so IPA interrupt conditions
are ever handled.  So these handlers can pretty easily just be
called directly in the main IRQ handler function.

This series simplifies the interrupt handling code by having the
small number of IPA interrupt handlers be called directly, rather
than having them be registered dynamically.

Version 2 just adds a missing forward-reference, as suggested by
Caleb.
====================

Link: https://lore.kernel.org/r/20230104175233.2862874-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:03:16 -08:00
Alex Elder
bfb798545d net: ipa: don't maintain IPA interrupt handler array
We can call the two IPA interrupt handler functions directly;
there's no need to maintain the array of handler function pointers
any more.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:03:14 -08:00
Alex Elder
8d8d3f1a3e net: ipa: kill ipa_interrupt_add()
The dynamic assignment of IPA interrupt handlers isn't needed; we
only handle three IPA interrupt types, and their handler functions
are now assigned directly.  We can get rid of ipa_interrupt_add()
and ipa_interrupt_remove() now, because they serve no purpose.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:03:14 -08:00
Alex Elder
482ae3a993 net: ipa: register IPA interrupt handlers directly
Declare the microcontroller IPA interrupt handler publicly, and
assign it directly in ipa_interrupt_config().  Make the SUSPEND IPA
interrupt handler public, and rename it ipa_power_suspend_handler().
Assign it directly in ipa_interrupt_config() as well.

This makes it unnecessary to do this in ipa_interrupt_add().  Make
similar changes for removing IPA interrupt handlers.

The next two patches will finish the cleanup, removing the
add/remove functions and the handler array entirely.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:03:14 -08:00
Alex Elder
d50ed35587 net: ipa: enable IPA interrupt handlers separate from registration
Expose ipa_interrupt_enable() and have functions that register
IPA interrupt handlers enable them directly, rather than having the
registration process do that.  Do the same for disabling IPA
interrupt handlers.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:03:14 -08:00
Alex Elder
8e461e1f09 net: ipa: introduce ipa_interrupt_enable()
Create new function ipa_interrupt_enable() to encapsulate enabling
one of the IPA interrupt types.  Introduce ipa_interrupt_disable()
to reverse that operation.  Add a helper function to factor out the
common register update used by both.

Use these in ipa_interrupt_add() and ipa_interrupt_remove().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:03:13 -08:00
Alex Elder
e5709b7c1e net: ipa: introduce a common microcontroller interrupt handler
The prototype for an IPA interrupt handler supplies the IPA
interrupt ID, so it's possible to use a single function to handle
any type of microcontroller interrupt.

Introduce ipa_uc_interrupt_handler(), which calls the event or the
response handler depending on the IRQ ID provided.  Register the new
function as the handler for both microcontroller IPA interrupt types.

The called functions don't use their "irq_id" arguments, so remove
them.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 22:03:13 -08:00
Jakub Kicinski
afd50da912 Merge branch 'enetc-unlock-xdp_redirect-for-xdp-non-linear-buffers'
Lorenzo Bianconi says:

====================
enetc: unlock XDP_REDIRECT for XDP non-linear buffers

Unlock XDP_REDIRECT for S/G XDP buffer and rely on XDP stack to properly
take care of the frames.
Rely on XDP_FLAGS_HAS_FRAGS flag to check if it really necessary to access
non-linear part of the xdp_buff/xdp_frame.
====================

Link: https://lore.kernel.org/r/cover.1672840490.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 21:38:33 -08:00
Lorenzo Bianconi
c7030d14c7 net: ethernet: enetc: do not always access skb_shared_info in the XDP path
Move XDP skb_shared_info structure initialization in from
enetc_map_rx_buff_to_xdp() to enetc_add_rx_buff_to_xdp() and do not always
access skb_shared_info in the xdp_buff/xdp_frame since it is located in a
different cacheline with respect to hard_start and data xdp pointers.
Rely on XDP_FLAGS_HAS_FRAGS flag to check if it really necessary to access
non-linear part of the xdp_buff/xdp_frame.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 21:38:29 -08:00
Lorenzo Bianconi
59cc773a35 net: ethernet: enetc: get rid of xdp_redirect_sg counter
Remove xdp_redirect_sg counter and the related ethtool entry since it is
no longer used.

Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 21:38:29 -08:00
Lorenzo Bianconi
8feb020f92 net: ethernet: enetc: unlock XDP_REDIRECT for XDP non-linear buffers
Even if full XDP_REDIRECT is not supported yet for non-linear XDP buffers
since we allow redirecting just into CPUMAPs, unlock XDP_REDIRECT for
S/G XDP buffer and rely on XDP stack to properly take care of the
frames.

Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 21:38:29 -08:00
Jakub Kicinski
4aea86b403 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05 15:34:11 -08:00
Linus Torvalds
50011c32f4 Including fixes from bpf, wifi, and netfilter.
Current release - regressions:
 
  - bpf: fix nullness propagation for reg to reg comparisons,
    avoid null-deref
 
  - inet: control sockets should not use current thread task_frag
 
  - bpf: always use maximal size for copy_array()
 
  - eth: bnxt_en: don't link netdev to a devlink port for VFs
 
 Current release - new code bugs:
 
  - rxrpc: fix a couple of potential use-after-frees
 
  - netfilter: conntrack: fix IPv6 exthdr error check
 
  - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
 
  - eth: dsa: qca8k: various fixes for the in-band register access
 
  - eth: nfp: fix schedule in atomic context when sync mc address
 
  - eth: renesas: rswitch: fix getting mac address from device tree
 
  - mobile: ipa: use proper endpoint mask for suspend
 
 Previous releases - regressions:
 
  - tcp: add TIME_WAIT sockets in bhash2, fix regression caught
    by Jiri / python tests
 
  - net: tc: don't intepret cls results when asked to drop, fix
    oob-access
 
  - vrf: determine the dst using the original ifindex for multicast
 
  - eth: bnxt_en:
    - fix XDP RX path if BPF adjusted packet length
    - fix HDS (header placement) and jumbo thresholds for RX packets
 
  - eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
    avoid memory corruptions
 
 Previous releases - always broken:
 
  - ulp: prevent ULP without clone op from entering the LISTEN status
 
  - veth: fix race with AF_XDP exposing old or uninitialized descriptors
 
  - bpf:
    - pull before calling skb_postpull_rcsum() (fix checksum support
      and avoid a WARN())
    - fix panic due to wrong pageattr of im->image (when livepatch
      and kretfunc coexist)
    - keep a reference to the mm, in case the task is dead
 
  - mptcp: fix deadlock in fastopen error path
 
  - netfilter:
    - nf_tables: perform type checking for existing sets
    - nf_tables: honor set timeout and garbage collection updates
    - ipset: fix hash:net,port,net hang with /0 subnet
    - ipset: avoid hung task warning when adding/deleting entries
 
  - selftests: net:
    - fix cmsg_so_mark.sh test hang on non-x86 systems
    - fix the arp_ndisc_evict_nocarrier test for IPv6
 
  - usb: rndis_host: secure rndis_query check against int overflow
 
  - eth: r8169: fix dmar pte write access during suspend/resume with WOL
 
  - eth: lan966x: fix configuration of the PCS
 
  - eth: sparx5: fix reading of the MAC address
 
  - eth: qed: allow sleep in qed_mcp_trace_dump()
 
  - eth: hns3:
    - fix interrupts re-initialization after VF FLR
    - fix handling of promisc when MAC addr table gets full
    - refine the handling for VF heartbeat
 
  - eth: mlx5:
    - properly handle ingress QinQ-tagged packets on VST
    - fix io_eq_size and event_eq_size params validation on big endian
    - fix RoCE setting at HCA level if not supported at all
    - don't turn CQE compression on by default for IPoIB
 
  - eth: ena:
    - fix toeplitz initial hash key value
    - account for the number of XDP-processed bytes in interface stats
    - fix rx_copybreak value update
 
 Misc:
 
  - ethtool: harden phy stat handling against buggy drivers
 
  - docs: netdev: convert maintainer's doc from FAQ to a normal document
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmO3MLcACgkQMUZtbf5S
 IrsEQBAAijPrpxsGMfX+VMqZ8RPKA3Qg8XF3ji2fSp4c0kiKv6lYI7PzPTR3u/fj
 CAlhQMHv7z53uM6Zd7FdUVl23paaEycu8YnlwSubg9z+wSeh/RQ6iq94mSk1PV+K
 LLVR/yop2N35Yp/oc5KZMb9fMLkxRG9Ci73QUVVYgvIrSd4Zdm13FjfVjL2C1MZH
 Yp003wigMs9IkIHOpHjNqwn/5s//0yXsb1PgKxCsaMdMQsG0yC+7eyDmxshCqsji
 xQm15mkGMjvWEYJaa4Tj4L3JW6lWbQzCu9nqPUX16KpmrnScr8S8Is+aifFZIBeW
 GZeDYgvjSxNWodeOrJnD3X+fnbrR9+qfx7T9y7XighfytAz5DNm1LwVOvZKDgPFA
 s+LlxOhzkDNEqbIsusK/LW+04EFc5gJyTI2iR6s4SSqmH3c3coJZQJeyRFWDZy/x
 1oqzcCcq8SwGUTJ9g6HAmDQoVkhDWDT/ZcRKhpWG0nJub972lB2iwM7LrAu+HoHI
 r8hyCkHpOi5S3WZKI9gPiGD+yOlpVAuG2wHg2IpjhKQvtd9DFUChGDhFeoB2rqJf
 9uI3RJBBYTDkeNu3kpfy5uMh2XhvbIZntK5kwpJ4VettZWFMaOAzn7KNqk8iT4gJ
 ASMrUrX59X0TAN0MgpJJm7uGtKbKZOu4lHNm74TUxH7V7bYn7dk=
 =TlcN
 -----END PGP SIGNATURE-----

Merge tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, wifi, and netfilter.

  Current release - regressions:

   - bpf: fix nullness propagation for reg to reg comparisons, avoid
     null-deref

   - inet: control sockets should not use current thread task_frag

   - bpf: always use maximal size for copy_array()

   - eth: bnxt_en: don't link netdev to a devlink port for VFs

  Current release - new code bugs:

   - rxrpc: fix a couple of potential use-after-frees

   - netfilter: conntrack: fix IPv6 exthdr error check

   - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes

   - eth: dsa: qca8k: various fixes for the in-band register access

   - eth: nfp: fix schedule in atomic context when sync mc address

   - eth: renesas: rswitch: fix getting mac address from device tree

   - mobile: ipa: use proper endpoint mask for suspend

  Previous releases - regressions:

   - tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
     Jiri / python tests

   - net: tc: don't intepret cls results when asked to drop, fix
     oob-access

   - vrf: determine the dst using the original ifindex for multicast

   - eth: bnxt_en:
      - fix XDP RX path if BPF adjusted packet length
      - fix HDS (header placement) and jumbo thresholds for RX packets

   - eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
     avoid memory corruptions

  Previous releases - always broken:

   - ulp: prevent ULP without clone op from entering the LISTEN status

   - veth: fix race with AF_XDP exposing old or uninitialized
     descriptors

   - bpf:
      - pull before calling skb_postpull_rcsum() (fix checksum support
        and avoid a WARN())
      - fix panic due to wrong pageattr of im->image (when livepatch and
        kretfunc coexist)
      - keep a reference to the mm, in case the task is dead

   - mptcp: fix deadlock in fastopen error path

   - netfilter:
      - nf_tables: perform type checking for existing sets
      - nf_tables: honor set timeout and garbage collection updates
      - ipset: fix hash:net,port,net hang with /0 subnet
      - ipset: avoid hung task warning when adding/deleting entries

   - selftests: net:
      - fix cmsg_so_mark.sh test hang on non-x86 systems
      - fix the arp_ndisc_evict_nocarrier test for IPv6

   - usb: rndis_host: secure rndis_query check against int overflow

   - eth: r8169: fix dmar pte write access during suspend/resume with
     WOL

   - eth: lan966x: fix configuration of the PCS

   - eth: sparx5: fix reading of the MAC address

   - eth: qed: allow sleep in qed_mcp_trace_dump()

   - eth: hns3:
      - fix interrupts re-initialization after VF FLR
      - fix handling of promisc when MAC addr table gets full
      - refine the handling for VF heartbeat

   - eth: mlx5:
      - properly handle ingress QinQ-tagged packets on VST
      - fix io_eq_size and event_eq_size params validation on big endian
      - fix RoCE setting at HCA level if not supported at all
      - don't turn CQE compression on by default for IPoIB

   - eth: ena:
      - fix toeplitz initial hash key value
      - account for the number of XDP-processed bytes in interface stats
      - fix rx_copybreak value update

  Misc:

   - ethtool: harden phy stat handling against buggy drivers

   - docs: netdev: convert maintainer's doc from FAQ to a normal
     document"

* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
  caif: fix memory leak in cfctrl_linkup_request()
  inet: control sockets should not use current thread task_frag
  net/ulp: prevent ULP without clone op from entering the LISTEN status
  qed: allow sleep in qed_mcp_trace_dump()
  MAINTAINERS: Update maintainers for ptp_vmw driver
  usb: rndis_host: Secure rndis_query check against int overflow
  net: dpaa: Fix dtsec check for PCS availability
  octeontx2-pf: Fix lmtst ID used in aura free
  drivers/net/bonding/bond_3ad: return when there's no aggregator
  netfilter: ipset: Rework long task execution when adding/deleting entries
  netfilter: ipset: fix hash:net,port,net hang with /0 subnet
  net: sparx5: Fix reading of the MAC address
  vxlan: Fix memory leaks in error path
  net: sched: htb: fix htb_classify() kernel-doc
  net: sched: cbq: dont intepret cls results when asked to drop
  net: sched: atm: dont intepret cls results when asked to drop
  dt-bindings: net: marvell,orion-mdio: Fix examples
  dt-bindings: net: sun8i-emac: Add phy-supply property
  net: ipa: use proper endpoint mask for suspend
  selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
  ...
2023-01-05 12:40:50 -08:00
Linus Torvalds
aa01a18392 gpio fixes for v6.2-rc3
- fix a reference leak in gpio-sifive
 - fix a potential use of an uninitialized variable in core gpiolib
 - fix a potential use of an uninitialized variable in gpio-pca953x
 - make GPIO irqchips immutable in gpio-pmic-eic-sprd, gpio-eic-sprd and
   gpio-sprd
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmO3EEQACgkQEacuoBRx
 13KPmg//Xu6/ng05VYSMtKr/3di1+8RD603+uLu5fCsJTzNHOezKX+Zumkf4eiZ5
 9ojvf/TGMWnCfKN5XAJ5YLqkCfAtSxBXMo0WLyZqDHtSGraBuOHdPpidSSkIcIyI
 BYql+lNtpEvUAG2JT0UjBnTiGgooNIYnmi18gecVzJ+xImP6pUrO41anfvLycOCH
 EqhpFZhegHm45XrQso6UH8T50QxL9avTnfwygVPR8s3M0pWmtJqogQYNjZasx0pf
 oRCSJ3eUZXWR69Z7IzRPJzBY88uL3MjcJfD4e8+rotR0/mTR55CdxbhS2i79Vc33
 zn80+5aWq1BtyAB+ZJrEd+211q5IVg+pnRQcdKDl86NSvQRVU57lCVpiTgc0DL4e
 kcktFWCoKEJ56p+ZtJDFnpSddetyEg+1mGPIBC9+MPANzkqggm0eq2o7ymqEEEGI
 gRe9kzBfN9oOKxtdfIrvttqrTKyifSJsbe6Sb6B+RG9nhJdE6eK6SdpS5m5WHXZW
 ATFj9c8yBCZzlXuJbviZUGedhCyLKI4z5EzgJj5TayQBuSZTYme/Iz2+++GjHbdD
 pylT/lSWbyGl1rrZ5YgQPeKN9U9/aknWEu1KHgx7iL9v8ulIRd3lVAqfFfwyf6Lt
 WPPHhlMznTQ/Wf6EhYb1U/yaCyAyWTxTh/KR6BMqp4FyKlQ9n/k=
 =vy5/
 -----END PGP SIGNATURE-----

Merge tag 'gpio-fixes-for-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
 "A reference leak fix, two fixes for using uninitialized variables and
  more drivers converted to using immutable irqchips:

   - fix a reference leak in gpio-sifive

   - fix a potential use of an uninitialized variable in core gpiolib

   - fix a potential use of an uninitialized variable in gpio-pca953x

   - make GPIO irqchips immutable in gpio-pmic-eic-sprd, gpio-eic-sprd
     and gpio-sprd"

* tag 'gpio-fixes-for-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: sifive: Fix refcount leak in sifive_gpio_probe
  gpio: sprd: Make the irqchip immutable
  gpio: pmic-eic-sprd: Make the irqchip immutable
  gpio: eic-sprd: Make the irqchip immutable
  gpio: pca953x: avoid to use uninitialized value pinctrl
  gpiolib: Fix using uninitialized lookup-flags on ACPI platforms
2023-01-05 12:06:40 -08:00
Linus Torvalds
5e9af4b426 fbdev updates for kernel 6.2-rc3:
- Fix Matrox G200eW initialization failure
 - Fix build failure of offb driver when built as module
 - Optimize stack usage in omapfb
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCY7asbAAKCRD3ErUQojoP
 X3bKAQDARxChoZ3eBfkKd3MJXMrYorE/T4zX2stBIdU6p/WgPQEA4W1xx3B6yzIK
 XR+lRlhwUIUkmgzMG9JEyZQEd9nzSA8=
 =s+E8
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-for-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes from Helge Deller:

 - Fix Matrox G200eW initialization failure

 - Fix build failure of offb driver when built as module

 - Optimize stack usage in omapfb

* tag 'fbdev-for-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: omapfb: avoid stack overflow warning
  fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB
  fbdev: atyfb: use strscpy() to instead of strncpy()
  fbdev: omapfb: use strscpy() to instead of strncpy()
  fbdev: make offb driver tristate
2023-01-05 11:24:33 -08:00
Paolo Abeni
0471005efa Merge branch 'add-support-for-qsgmii-mode-for-j721e-cpsw9g-to-am65-cpsw-driver'
Siddharth Vadapalli says:

====================
Add support for QSGMII mode for J721e CPSW9G to am65-cpsw driver

Add compatible to am65-cpsw driver for J721e CPSW9G, which contains 8
external ports and 1 internal host port.

Add support to power on and power off the SERDES PHY which is used by the
CPSW MAC.

=========
Changelog
=========
v5:
https://lore.kernel.org/r/20221109042203.375042-1-s-vadapalli@ti.com/
v4:
https://lore.kernel.org/r/20221108080606.124596-1-s-vadapalli@ti.com/
v3:
https://lore.kernel.org/r/20221026090957.180592-1-s-vadapalli@ti.com/
v2:
https://lore.kernel.org/r/20221018085810.151327-1-s-vadapalli@ti.com/
v1:
https://lore.kernel.org/r/20220914095053.189851-1-s-vadapalli@ti.com/
====================

Link: https://lore.kernel.org/r/20230104103432.1126403-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-05 12:12:21 +01:00
Siddharth Vadapalli
dab2b265dd net: ethernet: ti: am65-cpsw: Add support for SERDES configuration
Use PHY framework APIs to initialize the SERDES PHY connected to CPSW MAC.

Define the functions am65_cpsw_disable_phy(), am65_cpsw_enable_phy(),
am65_cpsw_disable_serdes_phy() and am65_cpsw_enable_serdes_phy().

Add new member "serdes_phy" to struct "am65_cpsw_slave_data" to store the
SERDES PHY for each port, if it exists. Use it later while disabling the
SERDES PHY for each port.

Power on and initialize the SerDes PHY in am65_cpsw_nuss_init_slave_ports()
by invoking am65_cpsw_enable_serdes_phy().

Power off the SerDes PHY in am65_cpsw_nuss_remove() by invoking
am65_cpsw_disable_serdes_phy().

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-05 12:12:19 +01:00
Siddharth Vadapalli
944131fa65 net: ethernet: ti: am65-cpsw: Enable QSGMII mode for J721e CPSW9G
CPSW9G in J721e supports additional modes like QSGMII.
Add new compatible for J721e in am65-cpsw driver.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-05 12:12:19 +01:00
Siddharth Vadapalli
c85b53e32c dt-bindings: net: ti: k3-am654-cpsw-nuss: Add J721e CPSW9G support
Update bindings for TI K3 J721e SoC which contains 9 ports (8 external
ports) CPSW9G module and add compatible for it.

Changes made:
    - Add new compatible ti,j721e-cpswxg-nuss for CPSW9G.
    - Extend pattern properties for new compatible.
    - Change maximum number of CPSW ports to 8 for new compatible.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-05 12:12:19 +01:00
Arnd Bergmann
634cf6ead9 fbdev: omapfb: avoid stack overflow warning
The dsi_irq_stats structure is a little too big to fit on the
stack of a 32-bit task, depending on the specific gcc options:

fbdev/omap2/omapfb/dss/dsi.c: In function 'dsi_dump_dsidev_irqs':
fbdev/omap2/omapfb/dss/dsi.c:1621:1: error: the frame size of 1064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Since this is only a debugfs file, performance is not critical,
so just dynamically allocate it, and print an error message
in there in place of a failure code when the allocation fails.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>
2023-01-05 11:43:27 +01:00
Zhengchao Shao
fe69230f05 caif: fix memory leak in cfctrl_linkup_request()
When linktype is unknown or kzalloc failed in cfctrl_linkup_request(),
pkt is not released. Add release process to error path.

Fixes: b482cd2053 ("net-caif: add CAIF core protocol stack")
Fixes: 8d545c8f95 ("caif: Disconnect without waiting for response")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230104065146.1153009-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-05 10:19:36 +01:00
Eric Dumazet
1ac8855744 inet: control sockets should not use current thread task_frag
Because ICMP handlers run from softirq contexts,
they must not use current thread task_frag.

Previously, all sockets allocated by inet_ctl_sock_create()
would use the per-socket page fragment, with no chance of
recursion.

Fixes: 98123866fc ("Treewide: Stop corrupting socket's task_frag")
Reported-by: syzbot+bebc6f1acdf4cbb79b03@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/20230103192736.454149-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-04 20:38:25 -08:00
Paolo Abeni
2c02d41d71 net/ulp: prevent ULP without clone op from entering the LISTEN status
When an ULP-enabled socket enters the LISTEN status, the listener ULP data
pointer is copied inside the child/accepted sockets by sk_clone_lock().

The relevant ULP can take care of de-duplicating the context pointer via
the clone() operation, but only MPTCP and SMC implement such op.

Other ULPs may end-up with a double-free at socket disposal time.

We can't simply clear the ULP data at clone time, as TLS replaces the
socket ops with custom ones assuming a valid TLS ULP context is
available.

Instead completely prevent clone-less ULP sockets from entering the
LISTEN status.

Fixes: 734942cc4e ("tcp: ULP infrastructure")
Reported-by: slipper <slipper.alive@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4b80c3d1dbe3d0ab072f80450c202d9bc88b4b03.1672740602.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-04 20:37:41 -08:00