Commit f4f943c958 ("RDS: IB: ack more receive completions to improve performance")
removed rds_ib_recv_tasklet_fn() implementation but not the declaration.
And commit ec16227e14 ("RDS/IB: Infiniband transport") declared but never implemented
other functions.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20240731063630.3592046-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
draft-ietf-6man-pio-pflag is adding a new flag to the Prefix Information
Option to signal that the network can allocate a unique IPv6 prefix per
client via DHCPv6-PD (see draft-ietf-v6ops-dhcp-pd-per-device).
When ra_honor_pio_pflag is enabled, the presence of a P-flag causes
SLAAC autoconfiguration to be disabled for that particular PIO.
An automated test has been added in Android (r.android.com/3195335) to
go along with this change.
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Lamparter <equinox@opensourcerouting.org>
Cc: Simon Horman <horms@kernel.org>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input parameter "is_rmb" of the smcr_new_buf_create function
has never been used, remove it.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the SMC client perform CLC handshake, it will check whether
the clc header type is correct in receiving SMC_CLC_ACCEPT packet.
The specific invoking path is as follows:
__smc_connect
smc_connect_clc
smc_clc_wait_msg
smc_clc_msg_hdr_valid
smc_clc_msg_acc_conf_valid
Therefore, the smc_connect_check_aclc interface invoked by
__smc_connect does not need to check type again.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the SMC client begins to connect to server, smcd_version is set
to SMC_V1 + SMC_V2. If fail to get VLAN ID, only SMC_V2 information
is left in smcd_version. And smcd_version will not be changed to 0.
Therefore, remove the fallback caused by the failure to get VLAN ID.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because linux/err.h is unreferenced in smc_loopback.h file, so
remove it.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the work of closing all tunnels from the pernet exit hook to
pre_exit since the core does rcu synchronisation between these steps
and we can therefore remove rcu_barrier from l2tp code.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp eth/ppp pseudowire setup/cleanup uses kfree() in some error
paths. Drop the refcount instead such that the session object is
always freed when the refcount reaches 0.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_session_register uses an idr_alloc then idr_replace pattern to
insert sessions into the session IDR. To catch invalid locking, add a
WARN_ON_ONCE if the IDR entry is modified by another thread between
alloc and replace steps.
Also add comments to make expectations clear.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_v3_session_htable and tunnel->session_list are read by lockless
getters using RCU. Use rcu list variants when adding or removing list
items.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a session is created, it sets a backpointer to its tunnel. When
the session refcount drops to 0, l2tp_session_free drops the tunnel
refcount if session->tunnel is non-NULL. However, session->tunnel is
set in l2tp_session_create, before the tunnel refcount is incremented
by l2tp_session_register, which leaves a small window where
session->tunnel is non-NULL when the tunnel refcount hasn't been
bumped.
Moving the assignment to l2tp_session_register is trivial but
l2tp_session_create calls l2tp_session_set_header_len which uses
session->tunnel to get the tunnel's encap. Add an encap arg to
l2tp_session_set_header_len to avoid using session->tunnel.
If l2tpv3 sessions have colliding IDs, it is possible for
l2tp_v3_session_get to race with l2tp_session_register and fetch a
session which doesn't yet have session->tunnel set. Add a check for
this case.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each l2tp ppp session has an associated pppox socket. l2tp_ppp uses
the session's pppox socket refcount to manage session lifetimes; the
pppox socket holds a ref on the session which is dropped by the socket
destructor. This complicates session cleanup.
Given l2tp sessions are refcounted, it makes more sense to reverse
this relationship such that the session keeps the socket alive, not
the other way around. So refactor l2tp_ppp to have the session hold a
ref on its socket while it references it. When the session is closed,
it drops its socket ref when it detaches from its socket. If the
socket is closed first, it initiates the closing of its session, if
one is attached. The socket/session can then be freed asynchronously
when their refcounts drop to 0.
Use the session's session_close callback to detach the pppox socket
since this will be done on the work queue together with the rest of
the session cleanup via l2tp_session_delete.
Also, since l2tp_ppp uses the pppox socket's sk_user_data, use the rcu
sk_user_data access helpers when accessing it and set the socket's
SOCK_RCU_FREE flag to have pppox sockets freed by rcu.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp sessions may be accessed under an rcu read lock. Have them freed
via rcu and remove the now unneeded synchronize_rcu when a session is
removed.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a tunnel is closed, l2tp_tunnel_closeall closes all sessions in
the tunnel. Move the work of deleting each session to the work queue
so that sessions are deleted using the same codepath whether they are
closed by user API request or their parent tunnel is closing. This
also avoids the locking dance in l2tp_tunnel_closeall where the
tunnel's session list lock was unlocked and relocked in the loop.
In l2tp_exit_net, use drain_workqueue instead of flush_workqueue
because the processing of tunnel_delete work may queue session_delete
work items which must also be processed.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the l2tp tunnel socket used sk_user_data to point to its
associated l2tp tunnel, socket and tunnel cleanup had to make use of
the socket's destructor to free the tunnel only when the socket could
no longer be accessed.
Now that sk_user_data is no longer used, we can simplify socket and
tunnel cleanup:
* If the tunnel closes first, it cleans up and drops its socket ref
when the tunnel refcount drops to zero. If its socket was provided
by userspace, the socket is closed and freed asynchronously, when
userspace closes it. If its socket is a kernel socket, the tunnel
closes the socket itself during cleanup and drops its socket ref
when the tunnel's refcount drops to zero.
* If the socket closes first, we initiate the closing of its
associated tunnel. For UDP sockets, this is via the socket's
encap_destroy hook. For L2TPIP sockets, this is via the socket's
destroy callback. The tunnel holds a socket ref while it
references the sock. When the tunnel is freed, it drops its socket
ref and the socket will be cleaned up when its own refcount drops
to zero, asynchronous to the tunnel free.
* The tunnel socket destructor is no longer needed since the tunnel
is no longer freed through the socket destructor.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since l2tp no longer derives tunnel pointers directly via
sk_user_data, it is no longer useful for l2tp to check tunnel pointers
using a magic feather. Drop the tunnel's magic field.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp no longer uses the tunnel socket's sk_user_data so drop the code
which sets it.
In l2tp_validate_socket use l2tp_sk_to_tunnel to check whether a given
socket is already attached to an l2tp tunnel since we can no longer
use non-null sk_user_data to indicate this.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp's ppp procfs output can be used to show internal state of
pppol2tp. It includes a 'user-data-ok' field, which is derived from
the tunnel socket's sk_user_data being non-NULL. Use tunnel->sock
being non-NULL to indicate this instead.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the recently exported ip_flush_pending_frames instead of a
free-coded version and lock the socket while we call it.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid protocol modules implementing their own, export
ip_flush_pending_frames.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_sk_to_tunnel derives the tunnel from sk_user_data. Instead,
lookup the tunnel by walking the tunnel IDR for a tunnel using the
indicated sock. This is slow but l2tp_sk_to_tunnel is not used in
the datapath so performance isn't critical.
l2tp_tunnel_destruct needs a variant of l2tp_sk_to_tunnel which does
not bump the tunnel refcount since the tunnel refcount is already 0.
Change l2tp_sk_to_tunnel sk arg to const since it does not modify sk.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The goo.gl URL shortener is deprecated and is due to stop
expanding existing links in 2025.
Expand the link in Kconfig.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240729205337.48058-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
catching COVID, so relatively short PR. Including fixes from bpf
and netfilter.
Current release - regressions:
- tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
Current release - new code bugs:
- l2tp: protect session IDR and tunnel session list with one lock,
make sure the state is coherent to avoid a warning
- eth: bnxt_en: update xdp_rxq_info in queue restart logic
- eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
Previous releases - regressions:
- xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
the field reuses previously un-validated pad
Previous releases - always broken:
- tap/tun: drop short frames to prevent crashes later in the stack
- eth: ice: add a per-VF limit on number of FDIR filters
- af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaibxAACgkQMUZtbf5S
IruuIRAAu96TiN/urPwmKznyb/Sk8x7p8iUzn6OvPS/TUlFUkURQtOh6M9uvbpN4
x/L//EWkMR0hY4SkBegoiXfb1GS0PjBdWTWUiROm5X9nVHqp5KRZAxWXhjFiS1BO
BIYOT+JfCl7mQiPs90Mys/cEtYOggMBsCZQVIGw/iYoJLFREqxFSONwa0dG+tGMX
jn9WNu4yCVDhJ/jtl2MaTsCNtYUaBUgYrKHJBfNGfJ2Lz/7rH9yFui2WSMlmOd/U
QGeCb1DWURlShlCqY37wNinbFsxWkI5JN00ukTtwFAXLIaqc+zgHcIjrDjTJwK43
F4tKbJT3+bmehMU/h3Uo3c7DhXl7n9zDGiDtbCxnkykp0sFGJpjhDrWydo51c+YB
qW5HaNrII2LiDicOVN8L29ylvKp7AEkClxgivEhZVGGk2f/szJRXfp9u3WBn5kAx
3paH55YN0DEsKbYbb1ZENEI1Vnc/4ff4PxZJCUNKwzcS8wCn1awqwcriK9TjS/cp
fjilNFT4J3/uFrodHWTkx0jJT6UJFT0aF03qPLUH/J5kG+EVukOf1jBPInNdf1si
1j47SpblHUe86HiHphFMt32KZ210lJzWxh8uGma57Y2sB9makdLiK4etrFjkiMJJ
Z8A3kGp3KpFjbuK4tHY25rp+5oxLNNOBNpay29lQrWtCL/NDcaQ=
=9OsH
-----END PGP SIGNATURE-----
Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
A lot of networking people were at a conference last week, busy
catching COVID, so relatively short PR.
Current release - regressions:
- tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
Current release - new code bugs:
- l2tp: protect session IDR and tunnel session list with one lock,
make sure the state is coherent to avoid a warning
- eth: bnxt_en: update xdp_rxq_info in queue restart logic
- eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
Previous releases - regressions:
- xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
the field reuses previously un-validated pad
Previous releases - always broken:
- tap/tun: drop short frames to prevent crashes later in the stack
- eth: ice: add a per-VF limit on number of FDIR filters
- af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"
* tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
tun: add missing verification for short frame
tap: add missing verification for short frame
mISDN: Fix a use after free in hfcmulti_tx()
gve: Fix an edge case for TSO skb validity check
bnxt_en: update xdp_rxq_info in queue restart logic
tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
bpf: Fix a segment issue when downgrading gso_size
net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
MAINTAINERS: make Breno the netconsole maintainer
MAINTAINERS: Update bonding entry
net: nexthop: Initialize all fields in dumped nexthops
net: stmmac: Correct byte order of perfect_match
selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
tipc: Return non-zero value from tipc_udp_addr2str() on error
netfilter: nft_set_pipapo_avx2: disable softinterrupts
ice: Fix recipe read procedure
ice: Add a per-VF limit on number of FDIR filters
net: bonding: correctly annotate RCU in bond_should_notify_peers()
...
Summary
- const qualify struct ctl_table args in proc_handlers:
This is a prerequisite to moving the static ctl_table structs into .rodata
data which will ensure that proc_handler function pointers cannot be
modified.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmahVOcACgkQupfNUreW
QU8QdgwAiDZMEQ6psvDfLGQlOvHyp21exZFC/i1OCrCq4BmTu83iYcojIKSaZ0q6
pfR9eYTVhNfAGsV2hQJvvUQPSVNESB/A7Z2zGLVQmZcOerAN17L2C7uA1SkbRBel
P3yoMX6wruVCnBgQTcjktoBgpDqaOiqLTSh8Ko512qYOClLf22sCbbk+doiOwMF6
zbzh9bRo+qTuTgW30gwzRNml/lA1/XylSsM/cEFCxsc8k22lCneP+iWKDAfWtsr6
0GL+SvbBTyIEMFFvSw+0MUW30QctxaCYA1pBmn8nmc6eJ2yDly75/r8EWRr7zqDW
SrRiXYDAzNcW3tiOdC2FW/J95VX62DiSQporVZ/OvahBKpQKGaLLNjx1EE3qGKIk
ggnYiDmIO88ZBHoRbDDK9EdGGqrIZfRx9JeFD3bMOrwT3Ffxw30rwcYpzlOyjFz3
TIUeNthGXoQNRxZfTE+iybotKWcXnxCNe4U1Mvc2liY6+Z0jazDWfs+jV7CiuX1G
Xk8jpTCR
=JMg4
-----END PGP SIGNATURE-----
Merge tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl constification from Joel Granados:
"Treewide constification of the ctl_table argument of proc_handlers
using a coccinelle script and some manual code formatting fixups.
This is a prerequisite to moving the static ctl_table structs into
read-only data section which will ensure that proc_handler function
pointers cannot be modified"
* tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: treewide: constify the ctl_table argument of proc_handlers
Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases to
get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions. It's not all that useful for a "write a whole driver
in rust" type of thing, but the firmware bindings do help out the
phy rust drivers, and the driver core bindings give a solid base on
which others can start their work. There is still a long way to go
here before we have a multitude of rust drivers being added, but
it's a great first step.
- driver core const api changes. This reached across all bus types,
and there are some fix-ups for some not-common bus types that
linux-next and 0-day testing shook out. This work is being done to
help make the rust bindings more safe, as well as the C code, moving
toward the end-goal of allowing us to put driver structures into
read-only memory. We aren't there yet, but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZqH+aQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymoOQCfVBdLcBjEDAGh3L8qHRGMPy4rV2EAoL/r+zKm
cJEYtJpGtWX6aAtugm9E
=ZyJV
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases
to get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions.
It's not all that useful for a "write a whole driver in rust" type
of thing, but the firmware bindings do help out the phy rust
drivers, and the driver core bindings give a solid base on which
others can start their work.
There is still a long way to go here before we have a multitude of
rust drivers being added, but it's a great first step.
- driver core const api changes.
This reached across all bus types, and there are some fix-ups for
some not-common bus types that linux-next and 0-day testing shook
out.
This work is being done to help make the rust bindings more safe,
as well as the C code, moving toward the end-goal of allowing us to
put driver structures into read-only memory. We aren't there yet,
but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
ARM: sa1100: make match function take a const pointer
sysfs/cpu: Make crash_hotplug attribute world-readable
dio: Have dio_bus_match() callback take a const *
zorro: make match function take a const pointer
driver core: module: make module_[add|remove]_driver take a const *
driver core: make driver_find_device() take a const *
driver core: make driver_[create|remove]_file take a const *
firmware_loader: fix soundness issue in `request_internal`
firmware_loader: annotate doctests as `no_run`
devres: Correct code style for functions that return a pointer type
devres: Initialize an uninitialized struct member
devres: Fix memory leakage caused by driver API devm_free_percpu()
devres: Fix devm_krealloc() wasting memory
driver core: platform: Switch to use kmemdup_array()
driver core: have match() callback in struct bus_type take a const *
MAINTAINERS: add Rust device abstractions to DRIVER CORE
device: rust: improve safety comments
MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
firmware: rust: improve safety comments
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZqIl1AAKCRDbK58LschI
g/MdAP9oyZV9/IZ6Y6Z1fWfio0SB+yJGugcwbFjWcEtNrzsqJQEAwipQnemAI4NC
HBMfK2a/w7vhAFMXrP/SbkB/gUJJ7QE=
=vovf
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-07-25
We've added 14 non-merge commits during the last 8 day(s) which contain
a total of 19 files changed, 177 insertions(+), 70 deletions(-).
The main changes are:
1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and
BPF sockhash. Also add test coverage for this case, from Michal Luczaj.
2) Fix a segmentation issue when downgrading gso_size in the BPF helper
bpf_skb_adjust_room(), from Fred Li.
3) Fix a compiler warning in resolve_btfids due to a missing type cast,
from Liwei Song.
4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte
boundary in the fexit_sleep BPF selftest, from Puranjay Mohan.
5) Fix a xsk regression to require a flag when actuating tx_metadata_len,
from Stanislav Fomichev.
6) Fix function prototype BTF dumping in libbpf for prototypes that have
no input arguments, from Andrii Nakryiko.
7) Fix stacktrace symbol resolution in perf script for BPF programs
containing subprograms, from Hou Tao.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
bpf: Fix a segment issue when downgrading gso_size
tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
bpf, events: Use prog to emit ksymbol event for main program
selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB
selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags
selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected()
af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash
bpftool: Fix typo in usage help
libbpf: Fix no-args func prototype BTF dumping syntax
MAINTAINERS: Update powerpc BPF JIT maintainers
MAINTAINERS: Update email address of Naveen
selftests/bpf: fexit_sleep: Fix stack allocation for arm64
====================
Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 'Fixes' commit recently changed the behaviour of TCP by skipping the
processing of the 3rd ACK when a sk->sk_socket is set. The goal was to
skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an
unnecessary ACK in case of simultaneous connect(). Unfortunately, that
had an impact on TFO and MPTCP.
I started to look at the impact on MPTCP, because the MPTCP CI found
some issues with the MPTCP Packetdrill tests [1]. Then Paolo Abeni
suggested me to look at the impact on TFO with "plain" TCP.
For MPTCP, when receiving the 3rd ACK of a request adding a new path
(MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that
has been created when the MPTCP connection got established before with
the first path. The newly added 'goto' will then skip the processing of
the segment text (step 7) and not go through tcp_data_queue() where the
MPTCP options are validated, and some actions are triggered, e.g.
sending the MPJ 4th ACK [2] as demonstrated by the new errors when
running a packetdrill test [3] establishing a second subflow.
This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
delayed. Still, we don't want to have this behaviour as it delays the
switch to the fully established mode, and invalid MPTCP options in this
3rd ACK will not be caught any more. This modification also affects the
MPTCP + TFO feature as well, and being the reason why the selftests
started to be unstable the last few days [4].
For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer
passing: if the 3rd ACK contains data, and the connection is accept()ed
before receiving them, these data would no longer be processed, and thus
not ACKed.
One last thing about MPTCP, in case of simultaneous connect(), a
fallback to TCP will be done, which seems fine:
`../common/defaults.sh`
0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <mss 1460, sackOK, TS val 100 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
+0 < S 0:0(0) win 1000 <mss 1460, sackOK, TS val 407 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
+0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 330 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
+0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>
+0 > . 1:1(0) ack 1 <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1>
Simultaneous SYN-data crossing is also not supported by TFO, see [6].
Kuniyuki Iwashima suggested to restrict the processing to SYN+ACK only:
that's a more generic solution than the one initially proposed, and
also enough to fix the issues described above.
Later on, Eric Dumazet mentioned that an ACK should still be sent in
reaction to the second SYN+ACK that is received: not sending a DUPACK
here seems wrong and could hurt:
0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <mss 1460, sackOK, TS val 1000 ecr 0,nop,wscale 8>
+0 < S 0:0(0) win 1000 <mss 1000, sackOK, nop, nop>
+0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 3308134035 ecr 0,nop,wscale 8>
+0 < S. 0:0(0) ack 1 win 1000 <mss 1000, sackOK, nop, nop>
+0 > . 1:1(0) ack 1 <nop, nop, sack 0:1> // <== Here
So in this version, the 'goto consume' is dropped, to always send an ACK
when switching from TCP_SYN_RECV to TCP_ESTABLISHED. This ACK will be
seen as a DUPACK -- with DSACK if SACK has been negotiated -- in case of
simultaneous SYN crossing: that's what is expected here.
Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1]
Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2]
Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3]
Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4]
Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5]
Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/client/simultaneous-fast-open.pkt [6]
Fixes: 23e89e8ee7 ("tcp: Don't drop SYN+ACK for simultaneous connect().")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240724-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v3-1-d48339764ce9@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Julian reports that commit 341ac980ea ("xsk: Support tx_metadata_len")
can break existing use cases which don't zero-initialize xdp_umem_reg
padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we
interpret the padding as tx_metadata_len only when being explicitly
asked.
Fixes: 341ac980ea ("xsk: Support tx_metadata_len")
Reported-by: Julian Schindel <mail@arctic-alpaca.de>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240713015253.121248-2-sdf@fomichev.me
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmagtw4ACgkQ1V2XiooU
IOTlQA//V22NlrHpu6iZ7DfDarRAKYcLgAxnQSoE9xu5P2+OEye1KQmSXngmr5xm
4YK2rDpjEU393Myrf2Rqr1qAENeRgE99c5U0R+SU2zQ+9z/LBmZIpLC/FaOPr4rJ
tFJMY9JyxQ/bp3KK+HPz23f/4FoiUQFeUmDwLSxV5uwhLCZ70tBs1492ATt9ESQe
kR69NiNjPgk1hswKFrMCBmilbUMN2dYPJDJmTn2qEdlMnIeRr6o/wcFVZtYnFOyX
L5aR6VwZ1WsNjzRnQodUMABkV3j6XNYUC5yjTX4mSvgVlEEoefbfiismZsC3j/dL
1IuYgHylnC+E6vHGQU6g5t/CLWcDEkJxfjvHfZrJlD1mycdT3PxSRWy0M2qNk6tq
rYeznfwJMwX2WVWVAx3Ba2RmBe3AQkhnypfsPMHFkCf4TZ+peDxPeZzVqlyUsJwI
jkLdnwZXXymd67iye/mtLpfG71dG4zQAoVGaX/6uS9BYlRepXO8uW0LF2f62u0kZ
DkzavZtcdXYvwk+AW+g0/FsKjLcjnETqrOC2K06pP4aHcjnNB8olLg8Ut26/Y506
16OxvUUEbV5Xsd0gquAWJmEI4TFY2ADXJOO4x2nP7GqHkRUeKo73zxTgY/68NCF5
l+Ur4eH5mbcAfLc+2nYyZEc6IaCKdyGkVq8kBoRiK7V09s/UOV4=
=49yO
-----END PGP SIGNATURE-----
Merge tag 'nf-24-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains a Netfilter fix for net:
Patch #1 if FPU is busy, then pipapo set backend falls back to standard
set element lookup. Moreover, disable bh while at this.
From Florian Westphal.
netfilter pull request 24-07-24
* tag 'nf-24-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_set_pipapo_avx2: disable softinterrupts
====================
Link: https://patch.msgid.link/20240724081305.3152-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.
This patch has been generated by the following coccinelle script:
```
virtual patch
@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }
@r3@
identifier func;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);
@r4@
identifier func, ctl;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);
@r5@
identifier func, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.
Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
struct nexthop_grp contains two reserved fields that are not initialized by
nla_put_nh_group(), and carry garbage. This can be observed e.g. with
strace (edited for clarity):
# ip nexthop add id 1 dev lo
# ip nexthop add id 101 group 1
# strace -e recvmsg ip nexthop get id 101
...
recvmsg(... [{nla_len=12, nla_type=NHA_GROUP},
[{id=1, weight=0, resvd1=0x69, resvd2=0x67}]] ...) = 52
The fields are reserved and therefore not currently used. But as they are, they
leak kernel memory, and the fact they are not just zero complicates repurposing
of the fields for new ends. Initialize the full structure.
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tipc_udp_addr2str() should return non-zero value if the UDP media
address is invalid. Otherwise, a buffer overflow access can occur in
tipc_media_addr_printf(). Fix this by returning 1 on an invalid UDP
media address.
Fixes: d0f91938be ("tipc: add ip/udp media type")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@endava.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to disable softinterrupts, else we get following problem:
1. pipapo_avx2 called from process context; fpu usable
2. preempt_disable() called, pcpu scratchmap in use
3. softirq handles rx or tx, we re-enter pipapo_avx2
4. fpu busy, fallback to generic non-avx version
5. fallback reuses scratch map and index, which are in use
by the preempted process
Handle this same way as generic version by first disabling
softinterrupts while the scratchmap is in use.
Fixes: f0b3d33806 ("netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version")
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The Record Route IP option records the addresses of the routers that
routed the packet. In the case of forwarded packets, the kernel performs
a route lookup via fib_lookup() and fills in the preferred source
address of the matched route.
The lookup is performed with the DS field of the forwarded packet, but
using the RT_TOS() macro which only masks one of the two ECN bits. If
the packet is ECT(0) or CE, the matched route might be different than
the route via which the packet was forwarded as the input path masks
both of the ECN bits, resulting in the wrong address being filled in the
Record Route option.
Fix by masking both of the ECN bits.
Fixes: 8e36360ae8 ("ipv4: Remove route key identity dependencies in ip_rt_get_source().")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20240718123407.434778-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuan-Wei Chiu has significantly reworked the min_heap library code and
has taught bcachefs to use the new more generic implementation.
- Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
reworks the cpumask and nodemask headers to make things generally more
rational.
- Kuan-Wei Chiu has sent along some maintenance work against our sorting
library code in the series "lib/sort: Optimizations and cleanups".
- More library maintainance work from Christophe Jaillet in the series
"Remove usage of the deprecated ida_simple_xx() API".
- Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
series "nilfs2: eliminate the call to inode_attach_wb()".
- Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB
command error".
- Plus the usual shower of singleton patches all over the place. Please
see the relevant changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2GvwAKCRDdBJ7gKXxA
jlf/AP48xP5ilIHbtpAKm2z+MvGuTxJQ5VSC0UXFacuCbc93lAEA+Yo+vOVRmh6j
fQF2nVKyKLYfSz7yqmCyAaHWohIYLgg=
=Stxz
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- In the series "treewide: Refactor heap related implementation",
Kuan-Wei Chiu has significantly reworked the min_heap library code
and has taught bcachefs to use the new more generic implementation.
- Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
reworks the cpumask and nodemask headers to make things generally
more rational.
- Kuan-Wei Chiu has sent along some maintenance work against our
sorting library code in the series "lib/sort: Optimizations and
cleanups".
- More library maintainance work from Christophe Jaillet in the series
"Remove usage of the deprecated ida_simple_xx() API".
- Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
series "nilfs2: eliminate the call to inode_attach_wb()".
- Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix
GDB command error".
- Plus the usual shower of singleton patches all over the place. Please
see the relevant changelogs for details.
* tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits)
ia64: scrub ia64 from poison.h
watchdog/perf: properly initialize the turbo mode timestamp and rearm counter
tsacct: replace strncpy() with strscpy()
lib/bch.c: use swap() to improve code
test_bpf: convert comma to semicolon
init/modpost: conditionally check section mismatch to __meminit*
init: remove unused __MEMINIT* macros
nilfs2: Constify struct kobj_type
nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro
math: rational: add missing MODULE_DESCRIPTION() macro
lib/zlib: add missing MODULE_DESCRIPTION() macro
fs: ufs: add MODULE_DESCRIPTION()
lib/rbtree.c: fix the example typo
ocfs2: add bounds checking to ocfs2_check_dir_entry()
fs: add kernel-doc comments to ocfs2_prepare_orphan_dir()
coredump: simplify zap_process()
selftests/fpu: add missing MODULE_DESCRIPTION() macro
compiler.h: simplify data_race() macro
build-id: require program headers to be right after ELF header
resource: add missing MODULE_DESCRIPTION()
...
Including fixes from netfilter.
Current release - new code bugs:
- eth: fbnic: fix s390 build.
- eth: airoha: fix NULL pointer dereference in airoha_qdma_cleanup_rx_queue()
Previous releases - regressions:
- flow_dissector: use DEBUG_NET_WARN_ON_ONCE
- ipv4: fix incorrect TOS in route get reply
- dsa: fix chip-wide frame size config in some drivers
Previous releases - always broken:
- netfilter: nf_set_pipapo: fix initial map fill
- eth: gve: fix XDP TX completion handling when counters overflow
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmaafj4SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkSsIP/jLODokNb/RcIuOYZVlgn2C5icMeZtqv
6BIliZeE1EcnMWqOnheHaJVklq17ChAYbj/GNN8zQeOQhPA2eFCzPPLl/UpF9/ik
wuvk+QQ4EM3T/SwWLKmht9UJioVi66szl3vZ+ByJ4BgGiJWORW1dK/AYzyYrsplk
BMpQTs/Q/ekzWJzBtX+1Cz8izX1gl+dXhMezSdg9cW4KaBu1Hqwny954HF6keUQn
h7bbAWiOAb5bhqUzgCdgF4gXp+uaWEzG1a1kY1G86NjXC5H0G03+vl37RbY/1kYh
lUa/3nfH2V7Sy+MYTvjQe66obVeQeOh/PhRMlbkEVphpKs8XtHfsP433iOMZZn2D
Z2Yb4yICnpNwbPmK1fyFvOrY7zGp2yZ2dSDEhowGjcyoQPO6RPnBfvArl66phIOm
DWfEq79dYa0eEnCM174BLZbsjcHeeYQX2b8lagUslC0Oel+ijUG26XeMqs5W3at9
QVcMV+aPE7pibpAq4M+qqkldv49QmXRsQai0BMa0+qEPyDKLe2nnf6L+TrDfN7Zf
tpiiZZZGpcctZ1cOfyuebB37z/jtHJQ94IfLCD9RzpHlpOtCFycO18adrohhI58Q
TaZf7U1CqC/UREckwE5F9ypQhdiQHEH84rgvKT3zFFRYftWTQmBCAUs3JzBPbYwO
RO2GDFM/htOu
=Q1xF
-----END PGP SIGNATURE-----
Merge tag 'net-6.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Notably this includes fixes for a s390 build breakage.
Current release - new code bugs:
- eth: fbnic: fix s390 build
- eth: airoha: fix NULL pointer dereference in
airoha_qdma_cleanup_rx_queue()
Previous releases - regressions:
- flow_dissector: use DEBUG_NET_WARN_ON_ONCE
- ipv4: fix incorrect TOS in route get reply
- dsa: fix chip-wide frame size config in some drivers
Previous releases - always broken:
- netfilter: nf_set_pipapo: fix initial map fill
- eth: gve: fix XDP TX completion handling when counters overflow"
* tag 'net-6.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net:
eth: fbnic: don't build the driver when skb has more than 21 frags
net: dsa: b53: Limit chip-wide jumbo frame config to CPU ports
net: dsa: mv88e6xxx: Limit chip-wide frame size config to CPU ports
net: airoha: Fix NULL pointer dereference in airoha_qdma_cleanup_rx_queue()
net: wwan: t7xx: add support for Dell DW5933e
ipv4: Fix incorrect TOS in fibmatch route get reply
ipv4: Fix incorrect TOS in route get reply
net: flow_dissector: use DEBUG_NET_WARN_ON_ONCE
driver core: auxiliary bus: Fix documentation of auxiliary_device
net: airoha: fix error branch in airoha_dev_xmit and airoha_set_gdm_ports
gve: Fix XDP TX completion handling when counters overflow
ipvs: properly dereference pe in ip_vs_add_service
selftests: netfilter: add test case for recent mismatch bug
netfilter: nf_set_pipapo: fix initial map fill
netfilter: ctnetlink: use helper function to calculate expect ID
eth: fbnic: fix s390 build.
Several new features here:
- Virtio find vqs API has been reworked
(required to fix the scalability issue we have with
adminq, which I hope to merge later in the cycle)
- vDPA driver for Marvell OCTEON
- virtio fs performance improvement
- mlx5 migration speedups
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmaXjQQPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpnIsH/jVNqAQbe/vaBQdNMdnsA+P9A9unLbYRxYCQ
tN73mQRIXKtnZHBRAEbMGq52HPYg8HlN2HJSgyNo6I6t8VD+PiOco7m+3GpmqEcW
aXPOPl0BAbVoDgyutxRuuodP8Z61lBx0mG6iOxpzTXOPGlpQqtPCFHO8YnodqnPf
tMix/5uAqgZKV2siCbw5DtzwEc0gDHU8qsD0/nyoS5nBDF9yh/ardr5P/qiyFDQH
atCNYTOhIFU83pLAaw0fpCGbkt7gxf+5RpWVx3wkYww+/MwvYhsveRvQyaGbBz3n
WDtET3SOtVTta98OAGIKCq/2z8f6mYXBP7vXapBgnJG3vwS/poQ=
=LYua
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- Virtio find vqs API has been reworked (required to fix the
scalability issue we have with adminq, which I hope to merge later
in the cycle)
- vDPA driver for Marvell OCTEON
- virtio fs performance improvement
- mlx5 migration speedups
Fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits)
virtio: rename virtio_find_vqs_info() to virtio_find_vqs()
virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers
virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()
virtio_balloon: convert to use virtio_find_vqs_info()
virtiofs: convert to use virtio_find_vqs_info()
scsi: virtio_scsi: convert to use virtio_find_vqs_info()
virtio_net: convert to use virtio_find_vqs_info()
virtio_crypto: convert to use virtio_find_vqs_info()
virtio_console: convert to use virtio_find_vqs_info()
virtio_blk: convert to use virtio_find_vqs_info()
virtio: rename find_vqs_info() op to find_vqs()
virtio: remove the original find_vqs() op
virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly
virtio: convert find_vqs() op implementations to find_vqs_info()
virtio_pci: convert vp_*find_vqs() ops to find_vqs_info()
virtio: introduce virtio_queue_info struct and find_vqs_info() config op
virtio: make virtio_find_single_vq() call virtio_find_vqs()
virtio: make virtio_find_vqs() call virtio_find_vqs_ctx()
caif_virtio: use virtio_find_single_vq() for single virtqueue finding
vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()
...
New Features:
* Add support for large folios
* Implement rpcrdma generic device removal notification
* Add client support for attribute delegations
* Use a LAYOUTRETURN during reboot recovery to report layoutstats and errors
* Improve throughput for random buffered writes
* Add NVMe support to pnfs/blocklayout
Bugfixes:
* Fix rpcrdma_reqs_reset()
* Avoid soft lockups when using UDP
* Fix an nfs/blocklayout premature PR key unregestration
* Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
* Do not extend writes to the entire folio
* Pass explicit offset and count values to tracepoints
* Fix a race to wake up sleeping SUNRPC sync tasks
* Fix gss_status tracepoint output
Cleanups:
* Add missing MODULE_DESCRIPTION() macros
* Add blocklayout / SCSI layout tracepoints
* Remove asm-generic headers from xprtrdma verbs.c
* Remove unused 'struct mnt_fhstatus'
* Other delegation related cleanups
* Other folio related cleanups
* Other pNFS related cleanups
* Other xprtrdma cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmaZgr0ACgkQ18tUv7Cl
QOv8FxAAnUyYG7Kdbv+5Ko/SFv0imxCb5DQh2XC/hSHNrlKBlDnqe2PANXR9XocL
mS0Wry5tZf/T+o+QoKv0HQUdWFlnqKzwclggrekf/lkioU1feWsLe2RzDl1iUh0V
6fwcCyWXW1mYX2CtCaDe+/ZFcoZOMD+bItNHt/RdDScSnS9Jd8GSyocsVKsqaBx6
3wub0FJ4UBgYNoX2T3YyK2JwvO9GLaKIQRJV74rjgPJKjcjhptbcb5MKBmOZrF95
UCcpl4CwvD9RTsSEp0B98UbAFFpk8Nw1tmHF3GmyG/nsrJomDuLKFvbsiq23eHUf
XeULZIbjMEzU56vjoTglZA4s7JYx17D0vzdPGUqU4mLN3LPm5LtGLBg2uQoPw/xW
50euLU+ol36mfnQlBsuM/tAXgtoAcT63aNeNRNp8aOL47xA+PC6kWTBK9OaR5+x6
w+d22Dpy+riMk1TRaAVt0ANcENKELsWRFvxkuWCpQhVoQ1h8LigQJzeggEEK7Sa6
5u9H6wCTee2wz746uwA43koj1utuyrLq/5S+qEtCY1pbP3U0A+Gh0Xh00OXiYuzL
TgRdksmiAL8cA51WjSrq6HhGLOUJAYLfbdKaVhW+fULxUVwzWhFFaFbbdiq/e4OR
0pfqls8UZWICE51GeTfalEidpKZgV/LxU3QOuVoalWBULyj/TeI=
=avTW
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add support for large folios
- Implement rpcrdma generic device removal notification
- Add client support for attribute delegations
- Use a LAYOUTRETURN during reboot recovery to report layoutstats
and errors
- Improve throughput for random buffered writes
- Add NVMe support to pnfs/blocklayout
Bugfixes:
- Fix rpcrdma_reqs_reset()
- Avoid soft lockups when using UDP
- Fix an nfs/blocklayout premature PR key unregestration
- Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
- Do not extend writes to the entire folio
- Pass explicit offset and count values to tracepoints
- Fix a race to wake up sleeping SUNRPC sync tasks
- Fix gss_status tracepoint output
Cleanups:
- Add missing MODULE_DESCRIPTION() macros
- Add blocklayout / SCSI layout tracepoints
- Remove asm-generic headers from xprtrdma verbs.c
- Remove unused 'struct mnt_fhstatus'
- Other delegation related cleanups
- Other folio related cleanups
- Other pNFS related cleanups
- Other xprtrdma cleanups"
* tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits)
SUNRPC: Fixup gss_status tracepoint error output
SUNRPC: Fix a race to wake a sync task
nfs: split nfs_read_folio
nfs: pass explicit offset/count to trace events
nfs: do not extend writes to the entire folio
nfs/blocklayout: add support for NVMe
nfs: remove nfs_page_length
nfs: remove the unused max_deviceinfo_size field from struct pnfs_layoutdriver_type
nfs: don't reuse partially completed requests in nfs_lock_and_join_requests
nfs: move nfs_wait_on_request to write.c
nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests
nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests
nfs: simplify nfs_folio_find_and_lock_request
nfs: remove nfs_folio_private_request
nfs: remove dead code for the old swap over NFS implementation
NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
nfs: Block on write congestion
nfs: Properly initialize server->writeback
nfs: Drop pointless check from nfs_commit_release_pages()
nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg
...
We've observed NFS clients with sync tasks sleeping in __rpc_execute
waiting on RPC_TASK_QUEUED that have not responded to a wake-up from
rpc_make_runnable(). I suspect this problem usually goes unnoticed,
because on a busy client the task will eventually be re-awoken by another
task completion or xprt event. However, if the state manager is draining
the slot table, a sync task missing a wake-up can result in a hung client.
We've been able to prove that the waker in rpc_make_runnable() successfully
calls wake_up_bit() (ie- there's no race to tk_runstate), but the
wake_up_bit() call fails to wake the waiter. I suspect the waker is
missing the load of the bit's wait_queue_head, so waitqueue_active() is
false. There are some very helpful comments about this problem above
wake_up_bit(), prepare_to_wait(), and waitqueue_active().
Fix this by inserting smp_mb__after_atomic() before the wake_up_bit(),
which pairs with prepare_to_wait() calling set_current_state().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmaYOsUACgkQ1V2XiooU
IOSZfhAAiylt0bXfNLh2dGFpigNqejO0y7uJG/lNALA6kbhqkfdaXyiEGI+ZyIUM
qtkpk0BUDEhCtpspOVLNtDYcZkKuH7zzt6MsA9BcNm/5AS4XPB56buPsZ94ic3Uk
LZB8X/+q+gT+HC3RF4hf92rfJP6WKvt3mO8GGLstg0r6orGX6DywfwD+jCAIS/HS
CgDot0SXS0MQZpTUoMugRpGF8wWBANLjUvLYNjIQhV5vLpRvX/Fal+St1+AsFnzq
hdtMDOmqJHy0GNwrOdWX5muK+3X5Fg1rHUIgA1SqALd52lpsv62U9JHN6cvV8nqG
h24u3v17rtHg82zbKz4pBVHwurDyIT3TM3Pmp8v+b8ZnYmWsyXLr3lqv3GU5Rx14
HWsEi+FL/ND2mfWF+GOn4JqkWy/97eZ78VkekuuWXfzuNAXedp7OUw7RgfhCl5Pr
DZPus0rrnl3nCOeQyXGpo0RacFDOhKESa0EBjHS0+S8nv4vpzS+5XbiZctM0FlSu
QhwZM2Q+M6Klr0LWkLYyS99s5Cra3z5z/Oud7xtnlBOp81r0O9YVBYopoV6MM0kV
b6IHvOvs+5W+dj8RmH6Chq9D+EmFuYGwPoa5uv6zdSn/2ZeXqKeGq/HnlXuqct37
RCbuV8S1IzJi5EgtwbU0uxBclKkiyCtkhqSNrdjSHxLwf7wgI2s=
=I/g9
-----END PGP SIGNATURE-----
Merge tag 'nf-24-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Call nf_expect_get_id() to delete expectation by ID. By trial and
error it is possible to leak the LSB of the expectation address on
x86_64. This bug is a leftover when converting the existing code
to use nf_expect_get_id().
2) Incorrect initialization in pipapo set backend leads to packet
mismatches. From Florian Westphal.
3) Extend netfilter's selftests to cover for the pipapo set backend,
also from Florian.
4) Fix sparse warning in IPVS when adding service, from Chen Hanxiao.
netfilter pull request 24-07-17
* tag 'nf-24-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
ipvs: properly dereference pe in ip_vs_add_service
selftests: netfilter: add test case for recent mismatch bug
netfilter: nf_set_pipapo: fix initial map fill
netfilter: ctnetlink: use helper function to calculate expect ID
====================
Link: https://patch.msgid.link/20240717215214.225394-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The TOS value that is returned to user space in the route get reply is
the one with which the lookup was performed ('fl4->flowi4_tos'). This is
fine when the matched route is configured with a TOS as it would not
match if its TOS value did not match the one with which the lookup was
performed.
However, matching on TOS is only performed when the route's TOS is not
zero. It is therefore possible to have the kernel incorrectly return a
non-zero TOS:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get fibmatch 192.0.2.2 tos 0xfc
192.0.2.0/24 tos 0x1c dev dummy1 proto kernel scope link src 192.0.2.1
Fix by instead returning the DSCP field from the FIB result structure
which was populated during the route lookup.
Output after the patch:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get fibmatch 192.0.2.2 tos 0xfc
192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1
Extend the existing selftests to not only verify that the correct route
is returned, but that it is also returned with correct "tos" value (or
without it).
Fixes: b61798130f ("net: ipv4: RTM_GETROUTE: return matched fib result when requested")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The TOS value that is returned to user space in the route get reply is
the one with which the lookup was performed ('fl4->flowi4_tos'). This is
fine when the matched route is configured with a TOS as it would not
match if its TOS value did not match the one with which the lookup was
performed.
However, matching on TOS is only performed when the route's TOS is not
zero. It is therefore possible to have the kernel incorrectly return a
non-zero TOS:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get 192.0.2.2 tos 0xfc
192.0.2.2 tos 0x1c dev dummy1 src 192.0.2.1 uid 0
cache
Fix by adding a DSCP field to the FIB result structure (inside an
existing 4 bytes hole), populating it in the route lookup and using it
when filling the route get reply.
Output after the patch:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get 192.0.2.2 tos 0xfc
192.0.2.2 dev dummy1 src 192.0.2.1 uid 0
cache
Fixes: 1a00fee4ff ("ipv4: Remove rt_key_{src,dst,tos} from struct rtable.")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Use pe directly to resolve sparse warning:
net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression
Fixes: 39b9722315 ("ipvs: handle connections started by real-servers")
Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
AF_UNIX socket tracks the most recent OOB packet (in its receive queue)
with an `oob_skb` pointer. BPF redirecting does not account for that: when
an OOB packet is moved between sockets, `oob_skb` is left outdated. This
results in a single skb that may be accessed from two different sockets.
Take the easy way out: silently drop MSG_OOB data targeting any socket that
is in a sockmap or a sockhash. Note that such silent drop is akin to the
fate of redirected skb's scm_fp_list (SCM_RIGHTS, SCM_CREDENTIALS).
For symmetry, forbid MSG_OOB in unix_bpf_recvmsg().
Fixes: 314001f0bf ("af_unix: Add OOB support")
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20240713200218.2140950-2-mhal@rbox.co
This is a light release containing optimizations, code clean-ups,
and minor bug fixes. This development cycle focused on work outside
of upstream kernel development:
1. Continuing to build upstream CI for NFSD based on kdevops
2. Continuing to focus on the quality of NFSD in LTS kernels
3. Participation in IETF nfsv4 WG discussions about NFSv4 ACLs,
directory delegation, and NFSv4.2 COPY offload
Notable features in v6.11 that were not pulled through the NFSD tree
include NFS server-side support for the new pNFS NVMe layout type
[RFC9561]. Functional testing for pNFS block layouts like this one
has been introduced to our kdevops CI harness. Work on improving
the resolution of file attribute time stamps in local filesystems
is also ongoing tree-wide.
As always I am grateful to NFSD contributors, reviewers, testers,
and bug reporters who participated during this cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmaVM0cACgkQM2qzM29m
f5fzOQ//c5CXIF3zCLIUofm5eZSP2zIszmHR75rVTEnf0Ehm2BJRF6VZiTvWXRpz
bOuswxfV1Bds+TofbPIP8jqDcMp8NIXemdb6+QMwh4FDY4M8t1v6TRYt35L6Ulrq
bSV81aRS622ofQ35sRzwxpGX6rB6YbB+5L4EKuxdEqRKSB8rCxQcjPy2qypcWlRC
hEAGDe3IiVxTz4VQBpASRqbH9Udw/XEqIhv5c8aLtPvl8i+yWyV5m2G5FMRdBj49
u8rCLoPi/mON8TDs2U4pbhcdgfBWWvGS6woFp6qrqM0wzXIPLalWsPGK3DUtuFUg
onxsClJXMWUvW4k4hbjiqosduLGY/kMeX62Lx1dCj/gktrJpU0GDNR/XbBhHU+hj
UT2CL8AfedC4FQekdyJri/rDgPiTMsf8UE0lgtchHMUXH0ztrjaRxMGiIFMm5vCl
dJBMGJfCkKR/+U1YrGRQI0tPL8CJKYI8klOEhLoOsCr/WC9p4nvvAzSg4W9mNK5P
ni4+KU4f/bj8U0Ap2bUacTpXj6W8VcwJWeuHahVA1Slo+eqXO401hj4W88dQmm9O
ZDR5h+6PI6KoL/KL6I4EyOv+sIEtW3s18cEWbSSu3N/CPuhSGTx8d2J201shJXRN
uDdMkvbwv48x20pgD2oTkPrZbJHOL3BK5/WPBg7pwpfkoRrBAhY=
=Xd5e
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"This is a light release containing optimizations, code clean-ups, and
minor bug fixes.
This development cycle focused on work outside of upstream kernel
development:
- Continuing to build upstream CI for NFSD based on kdevops
- Continuing to focus on the quality of NFSD in LTS kernels
- Participation in IETF nfsv4 WG discussions about NFSv4 ACLs,
directory delegation, and NFSv4.2 COPY offload
Notable features for v6.11 that do not come through the NFSD tree
include NFS server-side support for the new pNFS NVMe layout type
[RFC9561]. Functional testing for pNFS block layouts like this one has
been introduced to our kdevops CI harness. Work on improving the
resolution of file attribute time stamps in local filesystems is also
ongoing tree-wide.
As always I am grateful to NFSD contributors, reviewers, testers, and
bug reporters who participated during this cycle"
* tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument
gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey
MAINTAINERS: Add a bugzilla link for NFSD
nfsd: new netlink ops to get/set server pool_mode
sunrpc: refactor pool_mode setting code
nfsd: allow passing in array of thread counts via netlink
nfsd: make nfsd_svc take an array of thread counts
sunrpc: fix up the special handling of sv_nrpools == 1
SUNRPC: Add a trace point in svc_xprt_deferred_close
NFSD: Support write delegations in LAYOUTGET
lockd: Use *-y instead of *-objs in Makefile
NFSD: Fix nfsdcld warning
svcrdma: Handle ADDR_CHANGE CM event properly
svcrdma: Refactor the creation of listener CMA ID
NFSD: remove unused structs 'nfsd3_voidargs'
NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
The initial buffer has to be inited to all-ones, but it must restrict
it to the size of the first field, not the total field size.
After each round in the map search step, the result and the fill map
are swapped, so if we have a set where f->bsize of the first element
is smaller than m->bsize_max, those one-bits are leaked into future
rounds result map.
This makes pipapo find an incorrect matching results for sets where
first field size is not the largest.
Followup patch adds a test case to nft_concat_range.sh selftest script.
Thanks to Stefano Brivio for pointing out that we need to zero out
the remainder explicitly, only correcting memset() argument isn't enough.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reported-by: Yi Chen <yiche@redhat.com>
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>