As of commit 5801f064e3 ("net: ipv6: unexport __init-annotated seg6_hmac_init()"),
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
This remove the EXPORT_SYMBOL to fix modpost warning:
WARNING: modpost: vmlinux.o(___ksymtab+seg6_hmac_net_init+0x0): Section mismatch in reference from the variable __ksymtab_seg6_hmac_net_init to the function .init.text:seg6_hmac_net_init()
The symbol seg6_hmac_net_init is exported and annotated __init
Fix this by removing the __init annotation of seg6_hmac_net_init or drop the export.
Fixes: bf355b8d2c ("ipv6: sr: add core files for SR HMAC support")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20220628033134.21088-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When kcalloc fails, ipip6_tunnel_get_prl() should return -ENOMEM.
Move the position of label "out" to return correctly.
Addresses-Coverity: ("Unused value")
Fixes: 300aaeeaab ("[IPV6] SIT: Add SIOCGETPRL ioctl to get/dump PRL.")
Signed-off-by: katrinzhou <katrinzhou@tencent.com>
Reviewed-by: Eric Dumazet<edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220628035030.1039171-1-zys.zljxml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau says:
====================
mptcp: Fixes for 5.19
Several categories of fixes from the mptcp tree:
Patches 1-3 are fixes related to MP_FAIL and FASTCLOSE, to make sure
MIBs are accurate, and to handle MP_FAIL transmission and responses at
the correct times. sk_timer conflicts are also resolved.
Patches 4 and 6 handle two separate race conditions, one at socket
shutdown and one with unaccepted subflows.
Patch 5 makes sure read operations are not blocked during fallback to
TCP.
Patch 7 improves the diag selftest, which were incorrectly failing on
slow machines (like the VMs used for CI testing).
Patch 8 avoids possible symbol redefinition errors in the userspace
mptcp.h file.
Patch 9 fixes a selftest build issue with gcc 12.
====================
Link: https://lore.kernel.org/r/20220628010243.166605-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In a few MPTCP selftest tools, gcc 12 complains that the 'sock' variable
might be used uninitialized. This is a false positive because the only
code path that could lead to uninitialized access is where getaddrinfo()
fails, but the local xgetaddrinfo() wrapper exits if such a failure
occurs.
Initialize the 'sock' variable anyway to allow the tools to build with
gcc 12.
Fixes: 048d19d444 ("mptcp: add basic kselftest for mptcp")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Including <linux/mptcp.h> before the C library <netinet/in.h> header
causes symbol redefinition errors at compile-time due to duplicate
declarations and definitions in the <linux/in.h> header included by
<linux/mptcp.h>.
Explicitly include <netinet/in.h> before <linux/in.h> in
<linux/mptcp.h> when __KERNEL__ is not defined so that the C library
compatibility logic in <linux/libc-compat.h> is enabled when including
<linux/mptcp.h> in user space code.
Fixes: c11c5906bc ("mptcp: add MPTCP_SUBFLOW_ADDRS getsockopt support")
Signed-off-by: Ossama Othman <ossama.othman@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mentioned test-case still use an hard-coded-len sleep to
wait for a relative large number of connection to be established.
On very slow VM and with debug build such timeout could be exceeded,
causing failures in our CI.
Address the issue polling for the expected condition several times,
up to an unreasonable high amount of time. On reasonably fast system
the self-tests will be faster then before, on very slow one we will
still catch the correct condition.
Fixes: df62f2ec3d ("selftests/mptcp: add diag interface tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the listener socket owning the relevant request is closed,
it frees the unaccepted subflows and that causes later deletion
of the paired MPTCP sockets.
The mptcp socket's worker can run in the time interval between such delete
operations. When that happens, any access to msk->first will cause an UaF
access, as the subflow cleanup did not cleared such field in the mptcp
socket.
Address the issue explicitly traversing the listener socket accept
queue at close time and performing the needed cleanup on the pending
msk.
Note that the locking is a bit tricky, as we need to acquire the msk
socket lock, while still owning the subflow socket one.
Fixes: 86e39e0448 ("mptcp: keep track of local endpoint still available for each msk")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the MPTCP receive path reach a non fatal fall-back condition, e.g.
when the MPC sockets must fall-back to TCP, the existing code is a little
self-inconsistent: it reports that new data is available - return true -
but sets the MPC flag to the opposite value.
As the consequence read operations in some exceptional scenario may block
unexpectedly.
Address the issue setting the correct MPC read status. Additionally avoid
some code duplication in the fatal fall-back scenario.
Fixes: 9c81be0dbc ("mptcp: add MP_FAIL response support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the MPTCP socket shutdown happens before a fallback
to TCP, and all the pending data have been already spooled,
we never close the TCP connection.
Address the issue explicitly checking for critical condition
at fallback time.
Fixes: 1e39e5a32a ("mptcp: infinite mapping sending")
Fixes: 0348c690ed ("mptcp: add the fallback check")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mptcp_mp_fail_no_response shouldn't be invoked on each worker run, it
should be invoked only when MP_FAIL response timeout occurs.
This patch refactors the MP_FAIL response logic.
It leverages the fact that only the MPC/first subflow can gracefully
fail to avoid unneeded subflows traversal: the failing subflow can
be only msk->first.
A new 'fail_tout' field is added to the subflow context to record the
MP_FAIL response timeout and use such field to reliably share the
timeout timer between the MP_FAIL event and the MPTCP socket close
timeout.
Finally, a new ack is generated to send out MP_FAIL notification as soon
as we hit the relevant condition, instead of waiting a possibly unbound
time for the next data packet.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/281
Fixes: d9fb797046 ("mptcp: Do not traverse the subflow connection list without lock")
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This allow moving a couple of conditional out of the fast path,
making the code more easy to follow and will simplify the next
patch.
Fixes: ae66fb2ba6 ("mptcp: Do TCP fallback on early DSS checksum failure")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current accounting for MP_FAIL and FASTCLOSE is not very
accurate: both can be increased even when the related option is
not really sent. Move the accounting into the correct place.
Fixes: eb7f33654d ("mptcp: add the mibs for MP_FAIL")
Fixes: 1e75629cb9 ("mptcp: add the mibs for MP_FASTCLOSE")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When routes corresponding to addresses are restored by
fixup_permanent_addr(), the dst_nopolicy parameter was not set.
The typical use case is a user that configures an address on a down
interface and then put this interface up.
Let's take care of this flag in addrconf_f6i_alloc(), so that every callers
benefit ont it.
CC: stable@kernel.org
CC: David Forster <dforster@brocade.com>
Fixes: df789fe752 ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
Reported-by: Siwar Zitouni <siwar.zitouni@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220623120015.32640-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We should respect link partner capabilities and not force flow control
support on every link. Even more, in current state the MAC driver do not
advertises pause support so we should not keep flow control enabled at
all.
Fixes: e532a096be ("net: usb: asix: ax88772: add phylib support")
Reported-by: Anton Lundin <glance@acc.umu.se>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Anton Lundin <glance@acc.umu.se>
Link: https://lore.kernel.org/r/20220624075139.3139300-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If cable is attached after probe sequence, the usbnet framework would
not automatically start processing RX packets except at least one
packet was transmitted.
On systems with any kind of address auto configuration this issue was
not detected, because some packets are send immediately after link state
is changed to "running".
With this patch we will notify usbnet about link status change provided by the
PHYlib.
Fixes: e532a096be ("net: usb: asix: ax88772: add phylib support")
Reported-by: Anton Lundin <glance@acc.umu.se>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Anton Lundin <glance@acc.umu.se>
Link: https://lore.kernel.org/r/20220624075139.3139300-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mails to Charles get an auto reply, that he is no longer working at
Eff'Innov technologies. Drop the entry and mark the driver as orphaned.
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220626200039.4062784-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira says:
====================
Notify user space if any actions were flushed before error
This patch series fixes the behaviour of actions flush so that the
kernel always notifies user space whenever it deletes actions during a
flush operation, even if it didn't flush all the actions. This series
also introduces tdc tests to verify this new behaviour.
====================
Link: https://lore.kernel.org/r/20220623140742.684043-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add tdc test cases to verify new flush behaviour is correct, which do
the following:
- Try to flush only one action which is being referenced by a filter
- Try to flush three actions where the last one (index 3) is being
referenced by a filter
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If during an action flush operation one of the actions is still being
referenced, the flush operation is aborted and the kernel returns to
user space with an error. However, if the kernel was able to flush, for
example, 3 actions and failed on the fourth, the kernel will not notify
user space that it deleted 3 actions before failing.
This patch fixes that behaviour by notifying user space of how many
actions were deleted before flush failed and by setting extack with a
message describing what happened.
Fixes: 55334a5db5 ("net_sched: act: refuse to remove bound action outside")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
epic_close() calls epic_rx() and uses dma buffer, but in epic_remove_one()
we already freed the dma buffer. To fix this issue, reorder function calls
like in the .probe function.
BUG: KASAN: use-after-free in epic_rx+0xa6/0x7e0 [epic100]
Call Trace:
epic_rx+0xa6/0x7e0 [epic100]
epic_close+0xec/0x2f0 [epic100]
unregister_netdev+0x18/0x20
epic_remove_one+0xaa/0xf0 [epic100]
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Yilun Wu <yiluwu@cs.stonybrook.edu>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Reviewed-by: Francois Romieu <romieu@fr.zoreil.com>
Link: https://lore.kernel.org/r/20220627043351.25615-1-ztong0001@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While looking at a syzbot report I noticed the NAPI only gets
disabled before it's deleted. I think that user can detach
the queue before destroying the device and the NAPI will never
be stopped.
Fixes: 943170998b ("tun: enable NAPI for TUN/TAP driver")
Acked-by: Petar Penkov <ppenkov@aviatrix.com>
Link: https://lore.kernel.org/r/20220623042105.2274812-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shuang Li reported a NULL pointer dereference crash:
[] BUG: kernel NULL pointer dereference, address: 0000000000000068
[] RIP: 0010:tipc_link_is_up+0x5/0x10 [tipc]
[] Call Trace:
[] <IRQ>
[] tipc_bcast_rcv+0xa2/0x190 [tipc]
[] tipc_node_bc_rcv+0x8b/0x200 [tipc]
[] tipc_rcv+0x3af/0x5b0 [tipc]
[] tipc_udp_recv+0xc7/0x1e0 [tipc]
It was caused by the 'l' passed into tipc_bcast_rcv() is NULL. When it
creates a node in tipc_node_check_dest(), after inserting the new node
into hashtable in tipc_node_create(), it creates the bc link. However,
there is a gap between this insert and bc link creation, a bc packet
may come in and get the node from the hashtable then try to dereference
its bc link, which is NULL.
This patch is to fix it by moving the bc link creation before inserting
into the hashtable.
Note that for a preliminary node becoming "real", the bc link creation
should also be called before it's rehashed, as we don't create it for
preliminary nodes.
Fixes: 4cbf8ac2fe ("tipc: enable creating a "preliminary" node")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "1<<31" shift has a sign extension bug so IFF_TX_SKB_NO_LINEAR is
0xffffffff80000000 instead of 0x0000000080000000.
Fixes: c2ff53d804 ("net: Add priv_flags for allow tx skb without linear")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/YrRrcGttfEVnf85Q@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enguerrand de Ribaucourt says:
====================
net: dp83822: fix interrupt floods
The false carrier and RX error counters, once half full, produce interrupt
floods. Since we do not use these counters, these interrupts should be disabled.
====================
Link: https://lore.kernel.org/r/20220623134645.1858361-1-enguerrand.de-ribaucourt@savoirfairelinux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some RX errors, notably when disconnecting the cable, increase the RCSR
register. Once half full (0x7fff), an interrupt flood is generated. I
measured ~3k/s interrupts even after the RX errors transfer was
stopped.
Since we don't read and clear the RCSR register, we should disable this
interrupt.
Fixes: 87461f7a58 ("net: phy: DP83822 initial driver submission")
Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When unplugging an Ethernet cable, false carrier events were produced by
the PHY at a very high rate. Once the false carrier counter full, an
interrupt was triggered every few clock cycles until the cable was
replugged. This resulted in approximately 10k/s interrupts.
Since the false carrier counter (FCSCR) is never used, we can safely
disable this interrupt.
In addition to improving performance, this also solved MDIO read
timeouts I was randomly encountering with an i.MX8 fec MAC because of
the interrupt flood. The interrupt count and MDIO timeout fix were
tested on a v5.4.110 kernel.
Fixes: 87461f7a58 ("net: phy: DP83822 initial driver submission")
Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the third packet of 3WHS connection establishment
contains payload, it is added into socket receive queue
without the XFRM check and the drop of connection tracking
context.
This means that if the data is left unread in the socket
receive queue, conntrack module can not be unloaded.
As most applications usually reads the incoming data
immediately after accept(), bug has been hiding for
quite a long time.
Commit 68822bdf76 ("net: generalize skb freeing
deferral to per-cpu lists") exposed this bug because
even if the application reads this data, the skb
with nfct state could stay in a per-cpu cache for
an arbitrary time, if said cpu no longer process RX softirqs.
Many thanks to Ilya Maximets for reporting this issue,
and for testing various patches:
https://lore.kernel.org/netdev/20220619003919.394622-1-i.maximets@ovn.org/
Note that I also added a missing xfrm4_policy_check() call,
although this is probably not a big issue, as the SYN
packet should have been dropped earlier.
Fixes: b59c270104 ("[NETFILTER]: Keep conntrack reference until IPsec policy checks are done")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://lore.kernel.org/r/20220623050436.1290307-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cover the scenario when we cannot insert a socket into the sockmap, because
it has it is using ULP. Failed insert should not have any effect on the ULP
state. This is a regression test.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220623091231.417138-1-jakub@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
udpgso_bench.sh has been running its IPv6 TCP test with IPv4 arguments
since its initial conmit. Looks like a typo.
Fixes: 3a687bef14 ("selftests: udp gso benchmark")
Cc: willemb@google.com
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20220623000234.61774-1-dmichail@fungible.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported uninit-value in tcp_recvmsg() [1]
Issue here is that msg->msg_get_inq should have been cleared,
otherwise tcp_recvmsg() might read garbage and perform
more work than needed, or have undefined behavior.
Given CONFIG_INIT_STACK_ALL_ZERO=y is probably going to be
the default soon, I chose to change __sys_recvfrom() to clear
all fields but msghdr.addr which might be not NULL.
For __copy_msghdr_from_user(), I added an explicit clear
of kmsg->msg_get_inq.
[1]
BUG: KMSAN: uninit-value in tcp_recvmsg+0x6cf/0xb60 net/ipv4/tcp.c:2557
tcp_recvmsg+0x6cf/0xb60 net/ipv4/tcp.c:2557
inet_recvmsg+0x13a/0x5a0 net/ipv4/af_inet.c:850
sock_recvmsg_nosec net/socket.c:995 [inline]
sock_recvmsg net/socket.c:1013 [inline]
__sys_recvfrom+0x696/0x900 net/socket.c:2176
__do_sys_recvfrom net/socket.c:2194 [inline]
__se_sys_recvfrom net/socket.c:2190 [inline]
__x64_sys_recvfrom+0x122/0x1c0 net/socket.c:2190
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Local variable msg created at:
__sys_recvfrom+0x81/0x900 net/socket.c:2154
__do_sys_recvfrom net/socket.c:2194 [inline]
__se_sys_recvfrom net/socket.c:2190 [inline]
__x64_sys_recvfrom+0x122/0x1c0 net/socket.c:2190
CPU: 0 PID: 3493 Comm: syz-executor170 Not tainted 5.19.0-rc3-syzkaller-30868-g4b28366af7d9 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: f94fd25cb0 ("tcp: pass back data left in socket after receive")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Alexander Potapenko<glider@google.com>
Link: https://lore.kernel.org/r/20220622150220.1091182-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The pause settings reported by the PHY should also be applied to the GMII port
status override otherwise the switch will not generate pause frames towards the
link partner despite the advertisement saying otherwise.
Fixes: 246d7f773c ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220623030204.1966851-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
of_find_node_by_name() will decrease the refcount of its first arg and
we need a of_node_get() to keep refcount balance.
Fixes: 7d9ee2e8ff ("net: dsa: hellcreek: Add PTP status LEDs")
Signed-off-by: Liang He <windhl@126.com>
Link: https://lore.kernel.org/r/20220622040621.4094304-1-windhl@126.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - regressions:
- netfilter: cttimeout: fix slab-out-of-bounds read in cttimeout_net_exit
Current release - new code bugs:
- bpf: ftrace: keep address offset in ftrace_lookup_symbols
- bpf: force cookies array to follow symbols sorting
Previous releases - regressions:
- ipv4: ping: fix bind address validity check
- tipc: fix use-after-free read in tipc_named_reinit
- eth: veth: add updating of trans_start
Previous releases - always broken:
- sock: redo the psock vs ULP protection check
- netfilter: nf_dup_netdev: fix skb_under_panic
- bpf: fix request_sock leak in sk lookup helpers
- eth: igb: fix a use-after-free issue in igb_clean_tx_ring
- eth: ice: prohibit improper channel config for DCB
- eth: at803x: fix null pointer dereference on AR9331 phy
- eth: virtio_net: fix xdp_rxq_info bug after suspend/resume
Misc:
- eth: hinic: replace memcpy() with direct assignment
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmK0P+0SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkmBkP/1m5Et04wgtlEfQJtudZj0Sadra0tu6P
vaYlqtiRNMziSY/hxG1p4w7giM4gD7fD3S12Pc/ueCaUwxxILN/eZ/hNgCq9huf6
IbmVmfq6YNZwDaNzFDP8UcIqjnxbg1B3XD41dN7+FggA9ogGFkOvuAcJByzdANVX
BLOkQmGP22+pNJmniH3KYvCZlHIa+LVeRjdjdM+1/LKDs2pxpBi97obyzb5zUiE5
c5E7+BhkGI9X6V1TuHVCHIEFssYNWLiTJcw76HptWmK9Z/DlDEeVlHzKbAMNTycl
I8eTLXnqgye0KCKOqJ4fN+YN42ypdDzrUILKMHGEddG1lOot/2XChgp8+EqMY7Nx
Gjpjh28jTsKdCZMFF3lxDGxeonHciP6lZA80g3GNk4FWUVrqnKEYpdy+6psTkpDr
HahjmFWylGXfmPIKJrsiVGIyxD4ObkRF6SSH7L8j5tAVGxaB5MDFrCws136kACCk
ZyZiXTS0J3Cn1fAb2/vGKgDFhbEWykITYPaiVo7pyrO1jju5qQTtiKiABpcX0Ejs
WxvPA8HB61+kEapIzBLhhxRl25CXTleGE986au2MVh0I/HuQBxVExrRE9FgThjwk
YbSKhR2JOcD5B94HRQXVsQ05q02JzxmB0kVbqSLcIAbCOuo++LZCIdwR5XxSpF6s
AAFhqQycWowh
=JFWo
-----END PGP SIGNATURE-----
Merge tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- netfilter: cttimeout: fix slab-out-of-bounds read in
cttimeout_net_exit
Current release - new code bugs:
- bpf: ftrace: keep address offset in ftrace_lookup_symbols
- bpf: force cookies array to follow symbols sorting
Previous releases - regressions:
- ipv4: ping: fix bind address validity check
- tipc: fix use-after-free read in tipc_named_reinit
- eth: veth: add updating of trans_start
Previous releases - always broken:
- sock: redo the psock vs ULP protection check
- netfilter: nf_dup_netdev: fix skb_under_panic
- bpf: fix request_sock leak in sk lookup helpers
- eth: igb: fix a use-after-free issue in igb_clean_tx_ring
- eth: ice: prohibit improper channel config for DCB
- eth: at803x: fix null pointer dereference on AR9331 phy
- eth: virtio_net: fix xdp_rxq_info bug after suspend/resume
Misc:
- eth: hinic: replace memcpy() with direct assignment"
* tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: openvswitch: fix parsing of nw_proto for IPv6 fragments
sock: redo the psock vs ULP protection check
Revert "net/tls: fix tls_sk_proto_close executed repeatedly"
virtio_net: fix xdp_rxq_info bug after suspend/resume
igb: Make DMA faster when CPU is active on the PCIe link
net: dsa: qca8k: reduce mgmt ethernet timeout
net: dsa: qca8k: reset cpu port on MTU change
MAINTAINERS: Add a maintainer for OCP Time Card
hinic: Replace memcpy() with direct assignment
Revert "drivers/net/ethernet/neterion/vxge: Fix a use-after-free bug in vxge-main.c"
net: phy: smsc: Disable Energy Detect Power-Down in interrupt mode
ice: ethtool: Prohibit improper channel config for DCB
ice: ethtool: advertise 1000M speeds properly
ice: Fix switchdev rules book keeping
ice: ignore protocol field in GTP offload
netfilter: nf_dup_netdev: add and use recursion counter
netfilter: nf_dup_netdev: do not push mac header a second time
selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh
net/tls: fix tls_sk_proto_close executed repeatedly
erspan: do not assume transport header is always set
...
All small changes, mostly device-specific:
- A regression fix for PCM WC-page allocation on x86
- A regression fix for i915 audio component binding
- Fixes for (longstanding) beep handling bug
- Runtime PM fixes for Intel LPE HDMI audio
- A couple of pending FireWire fixes
- Usual HD-audio and USB-audio quirks, new Intel dspconf entries
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmKzM0YOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/eUg//bQ2kaPY4p75mBc7cxhwiJnxU8cnLxSJ4O7/A
Roz+9Sp741TIkY+jKXxvdLSZ8aqOF2H2Tx1DE1NQdOr55yVNIWJIZ6uu2Oue0MKJ
A7xsQalUf7tUehXTEBDjeojoM3WA7AYvpz3PLXWXsc1QCH/sM9baVAXdZ5SryuuT
7tCRsDwTtzHfIsS5N4rWEl9e8xzvTtQ/MluprbUia+WLrLorov4esDJokbCeolqV
sdB24xhBttJDwuQ7FANki+FIKldDdHQKuCFk4FocdnmiUWYTWVSkeAJbHaWjm9HJ
FZgXkD4UiwBmeiKIE0uCHaQ5QwOtJzXSI+eQU37VnAkfmQ3nyhxYrxXX4E3FxYss
NHoMXfCmequXTq/aF+QemortgxI+c78CpN7tC/45pJCspDko1o/44D/BwjldMRRp
f+4EiGM8xTYge53NmlTJvSAo55WdyDCWYX7JKxNbhH9KC1lUlNNnGemD5AhjeGRs
3nC95Iva7IWrDHVaAYmhMqmlCG9CZTJR/C4xWLNdBfdvyQKx5pRrqh5wlt24A0+0
crrylt56c3eDzOvCKwLj8cUFJyf9Qp2ZcxJowdxS1S+H0qZcWkg3i9gqDh7F7gRB
s7gLwjZym6rYylae7lxOZp8y81JNrHJRD2R8dAWjrIadxRFXYm8rFt61v6mjFt/V
jVkyGHQ=
=TcQa
-----END PGP SIGNATURE-----
Merge tag 'sound-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All small changes, mostly device-specific:
- A regression fix for PCM WC-page allocation on x86
- A regression fix for i915 audio component binding
- Fixes for (longstanding) beep handling bug
- Runtime PM fixes for Intel LPE HDMI audio
- A couple of pending FireWire fixes
- Usual HD-audio and USB-audio quirks, new Intel dspconf entries"
* tag 'sound-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for Clevo NS50PU
ALSA: hda: Fix discovery of i915 graphics PCI device
ALSA: hda/via: Fix missing beep setup
ALSA: hda/conexant: Fix missing beep setup
ALSA: memalloc: Drop x86-specific hack for WC allocations
ALSA: hda/realtek: Add quirk for Clevo PD70PNT
ALSA: x86: intel_hdmi_audio: use pm_runtime_resume_and_get()
ALSA: x86: intel_hdmi_audio: enable pm_runtime and set autosuspend delay
ALSA: hda: intel-nhlt: remove use of __func__ in dev_dbg
ALSA: hda: intel-dspcfg: use SOF for UpExtreme and UpExtreme11 boards
firewire: convert sysfs sprintf/snprintf family to sysfs_emit
firewire: cdev: fix potential leak of kernel stack due to uninitialized value
ALSA: hda/realtek: Apply fixup for Lenovo Yoga Duet 7 properly
ALSA: hda/realtek - ALC897 headset MIC no sound
ALSA: usb-audio: US16x08: Move overflow check before array access
ALSA: hda/realtek: Add mute LED quirk for HP Omen laptop
When a packet enters the OVS datapath and does not match any existing
flows installed in the kernel flow cache, the packet will be sent to
userspace to be parsed, and a new flow will be created. The kernel and
OVS rely on each other to parse packet fields in the same way so that
packets will be handled properly.
As per the design document linked below, OVS expects all later IPv6
fragments to have nw_proto=44 in the flow key, so they can be correctly
matched on OpenFlow rules. OpenFlow controllers create pipelines based
on this design.
This behavior was changed by the commit in the Fixes tag so that
nw_proto equals the next_header field of the last extension header.
However, there is no counterpart for this change in OVS userspace,
meaning that this field is parsed differently between OVS and the
kernel. This is a problem because OVS creates actions based on what is
parsed in userspace, but the kernel-provided flow key is used as a match
criteria, as described in Documentation/networking/openvswitch.rst. This
leads to issues such as packets incorrectly matching on a flow and thus
the wrong list of actions being applied to the packet. Such changes in
packet parsing cannot be implemented without breaking the userspace.
The offending commit is partially reverted to restore the expected
behavior.
The change technically made sense and there is a good reason that it was
implemented, but it does not comply with the original design of OVS.
If in the future someone wants to implement such a change, then it must
be user-configurable and disabled by default to preserve backwards
compatibility with existing OVS versions.
Cc: stable@vger.kernel.org
Fixes: fa642f0883 ("openvswitch: Derive IP protocol number for IPv6 later frags")
Link: https://docs.openvswitch.org/en/latest/topics/design/#fragments
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20220621204845.9721-1-roriorden@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 8a59f9d1e3 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
has moved the inet_csk_has_ulp(sk) check from sk_psock_init() to
the new tcp_bpf_update_proto() function. I'm guessing that this
was done to allow creating psocks for non-inet sockets.
Unfortunately the destruction path for psock includes the ULP
unwind, so we need to fail the sk_psock_init() itself.
Otherwise if ULP is already present we'll notice that later,
and call tcp_update_ulp() with the sk_proto of the ULP
itself, which will most likely result in the ULP looping
its callbacks.
Fixes: 8a59f9d1e3 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220620191353.1184629-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reverts commit 69135c572d.
This commit was just papering over the issue, ULP should not
get ->update() called with its own sk_prot. Each ULP would
need to add this check.
Fixes: 69135c572d ("net/tls: fix tls_sk_proto_close executed repeatedly")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20220620191353.1184629-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The following sequence currently causes a driver bug warning
when using virtio_net:
# ip link set eth0 up
# echo mem > /sys/power/state (or e.g. # rtcwake -s 10 -m mem)
<resume>
# ip link set eth0 down
Missing register, driver bug
WARNING: CPU: 0 PID: 375 at net/core/xdp.c:138 xdp_rxq_info_unreg+0x58/0x60
Call trace:
xdp_rxq_info_unreg+0x58/0x60
virtnet_close+0x58/0xac
__dev_close_many+0xac/0x140
__dev_change_flags+0xd8/0x210
dev_change_flags+0x24/0x64
do_setlink+0x230/0xdd0
...
This happens because virtnet_freeze() frees the receive_queue
completely (including struct xdp_rxq_info) but does not call
xdp_rxq_info_unreg(). Similarly, virtnet_restore() sets up the
receive_queue again but does not call xdp_rxq_info_reg().
Actually, parts of virtnet_freeze_down() and virtnet_restore_up()
are almost identical to virtnet_close() and virtnet_open(): only
the calls to xdp_rxq_info_(un)reg() are missing. This means that
we can fix this easily and avoid such problems in the future by
just calling virtnet_close()/open() from the freeze/restore handlers.
Aside from adding the missing xdp_rxq_info calls the only difference
is that the refill work is only cancelled if netif_running(). However,
this should not make any functional difference since the refill work
should only be active if the network interface is actually up.
Fixes: 754b8a21a9 ("virtio_net: setup xdp_rxq_info")
Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220621114845.3650258-1-stephan.gerhold@kernkonzept.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-06-21
This series contains updates to ice driver only.
Marcin fixes GTP filters by allowing ignoring of the inner ethertype field.
Wojciech adds VSI handle tracking in order to properly distinguish similar
filters for removal.
Anatolii removes ability to set 1000baseT and 1000baseX fields
concurrently which caused link issues. He also disallows setting
channels to less than the number of Traffic Classes which would cause
NULL pointer dereference.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: ethtool: Prohibit improper channel config for DCB
ice: ethtool: advertise 1000M speeds properly
ice: Fix switchdev rules book keeping
ice: ignore protocol field in GTP offload
====================
Link: https://lore.kernel.org/r/20220621224756.631765-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Intel I210 on some Intel Alder Lake platforms can only achieve ~750Mbps
Tx speed via iperf. The RR2DCDELAY shows around 0x2xxx DMA delay, which
will be significantly lower when 1) ASPM is disabled or 2) SoC package
c-state stays above PC3. When the RR2DCDELAY is around 0x1xxx the Tx
speed can reach to ~950Mbps.
According to the I210 datasheet "8.26.1 PCIe Misc. Register - PCIEMISC",
"DMA Idle Indication" doesn't seem to tie to DMA coalesce anymore, so
set it to 1b for "DMA is considered idle when there is no Rx or Tx AND
when there are no TLPs indicating that CPU is active detected on the
PCIe link (such as the host executes CSR or Configuration register read
or write operation)" and performing Tx should also fall under "active
CPU on PCIe link" case.
In addition to that, commit b6e0c419f0 ("igb: Move DMA Coalescing init
code to separate function.") seems to wrongly changed from enabling
E1000_PCIEMISC_LX_DECISION to disabling it, also fix that.
Fixes: b6e0c419f0 ("igb: Move DMA Coalescing init code to separate function.")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220621221056.604304-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current mgmt ethernet timeout is set to 100ms. This value is too
big and would slow down any mdio command in case the mgmt ethernet
packet have some problems on the receiving part.
Reduce it to just 5ms to handle case when some operation are done on the
master port that would cause the mgmt ethernet to not work temporarily.
Fixes: 5950c7c0a6 ("net: dsa: qca8k: add support for mgmt read/write in Ethernet packet")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20220621151633.11741-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It was discovered that the Documentation lacks of a fundamental detail
on how to correctly change the MAX_FRAME_SIZE of the switch.
In fact if the MAX_FRAME_SIZE is changed while the cpu port is on, the
switch panics and cease to send any packet. This cause the mgmt ethernet
system to not receive any packet (the slow fallback still works) and
makes the device not reachable. To recover from this a switch reset is
required.
To correctly handle this, turn off the cpu ports before changing the
MAX_FRAME_SIZE and turn on again after the value is applied.
Fixes: f58d2598cf ("net: dsa: qca8k: implement the port MTU callbacks")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20220621151122.10220-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I've been contributing and reviewing patches for ptp_ocp driver for
some time and I'm taking care of it's github mirror. On Jakub's
suggestion, I would like to step forward and become a maintainer for
this driver. This patch adds a dedicated entry to MAINTAINERS.
Signed-off-by: Vadim Fedorenko <vadfed@fb.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220621233131.21240-1-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This Kselftest fixes update for Linux 5.19-rc4 consists of compile
time fixes and run-time resources leaks.
-- Fix clang cross compilation
-- Fix resource leak when return error
-- fix compile error for dma_map_benchmark
-- Fix regression - make use of GUP_TEST_FILE macro
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmKzUYsACgkQCwJExA0N
Qxx3txAA6u4Ot7xF4L5ZC2VSpiAfnVwyiusfUFgOIcqTTb/tVhtHR2YOlNp49ETi
LRA1vBM+OjmS5BDGEq10M1aiI4xyUB0jtEdts/pzouzk+rluIyjQy94taYsxVSq3
ZUBpe1h09MZAX4s5tZfpJDyOOkFU8kuFKFASUGzdu/lr/NHC97rehwLW5KYG0F2J
Pe4H1l5YM9ZODl5rgaMML2azCDqnp6/WpRU4Ov2W6q8T5X/cLwNxv7AnacMb4ajx
IP9n4rOR05EBqsk9JZ0qJ237PsaffWN/6heczqHPuwPiyL/VKpKDSZTrAu2yBl1v
oXS+kdGj1alvxDlfYPd5fMbvZyYyLQyinNwN/2X5MNYB9qnUc+DJIbqHbLtpjg+h
hpeN55mAHOW/yhQDcnFIgE9RuGPPkC0WKu1RtOu6ovKCGb8Ar1cJQA/5qjnfFOhW
HeiqD6IMuVo07oi5KxH5nSXBNvm3n5avwAO3gUOiotZXMqFa2x9lE3GWv4Hf51W8
7OXbOF6GL1m9S5WF9Rhol6iviLRxMEESV9oAsKpi+AeyNEjkwrrJpvEIeRpPfm62
DmspylVBEUlOPiMKwf2A6/MqhVMN8vt4ssAobWbcYHdlee4521FiGe6durLs5z/Y
YECCoBCIiiFQaojhSdt1Vuz5oq9Ht/hITwMcf7AEIzA6b+A2YYQ=
=orZI
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"Compile time fixes and run-time resources leaks:
- Fix clang cross compilation
- Fix resource leak when return error
- fix compile error for dma_map_benchmark
- Fix regression - make use of GUP_TEST_FILE macro"
* tag 'linux-kselftest-fixes-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: make use of GUP_TEST_FILE macro
selftests: vm: Fix resource leak when return error
selftests dma: fix compile error for dma_map_benchmark
selftests: Fix clang cross compilation