The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.
But for tunnel code, it will call pmtu before xmit, like:
- tnl_update_pmtu()
- skb_dst_update_pmtu()
- ip6_rt_update_pmtu()
- __ip6_rt_update_pmtu()
- dst_confirm_neigh()
If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.
So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.
On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.
To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.
v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
Suggested-by: David Miller <davem@davemloft.net>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl39UCAACgkQ+7dXa6fL
C2sZhw//S7+XNXpA9HfwUO/Vic9jUFi4FRPkS+zqHUCJQpbC32+Gj1L13J++EEJm
aIaf8yo5DSNIKRaja+d4UTmZFgXOOg64Jq/p9ZOltdomS+ZihGWoigKHxYQtLfin
fsBVbPqMO+PSZbSqcwy7inMKfJ1EkFXiu5lhI5o4Swc3gw9w9HTui7u5c02Wol8L
2ooI2hiuali9FSJ592UfvHoxK3auNUJ6oB2aA+EFuLhMLz+8hhnVCNELsaWoLDOh
qZkI0D+X5w61MsYiND3jOB85pl+d8ADKPlMw/WM1hIwn/ODPvkqnGHVhOslB7/fh
dBsNUcGhAsdGz9jBDRXcUv5ak4G/wL1IiV5CRJ0xSX6I5lTm+byIRE2rqtO5mkQg
ESXCGGodbxFh7EIjjWskdTtusQN+O+v78GLue7h9KVLrsTvzQRT99BIAJLvT6rGM
F6I/Zirj1ZbRhn7uwkr+hwoD4CJvtF/mvImg0P2iPQmUrhEUJkE5VRVIDf/OnBd0
vGQJG7IKr5tUK+x3ztk34H3pqIakfQ03FX0mqdTFuoVYmaBlooSk9Do7xNTUHi2o
2rC3R5S1alRBqFBOKIz5dL6SK8L1wcLCgSzC1pI+q9bcusX6yjW0b3vMn6tzCUHA
eb1NM0jp7VXx5yN88pfi7U7JFxafcCMoKKBodvdLgnL06mfHnJc=
=vwQ0
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-fixes-20191220' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fixes
Here are a couple of bugfixes plus a patch that makes one of the bugfixes
easier:
(1) Move the ping and mutex unlock on a new call from rxrpc_input_packet()
into rxrpc_new_incoming_call(), which it calls. This means the
lock-unlock section is entirely within the latter function. This
simplifies patch (2).
(2) Don't take the call->user_mutex at all in the softirq path. Mutexes
aren't allowed to be taken or released there and a patch was merged
that caused a warning to be emitted every time this happened. Looking
at the code again, it looks like that taking the mutex isn't actually
necessary, as the value of call->state will block access to the call.
(3) Fix the incoming call path to check incoming calls earlier to reject
calls to RPC services for which we don't have a security key of the
appropriate class. This avoids an assertion failure if YFS tries
making a secure call to the kafs cache manager RPC service.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The fix on 951c6db954 fixed the issued reported there but introduced
another. When the allocation fails within sctp_stream_init() it is
okay/necessary to free the genradix. But it is also called when adding
new streams, from sctp_send_add_streams() and
sctp_process_strreset_addstrm_in() and in those situations it cannot
just free the genradix because by then it is a fully operational
association.
The fix here then is to only free the genradix in sctp_stream_init()
and on those other call sites move on with what it already had and let
the subsequent error handling to handle it.
Tested with the reproducers from this report and the previous one,
with lksctp-tools and sctp-tests.
Reported-by: syzbot+9a1bc632e78a1a98488b@syzkaller.appspotmail.com
Fixes: 951c6db954 ("sctp: fix memleak on err handling of stream initialization")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the size of the receive buffer for a socket is close to 2^31 when
computing if we have enough space in the buffer to copy a packet from
the queue to the buffer we might hit an integer overflow.
When an user set net.core.rmem_default to a value close to 2^31 UDP
packets are dropped because of this overflow. This can be visible, for
instance, with failure to resolve hostnames.
This can be fixed by casting sk_rcvbuf (which is an int) to unsigned
int, similarly to how it is done in TCP.
Signed-off-by: Antonio Messina <amessina@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
including adding a missing ipv6 match description.
2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
Bhat.
3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.
4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.
5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.
6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
Chaignon.
7) Multicast MAC limit test is off by one in qede, from Manish Chopra.
8) Fix established socket lookup race when socket goes from
TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
RCU grace period. From Eric Dumazet.
9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.
10) Fix active backup transition after link failure in bonding, from
Mahesh Bandewar.
11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.
12) Fix wrong interface passed to ->mac_link_up(), from Russell King.
13) Fix DSA egress flooding settings in b53, from Florian Fainelli.
14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.
15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.
16) Reject invalid MTU values in stmmac, from Jose Abreu.
17) Fix refcount leak in error path of u32 classifier, from Davide
Caratti.
18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
Kaseorg.
19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.
20) Disable hardware GRO when XDP is attached to qede, frm Manish
Chopra.
21) Since we encode state in the low pointer bits, dst metrics must be
at least 4 byte aligned, which is not necessarily true on m68k. Add
annotations to fix this, from Geert Uytterhoeven.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
sfc: Include XDP packet headroom in buffer step size.
sfc: fix channel allocation with brute force
net: dst: Force 4-byte alignment of dst_metrics
selftests: pmtu: fix init mtu value in description
hv_netvsc: Fix unwanted rx_table reset
net: phy: ensure that phy IDs are correctly typed
mod_devicetable: fix PHY module format
qede: Disable hardware gro when xdp prog is installed
net: ena: fix issues in setting interrupt moderation params in ethtool
net: ena: fix default tx interrupt moderation interval
net/smc: unregister ib devices in reboot_event
net: stmmac: platform: Fix MDIO init for platforms without PHY
llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
net: hisilicon: Fix a BUG trigered by wrong bytes_compl
net: dsa: ksz: use common define for tag len
s390/qeth: don't return -ENOTSUPP to userspace
s390/qeth: fix promiscuous mode after reset
s390/qeth: handle error due to unsupported transport mode
cxgb4: fix refcount init for TC-MQPRIO offload
tc-testing: initial tdc selftests for cls_u32
...
In the reboot_event handler, unregister the ib devices and enable
the IB layer to release the devices before the reboot.
Fixes: a33a803cfe ("net/smc: guarantee removal of link groups in reboot")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a frame with NULL DSAP is received, llc_station_rcv is called.
In turn, llc_stat_ev_rx_null_dsap_xid_c is called to check if it is a NULL
XID frame. The return statement of llc_stat_ev_rx_null_dsap_xid_c returns 1
when the incoming frame is not a NULL XID frame and 0 otherwise. Hence, a
NULL XID response is returned unexpectedly, e.g. when the incoming frame is
a NULL TEST command.
To fix the error, simply remove the conditional operator.
A similar error in llc_stat_ev_rx_null_dsap_test_c is also fixed.
Signed-off-by: Chan Shu Tak, Alex <alexchan@task.com.hk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove special taglen define KSZ8795_INGRESS_TAG_LEN
and use generic KSZ_INGRESS_TAG_LEN instead.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix rxrpc_new_incoming_call() to check that we have a suitable service key
available for the combination of service ID and security class of a new
incoming call - and to reject calls for which we don't.
This causes an assertion like the following to appear:
rxrpc: Assertion failed - 6(0x6) == 12(0xc) is false
kernel BUG at net/rxrpc/call_object.c:456!
Where call->state is RXRPC_CALL_SERVER_SECURING (6) rather than
RXRPC_CALL_COMPLETE (12).
Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Standard kernel mutexes cannot be used in any way from interrupt or softirq
context, so the user_mutex which manages access to a call cannot be a mutex
since on a new call the mutex must start off locked and be unlocked within
the softirq handler to prevent userspace interfering with a call we're
setting up.
Commit a0855d24fc ("locking/mutex: Complain
upon mutex API misuse in IRQ contexts") causes big warnings to be splashed
in dmesg for each a new call that comes in from the server. Whilst it
*seems* like it should be okay, since the accept path uses trylock, there
are issues with PI boosting and marking the wrong task as the owner.
Fix this by not taking the mutex in the softirq path at all. It's not
obvious that there should be any need for it as the state is set before the
first notification is generated for the new call.
There's also no particular reason why the link-assessing ping should be
triggered inside the mutex. It's not actually transmitted there anyway,
but rather it has to be deferred to a workqueue.
Further, I don't think that there's any particular reason that the socket
notification needs to be done from within rx->incoming_lock, so the amount
of time that lock is held can be shortened too and the ping prepared before
the new call notification is sent.
Fixes: 540b1c48c3 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Peter Zijlstra (Intel) <peterz@infradead.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Will Deacon <will@kernel.org>
cc: Davidlohr Bueso <dave@stgolabs.net>
Move the unlock and the ping transmission for a new incoming call into
rxrpc_new_incoming_call() rather than doing it in the caller. This makes
it clearer to see what's going on.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Will Deacon <will@kernel.org>
cc: Davidlohr Bueso <dave@stgolabs.net>
when users replace cls_u32 filters with new ones having wrong parameters,
so that u32_change() fails to validate them, the kernel doesn't roll-back
correctly, and leaves semi-configured rules.
Fix this in u32_walk(), avoiding a call to the walker function on filters
that don't have a match rule connected. The side effect is, these "empty"
filters are not even dumped when present; but that shouldn't be a problem
as long as we are restoring the original behaviour, where semi-configured
filters were not even added in the error path of u32_change().
Fixes: 6676d5e416 ("net: sched: set dedicated tcf_walker flag when tp is empty")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-12-19
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 269 insertions(+), 108 deletions(-).
The main changes are:
1) Fix lack of synchronization between xsk wakeup and destroying resources
used by xsk wakeup, from Maxim Mikityanskiy.
2) Fix pruning with tail call patching, untrack programs in case of verifier
error and fix a cgroup local storage tracking bug, from Daniel Borkmann.
3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to
egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer.
4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when
only cBPF is present, from Alexander Lobakin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
proc_dointvec_minmax_bpf_restricted() has been firstly introduced
in commit 2e4a30983b ("bpf: restrict access to core bpf sysctls")
under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in
ede95a63b5 ("bpf: add bpf_jit_limit knob to restrict unpriv
allocations"), because a new sysctl, bpf_jit_limit, made use of it.
Finally, this parameter has become long instead of integer with
fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
and thus, a new proc_dolongvec_minmax_bpf_restricted() has been
added.
With this last change, we got back to that
proc_dointvec_minmax_bpf_restricted() is used only under
CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been
brought back.
So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n
since v4.20 we have:
CC net/core/sysctl_net_core.o
net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function]
292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again.
Fixes: fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191218091821.7080-1-alobakin@dlink.ru
The XSK wakeup callback in drivers makes some sanity checks before
triggering NAPI. However, some configuration changes may occur during
this function that affect the result of those checks. For example, the
interface can go down, and all the resources will be destroyed after the
checks in the wakeup function, but before it attempts to use these
resources. Wrap this callback in rcu_read_lock to allow driver to
synchronize_rcu before actually destroying the resources.
xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup
wrapped into the RCU lock. After this commit, xsk_poll starts using
xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide
ndo_xsk_wakeup should be called. It also fixes a bug introduced with the
need_wakeup feature: a non-zero-copy socket may be used with a driver
supporting zero-copy, and in this case ndo_xsk_wakeup should not be
called, so the xs->zc check is the correct one.
Fixes: 77cd0d7b3f ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191217162023.16011-2-maximmi@mellanox.com
The kernel may sleep while holding a spinlock.
The function call path (from bottom to top) in Linux 4.19 is:
net/nfc/nci/uart.c, 349:
nci_skb_alloc in nci_uart_default_recv_buf
net/nfc/nci/uart.c, 255:
(FUNC_PTR)nci_uart_default_recv_buf in nci_uart_tty_receive
net/nfc/nci/uart.c, 254:
spin_lock in nci_uart_tty_receive
nci_skb_alloc(GFP_KERNEL) can sleep at runtime.
(FUNC_PTR) means a function pointer is called.
To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC for
nci_skb_alloc().
This bug is found by a static analysis tool STCheck written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dev_hold has to be called always in rx_queue_add_kobject.
Otherwise usage count drops below 0 in case of failure in
kobject_init_and_add.
Fixes: b8eb718348 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
Reported-by: syzbot <syzbot+30209ea299c09d8785c9@syzkaller.appspotmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Miller <davem@davemloft.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dsa_link_touch() is not exported, or defined outside of the
file it is in so make it static to avoid the following warning:
net/dsa/dsa2.c:127:17: warning: symbol 'dsa_link_touch' was not declared. Should it be static?
Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_pacing_shift can be read and written without lock
synchronization. This patch adds annotations to
document this fact and avoid future syzbot complains.
This might also avoid unexpected false sharing
in sk_pacing_shift_update(), as the compiler
could remove the conditional check and always
write over sk->sk_pacing_shift :
if (sk->sk_pacing_shift != val)
sk->sk_pacing_shift = val;
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported a memory leak when an allocation fails within
genradix_prealloc() for output streams. That's because
genradix_prealloc() leaves initialized members initialized when the
issue happens and SCTP stack will abort the current initialization but
without cleaning up such members.
The fix here is to always call genradix_free() when genradix_prealloc()
fails, for output and also input streams, as it suffers from the same
issue.
Reported-by: syzbot+772d9e36c490b18d51d1@syzkaller.appspotmail.com
Fixes: 2075e50caf ("sctp: convert to genradix")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* disable AQL on most drivers, addressing the iwlwifi issues
* fix double-free on network namespace changes
* fix TID field in frames injected through monitor interfaces
* fix ieee80211_calc_rx_airtime()
* fix NULL pointer dereference in rfkill (and remove BUG_ON)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl33THYACgkQB8qZga/f
l8QSsw/+MW4NiXgjFNIaTfyPocFCLi45Efdizef3D+T2MsAD279DFILLFc4Q08Tt
kL2CxOaD2lrrrPXTa++vkcaSBBQtgRRFvPQPycwoju9QkTuEQ31wFtXSzeCdSFmx
vLVaY+gMvPjw6HejWqouPlm1hBaA0jqZOCjwq3IWj0spDR/FwJ0HwXSzzEhUs7FV
1097Q9i7kLDZjdMvUUVnKi8SyWPL8TMPfXxyGPOsSbMPG5QAYj3odfb7FtsZLYgD
SwWafp6nroUfEDi3jk+QNEuJB4on6iAVEJxbltDQWqsBXO76CWVAezh9KqiDtzIt
Ay2YtTyOUTZUTPR8lZnoiTvR0GLzeNybwT5BQ9COO1tCD4yB8y1/cpa8oJxv/YRB
xekCcNMPDFqIwtrY4UKbTEuyCbf78uVO8cUYdlb4ZUUvLKFP2LiD63InyWAtvCdu
N21mzausAUzy65j5AJ7IIut7iFrcNEQ2qQtQuECGEqmu9uqHD4S3e9MQYd6Qx429
uXdWbtyoqnYXMaEhF4Zy+DNz5vELNUkP0Lv9sJ6ihALQSXWXwkiz+p7+eYsdBdZy
mJslJs+sYCgt0y46xzUfAcwvEdAOGkBeF91gCHSJ7q9g8UCrnZ4SN+OCx93YGnvW
w1AQgZ7VW2aL4i8lnfu6bd86K04M/AbwdRn2OL6Ug4LmmFvHUT8=
=tvTq
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-net-2019-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A handful of fixes:
* disable AQL on most drivers, addressing the iwlwifi issues
* fix double-free on network namespace changes
* fix TID field in frames injected through monitor interfaces
* fix ieee80211_calc_rx_airtime()
* fix NULL pointer dereference in rfkill (and remove BUG_ON)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio_transport_get_ops() and virtio_transport_send_pkt_info()
can only be used on connecting/connected sockets, since a socket
assigned to a transport is required.
This patch adds a WARN_ON() on virtio_transport_get_ops() to check
this requirement, a comment and a returned error on
virtio_transport_send_pkt_info(),
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rfkill_register, the struct rfkill pointer is first derefernced
and then checked for NULL. This patch removes the BUG_ON and returns
an error to the caller in case rfkill is NULL.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Link: https://lore.kernel.org/r/20191215153409.21696-1-pakki001@umn.edu
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
FASTOPEN setsockopt() or sendmsg() may switch the SMC socket to fallback
mode. Once fallback mode is active, the native TCP socket functions are
called. Nevertheless there is a small race window, when FASTOPEN
setsockopt/sendmsg runs in parallel to a connect(), and switch the
socket into fallback mode before connect() takes the sock lock.
Make sure the SMC-specific connect setup is omitted in this case.
This way a syzbot-reported refcount problem is fixed, triggered by
different threads running non-blocking connect() and FASTOPEN_KEY
setsockopt.
Reported-by: syzbot+96d3f9ff6a86d37e44c8@syzkaller.appspotmail.com
Fixes: 6d6dd528d5 ("net/smc: fix refcount non-blocking connect() -part 2")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
At the time commit ce5ec44099 ("tcp: ensure epoll edge trigger
wakeup when write queue is empty") was added to the kernel,
we still had a single write queue, combining rtx and write queues.
Once we moved the rtx queue into a separate rb-tree, testing
if sk_write_queue is empty has been suboptimal.
Indeed, if we have packets in the rtx queue, we probably want
to delay the EPOLLOUT generation at the time incoming packets
will free them, making room, but more importantly avoiding
flooding application with EPOLLOUT events.
Solution is to use tcp_rtx_and_write_queues_empty() helper.
Fixes: 75c119afe1 ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Due to how tcp_sendmsg() is implemented, we can have an empty
skb at the tail of the write queue.
Most [1] tcp_write_queue_empty() callers want to know if there is
anything to send (payload and/or FIN)
Instead of checking if the sk_write_queue is empty, we need
to test if tp->write_seq == tp->snd_nxt
[1] tcp_send_fin() was the only caller that expected to
see if an skb was in the write queue, I have changed the code
to reuse the tcp_write_queue_tail() result.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Backport of commit fdfc5c8594 ("tcp: remove empty skb from
write queue in error cases") in linux-4.14 stable triggered
various bugs. One of them has been fixed in commit ba2ddb43f2
("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
we still have crashes in some occasions.
Root-cause is that when tcp_sendmsg() has allocated a fresh
skb and could not append a fragment before being blocked
in sk_stream_wait_memory(), tcp_write_xmit() might be called
and decide to send this fresh and empty skb.
Sending an empty packet is not only silly, it might have caused
many issues we had in the past with tp->packets_out being
out of sync.
Fixes: c65f7f00c5 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Michal Kubecek and Firo Yang did a very nice analysis of crashes
happening in __inet_lookup_established().
Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN
(via a close()/socket()/listen() cycle) without a RCU grace period,
I should not have changed listeners linkage in their hash table.
They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt),
so that a lookup can detect a socket in a hash list was moved in
another one.
Since we added code in commit d296ba60d8 ("soreuseport: Resolve
merge conflict for v4/v6 ordering fix"), we have to add
hlist_nulls_add_tail_rcu() helper.
Fixes: 3b24d854cb ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Firo Yang <firo.yang@suse.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
In commit 4b1373de73 ("net: ipv6: addr: perform strict checks also for
doit handlers") we add strict check for inet6_rtm_getaddr(). But we did
the invalid header values check before checking if NETLINK_F_STRICT_CHK
is set. This may break backwards compatibility if user already set the
ifm->ifa_prefixlen, ifm->ifa_flags, ifm->ifa_scope in their netlink code.
I didn't move the nlmsg_len check because I thought it's a valid check.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 4b1373de73 ("net: ipv6: addr: perform strict checks also for doit handlers")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Redirecting a packet from ingress to egress by using bpf_redirect
breaks if the egress interface has an fq qdisc installed. This is the same
problem as fixed in 'commit 8203e2d844 ("net: clear skb->tstamp in forwarding paths")
Clear skb->tstamp when redirecting into the egress path.
Fixes: 80b14dee2b ("net: Add a new socket option for a future transmit time.")
Fixes: fb420d5d91 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20191213180817.2510-1-lmb@cloudflare.com
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3y5kIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv0MEADY9LOC2JkG2aNP53gCXGu74rFQNstJfusr
xPpkMs5I7hxoeVp/bkVAvR6BbpfPDsyGMhSMURELgMFzkuw+03z2IbxVuexdGOEu
X1+6EkunAq/r331fXL+cXUKJM3ThlkUBbNTuV8z5oWWSM1EfoBAWYfh2u4L4oBcc
yIHRH8by9qipx+fhBWPU/ZWeCcMVt5Ju2b6PHAE/GWwteZh00PsTwq0oqQcVcm/5
4Ennr0OELb7LaZWblAIIZe96KeJuCPzGfbRV/y12Ne/t6STH/sWA1tMUzgT22Eju
27uXtwAJtoLsep4V66a4VYObs/hXmEP281wIsXEnIkX0YCvbAiM+J6qVHGjYSMGf
mRrzfF6nDC1vaMduE4waoO3VFiDFQU/qfQRa21ZP50dfQOAXTpEz8kJNG/PjN1VB
gmz9xsujDU1QPD7IRiTreiPPHE5AocUzlUYuYOIMt7VSbFZIMHW5S/pML93avkJt
nx6g3gOP0JKkjaCEvIKY4VLljDT8eDZ/WScDsSedSbPZikMzEo8DSx4U6nbGPqzy
qSljgQniDcrH8GdRaJFDXgLkaB8pu83NH7zUH+xioUZjAHq/XEzKQFYqysf1DbtU
SEuSnUnLXBvbwfb3Z2VpQ/oz8G3a0nD9M7oudt1sN19oTCJKxYSbOUgNUrj9JsyQ
QlAVrPKRkQ==
=fbsf
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that
don't sever if the result is < 0.
This is mostly for linked timeouts, where if we ask for a pure
timeout we always get -ETIME. This makes links useless for that case,
hence allow a case where it works.
- Five minor optimizations to fix and improve cases that regressed
since v5.4.
- An SQTHREAD locking fix.
- A sendmsg/recvmsg iov assignment fix.
- Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and
subsequently ensuring that works for io_uring.
- Fix a case where for an invalid opcode we might return -EBADF instead
of -EINVAL, if the ->fd of that sqe was set to an invalid fd value.
* tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block:
io_uring: ensure we return -EINVAL on unknown opcode
io_uring: add sockets to list of files that support non-blocking issue
net: make socket read/write_iter() honor IOCB_NOWAIT
io_uring: only hash regular files for async work execution
io_uring: run next sqe inline if possible
io_uring: don't dynamically allocate poll data
io_uring: deferred send/recvmsg should assign iov
io_uring: sqthread should grab ctx->uring_lock for submissions
io-wq: briefly spin for new work after finishing work
io-wq: remove worker->wait waitqueue
io_uring: allow unbreakable links
Instead of just having an airtime flag in debugfs, turn AQL into a proper
NL80211_EXT_FEATURE, so drivers can turn it on when they are ready, and so
we also expose the presence of the feature to userspace.
This also has the effect of flipping the default, so drivers have to opt in
to using AQL instead of getting it by default with TXQs. To keep
functionality the same as pre-patch, we set this feature for ath10k (which
is where it is needed the most).
While we're at it, split out the debugfs interface so AQL gets its own
per-station debugfs file instead of using the 'airtime' file.
[Johannes:]
This effectively disables AQL for iwlwifi, where it fixes a number of
issues:
* TSO in iwlwifi is causing underflows and associated warnings in AQL
* HE (802.11ax) rates aren't reported properly so at HE rates, AQL could
never have a valid estimate (it'd use 6 Mbps instead of up to 2400!)
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20191212111437.224294-1-toke@redhat.com
Fixes: 3ace10f5b5 ("mac80211: Implement Airtime-based Queue Limit (AQL)")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This code was copied from mt76 and inherited an off by one bug from
there. The > should be >= so that we don't read one element beyond
the end of the array.
Fixes: db3e1c40cf ("mac80211: Import airtime calculation code from mt76")
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20191126120910.ftr4t7me3by32aiz@kili.mountain
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If wdev->wext.keys was initialized it didn't get reset to NULL on
unregister (and it doesn't get set in cfg80211_init_wdev either), but
wdev is reused if unregister was triggered through
cfg80211_switch_netns.
The next unregister (for whatever reason) will try to free
wdev->wext.keys again.
Signed-off-by: Stefan Bühler <source@stbuehler.de>
Link: https://lore.kernel.org/r/20191126100543.782023-1-stefan.buehler@tik.uni-stuttgart.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix overwriting of the qos_ctrl.tid field for encrypted frames injected on
a monitor interface. While qos_ctrl.tid is not encrypted, it's used as an
input into the encryption algorithm so it's protected, and thus cannot be
modified after encryption. For injected frames, the encryption may already
have been done in userspace, so we cannot change any fields.
Before passing the frame to the driver, the qos_ctrl.tid field is updated
from skb->priority. Prior to dbd50a851c skb->priority was updated in
ieee80211_select_queue_80211(), but this function is no longer always
called.
Update skb->priority in ieee80211_monitor_start_xmit() so that the value
is stored, and when later code 'modifies' the TID it really sets it to
the same value as before, preserving the encryption.
Fixes: dbd50a851c ("mac80211: only allocate one queue when using iTXQs")
Signed-off-by: Fredrik Olofsson <fredrik.olofsson@anyfinetworks.com>
Link: https://lore.kernel.org/r/20191119133451.14711-1-fredrik.olofsson@anyfinetworks.com
[rewrite commit message based on our discussion]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In the function 'tipc_disc_rcv()', the 'msg_peer_net_hash()' is called
to read the header data field but after the message skb has been freed,
that might result in a garbage value...
This commit fixes it by defining a new local variable to store the data
first, just like the other header fields' handling.
Fixes: f73b12812a ("tipc: improve throughput between nodes in netns")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a user message is sent, TIPC will check if the socket has faced a
congestion at link layer. If that happens, it will make a sleep to wait
for the congestion to disappear. This leaves a gap for other users to
take over the socket (e.g. multi threads) since the socket is released
as well. Also, in case of connectionless (e.g. SOCK_RDM), user is free
to send messages to various destinations (e.g. via 'sendto()'), then
the socket's preformatted header has to be updated correspondingly
prior to the actual payload message building.
Unfortunately, the latter action is done before the first action which
causes a condition issue that the destination of a certain message can
be modified incorrectly in the middle, leading to wrong destination
when that message is built. Consequently, when the message is sent to
the link layer, it gets stuck there forever because the peer node will
simply reject it. After a number of retransmission attempts, the link
is eventually taken down and the retransmission failure is reported.
This commit fixes the problem by rearranging the order of actions to
prevent the race condition from occurring, so the message building is
'atomic' and its header will not be modified by anyone.
Fixes: 365ad353c2 ("tipc: reduce risk of user starvation during link congestion")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit c55c8edafa ("tipc: smooth change between replicast and
broadcast"), we allow instant switching between replicast and broadcast
by sending a dummy 'SYN' packet on the last used link to synchronize
packets on the links. The 'SYN' message is an object of link congestion
also, so if that happens, a 'SOCK_WAKEUP' will be scheduled to be sent
back to the socket...
However, in that commit, we simply use the same socket 'cong_link_cnt'
counter for both the 'SYN' & normal payload message sending. Therefore,
if both the replicast & broadcast links are congested, the counter will
be not updated correctly but overwritten by the latter congestion.
Later on, when the 'SOCK_WAKEUP' messages are processed, the counter is
reduced one by one and eventually overflowed. Consequently, further
activities on the socket will only wait for the false congestion signal
to disappear but never been met.
Because sending the 'SYN' message is vital for the mechanism, it should
be done anyway. This commit fixes the issue by marking the message with
an error code e.g. 'TIPC_ERR_NO_PORT', so its sending should not face a
link congestion, there is no need to touch the socket 'cong_link_cnt'
either. In addition, in the event of any error (e.g. -ENOBUFS), we will
purge the entire payload message queue and make a return immediately.
Fixes: c55c8edafa ("tipc: smooth change between replicast and broadcast")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current rbtree for service ranges in the name table is built based
on the 'lower' & 'upper' range values resulting in a flaw in the rbtree
searching. Some issues have been observed in case of range overlapping:
Case #1: unable to withdraw a name entry:
After some name services are bound, all of them are withdrawn by user
but one remains in the name table forever. This corrupts the table and
that service becomes dummy i.e. no real port.
E.g.
/
{22, 22}
/
/
---> {10, 50}
/ \
/ \
{10, 30} {20, 60}
The node {10, 30} cannot be removed since the rbtree searching stops at
the node's ancestor i.e. {10, 50}, so starting from it will never reach
the finding node.
Case #2: failed to send data in some cases:
E.g. Two service ranges: {20, 60}, {10, 50} are bound. The rbtree for
this service will be one of the two cases below depending on the order
of the bindings:
{20, 60} {10, 50} <--
/ \ / \
/ \ / \
{10, 50} NIL <-- NIL {20, 60}
(a) (b)
Now, try to send some data to service {30}, there will be two results:
(a): Failed, no route to host.
(b): Ok.
The reason is that the rbtree searching will stop at the pointing node
as shown above.
Case #3: Same as case #2b above but if the data sending's scope is
local and the {10, 50} is published by a peer node, then it will result
in 'no route to host' even though the other {20, 60} is for example on
the local node which should be able to get the data.
The issues are actually due to the way we built the rbtree. This commit
fixes it by introducing an additional field to each node - named 'max',
which is the largest 'upper' of that node subtree. The 'max' value for
each subtrees will be propagated correctly whenever a node is inserted/
removed or the tree is rebalanced by the augmented rbtree callbacks.
By this way, we can change the rbtree searching appoarch to solve the
issues above. Another benefit from this is that we can now improve the
searching for a next range matching e.g. in case of multicast, so get
rid of the unneeded looping over all nodes in the tree.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The socket read/write helpers only look at the file O_NONBLOCK. not
the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2
and io_uring that rely on not having the file itself marked nonblocking,
but rather the iocb itself.
Cc: netdev@vger.kernel.org
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is softlockup when using TPACKET_V3:
...
NMI watchdog: BUG: soft lockup - CPU#2 stuck for 60010ms!
(__irq_svc) from [<c0558a0c>] (_raw_spin_unlock_irqrestore+0x44/0x54)
(_raw_spin_unlock_irqrestore) from [<c027b7e8>] (mod_timer+0x210/0x25c)
(mod_timer) from [<c0549c30>]
(prb_retire_rx_blk_timer_expired+0x68/0x11c)
(prb_retire_rx_blk_timer_expired) from [<c027a7ac>]
(call_timer_fn+0x90/0x17c)
(call_timer_fn) from [<c027ab6c>] (run_timer_softirq+0x2d4/0x2fc)
(run_timer_softirq) from [<c021eaf4>] (__do_softirq+0x218/0x318)
(__do_softirq) from [<c021eea0>] (irq_exit+0x88/0xac)
(irq_exit) from [<c0240130>] (msa_irq_exit+0x11c/0x1d4)
(msa_irq_exit) from [<c0209cf0>] (handle_IPI+0x650/0x7f4)
(handle_IPI) from [<c02015bc>] (gic_handle_irq+0x108/0x118)
(gic_handle_irq) from [<c0558ee4>] (__irq_usr+0x44/0x5c)
...
If __ethtool_get_link_ksettings() is failed in
prb_calc_retire_blk_tmo(), msec and tmo will be zero, so tov_in_jiffies
is zero and the timer expire for retire_blk_timer is turn to
mod_timer(&pkc->retire_blk_timer, jiffies + 0),
which will trigger cpu usage of softirq is 100%.
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Tested-by: Xiao Jiangfeng <xiaojiangfeng@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Wait for rcu grace period after releasing netns in ctnetlink,
from Florian Westphal.
2) Incorrect command type in flowtable offload ndo invocation,
from wenxu.
3) Incorrect callback type in flowtable offload flow tuple
updates, also from wenxu.
4) Fix compile warning on flowtable offload infrastructure due to
possible reference to uninitialized variable, from Nathan Chancellor.
5) Do not inline nf_ct_resolve_clash(), this is called from slow
path / stress situations. From Florian Westphal.
6) Missing IPv6 flow selector description in flowtable offload.
7) Missing check for NETDEV_UNREGISTER in nf_tables offload
infrastructure, from wenxu.
8) Update NAT selftest to use randomized netns names, from
Florian Westphal.
9) Restore nfqueue bridge support, from Marco Oliverio.
10) Compilation warning in SCTP_CHUNKMAP_*() on xt_sctp header.
From Phil Sutter.
11) Fix bogus lookup/get match for non-anonymous rbtree sets.
12) Missing netlink validation for NFT_SET_ELEM_INTERVAL_END
elements.
13) Missing netlink validation for NFT_DATA_VALUE after
nft_data_init().
14) If rule specifies no actions, offload infrastructure returns
EOPNOTSUPP.
15) Module refcount leak in object updates.
16) Missing sanitization for ARP traffic from br_netfilter, from
Eric Dumazet.
17) Compilation breakage on big-endian due to incorrect memcpy()
size in the flowtable offload infrastructure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In function 'memcpy',
inlined from 'flow_offload_mangle' at net/netfilter/nf_flow_table_offload.c:112:2,
inlined from 'flow_offload_port_dnat' at net/netfilter/nf_flow_table_offload.c:373:2,
inlined from 'nf_flow_rule_route_ipv4' at net/netfilter/nf_flow_table_offload.c:424:3:
./include/linux/string.h:376:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
376 | __read_overflow2();
| ^~~~~~~~~~~~~~~~~~
The original u8* was done in the hope to make this more adaptable but
consensus is to keep this like it is in tc pedit.
Fixes: c29f74e0df ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().
This patch is generated using following script:
EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"
git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do
if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
This is needed, because if the flag X25_ACCPT_APPRV_FLAG is not set on a
socket (manual call confirmation) and the channel is cleared by remote
before the manual call confirmation was sent, this situation needs to
be handled.
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot found a crash:
BUG: KMSAN: uninit-value in crc32_body lib/crc32.c:112 [inline]
BUG: KMSAN: uninit-value in crc32_le_generic lib/crc32.c:179 [inline]
BUG: KMSAN: uninit-value in __crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
Call Trace:
crc32_body lib/crc32.c:112 [inline]
crc32_le_generic lib/crc32.c:179 [inline]
__crc32c_le_base+0x4fa/0xd30 lib/crc32.c:202
chksum_update+0xb2/0x110 crypto/crc32c_generic.c:90
crypto_shash_update+0x4c5/0x530 crypto/shash.c:107
crc32c+0x150/0x220 lib/libcrc32c.c:47
sctp_csum_update+0x89/0xa0 include/net/sctp/checksum.h:36
__skb_checksum+0x1297/0x12a0 net/core/skbuff.c:2640
sctp_compute_cksum include/net/sctp/checksum.h:59 [inline]
sctp_packet_pack net/sctp/output.c:528 [inline]
sctp_packet_transmit+0x40fb/0x4250 net/sctp/output.c:597
sctp_outq_flush_transports net/sctp/outqueue.c:1146 [inline]
sctp_outq_flush+0x1823/0x5d80 net/sctp/outqueue.c:1194
sctp_outq_uncork+0xd0/0xf0 net/sctp/outqueue.c:757
sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1781 [inline]
sctp_side_effects net/sctp/sm_sideeffect.c:1184 [inline]
sctp_do_sm+0x8fe1/0x9720 net/sctp/sm_sideeffect.c:1155
sctp_primitive_REQUESTHEARTBEAT+0x175/0x1a0 net/sctp/primitive.c:185
sctp_apply_peer_addr_params+0x212/0x1d40 net/sctp/socket.c:2433
sctp_setsockopt_peer_addr_params net/sctp/socket.c:2686 [inline]
sctp_setsockopt+0x189bb/0x19090 net/sctp/socket.c:4672
The issue was caused by transport->ipaddr set with uninit addr param, which
was passed by:
sctp_transport_init net/sctp/transport.c:47 [inline]
sctp_transport_new+0x248/0xa00 net/sctp/transport.c:100
sctp_assoc_add_peer+0x5ba/0x2030 net/sctp/associola.c:611
sctp_process_param net/sctp/sm_make_chunk.c:2524 [inline]
where 'addr' is set by sctp_v4_from_addr_param(), and it doesn't initialize
the padding of addr->v4.
Later when calling sctp_make_heartbeat(), hbinfo.daddr(=transport->ipaddr)
will become the part of skb, and the issue occurs.
This patch is to fix it by initializing the padding of addr->v4 in
sctp_v4_from_addr_param(), as well as other functions that do the similar
thing, and these functions shouldn't trust that the caller initializes the
memory, as Marcelo suggested.
Reported-by: syzbot+6dcbfea81cd3d4dd0b02@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
neigh_cleanup() has not been used for seven years, and was a wrong design.
Messing with shared pointer in bond_neigh_init() without proper
memory barriers would at least trigger syzbot complains eventually.
It is time to remove this stuff.
Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl3s3dwTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRBaxiGjkeSdIKhhB/sE+kGrGECa6DSehxKsSTCvh/5AGw2U
AGh+AHgpfCRKOPNBeN0yKCdBVSE40fURsIjNvn5SDZ1cY8tUOVCJAyUU31MfgzkW
b+Xiro5iOu6G/UpfZgMCeGV617FZg/65rz7ylwNzzA7zuUcoml6o4A55/QUFNIXx
juDzhdKmpmLRQusekzbP9qyCgDGXv8/cPBloI+frzqaDU5bgS6Q3ldHe26lkQzNH
bwF6bCGpJYykuNFIrLoFpSuJ+7ttZhLexXCwsl90UfpwmHVvUgQKJuF8JVR1AF3x
5mjHWi3GFyi4aUL4yY4kU3cWWPse3MQLPt5QqMu56dW4UADp1k3FHByo
=YMzV
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-5.5-20191208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2019-12-08
this is a pull request of 13 patches for net/master.
The first two patches are by Dan Murphy. He adds himself as a maintainer to the
m-can MMIO and tcan SPI driver.
The next two patches the j1939 stack. The first one is by Oleksij Rempel and
fixes a locking problem found by the syzbot, the second one is by me an fixes a
mistake in the documentation.
Srinivas Neeli fixes missing RX CAN packets on CANFD2.0 in the xilinx driver.
Sean Nyekjaer fixes a possible deadlock in the the flexcan driver after
suspend/resume. Joakim Zhang contributes two patches for the flexcan driver
that fix problems with the low power enter/exit.
The next 4 patches all target the tcan part of the m_can driver. Sean Nyekjaer
adds the required delay after reset and fixes the device tree binding example.
Dan Murphy's patches make the wake-gpio optional.
In the last patch Xiaolong Huang fixes several kernel memory info leaks to the
USB device in the kvaser_usb_leaf driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>