tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
function sk_alloc_sg(). In case the number of SG entries hit
MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
current SG index to '0'. This leads to calling of function
tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
kernel crash. To fix this, set the number of SG elements to the number
of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
returns -ENOSPC.
Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the documentation in msg_zerocopy.rst, the SO_ZEROCOPY
flag was introduced because send(2) ignores unknown message flags and
any legacy application which was accidentally passing the equivalent of
MSG_ZEROCOPY earlier should not see any new behaviour.
Before commit f214f915e7 ("tcp: enable MSG_ZEROCOPY"), a send(2) call
which passed the equivalent of MSG_ZEROCOPY without setting SO_ZEROCOPY
would succeed. However, after that commit, it fails with -ENOBUFS. So
it appears that the SO_ZEROCOPY flag fails to fulfill its intended
purpose. Fix it.
Fixes: f214f915e7 ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When nla_put*() fails after nla_nest_start(), we need
to call nla_nest_cancel() to cancel the message, otherwise
we end up calling nla_nest_end() like a success.
Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__tipc_nl_compat_dumpit() uses a netlink_callback on stack,
so the only way to align it with other ->dumpit() call path
is calling tipc_dump_start() and tipc_dump_done() directly
inside it. Otherwise ->dumpit() would always get NULL from
cb->args[].
But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve
net pointer, the cb->skb here doesn't set skb->sk, the net pointer
is saved in msg->net instead, so introduce a helper function
__tipc_dump_start() to pass in msg->net.
Ying pointed out cb->args[0...3] are already used by other
callbacks on this call path, so we can't use cb->args[0] any
more, use cb->args[4] instead.
Fixes: 9a07efa9ae ("tipc: switch to rhashtable iterator")
Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending an skb, afiucv_hs_send() bails out on various error
conditions. But currently the caller has no way of telling whether the
skb was freed or not - resulting in potentially either
a) leaked skbs from iucv_send_ctrl(), or
b) double-free's from iucv_sock_sendmsg().
As dev_queue_xmit() will always consume the skb (even on error), be
consistent and also free the skb from all other error paths. This way
callers no longer need to care about managing the skb.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inbound packets may have any combination of flag bits set in their iucv
header. If we don't know how to handle a specific combination, drop the
skb instead of leaking it.
To clarify what error is returned in this case, replace the hard-coded
0 with the corresponding macro.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If users try to install act_tunnel_key 'set' rules with duplicate values
of 'index', the tunnel metadata are allocated, but never released. Then,
kmemleak complains as follows:
# tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
# echo clear > /sys/kernel/debug/kmemleak
# tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
Error: TC IDR already exists.
We have an error talking to the kernel
# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800574e6c80 (size 256):
comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff ................
81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00 .$..............
backtrace:
[<00000000b7afbf4e>] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
[<000000007d98fccd>] tcf_action_init_1+0x698/0xac0
[<0000000099b8f7cc>] tcf_action_init+0x15c/0x590
[<00000000dc60eebe>] tc_ctl_action+0x336/0x5c2
[<000000002f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0
[<000000000bfe7575>] netlink_rcv_skb+0x124/0x350
[<00000000edab656f>] netlink_unicast+0x40f/0x5d0
[<00000000b322cdcb>] netlink_sendmsg+0x6e8/0xba0
[<0000000063d9d490>] sock_sendmsg+0xb3/0xf0
[<00000000f0d3315a>] ___sys_sendmsg+0x654/0x960
[<00000000c06cbd42>] __sys_sendmsg+0xd3/0x170
[<00000000ce72e4b0>] do_syscall_64+0xa5/0x470
[<000000005caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<00000000fac1b476>] 0xffffffffffffffff
This problem theoretically happens also in case users attempt to setup a
geneve rule having wrong configuration data, or when the kernel fails to
allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
metadata also in the above conditions.
Addresses-Coverity-ID: 1373974 ("Resource leak")
Fixes: d0f6dd8a91 ("net/sched: Introduce act_tunnel_key")
Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before we unlock the sock in tipc_release(), we have to
detach sk->sk_socket from sk, otherwise a parallel
tipc_sk_fill_sock_diag() could stil read it after we
free this socket.
Fixes: c30b70deb5 ("tipc: implement socket diagnostics for AF_TIPC")
Reported-and-tested-by: syzbot+48804b87c16588ad491d@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Must perform TXQ teardown before unregistering interfaces in
mac80211, from Toke Høiland-Jørgensen.
2) Don't allow creating mac80211_hwsim with less than one channel, from
Johannes Berg.
3) Division by zero in cfg80211, fix from Johannes Berg.
4) Fix endian issue in tipc, from Haiqing Bai.
5) BPF sockmap use-after-free fixes from Daniel Borkmann.
6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.
7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.
8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.
9) Fix deadlock in hv_netvsc, from Dexuan Cui.
10) Do not restart timewait timer on RST, from Florian Westphal.
11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.
12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.
13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.
14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.
15) Use-after-free in act_ife, fix from Cong Wang.
16) Missing hold to meta module in act_ife, from Vlad Buslov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
net: phy: sfp: Handle unimplemented hwmon limits and alarms
net: sched: action_ife: take reference to meta module
act_ife: fix a potential use-after-free
net/mlx5: Fix SQ offset in QPs with small RQ
tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
tipc: correct spelling errors for struct tipc_bc_base's comment
bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
bnxt_en: Clean up unused functions.
bnxt_en: Fix firmware signaled resource change logic in open.
sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
sctp: fix invalid reference to the index variable of the iterator
net/ibm/emac: wrong emac_calc_base call was used by typo
net: sched: null actions array pointer before releasing action
vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
r8169: add support for NCube 8168 network card
ip6_tunnel: respect ttl inherit for ip6tnl
mac80211: shorten the IBSS debug messages
mac80211: don't Tx a deauth frame if the AP forbade Tx
mac80211: Fix station bandwidth setting after channel switch
mac80211: fix a race between restart and CSA flows
...
Immediately after module_put(), user could delete this
module, so e->ops could be already freed before we call
e->ops->release().
Fix this by moving module_put() after ops->release().
Fixes: ef6980b6be ("introduce IFE action")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* various A-MSDU building fixes (currently only affects mt76)
* syzkaller & spectre fixes in hwsim
* TXQ vs. teardown fix that was causing crashes
* embed WMM info in reg rule, bad code here had been causing crashes
* one compilation issue with fix from Arnd (rfkill-gpio includes)
* fixes for a race and bad data during/after channel switch
* nl80211: a validation fix, attribute type & unit fixes
along with other small fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAluNJXcACgkQB8qZga/f
l8Qvfw//dBwlhMII862Evk4M8OzhdHfkJ4Kp/d2C476whbEySU/jRIIeetmVpXYV
5cfStTxBpGkwMj5PXy3DaA2PO++L5qaApDJfHc8DNWNmvt9rRRJul1zP05HjZRxW
G7aFCFRWVK0dlmVP9GC/b20KyUvz4OpklBnxylkIrx0FCkw5bAHs1SsjGZCg/6Tm
008DAhFz3Ds6hNLxwricvrk5oQ6eC1cDfDd4Rtk3jCYQ4t7KFn5gFoKzKldfLdWe
TFTpVQ26XAGzn9QVXzAiXN4ZNpUpZrFXosC7cn5Ugiyic4YtnHxS2wVDuL3vs1cL
J2hoW6wjEBg+U6vmHMcijo1lnQwW7ueYUDWLJPNIXHA6A7sGyA6z6D7vbbvHfoG6
L681BrYmTmKkXXquu5+r85/9WgP2cmzbRpoIxTQl3sU2Liw2k5IJ9ryLLyul+8z7
spnDPOY7h4c0JrAvhjHkrKIbbW4FKYunxZJ8dn9eyAzOd/58iKoXzu4yAggwm+0V
DtZiu0gSr52sKrh1vqEyfhrPFCN1Mc19DRsJBtabUfVEveQTwToCkbZ5s1sLqSId
m30XUjjYOiRk7MZnncar0lE4//eJ6bnL3Wie3UTmO3xsMwlgKQPqjI4TprNogUCk
R2dVeGmhm3HSriRHKJL3/D8uzw5mMBI3Kicw9tFSSyVjtJgxvpg=
=lLBA
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2018-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Here are quite a large number of fixes, notably:
* various A-MSDU building fixes (currently only affects mt76)
* syzkaller & spectre fixes in hwsim
* TXQ vs. teardown fix that was causing crashes
* embed WMM info in reg rule, bad code here had been causing crashes
* one compilation issue with fix from Arnd (rfkill-gpio includes)
* fixes for a race and bad data during/after channel switch
* nl80211: a validation fix, attribute type & unit fixes
along with other small fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix for two spelling mistakes.
Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When users set params.spp_address and get a trans, ipv6_flowlabel flag
should be applied into this trans. But even if this one is not an ipv6
trans, it should not go to apply it into all other transes of the asoc
but simply ignore it.
Fixes: 0b0dce7a36 ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now in sctp_apply_peer_addr_params(), if SPP_IPV6_FLOWLABEL flag is set
and trans is NULL, it would use trans as the index variable to traverse
transport_addr_list, then trans is set as the last transport of it.
Later, if SPP_DSCP flag is set, it would enter into the wrong branch as
trans is actually an invalid reference.
So fix it by using a new index variable to traverse transport_addr_list
for both SPP_DSCP and SPP_IPV6_FLOWLABEL flags process.
Fixes: 0b0dce7a36 ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
man ip-tunnel ttl section says:
0 is a special value meaning that packets inherit the TTL value.
IPv4 tunnel respect this in ip_tunnel_xmit(), but IPv6 tunnel has not
implement it yet. To make IPv6 behave consistently with IP tunnel,
add ipv6 tunnel inherit support.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tracing is enabled, all the debug messages are recorded and must
not exceed MAX_MSG_LEN (100) columns. Longer debug messages grant the
user with:
WARNING: CPU: 3 PID: 32642 at /tmp/wifi-core-20180806094828/src/iwlwifi-stack-dev/net/mac80211/./trace_msg.h:32 trace_event_raw_event_mac80211_msg_event+0xab/0xc0 [mac80211]
Workqueue: phy1 ieee80211_iface_work [mac80211]
RIP: 0010:trace_event_raw_event_mac80211_msg_event+0xab/0xc0 [mac80211]
Call Trace:
__sdata_dbg+0xbd/0x120 [mac80211]
ieee80211_ibss_rx_queued_mgmt+0x15f/0x510 [mac80211]
ieee80211_iface_work+0x21d/0x320 [mac80211]
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the driver fails to properly prepare for the channel
switch, mac80211 will disconnect. If the CSA IE had mode
set to 1, it means that the clients are not allowed to send
any Tx on the current channel, and that includes the
deauthentication frame.
Make sure that we don't send the deauthentication frame in
this case.
In iwlwifi, this caused a failure to flush queues since the
firmware already closed the queues after having parsed the
CSA IE. Then mac80211 would wait until the deauthentication
frame would go out (drv_flush(drop=false)) and that would
never happen.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When performing a channel switch flow for a managed interface, the
flow did not update the bandwidth of the AP station and the rate
scale algorithm. In case of a channel width downgrade, this would
result with the rate scale algorithm using a bandwidth that does not
match the interface channel configuration.
Fix this by updating the AP station bandwidth and rate scaling algorithm
before the actual channel change in case of a bandwidth downgrade, or
after the actual channel change in case of a bandwidth upgrade.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We hit a problem with iwlwifi that was caused by a bug in
mac80211. A bug in iwlwifi caused the firwmare to crash in
certain cases in channel switch. Because of that bug,
drv_pre_channel_switch would fail and trigger the restart
flow.
Now we had the hw restart worker which runs on the system's
workqueue and the csa_connection_drop_work worker that runs
on mac80211's workqueue that can run together. This is
obviously problematic since the restart work wants to
reconfigure the connection, while the csa_connection_drop_work
worker does the exact opposite: it tries to disconnect.
Fix this by cancelling the csa_connection_drop_work worker
in the restart worker.
Note that this can sound racy: we could have:
driver iface_work CSA_work restart_work
+++++++++++++++++++++++++++++++++++++++++++++
|
<--drv_cs ---|
<FW CRASH!>
-CS FAILED-->
| |
| cancel_work(CSA)
schedule |
CSA work |
| |
Race between those 2
But this is not possible because we flush the workqueue
in the restart worker before we cancel the CSA worker.
That would be bullet proof if we could guarantee that
we schedule the CSA worker only from the iface_work
which runs on the workqueue (and not on the system's
workqueue), but unfortunately we do have an instance
in which we schedule the CSA work outside the context
of the workqueue (ieee80211_chswitch_done).
Note also that we should probably cancel other workers
like beacon_connection_loss_work and possibly others
for different types of interfaces, at the very least,
IBSS should suffer from the exact same problem, but for
now, do the minimum to fix the actual bug that was actually
experienced and reproduced.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In commit 9236c4523e5b ("mac80211: limit wmm params to comply
with ETSI requirements"), we have limited the WMM parameters to
comply with 802.11 and ETSI standard. Mistakenly the TXOP value
was caluclated wrong. Fix it by taking the minimum between
802.11 to ETSI to make sure we are not violating both.
Fixes: e552af0581 ("mac80211: limit wmm params to comply with ETSI requirements")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The "chandef->center_freq1" variable is a u32 but "freq" is a u16 so we
are truncating away the high bits. I noticed this bug because in commit
9cf0a0b4b6 ("cfg80211: Add support for 60GHz band channels 5 and 6")
we made "freq <= 56160 + 2160 * 6" a valid requency when before it was
only "freq <= 56160 + 2160 * 4" that was valid. It introduces a static
checker warning:
net/wireless/util.c:1571 ieee80211_chandef_to_operating_class()
warn: always true condition '(freq <= 56160 + 2160 * 6) => (0-u16max <= 69120)'
But really we probably shouldn't have been truncating the high bits
away to begin with.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Initialize 'n' to 2 in order to take into account also the first
packet in the estimation of max_subframe limit for a given A-MSDU
since frag_tail pointer is NULL when ieee80211_amsdu_aggregate
routine analyzes the second frame.
Fixes: 6e0456b545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Daniel Borkmann says:
====================
pull-request: bpf 2018-09-02
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix one remaining buggy offset override in sockmap's bpf_msg_pull_data()
when linearizing multiple scatterlist elements, from Tushar.
2) Fix BPF sockmap's misuse of ULP when a collision with another ULP is
found on map update where it would release existing ULP. syzbot found and
triggered this couple of times now, fix from John.
3) Add missing xskmap type to bpftool so it will properly show the type
on map dump, from Prashant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan reported a regression after an update to 4.18.5. In this case ipv6
default route is setup by systemd-networkd based on data from an RA. The
RA contains an MTU of 1492 which is used when the route is first inserted
but then systemd-networkd pushes down updates to the default route
without the mtu set.
Prior to the change to fib6_info, metrics such as MTU were held in the
dst_entry and rt6i_pmtu in rt6_info contained an update to the mtu if
any. ip6_mtu would look at rt6i_pmtu first and use it if set. If not,
the value from the metrics is used if it is set and finally falling
back to the idev value.
After the fib6_info change metrics are contained in the fib6_info struct
and there is no equivalent to rt6i_pmtu. To maintain consistency with
the old behavior the new code should only reset the MTU in the metrics
if the route update has it set.
Fixes: d4ead6b34b ("net/ipv6: move metrics from dst to rt6_info")
Reported-by: Jan Janssen <medhefgo@web.de>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After link down and up, i.e. when call ip_mc_up(), we doesn't init
im->unsolicit_count. So after igmp_timer_expire(), we will not start
timer again and only send one unsolicit report at last.
Fix it by initializing im->unsolicit_count in igmp_group_added(), so
we can respect igmp robustness value.
Fixes: 24803f38a5 ("igmp: do not remove igmp souce list info when set link down")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should not start timer if im->unsolicit_count equal to 0 after decrease.
Or we will send one more unsolicit report message. i.e. 3 instead of 2 by
default.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Helper bpf_msg_pull_data() mistakenly reuses variable 'offset' while
linearizing multiple scatterlist elements. Variable 'offset' is used
to find first starting scatterlist element
i.e. msg->data = sg_virt(&sg[first_sg]) + start - offset"
Use different variable name while linearizing multiple scatterlist
elements so that value contained in variable 'offset' won't get
overwritten.
Fixes: 015632bb30 ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull core fixes from Thomas Gleixner:
"A small set of updates for core code:
- Prevent tracing in functions which are called from trace patching
via stop_machine() to prevent executing half patched function trace
entries.
- Remove old GCC workarounds
- Remove pointless includes of notifier.h"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Remove workaround for unreachable warnings from old GCC
notifier: Remove notifier header file wherever not used
watchdog: Mark watchdog touch functions as notrace
Commit 80f1a0f4e0 ("net/ipv6: Put lwtstate when destroying fib6_info")
partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
with ip6_rt_copy_init(), and it should be done only once there.
rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
ip6_rt_copy_init(), so there is no need to get it again at the end.
With this patch, lwtstate also isn't copied from RTF_REJECT routes.
[1]:
unreferenced object 0xffff880b6aaa14e0 (size 64):
comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
hex dump (first 32 bytes):
01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
[<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
[<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
[<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
[<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
[<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
[<000000004c893c76>] netlink_unicast+0x417/0x5a0
[<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
[<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
[<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
[<000000001e7426c8>] __sys_sendmsg+0xde/0x170
[<00000000fe411443>] do_syscall_64+0x9f/0x4a0
[<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<000000006d21f353>] 0xffffffffffffffff
Fixes: 6edb3c96a5 ("net/ipv6: Defer initialization of dst to data path")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 1337 says:
''Ignore RST segments in TIME-WAIT state.
If the 2 minute MSL is enforced, this fix avoids all three hazards.''
So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk
expire rather than removing it instantly when a reset is received.
However, Linux will also re-start the TIME-WAIT timer.
This causes connect to fail when tying to re-use ports or very long
delays (until syn retry interval exceeds MSL).
packetdrill test case:
// Demonstrate bogus rearming of TIME-WAIT timer in rfc1337 mode.
`sysctl net.ipv4.tcp_rfc1337=1`
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < S 0:0(0) win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
// Receive first segment
0.310 < P. 1:1001(1000) ack 1 win 46
// Send one ACK
0.310 > . 1:1(0) ack 1001
// read 1000 byte
0.310 read(4, ..., 1000) = 1000
// Application writes 100 bytes
0.350 write(4, ..., 100) = 100
0.350 > P. 1:101(100) ack 1001
// ACK
0.500 < . 1001:1001(0) ack 101 win 257
// close the connection
0.600 close(4) = 0
0.600 > F. 101:101(0) ack 1001 win 244
// Our side is in FIN_WAIT_1 & waits for ack to fin
0.7 < . 1001:1001(0) ack 102 win 244
// Our side is in FIN_WAIT_2 with no outstanding data.
0.8 < F. 1001:1001(0) ack 102 win 244
0.8 > . 102:102(0) ack 1002 win 244
// Our side is now in TIME_WAIT state, send ack for fin.
0.9 < F. 1002:1002(0) ack 102 win 244
0.9 > . 102:102(0) ack 1002 win 244
// Peer reopens with in-window SYN:
1.000 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
// Therefore, reply with ACK.
1.000 > . 102:102(0) ack 1002 win 244
// Peer sends RST for this ACK. Normally this RST results
// in tw socket removal, but rfc1337=1 setting prevents this.
1.100 < R 1002:1002(0) win 244
// second syn. Due to rfc1337=1 expect another pure ACK.
31.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
31.0 > . 102:102(0) ack 1002 win 244
// .. and another RST from peer.
31.1 < R 1002:1002(0) win 244
31.2 `echo no timer restart;ss -m -e -a -i -n -t -o state TIME-WAIT`
// third syn after one minute. Time-Wait socket should have expired by now.
63.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
// so we expect a syn-ack & 3whs to proceed from here on.
63.0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
Without this patch, 'ss' shows restarts of tw timer and last packet is
thus just another pure ack, more than one minute later.
This restores the original code from commit 283fd6cf0be690a83
("Merge in ANK networking jumbo patch") in netdev-vger-cvs.git .
For some reason the else branch was removed/lost in 1f28b683339f7
("Merge in TCP/UDP optimizations and [..]") and timer restart became
unconditional.
Reported-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Getting prompt "The RDS Protocol" (RDS) is not too helpful, and it is
easily confused with Radio Data System (which we may want to support
in kernel, too).
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 71e4128620.
mmap()/munmap() can not be backed by kmalloced pages :
We fault in :
VM_BUG_ON_PAGE(PageSlab(page), page);
unmap_single_vma+0x8a/0x110
unmap_vmas+0x4b/0x90
unmap_region+0xc9/0x140
do_munmap+0x274/0x360
vm_munmap+0x81/0xc0
SyS_munmap+0x2b/0x40
do_syscall_64+0x13e/0x1c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Fixes: 71e4128620 ("packet: switch kvzalloc to allocate memory")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: John Sperbeck <jsperbeck@google.com>
Bisected-by: John Sperbeck <jsperbeck@google.com>
Cc: Zhang Yu <zhangyu31@baidu.com>
Cc: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the error path of changing the SKB headroom of the second
A-MSDU subframe, we would not account for the already-changed
length of the first frame that just got converted to be in
A-MSDU format and thus is a bit longer now.
Fix this by doing the necessary accounting.
It would be possible to reorder the operations, but that would
make the code more complex (to calculate the necessary pad),
and the headroom expansion should not fail frequently enough
to make that worthwhile.
Fixes: 6e0456b545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Do not start to aggregate packets in a A-MSDU frame (converting the
first subframe to A-MSDU, adding the header) if max_tx_fragments or
max_amsdu_subframes limits are already exceeded by it. In particular,
this happens when drivers set the limit to 1 to avoid A-MSDUs at all.
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
[reword commit message to be more precise]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nl80211_update_ft_ies() tried to validate NL80211_ATTR_IE with
is_valid_ie_attr() before dereferencing it, but that helper function
returns true in case of NULL pointer (i.e., attribute not included).
This can result to dereferencing a NULL pointer. Fix that by explicitly
checking that NL80211_ATTR_IE is included.
Fixes: 355199e02b ("cfg80211: Extend support for IEEE 802.11r Fast BSS Transition")
Signed-off-by: Arunk Khandavalli <akhandav@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
After the commit 802bfb1915 ("net/sched: user-space can't set
unknown tcfa_action values"), unknown tcfa_action values are
converted to TC_ACT_UNSPEC, but the common agreement is instead
rejecting such configurations.
This change also introduces a helper to simplify the destruction
of a single action, avoiding code duplication.
v1 -> v2:
- helper is now static and renamed according to act_* convention
- updated extack message, according to the new behavior
Fixes: 802bfb1915 ("net/sched: user-space can't set unknown tcfa_action values")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_unregister_all(PF_INET6) gets called from inet6_init in cases when
no handler has been registered for PF_INET6 yet, for example if
ip6_mr_init() fails. Abort and avoid a NULL pointer deref in that case.
Example of panic (triggered by faking a failure of
register_pernet_subsys):
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[...]
RIP: 0010:rtnl_unregister_all+0x17e/0x2a0
[...]
Call Trace:
? rtnetlink_net_init+0x250/0x250
? sock_unregister+0x103/0x160
? kernel_getsockopt+0x200/0x200
inet6_init+0x197/0x20d
Fixes: e2fddf5e96 ("[IPV6]: Make af_inet6 to check ip6_route_init return value.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6d0bfe2261 ("net: ipv6: Add IPv6 support to the ping socket.")
contains an error in the cleanup path of inet6_init(): when
proto_register(&pingv6_prot, 1) fails, we try to unregister
&pingv6_prot. When rawv6_init() fails, we skip unregistering
&pingv6_prot.
Example of panic (triggered by faking a failure of
proto_register(&pingv6_prot, 1)):
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[...]
RIP: 0010:__list_del_entry_valid+0x79/0x160
[...]
Call Trace:
proto_unregister+0xbb/0x550
? trace_preempt_on+0x6f0/0x6f0
? sock_no_shutdown+0x10/0x10
inet6_init+0x153/0x1b8
Fixes: 6d0bfe2261 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 15e668070a ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
moved the cleanup label for ipmr_fail, but should have changed the
contents of the cleanup labels as well. Now we can end up cleaning up
icmpv6 even though it hasn't been initialized (jump to icmp_fail or
ipmr_fail).
Simply undo things in the reverse order of their initialization.
Example of panic (triggered by faking a failure of icmpv6_init):
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[...]
RIP: 0010:__list_del_entry_valid+0x79/0x160
[...]
Call Trace:
? lock_release+0x8a0/0x8a0
unregister_pernet_operations+0xd4/0x560
? ops_free_list+0x480/0x480
? down_write+0x91/0x130
? unregister_pernet_subsys+0x15/0x30
? down_read+0x1b0/0x1b0
? up_read+0x110/0x110
? kmem_cache_create_usercopy+0x1b4/0x240
unregister_pernet_subsys+0x1d/0x30
icmpv6_cleanup+0x1d/0x30
inet6_init+0x1b5/0x23f
Fixes: 15e668070a ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported a use-after-free in tipc_group_fill_sock_diag(),
where tipc_group_fill_sock_diag() still reads tsk->group meanwhile
tipc_group_delete() just deletes it in tipc_release().
tipc_nl_sk_walk() aims to lock this sock when walking each sock
in the hash table to close race conditions with sock changes like
this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately
this doesn't work at all. All non-BH call path should take
lock_sock() instead to make it work.
tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu()
where RCU read lock is required, this is the reason why lock_sock()
can't be taken on this path. This could be resolved by switching to
rhashtable iterator API's, where taking a sleepable lock is possible.
Also, the iterator API's are friendly for restartable calls like
diag dump, the last position is remembered behind the scence,
all we need to do here is saving the iterator into cb->args[].
I tested this with parallel tipc diag dump and thousands of tipc
socket creation and release, no crash or memory leak.
Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rhashtable_walk_exit() must be paired with rhashtable_walk_enter().
Fixes: 40f9f43970 ("tipc: Fix tipc_sk_reinit race conditions")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before the commit d6990976af ("vti6: fix PMTU caching and reporting
on xmit") '!skb->ignore_df' check was always true because the function
skb_scrub_packet() was called before it, resetting ignore_df to zero.
In the commit, skb_scrub_packet() was moved below, and now this check
can be false for the packet, e.g. when sending it in the two fragments,
this prevents successful PMTU updates in such case. The next attempts
to send the packet lead to the same tx error. Moreover, vti6 initial
MTU value relies on PMTU adjustments.
This issue can be reproduced with the following LTP test script:
udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000
Fixes: ccd740cbc6 ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we perform the sg shift repair for the scatterlist ring, we
currently start out at i = first_sg + 1. However, this is not
correct since the first_sg could point to the sge sitting at slot
MAX_SKB_FRAGS - 1, and a subsequent i = MAX_SKB_FRAGS will access
the scatterlist ring (sg) out of bounds. Add the sk_msg_iter_var()
helper for iterating through the ring, and apply the same rule
for advancing to the next ring element as we do elsewhere. Later
work will use this helper also in other places.
Fixes: 015632bb30 ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>