Pull networking fixes from David Miller:
1) Fix list tests in netfilter ingress support, from Florian Westphal.
2) Fix reversal of input and output interfaces in ingress hook
invocation, from Pablo Neira Ayuso.
3) We have a use after free in r8169, caught by Dave Jones, fixed by
Francois Romieu.
4) Splice use-after-free fix in AF_UNIX frmo Hannes Frederic Sowa.
5) Three ipv6 route handling bug fixes from Martin KaFai Lau:
a) Don't create clone routes not managed by the fib6 tree
b) Don't forget to check expiration of DST_NOCACHE routes.
c) Handle rt->dst.from == NULL properly.
6) Several AF_PACKET fixes wrt transport header setting and SKB
protocol setting, from Daniel Borkmann.
7) Fix thunder driver crash on shutdown, from Pavel Fedin.
8) Several Mellanox driver fixes (max MTU calculations, use of correct
DMA unmap in TX path, etc.) from Saeed Mahameed, Tariq Toukan, Doron
Tsur, Achiad Shochat, Eran Ben Elisha, and Noa Osherovich.
9) Several mv88e6060 DSA driver fixes (wrong bit definitions for
certain registers, etc.) from Neil Armstrong.
10) Make sure to disable preemption while updating per-cpu stats of ip
tunnels, from Jason A. Donenfeld.
11) Various ARM64 bpf JIT fixes, from Yang Shi.
12) Flush icache properly in ARM JITs, from Daniel Borkmann.
13) Fix masking of RX and TX interrupts in ravb driver, from Masaru
Nagai.
14) Fix netdev feature propagation for devices not implementing
->ndo_set_features(). From Nikolay Aleksandrov.
15) Big endian fix in vmxnet3 driver, from Shrikrishna Khare.
16) RAW socket code increments incorrect SNMP counters, fix from Ben
Cartwright-Cox.
17) IPv6 multicast SNMP counters are bumped twice, fix from Neil Horman.
18) Fix handling of VLAN headers on stacked devices when REORDER is
disabled. From Vlad Yasevich.
19) Fix SKB leaks and use-after-free in ipvlan and macvlan drivers, from
Sabrina Dubroca.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
MAINTAINERS: Update Mellanox's Eth NIC driver entries
net/core: revert "net: fix __netdev_update_features return.." and add comment
af_unix: take receive queue lock while appending new skb
rtnetlink: fix frame size warning in rtnl_fill_ifinfo
net: use skb_clone to avoid alloc_pages failure.
packet: Use PAGE_ALIGNED macro
packet: Don't check frames_per_block against negative values
net: phy: Use interrupts when available in NOLINK state
phy: marvell: Add support for 88E1540 PHY
arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
macvlan: fix leak in macvlan_handle_frame
ipvlan: fix use after free of skb
ipvlan: fix leak in ipvlan_rcv_frame
vlan: Do not put vlan headers back on bridge and macvlan ports
vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
via-velocity: unconditionally drop frames with bad l2 length
ipg: Remove ipg driver
dl2k: Add support for IP1000A-based cards
snmp: Remove duplicate OUTMCAST stat increment
net: thunder: Check for driver data in nicvf_remove()
...
This reverts commit 00ee592717 ("net: fix __netdev_update_features return
on ndo_set_features failure")
and adds a comment explaining why it's okay to return a value other than
0 upon error. Some drivers might actually change flags and return an
error so it's better to fire a spurious notification rather than miss
these.
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While possibly in future we don't necessarily need to use
sk_buff_head.lock this is a rather larger change, as it affects the
af_unix fd garbage collector, diag and socket cleanups. This is too much
for a stable patch.
For the time being grab sk_buff_head.lock without disabling bh and irqs,
so don't use locked skb_queue_tail.
Fixes: 869e7c6248 ("net: af_unix: implement stream sendpage support")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following warning:
CC net/core/rtnetlink.o
net/core/rtnetlink.c: In function ‘rtnl_fill_ifinfo’:
net/core/rtnetlink.c:1308:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
}
^
by splitting up the huge rtnl_fill_ifinfo into some smaller ones, so we
don't have the huge frame allocations at the same time.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. new skb only need dst and ip address(v4 or v6).
2. skb_copy may need high order pages, which is very rare on long running server.
Signed-off-by: Junwei Zhang <linggao.zjw@alibaba-inc.com>
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use PAGE_ALIGNED(...) instead of open-coding it.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
rb->frames_per_block is an unsigned int, thus can never be negative.
Also fix spacing in the calculation of frames_per_block.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a vlan is configured with REORDER_HEADER set to 0, the vlan
header is put back into the packet and makes it appear that
the vlan header is still there even after it's been processed.
This posses a problem for bridge and macvlan ports. The packets
passed to those device may be forwarded and at the time of the
forward, vlan headers end up being unexpectedly present.
With the patch, we make sure that we do not put the vlan header
back (when REORDER_HEADER is 0) if a bridge or macvlan has
been configured on top of the vlan device.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have multiple stacked vlan devices all of which have
turned off REORDER_HEADER flag, the untag operation does not
locate the ethernet addresses correctly for nested vlans.
The reason is that in case of REORDER_HEADER flag being off,
the outer vlan headers are put back and the mac_len is adjusted
to account for the presense of the header. Then, the subsequent
untag operation, for the next level vlan, always use VLAN_ETH_HLEN
to locate the begining of the ethernet header and that ends up
being a multiple of 4 bytes short of the actuall beginning
of the mac header (the multiple depending on the how many vlan
encapsulations ethere are).
As a reslult, if there are multiple levles of vlan devices
with REODER_HEADER being off, the recevied packets end up
being dropped.
To solve this, we use skb->mac_len as the offset. The value
is always set on receive path and starts out as a ETH_HLEN.
The value is also updated when the vlan header manupations occur
so we know it will be correct.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using call_rcu(), the called function may be delayed quite
significantly, and without a matching rcu_barrier() there's no
way to be sure it has finished.
Therefore, global state that could be gone/freed/reused should
never be touched in the callback.
Fix this in mesh by moving the atomic_dec() into the caller;
that's not really a problem since we already unlinked the path
and it will be destroyed anyway.
This fixes a crash Jouni observed when running certain tests in
a certain order, in which the mesh interface was torn down, the
memory reused for a function pointer (work struct) and running
that then crashed since the pointer had been decremented by 1,
resulting in an invalid instruction byte stream.
Cc: stable@vger.kernel.org
Fixes: eb2b9311fd ("mac80211: mesh path table implementation")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For now, this feature doesn't actually work. To avoid shipping a
kernel that has it enabled but where it can't be used disable it
for now - we can re-enable it when it's fixed.
This partially reverts 44674d9c22 ("mac80211: advertise support
for full station state in AP mode").
Cc: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path. Remove the mcast bump, as its
not needed
Validated by the reporter, with good results
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Claus Jensen <claus.jensen@microsemi.com>
CC: Claus Jensen <claus.jensen@microsemi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent flaw in the netdev feature setting resulted in warnings
like this one from VLAN interfaces:
WARNING: CPU: 1 PID: 4975 at net/core/dev.c:2419 skb_warn_bad_offload+0xbc/0xcb()
: caps=(0x00000000001b5820, 0x00000000001b5829) len=2782 data_len=0 gso_size=1348 gso_type=16 ip_summed=3
The ":" is supposed to be preceded by a driver name, but in this
case it is an empty string since the device has no parent.
There are many types of network devices without a parent. The
anonymous warnings for these devices can be hard to debug. Log
the network device name instead in these cases to assist further
debugging.
This is mostly similar to how __netdev_printk() handles orphan
devices.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case multiple writes to a unix stream socket race we could end up in a
situation where we pre-allocate a new skb for use in unix_stream_sendpage
but have to free it again in the locked section because another skb
has been appended meanwhile, which we must use. Accidentally we didn't
clear the pointer after consuming it and so we touched freed memory
while appending it to the sk_receive_queue. So, clear the pointer after
consuming the skb.
This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.
Fixes: 869e7c6248 ("net: af_unix: implement stream sendpage support")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sending ICMP packets with raw sockets ends up in the SNMP counters
logging the type as the first byte of the IPv4 header rather than
the ICMP header. This is fixed by adding the IP Header Length to
the casting into a icmphdr struct.
Signed-off-by: Ben Cartwright-Cox <ben@benjojo.co.uk>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If ndo_set_features fails __netdev_update_features() will return -1 but
this is wrong because it is expected to return 0 if no features were
changed (see netdev_update_features()), which will cause a netdev
notifier to be called without any actual changes. Fix this by returning
0 if ndo_set_features fails.
Fixes: 6cb6a27c45 ("net: Call netdev_features_change() from netdev_update_features()")
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When __netdev_update_features() was updated to ensure some features are
disabled on new lower devices, an error was introduced for devices which
don't have the ndo_set_features() method set. Before we'll just set the
new features, but now we return an error and don't set them. Fix this by
returning the old behaviour and setting err to 0 when ndo_set_features
is not present.
Fixes: e7868a85e1 ("net/core: ensure features get disabled on new lower devs")
CC: Jarod Wilson <jarod@redhat.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Ido Schimmel <idosch@mellanox.com>
CC: Sander Eikelenboom <linux@eikelenboom.it>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Dave Young <dyoung@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When NET_SWITCHDEV=n, switchdev_port_attr_set simply returns EOPNOTSUPP.
In this case we should not emit errors and warnings to the kernel log.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 0bc05d585d ("switchdev: allow caller to explicitly request
attr_set as deferred")
Fixes: 6ac311ae8b ("Adding switchdev ageing notification on port
bridged")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYNACK packets might be attached to request sockets.
Use skb_to_full_sk() helper to avoid illegal accesses to
inet_sk(skb->sk)
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some functions access TCP sockets without holding a lock and
might output non consistent data, depending on compiler and or
architecture.
tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...
Introduce sk_state_load() and sk_state_store() to fix the issues,
and more clearly document where this lack of locking is happening.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
now sctp auth cannot work well when setting a hmacid manually, which
is caused by that we didn't use the network order for hmacid, so fix
it by adding the transformation in sctp_auth_ep_set_hmacs.
even we set hmacid with the network order in userspace, it still
can't work, because of this condition in sctp_auth_ep_set_hmacs():
if (id > SCTP_AUTH_HMAC_ID_MAX)
return -EOPNOTSUPP;
so this wasn't working before and thus it won't break compatibility.
Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since it's introduction in commit 69e3c75f4d ("net: TX_RING and
packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
reserve check should have reserve as 0, but currently, this is
unconditionally set (in it's original form as dev->hard_header_len).
I think this is not correct since tpacket_fill_skb() would then
take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
the extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check. Make it similar as we have it in packet_snd().
Fixes: 69e3c75f4d ("net: TX_RING and packet mmap")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case no struct sockaddr_ll has been passed to packet
socket's sendmsg() when doing a TX_RING flush run, then
skb->protocol is set to po->num instead, which is the protocol
passed via socket(2)/bind(2).
Applications only xmitting can go the path of allocating the
socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
TX_RING with sll_protocol of 0. That way, register_prot_hook()
is neither called on creation nor on bind time, which saves
cycles when there's no interest in capturing anyway.
That leaves us however with po->num 0 instead and therefore
the TX_RING flush run sets skb->protocol to 0 as well. Eric
reported that this leads to problems when using tools like
trafgen over bonding device. I.e. the bonding's hash function
could invoke the kernel's flow dissector, which depends on
skb->protocol being properly set. In the current situation, all
the traffic is then directed to a single slave.
Fix it up by inferring skb->protocol from the Ethernet header
when not set and we have ARPHRD_ETHER device type. This is only
done in case of SOCK_RAW and where we have a dev->hard_header_len
length. In case of ARPHRD_ETHER devices, this is guaranteed to
cover ETH_HLEN, and therefore being accessed on the skb after
the skb_store_bits().
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet sockets can be used by various net devices and are not
really restricted to ARPHRD_ETHER device types. However, when
currently checking for the extra 4 bytes that can be transmitted
in VLAN case, our assumption is that we generally probe on
ARPHRD_ETHER devices. Therefore, before looking into Ethernet
header, check the device type first.
This also fixes the issue where non-ARPHRD_ETHER devices could
have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
the check would test unfilled linear part of the skb (instead
of non-linear).
Fixes: 57f89bfa21 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
Fixes: 52f1454f62 ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.
While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.
Reference: http://thread.gmane.org/gmane.linux.network/386173/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tpacket_fill_skb() commit c1aad275b0 ("packet: set transport
header before doing xmit") and later on 40893fd0fd ("net: switch
to use skb_probe_transport_header()") was probing for a transport
header on the skb from a ring buffer slot, but at a time, where
the skb has _not even_ been filled with data yet. So that call into
the flow dissector is pretty useless. Lets do it after we've set
up the skb frags.
Fixes: c1aad275b0 ("packet: set transport header before doing xmit")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All DST_NOCACHE rt6_info used to have rt->dst.from set to
its parent.
After commit 8e3d5be736 ("ipv6: Avoid double dst_free"),
DST_NOCACHE is also set to rt6_info which does not have
a parent (i.e. rt->dst.from is NULL).
This patch catches the rt->dst.from == NULL case.
Fixes: 8e3d5be736 ("ipv6: Avoid double dst_free")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the expires of the DST_NOCACHE rt can be set during
the ip6_rt_update_pmtu(), we also need to consider the expires
value when doing ip6_dst_check().
This patches creates __rt6_check_expired() to only
check the expire value (if one exists) of the current rt.
In rt6_dst_from_check(), it adds __rt6_check_expired() as
one of the condition check.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1272571
The setup has a IPv4 GRE tunnel running in a IPSec. The bug
happens when ndisc starts sending router solicitation at the gre
interface. The simplified oops stack is like:
__lock_acquire+0x1b2/0x1c30
lock_acquire+0xb9/0x140
_raw_write_lock_bh+0x3f/0x50
__ip6_ins_rt+0x2e/0x60
ip6_ins_rt+0x49/0x50
~~~~~~~~
__ip6_rt_update_pmtu.part.54+0x145/0x250
ip6_rt_update_pmtu+0x2e/0x40
~~~~~~~~
ip_tunnel_xmit+0x1f1/0xf40
__gre_xmit+0x7a/0x90
ipgre_xmit+0x15a/0x220
dev_hard_start_xmit+0x2bd/0x480
__dev_queue_xmit+0x696/0x730
dev_queue_xmit+0x10/0x20
neigh_direct_output+0x11/0x20
ip6_finish_output2+0x21f/0x770
ip6_finish_output+0xa7/0x1d0
ip6_output+0x56/0x190
~~~~~~~~
ndisc_send_skb+0x1d9/0x400
ndisc_send_rs+0x88/0xc0
~~~~~~~~
The rt passed to ip6_rt_update_pmtu() is created by
icmp6_dst_alloc() and it is not managed by the fib6 tree,
so its rt6i_table == NULL. When __ip6_rt_update_pmtu() creates
a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL
and it causes the ip6_ins_rt() oops.
During pmtu update, we only want to create a RTF_CACHE clone
from a rt which is currently managed (or owned) by the
fib6 tree. It means either rt->rt6i_node != NULL or
rt is a RTF_PCPU clone.
It is worth to note that rt6i_table may not be NULL even it is
not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()).
Hence, rt6i_node is a better check instead of rt6i_table.
Fixes: 45e4fd2668 ("ipv6: Only create RTF_CACHE routes after encountering pmtu")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
During splicing an af-unix socket to a pipe we have to drop all
af-unix socket locks. While doing so we allow another reader to enter
unix_stream_read_generic which can read, copy and finally free another
skb. If exactly this skb is just in process of being spliced we get a
use-after-free report by kasan.
First, we must make sure to not have a free while the skb is used during
the splice operation. We simply increment its use counter before unlocking
the reader lock.
Stream sockets have the nice characteristic that we don't care about
zero length writes and they never reach the peer socket's queue. That
said, we can take the UNIXCB.consumed field as the indicator if the
skb was already freed from the socket's receive queue. If the skb was
fully consumed after we locked the reader side again we know it has been
dropped by a second reader. We indicate a short read to user space and
abort the current splice operation.
This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.
Fixes: 2b514574f7 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Ceph updates from Sage Weil:
"There are several patches from Ilya fixing RBD allocation lifecycle
issues, a series adding a nocephx_sign_messages option (and associated
bug fixes/cleanups), several patches from Zheng improving the
(directory) fsync behavior, a big improvement in IO for direct-io
requests when striping is enabled from Caifeng, and several other
small fixes and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: clear msg->con in ceph_msg_release() only
libceph: add nocephx_sign_messages option
libceph: stop duplicating client fields in messenger
libceph: drop authorizer check from cephx msg signing routines
libceph: msg signing callouts don't need con argument
libceph: evaluate osd_req_op_data() arguments only once
ceph: make fsync() wait unsafe requests that created/modified inode
ceph: add request to i_unsafe_dirops when getting unsafe reply
libceph: introduce ceph_x_authorizer_cleanup()
ceph: don't invalidate page cache when inode is no longer used
rbd: remove duplicate calls to rbd_dev_mapping_clear()
rbd: set device_type::release instead of device::release
rbd: don't free rbd_dev outside of the release callback
rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails
libceph: use local variable cursor instead of &msg->cursor
libceph: remove con argument in handle_reply()
ceph: combine as many iovec as possile into one OSD request
ceph: fix message length computation
ceph: fix a comment typo
rbd: drop null test before destroy functions
Pablo Neira Ayuso:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree. This
large batch that includes fixes for ipset, netfilter ingress, nf_tables
dynamic set instantiation and a longstanding Kconfig dependency problem.
More specifically, they are:
1) Add missing check for empty hook list at the ingress hook, from
Florian Westphal.
2) Input and output interface are swapped at the ingress hook,
reported by Patrick McHardy.
3) Resolve ipset extension alignment issues on ARM, patch from Jozsef
Kadlecsik.
4) Fix bit check on bitmap in ipset hash type, also from Jozsef.
5) Release buckets when all entries have expired in ipset hash type,
again from Jozsef.
6) Oneliner to initialize conntrack tuple object in the PPTP helper,
otherwise the conntrack lookup may fail due to random bits in the
structure holes, patch from Anthony Lineham.
7) Silence a bogus gcc warning in nfnetlink_log, from Arnd Bergmann.
8) Fix Kconfig dependency problems with TPROXY, socket and dup, also
from Arnd.
9) Add __netdev_alloc_pcpu_stats() to allow creating percpu counters
from atomic context, this is required by the follow up fix for
nf_tables.
10) Fix crash from the dynamic set expression, we have to add new clone
operation that should be defined when a simple memcpy is not enough.
This resolves a crash when using per-cpu counters with new Patrick
McHardy's flow table nft support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
the breakup of the big NFSv4 state lock in 3.17--thanks especially to
Andrew Elble and Jeff Layton for tracking down some of the remaining
races.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWQ51oAAoJECebzXlCjuG+b1AP+wWamMNw8eS0N98+KslfTMNd
BFcOFp6L5Hv1VRuwl67V9UUNS+9y5rsgWh9gMnQe5yPZ+dVABbO6mKfh3f7HJ2zg
aGzmE9ZdTMejjRDpSHHMEqxYePlxGVgxVhr8CgnzgkXf6KBEy3emfDlFIocgFWdR
JGWEhfOoOa+H7b3Awq3KxlxhAajq1ic14un2CLxTYdvjshwhlIjnscY1F7vNiNRg
O/ELQRKCCSNZbwGSV/OhNUXx3VQPQUh1eMvIfSD3Fs6AtMybIWGW5Fc36jxZJKt2
kllcEfukRnGa+Ezl/hwBWd1pEVMwmkYhNRt+9LtH8uWG4+uT5i3Mxn4taDYg8618
plp2GtRoC3VwOvUKEcKl3HhlRBu5H4zJtx9x60NDzAgNUtoKG/Dl7rhm7o7QwUk8
W3k0jYAryoyx+12fYO0dssdM4pj1Gi7nRGR687lKzXXttktbQF88ZS9EHhI3oFiC
Ak8ilEeap8ND1KIJY6Z2xr925BPpw+P2GXbd/Mr5H0aX3a3WM3wLPXlToZbve5EP
haYnTqbHw9QzqbLDcki8s0hNgv+xwlQWopoInGijfr3IAHQMpKStI0WxhyTYsb8g
0xyRWA1COnj5WvgznbkMky7Q45T27q26EFgaS8+LEJ1rtEpmNDZOaycbwym6XQHk
1oyydIRSWM3c7eWnDvFG
=wRg0
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Apologies for coming a little late in the merge window. Fortunately
this is another fairly quiet one:
Mainly smaller bugfixes and cleanup. We're still finding some bugs
from the breakup of the big NFSv4 state lock in 3.17 -- thanks
especially to Andrew Elble and Jeff Layton for tracking down some of
the remaining races"
* tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux:
svcrpc: document lack of some memory barriers
nfsd: fix race with open / open upgrade stateids
nfsd: eliminate sending duplicate and repeated delegations
nfsd: remove recurring workqueue job to clean DRC
SUNRPC: drop stale comment in svc_setup_socket()
nfsd: ensure that seqid morphing operations are atomic wrt to copies
nfsd: serialize layout stateid morphing operations
nfsd: improve client_has_state to check for unused openowners
nfsd: fix clid_inuse on mount with security change
sunrpc/cache: make cache flushing more reliable.
nfsd: move include of state.h from trace.c to trace.h
sunrpc: avoid warning in gss_key_timeout
lockd: get rid of reference-counted NSM RPC clients
SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage()
lockd: create NSM handles per net namespace
nfsd: switch unsigned char flags in svc_fh to bools
nfsd: move svc_fh->fh_maxsize to just after fh_handle
nfsd: drop null test before destroy functions
nfsd: serialize state seqid morphing operations
The L2CAP core expects channel implementations to manage the reference
returned by the new_connection callback. With sockets this is already
handled with each channel being tied to the corresponding socket. With
SMP however there's no context to tie the pointer to in the
smp_new_conn_cb function. The function can also not just drop the
reference since it's the only one at that point.
For fixed channels (like SMP) the code path inside the L2CAP core from
new_connection() to ready() is short and straight-forwards. The
crucial difference is that in ready() the implementation has access to
the l2cap_conn that SMP needs associate its l2cap_chan. Instead of
taking a new reference in smp_ready_cb() we can simply assume to
already own the reference created in smp_new_conn_cb(), i.e. there is
no need to call l2cap_chan_hold().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.19+
Pull networking fixes from David Miller:
1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet.
2) Several spots need to get to the original listner for SYN-ACK
packets, most spots got this ok but some were not. Whilst covering
the remaining cases, create a helper to do this. From Eric Dumazet.
3) Missiing check of return value from alloc_netdev() in CAIF SPI code,
from Rasmus Villemoes.
4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich.
5) Use after free in mvneta driver, from Justin Maggard.
6) Fix race on dst->flags access in dst_release(), from Eric Dumazet.
7) Add missing ZLIB_INFLATE dependency for new qed driver. From Arnd
Bergmann.
8) Fix multicast getsockopt deadlock, from WANG Cong.
9) Fix deadlock in btusb, from Kuba Pawlak.
10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6
counter state. From Sabrina Dubroca.
11) Fix packet_bind() race, which can cause lost notifications, from
Francesco Ruggeri.
12) Fix MAC restoration in qlcnic driver during bonding mode changes,
from Jarod Wilson.
13) Revert bridging forward delay change which broke libvirt and other
userspace things, from Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
Revert "bridge: Allow forward delay to be cfgd when STP enabled"
bpf_trace: Make dependent on PERF_EVENTS
qed: select ZLIB_INFLATE
net: fix a race in dst_release()
net: mvneta: Fix memory use after free.
net: Documentation: Fix default value tcp_limit_output_bytes
macvtap: Resolve possible __might_sleep warning in macvtap_do_read()
mvneta: add FIXED_PHY dependency
net: caif: check return value of alloc_netdev
net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA
drivers: net: xgene: fix RGMII 10/100Mb mode
netfilter: nft_meta: use skb_to_full_sk() helper
net_sched: em_meta: use skb_to_full_sk() helper
sched: cls_flow: use skb_to_full_sk() helper
netfilter: xt_owner: use skb_to_full_sk() helper
smack: use skb_to_full_sk() helper
net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid()
bpf: doc: correct arch list for supported eBPF JIT
dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put"
bonding: fix panic on non-ARPHRD_ETHER enslave failure
...
With the conversion of the counter expressions to make it percpu, we
need to clone the percpu memory area, otherwise we crash when using
counters from flow tables.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Kconfig is too smart for its own good: a Kconfig line that states
select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
means that if IP6_NF_IPTABLES is set to 'm', then NF_DEFRAG_IPV6 will
also be set to 'm', regardless of the state of the symbol from which
it is selected. When the xt_TEE driver is built-in and nothing else
forces NF_DEFRAG_IPV6 to be built-in, this causes a link-time error:
net/built-in.o: In function `tee_tg6':
net/netfilter/xt_TEE.c:46: undefined reference to `nf_dup_ipv6'
This works around that behavior by changing the dependency to
'if IP6_NF_IPTABLES != n', which is interpreted as boolean expression
rather than a tristate and causes the NF_DEFRAG_IPV6 symbol to
be built-in as well.
The bug only occurs once in thousands of 'randconfig' builds and
does not really impact real users. From inspecting the other
surrounding Kconfig symbols, I am guessing that NETFILTER_XT_TARGET_TPROXY
and NETFILTER_XT_MATCH_SOCKET have the same issue. If not, this
change should still be harmless.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After a recent (correct) change, gcc started warning about the use
of the 'flags' variable in nfulnl_recv_config()
net/netfilter/nfnetlink_log.c: In function 'nfulnl_recv_config':
net/netfilter/nfnetlink_log.c:320:14: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/nfnetlink_log.c:828:6: note: 'flags' was declared here
The warning first shows up in ARM s3c2410_defconfig with gcc-4.3 or
higher (including 5.2.1, which is the latest version I checked) I
tried working around it by rearranging the code but had no success
with that.
As a last resort, this initializes the variable to zero, which shuts
up the warning, but means that we don't get a warning if the code
is ever changed in a way that actually causes the variable to be
used without first being written.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 8cbc870829 ("netfilter: nfnetlink_log: validate dependencies to avoid breaking atomicity")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd
expect them. But it doesn't appear they're necessary in our case, and
this is likely a hot path--for now just document the odd behavior.
Kosuke Tatsukawa found this issue while looking through the linux source
code for places calling waitqueue_active() before wake_up*(), but
without preceding memory barriers, after sending a patch to fix a
similar issue in drivers/tty/n_tty.c (Details about the original issue
can be found here: https://lkml.org/lkml/2015/9/28/849).
Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit 34c2d9fb04.
There are 2 reasons for this revert:
1) The commit in question doesn't do what it says it does. The
description reads: "Allow bridge forward delay to be configured
when Spanning Tree is enabled." This was already the case before
the commit was made. What the commit actually do was disallow
invalid values or 'forward_delay' when STP was turned off.
2) The above change was actually a change in the user observed
behavior and broke things like libvirt and other network configs
that set 'forward_delay' to 0 without enabling STP. The value
of 0 is actually used when STP is turned off to immediately mark
the bridge as forwarding.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The svc_setup_socket() function does set the send and receive buffer
sizes, so the comment is out-of-date:
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Only cpu seeing dst refcount going to 0 can safely
dereference dst->flags.
Otherwise an other cpu might already have freed the dst.
Fixes: 27b75c95f1 ("net: avoid RCU for NOCACHE dst")
Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Highlights include:
Features:
- RDMA client backchannel from Chuck
- Support for NFSv4.2 file CLONE using the btrfs ioctl
Bugfixes + cleanups
- Move socket data receive out of the bottom halves and into a workqueue
- Refactor NFSv4 error handling so synchronous and asynchronous RPC handles
errors identically.
- Fix a panic when blocks or object layouts reads return a bad data length
- Fix nfsroot so it can handle a 1024 byte long path.
- Fix bad usage of page offset in bl_read_pagelist
- Various NFSv4 callback cleanups+fixes
- Fix GETATTR bitmap verification
- Support hexadecimal number for sunrpc debug sysctl files
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWQPMXAAoJEGcL54qWCgDy6ZUQAL32vpgyMXe7R4jcxoQxm52+
tn8FrY8aBZAqucvQsIGCrYfE01W/s8goDTQdZODn0MCcoor12BTPVYNIR42/J/no
MNnRTDF0dJ4WG+inX9G87XGG6sFN3wDaQcCaexknkQZlFNF4KthxojzR2BgjmRVI
p3WKkLSNTt6DYQQ8eDetvKoDT0AjR/KCYm89tiE8GMhKYcaZl6dTazJxwOcp2CX9
YDW6+fQbsv8qp5v2ay03e88O/DSmcNRFoxy/KUGT9OwJgdN08IN8fTt6GG38yycT
D9tb9uObBRcll4PnucouadBcykGr6jAP0z8HklE266LH1dwYLOHQoDFdgAs0QGtq
nlySiKvToj6CYXonXoPOjZF3P/lxlkj5ViZ2enBxgxrPmyWl172cUSa6NTXOMO46
kPpxw50xa1gP5kkBVwIZ6XZuzl/5YRhB3BRP3g6yuJCbAwVBJvawYU7riC+6DEB9
zygVfm21vi9juUQXJ37zXVRBTtoFhFjuSxcAYxc63o181lWYShKQ3IiRYg+zTxnq
7DOhXa0ZNGvMgJJi0tH9Es3/S6TrGhyKh5gKY/o2XUjY0hCSsCSdP6jw6Mb9Ax1s
0LzByHAikxBKPt2OFeoUgwycI2xqow4iAfuFk071iP7n0nwC804cUHSkGxW67dBZ
Ve5Skkg1CV+oWQYxGmGZ
=py1V
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
New features:
- RDMA client backchannel from Chuck
- Support for NFSv4.2 file CLONE using the btrfs ioctl
Bugfixes + cleanups:
- Move socket data receive out of the bottom halves and into a
workqueue
- Refactor NFSv4 error handling so synchronous and asynchronous RPC
handles errors identically.
- Fix a panic when blocks or object layouts reads return a bad data
length
- Fix nfsroot so it can handle a 1024 byte long path.
- Fix bad usage of page offset in bl_read_pagelist
- Various NFSv4 callback cleanups+fixes
- Fix GETATTR bitmap verification
- Support hexadecimal number for sunrpc debug sysctl files"
* tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (53 commits)
Sunrpc: Supports hexadecimal number for sysctl files of sunrpc debug
nfs: Fix GETATTR bitmap verification
nfs: Remove unused xdr page offsets in getacl/setacl arguments
fs/nfs: remove unnecessary new_valid_dev check
SUNRPC: fix variable type
NFS: Enable client side NFSv4.1 backchannel to use other transports
pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDS
pNFS/flexfiles: When mirrored, retry failed reads by switching mirrors
SUNRPC: Remove the TCP-only restriction in bc_svc_process()
svcrdma: Add backward direction service for RPC/RDMA transport
xprtrdma: Handle incoming backward direction RPC calls
xprtrdma: Add support for sending backward direction RPC replies
xprtrdma: Pre-allocate Work Requests for backchannel
xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers
SUNRPC: Abstract backchannel operations
xprtrdma: Saving IRQs no longer needed for rb_lock
xprtrdma: Remove reply tasklet
xprtrdma: Use workqueue to process RPC/RDMA replies
xprtrdma: Replace send and receive arrays
xprtrdma: Refactor reply handler error handling
...
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The uninitialized tuple structure caused incorrect hash calculation
and the lookup failed.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=106441
Signed-off-by: Anthony Lineham <anthony.lineham@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
SYNACK packets might be attached to request sockets.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYNACK packets might be attached to request sockets.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYNACK packets might be attached to request sockets.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYNACK packets might be attached to a request socket,
xt_owner wants to gte the listener in this case.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
- "Checksum offload support in user space" enablement
- Misc cxgb4 fixes, add T6 support
- Misc usnic fixes
- 32 bit build warning fixes
- Misc ocrdma fixes
- Multicast loopback prevention extension
- Extend the GID cache to store and return attributes of GIDs
- Misc iSER updates
- iSER clustering update
- Network NameSpace support for rdma CM
- Work Request cleanup series
- New Memory Registration API
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWPO5UAAoJELgmozMOVy/dSCQP/iX2ImMZOS3VkOYKhLR3dSv8
4vTEiYIoAT1JEXiPpiabuuACwotcZcMRk9kZ0dcWmBoFusTzKJmoDOkgAYd95XqY
EsAyjqtzUGNNMjH5u5W+kdbaFdH9Ktq7IJvspRlJuvzC47Srax+qBxX01jrAkDgh
4PoA3hEa2KkvkDjY2Mhvk9EWd/uflO9Ky6o0D8jUQkWtEvKBRyDjQLk30oW6wHX9
pTWqww3dD0EXTrR+PDA88v2saKH1kZFU1Nt2eU8Bw+zlJM8hcX6U7PfRX0g3HT/J
o+7ejTdLPWFDH35gJOU+KE519f1JbwfRjPJCqbOC9IttBB7iHSbhcpQLpWv4JV1x
agdBeDA3TGQj3dHb2SkYMlWXCBp7q8UCbVGvvirTFzGSGU73sc6hhP+vCKvPQIlE
Ah5tUqD7Y3mOBjvuDeIzKMLXILd5d3cH+m7Laytrf5e7fJPmBRZyOkcMh0QVElyl
mKo+PFjghgeTFb405J7SDDw/vThVyN9HyIt7AGEzObaajzOOk9R1hkQr46XVy9TK
yi58fl85yQ2n6TWV6NRnvkQoMy/N2HAEuXk/7HtO0PabV5w3Lo0zvXB9SnVrrVEm
58FWRBYCWorVSdSacuDnPm0iz45WSRIb9G9sBlhEC93eXRq2rSBoy4RvyLeliHFH
hllyhNNolI6FJ64j07Xm
=bBIY
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is my initial round of 4.4 merge window patches. There are a few
other things I wish to get in for 4.4 that aren't in this pull, as
this represents what has gone through merge/build/run testing and not
what is the last few items for which testing is not yet complete.
- "Checksum offload support in user space" enablement
- Misc cxgb4 fixes, add T6 support
- Misc usnic fixes
- 32 bit build warning fixes
- Misc ocrdma fixes
- Multicast loopback prevention extension
- Extend the GID cache to store and return attributes of GIDs
- Misc iSER updates
- iSER clustering update
- Network NameSpace support for rdma CM
- Work Request cleanup series
- New Memory Registration API"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
IB/core, cma: Make __attribute_const__ declarations sparse-friendly
IB/core: Remove old fast registration API
IB/ipath: Remove fast registration from the code
IB/hfi1: Remove fast registration from the code
RDMA/nes: Remove old FRWR API
IB/qib: Remove old FRWR API
iw_cxgb4: Remove old FRWR API
RDMA/cxgb3: Remove old FRWR API
RDMA/ocrdma: Remove old FRWR API
IB/mlx4: Remove old FRWR API support
IB/mlx5: Remove old FRWR API support
IB/srp: Dont allocate a page vector when using fast_reg
IB/srp: Remove srp_finish_mapping
IB/srp: Convert to new registration API
IB/srp: Split srp_map_sg
RDS/IW: Convert to new memory registration API
svcrdma: Port to new memory registration API
xprtrdma: Port to new memory registration API
iser-target: Port to new memory registration API
IB/iser: Port to new fast registration API
...
Pull trivial updates from Jiri Kosina:
"Trivial stuff from trivial tree that can be trivially summed up as:
- treewide drop of spurious unlikely() before IS_ERR() from Viresh
Kumar
- cosmetic fixes (that don't really affect basic functionality of the
driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek
- various comment / printk fixes and updates all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
bcache: Really show state of work pending bit
hwmon: applesmc: fix comment typos
Kconfig: remove comment about scsi_wait_scan module
class_find_device: fix reference to argument "match"
debugfs: document that debugfs_remove*() accepts NULL and error values
net: Drop unlikely before IS_ERR(_OR_NULL)
mm: Drop unlikely before IS_ERR(_OR_NULL)
fs: Drop unlikely before IS_ERR(_OR_NULL)
drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
UBI: Update comments to reflect UBI_METAONLY flag
pktcdvd: drop null test before destroy functions
Incorrect index was used when the data blob was shrinked at expiration,
which could lead to falsely expired entries and memory leak when
the comment extension was used too.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
The data extensions in ipset lacked the proper memory alignment and
thus could lead to kernel crash on several architectures. Therefore
the structures have been reorganized and alignment attributes added
where needed. The patch was tested on armv7h by Gerhard Wiesinger and
on x86_64, sparc64 by Jozsef Kadlecsik.
Reported-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem update from James Morris:
"This is mostly maintenance updates across the subsystem, with a
notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
maintainer of that"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
apparmor: clarify CRYPTO dependency
selinux: Use a kmem_cache for allocation struct file_security_struct
selinux: ioctl_has_perm should be static
selinux: use sprintf return value
selinux: use kstrdup() in security_get_bools()
selinux: use kmemdup in security_sid_to_context_core()
selinux: remove pointless cast in selinux_inode_setsecurity()
selinux: introduce security_context_str_to_sid
selinux: do not check open perm on ftruncate call
selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
KEYS: Merge the type-specific data with the payload data
KEYS: Provide a script to extract a module signature
KEYS: Provide a script to extract the sys cert list from a vmlinux file
keys: Be more consistent in selection of union members used
certs: add .gitignore to stop git nagging about x509_certificate_list
KEYS: use kvfree() in add_key
Smack: limited capability for changing process label
TPM: remove unnecessary little endian conversion
vTPM: support little endian guests
char: Drop owner assignment from i2c_driver
...
I mistakenly took wrong request sock pointer when calling tcp_move_syn()
@req_unhash is either a copy of @req, or a NULL value for
FastOpen connexions (as we do not expect to unhash the temporary
request sock from ehash table)
Fixes: 805c4bc057 ("tcp: fix req->saved_syn race")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ying Cai <ycai@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race conditions between packet_notifier and packet_bind{_spkt}.
It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
time packet_bind{_spkt} takes a reference on the new netdevice and the
time packet_do_bind sets po->ifindex.
In this case the notification can be missed.
If this happens during a dev_change_net_namespace this can result in the
netdevice to be moved to the new namespace while the packet_sock in the
old namespace still holds a reference on it. When the netdevice is later
deleted in the new namespace the deletion hangs since the packet_sock
is not found in the new namespace' &net->packet.sklist.
It can be reproduced with the script below.
This patch makes packet_do_bind check again for the presence of the
netdevice in the packet_sock's namespace after the synchronize_net
in unregister_prot_hook.
More in general it also uses the rcu lock for the duration of the bind
to stop dev_change_net_namespace/rollback_registered_many from
going past the synchronize_net following unlist_netdevice, so that
no NETDEV_UNREGISTER notifications can happen on the new netdevice
while the bind is executing. In order to do this some code from
packet_bind{_spkt} is consolidated into packet_do_dev.
import socket, os, time, sys
proto=7
realDev='em1'
vlanId=400
if len(sys.argv) > 1:
vlanId=int(sys.argv[1])
dev='vlan%d' % vlanId
os.system('taskset -p 0x10 %d' % os.getpid())
s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
os.system('ip link add link %s name %s type vlan id %d' %
(realDev, dev, vlanId))
os.system('ip netns add dummy')
pid=os.fork()
if pid == 0:
# dev should be moved while packet_do_bind is in synchronize net
os.system('taskset -p 0x20000 %d' % os.getpid())
os.system('ip link set %s netns dummy' % dev)
os.system('ip netns exec dummy ip link del %s' % dev)
s.close()
sys.exit(0)
time.sleep(.004)
try:
s.bind(('%s' % dev, proto+1))
except:
print 'Could not bind socket'
s.close()
os.system('ip netns del dummy')
sys.exit(0)
os.waitpid(pid, 0)
s.close()
os.system('ip netns del dummy')
sys.exit(0)
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before converting a 'socket pointer' into inet socket,
use sk_fullsock() to detect timewait or request sockets.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the reasons explained in commit ce1050089c ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
req->saved_syn unless we own the request sock.
This fixes races for listeners using TCP_SAVE_SYN option.
Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ying Cai <ycai@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2015-11-05
The following set of Bluetooth patches would be good to get into 4.4-rc1
if possible:
- Fix for missing LE CoC parameter validity checks
- Fix for potential deadlock in btusb
- Fix for issuing unsupported commands during HCI init
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the big tty and serial driver update for 4.4-rc1.
Lots of serial driver updates and a few small tty core changes. Full
details in the shortlog.
All of these have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlY6f64ACgkQMUfUDdst+ykf8gCfYPjtHy5hD/TsharaeXROnVgi
W8cAn16xk1Nmnde220MNNpO6zDu65G/1
=kslf
-----END PGP SIGNATURE-----
Merge tag 'tty-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big tty and serial driver update for 4.4-rc1.
Lots of serial driver updates and a few small tty core changes. Full
details in the shortlog.
All of these have been in linux-next for a while"
* tag 'tty-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (148 commits)
tty: Use unbound workqueue for all input workers
tty: Abstract tty buffer work
tty: Prevent tty teardown during tty_write_message()
tty: core: Use correct spinlock flavor in tiocspgrp()
tty: Combine SIGTTOU/SIGTTIN handling
serial: amba-pl011: fix incorrect integer size in pl011_fifo_to_tty()
ttyFDC: Fix build problems due to use of module_{init,exit}
tty: remove unneeded return statement
serial: 8250_mid: add support for DMA engine handling from UART MMIO
dmaengine: hsu: remove platform data
dmaengine: hsu: introduce stubs for the exported functions
dmaengine: hsu: make the UART driver in control of selecting this driver
serial: fix mctrl helper functions
serial: 8250_pci: Intel MID UART support to its own driver
serial: fsl_lpuart: add earlycon support
tty: disable unbind for old 74xx based serial/mpsc console port
serial: pl011: Spelling s/clocks-names/clock-names/
n_tty: Remove reader wakeups for TTY_BREAK/TTY_PARITY chars
tty: synclink, fix indentation
serial: at91, fix rs485 properties
...
In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
the dev_snmp6 entry that we have already registered for this device.
Call snmp6_unregister_dev in this case.
Fixes: a317a2f19d ("ipv6: fail early when creating netdev named all or default")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving a connect response we should make sure that the DCID is
within the valid range and that we don't already have another channel
allocated for the same DCID.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The 'dyn_end' value is also a valid CID so it should be included in
the range of values checked.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The core spec defines specific response codes for situations when the
received CID is incorrect. Add the defines for these and return them
as appropriate from the LE Connect Request handler function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The white list commands might not be implemented if the controller does
not actually support the white list. So check the supported commands
first before issuing these commands. Not supporting the white list is
the same as supporting a white list with zero size.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When a listen socket enqueues a connection for userspace to accept(),
the sk->sk_data_ready() callback should be invoked. In-kernel socket
users rely on this callback to detect when incoming connections are
available.
Currently the sk->sk_state_change() callback is invoked by
vmci_transport.c. This happens to work for userspace applications since
sk->sk_state_change = sock_def_wakeup() and sk->sk_data_ready =
sock_def_readable() both wake up the accept() waiter. In-kernel socket
users, on the other hand, fail to detect incoming connections.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With moving netdev_sync_lower_features() after the .ndo_set_features
calls, I neglected to verify that devices added *after* a flag had been
disabled on an upper device were properly added with that flag disabled as
well. This currently happens, because we exit __netdev_update_features()
when we see dev->features == features for the upper dev. We can retain the
optimization of leaving without calling .ndo_set_features with a bit of
tweaking and a goto here.
Fixes: fd867d51f8 ("net/core: generic support for disabling netdev features down stack")
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: netdev@vger.kernel.org
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A bug report (https://bugzilla.kernel.org/show_bug.cgi?id=107071) noted
that the follwoing ip command is failing with v4.3:
$ ip route add 10.248.5.0/24 dev bond0.250 table vlan_250 src 10.248.5.154
RTNETLINK answers: Invalid argument
021dd3b8a1 changed the lookup of the given preferred source address to
use the table id passed in, but this assumes the local entries are in the
given table which is not necessarily true for non-VRF use cases. When
validating the preferred source fallback to the local table on failure.
Fixes: 021dd3b8a1 ("net: Add routes to the table associated with the device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sasha reported the following lockdep warning:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_INET);
lock(rtnl_mutex);
lock(sk_lock-AF_INET);
lock(rtnl_mutex);
This is due to that for IP_MSFILTER and MCAST_MSFILTER, we take
rtnl lock before the socket lock in setsockopt() path, but take
the socket lock before rtnl lock in getsockopt() path. All the
rest optnames are setsockopt()-only.
Fix this by aligning the getsockopt() path with the setsockopt()
path, so that all mcast socket path would be locked in the same
order.
Note, IPv6 part is different where rtnl lock is not held.
Fixes: 54ff9ef36b ("ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/netfilter/xt_TEE.c
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix crash when TEE target is used with no --oif, from Eric Dumazet.
2) Oneliner to fix a crash on the redirect traffic to localhost
infrastructure when interface has not yet an address, from
Munehisa Kamata.
3) Oneliner not to request module all the time from nfnetlink due to
wrong type value, from Florian Westphal.
I'll make sure these patches 1 and 2 hit -stable.
====================
The conflict in net/netfilter/xt_TEE.c was minor, a change
to the 'oif' selection overlapping a function signature
change for the nf_dup_ipv{4,6}() routines.
Signed-off-by: David S. Miller <davem@davemloft.net>
The sunrpc debug sysctl files only accept decimal number right now.
But all the XXXDBUG_XXX macros are defined as hexadecimal.
It is not easy to set or check an separate flag.
This patch let those files support accepting hexadecimal number,
(decimal number is also supported). Also, display it as hexadecimal.
v2,
Remove duplicate parsing of '0x...', just using simple_strtol(tmpbuf, &s, 0)
Fix a bug of isspace() checking after parsing
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were
fixing the "BH-ness" of the counter bumps whilst in 'net-next'
the functions were modified to take an explicit 'net' parameter.
Signed-off-by: David S. Miller <davem@davemloft.net>
Caller passing down the SKIP_EOPNOTSUPP switchdev flag expects that
-EOPNOTSUPP cannot be returned. But in case of direct op call without
recurtion, this may happen. So fix this by checking it always on the
end of __switchdev_port_attr_set function.
Fixes: 464314ea6c ("switchdev: skip over ports returning -EOPNOTSUPP when recursing ports")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The irlmp_unregister_service() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to incorrect len type bc_send_request returned always zero.
The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* remove a warning on a check that can trigger without any
errors having happened (Andrei)
* correctly handle deauth request while in the process of
associating (Andrei)
* fix TDLS HT operation (Arik)
* allow changing AID/listen interval during client setup (Ayala)
* be more forgiving with WMM parameters to get HT/VHT in case of
broken APs with bad WMM settings (Emmanuel, myself)
* a number of other fixes (some in documentation)
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJWOKsAAAoJEDBSmw7B7bqrNKMQAKFH81CscgJGOQb/zgGdmuF3
kNrWrnH+3XqoqM2rHpIekQLxVkeUhM+hHCyaPCK7rVnCuu53pJ0u7P0rq922XAW4
olFBGVdE1yG/69ndR9MYDjLWP+ikMmAiMLbM5qzPuDJ5XyBVACC1D82+qSRvByCK
Z8PYJ+OsLk05eKa4ER7i8BVExRGM4vrce4Uh3K07yNKbfU81ztsltwflleRFGn3f
OCydpuSId+C4TuSmkgBJF1718B9GazvAbZDw5t4jorIrbiZzQMZAtoi+YxwXVhev
lvPCO8p1+lhWYUOK5LnO8mbdUFfe+kc3rrZjKWuXuDLp6mvPyP9FOaIFjFfnlJxT
8QadG0QDzTlLHUj29gvrnww8aob9c7iHueXP9OlcBMp9uTyklgBJ+fMyvPfXpWXB
Diy9n0VJfWzg8d74wWLLQy/N1qY6gwhXXwgW8TM/49O5BpbyvVsI6jFAR+8ZT9b9
GLGEkN68RBuY03mejkf4PmhqgMVErA2JtabRI0Efm2Do85t9ZxgObF6INsrZ+o2M
ffl7jhyHsFB+d38Ilwlb4cyWhxpIGrhTtt2h5zIsgNx3wmrXrarwMM3P4NGOOEbP
Euqkk/LoMZdjjB/78JSi6hdQSYoQFaW85tHBzXhMXk0nYXHLWdVEJsLuAtATl8gM
vzNkny8pcaLnRg/kXqgl
=/d5+
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2015-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Another set of fixes:
* remove a warning on a check that can trigger without any
errors having happened (Andrei)
* correctly handle deauth request while in the process of
associating (Andrei)
* fix TDLS HT operation (Arik)
* allow changing AID/listen interval during client setup (Ayala)
* be more forgiving with WMM parameters to get HT/VHT in case of
broken APs with bad WMM settings (Emmanuel, myself)
* a number of other fixes (some in documentation)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As pointed out by Nikolay and further explained by Geert, the initial
for_each_netdev_feature macro was broken, as feature would get set outside
of the block of code it was intended to run in, thus only ever working for
the first feature bit in the mask. While less pretty this way, this is
tested and confirmed functional with multiple feature bits set in
NETIF_F_UPPER_DISABLES.
[root@dell-per730-01 ~]# ethtool -K bond0 lro off
...
[ 242.761394] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2.
[ 243.552178] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83
[ 244.353978] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1.
[ 245.147420] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71
[root@dell-per730-01 ~]# ethtool -K bond0 gro off
...
[ 251.925645] bond0: Disabling feature 0x0000000000004000 on lower dev p5p2.
[ 252.713693] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83
[ 253.499085] bond0: Disabling feature 0x0000000000004000 on lower dev p5p1.
[ 254.290922] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71
Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack")
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NIC drivers mark device as detached during error recovery.
It expects no manangement hooks to be invoked in this state.
Invoke driver vlan hooks only if device is present.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the definition of PTP_CLASS_L2 to not have any bits overlapping with
the other defined protocol values, allowing the PTP_CLASS_* definitions to
be for simple filtering on packet type.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both tunnel6_protocol and tunnel46_protocol share the same error
handler, tunnel6_err(), which traverses through tunnel6_handlers list.
For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g.
in tunnel46_rcv(). Current code can generate an ICMPv6 error message
with an IPv4 packet embedded in it.
Fixes: 73d605d1ab ("[IPSEC]: changing API of xfrm6_tunnel_register")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, cfg80211 rejects updates of AID and listen interval parameters
for existing entries. This information is known only at association stage
and as a result it's impossible to update entries that were added
unassociated.
Fix this by allowing updates of these properies for stations that the
driver (or mac80211) assigned unassociated state.
This then fixes mac80211's use of NL80211_FEATURE_FULL_AP_CLIENT_STATE.
Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Channel context driver operations can sleep, so add might_sleep()
and document this.
Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow distinguishing the non-station case from the case of a
station without rates, by using -1 for the non-station case.
This value cannot be reached with a station since that many
legacy rates don't exist.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As WMM is required for HT/VHT operation, treat bad WMM parameters
more gracefully by falling back to default parameters instead of
not using WMM assocation. This makes it possible to still use HT
or VHT, although potentially with reduced quality of service due
to unintended WMM parameters.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Disabling WMM has a huge impact these days. It implies that
HT and VHT will be disabled which means that the throughput
will be drammatically reduced.
Since the AIFSN is a transmission parameter, we can play a
bit and fix it up to make it compliant with the 802.11
specification which requires it to be at least 2.
Increasing it from 1 to 2 will slightly reduce the
likelyhood to get a transmission opportunity compared to
other clients that would accept to set AIFSN=1, but at
least it will allow HT and VHT which is a huge gain.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function currently determines this value, for use in bss_info.qos,
based on the interface type itself. Make it a parameter instead and
set it with the same logic for now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
llid_in_use needs to be limited to stations of the same VIF, otherwise it
will cause a NULL deref as the sta_info of non-mesh-VIFs don't have
sta->mesh set.
Steps to reproduce:
modprobe mac80211_hwsim channels=2
iw phy phy0 interface add ibss0 type ibss
iw phy phy0 interface add mesh0 type mp
iw phy phy1 interface add ibss1 type ibss
iw phy phy1 interface add mesh1 type mp
ip link set ibss0 up
ip link set mesh0 up
ip link set ibss1 up
ip link set mesh1 up
iw dev ibss0 ibss join foo 2412
iw dev ibss1 ibss join foo 2412
# Ensure that ibss0 and ibss1 are actually associated; I often need to
# leave and join the cell on ibss1 a second time.
iw dev mesh0 mesh join bar
iw dev mesh1 mesh join bar # crash
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When 11n peers performs a TDLS connection on a legacy BSS, the HT
operation IE must be specified according to IEEE802.11-2012 section
9.23.3.2. Otherwise HT-protection is compromised and the medium becomes
noisy for both the TDLS and the BSS links.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Scheduled scan has to be reconfigured only if wowlan wasn't
configured, since otherwise it should continue to run (with
the 'any' trigger) or be aborted.
The current code will end up asking the driver to start a new
scheduled scan without stopping the previous one, and leaking
some memory (from the previous request.)
Fix this by doing the abort/restart under the proper conditions.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If drv_start() fails during hw_restart, all the running
interfaces are being closed/stopped, which results in
drv_stop() being called, although the driver was never
started successfully.
This might cause drivers to perform operations on uninitialized
memory (as they assume it was initialized on drv_start)
Consider the local->started flag, and call the driver's stop()
op only if drv_start() succeeded before.
Move drv_start() and drv_stop() to driver-ops.c, as they are no
longer simple wrappers.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The recalc_smps work can run after the station disassociates.
At this stage we already released the channel, but the work
will be cancelled only when the interface stops.
In this scenario we can hit the warning in ieee80211_recalc_smps, so
just remove it.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Requesting hw restart during suspend might result
in the restart work being executed after mac80211
and the hw are suspended.
Solve the race by simply scheduling the restart
work on a freezable workqueue.
Note that there can be some cases of reconfiguration
on resume (besides the hardware restart):
* wowlan is not configured -
All the interfaces removed were removed on suspend,
and drv_stop() was called. At this point the driver
shouldn't expect for hw_restart anyway, so we can
simply cancel it (on resume).
* wowlan is configured, drv_resume() == 1
There is no definitive expected behavior in this case,
as each driver might have different expectations (e.g.
setting some flags on suspend/restart vs. not handling
spurious recovery).
For now, simply let the hw_restart work run again after
resume, and hope the driver will handle it well (or at
least initiate another hw restart).
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Local request to deauthenticate wasn't handled while associating, thus
the association could continue even when the user space required to
disconnect.
Cc: stable@vger.kernel.org
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>