Rename the ipv4_tos and ipv4_ttl fields to just 'tos' and 'ttl', as they'll
be used with IPv6 tunnels, too.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the
newly introduced padding after the IPv4 addresses needs to be zeroed out.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas reported breakage adding routes with local nexthops:
$ ip route show table main
...
172.28.0.0/24 dev vnf-xe1p0 proto kernel scope link src 172.28.0.16
$ ip route add 10.0.0.0/8 via 172.28.0.32 table 100 dev vnf-xe1p0
RTNETLINK answers: Resource temporarily unavailable
3bfd847203 changed the lookup to use the passed in table but for cases like
this the nexthop is in the local table rather than the passed in table.
Fixes: 3bfd847203 ("net: Use passed in table for nexthop lookups")
Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
.maxtype should match .policy. Probably just been getting lucky here
because IFLA_BRPORT_MAX > IFLA_BR_MAX.
Fixes: 13323516 ("bridge: implement rtnl_link_ops->changelink")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make fib_encap_match() static as it isn't used outside the file.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A proprietary vendor command may send back useful data to the user
application.
For example, the field level applied on the NFC router antenna.
Still based on net/wireless/nl80211.c implementation,
add nfc_vendor_cmd_alloc_reply_skb and nfc_vendor_cmd_reply in
order to send back over netlink data generated by a proprietary
command.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
skb can be NULL and may lead to a NULL pointer error.
Add a check condition before setting HCI rx buffer.
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some drivers needs to have ability to reinit NCI core, for example
after updating firmware in setup() of post_setup() callback. This
patch makes nci_core_reset() and nci_core_init() functions public,
to make it possible.
Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some drivers require non-standard configuration after NCI_CORE_INIT
request, because they need to know ndev->manufact_specific_info or
ndev->manufact_id. This patch adds post_setup handler allowing to do
such custom configuration.
Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
make payload expression aware of the fact that VLAN offload may have
removed a vlan header.
When we encounter tagged skb, transparently insert the tag into the
register so that vlan header matching can work without userspace being
aware of offload features.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently, two routes going through the same tunnel interface are considered
the same even when they are routed to a different host after encapsulation.
This causes all routes added after the first one to have incorrect
encapsulation parameters.
This is nicely visible by doing:
# ip r a 192.168.1.2/32 dev vxlan0 tunnel dst 10.0.0.2
# ip r a 192.168.1.3/32 dev vxlan0 tunnel dst 10.0.0.3
# ip r
[...]
192.168.1.2/32 tunnel id 0 src 0.0.0.0 dst 10.0.0.2 [...]
192.168.1.3/32 tunnel id 0 src 0.0.0.0 dst 10.0.0.2 [...]
Implement the missing comparison function.
Fixes: 3093fbe7ff ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The built lwtunnel_state struct has to be freed after comparison.
Fixes: 571e722676 ("ipv4: support for fib route lwtunnel encap attributes")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The object tt_local is allocated with kmalloc and not initialized when the
function batadv_tt_local_add checks for the vlan. But this function can
only cleanup the object when the (not yet initialized) reference counter of
the object is 1. This is unlikely and thus the object would leak when the
vlan could not be found.
Instead the uninitialized object tt_local has to be freed manually and the
pointer has to set to NULL to avoid calling the function which would try to
decrement the reference counter of the not existing object.
CID: 1316518
Fixes: 354136bcc3 ("batman-adv: fix kernel crash due to missing NULL checks")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With more than two switches in a hierarchy, it becomes necessary to
describe multi-hop routes between switches. The current binding does
not allow this, although the older platform_data did. Extend the link
property to be a list rather than a single phandle to a remote switch.
It is then possible to express that a port should be used to reach
more than one switch and the switch maybe more than one hop away.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Those were all workarounds for the formerly double meaning of
tx_queue_len, which broke scheduling algorithms if untreated.
Now that all in-tree drivers have been converted away from setting
tx_queue_len = 0, it should be safe to drop these workarounds for
categorically broken setups.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the introduction of IFF_NO_QUEUE, there is a better way for
drivers to indicate that no qdisc should be attached by default. Though,
the old convention can't be dropped since ignoring that setting would
break drivers still using it. Instead, add a warning so out-of-tree
driver maintainers get a chance to adjust their code before we finally
get rid of any special handling of tx_queue_len == 0.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding new module name ila. This implements ILA translation. Light
weight tunnel redirection is used to perform the translation in
the data path. This is configured by the "ip -6 route" command
using the "encap ila <locator>" option, where <locator> is the
value to set in destination locator of the packet. e.g.
ip -6 route add 3333:0:0:1:5555:0:1:0/128 \
encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0
Sets a route where 3333:0:0:1 will be overwritten by
2001:0:0:1 on output.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function updates a checksum field value and skb->csum based on
a value which is the difference between the old and new checksum.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
the checksum field carries a pseudo header. This argument should be a
boolean instead of an int.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the capability to redirect dst input in the same way
that dst output is redirected by LWT.
Also, save the original dst.input and and dst.out when setting up
lwtunnel redirection. These can be called by the client as a pass-
through.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work adds the possibility of deriving the zone id from the skb->mark
field in a scalable manner. This allows for having only a single template
serving hundreds/thousands of different zones, for example, instead of the
need to have one match for each zone as an extra CT jump target.
Note that we'd need to have this information attached to the template as at
the time when we're trying to lookup a possible ct object, we already need
to know zone information for a possible match when going into
__nf_conntrack_find_get(). This work provides a minimal implementation for
a possible mapping.
In order to not add/expose an extra ct->status bit, the zone structure has
been extended to carry a flag for deriving the mark.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Table lookup compiles out when VRF is not enabled.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-08-16
Here's what's likely the last bluetooth-next pull request for 4.3:
- 6lowpan/802.15.4 refactoring, cleanups & fixes
- Document 6lowpan netdev usage in Documentation/networking/6lowpan.txt
- Support for UART based QCA Bluetooth controllers
- Power management support for Broeadcom Bluetooth controllers
- Change LE connection initiation to always use passive scanning first
- Support for new Silicon Wave USB ID
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change brace placement to be in line with coding standards
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
- avoid integer overflow in GW selection routine
- prevent race condition by making capability bit changes atomic (use
clear/set/test_bit)
- fix synchronization issue in mcast tvlv handler
- fix crash on double list removal of TT Request objects
- fix leak by puring packets enqueued for sending upon iface removal
- ensure network header pointer is set in skb
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJVzlUGAAoJEOb/4TMchkvfy9EQAJ0DBKdTqiuWwNQzZ9ghCeLi
Njgwyq4WdvZXp8vIj9cq0AR1JsFvH9fGp4BlLE8pGsQyxURiWQmR/Jurvb0UGMhM
SPVvl/b0nnCrRu937UcVEiq6wRrFwEr1fYfL45fY0c1eTkXSu/54iEURDvWlkpyX
COB4cuydoPY6O2LGiPRsGlOGTYbQ/u89G1HyxFr9Ch2IVRMPVXB6zuI7puuEg2ee
EwgI2IGO/GHHaScbXQpVy+bL6XL/OcQ4nCrYeaNpwbOKAOfGabNzfVncSCLRKMSW
jZh5D3cG3AKGUPeooEtd8yGgNG3SSVNTXt7CL+WOM7/2EKvYeNdJFaQzbdZQEtT0
begOcUJ2K0yas/iExqYXtEPFCWXhjuzrb4u6wu5psGgu3zt1nltpCkplDEMcSEia
C+dYE5r/JM2S1lsFclnjz+rBXu4ZfCUvV0q+50+lGiWPhfemM2M2XqJNcay0ioSm
radSwMe549sh6r3ajwdxF5z66tU94PERMMxVxJ9Z3gmHUZ56phxEe6X0xBO91bhW
aSCcFr6PQMxmfwanLj5Ydxm5MPHdDXF84YPMTFcGb4ya9ZWWURGNq7Sr85Ek7N6W
5DIA8F8/vsgaXYUn011aBu4q1mm5ZsTs9nR3MAxchAuuu3sTBOkd/BecS4T7CEOE
IHnbxsJxXReJ/aCQFoks
=ZEuN
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- avoid integer overflow in GW selection routine
- prevent race condition by making capability bit changes atomic (use
clear/set/test_bit)
- fix synchronization issue in mcast tvlv handler
- fix crash on double list removal of TT Request objects
- fix leak by puring packets enqueued for sending upon iface removal
- ensure network header pointer is set in skb
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a prep work for fixing a potential deadlock when creating
a pcpu rt.
The current rt6_get_pcpu_route() will also create a pcpu rt if one does not
exist. This patch moves the pcpu rt creation logic into another function,
rt6_make_pcpu_route().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After 4b32b5ad31 ("ipv6: Stop rt6_info from using inet_peer's metrics"),
ip6_dst_alloc() does not need the 'table' argument. This patch
cleans it up.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
a bit of content:
* mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse
* TDLS higher bandwidth support (Arik)
* OCB fixes from Bertold Van den Bergh
* suspend/resume fixes from Eliad
* dynamic SMPS support for minstrel-HT (Krishna Chaitanya)
* VHT bitrate mask support (Lorenzo Bianconi)
* better regulatory support for 5/10 MHz channels (Matthias May)
* basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon)
along with a number of other cleanups.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJVzg5bAAoJEDBSmw7B7bqr3PAP/1r8wyZXxtySzz6P5Z9k0+2I
52NiSUISgmtnaQUyahf4n90eMU+gGJWQwPwIZFvMKg6bD4RW2XI4MdKmviKx8skU
4sDlDxMFrVMfV/ySwiPDAONWPtwwgKllIt0IDDnKs6kPdDlUcbKOTEFYhzZ1HhTZ
7Og4rJm7M90QpdMU7hmxmE5KRkp1hW0Yce1KPTW5U0j9yl9zbi4eLVWT+ac1WnZs
GpItajd0BFtBy7DRHzX8RiRJ4pi+aWxhuYNqiSxUm0BqPWCzT7PP15M1kCGwrXtm
/TTSVJl7WkLbOYI0PE0Y0XcJfZUg1c9aecCR3ubmRrQrGfOBFpN01jUANIRwqvZ3
3QRq1RZNLac0+zlBPjoFdOHmoaVX6UcJQKSgOhcfuM1BcNFnXZEcHFN4/SaEUfvJ
1ltybEeOEAckCMqqfHb1g/nVfJnlBjy811GzIrsHXqKqb7rRfGkfxmBxLrRzVknS
PC970pbuhxICeeryKdVgK5BClWeT3TB1srt6OZ0QR1zlcfZbLZ8jqJlHJcy3szFi
P43X9w8I6ZNTzkBU+lsCt9gbveYS+rSaJ+zm/SaF21ro33+FEdZ+p1ujjzp729Tz
PnKobaOrku38Be7CSwJ760WvngC7gbZqGybGknBsws4dqDXJste0UjxulZeyaOkN
nVmHDL45jc5rd8qjoPQV
=kV1a
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Another pull request for the next cycle, this time with quite
a bit of content:
* mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse
* TDLS higher bandwidth support (Arik)
* OCB fixes from Bertold Van den Bergh
* suspend/resume fixes from Eliad
* dynamic SMPS support for minstrel-HT (Krishna Chaitanya)
* VHT bitrate mask support (Lorenzo Bianconi)
* better regulatory support for 5/10 MHz channels (Matthias May)
* basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon)
along with a number of other cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF
program to select a socket.
Update the internal eBPF program by passing to socket option
SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf().
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program
to select a socket.
This avoids having to keep adding special case fanout modes. One
example use case is application layer load balancing. The QUIC
protocol, for instance, encodes a connection ID in UDP payload.
Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data
associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the
only user so far.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look
very similar, yet they serve very different purpose. This is confusing for
anyone trying to implement a user space tool supporting lwt.
As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix
them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more
logical to have them in lwtunnel.h together with the encap enum.
Fixes: 3093fbe7ff ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2015-08-17
1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
From Thomas Egerer.
2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
From Andrzej Hajda.
3) Pass oif to the xfrm lookups so that it gets set on the flow
and the resolver routines can match based on oif.
From David Ahern.
4) Add documentation for the new xfrm garbage collector threshold.
From Alexander Duyck.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJVzc58AAoJEDBSmw7B7bqrhkQP/1bgnJHOtAthUygVCUDN3p6J
rbmyfmszcWqPgYWMh4sfBRJQPWBpq4Q4ZPlVNCU8VkFoJRXemqWsOjnpMro0Vi8Y
omTMPf1ph5WAUhCTjR6d+J7M5uHVzp1ougeaDEHY6QPTQ0rNTQ4W9gR+TFRSSTG2
gVIuledUXQgkBW6nm3mT3bxSwMkK6PnPy6Dpkzm4VFt3JtO0ZpSV+RzMLifmNf1r
25lYiUUyimWbJGvZCMV9O2qKWwb+9AIYrItW74yUV2dRqqBZmP+IqJQ2p2mTWcjE
YBToZA5Yrd6iSfuKC/wd2kt3NkmsGdNGwFSy4tI6WjBkKKLygOKBvcpBzcNDgv06
ZB4mcyfac+KPiDA1QFvwV+OrU1u63wWDyfjkGCVkoq6WJkUxUZ4JVOq/7LolrxK8
d0mJojOMjTlr2Q+7KMINl9HDw0MhkYqY/fJv+AHOlcIGc2bVrJF0n9fWsc9gKdPu
c+2lIJLxzuojapXJXj+XpF6iPjh92F83W6iPbXZfu6AzvYQwGwve87CckurwElYY
bCDV0nXzmn5DYwEOuKtiflbmLW0qlXbjhX27B4yHmXsS3/Q7M/pMUy3mj4pchmxn
Iwr8pVoCgn8Z1hDJvyYPb4l0ERjY/MZghBJg+Yp4Q7CEWGOmyIPdi0visA9FN/xk
ekOYrjW9Jz3AlgMDPVCC
=uehl
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
We have a single bugfix for an invalid memory read.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8133534c76 ("net: limit tcp/udp rmem/wmem to
SOCK_{RCV,SND}BUF_MIN") modified four sysctls to enforce that the values
written to them are not less than SOCK_MIN_{RCV,SND}BUF.
That change causes 4096 to no longer be accepted as a valid value for
'min' in tcp_wmem and udp_wmem_min. 4096 has been the default for both
of those sysctls for a long time, and unfortunately seems to be an
extremely popular setting. This change breaks a large number of sysctl
configurations at Facebook.
That commit referred to b1cb59cf2e ("net: sysctl_net_core: check
SNDBUF and RCVBUF for min length"), which choose to use the SOCK_MIN
constants as the lower limits to avoid nasty bugs. But AFAICS, a limit
of SOCK_MIN_SNDBUF isn't necessary to do that: the BUG_ON cited in the
commit message seems to have happened because unix_stream_sendmsg()
expects a minimum of a full page (ie SK_MEM_QUANTUM) and the math broke,
not because it had less than SOCK_MIN_SNDBUF allocated.
This particular issue doesn't seem to affect TCP however: using a
setting of "1 1 1" for tcp_{r,w}mem works, although it's obviously
suboptimal. SK_MEM_QUANTUM would be a nice minimum, but it's 64K on
some archs, so there would still be breakage.
Since a value of one doesn't seem to cause any problems, we can drop the
minimum 8133534c added to fix this.
This reverts commit 8133534c76.
Fixes: 8133534c76 ("net: limit tcp/udp rmem/wmem to SOCK_MIN...")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A zero length payload means that no TLV (Type Length Value) data has
been passed. Prior to this patch a non-existing TLV could be sanity
checked with TLV_OK() resulting in random behavior where a user
sending an empty message occasionally got a incorrect "operation not
supported" message back.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When NFC_ATTR_VENDOR_DATA is not set, data_len is 0 and data is NULL.
Fixes the following warning:
net/nfc/netlink.c:1536:3: warning: 'data' may be used uninitialized
+in this function [-Wmaybe-uninitialized]
return cmd->doit(dev, data, data_len);
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
fib_lookup() forces FIB_LOOKUP_NOREF flag, while fib_table_lookup()
does not.
This patch solves the typical message at reboot time or device
dismantle :
unregister_netdevice: waiting for eth0 to become free. Usage count = 4
Fixes: 3bfd847203 ("net: Use passed in table for nexthop lookups")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFC_ATTR_VENDOR_DATA is an optional vendor_cmd argument.
The current code was potentially using a non existing argument
leading to potential catastrophic results.
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We receive all 802.15.4 frames on the packet handler "lowpan_rcv" this
patch checks if the wpan device belongs to a lowpan interface.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixes 802.15.4 packet layer registration when mutliple
lowpan interfaces will be added. We need to register the packet layer at
the first lowpan interface and deregister it at the last interface. This
done by open_count variable which is protected by rtnl.
Additional do a quiet fix by adding dev_put(real_dev) when netdev
registration fails, which fix the refcount for the wpan dev.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The two commits noted below added calls to ip_hdr() and ipv6_hdr(). They
need a correctly set skb network header.
Unfortunately we cannot rely on the device drivers to set it for us.
Therefore setting it in the beginning of the according ndo_start_xmit
handler.
Fixes: 1d8ab8d3c1 ("batman-adv: Modified forwarding behaviour for multicast packets")
Fixes: ab49886e3d ("batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
When an interface is purged, the broadcast packets scheduled for this
interface should get purged as well.
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The list_del() calls were changed to list_del_init() to prevent
an accidental double deletion in batadv_tt_req_node_new().
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
So far the mcast tvlv handler did not anticipate the processing of
multiple incoming OGMs from the same originator at the same time. This
can lead to various issues:
* Broken refcounting: For instance two mcast handlers might both assume
that an originator just got multicast capabilities and will together
wrongly decrease mcast.num_disabled by two, potentially leading to
an integer underflow.
* Potential kernel panic on hlist_del_rcu(): Two mcast handlers might
one after another try to do an
hlist_del_rcu(&orig->mcast_want_all_*_node). The second one will
cause memory corruption / crashes.
(Reported by: Sven Eckelmann <sven@narfation.org>)
Right in the beginning the code path makes assumptions about the current
multicast related state of an originator and bases all updates on that. The
easiest and least error prune way to fix the issues in this case is to
serialize multiple mcast handler invocations with a spinlock.
Fixes: 60432d756c ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 60432d756c ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: e17931d1a6 ("batman-adv: introduce capability initialization bitfield")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 3f4841ffb3 ("batman-adv: tvlv - add network coding container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 17cf0ea455 ("batman-adv: tvlv - add distributed arp table container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Instead of using the out-of-line average calculation, use the new
DECLARE_EWMA() macro to declare a signal EWMA, and use that.
This actually *reduces* the code size slightly (on x86-64) while
also reducing the station info size by 80 bytes.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Define rc_rateidx_vht_mcs_mask array and rate_idx_match_vht_mcs_mask()
method in order to apply mcs mask for vht rates
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Define rate_control_apply_mask_ratetbl() in order to apply ratemask in
rate_control_set_rates() for station rate table
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove ieee80211_tx_rate dependency in rate_idx_match_legacy_mask(),
rate_idx_match_mcs_mask() and rate_idx_match_mask() in order to use the
previous logic to define a ratemask in rate_control_set_rates() for
station rate table. Moreover move rate mask definition logic in
rate_control_cap_mask()
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove unnecessary ieee80211_tx_info pointer from rate_control_apply_mask
signature. rate_control_apply_mask() will be used to define a ratemask in
rate_control_set_rates() for station rate table
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Perform the BSS_CHANGED_BSSID action when joining an OCB network.
This is required to set the broadcast BSSID in some network drivers.
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently OCB mode accepts frames with bssid==broadcast and type!=beacon.
Some non-data frames are sent matching this, for example probe responses.
This results in unnecessary creation of STA entries.
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To make mac80211 accept the multicast rate requested by the user the
rate control should be told that it is operating in BSS mode.
Without this, the default rate is selected in rate_control_send_low
(!pubsta and !txrc->bss)
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow setting multicast rate on OCB interfaces.
Current behaviour results in EOPNOTSUPP when attempting this.
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If driver failed to setup wiphy params (e.g. rts
threshold, fragmentation treshold) userspace
wasn't properly notified about this. This could
lead to user confusion who would think the command
succeeded even if that wasn't the case.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The original assumption of 20MHz wide channels hasn't been true since
the addition of support for 5 and 10 MHz channels.
Change the code to no longer disable all channels that don't fit into
the 20MHz grid, but instead set the appropriate flags to disable
operation on specific bandwidths.
Signed-off-by: Matthias May <matthias.may@neratec.com>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When replacing del_timer() with del_timer_sync(), I introduced
a deadlock condition :
reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
inet_csk_reqsk_queue_drop() can be called from many contexts,
one being the timer handler itself (reqsk_timer_handler()).
In this case, del_timer_sync() loops forever.
Simple fix is to test if timer is pending.
Fixes: 2235f2ac75 ("inet: fix races with reqsk timers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fragmentation cache uses information from the IP header to reassemble
packets. That information can be duplicated across VRFs -- same source
and destination addresses, protocol and id. Handle fragmentation with
VRFs by adding the VRF device index to entries in the cache and the
lookup arg.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If output device is not specified use VRF device if input device is
enslaved. This is needed to ensure tcp acks and resets go out VRF device.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a user passes in a table for new routes use that table for nexthop
lookups. Specifically, this solves the case where a connected route does
not exist in the main table, but only another table and then a subsequent
route is added with a next hop using the connected route. ie.,
$ ip route ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
169.254.0.0/16 dev eth0 scope link metric 1003
192.168.56.0/24 dev eth1 proto kernel scope link src 192.168.56.51
$ ip route ls table 10
1.1.1.0/24 dev eth2 scope link
Without this patch adding a nexthop route fails:
$ ip route add table 10 2.2.2.0/24 via 1.1.1.10
RTNETLINK answers: Network is unreachable
With this patch the route is added successfully.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a device associated with a VRF is brought up or down routes
should be added to/removed from the table associated with the VRF.
fib_magic defaults to using the main or local tables. Have it use
the table with the device if there is one.
A part of this is directing prefsrc validations to the correct
table as well.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.
inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For unconnected UDP sockets using a VRF device lookup source address
based on VRF table. This allows the UDP header to be properly setup
before showing up at the VRF device via the dst.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As with ingress use the index of VRF master device for route lookups on
egress. However, the oif should only be used to direct the lookups to a
specific table. Routes in the table are not based on the VRF device but
rather interfaces that are part of the VRF so do not consider the oif for
lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
latter part.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On ingress use index of VRF master device for route lookups if real device
is enslaved. Rules are expected to be installed for the VRF device to
direct lookups to a specific table.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is useful information to include in ipv6 netlink messages that
report interface information. IFLA_OPERSTATE is already included in
ipv4 messages, but missing for ipv6. This closes that gap.
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 10e4ea751 ("net: Fix race condition in store_rps_map") has moved the
manipulation of the rps_needed jump label under a spinlock. Since changing
the state of a jump label may sleep this is incorrect and causes warnings
during runtime.
Make rps_map_lock a mutex to allow sleeping under it.
Fixes: 10e4ea751 ("net: Fix race condition in store_rps_map")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new functions in DSA drivers to access hardware VLAN entries through
SWITCHDEV_OBJ_PORT_VLAN objects:
- port_pvid_get() and vlan_getnext() to dump a VLAN
- port_vlan_del() to exclude a port from a VLAN
- port_pvid_set() and port_vlan_add() to join a port to a VLAN
The DSA infrastructure will ensure that each VLAN of the given range
does not already belong to another bridge. If it does, it will fallback
to software VLAN and won't program the hardware.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link. The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:
net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...
When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.
1000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
1000::/8 via 8000::2 dev p8p1 metric 1024 pref medium
7000::/8 dev p7p1 proto kernel metric 256 dead linkdown pref medium
8000::/8 dev p8p1 proto kernel metric 256 pref medium
9000::/8 via 8000::2 dev p8p1 metric 2048 pref medium
9000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
fe80::/64 dev p7p1 proto kernel metric 256 dead linkdown pref medium
fe80::/64 dev p8p1 proto kernel metric 256 pref medium
This also adds devconf support and notification when sysctl values
change.
v2: drop use of rt6i_nhflags since it is not needed right now
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to track current link status of ipv6 nexthops to match
recent changes that added support for ipv4 nexthops. This takes a
simple approach to track linkdown status for next-hops and simply
checks the dev for the dst entry and sets proper flags that to be used
in the netlink message.
v2: drop use of rt6i_nhflags since it is not needed right now
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When generating /proc/net/route we emit a header followed by a line for
each route. When a short read is performed we will restart this process
based on the open file descriptor. When calculating the start point we
fail to take into account that the 0th entry is the header. This leads
us to skip the first entry when doing a continuation read.
This can be easily seen with the comparison below:
while read l; do echo "$l"; done </proc/net/route >A
cat /proc/net/route >B
diff -bu A B | grep '^[+-]'
On my example machine I have approximatly 10KB of route output. There we
see the very first non-title element is lost in the while read case,
and an entry around the 8K mark in the cat case:
+wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0
-tun1 00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0
Fix up the off-by-one when reaquiring position on continuation.
Fixes: 8be33e955c ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
BugLink: http://bugs.launchpad.net/bugs/1483440
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent refactoring of the IGMP and MLD parsing code into
ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash /
BUG() invocation for bridges:
I wrongly assumed that skb_get() could be used as a simple reference
counter for an skb which is not the case. skb_get() bears additional
semantics, a user count. This leads to a BUG() invocation in
pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb
with a user count greater than one - unfortunately the refactoring did
just that.
Fixing this by removing the skb_get() call and changing the API: The
caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to
additionally check whether the returned skb_trimmed is a clone.
Fixes: 9afd85c9e4 ("net: Export IGMP/MLD message validation code")
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TLP fails to send new packet because of receive window
limit, it should fall back to retransmit the last packet instead.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If TLP was unable to send a probe, it extended the RTO to
now + icsk_rto. But extending the RTO makes little sense
if no TLP probe went out. With this commit, instead of
extending the RTO we re-arm it relative to the transmit time
of the write queue head.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/cavium/Kconfig
The cavium conflict was overlapping dependency
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
At the last iteration of the loop, j may equal zero and thus
tp_list[j - 1] causes an invalid read.
Change the logic of the loop so that j - 1 is always >= 0.
Cc: stable@vger.kernel.org
Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The outside if statement checks that IEEE80211_TX_INTFL_MLME_CONN_TX is
set so this condition is always true. Checking twice upsets the static
checkers.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The iwlwifi driver was the only driver that used this, but as
it turns out it never needed it, so we can remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The GPIO subsystem provides dummy GPIO consumer functions if GPIOLIB is
not enabled. Hence drivers that depend on GPIOLIB, but use GPIO consumer
functionality only, can still be compiled if GPIOLIB is not enabled.
Relax the dependency on GPIOLIB if COMPILE_TEST is enabled, where
appropriate.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a system has multiple ethernet devices and during DHCP
request (for using NFS), the system waits only for HZ/2 which is
500mS before switching to another interface for DHCP.
There are some routers (Ex: Trendnet routers) which responds to
DHCP request at about 560mS. When the system has only one
ethernet interface there is no issue as the timeout is 2S and the
dev xid doesn't changes and only retries.
But when the system has multiple Ethernet like DRA74x with CPSW
in dual EMAC mode, the DHCP response is dropped as the dev xid
changes while shifting to the next device. So changing inter
device timeout to HZ (which is 1S).
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case we need to divert reads/writes using the slave MII bus, we may have
already fetched a valid PHY interface property from Device Tree, and that
mode is used by the PHY driver to make configuration decisions.
If we could not fetch the "phy-mode" property, we will assign p->phy_interface
to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as
to whether or not we should override the interface value.
Fixes: 19334920ea ("net: dsa: Set valid phy interface type")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2015-08-11
Here's an important regression fix for the 4.2-rc series that ensures
user space isn't given invalid LTK values. The bug essentially prevents
the encryption of subsequent LE connections, i.e. makes it impossible to
pair devices over LE.
Let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves module_init of 6lowpan module into core functionality
of 6lowpan module. To load the ipv6 module at probing of the 6lowpan
module should be core functionality. Loading next header compression
modules is iphc specific. Nevertheless we only support IPHC for the
generic 6LoWPAN branch right now so we can put it into the core
functionality. If possible new compression formats are introduced nhc
should load only when iphc is build.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduced the 6lowpan netdev private data struct. We name it
lowpan_priv and it's placed at the beginning of netdev private data. All
lowpan interfaces should allocate this room at first of netdev private
data. 6LoWPAN LL private data can be allocate by additional netdev private
data, e.g. dev->priv_size should be "sizeof(struct lowpan_priv) +
sizeof(LL_LOWPAN_PRIVATE_DATA)".
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The usually way to get the btle lowpan private data is to use the
introduced lowpan_dev inline function. This patch will cleanup by using
lowpan_dev consequently.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Power management support for BCM2E39 is now performed in Bluetooth
BCM UART driver.
Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Implement the switchdev_port_obj_{add,del,dump} functions in DSA to
support the SWITCHDEV_OBJ_PORT_FDB objects.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds an ndm_state member to the switchdev_obj_fdb structure,
in order to support static FDB addresses.
Set Rocker ndm_state to NUD_REACHABLE.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the prototype of port_getnext to include a vid parameter.
This is necessary to introduce the support for VLAN.
Also rename the fdb_{add,del,getnext} function pointers to
port_fdb_{add,del,getnext} since they are specific to a given port.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The gateway selection based on fast connections is using a single value
calculated from the average tq (0-255) and the download bandwidth (in
100Kibit). The formula for the first step (tq ** 2 * 10000 * bandwidth)
tends to overflow a u32 with low bandwidth settings like 50 [100KiBit]
and a tq value of over 92.
Changing this to a 64 bit unsigned integer allows to support a
bandwidth_down with up to ~2.8e10 [100KiBit] and a perfect tq of 255. This
is ~6.6 times higher than the maximum possible value of the gateway
announcement TVLV.
This problem only affects the non-default gw_sel_class 1.
Signed-off-by: Ruben Wisniewsi <ruben@vfn-nrw.de>
[sven@narfation.org: rewritten commit message]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The gw_factor is divided by BATADV_TQ_LOCAL_WINDOW_SIZE ** 2 * 64. But the
rest of the calculation has nothing to do with the tq window size and
therefore the calculation is just (tmp_gw_factor / (64 ** 3)).
Replace it with a simple shift to avoid a costly 64-bit divide when the
max_gw_factor is changed from u32 to u64. This type change is necessary
to avoid an overflow bug.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Rules can be installed that direct route lookups to specific tables based
on oif. Plumb the oif through the xfrm lookups so it gets set in the flow
struct and passed to the resolver routines.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Using ipv6_get_dsfield on the outer IP header implies that inner and
outer header are of the the same address family. For interfamily
tunnels, particularly 646, the code reading the DSCP field obtains the
wrong values (IHL and the upper four bits of the DSCP field).
This can cause the code to detect a congestion encoutered state in the
outer header and enable the corresponding bits in the inner header, too.
Since the DSCP field is stored in the xfrm mode common buffer
independently from the IP version of the outer header, it's safe (and
correct) to take this value from there.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch replaces the zone id which is pushed down into functions
with the actual zone object. It's a bigger one-time change, but
needed for later on extending zones with a direction parameter, and
thus decoupling this additional information from all call-sites.
No functional changes in this patch.
The default zone becomes a global const object, namely nf_ct_zone_dflt
and will be returned directly in various cases, one being, when there's
f.e. no zoning support.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In commit b357a364c5 ("inet: fix possible panic in
reqsk_queue_unlink()"), I missed fact that tcp_check_req()
can return the listener socket in one case, and that we must
release the request socket refcount or we leak it.
Tested:
Following packetdrill test template shows the issue
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 2920 <mss 1460,sackOK,nop,nop>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
+.002 < . 1:1(0) ack 21 win 2920
+0 > R 21:21(0)
Fixes: b357a364c5 ("inet: fix possible panic in reqsk_queue_unlink()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
reqsk_queue_destroy() and reqsk_queue_unlink() should use
del_timer_sync() instead of del_timer() before calling reqsk_put(),
otherwise we could free a req still used by another cpu.
But before doing so, reqsk_queue_destroy() must release syn_wait_lock
spinlock or risk a dead lock, as reqsk_timer_handler() might
need to take this same spinlock from reqsk_queue_unlink() (called from
inet_csk_reqsk_queue_drop())
Fixes: fa76ce7328 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains five Netfilter fixes for your net tree,
they are:
1) Silence a warning on falling back to vmalloc(). Since 88eab472ec, we can
easily hit this warning message, that gets users confused. So let's get rid
of it.
2) Recently when porting the template object allocation on top of kmalloc to
fix the netns dependencies between x_tables and conntrack, the error
checks where left unchanged. Remove IS_ERR() and check for NULL instead.
Patch from Dan Carpenter.
3) Don't ignore gfp_flags in the new nf_ct_tmpl_alloc() function, from
Joe Stringer.
4) Fix a crash due to NULL pointer dereference in ip6t_SYNPROXY, patch from
Phil Sutter.
5) The sequence number of the Syn+ack that is sent from SYNPROXY to clients is
not adjusted through our NAT infrastructure, as a result the client may
ignore this TCP packet and TCP flow hangs until the client probes us. Also
from Phil Sutter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When using a cluster of switches, some topologies will have an MDIO
bus per switch, not one for the whole cluster. Allow this to be
represented in the device tree, by adding an optional mii-bus property
at the switch level. The old platform_device method of instantiation
supports this already, so only the device tree binding needs extending
with an additional optional phandle.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for sharing GREPROTO_CISCO port was added so that
OVS gre port and kernel GRE devices can co-exist. After
flow-based tunneling patches OVS GRE protocol processing
is completely moved to ip_gre module. so there is no need
for GRE protocol hook. Following patch consolidates
GRE protocol related functions into ip_gre module.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using GRE tunnel meta data collection feature, we can implement
OVS GRE vport. This patch removes all of the OVS
specific GRE code and make OVS use a ip_gre net_device.
Minimal GRE vport is kept to handle compatibility with
current userspace application.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patch create new tunnel flag which enable
tunnel metadata collection on given device.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function will be used in gre and geneve vport implementations.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an explicit neighbour table overflow message (ratelimited) and
statistic to make diagnosing neighbour table overflows tractable in
the wild.
Diagnosing a neighbour table overflow can be quite difficult in the wild
because there is no explicit dmesg logged. Callers to neighbour code
seem to use net_dbg_ratelimit when the neighbour call fails which means
the "base message" is not emitted and the callback suppressed messages
from the ratelimiting can end-up juxtaposed with unrelated messages.
Further, a forced garbage collection will increment a stat on each call
whether it was successful in freeing-up a table entry or not, so that
statistic is only a hint. So, add a net_info_ratelimited message and
explicit statistic to the neighbour code.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the ability to toggle the vlan filtering support via
netlink. Since we're already running with rtnl in .changelink() we don't
need to take any additional locks.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
48ed7b26fa ("ipv6: reject locally assigned nexthop addresses") is too
strict; it rejects following corner-case:
ip -6 route add default via fe80::1:2:3 dev eth1
[ where fe80::1:2:3 is assigned to a local interface, but not eth1 ]
Fix this by restricting search to given device if nh is linklocal.
Joint work with Hannes Frederic Sowa.
Fixes: 48ed7b26fa ("ipv6: reject locally assigned nexthop addresses")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when trying to connect to already paired device that just
rotated its RPA MAC address, old address would be used and connection
would fail. In order to fix that, kernel must scan and receive
advertisement with fresh RPA before connecting.
This patch enables new connection establishment procedure. Instead of just
sending HCI_OP_LE_CREATE_CONN to controller, "connect" will add device to
kernel whitelist and start scan. If advertisement is received, it'll be
compared against whitelist and then trigger connection if it matches.
That fixes mentioned reconnect issue for already paired devices. It also
make whole connection procedure more robust. We can try to connect to
multiple devices at same time now, even though controller allow only one.
Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Currently, when trying to connect to already paired device that just
rotated its RPA MAC address, old address would be used and connection
would fail. In order to fix that, kernel must scan and receive
advertisement with fresh RPA before connecting.
This patch makes sure that when new procedure is in use, and we're stuck
in scan phase because no advertisement was received and timeout happened,
or app decided to close socket, scan whitelist gets properly cleaned up.
Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Currently, when trying to connect to already paired device that just
rotated its RPA MAC address, old address would be used and connection
would fail. In order to fix that, kernel must scan and receive
advertisement with fresh RPA before connecting.
This path makes sure that after advertisement is received from device that
we try to connect to, it is properly handled in check_pending_le_conn and
trigger connect attempt.
It also modifies hci_le_connect to make sure that connect attempt will be
properly continued.
Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Currently, when trying to connect to already paired device that just
rotated its RPA MAC address, old address would be used and connection
would fail. In order to fix that, kernel must scan and receive
advertisement with fresh RPA before connecting.
This patch adds hci_connect_le_scan with dependencies, new method that
will be used to connect to remote LE devices. Instead of just sending
connect request, it adds a device to whitelist. Later patches will make
use of this whitelist to send conenct request when advertisement is
received, and properly handle timeouts.
Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds hci_lookup_le_connect method, that will be used to check
wether outgoing le connection attempt is in progress.
Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixes the error handling for lowpan_xmit_fragment by replace
"-PTR_ERR" to "PTR_ERR". PTR_ERR returns already a negative errno code.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduce a new mib entry which isn't part of 802.15.4 but
useful as default behaviour to set the ack request bit or not if we
don't know if the ack request bit should set. This is currently used for
stacks like IEEE 802.15.4 6LoWPAN.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch changes the default minimum value of frame_retries to 0 and
changes the frame_retries default value to 3 which is also 802.15.4
default.
We don't use the frame_retries "-1" value as indicator for no-aret mode
anymore, instead we checking on the ack request bit inside the 802.15.4
frame control field. This allows a acknowledge handling per frame. This
checking is done by transceiver or inside xmit callback of driver layer.
If a transceiver doesn't support ARET handling the transmit
functionality ignores ack frames then, which isn't well but should not
effect anything of current functionality.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes several checks if a value is really changed. This
makes only sense if we have another layer call e.g. calling the
driver_ops which is done by callbacks like "set_channel".
For MAC settings which need to be set by phy registers (if the phy
supports that handling) this is set by doing an interface up currently
and are not direct driver_ops calls, so we remove the checks from these
configuration callbacks.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Suggested-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If we currently change the mac address inside the wpan interface while
we have a lowpan interface on top of the wpan interface, the mac address
setting doesn't reach the lowpan interface. The effect would be that the
IPv6 lowpan interface has the old SLAAC address and isn't working
anymore because the lowpan interface use in internal mechanism sometimes
dev->addr which is the old mac address of the wpan interface.
This patch checks if a wpan interface belongs to lowpan interface, if
yes then we need to check if the lowpan interface is down and change the
mac address also at the lowpan interface. When the lowpan interface will
be set up afterwards, it will use the correct SLAAC address which based
on the updated mac address setting.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Tested-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We currently supports multiple lowpan interfaces per wpan interface. I
never saw any use case into such functionality. We drop this feature now
because it's much easier do deal with address changes inside the under
laying wpan interface.
This patch removes the multiple lowpan interface and adds a lowpan_dev
netdev pointer into the wpan_dev, if this pointer isn't null the wpan
interface belongs to the assigned lowpan interface.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Tested-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The lowpan_fetch_skb function is used to fetch the first byte,
which also increments the data pointer in skb structure,
making subsequent array lookup of byte 0 actually being byte 1.
To decompress the first byte of the Flow Label when the TF flag is
set to 0x01, the second half of the first byte is needed.
The patch fixes the extraction of the Flow Label field.
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Lukasz Duda <lukasz.duda@nordicsemi.no>
Signed-off-by: Glenn Ruben Bakke <glenn.ruben.bakke@nordicsemi.no>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We should be passing the pointer itself instead of the address of the
pointer.
This was a copy and paste bug when we replaced the calls to
hci_send_cmd(). Originally, the arguments were "len, cp" but we
overwrote them with "sizeof(cp), &cp" by mistake.
Fixes: b3d3914006 ('Bluetooth: Move amp assoc read/write completed callback to amp.c')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Linus reports the following deadlock on rtnl_mutex; triggered only
once so far (extract):
[12236.694209] NetworkManager D 0000000000013b80 0 1047 1 0x00000000
[12236.694218] ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018
[12236.694224] ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff
[12236.694230] ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80
[12236.694235] Call Trace:
[12236.694250] [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
[12236.694257] [<ffffffff815d133a>] ? schedule+0x2a/0x70
[12236.694263] [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
[12236.694271] [<ffffffff815d2c3f>] ? __mutex_lock_slowpath+0x7f/0xf0
[12236.694280] [<ffffffff815d2cc6>] ? mutex_lock+0x16/0x30
[12236.694291] [<ffffffff814f1f90>] ? rtnetlink_rcv+0x10/0x30
[12236.694299] [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
[12236.694309] [<ffffffff814f5ad3>] ? rtnl_getlink+0x113/0x190
[12236.694319] [<ffffffff814f202a>] ? rtnetlink_rcv_msg+0x7a/0x210
[12236.694331] [<ffffffff8124565c>] ? sock_has_perm+0x5c/0x70
[12236.694339] [<ffffffff814f1fb0>] ? rtnetlink_rcv+0x30/0x30
[12236.694346] [<ffffffff8150d62c>] ? netlink_rcv_skb+0x9c/0xc0
[12236.694354] [<ffffffff814f1f9f>] ? rtnetlink_rcv+0x1f/0x30
[12236.694360] [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
[12236.694367] [<ffffffff8150d344>] ? netlink_sendmsg+0x484/0x5d0
[12236.694376] [<ffffffff810a236f>] ? __wake_up+0x2f/0x50
[12236.694387] [<ffffffff814cad23>] ? sock_sendmsg+0x33/0x40
[12236.694396] [<ffffffff814cb05e>] ? ___sys_sendmsg+0x22e/0x240
[12236.694405] [<ffffffff814cab75>] ? ___sys_recvmsg+0x135/0x1a0
[12236.694415] [<ffffffff811a9d12>] ? eventfd_write+0x82/0x210
[12236.694423] [<ffffffff811a0f9e>] ? fsnotify+0x32e/0x4c0
[12236.694429] [<ffffffff8108cb70>] ? wake_up_q+0x60/0x60
[12236.694434] [<ffffffff814cba09>] ? __sys_sendmsg+0x39/0x70
[12236.694440] [<ffffffff815d4797>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
It seems so far plausible that the recursive call into rtnetlink_rcv()
looks suspicious. One way, where this could trigger is that the senders
NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so
the rtnl_getlink() request's answer would be sent to the kernel instead
to the actual user process, thus grabbing rtnl_mutex() twice.
One theory would be that netlink_autobind() triggered via netlink_sendmsg()
internally overwrites the -EBUSY error to 0, but where it is wrongly
originating from __netlink_insert() instead. That would reset the
socket's portid to 0, which is then filled into NETLINK_CB(skb).portid
later on. As commit d470e3b483 ("[NETLINK]: Fix two socket hashing bugs.")
also puts it, -EBUSY should not be propagated from netlink_insert().
It looks like it's very unlikely to reproduce. We need to trigger the
rhashtable_insert_rehash() handler under a situation where rehashing
currently occurs (one /rare/ way would be to hit ht->elasticity limits
while not filled enough to expand the hashtable, but that would rather
require a specifically crafted bind() sequence with knowledge about
destination slots, seems unlikely). It probably makes sense to guard
__netlink_insert() in any case and remap that error. It was suggested
that EOVERFLOW might be better than an already overloaded ENOMEM.
Reference: http://thread.gmane.org/gmane.linux.network/372676
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon receipt of SYNACK from the server, ipt_SYNPROXY first sends back an ACK to
finish the server handshake, then calls nf_ct_seqadj_init() to initiate
sequence number adjustment of forwarded packets to the client and finally sends
a window update to the client to unblock it's TX queue.
Since synproxy_send_client_ack() does not set synproxy_send_tcp()'s nfct
parameter, no sequence number adjustment happens and the client receives the
window update with incorrect sequence number. Depending on client TCP
implementation, this leads to a significant delay (until a window probe is
being sent).
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This happens when networking namespaces are enabled.
Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch fix double word "the the" in
Documentation/DocBook/networking/API-eth-get-headlen.html
Documentation/DocBook/networking/netdev.html
Documentation/DocBook/networking.xml
These files are generated from comment in source,
so I have to fix comment in net/ethernet/eth.c.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
label on the stack, then after popping the resulting packet must be
treated as a IPv4 packet and forwarded based on the IPv4 header. The
same is true for IPv6 Explicit NULL with an IPv6 packet following.
Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
add an attribute that specifies the expected payload type for use at
forwarding time for determining the type of the encapsulated packet
instead of inspecting the first nibble of the packet.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the fdb_{add,del,getnext} function pointer in favor of new
port_fdb_{add,del,getnext}.
Implement the switchdev_port_obj_{add,del,dump} functions in DSA to
support the SWITCHDEV_OBJ_PORT_FDB objects.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a is_static boolean to the switchdev_obj_fdb structure,
in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The address in the switchdev_obj_fdb structure is currently represented
as a pointer. Replacing it for a 6-byte array allows switchdev to carry
addresses directly read from hardware registers, not stored by the
switch chip driver (as in Rocker).
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix a double word "the the"
in Documentation/DocBook/networking.xml and
Documentation/DocBook/networking/API-Wimax-report-rfkill-sw.html.
These files are generated from comment in source, so I had to
fix the typo in net/wimax/io-rfkill.c
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race condition in store_rps_map that allows jump label
count in rps_needed to go below zero. This can happen when
concurrently attempting to set and a clear map.
Scenario:
1. rps_needed count is zero
2. New map is assigned by setting thread, but rps_needed count _not_ yet
incremented (rps_needed count still zero)
2. Map is cleared by second thread, old_map set to that just assigned
3. Second thread performs static_key_slow_dec, rps_needed count now goes
negative
Fix is to increment or decrement rps_needed under the spinlock.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- prevent DAT from replying on behalf of local clients and confuse L2
bridges
- fix crash on double list removal of TT objects (tt_local_entry)
- fix crash due to missing NULL checks
- initialize bw values for new GWs objects to prevent memory leak
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJVwdK+AAoJEOb/4TMchkvfO9UP/i8vIBZ0XabuYlG9r5GnYh8s
1WvO7jbe4zYhe3WrfAibcfcxMT6v0aI6hc/VRCb36zTNT3VJeh76+fKGwLC4ZU08
UdefhzCavV/gg2fX0b67aA/SsC6I9FrZAG8VOt5GXhMWqjZYSwbxMKLrNe/QRJks
hUzfcQbmFjIMSTK2Zf4/MV0AHzzPKbHixE8t3to2DmlHhZ6lCWV5l42nv/H/s8js
uxNI06rnQBY7nY06rZEualznEZpN7AFtFeIw9zXDZ0PgkIPrsjShwu/ZyzVK9wg1
4ld1JQjf1uKbrOqvdY4bAx8EUUQQ3n68513IXuKLgAdPcBGYVFo8atbk9K24QPue
KzO5tqg7ELf6EO1/haH/VLOTCrMxiC1C62CuNxQn1rDIUAIUrtX9Rqt2f2bS75KG
tGNJTUfQfpNCY4IlqaUE8cdzABiY6cLIf68gSmXl/FfBzVNWDc1Totsi/B6C2WEM
PYjXnmPc97tz0/iWRpvOMd1DxoMXnM0odH4gifUt3Osl0hKMbelhnGKACCvHM8jK
3EOOz1yZIEoX9OkwtFnCji+web884ASUJjnyrNJgbyOY3juj2B0hSrWgK++trUMS
+5oZMkFtHaar0l+JRc2Xhf/GqtkY50lGlwcPlOukDyKJ+RvPTRsShIYhns75Z2us
dWqAlJEdf3kRZuDri8O3
=JfaB
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- prevent DAT from replying on behalf of local clients and confuse L2
bridges
- fix crash on double list removal of TT objects (tt_local_entry)
- fix crash due to missing NULL checks
- initialize bw values for new GWs objects to prevent memory leak
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When sampling rate is 1, the sampling probability is UINT32_MAX. The packet
should be sampled even the prandom32() generate the number of UINT32_MAX.
And none packet need be sampled when the probability is 0.
Signed-off-by: Wenyu Zhang <wenyuz@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA,
so combine them into single IFLA_VXLAN_COLLECT_METADATA flag.
'flowbased' doesn't convey real meaning of the vxlan tunnel mode.
This mode can be used by routing, tc+bpf and ovs.
Only ovs is strictly flow based, so 'collect metadata' is a better
name for this tunnel mode.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register pernet subsys init/stop functions that will set up
and tear down per-net RDS-TCP listen endpoints. Unregister
pernet subusys functions on 'modprobe -r' to clean up these
end points.
Enable keepalive on both accept and connect socket endpoints.
The keepalive timer expiration will ensure that client socket
endpoints will be removed as appropriate from the netns when
an interface is removed from a namespace.
Register a device notifier callback that will clean up all
sockets (and thus avoid the need to wait for keepalive timeout)
when the loopback device is unregistered from the netns indicating
that the netns is getting deleted.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>