Commit Graph

38866 Commits

Author SHA1 Message Date
Nikolay Aleksandrov
6ea3c9d5b0 mpls: fix mpls_net_init memory leak
Fix a memory leak in the mpls netns init function in case of failure. If
register_net_sysctl fails then we need to free the ctl_table.

Fixes: 7720c01f3f ("mpls: Add a sysctl to control the size of the mpls label table")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:45:09 -07:00
Daniel Borkmann
c3a8d94746 tcp: use dctcp if enabled on the route to the initiator
Currently, the following case doesn't use DCTCP, even if it should:
A responder has f.e. Cubic as system wide default, but for a specific
route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
initiating host then uses DCTCP as congestion control, but since the
initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
and we have to fall back to Reno after 3WHS completes.

We were thinking on how to solve this in a minimal, non-intrusive
way without bloating tcp_ecn_create_request() needlessly: lets cache
the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
to only do a single metric feature lookup inside tcp_ecn_create_request().

Joint work with Florian Westphal.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:34:00 -07:00
Daniel Borkmann
b8d3e4163a fib, fib6: reject invalid feature bits
Feature bits that are invalid should not be accepted by the kernel,
only the lower 4 bits may be configured, but not the remaining ones.
Even from these 4, 2 of them are unused.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:34:00 -07:00
Daniel Borkmann
1bb14807bc net: fib6: reduce identation in ip6_convert_metrics
Reduce the identation a bit, there's no need to artificically have
it increased.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:34:00 -07:00
Florian Westphal
6cf9dfd3bd net: fib: move metrics parsing to a helper
fib_create_info() is already quite large, so before adding more
code to the metrics section move that to a helper, similar to
ip6_convert_metrics.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:34:00 -07:00
Pravin B Shelar
4c22279848 ip-tunnel: Use API to access tunnel metadata options.
Currently tun-info options pointer is used in few cases to
pass options around. But tunnel options can be accessed using
ip_tunnel_info_opts() API without using the pointer. Following
patch removes the redundant pointer and consistently make use
of API.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:28:56 -07:00
Madalin Bucur
d1bfc62591 ipv4: fix 32b build
Address remaining issue after 80ec192.

Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 11:32:41 -07:00
David S. Miller
80ec1927b1 ipv4: Fix 32-bit build.
net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64':
>> net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function)
      v = *(((u64 *)bhptr) + offt);
                             ^
   net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in
   net/ipv4/af_inet.c: In function 'snmp_fold_field64':
>> net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function)
      res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
                                          ^
>> net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field'
      res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
             ^
   net/ipv4/af_inet.c:1455:5: note: declared here
    u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt)
        ^

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30 22:40:44 -07:00
Ken-ichirou MATSUZAWA
0ef707700f netlink: rx mmap: fix POLLIN condition
Poll() returns immediately after setting the kernel current frame
(ring->head) to SKIP from user space even though there is no new
frame. And in a case of all frames is VALID, user space program
unintensionally sets (only) kernel current frame to UNUSED, then
calls poll(), it will not return immediately even though there are
VALID frames.

To avoid situations like above, I think we need to scan all frames
to find VALID frames at poll() like netlink_alloc_skb(),
netlink_forward_ring() finding an UNUSED frame at skb allocation.

Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30 21:55:51 -07:00
Raghavendra K T
a3a773726c net: Optimize snmp stat aggregation by walking all the percpu data at once
Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.

reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data of an item (iteratively for around 36 items).

idea: This patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.

Docker creation got faster by more than 2x after the patch.

Result:
                       Before           After
Docker creation time   6.836s           3.25s
cache miss             2.7%             1.41%

perf before:
    50.73%  docker           [kernel.kallsyms]       [k] snmp_fold_field
     9.07%  swapper          [kernel.kallsyms]       [k] snooze_loop
     3.49%  docker           [kernel.kallsyms]       [k] veth_stats_one
     2.85%  swapper          [kernel.kallsyms]       [k] _raw_spin_lock

perf after:
    10.57%  docker           docker                [.] scanblock
     8.37%  swapper          [kernel.kallsyms]     [k] snooze_loop
     6.91%  docker           [kernel.kallsyms]     [k] snmp_get_cpu_field
     6.67%  docker           [kernel.kallsyms]     [k] veth_stats_one

changes/ideas suggested:
Using buffer in stack (Eric), Usage of memset (David), Using memcpy in
place of unaligned_put (Joe).

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30 21:48:58 -07:00
Raghavendra K T
c4c6bc3146 net: Introduce helper functions to get the per cpu data
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30 21:48:58 -07:00
David S. Miller
06fb4e701b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-08-30 21:45:01 -07:00
Pravin B Shelar
a581b96dbf openvswitch: Remove vport-net
This structure is not used anymore.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 19:07:15 -07:00
Pravin B Shelar
8c876639c9 openvswitch: Remove vport stats.
Since all vport types are now backed by netdev, we can directly
use netdev stats. Following patch removes redundant stat
from vport.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 19:07:15 -07:00
Pravin B Shelar
3eedb41fb4 openvswitch: Remove egress_tun_info.
tun info is passed using skb-dst pointer. Now we have
converted all vports to netdev based implementation so
Now we can remove redundant pointer to tun-info from OVS_CB.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 19:07:15 -07:00
Pravin B Shelar
24d43f32d8 openvswitch: Remove vport get_name()
Remove unused get_name() function pointer from vport ops.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 19:07:15 -07:00
Simon Horman
c30da49789 openvswitch: retain parsed IPv6 header fields in flow on error skipping extension headers
When an error occurs skipping IPv6 extension headers retain the already
parsed IP protocol and IPv6 addresses in the flow. Also assume that the
packet is not a fragment in the absence of information to the contrary;
that is always use the frag_off value set by ipv6_skip_exthdr().

This allows matching on the IP protocol and IPv6 addresses of packets
with malformed extension headers.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 13:39:59 -07:00
David S. Miller
f5004a14fa Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-08-28

One more bunch of Bluetooth patches for 4.3:

 - Crash fix for hci_bcm driver
 - Enhancements to hci_intel driver (e.g. baudrate configuration)
 - Fix for SCO link type after multiple connect attempts
 - Cleanups & minor fixes in a few other places

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 13:15:03 -07:00
Jiri Benc
a43a9ef6a2 vxlan: do not receive IPv4 packets on IPv6 socket
By default (subject to the sysctl settings), IPv6 sockets listen also for
IPv4 traffic. Vxlan is not prepared for that and expects IPv6 header in
packets received through an IPv6 socket.

In addition, it's currently not possible to have both IPv4 and IPv6 vxlan
tunnel on the same port (unless bindv6only sysctl is enabled), as it's not
possible to create and bind both IPv4 and IPv6 vxlan interfaces and there's
no way to specify both IPv4 and IPv6 remote/group IP addresses.

Set IPV6_V6ONLY on vxlan sockets to fix both of these issues. This is not
done globally in udp_tunnel, as l2tp and tipc seems to work okay when
receiving IPv4 packets on IPv6 socket and people may rely on this behavior.
The other tunnels (geneve and fou) do not support IPv6.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 13:07:54 -07:00
Jiri Benc
b9b6695cf0 fou: reject IPv6 config
fou does not really support IPv6 encapsulation. After an UDP socket is
created in fou_create, the encap_rcv callback is set either to fou_udp_recv
or to gue_udp_recv. Both of those unconditionally assume that the received
packet has an IPv4 header and access the data at network_header as it was an
IPv4 header. This leads to IPv6 flow label being interpreted as IP packet
length, etc.

Disallow fou tunnel to be configured as IPv6 until real IPv6 support is
added to fou.

CC: Tom Herbert <tom@herbertland.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 13:07:54 -07:00
Jiri Benc
7f9562a1f4 ip_tunnels: record IP version in tunnel info
There's currently nothing preventing directing packets with IPv6
encapsulation data to IPv4 tunnels (and vice versa). If this happens,
IPv6 addresses are incorrectly interpreted as IPv4 ones.

Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this
in ip_tunnel_info. Reject packets at appropriate places if they are supposed
to be encapsulated into an incompatible protocol.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 13:07:54 -07:00
Jiri Benc
46fa062ad6 ip_tunnels: convert the mode field of ip_tunnel_info to flags
The mode field holds a single bit of information only (whether the
ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags.
This allows more mode flags to be added.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 13:07:54 -07:00
David Ahern
f6d3c19274 net: FIB tracepoints
A few useful tracepoints developing VRF driver.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 13:05:16 -07:00
Vlad Yasevich
73e6742027 sctp: Do not try to search for the transport twice
When removing an non-primary transport during ASCONF
processing, we end up traversing the transport list
twice: once in sctp_cmd_del_non_primary, and once in
sctp_assoc_del_peer.  We can avoid the second
search and call sctp_assoc_rm_peer() instead.
Found by code inspection during code reviews.

Signed-off-by: Vladislav Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 22:25:43 -07:00
lucien
7c5a946181 sctp: ASCONF-ACK with Unresolvable Address should be sent
RFC 5061:
    This is an opaque integer assigned by the sender to identify each
    request parameter.  The receiver of the ASCONF Chunk will copy this
    32-bit value into the ASCONF Response Correlation ID field of the
    ASCONF-ACK response parameter.  The sender of the ASCONF can use this
    same value in the ASCONF-ACK to find which request the response is
    for.  Note that the receiver MUST NOT change this 32-bit value.

    Address Parameter: TLV

    This field contains an IPv4 or IPv6 address parameter, as described
    in Section 3.3.2.1 of [RFC4960].

ASCONF chunk with Error Cause Indication Parameter (Unresolvable Address)
should be sent if the Delete IP Address is not part of the association.

  Endpoint A                           Endpoint B
  (ESTABLISHED)                        (ESTABLISHED)

  ASCONF        ----------------->
  (Delete IP Address)
                <-----------------      ASCONF-ACK
                                        (Unresolvable Address)

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 22:25:42 -07:00
Ken-ichirou MATSUZAWA
7084a31589 netlink: mmap: fix lookup frame position
__netlink_lookup_frame() was always called with the same "pos"
value in netlink_forward_ring(). It will look at the same ring entry
header over and over again, every time through this loop. Then cycle
through the whole ring, advancing ring->head, not "pos" until it
equals the "ring->head != head" loop test fails.

Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 22:25:42 -07:00
Christophe Ricard
0a6a3a23ea netlink: add NETLINK_CAP_ACK socket option
Since commit c05cdb1b86 ("netlink: allow large data transfers from
user-space"), the kernel may fail to allocate the necessary room for the
acknowledgment message back to userspace. This patch introduces a new
socket option that trims off the payload of the original netlink message.

The netlink message header is still included, so the user can guess from
the sequence number what is the message that has triggered the
acknowledgment.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 22:25:42 -07:00
Joe Stringer
0d5cdef8d5 openvswitch: Fix conntrack compilation without mark.
Fix build with !CONFIG_NF_CONNTRACK_MARK && CONFIG_OPENVSWITCH_CONNTRACK

Fixes: 182e304 ("openvswitch: Allow matching on conntrack mark")
Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Tested-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 22:23:59 -07:00
David S. Miller
581a5f2a61 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next tree.
In sum, patches to address fallout from the previous round plus updates from
the IPVS folks via Simon Horman, they are:

1) Add a new scheduler to IPVS: The weighted overflow scheduling algorithm
   directs network connections to the server with the highest weight that is
   currently available and overflows to the next when active connections exceed
   the node's weight. From Raducu Deaconu.

2) Fix locking ordering in IPVS, always take rtnl_lock in first place. Patch
   from Julian Anastasov.

3) Allow to indicate the MTU to the IPVS in-kernel state sync daemon. From
   Julian Anastasov.

4) Enhance multicast configuration for the IPVS state sync daemon. Also from
   Julian.

5) Resolve sparse warnings in the nf_dup modules.

6) Fix a linking problem when CONFIG_NF_DUP_IPV6 is not set.

7) Add ICMP codes 5 and 6 to IPv6 REJECT target, they are more informative
   subsets of code 1. From Andreas Herz.

8) Revert the jumpstack size calculation from mark_source_chains due to chain
   depth miscalculations, from Florian Westphal.

9) Calm down more sparse warning around the Netfilter tree, again from Florian
   Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 16:29:59 -07:00
Nikolay Aleksandrov
c9fd56b34e netpoll: warn on netpoll_send_udp users who haven't disabled irqs
Make sure we catch future netpoll_send_udp users who use it without
disabling irqs and also as a hint for poll_controller users.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 16:24:54 -07:00
Daniel Borkmann
c1b3b19923 net: sched: don't break line in tc_classify loop notification
Just some minor noise follow-up to address some stylistic issues of
commit 3b3ae88026 ("net: sched: consolidate tc_classify{,_compat}").
Accidentally v1 instead of v2 of that commit got applied, so this
patch adds the relative diff.

Suggested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 14:01:15 -07:00
David S. Miller
8b72ca67fe Included changes:
- code beautification
 - remove obsolete 'deleted' attribute for bat-gw node
 - increase internal version number
 - prevent potential access to netdev object after deregistration
 - set needed_head/tail_room for batman virtual interface
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJV4CMzAAoJEOb/4TMchkvfsjYP/1IckH6GA4vMROXZ6+Y89ZVl
 Ii6uQck+0FUCkzld+sn8RYPDClqmNTB9PcYAVrrKDMFxOZVCdIvK+iLh1s2PQK4o
 eiab+V93VNK3DTlouvszsW0mdWCaBXiKR01PpEcr528d4LxFnGbiCKDGkcr/A0HM
 yMW9ZwmpohFk0/CRZPZlDJjTlTAUsvp+YXqA2vBUdBT1zN8Lb3BB7jK/rXXreLJ9
 nnNNn+AEw6Fy/FXTDc3QKMrCdyFlN+bHnyXds6345p5asy0JCVh0Sk2zuySi2MDh
 LT4cujIP5LtAhp8i52R5ZjOvNOQfGAyRxvBd20gZM/lKfMcIK3KEaMe7Se87NySR
 IhQ1nwE5S4AWnlmU+xfmX4vReojb9sCLhBN8oMC5bm3uqa4kLR9e4CBaPdRSpZex
 P3Z9DPaV1GAe3yyGFz9pRUTXzAuVMtYKjkObDkOVq+oKqBpG5UMf5gwEkycILvzb
 wwqHbjE7QY1izE8TgDe1y+GKPR8Qivwm7X8gwuHqmkKCNCmrqNkGjPkp1W4RGkzg
 BGJmS9FENxxILvt68rXZYqQ2JUg3Z+k/Jiw5gmQGPjwH62JnEMkKD4NMphDf6hxm
 85bT9J8mF2AJJKDvEYwDu+KFEhX0LyhcN38SGq5QYRS+9nf0JoZzWujHJ7oXHJiw
 o7mwRayOYPLzeLLWpjWy
 =KKXM
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
Included changes:
- code beautification
- remove obsolete 'deleted' attribute for bat-gw node
- increase internal version number
- prevent potential access to netdev object after deregistration
- set needed_head/tail_room for batman virtual interface
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:43:33 -07:00
Valentin Rothberg
9723e6abc7 openswitch: fix typo CONFIG_NF_CONNTRACK_LABEL
Fix typo in conntrack.c
s/CONFIG_NF_CONNTRACK_LABEL/CONFIG_NF_CONNTRACK_LABELS/

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:39:53 -07:00
David Ahern
192132b9a0 net: Add support for VRFs to inetpeer cache
inetpeer caches based on address only, so duplicate IP addresses within
a namespace return the same cached entry. Enhance the ipv4 address key
to contain both the IPv4 address and VRF device index.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:32:36 -07:00
David Ahern
d39d14ffa2 net: Add helper function to compare inetpeer addresses
tcp_metrics and inetpeer both have functions to compare inetpeer
addresses. Consolidate into 1 version.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:32:36 -07:00
David Ahern
3abef286cf net: Add set,get helpers for inetpeer addresses
Use inetpeer set,get helpers in tcp_metrics rather than peeking into
the inetpeer_addr struct.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:32:36 -07:00
David Ahern
72afa352d6 net: Introduce ipv4_addr_hash and use it for tcp metrics
Refactors a common line into helper function.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:32:35 -07:00
Philip Downey
df2cf4a78e IGMP: Inhibit reports for local multicast groups
The range of addresses between 224.0.0.0 and 224.0.0.255 inclusive, is
reserved for the use of routing protocols and other low-level topology
discovery or maintenance protocols, such as gateway discovery and
group membership reporting.  Multicast routers should not forward any
multicast datagram with destination addresses in this range,
regardless of its TTL.

Currently, IGMP reports are generated for this reserved range of
addresses even though a router will ignore this information since it
has no purpose.  However, the presence of reserved group addresses in
an IGMP membership report uses up network bandwidth and can also
obscure addresses of interest when inspecting membership reports using
packet inspection or debug messages.

Although the RFCs for the various version of IGMP (e.g.RFC 3376 for
v3) do not specify that the reserved addresses be excluded from
membership reports, it should do no harm in doing so.  In particular
there should be no adverse effect in any IGMP snooping functionality
since 224.0.0.x is specifically excluded as per RFC 4541 (IGMP and MLD
Snooping Switches Considerations) section 2.1.2. Data Forwarding
Rules:

    2) Packets with a destination IP (DIP) address in the 224.0.0.X
       range which are not IGMP must be forwarded on all ports.

IGMP reports for local multicast groups can now be optionally
inhibited by means of a system control variable (by setting the value
to zero) e.g.:
    echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports

To retain backwards compatibility the previous behaviour is retained
by default on system boot or reverted by setting the value back to
non-zero e.g.:
    echo 1 >  /proc/sys/net/ipv4/igmp_link_local_mcast_reports

Signed-off-by: Philip Downey <pdowney@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:28:47 -07:00
Florian Westphal
851345c5bb netfilter: reduce sparse warnings
bridge/netfilter/ebtables.c:290:26: warning: incorrect type in assignment (different modifiers)
-> remove __pure annotation.

ipv6/netfilter/ip6t_SYNPROXY.c:240:27: warning: cast from restricted __be16
-> switch ntohs to htons and vice versa.

netfilter/core.c:391:30: warning: symbol 'nfq_ct_nat_hook' was not declared. Should it be static?
-> delete it, got removed

net/netfilter/nf_synproxy_core.c:221:48: warning: cast to restricted __be32
-> Use __be32 instead of u32.

Tested with objdiff that these changes do not affect generated code.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-28 21:04:12 +02:00
Florian Westphal
98dbbfc3f1 Revert "netfilter: xtables: compute exact size needed for jumpstack"
This reverts commit 98d1bd802c.

mark_source_chains will not re-visit chains, so

*filter
:INPUT ACCEPT [365:25776]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [217:45832]
:t1 - [0:0]
:t2 - [0:0]
:t3 - [0:0]
:t4 - [0:0]
-A t1 -i lo -j t2
-A t2 -i lo -j t3
-A t3 -i lo -j t4
# -A INPUT -j t4
# -A INPUT -j t3
# -A INPUT -j t2
-A INPUT -j t1
COMMIT

Will compute a chain depth of 2 if the comments are removed.
Revert back to counting the number of chains for the time being.

Reported-by: Cong Wang <cwang@twopensource.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-28 21:04:11 +02:00
Kuba Pawlak
618353b1f3 Bluetooth: Fix SCO link type handling on connection complete
Synchronous connections are initially created with type eSCO.
Link manager may reject proposed link parameters, which triggers
connection setup retry with a different set. Link type embedded
in responses should be disregarded until Synchronous Connect Complete
returns Success (0x00). Current code updates link type every time
which creates an issue when link type changes to SCO and back to eSCO
on further attepts.

Issue happens with BlackBerry 9100 and 9700 with Intel WilkinsPeak
on third connection setup attept

2015-05-18 01:27:57.332242 < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17
    handle 256 voice setting 0x0060 ptype 0x0380
2015-05-18 01:27:57.333604 > HCI Event: Command Status (0x0f) plen 4
    Setup Synchronous Connection (0x01|0x0028) status 0x00 ncmd 1
2015-05-18 01:27:57.334614 > HCI Event: Synchronous Connect Complete (0x2c) plen 17
    status 0x1a handle 0 bdaddr 30:7C:30:B3:A8:86 type SCO
    Error: Unsupported Remote Feature / Unsupported LMP Feature
2015-05-18 01:27:57.334895 < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17
    handle 256 voice setting 0x0060 ptype 0x0380
2015-05-18 01:27:57.335601 > HCI Event: Command Status (0x0f) plen 4
    Setup Synchronous Connection (0x01|0x0028) status 0x00 ncmd 1
2015-05-18 01:27:57.336610 > HCI Event: Synchronous Connect Complete (0x2c) plen 17
    status 0x1a handle 0 bdaddr 30:7C:30:B3:A8:86 type SCO
    Error: Unsupported Remote Feature / Unsupported LMP Feature
2015-05-18 01:27:57.336685 < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17
    handle 256 voice setting 0x0060 ptype 0x03c8
2015-05-18 01:27:57.337603 > HCI Event: Command Status (0x0f) plen 4
    Setup Synchronous Connection (0x01|0x0028) status 0x00 ncmd 1
2015-05-18 01:27:57.342608 > HCI Event: Max Slots Change (0x1b) plen 3
    handle 256 slots 1
2015-05-18 01:27:57.377631 > HCI Event: Synchronous Connect Complete (0x2c) plen 17
    status 0x00 handle 257 bdaddr 30:7C:30:B3:A8:86 type eSCO
    Air mode: CVSD

Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-28 21:03:00 +02:00
Stefan Schmidt
4e1795de10 nl802154: stricter input checking for boolean inputs
So far we handled boolean input by forcing them with !! and assigning
them into a bool. This allowed userspace to send values > 1 which were
used as 1. We should be stricter here and return -EINVAL for all but
0 or 1.

Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-28 21:00:37 +02:00
Nicholas Krause
df945360ce Bluetooth: Make the function sco_conn_del have a return type of void
This makes the function sco_conn_del have a return type of void now
due to this function always running successfully and thus never
needing to signal its caller when a non recoverable internal failure
occurs by returning a error code to its respective caller.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-28 21:00:37 +02:00
David S. Miller
0d36938bb8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-08-27 21:45:31 -07:00
Linus Torvalds
e001d7084a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Some straggler bug fixes here:

   1) Netlink_sendmsg() doesn't check iterator type properly in mmap
      case, from Ken-ichirou MATSUZAWA.

   2) Don't sleep in atomic context in bcmgenet driver, from Florian
      Fainelli.

   3) The pfkey_broadcast() code patch can't actually ever use anything
      other than GFP_ATOMIC.  And the cases that right now pass
      GFP_KERNEL or similar will currently trigger an RCU splat.  Just
      use GFP_ATOMIC unconditionally.  From David Ahern.

   4) Fix FD bit timings handling in pcan_usb driver, from Marc
      Kleine-Budde.

   5) Cache dst leaked in ip6_gre tunnel removal, fix from Huaibin Wang.

   6) Traversal into drivers/net/ethernet/renesas should be triggered by
      CONFIG_NET_VENDOR_RENESAS, not a particular driver's config
      option.  From Kazuya Mizuguchi.

   7) Fix regression in handling of igmp_join errors in vxlan, from
      Marcelo Ricardo Leitner.

   8) Make phy_{read,write}_mmd_indirect() properly take the mdio_lock
      mutex when programming the registers.  From Russell King.

   9) Fix non-forced handling in u32_destroy(), from WANG Cong.

  10) Test the EVENT_NO_RUNTIME_PM flag before it is cleared in
      usbnet_stop(), from Eugene Shatokhin.

  11) In sfc driver, don't fetch statistics firmware isn't capable of,
      from Bert Kenward.

  12) Verify ASCONF address parameter location in SCTP, from Xin Long"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state
  sctp: asconf's process should verify address parameter is in the beginning
  sfc: only use vadaptor stats if firmware is capable
  net: phy: fixed: propagate fixed link values to struct
  usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared
  drivers: net: xgene: fix: Oops in linkwatch_fire_event
  cls_u32: complete the check for non-forced case in u32_destroy()
  net: fec: use reinit_completion() in mdio accessor functions
  net: phy: add locking to phy_read_mmd_indirect()/phy_write_mmd_indirect()
  vxlan: re-ignore EADDRINUSE from igmp_join
  net: compile renesas directory if NET_VENDOR_RENESAS is configured
  ip6_gre: release cached dst on tunnel removal
  phylib: Make PHYs children of their MDIO bus, not the bus' parent.
  can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters
  net: Fix RCU splat in af_key
  net: bcmgenet: fix uncleaned dma flags
  net: bcmgenet: Avoid sleeping in bcmgenet_timeout
  netlink: mmap: fix tx type check
2015-08-27 17:52:38 -07:00
Phil Sutter
3e692f2153 net: sched: simplify attach_one_default_qdisc()
Now that noqueue qdisc can be attached just like any other qdisc, no
special treatment is necessary anymore when attaching it as default
qdisc.

This change has the added benefit that 'tc qdisc show' prints noqueue
instead of nothing for devices defaulting to noqueue.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:14:30 -07:00
Phil Sutter
d66d6c3152 net: sched: register noqueue qdisc
This way users can attach noqueue just like any other qdisc using tc
without having to mess with tx_queue_len first.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:14:30 -07:00
Phil Sutter
db4094bca7 net: sched: ignore tx_queue_len when assigning default qdisc
Since alloc_netdev_mqs() sets IFF_NO_QUEUE for drivers not initializing
tx_queue_len, it is safe to assume that if tx_queue_len is zero,
dev->priv flags always contains IFF_NO_QUEUE.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:14:29 -07:00
Phil Sutter
f84bb1eac0 net: fix IFF_NO_QUEUE for drivers using alloc_netdev
Printing a warning in alloc_netdev_mqs() if tx_queue_len is zero and
IFF_NO_QUEUE not set is not appropriate since drivers may use one of the
alloc_netdev* macros instead of alloc_etherdev*, thereby not
intentionally leaving tx_queue_len uninitialized. Instead check here if
tx_queue_len is zero and set IFF_NO_QUEUE, so the value of tx_queue_len
can be ignored in net/sched_generic.c.

Fixes: 906470c ("net: warn if drivers set tx_queue_len = 0")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:14:29 -07:00
Jean Sacren
69dba9bbc5 sock: fix kernel doc error
The symbol '__sk_reclaim' is not present in the current tree. Apparently
'__sk_reclaim' was meant to be '__sk_mem_reclaim', so fix it with the
right symbol name for the kernel doc.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:12:41 -07:00