Commit Graph

42594 Commits

Author SHA1 Message Date
Simon Horman
49dbe7ae21 sit: support MPLS over IPv4
Extend the SIT driver to support MPLS over IPv4. This implementation
extends existing support for IPv6 over IPv4 and IPv4 over IPv4.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09 17:45:56 -04:00
Simon Horman
8afe97e5d4 tunnels: support MPLS over IPv4 tunnels
Extend tunnel support to MPLS over IPv4.  The implementation extends the
existing differentiation between IPIP and IPv6 over IPv4 to also cover MPLS
over IPv4.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09 17:45:56 -04:00
Nikolay Aleksandrov
a65056ecf4 net: bridge: extend MLD/IGMP query stats
As was suggested this patch adds support for the different versions of MLD
and IGMP query types. Since the user visible structure is still in net-next
we can augment it instead of adding netlink attributes.
The distinction between the different IGMP/MLD query types is done as
suggested in Section 7.1, RFC 3376 [1] and Section 8.1, RFC 3810 [2] based
on query payload size and code for IGMP. Since all IGMP packets go through
multicast_rcv() and it uses ip_mc_check_igmp/ipv6_mc_check_mld we can be
sure that at least the ip/ipv6 header can be directly used.

[1] https://tools.ietf.org/html/rfc3376#section-7
[2] https://tools.ietf.org/html/rfc3810#section-8.1

Suggested-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09 17:40:09 -04:00
Marcelo Ricardo Leitner
f1533cce60 sctp: fix panic when sending auth chunks
When we introduced GSO support, if using auth the auth chunk was being
left queued on the packet even after the final segment was generated.
Later on sctp_transmit_packet it calls sctp_packet_reset, which zeroed
the packet len while not accounting for this left-over. This caused more
space to be used the next packet due to the chunk still being queued,
but space which wasn't allocated as its size wasn't accounted.

The fix is to only queue it back when we know that we are going to
generate another segment.

Fixes: 90017accff ("sctp: Add GSO support")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09 00:08:21 -04:00
Vivien Didelot
d390238c4f net: dsa: initialize the routing table
The routing table of every switch in a tree is currently initialized to
all zeros. This is an issue since 0 is a valid port number.

Add a DSA_RTABLE_NONE=-1 constant to initialize the signed values of the
routing table pointing to other switches.

This fixes the device mapping of the mv88e6xxx driver where the port
pointing to the switch itself and to non-existent switches was wrongly
configured to be 0. It is now set to the expected 0xf value.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08 23:59:49 -04:00
David S. Miller
cc3baecb21 RxRPC rewrite
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAV3zeiPSw1s6N8H32AQI9Iw//RxwhAzb+ovNIizPqcFRt4BIS3Or6unov
 Bx9jtq+U1POWZWtlIHYZKDH8ndxMnC56NkEHmpIK3uEiqJ6BoQPpDdfN+hhREysV
 Hoa+JOkMgufPBanU/JyKY/4vsmlvLuoOdppN1OA/kx1KECux9xJWIrvFsCUQGeat
 nDsdWChHkZAm/GDPZiFvxEBVaxDe2dmnDMBFTst1RsrH2uICSqM4k5srmjc3NPAY
 bsTqZeQGTIK1V9MggwBHxHFMvvERlGDpcrpoMRjeTzMmCpCg5endJoSu3hdNjHUO
 o5Fi50dhLI5jo84DQiXL0wM4SLND0QQygl+QeU3zlJYtsQsF6WxPnIEGqlGr3+WV
 I4wjDc5lxECyQIjCsrCo5ZwJ47Kqmm/ZQ4uGd9JooAVhqhP7/2dhFH0zXywJZzKs
 zo+dWTF5Xvde+mlknm1RCTgkdx3msPH9EVkEoO4FOPOAg6lhIMQFFXLXZfGr9oX6
 V99t+8t8YhDPTbL4AQnzh/aMHtbpM6be4TYiRRjT6iZLuvWOPW/zpp/1hmyeEkbU
 KZNDunC2tH030Fx5toGi3b2i8M5SJdyex9Udg/YsNexpWmyHMS49PoGk9ZnRRPA+
 xn9+xIVsqTh+xbiyCOPJqlQMMK9ayF7isT2N8T19qoVJxurdE/tMBdtKrJ5uTJFT
 W0n8KV46a+4=
 =1ZSd
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-rewrite-20160706' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Improve conn/call lookup and fix call number generation [ver #3]

I've fixed a couple of patch descriptions and excised the patch that
duplicated the connections list for reconsideration at a later date.

For reference, the excised patch is sitting on the rxrpc-experimental
branch of my git tree, based on top of the rxrpc-rewrite branch.  Diffing
it against yesterday's tag shows no differences.

Would you prefer the patch set to be emailed afresh instead of a git-pull
request?

David
---
Here's the next part of the AF_RXRPC rewrite.  The two main purposes of
this set are to fix the call number handling and to make use of RCU when
looking up the connection or call to pass a received packet to.

Important changes in this set include:

 (1) Avoidance of placing stack data into SG lists in rxkad so that kernel
     stacks can become vmalloc'd (Herbert Xu).

 (2) Calls cease pinning the connection they used as soon as possible,
     which allows the connection to be discarded sooner and allows the call
     channel on that connection to be reused earlier.

 (3) Make each call channel on a connection have a separate and independent
     call number space rather than having a shared number space for the
     connection.  Call numbers should increment monotonically per channel
     on the client, and the server should ignore a call with a lower call
     number for that channel than the latest it has seen.  The RESPONSE
     packet sets the minimum values of each call ID counter on a
     connection.

 (4) Look up calls by indexing the channel array on a connection rather
     than by keeping calls in an rbtree on that connection.  Also look up
     calls using the channel array rather than using a hashtable.

     The call hashtable can then be removed.

 (5) Call terminal statuses are cached in the channel array for the last
     call.  It is assumed that if we the server have seen call N, then the
     client no longer cares about call N-1 on the same channel.

     This will allow retransmission of the terminal status in future
     without the need to keep the rxrpc_call struct around.

 (6) Peer lookups are moved out of common connection handling code and into
     service connection handling code as client connections (a) must point
     to a peer before they can be used and (b) are looked up by a
     machine-unique connection ID directly, so we only need to look up the
     peer first if we're going to deal with a service call.

 (7) The reference count on a connection is held elevated by 1 whilst it is
     alive (ie. idle unused connections have a refcount of 1).  The reaper
     will attempt to change the refcount from 1->0 and skip if this cannot
     be done, whilst look ups only increment the refcount if it's non-zero.

     This makes the implementation of RCU lookups easier as we don't have
     to get a ref on the connection or a lock on the connection list to
     prevent a connection being reaped whilst we're contemplating queueing
     a packet that initiates a new service call upon it.

     If we need to get a connection, but there's a dead connection in the
     tree, we use rb_replace_node() to replace the dead one with a new one.

 (8) Use a seqlock to validate the walk over the service connection rbtree
     attached to a peer when it's being walked in RCU mode.

 (9) Make the incoming call/connection packet handling code use RCU mode
     and locks and make it only take a reference if the call/connection
     gets queued on a workqueue.

The intention is that the next set will introduce the connection lifetime
management and capacity limits to prevent clients from overloading the
server.

There are some fixes too:

 (1) Verifying that a packet coming in to a client connection came from the
     expected source.

 (2) Fix handling of connection failure in client call creation where we
     don't reinitialise the list linkage block and a second attempt to
     unlink the failed connection oopses and also we don't set the state
     correctly, which causes an assertion failure.

 (3) New service calls were being added to the socket's accept queue under
     the wrong lock.

Changes:

 (V2) In rxrpc_find_service_conn_rcu() initialised the sequence number to 0.

      Fixed the RCU handling in conn_service.c by introducing and using
      rb_replace_node_rcu() as an RCU-safe alternative in
      rxrpc_publish_service_conn().

      Modified and used rcu_dereference_raw() to avoid RCU sparse warnings
      in rxrpc_find_service_conn_rcu().

      Added in some missing RCU dereference wrappers.  It seems to be
      necessary to turn on CONFIG_PROVE_RCU_REPEATEDLY as well as
      CONFIG_SPARSE_RCU_POINTER to get the static __rcu annotation checking
      to happen.

      Fixed some other sparse warnings, including a missing ntohs() in
      jumbo packet processing.

 (V3) Fixed some commit descriptions.

      Excised the patch that duplicated the connection list to separate out
      the procfs list for reconsideration at a later date.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08 23:52:12 -04:00
Florian Westphal
bba7eb5d9b hfsc: reduce hfsc_sched to 14 cachelines
hfsc_sched is huge (size: 920, cachelines: 15), but we can get it to 14
cachelines by placing level after filter_cnt (covering 4 byte hole) and
reducing period/nactive/flags to u32 (period is just a counter,
incremented when class becomes active -- 2**32 is plenty for this
purpose, also, long is only 32bit wide on 32bit platforms anyway).

cl_vtperiod is exported to userspace via tc_hfsc_stats, but its period
member is already u32, so no precision is lost there either.

Cc: Michal Soltys <soltys@ziu.info>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08 23:08:39 -04:00
David S. Miller
a90a6e55f3 One more set of new features:
* beacon report (for radio measurement) support in cfg80211/mac80211
  * hwsim: allow wmediumd in namespaces
  * mac80211: extend 160MHz workaround to CSA IEs
  * mesh: properly encrypt group-addressed privacy action frames
  * mesh: allow setting peer AID
  * first steps for MU-MIMO monitor mode
  * along with various other cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJXfSCuAAoJEGt7eEactAAdaugP/ilrcELQRsIN5ZXCAKYZuwXV
 T01JPwgOaWL9ILu7h1SfG/+j9kzMnyk4WmRpeoj2FGNcyfG2AvULWSLpQJ2abwgQ
 8o/emuLinQwRENevaMUTRSOE0HkXoFPCbbq37+a2i6bAv1QSYY3A0xvWpcU5fZ4D
 7CKYDYPBAdMXYwEwy1g4nYWfDAYqS4rthr3l3rS1Cy7Q2T1ZlMlD90GjD7oeQAEw
 orKulhkkDSzvxfvZCYTzXmUoBQE8sNXGDD+OFsJyowyt+ugM/xan+2tmhCaHSnda
 HpdCS2aRj779UBn9cOfELjffTNpS++PM6KFd8ZDaPcJSMginn/BAHTOeNfNUfL0Q
 +Enu59I82qMDzbG2z1Qezzjv7OTzyEvyvYzNbLOqljTBSBklDa3rHwhyk+g1oVBH
 +4xX1Vk5QBLde+Q0NS0gTkcqOQK8KT5+HEqiUfgLSNDETN0lSGsKbtvMfU/ikE1Y
 aLRkTp7nzUd03qjIFLS6RMf7JjucWWzH1ZXTHvbpDFAG7riOhYRD3Sw+0e7madTd
 +HXjH9dnOGnGPDL+FyDwtW6iclYwNjcIPQiNdOjwWfMA2Wmr7iq+aFUptRCwQTHB
 WtgJ3f8OHax2JXcm4grYfxELZip5vbWJJHUC84Drvmzw3X7FRITf+OEWjdNOzsRD
 Fc7w5ceThh9Id1BvcH2+
 =R1qP
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2016-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
One more set of new features:
 * beacon report (for radio measurement) support in cfg80211/mac80211
 * hwsim: allow wmediumd in namespaces
 * mac80211: extend 160MHz workaround to CSA IEs
 * mesh: properly encrypt group-addressed privacy action frames
 * mesh: allow setting peer AID
 * first steps for MU-MIMO monitor mode
 * along with various other cleanups and improvements
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 22:32:15 -07:00
David S. Miller
30d0844bdc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx5/core/en.h
	drivers/net/ethernet/mellanox/mlx5/core/en_main.c
	drivers/net/usb/r8152.c

All three conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 10:35:22 -07:00
David S. Miller
ae3e4562e2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next,
they are:

1) Don't use userspace datatypes in bridge netfilter code, from
   Tobin Harding.

2) Iterate only once over the expectation table when removing the
   helper module, instead of once per-netns, from Florian Westphal.

3) Extra sanitization in xt_hook_ops_alloc() to return error in case
   we ever pass zero hooks, xt_hook_ops_alloc():

4) Handle NFPROTO_INET from the logging core infrastructure, from
   Liping Zhang.

5) Autoload loggers when TRACE target is used from rules, this doesn't
   change the behaviour in case the user already selected nfnetlink_log
   as preferred way to print tracing logs, also from Liping Zhang.

6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
   by cache lines, increases the size of entries in 11% per entry.
   From Florian Westphal.

7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.

8) Remove useless defensive check in nf_logger_find_get() from Shivani
   Bhardwaj.

9) Remove zone extension as place it in the conntrack object, this is
   always include in the hashing and we expect more intensive use of
   zones since containers are in place. Also from Florian Westphal.

10) Owner match now works from any namespace, from Eric Bierdeman.

11) Make sure we only reply with TCP reset to TCP traffic from
    nf_reject_ipv4, patch from Liping Zhang.

12) Introduce --nflog-size to indicate amount of network packet bytes
    that are copied to userspace via log message, from Vishwanath Pai.
    This obsoletes --nflog-range that has never worked, it was designed
    to achieve this but it has never worked.

13) Introduce generic macros for nf_tables object generation masks.

14) Use generation mask in table, chain and set objects in nf_tables.
    This allows fixes interferences with ongoing preparation phase of
    the commit protocol and object listings going on at the same time.
    This update is introduced in three patches, one per object.

15) Check if the object is active in the next generation for element
    deactivation in the rbtree implementation, given that deactivation
    happens from the commit phase path we have to observe the future
    status of the object.

16) Support for deletion of just added elements in the hash set type.

17) Allow to resize hashtable from /proc entry, not only from the
    obscure /sys entry that maps to the module parameter, from Florian
    Westphal.

18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
    anymore since we tear down the ruleset whenever the netdevice
    goes away.

19) Support for matching inverted set lookups, from Arturo Borrero.

20) Simplify the iptables_mangle_hook() by removing a superfluous
    extra branch.

21) Introduce ether_addr_equal_masked() and use it from the netfilter
    codebase, from Joe Perches.

22) Remove references to "Use netfilter MARK value as routing key"
    from the Netfilter Kconfig description given that this toggle
    doesn't exists already for 10 years, from Moritz Sichert.

23) Introduce generic NF_INVF() and use it from the xtables codebase,
    from Joe Perches.

24) Setting logger to NONE via /proc was not working unless explicit
    nul-termination was included in the string. This fixes seems to
    leave the former behaviour there, so we don't break backward.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 09:15:15 -07:00
Masashi Honma
7d27a0ba7a cfg80211: Add mesh peer AID setting API
Previously, mesh power management functionality works only with kernel
MPM. Because user space MPM did not report mesh peer AID to kernel,
the kernel could not identify the bit in TIM element. So this patch
adds mesh peer AID setting API.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 15:04:52 +02:00
Johannes Berg
92b3a28a2b mac80211: parse wide bandwidth channel switch IE with workaround
Continuing the workaround implemented in commit 23665aaf91
("mac80211: Interoperability workaround for 80+80 and 160 MHz channels")
use the same code to parse the Wide Bandwidth Channel Switch element
by converting to VHT Operation element since the spec also just refers
to that for parsing semantics, particularly with the workaround.

While at it, remove some dead code - the IEEE80211_STA_DISABLE_40MHZ
flag can never be set at this point since it's checked earlier and the
wide_bw_chansw_ie pointer is set to NULL if it's set.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 14:55:04 +02:00
Johannes Berg
7d10f6b179 mac80211: report failure to start (partial) scan as scan abort
Rather than reporting the scan as having completed, report it as
being aborted.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 14:54:38 +02:00
Avraham Stern
7947d3e075 mac80211: Add support for beacon report radio measurement
Add the following to support beacon report radio measurement
with the measurement mode field set to passive or active:
1. Propagate the required scan duration to the device
2. Report the scan start time (in terms of TSF)
3. Report each BSS's detection time (also in terms of TSF)

TSF times refer to the BSS that the interface that requested the
scan is connected to.

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
[changed ath9k/10k, at76c59x-usb, iwlegacy, wl1251 and wlcore to match
the new API]
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 14:53:19 +02:00
Avraham Stern
1d76250bd3 nl80211: support beacon report scanning
Beacon report radio measurement requires reporting observed BSSs
on the channels specified in the beacon request. If the measurement
mode is set to passive or active, it requires actually performing a
scan (passive or active, accordingly), and reporting the time that
the scan was started and the time each beacon/probe was received
(both in terms of TSF of the BSS of the requesting AP). If the
request mode is table, this information is optional.
In addition, the radio measurement request specifies the channel
dwell time for the measurement.

In order to use scan for beacon report when the mode is active or
passive, add a parameter to scan request that specifies the
channel dwell time, and add scan start time and beacon received time
to scan results information.

Supporting beacon report is required for Multi Band Operation (MBO).

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 14:51:31 +02:00
Aviya Erenfeld
c6e6a0c8be nl80211: Add API to support VHT MU-MIMO air sniffer
add API to support VHT MU-MIMO air sniffer.
in MU-MIMO there are parallel frames on the air while the HW
has only one RX.
add the capability to sniff one of the MU-MIMO parallel frames by
giving the sniffer additional information so it'll know which
of the parallel frames it shall follow.

Add attribute - NL80211_ATTR_MU_MIMO_GROUP_DATA - for getting
a MU-MIMO groupID in order to monitor packets from that group
using VHT MU-MIMO.
And add attribute -NL80211_ATTR_MU_MIMO_FOLLOW_ADDR - for passing
MAC address to monitor mode.
that option will be used by VHT MU-MIMO air sniffer to follow a
station according to it's MAC address using VHT MU-MIMO.

Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 14:46:04 +02:00
Johannes Berg
f89e07d4cf mac80211: agg-rx: refuse ADDBA Request with timeout update
The current implementation of handling ADDBA Request while a session
is already active with the peer is wrong - in case the peer is using
the existing session's dialog token this should be treated as update
to the session, which can update the timeout value.

We don't really have a good way of supporting that, so reject, but
implement the required behaviour in the spec of "Even if the updated
ADDBA Request frame is not accepted, the original Block ACK setup
remains active." (802.11-2012 10.5.4)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 14:44:14 +02:00
David Howells
d440a1ce5d rxrpc: Kill off the call hash table
The call hash table is now no longer used as calls are looked up directly
by channel slot on the connection, so kill it off.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 11:23:54 +01:00
David Howells
8496af50eb rxrpc: Use RCU to access a peer's service connection tree
Move to using RCU access to a peer's service connection tree when routing
an incoming packet.  This is done using a seqlock to trigger retrying of
the tree walk if a change happened.

Further, we no longer get a ref on the connection looked up in the
data_ready handler unless we queue the connection's work item - and then
only if the refcount > 0.


Note that I'm avoiding the use of a hash table for service connections
because each service connection is addressed by a 62-bit number
(constructed from epoch and connection ID >> 2) that would allow the client
to engage in bucket stuffing, given knowledge of the hash algorithm.
Peers, however, are hashed as the network address is less controllable by
the client.  The total number of peers will also be limited in a future
commit.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:51:14 +01:00
David Howells
1291e9d108 rxrpc: Move data_ready peer lookup into rxrpc_find_connection()
Move the peer lookup done in input.c by data_ready into
rxrpc_find_connection().

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:51:14 +01:00
David Howells
e8d70ce177 rxrpc: Prune the contents of the rxrpc_conn_proto struct
Prune the contents of the rxrpc_conn_proto struct.  Most of the fields aren't
used anymore.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:51:14 +01:00
David Howells
001c112249 rxrpc: Maintain an extra ref on a conn for the cache list
Overhaul the usage count accounting for the rxrpc_connection struct to make
it easier to implement RCU access from the data_ready handler.

The problem is that currently we're using a lock to prevent the garbage
collector from trying to clean up a connection that we're contemplating
unidling.  We could just stick incoming packets on the connection we find,
but we've then got a problem that we may race when dispatching a work item
to process it as we need to give that a ref to prevent the rxrpc_connection
struct from disappearing in the meantime.

Further, incoming packets may get discarded if attached to an
rxrpc_connection struct that is going away.  Whilst this is not a total
disaster - the client will presumably resend - it would delay processing of
the call.  This would affect the AFS client filesystem's service manager
operation.

To this end:

 (1) We now maintain an extra count on the connection usage count whilst it
     is on the connection list.  This mean it is not in use when its
     refcount is 1.

 (2) When trying to reuse an old connection, we only increment the refcount
     if it is greater than 0.  If it is 0, we replace it in the tree with a
     new candidate connection.

 (3) Two connection flags are added to indicate whether or not a connection
     is in the local's client connection tree (used by sendmsg) or the
     peer's service connection tree (used by data_ready).  This makes sure
     that we don't try and remove a connection if it got replaced.

     The flags are tested under lock with the removal operation to prevent
     the reaper from killing the rxrpc_connection struct whilst someone
     else is trying to effect a replacement.

     This could probably be alleviated by using memory barriers between the
     flag set/test and the rb_tree ops.  The rb_tree op would still need to
     be under the lock, however.

 (4) When trying to reap an old connection, we try to flip the usage count
     from 1 to 0.  If it's not 1 at that point, then it must've come back
     to life temporarily and we ignore it.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:50:04 +01:00
David Howells
d991b4a32f rxrpc: Move peer lookup from call-accept to new-incoming-conn
Move the lookup of a peer from a call that's being accepted into the
function that creates a new incoming connection.  This will allow us to
avoid incrementing the peer's usage count in some cases in future.

Note that I haven't bother to integrate rxrpc_get_addr_from_skb() with
rxrpc_extract_addr_from_skb() as I'm going to delete the former in the very
near future.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:49:57 +01:00
David Howells
7877a4a4bd rxrpc: Split service connection code out into its own file
Split the service-specific connection code out into into its own file.  The
client-specific code has already been split out.  This will leave just the
common code in the original file.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:49:35 +01:00
David Howells
c6d2b8d764 rxrpc: Split client connection code out into its own file
Split the client-specific connection code out into its own file.  It will
behave somewhat differently from the service-specific connection code, so
it makes sense to separate them.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:52 +01:00
David Howells
a1399f8bb0 rxrpc: Call channels should have separate call number spaces
Each channel on a connection has a separate, independent number space from
which to allocate callNumber values.  It is entirely possible, for example,
to have a connection with four active calls, each with call number 1.

Note that the callNumber values for any particular channel don't have to
start at 1, but they are supposed to increment monotonically for that
channel from a client's perspective and may not be reused once the call
number is transmitted (until the epoch cycles all the way back round).

Currently, however, call numbers are allocated on a per-connection basis
and, further, are held in an rb-tree.  The rb-tree is redundant as the four
channel pointers in the rxrpc_connection struct are entirely capable of
pointing to all the calls currently in progress on a connection.

To this end, make the following changes:

 (1) Handle call number allocation independently per channel.

 (2) Get rid of the conn->calls rb-tree.  This is overkill as a connection
     may have a maximum of four calls in progress at any one time.  Use the
     pointers in the channels[] array instead, indexed by the channel
     number from the packet.

 (3) For each channel, save the result of the last call that was in
     progress on that channel in conn->channels[] so that the final ACK or
     ABORT packet can be replayed if necessary.  Any call earlier than that
     is just ignored.  If we've seen the next call number in a packet, the
     last one is most definitely defunct.

 (4) When generating a RESPONSE packet for a connection, the call number
     counter for each channel must be included in it.

 (5) When parsing a RESPONSE packet for a connection, the call number
     counters contained therein should be used to set the minimum expected
     call numbers on each channel.

To do in future commits:

 (1) Replay terminal packets based on the last call stored in
     conn->channels[].

 (2) Connections should be retired before the callNumber space on any
     channel runs out.

 (3) A server is expected to disregard or reject any new incoming call that
     has a call number less than the current call number counter.  The call
     number counter for that channel must be advanced to the new call
     number.

     Note that the server cannot just require that the next call that it
     sees on a channel be exactly the call number counter + 1 because then
     there's a scenario that could cause a problem: The client transmits a
     packet to initiate a connection, the network goes out, the server
     sends an ACK (which gets lost), the client sends an ABORT (which also
     gets lost); the network then reconnects, the client then reuses the
     call number for the next call (it doesn't know the server already saw
     the call number), but the server thinks it already has the first
     packet of this call (it doesn't know that the client doesn't know that
     it saw the call number the first time).

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:52 +01:00
David Howells
30b515f4d1 rxrpc: Access socket accept queue under right lock
The socket's accept queue (socket->acceptq) should be accessed under
socket->call_lock, not under the connection lock.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:51 +01:00
David Howells
dee46364ce rxrpc: Add RCU destruction for connections and calls
Add RCU destruction for connections and calls as the RCU lookup from the
transport socket data_ready handler is going to come along shortly.

Whilst we're at it, move the cleanup workqueue flushing and RCU barrierage
into the destruction code for the objects that need it (locals and
connections) and add the extra RCU barrier required for connection cleanup.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:51 +01:00
David Howells
e653cfe49c rxrpc: Release a call's connection ref on call disconnection
When a call is disconnected, clear the call's pointer to the connection and
release the associated ref on that connection.  This means that the call no
longer pins the connection and the connection can be discarded even before
the call is.

As the code currently stands, the call struct is effectively pinned by
userspace until userspace has enacted a recvmsg() to retrieve the final
call state as sk_buffs on the receive queue pin the call to which they're
related because:

 (1) The rxrpc_call struct contains the userspace ID that recvmsg() has to
     include in the control message buffer to indicate which call is being
     referred to.  This ID must remain valid until the terminal packet is
     completely read and must be invalidated immediately at that point as
     userspace is entitled to immediately reuse it.

 (2) The final ACK to the reply to a client call isn't sent until the last
     data packet is entirely read (it's probably worth altering this in
     future to be send the ACK as soon as all the data has been received).


This change requires a bit of rearrangement to make sure that the call
isn't going to try and access the connection again after protocol
completion:

 (1) Delete the error link earlier when we're releasing the call.  Possibly
     network errors should be distributed via connections at the cost of
     adding in an access to the rxrpc_connection struct.

 (2) Remove the call from the connection's call tree before disconnecting
     the call.  The call tree needs to be removed anyway and incoming
     packets delivered by channel pointer instead.

 (3) The release call event should be considered last after all other
     events have been processed so that we don't need access to the
     connection again.

 (4) Move the channel_lock taking from rxrpc_release_call() to
     rxrpc_disconnect_call() where it will be required in future.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:51 +01:00
David Howells
d1e858c5a3 rxrpc: Fix handling of connection failure in client call creation
If rxrpc_connect_call() fails during the creation of a client connection,
there are two bugs that we can hit that need fixing:

 (1) The call state should be moved to RXRPC_CALL_DEAD before the call
     cleanup phase is invoked.  If not, this can cause an assertion failure
     later.

 (2) call->link should be reinitialised after being deleted in
     rxrpc_new_client_call() - which otherwise leads to a failure later
     when the call cleanup attempts to delete the link again.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:51 +01:00
David Howells
2c4579e4b1 rxrpc: Move usage count getting into rxrpc_queue_conn()
Rather than calling rxrpc_get_connection() manually before calling
rxrpc_queue_conn(), do it inside the queue wrapper.

This allows us to do some important fixes:

 (1) If the usage count is 0, do nothing.  This prevents connections from
     being reanimated once they're dead.

 (2) If rxrpc_queue_work() fails because the work item is already queued,
     retract the usage count increment which would otherwise be lost.

 (3) Don't take a ref on the connection in the work function.  By passing
     the ref through the work item, this is unnecessary.  Doing it in the
     work function is too late anyway.  Previously, connection-directed
     packets held a ref on the connection, but that's not really the best
     idea.

And another useful changes:

 (*) Don't need to take a refcount on the connection in the data_ready
     handler unless we invoke the connection's work item.  We're using RCU
     there so that's otherwise redundant.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:51 +01:00
David Howells
eb9b9d2275 rxrpc: Check that the client conns cache is empty before module removal
Check that the client conns cache is empty before module removal and bug if
not, listing any offending connections that are still present.  Unfortunately,
if there are connections still around, then the transport socket is still
unexpectedly open and active, so we can't just unallocate the connections.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:51 +01:00
David Howells
bba304db34 rxrpc: Turn connection #defines into enums and put outside struct def
Turn the connection event and state #define lists into enums and move
outside of the struct definition.

Whilst we're at it, change _SERVER to _SERVICE in those identifiers and add
EV_ into the event name to distinguish them from flags and states.

Also add a symbol indicating the number of states and use that in the state
text array.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:51 +01:00
David Howells
5acbee4648 rxrpc: Provide queuing helper functions
Provide queueing helper functions so that the queueing of local and
connection objects can be fixed later.

The issue is that a ref on the object needs to be passed to the work queue,
but the act of queueing the object may fail because the object is already
queued.  Testing the queuedness of an object before hand doesn't work
because there can be a race with someone else trying to queue it.  What
will have to be done is to adjust the refcount depending on the result of
the queue operation.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:05 +01:00
Herbert Xu
a263629da5 rxrpc: Avoid using stack memory in SG lists in rxkad
rxkad uses stack memory in SG lists which would not work if stacks were
allocated from vmalloc memory.  In fact, in most cases this isn't even
necessary as the stack memory ends up getting copied over to kmalloc
memory.

This patch eliminates all the unnecessary stack memory uses by supplying
the final destination directly to the crypto API.  In two instances where a
temporary buffer is actually needed we also switch use a scratch area in
the rxrpc_call struct (only one DATA packet will be being secured or
verified at a time).

Finally there is no need to split a split-page buffer into two SG entries
so code dealing with that has been removed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:05 +01:00
David Howells
689f4c646d rxrpc: Check the source of a packet to a client conn
When looking up a client connection to which to route a packet, we need to
check that the packet came from the correct source so that a peer can't try
to muck around with another peer's connection.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:05 +01:00
David Howells
88b99d0b7a rxrpc: Fix some sparse errors
Fix the following sparse errors:

../net/rxrpc/conn_object.c:77:17: warning: incorrect type in assignment (different base types)
../net/rxrpc/conn_object.c:77:17:    expected restricted __be32 [usertype] call_id
../net/rxrpc/conn_object.c:77:17:    got unsigned int [unsigned] [usertype] call_id
../net/rxrpc/conn_object.c:84:21: warning: restricted __be32 degrades to integer
../net/rxrpc/conn_object.c:86:26: warning: restricted __be32 degrades to integer
../net/rxrpc/conn_object.c:357:15: warning: incorrect type in assignment (different base types)
../net/rxrpc/conn_object.c:357:15:    expected restricted __be32 [usertype] epoch
../net/rxrpc/conn_object.c:357:15:    got unsigned int [unsigned] [usertype] epoch
../net/rxrpc/conn_object.c:369:21: warning: restricted __be32 degrades to integer
../net/rxrpc/conn_object.c:371:26: warning: restricted __be32 degrades to integer
../net/rxrpc/conn_object.c:411:21: warning: restricted __be32 degrades to integer
../net/rxrpc/conn_object.c:413:26: warning: restricted __be32 degrades to integer

Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:43:05 +01:00
Martin KaFai Lau
903ce4abdf ipv6: Fix mem leak in rt6i_pcpu
It was first reported and reproduced by Petr (thanks!) in
https://bugzilla.kernel.org/show_bug.cgi?id=119581

free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().

However, after fixing a deadlock bug in
commit 9c7370a166 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.

It is worth to note that rt6i_pcpu is protected by table->tb6_lock.

kmemleak somehow did not report it.  We nailed it down by
observing the pcpu entries in /proc/vmallocinfo (first suggested
by Hannes, thanks!).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Fixes: 9c7370a166 ("ipv6: Fix a potential deadlock when creating pcpu rt")
Reported-by: Petr Novopashenniy <pety@rusnet.ru>
Tested-by: Petr Novopashenniy <pety@rusnet.ru>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Petr Novopashenniy <pety@rusnet.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 14:09:23 -07:00
Vegard Nossum
ab58298cf4 net: fix decnet rtnexthop parsing
dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0
(i.e. if userspace passes a malformed netlink message).

Let's use the helpers from net/nexthop.h which take care of all this
stuff. We can do exactly the same as e.g. fib_count_nexthops() and
fib_get_nhs() from net/ipv4/fib_semantics.c.

This fixes the softlockup for me.

Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 14:08:47 -07:00
Ido Schimmel
2a4501ae18 neigh: Send a notification when DELAY_PROBE_TIME changes
When the data plane is offloaded the traffic doesn't go through the
networking stack. Therefore, after first resolving a neighbour the NUD
state machine will transition it from REACHABLE to STALE until it's
finally deleted by the garbage collector.

To prevent such situations the offloading driver should notify the NUD
state machine on any neighbours that were recently used. The driver's
polling interval should be set so that the NUD state machine can
function as if the traffic wasn't offloaded.

Currently, there are no in-tree drivers that can report confirmation for
a neighbour, but only 'used' indication. Therefore, the polling interval
should be set according to DELAY_FIRST_PROBE_TIME, as a neighbour will
transition from REACHABLE state to DELAY (instead of STALE) if "a packet
was sent within the last DELAY_FIRST_PROBE_TIME seconds" (RFC 4861).

Send a netevent whenever the DELAY_FIRST_PROBE_TIME changes - either via
netlink or sysctl - so that offloading drivers can correctly set their
polling interval.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:29 -07:00
Jiri Pirko
18bfb924f0 net: introduce default neigh_construct/destroy ndo calls for L2 upper devices
L2 upper device needs to propagate neigh_construct/destroy calls down to
lower devices. Do this by defining default ndo functions and use them in
team, bond, bridge and vlan.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:28 -07:00
Jiri Pirko
503eebc265 net: add dev arg to ndo_neigh_construct/destroy
As the following patch will allow upper devices to follow the call down
lower devices, we need to add dev here and not rely on n->dev.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:28 -07:00
Pavel Tikhomirov
c6ac37d8d8 netfilter: nf_log: fix error on write NONE to logger choice sysctl
It is hard to unbind nf-logger:

  echo NONE > /proc/sys/net/netfilter/nf_log/0
  bash: echo: write error: No such file or directory

  sysctl -w net.netfilter.nf_log.0=NONE
  sysctl: setting key "net.netfilter.nf_log.0": No such file or directory
  net.netfilter.nf_log.0 = NONE

You need explicitly send '\0', for instance like:

  echo -e "NONE\0" > /proc/sys/net/netfilter/nf_log/0

That seem to be strange, so fix it using proc_dostring.

Now it works fine:
   modprobe nfnetlink_log
   echo nfnetlink_log > /proc/sys/net/netfilter/nf_log/0
   cat /proc/sys/net/netfilter/nf_log/0
   nfnetlink_log
   echo NONE > /proc/sys/net/netfilter/nf_log/0
   cat /proc/sys/net/netfilter/nf_log/0
   NONE

v2: add missed error check for proc_dostring

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-05 14:57:57 +02:00
David S. Miller
b77af26a79 This feature patchset includes the following changes:
- Cleanup work by Markus Pargmann and Sven Eckelmann (six patches)
 
  - Initial Netlink support by Matthias Schiffer (two patches)
 
  - Throughput Meter implementation by Antonio Quartulli, a kernel-space
    traffic generator to estimate link speeds. This feature is useful on
    low-end WiFi APs where running iperf or netperf from userspace
    gives wrong results due to heavy userspace/kernelspace overhead.
    (two patches)
 
  - API clean-up work by Antonio Quartulli (one patch)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJXel3cAAoJEKEr45hCkp6hxs0P/jQZBJ37Bd4EHRGdhvCJWwsO
 j79zr7mIECub8a6PMkO1GI87ksJNtRdRw7XAIbLKTwsKEsUE0Gpv/MLLKgv/nD7X
 zatcoI4DujkgSojZIcOV/061+M9FAnCtAYv13jIS8nbXdqfGPxPfLua6Zbvx1GS2
 z/Rqg/WbB2qDtDlUrp0W/8oXQ+k6062G7GigroPLmjdWd5lF0H6ly4loWsxFyr0U
 GVl44HM4nOj7DwkVlrGoOXnAbjpz9TNC/aA5TIS/tLFZkm5dvJjjKLDbxo5NM9aq
 hRhFy8Gbe0TmOxV3mKZUT1oHuaHgFDY2tADLiLF2g/ijgaTetXCBJ6DXQ7BkiZnh
 +t1fnutOB1D05+cZGDmlfb2bFXO6vdDwNzKYuPdeW0tUOVwzNIaMK+US1HffUD3F
 ciK/cALsLbfJ3QkUHJclE57baMuB2c7YWJUxGdp2r4lKHak6tc8+BsornI6lB6qY
 kcxip6EEaT7edjT66Qjq8GtFK7WIri5nHI9n5Js+Wwl1QENvkLmZRQ6qZexwSplS
 RTZmmO+i+Y4rGDa3xoVSlC+CEUO7D4VwhET2Jir7KJrVS+pFNRAmfpUNWxeauAls
 D1xWgBrWjjOYu5i3LjwC6cHl4eTWxBwWmBUaxLUUeyoR22ulIs6bXCQMWOLMbupd
 q8k2B5BW9waTAgb4Tam9
 =PFHu
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20160704' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature patchset includes the following changes:

 - Cleanup work by Markus Pargmann and Sven Eckelmann (six patches)

 - Initial Netlink support by Matthias Schiffer (two patches)

 - Throughput Meter implementation by Antonio Quartulli, a kernel-space
   traffic generator to estimate link speeds. This feature is useful on
   low-end WiFi APs where running iperf or netperf from userspace
   gives wrong results due to heavy userspace/kernelspace overhead.
   (two patches)

 - API clean-up work by Antonio Quartulli (one patch)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 23:33:59 -07:00
Jiri Pirko
7ce856aaaf mlxsw: spectrum: Add couple of lower device helper functions
Add functions that iterate over lower devices and find port device.
As a dependency add netdev_for_each_all_lower_dev and
netdev_for_each_all_lower_dev_rcu macro with
netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers.

Also, add functions to return mlxsw struct according to lower device
found and mlxsw_port struct with a reference to lower device.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:15 -07:00
Vegard Nossum
3dad5424ad RDS: fix rds_tcp_init() error path
If register_pernet_subsys() fails, we shouldn't try to call
unregister_pernet_subsys().

Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Cc: stable@vger.kernel.org
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 16:09:49 -07:00
Daniel Borkmann
13c5c240f7 bpf: add bpf_get_hash_recalc helper
If skb_clear_hash() was invoked due to mangling of relevant headers and
BPF program needs skb->hash later on, we can add a helper to trigger hash
recalculation via bpf_get_hash_recalc().

The helper will return the newly retrieved hash directly, but later access
can also be done via skb context again through skb->hash directly (inline)
without needing to call the helper once more.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 16:08:40 -07:00
John Fastabend
0967f24459 net: pktgen: support injecting packets for qdisc testing
Add another xmit_mode to pktgen to allow testing xmit functionality
of qdiscs. The new mode "queue_xmit" injects packets at
__dev_queue_xmit() so that qdisc is called.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 16:07:34 -07:00
Jamal Hadi Salim
61cc535de3 net sched actions: skbedit convert to use more modern nla_put_xxx
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 15:11:14 -07:00
Jamal Hadi Salim
ff202ee1ed net sched actions: skbedit add support for mod-ing skb pkt_type
Extremely useful for setting packet type to host so i dont
have to modify the dst mac address using pedit (which requires
that i know the mac address)

Example usage:
tc filter add dev eth0 parent ffff: protocol ip pref 9 u32 \
match ip src 5.5.5.5/32 \
flowid 1:5 action skbedit ptype host

This will tag all packets incoming from 5.5.5.5 with type
PACKET_HOST

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 15:11:14 -07:00