Commit Graph

39550 Commits

Author SHA1 Message Date
Eric W. Biederman
3f5312ae62 xfrm: Only compute net once in xfrm_policy_queue_process
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:26:53 -07:00
Eric W. Biederman
850dcc4d4d ipv4: Fix ip_queue_xmit to pass sk into ip_local_out_sk
After a packet has been encapsulated by a tunnel we should use the
tunnel sockets local multicast loopback flag to control if the
encapsulated packet should be locally loopback back.

Pass sk into ip_local_out_sk so that in the rare case we are dealing
with a tunneled packet whose tunnel destination address is a multicast
address the kernel properly decides to loopback this packet.

In practice I don't think this matters as ip_queue_xmit is used by
tcp, l2tp and sctp none of which I am aware of uses ip level
multicasting as they are all point to point communications protocols.
Let's fix this before someone uses ip_queue_xmit for a tunnel protocol
that does use multicast.

Fixes: aad88724c9 ("ipv4: add a sock pointer to dst->output() path.")
Fixes: b0270e9101 ("ipv4: add a sock pointer to ip_queue_xmit()")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:26:52 -07:00
Eric W. Biederman
fd2874b3bb ipv4: Fix ip_local_out_sk by passing the sk into __ip_local_out_sk
In the rare case where sk != skb->sk ip_local_out_sk arranges
to call dst->output differently if the skb is queued or not.
This is a bug.

Fix this bug by passing the sk parameter of ip_local_out_sk through
from ip_local_out_sk to __ip_local_out_sk (skipping __ip_local_out).

Fixes: 7026b1ddb6 ("netfilter: Pass socket pointer down through okfn().")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08 04:26:52 -07:00
Felix Fietkau
4d57c67827 mac80211: add missing struct ieee80211_txq tid field initialization
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-10-08 11:10:50 +02:00
Johan Hedberg
26d46dffbe Bluetooth: 6lowpan: Remove unnecessary chan_get() function
The chan_get() function just adds unnecessary indirection to calling
the chan_create() call. The only added value it gives is the chan->ops
assignment, but that can equally well be done in the calling code.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-08 10:43:52 +02:00
Johan Hedberg
0cd088fc97 Bluetooth: 6lowpan: Rename confusing 'pchan' variables
The typical convention when having both a child and a parent channel
variable is to call the former 'chan' and the latter 'pchan'. When
there's only one variable it's called chan. Rename the 'pchan'
variables in the 6lowpan code to follow this convention.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-08 10:43:52 +02:00
Johan Hedberg
630ef791ea Bluetooth: 6lowpan: Remove unnecessary chan_open() function
All the chan_open() function now does is to call chan_create() so it
doesn't really add any value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-08 10:43:52 +02:00
Johan Hedberg
b0c09f94ff Bluetooth: 6lowpan: Remove redundant BT_CONNECTED assignment
The L2CAP core code makes sure of setting the channel state to
BT_CONNECTED, so there's no need for the implementation code (6lowpan
in this case) to do it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-08 10:43:52 +02:00
Johan Hedberg
5d0fd77a04 Bluetooth: 6lowpan: Remove redundant (and incorrect) MPS assignments
The L2CAP core code already sets the local MPS to a sane value. The
remote MPS value otoh comes from the remote side so there's no point
in trying to hard-code it to any value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-08 10:43:52 +02:00
Johan Hedberg
301de2cb6a Bluetooth: 6lowpan: Fix imtu & omtu values
The omtu value is determined by the remote peer so there's no point in
trying to hard-code it to any value. The IPSP specification otoh gives
a more reasonable value for the imtu, i.e. 1280.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-08 10:43:52 +02:00
Marcel Holtmann
fe806dcede Bluetooth: Enforce packet types in hci_recv_frame driver function
When calling the hci_recv_frame driver function check for valid packet
types that the core should process. This should catch issues with
drivers trying to feed vendor packet types through this interface.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-08 10:05:41 +03:00
Marcel Holtmann
acc649c654 Bluetooth: Fix interaction of HCI_QUIRK_RESET_ON_CLOSE and HCI_AUTO_OFF
When the controller requires the HCI Reset command to be send when
closing the transport, the HCI_AUTO_OFF needs to be accounted for. The
current code tries to actually do that, but the flag gets cleared to
early. So store its value and use it that stored value instead of
checking for a flag that is always cleared.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-08 10:00:05 +03:00
Marcel Holtmann
4b4113d6db Bluetooth: Add debugfs entry for setting vendor diagnostic mode
This adds a new debugfs entry for enabling and disabling the vendor
diagnostic mode. It is only exposed for drivers that provide the
set_diag driver callback and actually have an option for vendor
specific diagnostic information.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-08 09:57:07 +03:00
Marcel Holtmann
e875ff8407 Bluetooth: Add support for vendor specific diagnostic channel
Introduce hci_recv_diag function for HCI drivers to allow sending vendor
specific diagnostic messages into the Bluetooth core stack. The messages
are not processed, but they are forwarded to the monitor channel and can
be retrieved by user space diagnostic tools.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-08 09:51:13 +03:00
Marcel Holtmann
6c566dd5a1 Bluetooth: Send index information updates to monitor channel
The Bluetooth public device address might change during controller setup
and it makes it a lot simpler for monitoring tools if they just get told
what the new address is. In addition include the manufacturer / company
information of the controller. That allows for easy vendor specific HCI
command and event handling.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-08 09:48:34 +03:00
Eric Dumazet
acb4a6bfc8 tcp: ensure prior synack rtx behavior with small backlogs
Some applications use a listen() backlog of 1.

Prior kernels were silently enforcing a qlen_log of 4, so that we were
sending up to /proc/sys/net/ipv4/tcp_synack_retries SYNACK messages.

Fixes: ef547f2ac1 ("tcp: remove max_qlen_log")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 05:08:58 -07:00
Yuvaraja Mariappan
686a562449 net: ipv4: tcp.c Fixed an assignment coding style issue
Fixed an assignment coding style issue

Signed-off-by: Yuvaraja Mariappan <ymariappan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 05:01:04 -07:00
Nikolay Aleksandrov
5d6ae479ab bridge: netlink: add support for port's multicast_router attribute
Add IFLA_BRPORT_MULTICAST_ROUTER to allow setting/getting port's
multicast_router via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:49:34 -07:00
Nikolay Aleksandrov
9b0c6e4deb bridge: netlink: allow to flush port's fdb
Add IFLA_BRPORT_FLUSH to allow flushing port's fdb similar to sysfs's
flush.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:49:32 -07:00
Nikolay Aleksandrov
61c0a9a83e bridge: netlink: export port's timer values
Add the following attributes in order to export port's timer values:
IFLA_BRPORT_MESSAGE_AGE_TIMER, IFLA_BRPORT_FORWARD_DELAY_TIMER and
IFLA_BRPORT_HOLD_TIMER.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:49:31 -07:00
Nikolay Aleksandrov
e08e838ac5 bridge: netlink: export port's topology_change_ack and config_pending
Add IFLA_BRPORT_TOPOLOGY_CHANGE_ACK and IFLA_BRPORT_CONFIG_PENDING to
allow getting port's topology_change_ack and config_pending respectively
via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:49:30 -07:00
Nikolay Aleksandrov
42d452c4b5 bridge: netlink: export port's id and number
Add IFLA_BRPORT_(ID|NO) to allow getting port's port_id and port_no
respectively via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:49:29 -07:00
Nikolay Aleksandrov
96f94e7f4a bridge: netlink: export port's designated cost and port
Add IFLA_BRPORT_DESIGNATED_(COST|PORT) to allow getting the port's
designated cost and port respectively via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:49:29 -07:00
Nikolay Aleksandrov
80df9a2692 bridge: netlink: export port's bridge id
Add IFLA_BRPORT_BRIDGE_ID to allow getting the designated bridge id via
netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:49:28 -07:00
Nikolay Aleksandrov
4ebc7660ab bridge: netlink: export port's root id
Add IFLA_BRPORT_ROOT_ID to allow getting the designated root id via
netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:49:27 -07:00
David Ahern
deaa0a6a93 net: Lookup actual route when oif is VRF device
If the user specifies a VRF device in a get route query the custom route
pointing to the VRF device is returned:

    $ ip route ls table vrf-red
    unreachable default
    broadcast 10.2.1.0 dev eth1  proto kernel  scope link  src 10.2.1.2
    10.2.1.0/24 dev eth1  proto kernel  scope link  src 10.2.1.2
    local 10.2.1.2 dev eth1  proto kernel  scope host  src 10.2.1.2
    broadcast 10.2.1.255 dev eth1  proto kernel  scope link  src 10.2.1.2

    $ ip route get oif vrf-red 10.2.1.40
    10.2.1.40 dev vrf-red
        cache

Add the flags to skip the custom route and go directly to the FIB. With
this patch the actual route is returned:

    $ ip route get oif vrf-red 10.2.1.40
    10.2.1.40 dev eth1  src 10.2.1.2
        cache

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:31:16 -07:00
David S. Miller
2579c98f0d For the current cycle, we have the following right now:
* many internal fixes, API improvements, cleanups, etc.
  * full AP client state tracking in cfg80211/mac80211 from Ayala
  * VHT support (in mac80211) for mesh
  * some A-MSDU in A-MPDU support from Emmanuel
  * show current TX power to userspace (from Rafał)
  * support for netlink dump in vendor commands (myself)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJWEp5XAAoJEDBSmw7B7bqr8DsQAICgQL7gSkHUlc6rbMJ9MzX+
 9W0SNpZHSmfE0ZsL3cCoeHbk5dGhX82GumIz4GeqtvIKUNHkC8qlnXJIKTEva+sp
 PjcF1wS0qQFdt6sg/Zxq+4Q8lZrZf1xP9W0x0ORYi9d9qej07JAZku8zYt4agpNV
 R4nCl/gKVF375aV8y+qi+WSZXx4j80dJkokoVk4hzotWjd0bGVL1T9YwDRzxg4FI
 S0DnkxlsD3MRHJXq+9+DbF5cuTjCG2LZNcDIBy455eWN27j9CWgEPVXoySQjDgQc
 ayf2siw7BccqnV84et0vi+0WYXdZCHm3zCen44s4vaCflhdGxdx48V+Lib6mluR3
 OEM1V1l9uV97UyORPljRKvDURq2IUdLQw00of26CTX8qEnmQIfxC7qaRg0rYEiGW
 SbTClbEiEkBLV+sCStnkv8GJHNpvtI/2VQXH1ydrHsrWC3Sl9bpPOWYlNBPwdzM9
 U4zgpxf6gLqlsukQKmMDmoKW7T04Fs0qgE99ThU2x6uTGsux8bfbxgzPCfUdeY8M
 HmCB5oBCZKJ5pzv6z6lUGc0cO42IL50aBrrlatrEekjevUXW3MMOZCwGrUXxpMw1
 gd+2PnLCCUeDyKNvkpXEgr4uS9Egc0sWH1RlpDPaAA5gRdRHiDn7MK7Z+s5OpNIC
 wnFCQKB+KrNNrQFuXz9k
 =BF9F
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
For the current cycle, we have the following right now:
 * many internal fixes, API improvements, cleanups, etc.
 * full AP client state tracking in cfg80211/mac80211 from Ayala
 * VHT support (in mac80211) for mesh
 * some A-MSDU in A-MPDU support from Emmanuel
 * show current TX power to userspace (from Rafał)
 * support for netlink dump in vendor commands (myself)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:29:18 -07:00
David Ahern
bb191c3e87 net: Add l3mdev saddr lookup to raw_sendmsg
ping originated on box through a VRF device is showing up in tcpdump
without a source address:
    $ tcpdump -n -i vrf-blue
    08:58:33.311303 IP 0.0.0.0 > 10.2.2.254: ICMP echo request, id 2834, seq 1, length 64
    08:58:33.311562 IP 10.2.2.254 > 10.2.2.2: ICMP echo reply, id 2834, seq 1, length 64

Add the call to l3mdev_get_saddr to raw_sendmsg.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:27:46 -07:00
David Ahern
8cbb512c92 net: Add source address lookup op for VRF
Add operation to l3mdev to lookup source address for a given flow.
Add support for the operation to VRF driver and convert existing
IPv4 hooks to use the new lookup.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:27:44 -07:00
David Ahern
3ce58d8435 net: Refactor path selection in __ip_route_output_key_hash
VRF device needs the same path selection following lookup to set source
address. Rather than duplicating code, move existing code into a
function that is exported to modules.

Code move only; no functional change.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:27:44 -07:00
David Ahern
fee6d4c777 net: Add netif_is_l3_slave
IPv6 addrconf keys off of IFF_SLAVE so can not use it for L3 slave.
Add a new private flag and add netif_is_l3_slave function for checking
it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:27:43 -07:00
David Ahern
6e2895a8e3 net: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRC
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:27:42 -07:00
David Ahern
4148987a51 net: Fix vti use case with oif in dst lookups for IPv6
It occurred to me yesterday that 741a11d9e4 ("net: ipv6: Add
RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup
needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes
the oif to be considered in lookups which is known to break vti. This
explains why 58189ca7b2 did not the IPv6 change at the time it was
submitted.

Fixes: 42a7b32b73 ("xfrm: Add oif to dst lookups")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:24:00 -07:00
Jiri Benc
6b26ba3a7d openvswitch: netlink attributes for IPv6 tunneling
Add netlink attributes for IPv6 tunnel addresses. This enables IPv6 support
for tunnels.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:18:00 -07:00
Jiri Benc
00a93babd0 openvswitch: add tunnel protocol to sw_flow_key
Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
also acts as an indicator whether the flow contains tunnel data (this was
previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
in an union with IPv4 ones this won't work anymore).

The new field was added to a hole in sw_flow_key.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:17:59 -07:00
Nikolay Aleksandrov
4917a1548f bridge: netlink: make br_fill_info's frame size smaller
When KASAN is enabled the frame size grows > 2048 bytes and we get a
warning, so make it smaller.
net/bridge/br_netlink.c: In function 'br_fill_info':
>> net/bridge/br_netlink.c:1110:1: warning: the frame size of 2160 bytes
>> is larger than 2048 bytes [-Wframe-larger-than=]

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:15:57 -07:00
David Ahern
16660f0bd9 net: Add support for filtering neigh dump by device index
Add support for filtering neighbor dumps by device by adding the
NDA_IFINDEX attribute to the dump request.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:12:02 -07:00
Russell King
d25b8e7429 net: dsa: better error reporting
Add additional error reporting to the generic DSA code, so it's easier
to debug when things go wrong.  This was useful when initially bringing
up 88e6176 on a new board.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 02:58:49 -07:00
Peter Nørlund
0a837fe472 ipv4: Fix compilation errors in fib_rebalance
This fixes

net/built-in.o: In function `fib_rebalance':
fib_semantics.c:(.text+0x9df14): undefined reference to `__divdi3'

and

net/built-in.o: In function `fib_rebalance':
net/ipv4/fib_semantics.c:572: undefined reference to `__aeabi_ldivmod'

Fixes: 0e884c78ee ("ipv4: L3 hash-based multipath")

Signed-off-by: Peter Nørlund <pch@ordbogen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 23:48:09 -07:00
Santosh Shilimkar
0676651323 RDS: IB: split mr pool to improve 8K messages performance
8K message sizes are pretty important usecase for RDS current
workloads so we make provison to have 8K mrs available from the pool.
Based on number of SG's in the RDS message, we pick a pool to use.

Also to make sure that we don't under utlise mrs when say 8k messages
are dominating which could lead to 8k pull being exhausted, we fall-back
to 1m pool till 8k pool recovers for use.

This helps to at least push ~55 kB/s bidirectional data which
is a nice improvement.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:02 -07:00
Santosh Shilimkar
41a4e96462 RDS: IB: use max_mr from HCA caps than max_fmr
All HCA drivers seems to popullate max_mr caps and few of
them do both max_mr and max_fmr.

Hence update RDS code to make use of max_mr.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:02 -07:00
Santosh Shilimkar
67161e250a RDS: IB: mark rds_ib_fmr_wq static
Fix below warning by marking rds_ib_fmr_wq static

net/rds/ib_rdma.c:87:25: warning: symbol 'rds_ib_fmr_wq' was not declared. Should it be static?

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:02 -07:00
Santosh Shilimkar
26139dc1db RDS: IB: use already available pool handle from ibmr
rds_ib_mr already keeps the pool handle which it associates
with. Lets use that instead of round about way of fetching
it from rds_ib_device.

No functional change.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:02 -07:00
Santosh Shilimkar
2e1d6b813a RDS: IB: fix the rds_ib_fmr_wq kick call
RDS IB mr pool has its own workqueue 'rds_ib_fmr_wq', so we need
to use queue_delayed_work() to kick the work. This was hurting
the performance since pool maintenance was less often triggered
from other path.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
9441c973e1 RDS: IB: handle rds_ibdev release case instead of crashing the kernel
Just in case we are still handling the QP receive completion while the
rds_ibdev is released, drop the connection instead of crashing the kernel.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
0c28c04500 RDS: IB: split send completion handling and do batch ack
Similar to what we did with receive CQ completion handling, we split
the transmit completion handler so that it lets us implement batched
work completion handling.

We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to
identify the send vs receive completion event handler invocation.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
f4f943c958 RDS: IB: ack more receive completions to improve performance
For better performance, we split the receive completion IRQ handler. That
lets us acknowledge several WCE events in one call. We also limit the WC
to max 32 to avoid latency. Acknowledging several completions in one call
instead of several calls each time will provide better performance since
less mutual exclusion locks are being performed.

In next patch, send completion is also split which re-uses the poll_cq()
and hence the code is moved to ib_cm.c

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
db6526dcb5 RDS: use rds_send_xmit() state instead of RDS_LL_SEND_FULL
In Transport indepedent rds_sendmsg(), we shouldn't make decisions based
on RDS_LL_SEND_FULL which is used to manage the ring for RDMA based
transports. We can safely issue rds_send_xmit() and the using its
return value take decision on deferred work. This will also fix
the scenario where at times we are seeing connections stuck with
the LL_SEND_FULL bit getting set and never cleared.

We kick krdsd after any time we see -ENOMEM or -EAGAIN from the
ring allocation code.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:19:01 -07:00
Santosh Shilimkar
4bebdd7a4d RDS: defer the over_batch work to send worker
Current process gives up if its send work over the batch limit.
The work queue will get  kicked to finish off any other requests.
This fixes remainder condition from commit 443be0e5af ("RDS: make
sure not to loop forever inside rds_send_xmit").

The restart condition is only for the case where we reached to
over_batch code for some other reason so just retrying again
before giving up.

While at it, make sure we use already available 'send_batch_count'
parameter instead of magic value. The batch count threshold value
of 1024 came via commit 443be0e5af ("RDS: make sure not to loop
forever inside rds_send_xmit"). The idea is to process as big a
batch as we can but at the same time we don't hold other waiting
processes for send. Hence back-off after the send_batch_count
limit (1024) to avoid soft-lock ups.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05 11:18:45 -07:00
Andrzej Hajda
5edfcee5ed mac80211: make ieee80211_new_mesh_header return unsigned
The function returns always non-negative values.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-10-05 17:54:16 +02:00
Daniel Borkmann
bab1899187 bpf, seccomp: prepare for upcoming criu support
The current ongoing effort to dump existing cBPF seccomp filters back
to user space requires to hold the pre-transformed instructions like
we do in case of socket filters from sk_attach_filter() side, so they
can be reloaded in original form at a later point in time by utilities
such as criu.

To prepare for this, simply extend the bpf_prog_create_from_user()
API to hold a flag that tells whether we should store the original
or not. Also, fanout filters could make use of that in future for
things like diag. While fanout filters already use bpf_prog_destroy(),
move seccomp over to them as well to handle original programs when
present.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:47:05 -07:00
David S. Miller
40e106801e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next
Eric W. Biederman says:

====================
net: Pass net through ip fragmention

This is the next installment of my work to pass struct net through the
output path so the code does not need to guess how to figure out which
network namespace it is in, and ultimately routes can have output
devices in another network namespace.

This round focuses on passing net through ip fragmentation which we seem
to call from about everywhere.  That is the main ip output paths, the
bridge netfilter code, and openvswitch.  This has to happend at once
accross the tree as function pointers are involved.

First some prep work is done, then ipv4 and ipv6 are converted and then
temporary helper functions are removed.
====================

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:39:31 -07:00
Sowmini Varadhan
76b29ef120 RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit
For the same reasons as commit 2f53384424 ("tcp: allow splice() to
build full TSO packets") and commit 35f9c09fe9 ("tcp: tcp_sendpages()
should call tcp_push() once"), rds_tcp_xmit may have multiple pages to
send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to
tcp_sendpage()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:34:53 -07:00
Sowmini Varadhan
1edd6a14d2 RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune
Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K)
clobbers efficient use of TSO because it inflates the size_goal
that is computed in tcp_sendmsg/tcp_sendpage and skews packet
latency, and the default values for these parameters actually
results in significantly better performance.

In request-response tests using rds-stress with a packet size of
100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16)
between a single pair of IP addresses achieves a throughput of
6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under
equivalent conditions on these platforms.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:34:53 -07:00
Sowmini Varadhan
3b20fc3897 RDS: Use a single TCP socket for both send and receive.
Commit f711a6ae06 ("net/rds: RDS-TCP: Always create a new rds_sock
for an incoming connection.") modified rds-tcp so that an incoming SYN
would ignore an existing "client" TCP connection which had the local
port set to the transient port.  The motivation for ignoring the existing
"client" connection in f711a6ae was to avoid race conditions and an
endless duel of reconnect attempts triggered by a restart/abort of one
of the nodes in the TCP connection.

However, having separate sockets for active and passive sides
is avoidable, and the simpler model of a single TCP socket for
both send and receives of all RDS connections associated with
that tcp socket makes for easier observability. We avoid the race
conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
The c_outgoing bit is initialized in __rds_conn_create().

A side-effect of re-using the client rds_connection for an incoming
SYN is the potential of encountering duelling SYNs, i.e., we
have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
SYN. The logic to arbitrate this criss-crossing SYN exchange in
rds_tcp_accept_one() has been modified to emulate the BGP state
machine: the smaller IP address should back off from the connection attempt.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:34:51 -07:00
Eric Dumazet
ac8cfc7bb8 tcp: restore fastopen operations
I accidentally cleared fastopenq.max_qlen in reqsk_queue_alloc()
while max_qlen can be set before listen() is called,
using TCP_FASTOPEN socket option for example.

Fixes: 0536fcc039 ("tcp: prepare fastopen code for upcoming listener changes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:19:06 -07:00
Arnd Bergmann
3ef0a25bf9 net: sctp: avoid incorrect time_t use
We want to avoid using time_t in the kernel because of the y2038
overflow problem. The use in sctp is not for storing seconds at
all, but instead uses microseconds and is passed as 32-bit
on all machines.

This patch changes the type to u32, which better fits the use.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: linux-sctp@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:16:48 -07:00
Arnd Bergmann
3dd7669f1f ipv6: use ktime_t for internal timestamps
The ipv6 mip6 implementation is one of only a few users of the
skb_get_timestamp() function in the kernel, which is both unsafe
on 32-bit architectures because of the 2038 overflow, and slightly
less efficient than the skb_get_ktime() based approach.

This converts the function call and the mip6_report_rate_limiter
structure that stores the time stamp, eliminating all uses of
timeval in the ipv6 code.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:16:47 -07:00
Arnd Bergmann
f6389ecbc5 nfnetlink: use y2038 safe timestamp
The __build_packet_message function fills a nfulnl_msg_packet_timestamp
structure that uses 64-bit seconds and is therefore y2038 safe, but
it uses an intermediate 'struct timespec' which is not.

This trivially changes the code to use 'struct timespec64' instead,
to correct the result on 32-bit architectures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:16:47 -07:00
Arnd Bergmann
84b00607ae mac80211: use ktime_get_seconds
The mac80211 code uses ktime_get_ts to measure the connected time.
As this uses monotonic time, it is y2038 safe on 32-bit systems,
but we still want to deprecate the use of 'timespec' because most
other users are broken.

This changes the code to use ktime_get_seconds() instead, which
avoids the timespec structure and is slightly more efficient.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:16:45 -07:00
Peter Nørlund
79a131592d ipv4: ICMP packet inspection for multipath
ICMP packets are inspected to let them route together with the flow they
belong to, minimizing the chance that a problematic path will affect flows
on other paths, and so that anycast environments can work with ECMP.

Signed-off-by: Peter Nørlund <pch@ordbogen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 03:00:04 -07:00
Peter Nørlund
0e884c78ee ipv4: L3 hash-based multipath
Replaces the per-packet multipath with a hash-based multipath using
source and destination address.

Signed-off-by: Peter Nørlund <pch@ordbogen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 02:59:21 -07:00
Eric Dumazet
a1a5344ddb tcp: avoid two atomic ops for syncookies
inet_reqsk_alloc() is used to allocate a temporary request
in order to generate a SYNACK with a cookie. Then later,
syncookie validation also uses a temporary request.

These paths already took a reference on listener refcount,
we can avoid a couple of atomic operations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 02:45:27 -07:00
Eric Dumazet
004a5d0140 net: use sk_fullsock() in __netdev_pick_tx()
SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a
sk_dst_cache pointer.

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 02:45:25 -07:00
Eric Dumazet
7656d842de tcp: fix fastopen races vs lockless listener
There are multiple races that need fixes :

1) skb_get() + queue skb + kfree_skb() is racy

An accept() can be done on another cpu, data consumed immediately.
tcp_recvmsg() uses __kfree_skb() as it is assumed all skb found in
socket receive queue are private.

Then the kfree_skb() in tcp_rcv_state_process() uses an already freed skb

2) tcp_reqsk_record_syn() needs to be done before tcp_try_fastopen()
for the same reasons.

3) We want to send the SYNACK before queueing child into accept queue,
otherwise we might reintroduce the ooo issue fixed in
commit 7c85af8810 ("tcp: avoid reorders for TFO passive connections")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 02:45:24 -07:00
Marcel Holtmann
22db3cbcf9 Bluetooth: Send transport open and close monitor events
When the core starts or shuts down the actual HCI transport, send a new
monitor event that indicates that this is happening. These new events
correspond to HCI_DEV_OPEN and HCI_DEV_CLOSE events.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-05 10:30:49 +03:00
Marcel Holtmann
e9ca8bf157 Bluetooth: Move handling of HCI_RUNNING flag into core
Setting and clearing of HCI_RUNNING flag in each and every driver is
just duplicating the same code all over the place. So instead of having
the driver do it in their hdev->open and hdev->close callbacks, set it
globally in the core transport handling.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-05 10:30:25 +03:00
Marcel Holtmann
73d0d3c867 Bluetooth: Move HCI_RUNNING check into hci_send_frame
In all callbacks for hdev->send the status of HCI_RUNNING is checked. So
instead of repeating that code in every driver, move the check into the
hci_send_frame function before calling hdev->send.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-05 10:30:10 +03:00
Marcel Holtmann
4a3f95b7b6 Bluetooth: Introduce HCI_DEV_OPEN and HCI_DEV_CLOSE events
When opening the HCI transport via hdev->open send HCI_DEV_OPEN event
and when closing the HCI transport via hdev->close send HCI_DEV_CLOSE.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-05 10:29:36 +03:00
Marcel Holtmann
ed1b28a48b Bluetooth: Limit userspace exposure of stack internal events
The stack internal events that are exposed to userspace should be
limited to HCI_DEV_REG, HCI_DEV_UNREG, HCI_DEV_UP and HCI_DEV_DOWN.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-05 10:29:23 +03:00
Nikolay Aleksandrov
0f963b7592 bridge: netlink: add support for default_pvid
Add IFLA_BR_VLAN_DEFAULT_PVID to allow setting/getting bridge's
default_pvid via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:07 -07:00
Nikolay Aleksandrov
93870cc02a bridge: netlink: add support for netfilter tables config
Add support to allow getting/setting netfilter tables settings.
Currently these are IFLA_BR_NF_CALL_IPTABLES, IFLA_BR_NF_CALL_IP6TABLES
and IFLA_BR_NF_CALL_ARPTABLES.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:07 -07:00
Nikolay Aleksandrov
7e4df51eb3 bridge: netlink: add support for igmp's intervals
Add support to set/get all of the igmp's configurable intervals via
netlink. These currently are:
IFLA_BR_MCAST_LAST_MEMBER_INTVL
IFLA_BR_MCAST_MEMBERSHIP_INTVL
IFLA_BR_MCAST_QUERIER_INTVL
IFLA_BR_MCAST_QUERY_INTVL
IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
IFLA_BR_MCAST_STARTUP_QUERY_INTVL

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:06 -07:00
Nikolay Aleksandrov
b89e6babad bridge: netlink: add support for multicast_startup_query_count
Add IFLA_BR_MCAST_STARTUP_QUERY_CNT to allow setting/getting
br->multicast_startup_query_count via netlink. Also align the ifla
comments.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:06 -07:00
Nikolay Aleksandrov
79b859f573 bridge: netlink: add support for multicast_last_member_count
Add IFLA_BR_MCAST_LAST_MEMBER_CNT to allow setting/getting
br->multicast_last_member_count via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:06 -07:00
Nikolay Aleksandrov
858079fdae bridge: netlink: add support for igmp's hash_max
Add IFLA_BR_MCAST_HASH_MAX to allow setting/getting br->hash_max via
netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:06 -07:00
Nikolay Aleksandrov
431db3c050 bridge: netlink: add support for igmp's hash_elasticity
Add IFLA_BR_MCAST_HASH_ELASTICITY to allow setting/getting
br->hash_elasticity via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:05 -07:00
Nikolay Aleksandrov
ba062d7cc6 bridge: netlink: add support for multicast_querier
Add IFLA_BR_MCAST_QUERIER to allow setting/getting br->multicast_querier
via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:04 -07:00
Nikolay Aleksandrov
295141d904 bridge: netlink: add support for multicast_query_use_ifaddr
Add IFLA_BR_MCAST_QUERY_USE_IFADDR to allow setting/getting
br->multicast_query_use_ifaddr via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:03 -07:00
Nikolay Aleksandrov
89126327f9 bridge: netlink: add support for multicast_snooping
Add IFLA_BR_MCAST_SNOOPING to allow enabling/disabling multicast
snooping via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:02 -07:00
Nikolay Aleksandrov
a9a6bc70f5 bridge: netlink: add support for multicast_router
Add IFLA_BR_MCAST_ROUTER to allow setting and retrieving
br->multicast_router when igmp snooping is enabled.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:01 -07:00
Nikolay Aleksandrov
150217c688 bridge: netlink: add fdb flush
Simple attribute that flushes the bridge's fdb.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:46:01 -07:00
Nikolay Aleksandrov
111189abc5 bridge: netlink: add group_addr support
Add IFLA_BR_GROUP_ADDR attribute to allow setting and retrieving the
group_addr via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:45:59 -07:00
Nikolay Aleksandrov
d76bd14e0f bridge: netlink: export all timers
Export the following bridge timers (also exported via sysfs):
IFLA_BR_HELLO_TIMER, IFLA_BR_TCN_TIMER, IFLA_BR_TOPOLOGY_CHANGE_TIMER,
IFLA_BR_GC_TIMER via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:45:59 -07:00
Nikolay Aleksandrov
ed4163098e bridge: netlink: export topology_change and topology_change_detected
Add IFLA_BR_TOPOLOGY_CHANGE and IFLA_BR_TOPOLOGY_CHANGE_DETECTED and
export them via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:45:58 -07:00
Nikolay Aleksandrov
684dd248be bridge: netlink: export root path cost
Add IFLA_BR_ROOT_PATH_COST and export it via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:45:57 -07:00
Nikolay Aleksandrov
8762ba680f bridge: netlink: export root port
Add IFLA_BR_ROOT_PORT and export it via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:45:56 -07:00
Nikolay Aleksandrov
7599a2201f bridge: netlink: export bridge id
Add IFLA_BR_BRIDGE_ID and export br->bridge_id via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:45:55 -07:00
Nikolay Aleksandrov
5127c81f84 bridge: netlink: export root id
Add IFLA_BR_ROOT_ID and export br->designated_root via netlink. For this
purpose add struct ifla_bridge_id that would represent struct bridge_id.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:45:54 -07:00
Nikolay Aleksandrov
7910228b6b bridge: netlink: add group_fwd_mask support
Add IFLA_BR_GROUP_FWD_MASK attribute to allow setting and retrieving the
group_fwd_mask via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:45:53 -07:00
Nikolay Aleksandrov
6be144f62f bridge: vlan: use br_vlan_should_use to simplify __vlan_add/del
The checks that lead to num_vlans change are always what
br_vlan_should_use checks for, namely if the vlan is only a context or
not and depending on that it's either not counted or counted
as a real/used vlan respectively.
Also give better explanation in br_vlan_should_use's comment.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:43:50 -07:00
Nikolay Aleksandrov
2ffdf508d2 bridge: vlan: drop master_flags from __vlan_add
There's only one user now and we can include the flag directly.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:43:49 -07:00
Nikolay Aleksandrov
f8ed289fab bridge: vlan: use br_vlan_(get|put)_master to deal with refcounts
Introduce br_vlan_(get|put)_master which take a reference (or create the
master vlan first if it didn't exist) and drop a reference respectively.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:43:48 -07:00
Nikolay Aleksandrov
586c2b573e bridge: vlan: use rcu list for the ordered vlan list
When I did the conversion to rhashtable I missed the required locking of
one important user of the vlan list - br_get_link_af_size_filtered()
which is called:
br_ifinfo_notify() -> br_nlmsg_size() -> br_get_link_af_size_filtered()
and the notifications can be sent without holding rtnl. Before this
conversion the function relied on using rcu and since we already use rcu to
destroy the vlans, we can simply migrate the list to use the rcu helpers.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04 16:43:47 -07:00
Eric Dumazet
e96f78ab27 tcp/dccp: add SLAB_DESTROY_BY_RCU flag for request sockets
Before letting request sockets being put in TCP/DCCP regular
ehash table, we need to add either :

- SLAB_DESTROY_BY_RCU flag to their kmem_cache
- add RCU grace period before freeing them.

Since we carefully respected the SLAB_DESTROY_BY_RCU protocol
like ESTABLISH and TIMEWAIT sockets, use it here.

req_prot_init() being only used by TCP and DCCP, I did not add
a new slab_flags into their rsk_prot, but reuse prot->slab_flags

Since all reqsk_alloc() users are correctly dealing with a failure,
add the __GFP_NOWARN flag to avoid traces under pressure.

Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 13:25:20 -07:00
Daniel Borkmann
754f1e6a36 sched, bpf: make skb->priority writable
{cls,act}_bpf can now set the skb->priority from an eBPF program based
on various critera, so that for example classful qdiscs like multiq can
update the skb's priority during enqueue time and further push it down
into subsequent qdiscs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 05:02:41 -07:00
Daniel Borkmann
c46646d048 sched, bpf: add helper for retrieving routing realms
Using routing realms as part of the classifier is quite useful, it
can be viewed as a tag for one or multiple routing entries (think of
an analogy to net_cls cgroup for processes), set by user space routing
daemons or via iproute2 as an indicator for traffic classifiers and
later on processed in the eBPF program.

Unlike actions, the classifier can inspect device flags and enable
netif_keep_dst() if necessary. tc actions don't have that possibility,
but in case people know what they are doing, it can be used from there
as well (e.g. via devs that must keep dsts by design anyway).

If a realm is set, the handler returns the non-zero realm. User space
can set the full 32bit realm for the dst.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 05:02:41 -07:00
Daniel Borkmann
a91263d520 ebpf: migrate bpf_prog's flags to bitfield
As we need to add further flags to the bpf_prog structure, lets migrate
both bools to a bitfield representation. The size of the base structure
(excluding insns) remains unchanged at 40 bytes.

Add also tags for the kmemchecker, so that it doesn't throw false
positives. Even in case gcc would generate suboptimal code, it's not
being accessed in performance critical paths.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 05:02:39 -07:00
Jiri Pirko
9e8f4a548a switchdev: push object ID back to object structure
Suggested-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 04:49:40 -07:00
Jiri Pirko
648b4a995a switchdev: bring back switchdev_obj and use it as a generic object param
Replace "void *obj" with a generic structure. Introduce couple of
helpers along that.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 04:49:39 -07:00