linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-18 09:44:18 +08:00

Author	SHA1	Message	Date
Elad Raz	6ac311ae8b	Adding switchdev ageing notification on port bridged Configure ageing time to the HW for newly bridged device CC: Scott Feldman <sfeldma@gmail.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Elad Raz <eladr@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:50:57 -07:00
Dan Carpenter	50010c2059	irda: precedence bug in irlmp_seq_hb_idx() This is decrementing the pointer, instead of the value stored in the pointer. KASan detects it as an out of bounds reference. Reported-by: "Berry Cheng 程君(成淼)" <chengmiao.cj@alibaba-inc.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:48:26 -07:00
Gao feng	f6a835bb04	vsock: fix missing cleanup when misc_register failed reset transport and unlock if misc_register failed. Signed-off-by: Gao feng <omarapazanadi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:41:06 -07:00
David Howells	146aa8b145	KEYS: Merge the type-specific data with the payload data Merge the type-specific data with the payload data into one four-word chunk as it seems pointless to keep them separate. Use user_key_payload() for accessing the payloads of overloaded user-defined keys. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cifs@vger.kernel.org cc: ecryptfs@vger.kernel.org cc: linux-ext4@vger.kernel.org cc: linux-f2fs-devel@lists.sourceforge.net cc: linux-nfs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: linux-ima-devel@lists.sourceforge.net	2015-10-21 15:18:36 +01:00
Yuchung Cheng	4f41b1c58a	tcp: use RACK to detect losses This patch implements the second half of RACK that uses the the most recent transmit time among all delivered packets to detect losses. tcp_rack_mark_lost() is called upon receiving a dubious ACK. It then checks if an not-yet-sacked packet was sent at least "reo_wnd" prior to the sent time of the most recently delivered. If so the packet is deemed lost. The "reo_wnd" reordering window starts with 1msec for fast loss detection and changes to min-RTT/4 when reordering is observed. We found 1msec accommodates well on tiny degree of reordering (<3 pkts) on faster links. We use min-RTT instead of SRTT because reordering is more of a path property but SRTT can be inflated by self-inflicated congestion. The factor of 4 is borrowed from the delayed early retransmit and seems to work reasonably well. Since RACK is still experimental, it is now used as a supplemental loss detection on top of existing algorithms. It is only effective after the fast recovery starts or after the timeout occurs. The fast recovery is still triggered by FACK and/or dupack threshold instead of RACK. We introduce a new sysctl net.ipv4.tcp_recovery for future experiments of loss recoveries. For now RACK can be disabled by setting it to 0. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:53 -07:00
Yuchung Cheng	659a8ad56f	tcp: track the packet timings in RACK This patch is the first half of the RACK loss recovery. RACK loss recovery uses the notion of time instead of packet sequence (FACK) or counts (dupthresh). It's inspired by the previous FACK heuristic in tcp_mark_lost_retrans(): when a limited transmit (new data packet) is sacked, then current retransmitted sequence below the newly sacked sequence must been lost, since at least one round trip time has elapsed. But it has several limitations: 1) can't detect tail drops since it depends on limited transmit 2) is disabled upon reordering (assumes no reordering) 3) only enabled in fast recovery ut not timeout recovery RACK (Recently ACK) addresses these limitations with the notion of time instead: a packet P1 is lost if a later packet P2 is s/acked, as at least one round trip has passed. Since RACK cares about the time sequence instead of the data sequence of packets, it can detect tail drops when later retransmission is s/acked while FACK or dupthresh can't. For reordering RACK uses a dynamically adjusted reordering window ("reo_wnd") to reduce false positives on ever (small) degree of reordering. This patch implements tcp_advanced_rack() which tracks the most recent transmission time among the packets that have been delivered (ACKed or SACKed) in tp->rack.mstamp. This timestamp is the key to determine which packet has been lost. Consider an example that the sender sends six packets: T1: P1 (lost) T2: P2 T3: P3 T4: P4 T100: sack of P2. rack.mstamp = T2 T101: retransmit P1 T102: sack of P2,P3,P4. rack.mstamp = T4 T205: ACK of P4 since the hole is repaired. rack.mstamp = T101 We need to be careful about spurious retransmission because it may falsely advance tp->rack.mstamp by an RTT or an RTO, causing RACK to falsely mark all packets lost, just like a spurious timeout. We identify spurious retransmission by the ACK's TS echo value. If TS option is not applicable but the retransmission is acknowledged less than min-RTT ago, it is likely to be spurious. We refrain from using the transmission time of these spurious retransmissions. The second half is implemented in the next patch that marks packet lost using RACK timestamp. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:48 -07:00
Yuchung Cheng	77c631273d	tcp: add tcp_tsopt_ecr_before helper a helper to prepare the main RACK patch Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:45 -07:00
Yuchung Cheng	af82f4e848	tcp: remove tcp_mark_lost_retrans() Remove the existing lost retransmit detection because RACK subsumes it completely. This also stops the overloading the ack_seq field of the skb control block. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:44 -07:00
Yuchung Cheng	f672258391	tcp: track min RTT using windowed min-filter Kathleen Nichols' algorithm for tracking the minimum RTT of a data stream over some measurement window. It uses constant space and constant time per update. Yet it almost always delivers the same minimum as an implementation that has to keep all the data in the window. The measurement window is tunable via sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes. The algorithm keeps track of the best, 2nd best & 3rd best min values, maintaining an invariant that the measurement time of the n'th best >= n-1'th best. It also makes sure that the three values are widely separated in the time window since that bounds the worse case error when that data is monotonically increasing over the window. Upon getting a new min, we can forget everything earlier because it has no value - the new min is less than everything else in the window by definition and it's the most recent. So we restart fresh on every new min and overwrites the 2nd & 3rd choices. The same property holds for the 2nd & 3rd best. Therefore we have to maintain two invariants to maximize the information in the samples, one on values (1st.v <= 2nd.v <= 3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <= now). These invariants determine the structure of the code The RTT input to the windowed filter is the minimum RTT measured from ACK or SACK, or as the last resort from TCP timestamps. The accessor tcp_min_rtt() returns the minimum RTT seen in the window. ~0U indicates it is not available. The minimum is 1usec even if the true RTT is below that. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:43 -07:00
Yuchung Cheng	9e45a3e36b	tcp: apply Kern's check on RTTs used for congestion control Currently ca_seq_rtt_us does not use Kern's check. Fix that by checking if any packet acked is a retransmit, for both RTT used for RTT estimation and congestion control. Fixes: `5b08e47ca` ("tcp: prefer packet timing to TS-ECR for RTT") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:41 -07:00
Johan Hedberg	8ce783dc5e	Bluetooth: Fix missing hdev locking for LE scan cleanup The hci_conn objects don't have a dedicated lock themselves but rely on the caller to hold the hci_dev lock for most types of access. The hci_conn_timeout() function has so far sent certain HCI commands based on the hci_conn state which has been possible without holding the hci_dev lock. The recent changes to do LE scanning before connect attempts added even more operations to hci_conn and hci_dev from hci_conn_timeout, thereby exposing potential race conditions with the hci_dev and hci_conn states. As an example of such a race, here there's a timeout but an l2cap_sock_connect() call manages to race with the cleanup routine: [Oct21 08:14] l2cap_chan_timeout: chan ee4b12c0 state BT_CONNECT [ +0.000004] l2cap_chan_close: chan ee4b12c0 state BT_CONNECT [ +0.000002] l2cap_chan_del: chan ee4b12c0, conn f3141580, err 111, state BT_CONNECT [ +0.000002] l2cap_sock_teardown_cb: chan ee4b12c0 state BT_CONNECT [ +0.000005] l2cap_chan_put: chan ee4b12c0 orig refcnt 4 [ +0.000010] hci_conn_drop: hcon f53d56e0 orig refcnt 1 [ +0.000013] l2cap_chan_put: chan ee4b12c0 orig refcnt 3 [ +0.000063] hci_conn_timeout: hcon f53d56e0 state BT_CONNECT [ +0.000049] hci_conn_params_del: addr ee:0d:30:09:53:1f (type 1) [ +0.000002] hci_chan_list_flush: hcon f53d56e0 [ +0.000001] hci_chan_del: hci0 hcon f53d56e0 chan f4e7ccc0 [ +0.004528] l2cap_sock_create: sock e708fc00 [ +0.000023] l2cap_chan_create: chan ee4b1770 [ +0.000001] l2cap_chan_hold: chan ee4b1770 orig refcnt 1 [ +0.000002] l2cap_sock_init: sk ee4b3390 [ +0.000029] l2cap_sock_bind: sk ee4b3390 [ +0.000010] l2cap_sock_setsockopt: sk ee4b3390 [ +0.000037] l2cap_sock_connect: sk ee4b3390 [ +0.000002] l2cap_chan_connect: 00:02:72:d9:e5:8b -> ee:0d:30:09:53:1f (type 2) psm 0x00 [ +0.000002] hci_get_route: 00:02:72:d9:e5:8b -> ee:0d:30:09:53:1f [ +0.000001] hci_dev_hold: hci0 orig refcnt 8 [ +0.000003] hci_conn_hold: hcon f53d56e0 orig refcnt 0 Above the l2cap_chan_connect() shouldn't have been able to reach the hci_conn f53d56e0 anymore but since hci_conn_timeout didn't do proper locking that's not the case. The end result is a reference to hci_conn that's not in the conn_hash list, resulting in list corruption when trying to remove it later: [Oct21 08:15] l2cap_chan_timeout: chan ee4b1770 state BT_CONNECT [ +0.000004] l2cap_chan_close: chan ee4b1770 state BT_CONNECT [ +0.000003] l2cap_chan_del: chan ee4b1770, conn f3141580, err 111, state BT_CONNECT [ +0.000001] l2cap_sock_teardown_cb: chan ee4b1770 state BT_CONNECT [ +0.000005] l2cap_chan_put: chan ee4b1770 orig refcnt 4 [ +0.000002] hci_conn_drop: hcon f53d56e0 orig refcnt 1 [ +0.000015] l2cap_chan_put: chan ee4b1770 orig refcnt 3 [ +0.000038] hci_conn_timeout: hcon f53d56e0 state BT_CONNECT [ +0.000003] hci_chan_list_flush: hcon f53d56e0 [ +0.000002] hci_conn_hash_del: hci0 hcon f53d56e0 [ +0.000001] ------------[ cut here ]------------ [ +0.000461] WARNING: CPU: 0 PID: 1782 at lib/list_debug.c:56 __list_del_entry+0x3f/0x71() [ +0.000839] list_del corruption, f53d56e0->prev is LIST_POISON2 (00000200) The necessary fix is unfortunately more complicated than just adding hci_dev_lock/unlock calls to the hci_conn_timeout() call path. Particularly, the hci_conn_del() API, which expects the hci_dev lock to be held, performs a cancel_delayed_work_sync(&hcon->disc_work) which would lead to a deadlock if the hci_conn_timeout() call path tries to acquire the same lock. This patch solves the problem by deferring the cleanup work to a separate work callback. To protect against the hci_dev or hci_conn going away meanwhile temporary references are taken with the help of hci_dev_hold() and hci_conn_get(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 4.3	2015-10-21 14:25:34 +02:00
Johannes Berg	e5a9f8d046	mac80211: move station statistics into sub-structs Group station statistics by where they're (mostly) updated (TX, RX and TX-status) and group them into sub-structs of the struct sta_info. Also rename the variables since the grouping now makes it obvious where they belong. This makes it easier to identify where the statistics are updated in the code, and thus easier to think about them. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-21 10:08:22 +02:00
Johannes Berg	976bd9efda	mac80211: move beacon_loss_count into ifmgd There's little point in keeping (and even sending to userspace) the beacon_loss_count value per station, since it can only apply to the AP on a managed-mode connection. Move the value to ifmgd, advertise it only in managed mode, and remove it from ethtool as it's available through better interfaces. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-21 10:08:21 +02:00
Johannes Berg	763aa27a29	mac80211: remove sta->last_ack_signal This file only feeds a debugfs file that isn't very useful, so remove it. If necessary, we can add other ways to get this information, for example in the NL80211_CMD_PROBE_CLIENT response. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-21 10:08:21 +02:00
Marcel Holtmann	98a63aaf24	Bluetooth: Introduce driver specific post init callback Some drivers might have to restore certain settings after the init procedure has been completed. This driver callback allows them to hook into that stage. This callback is run just before the controller is declared as powered up. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-21 07:30:53 +03:00
Dean Jenkins	9f7378a9d6	Bluetooth: l2cap_disconnection_req priority over shutdown There is a L2CAP protocol race between the local peer and the remote peer demanding disconnection of the L2CAP link. When L2CAP ERTM is used, l2cap_sock_shutdown() can be called from userland to disconnect L2CAP. However, there can be a delay introduced by waiting for ACKs. During this waiting period, the remote peer may have sent a Disconnection Request. Therefore, recheck the shutdown status of the socket after waiting for ACKs because there is no need to do further processing if the connection has gone. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Harish Jenny K N <harish_kandiga@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:26 +02:00
Dean Jenkins	04ba72e6b2	Bluetooth: Reorganize mutex lock in l2cap_sock_shutdown() This commit reorganizes the mutex lock and is now only protecting l2cap_chan_close(). This is now consistent with other places where l2cap_chan_close() is called. If a conn connection exists, call mutex_lock(&conn->chan_lock) before calling l2cap_chan_close() to ensure other L2CAP protocol operations do not interfere. Note that the conn structure has to be protected from being freed as it is possible for the connection to be disconnected whilst the locks are not held. This solution allows the mutex lock to be used even when the connection has just been disconnected. This commit also reduces the scope of chan locking. The only place where chan locking is needed is the call to l2cap_chan_close(chan, 0) which if necessary closes the channel. Therefore, move the l2cap_chan_lock(chan) and l2cap_chan_lock(chan) locking calls to around l2cap_chan_close(chan, 0). This allows __l2cap_wait_ack(sk, chan) to be called with no chan locks being held so L2CAP messaging over the ACL link can be done unimpaired. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Harish Jenny K N <harish_kandiga@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:26 +02:00
Dean Jenkins	e7456437c1	Bluetooth: Unwind l2cap_sock_shutdown() l2cap_sock_shutdown() is designed to only action shutdown of the channel when shutdown is not already in progress. Therefore, reorganise the code flow by adding a goto to jump to the end of function handling when shutdown is already being actioned. This removes one level of code indentation and make the code more readable. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Harish Jenny K N <harish_kandiga@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:26 +02:00
Alexander Aring	09bf420f10	6lowpan: put mcast compression in an own function This patch moves the mcast compression algorithmn to an own function like all other compression/decompression methods in iphc. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	b5af9bdbfe	6lowpan: rework tc and flow label handling This patch reworks the handling of compression/decompression of traffic class and flow label handling. The current method is hard to understand, also doesn't checks if we can read the buffer from skb length. I tried to put the shifting operations into static inline functions and comment each steps which I did there to make it hopefully somewhat more readable. The big mess to deal with that is the that the ipv6 header bring the order "DSCP + ECN" but iphc uses "ECN + DSCP". Additional the DCSP + ECN bits are splitted in ipv6_hdr inside the priority and flow_lbl[0] fields. I tested these compressions by using fakelb 802.15.4 driver and manipulate the tc and flow label fields manually in function "__ip6_local_out" before the skb will be send to lower layers. Then I looked up the tc and flow label fields in wireshark on a wpan and lowpan interface. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	c8a3e7eb98	6lowpan: iphc: change define values This patch has the main goal to delete shift operations. Instead we doing masks and equals afterwards. E.g. for the SAM evaluation we masking only the SAM value which fits in iphc1 byte, then comparing with all possible SAM values over a switch case statement. We will not shifting the SAM value to somewhat readable anymore. Additional this patch slighty change the naming style like RFC 6282, e.g. TTL to HLIM and we will drop an errno now if CID flag is set, because we don't support it. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	028b2a8c16	6lowpan: remove lowpan_is_addr_broadcast This macro is used at 802.15.4 6LoWPAN only and can be replaced by memcmp with the interface broadcast address. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	6350047eb8	6lowpan: move IPHC functionality defines This patch removes the IPHC related defines for doing bit manipulation from global 6lowpan header to the iphc file which should the only one implementation which use these defines. Also move next header compression defines to their nhc implementation. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	607b0bd3f2	6lowpan: nhc: move iphc manipulation out of nhc This patch moves the iphc setting of next header commpression bit inside iphc functionality. Setting of IPHC bits should be happen at iphc.c file only. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	478208e3b9	6lowpan: remove lowpan_fetch_skb_u8 This patch removes the lowpan_fetch_skb_u8 function for getting the iphc bytes. Instead we using the generic which has a len parameter to tell the amount of bytes to fetch. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	8911d7748c	6lowpan: cleanup lowpan_header_decompress This patch changes the lowpan_header_decompress function by removing inklayer related information from parameters. This is currently for supporting short and extended address for iphc handling in 802154. We don't support short address handling anyway right now, but there exists already code for handling short addresses in lowpan_header_decompress. The address parameters are also changed to a void pointer, so 6LoWPAN linklayer specific code can put complex structures as these parameters and cast it again inside the generic code by evaluating linklayer type before. The order is also changed by destination address at first and then source address, which is the same like all others functions where destination is always the first, memcpy, dev_hard_header, lowpan_header_compress, etc. This patch also moves the fetching of iphc values from 6LoWPAN linklayer specific code into the generic branch. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:24 +02:00
Alexander Aring	a6f773891a	6lowpan: cleanup lowpan_header_compress This patch changes the lowpan_header_compress function by removing unused parameters like "len" and drop static value parameters of protocol type. Instead we really check the protocol type inside inside the skb structure. Also we drop the use of IEEE802154_ADDR_LEN which is link-layer specific. Instead we using EUI64_ADDR_LEN which should always the default case for now. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:24 +02:00
Alexander Aring	bf513fd6fc	6lowpan: introduce LOWPAN_IPHC_MAX_HC_BUF_LEN This patch introduces the LOWPAN_IPHC_MAX_HC_BUF_LEN define which represent the worst-case supported IPHC buffer length. It's used to allocate the stack buffer space for creating the IPHC header. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:24 +02:00
Alexander Aring	cefdb801c8	bluetooth: 6lowpan: use lowpan dispatch helpers This patch adds a check if the dataroom of skb contains a dispatch value by checking if skb->len != 0. This patch also change the dispatch evaluation by the recently introduced helpers for checking the common 6LoWPAN dispatch values for IPv6 and IPHC header. There was also a forgotten else branch which should drop the packet if no matching dispatch is available. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:24 +02:00
Alexander Aring	71cd2aa53d	mac802154: llsec: use kzfree This patch will use kzfree instead kfree for security related information which can be offered by acccident. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:24 +02:00
Johan Hedberg	a6ad2a6b9c	Bluetooth: Fix removing connection parameters when unpairing The commit `89cbb0638e` introduced support for deferred connection parameter removal when unpairing by removing them only once an existing connection gets disconnected. However, it failed to address the scenario when we're not connected and do an unpair operation. What makes things worse is that most user space BlueZ versions will first issue a disconnect request and only then unpair, meaning the buggy code will be triggered every time. This effectively causes the kernel to resume scanning and reconnect to a device for which we've removed all keys and GATT database information. This patch fixes the issue by adding the missing call to the hci_conn_params_del() function to a branch which handles the case of no existing connection. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 3.19+	2015-10-21 00:49:24 +02:00
Marcel Holtmann	e131d74a3a	Bluetooth: Add support setup stage internal notification event Before the vendor specific setup stage is triggered call back into the core to trigger an internal notification event. That event is used to send an index update to the monitor interface. With that specific event it is possible to update userspace with manufacturer information before any HCI command has been executed. This is useful for early stage debugging of vendor specific initialization sequences. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-21 00:49:23 +02:00
David Herrmann	660f0fc07d	Bluetooth: hidp: fix device disconnect on idle timeout The HIDP specs define an idle-timeout which automatically disconnects a device. This has always been implemented in the HIDP layer and forced a synchronous shutdown of the hidp-scheduler. This works just fine, but lacks a forced disconnect on the underlying l2cap channels. This has been broken since: commit `5205185d46` Author: David Herrmann <dh.herrmann@gmail.com> Date: Sat Apr 6 20:28:47 2013 +0200 Bluetooth: hidp: remove old session-management The old session-management always forced an l2cap error on the ctrl/intr channels when shutting down. The new session-management skips this, as we don't want to enforce channel policy on the caller. In other words, if user-space removes an HIDP device, the underlying channels (which are owned and referenced by user-space) are still left active. User-space needs to call shutdown(2) or close(2) to release them. Unfortunately, this does not work with idle-timeouts. There is no way to signal user-space that the HIDP layer has been stopped. The API simply does not support any event-passing except for poll(2). Hence, we restore old behavior and force EUNATCH on the sockets if the HIDP layer is disconnected due to idle-timeouts (behavior of explicit disconnects remains unmodified). User-space can still call getsockopt(..., SO_ERROR, ...) ..to retrieve the EUNATCH error and clear sk_err. Hence, the channels can still be re-used (which nobody does so far, though). Therefore, the API still supports the new behavior, but with this patch it's also compatible to the old implicit channel shutdown. Cc: <stable@vger.kernel.org> # 3.10+ Reported-by: Mark Haun <haunma@keteu.org> Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:23 +02:00
Marcel Holtmann	7e995b9ead	Bluetooth: Add new quirk for non-persistent diagnostic settings If the diagnostic settings are not persistent over HCI Reset, then this quirk can be used to tell the Bluetoth core about it. This will ensure that the settings are programmed correctly when the controller is powered up. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-21 00:49:22 +02:00
Johan Hedberg	cad20c2780	Bluetooth: Don't use remote address type to decide IRK persistency There are LE devices on the market that start off by announcing their public address and then once paired switch to using private address. To be interoperable with such devices we should simply trust the fact that we're receiving an IRK from them to indicate that they may use private addresses in the future. Instead, simply tie the persistency to the bonding/no-bonding information the same way as for LTKs and CSRKs. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:21 +02:00
Marcel Holtmann	581d6fd60f	Bluetooth: Queue diagnostic messages together with HCI packets Sending diagnostic messages directly to the monitor socket might cause issues for devices processing their messages in interrupt context. So instead of trying to directly forward them, queue them up with the other HCI packets and lets them be processed by the sockets at the same time. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-21 00:49:21 +02:00
Marcel Holtmann	bb77543ebd	Bluetooth: Restrict valid packet types via HCI_CHANNEL_RAW When using the HCI_CHANNEL_RAW, restrict the packet types to valid ones from the Bluetooth specification. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-21 00:49:21 +02:00
Marcel Holtmann	8cd4f58142	Bluetooth: Remove quirk for HCI_VENDOR_PKT filter handling The HCI_VENDOR_PKT quirk was needed for BPA-100/105 devices that send these messages. Now that there is support for proper diagnostic channel this quirk is no longer needed. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-21 00:49:21 +02:00
David S. Miller	26440c835f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/asix_common.c net/ipv4/inet_connection_sock.c net/switchdev/switchdev.c In the inet_connection_sock.c case the request socket hashing scheme is completely different in net-next. The other two conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-20 06:08:27 -07:00
Julia Lawall	9abebb8a10	NFC: delete null dereference The exit label performs device_unlock(&dev->dev);, which will fail when dev is NULL, and nfc_put_device(dev);, which is not useful when dev is NULL, so just exit the function immediately. Problem found using scripts/coccinelle/null/deref_null.cocci Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-19 20:10:37 +02:00
Linus Torvalds	1099f86044	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Account for extra headroom in ath9k driver, from Felix Fietkau. 2) Fix OOPS in pppoe driver due to incorrect socket state transition, from Guillaume Nault. 3) Kill memory leak in amd-xgbe debugfx, from Geliang Tang. 4) Power management fixes for iwlwifi, from Johannes Berg. 5) Fix races in reqsk_queue_unlink(), from Eric Dumazet. 6) Fix dst_entry usage in ARP replies, from Jiri Benc. 7) Cure OOPSes with SO_GET_FILTER, from Daniel Borkmann. 8) Missing allocation failure check in amd-xgbe, from Tom Lendacky. 9) Various resource allocation/freeing cures in DSA< from Neil Armstrong. 10) A series of bug fixes in the openvswitch conntrack support, from Joe Stringer. 11) Fix two cases (BPF and act_mirred) where we have to clean the sender cpu stored in the SKB before transmitting. From WANG Cong and Alexei Starovoitov. 12) Disable VLAN filtering in promiscuous mode in mlx5 driver, from Achiad Shochat. 13) Older bnx2x chips cannot do 4-tuple UDP hashing, so prevent this configuration via ethtool. From Yuval Mintz. 14) Don't call rt6_uncached_list_flush_dev() from rt6_ifdown() when 'dev' is NULL, from Eric Biederman. 15) Prevent stalled link synchronization in tipc, from Jon Paul Maloy. 16) kcalloc() gstrings ethtool buffer before having driver fill it in, in order to prevent kernel memory leaking. From Joe Perches. 17) Fix mixxing rt6_info initialization for blackhole routes, from Martin KaFai Lau. 18) Kill VLAN regression in via-rhine, from Andrej Ota. 19) Missing pfmemalloc check in sk_add_backlog(), from Eric Dumazet. 20) Fix spurious MSG_TRUNC signalling in netlink dumps, from Ronen Arad. 21) Scrube SKBs when pushing them between namespaces in openvswitch, from Joe Stringer. 22) bcmgenet enables link interrupts too early, fix from Florian Fainelli. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits) net: bcmgenet: Fix early link interrupt enabling tunnels: Don't require remote endpoint or ID during creation. openvswitch: Scrub skb between namespaces xen-netback: correctly check failed allocation net: asix: add support for the Billionton GUSB2AM-1G-B USB adapter netlink: Trim skb to alloc size to avoid MSG_TRUNC net: add pfmemalloc check in sk_add_backlog() via-rhine: fix VLAN receive handling regression. ipv6: Initialize rt6_info properly in ip6_blackhole_route() ipv6: Move common init code for rt6_info to a new function rt6_info_init() Bluetooth: Fix initializing conn_params in scan phase Bluetooth: Fix conn_params list update in hci_connect_le_scan_cleanup Bluetooth: Fix remove_device behavior for explicit connects Bluetooth: Fix LE reconnection logic Bluetooth: Fix reference counting for LE-scan based connections Bluetooth: Fix double scan updates mlxsw: core: Fix race condition in __mlxsw_emad_transmit tipc: move fragment importance field to new header position ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings tipc: eliminate risk of stalled link synchronization ...	2015-10-19 09:55:40 -07:00
Steffen Klassert	ca064bd893	xfrm: Fix pmtu discovery for local generated packets. Commit `044a832a77` ("xfrm: Fix local error reporting crash with interfamily tunnels") moved the setting of skb->protocol behind the last access of the inner mode family to fix an interfamily crash. Unfortunately now skb->protocol might not be set at all, so we fail dispatch to the inner address family. As a reault, the local error handler is not called and the mtu value is not reported back to userspace. We fix this by setting skb->protocol on message size errors before we call xfrm_local_error. Fixes: `044a832a77` ("xfrm: Fix local error reporting crash with interfamily tunnels") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-10-19 10:30:05 +02:00
David S. Miller	371f1c7e0d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. Most relevantly, updates for the nfnetlink_log to integrate with conntrack, fixes for cttimeout and improvements for nf_queue core, they are: 1) Remove useless ifdef around static inline function in IPVS, from Eric W. Biederman. 2) Simplify the conntrack support for nfnetlink_queue: Merge nfnetlink_queue_ct.c file into nfnetlink_queue_core.c, then rename it back to nfnetlink_queue.c 3) Use y2038 safe timestamp from nfnetlink_queue. 4) Get rid of dead function definition in nf_conntrack, from Flavio Leitner. 5) Attach conntrack support for nfnetlink_log.c, from Ken-ichirou MATSUZAWA. This adds a new NETFILTER_NETLINK_GLUE_CT Kconfig switch that controls enabling both nfqueue and nflog integration with conntrack. The userspace application can request this via NFULNL_CFG_F_CONNTRACK configuration flag. 6) Remove unused netns variables in IPVS, from Eric W. Biederman and Simon Horman. 7) Don't put back the refcount on the cttimeout object from xt_CT on success. 8) Fix crash on cttimeout policy object removal. We have to flush out the cttimeout extension area of the conntrack not to refer to an unexisting object that was just removed. 9) Make sure rcu_callback completion before removing nfnetlink_cttimeout module removal. 10) Fix compilation warning in br_netfilter when no nf_defrag_ipv4 and nf_defrag_ipv6 are enabled. Patch from Arnd Bergmann. 11) Autoload ctnetlink dependencies when NFULNL_CFG_F_CONNTRACK is requested. Again from Ken-ichirou MATSUZAWA. 12) Don't use pointer to previous hook when reinjecting traffic via nf_queue with NF_REPEAT verdict since it may be already gone. This also avoids a deadloop if the userspace application keeps returning NF_REPEAT. 13) A bunch of cleanups for netfilter IPv4 and IPv6 code from Ian Morris. 14) Consolidate logger instance existence check in nfulnl_recv_config(). 15) Fix broken atomicity when applying configuration updates to logger instances in nfnetlink_log. 16) Get rid of the .owner attribute in our hook object. We don't need this anymore since we're dropping pending packets that have escaped from the kernel when unremoving the hook. Patch from Florian Westphal. 17) Remove unnecessary rcu_read_lock() from nf_reinject code, we always assume RCU read side lock from .call_rcu in nfnetlink. Also from Florian. 18) Use static inline function instead of macros to define NF_HOOK() and NF_HOOK_COND() when no netfilter support in on, from Arnd Bergmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-18 22:48:34 -07:00
santosh.shilimkar@oracle.com	7b4b000951	RDS: fix rds-ping deadlock over TCP transport Sowmini found hang with rds-ping while testing RDS over TCP. Its a corner case and doesn't happen always. The issue is not reproducible with IB transport. Its clear from below dump why we see it with RDS TCP. [<ffffffff8153b7e5>] do_tcp_setsockopt+0xb5/0x740 [<ffffffff8153bec4>] tcp_setsockopt+0x24/0x30 [<ffffffff814d57d4>] sock_common_setsockopt+0x14/0x20 [<ffffffffa096071d>] rds_tcp_xmit_prepare+0x5d/0x70 [rds_tcp] [<ffffffffa093b5f7>] rds_send_xmit+0xd7/0x740 [rds] [<ffffffffa093bda2>] rds_send_pong+0x142/0x180 [rds] [<ffffffffa0939d34>] rds_recv_incoming+0x274/0x330 [rds] [<ffffffff810815ae>] ? ttwu_queue+0x11e/0x130 [<ffffffff814dcacd>] ? skb_copy_bits+0x6d/0x2c0 [<ffffffffa0960350>] rds_tcp_data_recv+0x2f0/0x3d0 [rds_tcp] [<ffffffff8153d836>] tcp_read_sock+0x96/0x1c0 [<ffffffffa0960060>] ? rds_tcp_recv_init+0x40/0x40 [rds_tcp] [<ffffffff814d6a90>] ? sock_def_write_space+0xa0/0xa0 [<ffffffffa09604d1>] rds_tcp_data_ready+0xa1/0xf0 [rds_tcp] [<ffffffff81545249>] tcp_data_queue+0x379/0x5b0 [<ffffffffa0960cdb>] ? rds_tcp_write_space+0xbb/0x110 [rds_tcp] [<ffffffff81547fd2>] tcp_rcv_established+0x2e2/0x6e0 [<ffffffff81552602>] tcp_v4_do_rcv+0x122/0x220 [<ffffffff81553627>] tcp_v4_rcv+0x867/0x880 [<ffffffff8152e0b3>] ip_local_deliver_finish+0xa3/0x220 This happens because rds_send_xmit() chain wants to take sock_lock which is already taken by tcp_v4_rcv() on its way to rds_tcp_data_ready(). Commit `db6526dcb5` ("RDS: use rds_send_xmit() state instead of RDS_LL_SEND_FULL") which was trying to opportunistically finish the send request in same thread context. But because of above recursive lock hang with RDS TCP, the send work from rds_send_pong() needs to deferred to worker to avoid lock up. Given RDS ping is more of connectivity test than performance critical path, its should be ok even for transport like IB. Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-18 22:45:55 -07:00
Eric Dumazet	dc6ef6be52	tcp: do not set queue_mapping on SYNACK At the time of commit `fff3269907` ("tcp: reflect SYN queue_mapping into SYNACK packets") we had little ways to cope with SYN floods. We no longer need to reflect incoming skb queue mappings, and instead can pick a TX queue based on cpu cooking the SYNACK, with normal XPS affinities. Note that all SYNACK retransmits were picking TX queue 0, this no longer is a win given that SYNACK rtx are now distributed on all cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-18 22:26:02 -07:00
Joe Stringer	740dbc2891	openvswitch: Scrub skb between namespaces If OVS receives a packet from another namespace, then the packet should be scrubbed. However, people have already begun to rely on the behaviour that skb->mark is preserved across namespaces, so retain this one field. This is mainly to address information leakage between namespaces when using OVS internal ports, but by placing it in ovs_vport_receive() it is more generally applicable, meaning it should not be overlooked if other port types are allowed to be moved into namespaces in future. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-18 22:24:50 -07:00
David S. Miller	a5d6f7dd30	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2015-10-16 First of all, sorry for the late set of patches for the 4.3 cycle. We just finished an intensive week of testing at the Bluetooth UnPlugFest and discovered (and fixed) issues there. Unfortunately a few issues affect 4.3-rc5 in a way that they break existing Bluetooth LE mouse and keyboard support. The regressions result from supporting LE privacy in conjunction with scanning for Resolvable Private Addresses before connecting. A feature that has been tested heavily (including automated unit tests), but sadly some regressions slipped in. The UnPlugFest with its multitude of test platforms is a good battle testing ground for uncovering every corner case. The patches in this pull request focus only on fixing the regressions in 4.3-rc5. The patches look a bit larger since we also added comments in the critical sections of the fixes to improve clarity. I would appreciate if we can get these regression fixes to Linus quickly. Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-18 22:23:33 -07:00
Arad, Ronen	db65a3aaf2	netlink: Trim skb to alloc size to avoid MSG_TRUNC netlink_dump() allocates skb based on the calculated min_dump_alloc or a per socket max_recvmsg_len. min_alloc_size is maximum space required for any single netdev attributes as calculated by rtnl_calcit(). max_recvmsg_len tracks the user provided buffer to netlink_recvmsg. It is capped at 16KiB. The intention is to avoid small allocations and to minimize the number of calls required to obtain dump information for all net devices. netlink_dump packs as many small messages as could fit within an skb that was sized for the largest single netdev information. The actual space available within an skb is larger than what is requested. It could be much larger and up to near 2x with align to next power of 2 approach. Allowing netlink_dump to use all the space available within the allocated skb increases the buffer size a user has to provide to avoid truncaion (i.e. MSG_TRUNG flag set). It was observed that with many VLANs configured on at least one netdev, a larger buffer of near 64KiB was necessary to avoid "Message truncated" error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when min_alloc_size was only little over 32KiB. This patch trims skb to allocated size in order to allow the user to avoid truncation with more reasonable buffer size. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-18 19:34:12 -07:00
Li RongQing	26fb342c73	ipconfig: send Client-identifier in DHCP requests A dhcp server may provide parameters to a client from a pool of IP addresses and using a shared rootfs, or provide a specific set of parameters for a specific client, usually using the MAC address to identify each client individually. The dhcp protocol also specifies a client-id field which can be used to determine the correct parameters to supply when no MAC address is available. There is currently no way to tell the kernel to supply a specific client-id, only the userspace dhcp clients support this feature, but this can not be used when the network is needed before userspace is available such as when the root filesystem is on NFS. This patch is to be able to do something like "ip=dhcp,client_id_type, client_id_value", as a kernel parameter to enable the kernel to identify itself to the server. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-18 19:23:52 -07:00
Peter Hurley	fef062cbf2	tty: Remove ASYNC_CLOSING checks in open()/hangup() methods Since at least before 2.6.30, tty drivers that do not drop the tty lock while closing cannot observe ASYNC_CLOSING set while holding the tty lock; this includes the tty driver's open() and hangup() methods, since the tty core calls these methods holding the tty lock. For these drivers, waiting for ASYNC_CLOSING to clear while opening is not required, since this condition cannot occur. Similarly, even when the open() method drops and reacquires the tty lock after blocking, ASYNC_CLOSING cannot be set (again, for drivers that do not drop the tty lock while closing). Now that tty port drivers no longer drop the tty lock while closing (since 'tty: Remove tty_wait_until_sent_from_close()'), the same conditions apply: waiting for ASYNC_CLOSING to clear while opening is not required, nor is re-checking ASYNC_CLOSING after dropping and reacquiring the tty lock while blocking (eg., in *_block_til_ready()). Note: The ASYNC_CLOSING flag state is still maintained since several bitrotting drivers use it for (dubious) other purposes. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-17 21:11:29 -07:00
Pablo Neira Ayuso	f0a0a978b6	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next This merge resolves conflicts with `75aec9df3a` ("bridge: Remove br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve netns support in the network stack that reached upstream via David's net-next tree. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/bridge/br_netfilter_hooks.c	2015-10-17 14:28:03 +02:00
Nikolay Borisov	00db674bed	netfilter: ipset: Fix sleeping memory allocation in atomic context Commit `00590fdd5b` introduced RCU locking in list type and in doing so introduced a memory allocation in list_set_add, which is done in an atomic context, due to the fact that ipset rcu list modifications are serialised with a spin lock. The reason why we can't use a mutex is that in addition to modifying the list with ipset commands, it's also being modified when a particular ipset rule timeout expires aka garbage collection. This gc is triggered from set_cleanup_entries, which in turn is invoked from a timer thus requiring the lock to be bh-safe. Concretely the following call chain can lead to "sleeping function called in atomic context" splat: call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL). And since GFP_KERNEL allows initiating direct reclaim thus potentially sleeping in the allocation path. To fix the issue change the allocation type to GFP_ATOMIC, to correctly reflect that it is occuring in an atomic context. Fixes: `00590fdd5b` ("netfilter: ipset: Introduce RCU locking in list type") Signed-off-by: Nikolay Borisov <kernel@kyup.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-17 13:01:24 +02:00
Linus Torvalds	59bcce1216	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "Just two small items from Ilya: The first patch fixes the RBD readahead to grab full objects. The second fixes the write ops to prevent undue promotion when a cache tier is configured on the server side" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: use writefull op for object size writes rbd: set max_sectors explicitly	2015-10-16 12:47:02 -07:00
Ian Morris	c8d71d08aa	netfilter: ipv4: whitespace around operators This patch cleanses whitespace around arithmetical operators. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 19:19:23 +02:00
Ian Morris	24cebe3f29	netfilter: ipv4: code indentation Use tabs instead of spaces to indent code. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 19:19:15 +02:00
Ian Morris	6c28255b46	netfilter: ipv4: function definition layout Use tabs instead of spaces to indent second line of parameters in function definitions. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 19:19:10 +02:00
Ian Morris	27951a0168	netfilter: ipv4: ternary operator layout Correct whitespace layout of ternary operators in the netfilter-ipv4 code. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 19:19:04 +02:00
Ian Morris	19f0a60201	netfilter: ipv4: label placement Whitespace cleansing: Labels should not be indented. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 19:17:41 +02:00
Arnd Bergmann	008027c31d	netfilter: turn NF_HOOK into an inline function A recent change to the dst_output handling caused a new warning when the call to NF_HOOK() is the only used of a local variable passed as 'dev', and CONFIG_NETFILTER is disabled: net/ipv6/ip6_output.c: In function 'ip6_output': net/ipv6/ip6_output.c:135:21: warning: unused variable 'dev' [-Wunused-variable] The reason for this is that the NF_HOOK macro in this case does not reference the variable at all, and the call to dev_net(dev) got removed from the ip6_output function. To avoid that warning now and in the future, this changes the macro into an equivalent inline function, which tells the compiler that the variable is passed correctly but still unused. The dn_forward function apparently had the same problem in the past and added a local workaround that no longer works with the inline function. In order to avoid a regression, we have to also remove the #ifdef from decnet in the same patch. Fixes: `ede2059dba` ("dst: Pass net into dst->output") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 18:45:36 +02:00
Florian Westphal	81b4325eba	netfilter: nf_queue: remove rcu_read_lock calls All verdict handlers make use of the nfnetlink .call_rcu callback so rcu readlock is already held. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 18:22:41 +02:00
Florian Westphal	ed78d09d59	netfilter: make nf_queue_entry_get_refs return void We don't care if module is being unloaded anymore since hook unregister handling will destroy queue entries using that hook. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 18:22:23 +02:00
Florian Westphal	2ffbceb2b0	netfilter: remove hook owner refcounting since commit `8405a8fff3` ("netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook") all pending queued entries are discarded. So we can simply remove all of the owner handling -- when module is removed it also needs to unregister all its hooks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 18:21:39 +02:00
Ilya Dryomov	e30b7577bf	rbd: use writefull op for object size writes This covers only the simplest case - an object size sized write, but it's still useful in tiering setups when EC is used for the base tier as writefull op can be proxied, saving an object promotion. Even though updating ceph_osdc_new_request() to allow writefull should just be a matter of fixing an assert, I didn't do it because its only user is cephfs. All other sites were updated. Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-10-16 16:49:01 +02:00
Jiri Pirko	573c7ba006	net: introduce pre-change upper device notifier This newly introduced netdevice notifier is called before actual change upper happens. That provides a possibility for notifier handlers to know upper change will happen and react to it, including possibility to forbid the change. That is valuable for drivers which can check if the upper device linkage is supported and forbid that in case it is not. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 07:15:05 -07:00
David Ahern	51161aa98d	net: Fix suspicious RCU usage in fib_rebalance This command: ip route add 192.168.1.0/24 nexthop via 10.2.1.5 dev eth1 nexthop via 10.2.2.5 dev eth2 generated this suspicious RCU usage message: [ 63.249262] [ 63.249939] =============================== [ 63.251571] [ INFO: suspicious RCU usage. ] [ 63.253250] 4.3.0-rc3+ #298 Not tainted [ 63.254724] ------------------------------- [ 63.256401] ../include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage! [ 63.259450] [ 63.259450] other info that might help us debug this: [ 63.259450] [ 63.262297] [ 63.262297] rcu_scheduler_active = 1, debug_locks = 1 [ 63.264647] 1 lock held by ip/2870: [ 63.265896] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff813ebfb7>] rtnl_lock+0x12/0x14 [ 63.268858] [ 63.268858] stack backtrace: [ 63.270409] CPU: 4 PID: 2870 Comm: ip Not tainted 4.3.0-rc3+ #298 [ 63.272478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 63.275745] 0000000000000001 ffff8800b8c9f8b8 ffffffff8125f73c ffff88013afcf301 [ 63.278185] ffff8800bab7a380 ffff8800b8c9f8e8 ffffffff8107bf30 ffff8800bb728000 [ 63.280634] ffff880139fe9a60 0000000000000000 ffff880139fe9a00 ffff8800b8c9f908 [ 63.283177] Call Trace: [ 63.283959] [<ffffffff8125f73c>] dump_stack+0x4c/0x68 [ 63.285593] [<ffffffff8107bf30>] lockdep_rcu_suspicious+0xfa/0x103 [ 63.287500] [<ffffffff8144d752>] __in_dev_get_rcu+0x48/0x4f [ 63.289169] [<ffffffff8144d797>] fib_rebalance+0x3e/0x127 [ 63.290753] [<ffffffff8144d986>] ? rcu_read_unlock+0x3e/0x5f [ 63.292442] [<ffffffff8144ea45>] fib_create_info+0xaf9/0xdcc [ 63.294093] [<ffffffff8106c12f>] ? sched_clock_local+0x12/0x75 [ 63.295791] [<ffffffff8145236a>] fib_table_insert+0x8c/0x451 [ 63.297493] [<ffffffff8144bf9c>] ? fib_get_table+0x36/0x43 [ 63.299109] [<ffffffff8144c3ca>] inet_rtm_newroute+0x43/0x51 [ 63.300709] [<ffffffff813ef684>] rtnetlink_rcv_msg+0x182/0x195 [ 63.302334] [<ffffffff8107d04c>] ? trace_hardirqs_on+0xd/0xf [ 63.303888] [<ffffffff813ebfb7>] ? rtnl_lock+0x12/0x14 [ 63.305346] [<ffffffff813ef502>] ? __rtnl_unlock+0x12/0x12 [ 63.306878] [<ffffffff81407c4c>] netlink_rcv_skb+0x3d/0x90 [ 63.308437] [<ffffffff813ec00e>] rtnetlink_rcv+0x21/0x28 [ 63.309916] [<ffffffff81407742>] netlink_unicast+0xfa/0x17f [ 63.311447] [<ffffffff81407a5e>] netlink_sendmsg+0x297/0x2dc [ 63.313029] [<ffffffff813c6cd4>] sock_sendmsg_nosec+0x12/0x1d [ 63.314597] [<ffffffff813c835b>] ___sys_sendmsg+0x196/0x21b [ 63.316125] [<ffffffff8100bf9f>] ? native_sched_clock+0x1f/0x3c [ 63.317671] [<ffffffff8106c12f>] ? sched_clock_local+0x12/0x75 [ 63.319185] [<ffffffff8106c397>] ? sched_clock_cpu+0x9d/0xb6 [ 63.320693] [<ffffffff8107e2d7>] ? __lock_is_held+0x32/0x54 [ 63.322145] [<ffffffff81159fcb>] ? __fget_light+0x4b/0x77 [ 63.323541] [<ffffffff813c8726>] __sys_sendmsg+0x3d/0x5b [ 63.324947] [<ffffffff813c8751>] SyS_sendmsg+0xd/0x19 [ 63.326274] [<ffffffff814c8f57>] entry_SYSCALL_64_fastpath+0x12/0x6f It looks like all of the code paths to fib_rebalance are under rtnl. Fixes: `0e884c78ee` ("ipv4: L3 hash-based multipath") Cc: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:57:55 -07:00
Eric Dumazet	ebb516af60	tcp/dccp: fix race at listener dismantle phase Under stress, a close() on a listener can trigger the WARN_ON(sk->sk_ack_backlog) in inet_csk_listen_stop() We need to test if listener is still active before queueing a child in inet_csk_reqsk_queue_add() Create a common inet_child_forget() helper, and use it from inet_csk_reqsk_queue_add() and inet_csk_listen_stop() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:52:19 -07:00
Eric Dumazet	f03f2e154f	tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper Let's reduce the confusion about inet_csk_reqsk_queue_drop() : In many cases we also need to release reference on request socket, so add a helper to do this, reducing code size and complexity. Fixes: `4bdc3d6614` ("tcp/dccp: fix behavior of stale SYN_RECV request sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:52:18 -07:00
Eric Dumazet	ef84d8ce5a	Revert "inet: fix double request socket freeing" This reverts commit `c69736696c`. At the time of above commit, tcp_req_err() and dccp_req_err() were dead code, as SYN_RECV request sockets were not yet in ehash table. Real bug was fixed later in a different commit. We need to revert to not leak a refcount on request socket. inet_csk_reqsk_queue_drop_and_put() will be added in following commit to make clean inet_csk_reqsk_queue_drop() does not release the reference owned by caller. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:52:17 -07:00
Martin KaFai Lau	0a1f596200	ipv6: Initialize rt6_info properly in ip6_blackhole_route() ip6_blackhole_route() does not initialize the newly allocated rt6_info properly. This patch: 1. Call rt6_info_init() to initialize rt6i_siblings and rt6i_uncached 2. The current rt->dst._metrics init code is incorrect: - 'rt->dst._metrics = ort->dst._metris' is not always safe - Not sure what dst_copy_metrics() is trying to do here considering ip6_rt_blackhole_cow_metrics() always returns NULL Fix: - Always do dst_copy_metrics() - Replace ip6_rt_blackhole_cow_metrics() with dst_cow_metrics_generic() 3. Mask out the RTF_PCPU bit from the newly allocated blackhole route. This bug triggers an oops (reported by Phil Sutter) in rt6_get_cookie(). It is because RTF_PCPU is set while rt->dst.from is NULL. Fixes: `d52d3997f8` ("ipv6: Create percpu rt6_info") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Phil Sutter <phil@nwl.cc> Tested-by: Phil Sutter <phil@nwl.cc> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Phil Sutter <phil@nwl.cc> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:39:16 -07:00
Martin KaFai Lau	ebfa45f0d9	ipv6: Move common init code for rt6_info to a new function rt6_info_init() Introduce rt6_info_init() to do the common init work for 'struct rt6_info' (after calling dst_alloc). It is a prep work to fix the rt6_info init logic in the ip6_blackhole_route(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Phil Sutter <phil@nwl.cc> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:39:14 -07:00
Jakub Pawlowski	5157b8a503	Bluetooth: Fix initializing conn_params in scan phase This patch makes sure that conn_params that were created just for explicit_connect, will get properly deleted during cleanup. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-16 09:24:41 +02:00
Johan Hedberg	9ad3e6ffe1	Bluetooth: Fix conn_params list update in hci_connect_le_scan_cleanup After clearing the params->explicit_connect variable the parameters may need to be either added back to the right list or potentially left absent from both the le_reports and the le_conns lists. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-16 09:24:41 +02:00
Johan Hedberg	679d2b6f9d	Bluetooth: Fix remove_device behavior for explicit connects Devices undergoing an explicit connect should not have their conn_params struct removed by the mgmt Remove Device command. This patch fixes the necessary checks in the command handler to correct the behavior. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-16 09:24:41 +02:00
Johan Hedberg	49c509220d	Bluetooth: Fix LE reconnection logic We can't use hci_explicit_connect_lookup() since that would only cover explicit connections, leaving normal reconnections completely untouched. Not using it in turn means leaving out entries in pend_le_reports. To fix this and simplify the logic move conn params from the reports list to the pend_le_conns list for the duration of an explicit connect. Once the connect is complete move the params back to the pend_le_reports list. This also means that the explicit connect lookup function only needs to look into the pend_le_conns list. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-16 09:24:41 +02:00
Johan Hedberg	b958f9a3e8	Bluetooth: Fix reference counting for LE-scan based connections The code should never directly call hci_conn_hash_del since many cleanup & reference counting updates would be lost. Normally hci_conn_del is the right thing to do, but in the case of a connection doing LE scanning this could cause a deadlock due to doing a cancel_delayed_work_sync() on the same work callback that we were called from. Connections in the LE scanning state actually need very little cleanup - just a small subset of hci_conn_del. To solve the issue, refactor out these essential pieces into a new hci_conn_cleanup() function and call that from the two necessary places. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-16 09:24:41 +02:00
Jakub Pawlowski	168b8a25c0	Bluetooth: Fix double scan updates When disable/enable scan command is issued twice, some controllers will return an error for the second request, i.e. requests with this command will fail on some controllers, and succeed on others. This patch makes sure that unnecessary scan disable/enable commands are not issued. When adding device to the auto connect whitelist when there is pending connect attempt, there is no need to update scan. hci_connect_le_scan_cleanup is conditionally executing hci_conn_params_del, that is calling hci_update_background_scan. Make the other case also update scan, and remove reduntand call from hci_connect_le_scan_remove. When stopping interleaved discovery the state should be set to stopped only when both LE scanning and discovery has stopped. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-16 09:24:41 +02:00
Johannes Berg	a515de6607	cfg80211: reg: fix reg_ignore_cell_hint return type The return type should be enum reg_request_treatment for both branches of the #ifdef. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-16 09:15:45 +02:00
Johannes Berg	81e925747e	cfg80211: reg: reduce chan_reg_rule_print_dbg() ifdef The function is void and static, so just ifdef its contents instead of duplicating the declaration. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-16 09:15:45 +02:00
Johannes Berg	9f50680292	cfg80211: reg: fix antenna gain in chan_reg_rule_print_dbg() Printing "N/A mBi" is strange - print just "N/A" instead. Also add a missing opening parenthesis. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-16 09:15:44 +02:00
Johannes Berg	d34265a3ee	cfg80211: reg: centralize freeing ignored requests Instead of having a lot of places that free ignored requests and then return REG_REQ_OK, make reg_process_hint() process REG_REQ_IGNORE by freeing the request, and let functions it calls return that instead of freeing. This also fixes a leak when a second (different) country IE hint was ignored. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-16 09:15:44 +02:00
Johannes Berg	480908a7ec	cfg80211: reg: clarify 'treatment' handling in reg_process_hint() This function can only deal with treatment values OK and ALREADY_SET so make the callees not return anything else and warn if they do. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-16 09:15:44 +02:00
Johannes Berg	fd453d3c53	cfg80211: reg: rename reg_regdb_query() to reg_query_builtin() The new name better reflects the functionality. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-16 09:15:43 +02:00
Johannes Berg	b686303691	cfg80211: reg: make CRDA support optional If there's a built-in regulatory database, there may be little point in also calling out to CRDA and failing if the system is configured that way. Allow removing CRDA support to save ~1K kernel size. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-16 09:15:39 +02:00
Jon Paul Maloy	c819930090	tipc: update node FSM when peer RESET message is received The change made in the previous commit revealed a small flaw in the way the node FSM is updated. When the function tipc_node_link_down() is called for the last link to a node, we should check whether this was caused by a local reset or by a received RESET message from the peer. In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to the node FSM, so that it is ready to re-establish contact. If this is not done, the peer node will sometimes have to go through a second establish cycle before the link becomes stable. We fix this in this commit by conditionally issuing the mentioned event in the function tipc_node_link_down(). We also move LINK_RESET FSM even away from the link_reset() function and into the caller function, partially because it is easier to follow the code when state changes are gathered at a limited number of locations, partially because there will be cases in future commits where we don't want the link to go RESET mode when link_reset() is called. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:23 -07:00
Jon Paul Maloy	282b3a0562	tipc: send out RESET immediately when link goes down When a link is taken down because of a node local event, such as disabling of a bearer or an interface, we currently leave it to the peer node to discover the broken communication. The default time for such failure discovery is 1.5-2 seconds. If we instead allow the terminating link endpoint to send out a RESET message at the moment it is reset, we can achieve the impression that both endpoints are going down instantly. Since this is a very common scenario, we find it worthwhile to make this small modification. Apart from letting the link produce the said message, we also have to ensure that the interface is able to transmit it before TIPC is detached. We do this by performing the disabling of a bearer in three steps: 1) Disable reception of TIPC packets from the interface in question. 2) Take down the links, while allowing them so send out a RESET message. 3) Disable transmission of TIPC packets on the interface. Apart from this, we now have to react on the NETDEV_GOING_DOWN event, instead of as currently the NEDEV_DOWN event, to ensure that such transmission is possible during the teardown phase. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:22 -07:00
Jon Paul Maloy	73f646cec3	tipc: delay ESTABLISH state event when link is established Link establishing, just like link teardown, is a non-atomic action, in the sense that discovering that conditions are right to establish a link, and the actual adding of the link to one of the node's send slots is done in two different lock contexts. The link FSM is designed to help bridging the gap between the two contexts in a safe manner. We have now discovered a weakness in the implementaton of this FSM. Because we directly let the link go from state LINK_ESTABLISHING to state LINK_ESTABLISHED already in the first lock context, we are unable to distinguish between a fully established link, i.e., a link that has been added to its slot, and a link that has not yet reached the second lock context. It may hence happen that a manual intervention, e.g., when disabling an interface, causes the function tipc_node_link_down() to try removing the link from the node slots, decrementing its active link counter etc, although the link was never added there in the first place. We solve this by delaying the actual state change until we reach the second lock context, inside the function tipc_node_link_up(). This makes it possible for potentail callers of __tipc_node_link_down() to know if they should proceed or not, and the problem is solved. Unforunately, the situation described above also has a second problem. Since there by necessity is a tipc_node_link_up() call pending once the node lock has been released, we must defuse that call by setting the link back from LINK_ESTABLISHING to LINK_RESET state. This forces us to make a slight modification to the link FSM, which will now look as follows. +------------------------------------+ \|RESET_EVT \| \| \| \| +--------------+ \| +-----------------\| SYNCHING \|-----------------+ \| \|FAILURE_EVT +--------------+ PEER_RESET_EVT\| \| \| A \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|SYNCH_ \|SYNCH_ \| \| \| \|BEGIN_EVT \|END_EVT \| \| \| \| \| \| \| V \| V V \| +-------------+ +--------------+ +------------+ \| \| RESETTING \|<---------\| ESTABLISHED \|--------->\| PEER_RESET \| \| +-------------+ FAILURE_ +--------------+ PEER_ +------------+ \| \| EVT \| A RESET_EVT \| \| \| \| \| \| \| \| +----------------+ \| \| \| RESET_EVT\| \|RESET_EVT \| \| \| \| \| \| \| \| \| \| \|ESTABLISH_EVT \| \| \| \| +-------------+ \| \| \| \| \| \| RESET_EVT \| \| \| \| \| \| \| \| \| \| \| V V V \| \| \| \| +-------------+ +--------------+ RESET_EVT\| +--->\| RESET \|--------->\| ESTABLISHING \|<----------------+ +-------------+ PEER_ +--------------+ \| A RESET_EVT \| \| \| \| \| \| \| \|FAILOVER_ \|FAILOVER_ \|FAILOVER_ \|BEGIN_EVT \|END_EVT \|BEGIN_EVT \| \| \| V \| \| +-------------+ \| \| FAILINGOVER \|<----------------+ +-------------+ Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:21 -07:00
Jon Paul Maloy	8306f99a51	tipc: disallow packet duplicates in link deferred queue After the previous commits, we are guaranteed that no packets of type LINK_PROTOCOL or with illegal sequence numbers will be attempted added to the link deferred queue. This makes it possible to make some simplifications to the sorting algorithm in the function tipc_skb_queue_sorted(). We also alter the function so that it will drop packets if one with the same seqeunce number is already present in the queue. This is necessary because we have identified weird packet sequences, involving duplicate packets, where a legitimate in-sequence packet may advance to the head of the queue without being detected and de-queued. Finally, we make this function outline, since it will now be called only in exceptional cases. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:21 -07:00
Jon Paul Maloy	81204c492b	tipc: improve sequence number checking The sequence number of an incoming packet is currently only checked for less than, equality to, or bigger than the next expected number, meaning that the receive window in practice becomes one half sequence number cycle, or U16_MAX/2. This does not make sense, and may not even be safe if there are extreme delays in the network. Any packet sent by the peer during the ongoing cycle must belong inside his current send window, or should otherwise be dropped if possible. Since a link endpoint cannot know its peer's current send window, it has to base this sanity check on a worst-case assumption, i.e., that the peer is using a maximum sized window of 8191 packets. Using this assumption, we now add a check that the sequence number is not bigger than next_expected + TIPC_MAX_LINK_WIN. We also re-order the checks done, so that the receive window test is performed before the gap test. This way, we are guaranteed that no packet with illegal sequence numbers are ever added to the deferred queue. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:20 -07:00
Jon Paul Maloy	f9aa358a81	tipc: simplify tipc_link_rcv() reception loop Currently, all packets received in tipc_link_rcv() are unconditionally added to the packet deferred queue, whereafter that queue is walked and all its buffers evaluated for delivery. This is both non-optimal and and makes the queue sorting function unnecessary complex. This commit changes the loop so that an arrived packet is evaluated first, and added to the deferred queue only when a sequence number gap is discovered. A non-empty deferred queue is walked until it is empty or until its head's sequence number doesn't fit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:19 -07:00
Jon Paul Maloy	9945e8043e	tipc: limit usage of temporary skb list during packet reception During packet reception, the function tipc_link_rcv() adds its accepted packets to a temporary buffer queue, before finally splicing this queue into the lock protected input queue that will be delivered up to the socket layer. The purpose is to reduce potential contention on the input queue lock. However, since the vast majority of packets arrive in sequence, they will anyway be added one by one to the input queue, and the use of the temporary queue becomes a sub-optimization. The only case where this queue makes sense is when unpacking buffers from a bundle packet; here we want to avoid dozens of small buffers to be added individually to the lock-protected input queue in a tight loop. In this commit, we remove the general usage of the temporary queue, and keep it only for the packet unbundling case. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:18 -07:00
Linus Torvalds	58bd6e0602	Changes for 4.3-rc5 - Work around connection namespace lookup bug related to RoCE - Change usnic license to Dual GPL/BSD (was intended to be that way all along, but wasn't clear, permission from contributors was chased down) - Fix an issue between NFSoRDMA and mlx5 that could cause an oops - Fix leak of sendonly multicast groups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWHoT/AAoJELgmozMOVy/deK4QALETCToLcR5RRDR+QleFUvby FnP91Pu9zGOoiuP25FT5Ny0YAmTHd1KiDQBQHRe/NrYDCH2M/q8jFJSWZLwGrG6q 8GYc1ieozGQMZvId3ZJnqUJUTEyJu9QtpiFFZJYJHriP6OShP1GiHJ/XTN9dvJ/u xcmViAYYIjjScjaY1MuYpxKITFwfZE0HtdvK7zzq+F9cpfmC//Zc0Po4V4o4Y9V3 14WgbWZyhehmECKwN95hIY1pLySadgcCxoeUDHclQ3efKLar4tEC3SOM2QZsnNRc qlCHEZYeB5TRo0dF/2CYUMLfUHkMjnUpW2BiVDbQfmPio7lGUjh2SBAQjI5i1dEQ Wg69JH1TV7BYfRiwe7n49P/BJ2vIhCR2UjQrHjilZ/h6DPSfKy29hVSvOzb5xLeJ mwl/KSKxlfT2Z1SZy0yMlJfCm8tjPwf6WhOVwkFRAhYHD3Yf31EMVzD7gTtW2MXO n5S80k5ccJlXniPWjaqerhjOZHmwHViBmHNlN4zlDCRZeT9IuKDj5mi31f7HC4gx WqJtSjRxydpbGPKROHI4vrmfARPAKNrKhj8BiqxO5Cja+TiS2QeXXr+fbRwETrLS TjXWNfS3Boy564AJ8Gfug2wfBcHwY+31Uv2a6nrMmKi+wwVexF/ENOb/x9WHZrVo VqQVI2lUBH2LsmzadD9c =usb1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "We have four batched up patches for the current rc kernel. Two of them are small fixes that are obvious. One of them is larger than I would like for a late stage rc pull, but we found an issue in the namespace lookup code related to RoCE and this works around the issue for now (we allow a lookup with a namespace to succeed on RoCE since RoCE namespaces aren't implemented yet). This will go away in 4.4 when we put in support for namespaces in RoCE devices. The last one is large in terms of lines, but is all legal and no functional changes. Cisco needed to update their files to be more specific about their license. They had intended the files to be dual licensed as GPL/BSD all along, and specified that in their module license tag, but their file headers were not up to par. They contacted all of the contributors to get agreement and then submitted a patch to update the license headers in the files. Summary: - Work around connection namespace lookup bug related to RoCE - Change usnic license to Dual GPL/BSD (was intended to be that way all along, but wasn't clear, permission from contributors was chased down) - Fix an issue between NFSoRDMA and mlx5 that could cause an oops - Fix leak of sendonly multicast groups" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/ipoib: For sendonly join free the multicast group on leave IB/cma: Accept connection without a valid netdev on RoCE xprtrdma: Don't require LOCAL_DMA_LKEY support for fastreg usnic: add missing clauses to BSD license	2015-10-15 13:44:35 -07:00
Johannes Berg	922ec58c70	cfg80211: reg: remove useless reg_timeout scheduling When the functions reg_set_rd_driver() and reg_set_rd_country_ie() return with an error, the calling function already restores data by calling restore_regulatory_settings(), so there's no need to also schedule a timeout (which would lead to other side effects such as indicating CRDA failed, which clearly isn't true.) Remove the scheduling. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-15 16:17:09 +02:00
Johannes Berg	c7d319e542	cfg80211: reg: search built-in database directly Instead of searching the built-in database only in the worker, search it directly and return an error if the entry cannot be found (or memory cannot be allocated.) This means that builtin database queries no longer rely on the timeout. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-15 16:17:09 +02:00
Johannes Berg	cecbb069cc	cfg80211: reg: rename reg_call_crda to reg_query_database The new name is more appropriate since in the case of a built-in database it may not really rely on CRDA. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-15 16:17:08 +02:00
Johannes Berg	25b20dbdc4	cfg80211: reg: fix reg_call_crda() return value bug The function reg_call_crda() can't actually validly return REG_REQ_IGNORE as it does now when calling CRDA fails since that return value isn't handled properly. Fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-15 16:17:08 +02:00
Johannes Berg	7cf3741823	cfg80211: reg: remove useless non-NULL check There's no way that the alpha2 pointer can be NULL, so no point in checking that it isn't. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-15 16:17:07 +02:00
Johannes Berg	8047d2616d	cfg80211: fix gHz to GHz There's no "g" prefix, only "G" (1e9) that was clearly intended here. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-15 16:17:07 +02:00
Jiri Pirko	771acac2ff	switchdev: assert rtnl mutex when going over lower netdevs netdev_for_each_lower_dev has to be called with rtnl mutex held. So better enforce it in switchdev functions. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:53 -07:00
Jiri Pirko	56607386e8	bridge: defer switchdev fdb del call in fdb_del_external_learn Since spinlock is held here, defer the switchdev operation. Also, ensure that defered switchdev ops are processed before port master device is unlinked. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:50 -07:00
Jiri Pirko	4d429c5ddc	switchdev: introduce possibility to defer obj_add/del Similar to the attr usecase, the caller knows if he is holding RTNL and is in atomic section. So let the called to decide the correct call variant. This allows drivers to sleep inside their ops and wait for hw to get the operation status. Then the status is propagated into switchdev core. This avoids silent errors in drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:49 -07:00
Jiri Pirko	850d0cbc91	switchdev: remove pointers from switchdev objects When object is used in deferred work, we cannot use pointers in switchdev object structures because the memory they point at may be already used by someone else. So rather do local copy of the value. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:49 -07:00
Jiri Pirko	0bc05d585d	switchdev: allow caller to explicitly request attr_set as deferred Caller should know if he can call attr_set directly (when holding RTNL) or if he has to defer the att_set processing for later. This also allows drivers to sleep inside attr_set and report operation status back to switchdev core. Switchdev core then warns if status is not ok, instead of silent errors happening in drivers. Benefit from newly introduced switchdev deferred ops infrastructure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:48 -07:00
Jiri Pirko	f7fadf3047	switchdev: make struct switchdev_attr parameter const for attr_set calls Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:46 -07:00
Jiri Pirko	793f40147e	switchdev: introduce switchdev deferred ops infrastructure Introduce infrastructure which will be used internally to defer ops. Note that the deferred ops are queued up and either are processed by scheduled work or explicitly by user calling deferred_process function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:46 -07:00
Pablo Neira	8cbc870829	netfilter: nfnetlink_log: validate dependencies to avoid breaking atomicity Check that dependencies are fulfilled before updating the logger instance, otherwise we can leave things in intermediate state on errors in nfulnl_recv_config(). [ Ken-ichirou reports that this is also fixing missing instance refcnt drop on error introduced in his patch `914eebf2f4` ("netfilter: nfnetlink_log: autoload nf_conntrack_netlink module NFQA_CFG_F_CONNTRACK config flag"). ] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>	2015-10-15 06:45:03 +02:00
Pablo Neira Ayuso	336a3b3ee9	netfilter: nfnetlink_log: consolidate check for instance in nfulnl_recv_config() This patch consolidates the check for valid logger instance once we have passed the command handling: The config message that we receive may contain the following info: 1) Command only: We always get a valid instance pointer if we just created it. In case that the instance is being destroyed or the command is unknown, we jump to exit path of nfulnl_recv_config(). This patch doesn't modify this handling. 2) Config only: In this case, the instance must always exist since the user is asking for configuration updates. If the instance doesn't exist this returns -ENODEV. 3) No command and no configs are specified: This case is rare. The user is sending us a config message with neither commands nor config options. In this case, we have to check if the instance exists and bail out otherwise. Before this patch, it was possible to send a config message with no command and no config updates for an unexisting instance without triggering an error. So this is the only case that changes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>	2015-10-15 06:44:31 +02:00
Jon Paul Maloy	dde4b5ae65	tipc: move fragment importance field to new header position In commit `e3eea1eb47` ("tipc: clean up handling of message priorities") we introduced a field in the packet header for keeping track of the priority of fragments, since this value is not present in the specified protocol header. Since the value so far only is used at the transmitting end of the link, we have not yet officially defined it as part of the protocol. Unfortunately, the field we use for keeping this value, bits 13-15 in in word 5, has turned out to be a poor choice; it is already used by the broadcast protocol for carrying the 'network id' field of the sending node. Since packet fragments also need to be transported across the broadcast protocol, the risk of conflict is obvious, and we see this happen when we use network identities larger than 2^13-1. This has escaped our testing because we have so far only been using small network id values. We now move this field to bits 0-2 in word 9, a field that is guaranteed to be unused by all involved protocols. Fixes: `e3eea1eb47` ("tipc: clean up handling of message priorities") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 19:10:08 -07:00
Eric Dumazet	f985c65c90	tcp: avoid spurious SYN flood detection at listen() time At listen() time, there is a small window where listener is visible with a zero backlog, triggering a spurious "Possible SYN flooding on port" message. Nothing prevents us from setting the correct backlog. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 19:06:32 -07:00
Eric Dumazet	c2f34a65a6	tcp/dccp: fix potential NULL deref in __inet_inherit_port() As we no longer hold listener lock in fast path, it is possible that a child is created right after listener freed its bound port, if a close() is done while incoming packets are processed. __inet_inherit_port() must detect this and return an error, so that caller can free the child earlier. Fixes: `e994b2f0fb` ("tcp: do not lock listener to process SYN packets") Fixes: `079096f103` ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 19:06:31 -07:00
Joe Perches	077cb37fcf	ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings It seems that kernel memory can leak into userspace by a kmalloc, ethtool_get_strings, then copy_to_user sequence. Avoid this by using kcalloc to zero fill the copied buffer. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 19:00:20 -07:00
David S. Miller	c3503357fb	linux-can-next-for-4.4-20151013 -----BEGIN PGP SIGNATURE----- iQEcBAABCgAGBQJWHSq8AAoJEP5prqPJtc/HCyMH/R4PeAJoSwlQQVTKxDsdC/x1 Jxue9dMpEQhINoU1EpSQxaYIwg8zzFpQqcKVzoX9bgw5YdfAVgv/wrSW98Hwg/h5 OX+QYBAvhK/1Gk0+b7fwPF323osdD/8hn4lbQorB3gEYmE4+3kKh6ivlxGNa1LfW VDfX23MhRF+iXFM64pnl7LR6BnflPQlGEKlWQgevR+cZDfEk+lDTRHjdAu/Hjokc Nwo1agptCOsS5mgE/hyLhqBc6UXSN8ytoi5acP+KtnfnLtmgw/YEt7/2QQgOOTkf T2zwCxFRQcePwoip7OXFwzkPsZkj3gn4XZCbTSErbqnQ28sFDbTQUTC1mB87f70= =0GcF -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-4.4-20151013' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2015-09-17 this is a pull request of 4 patches for net-next/master. Two patches are by Gerhard Bertelsmann, fixing some problems in the sun4i driver. The patch by Arnd Bergmann stops using timeval for the CAN broadcast manager. The last patch by Alexandre Belloni removes the otherwise unused struct at91_can_data from the driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 18:36:58 -07:00
David S. Miller	ef41a2cedb	Like last time, we have two small fixes: * fast-xmit was not doing powersave filter clearing correctly, disable fast-xmit while any such operations are still pending * a debugfs file was broken due to some infrastructure changes -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJWHMdyAAoJEDBSmw7B7bqr+MwQAIG16Oo01vLDRXtjS+XkxVzq HEXy+PfL3xDEPOq+P5Rm7Bwg1hK6EqRNh6UBab6YvKP0vyrsEgqDe29ftf16R3yC K9gcslJgm/B8OhwOUQJa9UAyiL28AY8ZTQpKS8b9z7qu7lsXRMFI/S/nVvosdrdT DGGayyABFuWWbQ0YlLOOoq17/p/BELoaOhj811dlJszkwl7zZmmjsTF4rjB7tsgJ d0+Gh+Xvx8d5Kl9cvKvgGLeh7Ms7jxnJi96xcNdxUXWylbGeo/05jpRtwnTrQlsj wYWmkwXXykppbAFO+YQE+hBpEK1KQx8aQVPxNuxv0bPgggt2dkRDJRJFS9g7nSUn kuJjNJYrVUDYRDszgzjRWi6HFln9PCZJv35BGYTVptt3qM7IcZ16vrNRlDxzTtN+ iX20Fv+IyVW3ZKC7PUIugYYpXvOibKKOpPpkiEz7DiSZXy9YKTdZuhNv3JwuTTca 0BnGIUX+M2zlBeaRUugX3pK88W1LajgKx/FnnFZ6pCivC2bQr3Uf7IsNzSIO9eEZ +q9zdumyonKi2RJXerPJFN+yXB0afv2rQRqZQqoAt3MURMI73BawXL0SUOgNPrDr 5ivCFy/6deXDnQ3mRLaT+w9alMThBSLPGXKZZKq3RJNJmUYr8Oe+6LMvtFEqPlCt s703Q3UWgZ6iyx77kd1o =Ziyp -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2015-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Like last time, we have two small fixes: * fast-xmit was not doing powersave filter clearing correctly, disable fast-xmit while any such operations are still pending * a debugfs file was broken due to some infrastructure changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 18:34:25 -07:00
Johannes Berg	6f1f5d5f4d	mac80211: remove event.c That file contains just a single function, which itself is just a single statement to call a different function. Remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-14 18:40:26 +02:00
Johannes Berg	3beea3513f	mac80211: remove cfg.h The file contains just a single declaration that can easily move to another file - remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-14 18:36:43 +02:00
Johannes Berg	fbd6ff5cea	mac80211: move sta_set_rate_info_rx() and make it static There's only a single caller of this function, so it can be moved to the same file and made static. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-14 18:31:30 +02:00
Johannes Berg	a732fa7001	mac80211: clean up ieee80211_rx_h_check_dup code Reduce indentation a bit to make the condition more readable. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-14 18:27:07 +02:00
Johannes Berg	4a733ef1be	mac80211: remove PM-QoS listener As this API has never really seen any use and most drivers don't ever use the value derived from it, remove it. Change the only driver using it (rt2x00) to simply use the DTIM period instead of the "max sleep" time. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-14 18:04:08 +02:00
Jon Paul Maloy	0f8b8e28fb	tipc: eliminate risk of stalled link synchronization In commit `6e498158a8` ("tipc: move link synch and failover to link aggregation level") we introduced a new mechanism for performing link failover and synchronization. We have now detected a bug in this mechanism. During link synchronization we use the arrival of any packet on the tunnel link to trig a check for whether it has reached the synchronization point or not. This has turned out to be too permissive, since it may cause an arriving non-last SYNCH packet to end the synch state, just to see the next SYNCH packet initiate a new synch state with a new, higher synch point. This is not fatal, but should be avoided, because it may significantly extend the synchronization period, while at the same time we are not allowed to send NACKs if packets are lost. In the worst case, a low-traffic user may see its traffic stall until a LINK_PROTOCOL state message trigs the link to leave synchronization state. At the same time, LINK_PROTOCOL packets which happen to have a (non- valid) sequence number lower than the tunnel link's rcv_nxt value will be consistently dropped, and will never be able to resolve the situation described above. We fix this by exempting LINK_PROTOCOL packets from the sequence number check, as they should be. We also reduce (but don't completely eliminate) the risk of entering multiple synchronization states by only allowing the (logically) first SYNCH packet to initiate a synchronization state. This works independently of actual packet arrival order. Fixes: commit `6e498158a8` ("tipc: move link synch and failover to link aggregation level") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 06:06:40 -07:00
Paolo Abeni	02a6d6136f	Revert "ipv4/icmp: redirect messages can use the ingress daddr as source" Revert the commit `e2ca690b65` ("ipv4/icmp: redirect messages can use the ingress daddr as source"), which tried to introduce a more suitable behaviour for ICMP redirect messages generated by VRRP routers. However RFC 5798 section 8.1.1 states: The IPv4 source address of an ICMP redirect should be the address that the end-host used when making its next-hop routing decision. while said commit used the generating packet destination address, which do not match the above and in most cases leads to no redirect packets to be generated. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 06:01:07 -07:00
Ian Morris	dbb526ebfe	netfilter: ipv6: pointer cast layout Correct whitespace layout of a pointer casting. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-14 12:30:08 +02:00
Ian Morris	4305ae44a9	netfilter: ip6_tables: improve if statements Correct whitespace layout of if statements. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-14 12:29:51 +02:00
Eric Dumazet	4bdc3d6614	tcp/dccp: fix behavior of stale SYN_RECV request sockets When a TCP/DCCP listener is closed, its pending SYN_RECV request sockets become stale, meaning 3WHS can not complete. But current behavior is wrong : incoming packets finding such stale sockets are dropped. We need instead to cleanup the request socket and perform another lookup : - Incoming ACK will give a RST answer, - SYN rtx might find another listener if available. - We expedite cleanup of request sockets and old listener socket. Fixes: `079096f103` ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 18:26:34 -07:00
Linus Torvalds	5b5f145527	Two nfsd fixes, one for an RDMA crash, one for a pnfs/block protocol bug. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWHUj5AAoJECebzXlCjuG+KIoP/RW5zigAEKqUiD7ycKR91BxD 9Nt0fqTTrbkGJhKM1/DN4YEjogAHeFW5OnGiLQRUNI/qdy+I1Gyr1kgwGmCCVDt9 d8AhnxcnXR5SmsQHk7eeUd/rnODetf0bW5YJ8PfFbnC6cmM013nR9ujEccUuCl9M hHTp+690Doab00PtWtsjmZv5d+eT1bktY/R2PuQhyQM2CKWh1u4FeNTd1lWE551D b1wSvhAGMYVEsQv8+HICDrIQ8loGfH2gpBILERLM2yJlhN1IPU3RmNSAcQpZSaql veJYVmHdpMACCLp0Dd3hwWKDYvcQ2lCqKk+Cpd0vLpvZ8J5OjCLC+a2dh0PRIYuf pwFCvbWz6dn27/9eXEKbyT2JIeBIl4qwrFjfiRKlNX0c4HGKXaE2gJrY7bxnDxe1 BatAbEFZ+rxHyPmycaj3JdyOxafmw94XzbT8q2g7tmUCj+pvAI+Pbv6PlwN6W2r7 aGBZzgd8Y9pT6ZbCB0e413d/t5ulxwkt6vVz9Jze4gfcUrWcqHaqt7AadMl7obUx AYPLAVGeHybdKlLvqv42IF2QM8ZhizM0+EnxkjfWLrsa7WbstWX5KLPpm3K80dM7 98p1ToNQDFcNU8WBZw8AkBpFz4j32RVOkvzWFWbhCo+T3is4BmP16uEEjH90aCCY skQKMrq8J1ox33gz5gT7 =Pkuy -----END PGP SIGNATURE----- Merge tag 'nfsd-4.3-2' of git://linux-nfs.org/~bfields/linux Pull nfsd fixes from Bruce Fields: "Two nfsd fixes, one for an RDMA crash, one for a pnfs/block protocol bug" * tag 'nfsd-4.3-2' of git://linux-nfs.org/~bfields/linux: svcrdma: Fix NFS server crash triggered by 1MB NFS WRITE nfsd/blocklayout: accept any minlength	2015-10-13 11:31:03 -07:00
Arnd Bergmann	ba61a8d9d7	can: avoid using timeval for uapi The can subsystem communicates with user space using a bcm_msg_head header, which contains two timestamps. This is problematic for multiple reasons: a) The structure layout is currently incompatible between 64-bit user space and 32-bit user space, and cannot work in compat mode (other than x32). b) The timeval structure layout will change in 32-bit user space when we fix the y2038 overflow problem by redefining time_t to 64-bit, making new 32-bit user space incompatible with the current kernel interface. Cars last a long time and often use old kernels, so the actual users of this code are the most likely ones to migrate to y2038 safe user space. This tries to work around part of the problem by changing the publicly visible user interface in the header, but not the binary interface. Fortunately, the values passed around in the structure are relative times and do not actually suffer from the y2038 overflow, so 32-bit is enough here. We replace the use of 'struct timeval' with a newly defined 'struct bcm_timeval' that uses the exact same binary layout as before and that still suffers from problem a) but not problem b). The downside of this approach is that any user space program that currently assigns a timeval structure to these members rather than writing the tv_sec/tv_usec portions individually will suffer a compile-time error when built with an updated kernel header. Fixing this error makes it work fine with old and new headers though. We could address problem a) by using '__u32' or 'int' members rather than 'long', but that would have a more significant downside in also breaking support for all existing 64-bit user binaries that might be using this interface, which is likely not acceptable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-can@vger.kernel.org Cc: linux-api@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2015-10-13 17:42:34 +02:00
Ian Morris	544d9b17f9	netfilter: ip6_tables: ternary operator layout Correct whitespace layout of ternary operators in the netfilter-ipv6 code. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-13 14:12:38 +02:00
Ian Morris	f9527ea9b6	netfilter: ipv6: whitespace around operators This patch cleanses whitespace around arithmetical operators. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-13 14:12:38 +02:00
Ian Morris	7695495d5a	netfilter: ipv6: code indentation Use tabs instead of spaces to indent code. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-13 14:12:38 +02:00
Ian Morris	cda219c6ad	netfilter: ip6_tables: function definition layout Use tabs instead of spaces to indent second line of parameters in function definitions. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-13 14:12:37 +02:00
Ian Morris	6ac94619b6	netfilter: ip6_tables: label placement Whitespace cleansing: Labels should not be indented. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-13 14:12:37 +02:00
Florian Westphal	514ed62ed3	netfilter: sync with packet rx also after removing queue entries We need to sync packet rx again after flushing the queue entries. Otherwise, the following race could happen: cpu1: nf_unregister_hook(H) called, H unliked from lists, calls synchronize_net() to wait for packet rx completion. Problem is that while no new nf_queue_entry structs that use H can be allocated, another CPU might receive a verdict from userspace just before cpu1 calls nf_queue_nf_hook_drop to remove this entry: cpu2: receive verdict from userspace, lock queue cpu2: unlink nf_queue_entry struct E, which references H, from queue list cpu1: calls nf_queue_nf_hook_drop, blocks on queue spinlock cpu2: unlock queue cpu1: nf_queue_nf_hook_drop drops affected queue entries cpu2: call nf_reinject for E cpu1: kfree(H) cpu2: potential use-after-free for H Cc: Eric W. Biederman <ebiederm@xmission.com> Fixes: `085db2c045` ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-13 13:59:56 +02:00
Nikolay Aleksandrov	f409d0ed87	bridge: vlan: move back vlan_flush Ido Schimmel reported a problem with switchdev devices because of the order change of del_nbp operations, more specifically the move of nbp_vlan_flush() which deletes all vlans and frees vlgrp after the rx_handler has been unregistered. So in order to fix this move vlan_flush back where it was and make it destroy the rhtable after NULLing vlgrp and waiting a grace period to make sure noone can see it. Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:57:58 -07:00
Nikolay Aleksandrov	b8d02c3cac	bridge: vlan: drop unnecessary flush code As Ido Schimmel pointed out the vlan_vid_del() code in nbp_vlan_flush is unnecessary (and is actually a remnant of the old vlan code) so we can remove it. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:57:56 -07:00
Nikolay Aleksandrov	e9c953eff7	bridge: vlan: use rcu for vlan_list traversal in br_fill_ifinfo br_fill_ifinfo is called by br_ifinfo_notify which can be called from many contexts with different locks held, sometimes it relies upon bridge's spinlock only which is a problem for the vlan code, so use explicitly rcu for that to avoid problems. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:57:54 -07:00
Nikolay Aleksandrov	907b1e6e83	bridge: vlan: use proper rcu for the vlgrp member The bridge and port's vlgrp member is already used in RCU way, currently we rely on the fact that it cannot disappear while the port exists but that is error-prone and we might miss places with improper locking (either RCU or RTNL must be held to walk the vlan_list). So make it official and use RCU for vlgrp to catch offenders. Introduce proper vlgrp accessors and use them consistently throughout the code. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:57:52 -07:00
David Ahern	ca254490c8	net: Add VRF support to IPv6 stack As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded table ids with possibly device specific ones and manipulating the oif in the flowi6. The flow flags are used to skip oif compare in nexthop lookups if the device is enslaved to a VRF via the L3 master device. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:55:08 -07:00
David Ahern	c485068778	net: Export fib6_get_table and nd_tbl Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:55:05 -07:00
Eric W. Biederman	e332bc67cf	ipv6: Don't call with rt6_uncached_list_flush_dev As originally written rt6_uncached_list_flush_dev makes no sense when called with dev == NULL as it attempts to flush all uncached routes regardless of network namespace when dev == NULL. Which is simply incorrect behavior. Furthermore at the point rt6_ifdown is called with dev == NULL no more network devices exist in the network namespace so even if the code in rt6_uncached_list_flush_dev were to attempt something sensible it would be meaningless. Therefore remove support in rt6_uncached_list_flush_dev for handling network devices where dev == NULL, and only call rt6_uncached_list_flush_dev when rt6_ifdown is called with a network device. Fixes: `8d0b94afdc` ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:52:40 -07:00
Nikolay Aleksandrov	af3793921d	bridge: fix gc_timer mod/del race condition commit `c62987bbd8` ("bridge: push bridge setting ageing_time down to switchdev") introduced a timer race condition because the gc_timer can get rearmed after it's supposedly stopped and flushed in br_dev_delete() leading to a use of freed memory. So take rtnl to sync with bridge destruction when setting ageing_timer. Here's the trace reproduced with these two commands running in parallel: while :; do echo 10000 > /sys/class/net/br0/bridge/ageing_timer; done; while :; do brctl addbr br0; ip l set br0 up; ip l set br0 down; brctl delbr br0; done; [ 300.000029] BUG: unable to handle kernel paging request at ffffffff811c59d3 [ 300.000263] IP: [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0 [ 300.000422] PGD 1a0f067 PUD 1a10063 PMD 10001e1 [ 300.000639] Oops: 0003 [#1] SMP [ 300.000793] Modules linked in: bridge stp llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd snd_hda_codec_generic qxl drm_kms_helper psmouse pcspkr ttm snd_hda_intel 9pnet_virtio evdev serio_raw joydev snd_hda_codec 9pnet virtio_balloon drm snd_hwdep virtio_console snd_hda_core pvpanic snd_pcm i2c_piix4 snd_timer acpi_cpufreq parport_pc snd parport soundcore button processor i2c_core ipv6 autofs4 hid_generic usbhid hid ext4 crc16 mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk virtio_net e1000 ehci_pci uhci_hcd ehci_hcd usbcore usb_common floppy ata_piix libata virtio_pci virtio_ring virtio scsi_mod [ 300.004008] CPU: 1 PID: 1169 Comm: bash Not tainted 4.3.0-rc3+ #46 [ 300.004008] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 300.004008] task: ffff880035be2200 ti: ffff88003795c000 task.ti: ffff88003795c000 [ 300.004008] RIP: 0010:[<ffffffff810f168e>] [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0 [ 300.004008] RSP: 0018:ffff88003fd03e78 EFLAGS: 00010046 [ 300.004008] RAX: ffff88003fd0ef60 RBX: 840fc78949c08548 RCX: 00000001ffffffff [ 300.004008] RDX: 0000000000000000 RSI: ffffffff811c59d3 RDI: ffff88003fd0df00 [ 300.004008] RBP: ffff88003fd03e78 R08: 00000000ffffffff R09: 0000000000000000 [ 300.004008] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fd0df00 [ 300.004008] R13: 0000000000000000 R14: 0000000000000001 R15: ffffffff816032e0 [ 300.004008] FS: 00007fcbdd609700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 [ 300.004008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 300.004008] CR2: ffffffff811c59d3 CR3: 0000000037879000 CR4: 00000000000406e0 [ 300.004008] Stack: [ 300.004008] ffff88003fd03ea8 ffffffff810f1775 ffff88003c8cb958 ffff88003fd0df00 [ 300.004008] 0000000000000000 0000000000000001 ffff88003fd03f18 ffffffff810f28c4 [ 300.004008] ffff88003fd0eb68 ffff88003fd0e968 ffff88003fd0e768 ffff88003fd0df68 [ 300.004008] Call Trace: [ 300.004008] <IRQ> [ 300.004008] [<ffffffff810f1775>] cascade+0x45/0x70 [ 300.004008] [<ffffffff810f28c4>] run_timer_softirq+0x2f4/0x340 [ 300.004008] [<ffffffff8107e380>] __do_softirq+0xd0/0x440 [ 300.004008] [<ffffffff8107e8a3>] irq_exit+0xb3/0xc0 [ 300.004008] [<ffffffff815c2032>] smp_apic_timer_interrupt+0x42/0x50 [ 300.004008] [<ffffffff815bfe37>] apic_timer_interrupt+0x87/0x90 [ 300.004008] <EOI> [ 300.004008] [<ffffffff811fb80c>] ? create_object+0x13c/0x2e0 [ 300.004008] [<ffffffff8109b23e>] ? __kernel_text_address+0x4e/0x70 [ 300.004008] [<ffffffff8109b23e>] ? __kernel_text_address+0x4e/0x70 [ 300.004008] [<ffffffff8101e17f>] print_context_stack+0x7f/0xf0 [ 300.004008] [<ffffffff8101d55b>] dump_trace+0x11b/0x300 [ 300.004008] [<ffffffff8102970b>] save_stack_trace+0x2b/0x50 [ 300.004008] [<ffffffff811fb80c>] create_object+0x13c/0x2e0 [ 300.004008] [<ffffffff815b2e8e>] kmemleak_alloc+0x4e/0xb0 [ 300.004008] [<ffffffff811e475d>] kmem_cache_alloc_trace+0x18d/0x2f0 [ 300.004008] [<ffffffff8128b139>] kernfs_fop_open+0xc9/0x380 [ 300.004008] [<ffffffff8120214f>] do_dentry_open+0x1ff/0x2f0 [ 300.004008] [<ffffffff8128b070>] ? kernfs_fop_release+0x70/0x70 [ 300.004008] [<ffffffff812034f9>] vfs_open+0x59/0x60 [ 300.004008] [<ffffffff812130de>] path_openat+0x1ce/0x1260 [ 300.004008] [<ffffffff812154ae>] do_filp_open+0x7e/0xe0 [ 300.004008] [<ffffffff812251ff>] ? __alloc_fd+0xaf/0x180 [ 300.004008] [<ffffffff8120387b>] do_sys_open+0x12b/0x210 [ 300.004008] [<ffffffff8120397e>] SyS_open+0x1e/0x20 [ 300.004008] [<ffffffff815bf0b6>] entry_SYSCALL_64_fastpath+0x16/0x7a [ 300.004008] Code: 66 90 48 8b 46 10 48 8b 4f 40 55 48 89 c2 48 89 e5 48 29 ca 48 81 fa ff 00 00 00 77 20 0f b6 c0 48 8d 44 c7 68 48 8b 10 48 85 d2 <48> 89 16 74 04 48 89 72 08 48 89 30 48 89 46 08 5d c3 48 81 fa [ 300.004008] RIP [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0 [ 300.004008] RSP <ffff88003fd03e78> [ 300.004008] CR2: ffffffff811c59d3 Fixes: `c62987bbd8` ("bridge: push bridge setting ageing_time down to switchdev") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:50:17 -07:00
Nikolay Aleksandrov	87aaf2caed	switchdev: check if the vlan id is in the proper vlan range VLANs 0 and 4095 are reserved and shouldn't be used, add checks to switchdev similar to the bridge. Also make sure ids above 4095 cannot be passed either. Fixes: `47f8328bb1` ("switchdev: add new switchdev bridge setlink") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:43:24 -07:00
Nikolay Aleksandrov	cc02aa8e41	switchdev: enforce no pvid flag in vlan ranges We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Elad Raz <eladr@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:41:40 -07:00
Vivien Didelot	efd29b3d82	net: dsa: do not warn unsupported bridge ops A DSA driver may not provide the port_join_bridge and port_leave_bridge functions, so don't warn in such case. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:26:30 -07:00
Sowmini Varadhan	241b271952	RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one() Consider the following "duelling syn" sequence between two peers A and B: A B SYN1 --> <-- SYN2 SYN2ACK --> Note that the SYN/ACK has already been sent out by TCP before rds_tcp_accept_one() gets invoked as part of callbacks. If the inet_addr(A) is numerically less than inet_addr(B), the arbitration scheme in rds_tcp_accept_one() will prefer the TCP connection triggered by SYN1, and will send a CLOSE for the SYN2 (just after the SYN2ACK was sent). Since B also follows the same arbitration scheme, it will send the SYN-ACK for SYN1 that will set up a healthy ESTABLISHED connection on both sides. B will also get a CLOSE for SYN2, which should result in the cleanup of the TCP state machine for SYN2, but it should not trigger any stale RDS-TCP callbacks (such as ->writespace, ->state_change etc), that would disrupt the progress of the SYN2 based RDS-TCP connection. Thus the arbitration scheme in rds_tcp_accept_one() should restore rds_tcp callbacks for the winner before setting them up for the new accept socket, and also make sure that conn->c_outgoing is set to 0 so that we do not trigger any reconnect attempts on the passive side of the tcp socket in the future, in conformance with commit `c82ac7e69e` ("net/rds: RDS-TCP: only initiate reconnect attempt on outgoing TCP socket.") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:22:41 -07:00
Sowmini Varadhan	486798001b	RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports. The IP address passed to rds_bind() should be vetted by the transport's ->laddr_check() for a previously bound transport. This needs to be done to avoid cases where, for example, the application has asked for an IB transport, but the IP address passed to bind is only usable on ethernet interfaces. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:22:40 -07:00
Florian Westphal	7ceebfe46e	netfilter: nfqueue: don't use prev pointer Usage of -prev seems buggy. While packet was out our hook cannot be removed but we have no way to know if the previous one is still valid. So better not use ->prev at all. Since NF_REPEAT just asks to invoke same hook function again, just do so, and continue with nf_interate if we get an ACCEPT verdict. A side effect of this change is that if nf_reinject(NF_REPEAT) causes another REPEAT we will now drop the skb instead of a kernel loop. However, NF_REPEAT loops would be a bug so this should not happen anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-13 12:03:24 +02:00
Johannes Berg	61f6bba006	mac80211: use new cfg80211_inform_bss_frame_data() API The new API is more easily extensible with a metadata struct passed to it, use it in mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 11:36:21 +02:00
Avraham Stern	e2845c458e	mac80211: Do not restart scheduled scan if multiple scan plans are set If multiple scan plans were set for scheduled scan, do not restart scheduled scan on reconfig because it is possible that some scan plans were already completed and there is no need to run them all over again. Instead, notify userspace that scheduled scan stopped so it can configure new scan plans for scheduled scan. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 10:35:51 +02:00
Avraham Stern	3b06d27795	cfg80211: Add multiple scan plans for scheduled scan Add the option to configure multiple 'scan plans' for scheduled scan. Each 'scan plan' defines the number of scan cycles and the interval between scans. The scan plans are executed in the order they were configured. The last scan plan will always run infinitely and thus defines only the interval between scans. The maximum number of scan plans supported by the device and the maximum number of iterations in a single scan plan are advertised to userspace so it can configure the scan plans appropriately. When scheduled scan results are received there is no way to know which scan plan is being currently executed, so there is no way to know when the next scan iteration will start. This is not a problem, however. The scan start timestamp is only used for flushing old scan results, and there is no difference between flushing all results received until the end of the previous iteration or the start of the current one, since no results will be received in between. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 10:35:26 +02:00
Dmitry Shmidt	6e19bc4b70	nl80211: allow BSS data to include CLOCK_BOOTTIME timestamp For location and connectivity services, userspace would often like to know the time when the BSS was last seen. The current "last seen" value is calculated in a way that makes it less useful, especially if the system suspended in the meantime. Add the ability for the driver to report a real CLOCK_BOOTTIME stamp that can then be reported to userspace (if present). Drivers wishing to use this must be converted to the new API to call cfg80211_inform_bss_data() or cfg80211_inform_bss_frame_data(). They need to ensure the reported value is accurate enough even when the frame might have been buffered in the device (e.g. firmware.) Signed-off-by: Dmitry Shmidt <dimitrysh@google.com> [modified to use struct, inlines] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 10:32:17 +02:00
Mohammed Shafi Shajakhan	4633dfc32c	mac80211: Fix hwflags debugfs file format Commit `30686bf7f5` ("mac80211: convert HW flags to unsigned long bitmap") accidentally removed the newline delimiter from the hwflags debugfs file. Fix this by adding back the newline between the HW flags. Cc: stable@vger.kernel.org [4.2] Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> [fix commit log] Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 10:30:56 +02:00
Tamizh chelvam	93f0490e5d	Revert "mac80211: remove exposing 'mfp' to drivers" This reverts commit `5c48f12017`. Some device drivers (ath10k) offload part of aggregation including AddBA/DelBA negotiations to firmware. In such scenario, the PMF configuration of the station needs to be provided to driver to enable encryption of AddBA/DelBA action frames. Signed-off-by: Tamizh chelvam <c_traja@qti.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 10:29:11 +02:00
Johannes Berg	985f2c87a7	Merge remote-tracking branch 'net-next/master' into mac80211-next Merge net-next to get some driver changes that patches depend on (in order to avoid conflicts). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 10:28:43 +02:00
Nikolay Aleksandrov	6623c60dc2	bridge: vlan: enforce no pvid flag in vlan ranges Currently it's possible for someone to send a vlan range to the kernel with the pvid flag set which will result in the pvid bouncing from a vlan to vlan and isn't correct, it also introduces problems for hardware where it doesn't make sense having more than 1 pvid. iproute2 already enforces this, so let's enforce it on kernel-side as well. Reported-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:59:15 -07:00
Roopa Prabhu	8c5b83f0f2	ipv6 route: use err pointers instead of returning pointer by reference This patch makes ip6_route_info_create return err pointer instead of returning the rt pointer by reference as suggested by Dave Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:47:34 -07:00
Eric W. Biederman	b72775977c	ipv6: Pass struct net into nf_ct_frag6_gather The function nf_ct_frag6_gather is called on both the input and the output paths of the networking stack. In particular ipv6_defrag which calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain on input and the LOCAL_OUT chain on output. The addition of a net parameter makes it explicit which network namespace the packets are being reassembled in, and removes the need for nf_ct_frag6_gather to guess. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:44:17 -07:00
Eric W. Biederman	19bcf9f203	ipv4: Pass struct net into ip_defrag and ip_check_defrag The function ip_defrag is called on both the input and the output paths of the networking stack. In particular conntrack when it is tracking outbound packets from the local machine calls ip_defrag. So add a struct net parameter and stop making ip_defrag guess which network namespace it needs to defragment packets in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:44:16 -07:00
Eric W. Biederman	37fcbab61b	ipv4: Only compute net once in ip_call_ra_chain ip_call_ra_chain is called early in the forwarding chain from ip_forward and ip_mr_input, which makes skb->dev the correct expression to get the input network device and dev_net(skb->dev) a correct expression for the network namespace the packet is being processed in. Compute the network namespace and store it in a variable to make the code clearer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:44:14 -07:00
Eric Dumazet	161642e24f	packet: fix match_fanout_group() Recent TCP listener patches exposed a prior af_packet bug : match_fanout_group() blindly assumes it is always safe to cast sk to a packet socket to compare fanout with af_packet_priv But SYNACK packets can be sent while attached to request_sock, which are smaller than a "struct sock". We can read non existent memory and crash. Fixes: `c0de08d042` ("af_packet: don't emit packet on orig fanout group") Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:42:38 -07:00
Paolo Abeni	e2ca690b65	ipv4/icmp: redirect messages can use the ingress daddr as source This patch allows configuring how the source address of ICMP redirect messages is selected; by default the old behaviour is retained, while setting icmp_redirects_use_orig_daddr force the usage of the destination address of the packet that caused the redirect. The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the following scenario: Two machines are set up with VRRP to act as routers out of a subnet, they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to x.x.x.254/24. If a host in said subnet needs to get an ICMP redirect from the VRRP router, i.e. to reach a destination behind a different gateway, the source IP in the ICMP redirect is chosen as the primary IP on the interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2. The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2, and will continue to use the wrong next-op. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:38:02 -07:00
Jiri Pirko	0944d6b5a2	bridge: try switchdev op first in __vlan_vid_add/del Some drivers need to implement both switchdev vlan ops and vid_add/kill ndos. For that to work in bridge code, we need to try switchdev op first when adding/deleting vlan id. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:35:20 -07:00
Eric Dumazet	ed53d0ab76	net: shrink struct sock and request_sock by 8 bytes One 32bit hole is following skc_refcnt, use it. skc_incoming_cpu can also be an union for request_sock rcv_wnd. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:28:22 -07:00
Eric Dumazet	70da268b56	net: SO_INCOMING_CPU setsockopt() support SO_INCOMING_CPU as added in commit `2c8c56e15d` was a getsockopt() command to fetch incoming cpu handling a particular TCP flow after accept() This commits adds setsockopt() support and extends SO_REUSEPORT selection logic : If a TCP listener or UDP socket has this option set, a packet is delivered to this socket only if CPU handling the packet matches the specified one. This allows to build very efficient TCP servers, using one listener per RX queue, as the associated TCP listener should only accept flows handled in softirq by the same cpu. This provides optimal NUMA behavior and keep cpu caches hot. Note that __inet_lookup_listener() still has to iterate over the list of all listeners. Following patch puts sk_refcnt in a different cache line to let this iteration hit only shared and read mostly cache lines. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:28:20 -07:00
Edward Jee	c7d39e3263	packet: support per-packet fwmark for af_packet sendmsg Signed-off-by: Edward Hyunkoo Jee <edjee@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:25:22 -07:00
Edward Jee	f28ea365cd	sock: support per-packet fwmark It's useful to allow users to set fwmark for an individual packet, without changing the socket state. The function this patch adds in sock layer can be used by the protocols that need such a feature. Signed-off-by: Edward Hyunkoo Jee <edjee@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:25:21 -07:00
Alexei Starovoitov	1be7f75d16	bpf: enable non-root eBPF programs In order to let unprivileged users load and execute eBPF programs teach verifier to prevent pointer leaks. Verifier will prevent - any arithmetic on pointers (except R10+Imm which is used to compute stack addresses) - comparison of pointers (except if (map_value_ptr == 0) ... ) - passing pointers to helper functions - indirectly passing pointers in stack to helper functions - returning pointer from bpf program - storing pointers into ctx or maps Spill/fill of pointers into stack is allowed, but mangling of pointers stored in the stack or reading them byte by byte is not. Within bpf programs the pointers do exist, since programs need to be able to access maps, pass skb pointer to LD_ABS insns, etc but programs cannot pass such pointer values to the outside or obfuscate them. Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs, so that socket filters (tcpdump), af_packet (quic acceleration) and future kcm can use it. tracing and tc cls/act program types still require root permissions, since tracing actually needs to be able to see all kernel pointers and tc is for root only. For example, the following unprivileged socket filter program is allowed: int bpf_prog1(struct __sk_buff skb) { u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); u64 value = bpf_map_lookup_elem(&my_map, &index); if (value) value += skb->len; return 0; } but the following program is not: int bpf_prog1(struct __sk_buff skb) { u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); u64 value = bpf_map_lookup_elem(&my_map, &index); if (value) value += (u64) skb; return 0; } since it would leak the kernel address into the map. Unprivileged socket filter bpf programs have access to the following helper functions: - map lookup/update/delete (but they cannot store kernel pointers into them) - get_random (it's already exposed to unprivileged user space) - get_smp_processor_id - tail_call into another socket filter program - ktime_get_ns The feature is controlled by sysctl kernel.unprivileged_bpf_disabled. This toggle defaults to off (0), but can be set true (1). Once true, bpf programs and maps cannot be accessed from unprivileged process, and the toggle cannot be set back to false. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:13:35 -07:00
Ken-ichirou MATSUZAWA	914eebf2f4	netfilter: nfnetlink_log: autoload nf_conntrack_netlink module NFQA_CFG_F_CONNTRACK config flag This patch enables to load nf_conntrack_netlink module if NFULNL_CFG_F_CONNTRACK config flag is specified. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 21:44:12 +02:00
Chuck Lever	3be7f32878	svcrdma: Fix NFS server crash triggered by 1MB NFS WRITE Now that the NFS server advertises a maximum payload size of 1MB for RPC/RDMA again, it crashes in svc_process_common() when NFS client sends a 1MB NFS WRITE on an NFS/RDMA mount. The server has set up a 259 element array of struct page pointers in rq_pages[] for each incoming request. The last element of the array is NULL. When an incoming request has been completely received, rdma_read_complete() attempts to set the starting page of the incoming page vector: rqstp->rq_arg.pages = &rqstp->rq_pages[head->hdr_count]; and the page to use for the reply: rqstp->rq_respages = &rqstp->rq_arg.pages[page_no]; But the value of page_no has already accounted for head->hdr_count. Thus rq_respages now points past the end of the incoming pages. For NFS WRITE operations smaller than the maximum, this is harmless. But when the NFS WRITE operation is as large as the server's max payload size, rq_respages now points at the last entry in rq_pages, which is NULL. Fixes: `cc9a903d91` ('svcrdma: Change maximum server payload . . .') BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-12 11:55:43 -04:00
Arnd Bergmann	c932245811	netfilter: bridge: avoid unused label warning With the ARM mini2440_defconfig, the bridge netfilter code gets built with both CONFIG_NF_DEFRAG_IPV4 and CONFIG_NF_DEFRAG_IPV6 disabled, which leads to a harmless gcc warning: net/bridge/br_netfilter_hooks.c: In function 'br_nf_dev_queue_xmit': net/bridge/br_netfilter_hooks.c:792:2: warning: label 'drop' defined but not used [-Wunused-label] This gets rid of the warning by cleaning up the code to avoid the respective #ifdefs causing this problem, and replacing them with if(IS_ENABLED()) checks. I have verified that the resulting object code is unchanged, and an additional advantage is that we now get compile coverage of the unused functions in more configurations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `dd302b59bd` ("netfilter: bridge: don't leak skb in error paths") Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:48:36 +02:00
Pablo Neira Ayuso	d53195c259	Merge tag 'ipvs4-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Fourth Round of IPVS Updates for v4.4 please consider these build warning cleanups from David Ahern and myself. They resolve some minor side effects of Eric Biederman' heroic work to cleanup IPVS which you recently pulled: its queued up for v4.4 so no need to worry about earlier kernel versions. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:38:54 +02:00
lucien	cc4998febd	netfilter: ipt_rpfilter: remove the nh_scope test in rpfilter_lookup_reverse --accept-local option works for res.type == RTN_LOCAL, which should be from the local table, but there, the fib_info's nh->nh_scope = RT_SCOPE_NOWHERE ( > RT_SCOPE_HOST). in fib_create_info(). if (cfg->fc_scope == RT_SCOPE_HOST) { struct fib_nh nh = fi->fib_nh; / Local address is added. */ if (nhs != 1 \|\| nh->nh_gw) goto err_inval; nh->nh_scope = RT_SCOPE_NOWHERE; <=== nh->nh_dev = dev_get_by_index(net, fi->fib_nh->nh_oif); err = -ENODEV; if (!nh->nh_dev) goto failure; but in our rpfilter_lookup_reverse(): if (dev_match \|\| flags & XT_RPFILTER_LOOSE) return FIB_RES_NH(res).nh_scope <= RT_SCOPE_HOST; if nh->nh_scope > RT_SCOPE_HOST, it will fail. --accept-local option will never be passed. it seems the test is bogus and can be removed to fix this issue. if (dev_match \|\| flags & XT_RPFILTER_LOOSE) return FIB_RES_NH(res).nh_scope <= RT_SCOPE_HOST; ipv6 does not have this issue. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:27:48 +02:00
Pablo Neira Ayuso	4302f5eeb9	nfnetlink_cttimeout: add rcu_barrier() on module removal Make sure kfree_rcu() released objects before leaving the module removal exit path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:04:41 +02:00
Pablo Neira Ayuso	ae2d708ed8	netfilter: conntrack: fix crash on timeout object removal The object and module refcounts are updated for each conntrack template, however, if we delete the iptables rules and we flush the timeout database, we may end up with invalid references to timeout object that are just gone. Resolve this problem by setting the timeout reference to NULL when the custom timeout entry is removed from our base. This patch requires some RCU trickery to ensure safe pointer handling. This handling is similar to what we already do with conntrack helpers, the idea is to avoid bumping the timeout object reference counter from the packet path to avoid the cost of atomic ops. Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:04:34 +02:00
Pablo Neira Ayuso	403d89ad9c	netfilter: xt_CT: don't put back reference to timeout policy object On success, this shouldn't put back the timeout policy object, otherwise we may have module refcount overflow and we allow deletion of timeout that are still in use. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 16:54:45 +02:00
Scott Feldman	c62987bbd8	bridge: push bridge setting ageing_time down to switchdev Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't support setting ageing_time (or setting bridge attrs in general). If push fails, don't update ageing_time in bridge and return err to user. If push succeeds, update ageing_time in bridge and run gc_timer now to recalabrate when to run gc_timer next, based on new ageing_time. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 05:20:20 -07:00
Scott Feldman	464314ea6c	switchdev: skip over ports returning -EOPNOTSUPP when recursing ports This allows us to recurse over all the ports, skipping over unsupporting ports. Without the change, the recursion would stop at first unsupported port. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 05:20:20 -07:00
Richard Sailer	7533ce3055	tcp: change type of alive from int to bool The alive parameter of tcp_orphan_retries, indicates whether the connection is assumed alive or not. In the function and all places calling it is used as a boolean value. Therefore this changes the type of alive to bool in the function definition and all calling locations. Since tcp_orphan_tries is a tcp_timer.c local function no change in any other file or header is necessary. Signed-off-by: Richard Sailer <richard@weltraumpflege.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 05:15:03 -07:00
Roopa Prabhu	3741873b4f	bridge: allow adding of fdb entries pointing to the bridge device This patch enables adding of fdb entries pointing to the bridge device. This can be used to propagate mac address of vlan interfaces configured on top of the vlan filtering bridge. Before: $bridge fdb add 44:38:39:00:27:9f dev bridge RTNETLINK answers: Invalid argument After: $bridge fdb add 44:38:39:00:27:9f dev bridge Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 05:11:58 -07:00
Eric Dumazet	6bcfd7f8c2	tcp: fix RFS vs lockless listeners Before recent TCP listener patches, we were updating listener sk->sk_rxhash before the cloning of master socket. children sk_rxhash was therefore correct after the normal 3WHS. But with lockless listener, we no longer dirty/change listener sk_rxhash as it would be racy. We need to correctly update the child sk_rxhash, otherwise first data packet wont hit correct cpu if RFS is used. Fixes: `079096f103` ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Cc: Tom Herbert <tom@herbertland.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:33:15 -07:00
Hannes Frederic Sowa	9ef2e965e5	ipv6: drop frames with attached skb->sk in forwarding This is a clone of commit `2ab957492d` ("ip_forward: Drop frames with attached skb->sk") for ipv6. This commit has exactly the same reasons as the above mentioned commit, namely to prevent panics during netfilter reload or a misconfigured stack. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:30:44 -07:00
Hannes Frederic Sowa	d9e4ce65b2	ipv6: gre: setup default multicast routes over PtP links GRE point-to-point interfaces should also support ipv6 multicast. Setting up default multicast routes on interface creation was forgotten. Add it. Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=103231> Cc: Julien Muchembled <jm@jmuchemb.eu> Cc: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dumazet <ndumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:30:43 -07:00
Vivien Didelot	8057b3e7a1	net: dsa: use switchdev obj in port_fdb_del For consistency with the FDB add operation, propagate the switchdev_obj_port_fdb structure in the DSA drivers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:28:52 -07:00
Vivien Didelot	1f36faf269	net: dsa: push prepare phase in port_fdb_add Now that the prepare phase is pushed down to the DSA drivers, propagate it to the port_fdb_add function. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:28:50 -07:00
Vivien Didelot	146a32067b	net: dsa: add port_fdb_prepare Push the prepare phase for FDB operations down to the DSA drivers, with a new port_fdb_prepare function. Currently only mv88e6xxx is affected. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:28:49 -07:00
David S. Miller	7bcfeead48	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-10-08 Here's another set of Bluetooth & 802.15.4 patches for the 4.4 kernel. 802.15.4: - Many improvements & fixes to the mrf24j40 driver - Fixes and cleanups to nl802154, mac802154 & ieee802154 code Bluetooth: - New chipset support in btmrvl driver - Fixes & cleanups to btbcm, btmrvl, bpa10x & btintel drivers - Support for vendor specific diagnostic data through common API - Cleanups to the 6lowpan code - New events & message types for monitor channel Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:15:30 -07:00
Eric Dumazet	e446f9dfe1	net: synack packets can be attached to request sockets selinux needs few changes to accommodate fact that SYNACK messages can be attached to a request socket, lacking sk_security pointer (Only syncookies are still attached to a TCP_LISTEN socket) Adds a new sk_listener() helper, and use it in selinux and sch_fq Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported by: kernel test robot <ying.huang@linux.intel.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@parisplace.org> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:05:06 -07:00
WANG Cong	6ac644a8ae	sch_hhf: fix return value of hhf_drop() Similar to commit `c0afd9ce4d` ("fq_codel: fix return value of fq_codel_drop()") ->drop() is supposed to return the number of bytes it dropped, but hhf_drop () returns the id of the bucket where it drops a packet from. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Terry Lam <vtlam@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 04:49:33 -07:00
Alexei Starovoitov	ff936a04e5	bpf: fix cb access in socket filter programs eBPF socket filter programs may see junk in 'u32 cb[5]' area, since it could have been used by protocol layers earlier. For socket filter programs used in af_packet we need to clean 20 bytes of skb->cb area if it could be used by the program. For programs attached to TCP/UDP sockets we need to save/restore these 20 bytes, since it's used by protocol layers. Remove SK_RUN_FILTER macro, since it's no longer used. Long term we may move this bpf cb area to per-cpu scratch, but that requires addition of new 'per-cpu load/store' instructions, so not suitable as a short term fix. Fixes: `d691f9e8d4` ("bpf: allow programs to write to certain skb fields") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 04:40:05 -07:00
Linus Torvalds	38aa0a59a6	Just one RDMA bugfix. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWFW4mAAoJECebzXlCjuG+YQ8P/2cfPRV2QZHK0BxlHooM6WII ZyIOMYU9KHxtoolC7UWfTy6y+ohDzisByYS59Tpd9k0d2NWqtMgUTLHS1UbjcekF RBMkhqv8VLDMupiBVElaO4/FvSqhP4YTpB/YvFHn8K4i2+NnfwL4c707SlxAk2tA SKhvgZVIS/N+VYpQo5hFZ1RofTQ7zWsvzPEsAOJR0pbBhEFE0WemZ12nQwkdkmRI 2/R5XbT0ngSpCBRo2OcUoCHTozJG90gVfsu8IGzs/QeqlYZ9dVxWOUh8WDP2gmDF iB/KrUnv+gsMg4pLKrN9pbBMi8o6zvrbe7IMNjZEhA7qqcEwgf94hViYgrGdIDlS pqWWf/YMYWZzT0K1U8DuqjzQyeuTjRNv7RkALBFi54kQC6T49PIDbJruerhVVdzZ sgmDB/4kaSJF8yutetuRogskC+E7BaqhnAqu+VDin0UCFMl2GUb+3yof7GawbQcD uhPNMhn94LI6zXEzd86dKCc2ZwwNRfJYpfy5gYUmRHSHllZUSQdCqT4s3oIa4eFB RNqd0/AulHNgRJuXX/wMPZh5IWr9AnLp1WfJXRbY6hu5Q8+btsFG1wEBuQr3USTZ D5yJexpVQRNSmPWllLwfXkGFY4tiJA/TNDZxwrgocamnvxdrRw82HoFNvpRKVFEn AZFB4UR4JbqCe4LmBV/r =Jent -----END PGP SIGNATURE----- Merge tag 'nfsd-4.3-1' of git://linux-nfs.org/~bfields/linux Pull nfsd bugfix from Bruce Fields: "Just one RDMA bugfix" * tag 'nfsd-4.3-1' of git://linux-nfs.org/~bfields/linux: svcrdma: handle rdma read with a non-zero initial page offset	2015-10-09 16:34:45 -07:00
Paul Gortmaker	075640e364	net/sched: make sch_blackhole.c explicitly non-modular The Kconfig currently controlling compilation of this code is: net/sched/Kconfig:menuconfig NET_SCHED net/sched/Kconfig: bool "QoS and/or fair queueing" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We can change to one of the other priority initcalls (subsys?) at any later date, if desired. We also delete the MODULE_LICENSE tag since all that information is already contained at the top of the file in the comments. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-09 07:52:28 -07:00
Paul Gortmaker	36b9ad8084	net/dcb: make dcbnl.c explicitly non-modular The Kconfig currently controlling compilation of this code is: net/dcb/Kconfig:config DCB net/dcb/Kconfig: bool "Data Center Bridging support" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We can change to one of the other priority initcalls (subsys?) at any later date, if desired. We also delete the MODULE_LICENSE tag etc. since all that information is (or is now) already contained at the top of the file in the comments. Cc: "David S. Miller" <davem@davemloft.net> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Anish Bhatt <anish@chelsio.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Shani Michaeli <shanim@mellanox.com> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-09 07:52:27 -07:00
Paul Gortmaker	b6191aeeec	net/core: make sock_diag.c explicitly non-modular The Makefile currently controlling compilation of this code lists it under "obj-y" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We can change to one of the other priority initcalls (subsys?) at any later date, if desired. We can't remove module.h since the file uses other module related stuff even though it is not modular itself. We move the information from the MODULE_LICENSE tag to the top of the file, since that information is not captured anywhere else. The MODULE_ALIAS_NET_PF_PROTO becomes a no-op in the non modular case, so it is removed. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Craig Gallek <kraig@google.com> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-09 07:52:27 -07:00
Yaowei Bai	0cbf334376	net/core: lockdep_rtnl_is_held can be boolean This patch makes lockdep_rtnl_is_held return bool due to this particular function only using either one or zero as its return value. In another patch lockdep_is_held is also made return bool. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-09 07:49:06 -07:00
Yaowei Bai	45ae74f561	net/dccp: dccp_bad_service_code can be boolean This patch makes dccp_bad_service_code return bool due to these particular functions only using either one or zero as their return value. dccp_list_has_service is also been made return bool in this patchset. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-09 07:49:03 -07:00
Yaowei Bai	875e082949	net/nfnetlink: lockdep_nfnl_is_held can be boolean This patch makes lockdep_nfnl_is_held return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-09 07:49:00 -07:00
Yaowei Bai	61d03535e4	net/netlink: lockdep_genl_is_held can be boolean This patch makes lockdep_genl_is_held return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-09 07:48:59 -07:00
Trond Myklebust	31303d6cbb	SUNRPC: Use MSG_SENDPAGE_NOTLAST in xs_send_pagedata() If we're sending more than one page via kernel_sendpage(), then set MSG_SENDPAGE_NOTLAST between the pages so that we don't send suboptimal frames (see commit `2f53384424` and commit `35f9c09fe9`). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 09:12:35 -04:00
Marcel Holtmann	f640ee98bb	Bluetooth: Fix basic debugfs entries for unconfigured controllers When the controller is unconfigured (for example it does not have a valid Bluetooth address), then the basic debugfs entries for dut_mode and vendor_diag are not creates. Ensure they are created in __hci_init and also __hci_unconf_init functions. One of them is called during setup stage of a new controller. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 15:33:18 +03:00
Trond Myklebust	a26480942c	SUNRPC: Move AF_LOCAL receive data path into a workqueue context Now that we've done it for TCP and UDP, let's convert AF_LOCAL as well. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 08:27:05 -04:00
Trond Myklebust	f9b2ee714c	SUNRPC: Move UDP receive data path into a workqueue context Now that we've done it for TCP, let's convert UDP as well. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 08:27:04 -04:00
Trond Myklebust	edc1b01cd3	SUNRPC: Move TCP receive data path into a workqueue context Stream protocols such as TCP can often build up a backlog of data to be read due to ordering. Combine this with the fact that some workloads such as NFS read()-intensive workloads need to receive a lot of data per RPC call, and it turns out that receiving the data from inside a softirq context can cause starvation. The following patch moves the TCP data receive into a workqueue context. We still end up calling tcp_read_sock(), but we do so from a process context, meaning that softirqs are enabled for most of the time. With this patch, I see a doubling of read bandwidth when running a multi-threaded iozone workload between a virtual client and server setup. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 08:27:04 -04:00
Trond Myklebust	66d7a56a62	SUNRPC: Refactor TCP receive Move the TCP data receive loop out of xs_tcp_data_ready(). Doing so will allow us to move the data receive out of the softirq context in a set of followup patches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 08:27:04 -04:00
Daniel Borkmann	3ad0040573	bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs While recently arguing on a seccomp discussion that raw prandom_u32() access shouldn't be exposed to unpriviledged user space, I forgot the fact that SKF_AD_RANDOM extension actually already does it for some time in cBPF via commit `4cd3675ebf` ("filter: added BPF random opcode"). Since prandom_u32() is being used in a lot of critical networking code, lets be more conservative and split their states. Furthermore, consolidate eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF, bpf_get_prandom_u32() was only accessible for priviledged users, but should that change one day, we also don't want to leak raw sequences through things like eBPF maps. One thought was also to have own per bpf_prog states, but due to ABI reasons this is not easily possible, i.e. the program code currently cannot access bpf_prog itself, and copying the rnd_state to/from the stack scratch space whenever a program uses the prng seems not really worth the trouble and seems too hacky. If needed, taus113 could in such cases be implemented within eBPF using a map entry to keep the state space, or get_random_bytes() could become a second helper in cases where performance would not be critical. Both sides can trigger a one-time late init via prandom_init_once() on the shared state. Performance-wise, there should even be a tiny gain as bpf_user_rnd_u32() saves one function call. The PRNG needs to live inside the BPF core since kernels could have a NET-less config as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Chema Gonzalez <chema@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:26:39 -07:00
Hannes Frederic Sowa	46234253b9	net: move net_get_random_once to lib There's no good reason why users outside of networking should not be using this facility, f.e. for initializing their seeds. Therefore, make it accessible from there as get_random_once(). Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:26:35 -07:00
Alexander Aring	4d6a6aed22	6lowpan: move shared settings to lowpan_netdev_setup This patch moves values for all lowpan interface to the shared implementation of 6lowpan. This patch also quietly fixes the forgotten IFF_NO_QUEUE flag for the bluetooth 6LoWPAN interface. An identically commit is `4afbc0d` ("net: 6lowpan: convert to using IFF_NO_QUEUE") which wasn't changed for bluetooth 6lowpan. All 6lowpan interfaces should be virtual with IFF_NO_QUEUE, using EUI64 address length, the mtu size is 1280 (IPV6_MIN_MTU) and the netdev type is ARPHRD_6LOWPAN. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-08 14:25:34 +02:00
David Ahern	28335a7445	net: Do not drop to make_route if oif is l3mdev Commit `deaa0a6a93` ("net: Lookup actual route when oif is VRF device") exposed a bug in __ip_route_output_key_hash for VRF devices: on FIB lookup failure if the oif is specified the current logic drops to make_route on the assumption that the route tables are wrong. For VRF/L3 master devices this leads to wrong dst entries and route lookups. For example: $ ip route ls table vrf-red unreachable default broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2 broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2 $ ip route get oif vrf-red 1.1.1.1 1.1.1.1 dev vrf-red src 10.0.0.2 cache With this patch: $ ip route get oif vrf-red 1.1.1.1 RTNETLINK answers: No route to host which is the correct response based on the default route Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:18:47 -07:00
Daniel Borkmann	cfc81b5038	bpf, skb_do_redirect: clear sender_cpu before xmit Similar to commit `c29390c6df` ("xps: must clear sender_cpu before forwarding"), we also need to clear the skb->sender_cpu when moving from RX to TX via skb_do_redirect() due to the shared location of napi_id (used on RX) and sender_cpu (used on TX). Fixes: `27b29f6305` ("bpf: add bpf_redirect() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:03:08 -07:00
Alexei Starovoitov	6bf0577374	bpf: clear sender_cpu before xmit Similar to commit `c29390c6df` ("xps: must clear sender_cpu before forwarding") the skb->sender_cpu needs to be cleared before xmit. Fixes: `3896d655f4` ("bpf: introduce bpf_clone_redirect() helper") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:02:08 -07:00
WANG Cong	d40496a564	act_mirred: clear sender cpu before sending to tx Similar to commit `c29390c6df` ("xps: must clear sender_cpu before forwarding") the skb->sender_cpu needs to be cleared when moving from Rx Tx, otherwise kernel could crash. Fixes: `2bd82484bb` ("xps: fix xps for stacked devices") Cc: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:59:04 -07:00
David S. Miller	91d2f14bc3	Merge branch 'net/rds/4.3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux Santosh Shilimkar says: ==================== RDS: connection scalability and performance improvements [v4] Re-sending the same patches from v3 again since my repost of patch 05/14 from v3 was whitespace damaged. [v3] Updated patch "[PATCH v2 05/14] RDS: defer the over_batch work to send worker" as per David Miller's comment [4] to avoid the magic value usage. Patch now makes use of already available but unused send_batch_count module parameter. Rest of the patches are same as earlier version v2 [3] [v2]: Dropped "[PATCH 05/15] RDS: increase size of hash-table to 8K" from earlier version [1]. I plan to address the hash table scalability using re-sizable hash tables as suggested by David Laight and David Miller [2] This series addresses RDS connection bottlenecks on massive workloads and improve the RDMA performance almost by 3X. RDS TCP also gets a small gain of about 12%. RDS is being used in massive systems with high scalability where several hundred thousand end points and tens of thousands of local processes are operating in tens of thousand sockets. Being RC(reliable connection), socket bind and release happens very often and any inefficiencies in bind hash look ups hurts the overall system performance. RDS bin hash-table uses global spin-lock which is the biggest bottleneck. To make matter worst, it uses rcu inside global lock for hash buckets. This is being addressed by simply using per bucket rw lock which makes the locking simple and very efficient. The hash table size is still an issue and I plan to address it by using re-sizable hash tables as suggested on the list. For RDS RDMA improvement, the completion handling is revamped so that we can do batch completions. Both send and receive completion handlers are split logically to achieve the same. RDS 8K messages being one of the key usecase, mr pool is adapted to have the 8K mrs along with default 1M mrs. And while doing this, few fixes and couple of bottlenecks seen with rds_sendmsg() are addressed. Series applies against 4.3-rc1 as well net-next. Its tested on Oracle hardware with IB fabric for both bcopy as well as RDMA mode. RDS TCP is tested with iXGB NIC. Like last time, iWARP transport is untested with these changes. The patchset is also available at below git repo: git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git net/rds/4.3-v3 As a side note, the IB HCA driver I used for testing misses at least 3 important patches in upstream to see the full blown IB performance and am hoping to get that in mainline with help of them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:38:37 -07:00
Eric W. Biederman	ede2059dba	dst: Pass net into dst->output The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:03 -07:00
Eric W. Biederman	33224b16ff	ipv4, ipv6: Pass net into ip_local_out and ip6_local_out Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:02 -07:00
Eric W. Biederman	cf91a99daa	ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:02 -07:00
Eric W. Biederman	77589ce0f8	ipv4: Cache net in ip_build_and_send_pkt and ip_queue_xmit Compute net and store it in a variable in the functions ip_build_and_send_pkt and ip_queue_xmit so that it does not need to be recomputed next time it is needed. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:59 -07:00
Eric W. Biederman	f859b0f662	ipv4: Cache net in iptunnel_xmit Store net in a variable in ip_tunnel_xmit so it does not need to be recomputed when it is used again. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:59 -07:00
Eric W. Biederman	792883303c	ipv6: Merge ip6_local_out and ip6_local_out_sk Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:58 -07:00
Eric W. Biederman	9f8955cc46	ipv6: Merge __ip6_local_out and __ip6_local_out_sk Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk __ip6_local_out and remove the previous __ip6_local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:58 -07:00
Eric W. Biederman	e2cb77db08	ipv4: Merge ip_local_out and ip_local_out_sk It is confusing and silly hiding a parameter so modify all of the callers to pass in the appropriate socket or skb->sk if no socket is known. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:57 -07:00
Eric W. Biederman	b92dacd456	ipv4: Merge __ip_local_out and __ip_local_out_sk Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:57 -07:00
Eric W. Biederman	4ebdfba73c	dst: Pass a sk into .local_out For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:55 -07:00
Eric W. Biederman	13206b6bff	net: Pass net into dst_output and remove dst_output_okfn Replace dst_output_okfn with dst_output Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:54 -07:00
Eric W. Biederman	3f5312ae62	xfrm: Only compute net once in xfrm_policy_queue_process Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:53 -07:00
Eric W. Biederman	850dcc4d4d	ipv4: Fix ip_queue_xmit to pass sk into ip_local_out_sk After a packet has been encapsulated by a tunnel we should use the tunnel sockets local multicast loopback flag to control if the encapsulated packet should be locally loopback back. Pass sk into ip_local_out_sk so that in the rare case we are dealing with a tunneled packet whose tunnel destination address is a multicast address the kernel properly decides to loopback this packet. In practice I don't think this matters as ip_queue_xmit is used by tcp, l2tp and sctp none of which I am aware of uses ip level multicasting as they are all point to point communications protocols. Let's fix this before someone uses ip_queue_xmit for a tunnel protocol that does use multicast. Fixes: `aad88724c9` ("ipv4: add a sock pointer to dst->output() path.") Fixes: `b0270e9101` ("ipv4: add a sock pointer to ip_queue_xmit()") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:52 -07:00
Eric W. Biederman	fd2874b3bb	ipv4: Fix ip_local_out_sk by passing the sk into __ip_local_out_sk In the rare case where sk != skb->sk ip_local_out_sk arranges to call dst->output differently if the skb is queued or not. This is a bug. Fix this bug by passing the sk parameter of ip_local_out_sk through from ip_local_out_sk to __ip_local_out_sk (skipping __ip_local_out). Fixes: `7026b1ddb6` ("netfilter: Pass socket pointer down through okfn().") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:52 -07:00
Christoph Hellwig	e622f2f4ad	IB: split struct ib_send_wr This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>	2015-10-08 11:09:10 +01:00
Felix Fietkau	4d57c67827	mac80211: add missing struct ieee80211_txq tid field initialization Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-08 11:10:50 +02:00
Johan Hedberg	26d46dffbe	Bluetooth: 6lowpan: Remove unnecessary chan_get() function The chan_get() function just adds unnecessary indirection to calling the chan_create() call. The only added value it gives is the chan->ops assignment, but that can equally well be done in the calling code. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-08 10:43:52 +02:00
Johan Hedberg	0cd088fc97	Bluetooth: 6lowpan: Rename confusing 'pchan' variables The typical convention when having both a child and a parent channel variable is to call the former 'chan' and the latter 'pchan'. When there's only one variable it's called chan. Rename the 'pchan' variables in the 6lowpan code to follow this convention. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-08 10:43:52 +02:00
Johan Hedberg	630ef791ea	Bluetooth: 6lowpan: Remove unnecessary chan_open() function All the chan_open() function now does is to call chan_create() so it doesn't really add any value. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-08 10:43:52 +02:00
Johan Hedberg	b0c09f94ff	Bluetooth: 6lowpan: Remove redundant BT_CONNECTED assignment The L2CAP core code makes sure of setting the channel state to BT_CONNECTED, so there's no need for the implementation code (6lowpan in this case) to do it. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-08 10:43:52 +02:00
Johan Hedberg	5d0fd77a04	Bluetooth: 6lowpan: Remove redundant (and incorrect) MPS assignments The L2CAP core code already sets the local MPS to a sane value. The remote MPS value otoh comes from the remote side so there's no point in trying to hard-code it to any value. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-08 10:43:52 +02:00
Johan Hedberg	301de2cb6a	Bluetooth: 6lowpan: Fix imtu & omtu values The omtu value is determined by the remote peer so there's no point in trying to hard-code it to any value. The IPSP specification otoh gives a more reasonable value for the imtu, i.e. 1280. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-08 10:43:52 +02:00
Marcel Holtmann	fe806dcede	Bluetooth: Enforce packet types in hci_recv_frame driver function When calling the hci_recv_frame driver function check for valid packet types that the core should process. This should catch issues with drivers trying to feed vendor packet types through this interface. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 10:05:41 +03:00
Marcel Holtmann	acc649c654	Bluetooth: Fix interaction of HCI_QUIRK_RESET_ON_CLOSE and HCI_AUTO_OFF When the controller requires the HCI Reset command to be send when closing the transport, the HCI_AUTO_OFF needs to be accounted for. The current code tries to actually do that, but the flag gets cleared to early. So store its value and use it that stored value instead of checking for a flag that is always cleared. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 10:00:05 +03:00
Marcel Holtmann	4b4113d6db	Bluetooth: Add debugfs entry for setting vendor diagnostic mode This adds a new debugfs entry for enabling and disabling the vendor diagnostic mode. It is only exposed for drivers that provide the set_diag driver callback and actually have an option for vendor specific diagnostic information. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 09:57:07 +03:00
Marcel Holtmann	e875ff8407	Bluetooth: Add support for vendor specific diagnostic channel Introduce hci_recv_diag function for HCI drivers to allow sending vendor specific diagnostic messages into the Bluetooth core stack. The messages are not processed, but they are forwarded to the monitor channel and can be retrieved by user space diagnostic tools. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 09:51:13 +03:00
Marcel Holtmann	6c566dd5a1	Bluetooth: Send index information updates to monitor channel The Bluetooth public device address might change during controller setup and it makes it a lot simpler for monitoring tools if they just get told what the new address is. In addition include the manufacturer / company information of the controller. That allows for easy vendor specific HCI command and event handling. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 09:48:34 +03:00
Eric Dumazet	acb4a6bfc8	tcp: ensure prior synack rtx behavior with small backlogs Some applications use a listen() backlog of 1. Prior kernels were silently enforcing a qlen_log of 4, so that we were sending up to /proc/sys/net/ipv4/tcp_synack_retries SYNACK messages. Fixes: `ef547f2ac1` ("tcp: remove max_qlen_log") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 05:08:58 -07:00
Joe Stringer	ab38a7b5a4	openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT Previously, the CT_ATTR_FLAGS attribute, when nested under the OVS_ACTION_ATTR_CT, encoded a 32-bit bitmask of flags that modify the semantics of the ct action. It's more extensible to just represent each flag as a nested attribute, and this requires no additional error checking to reject flags that aren't currently supported. Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 05:03:06 -07:00
Joe Stringer	fbccce5965	openvswitch: Extend ct_state match field to 32 bits The ct_state field was initially added as an 8-bit field, however six of the bits are already being used and use cases are already starting to appear that may push the limits of this field. This patch extends the field to 32 bits while retaining the internal representation of 8 bits. This should cover forward compatibility of the ABI for the foreseeable future. This patch also reorders the OVS_CS_F_* bits to be sequential. Suggested-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 05:03:06 -07:00
Joe Stringer	6f22595246	openvswitch: Reject ct_state unsupported bits Previously, if userspace specified ct_state bits in the flow key which are currently undefined (and therefore unsupported), then they would be ignored. This could cause unexpected behaviour in future if userspace is extended to support additional bits but attempts to communicate with the current version of the kernel. This patch rectifies the situation by rejecting such ct_state bits. Fixes: `7f8a436eaa` "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 05:03:05 -07:00
Joe Stringer	ec0d043d05	openvswitch: Ensure flow is valid before executing ct The ct action uses parts of the flow key, so we need to ensure that it is valid before executing that action. Fixes: `7f8a436eaa` "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 05:03:05 -07:00
Joe Stringer	b8f2257069	openvswitch: Fix skb leak in ovs_fragment() If ovs_fragment() was unable to fragment the skb due to an L2 header that exceeds the supported length, skbs would be leaked. Fix the bug. Fixes: `7f8a436eaa` "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 05:03:03 -07:00
Yuvaraja Mariappan	686a562449	net: ipv4: tcp.c Fixed an assignment coding style issue Fixed an assignment coding style issue Signed-off-by: Yuvaraja Mariappan <ymariappan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 05:01:04 -07:00
Neil Armstrong	4d7f3e757c	net: dsa: exit probe if no switch were found If no switch were found in dsa_setup_dst, return -ENODEV and exit the dsa_probe cleanly. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:56:11 -07:00
Neil Armstrong	d4ac35d6ed	net: dsa: switch to devm_ calls and remove kfree calls Now the kfree calls exists in the the remove functions, remove them in all places except the of_probe functions and replace allocation calls with their devm_ counterparts. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:56:09 -07:00
Neil Armstrong	cbc5d90b37	net: dsa: complete dsa_switch_destroy When unbinding dsa, complete the dsa_switch_destroy to unregister the fixed link phy then cleanly unregister and destroy the net devices. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:56:08 -07:00
Neil Armstrong	e410ddb89e	net: dsa: add missing dsa_switch mdiobus remove To prevent memory leakage on unbinding, add missing mdiobus unregister and unallocation calls. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:56:07 -07:00
Neil Armstrong	1023d2ec1e	net: dsa: add missing kfree on remove To prevent memory leakage on unbinding, add missing kfree calls. Includes minor cosmetic change to make patch clean. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:56:06 -07:00
Nikolay Aleksandrov	5d6ae479ab	bridge: netlink: add support for port's multicast_router attribute Add IFLA_BRPORT_MULTICAST_ROUTER to allow setting/getting port's multicast_router via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:34 -07:00
Nikolay Aleksandrov	9b0c6e4deb	bridge: netlink: allow to flush port's fdb Add IFLA_BRPORT_FLUSH to allow flushing port's fdb similar to sysfs's flush. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:32 -07:00
Nikolay Aleksandrov	61c0a9a83e	bridge: netlink: export port's timer values Add the following attributes in order to export port's timer values: IFLA_BRPORT_MESSAGE_AGE_TIMER, IFLA_BRPORT_FORWARD_DELAY_TIMER and IFLA_BRPORT_HOLD_TIMER. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:31 -07:00
Nikolay Aleksandrov	e08e838ac5	bridge: netlink: export port's topology_change_ack and config_pending Add IFLA_BRPORT_TOPOLOGY_CHANGE_ACK and IFLA_BRPORT_CONFIG_PENDING to allow getting port's topology_change_ack and config_pending respectively via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:30 -07:00
Nikolay Aleksandrov	42d452c4b5	bridge: netlink: export port's id and number Add IFLA_BRPORT_(ID\|NO) to allow getting port's port_id and port_no respectively via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:29 -07:00
Nikolay Aleksandrov	96f94e7f4a	bridge: netlink: export port's designated cost and port Add IFLA_BRPORT_DESIGNATED_(COST\|PORT) to allow getting the port's designated cost and port respectively via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:29 -07:00
Nikolay Aleksandrov	80df9a2692	bridge: netlink: export port's bridge id Add IFLA_BRPORT_BRIDGE_ID to allow getting the designated bridge id via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:28 -07:00
Nikolay Aleksandrov	4ebc7660ab	bridge: netlink: export port's root id Add IFLA_BRPORT_ROOT_ID to allow getting the designated root id via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:27 -07:00
David Ahern	deaa0a6a93	net: Lookup actual route when oif is VRF device If the user specifies a VRF device in a get route query the custom route pointing to the VRF device is returned: $ ip route ls table vrf-red unreachable default broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2 broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2 $ ip route get oif vrf-red 10.2.1.40 10.2.1.40 dev vrf-red cache Add the flags to skip the custom route and go directly to the FIB. With this patch the actual route is returned: $ ip route get oif vrf-red 10.2.1.40 10.2.1.40 dev eth1 src 10.2.1.2 cache Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:31:16 -07:00
David S. Miller	2579c98f0d	For the current cycle, we have the following right now: * many internal fixes, API improvements, cleanups, etc. * full AP client state tracking in cfg80211/mac80211 from Ayala * VHT support (in mac80211) for mesh * some A-MSDU in A-MPDU support from Emmanuel * show current TX power to userspace (from Rafał) * support for netlink dump in vendor commands (myself) -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJWEp5XAAoJEDBSmw7B7bqr8DsQAICgQL7gSkHUlc6rbMJ9MzX+ 9W0SNpZHSmfE0ZsL3cCoeHbk5dGhX82GumIz4GeqtvIKUNHkC8qlnXJIKTEva+sp PjcF1wS0qQFdt6sg/Zxq+4Q8lZrZf1xP9W0x0ORYi9d9qej07JAZku8zYt4agpNV R4nCl/gKVF375aV8y+qi+WSZXx4j80dJkokoVk4hzotWjd0bGVL1T9YwDRzxg4FI S0DnkxlsD3MRHJXq+9+DbF5cuTjCG2LZNcDIBy455eWN27j9CWgEPVXoySQjDgQc ayf2siw7BccqnV84et0vi+0WYXdZCHm3zCen44s4vaCflhdGxdx48V+Lib6mluR3 OEM1V1l9uV97UyORPljRKvDURq2IUdLQw00of26CTX8qEnmQIfxC7qaRg0rYEiGW SbTClbEiEkBLV+sCStnkv8GJHNpvtI/2VQXH1ydrHsrWC3Sl9bpPOWYlNBPwdzM9 U4zgpxf6gLqlsukQKmMDmoKW7T04Fs0qgE99ThU2x6uTGsux8bfbxgzPCfUdeY8M HmCB5oBCZKJ5pzv6z6lUGc0cO42IL50aBrrlatrEekjevUXW3MMOZCwGrUXxpMw1 gd+2PnLCCUeDyKNvkpXEgr4uS9Egc0sWH1RlpDPaAA5gRdRHiDn7MK7Z+s5OpNIC wnFCQKB+KrNNrQFuXz9k =BF9F -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the current cycle, we have the following right now: * many internal fixes, API improvements, cleanups, etc. * full AP client state tracking in cfg80211/mac80211 from Ayala * VHT support (in mac80211) for mesh * some A-MSDU in A-MPDU support from Emmanuel * show current TX power to userspace (from Rafał) * support for netlink dump in vendor commands (myself) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:29:18 -07:00
David Ahern	bb191c3e87	net: Add l3mdev saddr lookup to raw_sendmsg ping originated on box through a VRF device is showing up in tcpdump without a source address: $ tcpdump -n -i vrf-blue 08:58:33.311303 IP 0.0.0.0 > 10.2.2.254: ICMP echo request, id 2834, seq 1, length 64 08:58:33.311562 IP 10.2.2.254 > 10.2.2.2: ICMP echo reply, id 2834, seq 1, length 64 Add the call to l3mdev_get_saddr to raw_sendmsg. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:27:46 -07:00
David Ahern	8cbb512c92	net: Add source address lookup op for VRF Add operation to l3mdev to lookup source address for a given flow. Add support for the operation to VRF driver and convert existing IPv4 hooks to use the new lookup. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:27:44 -07:00
David Ahern	3ce58d8435	net: Refactor path selection in __ip_route_output_key_hash VRF device needs the same path selection following lookup to set source address. Rather than duplicating code, move existing code into a function that is exported to modules. Code move only; no functional change. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:27:44 -07:00
David Ahern	fee6d4c777	net: Add netif_is_l3_slave IPv6 addrconf keys off of IFF_SLAVE so can not use it for L3 slave. Add a new private flag and add netif_is_l3_slave function for checking it. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:27:43 -07:00
David Ahern	6e2895a8e3	net: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRC Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:27:42 -07:00
David Ahern	6e28b00082	net: Fix vti use case with oif in dst lookups for IPv6 It occurred to me yesterday that `741a11d9e4` ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes the oif to be considered in lookups which is known to break vti. This explains why `58189ca7b2` did not the IPv6 change at the time it was submitted. Fixes: `42a7b32b73` ("xfrm: Add oif to dst lookups") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:25:03 -07:00
David Ahern	4148987a51	net: Fix vti use case with oif in dst lookups for IPv6 It occurred to me yesterday that `741a11d9e4` ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes the oif to be considered in lookups which is known to break vti. This explains why `58189ca7b2` did not the IPv6 change at the time it was submitted. Fixes: `42a7b32b73` ("xfrm: Add oif to dst lookups") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:24:00 -07:00
Jiri Benc	6b26ba3a7d	openvswitch: netlink attributes for IPv6 tunneling Add netlink attributes for IPv6 tunnel addresses. This enables IPv6 support for tunnels. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:18:00 -07:00
Jiri Benc	00a93babd0	openvswitch: add tunnel protocol to sw_flow_key Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now also acts as an indicator whether the flow contains tunnel data (this was previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses in an union with IPv4 ones this won't work anymore). The new field was added to a hole in sw_flow_key. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:17:59 -07:00
Nikolay Aleksandrov	4917a1548f	bridge: netlink: make br_fill_info's frame size smaller When KASAN is enabled the frame size grows > 2048 bytes and we get a warning, so make it smaller. net/bridge/br_netlink.c: In function 'br_fill_info': >> net/bridge/br_netlink.c:1110:1: warning: the frame size of 2160 bytes >> is larger than 2048 bytes [-Wframe-larger-than=] Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:15:57 -07:00
David Ahern	16660f0bd9	net: Add support for filtering neigh dump by device index Add support for filtering neighbor dumps by device by adding the NDA_IFINDEX attribute to the dump request. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:12:02 -07:00
Russell King	d25b8e7429	net: dsa: better error reporting Add additional error reporting to the generic DSA code, so it's easier to debug when things go wrong. This was useful when initially bringing up 88e6176 on a new board. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 02:58:49 -07:00
Simon Horman	92240e8dc0	ipvs: Remove possibly unused variables from ip_vs_conn_net_{init,cleanup} If CONFIG_PROC_FS is undefined then the arguments of proc_create() and remove_proc_entry() are unused. As a result the net variables of ip_vs_conn_net_{init,cleanup} are unused. net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_init’: net/netfilter/ipvs//ip_vs_conn.c:1350:14: warning: unused variable ‘net’ [-Wunused-variable] net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_cleanup’: net/netfilter/ipvs//ip_vs_conn.c:1361:14: warning: unused variable ‘net’ [-Wunused-variable] ... Resolve this by dereferencing net as needed rather than storing it in a variable. Fixes: `3d99376689` ("ipvs: Pass ipvs not net into ip_vs_control_net_(init\|cleanup)") Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2015-10-07 10:12:00 +09:00
David Ahern	ed1c9f0e78	ipvs: Remove possibly unused variable from ip_vs_out Eric's net namespace changes in `1b75097dd7` leaves net unreferenced if CONFIG_IP_VS_IPV6 is not enabled: ../net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_out’: ../net/netfilter/ipvs/ip_vs_core.c:1177:14: warning: unused variable ‘net’ [-Wunused-variable] After the net refactoring there is only 1 user; push the reference to the 1 user. While the line length slightly exceeds 80 it seems to be the best change. Fixes: 1b75097dd7a26("ipvs: Pass ipvs into ip_vs_out") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Julian Anastasov <ja@ssi.bg> [horms: updated subject] Signed-off-by: Simon Horman <horms@verge.net.au>	2015-10-07 10:11:59 +09:00
Sagi Grimberg	f022fa88ce	xprtrdma: Don't require LOCAL_DMA_LKEY support for fastreg There is no need to require LOCAL_DMA_LKEY support as the PD allocation makes sure that there is a local_dma_lkey. Also correctly set a return value in error path. This caused a NULL pointer dereference in mlx5 which removed the support for LOCAL_DMA_LKEY. Fixes: `bb6c96d728` ("xprtrdma: Replace global lkey with lkey local to PD") Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-10-06 14:23:00 -04:00
Peter Nørlund	0a837fe472	ipv4: Fix compilation errors in fib_rebalance This fixes net/built-in.o: In function `fib_rebalance': fib_semantics.c:(.text+0x9df14): undefined reference to `__divdi3' and net/built-in.o: In function `fib_rebalance': net/ipv4/fib_semantics.c:572: undefined reference to `__aeabi_ldivmod' Fixes: `0e884c78ee` ("ipv4: L3 hash-based multipath") Signed-off-by: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 23:48:09 -07:00
Santosh Shilimkar	0676651323	RDS: IB: split mr pool to improve 8K messages performance 8K message sizes are pretty important usecase for RDS current workloads so we make provison to have 8K mrs available from the pool. Based on number of SG's in the RDS message, we pick a pool to use. Also to make sure that we don't under utlise mrs when say 8k messages are dominating which could lead to 8k pull being exhausted, we fall-back to 1m pool till 8k pool recovers for use. This helps to at least push ~55 kB/s bidirectional data which is a nice improvement. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:19:02 -07:00
Santosh Shilimkar	41a4e96462	RDS: IB: use max_mr from HCA caps than max_fmr All HCA drivers seems to popullate max_mr caps and few of them do both max_mr and max_fmr. Hence update RDS code to make use of max_mr. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:19:02 -07:00
Santosh Shilimkar	67161e250a	RDS: IB: mark rds_ib_fmr_wq static Fix below warning by marking rds_ib_fmr_wq static net/rds/ib_rdma.c:87:25: warning: symbol 'rds_ib_fmr_wq' was not declared. Should it be static? Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:19:02 -07:00
Santosh Shilimkar	26139dc1db	RDS: IB: use already available pool handle from ibmr rds_ib_mr already keeps the pool handle which it associates with. Lets use that instead of round about way of fetching it from rds_ib_device. No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:19:02 -07:00
Santosh Shilimkar	2e1d6b813a	RDS: IB: fix the rds_ib_fmr_wq kick call RDS IB mr pool has its own workqueue 'rds_ib_fmr_wq', so we need to use queue_delayed_work() to kick the work. This was hurting the performance since pool maintenance was less often triggered from other path. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:19:01 -07:00
Santosh Shilimkar	9441c973e1	RDS: IB: handle rds_ibdev release case instead of crashing the kernel Just in case we are still handling the QP receive completion while the rds_ibdev is released, drop the connection instead of crashing the kernel. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:19:01 -07:00
Santosh Shilimkar	0c28c04500	RDS: IB: split send completion handling and do batch ack Similar to what we did with receive CQ completion handling, we split the transmit completion handler so that it lets us implement batched work completion handling. We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to identify the send vs receive completion event handler invocation. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:19:01 -07:00
Santosh Shilimkar	f4f943c958	RDS: IB: ack more receive completions to improve performance For better performance, we split the receive completion IRQ handler. That lets us acknowledge several WCE events in one call. We also limit the WC to max 32 to avoid latency. Acknowledging several completions in one call instead of several calls each time will provide better performance since less mutual exclusion locks are being performed. In next patch, send completion is also split which re-uses the poll_cq() and hence the code is moved to ib_cm.c Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:19:01 -07:00
Santosh Shilimkar	db6526dcb5	RDS: use rds_send_xmit() state instead of RDS_LL_SEND_FULL In Transport indepedent rds_sendmsg(), we shouldn't make decisions based on RDS_LL_SEND_FULL which is used to manage the ring for RDMA based transports. We can safely issue rds_send_xmit() and the using its return value take decision on deferred work. This will also fix the scenario where at times we are seeing connections stuck with the LL_SEND_FULL bit getting set and never cleared. We kick krdsd after any time we see -ENOMEM or -EAGAIN from the ring allocation code. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:19:01 -07:00
Santosh Shilimkar	4bebdd7a4d	RDS: defer the over_batch work to send worker Current process gives up if its send work over the batch limit. The work queue will get kicked to finish off any other requests. This fixes remainder condition from commit `443be0e5af` ("RDS: make sure not to loop forever inside rds_send_xmit"). The restart condition is only for the case where we reached to over_batch code for some other reason so just retrying again before giving up. While at it, make sure we use already available 'send_batch_count' parameter instead of magic value. The batch count threshold value of 1024 came via commit `443be0e5af` ("RDS: make sure not to loop forever inside rds_send_xmit"). The idea is to process as big a batch as we can but at the same time we don't hold other waiting processes for send. Hence back-off after the send_batch_count limit (1024) to avoid soft-lock ups. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-10-05 11:18:45 -07:00
Andrzej Hajda	5edfcee5ed	mac80211: make ieee80211_new_mesh_header return unsigned The function returns always non-negative values. The problem has been detected using proposed semantic patch scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107 Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-05 17:54:16 +02:00
Ken-ichirou MATSUZAWA	a29a9a585b	netfilter: nfnetlink_log: allow to attach conntrack This patch enables to include the conntrack information together with the packet that is sent to user-space via NFLOG, then a user-space program can acquire NATed information by this NFULA_CT attribute. Including the conntrack information is optional, you can set it via NFULNL_CFG_F_CONNTRACK flag with the NFULA_CFG_FLAGS attribute like NFQUEUE. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:32:14 +02:00
Ken-ichirou MATSUZAWA	224a05975e	netfilter: ctnetlink: add const qualifier to nfnl_hook.get_ct get_ct as is and will not update its skb argument, and users of nfnl_ct_hook is currently only nfqueue, we can add const qualifier. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>	2015-10-05 17:32:13 +02:00
Ken-ichirou MATSUZAWA	83f3e94d34	netfilter: Kconfig rename QUEUE_CT to GLUE_CT Conntrack information attaching infrastructure is now generic and update it's name to use `glue' in previous patch. This patch updates Kconfig symbol name and adding NF_CT_NETLINK dependency. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:32:12 +02:00
Ken-ichirou MATSUZAWA	a4b4766c3c	netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info The idea of this series of patch is to attach conntrack information to nflog like nfqueue has already done. nfqueue conntrack info attaching basis is generic, rename those names to generic one, glue. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:32:11 +02:00
Pablo Neira Ayuso	b28b1e826f	netfilter: nfnetlink_queue: use y2038 safe timestamp The __build_packet_message function fills a nfulnl_msg_packet_timestamp structure that uses 64-bit seconds and is therefore y2038 safe, but it uses an intermediate 'struct timespec' which is not. This trivially changes the code to use 'struct timespec64' instead, to correct the result on 32-bit architectures. This is a copy and paste of Arnd's original patch for nfnetlink_log. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:27:25 +02:00
Pablo Neira Ayuso	2b5b1a01a7	Merge tag 'ipvs3-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Third Round of IPVS Updates for v4.4 please consider this build fix from Eric Biederman which resolves a build problem introduced in is excellent work to cleanup IPVS which you recently pulled: its queued up for v4.4 so no need to worry about earlier kernel versions. I have another minor cleanup, to fix a build warning, pending. However, I wanted to send this one to you now as its hit nf-next, net-next and in turn next, and a slow trickle of bug reports are appearing. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:27:06 +02:00
Pravin B Shelar	83ffe99f52	openvswitch: Fix ovs_vport_get_stats() Not every device has dev->tstats set. So when OVS tries to calculate vport stats it causes kernel panic. Following patch fixes it by using standard API to get net-device stats. ---8<--- Unable to handle kernel paging request at virtual address 766b4008 Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: vport_vxlan vxlan ip6_udp_tunnel udp_tunnel tun bridge stp llc openvswitch ipv6 CPU: 7 PID: 1108 Comm: ovs-vswitchd Not tainted 4.3.0-rc3+ #82 PC is at ovs_vport_get_stats+0x150/0x1f8 [openvswitch] <snip> Call trace: [<ffffffbffc0859f8>] ovs_vport_get_stats+0x150/0x1f8 [openvswitch] [<ffffffbffc07cdb0>] ovs_vport_cmd_fill_info+0x140/0x1e0 [openvswitch] [<ffffffbffc07cf0c>] ovs_vport_cmd_dump+0xbc/0x138 [openvswitch] [<ffffffc00045a5ac>] netlink_dump+0xb8/0x258 [<ffffffc00045ace0>] __netlink_dump_start+0x120/0x178 [<ffffffc00045dd9c>] genl_family_rcv_msg+0x2d4/0x308 [<ffffffc00045de58>] genl_rcv_msg+0x88/0xc4 [<ffffffc00045cf24>] netlink_rcv_skb+0xd4/0x100 [<ffffffc00045dab0>] genl_rcv+0x30/0x48 [<ffffffc00045c830>] netlink_unicast+0x154/0x200 [<ffffffc00045cc9c>] netlink_sendmsg+0x308/0x364 [<ffffffc00041e10c>] sock_sendmsg+0x14/0x2c [<ffffffc000420d58>] SyS_sendto+0xbc/0xf0 Code: aa1603e1 f94037a4 aa1303e2 aa1703e0 (f9400465) Reported-by: Tomasz Sawicki <tomasz.sawicki@objectiveintegration.uk> Fixes: `8c876639c9` ("openvswitch: Remove vport stats.") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 07:06:35 -07:00
Daniel Borkmann	bab1899187	bpf, seccomp: prepare for upcoming criu support The current ongoing effort to dump existing cBPF seccomp filters back to user space requires to hold the pre-transformed instructions like we do in case of socket filters from sk_attach_filter() side, so they can be reloaded in original form at a later point in time by utilities such as criu. To prepare for this, simply extend the bpf_prog_create_from_user() API to hold a flag that tells whether we should store the original or not. Also, fanout filters could make use of that in future for things like diag. While fanout filters already use bpf_prog_destroy(), move seccomp over to them as well to handle original programs when present. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Tycho Andersen <tycho.andersen@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:47:05 -07:00
Konstantin Khlebnikov	598c12d0ba	ovs: do not allocate memory from offline numa node When openvswitch tries allocate memory from offline numa node 0: stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL \| __GFP_ZERO, 0) It catches VM_BUG_ON(nid < 0 \|\| nid >= MAX_NUMNODES \|\| !node_online(nid)) [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h This patch disables numa affinity in this case. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:42:03 -07:00
Daniel Borkmann	93d08b6966	bpf: fix panic in SO_GET_FILTER with native ebpf programs When sockets have a native eBPF program attached through setsockopt(sk, SOL_SOCKET, SO_ATTACH_BPF, ...), and then try to dump these over getsockopt(sk, SOL_SOCKET, SO_GET_FILTER, ...), the following panic appears: [49904.178642] BUG: unable to handle kernel NULL pointer dereference at (null) [49904.178762] IP: [<ffffffff81610fd9>] sk_get_filter+0x39/0x90 [49904.182000] PGD 86fc9067 PUD 531a1067 PMD 0 [49904.185196] Oops: 0000 [#1] SMP [...] [49904.224677] Call Trace: [49904.226090] [<ffffffff815e3d49>] sock_getsockopt+0x319/0x740 [49904.227535] [<ffffffff812f59e3>] ? sock_has_perm+0x63/0x70 [49904.228953] [<ffffffff815e2fc8>] ? release_sock+0x108/0x150 [49904.230380] [<ffffffff812f5a43>] ? selinux_socket_getsockopt+0x23/0x30 [49904.231788] [<ffffffff815dff36>] SyS_getsockopt+0xa6/0xc0 [49904.233267] [<ffffffff8171b9ae>] entry_SYSCALL_64_fastpath+0x12/0x71 The underlying issue is the very same as in commit `b382c08656` ("sock, diag: fix panic in sock_diag_put_filterinfo"), that is, native eBPF programs don't store an original program since this is only needed in cBPF ones. However, sk_get_filter() wasn't updated to test for this at the time when eBPF could be attached. Just throw an error to the user to indicate that eBPF cannot be dumped over this interface. That way, it can also be known that a program _is_ attached (as opposed to just return 0), and a different (future) method needs to be consulted for a dump. Fixes: `89aa075832` ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:40:16 -07:00
Joe Stringer	33db4125ec	openvswitch: Rename LABEL->LABELS Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name for these to be consistent with conntrack. Fixes: `c2ac667` "openvswitch: Allow matching on conntrack label" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:34:28 -07:00
Andrey Vagin	e9193d60d3	net/unix: fix logic about sk_peek_offset Now send with MSG_PEEK can return data from multiple SKBs. Unfortunately we take into account the peek offset for each skb, that is wrong. We need to apply the peek offset only once. In addition, the peek offset should be used only if MSG_PEEK is set. Cc: "David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING Cc: Eric Dumazet <edumazet@google.com> (commit_signer:1/14=7%) Cc: Aaron Conole <aconole@bytheb.org> Fixes: `9f389e3567` ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag") Signed-off-by: Andrey Vagin <avagin@openvz.org> Tested-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:33:09 -07:00
WANG Cong	215c90afb9	act_mirred: always release tcf hash Align with other tc actions. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:30:34 -07:00
WANG Cong	6bd00b8506	act_mirred: fix a race condition on mirred_list After commit `1ce87720d4` ("net: sched: make cls_u32 lockless") we began to release tc actions in a RCU callback. However, mirred action relies on RTNL lock to protect the global mirred_list, therefore we could have a race condition between RCU callback and netdevice event, which caused a list corruption as reported by Vinson. Instead of relying on RTNL lock, introduce a spinlock to protect this list. Note, in non-bind case, it is still called with RTNL lock, therefore should disable BH too. Reported-by: Vinson Lee <vlee@twopensource.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 06:30:33 -07:00
Jiri Benc	181a4224ac	ipv4: fix reply_dst leakage on arp reply There are cases when the created metadata reply is not used. Ensure the allocated memory is freed also in such cases. Fixes: `63d008a4e9` ("ipv4: send arp replies to the correct tunnel") Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 04:05:15 -07:00
Eric Dumazet	2306c704ce	inet: fix race in reqsk_queue_unlink() reqsk_timer_handler() tests if icsk_accept_queue.listen_opt is NULL at its beginning. By the time it calls inet_csk_reqsk_queue_drop() and reqsk_queue_unlink(), listener might have been closed and inet_csk_listen_stop() had called reqsk_queue_yank_acceptq() which sets icsk_accept_queue.listen_opt to NULL We therefore need to correctly check listen_opt being NULL after holding syn_wait_lock for proper synchronization. Fixes: `fa76ce7328` ("inet: get rid of central tcp/dccp listener timer") Fixes: `b357a364c5` ("inet: fix possible panic in reqsk_queue_unlink()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 04:04:09 -07:00

... 4 5 6 7 8 ...

40013 Commits