linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-16 01:04:08 +08:00

Author	SHA1	Message	Date
Cong Wang	a49eb42a34	sched, act: allow to clear all actions as well When we change the list of action on a given filter, currently we don't change it to empty. This is a bug, we should allow to change to whatever users given. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 23:42:39 -04:00
Cong Wang	2f7ef2f879	sched, cls: check if we could overwrite actions when changing a filter When actions are attached to a filter, they are a part of the filter itself, so when changing a filter we should allow to overwrite the actions inside as well. In my specific case, when I tried to _append_ a new action to an existing filter which already has an action, I got EEXIST since kernel refused to overwrite the existing one in kernel. This patch checks if we are changing the filter checking NLM_F_CREATE flag (Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down to actions. This fixes the problem above. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 23:42:39 -04:00
Karl Heiss	8c2eab9097	net: sctp: Don't transition to PF state when transport has exhausted 'Path.Max.Retrans'. Don't transition to the PF state on every strike after 'Path.Max.Retrans'. Per draft-ietf-tsvwg-sctp-failover-03 Section 5.1.6: Additional (PMR - PFMR) consecutive timeouts on a PF destination confirm the path failure, upon which the destination transitions to the Inactive state. As described in [RFC4960], the sender (i) SHOULD notify ULP about this state transition, and (ii) transmit heartbeats to the Inactive destination at a lower frequency as described in Section 8.3 of [RFC4960]. This also prevents sending SCTP_ADDR_UNREACHABLE to the user as the state bounces between SCTP_INACTIVE and SCTP_PF for each subsequent strike. Signed-off-by: Karl Heiss <kheiss@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 23:41:14 -04:00
Xufeng Zhang	8535087131	sctp: reset flowi4_oif parameter on route lookup commit `813b3b5db8` (ipv4: Use caller's on-stack flowi as-is in output route lookups.) introduces another regression which is very similar to the problem of commit `e6b45241c` (ipv4: reset flowi parameters on route connect) wants to fix: Before we call ip_route_output_key() in sctp_v4_get_dst() to get a dst that matches a bind address as the source address, we have already called this function previously and the flowi parameters have been initialized including flowi4_oif, so when we call this function again, the process in __ip_route_output_key() will be different because of the setting of flowi4_oif, and we'll get a networking device which corresponds to the inputted flowi4_oif as the output device, this is wrong because we'll never hit this place if the previously returned source address of dst match one of the bound addresses. To reproduce this problem, a vlan setting is enough: # ifconfig eth0 up # route del default # vconfig add eth0 2 # vconfig add eth0 3 # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0 # route add default gw 10.0.1.254 dev eth0.2 # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0 # ip rule add from 10.0.0.14 table 4 # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3 # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I You'll detect that all the flow are routed to eth0.2(10.0.1.254). Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 19:46:17 -04:00
Toshiaki Makita	30313a3d57	bridge: Handle IFLA_ADDRESS correctly when creating bridge device When bridge device is created with IFLA_ADDRESS, we are not calling br_stp_change_bridge_id(), which leads to incorrect local fdb management and bridge id calculation, and prevents us from receiving frames on the bridge device. Reported-by: Tom Gundersen <teg@jklm.no> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 19:46:17 -04:00
Ying Xue	22e7987ae7	tipc: fix a possible memory leak The commit `a8b9b96e95` ("tipc: fix race in disc create/delete") leads to the following static checker warning: net/tipc/discover.c:352 tipc_disc_create() warn: possible memory leak of 'req' The risk of memory leak really exists in practice. Especially when it's failed to allocate memory for "req->buf", tipc_disc_create() doesn't free its allocated memory, instead just directly returns with ENOMEM error code. In this situation, memory leak, of course, happens. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 19:08:06 -04:00
xiao jin	851bdd11ca	inetpeer_gc_worker: trivial cleanup Do not initialize list twice. list_replace_init() already takes care of initializing list. We don't need to initialize it with LIST_HEAD() beforehand. Signed-off-by: xiao jin <jin.xiao@intel.com> Reviewed-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:52:28 -04:00
xiao jin	1818ce4dc5	net_namespace: trivial cleanup Do not initialize net_kill_list twice. list_replace_init() already takes care of initializing net_kill_list. We don't need to initialize it with LIST_HEAD() beforehand. Signed-off-by: xiao jin <jin.xiao@intel.com> Reviewed-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:50:23 -04:00
Erik Hugne	78acb1f9b8	tipc: add ioctl to fetch link names We add a new ioctl for AF_TIPC that can be used to fetch the logical name for a link to a remote node on a given bearer. This should be used in combination with link state subscriptions. The logical name size limit definitions are moved to tipc.h, as they are now also needed by the new ioctl. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:13:24 -04:00
Erik Hugne	a89778d8ba	tipc: add support for link state subscriptions When links are established over a bearer plane, we create a node local publication containing information about the peer node and bearer plane. This allows TIPC applications to use the standard TIPC topology server subscription mechanism to get notifications when a link goes up or down. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:13:24 -04:00
Rostislav Lisovy	8eca1fb692	cfg80211: Use 5MHz bandwidth by default when checking usable channels Current code checks if the 20MHz bandwidth is allowed for particular channel -- if it is not, the channel is disabled. Since we need to use 5/10 MHz channels, this code is modified in the way that the default bandwidth to check is 5MHz. If the maximum bandwidth allowed by the channel is smaller than 5MHz, the channel is disabled. Otherwise the channel is used and the flags are set according to the bandwidth allowed by the channel. Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:39:32 +02:00
Rostislav Lisovy	ea077c1cea	cfg80211: Add attributes describing prohibited channel bandwidth Since there are frequency bands (e.g. 5.9GHz) allowing channels with only 10 or 5 MHz bandwidth, this patch adds attributes that allow keeping track about this information. When channel attributes are reported to user-space, make sure to not break old tools, i.e. if the 'split wiphy dump' is enabled, report the extra attributes (if present) describing the bandwidth restrictions. If the 'split wiphy dump' is not enabled, completely omit those channels that have flags set to either IEEE80211_CHAN_NO_10MHZ or IEEE80211_CHAN_NO_20MHZ. Add the check for new bandwidth restriction flags in cfg80211_chandef_usable() to comply with the restrictions. Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:38:23 +02:00
Zhao, Gang	8bd811aa6c	mac80211: change return value of notifier function Return NOTIFY_DONE if we don't care this time's notification, return NOTIFY_OK if we successfully handled this time's notification. That's the formal way to do it. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:34:25 +02:00
Zhao, Gang	6784c7db8d	cfg80211: change return value of notifier function Return NOTIFY_DONE if we don't care this time's notification, return NOTIFY_OK if we successfully handled this time's notification. That's the formal way to do it. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:33:44 +02:00
Zhao, Gang	f26cbf401b	cfg80211: change wiphy_to_dev function name Name wiphy_to_rdev is more accurate to describe what the function does, i.e., return a pointer pointing to struct cfg80211_registered_device. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:33:04 +02:00
Zhao, Gang	1b8ec87aa0	cfg80211: change registered device pointer name Name "dev" is too common and ambiguous, let all the pointer name pointing to struct cfg80211_registered_device be "rdev". This can improve code readability and consistency(since other places have already called it rdev). Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:32:56 +02:00
Zhao, Gang	308f7fcfdb	mac80211: remove unnecessary BUG_ON() The BUG_ON(!err) can't be triggered in the code path, so remove it. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:31:00 +02:00
Zhao, Gang	6b59db7d4c	mac80211: return bool instead of numbers in yes/no function And some code style changes in the function, and correct a typo in comment. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:30:28 +02:00
Marek Kwaczynski	17d38fa8c2	mac80211: add option to generate CCMP IVs only for mgmt frames Some chips can encrypt managment frames in HW, but require generated IV in the frame. Add a key flag that allows us to achieve this. Signed-off-by: Marek Kwaczynski <marek.kwaczynski@tieto.com> [use BIT(0) to fill that spot, fix indentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:26:15 +02:00
Michal Kazior	c0166da9fe	mac80211: compute chanctx refcount on-the-fly It doesn't make much sense to store refcount in the chanctx structure. One still needs to hold chanctx_mtx to get the value safely. Besides, refcount isn't on performance critical paths. This will make implementing chanctx reservation refcounting a little easier. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	2b32713d72	mac80211: fix racy usage of chanctx->refcount Channel context refcount is protected by chanctx_mtx. Accessing the value without holding the mutex is racy. RCU section didn't guarantee anything here. Theoretically ieee80211_channel_switch() could fail to see refcount change and read "1" instead of, e.g. "2". This means mac80211 could accept CSA even though it shouldn't have. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	1f0d54cdcf	mac80211: split ieee80211_free_chanctx() The function did a little too much. Split it up so the code can be easily reused in the future. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	ed68ebcaf9	mac80211: split ieee80211_new_chanctx() The function did a little too much. Split it up so the code can be easily reused in the future. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	13f348a814	mac80211: improve chanctx reservation lookup Use a separate function to look for reservation chanctx. For multi-interface/channel reservation search sematics differ slightly. The new routine allows reservations to be merged with chanctx that are already reserved by other interface(s). Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	0288157b2a	mac80211: improve find_chanctx() for reservations This allows new vifs to be assigned to a chanctx as long as chanctx's reservation chandefs (if any) and chanctx's current chandef (implied by assigned vifs at the time, if any) and the new vif chandef are all compatible. This implies it is impossible to assign a new vif to an in-place reservation chanctx. This gives no advantages for single-channel hardware. It makes sense for multi-channel hardware only. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:30 +02:00
Michal Kazior	e3afb92022	mac80211: track reserved vifs in chanctx This can be useful. Provides a more straghtforward way to iterate over interfaces taking part in chanctx reservation and allows tracking chanctx usage explicitly. The structure is protected by local->chanctx_mtx. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:29 +02:00
Michal Kazior	484298ad1a	mac80211: track assigned vifs in chanctx This can be useful. Provides a more straghtforward way to iterate over interfaces bound to a given chanctx and allows tracking chanctx usage explicitly. The structure is protected by local->chanctx_mtx. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:29 +02:00
Michal Kazior	093324816b	mac80211: add support for radar detection for reservations Initial chanctx reservation code wasn't aware of radar detection requirements. This is necessary for chanctx reservations to be used for channel switching in the future. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:15 +02:00
Michal Kazior	c2b90ad880	mac80211: prevent chanctx overcommit Do not allocate more channel contexts than a driver is capable for currently matching interface combination. This allows the ieee80211_vif_reserve_chanctx() to act as a guard against breaking interface combinations. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:15 +02:00
Michal Kazior	6fa001bc7e	mac80211: add max channel calculation utility function The utility function has no uses yet. It is aimed at future chanctx reservation management and channel switching. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:15 +02:00
Michal Kazior	65a124dd71	cfg80211: allow drivers to iterate over matching combinations The patch splits cfg80211_check_combinations() into an iterator function and a simple iteration user. This makes it possible for drivers to asses how many channels can use given iftype setup. This in turn can be used for future multi-interface/multi-channel channel switching. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 17:08:14 +02:00
Ilan Peer	46d537245d	cfg80211: Fix GO Concurrent relaxation on UNII-3 At some locations, channels 149-165 are considered a single bundle, while at some other locations, e.g., Indonesia, channels 149-161 are considered a single bundle, while channel 165 belongs to a different bundle. This means that: 1. A station interface connection to an AP on channel 165 allows the instantiation of a P2P GO on channels 149-165. 2. A station interface connection to an AP on channels 149-161 does NOT allow the instantiation of a P2P GO on channel 165. Fix this. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-25 16:47:33 +02:00
Johan Hedberg	09da1f3463	Bluetooth: Fix redundant encryption request for reauthentication When we're performing reauthentication (in order to elevate the security level from an unauthenticated key to an authenticated one) we do not need to issue any encryption command once authentication completes. Since the trigger for the encryption HCI command is the ENCRYPT_PEND flag this flag should not be set in this scenario. Instead, the REAUTH_PEND flag takes care of all necessary steps for reauthentication. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-04-25 09:47:15 +03:00
Johan Hedberg	9eb1fbfa0a	Bluetooth: Fix triggering BR/EDR L2CAP Connect too early Commit `1c2e004183` introduced an event handler for the encryption key refresh complete event with the intent of fixing some LE/SMP cases. However, this event is shared with BR/EDR and there we actually want to act only on the auth_complete event (which comes after the key refresh). If we do not do this we may trigger an L2CAP Connect Request too early and cause the remote side to return a security block error. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2014-04-25 09:47:15 +03:00
Kumar Sundararajan	1c26585458	ipv6: fib: fix fib dump restart When the ipv6 fib changes during a table dump, the walk is restarted and the number of nodes dumped are skipped. But the existing code doesn't advance to the next node after a node is skipped. This can cause the dump to loop or produce lots of duplicates when the fib is modified during the dump. This change advances the walk to the next node if the current node is skipped after a restart. Signed-off-by: Kumar Sundararajan <kumar@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 17:19:25 -04:00
Nicolas Dichtel	f01ec1c017	vxlan: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. The vxlan socket is openned into the i/o netns, ie into the netns where encapsulated packets are received. The socket lookup is done into this netns to find the corresponding vxlan tunnel. After decapsulation, the packet is injecting into the corresponding interface which may stand to another netns. When one of the two netns is removed, the tunnel is destroyed. Configuration example: ip netns add netns1 ip netns exec netns1 ip link set lo up ip link add vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0 ip link set vxlan10 netns netns1 ip netns exec netns1 ip addr add 192.168.0.249/24 broadcast 192.168.0.255 dev vxlan10 ip netns exec netns1 ip link set vxlan10 up Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 16:18:26 -04:00
David Gibson	c53864fd60	rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set Since `115c9b8192` (rtnetlink: Fix problem with buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST attribute if they were solicited by a GETLINK message containing an IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag. That was done because some user programs broke when they received more data than expected - because IFLA_VFINFO_LIST contains information for each VF it can become large if there are many VFs. However, the IFLA_VF_PORTS attribute, supplied for devices which implement ndo_get_vf_port (currently the 'enic' driver only), has the same problem. It supplies per-VF information and can therefore become large, but it is not currently conditional on the IFLA_EXT_MASK value. Worse, it interacts badly with the existing EXT_MASK handling. When IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet. netlink_dump() will misinterpret this as having finished the listing and omit data for this interface and all subsequent ones. That can cause getifaddrs(3) to enter an infinite loop. This patch addresses the problem by only supplying IFLA_VF_PORTS when IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:52:54 -04:00
David Gibson	973462bbde	rtnetlink: Warn when interface's information won't fit in our packet Without IFLA_EXT_MASK specified, the information reported for a single interface in response to RTM_GETLINK is expected to fit within a netlink packet of NLMSG_GOODSIZE. If it doesn't, however, things will go badly wrong, When listing all interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first message in a packet as the end of the listing and omit information for that interface and all subsequent ones. This can cause getifaddrs(3) to enter an infinite loop. This patch won't fix the problem, but it will WARN_ON() making it easier to track down what's going wrong. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:52:54 -04:00
David S. Miller	a64d90fd96	netfilter: Fix warning in nfnetlink_receive(). net/netfilter/nfnetlink.c: In function ‘nfnetlink_rcv’: net/netfilter/nfnetlink.c:371:14: warning: unused variable ‘net’ [-Wunused-variable] Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:51:29 -04:00
Eric W. Biederman	90f62cf30a	net: Use netlink_ns_capable to verify the permisions of netlink messages It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:54 -04:00
Eric W. Biederman	aa4cf9452f	net: Add variants of capable for use on netlink messages netlink_net_capable - The common case use, for operations that are safe on a network namespace netlink_capable - For operations that are only known to be safe for the global root netlink_ns_capable - The general case of capable used to handle special cases __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of the skbuff of a netlink message. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:54 -04:00
Eric W. Biederman	a3b299da86	net: Add variants of capable for use on on sockets sk_net_capable - The common case, operations that are safe in a network namespace. sk_capable - Operations that are not known to be safe in a network namespace sk_ns_capable - The general case for special cases. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:53 -04:00
Eric W. Biederman	a53b72c83a	net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump The permission check in sock_diag_put_filterinfo is wrong, and it is so removed from it's sources it is not clear why it is wrong. Move the computation into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo. This does not yet correct the capability check but instead simply moves it to make it clear what is going on. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:53 -04:00
Eric W. Biederman	5187cd055b	netlink: Rename netlink_capable netlink_allowed netlink_capable is a static internal function in af_netlink.c and we have better uses for the name netlink_capable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:53 -04:00
David S. Miller	4366004d77	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/igb/e1000_mac.c net/core/filter.c Both conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:19:00 -04:00
Marcel Holtmann	db5966816c	Bluetooth: Return EOPNOTSUPP for HCISETRAW ioctl command The HCISETRAW ioctl command is not really useful. To utilize raw and direct access to the HCI controller, the HCI User Channel feature has been introduced. Return EOPNOTSUPP to indicate missing support for this command. For legacy reasons hcidump used to use HCISETRAW for permission check to return proper error codes to users. To keep backwards compability return EPERM in case the caller does not have CAP_NET_ADMIN capability. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-24 11:55:25 -03:00
Tomasz Bursztyka	f5efc696cc	netfilter: nf_tables: Add meta expression key for bridge interface name NFT_META_BRI_IIFNAME to get packet input bridge interface name NFT_META_BRI_OIFNAME to get packet output bridge interface name Such meta key are accessible only through NFPROTO_BRIDGE family, on a dedicated nft meta module: nft_meta_bridge. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-24 10:37:28 +02:00
Alexei Starovoitov	83d5b7ef99	net: filter: initialize A and X registers exisiting BPF verifier allows uninitialized access to registers, 'ret A' is considered to be a valid filter. So initialize A and X to zero to prevent leaking kernel memory In the future BPF verifier will be rejecting such filters Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-23 15:34:41 -04:00
Nicolas Dichtel	22f08069e8	ip6gre: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-23 14:53:36 -04:00
Nicolas Dichtel	b57708add3	gre: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-23 14:53:36 -04:00
Tomasz Bursztyka	aa45660c6b	netfilter: nf_tables: Make meta expression core functions public This will be useful to create network family dedicated META expression as for NFPROTO_BRIDGE for instance. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-23 13:55:30 +02:00
Tomasz Bursztyka	758dbcecf1	netfilter: nf_tables: Stack expression type depending on their family To ensure family tight expression gets selected in priority to family agnostic ones. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-23 13:51:05 +02:00
Tetsuo Handa	2e71029e2c	xfrm: Remove useless xfrm_audit struct. Commit `f1370cc4` "xfrm: Remove useless secid field from xfrm_audit." changed "struct xfrm_audit" to have either { audit_get_loginuid(current) / audit_get_sessionid(current) } or { INVALID_UID / -1 } pair. This means that we can represent "struct xfrm_audit" as "bool". This patch replaces "struct xfrm_audit" argument with "bool". Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-04-23 08:21:04 +02:00
Richard Guy Briggs	7774d5e03f	netlink: implement unbind to netlink_setsockopt NETLINK_DROP_MEMBERSHIP Call the per-protocol unbind function rather than bind function on NETLINK_DROP_MEMBERSHIP in netlink_setsockopt(). Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:42:26 -04:00
Richard Guy Briggs	4f52090052	netlink: have netlink per-protocol bind function return an error code. Have the netlink per-protocol optional bind function return an int error code rather than void to signal a failure. This will enable netlink protocols to perform extra checks including capabilities and permissions verifications when updating memberships in multicast groups. In netlink_bind() and netlink_setsockopt() the call to the per-protocol bind function was moved above the multicast group update to prevent any access to the multicast socket groups before checking with the per-protocol bind function. This will enable the per-protocol bind function to be used to check permissions which could be denied before making them available, and to avoid the messy job of undoing the addition should the per-protocol bind function fail. The netfilter subsystem seems to be the only one currently using the per-protocol bind function. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:42:26 -04:00
Richard Guy Briggs	bfe4bc71c6	netlink: simplify nfnetlink_bind Remove duplicity and simplify code flow by moving the rcu_read_unlock() above the condition and let the flow control exit naturally at the end of the function. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:42:26 -04:00
Chema Gonzalez	4cd3675ebf	filter: added BPF random opcode Added a new ancillary load (bpf call in eBPF parlance) that produces a 32-bit random number. We are implementing it as an ancillary load (instead of an ISA opcode) because (a) it is simpler, (b) allows easy JITing, and (c) seems more in line with generic ISAs that do not have "get a random number" as a instruction, but as an OS call. The main use for this ancillary load is to perform random packet sampling. Signed-off-by: Chema Gonzalez <chema@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Li RongQing	5a4ae5f6e7	vlan: unnecessary to check if vlan_pcpu_stats is NULL if allocating memory for vlan_pcpu_stats failed, the device can not be operated Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Venkata Duvvuru	3de0b59239	ethtool: Support for configurable RSS hash key This ethtool patch primarily copies the ioctl command data structures from/to the User space and invokes the driver hook. Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Eric Dumazet	1f3279ae0c	tcp: avoid retransmits of TCP packets hanging in host queues In commit `0e280af026` ("tcp: introduce TCPSpuriousRtxHostQueues SNMP counter") we added a logic to detect when a packet was retransmitted while the prior clone was still in a qdisc or driver queue. We are now confident we can do better, and catch the problem before we fragment a TSO packet before retransmit, or in TLP path. This patch fully exploits the logic by simply canceling the spurious retransmit. Original packet is in a queue and will eventually leave the host. This helps to avoid network collapses when some events make the RTO estimations very wrong, particularly when dealing with huge number of sockets with synchronized blast. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Heiner Kallweit	6046d5b4e4	ipv6: support IFA_F_MANAGETEMPADDR for address deletion too Userspace applications can use IFA_F_MANAGETEMPADDR with RTM_NEWADDR already to indicate that the kernel should take care of temporary address management. This patch adds related functionality to RTM_DELADDR. By setting IFA_F_MANAGETEMPADDR a userspace application can indicate that the kernel should delete all related temporary addresses as well. A corresponding patch for the "ip addr del" command has been applied to iproute2 already. Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:27:57 -04:00
Ying Xue	a8b9b96e95	tipc: fix race in disc create/delete Commit `a21a584d67` (tipc: fix neighbor detection problem after hw address change) introduces a race condition involving tipc_disc_delete() and tipc_disc_add/remove_dest that can cause TIPC to dereference the pointer to the bearer discovery request structure after it has been freed since a stray pointer is left in the bearer structure. In order to fix the issue, the process of resetting the discovery request handler is optimized: the discovery request handler and request buffer are just reset instead of being freed, allocated and initialized. As the request point is always valid and the request's lock is taken while the request handler is reset, the race doesn't happen any more. Reported-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	28dd94187a	tipc: use bc_lock to protect node map in bearer structure The node map variable - 'nodes' in bearer structure is only used by bclink. When bclink accesses it, bc_lock is held. But when change it, for instance, in tipc_bearer_add_dest() or tipc_bearer_remove_dest() the bc_lock is not taken at all. To avoid any inconsistent data, we should always grab bc_lock while accessing node map variable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	4ae88c94d3	tipc: use bearer_disable to disable bearer in tipc_l2_device_event As bearer pointer is known in tipc_l2_device_event(), it's unnecessary to search it again in tipc_disable_bearer(). If tipc_disable_bearer() is replaced with bearer_disable() in tipc_l2_device_event(), this will help us save a bit time when bearer is disabled. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	f1c8d8cb82	tipc: make media_ptr pointed netdevice valid The 'media_ptr' pointer in bearer structure which points to network device, is protected by RCU. So, before netdevice is released, synchronize_net() should be involved to prevent no any user of the netdevice on read side from accessing it after it is freed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	7216cd949c	tipc: purge tipc_net_lock lock Now tipc routing hierarchy comprises the structures 'node', 'link'and 'bearer'. The whole hierarchy is protected by a big read/write lock, tipc_net_lock, to ensure that nothing is added or removed while code is accessing any of these structures. Obviously the locking policy makes node, link and bearer components closely bound together so that their relationship becomes unnecessarily complex. In the worst case, such locking policy not only has a negative influence on performance, but also it's prone to lead to deadlock occasionally. In order o decouple the complex relationship between bearer and node as well as link, the locking policy is adjusted as follows: - Bearer level RTNL lock is used on update side, and RCU is used on read side. Meanwhile, all bearer instances including broadcast bearer are saved into bearer_list array. - Node and link level All node instances are saved into two tipc_node_list and node_htable lists. The two lists are protected by node_list_lock on write side, and they are guarded with RCU lock on read side. All members in node structure including link instances are protected by node spin lock. - The relationship between bearer and node When link accesses bearer, it first needs to find the bearer with its bearer identity from the bearer_list array. When bearer accesses node, it can iterate the node_htable hash list with the node address to find the corresponding node. In the new locking policy, every component has its private locking solution and the relationship between bearer and node is very simple, that is, they can find each other with node address or bearer identity from node_htable hash list or bearer_list array. Until now above all changes have been done, so tipc_net_lock can be removed safely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	2231c5af45	tipc: use RCU to protect media_ptr pointer Now the media_ptr pointer is protected with tipc_net_lock write lock on write side; tipc_net_lock read lock is used to read side. As the part of effort of eliminating tipc_net_lock, we decide to adjust the locking policy of media_ptr pointer protection: on write side, RTNL lock is use while on read side RCU read lock is applied. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	7a2f7d18e7	tipc: decouple the relationship between bearer and link Currently on both paths of message transmission and reception, the read lock of tipc_net_lock must be held before bearer is accessed, while the write lock of tipc_net_lock has to be taken before bearer is configured. Although it can ensure that bearer is always valid on the two data paths, link and bearer is closely bound together. So as the part of effort of removing tipc_net_lock, the locking policy of bearer protection will be adjusted as below: on the two data paths, RCU is used, and on the configuration path of bearer, RTNL lock is applied. Now RCU just covers the path of message reception. To make it possible to protect the path of message transmission with RCU, link should not use its stored bearer pointer to access bearer, but it should use the bearer identity of its attached bearer as index to get bearer instance from bearer_list array, which can help us decouple the relationship between bearer and link. As a result, bearer on the path of message transmission can be safely protected by RCU when we access bearer_list array within RCU lock protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	f8322dfce5	tipc: convert bearer_list to RCU list Convert bearer_list to RCU list. It's protected by RTNL lock on update side, and RCU read lock is applied to read side. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	f97e455abf	tipc: use RTNL lock to protect tipc_net_stop routine As the tipc network initialization(ie, tipc_net_start routine) is under RTNL protection, its corresponding deinitialization part(ie, tipc_net_stop routine) should be protected by RTNL too. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	ca07fb07c9	tipc: adjust locking policy of protecting tipc_ptr pointer of net_device Currently the 'tipc_ptr' pointer is protected by tipc_net_lock write lock on write side, and RCU read lock is applied to read side. In addition, there have two paths on write side where we may change variables pointed by the 'tipc_ptr' pointer: one is to configure bearer by tipc-config tool and another one is that bearer status is changed by notification events of its attached interface. But on the latter path, we improperly deem that accessing 'tipc_ptr' pointer happens on read side with rcu_read_lock() although some variables pointed by the 'tipc_ptr' pointer are changed possibly. Moreover, as now the both paths are guarded by RTNL lock, it's better to adjust the locking policy of 'tipc_ptr' pointer protection, allowing RTNL instead of tipc_net_lock write lock to protect it on write side, which will help us purge tipc_net_lock in the future. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	ef13a262c3	tipc: replace config_mutex lock with RTNL lock There have two paths where we can configure or change bearer status: one is that bearer is configured from user space with tipc-config tool; another one is that bearer is changed by notification events from its attached interface. On the first path, one dedicated config_mutex lock is guarded; on the latter path, RTNL lock has been placed to serialize the process of dealing with interface events. So, if RTNL lock is also used to protect the first path, this will not only extremely help us simplify current locking policy, but also config_mutex lock can be deleted as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Andrew Lutomirski	78541c1dc6	net: Fix ns_capable check in sock_diag_put_filterinfo The caller needs capabilities on the namespace being queried, not on their own namespace. This is a security bug, although it likely has only a minor impact. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 12:49:39 -04:00
Bob Copeland	bc3ce0b0be	mac80211: mesh: always use the latest target_sn When a path target responds to a path request, its response always contains the most up-to-date information; accordingly, it should use the latest target_sn, regardless of net_traversal_jiffies(). Otherwise, only the first path response is considered when constructing a path, as it will have the highest target_sn of all replies during that period. Signed-off-by: Bob Copeland <bob@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:24:50 +02:00
Bob Copeland	a40a8c17b2	mac80211: fix mesh_add_rsn_ie IE finding loop Previously, the code to copy the RSN IE from the mesh config would increment its pointer by one in the loop instead of by the element length, so there was the potential for mistaking another IE's data fields as the RSN IE. cfg80211_find_ie() exists, so just use that. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:24:49 +02:00
Bob Copeland	aee6499c8c	mac80211: mesh: use u16 return type for u16 getter u16_field_get() is a simple wrapper around get_unaligned_le16(), and it is being assigned to a u16, so there's no need to promote to u32 in the middle. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:24:49 +02:00
Jouni Malinen	9a07bf507d	mac80211: Allow HT capa override to add 40 MHz intolerant This can be useful for testing purposes to confirm valid AP behavior on HT 20/40 co-existence functionality. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:23:19 +02:00
Luis R. Rodriguez	96cce12ff6	cfg80211: fix processing world regdomain when non modular This allows processing of the last regulatory request when we determine its still pending. Without this if a regulatory request failed to get processed by userspace we wouldn't be able to re-process it later. An example situation that can lead to an unprocessed last_request is enabling cfg80211 to be built-in to the kernel, not enabling CFG80211_INTERNAL_REGDB and the CRDA binary not being available at the time the udev rule that kicks of CRDA triggers. In such a situation we want to let some cfg80211 triggers eventually kick CRDA for us again. Without this if the first cycle attempt to kick off CRDA failed we'd be stuck without the ability to change process any further regulatory domains. cfg80211 will trigger re-processing of the regulatory queue whenever schedule_work(&reg_work) is called, currently this happens when: * suspend / resume * disconnect * a beacon hint gets triggered (non DFS 5 GHz AP found) * a regulatory request gets added to the queue We don't have any specific opportunistic late boot triggers to address a late mount of where CRDA resides though, adding that should be done separately through another patch. Without an opportunistic fix then this fix relies at least one of the triggeres above to happen. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:17:56 +02:00
Arik Nemtsov	c888393b74	cfg80211: avoid freeing last_request while in flight Avoid freeing the last request while it is being processed. This can happen in some cases if reg_work is kicked for some reason while the currently pending request is in flight. Cc: Sander Eikelenboom <linux@eikelenboom.it> Tested-by: Eliad Peller <eliad@wizery.com> Tested-by: Colleen Twitty <colleen@cozybit.com> Signed-off-by: Arik Nemtsov <arik@wizery.com> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:12:55 +02:00
Bob Copeland	311ab8e1c2	mac80211: fixup radiotap tx flags for RTS/CTS When using RTS/CTS, the CTS-to-Self bit in radiotap TX flags is getting set instead of the RTS bit. Set the correct one. Reported-by: Larry Maxwell <larrymaxwell@agilemesh.com> Signed-off-by: Bob Copeland <bob@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 17:08:30 +02:00
Chun-Yeow Yeoh	062f1d6de0	mac80211: avoid handling of SMPS for mesh The patch "mac80211: implement SMPS for AP" has caused kernel oops at mesh STA if the peer mesh STA operates in sleep mode and then becomes active mode. It can be easily reproduced by setting the following commands at peer mesh STA: iw mesh0 station set aa:bb:cc:dd:ee:ff mesh_power_mode deep iw mesh0 station set aa:bb:cc:dd:ee:ff mesh_power_mode active Kernel oops will happen at mesh STA aa:bb:cc:dd:ee:ff. Fix this by avoiding SMPS for mesh mode. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-22 16:38:42 +02:00
Tetsuo Handa	f1370cc4a0	xfrm: Remove useless secid field from xfrm_audit. It seems to me that commit `ab5f5e8b` "[XFRM]: xfrm audit calls" is doing something strange at xfrm_audit_helper_usrinfo(). If secid != 0 && security_secid_to_secctx(secid) != 0, the caller calls audit_log_task_context() which basically does secid != 0 && security_secid_to_secctx(secid) == 0 case except that secid is obtained from current thread's context. Oh, what happens if secid passed to xfrm_audit_helper_usrinfo() was obtained from other thread's context? It might audit current thread's context rather than other thread's context if security_secid_to_secctx() in xfrm_audit_helper_usrinfo() failed for some reason. Then, are all the caller of xfrm_audit_helper_usrinfo() passing either secid obtained from current thread's context or secid == 0? It seems to me that they are. If I didn't miss something, we don't need to pass secid to xfrm_audit_helper_usrinfo() because audit_log_task_context() will obtain secid from current thread's context. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-04-22 10:47:53 +02:00
Weiping Pan	86fd14ad1e	tcp: make tcp_cwnd_application_limited() static Make tcp_cwnd_application_limited() static and move it from tcp_input.c to tcp_output.c Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:56 -04:00
Luis R. Rodriguez	94716d2fb1	6lowpan: make lowpan_cb static CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: linux-zigbee-devel@lists.sourceforge.net Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:55 -04:00
Luis R. Rodriguez	599018a710	6lowpan: add helper to get 6lowpan namespace This will simplify the new reassembly backport with no code changes being required. CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: linux-zigbee-devel@lists.sourceforge.net Cc: David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:55 -04:00
Neil Horman	8465a5fcd1	sctp: add support for busy polling to sctp protocol The busy polling socket option adds support for sockets to busy wait on data arriving on the napi queue from which they have most recently received a frame. Currently only tcp and udp support this feature, but theres no reason sctp can't do so as well. Add it in so appliations can take advantage of it Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: Daniel Borkmann <dborkman@redhat.com> CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:55 -04:00
Herbert Xu	a0265d28b3	net: Add __dev_forward_skb This patch adds the helper __dev_forward_skb which is identical to dev_forward_skb except that it doesn't actually inject the skb into the stack. This is useful where we wish to have finer control over how the packet is injected, e.g., via netif_rx_ni or netif_receive_skb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:55 -04:00
Kenjiro Nakayama	1536e2857b	tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner This patch adds a TCP_FASTOPEN socket option to get a max backlog on its listener to getsockopt(). Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-20 18:18:54 -04:00
Vlad Yasevich	b14878ccb7	net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-18 18:32:00 -04:00
David S. Miller	5a292f7bc6	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-04-17 Please pull this batch of fixes intended for the 3.15 stream... For the mac80211 bits, Johannes says: "We have a fix from Chun-Yeow to not look at management frame bitrates that are typically really low, two fixes from Felix for AP_VLAN interfaces, a fix from Ido to disable SMPS settings when a monitor interface is enabled, a radar detection fix from Michał and a fix from myself for a very old remain-on-channel bug." For the iwlwifi bits, Emmanuel says: "I have new device IDs and a new firmware API. These are the trivial ones. The less trivial ones are Johannes's fix that delays the enablement of an interrupt coalescing hardware until after association - this fixes a few connection problems seen in the field. Eyal has a bunch of rate control fixes. I decided to add these for 3.15 because they fix some disconnection and packet loss scenarios which were reported by the field. I also have a fix for a memory leak that happens only with a very new NIC." Along with those... Amitkumar Karwar fixes a couple of problems relating to driver/firmware interactions in mwifiex. Christian Engelmayer avoids a couple of potential memory leaks in the new rsi driver. Eliad Peller provides a wl18xx mailbox alignment fix for problems when using new firmware. Frederic Danis adds a couple of missing debugging strings to the cw1200 driver. Geert Uytterhoeven adds a variable initialization inside of the rsi driver. Luciano Coelho patches the wlcore code to ignore dummy packet events in PLT mode in order to work around a firmware bug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-18 18:29:44 -04:00
dingtianhong	dc8eaaa006	vlan: Fix lockdep warning when vlan dev handle notification When I open the LOCKDEP config and run these steps: modprobe 8021q vconfig add eth2 20 vconfig add eth2.20 30 ifconfig eth2 xx.xx.xx.xx then the Call Trace happened: [32524.386288] ============================================= [32524.386293] [ INFO: possible recursive locking detected ] [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G O [32524.386302] --------------------------------------------- [32524.386306] ifconfig/3103 is trying to acquire lock: [32524.386310] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386326] [32524.386326] but task is already holding lock: [32524.386330] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386341] [32524.386341] other info that might help us debug this: [32524.386345] Possible unsafe locking scenario: [32524.386345] [32524.386350] CPU0 [32524.386352] ---- [32524.386354] lock(&vlan_netdev_addr_lock_key/1); [32524.386359] lock(&vlan_netdev_addr_lock_key/1); [32524.386364] [32524.386364] * DEADLOCK * [32524.386364] [32524.386368] May be due to missing lock nesting notation [32524.386368] [32524.386373] 2 locks held by ifconfig/3103: [32524.386376] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20 [32524.386387] #1: (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386398] [32524.386398] stack backtrace: [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G O 3.14.0-rc2-0.7-default+ #35 [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [32524.386414] ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8 [32524.386421] ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0 [32524.386428] 000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000 [32524.386435] Call Trace: [32524.386441] [<ffffffff814f68a2>] dump_stack+0x6a/0x78 [32524.386448] [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940 [32524.386454] [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940 [32524.386459] [<ffffffff810a4874>] lock_acquire+0xe4/0x110 [32524.386464] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386471] [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40 [32524.386476] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386481] [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386489] [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q] [32524.386495] [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0 [32524.386500] [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40 [32524.386506] [<ffffffff8141b3cf>] __dev_open+0xef/0x150 [32524.386511] [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190 [32524.386516] [<ffffffff8141b292>] dev_change_flags+0x32/0x80 [32524.386524] [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830 [32524.386532] [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660 [32524.386540] [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0 [32524.386550] [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60 [32524.386558] [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0 [32524.386568] [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590 [32524.386578] [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50 [32524.386586] [<ffffffff811b39e5>] ? __fget_light+0x105/0x110 [32524.386594] [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0 [32524.386604] [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b ======================================================================== The reason is that all of the addr_lock_key for vlan dev have the same class, so if we change the status for vlan dev, the vlan dev and its real dev will hold the same class of addr_lock_key together, so the warning happened. we should distinguish the lock depth for vlan dev and its real dev. v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which could support to add 8 vlan id on a same vlan dev, I think it is enough for current scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id, and the vlan dev would not meet the same class key with its real dev. The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan dev could get a suitable class key. v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev and its real dev, but it make no sense, because the difference for subclass in the lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth to distinguish the different depth for every vlan dev, the same depth of the vlan dev could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h, I think it is enough here, the lockdep should never exceed that value. v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method, we could use _nested() variants to fix the problem, calculate the depth for every vlan dev, and use the depth as the subclass for addr_lock_key. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-18 17:48:30 -04:00
John W. Linville	4a0c3d9fd1	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-04-17 10:34:22 -04:00
Nicolas Dichtel	74462f0d4a	ip6_tunnel: use the right netns in ioctl handler Because the netdevice may be in another netns than the i/o netns, we should use the i/o netns instead of dev_net(dev). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:16:02 -04:00
Nicolas Dichtel	9aad77c3b5	sit: use the right netns in ioctl handler Because the netdevice may be in another netns than the i/o netns, we should use the i/o netns instead of dev_net(dev). Note that netdev_priv(dev) cannot bu NULL, hence we can remove these useless checks. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:16:02 -04:00
Nicolas Dichtel	8c923ce219	ip_tunnel: use the right netns in ioctl handler Because the netdevice may be in another netns than the i/o netns, we should use the i/o netns instead of dev_net(dev). The variable 'tunnel' was used only to get 'itn', hence to simplify code I remove it and use 't' instead. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:16:02 -04:00
Jan Glauber	b7c0ddf5f2	net: use SYSCALL_DEFINEx for sys_recv Make sys_recv a first class citizen by using the SYSCALL_DEFINEx macro. Besides being cleaner this will also generate meta data for the system call so tracing tools like ftrace or LTTng can resolve this system call. Signed-off-by: Jan Glauber <jan.glauber@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:15:05 -04:00
Cong Wang	0d5edc6873	ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source() In my special case, when a packet is redirected from veth0 to lo, its skb->dev->ifindex would be LOOPBACK_IFINDEX. Meanwhile we pass the hard-coded LOOPBACK_IFINDEX to fib_validate_source() in ip_route_input_slow(). This would cause the following check in fib_validate_source() fail: (dev->ifindex != oif \|\| !IN_DEV_TX_REDIRECTS(idev)) when rp_filter is disabeld on loopback. As suggested by Julian, the caller should pass 0 here so that we will not end up by calling __fib_validate_source(). Cc: Eric Biederman <ebiederm@xmission.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:05:12 -04:00
Cong Wang	6a662719c9	ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif As suggested by Julian: Simply, flowi4_iif must not contain 0, it does not look logical to ignore all ip rules with specified iif. because in fib_rule_match() we do: if (rule->iifindex && (rule->iifindex != fl->flowi_iif)) goto out; flowi4_iif should be LOOPBACK_IFINDEX by default. We need to move LOOPBACK_IFINDEX to include/net/flow.h: 1) It is mostly used by flowi_iif 2) Fix the following compile error if we use it in flow.h by the patches latter: In file included from include/linux/netfilter.h:277:0, from include/net/netns/netfilter.h:5, from include/net/net_namespace.h:21, from include/linux/netdevice.h:43, from include/linux/icmpv6.h:12, from include/linux/ipv6.h:61, from include/net/ipv6.h:16, from include/linux/sunrpc/clnt.h:27, from include/linux/nfs_fs.h:30, from init/do_mounts.c:32: include/net/flow.h: In function ‘flowi4_init_output’: include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function) Cc: Eric Biederman <ebiederm@xmission.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-16 15:05:11 -04:00
Nicolas Dichtel	54d63f787b	ip6_gre: don't allow to remove the fb_tunnel_dev It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but this is unsafe, the module always supposes that this device exists. For example, ip6gre_tunnel_lookup() may use it unconditionally. Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we let ip6gre_destroy_tunnels() do the job). Introduced by commit `c12b395a46` ("gre: Support GRE over IPv6"). CC: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-15 14:56:19 -04:00
Eric Dumazet	aad88724c9	ipv4: add a sock pointer to dst->output() path. In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: `8f646c922d` ("vxlan: keep original skb ownership") Reported-by: lucien xin <lucien.xin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-15 13:47:15 -04:00
Eric Dumazet	b0270e9101	ipv4: add a sock pointer to ip_queue_xmit() ip_queue_xmit() assumes the skb it has to transmit is attached to an inet socket. Commit `31c70d5956` ("l2tp: keep original skb ownership") changed l2tp to not change skb ownership and thus broke this assumption. One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(), so that we do not assume skb->sk points to the socket used by l2tp tunnel. Fixes: `31c70d5956` ("l2tp: keep original skb ownership") Reported-by: Zhan Jianyu <nasa4836@gmail.com> Tested-by: Zhan Jianyu <nasa4836@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-15 12:58:34 -04:00
David S. Miller	00cbc3dcd1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains three Netfilter fixes for your net tree, they are: * Fix missing generation sequence initialization which results in a splat if lockdep is enabled, it was introduced in the recent works to improve nf_conntrack scalability, from Andrey Vagin. * Don't flush the GRE keymap list in nf_conntrack when the pptp helper is disabled otherwise this crashes due to a double release, from Andrey Vagin. * Fix nf_tables cmp fast in big endian, from Patrick McHardy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 19:00:10 -04:00
Vlad Yasevich	1e785f48d2	net: Start with correct mac_len in skb_network_protocol Sometimes, when the packet arrives at skb_mac_gso_segment() its skb->mac_len already accounts for some of the mac lenght headers in the packet. This seems to happen when forwarding through and OpenSSL tunnel. When we start looking for any vlan headers in skb_network_protocol() we seem to ignore any of the already known mac headers and start with an ETH_HLEN. This results in an incorrect offset, dropped TSO frames and general slowness of the connection. We can start counting from the known skb->mac_len and return at least that much if all mac level headers are known and accounted for. Fixes: `53d6471cef` (net: Account for all vlan headers in skb_mac_gso_segment) CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Borkman <dborkman@redhat.com> Tested-by: Martin Filip <nexus+kernel@smoula.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 18:58:58 -04:00
Daniel Borkmann	362d52040c	Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" This reverts commit `ef2820a735` ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") as it introduced a serious performance regression on SCTP over IPv4 and IPv6, though a not as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs. Current state: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 17:56:21 GMT Connecting to host 192.168.241.3, port 5201 Cookie: Lab200slot2.1397238981.812898.548918 [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec (etc) [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 19:08:41 GMT Connecting to host 2001:db8:0:f101::1, port 5201 Cookie: Lab200slot2.1397243321.714295.2b3f7c [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201 Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec (etc) After patch: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64 Time: Mon, 14 Apr 2014 16:40:48 GMT Connecting to host 192.168.240.3, port 5201 Cookie: Lab200slot2.1397493648.413274.65e131 [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec With the reverted patch applied, the SCTP/IPv4 performance is back to normal on latest upstream for IPv4 and IPv6 and has same throughput as 3.4.2 test kernel, steady and interval reports are smooth again. Fixes: `ef2820a735` ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") Reported-by: Peter Butler <pbutler@sonusnet.com> Reported-by: Dongsheng Song <dongsheng.song@gmail.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Peter Butler <pbutler@sonusnet.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 16:26:48 -04:00
Daniel Borkmann	8c482cdc35	net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has been wrongly decoded by commit `a8fc927780` ("sk-filter: Add ability to get socket filter program (v2)") into the opcode BPF_LD\|BPF_B\|BPF_ABS although it should have been decoded as BPF_LD\|BPF_W\|BPF_ABS. In practice, this should not have much side-effect though, as such conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse operation PR_GET_SECCOMP will only return the current seccomp mode, but not the filter itself. Since the transition to the new BPF infrastructure, it's also not used anymore, so we can simply remove this as it's unreachable. Fixes: `a8fc927780` ("sk-filter: Add ability to get socket filter program (v2)") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 16:26:47 -04:00
John W. Linville	08e22e193a	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-04-14 13:47:01 -04:00
Eric Dumazet	30f78d8ebf	ipv6: Limit mtu to 65575 bytes Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-14 12:39:59 -04:00
Patrick McHardy	b855d416dc	netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4 nft_cmp_fast is used for equality comparisions of size <= 4. For comparisions of size < 4 byte a mask is calculated that is applied to both the data from userspace (during initialization) and the register value (during runtime). Both values are stored using (in effect) memcpy to a memory area that is then interpreted as u32 by nft_cmp_fast. This works fine on little endian since smaller types have the same base address, however on big endian this is not true and the smaller types are interpreted as a big number with trailing zero bytes. The mask therefore must not include the lower bytes, but the higher bytes on big endian. Add a helper function that does a cpu_to_le32 to switch the bytes on big endian. Since we're dealing with a mask of just consequitive bits, this works out fine. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-14 10:38:02 +02:00
Andrey Vagin	ee214d54bf	netfilter: nf_conntrack: initialize net.ct.generation [ 251.920788] INFO: trying to register non-static key. [ 251.921386] the code is fine but needs lockdep annotation. [ 251.921386] turning off the locking correctness validator. [ 251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294 [ 251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 251.921386] 0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd [ 251.921386] ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0 [ 251.921386] ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172 [ 251.921386] Call Trace: [ 251.921386] [<ffffffff816b7ecd>] dump_stack+0x45/0x56 [ 251.921386] [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c [ 251.921386] [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff810c7272>] lock_acquire+0xa2/0x120 [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack] [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815e98f2>] ip_output+0x92/0x100 [ 251.921386] [<ffffffff815e8df9>] ip_local_out+0x29/0x90 [ 251.921386] [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0 [ 251.921386] [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0 [ 251.921386] [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960 [ 251.921386] [<ffffffff81602d82>] tcp_connect+0x812/0x960 [ 251.921386] [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70 [ 251.921386] [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0 [ 251.921386] [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470 [ 251.921386] [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330 [ 251.921386] [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0 [ 251.921386] [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50 [ 251.921386] [<ffffffff8158b157>] SYSC_connect+0xe7/0x120 [ 251.921386] [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff8158c36e>] SyS_connect+0xe/0x10 [ 251.921386] [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b [ 312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333) [ 312.015097] INFO: Stall ended before state dump start Fixes: `93bb0ceb75` ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-14 10:35:28 +02:00
Patrick McHardy	60eb18943b	netfilter: nf_tables: handle more than 8 * PAGE_SIZE set name allocations We currently have a limit of 8 * PAGE_SIZE anonymous sets. Lift that limit by continuing the scan if the entire page is exhausted. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-14 10:31:20 +02:00
Mathias Krause	05ab8f2647	filter: prevent nla extensions to peek beyond the end of the message The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- \| ld #0x87654321 \| ldx #42 \| ld #nla \| ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- \| ld #0x87654321 \| ldx #42 \| ld #nlan \| ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- \| ; (needs a fake netlink header at offset 0) \| ld #0 \| ldx #42 \| ld #nlan \| ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: `4738c1db15` ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: `d214c7537b` ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-13 23:31:55 -04:00
Julian Anastasov	91146153da	ipv4: return valid RTA_IIF on ip route get Extend commit `13378cad02` ("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0 which is displayed as 'iif *'. inet_iif is not appropriate to use because skb_iif is not set. Use the skb->dev->ifindex instead. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-13 23:23:41 -04:00
Wang, Xiaoming	b04c461902	net: ipv4: current group_info should be put after using. Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com> Signed-off-by: xiaoming wang <xiaoming.wang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-13 22:52:58 -04:00
Linus Torvalds	454fd351f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull yet more networking updates from David Miller: 1) Various fixes to the new Redpine Signals wireless driver, from Fariya Fatima. 2) L2TP PPP connect code takes PMTU from the wrong socket, fix from Dmitry Petukhov. 3) UFO and TSO packets differ in whether they include the protocol header in gso_size, account for that in skb_gso_transport_seglen(). From Florian Westphal. 4) If VLAN untagging fails, we double free the SKB in the bridging output path. From Toshiaki Makita. 5) Several call sites of sk->sk_data_ready() were referencing an SKB just added to the socket receive queue in order to calculate the second argument via skb->len. This is dangerous because the moment the skb is added to the receive queue it can be consumed in another context and freed up. It turns out also that none of the sk->sk_data_ready() implementations even care about this second argument. So just kill it off and thus fix all these use-after-free bugs as a side effect. 6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti. 7) pktgen needs to do locking properly for LLTX devices, from Daniel Borkmann. 8) xen-netfront driver initializes TX array entries in RX loop :-) From Vincenzo Maffione. 9) After refactoring, some tunnel drivers allow a tunnel to be configured on top itself. Fix from Nicolas Dichtel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) vti: don't allow to add the same tunnel twice gre: don't allow to add the same tunnel twice drivers: net: xen-netfront: fix array initialization bug pktgen: be friendly to LLTX devices r8152: check RTL8152_UNPLUG net: sun4i-emac: add promiscuous support net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO net: ipv6: Fix oif in TCP SYN+ACK route lookup. drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts drivers: net: cpsw: discard all packets received when interface is down net: Fix use after free by removing length arg from sk_data_ready callbacks. Drivers: net: hyperv: Address UDP checksum issues Drivers: net: hyperv: Negotiate suitable ndis version for offload support Drivers: net: hyperv: Allocate memory for all possible per-pecket information bridge: Fix double free and memory leak around br_allowed_ingress bonding: Remove debug_fs files when module init fails i40evf: program RSS LUT correctly i40evf: remove open-coded skb_cow_head ixgb: remove open-coded skb_cow_head igbvf: remove open-coded skb_cow_head ...	2014-04-12 17:31:22 -07:00
Nicolas Dichtel	8d89dcdf80	vti: don't allow to add the same tunnel twice Before the patch, it was possible to add two times the same tunnel: ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41 ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit `b9959fd3b0` ("vti: switch to new ip tunnel code"). CC: Cong Wang <amwang@redhat.com> CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-12 17:03:11 -04:00
Nicolas Dichtel	5a4552752d	gre: don't allow to add the same tunnel twice Before the patch, it was possible to add two times the same tunnel: ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249 ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit `c544193214` ("GRE: Refactor GRE tunneling code."). CC: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-12 17:03:11 -04:00
Daniel Borkmann	0f2eea4b7e	pktgen: be friendly to LLTX devices Similarly to commit `43279500de` ("packet: respect devices with LLTX flag in direct xmit"), we can basically apply the very same to pktgen. This will help testing against LLTX devices such as dummy driver (or others), which only have a single netdevice txq and would otherwise require locking their txq from pktgen side while e.g. in dummy case, we would not need any locking. Fix this by making use of HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will be respected. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-12 01:59:38 -04:00
Linus Torvalds	582076ab16	A bunch of updates and cleanup within the transport layer, particularly with a focus on RDMA. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org iQIcBAABAgAGBQJTRXV1AAoJEDZk62b0Tg6xYWAP/0Kfp9/8Z05SQ1fnJveySw1O nKK//1/GBmquntqFAVHI6yJRLzPJ+Z/Y4u4a0qriwfgpPQvUJQrL77tY/VfEBUPB VoJG1tj7lpdLrO3p4YyPkPjymyC7YOoFNjGEstWFg7HetwnnqqZL2LB+5yJzyqjx y9nv1HzsrbAE7j8C4hQ1Nmds5muUb5VhnTtPhjrx4tP1sWWh8XTVJbsVDiEqx6cu uJXFFTbkONr9jKfv+Ki3H2pZej2yD7w4tU4lkdcGNyij/Q4Xn1iERqroW2/GT6Cl AXxlIKN24ASjWo4VqW0Wf8gO8vbUtHRChoiZ69DvzNTkbWmAIWSFHyQJ4cinwuyr UbOQZuccO59QtpNVpBvG/vjnbI54rg+VGLy+xE0vcrBDlyoptc56IAFSg8zJY5UN ysbyHCGME//9VZ1zeeZvkMjm8z5Enp6x4zmtnUHmufO7DVMTFUePED6U1u9WIyP5 FFy5EboXMSh97yB8REvbIlY2MgBJWYdnyzLKFMeRpzC8fOXJqBoCX8i3Z5SkMYWJ 1FS/pGr7ec/VX1iHXSYi9hhzTJ6o9mEmOIhaO4UcqAuK8Rk2jbCp1Lx0iDhmJtdT zjofGDe57ro7nOZf8A/TI5z+6BZq8KYVfZrXtYaPqC4rXOzJAox/yHlz9FmAJYgv ssOIKXX0ujLwqrMBVatj =yzy/ -----END PGP SIGNATURE----- Merge tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p changes from Eric Van Hensbergen: "A bunch of updates and cleanup within the transport layer, particularly with a focus on RDMA" * tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9pnet_rdma: check token type before int conversion 9pnet: trans_fd : allocate struct p9_trans_fd and struct p9_conn together. 9pnet: p9_client->conn field is unused. Remove it. 9P: Get rid of REQ_STATUS_FLSH 9pnet_rdma: add cancelled() 9pnet_rdma: update request status during send 9P: Add cancelled() to the transport functions. net: Mark function as static in 9p/client.c 9P: Add memory barriers to protect request fields over cb/rpc threads handoff	2014-04-11 14:14:57 -07:00
Lorenzo Colitti	a36dbdb28e	net: ipv6: Fix oif in TCP SYN+ACK route lookup. net-next commit `9c76a11`, ipv6: tcp_ipv6 policy route issue, had a boolean logic error that caused incorrect behaviour for TCP SYN+ACK when oif-based rules are in use. Specifically: 1. If a SYN comes in from a global address, and sk_bound_dev_if is not set, the routing lookup has oif set to the interface the SYN came in on. Instead, it should have oif unset, because for global addresses, the incoming interface doesn't necessarily have any bearing on the interface the SYN+ACK is sent out on. 2. If a SYN comes in from a link-local address, and sk_bound_dev_if is set, the routing lookup has oif set to the interface the SYN came in on. Instead, it should have oif set to sk_bound_dev_if, because that's what the application requested. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 16:43:47 -04:00
David S. Miller	676d23690f	net: Fix use after free by removing length arg from sk_data_ready callbacks. Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 16:15:36 -04:00
Toshiaki Makita	eb7076182d	bridge: Fix double free and memory leak around br_allowed_ingress br_allowed_ingress() has two problems. 1. If br_allowed_ingress() is called by br_handle_frame_finish() and vlan_untag() in br_allowed_ingress() fails, skb will be freed by both vlan_untag() and br_handle_frame_finish(). 2. If br_allowed_ingress() is called by br_dev_xmit() and br_allowed_ingress() fails, the skb will not be freed. Fix these two problems by freeing the skb in br_allowed_ingress() if it fails. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 15:12:47 -04:00
Mikel Astiz	b16c660488	Bluetooth: Request MITM Protection when initiator The GAP Specification gives the flexibility to decide whether MITM Protection is requested or not (Bluetooth Core Specification v4.0 Volume 3, part C, section 6.5.3) when replying to an HCI_EV_IO_CAPA_REQUEST event. The recommendation is not to set this flag "unless the security policy of an available local service requires MITM Protection" (regardless of the bonding type). However, the kernel doesn't necessarily have this information and therefore the safest choice is to always use MITM Protection, also for General Bonding. This patch changes the behavior for the General Bonding initiator role, always requesting MITM Protection even if no high security level is used. Depending on the remote capabilities, the protection might not be actually used, and we will accept this locally unless of course a high security level was originally required. Note that this was already done for Dedicated Bonding. No-Bonding is left unmodified because MITM Protection is normally not desired in these cases. Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de> Signed-off-by: Timo Mueller <timo.mueller@bmw-carit.de> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-11 10:33:08 -07:00
Timo Mueller	7e74170af1	Bluetooth: Use MITM Protection when IO caps allow it When responding to a remotely-initiated pairing procedure, a MITM protected SSP associaton model can be used for pairing if both local and remote IO capabilities are set to something other than NoInputNoOutput, regardless of the bonding type (Dedicated or General). This was already done for Dedicated Bonding but this patch proposes to use the same policy for General Bonding as well. The GAP Specification gives the flexibility to decide whether MITM Protection is used ot not (Bluetooth Core Specification v4.0 Volume 3, part C, section 6.5.3). Note however that the recommendation is not to set this flag "unless the security policy of an available local service requires MITM Protection" (for both Dedicated and General Bonding). However, as we are already requiring MITM for Dedicated Bonding, we will follow this behaviour also for General Bonding. Signed-off-by: Timo Mueller <timo.mueller@bmw-carit.de> Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-11 10:33:08 -07:00
Mikel Astiz	6fd6b915bd	Bluetooth: Refactor code for outgoing dedicated bonding Do not always set the MITM protection requirement by default in the field conn->auth_type, since this will be added later in hci_io_capa_request_evt(), as part of the requirements specified in HCI_OP_IO_CAPABILITY_REPLY. This avoids a hackish exception for the auto-reject case, but doesn't change the behavior of the code at all. Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-11 10:33:08 -07:00
Mikel Astiz	b7f94c8808	Bluetooth: Refactor hci_get_auth_req() Refactor the code without changing its behavior by handling the no-bonding cases first followed by General Bonding. Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de> Signed-off-by: Timo Mueller <timo.mueller@bmw-carit.de> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-04-11 10:33:08 -07:00
Janusz Dziedzic	4f267c1198	cfg80211: reg: set DFS CAC time in case of custom regd Set DFS CAC time also in case of using custom and strict regulatory from drivers. In other case we could have unset DFS CAC time directly after driver loaded and before issue regulatory set from user mode. Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 10:00:10 +02:00
Felix Fietkau	764152ff66	mac80211: exclude AP_VLAN interfaces from tx power calculation Their power value is initialized to zero. This patch fixes an issue where the configured power drops to the minimum value when AP_VLAN interfaces are created/removed. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 09:37:41 +02:00
Felix Fietkau	990baec022	mac80211: suppress BSS info change notifications for AP_VLAN Fixes warnings on tx power changes Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-11 09:37:01 +02:00
Florian Westphal	6d39d589bb	net: core: don't account for udp header size when computing seglen In case of tcp, gso_size contains the tcpmss. For UFO (udp fragmentation offloading) skbs, gso_size is the fragment payload size, i.e. we must not account for udp header size. Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet will be needlessly fragmented in the forward path, because we think its individual segments are too large for the outgoing link. Fixes: `fe6cc55f3a` ("net: ip, ipv6: handle gso skbs in forwarding path") Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-10 21:41:25 -04:00
Johannes Berg	c14a74007f	cfg80211: ignore invalid BSSIDs when looking for BSSes When looking for a BSS matching given parameters, ignore invalid BSSIDs. This avoids, for example, trying to join an IBSS that has a multicast BSSID, which isn't supported by all drivers nor is it a valid configuration of the IBSS so better create a new one with a correctly chosen random BSSID. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-10 10:09:18 +02:00
Johannes Berg	74f8274103	cfg80211: reject invalid IBSS BSSIDs in wext compat code Don't allow using a multicast address as the BSSID, that isn't a valid configuration. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-10 10:09:16 +02:00
Zhao, Gang	96998e3a2f	cfg80211: remove unused wiphy argument from cfg80211_wext_freq() cfg80211_wext_freq() is declared in wext-compat.h, but its parameter struct wiphy's declaration is not included there. As the parameter isn't used, just remove it. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> [remove parameter instead of changing to netdev] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-10 10:06:19 +02:00
Dmitry Petukhov	f34c4a35d8	l2tp: take PMTU from tunnel UDP socket When l2tp driver tries to get PMTU for the tunnel destination, it uses the pointer to struct sock that represents PPPoX socket, while it should use the pointer that represents UDP socket of the tunnel. Signed-off-by: Dmitry Petukhov <dmgenp@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-09 15:24:38 -04:00
Daniel Borkmann	1e1cdf8ac7	net: sctp: test if association is dead in sctp_wake_up_waiters In function sctp_wake_up_waiters(), we need to involve a test if the association is declared dead. If so, we don't have any reference to a possible sibling association anymore and need to invoke sctp_write_space() instead, and normally walk the socket's associations and notify them of new wmem space. The reason for special casing is that otherwise, we could run into the following issue when a sctp_primitive_SEND() call from sctp_sendmsg() fails, and tries to flush an association's outq, i.e. in the following way: sctp_association_free() `-> list_del(&asoc->asocs) <-- poisons list pointer asoc->base.dead = true sctp_outq_free(&asoc->outqueue) `-> __sctp_outq_teardown() `-> sctp_chunk_free() `-> consume_skb() `-> sctp_wfree() `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers if asoc->ep->sndbuf_policy=0 Therefore, only walk the list in an 'optimized' way if we find that the current association is still active. We could also use list_del_init() in addition when we call sctp_association_free(), but as Vlad suggests, we want to trap such bugs and thus leave it poisoned as is. Why is it safe to resolve the issue by testing for asoc->base.dead? Parallel calls to sctp_sendmsg() are protected under socket lock, that is lock_sock()/release_sock(). Only within that path under lock held, we're setting skb/chunk owner via sctp_set_owner_w(). Eventually, chunks are freed directly by an association still under that lock. So when traversing association list on destruction time from sctp_wake_up_waiters() via sctp_wfree(), a different CPU can't be running sctp_wfree() while another one calls sctp_association_free() as both happens under the same lock. Therefore, this can also not race with setting/testing against asoc->base.dead as we are guaranteed for this to happen in order, under lock. Further, Vlad says: the times we check asoc->base.dead is when we've cached an association pointer for later processing. In between cache and processing, the association may have been freed and is simply still around due to reference counts. We check asoc->base.dead under a lock, so it should always be safe to check and not race against sctp_association_free(). Stress-testing seems fine now, too. Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-09 12:37:49 -04:00
Zhao, Gang	aa51142fc5	mac80211: fix some missing includes Some header files in mac80211 don't include all the header files they require, fix that. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 14:49:43 +02:00
Pawel Kulakowski	a2e7495d63	mac80211: Allow disabling LDPC This allows user-space (wpa_supplicant) to disable LDPC coding. Signed-off-by: Pawel Kulakowski <pawel.kulakowski@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:56:00 +02:00
Michal Kazior	65d26f29ec	cfg80211: fix radar_detect combination checking All bits from radar_detect must match combination radar bitmask. Otherwise it is theoretically possible to lead into an invalid combination provided a driver reports strange combinations. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:58 +02:00
Luciano Coelho	5d52ee8110	mac80211: allow reservation of a running chanctx With single-channel drivers, we need to be able to change a running chanctx if we want to use chanctx reservation. Not all drivers may be able to do this, so add a flag that indicates support for it. Changing a running chanctx can also be used as an optimization in multi-channel drivers when the context needs to be reserved for future usage. Introduce IEEE80211_CHANCTX_RESERVED chanctx mode to mark a channel as reserved so nobody else can use it (since we know it's going to change). In the future, we may allow several vifs to use the same reservation as long as they plan to use the chanctx on the same future channel. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:56 +02:00
Luciano Coelho	11335a550b	mac80211: implement chanctx reservation In order to support channel switch with multiple vifs and multiple contexts, we implement a concept of channel context reservation. This allows us to reserve a channel context to be used later. The reservation functionality is not tied directly to channel switch and may be used in other situations (eg. reserving a channel context during IBSS join). We first check if an existing compatible context exists and if it does, we reserve it. If there is no compatible context we create a new one and reserve it. Additionally, split ieee80211_vif_copy_chanctx_to_vlans() so we can call it while already holding the chanctx mutex. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:54 +02:00
Luciano Coelho	77eeba974f	mac80211: refactor ieee80211_assign/unassign_vif_chanctx into one Combine the functions into one, so that we can switch from one context to the other without having to unassign and assign separately. This is needed by the channel reservation functionality because otherwise we have a small period of time when the chanctx is set to NULL, which can cause problems if someone else is trying to dereference it. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:52 +02:00
Luciano Coelho	33ffd952c2	mac80211: split ieee80211_vif_change_channel in two ieee80211_vif_change_channel() locks chanctx_mtx. When implementing channel reservation for CS, we will need to call the function to change channel when the lock is already held, so split the part that requires the lock out and leave the locking in the original function. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:50 +02:00
Michal Kazior	4e141dad26	mac80211: protect AP VLAN list with local->mtx It was impossible to change chanctx of master AP for AP VLANs because the copy function requires RTNL which can't be simply taken in mac80211 code due to possible deadlocks. This is required for future chanctx reservation that re-bind vifs to new chanctx. This requires safe AP VLAN iteration without RTNL. Now VLANs can be iterated while holding either RTNL or local->mtx because the list is modified while holding both of these locks. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:49 +02:00
Luciano Coelho	b6a550156b	cfg80211/mac80211: move more combination checks to mac80211 Get rid of the cfg80211_can_add_interface() and cfg80211_can_change_interface() functions by moving that functionality to mac80211. With this patch all interface combination checks are now out of cfg80211 (except for the channel switch case which will be addressed in a future commit). Additionally, modify the ieee80211_check_combinations() function so that an undefined chandef can be passed, in order to use it before a channel is defined. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:47 +02:00
Luciano Coelho	71965c1d04	cfg80211/mac80211: move combination check to mac80211 for ibss Now that mac80211 can check the interface combinations itself, move the combinations check from cfg80211 to mac80211 when joining an IBSS. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:45 +02:00
Luciano Coelho	73de86a389	cfg80211/mac80211: move interface counting for combination check to mac80211 Move the counting part of the interface combination check from cfg80211 to mac80211. This is needed to simplify locking when the driver has to perform a combination check by itself (eg. with channel-switch). Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:43 +02:00
Luciano Coelho	2beb6dab2d	cfg80211/mac80211: refactor cfg80211_chandef_dfs_required() Some interface types don't require DFS (such as STATION, P2P_CLIENT etc). In order to centralize these decisions, make cfg80211_chandef_dfs_required() take the iftype into consideration. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:41 +02:00
Luciano Coelho	cb2d956dd3	cfg80211: refactor cfg80211_can_use_iftype_chan() Separate the code that counts the interface types and channels from the code that check the interface combinations. The new function that checks for combinations is exported so it can be called by the drivers. This is done in preparation for moving the interface combinations checks out of cfg80211. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:39 +02:00
Ilan Peer	c8866e55a9	cfg80211: Enable GO operation on indoor channels Allow GO operation on a channel marked with IEEE80211_CHAN_INDOOR_ONLY iff there is a user hint indicating that the platform is operating in an indoor environment, i.e., the platform is a printer or media center device. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:37 +02:00
Ilan Peer	52616f2b44	cfg80211: Add an option to hint indoor operation Add the option to hint the wireless core that it is operating in an indoor environment. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:35 +02:00
Ilan Peer	174e0cd28a	cfg80211: Enable GO operation on additional channels Allow GO operation on a channel marked with IEEE80211_CHAN_GO_CONCURRENT iff there is an active station interface that is associated to an AP operating on the same channel in the 2 GHz band or the same UNII band (in the 5 GHz band). This relaxation is not allowed if the channel is marked with IEEE80211_CHAN_RADAR. Note that this is a permissive approach to the FCC definitions, that require a clear assessment that the device operating the AP is an authorized master, i.e., with radar detection and DFS capabilities. It is assumed that such restrictions are enforced by user space. Furthermore, it is assumed, that if the conditions that allowed for the operation of the GO on such a channel change, i.e., the station interface disconnected from the AP, it is the responsibility of user space to evacuate the GO from the channel. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:34 +02:00
Ilan Peer	94fc661f68	cfg80211: Add Kconfig option for cellular BS hints Move the regulatory cellular base station hints support under a specific configuration option and make the option depend on CFG80211_CERTIFICATION_ONUS. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:33 +02:00
David Spinadel	570dbde137	cfg80211: Add indoor only and GO concurrent channel attributes The FCC are clarifying some soft configuration requirements, which among other include the following: 1. Indoor operation, where a device can use channels requiring indoor operation, subject to that it can guarantee indoor operation, i.e., the device is connected to AC Power or the device is under the control of a local master that is acting as an AP and is connected to AC Power. 2. Concurrent GO operation, where devices may instantiate a P2P GO while they are under the guidance of an authorized master. For example, on a channel on which a BSS is connected to an authorized master, i.e., with DFS and radar detection capability in the UNII band. See https://apps.fcc.gov/eas/comments/GetPublishedDocument.html?id=327&tn=528122 Add support for advertising Indoor-only and GO-Concurrent channel properties. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:32 +02:00
Zhao, Gang	babd3a2721	cfg80211: slightly clean up of cfg80211_sme_connect() Wdev->ssid_len has already been set in cfg80211_connect() and is equal to connect->ssid_len. Use wdev->ssid_len instead of connect->ssid_len so it will be consistent with previous ssid assignment statement. If bss is found in cfg80211_get_conn_bss(), wdev->conn->state is set to CFG80211_CONN_AUTHENTICATE_NEXT in there. So it's not needed to set it manually to CFG80211_CONN_AUTHENTICATE_NEXT if bss is found in that function. Signed-off-by: Zhao, Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:31 +02:00
Monam Agarwal	0c2bef4621	mac80211: use RCU_INIT_POINTER rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. However, in the case that NULL is assigned there's no structure to initialize so using RCU_INIT_POINTER instead is safe and more efficient. Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> [squash eight tiny patches, rewrite commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:30 +02:00
Monam Agarwal	34dd886c19	cfg80211: regulatory: use RCU_INIT_POINTER rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. However, in the case that NULL is assigned there's no structure to initialize so using RCU_INIT_POINTER instead is safe and more efficient. Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> [rewrite commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:30 +02:00
Emmanuel Grumbach	77be2c54c5	mac80211: add vif to flush call This will allow the low level driver to make decision based on the vif such as queues etc... Since the vif might be NULL, we can't add it to the tracing functions. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> [fix staging rtl8821ae driver] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:29 +02:00
Johannes Berg	78f22b6a3a	cfg80211: allow userspace to take ownership of interfaces When dynamically creating interfaces from userspace, e.g. for P2P usage, such interfaces are usually owned by the process that created them, i.e. wpa_supplicant. Should wpa_supplicant crash, such interfaces will often cease operating properly and cause problems on restarting the process. To avoid this problem, introduce an ownership concept for interfaces. If an interface is owned by a netlink socket, then it will be destroyed if the netlink socket is closed for any reason, including if the process it belongs to crashed. This gives us a race-free way to get rid of any such interfaces. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:28 +02:00
Jan-Simon Möller	6e1ee5d2e9	mac80211: remove VLAIS usage from mac80211 Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99 compliant equivalent. This is the original VLAIS struct. struct { struct aead_request req; u8 priv[crypto_aead_reqsize(tfm)]; } aead_req; This patch instead allocates the appropriate amount of memory using a char array. The new code can be compiled with both gcc and clang. Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de> Signed-off-by: Behan Webster <behanw@converseincode.com> [small style cleanups] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:55:27 +02:00
Chun-Yeow Yeoh	00a9a6d1e2	mac80211: update last_tx_rate only for data frame Rate controller in firmware may also return the Tx Rate used for management frame that is usually sent as lowest Tx Rate (1Mbps in 2.4GHz). So update the last_tx_rate only if it is data frame. This patch is tested with ath9k_htc. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:50:45 +02:00
Michal Kazior	9b4816f543	mac80211: fix radar_enabled propagation If chandef had non-HT width it was possible for radar_enabled update to not be propagated properly through drv_config(). This happened because ieee80211_hw_conf_chan() would never see different local->hw.conf.chandef and local->_oper_chandef. This wasn't a problem with HT chandefs because _oper_chandef width is reset to non-HT in ieee80211_free_chanctx() making ieee80211_hw_conf_chan() to kick in. This problem led (at least) ath10k to not start CAC if prior CAC was cancelled and both CACs were requested for identical non-HT chandefs. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:48:36 +02:00
Ido Yariv	7b8a9cdd1f	mac80211: Disable SMPS for the monitor interface All antennas should be operational when monitoring to maximize reception. Signed-off-by: Ido Yariv <idox.yariv@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:47:58 +02:00
Johannes Berg	115b943a6e	mac80211: fix software remain-on-channel implementation Jouni reported that when doing off-channel transmissions mixed with on-channel transmissions, the on-channel ones ended up on the off-channel in some cases. The reason for that is that during the refactoring of the off- channel code, I lost the part that stopped all activity and as a consequence the on-channel frames (including data frames) were no longer queued but would be transmitted on the temporary channel. Fix this by simply restoring the lost activity stop call. Cc: stable@vger.kernel.org Fixes: `2eb278e083` ("mac80211: unify SW/offload remain-on-channel") Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-04-09 10:47:47 +02:00
Linus Torvalds	75ff24fa52	Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "Highlights: - server-side nfs/rdma fixes from Jeff Layton and Tom Tucker - xdr fixes (a larger xdr rewrite has been posted but I decided it would be better to queue it up for 3.16). - miscellaneous fixes and cleanup from all over (thanks especially to Kinglong Mee)" * 'for-3.15' of git://linux-nfs.org/~bfields/linux: (36 commits) nfsd4: don't create unnecessary mask acl nfsd: revert v2 half of "nfsd: don't return high mode bits" nfsd4: fix memory leak in nfsd4_encode_fattr() nfsd: check passed socket's net matches NFSd superblock's one SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp SUNRPC: New helper for creating client with rpc_xprt NFSD: Free backchannel xprt in bc_destroy NFSD: Clear wcc data between compound ops nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+ nfsd4: fix nfs4err_resource in 4.1 case nfsd4: fix setclientid encode size nfsd4: remove redundant check from nfsd4_check_resp_size nfsd4: use more generous NFS4_ACL_MAX nfsd4: minor nfsd4_replay_cache_entry cleanup nfsd4: nfsd4_replay_cache_entry should be static nfsd4: update comments with obsolete function name rpc: Allow xdr_buf_subsegment to operate in-place NFSD: Using free_conn free connection SUNRPC: fix memory leak of peer addresses in XPRT ...	2014-04-08 18:28:14 -07:00
Linus Torvalds	ce7613db2d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull more networking updates from David Miller: 1) If a VXLAN interface is created with no groups, we can crash on reception of packets. Fix from Mike Rapoport. 2) Missing includes in CPTS driver, from Alexei Starovoitov. 3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki and Dan Carpenter. 4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers. From Josh Boyer. 5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel Borkmann. 6) Byte-Queue-Limit enabled drivers aren't handled properly in AF_PACKET transmit path, also from Daniel Borkmann. Same problem exists in pktgen, and Daniel fixed it there too. 7) Fix resource leaks in driver probe error paths of new sxgbe driver, from Francois Romieu. 8) Truesize of SKBs can gradually get more and more corrupted in NAPI packet recycling path, fix from Eric Dumazet. 9) Fix uniprocessor netfilter build, from Florian Westphal. In the longer term we should perhaps try to find a way for ARRAY_SIZE() to work even with zero sized array elements. 10) Fix crash in netfilter conntrack extensions due to mis-estimation of required extension space. From Andrey Vagin. 11) Since we commit table rule updates before trying to copy the counters back to userspace (it's the last action we perform), we really can't signal the user copy with an error as we are beyond the point from which we can unwind everything. This causes all kinds of use after free crashes and other mysterious behavior. From Thomas Graf. 12) Restore previous behvaior of div/mod by zero in BPF filter processing. From Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net: sctp: wake up all assocs if sndbuf policy is per socket isdnloop: several buffer overflows netdev: remove potentially harmful checks pktgen: fix xmit test for BQL enabled devices net/at91_ether: avoid NULL pointer dereference tipc: Let tipc_release() return 0 at86rf230: fix MAX_CSMA_RETRIES parameter mac802154: fix duplicate #include headers sxgbe: fix duplicate #include headers net: filter: be more defensive on div/mod by X==0 netfilter: Can't fail and free after table replacement xen-netback: Trivial format string fix net: bcmgenet: Remove unnecessary version.h inclusion net: smc911x: Remove unused local variable bonding: Inactive slaves should keep inactive flag's value netfilter: nf_tables: fix wrong format in request_module() netfilter: nf_tables: set names cannot be larger than 15 bytes netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len netfilter: Add {ipt,ip6t}_osf aliases for xt_osf netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks ...	2014-04-08 12:41:23 -07:00
Linus Torvalds	d586c86d50	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull second set of s390 patches from Martin Schwidefsky: "The second part of Heikos uaccess rework, the page table walker for uaccess is now a thing of the past (yay!) The code change to fix the theoretical TLB flush problem allows us to add a TLB flush optimization for zEC12, this machine has new instructions that allow to do CPU local TLB flushes for single pages and for all pages of a specific address space. Plus the usual bug fixing and some more cleanup" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/uaccess: rework uaccess code - fix locking issues s390/mm,tlb: optimize TLB flushing for zEC12 s390/mm,tlb: safeguard against speculative TLB creation s390/irq: Use defines for external interruption codes s390/irq: Add defines for external interruption codes s390/sclp: add timeout for queued requests kvm/s390: also set guest pages back to stable on kexec/kdump lcs: Add missing destroy_timer_on_stack() s390/tape: Add missing destroy_timer_on_stack() s390/tape: Use del_timer_sync() s390/3270: fix crash with multiple reset device requests s390/bitops,atomic: add missing memory barriers s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6	2014-04-08 12:02:28 -07:00
Daniel Borkmann	52c35befb6	net: sctp: wake up all assocs if sndbuf policy is per socket SCTP charges chunks for wmem accounting via skb->truesize in sctp_set_owner_w(), and sctp_wfree() respectively as the reverse operation. If a sender runs out of wmem, it needs to wait via sctp_wait_for_sndbuf(), and gets woken up by a call to __sctp_write_space() mostly via sctp_wfree(). __sctp_write_space() is being called per association. Although we assign sk->sk_write_space() to sctp_write_space(), which is then being done per socket, it is only used if send space is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE is set and therefore not invoked in sock_wfree(). Commit `4c3a5bdae2` ("sctp: Don't charge for data in sndbuf again when transmitting packet") fixed an issue where in case sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. However, a still remaining issue is that if net.sctp.sndbuf_policy=0, that is accounting per socket, and one-to-many sockets are in use, the reclaimed write space from sctp_wfree() is 'unfairly' handed back on the server to the association that is the lucky one to be woken up again via __sctp_write_space(), while the remaining associations are never be woken up again (unless by a signal). The effect disappears with net.sctp.sndbuf_policy=1, that is wmem accounting per association, as it guarantees a fair share of wmem among associations. Therefore, if we have reclaimed memory in case of per socket accounting, wake all related associations to a socket in a fair manner, that is, traverse the socket association list starting from the current neighbour of the association and issue a __sctp_write_space() to everyone until we end up waking ourselves. This guarantees that no association is preferred over another and even if more associations are taken into the one-to-many session, all receivers will get messages from the server and are not stalled forever on high load. This setting still leaves the advantage of per socket accounting in touch as an association can still use up global limits if unused by others. Fixes: `4eb701dfc6` ("[SCTP] Fix SCTP sendbuffer accouting.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-08 13:06:07 -04:00
Andrey Vagin	8142b227ef	netfilter: nf_conntrack: flush net_gre->keymap_list only from gre helper nf_ct_gre_keymap_flush() removes a nf_ct_gre_keymap object from net_gre->keymap_list and frees the object. But it doesn't clean a reference on this object from ct_pptp_info->keymap[dir]. Then nf_ct_gre_keymap_destroy() may release the same object again. So nf_ct_gre_keymap_flush() can be called only when we are sure that when nf_ct_gre_keymap_destroy will not be called. nf_ct_gre_keymap is created by nf_ct_gre_keymap_add() and the right way to destroy it is to call nf_ct_gre_keymap_destroy(). This patch marks nf_ct_gre_keymap_flush() as static, so this patch can break compilation of third party modules, which use nf_ct_gre_keymap_flush. I'm not sure this is the right way to deprecate this function. [ 226.540793] general protection fault: 0000 [#1] SMP [ 226.541750] Modules linked in: nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre ip_gre ip_tunnel gre ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack veth tun bridge stp llc ppdev microcode joydev pcspkr serio_raw virtio_console virtio_balloon floppy parport_pc parport pvpanic i2c_piix4 virtio_net drm_kms_helper ttm ata_generic virtio_pci virtio_ring virtio drm i2c_core pata_acpi [last unloaded: ip_tunnel] [ 226.541776] CPU: 0 PID: 49 Comm: kworker/u4:2 Not tainted 3.14.0-rc8+ #101 [ 226.541776] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 226.541776] Workqueue: netns cleanup_net [ 226.541776] task: ffff8800371e0000 ti: ffff88003730c000 task.ti: ffff88003730c000 [ 226.541776] RIP: 0010:[<ffffffff81389ba9>] [<ffffffff81389ba9>] __list_del_entry+0x29/0xd0 [ 226.541776] RSP: 0018:ffff88003730dbd0 EFLAGS: 00010a83 [ 226.541776] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800374e6c40 RCX: dead000000200200 [ 226.541776] RDX: 6b6b6b6b6b6b6b6b RSI: ffff8800371e07d0 RDI: ffff8800374e6c40 [ 226.541776] RBP: ffff88003730dbd0 R08: 0000000000000000 R09: 0000000000000000 [ 226.541776] R10: 0000000000000001 R11: ffff88003730d92e R12: 0000000000000002 [ 226.541776] R13: ffff88007a4c42d0 R14: ffff88007aef0000 R15: ffff880036cf0018 [ 226.541776] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 226.541776] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 226.541776] CR2: 00007f07f643f7d0 CR3: 0000000036fd2000 CR4: 00000000000006f0 [ 226.541776] Stack: [ 226.541776] ffff88003730dbe8 ffffffff81389c5d ffff8800374ffbe4 ffff88003730dc28 [ 226.541776] ffffffffa0162a43 ffffffffa01627c5 ffff88007a4c42d0 ffff88007aef0000 [ 226.541776] ffffffffa01651c0 ffff88007a4c45e0 ffff88007aef0000 ffff88003730dc40 [ 226.541776] Call Trace: [ 226.541776] [<ffffffff81389c5d>] list_del+0xd/0x30 [ 226.541776] [<ffffffffa0162a43>] nf_ct_gre_keymap_destroy+0x283/0x2d0 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffffa01627c5>] ? nf_ct_gre_keymap_destroy+0x5/0x2d0 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffffa0162ab7>] gre_destroy+0x27/0x70 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffffa0117de3>] destroy_conntrack+0x83/0x200 [nf_conntrack] [ 226.541776] [<ffffffffa0117d87>] ? destroy_conntrack+0x27/0x200 [nf_conntrack] [ 226.541776] [<ffffffffa0117d60>] ? nf_conntrack_hash_check_insert+0x2e0/0x2e0 [nf_conntrack] [ 226.541776] [<ffffffff81630142>] nf_conntrack_destroy+0x72/0x180 [ 226.541776] [<ffffffff816300d5>] ? nf_conntrack_destroy+0x5/0x180 [ 226.541776] [<ffffffffa011ef80>] ? kill_l3proto+0x20/0x20 [nf_conntrack] [ 226.541776] [<ffffffffa011847e>] nf_ct_iterate_cleanup+0x14e/0x170 [nf_conntrack] [ 226.541776] [<ffffffffa011f74b>] nf_ct_l4proto_pernet_unregister+0x5b/0x90 [nf_conntrack] [ 226.541776] [<ffffffffa0162409>] proto_gre_net_exit+0x19/0x30 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffff815edf89>] ops_exit_list.isra.1+0x39/0x60 [ 226.541776] [<ffffffff815eecc0>] cleanup_net+0x100/0x1d0 [ 226.541776] [<ffffffff810a608a>] process_one_work+0x1ea/0x4f0 [ 226.541776] [<ffffffff810a6028>] ? process_one_work+0x188/0x4f0 [ 226.541776] [<ffffffff810a64ab>] worker_thread+0x11b/0x3a0 [ 226.541776] [<ffffffff810a6390>] ? process_one_work+0x4f0/0x4f0 [ 226.541776] [<ffffffff810af42d>] kthread+0xed/0x110 [ 226.541776] [<ffffffff8173d4dc>] ? _raw_spin_unlock_irq+0x2c/0x40 [ 226.541776] [<ffffffff810af340>] ? kthread_create_on_node+0x200/0x200 [ 226.541776] [<ffffffff8174747c>] ret_from_fork+0x7c/0xb0 [ 226.541776] [<ffffffff810af340>] ? kthread_create_on_node+0x200/0x200 [ 226.541776] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08 [ 226.541776] RIP [<ffffffff81389ba9>] __list_del_entry+0x29/0xd0 [ 226.541776] RSP <ffff88003730dbd0> [ 226.612193] ---[ end trace 985ae23ddfcc357c ]--- Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-08 10:56:12 +02:00
Linus Torvalds	26c12d9334	Merge branch 'akpm' (incoming from Andrew) Merge second patch-bomb from Andrew Morton: - the rest of MM - zram updates - zswap updates - exit - procfs - exec - wait - crash dump - lib/idr - rapidio - adfs, affs, bfs, ufs - cris - Kconfig things - initramfs - small amount of IPC material - percpu enhancements - early ioremap support - various other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits) MAINTAINERS: update Intel C600 SAS driver maintainers fs/ufs: remove unused ufs_super_block_third pointer fs/ufs: remove unused ufs_super_block_second pointer fs/ufs: remove unused ufs_super_block_first pointer fs/ufs/super.c: add __init to init_inodecache() doc/kernel-parameters.txt: add early_ioremap_debug arm64: add early_ioremap support arm64: initialize pgprot info earlier in boot x86: use generic early_ioremap mm: create generic early_ioremap() support x86/mm: sparse warning fix for early_memremap lglock: map to spinlock when !CONFIG_SMP percpu: add preemption checks to __this_cpu ops vmstat: use raw_cpu_ops to avoid false positives on preemption checks slub: use raw_cpu_inc for incrementing statistics net: replace __this_cpu_inc in route.c with raw_cpu_inc modules: use raw_cpu_write for initialization of per cpu refcount. mm: use raw_cpu ops for determining current NUMA node percpu: add raw_cpu_ops slub: fix leak of 'name' in sysfs_slab_add ...	2014-04-07 16:38:06 -07:00
Christoph Lameter	3ed66e910c	net: replace __this_cpu_inc in route.c with raw_cpu_inc The RT_CACHE_STAT_INC macro triggers the new preemption checks for __this_cpu ops. I do not see any other synchronization that would allow the use of a __this_cpu operation here however in commit `dbd2915ce8` ("[IPV4]: RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of raw_smp_processor_id() here because "we do not care" about races. In the past we agreed that the price of disabling interrupts here to get consistent counters would be too high. These counters may be inaccurate due to race conditions. The use of __this_cpu op improves the situation already from what commit `dbd2915ce8` did since the single instruction emitted on x86 does not allow the race to occur anymore. However, non x86 platforms could still experience a race here. Trace: __this_cpu_add operation in preemptible [00000000] code: avahi-daemon/1193 caller is __this_cpu_preempt_check+0x38/0x60 CPU: 1 PID: 1193 Comm: avahi-daemon Tainted: GF 3.12.0-rc4+ #187 Call Trace: check_preemption_disabled+0xec/0x110 __this_cpu_preempt_check+0x38/0x60 __ip_route_output_key+0x575/0x8c0 ip_route_output_flow+0x27/0x70 udp_sendmsg+0x825/0xa20 inet_sendmsg+0x85/0xc0 sock_sendmsg+0x9c/0xd0 ___sys_sendmsg+0x37c/0x390 __sys_sendmsg+0x49/0x90 SyS_sendmsg+0x12/0x20 tracesys+0xe1/0xe6 Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-07 16:36:14 -07:00
Linus Torvalds	467a9e1633	CPU hotplug notifiers registration fixes for 3.15-rc1 The purpose of this single series of commits from Srivatsa S Bhat (with a small piece from Gautham R Shenoy) touching multiple subsystems that use CPU hotplug notifiers is to provide a way to register them that will not lead to deadlocks with CPU online/offline operations as described in the changelog of commit `93ae4f978c` (CPU hotplug: Provide lockless versions of callback registration functions). The first three commits in the series introduce the API and document it and the rest simply goes through the users of CPU hotplug notifiers and converts them to using the new method. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTQow2AAoJEILEb/54YlRxW4QQAJlYRDUzwFJzJzYhltQYuVR+ 4D74XMtvXgoJfg3cwdSWvMKKpJZnA9BVN0f7Hcx9wYmgdexYUuHeZJmMNyc3S2+g KjKBIsugvgmZhHbbLd6TJ6GBbhGT5JLt9VmSfL9zIkveInU1YHFUUqL/mxdHm4J0 BSGKjk2rN3waRJgmY+xfliFLtQjDKFwJpMuvrgtoUyfas3f4sIV43UNbqdvA/weJ rzedxXOlKH/id4b56lj/4iIzcoL3mwvJJ7r6n0CEMsKv87z09kqR0O+69Tsq/cgs j17CsvoJOmZGk3QTeKVMQWBsvk6aPoDu3zK83gLbQMt+qjOpSTbJLz/3HZw4/TrW ss4nuZne1DLMGS+6hoxYbTP+6Ni//Kn+l/LrHc5jb7m1X3lMO4W2aV3IROtIE1rv lEP1IG01NU4u9YwkVj1dyhrkSp8tLPul4SrUK8W+oNweOC5crjJV7vJbIPJgmYiM IZN55wln0yVRtR4TX+rmvN0PixsInE8MeaVCmReApyF9pdzul/StxlBze5BKLSJD cqo1kNPpsmdxoDucqUpQ/gSvy+IOl2qnlisB5PpV93sk7De6TFDYrGHxjYIW7jMf StXwdCDDQhzd2Q8Kfpp895A1dbIl8rKtwA6bTU2eX+BfMVFzuMdT44cvosx1+UdQ sWl//rg76nb13dFjvF+q =SW7Q -----END PGP SIGNATURE----- Merge tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull CPU hotplug notifiers registration fixes from Rafael Wysocki: "The purpose of this single series of commits from Srivatsa S Bhat (with a small piece from Gautham R Shenoy) touching multiple subsystems that use CPU hotplug notifiers is to provide a way to register them that will not lead to deadlocks with CPU online/offline operations as described in the changelog of commit `93ae4f978c` ("CPU hotplug: Provide lockless versions of callback registration functions"). The first three commits in the series introduce the API and document it and the rest simply goes through the users of CPU hotplug notifiers and converts them to using the new method" * tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits) net/iucv/iucv.c: Fix CPU hotplug callback registration net/core/flow.c: Fix CPU hotplug callback registration mm, zswap: Fix CPU hotplug callback registration mm, vmstat: Fix CPU hotplug callback registration profile: Fix CPU hotplug callback registration trace, ring-buffer: Fix CPU hotplug callback registration xen, balloon: Fix CPU hotplug callback registration hwmon, via-cputemp: Fix CPU hotplug callback registration hwmon, coretemp: Fix CPU hotplug callback registration thermal, x86-pkg-temp: Fix CPU hotplug callback registration octeon, watchdog: Fix CPU hotplug callback registration oprofile, nmi-timer: Fix CPU hotplug callback registration intel-idle: Fix CPU hotplug callback registration clocksource, dummy-timer: Fix CPU hotplug callback registration drivers/base/topology.c: Fix CPU hotplug callback registration acpi-cpufreq: Fix CPU hotplug callback registration zsmalloc: Fix CPU hotplug callback registration scsi, fcoe: Fix CPU hotplug callback registration scsi, bnx2fc: Fix CPU hotplug callback registration scsi, bnx2i: Fix CPU hotplug callback registration ...	2014-04-07 14:55:46 -07:00
Veaceslav Falico	6859e7df6d	netdev: remove potentially harmful checks Currently we're checking a variable for != NULL after actually dereferencing it, in netdev_lower_get_next_private*(). It's counter-intuitive at best, and can lead to faulty usage (as it implies that the variable can be NULL), so fix it by removing the useless checks. Reported-by: Daniel Borkmann <dborkman@redhat.com> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Jiri Pirko <jiri@resnulli.us> CC: stephen hemminger <stephen@networkplumber.org> CC: Jerry Chu <hkchu@google.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-07 15:52:07 -04:00
Daniel Borkmann	6f25cd47dc	pktgen: fix xmit test for BQL enabled devices Similarly as in commit `8e2f1a63f2` ("packet: fix packet_direct_xmit for BQL enabled drivers"), we test for __QUEUE_STATE_STACK_XOFF bit in pktgen's xmit, which would not fully fill the device's TX ring for BQL drivers that use netdev_tx_sent_queue(). Fix is to use, similarly as we do in packet sockets, netif_xmit_frozen_or_drv_stopped() test. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-07 15:20:44 -04:00
Geert Uytterhoeven	065d7e3956	tipc: Let tipc_release() return 0 net/tipc/socket.c: In function ‘tipc_release’: net/tipc/socket.c:352: warning: ‘res’ is used uninitialized in this function Introduced by commit `24be34b5a0` ("tipc: eliminate upcall function pointers between port and socket"), which removed the sole initializer of "res". Just return 0 to fix it. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-07 15:08:17 -04:00
Linus Torvalds	240cd6a817	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "The biggest chunk is a series of patches from Ilya that add support for new Ceph osd and crush map features, including some new tunables, primary affinity, and the new encoding that is needed for erasure coding support. This brings things into parity with the server side and the looming firefly release. There is also support for allocation hints in RBD that help limit fragmentation on the server side. There is also a series of patches from Zheng fixing NFS reexport, directory fragmentation support, flock vs fnctl behavior, and some issues with clustered MDS. Finally, there are some miscellaneous fixes from Yunchuan Wen for fscache, Fabian Frederick for ACLs, and from me for fsync(dirfd) behavior" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (79 commits) ceph: skip invalid dentry during dcache readdir libceph: dump pool {read,write}_tier to debugfs libceph: output primary affinity values on osdmap updates ceph: flush cap release queue when trimming session caps ceph: don't grabs open file reference for aborted request ceph: drop extra open file reference in ceph_atomic_open() ceph: preallocate buffer for readdir reply libceph: enable PRIMARY_AFFINITY feature bit libceph: redo ceph_calc_pg_primary() in terms of ceph_calc_pg_acting() libceph: add support for osd primary affinity libceph: add support for primary_temp mappings libceph: return primary from ceph_calc_pg_acting() libceph: switch ceph_calc_pg_acting() to new helpers libceph: introduce apply_temps() helper libceph: introduce pg_to_raw_osds() and raw_to_up_osds() helpers libceph: ceph_can_shift_osds(pool) and pool type defines libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions libceph: enable OSDMAP_ENC feature bit libceph: primary_affinity decode bits libceph: primary_affinity infrastructure ...	2014-04-07 11:09:13 -07:00
Jean Sacren	6c6a985556	mac802154: fix duplicate #include headers The commit `e6278d9200` ("mac802154: use header operations to create/parse headers") included the header net/ieee802154_netdev.h which had been included by the commit `b70ab2e87f` ("ieee802154: enforce consistent endianness in the 802.15.4 stack"). Fix this duplicate #include by deleting the latter one as the required header has already been in place. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Cc: linux-zigbee-devel@lists.sourceforge.net Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-07 13:18:44 -04:00
Daniel Borkmann	5f9fde5f79	net: filter: be more defensive on div/mod by X==0 The old interpreter behaviour was that we returned with 0 whenever we found a division by 0 would take place. In the new interpreter we would currently just skip that instead and continue execution. It's true that a value of 0 as return might not be appropriate in all cases, but current users (socket filters -> drop packet, seccomp -> SECCOMP_RET_KILL, cls_bpf -> unclassified, etc) seem fine with that behaviour. Better this than undefined BPF program behaviour as it's expected that A contains the result of the division. In future, as more use cases open up, we could further adapt this return value to our needs, if necessary. So reintroduce return of 0 for division by 0 as in the old interpreter. Also in case of K which is guaranteed to be 32bit wide, sk_chk_filter() already takes care of preventing division by 0 invoked through K, so we can generally spare us these tests. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Reviewed-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-07 12:54:39 -04:00
Linus Torvalds	2b3a8fd735	NFS client updates for Linux 3.15 Highlights include: - Stable fix for a use after free issue in the NFSv4.1 open code - Fix the SUNRPC bi-directional RPC code to account for TCP segmentation - Optimise usage of readdirplus when confronted with 'ls -l' situations - Soft mount bugfixes - NFS over RDMA bugfixes - NFSv4 close locking fixes - Various NFSv4.x client state management optimisations - Rename/unlink code cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTQBayAAoJEGcL54qWCgDyUzgQAKzSlbcksMQT55M/KZJXabNW KSctJeDrkTkRxOXTNxuF9NbIgeqenLijCokXty6BIUgup0zkOPMzFfRfgdQvplnp YEj4sOEXEZ8CX+PoUTYOEayzt0ssEAOyidumiM+Gx2LD/E1d2xyCL7YaAOjIhVQS OnXcX1cZw+dZSUxC9vu5fVDjrphJTnp4CXdbvR5PiJiXeKqzZd9e5M3hXgpAQ/AS mWjYeUvM9mwyz7UmbLKkWEmzB3tFlGdTzDPxLRrkfcOSKI2Ham0lL3/Uv50/nRTu 99ts6KH8KLGcUuL9vD9KRebht2f71usBrWAdvpy1cUcf1Fh6lmEg4ktGfkqldaUu 9kNu9d5DCxJoGc6R2UTw5FeyPwYuDWoBwEGy1DcguJ5CeQn2R2nH4ps/P3J3DX4d DZsJqCY9idKZCQhtyR0iF9j3x2bNFoENaL6WHI6b0J+xjMedIbHgeUQzIQP0RLBJ h0IcjK0D+e7WdyC7jk4Nm3krtms5SNUG5/N9OUO36a7v8735PJBcbcgm9hZJt8Fh t/4vqUmKIBXHioHsMhaFslqTWlYIR9a3MYmN7QtHFYbqUfNxH69v9y3d6jb4Igck kqoEiui5aJOCR76s7oVdHCcm+klBwEPiACT+H9CUMzSoKzHSWsBSNZbJR3BEia4M 7dwScS1OfI2KuutshGQA =weNx -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: - Stable fix for a use after free issue in the NFSv4.1 open code - Fix the SUNRPC bi-directional RPC code to account for TCP segmentation - Optimise usage of readdirplus when confronted with 'ls -l' situations - Soft mount bugfixes - NFS over RDMA bugfixes - NFSv4 close locking fixes - Various NFSv4.x client state management optimisations - Rename/unlink code cleanups" * tag 'nfs-for-3.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) nfs: pass string length to pr_notice message about readdir loops NFSv4: Fix a use-after-free problem in open() SUNRPC: rpc_restart_call/rpc_restart_call_prepare should clear task->tk_status SUNRPC: Don't let rpc_delay() clobber non-timeout errors SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks SUNRPC: Ensure call_status() deals correctly with SOFTCONN tasks NFSv4: Ensure we respect soft mount timeouts during trunking discovery NFSv4: Schedule recovery if nfs40_walk_client_list() is interrupted NFS: advertise only supported callback netids SUNRPC: remove KERN_INFO from dprintk() call sites SUNRPC: Fix large reads on NFS/RDMA NFS: Clean up: revert increase in READDIR RPC buffer max size SUNRPC: Ensure that call_bind times out correctly SUNRPC: Ensure that call_connect times out correctly nfs: emit a fsnotify_nameremove call in sillyrename codepath nfs: remove synchronous rename code nfs: convert nfs_rename to use async_rename infrastructure nfs: make nfs_async_rename non-static nfs: abstract out code needed to complete a sillyrename NFSv4: Clear the open state flags if the new stateid does not match ...	2014-04-06 10:09:38 -07:00
Thomas Graf	c58dd2dd44	netfilter: Can't fail and free after table replacement All xtables variants suffer from the defect that the copy_to_user() to copy the counters to user memory may fail after the table has already been exchanged and thus exposed. Return an error at this point will result in freeing the already exposed table. Any subsequent packet processing will result in a kernel panic. We can't copy the counters before exposing the new tables as we want provide the counter state after the old table has been unhooked. Therefore convert this into a silent error. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-05 17:46:22 +02:00
Ilya Dryomov	8a53f23fcd	libceph: dump pool {read,write}_tier to debugfs Dump pool {read,write}_tier to debugfs. While at it, fixup printk type specifiers and remove the unnecessary cast to unsigned long long. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-04-04 21:08:29 -07:00
Ilya Dryomov	f31da0f3e1	libceph: output primary affinity values on osdmap updates Similar to osd weights, output primary affinity values on incremental osdmap updates. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-04-04 21:08:28 -07:00
Ilya Dryomov	c4c1228525	libceph: redo ceph_calc_pg_primary() in terms of ceph_calc_pg_acting() Reimplement ceph_calc_pg_primary() in terms of ceph_calc_pg_acting() and get rid of the now unused calc_pg_raw(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:19 -07:00
Ilya Dryomov	47ec1f3cc4	libceph: add support for osd primary affinity Respond to non-default primary_affinity values accordingly. (Primary affinity allows the admin to shift 'primary responsibility' away from specific osds, effectively shifting around the read side of the workload and whatever overhead is incurred by peering and writes by virtue of being the primary). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:17 -07:00
Ilya Dryomov	5e8d4d36bf	libceph: add support for primary_temp mappings Change apply_temp() to override primary in the same way pg_temp overrides osd set. primary_temp overrides pg_temp primary too. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:16 -07:00
Ilya Dryomov	8008ab1080	libceph: return primary from ceph_calc_pg_acting() In preparation for adding support for primary_temp, stop assuming primaryness: add a primary out parameter to ceph_calc_pg_acting() and change call sites accordingly. Primary is now specified separately from the order of osds in the set. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:14 -07:00
Ilya Dryomov	ac972230e2	libceph: switch ceph_calc_pg_acting() to new helpers Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(), raw_to_up_osds() and apply_temps(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:13 -07:00
Ilya Dryomov	45966c3467	libceph: introduce apply_temps() helper apply_temp() helper for applying various temporary mappings (at this point only pg_temp mappings) to the up set, therefore transforming it into an acting set. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:11 -07:00
Ilya Dryomov	2bd93d4d7e	libceph: introduce pg_to_raw_osds() and raw_to_up_osds() helpers pg_to_raw_osds() helper for computing a raw (crush) set, which can contain non-existant and down osds. raw_to_up_osds() helper for pruning non-existant and down osds from the raw set, therefore transforming it into an up set, and determining up primary. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:10 -07:00
Ilya Dryomov	63a6993f52	libceph: primary_affinity decode bits Add two helpers to decode primary_affinity (full map, vector<u32>) and new_primary_affinity (inc map, map<u32, u32>) and switch to them. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:04 -07:00
Ilya Dryomov	2cfa34f2d6	libceph: primary_affinity infrastructure Add primary_affinity infrastructure. primary_affinity values are stored in an max_osd-sized array, hanging off ceph_osdmap, similar to a osd_weight array. Introduce {get,set}_primary_affinity() helpers, primarily to return CEPH_OSD_DEFAULT_PRIMARY_AFFINITY when no affinity has been set and to abstract out osd_primary_affinity array allocation and initialization. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:02 -07:00
Ilya Dryomov	d286de796a	libceph: primary_temp decode bits Add a common helper to decode both primary_temp (full map, map<pg_t, u32>) and new_primary_temp (inc map, same) and switch to it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:00 -07:00
Ilya Dryomov	9686f94c8c	libceph: primary_temp infrastructure Add primary_temp mappings infrastructure. struct ceph_pg_mapping is overloaded, primary_temp mappings are stored in an rb-tree, rooted at ceph_osdmap, in a manner similar to pg_temp mappings. Dump primary_temp mappings to /sys/kernel/debug/ceph/<client>/osdmap, one 'primary_temp <pgid> <osd>' per line, e.g: primary_temp 2.6 4 Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:58 -07:00
Ilya Dryomov	35a935d75d	libceph: generalize ceph_pg_mapping In preparation for adding support for primary_temp mappings, generalize struct ceph_pg_mapping so it can hold mappings other than pg_temp. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:57 -07:00
Ilya Dryomov	ec7af97258	libceph: introduce get_osdmap_client_data_v() Full and incremental osdmaps are structured identically and have identical headers. Add a helper to decode both "old" (16-bit version, v6) and "new" (8-bit struct_v+struct_compat+struct_len, v7) osdmap enconding headers and switch to it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:55 -07:00
Ilya Dryomov	10db634e20	libceph: introduce decode{,_new}_pg_temp() and switch to them Consolidate pg_temp (full map, map<pg_t, vector<u32>>) and new_pg_temp (inc map, same) decoding logic into a common helper and switch to it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:53 -07:00
Ilya Dryomov	4d60351f90	libceph: switch osdmap_set_max_osd() to krealloc() Use krealloc() instead of rolling our own. (krealloc() with a NULL first argument acts as a kmalloc()). Properly initalize the new array elements. This is needed to make future additions to osdmap easier. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:52 -07:00
Ilya Dryomov	433fbdd31d	libceph: introduce decode{,_new}_pools() and switch to them Consolidate pools (full map, map<u64, pg_pool_t>) and new_pools (inc map, same) decoding logic into a common helper and switch to it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:50 -07:00
Ilya Dryomov	0f70c7eedb	libceph: rename __decode_pool{,_names}() to decode_pool{,_names}() To be in line with all the other osdmap decode helpers. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:49 -07:00
Ilya Dryomov	53bbaba9d8	libceph: fix and clarify ceph_decode_need() sizes Sum up sizeof(...) results instead of (incorrectly) hard-coding the number of bytes, expressed in ints and longs. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:47 -07:00
Ilya Dryomov	9464d00862	libceph: nuke bogus encoding version check in osdmap_apply_incremental() Only version 6 of osdmap encoding is supported, anything other than version 6 results in an error and halts the decoding process. Checking if version is >= 5 is therefore bogus. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:46 -07:00
Ilya Dryomov	86f1742b94	libceph: fixup error handling in osdmap_apply_incremental() The existing error handling scheme requires resetting err to -EINVAL prior to calling any ceph_decode_* macro. This is ugly and fragile, and there already are a few places where we would return 0 on error, due to a missing reset. Follow osdmap_decode() and fix this by adding a special e_inval label to be used by all ceph_decode_* macros. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:44 -07:00
Ilya Dryomov	9902e682c7	libceph: fix crush_decode() call site in osdmap_decode() The size of the memory area feeded to crush_decode() should be limited not only by osdmap end, but also by the crush map length. Also, drop unnecessary dout() (dout() in crush_decode() conveys the same info) and step past crush map only if it is decoded successfully. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:43 -07:00
Ilya Dryomov	2d88b2e081	libceph: check length of osdmap osd arrays Check length of osd_state, osd_weight and osd_addr arrays. They should all have exactly max_osd elements after the call to osdmap_set_max_osd(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:41 -07:00
Ilya Dryomov	3977058c46	libceph: safely decode max_osd value in osdmap_decode() max_osd value is not covered by any ceph_decode_need(). Use a safe version of ceph_decode_* macro to decode it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:40 -07:00
Ilya Dryomov	597b52f6ca	libceph: fixup error handling in osdmap_decode() The existing error handling scheme requires resetting err to -EINVAL prior to calling any ceph_decode_* macro. This is ugly and fragile, and there already are a few places where we would return 0 on error, due to a missing reset. Fix this by adding a special e_inval label to be used by all ceph_decode_* macros. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:38 -07:00
Ilya Dryomov	a2505d63ee	libceph: split osdmap allocation and decode steps Split osdmap allocation and initialization into a separate function, ceph_osdmap_decode(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:37 -07:00
Ilya Dryomov	38a8d56023	libceph: dump osdmap and enhance output on decode errors Dump osdmap in hex on both full and incremental decode errors, to make it easier to match the contents with error offset. dout() map epoch and max_osd value on success. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:35 -07:00
Ilya Dryomov	1c00240e00	libceph: dump pg_temp mappings to debugfs Dump pg_temp mappings to /sys/kernel/debug/ceph/<client>/osdmap, one 'pg_temp <pgid> [<osd>, ..., <osd>]' per line, e.g: pg_temp 2.6 [2,3,4] Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:34 -07:00
Ilya Dryomov	0a2800d728	libceph: do not prefix osd lines with \t in debugfs output To save screen space in anticipation of more fields (e.g. primary affinity). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:32 -07:00
Ilya Dryomov	35fea3a18a	libceph: refer to osdmap directly in osdmap_show() To make it more readable and save screen space. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:07:31 -07:00
Ilya Dryomov	d83ed858f1	crush: add SET_CHOOSELEAF_VARY_R step This lets you adjust the vary_r tunable on a per-rule basis. Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2014-04-04 21:07:28 -07:00
Ilya Dryomov	e2b149cc4b	crush: add chooseleaf_vary_r tunable The current crush_choose_firstn code will re-use the same 'r' value for the recursive call. That means that if we are hitting a collision or rejection for some reason (say, an OSD that is marked out) and need to retry, we will keep making the same (bad) choice in that recursive selection. Introduce a tunable that fixes that behavior by incorporating the parent 'r' value into the recursive starting point, so that a different path will be taken in subsequent placement attempts. Note that this was done from the get-go for the new crush_choose_indep algorithm. This was exposed by a user who was seeing PGs stuck in active+remapped after reweight-by-utilization because the up set mapped to a single OSD. Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2014-04-04 21:07:26 -07:00
Ilya Dryomov	6ed1002f36	crush: allow crush rules to set (re)tries counts to 0 These two fields are misnomers; they are retry counts. Reflects ceph.git commit f17caba8ae0cad7b6f8f35e53e5f73b444696835. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2014-04-04 21:07:25 -07:00
Ilya Dryomov	48a163dbb5	crush: fix off-by-one errors in total_tries refactor Back in 27f4d1f6bc32c2ed7b2c5080cbd58b14df622607 we refactored the CRUSH code to allow adjustment of the retry counts on a per-pool basis. That commit had an off-by-one bug: the previous "tries" counter was a retry count, not a try count, but the new code was passing in 1 meaning there should be no retries. Fix the ftotal vs tries comparison to use < instead of <= to fix the problem. Note that the original code used <= here, which means the global "choose_total_tries" tunable is actually counting retries. Compensate for that by adding 1 in crush_do_rule when we pull the tunable into the local variable. This was noticed looking at output from a user provided osdmap. Unfortunately the map doesn't illustrate the change in mapping behavior and I haven't managed to construct one yet that does. Inspection of the crush debug output now aligns with prior versions, though. Reflects ceph.git commit 795704fd615f0b008dcc81aa088a859b2d075138. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2014-04-04 21:07:23 -07:00
Yan, Zheng	d90deda69c	libceph: fix oops in ceph_msg_data_{pages,pagelist}_advance() When there is no more data, ceph_msg_data_{pages,pagelist}_advance() should not move on to the next page. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-04-04 21:07:15 -07:00
Pablo Neira Ayuso	2fec6bb6f4	netfilter: nf_tables: fix wrong format in request_module() The intended format in request_module is %.s instead of %.s. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:50 +02:00
Pablo Neira Ayuso	a9bdd83656	netfilter: nf_tables: set names cannot be larger than 15 bytes Currently, nf_tables trims off the set name if it exceeeds 15 bytes, so explicitly reject set names that are too large. Reported-by: Giuseppe Longo <giuseppelng@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:44 +02:00
Kirill Tkhai	b8ddd9eac8	netfilter: Add {ipt,ip6t}_osf aliases for xt_osf There are no these aliases, so kernel can not request appropriate match table: $ iptables -I INPUT -p tcp -m osf --genre Windows --ttl 2 -j DROP iptables: No chain/target/match by that name. setsockopt() requests ipt_osf module, which is not present. Add the aliases. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:22 +02:00
Alexey Perevalov	a00e76349f	netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks This simple modification allows iptables to work with INPUT chain in combination with cgroup module. It could be useful for counting ingress traffic per cgroup with nfacct netfilter module. There were no problems to count the egress traffic that way formerly. It's possible to get classified sk_buff after PREROUTING, due to socket lookup being done in early_demux (tcp_v4_early_demux). Also it works for udp as well. Trivial usage example, assuming we're in the same shell every step and we have enough permissions: 1) Classic net_cls cgroup initialization: mkdir /sys/fs/cgroup/net_cls mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls 2) Set up cgroup for interesting application: mkdir /sys/fs/cgroup/net_cls/wget echo 1 > /sys/fs/cgroup/net_cls/wget/net_cls.classid echo $BASHPID > /sys/fs/cgroup/net_cls/wget/cgroup.procs 3) Create kernel counters: nfacct add wget-cgroup-in iptables -A INPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-in nfacct add wget-cgroup-out iptables -A OUTPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-out 4) Network usage: wget https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.14-rc6.tar.xz 5) Check results: nfacct list Cgroup approach is being used for the DataUsage (counting & blocking traffic) feature for Samsung's modification of the Tizen OS. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:17 +02:00
Florian Westphal	e00b437b3d	netfilter: connlimit: move lock array out of struct connlimit_data Eric points out that the locks can be global. Moreover, both Jesper and Eric note that using only 32 locks increases false sharing as only two cache lines are used. This increases locks to 256 (16 cache lines assuming 64byte cacheline and 4 bytes per spinlock). Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:13 +02:00
Florian Westphal	e5ac6eafba	netfilter: connlimit: fix UP build cannot use ARRAY_SIZE() if spinlock_t is empty struct. Fixes: `1442e7507d` ("netfilter: connlimit: use keyed locks") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:07 +02:00
Eric Dumazet	e33d0ba804	net-gro: reset skb->truesize in napi_reuse_skb() Recycling skb always had been very tough... This time it appears GRO layer can accumulate skb->truesize adjustments made by drivers when they attach a fragment to skb. skb_gro_receive() can only subtract from skb->truesize the used part of a fragment. I spotted this problem seeing TcpExtPruneCalled and TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where TCP receive window should be sized properly to accept traffic coming from a driver not overshooting skb->truesize. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-03 16:17:52 -04:00
Linus Torvalds	32d01dc7be	Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot updates for cgroup: - The biggest one is cgroup's conversion to kernfs. cgroup took after the long abandoned vfs-entangled sysfs implementation and made it even more convoluted over time. cgroup's internal objects were fused with vfs objects which also brought in vfs locking and object lifetime rules. Naturally, there are places where vfs rules don't fit and nasty hacks, such as credential switching or lock dance interleaving inode mutex and cgroup_mutex with object serial number comparison thrown in to decide whether the operation is actually necessary, needed to be employed. After conversion to kernfs, internal object lifetime and locking rules are mostly isolated from vfs interactions allowing shedding of several nasty hacks and overall simplification. This will also allow implmentation of operations which may affect multiple cgroups which weren't possible before as it would have required nesting i_mutexes. - Various simplifications including dropping of module support, easier cgroup name/path handling, simplified cgroup file type handling and task_cg_lists optimization. - Prepatory changes for the planned unified hierarchy, which is still a patchset away from being actually operational. The dummy hierarchy is updated to serve as the default unified hierarchy. Controllers which aren't claimed by other hierarchies are associated with it, which BTW was what the dummy hierarchy was for anyway. - Various fixes from Li and others. This pull request includes some patches to add missing slab.h to various subsystems. This was triggered xattr.h include removal from cgroup.h. cgroup.h indirectly got included a lot of files which brought in xattr.h which brought in slab.h. There are several merge commits - one to pull in kernfs updates necessary for converting cgroup (already in upstream through driver-core), others for interfering changes in the fixes branch" * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits) cgroup: remove useless argument from cgroup_exit() cgroup: fix spurious lockdep warning in cgroup_exit() cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c cgroup: break kernfs active_ref protection in cgroup directory operations cgroup: fix cgroup_taskset walking order cgroup: implement CFTYPE_ONLY_ON_DFL cgroup: make cgrp_dfl_root mountable cgroup: drop const from @buffer of cftype->write_string() cgroup: rename cgroup_dummy_root and related names cgroup: move ->subsys_mask from cgroupfs_root to cgroup cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding cgroup: remove NULL checks from [pr_cont_]cgroup_{name\|path}() cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root cgroup: reorganize cgroup bootstrapping cgroup: relocate setting of CGRP_DEAD cpuset: use rcu_read_lock() to protect task_cs() cgroup_freezer: document freezer_fork() subtleties cgroup: update cgroup_transfer_tasks() to either succeed or fail cgroup: drop task_lock() protection around task->cgroups cgroup: update how a newly forked task gets associated with css_set ...	2014-04-03 13:05:42 -07:00
Erik Hugne	a5e7ac5ce1	tipc: fix regression bug where node events are not being generated Commit `5902385a24` ("tipc: obsolete the remote management feature") introduces a regression where node topology events are not being generated because the publication that triggers this: {0, <z.c.n>, <z.c.n>} is no longer available. This will break applications that rely on node events to discover when nodes join/leave a cluster. We fix this by advertising the node publication when TIPC enters networking mode, and withdraws it upon shutdown. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-03 16:03:57 -04:00
Jiri Pirko	d0290214de	net: add busy_poll device feature Currently there is no way how to find out if a device supports busy polling. So add a feature and make it dependent on ndo_busy_poll existence. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-03 14:31:34 -04:00
Daniel Borkmann	8e2f1a63f2	packet: fix packet_direct_xmit for BQL enabled drivers Currently, in packet_direct_xmit() we test the assigned netdevice queue for netif_xmit_frozen_or_stopped() before doing an ndo_start_xmit(). This can have the side-effect that BQL enabled drivers which make use of netdev_tx_sent_queue() internally, set __QUEUE_STATE_STACK_XOFF from within the stack and would not fully fill the device's TX ring from packet sockets with PACKET_QDISC_BYPASS enabled. Instead, use a test without BQL bit so that bursts can be absorbed into the NICs TX ring. Fix and code suggested by Eric Dumazet, thanks! Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-03 14:29:12 -04:00
Daniel Borkmann	0f97ede45e	packet: report tx_dropped in packet_direct_xmit Since commit `015f0688f5` ("net: net: add a core netdev->tx_dropped counter"), we can now account for TX drops from within the core stack instead of drivers. Therefore, fix packet_direct_xmit() and increase drop count when we encounter a problem before driver's xmit function was called (we do not want to doubly account for it). Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-03 14:29:11 -04:00
Thomas Huth	1dad093b66	s390/irq: Use defines for external interruption codes Use the new defines for external interruption codes to get rid of "magic" numbers in the s390 source code. And while we're at it, also rename the (un-)register_external_interrupt function to something shorter so that this patch does not exceed the 80 columns all over the place. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-04-03 14:30:52 +02:00
Arturo Borrero	d60ce62fb5	netfilter: nf_tables: add set_elem notifications This patch adds set_elems notifications. When a set_elem is added/deleted, all listening peers in userspace will receive the corresponding notification. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>	2014-04-03 12:22:25 +02:00
Linus Torvalds	cd6362befe	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Here is my initial pull request for the networking subsystem during this merge window: 1) Support for ESN in AH (RFC 4302) from Fan Du. 2) Add full kernel doc for ethtool command structures, from Ben Hutchings. 3) Add BCM7xxx PHY driver, from Florian Fainelli. 4) Export computed TCP rate information in netlink socket dumps, from Eric Dumazet. 5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas Dichtel. 6) Convert many drivers to pci_enable_msix_range(), from Alexander Gordeev. 7) Record SKB timestamps more efficiently, from Eric Dumazet. 8) Switch to microsecond resolution for TCP round trip times, also from Eric Dumazet. 9) Clean up and fix 6lowpan fragmentation handling by making use of the existing inet_frag api for it's implementation. 10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss. 11) Auto size SKB lengths when composing netlink messages based upon past message sizes used, from Eric Dumazet. 12) qdisc dumps can take a long time, add a cond_resched(), From Eric Dumazet. 13) Sanitize netpoll core and drivers wrt. SKB handling semantics. Get rid of never-used-in-tree netpoll RX handling. From Eric W Biederman. 14) Support inter-address-family and namespace changing in VTI tunnel driver(s). From Steffen Klassert. 15) Add Altera TSE driver, from Vince Bridgers. 16) Optimizing csum_replace2() so that it doesn't adjust the checksum by checksumming the entire header, from Eric Dumazet. 17) Expand BPF internal implementation for faster interpreting, more direct translations into JIT'd code, and much cleaner uses of BPF filtering in non-socket ocntexts. From Daniel Borkmann and Alexei Starovoitov" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits) netpoll: Use skb_irq_freeable to make zap_completion_queue safe. net: Add a test to see if a skb is freeable in irq context qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port' net: ptp: move PTP classifier in its own file net: sxgbe: make "core_ops" static net: sxgbe: fix logical vs bitwise operation net: sxgbe: sxgbe_mdio_register() frees the bus Call efx_set_channels() before efx->type->dimension_resources() xen-netback: disable rogue vif in kthread context net/mlx4: Set proper build dependancy with vxlan be2net: fix build dependency on VxLAN mac802154: make csma/cca parameters per-wpan mac802154: allow only one WPAN to be up at any given time net: filter: minor: fix kdoc in __sk_run_filter netlink: don't compare the nul-termination in nla_strcmp can: c_can: Avoid led toggling for every packet. can: c_can: Simplify TX interrupt cleanup can: c_can: Store dlc private can: c_can: Reduce register access can: c_can: Make the code readable ...	2014-04-02 20:53:45 -07:00
Ilya Dryomov	c647b8a8c6	libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op This is primarily for rbd's benefit and is supposed to combat fragmentation: "... knowing that rbd images have a 4m size, librbd can pass a hint that will let the osd do the xfs allocation size ioctl on new files so that they are allocated in 1m or 4m chunks. We've seen cases where users with rbd workloads have very high levels of fragmentation in xfs and this would mitigate that and probably have a pretty nice performance benefit." SETALLOCHINT is considered advisory, so our backwards compatibility mechanism here is to set FAILOK flag for all SETALLOCHINT ops. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-03 10:33:51 +08:00
Ilya Dryomov	7b25bf5f02	libceph: encode CEPH_OSD_OP_FLAG_* op flags Encode ceph_osd_op::flags field so that it gets sent over the wire. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-03 10:33:51 +08:00
Ilya Dryomov	9d521470a4	libceph: a per-osdc crush scratch buffer With the addition of erasure coding support in the future, scratch variable-length array in crush_do_rule_ary() is going to grow to at least 200 bytes on average, on top of another 128 bytes consumed by rawosd/osd arrays in the call chain. Replace it with a buffer inside struct osdmap and a mutex. This shouldn't result in any contention, because all osd requests were already serialized by request_mutex at that point; the only unlocked caller was ceph_ioctl_get_dataloc(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-04-03 10:33:50 +08:00
Linus Torvalds	0f1b1e6d73	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: - substantial cleanup of the generic and transport layers, in the direction of an ultimate goal of making struct hid_device completely transport independent, by Benjamin Tissoires - cp2112 driver from David Barksdale - a lot of fixes and new hardware support (Dualshock 4) to hid-sony driver, by Frank Praznik - support for Win 8.1 multitouch protocol by Andrew Duggan - other smaller fixes / device ID additions * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (75 commits) HID: sony: fix force feedback mismerge HID: sony: Set the quriks flag for Bluetooth controllers HID: sony: Fix Sixaxis cable state detection HID: uhid: Add UHID_CREATE2 + UHID_INPUT2 HID: hyperv: fix _raw_request() prototype HID: hyperv: Implement a stub raw_request() entry point HID: hid-sensor-hub: fix sleeping function called from invalid context HID: multitouch: add support for Win 8.1 multitouch touchpads HID: remove hid_output_raw_report transport implementations HID: sony: do not rely on hid_output_raw_report HID: cp2112: remove the last hid_output_raw_report() call HID: cp2112: remove various hid_out_raw_report calls HID: multitouch: add support of other generic collections in hid-mt HID: multitouch: remove pen special handling HID: multitouch: remove registered devices with default behavior HID: hidp: Add a comment that some devices depend on the current behavior of uniq HID: sony: Prevent duplicate controller connections. HID: sony: Perform a boundry check on the sixaxis battery level index. HID: sony: Fix work queue issues HID: sony: Fix multi-line comment styling ...	2014-04-02 16:24:28 -07:00
Linus Torvalds	159d8133d0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual rocket science -- mostly documentation and comment updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: sparse: fix comment doc: fix double words isdn: capi: fix "CAPI_VERSION" comment doc: DocBook: Fix typos in xml and template file Bluetooth: add module name for btwilink driver core: unexport static function create_syslog_header mmc: core: typo fix in printk specifier ARM: spear: clean up editing mistake net-sysfs: fix comment typo 'CONFIG_SYFS' doc: Insert MODULE_ in module-signing macros Documentation: update URL to hfsplus Technote 1150 gpio: update path to documentation ixgbe: Fix format string in ixgbe_fcoe. Kconfig: Remove useless "default N" lines user_namespace.c: Remove duplicated word in comment CREDITS: fix formatting treewide: Fix typo in Documentation/DocBook mm: Fix warning on make htmldocs caused by slab.c ata: ata-samsung_cf: cleanup in header file idr: remove unused prototype of idr_free()	2014-04-02 16:23:38 -07:00
Patrick McHardy	2c96c25d11	netfilter: nft_hash: use set global element counter instead of private one Now that nf_tables performs global accounting of set elements, it is not needed in the hash type anymore. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-02 21:33:55 +02:00
Patrick McHardy	c50b960ccc	netfilter: nf_tables: implement proper set selection The current set selection simply choses the first set type that provides the requested features, which always results in the rbtree being chosen by virtue of being the first set in the list. What we actually want to do is choose the implementation that can provide the requested features and is optimal from either a performance or memory perspective depending on the characteristics of the elements and the preferences specified by the user. The elements are not known when creating a set. Even if we would provide them for anonymous (literal) sets, we'd still have standalone sets where the elements are not known in advance. We therefore need an abstract description of the data charcteristics. The kernel already knows the size of the key, this patch starts by introducing a nested set description which so far contains only the maximum amount of elements. Based on this the set implementations are changed to provide an estimate of the required amount of memory and the lookup complexity class. The set ops have a new callback ->estimate() that is invoked during set selection. It receives a structure containing the attributes known to the kernel and is supposed to populate a struct nft_set_estimate with the complexity class and, in case the size is known, the complete amount of memory required, or the amount of memory required per element otherwise. Based on the policy specified by the user (performance/memory, defaulting to performance) the kernel will then select the best suited implementation. Even if the set implementation would allow to add more than the specified maximum amount of elements, they are enforced since new implementations might not be able to add more than maximum based on which they were selected. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-02 21:32:57 +02:00
Patrick McHardy	fe92ca45a1	netfilter: nft_ct: split nft_ct_init() into two functions for get/set For value spanning multiple registers, we need to validate the length of data loads. In order to add this to nft_ct, we need the length from key validation. Split the nft_ct_init() function into two functions for the get and set operations as preparation for that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-02 21:29:45 +02:00
Patrick McHardy	d2caa696ad	netfilter: nft_meta: split nft_meta_init() into two functions for get/set For value spanning multiple registers, we need to validate the length of data loads. In order to add this to nft_meta, we need the length from key validation. Split the nft_meta_init() function into two functions for the get and set operations as preparation for that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>	2014-04-02 21:29:25 +02:00
Patrick McHardy	e88e514e1f	netfilter: nft_ct: add missing ifdef for NFT_MARK setting The set operation for ct mark is only valid if CONFIG_NF_CONNTRACK_MARK is enabled. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-02 21:29:07 +02:00
Linus Torvalds	7a48837732	Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block Pull core block layer updates from Jens Axboe: "This is the pull request for the core block IO bits for the 3.15 kernel. It's a smaller round this time, it contains: - Various little blk-mq fixes and additions from Christoph and myself. - Cleanup of the IPI usage from the block layer, and associated helper code. From Frederic Weisbecker and Jan Kara. - Duplicate code cleanup in bio-integrity from Gu Zheng. This will give you a merge conflict, but that should be easy to resolve. - blk-mq notify spinlock fix for RT from Mike Galbraith. - A blktrace partial accounting bug fix from Roman Pen. - Missing REQ_SYNC detection fix for blk-mq from Shaohua Li" * 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits) blk-mq: add REQ_SYNC early rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock blk-mq: support partial I/O completions blk-mq: merge blk_mq_insert_request and blk_mq_run_request blk-mq: remove blk_mq_alloc_rq blk-mq: don't dump CPU -> hw queue map on driver load blk-mq: fix wrong usage of hctx->state vs hctx->flags blk-mq: allow blk_mq_init_commands() to return failure block: remove old blk_iopoll_enabled variable blktrace: fix accounting of partially completed requests smp: Rename __smp_call_function_single() to smp_call_function_single_async() smp: Remove wait argument from __smp_call_function_single() watchdog: Simplify a little the IPI call smp: Move __smp_call_function_single() below its safe version smp: Consolidate the various smp_call_function_single() declensions smp: Teach __smp_call_function_single() to check for offline cpus smp: Remove unused list_head from csd smp: Iterate functions through llist_for_each_entry_safe() block: Stop abusing rq->csd.list in blk-softirq block: Remove useless IPI struct initialization ...	2014-04-01 19:19:15 -07:00
Eric W. Biederman	b1586f099b	netpoll: Use skb_irq_freeable to make zap_completion_queue safe. Replace the test in zap_completion_queue to test when it is safe to free skbs in hard irq context with skb_irq_freeable ensuring we only free skbs when it is safe, and removing the possibility of subtle problems. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-01 17:53:36 -04:00
Daniel Borkmann	408eccce32	net: ptp: move PTP classifier in its own file This commit fixes a build error reported by Fengguang, that is triggered when CONFIG_NETWORK_PHY_TIMESTAMPING is not set: ERROR: "ptp_classify_raw" [drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.ko] undefined! The fix is to introduce its own file for the PTP BPF classifier, so that PTP_1588_CLOCK and/or NETWORK_PHY_TIMESTAMPING can select it independently from each other. IXP4xx driver on ARM needs to select it as well since it does not seem to select PTP_1588_CLOCK or similar that would pull it in automatically. This also allows for hiding all of the internals of the BPF PTP program inside that file, and only exporting relevant API bits to drivers. This patch also adds a kdoc documentation of ptp_classify_raw() API to make it clear that it can return PTP_CLASS_* defines. Also, the BPF program has been translated into bpf_asm code, so that it can be more easily read and altered (extensively documented in [1]). In the kernel tree under tools/net/ we have bpf_asm and bpf_dbg tools, so the commented program can simply be translated via `./bpf_asm -c prog` where prog is a file that contains the commented code. This makes it easily readable/verifiable and when there's a need to change something, jump offsets etc do not need to be replaced manually which can be very error prone. Instead, a newly translated version via bpf_asm can simply replace the old code. I have checked opcode diffs before/after and it's the very same filter. [1] Documentation/networking/filter.txt Fixes: `164d8c6665` ("net: ptp: do not reimplement PTP/BPF classifier") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jiri Benc <jbenc@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-01 16:43:18 -04:00
Phoebe Buckheister	e462ded699	mac802154: make csma/cca parameters per-wpan Commit `9b2777d608` (ieee802154: add TX power control to wpan_phy) and following erroneously added CSMA and CCA parameters for 802.15.4 devices as PHY parameters, while they are actually MAC parameters and can differ for any two WPAN instances. Since it is now sensible to have multiple WPAN devices with differing CSMA/CCA parameters, make these parameters MAC parameters instead. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-01 16:25:51 -04:00
Phoebe Buckheister	336908f6d7	mac802154: allow only one WPAN to be up at any given time All 802.15.4 PHY devices with drivers in tree can support only one WPAN at any given time, yet the stack allows arbitrarily many WPAN devices to be created and up at the same time. This cannot work with what the hardware provides, and in the current implementation, provides an easy DoS vector to any process on the system that may call socket() and sendmsg(). Thus, allow only one WPAN per PHY to be up at once, just like mac80211 does for managed devices. Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-01 16:25:51 -04:00
Daniel Borkmann	01d32f6e5a	net: filter: minor: fix kdoc in __sk_run_filter This minor patch fixes the following warning when doing a `make htmldocs`: DOCPROC Documentation/DocBook/networking.xml Warning(.../net/core/filter.c:135): No description found for parameter 'insn' Warning(.../net/core/filter.c:135): Excess function parameter 'fentry' description in '__sk_run_filter' HTML Documentation/DocBook/networking.html Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-01 15:25:33 -04:00
Jiri Kosina	ad295b6d57	Merge branch 'for-3.15/hid-core-ll-transport-cleanup' into for-linus Conflicts: drivers/hid/hid-ids.h drivers/hid/hid-sony.c drivers/hid/i2c-hid/i2c-hid.c	2014-04-01 19:05:09 +02:00
Jiri Kosina	ee5f68e6c2	Merge branch 'for-3.15/ll-driver-new-callbacks' into for-linus	2014-04-01 18:56:24 +02:00
Linus Torvalds	9d919e8d5b	Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "PREPARE_[DELAYED_]WORK() were used to change the work function of work items without fully reinitializing it; however, this makes workqueue consider the work item as a different one from before and allows the work item to start executing before the previous instance is finished which can lead to extremely subtle issues which are painful to debug. The interface has never been popular. This pull request contains patches to remove existing usages and kill the interface. As one of the changes was routed during the last devel cycle and another depended on a pending change in nvme, for-3.15 contains a couple merge commits. In addition, interfaces which were deprecated quite a while ago - __cancel_delayed_work() and WQ_NON_REENTRANT - are removed too" * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove deprecated WQ_NON_REENTRANT workqueue: Spelling s/instensive/intensive/ workqueue: remove PREPARE_[DELAYED_]WORK() staging/fwserial: don't use PREPARE_WORK afs: don't use PREPARE_WORK nvme: don't use PREPARE_WORK usb: don't use PREPARE_DELAYED_WORK floppy: don't use PREPARE_[DELAYED_]WORK ps3-vuart: don't use PREPARE_WORK wireless/rt2x00: don't use PREPARE_WORK in rt2800usb.c workqueue: Remove deprecated __cancel_delayed_work()	2014-03-31 15:08:51 -07:00
Linus Torvalds	190f918660	Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 compat wrapper rework from Heiko Carstens: "S390 compat system call wrapper simplification work. The intention of this work is to get rid of all hand written assembly compat system call wrappers on s390, which perform proper sign or zero extension, or pointer conversion of compat system call parameters. Instead all of this should be done with C code eg by using Al's COMPAT_SYSCALL_DEFINEx() macro. Therefore all common code and s390 specific compat system calls have been converted to the COMPAT_SYSCALL_DEFINEx() macro. In order to generate correct code all compat system calls may only have eg compat_ulong_t parameters, but no unsigned long parameters. Those patches which change parameter types from unsigned long to compat_ulong_t parameters are separate in this series, but shouldn't cause any harm. The only compat system calls which intentionally have 64 bit parameters (preadv64 and pwritev64) in support of the x86/32 ABI haven't been changed, but are now only available if an architecture defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64. System calls which do not have a compat variant but still need proper zero extension on s390, like eg "long sys_brk(unsigned long brk)" will get a proper wrapper function with the new s390 specific COMPAT_SYSCALL_WRAPx() macro: COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk); which generates the following code (simplified): asmlinkage long sys_brk(unsigned long brk); asmlinkage long compat_sys_brk(long brk) { return sys_brk((u32)brk); } Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines includes both linux/syscall.h and linux/compat.h, it will generate build errors, if the declaration of sys_brk() doesn't match, or if there exists a non-matching compat_sys_brk() declaration. In addition this will intentionally result in a link error if somewhere else a compat_sys_brk() function exists, which probably should have been used instead. Two more BUILD_BUG_ONs make sure the size and type of each compat syscall parameter can be handled correctly with the s390 specific macros. I converted the compat system calls step by step to verify the generated code is correct and matches the previous code. In fact it did not always match, however that was always a bug in the hand written asm code. In result we get less code, less bugs, and much more sanity checking" * 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits) s390/compat: add copyright statement compat: include linux/unistd.h within linux/compat.h s390/compat: get rid of compat wrapper assembly code s390/compat: build error for large compat syscall args mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: convert to COMPAT_SYSCALL_DEFINE security/compat: convert to COMPAT_SYSCALL_DEFINE mm/compat: convert to COMPAT_SYSCALL_DEFINE net/compat: convert to COMPAT_SYSCALL_DEFINE kernel/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: optional preadv64/pwrite64 compat system calls ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t s390/compat: partial parameter conversion within syscall wrappers s390/compat: automatic zero, sign and pointer conversion of syscalls s390/compat: add sync_file_range and fallocate compat syscalls ...	2014-03-31 14:32:17 -07:00
Stanislav Kinsbursky	3064639423	nfsd: check passed socket's net matches NFSd superblock's one There could be a case, when NFSd file system is mounted in network, different to socket's one, like below: "ip netns exec" creates new network and mount namespace, which duplicates NFSd mount point, created in init_net context. And thus NFS server stop in nested network context leads to RPCBIND client destruction in init_net. Then, on NFSd start in nested network context, rpc.nfsd process creates socket in nested net and passes it into "write_ports", which leads to RPCBIND sockets creation in init_net context because of the same reason (NFSd monut point was created in init_net context). An attempt to register passed socket in nested net leads to panic, because no RPCBIND client present in nexted network namespace. This patch add check that passed socket's net matches NFSd superblock's one. And returns -EINVAL error to user psace otherwise. v2: Put socket on exit. Reported-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-03-31 16:58:16 -04:00

... 3 4 5 6 7 ...

32759 Commits