linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-16 23:45:31 +08:00

Author	SHA1	Message	Date
Felix Fietkau	7a36b930e6	mac80211: minstrel_ht: set default tx aggregation timeout to 0 The value 5000 was put here with the addition of the timeout field to ieee80211_start_tx_ba_session. It was originally added in mac80211 to save resources for drivers like iwlwifi, which only supports a limited number of concurrent aggregation sessions. Since iwlwifi does not use minstrel_ht and other drivers don't need this, 0 is a better default - especially since there have been recent reports of aggregation setup related issues reproduced with ath9k. This should improve stability without causing any adverse effects. Cc: stable@vger.kernel.org Acked-by: Avery Pennarun <apenwarr@gmail.com> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-02-23 10:39:24 +01:00
Sven Eckelmann	212c5a5e6b	mac80211: minstrel: Change expected throughput unit back to Kbps The change from cur_tp to the function minstrel_get_tp_avg/minstrel_ht_get_tp_avg changed the unit used for the current throughput. For example in minstrel_ht the correct conversion between them would be: mrs->cur_tp / 10 == minstrel_ht_get_tp_avg(..). This factor 10 must also be included in the calculation of minstrel_get_expected_throughput and minstrel_ht_get_expected_throughput to return values with the unit [Kbps] instead of [10Kbps]. Otherwise routing algorithms like B.A.T.M.A.N. V will make incorrect decision based on these values. Its kernel based implementation expects expected_throughput always to have the unit [Kbps] and not sometimes [10Kbps] and sometimes [Kbps]. The same requirement has iw or olsrdv2's nl80211 based statistics module which retrieve the same data via NL80211_STA_INFO_TX_BITRATE. Cc: stable@vger.kernel.org Fixes: `6a27b2c40b` ("mac80211: restructure per-rate throughput calculation into function") Signed-off-by: Sven Eckelmann <sven@open-mesh.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-02-02 15:57:02 +01:00
Konstantin Khlebnikov	f84a93726e	mac80211: minstrel_ht: fix out-of-bound in minstrel_ht_set_best_prob_rate Patch fixes this splat BUG: KASAN: slab-out-of-bounds in minstrel_ht_update_stats.isra.7+0x6e1/0x9e0 [mac80211] at addr ffff8800cee640f4 Read of size 4 by task swapper/3/0 Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Link: http://lkml.kernel.org/r/CALYGNiNyJhSaVnE35qS6UCGaSb2Dx1_i5HcRavuOX14oTz2P+w@mail.gmail.com Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-02-02 15:54:49 +01:00
Chris Bainbridge	f39ea2690b	mac80211: fix use of uninitialised values in RX aggregation Use kzalloc instead of kmalloc for struct tid_ampdu_rx to initialize the "removed" field (all others are initialized manually). That fixes: UBSAN: Undefined behaviour in net/mac80211/rx.c:932:29 load of value 2 is not a valid value for type '_Bool' CPU: 3 PID: 1134 Comm: kworker/u16:7 Not tainted 4.5.0-rc1+ #265 Workqueue: phy0 rt2x00usb_work_rxdone 0000000000000004 ffff880254a7ba50 ffffffff8181d866 0000000000000007 ffff880254a7ba78 ffff880254a7ba68 ffffffff8188422d ffffffff8379b500 ffff880254a7bab8 ffffffff81884747 0000000000000202 0000000348620032 Call Trace: [<ffffffff8181d866>] dump_stack+0x45/0x5f [<ffffffff8188422d>] ubsan_epilogue+0xd/0x40 [<ffffffff81884747>] __ubsan_handle_load_invalid_value+0x67/0x70 [<ffffffff82227b4d>] ieee80211_sta_reorder_release.isra.16+0x5ed/0x730 [<ffffffff8222ca14>] ieee80211_prepare_and_rx_handle+0xd04/0x1c00 [<ffffffff8222db03>] __ieee80211_rx_handle_packet+0x1f3/0x750 [<ffffffff8222e4a7>] ieee80211_rx_napi+0x447/0x990 While at it, convert to use sizeof(tid_agg_rx) instead. Fixes: `788211d81b` ("mac80211: fix RX A-MPDU session reorder timer deletion") Cc: stable@vger.kernel.org Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> [reword commit message, use sizeof(tid_agg_rx)] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-29 17:13:44 +01:00
Johannes Berg	cb150b9d23	cfg80211/wext: fix message ordering Since cfg80211 frequently takes actions from its netdev notifier call, wireless extensions messages could still be ordered badly since the wext netdev notifier, since wext is built into the kernel, runs before the cfg80211 netdev notifier. For example, the following can happen: 5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff 5: wlan1: <BROADCAST,MULTICAST,UP> link/ether when setting the interface down causes the wext message. To also fix this, export the wireless_nlevent_flush() function and also call it from the cfg80211 notifier. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-29 17:13:43 +01:00
Johannes Berg	8bf862739a	wext: fix message delay/ordering Beniamino reported that he was getting an RTM_NEWLINK message for a given interface, after the RTM_DELLINK for it. It turns out that the message is a wireless extensions message, which was sent because the interface had been connected and disconnection while it was deleted caused a wext message. For its netlink messages, wext uses RTM_NEWLINK, but the message is without all the regular rtnetlink attributes, so "ip monitor link" prints just rudimentary information: 5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff Deleted 5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff 5: wlan1: <BROADCAST,MULTICAST,UP> link/ether (from my hwsim reproduction) This can cause userspace to get confused since it doesn't expect an RTM_NEWLINK message after RTM_DELLINK. The reason for this is that wext schedules a worker to send out the messages, and the scheduling delay can cause the messages to get out to userspace in different order. To fix this, have wext register a netdevice notifier and flush out any pending messages when netdevice state changes. This fixes any ordering whenever the original message wasn't sent by a notifier itself. Cc: stable@vger.kernel.org Reported-by: Beniamino Galvani <bgalvani@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-29 17:13:39 +01:00
Johannes Berg	6736fde967	rfkill: fix rfkill_fop_read wait_event usage The code within wait_event_interruptible() is called with !TASK_RUNNING, so mustn't call any functions that can sleep, like mutex_lock(). Since we re-check the list_empty() in a loop after the wait, it's safe to simply use list_empty() without locking. This bug has existed forever, but was only discovered now because all userspace implementations, including the default 'rfkill' tool, use poll() or select() to get a readable fd before attempting to read. Cc: stable@vger.kernel.org Fixes: `c64fb01627` ("rfkill: create useful userspace interface") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-26 11:32:05 +01:00
Sachin Kulkarni	4fa11ec726	mac80211: Requeue work after scan complete for all VIF types. During a sw scan ieee80211_iface_work ignores work items for all vifs. However after the scan complete work is requeued only for STA, ADHOC and MESH iftypes. This occasionally results in event processing getting delayed/not processed for iftype AP when it coexists with a STA. This can result in data halt and eventually disconnection on the AP interface. Cc: stable@vger.kernel.org Signed-off-by: Sachin Kulkarni <Sachin.Kulkarni@imgtec.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-26 11:27:39 +01:00
Helmut Schaa	da629cf111	mac80211: Don't buffer non-bufferable MMPDUs Non-bufferable MMPDUs are sent out to STAs even while in PS mode (for example probe responses). Applying filtered frame handling for these doesn't seem to make much sense and will only create more air utilization when the STA wakes up. Hence, apply filtered frame handling only for bufferable MMPDUs. Discovered while testing an old VOIP phone that started probing for APs while in PS mode. The mac80211/ath9k AP where the STA is associated would reply with a probe response but the phone sometimes moved to a new channel already and couldn't ack the probe response anymore. In that case mac80211 applied filtered frame handling for the un-acked probe response. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-14 11:10:21 +01:00
Eliad Peller	2bc533bd9d	mac80211: handle sched_scan_stopped vs. hw restart race On hw restart, mac80211 might try to reconfigure already stopped sched scan, if ieee80211_sched_scan_stopped_work() wasn't scheduled yet. This in turn will keep the device driver with scheduled scan configured, while both mac80211 and cfg80211 will clear their sched scan state once the work is scheduled. Fix it by ignoring ieee80211_sched_scan_stopped() calls while in hw restart, and flush the work before starting the reconfiguration. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-14 11:10:20 +01:00
Emmanuel Grumbach	1a57081add	mac80211: fix PS-Poll handling My commit below broken PS-Poll handling. In case the driver has no frames buffered, driver_release_tids will be 0, but calling find_highest_prio_tid() with 0 as a parameter is not a good idea: fls(0) - 1 = -1. This bug caused mac80211 to think that frames were buffered in the driver which in turn was confused because mac80211 was asking to release frames that were not reported to exist. On iwlwifi, this led to the WARNING below: WARNING: CPU: 0 PID: 11230 at drivers/net/wireless/intel/iwlwifi/mvm/sta.c:1733 iwl_mvm_sta_modify_sleep_tx_count+0x2af/0x320 [iwlmvm]() ffffffffc0627c60 ffff8800069b7648 ffffffff81888913 0000000000000000 0000000000000000 ffff8800069b7688 ffffffff81089d6a ffff8800069b7678 0000000000000001 ffff88003b35abf0 ffff88000698b128 ffff8800069b76d4 Call Trace: [<ffffffff81888913>] dump_stack+0x4c/0x65 [<ffffffff81089d6a>] warn_slowpath_common+0x8a/0xc0 [<ffffffff81089e5a>] warn_slowpath_null+0x1a/0x20 [<ffffffffc05f36bf>] iwl_mvm_sta_modify_sleep_tx_count+0x2af/0x320 [iwlmvm] [<ffffffffc05dae41>] iwl_mvm_mac_release_buffered_frames+0x31/0x40 [iwlmvm] [<ffffffffc045d8b6>] ieee80211_sta_ps_deliver_response+0x6e6/0xd80 [mac80211] [<ffffffffc0461296>] ieee80211_sta_ps_deliver_poll_response+0x26/0x30 [mac80211] [<ffffffffc048f743>] ieee80211_rx_handlers+0xa83/0x2900 [mac80211] [<ffffffffc04917ad>] ieee80211_prepare_and_rx_handle+0x1ed/0xa70 [mac80211] [<ffffffffc045e3d5>] ? sta_info_get_bss+0x5/0x4a0 [mac80211] [<ffffffffc04925b6>] ieee80211_rx_napi+0x586/0xcd0 [mac80211] [<ffffffffc05eaa3e>] iwl_mvm_rx_rx_mpdu+0x59e/0xc60 [iwlmvm] Fixes: `0ead2510f8` ("mac80211: allow the driver to send EOSP when needed") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-14 11:10:18 +01:00
Eliad Peller	b9f628fcc6	mac80211: clear local->sched_scan_req properly on reconfig On reconfig, in case of sched_scan_req->n_scan_plans > 1, local->sched_scan_req was never cleared, although cfg80211_sched_scan_stopped_rtnl() was called, resulting in local->sched_scan_req holding a stale and preventing further scheduled scan requests. Clear it explicitly in this case. Fixes: 42a7e82c6792 ("mac80211: Do not restart scheduled scan if multiple scan plans are set") Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-14 11:10:16 +01:00
Eliad Peller	470f4d613b	mac80211: avoid ROC during hw restart Defer ROC requests during hw restart, as the driver might not be fully configured in this stage (e.g. channel contexts were not added yet) Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-14 11:10:14 +01:00
Johannes Berg	c3826807bb	regulatory: fix world regulatory domain data The rule definitions here aren't really valid, they would be rejected if it came from userspace due to the bandwidth specified being bigger than the rule's width. This is fairly much inconsequential since the other rules around them do enable the bandwidth, but express that better using the NL80211_RRF_AUTO_BW flag. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-14 11:10:13 +01:00
Dave Young	94c4fd641e	wireless: change cfg80211 regulatory domain info as debug messages cfg80211 module prints a lot of messages like below. Actually printing once is acceptable but sometimes it will print again and again, it looks very annoying. It is better to change these detail messages to debugging only. cfg80211: World regulatory domain updated: cfg80211: DFS Master region: unset cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time) cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A) cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A) cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A) cfg80211: (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A) cfg80211: (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s) cfg80211: (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s) cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A) cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A) The changes in this patch is to replace pr_info with pr_debug in function print_rd_rules and print_regdomain_info Signed-off-by: Dave Young <dyoung@redhat.com> [change some pr_err() statements to at least keep the alpha2] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-14 11:10:11 +01:00
Johannes Berg	e6a8a3aaaa	mac80211: fix remain-on-channel cancellation Ilan's previous commit `1b894521e6` ("mac80211: handle HW ROC expired properly") neglected to take into account that hw_begun was now always set in the software implementation as well as the offloaded case. Fix hw_begun to only apply to the offloaded case to make the check in Ilan's commit safe and correct. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-14 11:10:10 +01:00
Johannes Berg	e9db455787	mac80211: recalculate SW ROC only when needed The current (new) code recalculates the new work timeout for software remain-on-channel whenever any item started. In two of the callers of ieee80211_handle_roc_started(), this is completely pointless since they're for hardware and will skip the recalculation entirely; it's necessary only in the case of having just added a new item to the list, as in the last remaining case the recalculation had just been done. This last case, however, is also problematic - if one of the items on the list actually expires during the recalc the list iteration outside becomes corrupted and crashes. Fix this by moving the recalculation to the only place where it's required. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-01-14 11:10:09 +01:00
David S. Miller	1adbfc435f	Included bugfixes: - avoid freeing batadv_hardif_neigh_node when still in use in other contexts - prevent lockdep splat in mcast_free during shutdown -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJWlovIAAoJENpFlCjNi1MRxccP/1+pDAPE9GpSUwOaDUYjWVYQ 4k1jRdJZAiMTrLyr1pW3Gesq3wTwtcIjjevaZGOa3QyhKRPmVj7yHqeYv62+WX9x vEu6Mq6btBUqF+7+hCA/7n+Wyca+fsg9ge87hi6blmTSexlrcog80vTf2/N0CAQ4 w9jVUWcWpK8fZ6ia5IiK/RNw9gL4onh98V/7WqcvUyOmBmlyQJTpdqvvj5DZGFbL u6B5hr0hzLirW2LJsk4fdYgKzyMYXBPtLZoiRiuk9GyjMDK+t0rUPdpuoPYhiVhH ZFidpICvNpBDQyDjITGX6XvAUqYI4bjDfxT3vdznkNhcmcPjrcITSw8x8u2rB3Xr TTVPtxi1OvGc0UCW+2Y47IJuHowZ7dKI0B2PiHdvLAlgM4dMYupZlruRMhR/XF0b x7wSsbETxO3g+SH3qqJVMyebrrX//uC19vThbq+gVzMEa1hkZSSvDZbp/c2RDlCA h5DtOjYPzCQeFo/8eorprUaO0zpbF0k/fTQu8f9csyozKVMdcCTO2X0vSRoXJG+m 9EP4WiFhqRXZWlgrTAXVxTgPlP1KEDHTmjynC2iwkg1Fxje1xBWX+zdWAt9Vgrby HhYlb2wWg6tTt8zHjqwtOUqeCMDl8/7I+nk4GjJMzlNv3Iv2+3Eyj+DALOiQgt1S xqvE8KkN7EEAzlzkqDPF =6uBh -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Antonio Quartulli says: ==================== net: batman-adv 20160114 Included bugfixes: - avoid freeing batadv_hardif_neigh_node when still in use in other contexts - prevent lockdep splat in mcast_free during shutdown ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-13 14:57:42 -05:00
David S. Miller	b8e429a2fe	genetlink: Fix off-by-one in genl_allocate_reserve_groups() The bug fix for adding n_groups to the computation forgot to adjust ">=" to ">" to keep the condition correct. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-13 10:28:06 -05:00
Sven Eckelmann	bab7c6c3de	batman-adv: Fix list removal of batadv_hardif_neigh_node The neigh_list with batadv_hardif_neigh_node objects is accessed with only rcu_read_lock in batadv_hardif_neigh_get and batadv_iv_neigh_print. Thus it is not allowed to kfree the object before the rcu grace period ends (which may still protects context accessing this object). Therefore the object has first to be removed from the neigh_list and then it has either wait with synchronize_rcu or call_rcu till the grace period ends before it can be freed. Fixes: `cef63419f7` ("batman-adv: add list of unique single hop neighbors per hard-interface") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-01-13 19:28:27 +08:00
Simon Wunderlich	af63cf51b7	batman-adv: fix lockdep splat when doing mcast_free While testing, we got something like this: WARNING: CPU: 0 PID: 238 at net/batman-adv/multicast.c:142 batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv]() [...] Call Trace: [<ffffffff815fc597>] dump_stack+0x4b/0x64 [<ffffffff810b34dc>] warn_slowpath_common+0xbc/0x120 [<ffffffffa0024ec5>] ? batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffff810b3705>] warn_slowpath_null+0x15/0x20 [<ffffffffa0024ec5>] batadv_mcast_mla_tt_retract+0x94/0x205 [batman_adv] [<ffffffffa00273fe>] batadv_mcast_free+0x36/0x39 [batman_adv] [<ffffffffa0020c77>] batadv_mesh_free+0x7d/0x13f [batman_adv] [<ffffffffa0036a6b>] batadv_softif_free+0x15/0x25 [batman_adv] [...] Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-01-13 19:26:25 +08:00
David S. Miller	ddb5388ffd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux	2016-01-13 00:21:27 -05:00
Linus Torvalds	aee3bfa330	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from Davic Miller: 1) Support busy polling generically, for all NAPI drivers. From Eric Dumazet. 2) Add byte/packet counter support to nft_ct, from Floriani Westphal. 3) Add RSS/XPS support to mvneta driver, from Gregory Clement. 4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes Frederic Sowa. 5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai. 6) Add support for VLAN device bridging to mlxsw switch driver, from Ido Schimmel. 7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski. 8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko. 9) Reorganize wireless drivers into per-vendor directories just like we do for ethernet drivers. From Kalle Valo. 10) Provide a way for administrators "destroy" connected sockets via the SOCK_DESTROY socket netlink diag operation. From Lorenzo Colitti. 11) Add support to add/remove multicast routes via netlink, from Nikolay Aleksandrov. 12) Make TCP keepalive settings per-namespace, from Nikolay Borisov. 13) Add forwarding and packet duplication facilities to nf_tables, from Pablo Neira Ayuso. 14) Dead route support in MPLS, from Roopa Prabhu. 15) TSO support for thunderx chips, from Sunil Goutham. 16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon. 17) Rationalize, consolidate, and more completely document the checksum offloading facilities in the networking stack. From Tom Herbert. 18) Support aborting an ongoing scan in mac80211/cfg80211, from Vidyullatha Kanchanapally. 19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits) net: bnxt: always return values from _bnxt_get_max_rings net: bpf: reject invalid shifts phonet: properly unshare skbs in phonet_rcv() dwc_eth_qos: Fix dma address for multi-fragment skbs phy: remove an unneeded condition mdio: remove an unneed condition mdio_bus: NULL dereference on allocation error net: Fix typo in netdev_intersect_features net: freescale: mac-fec: Fix build error from phy_device API change net: freescale: ucc_geth: Fix build error from phy_device API change bonding: Prevent IPv6 link local address on enslaved devices IB/mlx5: Add flow steering support net/mlx5_core: Export flow steering API net/mlx5_core: Make ipv4/ipv6 location more clear net/mlx5_core: Enable flow steering support for the IB driver net/mlx5_core: Initialize namespaces only when supported by device net/mlx5_core: Set priority attributes net/mlx5_core: Connect flow tables net/mlx5_core: Introduce modify flow table command net/mlx5_core: Managing root flow table ...	2016-01-12 18:57:02 -08:00
Linus Torvalds	33caf82acf	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of stuff. That probably should've been 5 or 6 separate branches, but by the time I'd realized how large and mixed that bag had become it had been too close to -final to play with rebasing. Some fs/namei.c cleanups there, memdup_user_nul() introduction and switching open-coded instances, burying long-dead code, whack-a-mole of various kinds, several new helpers for ->llseek(), assorted cleanups and fixes from various people, etc. One piece probably deserves special mention - Neil's lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets called without ->i_mutex and tries to avoid ever taking it. That, of course, means that it's not useful for any directory modifications, but things like getting inode attributes in nfds readdirplus are fine with that. I really should've asked for moratorium on lookup-related changes this cycle, but since I hadn't done that early enough... I am asking for that for the coming cycle, though - I'm going to try and get conversion of i_mutex to rwsem with ->lookup() done under lock taken shared. There will be a patch closer to the end of the window, along the lines of the one Linus had posted last May - mechanical conversion of ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/ inode_is_locked()/inode_lock_nested(). To quote Linus back then: ----- \| This is an automated patch using \| \| sed 's/mutex_lock(&\(.\)->i_mutex)/inode_lock(\1)/' \| sed 's/mutex_unlock(&\(.\)->i_mutex)/inode_unlock(\1)/' \| sed 's/mutex_lock_nested(&\(.\)->i_mutex,[ ]I_MUTEX_\([A-Z0-9_]\))/inode_lock_nested(\1, I_MUTEX_\2)/' \| sed 's/mutex_is_locked(&\(.\)->i_mutex)/inode_is_locked(\1)/' \| sed 's/mutex_trylock(&\(.\)->i_mutex)/inode_trylock(\1)/' \| \| with a very few manual fixups ----- I'm going to send that once the ->i_mutex-affecting stuff in -next gets mostly merged (or when Linus says he's about to stop taking merges)" 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) nfsd: don't hold i_mutex over userspace upcalls fs:affs:Replace time_t with time64_t fs/9p: use fscache mutex rather than spinlock proc: add a reschedule point in proc_readfd_common() logfs: constify logfs_block_ops structures fcntl: allow to set O_DIRECT flag on pipe fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE fs: xattr: Use kvfree() [s390] page_to_phys() always returns a multiple of PAGE_SIZE nbd: use ->compat_ioctl() fs: use block_device name vsprintf helper lib/vsprintf: add %*pg format specifier fs: use gendisk->disk_name where possible poll: plug an unused argument to do_poll amdkfd: don't open-code memdup_user() cdrom: don't open-code memdup_user() rsxx: don't open-code memdup_user() mtip32xx: don't open-code memdup_user() [um] mconsole: don't open-code memdup_user_nul() [um] hostaudio: don't open-code memdup_user() ...	2016-01-12 17:11:47 -08:00
Rabin Vincent	229394e8e6	net: bpf: reject invalid shifts On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a constant shift that can't be encoded in the immediate field of the UBFM/SBFM instructions is passed to the JIT. Since these shifts amounts, which are negative or >= regsize, are invalid, reject them in the eBPF verifier and the classic BPF filter checker, for all architectures. Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-12 17:06:53 -05:00
Matti Vaittinen	ccdf6ce6a8	net: netlink: Fix multicast group storage allocation for families with more than one groups Multicast groups are stored in global buffer. Check for needed buffer size incorrectly compares buffer size to first id for family. This means that for families with more than one mcast id one may allocate too small buffer and end up writing rest of the groups to some unallocated memory. Fix the buffer size check to compare allocated space to last mcast id for the family. Tested on ARM using kernel 3.14 Signed-off-by: Matti Vaittinen <matti.vaittinen@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-12 16:40:15 -05:00
Rabin Vincent	06928b3870	net: bpf: reject invalid shifts On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a constant shift that can't be encoded in the immediate field of the UBFM/SBFM instructions is passed to the JIT. Since these shifts amounts, which are negative or >= regsize, are invalid, reject them in the eBPF verifier and the classic BPF filter checker, for all architectures. Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-12 15:55:39 -05:00
Eric Dumazet	7aaed57c5c	phonet: properly unshare skbs in phonet_rcv() Ivaylo Dimitrov reported a regression caused by commit `7866a62104` ("dev: add per net_device packet type chains"). skb->dev becomes NULL and we crash in __netif_receive_skb_core(). Before above commit, different kind of bugs or corruptions could happen without major crash. But the root cause is that phonet_rcv() can queue skb without checking if skb is shared or not. Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests. Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-12 12:05:38 -05:00
David S. Miller	9d367eddf3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/bonding/bond_main.c drivers/net/ethernet/mellanox/mlxsw/spectrum.h drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c The bond_main.c and mellanox switch conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 23:55:43 -05:00
Michal Kubeček	40ba330227	udp: disallow UFO for sockets with SO_NO_CHECK option Commit `acf8dd0a9d` ("udp: only allow UFO for packets from SOCK_DGRAM sockets") disallows UFO for packets sent from raw sockets. We need to do the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if for a bit different reason: while such socket would override the CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and bad offloading flags warning is triggered in __skb_gso_segment(). In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow UFO for packets sent by sockets with UDP_NO_CHECK6_TX option. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Tested-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:40:57 -05:00
John Fastabend	3de03596df	net: pktgen: fix null ptr deref in skb allocation Fix possible null pointer dereference that may occur when calling skb_reserve() on a null skb. Fixes: `879c7220e8` ("net: pktgen: Observe needed_headroom of the device") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:39:44 -05:00
Daniel Borkmann	c6c3345407	bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key After IPv6 support has recently been added to metadata dst and related encaps, add support for populating/reading it from an eBPF program. Commit `d3aa45ce6b` ("bpf: add helpers to access tunnel metadata") started with initial IPv4-only support back then (due to IPv6 metadata support not being available yet). To stay compatible with older programs, we need to test for the passed structure size. Also TOS and TTL support from the ip_tunnel_info key has been added. Tested with vxlan devs in collect meta data mode with IPv4, IPv6 and in compat mode over different network namespaces. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:32:55 -05:00
Daniel Borkmann	781c53bc5d	bpf: export helper function flags and reject invalid ones Export flags used by eBPF helper functions through UAPI, so they can be used by programs (instead of them redefining all flags each time or just using the hard-coded values). It also gives a better overview what flags are used where and we can further get rid of the extra macros defined in filter.c. Moreover, reject invalid flags. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:32:55 -05:00
Jamal Hadi Salim	66530bdf85	sched,cls_flower: set key address type when present only when user space passes the addresses should we consider their presence Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:27:30 -05:00
Neal Cardwell	83d15e70c4	tcp_yeah: don't set ssthresh below 2 For tcp_yeah, use an ssthresh floor of 2, the same floor used by Reno and CUBIC, per RFC 5681 (equation 4). tcp_yeah_ssthresh() was sometimes returning a 0 or negative ssthresh value if the intended reduction is as big or bigger than the current cwnd. Congestion control modules should never return a zero or negative ssthresh. A zero ssthresh generally results in a zero cwnd, causing the connection to stall. A negative ssthresh value will be interpreted as a u32 and will set a target cwnd for PRR near 4 billion. Oleksandr Natalenko reported that a system using tcp_yeah with ECN could see a warning about a prior_cwnd of 0 in tcp_cwnd_reduction(). Testing verified that this was due to tcp_yeah_ssthresh() misbehaving in this way. Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:25:16 -05:00
Marcelo Ricardo Leitner	649621e3d5	sctp: fix use-after-free in pr_debug statement Dmitry Vyukov reported a use-after-free in the code expanded by the macro debug_post_sfx, which is caused by the use of the asoc pointer after it was freed within sctp_side_effect() scope. This patch fixes it by allowing sctp_side_effect to clear that asoc pointer when the TCB is freed. As Vlad explained, we also have to cover the SCTP_DISPOSITION_ABORT case because it will trigger DELETE_TCB too on that same loop. Also, there were places issuing SCTP_CMD_INIT_FAILED and ASSOC_FAILED but returning SCTP_DISPOSITION_CONSUME, which would fool the scheme above. Fix it by returning SCTP_DISPOSITION_ABORT instead. The macro is already prepared to handle such NULL pointer. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:13:01 -05:00
Alexander Kuleshov	617cfc7530	net/rtnetlink: remove unused sz_idx variable The sz_idx variable is defined in the rtnetlink_rcv_msg(), but not used anywhere. Let's remove it. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 00:22:20 -05:00
willy tarreau	712f4aad40	unix: properly account for FDs passed over unix sockets It is possible for a process to allocate and accumulate far more FDs than the process' limit by sending them over a unix socket then closing them to keep the process' fd count low. This change addresses this problem by keeping track of the number of FDs in flight per user and preventing non-privileged processes from having more FDs in flight than their configured FD limit. Reported-by: socketpair@gmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Mitigates: CVE-2013-4312 (Linux 2.0+) Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 00:05:30 -05:00
Jean Sacren	c5420eb12f	openvswitch: update kernel doc for struct vport commit `be4ace6e6b` ("openvswitch: Move dev pointer into vport itself") The commit above added @dev and moved @rcu to the bottom of struct vport, but the change was not reflected in the kernel doc. So let's update the kernel doc as well. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 23:49:21 -05:00
Jean Sacren	2f7066ada1	openvswitch: fix struct geneve_port member name commit `6b001e682e` ("openvswitch: Use Geneve device.") The commit above introduced 'port_no' as the name for the member of struct geneve_port. The correct name should be 'dst_port' as described in the kernel doc. Let's fix that member name and all the pertinent instances so that both doc and code would be consistent. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 23:49:21 -05:00
Jean Sacren	5ea030429f	openvswitch: clean up unused function commit `6b001e682e` ("openvswitch: Use Geneve device.") The commit above deleted the only call site of ovs_tunnel_route_lookup() and now that function is not used any more. So let's delete the function definition as well. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 23:49:21 -05:00
Eric Dumazet	3e4006f0b8	ipv6: tcp: add rcu locking in tcp_v6_send_synack() When first SYNACK is sent, we already hold rcu_read_lock(), but this is not true if a SYNACK is retransmitted, as a timer (soft) interrupt does not hold rcu_read_lock() Fixes: `45f6fad84c` ("ipv6: add complete rcu protection around np->opt") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 22:58:03 -05:00
Eric Dumazet	a78cb84c62	net: add scheduling point in recvmmsg/sendmmsg Applications often have to reduce number of datagrams they receive or send per system call to avoid starvation problems. Really the kernel should take care of this by using cond_resched(), so that applications can experiment bigger batch sizes. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 22:56:29 -05:00
Lubomir Rintel	3d171f3907	ipv6: always add flag an address that failed DAD with DADFAILED The userspace needs to know why is the address being removed so that it can perhaps obtain a new address. Without the DADFAILED flag it's impossible to distinguish removal of a temporary and tentative address due to DAD failure from other reasons (device removed, manual address removal). Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 22:54:27 -05:00
Daniel Borkmann	1f211a1b92	net, sched: add clsact qdisc This work adds a generalization of the ingress qdisc as a qdisc holding only classifiers. The clsact qdisc works on ingress, but also on egress. In both cases, it's execution happens without taking the qdisc lock, and the main difference for the egress part compared to prior version of [1] is that this can be applied with _any_ underlying real egress qdisc (also classless ones). Besides solving the use-case of [1], that is, allowing for more programmability on assigning skb->priority for the mqprio case that is supported by most popular 10G+ NICs, it also opens up a lot more flexibility for other tc applications. The main work on classification can already be done at clsact egress time if the use-case allows and state stored for later retrieval f.e. again in skb->priority with major/minors (which is checked by most classful qdiscs before consulting tc_classify()) and/or in other skb fields like skb->tc_index for some light-weight post-processing to get to the eventual classid in case of a classful qdisc. Another use case is that the clsact egress part allows to have a central egress counterpart to the ingress classifiers, so that classifiers can easily share state (e.g. in cls_bpf via eBPF maps) for ingress and egress. Currently, default setups like mq + pfifo_fast would require for this to use, for example, prio qdisc instead (to get a tc_classify() run) and to duplicate the egress classifier for each queue. With clsact, it allows for leaving the setup as is, it can additionally assign skb->priority to put the skb in one of pfifo_fast's bands and it can share state with maps. Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid) w/o the need to perform a skb_dst_force() to hold on to it any longer. In lwt case, we can also use this facility to setup dst metadata via cls_bpf (bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for that (case of IFF_NO_QUEUE devices, for example). The realization can be done without any changes to the scheduler core framework. All it takes is that we have two a-priori defined minors/child classes, where we can mux between ingress and egress classifier list (dev->ingress_cl_list and dev->egress_cl_list, latter stored close to dev->_tx to avoid extra cacheline miss for moderate loads). The egress part is a bit similar modelled to handle_ing() and patched to a noop in case the functionality is not used. Both handlers are now called sch_handle_ingress() and sch_handle_egress(), code sharing among the two doesn't seem practical as there are various minor differences in both paths, so that making them conditional in a single handler would rather slow things down. Full compatibility to ingress qdisc is provided as well. Since both piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist per netdevice, and thus ingress qdisc specific behaviour can be retained for user space. This means, either a user does 'tc qdisc add dev foo ingress' and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact' alternative, where both, ingress and egress classifier can be configured as in the below example. ingress qdisc supports attaching classifier to any minor number whereas clsact has two fixed minors for muxing between the lists, therefore to not break user space setups, they are better done as two separate qdiscs. I decided to extend the sch_ingress module with clsact functionality so that commonly used code can be reused, the module is being aliased with sch_clsact so that it can be auto-loaded properly. Alternative would have been to add a flag when initializing ingress to alter its behaviour plus aliasing to a different name (as it's more than just ingress). However, the first would end up, based on the flag, choosing the new/old behaviour by calling different function implementations to handle each anyway, the latter would require to register ingress qdisc once again under different alias. So, this really begs to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops by its own that share callbacks used by both. Example, adding qdisc: # tc qdisc add dev foo clsact # tc qdisc show dev foo qdisc mq 0: root qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc clsact ffff: parent ffff:fff1 Adding filters (deleting, etc works analogous by specifying ingress/egress): # tc filter add dev foo ingress bpf da obj bar.o sec ingress # tc filter add dev foo egress bpf da obj bar.o sec egress # tc filter show dev foo ingress filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action # tc filter show dev foo egress filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will show an empty list for clsact. Either using the parent names (ingress/egress) or specifying the full major/minor will then show the related filter lists. Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend. [1] http://patchwork.ozlabs.org/patch/512949/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 22:13:15 -05:00
Sasha Levin	320f1a4a17	net: sctp: prevent writes to cookie_hmac_alg from accessing invalid memory proc_dostring() needs an initialized destination string, while the one provided in proc_sctp_do_hmac_alg() contains stack garbage. Thus, writing to cookie_hmac_alg would strlen() that garbage and end up accessing invalid memory. Fixes: `3c68198e7` ("sctp: Make hmac algorithm selection for cookie generation dynamic") Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 18:01:01 -05:00
Daniel Borkmann	f8ffad69c9	bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions Add a small helper skb_postpush_rcsum() and fix up redirect locations that need CHECKSUM_COMPLETE fixups on ingress. dev_forward_skb() expects a proper csum that covers also Ethernet header, f.e. since `2c26d34bbc` ("net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding"), we also do skb_postpull_rcsum() after pulling Ethernet header off via eth_type_trans(). When using eBPF in a netns setup f.e. with vxlan in collect metadata mode, I can trigger the following csum issue with an IPv6 setup: [ 505.144065] dummy1: hw csum failure [...] [ 505.144108] Call Trace: [ 505.144112] <IRQ> [<ffffffff81372f08>] dump_stack+0x44/0x5c [ 505.144134] [<ffffffff81607cea>] netdev_rx_csum_fault+0x3a/0x40 [ 505.144142] [<ffffffff815fee3f>] __skb_checksum_complete+0xcf/0xe0 [ 505.144149] [<ffffffff816f0902>] nf_ip6_checksum+0xb2/0x120 [ 505.144161] [<ffffffffa08c0e0e>] icmpv6_error+0x17e/0x328 [nf_conntrack_ipv6] [ 505.144170] [<ffffffffa0898eca>] ? ip6t_do_table+0x2fa/0x645 [ip6_tables] [ 505.144177] [<ffffffffa08c0725>] ? ipv6_get_l4proto+0x65/0xd0 [nf_conntrack_ipv6] [ 505.144189] [<ffffffffa06c9a12>] nf_conntrack_in+0xc2/0x5a0 [nf_conntrack] [ 505.144196] [<ffffffffa08c039c>] ipv6_conntrack_in+0x1c/0x20 [nf_conntrack_ipv6] [ 505.144204] [<ffffffff8164385d>] nf_iterate+0x5d/0x70 [ 505.144210] [<ffffffff816438d6>] nf_hook_slow+0x66/0xc0 [ 505.144218] [<ffffffff816bd302>] ipv6_rcv+0x3f2/0x4f0 [ 505.144225] [<ffffffff816bca40>] ? ip6_make_skb+0x1b0/0x1b0 [ 505.144232] [<ffffffff8160b77b>] __netif_receive_skb_core+0x36b/0x9a0 [ 505.144239] [<ffffffff8160bdc8>] ? __netif_receive_skb+0x18/0x60 [ 505.144245] [<ffffffff8160bdc8>] __netif_receive_skb+0x18/0x60 [ 505.144252] [<ffffffff8160ccff>] process_backlog+0x9f/0x140 [ 505.144259] [<ffffffff8160c4a5>] net_rx_action+0x145/0x320 [...] What happens is that on ingress, we push Ethernet header back in, either from cls_bpf or right before skb_do_redirect(), but without updating csum. The "hw csum failure" can be fixed by using the new skb_postpush_rcsum() helper for the dev_forward_skb() case to correct the csum diff again. Thanks to Hannes Frederic Sowa for the csum_partial() idea! Fixes: `3896d655f4` ("bpf: introduce bpf_clone_redirect() helper") Fixes: `27b29f6305` ("bpf: add bpf_redirect() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 17:54:28 -05:00
Daniel Borkmann	fdc5432a7b	net, sched: add skb_at_tc_ingress helper Add a skb_at_tc_ingress() as this will be needed elsewhere as well and can hide the ugly ifdef. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 17:54:28 -05:00
Nikolay Borisov	b840d15d39	ipv4: Namespecify the tcp_keepalive_intvl sysctl knob This is the final part required to namespaceify the tcp keep alive mechanism. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 17:32:09 -05:00
Nikolay Borisov	9bd6861bd4	ipv4: Namespecify tcp_keepalive_probes sysctl knob This is required to have full tcp keepalive mechanism namespace support. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 17:32:09 -05:00

1 2 3 4 5 ...

40561 Commits