linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-30 16:13:54 +08:00

Author	SHA1	Message	Date
Pablo Neira Ayuso	876665eafc	netfilter: nft_chain_nat_ipv6: use generic IPv6 NAT code from core Use the exported IPv6 NAT functions that are provided by the core. This removes duplicated code so iptables and nft use the same NAT codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:09 +02:00
Pablo Neira Ayuso	2a5538e9aa	netfilter: nat: move specific NAT IPv6 to core Move the specific NAT IPv6 core functions that are called from the hooks from ip6table_nat.c to nf_nat_l3proto_ipv6.c. This prepares the ground to allow iptables and nft to use the same NAT engine code that comes in a follow up patch. This also renames nf_nat_ipv6_fn to nft_nat_ipv6_fn in net/ipv6/netfilter/nft_chain_nat_ipv6.c to avoid a compilation breakage. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:30:00 +02:00
David S. Miller	5b4c314575	Merge tag 'master-2014-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-09-08 Please pull this batch of updates intended for the 3.18 stream... For the mac80211 bits, Johannes says: "Not that much content this time. Some RCU cleanups, crypto performance improvements, and various patches all over, rather than listing them one might as well look into the git log instead." For the Bluetooth bits, Gustavo says: "The changes consists of: - Coding style fixes to HCI drivers - Corrupted ack value fix for the H5 HCI driver - A couple of Enhanced L2CAP fixes - Conversion of SMP code to use common L2CAP channel API - Page scan optimizations when using the kernel-side whitelist - Various mac802154 and and ieee802154 6lowpan cleanups - One new Atheros USB ID" For the iwlwifi bits, Emmanuel says: "We have a new big thing coming up which is called Dynamic Queue Allocation (or DQA). This is a completely new way to work with the Tx queues and it requires major refactoring. This is being done by Johannes and Avri. Besides this, Johannes disables U-APSD by default because of APs that would disable A-MPDU if the association supports U-ASPD. Luca contributed to the power area which he was cleaning up on the way while working on CSA. A few more random things here and there." For the Atheros bits, Kalle says: "For ath6kl we had two small fixes and a new SDIO device id. For ath10k the bigger changes are: * support for new firmware version 10.2 (Michal) * spectral scan support (Simon, Sven & Mathias) * export a firmware crash dump file (Ben & me) * cleaning up of pci.c (Michal) * print pci id in all messages, which causes most of the churn (Michal)" Beyond that, we have the usual collection of various updates to ath9k, b43, mwifiex, and wil6210, as well as a few other bits here and there. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-08 16:43:58 -07:00
Willem de Bruijn	a7f26b7e1e	inet: remove dead inetpeer sequence code inetpeer sequence numbers are no longer incremented, so no need to check and flush the tree. The function that increments the sequence number was already dead code and removed in in "ipv4: remove unused function" (`068a6e18`). Remove the code that checks for a change, too. Verifying that v4_seq and v6_seq are never incremented and thus that flush_check compares bp->flush_seq to 0 is trivial. The second part of the change removes flush_check completely even though bp->flush_seq is exactly !0 once, at initialization. This change is correct because the time this branch is true is when bp->root == peer_avl_empty_rcu, in which the branch and inetpeer_invalidate_tree are a NOOP. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-08 16:42:42 -07:00
Tom Herbert	1e701f1698	net: Fix GRE RX to use skb_transport_header for GRE header offset GRE assumes that the GRE header is at skb_network_header + ip_hrdlen(skb). It is more general to use skb_transport_header and this allows the possbility of inserting additional header between IP and GRE (which is what we will done in Generic UDP Encapsulation for GRE). Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-08 15:23:05 -07:00
John W. Linville	61a3d4f9d5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2014-09-08 11:14:56 -04:00
David S. Miller	eb84d6b604	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-09-07 21:41:53 -07:00
David S. Miller	45ce829dd0	Merge tag 'master-2014-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-09-05 Please pull this batch of fixes intended for the 3.17 stream... For the mac80211 bits, Johannes says: "Here are a few fixes for mac80211. One has been discussed for a while and adds a terminating NUL-byte to the alpha2 sent to userspace, which shouldn't be necessary but since many places treat it as a string we couldn't move to just sending two bytes. In addition to that, we have two VLAN fixes from Felix, a mesh fix, a fix for the recently introduced RX aggregation offload, a revert for a broken patch (that luckily didn't really cause any harm) and a small fix for alignment in debugfs." For the iwlwifi bits, Emmanuel says: "I revert a patch that disabled CTS to self in dvm because users reported issues. The revert is CCed to stable since the offending patch was sent to stable too. I also bump the firmware API versions since a new firmware is coming up. On top of that, Marcel fixes a bug I introduced while fixing a bug in our Kconfig file." Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-07 16:11:10 -07:00
WANG Cong	de185ab46c	ipv6: restore the behavior of ipv6_sock_ac_drop() It is possible that the interface is already gone after joining the list of anycast on this interface as we don't hold a refcount for the device, in this case we are safe to ignore the error. What's more important, for API compatibility we should not change this behavior for applications even if it were correct. Fixes: commit `a9ed4a2986` ("ipv6: fix rtnl locking in setsockopt for anycast and multicast") Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-07 16:10:07 -07:00
Andy Shevchenko	13aa3463e5	rose: use %ph specifier Instead of dereference each byte let's use %ph specifier in the printk() calls. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-07 16:07:25 -07:00
Neal Cardwell	87d943085b	tcp: remove obsolete comment about TCP_SKB_CB(skb)->when in tcp_fragment() The TCP_SKB_CB(skb)->when field no longer exists as of recent change `7faee5c0d5` ("tcp: remove TCP_SKB_CB(skb)->when"). And in any case, tcp_fragment() is called on already-transmitted packets from the __tcp_retransmit_skb() call site, so copying timestamps of any kind in this spot is quite sensible. Signed-off-by: Neal Cardwell <ncardwell@google.com> Reported-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-06 12:29:10 -07:00
Eric Dumazet	7faee5c0d5	tcp: remove TCP_SKB_CB(skb)->when After commit `740b0f1841` ("tcp: switch rtt estimations to usec resolution"), we no longer need to maintain timestamps in two different fields. TCP_SKB_CB(skb)->when can be removed, as same information sits in skb_mstamp.stamp_jiffies Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:49:33 -07:00
Eric Dumazet	04317dafd1	tcp: introduce TCP_SKB_CB(skb)->tcp_tw_isn TCP_SKB_CB(skb)->when has different meaning in output and input paths. In output path, it contains a timestamp. In input path, it contains an ISN, chosen by tcp_timewait_state_process() Lets add a different name to ease code comprehension. Note that 'when' field will disappear in following patch, as skb_mstamp already contains timestamp, the anonymous union will promptly disappear as well. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:49:33 -07:00
Alexander Duyck	56193d1bce	net: Add function for parsing the header length out of linear ethernet frames This patch updates some of the flow_dissector api so that it can be used to parse the length of ethernet buffers stored in fragments. Most of the changes needed were to __skb_get_poff as it needed to be updated to support sending a linear buffer instead of a skb. I have split __skb_get_poff into two functions, the first is skb_get_poff and it retains the functionality of the original __skb_get_poff. The other function is __skb_get_poff which now works much like __skb_flow_dissect in relation to skb_flow_dissect in that it provides the same functionality but works with just a data buffer and hlen instead of needing an skb. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:47:02 -07:00
Alexander Duyck	82eabd9eb2	net: merge cases where sock_efree and sock_edemux are the same function Since sock_efree and sock_demux are essentially the same code for non-TCP sockets and the case where CONFIG_INET is not defined we can combine the code or replace the call to sock_edemux in several spots. As a result we can avoid a bit of unnecessary code or code duplication. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:43:45 -07:00
Alexander Duyck	62bccb8cdb	net-timestamp: Make the clone operation stand-alone from phy timestamping The phy timestamping takes a different path than the regular timestamping does in that it will create a clone first so that the packets needing to be timestamped can be placed in a queue, or the context block could be used. In order to support these use cases I am pulling the core of the code out so it can be used in other drivers beyond just phy devices. In addition I have added a destructor named sock_efree which is meant to provide a simple way for dropping the reference to skb exceptions that aren't part of either the receive or send windows for the socket, and I have removed some duplication in spots where this destructor could be used in place of sock_edemux. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:43:45 -07:00
Alexander Duyck	37846ef018	net-timestamp: Merge shared code between phy and regular timestamping This change merges the shared bits that exist between skb_tx_tstamp and skb_complete_tx_timestamp. By doing this we can avoid the two diverging as there were already changes pushed into skb_tx_tstamp that hadn't made it into the other function. In addition this resolves issues with the fact that skb_complete_tx_timestamp was included in linux/skbuff.h even though it was only compiled in if phy timestamping was enabled. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:43:45 -07:00
Eric Dumazet	d546c62154	ipv4: harden fnhe_hashfun() Lets make this hash function a bit secure, as ICMP attacks are still in the wild. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:40:33 -07:00
Masanari Iida	e793c0f70e	net: treewide: Fix typo found in DocBook/networking.xml This patch fix spelling typo found in DocBook/networking.xml. It is because the neworking.xml is generated from comments in the source, I have to fix typo in comments within the source. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:35:28 -07:00
Pablo Neira Ayuso	84a59ca55f	netfilter: add explicit Kconfig for NETFILTER_XT_NAT Paul Bolle reports that 'select NETFILTER_XT_NAT' from the IPV4 and IPV6 NAT tables becomes noop since there is no Kconfig switch for it. Add the Kconfig switch to resolve this problem. Fixes: `8993cf8` netfilter: move NAT Kconfig switches out of the iptables scope Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:23:31 -07:00
Eric Dumazet	caa415270c	ipv4: fix a race in update_or_create_fnhe() nh_exceptions is effectively used under rcu, but lacks proper barriers. Between kzalloc() and setting of nh->nh_exceptions(), we need a proper memory barrier. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `4895c771c7` ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:15:50 -07:00
Nicolas Dichtel	e7478dfc46	ipv6: use addrconf_get_prefix_route() to remove peer addr addrconf_get_prefix_route() ensures to get the right route in the right table. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:13:24 -07:00
Nicolas Dichtel	f24062b07d	ipv6: fix a refcnt leak with peer addr There is no reason to take a refcnt before deleting the peer address route. It's done some lines below for the local prefix route because inet6_ifa_finish_destroy() will release it at the end. For the peer address route, we want to free it right now. This bug has been introduced by commit `caeaba7900` ("ipv6: add support of peer address"). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:13:24 -07:00
Andy Zhou	29abe2fda5	l2tp: fix missing line continuation This syntax error was covered by L2TP_REFCNT_DEBUG not being set by default. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 15:19:53 -07:00
Willem de Bruijn	c199105d15	net-timestamp: only report sw timestamp if reporting bit is set The timestamping API has separate bits for generating and reporting timestamps. A software timestamp should only be reported for a packet when the packet has the relevant generation flag (SKBTX_..) set and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set. The second check was accidentally removed. Reinstitute the original behavior. Tested: Without this patch, Documentation/networking/txtimestamp reports timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set. After the patch, it only reports them when the flag is set. Fixes: `f24b9be595` ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 15:02:43 -07:00
Guillaume Nault	eed4d839b0	l2tp: fix race while getting PMTU on PPP pseudo-wire Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU. The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get() could return NULL if tunnel->sock->sk_dst_cache was reset just before the call, thus making dst_mtu() dereference a NULL pointer: [ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0 [ 1937.664005] Oops: 0000 [#1] SMP [ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core] [ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G O 3.17.0-rc1 #1 [ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008 [ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000 [ 1937.664005] RIP: 0010:[<ffffffffa049db88>] [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] RSP: 0018:ffff8800c43c7de8 EFLAGS: 00010282 [ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5 [ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000 [ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000 [ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000 [ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009 [ 1937.664005] FS: 00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000 [ 1937.664005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0 [ 1937.664005] Stack: [ 1937.664005] ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009 [ 1937.664005] ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57 [ 1937.664005] ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000 [ 1937.664005] Call Trace: [ 1937.664005] [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp] [ 1937.664005] [<ffffffff81109b57>] ? might_fault+0x9e/0xa5 [ 1937.664005] [<ffffffff81109b0e>] ? might_fault+0x55/0xa5 [ 1937.664005] [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26 [ 1937.664005] [<ffffffff81309196>] SYSC_connect+0x87/0xb1 [ 1937.664005] [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56 [ 1937.664005] [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1 [ 1937.664005] [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 1937.664005] [<ffffffff8114c262>] ? spin_lock+0x9/0xb [ 1937.664005] [<ffffffff813092b4>] SyS_connect+0x9/0xb [ 1937.664005] [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b [ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89 [ 1937.664005] RIP [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] RSP <ffff8800c43c7de8> [ 1937.664005] CR2: 0000000000000020 [ 1939.559375] ---[ end trace 82d44500f28f8708 ]--- Fixes: `f34c4a35d8` ("l2tp: take PMTU from tunnel UDP socket") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 14:40:18 -07:00
Govindarajulu Varadarajan	f0db9b0734	ethtool: Add generic options for tunables This patch adds new ethtool cmd, ETHTOOL_GTUNABLE & ETHTOOL_STUNABLE for getting tunable values from driver. Add get_tunable and set_tunable to ethtool_ops. Driver implements these functions for getting/setting tunable value. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 12:12:20 -07:00
Daniel Borkmann	e020836d95	dev_ioctl: remove dev_load() CAP_SYS_MODULE message Marcel reported to see the following message when autoloading is being triggered when adding nlmon device: Loading kernel module for a network device with CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias netdev-nlmon instead. This false-positive happens despite with having correct capabilities set, e.g. through issuing `ip link del dev nlmon` more than once on a valid device with name nlmon, but Marcel has also seen it on creation time when no nlmon module is previously compiled-in or loaded as module and the device name equals a link type name (e.g. nlmon, vxlan, team). Stephen says: The netdev module alias is a hold over from the past. For normal devices, people used to create a alias eth0 to and point it to the type of network device used, that was back in the bad old ISA days before real discovery. Also, the tunnels create module alias for the control device and ip used to use this to autoload the tunnel device. The message is bogus and should just be removed, I also see it in a couple of other cases where tap devices are renamed for other usese. As mentioned in `8909c9ad8f` ("net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules"), we nevertheless still might want to leave the old autoloading behaviour in place as it could break old scripts, so for now, lets just remove the log message as Stephen suggests. Reference: http://thread.gmane.org/gmane.linux.kernel/1105168 Reported-by: Marcel Holtmann <marcel@holtmann.org> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 12:04:40 -07:00
Daniel Borkmann	60a3b2253c	net: bpf: make eBPF interpreter images read-only With eBPF getting more extended and exposure to user space is on it's way, hardening the memory range the interpreter uses to steer its command flow seems appropriate. This patch moves the to be interpreted bytecode to read-only pages. In case we execute a corrupted BPF interpreter image for some reason e.g. caused by an attacker which got past a verifier stage, it would not only provide arbitrary read/write memory access but arbitrary function calls as well. After setting up the BPF interpreter image, its contents do not change until destruction time, thus we can setup the image on immutable made pages in order to mitigate modifications to that code. The idea is derived from commit `314beb9bca` ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). This is possible because bpf_prog is not part of sk_filter anymore. After setup bpf_prog cannot be altered during its life-time. This prevents any modifications to the entire bpf_prog structure (incl. function/JIT image pointer). Every eBPF program (including classic BPF that are migrated) have to call bpf_prog_select_runtime() to select either interpreter or a JIT image as a last setup step, and they all are being freed via bpf_prog_free(), including non-JIT. Therefore, we can easily integrate this into the eBPF life-time, plus since we directly allocate a bpf_prog, we have no performance penalty. Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual inspection of kernel_page_tables. Brad Spengler proposed the same idea via Twitter during development of this patch. Joint work with Hannes Frederic Sowa. Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 12:02:48 -07:00
Sabrina Dubroca	a9ed4a2986	ipv6: fix rtnl locking in setsockopt for anycast and multicast Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST triggers the assertion in addrconf_join_solict()/addrconf_leave_solict() ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before calling ipv6_dev_mc_inc/dec. This patch moves ASSERT_RTNL() up a level in the call stack. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reported-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 11:52:28 -07:00
Hannes Frederic Sowa	a9fe8e2994	ipv4: implement igmp_qrv sysctl to tune igmp robustness variable As in IPv6 people might increase the igmp query robustness variable to make sure unsolicited state change reports aren't lost on the network. Add and document this new knob to igmp code. RFCs allow tuning this parameter back to first IGMP RFC, so we also use this setting for all counters, including source specific multicast. Also take over sysctl value when upping the interface and don't reuse the last one seen on the interface. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-04 22:26:14 -07:00
Hannes Frederic Sowa	2f711939d2	ipv6: add sysctl_mld_qrv to configure query robustness variable This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query robustness variable. It specifies how many retransmit of unsolicited mld retransmit should happen. Admins might want to tune this on lossy links. Also reset mld state on interface down/up, so we pick up new sysctl settings during interface up event. IPv6 certification requests this knob to be available. I didn't make this knob netns specific, as it is mostly a setting in a physical environment and should be per host. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-04 22:26:14 -07:00
John W. Linville	ef4ead3f29	Not that much content this time. Some RCU cleanups, crypto performance improvements, and various patches all over, rather than listing them one might as well look into the git log instead. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUAIx4AAoJEDBSmw7B7bqrUYcP/3t4qdFxm0bd4j2AEkl3mPwB Qu7obTicOTfBRoJNEgS+8AU2u3PfztU6+ErZs4ETLUuqaZwXisqmwBiMo86+Wtdf gx9KonwEW051g7YmB0+6EMwuy04MGzTEk8VavQwqM4g9LIPJ4Buo/kj7MNJ51m11 XyRmJqZJnKKeiiQ4eC0gPf8e44qiQqaDuYZ0r1UDnNRg2KrbAHlGTBKYI3VRl2u4 xRpPGVnHwT0qkWb1Zw9fk0VfPr9m1ETthzcZvnhk6uMnJ28D+1B1FjZR1GJU6BW7 Zx2FbevbZTjDoNT1GQpLGMXBuW0lsZFetXVFiJCr/StaPBtHmtdu28fuNVm8yJYz euDlEgrE8F4npdec2F5R2zh7Ue2U7eMEL2uxxjciNSJOipHgx5EXH12Y/5QtrChy 4OHPbNHgpmqFB7TmkvHDgP/0A7XdyqKVc+NtIV+eECIwE4tHcJ6A+bQ+ZCoRV2Vw zmsNuNeNeDW7NEAw9veRXissLZMy/EjUnsOrnW29BpO/yG+2YjqpyQ6JQpcXeCPD WQgl2FHpk6ap3jpVjxminxw2HkDnQ0oTKusGLcezalhUlWMo7VYNN59aLzcphxX5 Fotp/8v1sbDTF46uc/QJ38N5TqflwWeFpxvGkdNGuAT4llP03NaXV0ORBecFmMW2 esb+PLwlByCDeVFu53q+ =Qth6 -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-john-2014-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg <johannes@sipsolutions.net> says: "Not that much content this time. Some RCU cleanups, crypto performance improvements, and various patches all over, rather than listing them one might as well look into the git log instead." Signed-off-by: John W. Linville <linville@tuxdriver.com> Conflicts: drivers/net/wireless/ath/wil6210/wmi.c	2014-09-04 13:41:33 -04:00
John W. Linville	190355cc06	Here are a few fixes for mac80211. One has been discussed for a while and adds a terminating NUL-byte to the alpha2 sent to userspace, which shouldn't be necessary but since many places treat it as a string we couldn't move to just sending two bytes. In addition to that, we have two VLAN fixes from Felix, a mesh fix, a fix for the recently introduced RX aggregation offload, a revert for a broken patch (that luckily didn't really cause any harm) and a small fix for alignment in debugfs. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJUAFrmAAoJEDBSmw7B7bqr79AP/1yGi9lkv/wWUs5y0AhUSen9 850MU26BBlyAAFSz11xqgaEeRmeBeqhR3K7w/M02TX0CHxBzMqMZfyE//tq0UJaI ZwZmtyQmdMiOSNKignTIIx7OHTioq0wrGKb6O2UvKoJfTlB9t01jCC4jmCTF5Vos 6ReF7NaZEbxW6XDOsClNTAtIa1c6n1RQ5VbDIEL5Vfvqv8LbcobduF8WcYl80eIQ +EvIHtUm/Luxg6DblibgEVtwYOtNpvRz4pofdw3xoSHAnF+zhXbUr0dUjpkBNA7o vWboCBl14Qn1M7pOJZ0+TBzFmquAr6CDbDvArVCH01Swh27EUDQUcHQAggGpT71w DFgWHOYP0UCB6Y4U0GjBehy8PeuytqJLBSceKVud7DDqd8fY+Lq3MMyicIk0aw3o IIDLWrujkCBXsdfuxQETmYxHU05WHSuYOCTgGSqbq3QPTWm8pBGWTdbk+1t/0FyH cGLJOWs/jCrtHdzDj6TH+kL8NmvwB7sC9MT45qG0ilevmPW25yrnTJPEMEvFBqvZ lnaqiX6D1kGNZd09CxgSIhxrQi+N0Yg+UlLa4IUtOIqnQussOC3xH2U5qTufdpa1 Gi9aCkBGVKQiObPWucf2QB4t1sZ18rxBhrAelZhQPLTKrnsuLhpcVBlU+L6ScCAk FVni4HZH2IGtDQ577k10 =G/pB -----END PGP SIGNATURE----- Merge tag 'mac80211-for-john-2014-08-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg <johannes@sipsolutions.net> says: "Here are a few fixes for mac80211. One has been discussed for a while and adds a terminating NUL-byte to the alpha2 sent to userspace, which shouldn't be necessary but since many places treat it as a string we couldn't move to just sending two bytes. In addition to that, we have two VLAN fixes from Felix, a mesh fix, a fix for the recently introduced RX aggregation offload, a revert for a broken patch (that luckily didn't really cause any harm) and a small fix for alignment in debugfs." Signed-off-by: John W. Linville <linville@redhat.com>	2014-09-04 13:08:24 -04:00
Li RongQing	c5eba0b6f8	openvswitch: distinguish between the dropped and consumed skb distinguish between the dropped and consumed skb, not assume the skb is consumed always Cc: Thomas Graf <tgraf@noironetworks.com> Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-03 20:50:51 -07:00
Jesper Dangaard Brouer	1f59533f9c	qdisc: validate frames going through the direct_xmit path In commit `50cbe9ab5f` ("net: Validate xmit SKBs right when we pull them out of the qdisc") the validation code was moved out of dev_hard_start_xmit and into dequeue_skb. However this overlooked the fact that we do not always enqueue the skb onto a qdisc. First situation is if qdisc have flag TCQ_F_CAN_BYPASS and qdisc is empty. Second situation is if there is no qdisc on the device, which is a common case for software devices. Originally spotted and inital patch by Alexander Duyck. As a result Alex was seeing issues trying to connect to a vhost_net interface after commit `50cbe9ab5f` was applied. Added a call to validate_xmit_skb() in __dev_xmit_skb(), in the code path for qdiscs with TCQ_F_CAN_BYPASS flag, and in __dev_queue_xmit() when no qdisc. Also handle the error situation where dev_hard_start_xmit() could return a skb list, and does not return dev_xmit_complete(rc) and falls through to the kfree_skb(), in that situation it should call kfree_skb_list(). Fixes: `50cbe9ab5f` ("net: Validate xmit SKBs right when we pull them out of the qdisc") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-03 20:41:42 -07:00
Jesper Dangaard Brouer	3f3c7eec60	qdisc: exit case fixes for skb list handling in qdisc layer More minor fixes to merge commit `53fda7f7f9` (Merge branch 'xmit_list') that allows us to work with a list of SKBs. Fixing exit cases in qdisc_reset() and qdisc_destroy(), where a leftover requeued SKB (qdisc->gso_skb) can have the potential of being a skb list, thus use kfree_skb_list(). This is a followup to commit `10770bc2d1` ("qdisc: adjustments for API allowing skb list xmits"). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-03 20:41:42 -07:00
Jesper Dangaard Brouer	10770bc2d1	qdisc: adjustments for API allowing skb list xmits Minor adjustments for merge commit `53fda7f7f9` (Merge branch 'xmit_list') that allows us to work with a list of SKBs. Update code doc to function sch_direct_xmit(). In handle_dev_cpu_collision() use kfree_skb_list() in error handling. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 14:06:17 -07:00
Li RongQing	4ee45ea05c	openvswitch: fix a memory leak The user_skb maybe be leaked if the operation on it failed and codes skipped into the label "out:" without calling genlmsg_unicast. Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 14:01:21 -07:00
Pablo Neira	41ad82f7f8	netfilter: fix missing dependencies in NETFILTER_XT_TARGET_LOG make defconfig reports: warning: (NETFILTER_XT_TARGET_LOG) selects NF_LOG_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NETFILTER_ADVANCED) Fixes: `d79a61d` netfilter: NETFILTER_XT_TARGET_LOG selects NF_LOG_* Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 13:59:54 -07:00
David S. Miller	abccc5878a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== pull request: Netfilter/IPVS fixes for net The following patchset contains seven Netfilter fixes for your net tree, they are: 1) Make the NAT infrastructure independent of x_tables, some users are already starting to test nf_tables with NAT without enabling x_tables. Without this patch for Kconfig, there's a superfluous dependency between NAT and x_tables. 2) Allow to use 0 in the cgroup match, the kernel rejects with -EINVAL with no good reason. From Daniel Borkmann. 3) Select CONFIG_NF_NAT from the nf_tables NAT expression, this also resolves another NAT dependency with x_tables. 4) Use HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABEL in the Netfilter hook code as elsewhere in the kernel to resolve toolchain problems, from Zhouyi Zhou. 5) Use iptunnel_handle_offloads() to set up tunnel encapsulation depending on the offload capabilities, reported by Alex Gartrell patch from Julian Anastasov. 6) Fix wrong family when registering the ip_vs_local_reply6() hook, also from Julian. 7) Select the NF_LOG_* symbols from NETFILTER_XT_TARGET_LOG. Rafał Miłecki reported that when jumping from 3.16 to 3.17-rc, his log target is not selected anymore due to changes in the previous development cycle to accomodate the full logging support for nf_tables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 13:56:30 -07:00
Nicolas Dichtel	ba9989069f	rtnl/do_setlink(): notify when a netdev is modified Depending on which parameters were updated, the changes were not propagated via the notifier chain and netlink. The new flag has been set only when the change did not cause a call to the notifier chain and/or to the netlink notification functions. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 12:57:04 -07:00
Nicolas Dichtel	90c325e3bf	rtnl/do_setlink(): last arg is now a set of flags There is no functional changes with this commit, it only prepares the next one. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 12:57:04 -07:00
Nicolas Dichtel	1889b0e7ef	rtnl/do_setlink(): set modified when IFLA_LINKMODE is updated The only effect of this patch is to print a warning if IFLA_LINKMODE is updated and a following change fails. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 12:57:04 -07:00
Nicolas Dichtel	5d1180fcac	rtnl/do_setlink(): set modified when IFLA_TXQLEN is updated The only effect of this patch is to print a warning if IFLA_TXQLEN is updated and a following change fails. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-02 12:57:04 -07:00
Pablo Neira Ayuso	65cd90ac76	netfilter: nft_chain_nat_ipv4: use generic IPv4 NAT code from core Use the exported IPv4 NAT functions that are provided by the core. This removes duplicated code so iptables and nft use the same NAT codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-02 17:14:11 +02:00
Pablo Neira Ayuso	30766f4c2d	netfilter: nat: move specific NAT IPv4 to core Move the specific NAT IPv4 core functions that are called from the hooks from iptable_nat.c to nf_nat_l3proto_ipv4.c. This prepares the ground to allow iptables and nft to use the same NAT engine code that comes in a follow up patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-02 17:14:10 +02:00
Willem de Bruijn	364a9e9324	sock: deduplicate errqueue dequeue sk->sk_error_queue is dequeued in four locations. All share the exact same logic. Deduplicate. Also collapse the two critical sections for dequeue (at the top of the recv handler) and signal (at the bottom). This moves signal generation for the next packet forward, which should be harmless. It also changes the behavior if the recv handler exits early with an error. Previously, a signal for follow-up packets on the errqueue would then not be scheduled. The new behavior, to always signal, is arguably a bug fix. For rxrpc, the change causes the same function to be called repeatedly for each queued packet (because the recv handler == sk_error_report). It is likely that all packets will fail for the same reason (e.g., memory exhaustion). This code runs without sk_lock held, so it is not safe to trust that sk->sk_err is immutable inbetween releasing q->lock and the subsequent test. Introduce int err just to avoid this potential race. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:49:08 -07:00
Tom Herbert	72297c59f7	l2tp: Enable checksum unnecessary conversions for l2tp/UDP sockets Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:36:28 -07:00
Tom Herbert	884d338c04	gre: Add support for checksum unnecessary conversions Call skb_checksum_try_convert and skb_gro_checksum_try_convert after checksum is found present and validated in the GRE header for normal and GRO paths respectively. In GRO path, call skb_gro_checksum_try_convert Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:36:28 -07:00
Tom Herbert	2abb7cdc0d	udp: Add support for doing checksum unnecessary conversion Add support for doing CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE conversion in UDP tunneling path. In the normal UDP path, we call skb_checksum_try_convert after locating the UDP socket. The check is that checksum conversion is enabled for the socket (new flag in UDP socket) and that checksum field is non-zero. In the UDP GRO path, we call skb_gro_checksum_try_convert after checksum is validated and checksum field is non-zero. Since this is already in GRO we assume that checksum conversion is always wanted. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:36:28 -07:00
Tom Herbert	5a21232983	net: Support for csum_bad in skbuff This flag indicates that an invalid checksum was detected in the packet. __skb_mark_checksum_bad helper function was added to set this. Checksums can be marked bad from a driver or the GRO path (the latter is implemented in this patch). csum_bad is checked in __skb_checksum_validate_complete (i.e. calling that when ip_summed == CHECKSUM_NONE). csum_bad works in conjunction with ip_summed value. In the case that ip_summed is CHECKSUM_NONE and csum_bad is set, this implies that the first (or next) checksum encountered in the packet is bad. When ip_summed is CHECKSUM_UNNECESSARY, the first checksum after the last one validated is bad. For example, if ip_summed == CHECKSUM_UNNECESSARY, csum_level == 1, and csum_bad is set-- then the third checksum in the packet is bad. In the normal path, the packet will be dropped when processing the protocol layer of the bad checksum: __skb_decr_checksum_unnecessary called twice for the good checksums changing ip_summed to CHECKSUM_NONE so that __skb_checksum_validate_complete is called to validate the third checksum and that will fail since csum_bad is set. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 21:36:27 -07:00
Florian Fainelli	61b7363ffa	net: dsa: make dsa_pack_type static net/dsa/dsa.c:624:20: sparse: symbol 'dsa_pack_type' was not declared. Should it be static? Fixes: `3e8a72d1da` ("net: dsa: reduce number of protocol hooks") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 20:41:45 -07:00
stephen hemminger	688d1945bc	tcp: whitespace fixes Fix places where there is space before tab, long lines, and awkward if(){, double spacing etc. Add blank line after declaration/initialization. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 18:12:45 -07:00
David S. Miller	377655fe6b	Merge tag 'master-2014-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-08-28 Please pull this batch of fixes intended for the 3.17 stream. For the Bluetooth/6LowPAN/802.15.4 bits, Johan says: 'It contains a connection reference counting fix for LE where a connection might stay up even though it should get disconnected. The other 802.15.4 6LoWPAN related patches were sent to the bluetooth tree by Alexander Aring and described as follows by him: " these patches contains patches for the bluetooth branch. This series includes memory leak fixes and an errno value fix. Also there are two patches for sending and receiving 1280 6LoWPAN packets, which makes the IEEE 802.15.4 6LoWPAN stack more RFC compliant. "' Along with that... Alexey Khoroshilov fixes a use-after-free bug on at76c50x-usb. Hauke Mehrtens adds a PCI ID to bcma. Himangi Saraogi fixes a silly "A \|\| A" test in rtlwifi. Larry Finger adds a device ID to rtl8192cu. Maks Naumov fixes a strncmp argument in ath9k. Álvaro Fernández Rojas adds a PCI ID to ssb. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 18:08:11 -07:00
Jesper Dangaard Brouer	afb84b6261	pktgen: add flag NO_TIMESTAMP to disable timestamping Then testing the TX limits of the stack, then it is useful to be-able to disable the do_gettimeofday() timetamping on every packet. This implements a pktgen flag NO_TIMESTAMP which will disable this call to do_gettimeofday(). The performance change on (my system E5-2695) with skb_clone=0, goes from TX 2,423,751 pps to 2,567,165 pps with flag NO_TIMESTAMP. Thus, the cost of do_gettimeofday() or saving is approx 23 nanosec. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 18:06:59 -07:00
Erik Hugne	a5325ae5b8	tipc: add name distributor resiliency queue TIPC name table updates are distributed asynchronously in a cluster, entailing a risk of certain race conditions. E.g., if two nodes simultaneously issue conflicting (overlapping) publications, this may not be detected until both publications have reached a third node, in which case one of the publications will be silently dropped on that node. Hence, we end up with an inconsistent name table. In most cases this conflict is just a temporary race, e.g., one node is issuing a publication under the assumption that a previous, conflicting, publication has already been withdrawn by the other node. However, because of the (rtt related) distributed update delay, this may not yet hold true on all nodes. The symptom of this failure is a syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error". In this commit we add a resiliency queue at the receiving end of the name table distributor. When insertion of an arriving publication fails, we retain it in this queue for a short amount of time, assuming that another update will arrive very soon and clear the conflict. If so happens, we insert the publication, otherwise we drop it. The (configurable) retention value defaults to 2000 ms. Knowing from experience that the situation described above is extremely rare, there is no risk that the queue will accumulate any large number of items. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:51:48 -07:00
Erik Hugne	f4ad8a4b8b	tipc: refactor name table updates out of named packet receive routine We need to perform the same actions when processing deferred name table updates, so this functionality is moved to a separate function. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:51:48 -07:00
David S. Miller	8dcda22a5d	net: xmit_list() becomes dev_hard_start_xmit(). Now fundamentally we can process lists of SKBs as cheaply as single packets. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:56 -07:00
David S. Miller	ce93718fb7	net: Don't keep around original SKB when we software segment GSO frames. Just maintain the list properly by returning the head of the remaining SKB list from dev_hard_start_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:56 -07:00
David S. Miller	50cbe9ab5f	net: Validate xmit SKBs right when we pull them out of the qdisc. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:56 -07:00
David S. Miller	eae3f88ee4	net: Separate out SKB validation logic from transmit path. dev_hard_start_xmit() does two things, it first validates and canonicalizes the SKB, then it actually sends it. Make a set of helper functions for doing the first part. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	95f6b3dda2	net: Have xmit_list() signal more==true when appropriate. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	fa2dbdc253	net: Pass a "more" indication down into netdev_start_xmit() code paths. For now it will always be false. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	7f2e870f2a	net: Move main gso loop out of dev_hard_start_xmit() into helper. There is a slight policy change happening here as well. The previous code would drop the entire rest of the GSO skb if any of them got, for example, a congestion notification. That makes no sense, anything NET_XMIT_MASK and below is something like congestion or policing. And in the congestion case it doesn't even mean the packet was actually dropped. Just continue until dev_xmit_complete() evaluates to false. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	2ea2551375	net: Create xmit_one() helper for dev_hard_start_xmit() Hopefully making the code a bit easier to read and digest. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
David S. Miller	10b3ad8c21	net: Do txq_trans_update() in netdev_start_xmit() That way we don't have to audit every call site to make sure it is doing this properly. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:39:55 -07:00
Pablo Neira Ayuso	d79a61d646	netfilter: NETFILTER_XT_TARGET_LOG selects NF_LOG_* CONFIG_NETFILTER_XT_TARGET_LOG is not selected anymore when jumping from 3.16 to 3.17-rc1 if you don't set on the new NF_LOG_IPV4 and NF_LOG_IPV6 switches. Change this to select the three new symbols NF_LOG_COMMON, NF_LOG_IPV4 and NF_LOG_IPV6 instead, so NETFILTER_XT_TARGET_LOG remains enabled when moving from old to new kernels. Reported-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-01 13:46:31 +02:00
Tom Herbert	202863fe4c	sctp: Change sctp to implement csum_levels CHECKSUM_UNNECESSARY may be applied to the SCTP CRC so we need to appropriate account for this by decrementing csum_level. This is done by calling __skb_dec_checksum_unnecessary. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:41:11 -07:00
Tom Herbert	662880f442	net: Allow GRO to use and set levels of checksum unnecessary Allow GRO path to "consume" checksums provided in CHECKSUM_UNNECESSARY and to report new checksums verfied for use in fallback to normal path. Change GRO checksum path to track csum_level using a csum_cnt field in NAPI_GRO_CB. On GRO initialization, if ip_summed is CHECKSUM_UNNECESSARY set NAPI_GRO_CB(skb)->csum_cnt to skb->csum_level + 1. For each checksum verified, decrement NAPI_GRO_CB(skb)->csum_cnt while its greater than zero. If a checksum is verfied and NAPI_GRO_CB(skb)->csum_cnt == 0, we have verified a deeper checksum than originally indicated in skbuf so increment csum_level (or initialize to CHECKSUM_UNNECESSARY if ip_summed is CHECKSUM_NONE or CHECKSUM_COMPLETE). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:41:11 -07:00
Tom Herbert	77cffe23c1	net: Clarification of CHECKSUM_UNNECESSARY This patch: - Clarifies the specific requirements of devices returning CHECKSUM_UNNECESSARY (comments in skbuff.h). - Adds csum_level field to skbuff. This is used to express how many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1). This replaces the overloading of skb->encapsulation, that field is is now only used to indicate inner headers are valid. - Change __skb_checksum_validate_needed to "consume" each checksum as indicated by csum_level as layers of the the packet are parsed. - Remove skb_pop_rcv_encapsulation, no longer needed in the new csum_level model. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:41:11 -07:00
Daniel Borkmann	38ab1fa981	net: sctp: fix ABI mismatch through sctp_assoc_to_state helper Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp tree, the official <netinet/sctp.h> header carries a copy of enum sctp_sstat_state that looks like (compared to the current in-kernel enumeration): User definition: Kernel definition: enum sctp_sstat_state { typedef enum { SCTP_EMPTY = 0, <removed> SCTP_CLOSED = 1, SCTP_STATE_CLOSED = 0, SCTP_COOKIE_WAIT = 2, SCTP_STATE_COOKIE_WAIT = 1, SCTP_COOKIE_ECHOED = 3, SCTP_STATE_COOKIE_ECHOED = 2, SCTP_ESTABLISHED = 4, SCTP_STATE_ESTABLISHED = 3, SCTP_SHUTDOWN_PENDING = 5, SCTP_STATE_SHUTDOWN_PENDING = 4, SCTP_SHUTDOWN_SENT = 6, SCTP_STATE_SHUTDOWN_SENT = 5, SCTP_SHUTDOWN_RECEIVED = 7, SCTP_STATE_SHUTDOWN_RECEIVED = 6, SCTP_SHUTDOWN_ACK_SENT = 8, SCTP_STATE_SHUTDOWN_ACK_SENT = 7, }; } sctp_state_t; This header was later on also placed into the uapi, so that user space programs can compile without having <netinet/sctp.h>, but the shipped with <linux/sctp.h> instead. While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we nevertheless have a what it appears to be dummy SCTP_EMPTY state from the very early days. While it seems to do just nothing, commit `0b8f9e25b0` ("sctp: remove completely unsed EMPTY state") did the right thing and removed this dead code. That however, causes an off-by-one when the user asks the SCTP stack via SCTP_STATUS API and checks for the current socket state thus yielding possibly undefined behaviour in applications as they expect the kernel to tell the right thing. The enumeration had to be changed however as based on the current socket state, we access a function pointer lookup-table through this. Therefore, I think the best way to deal with this is just to add a helper function sctp_assoc_to_state() to encapsulate the off-by-one quirk. Reported-by: Tristan Su <sooqing@gmail.com> Fixes: `0b8f9e25b0` ("sctp: remove completely unsed EMPTY state") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:31:08 -07:00
Eric Dumazet	d9b2938aab	net: attempt a single high order allocation In commit `ed98df3361` ("net: use __GFP_NORETRY for high order allocations") we tried to address one issue caused by order-3 allocations. We still observe high latencies and system overhead in situations where compaction is not successful. Instead of trying order-3, order-2, and order-1, do a single order-3 best effort and immediately fallback to plain order-0. This mimics slub strategy to fallback to slab min order if the high order allocation used for performance failed. Order-3 allocations give a performance boost only if they can be done without recurring and expensive memory scan. Quoting David : The page allocator relies on synchronous (sync light) memory compaction after direct reclaim for allocations that don't retry and deferred compaction doesn't work with this strategy because the allocation order is always decreasing from the previous failed attempt. This means sync light compaction will always be encountered if memory cannot be defragmented or reclaimed several times during the skb_page_frag_refill() iteration. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:28:23 -07:00
Ying Xue	cc086fcf92	tipc: fix a potential oops Commit `6c9808ce09` ("tipc: remove port_lock") accidentally involves a potential bug: when tipc socket instance(tsk) is not got with given reference number in tipc_sk_get(), tsk is set to NULL. Subsequently we jump to exit label where to decrease socket reference counter pointed by tsk pointer in tipc_sk_put(). However, As now tsk is NULL, oops may happen because of touching a NULL pointer. Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:22:43 -07:00
Daniel Borkmann	10c51b5623	net: add skb_get_tx_queue() helper Replace occurences of skb_get_queue_mapping() and follow-up netdev_get_tx_queue() with an actual helper function. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:02:07 -07:00
Mika Westerberg	d0616613d9	net: rfkill: gpio: Add more Broadcom bluetooth ACPI IDs This adds one more ACPI ID of a Broadcom bluetooth chip. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-29 13:10:44 +02:00
Michal Kazior	a00f4f6e04	mac80211: fix chantype recalc warning When a device driver is unloaded local->interfaces list is cleared. If there was more than 1 interface running and connected (bound to a chanctx) then chantype recalc was called and it ended up with compat being NULL causing a call trace warning. Warn if compat becomes NULL as a result of incompatible bss_conf.chandef of interfaces bound to a given channel context only. The call trace looked like this: WARNING: CPU: 2 PID: 2594 at /devel/src/linux/net/mac80211/chan.c:557 ieee80211_recalc_chanctx_chantype+0x2cd/0x2e0() Modules linked in: ath10k_pci(-) ath10k_core ath CPU: 2 PID: 2594 Comm: rmmod Tainted: G W 3.16.0-rc1+ #150 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 0000000000000009 ffff88001ea279c0 ffffffff818dfa93 0000000000000000 ffff88001ea279f8 ffffffff810514a8 ffff88001ce09cd0 ffff88001e03cc58 0000000000000000 ffff88001ce08840 ffff88001ce09cd0 ffff88001ea27a08 Call Trace: [<ffffffff818dfa93>] dump_stack+0x4d/0x66 [<ffffffff810514a8>] warn_slowpath_common+0x78/0xa0 [<ffffffff81051585>] warn_slowpath_null+0x15/0x20 [<ffffffff818a407d>] ieee80211_recalc_chanctx_chantype+0x2cd/0x2e0 [<ffffffff818a3dda>] ? ieee80211_recalc_chanctx_chantype+0x2a/0x2e0 [<ffffffff818a4919>] ieee80211_assign_vif_chanctx+0x1a9/0x770 [<ffffffff818a6220>] __ieee80211_vif_release_channel+0x70/0x130 [<ffffffff818a6dd3>] ieee80211_vif_release_channel+0x43/0xb0 [<ffffffff81885f4e>] ieee80211_stop_ap+0x21e/0x5a0 [<ffffffff8184b9b5>] __cfg80211_stop_ap+0x85/0x520 [<ffffffff8181c188>] __cfg80211_leave+0x68/0x120 [<ffffffff8181c268>] cfg80211_leave+0x28/0x40 [<ffffffff8181c5f3>] cfg80211_netdev_notifier_call+0x373/0x6b0 [<ffffffff8107f965>] notifier_call_chain+0x55/0x110 [<ffffffff8107fa41>] raw_notifier_call_chain+0x11/0x20 [<ffffffff816a8dc0>] call_netdevice_notifiers_info+0x30/0x60 [<ffffffff816a8eb9>] __dev_close_many+0x59/0xf0 [<ffffffff816a9021>] dev_close_many+0x81/0x120 [<ffffffff816aa1c5>] rollback_registered_many+0x115/0x2a0 [<ffffffff816aa3a6>] unregister_netdevice_many+0x16/0xa0 [<ffffffff8187d841>] ieee80211_remove_interfaces+0x121/0x1b0 [<ffffffff8185e0e6>] ieee80211_unregister_hw+0x56/0x110 [<ffffffffa0011ac4>] ath10k_mac_unregister+0x14/0x60 [ath10k_core] [<ffffffffa0014fe7>] ath10k_core_unregister+0x27/0x40 [ath10k_core] [<ffffffffa003b1f4>] ath10k_pci_remove+0x44/0xa0 [ath10k_pci] [<ffffffff81373138>] pci_device_remove+0x28/0x60 [<ffffffff814cb534>] __device_release_driver+0x64/0xd0 [<ffffffff814cbcc8>] driver_detach+0xb8/0xc0 [<ffffffff814cb23a>] bus_remove_driver+0x4a/0xb0 [<ffffffff814cc697>] driver_unregister+0x27/0x50 Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-29 13:06:01 +02:00
Florian Fainelli	246d7f773c	net: dsa: add Broadcom SF2 switch driver Add support for the Broadcom Starfigther 2 switch chip using a DSA driver. This switch driver supports the following features: - configuration of the external switch port interface: MII, RevMII, RGMII and RGMII_NO_ID are supported - support for the per-port MIB counters - support for link interrupts for special ports (e.g: MoCA) - powering up/down of switch memories to conserve power when ports are unused Finally, update the compatible property for the DSA core code to match our switch top-level compatible node. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	5037d532b8	net: dsa: add Broadcom tag RX/TX handler Add support for the 4-bytes Broadcom tag that built-in switches such as the Starfighter 2 might insert when receiving packets, or that we need to insert while targetting specific switch ports. We use a fake local EtherType value for this 4-bytes switch tag: ETH_P_BRCMTAG to make sure we can assign DSA-specific network operations within the DSA drivers. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	ce31b31c68	net: dsa: allow updating fixed PHY link information Allow switch drivers to hook a PHY link update callback to perform port-specific link work. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	ec9436baed	net: dsa: allow drivers to do link adjustment Whenever libphy determines that the link status of a given PHY/port has changed, allow to call into the switch driver link adjustment callback so proper actions can be taken care of by the switch driver upon link notification. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	5aed85cec2	net: dsa: allow switches to work without tagging In case switch port tagging is disabled (voluntarily, or the switch just does not support it), allow us to continue using the defined set of dsa_device_ops in net/dsa/slave.c. We introduce dsa_protocol_is_tagged() to check whether we need to override skb->protocol and go through the DSA-specifif packet_type function, or if we just go on and receive the SKB through the normal path. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	0d8bcdd383	net: dsa: allow for more complex PHY setups Modify the DSA slave interface to be bound to an arbitray PHY, not just the ones that are available as child PHY devices of the switch MDIO bus. This allows us for instance to have external PHYs connected to a separate MDIO bus, but yet also connected to a given switch port. Under certain configurations, the physical port mask might not be a 1:1 mapping to the MII PHYs mask. This is the case, if e.g: Port 1 of the switch is used and connects to a PHY at a MDIO address different than 1. Introduce a phys_mii_mask variable which allows driver to implement and divert their own MDIO read/writes operations for a subset of the MDIO PHY addresses. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	bd47497a01	net: dsa: retain a per-port device_node pointer We will later use the per-port device_node pointer to fetch a bunch of port-specific properties. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:40 -07:00
Florian Fainelli	fa981d9af8	net: dsa: provide a switch device device tree node pointer We might need to fetch additional resources from the device tree node pointer, such as register ranges or other properties. Keep a device_node pointer around for this. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:39 -07:00
Florian Fainelli	3e8a72d1da	net: dsa: reduce number of protocol hooks DSA is currently registering one packet_type function per EtherType it needs to intercept in the receive path of a DSA-enabled Ethernet device. Right now we have three of them: trailer, DSA and eDSA, and there might be more in the future, this will not scale to the addition of new protocols. This patch proceeds with adding a new layer of abstraction and two new functions: dsa_switch_rcv() which will dispatch into the tag-protocol specific receive function implemented by net/dsa/tag_.c dsa_slave_xmit() which will dispatch into the tag-protocol specific transmit function implemented by net/dsa/tag_.c When we do create the per-port slave network devices, we iterate over the switch protocol to assign the DSA-specific receive and transmit operations. A new fake ethertype value is used: ETH_P_XDSA to illustrate the fact that this is no longer going to look like ETH_P_DSA or ETH_P_TRAILER like it used to be. This allows us to greatly simplify the check in eth_type_trans() and always override the skb->protocol with ETH_P_XDSA for Ethernet switches tagged protocol, while also reducing the number repetitive slave netdevice_ops assignments. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 22:59:39 -07:00
Julian Anastasov	eb90b0c734	ipvs: fix ipv6 hook registration for local replies commit `fc60476761` ("ipvs: changes for local real server") from 2.6.37 introduced DNAT support to local real server but the IPv6 LOCAL_OUT handler ip_vs_local_reply6() is registered incorrectly as IPv4 hook causing any outgoing IPv4 traffic to be dropped depending on the IP header values. Chris tracked down the problem to CONFIG_IP_VS_IPV6=y Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768 Reported-by: Chris J Arges <chris.j.arges@canonical.com> Tested-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-08-28 10:52:37 +09:00
Florian Westphal	253ff51635	tcp: syncookies: mark cookie_secret read_mostly only written once. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-27 16:30:49 -07:00
Andreea-Cristina Bernat	2688eba9d5	mac80211: Replace rcu_dereference() with rcu_access_pointer() The "rcu_dereference()" calls are used directly in conditions. Since their return values are never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacements. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} \| while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-27 12:14:10 +02:00
Julian Anastasov	ea1d5d7755	ipvs: properly declare tunnel encapsulation The tunneling method should properly use tunnel encapsulation. Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum offload is supported. Thanks to Alex Gartrell for reporting the problem, providing solution and for all suggestions. Reported-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-08-27 14:31:56 +09:00
Alexey Perevalov	f111f780ae	netfilter: nfnetlink_acct: add filter support to nfacct counter list/reset You can use this to skip accounting objects when listing/resetting via NFNL_MSG_ACCT_GET/NFNL_MSG_ACCT_GET_CTRZERO messages with the NLM_F_DUMP netlink flag. The filtering covers the following cases: 1. No filter specified. In this case, the client will get old behaviour, 2. List/reset counter object only: In this case, you have to use NFACCT_F_QUOTA as mask and value 0. 3. List/reset quota objects only: You have to use NFACCT_F_QUOTA_PKTS as mask and value - the same, for byte based quota mask should be NFACCT_F_QUOTA_BYTES and value - the same. If you want to obtain the object with any quota type (ie. NFACCT_F_QUOTA_PKTS\|NFACCT_F_QUOTA_BYTES), you need to perform two dump requests, one to obtain NFACCT_F_QUOTA_PKTS objects and another for NFACCT_F_QUOTA_BYTES. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-26 21:36:19 +02:00
Andreea-Cristina Bernat	ad053a962f	mac80211: scan: Replace rcu_assign_pointer() with RCU_INIT_POINTER() The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:31 +02:00
Johannes Berg	5bc8c1f2b0	cfg80211: allow passing frame type to cfg80211_inform_bss() When using the cfg80211_inform_bss[_width]() functions drivers cannot currently indicate whether the data was received in a beacon or probe response. Fix that by passing a new enum that indicates such (or unknown). For good measure, use it in ath6kl. Acked-by: Kalle Valo <kvalo@qca.qualcomm.com> [ath6kl] Acked-by: Arend van Spriel <arend@broadcom.com> [brcmfmac] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:02 +02:00
Johannes Berg	0e227084ae	cfg80211: clarify BSS probe response vs. beacon data There are a few possible cases of where BSS data came from: 1) only a beacon has been received 2) only a probe response has been received 3) the driver didn't report what it received (this happens when using cfg80211_inform_bss[_width]()) 4) both probe response and beacon data has been received Unfortunately, in the userspace API, a few things weren't there: a) there was no way to differentiate cases 1) and 4) above without comparing the data of the IEs b) the TSF was always from the last frame, instead of being exposed for beacon/probe response separately like IEs Fix this by i) exporting a new flag attribute that indicates whether or not probe response data has been received - this addresses (a) ii) exporting a BEACON_TSF attribute that holds the beacon's TSF if a beacon has been received iii) not exporting the beacon attributes in case (3) above as that would just lead userspace into thinking the data actually came from a beacon when that isn't clear To implement this, track inside the IEs struct whether or not it (definitely) came from a beacon. Reported-by: William Seto Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:01 +02:00
Michal Kazior	f41ef64853	cfg80211: re-enable CSA for drivers that support it This reverts commit `dda444d524`. Channel switching code has been reworked and improved significantly since the time original locking issues were found. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:01 +02:00
Ido Yariv	c70f59a2a0	mac80211: don't resize skbs needlessly Header-less cloned skbs with sufficient headroom need not be cloned unless the tailroom is going to be modified. Fix ieee80211_skb_resize so it would only resize cloned skbs if either the header isn't released or the tailroom is going to be modified. Some drivers might have assumed that skbs are never cloned, so add a HW flag that explicitly permits cloned TX skbs. Drivers which do not modify TX skbs should set this flag to avoid copying skbs. Signed-off-by: Ido Yariv <idox.yariv@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:16:00 +02:00
Ido Yariv	ca34e3b5c8	mac80211: Fix accounting of the tailroom-needed counter When hw acceleration is enabled, the GENERATE_IV or PUT_IV_SPACE flags will only require headroom space. Consequently, the tailroom-needed counter can safely be decremented. Signed-off-by: Ido Yariv <idox.yariv@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:15:59 +02:00
Vladimir Kondratiev	970fdfa89b	cfg80211: remove @gfp parameter from cfg80211_rx_mgmt() In the cfg80211_rx_mgmt(), parameter @gfp was used for the memory allocation. But, memory get allocated under spin_lock_bh(), this implies atomic context. So, one can't use GFP_KERNEL, only variants with no __GFP_WAIT. Actually, in all occurrences GFP_ATOMIC is used (wil6210 use GFP_KERNEL by mistake), and it should be this way or warning triggered in the memory allocation code. Remove @gfp parameter as no actual choice exist, and use hard coded GFP_ATOMIC for memory allocation. Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:15:58 +02:00
Johannes Berg	649b2a4da5	mac80211: make ieee80211_vif_use_reserved_switch static Reorder some code to make ieee80211_vif_use_reserved_switch() static, no other changes. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:15:35 +02:00
Bob Copeland	f8134fed83	mac80211: mesh_plink: use get_unaligned_le16 instead of memcpy Use get_unaligned_le16 to access llid/plid. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:15:34 +02:00
Johannes Berg	14b058bbce	mac80211: fix agg_status debugfs file alignment The "RX active" string is too long, so the columns get shifted. Change it to just "RX" to avoid this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:13:37 +02:00
Denton Gentry	c7dcb45fac	mac80211: fix start_seq_num in Rx reorder offload sta->last_seq_ctrl is the seq_ctrl field from the last header seen, need to shift it 4 bits to extract the sequence number. Otherwise the ieee80211_sn_less() check at the top of ieee80211_sta_manage_reorder_buf drops frames until the sequence number catches up. Cc: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Denton Gentry <denton.gentry@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:13:32 +02:00
Bob Copeland	6c6fa49649	mac80211: mesh_plink: handle confirm frames with new plid The 802.11 standard says when processing a plink confirm frame: "If the peerLinkID in the mesh peering instance has not been set, the Local Link ID field of the Mesh Peering Confirm request shall be copied into the peerLinkID in the mesh peering instance." We were only doing this when receiving an open peering frame, but it could happen that the open frame gets lost and so we should handle this case rather than rejecting the confirm and failing the whole peering process. Reported-by: Yu Niiro <yu.niiro@gmail.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:12:55 +02:00
Felix Fietkau	3918edb0e6	mac80211: fix smps mode check for AP_VLAN In ieee80211_sta_ps_deliver_wakeup, sdata->smps_mode is checked. This is initialized only for the base AP interface, not the individual VLANs. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:12:44 +02:00
Felix Fietkau	0e67c13667	mac80211: ignore AP_VLAN in ieee80211_recalc_chanctx_chantype When bringing down the AP, a WARN_ON is hit because the bss config chandef is empty here. Since AP_VLAN channel settings do not matter for anything chanctx related (always inherits the settings from the AP interface), let's just ignore it here. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 11:12:37 +02:00
Johannes Berg	bb512ad073	Revert "mac80211: disable uAPSD if all ACs are under ACM" This reverts commit `24aa11ab8a`. That commit was wrong since it uses data that hasn't even been set up yet, but might be a hold-over from a previous connection. Additionally, it seems like a driver-specific workaround that shouldn't have been in mac80211 to start with. Cc: stable@vger.kernel.org Fixes: `24aa11ab8a` ("mac80211: disable uAPSD if all ACs are under ACM") Reviewed-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-08-26 09:45:35 +02:00
Michal Kubeček	db115037bb	net: fix checksum features handling in netif_skb_features() This is follow-up to `da08143b85` ("vlan: more careful checksum features handling") which introduced more careful feature intersection in vlan code, taking into account that HW_CSUM should be considered superset of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features() in order to avoid offloading mismatch warning when vlan is created on top of a bond consisting of slaves supporting IP/IPv6 checksumming but not vlan Tx offloading. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-25 17:23:03 -07:00
WANG Cong	453a940ea7	net: make skb an optional parameter for__skb_flow_dissect() Fixes: commit `690e36e726` (net: Allow raw buffers to be passed into the flow dissector) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-25 17:21:26 -07:00
WANG Cong	6451b3f59a	net: fix comments for __skb_flow_get_ports() Fixes: commit `690e36e726` (net: Allow raw buffers to be passed into the flow dissector) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-25 17:21:26 -07:00
Alexander Y. Fomichev	4c75431ac3	net: prevent of emerging cross-namespace symlinks Code manipulating sysfs symlinks on adjacent net_devices(s) currently doesn't take into account that devices potentially belong to different namespaces. This patch trying to fix an issue as follows: - check for net_ns before creating / deleting symlink. for now only netdev_adjacent_rename_links and __netdev_adjacent_dev_remove are affected, afaics __netdev_adjacent_dev_insert implies both net_devs belong to the same namespace. - Drop all existing symlinks to / from all adj_devs before switching namespace and recreate them just after. Signed-off-by: Alexander Y. Fomichev <git.user@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-25 15:17:43 -07:00
Tomasz Bursztyka	a796dac9a6	wireless: core: Reorder wiphy_register() notifications relevantly Currently it can send regulatory domain change notification before any NEW_WIPHY notification. Moreover, if rfill_register() fails, calling wiphy_unregister() will send a DEL_WIPHY though no NEW_WIPHY had been sent previously. Thus reordering so it properly notifies NEW_WIPHY before any other. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-08-25 16:17:41 -04:00
John W. Linville	07bc788424	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-08-25 15:58:02 -04:00
Mika Westerberg	fb70118c0e	net: rfkill: gpio: Add more Broadcom bluetooth ACPI IDs This adds one more ACPI ID of a Broadcom bluetooth chip. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-08-25 15:39:23 -04:00
John W. Linville	0fdcaa5948	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	2014-08-25 15:35:20 -04:00
Zhouyi Zhou	d1c85c2ebe	netfilter: HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABEL Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure that the toolchain has the required support in addition to CONFIG_JUMP_LABEL being set. Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-25 10:45:28 +02:00
David S. Miller	4798248e4e	net: Add ops->ndo_xmit_flush() Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 23:02:45 -07:00
Ian Morris	4c83acbc56	ipv6: White-space cleansing : gaps between function and symbol export This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. This patch removes some blank lines between the end of a function definition and the EXPORT_SYMBOL_GPL macro in order to prevent checkpatch warning that EXPORT_SYMBOL must immediately follow a function. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 22:37:52 -07:00
Ian Morris	cc24becae3	ipv6: White-space cleansing : Structure layouts This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. This patch addresses structure definitions, specifically it cleanses the brace placement and replaces spaces with tabs in a few places. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 22:37:52 -07:00
Ian Morris	67ba4152e8	ipv6: White-space cleansing : Line Layouts This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. A number of items are addressed in this patch: * Multiple spaces converted to tabs * Spaces before tabs removed. * Spaces in pointer typing cleansed (char )foo etc. Remove space after sizeof * Ensure spacing around comparators such as if statements. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 22:37:52 -07:00
Tom Herbert	48a5fc7731	gre: When GRE csum is present count as encap layer wrt csum In GRE demux if the GRE checksum pop rcv encapsulation so that any encapsulated checksums are treated as tunnel checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:24 -07:00
Tom Herbert	57c67ff4bd	udp: additional GRO support Implement GRO for UDPv6. Add UDP checksum verification in gro_receive for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:24 -07:00
Tom Herbert	149d0774a7	tcp: Call skb_gro_checksum_validate In tcp[64]_gro_receive call skb_gro_checksum_validate to validate TCP checksum in the gro context. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:24 -07:00
Tom Herbert	758f75d1ff	gre: call skb_gro_checksum_simple_validate Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:23 -07:00
Tom Herbert	573e8fca25	net: skb_gro_checksum_* functions Add skb_gro_checksum_validate, skb_gro_checksum_validate_zero_check, and skb_gro_checksum_simple_validate, and __skb_gro_checksum_complete. These are the cognates of the normal checksum functions but are used in the gro_receive path and operate on GRO related fields in sk_buffs. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-24 18:09:23 -07:00
Jozsef Kadlecsik	1b05756c48	netfilter: ipset: Fix warn: integer overflows 'sizeof(map) + size set->dsize' Dan Carpenter reported that the static checker emits the warning net/netfilter/ipset/ip_set_list_set.c:600 init_list_set() warn: integer overflows 'sizeof(map) + size set->dsize' Limit the maximal number of elements in list type of sets. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:33:10 +02:00
Mark Rustad	94729f8a1e	netfilter: ipset: Resolve missing-field-initializer warnings Resolve missing-field-initializer warnings by providing a directed initializer. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:32:34 +02:00
Sergey Popovich	6e41ee684e	netfilter: ipset: netnet,netportnet: Fix value range support for IPv4 Ranges of values are broken with hash:net,net and hash:net,port,net. hash:net,net ============ # ipset create test-nn hash:net,net # ipset add test-nn 10.0.10.1-10.0.10.127,10.0.0.0/8 # ipset list test-nn Name: test-nn Type: hash:net,net Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 16960 References: 0 Members: 10.0.10.1,10.0.0.0/8 # ipset test test-nn 10.0.10.65,10.0.0.1 10.0.10.65,10.0.0.1 is NOT in set test-nn. # ipset test test-nn 10.0.10.1,10.0.0.1 10.0.10.1,10.0.0.1 is in set test-nn. hash:net,port,net ================= # ipset create test-npn hash:net,port,net # ipset add test-npn 10.0.10.1-10.0.10.127,tcp:80,10.0.0.0/8 # ipset list test-npn Name: test-npn Type: hash:net,port,net Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 17344 References: 0 Members: 10.0.10.8/29,tcp:80,10.0.0.0 10.0.10.16/28,tcp:80,10.0.0.0 10.0.10.2/31,tcp:80,10.0.0.0 10.0.10.64/26,tcp:80,10.0.0.0 10.0.10.32/27,tcp:80,10.0.0.0 10.0.10.4/30,tcp:80,10.0.0.0 10.0.10.1,tcp:80,10.0.0.0 # ipset list test-npn # ipset test test-npn 10.0.10.126,tcp:80,10.0.0.2 10.0.10.126,tcp:80,10.0.0.2 is NOT in set test-npn. # ipset test test-npn 10.0.10.126,tcp:80,10.0.0.0 10.0.10.126,tcp:80,10.0.0.0 is in set test-npn. # ipset create test-npn hash:net,port,net # ipset add test-npn 10.0.10.0/24,tcp:80-81,10.0.0.0/8 # ipset list test-npn Name: test-npn Type: hash:net,port,net Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 17024 References: 0 Members: 10.0.10.0,tcp:80,10.0.0.0 10.0.10.0,tcp:81,10.0.0.0 # ipset test test-npn 10.0.10.126,tcp:80,10.0.0.0 10.0.10.126,tcp:80,10.0.0.0 is NOT in set test-npn. # ipset test test-npn 10.0.10.0,tcp:80,10.0.0.0 10.0.10.0,tcp:80,10.0.0.0 is in set test-npn. Correctly setup from..to variables where no IPSET_ATTR_IP_TO{,2} attribute is given, so in range processing loop we construct proper cidr value. Check whenever we have no ranges and can short cut in hash:net,net properly. Use unlikely() where appropriate, to comply with other modules. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:32:05 +02:00
Vytas Dauksa	ecc245c2bd	netfilter: ipset: Removed invalid IPSET_ATTR_MARKMASK validation Markmask is an u32, hence it can't be greater then 4294967295 ( i.e. 0xffffffff ). This was causing smatch warning: net/netfilter/ipset/ip_set_hash_gen.h:1084 hash_ipmark_create() warn: impossible condition '(markmask > 4294967295) => (0-u32max > u32max)' Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:31:34 +02:00
Ana Rey	afc5be3079	netfilter: nft_meta: Add cpu attribute support Add cpu support to meta expresion. This allows you to match packets with cpu number. Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-24 14:08:46 +02:00
Ana Rey	e2a093ff0d	netfilter: nft_meta: add pkttype support Add pkttype support for ip, ipv6 and inet families of tables. This allows you to fetch the meta packet type based on the link layer information. The loopback traffic is a special case, the packet type is guessed from the network layer header. No special handling for bridge and arp since we're not going to see such traffic in the loopback interface. Joint work with Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-24 14:06:39 +02:00
Daniel Borkmann	8fc54f6891	net: use reciprocal_scale() helper Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 12:21:21 -07:00
David S. Miller	690e36e726	net: Allow raw buffers to be passed into the flow dissector. Drivers, and perhaps other entities we have not yet considered, sometimes want to know how deep the protocol headers go before deciding how large of an SKB to allocate and how much of the packet to place into the linear SKB area. For example, consider a driver which has a device which DMAs into pools of pages and then tells the driver where the data went in the DMA descriptor(s). The driver can then build an SKB and reference most of the data via SKB fragments (which are page/offset/length triplets). However at least some of the front of the packet should be placed into the linear SKB area, which comes before the fragments, so that packet processing can get at the headers efficiently. The first thing each protocol layer is going to do is a "pskb_may_pull()" so we might as well aggregate as much of this as possible while we're building the SKB in the driver. Part of supporting this is that we don't have an SKB yet, so we want to be able to let the flow dissector operate on a raw buffer in order to compute the offset of the end of the headers. So now we have a __skb_flow_dissect() which takes an explicit data pointer and length. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 12:13:41 -07:00
Jon Paul Maloy	301bae56f2	tipc: merge struct tipc_port into struct tipc_sock We complete the merging of the port and socket layer by aggregating the fields of struct tipc_port directly into struct tipc_sock, and moving the combined structure into socket.c. We also move all functions and macros that are not any longer exposed to the rest of the stack into socket.c, and rename them accordingly. Despite the size of this commit, there are no functional changes. We have only made such changes that are necessary due of the removal of struct tipc_port. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	808d90f9c5	tipc: remove files ref.h and ref.c The reference table is now 'socket aware' instead of being generic, and has in reality become a socket internal table. In order to be able to minimize the API exposed by the socket layer towards the rest of the stack, we now move the reference table definitions and functions into the file socket.c, and rename the functions accordingly. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	2e84c60b77	tipc: remove include file port.h We move the inline functions in the file port.h to socket.c, and modify their names accordingly. We move struct tipc_port and some macros to socket.h. Finally, we remove the file port.h. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	0fc87aaebd	tipc: remove source file port.c In this commit, we move the remaining functions in port.c to socket.c, and give them new names that correspond to their new location. We then remove the file port.c. There are only cosmetic changes to the moved functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	6c9808ce09	tipc: remove port_lock In previous commits we have reduced usage of port_lock to a minimum, and complemented it with usage of bh_lock_sock() at the remaining locations. The purpose has been to remove this lock altogether, since it largely duplicates the role of bh_lock_sock. We are now ready to do this. However, we still need to protect the BH callers from inadvertent release of the socket while they hold a reference to it. We do this by replacing port_lock by a combination of a rw-lock protecting the reference table as such, and updating the socket reference counter while the socket is referenced from BH. This technique is more standard and comprehensible than the previous approach, and turns out to have a positive effect on overall performance. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	9b50fd087a	tipc: replace port pointer with socket pointer in registry In order to make tipc_sock the only entity referencable from other parts of the stack, we add a tipc_sock pointer instead of a tipc_port pointer to the registry. As a consequence, we also let the function tipc_port_lock() return a pointer to a tipc_sock instead of a tipc_port. We keep the function's name for now, since the lock still is owned by the port. This is another step in the direction of eliminating port_lock, replacing its usage with lock_sock() and bh_lock_sock(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5a9ee0be33	tipc: use registry when scanning sockets The functions tipc_port_get_ports() and tipc_port_reinit() scan over all sockets/ports to access each of them. This is done by using a dedicated linked list, 'tipc_socks' where all sockets are members. The list is in turn protected by a spinlock, 'port_list_lock', while each socket is locked by using port_lock at the moment of access. In order to reduce complexity and risk of deadlock, we want to get rid of the linked list and the accompanying spinlock. This is what we do in this commit. Instead of the linked list, we use the port registry to scan across the sockets. We also add usage of bh_lock_sock() inside the scope of port_lock in both functions, as a preparation for the complete removal of port_lock. Finally, we move the functions from port.c to socket.c, and rename them to tipc_sk_sock_show() and tipc_sk_reinit() repectively. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5b8fa7ce82	tipc: eliminate functions tipc_port_init and tipc_port_destroy After the latest changes to the socket/port layer the existence of the functions tipc_port_init() and tipc_port_destroy() cannot be justified. They are both called only once, from tipc_sk_create() and tipc_sk_delete() respectively, and their functionality can better be merged into the latter two functions. This also entails that all remaining references to port_lock now are made from inside socket.c, something that will make it easier to remove this lock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	739f5e4efc	tipc: redefine message acknowledge function The function tipc_acknowledge() is a remnant from the obsolete native API. Currently, it grabs port_lock, before building an acknowledge message and sending it to the peer. Since all access to socket members now is protected by the socket lock, it has become unnecessary to grab port_lock here. In this commit, we remove the usage of port_lock, simplify the function, and move it to socket.c, renaming it to tipc_sk_send_ack(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	dadebc0029	tipc: eliminate port_connect()/port_disconnect() functions tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete native API. Their only task is to grab port_lock and call the functions __tipc_port_connect()/__tipc_port_disconnect() respectively, which will perform the actual state change. Since socket/port exection now is single-threaded the use of port_lock is not needed any more, so we can safely replace the two functions with their lock-free counterparts. In this commit, we remove the two functions. Furthermore, the contents of __tipc_port_disconnect() is so trivial that we choose to eliminate that function too, expanding its functionality into tipc_shutdown(). __tipc_port_connect() is simplified, moved to socket.c, and given the more correct name tipc_sk_finish_conn(). Finally, we eliminate the function auto_connect(), and expand its contents into filter_connect(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	80e44c2225	tipc: eliminate function tipc_port_shutdown() tipc_port_shutdown() is a remnant from the now obsolete native interface. As such it grabs port_lock in order to protect itself from concurrent BH processing. However, after the recent changes to the port/socket upcalls, sockets are now basically single-threaded, and all execution, except the read-only tipc_sk_timer(), is executing within the protection of lock_sock(). So the use of port_lock is not needed here. In this commit we eliminate the whole function, and merge it into its only caller, tipc_shutdown(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5728901581	tipc: clean up socket timer function The last remaining BH upcall to the socket, apart for the message reception function tipc_sk_rcv(), is the timer function. We prefer to let this function continue executing in BH, since it only does read-acces to semi-permanent data, but we make three changes to it: 1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope of port_lock. This is a preparation for replacing port_lock with bh_lock_sock() at the locations where it is still used. 2) We move the function from port.c to socket.c, as a further step of eliminating the port code level altogether. 3) We let it make use of the newly introduced tipc_msg_create() function. This enables us to get rid of three context specific functions (port_create_self_abort_msg() etc.) in port.c Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	02be61a981	tipc: use message to abort connections when losing contact to node In the current implementation, each 'struct tipc_node' instance keeps a linked list of those ports/sockets that are connected to the node represented by that struct. The purpose of this is to let the node object know which sockets to alert when it loses contact with its peer node, i.e., which sockets need to have their connections aborted. This entails an unwanted direct reference from the node structure back to the port/socket structure, and a need to grab port_lock when we have to make an upcall to the port. We want to get rid of this unecessary BH entry point into the socket, and also eliminate its use of port_lock. In this commit, we instead let the node struct keep list of "connected socket" structs, which each represents a connected socket, but is allocated independently by the node at the moment of connection. If the node loses contact with its peer node, the list is traversed, and a "connection abort" message is created for each entry in the list. The message is sent to it respective connected socket using the ordinary data path, and the receiving socket aborts its connections upon reception of the message. This enables us to get rid of the direct reference from 'struct node' to ´struct port', and another unwanted BH access point to the latter. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	50100a5e39	tipc: use pseudo message to wake up sockets after link congestion The current link implementation keeps a linked list of blocked ports/ sockets that is populated when there is link congestion. The purpose of this is to let the link know which users to wake up when the congestion abates. This adds unnecessary complexity to the data structure and the code, since it forces us to involve the link each time we want to delete a socket. It also forces us to grab the spinlock port_lock within the scope of node_lock. We want to get rid of this direct dependence, as well as the deadlock hazard resulting from the usage of port_lock. In this commit, we instead let the link keep list of a "wakeup" pseudo messages for use in such situations. Those messages are sent to the pending sockets via the ordinary message reception path, and wake up the socket's owner when they are received. This enables us to get rid of the 'waiting_ports' linked lists in struct tipc_port that manifest this direct reference. As a consequence, we can eliminate another BH entry into the socket, and hence the need to grab port_lock. This is a further step in our effort to remove port_lock altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	1dd0bd2b14	tipc: introduce new function tipc_msg_create() The function tipc_msg_init() has turned out to be of limited value in many cases. It take too few parameters to be usable for creating a complete message, it makes too many assumptions about what the message should be used for, and it does not allocate any buffer to be returned to the caller. Therefore, we now introduce the new function tipc_msg_create(), which takes all the parameters needed to create a full message, and returns a buffer of the requested size. The new function will be very useful for the changes we will be doing in later commits in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
David S. Miller	f9474ddfaa	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pulling to get some TIPC fixes that a net-next series depends upon. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:12:08 -07:00
Yuchung Cheng	989e04c5bc	tcp: improve undo on timeout Upon timeout, undo (via both timestamps/Eifel and DSACKs) was disabled if any retransmits were still in flight. The concern was perhaps that spurious retransmission sent in a previous recovery episode may trigger DSACKs to falsely undo the current recovery. However, this inadvertently misses undo opportunities (using either TCP timestamps or DSACKs) when timeout occurs during a loss episode, i.e. recurring timeouts or timeout during fast recovery. In these cases some retransmissions will be in flight but we should allow undo. Furthermore, we should only reset undo_marker and undo_retrans upon timeout if we are starting a new recovery episode. Finally, when we do reset our undo state, we now do so in a manner similar to tcp_enter_recovery(), so that we require a DSACK for each of the outstsanding retransmissions. This will achieve the original goal by requiring that we receive the same number of DSACKs as retransmissions. This patch increases the undo events by 50% on Google servers. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 21:28:02 -07:00
Eric Dumazet	884cf705c7	net: remove dead code after sk_data_ready change As a followup to commit `676d23690f` ("net: Fix use after free by removing length arg from sk_data_ready callbacks"), we can remove some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 21:08:50 -07:00

1 2 3 4 5 ...

34242 Commits