linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-16 07:24:39 +08:00

Author	SHA1	Message	Date
David Howells	e33b3d97bc	rxrpc: The protocol family should be set to PF_RXRPC not PF_UNIX Fix the protocol family set in the proto_ops for rxrpc to be PF_RXRPC not PF_UNIX. Signed-off-by: David Howells <dhowells@redhat.com>	2016-03-04 15:54:27 +00:00
David Howells	0d12f8a402	rxrpc: Keep the skb private record of the Rx header in host byte order Currently, a copy of the Rx packet header is copied into the the sk_buff private data so that we can advance the pointer into the buffer, potentially discarding the original. At the moment, this copy is held in network byte order, but this means we're doing a lot of unnecessary translations. The reasons it was done this way are that we need the values in network byte order occasionally and we can use the copy, slightly modified, as part of an iov array when sending an ack or an abort packet. However, it seems more reasonable on review that it would be better kept in host byte order and that we make up a new header when we want to send another packet. To this end, rename the original header struct to rxrpc_wire_header (with BE fields) and institute a variant called rxrpc_host_header that has host order fields. Change the struct in the sk_buff private data into an rxrpc_host_header and translate the values when filling it in. This further allows us to keep values kept in various structures in host byte order rather than network byte order and allows removal of some fields that are byteswapped duplicates. Signed-off-by: David Howells <dhowells@redhat.com>	2016-03-04 15:53:46 +00:00
David Howells	4c198ad17a	rxrpc: Rename call events to begin RXRPC_CALL_EV_ Rename call event names to begin RXRPC_CALL_EV_ to distinguish them from the flags. Signed-off-by: David Howells <dhowells@redhat.com>	2016-03-04 15:53:46 +00:00
David Howells	5b8848d149	rxrpc: Convert call flag and event numbers into enums Convert call flag and event numbers into enums and move their definitions outside of the struct. Also move the call state enum outside of the struct and add an extra element to count the number of states. Signed-off-by: David Howells <dhowells@redhat.com>	2016-03-04 15:53:46 +00:00
David Howells	e721498a63	rxrpc: Fix a case where a call event bit is being used as a flag bit Fix a case where RXRPC_CALL_RELEASE (an event) is being used to specify a flag bit. RXRPC_CALL_RELEASED should be used instead. Signed-off-by: David Howells <dhowells@redhat.com>	2016-03-04 15:53:46 +00:00
Eric Dumazet	1f27cde313	net: sched: use pfifo_fast for non real queues Some devices declare a high number of TX queues, then set a much lower real_num_tx_queues This cause setups using fq_codel, sfq or fq as the default qdisc to consume more memory than really needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-03 17:38:46 -05:00
David Ahern	799977d9aa	net: ipv6: Fix refcnt on host routes Andrew and Ying Huang's test robot both reported usage count problems that trace back to the 'keep address on ifdown' patch. >From Andrew: We execute CRIU test on linux-next. On the current linux-next kernel they hangs on creating a network namespace. The kernel log contains many massages like this: [ 1036.122108] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 1046.165156] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 1056.210287] unregister_netdevice: waiting for lo to become free. Usage count = 2 I tried to revert this patch and the bug disappeared. Here is a set of commands to reproduce this bug: [root@linux-next-test linux-next]# uname -a Linux linux-next-test 4.5.0-rc6-next-20160301+ #3 SMP Wed Mar 2 17:32:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [root@linux-next-test ~]# unshare -n [root@linux-next-test ~]# ip link set up dev lo [root@linux-next-test ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever [root@linux-next-test ~]# logout [root@linux-next-test ~]# unshare -n ----- The problem is a change made to RTM_DELADDR case in __ipv6_ifa_notify that was added in an early version of the offending patch and is no longer needed. Fixes: `f1705ec197` ("net: ipv6: Make address flushing on ifdown optional") Cc: Andrey Wagin <avagin@gmail.com> Cc: Ying Huang <ying.huang@linux.intel.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-03 17:26:18 -05:00
Arnd Bergmann	3d1cbe839a	net: mellanox: add DEVLINK dependencies The new NET_DEVLINK infrastructure can be a loadable module, but the drivers using it might be built-in, which causes link errors like: drivers/net/built-in.o: In function `mlx4_load_one': :(.text+0x2fbfda): undefined reference to `devlink_port_register' :(.text+0x2fc084): undefined reference to `devlink_port_unregister' drivers/net/built-in.o: In function `mlxsw_sx_port_remove': :(.text+0x33a03a): undefined reference to `devlink_port_type_clear' :(.text+0x33a04e): undefined reference to `devlink_port_unregister' There are multiple ways to avoid this: a) add 'depends on NET_DEVLINK \|\| !NET_DEVLINK' dependencies for each user b) use 'select NET_DEVLINK' from each driver that uses it and hide the symbol in Kconfig. c) make NET_DEVLINK a 'bool' option so we don't have to list it as a dependency, and rely on the APIs to be stubbed out when it is disabled d) use IS_REACHABLE() rather than IS_ENABLED() to check for NET_DEVLINK in include/net/devlink.h This implements a variation of approach a) by adding an intermediate symbol that drivers can depend on, and changes the three drivers using it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `09d4d087cd` ("mlx4: Implement devlink interface") Fixes: `c4745500e9` ("mlxsw: Implement devlink interface") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-03 17:08:59 -05:00
Florian Westphal	5d150a9855	ipv6: re-enable fragment header matching in ipv6_find_hdr When ipv6_find_hdr is used to find a fragment header (caller specifies target NEXTHDR_FRAGMENT) we erronously return -ENOENT for all fragments with nonzero offset. Before commit `9195bb8e38`, when target was specified, we did not enter the exthdr walk loop as nexthdr == target so this used to work. Now we do (so we can skip empty route headers). When we then stumble upon a frag with nonzero frag_off we must return -ENOENT ("header not found") only if the caller did not specifically request NEXTHDR_FRAGMENT. This allows nfables exthdr expression to match ipv6 fragments, e.g. via nft add rule ip6 filter input frag frag-off gt 0 Fixes: `9195bb8e38` ("ipv6: improve ipv6_find_hdr() to skip empty routing headers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-03 16:35:20 -05:00
Parthasarathy Bhuvaragan	f214fc4029	tipc: Revert "tipc: use existing sk_write_queue for outgoing packet chain" reverts commit `94153e36e7` ("tipc: use existing sk_write_queue for outgoing packet chain") In Commit `94153e36e7`, we assume that we fill & empty the socket's sk_write_queue within the same lock_sock() session. This is not true if the link is congested. During congestion, the socket lock is released while we wait for the congestion to cease. This implementation causes a nullptr exception, if the user space program has several threads accessing the same socket descriptor. Consider two threads of the same program performing the following: Thread1 Thread2 -------------------- ---------------------- Enter tipc_sendmsg() Enter tipc_sendmsg() lock_sock() lock_sock() Enter tipc_link_xmit(), ret=ELINKCONG spin on socket lock.. sk_wait_event() : release_sock() grab socket lock : Enter tipc_link_xmit(), ret=0 : release_sock() Wakeup after congestion lock_sock() skb = skb_peek(pktchain); !! TIPC_SKB_CB(skb)->wakeup_pending = tsk->link_cong; In this case, the second thread transmits the buffers belonging to both thread1 and thread2 successfully. When the first thread wakeup after the congestion it assumes that the pktchain is intact and operates on the skb's in it, which leads to the following exception: [2102.439969] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0 [2102.440074] IP: [<ffffffffa005f330>] __tipc_link_xmit+0x2b0/0x4d0 [tipc] [2102.440074] PGD 3fa3f067 PUD 3fa6b067 PMD 0 [2102.440074] Oops: 0000 [#1] SMP [2102.440074] CPU: 2 PID: 244 Comm: sender Not tainted 3.12.28 #1 [2102.440074] RIP: 0010:[<ffffffffa005f330>] [<ffffffffa005f330>] __tipc_link_xmit+0x2b0/0x4d0 [tipc] [...] [2102.440074] Call Trace: [2102.440074] [<ffffffff8163f0b9>] ? schedule+0x29/0x70 [2102.440074] [<ffffffffa006a756>] ? tipc_node_unlock+0x46/0x170 [tipc] [2102.440074] [<ffffffffa005f761>] tipc_link_xmit+0x51/0xf0 [tipc] [2102.440074] [<ffffffffa006d8ae>] tipc_send_stream+0x11e/0x4f0 [tipc] [2102.440074] [<ffffffff8106b150>] ? __wake_up_sync+0x20/0x20 [2102.440074] [<ffffffffa006dc9c>] tipc_send_packet+0x1c/0x20 [tipc] [2102.440074] [<ffffffff81502478>] sock_sendmsg+0xa8/0xd0 [2102.440074] [<ffffffff81507895>] ? release_sock+0x145/0x170 [2102.440074] [<ffffffff815030d8>] ___sys_sendmsg+0x3d8/0x3e0 [2102.440074] [<ffffffff816426ae>] ? _raw_spin_unlock+0xe/0x10 [2102.440074] [<ffffffff81115c2a>] ? handle_mm_fault+0x6ca/0x9d0 [2102.440074] [<ffffffff8107dd65>] ? set_next_entity+0x85/0xa0 [2102.440074] [<ffffffff816426de>] ? _raw_spin_unlock_irq+0xe/0x20 [2102.440074] [<ffffffff8107463c>] ? finish_task_switch+0x5c/0xc0 [2102.440074] [<ffffffff8163ea8c>] ? __schedule+0x34c/0x950 [2102.440074] [<ffffffff81504e12>] __sys_sendmsg+0x42/0x80 [2102.440074] [<ffffffff81504e62>] SyS_sendmsg+0x12/0x20 [2102.440074] [<ffffffff8164aed2>] system_call_fastpath+0x16/0x1b In this commit, we maintain the skb list always in the stack. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-03 16:30:29 -05:00
David S. Miller	aefd3fb26e	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-03-01 Here's our main set of Bluetooth & 802.15.4 patches for the 4.6 kernel. - New Bluetooth HCI driver for Intel/AG6xx controllers - New Broadcom ACPI IDs - LED trigger support for indicating Bluetooth powered state - Various fixes in mac802154, 6lowpan and related drivers - New USB IDs for AR3012 Bluetooth controllers Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-03 16:27:23 -05:00
Benjamin Poirier	1837b2e2bc	mld, igmp: Fix reserved tailroom calculation The current reserved_tailroom calculation fails to take hlen and tlen into account. skb: [__hlen__\|__data____________\|__tlen___\|__extra__] ^ ^ head skb_end_offset In this representation, hlen + data + tlen is the size passed to alloc_skb. "extra" is the extra space made available in __alloc_skb because of rounding up by kmalloc. We can reorder the representation like so: [__hlen__\|__data____________\|__extra__\|__tlen___] ^ ^ head skb_end_offset The maximum space available for ip headers and payload without fragmentation is min(mtu, data + extra). Therefore, reserved_tailroom = data + extra + tlen - min(mtu, data + extra) = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) Compare the second line to the current expression: reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset) and we can see that hlen and tlen are not taken into account. The min() in the third line can be expanded into: if mtu < skb_tailroom - tlen: reserved_tailroom = skb_tailroom - mtu else: reserved_tailroom = tlen Depending on hlen, tlen, mtu and the number of multicast address records, the current code may output skbs that have less tailroom than dev->needed_tailroom or it may output more skbs than needed because not all space available is used. Fixes: `4c672e4b` ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-03 15:41:07 -05:00
Eric Engestrom	a9d562358b	net/ipv4: remove left over dead code `8cc785f6f4` ("net: ipv4: make the ping /proc code AF-independent") removed the code using it, but renamed this variable instead of removing it. Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 15:00:55 -05:00
Eric Engestrom	6353e1875d	net/rtnetlink: remove dead code `3b766cd832` ("net/core: Add reading VF statistics through the PF netdevice") added that variable but it's never been used. Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 15:00:55 -05:00
Avinash Repaka	1659185fb4	RDS: IB: Support Fastreg MR (FRMR) memory registration mode Fastreg MR(FRMR) is another method with which one can register memory to HCA. Some of the newer HCAs supports only fastreg mr mode, so we need to add support for it to have RDS functional on them. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com	ad6832f950	RDS: IB: allocate extra space on queues for FRMR support Fastreg MR(FRMR) memory registration and invalidation makes use of work request and completion queues for its operation. Patch allocates extra queue space towards these operation(s). Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com	2cb2912d65	RDS: IB: add Fastreg MR (FRMR) detection support Discovere Fast Memmory Registration support using IB device IB_DEVICE_MEM_MGT_EXTENSIONS. Certain HCA might support just FRMR or FMR or both FMR and FRWR. In case both mr type are supported, default FMR is used. Default MR is still kept as FMR against what everyone else is following. Default will be changed to FRMR once the RDS performance with FRMR is comparable with FMR. The work is in progress for the same. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com	db42753adb	RDS: IB: add mr reused stats Add MR reuse statistics to RDS IB transport. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com	37ea401e9c	RDS: IB: handle the RDMA CM time wait event Drop the RDS connection on RDMA_CM_EVENT_TIMEWAIT_EXIT so that it can reconnect and resume. While testing fastreg, this error happened in couple of tests but was getting un-noticed. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com	d4de76da5c	RDS: IB: add connection info to ibmr Preperatory patch for FRMR support. From connection info, we can retrieve cm_id which contains qp handled needed for work request posting. We also need to drop the RDS connection on QP error states where connection handle becomes useful. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:18 -05:00
santosh.shilimkar@oracle.com	490ea5967b	RDS: IB: move FMR code to its own file No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:18 -05:00
santosh.shilimkar@oracle.com	a69365a39c	RDS: IB: create struct rds_ib_fmr Keep fmr related filed in its own struct. Fastreg MR structure will be added to the union. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:18 -05:00
santosh.shilimkar@oracle.com	f6df683f32	RDS: IB: Re-organise ibmr code No functional changes. This is in preperation towards adding fastreg memory resgitration support. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:18 -05:00
santosh.shilimkar@oracle.com	dcfd041c87	RDS: IB: Remove the RDS_IB_SEND_OP dependency This helps to combine asynchronous fastreg MR completion handler with send completion handler. No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:17 -05:00
santosh.shilimkar@oracle.com	5711f8b353	RDS: Add support for SO_TIMESTAMP for incoming messages The SO_TIMESTAMP generates time stamp for each incoming RDS messages User app can enable it by using SO_TIMESTAMP setsocketopt() at SOL_SOCKET level. CMSG data of cmsg type SO_TIMESTAMP contains the time stamp in struct timeval format. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:17 -05:00
santosh.shilimkar@oracle.com	dcdede0406	RDS: Drop stale iWARP RDMA transport RDS iWarp support code has become stale and non testable. As indicated earlier, am dropping the support for it. If new iWarp user(s) shows up in future, we can adapat the RDS IB transprt for the special RDMA READ sink case. iWarp needs an MR for the RDMA READ sink. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:13:17 -05:00
Pablo Neira Ayuso	8a6bf5da1a	netfilter: nft_masq: support port range Complete masquerading support by allowing port range selection. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:27 +01:00
Florian Westphal	5f6c253ebe	netfilter: bridge: register hooks only when bridge interface is added This moves bridge hooks to a register-when-needed scheme. We use a device notifier to register the 'call-iptables' netfilter hooks only once a bridge gets added. This means that if the initial namespace uses a bridge, newly created network namespaces no longer get the PRE_ROUTING ipt_sabotage hook. It will registered in that network namespace once a bridge is created within that namespace. A few modules still use global hooks: - conntrack - bridge PF_BRIDGE hooks - IPVS - CLUSTER match (deprecated) - SYNPROXY As long as these modules are not loaded/used, a new network namespace has empty hook list and NF_HOOK() will boil down to single list_empty test even if initial namespace does stateless packet filtering. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:25 +01:00
Florian Westphal	b9e69e1273	netfilter: xtables: don't hook tables by default delay hook registration until the table is being requested inside a namespace. Historically, a particular table (iptables mangle, ip6tables filter, etc) was registered on module load. When netns support was added to iptables only the ip/ip6tables ruleset was made namespace aware, not the actual hook points. This means f.e. that when ipt_filter table/module is loaded on a system, then each namespace on that system has an (empty) iptables filter ruleset. In other words, if a namespace sends a packet, such skb is 'caught' by netfilter machinery and fed to hooking points for that table (i.e. INPUT, FORWARD, etc). Thanks to Eric Biederman, hooks are no longer global, but per namespace. This means that we can avoid allocation of empty ruleset in a namespace and defer hook registration until we need the functionality. We register a tables hook entry points ONLY in the initial namespace. When an iptables get/setockopt is issued inside a given namespace, we check if the table is found in the per-namespace list. If not, we attempt to find it in the initial namespace, and, if found, create an empty default table in the requesting namespace and register the needed hooks. Hook points are destroyed only once namespace is deleted, there is no 'usage count' (it makes no sense since there is no 'remove table' operation in xtables api). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:24 +01:00
Florian Westphal	a67dd266ad	netfilter: xtables: prepare for on-demand hook register This change prepares for upcoming on-demand xtables hook registration. We change the protoypes of the register/unregister functions. A followup patch will then add nf_hook_register/unregister calls to the iptables one. Once a hook is registered packets will be picked up, so all assignments of the form net->ipv4.iptable_$table = new_table have to be moved to ip(6)t_register_table, else we can see NULL net->ipv4.iptable_$table later. This patch doesn't change functionality; without this the actual change simply gets too big. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:23 +01:00
Joe Stringer	5f547391f5	netfilter: nf_defrag_ipv4: Drop redundant ip_send_check() Since commit `0848f6428b` ("inet: frags: fix defragmented packet's IP header for af_packet"), ip_send_check() would be called twice for defragmentation that occurs from netfilter ipv4 defrag hooks. Remove the extra call. Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:22 +01:00
Pablo Neira Ayuso	a7ed31cf65	Merge tag 'ipvs-for-v4.6' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next into HEAD Simon Horman says: ==================== please consider these cleanups for IPVS for v4.6. * Arnd Bergmann has resolved a bunch of unused variable warnings and; * Yannick Brosseau has removed a noisy debug message ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:01 +01:00
Arnd Bergmann	fb653ebddc	batman-adv: clarify CFG80211 dependency The driver calls cfg80211_get_station, which may be part of a module, so we must not enable BATMAN_ADV_BATMAN_V if BATMAN_ADV=y and CFG80211=m: net/built-in.o: In function `batadv_v_elp_get_throughput': (text+0x5c62c): undefined reference to `cfg80211_get_station' This clarifies the dependency to cover all combinations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `c833484e5f` ("batman-adv: ELP - compute the metric based on the estimated throughput") Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 13:45:47 -05:00
David S. Miller	68df7247f4	Here are a few more fixes for the current cycle: * check GCMP encryption vs. fragmentation properly; we'd found this problem quite a while ago but waited for the 802.11 spec to be updated * fix RTS/CTS logic in minstrel_ht * fix RX of certain public action frames in AP mode * add mac80211_hwsim to MAC80211 in MAINTAINERS, this helps the kbuild robot pick up the right tree for it -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJW1u7EAAoJEGt7eEactAAdZXsP/il3dCxYtqYtBOzmR8OkDztn yJiapt9aZ/u1a1mfDKfsJ2/8OLsE4pY/fp3G8TttURKNB/26sHWUEYSHWV2Ecfmi 4QFvAiUeO9Ue50ggASno239hp51KIKGcVF5flqKrCjTaSSyUm8YR78S1NlH2N+lT i+i/AidHEdLnTgvYGOUhWu3QyL7Bm5t5wVZPivZUs0Zoos+tTokEj40eYmetQX27 lQMGa/xbvLHYeBcuLSgdQbzUR1jzOi70b9+QkjqhrYrlsc+MZE8ghwOgeAeQx56B 0sg0fZZVI8DIaCD5S3Pj8EP0WnP990DtM+4nSvtfFUYJxEM7LthEV+bs4Gr1MR2q dZESMisKoRWBf2SyUhoVeUO6bV5kY+eTz/wyQCTdFsp9u6jzOgiY7y2aliqbJFgD 0Ydbyqq1dAeRqFaTg/46OAXkUxHW7/8hQC78Cmintevd1nHivZC3eevG6xj22G1N eUvqy+OIv0d2Hg8+CL6oBsexQrKLg4bQFPR6s/doPnXtgsh5JccFYWujClsrNL26 kRb1kYrIqoVWXwnPp1dLF02s6Ny0+AW4xHLQIAwz5HACsJp+SjnpRP3WF9FTPs3V T71dXQ48MLrQ/EVidCpMwJYoUnZ+BWeUf5hHpkxILIP1f0elx3i50eshlMNA/kwb YzufoqiQhICciwEBAgBM =2rZC -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2016-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Here are a few more fixes for the current cycle: * check GCMP encryption vs. fragmentation properly; we'd found this problem quite a while ago but waited for the 802.11 spec to be updated * fix RTS/CTS logic in minstrel_ht * fix RX of certain public action frames in AP mode * add mac80211_hwsim to MAC80211 in MAINTAINERS, this helps the kbuild robot pick up the right tree for it ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 13:35:31 -05:00
David S. Miller	7da5ee09f1	With this patchset we finally introduce our new routing protocol: B.A.T.M.A.N. V. Its implementation started quite some years ago, but due to the big changes being introduced it took a while to be discussed, designed, worked, re-worked, tested and debugged (well, we're never done with the latest). The entire operation has basically been a team work involving all the core contributors together with other people interested in the project. The new protocol is divided into two main subcomponents, called respectively ELP and OGMv2. The former is in charge of dealing with the neighbour discovery and link quality estimation, while the latter implements the algorithm that spreads the metrics around the network and computes optimal paths. The biggest change introduced with B.A.T.M.A.N. V is the new metric: the protocol won't rely on packet loss anymore, but it will use the estimated throughput extracted directly from the wifi driver (when available) by querying cfg80211. Batman-adv will also send some unicast probing packets when an interface is not used for payload traffic to make sure that such values are current. The new protocol can be compiled-in or not like other features we have and when selected will pull in CFG80211 as dependency for the reason described above. Thanks to the big work brought up in the past by Marek Lindner, batman-adv can easily deal several protocol implementations, therefore compiling in this new version does not exclude the older. This means that the user is offered the option to choose the protocol when creating the mesh interface (default is the old one to keep backward compatibility). Along with the protocol there are some sysfs knobs that are introduced to fine tune some of its behaviours, but users are recommended to keep the default values unless they know what they are doing. The last patch is about advertising our own patchwork platform (thanks to Sven Eckelmann for having set that up!) in the MAINTAINERS file. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJW1AvOAAoJENpFlCjNi1MRKxsP/1BR8Wh3MrtNhX7MMe7PnZEQ p00kArkjtnZCbBVFbG3JduqYaOPnGyzCLgBgF3DnOxx8gAGUEu6fu4AQNNr9jiUM 4lPnPbe5NuvjzQW27fJDZMuy4Y1FLaJ5DeoYjDH48/YDNS38nzlPPtNwg0pJA2TG C22dM6CX9kzxiy5RETmjgIjK2fgIybudWtCv7BGZxNyiqMaq+bYD+ZJJ6/wbj/FV uKbWB28puJZ3U3qTJ6ygbhAq8moeubp/nbZeh2RGkhpmnS7LKgdLWyPxJLAEza7Z jbdhyAE2xDYaU2xxVdOH9GecSYeUlQ+mrBdEblo2cfKMgShwZBrNjQxCaJIHN3G9 ta44yUpjGFl6/HIQnzfNZD4bTRNnb/DeRTG5qWxdrGAbxZESsMPf1Ph/hZ+KvSZ5 7eSZPmVGRYl8hJaLc25bG8pCOLQRsx/54hXNqYRqmImiPQi4IRvKUQkEDXBpHrT+ EwChfIQvCJSzZQAljrhYP/ytg/CiZxNKgW1u8cJRlHRs3jbBxJUJ67gvf2yWdQdj Luc76uLA1++8QLqhny6AtLeRrr7mYDFnJuNjkwbqWFSyBK6r/lKS1L3ereTfeVIM 5VWMuEDlbWuPscufDJ6WDgWwTqypybjvYbnDlQV7xHap7OegyS/ukQVz2W5lXvS1 gGERfC8WeW1JnFb5TSAY =qd6p -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Antonio Quartulli says: ==================== batman-adv 20160229 this is our (hopefully) latest batch of patches intended for net-next. With this patchset we finally introduce B.A.T.M.A.N. V: the latest version of our routing protocol. Technical documentation describing the protocol in more detail can be found in our wiki[1][2][3][4]. For what concerns this pull request, you can find the high level description right below. [1] https://www.open-mesh.org/projects/batman-adv/wiki/BATMAN_V [2] https://www.open-mesh.org/projects/batman-adv/wiki/OGMv2 [3] https://www.open-mesh.org/projects/batman-adv/wiki/ELP [4] https://www.open-mesh.org/projects/batman-adv/wiki/BATMAN_V_Tests ... With this patchset we finally introduce our new routing protocol: B.A.T.M.A.N. V. Its implementation started quite some years ago, but due to the big changes being introduced it took a while to be discussed, designed, worked, re-worked, tested and debugged (well, we're never done with the latest). The entire operation has basically been a team work involving all the core contributors together with other people interested in the project. The new protocol is divided into two main subcomponents, called respectively ELP and OGMv2. The former is in charge of dealing with the neighbour discovery and link quality estimation, while the latter implements the algorithm that spreads the metrics around the network and computes optimal paths. The biggest change introduced with B.A.T.M.A.N. V is the new metric: the protocol won't rely on packet loss anymore, but it will use the estimated throughput extracted directly from the wifi driver (when available) by querying cfg80211. Batman-adv will also send some unicast probing packets when an interface is not used for payload traffic to make sure that such values are current. The new protocol can be compiled-in or not like other features we have and when selected will pull in CFG80211 as dependency for the reason described above. Thanks to the big work brought up in the past by Marek Lindner, batman-adv can easily deal several protocol implementations, therefore compiling in this new version does not exclude the older. This means that the user is offered the option to choose the protocol when creating the mesh interface (default is the old one to keep backward compatibility). Along with the protocol there are some sysfs knobs that are introduced to fine tune some of its behaviours, but users are recommended to keep the default values unless they know what they are doing. The last patch is about advertising our own patchwork platform (thanks to Sven Eckelmann for having set that up!) in the MAINTAINERS file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:48:00 -05:00
Zhang Shengju	c145aeb3ff	net: pktgen: use reset to set mac header Since offset is zero, it's not necessary to use set function. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:46:54 -05:00
David S. Miller	241deec944	sch_mqprio: Fix build with older gcc. CC [M] net/sched/sch_mqprio.o net/sched/sch_mqprio.c: In function ?mqprio_init?: net/sched/sch_mqprio.c:145: error: unknown field ?tc? specified in initializer net/sched/sch_mqprio.c:145: warning: missing braces around initializer net/sched/sch_mqprio.c:145: warning: (near initialization for ?tc.<anonymous>?) make[2]: * [net/sched/sch_mqprio.o] Error 1 make[1]: * [net/sched] Error 2 make: *** [net] Error 2 Several people reported this, surround the unnamed union member initialization with braces to fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:44:59 -05:00
WANG Cong	64d4e3431e	net: remove skb_sender_cpu_clear() After commit `52bd2d62ce` ("net: better skb->sender_cpu and skb->napi_id cohabitation") skb_sender_cpu_clear() becomes empty and can be removed. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:36:47 -05:00
Xin Long	3d73e8fac8	sctp: sctp_remaddr_seq_show use the wrong variable to dump transport info Now in sctp_remaddr_seq_show(), we use variable tsp to get the param v. but tsp is also used to traversal transport_addr_list, which will cover the previous value, and make sctp_transport_put work on the wrong transport. So fix it by adding a new variable to get the param v. Fixes: `fba4c330c5` ("sctp: hold transport before we access t->asoc in sctp proc") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:35:44 -05:00
Xin Long	40b4f0fd74	sctp: lack the check for ports in sctp_v6_cmp_addr As the member .cmp_addr of sctp_af_inet6, sctp_v6_cmp_addr should also check the port of addresses, just like sctp_v4_cmp_addr, cause it's invoked by sctp_cmp_addr_exact(). Now sctp_v6_cmp_addr just check the port when two addresses have different family, and lack the port check for two ipv6 addresses. that will make sctp_hash_cmp() cannot work well. so fix it by adding ports comparison in sctp_v6_cmp_addr(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:25:12 -05:00
David Ahern	4f25a1110c	net: ipv6/l3mdev: Move host route on saved address if necessary Commit `f1705ec197` allows IPv6 addresses to be retained on a link down. The address can have a cached host route which can point to the wrong FIB table if the L3 enslavement is changed (e.g., route can point to local table instead of VRF table if device is added to an L3 domain). On link up check the table of the cached host route against the FIB table associated with the device and correct if needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:22:52 -05:00
Deepa Dinamani	6497c7e640	net: sctp: Convert log timestamps to be y2038 safe SCTP probe log timestamps use struct timespec which is not y2038 safe. Use struct timespec64 which is 2038 safe instead. Use monotonic time instead of real time as only time differences are logged. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:18:44 -05:00
Deepa Dinamani	b1b270d863	net: ipv4: tcp_probe: Replace timespec with timespec64 TCP probe log timestamps use struct timespec which is not y2038 safe. Even though timespec might be good enough here as it is used to represent delta time, the plan is to get rid of all uses of timespec in the kernel. Replace with struct timespec64 which is y2038 safe. Prints still use unsigned long format and type. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:18:44 -05:00
Deepa Dinamani	822c868532	net: ipv4: Convert IP network timestamps to be y2038 safe ICMP timestamp messages and IP source route options require timestamps to be in milliseconds modulo 24 hours from midnight UT format. Add inet_current_timestamp() function to support this. The function returns the required timestamp in network byte order. Timestamp calculation is also changed to call ktime_get_real_ts64() which uses struct timespec64. struct timespec64 is y2038 safe. Previously it called getnstimeofday() which uses struct timespec. struct timespec is not y2038 safe. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:18:44 -05:00
Jamal Hadi Salim	200e10f469	Support to encoding decoding skb prio on IFE action Example usage: Set the skb priority using skbedit then allow it to be encoded sudo tc qdisc add dev $ETH root handle 1: prio sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action skbedit prio 17 \ action ife encode \ allow prio \ dst 02:15:15:15:15:15 Note: You dont need the skbedit action if you are already encoding the skb priority earlier. A zero skb priority will not be sent Alternative hard code static priority of decimal 33 (unlike skbedit) then mark of 0x12 every time the filter matches sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action ife encode \ type 0xDEAD \ use prio 33 \ use mark 0x12 \ dst 02:15:15:15:15:15 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:15:23 -05:00
Jamal Hadi Salim	084e2f6566	Support to encoding decoding skb mark on IFE action Example usage: Set the skb using skbedit then allow it to be encoded sudo tc qdisc add dev $ETH root handle 1: prio sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action skbedit mark 17 \ action ife encode \ allow mark \ dst 02:15:15:15:15:15 Note: You dont need the skbedit action if you are already encoding the skb mark earlier. A zero skb mark, when seen, will not be encoded. Alternative hard code static mark of 0x12 every time the filter matches sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action ife encode \ type 0xDEAD \ use mark 0x12 \ dst 02:15:15:15:15:15 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:15:23 -05:00
Jamal Hadi Salim	ef6980b6be	introduce IFE action This action allows for a sending side to encapsulate arbitrary metadata which is decapsulated by the receiving end. The sender runs in encoding mode and the receiver in decode mode. Both sender and receiver must specify the same ethertype. At some point we hope to have a registered ethertype and we'll then provide a default so the user doesnt have to specify it. For now we enforce the user specify it. Lets show example usage where we encode icmp from a sender towards a receiver with an skbmark of 17; both sender and receiver use ethertype of 0xdead to interop. YYYY: Lets start with Receiver-side policy config: xxx: add an ingress qdisc sudo tc qdisc add dev $ETH ingress xxx: any packets with ethertype 0xdead will be subjected to ife decoding xxx: we then restart the classification so we can match on icmp at prio 3 sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \ u32 match u32 0 0 flowid 1:1 \ action ife decode reclassify xxx: on restarting the classification from above if it was an icmp xxx: packet, then match it here and continue to the next rule at prio 4 xxx: which will match based on skb mark of 17 sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \ u32 match ip protocol 1 0xff flowid 1:1 \ action continue xxx: match on skbmark of 0x11 (decimal 17) and accept sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \ handle 0x11 fw flowid 1:1 \ action ok xxx: Lets show the decoding policy sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead xxx: filter pref 2 u32 filter pref 2 u32 fh 800: ht divisor 1 filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 0 success 0) match 00000000/00000000 at 0 (success 0 ) action order 1: ife decode action reclassify index 1 ref 1 bind 1 installed 14 sec used 14 sec type: 0x0 Metadata: allow mark allow hash allow prio allow qmap Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: Observe that above lists all metadatum it can decode. Typically these submodules will already be compiled into a monolithic kernel or loaded as modules YYYY: Lets show the sender side now .. xxx: Add an egress qdisc on the sender netdev sudo tc qdisc add dev $ETH root handle 1: prio xxx: xxx: Match all icmp packets to 192.168.122.237/24, then xxx: tag the packet with skb mark of decimal 17, then xxx: Encode it with: xxx: ethertype 0xdead xxx: add skb->mark to whitelist of metadatum to send xxx: rewrite target dst MAC address to 02:15:15:15:15:15 xxx: sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 u32 \ match ip dst 192.168.122.237/24 \ match ip protocol 1 0xff \ flowid 1:2 \ action skbedit mark 17 \ action ife encode \ type 0xDEAD \ allow mark \ dst 02:15:15:15:15:15 xxx: Lets show the encoding policy sudo tc -s filter ls dev $ETH parent 1: protocol ip xxx: filter pref 10 u32 filter pref 10 u32 fh 800: ht divisor 1 filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2 (rule hit 0 success 0) match c0a87aed/ffffffff at 16 (success 0 ) match 00010000/00ff0000 at 8 (success 0 ) action order 1: skbedit mark 17 index 6 ref 1 bind 1 Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: ife encode action pipe index 3 ref 1 bind 1 dst MAC: 02:15:15:15:15:15 type: 0xDEAD Metadata: allow mark Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: test by sending ping from sender to destination Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:15:22 -05:00
David S. Miller	d67703fced	Here's another round of updates for -next: * big A-MSDU RX performance improvement (avoid linearize of paged RX) * rfkill changes: cleanups, documentation, platform properties * basic PBSS support in cfg80211 * MU-MIMO action frame processing support * BlockAck reordering & duplicate detection offload support * various cleanups & little fixes -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJW0LZ0AAoJEGt7eEactAAdde0P/2meIOehuHBuAtL7REVoNhri bz9eSHMTg+ozCspL7F6vW1ifDI9AaJEaqJccmriueE/UQVC3VXRPGRJ4SFCwZGo9 Zrtys2v9wOq0+XhxyN65Ucf41O9F/5FFabR5OFbf/pZhW5b2cubEjD1P4BB76Iya 8O6wf9oDDjt3zJgYK+sygm3k9wtDVrH3qEbj8IDnCy22P7010qCsfok9swfaq8OB DBgb6BVfDOFTNXvJGH5fRuUKZdtovzzxorXnoG+zjmKmFdMVdgIYj9+2QfnMjW03 B4/W85svcLLH8V3lHZc4G8oKM4J4XtjH1PskKIMF7ThJsKGMf8tL2vpt9rr8iscd Y9SwTEGc9JmhL7n2FaQFlY6ScLcp4ML+2rXxDOMpBmgF3Ne3yfBsJhLKZEl8vSfI mKhzGXpUKjJxJWIxkR0ylJy4/zHeIXkgRlUEhb8t+jgAqvOBTwiVY+vljHCDUERa sH40r1OqnGJtOHkSRqXSpxwXW+eKgyDd7fnnRX/tyttp2Fuew27/fN63SjpsfN6O 3lfSM5bl3FcCKx7vqTLuqzsoqGvDDYkSq6GDfKDqeZIk0vaXA3SJNEOKgymFWQfR rzsaXvTbBT34GYRg3xS2NCxlmcBPemei/q0x6ZOffxhF41Qpqjs1dPB1Yq3AW4jD HGF+NdRbWEqEFVIjQa8w =JHOe -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2016-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Here's another round of updates for -next: * big A-MSDU RX performance improvement (avoid linearize of paged RX) * rfkill changes: cleanups, documentation, platform properties * basic PBSS support in cfg80211 * MU-MIMO action frame processing support * BlockAck reordering & duplicate detection offload support * various cleanups & little fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:03:27 -05:00
Nikolay Aleksandrov	59f78f9f6c	bridge: mcast: add support for more router port information dumping Allow for more multicast router port information to be dumped such as timer and type attributes. For that that purpose we need to extend the MDBA_ROUTER_PORT attribute similar to how it was done for the mdb entries recently. The new format is thus: [MDBA_ROUTER_PORT] = { <- nested attribute u32 ifindex <- router port ifindex for user-space compatibility [MDBA_ROUTER_PATTR attributes] } This way it remains compatible with older users (they'll simply retrieve the u32 in the beginning) and new users can parse the remaining attributes. It would also allow to add future extensions to the router port without breaking compatibility. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov	a55d8246ab	bridge: mcast: add support for temporary port router Add support for a temporary router port which doesn't depend only on the incoming query. It can be refreshed if set to the same value, which is a no-op for the rest. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov	4950cfd1e6	bridge: mcast: do nothing if port's multicast_router is set to the same val This is needed for the upcoming temporary port router. There's no point to go through the logic if the value is the same. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov	7f0aec7a66	bridge: mcast: use names for the different multicast_router types Using raw values makes it difficult to extend and also understand the code, give them names and do explicit per-option manipulation in br_multicast_set_port_router. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:55:07 -05:00
Vivien Didelot	fb2dabad69	net: dsa: support VLAN filtering switchdev attr When a user explicitly requests VLAN filtering with something like: # echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering Switchdev propagates a SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING port attribute. Add support for it in the DSA layer with a new port_vlan_filtering function to let drivers toggle 802.1Q filtering on user demand. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:24:51 -05:00
Jiri Pirko	bfcd3a4661	Introduce devlink infrastructure Introduce devlink infrastructure for drivers to register and expose to userspace via generic Netlink interface. There are two basic objects defined: devlink - one instance for every "parent device", for example switch ASIC devlink port - one instance for every physical port of the device. This initial portion implements basic get/dump of objects to userspace. Also, port splitter and port type setting is implemented. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:07:29 -05:00
John Fastabend	9e8ce79cd7	net: sched: cls_u32 add bit to specify software only rules In the initial implementation the only way to stop a rule from being inserted into the hardware table was via the device feature flag. However this doesn't work well when working on an end host system where packets are expect to hit both the hardware and software datapaths. For example we can imagine a rule that will match an IP address and increment a field. If we install this rule in both hardware and software we may increment the field twice. To date we have only added support for the drop action so we have been able to ignore these cases. But as we extend the action support we will hit this example plus more such cases. Arguably these are not even corner cases in many working systems these cases will be common. To avoid forcing the driver to always abort (i.e. the above example) this patch adds a flag to add a rule in software only. A careful user can use this flag to build software and hardware datapaths that work together. One example we have found particularly useful is to use hardware resources to set the skb->mark on the skb when the match may be expensive to run in software but a mark lookup in a hash table is cheap. The idea here is hardware can do in one lookup what the u32 classifier may need to traverse multiple lists and hash tables to compute. The flag is only passed down on inserts. On deletion to avoid stale references in hardware we always try to remove a rule if it exists. The flags field is part of the classifier specific options. Although it is tempting to lift this into the generic structure doing this proves difficult do to how the tc netlink attributes are implemented along with how the dump/change routines are called. There is also precedence for putting seemingly generic pieces in the specific classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So although not ideal I've left FLAGS in the u32 options as well as it simplifies the code greatly and user space has already learned how to manage these bits ala 'tc' tool. Another thing if trying to update a rule we require the flags to be unchanged. This is to force user space, software u32 and the hardware u32 to keep in sync. Thanks to Simon Horman for catching this case. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:05:39 -05:00
John Fastabend	6843e7a2ab	net: sched: consolidate offload decision in cls_u32 The offload decision was originally very basic and tied to if the dev implemented the appropriate ndo op hook. The next step is to allow the user to more flexibly define if any paticular rule should be offloaded or not. In order to have this logic in one function lift the current check into a helper routine tc_should_offload(). Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:05:39 -05:00
Paolo Abeni	3a927bc7cf	ovs: propagate per dp max headroom to all vports This patch implements bookkeeping support to compute the maximum headroom for all the devices in each datapath. When said value changes, the underlying devs are notified via the ndo_set_rx_headroom method. This also increases the internal vports xmit performance. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 15:54:30 -05:00
Paolo Abeni	45493d47c3	bridge: notify enslaved devices of headroom changes On bridge needed_headroom changes, the enslaved devices are notified via the ndo_set_rx_headroom method Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 15:54:30 -05:00
Felix Fietkau	c36dd3eaf1	mac80211: minstrel_ht: fix a logic error in RTS/CTS handling RTS/CTS needs to be enabled if the rate is a fallback rate or if it's a dual-stream rate and the sta is in dynamic SMPS mode. Cc: stable@vger.kernel.org Fixes: `a3ebb4e1b7` ("mac80211: minstrel_ht: handle peers in dynamic SMPS") Reported-by: Matías Richart <mrichart@fing.edu.uy> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-03-01 15:49:51 +01:00
Jouni Malinen	1ec7bae8be	mac80211: Fix Public Action frame RX in AP mode Public Action frames use special rules for how the BSSID field (Address 3) is set. A wildcard BSSID is used in cases where the transmitter and recipient are not members of the same BSS. As such, we need to accept Public Action frames with wildcard BSSID. Commit `db8e173245` ("mac80211: ignore frames between TDLS peers when operating as AP") added a rule that drops Action frames to TDLS-peers based on an Action frame having different DA (Address 1) and BSSID (Address 3) values. This is not correct since it misses the possibility of BSSID being a wildcard BSSID in which case the Address 1 would not necessarily match. Fix this by allowing mac80211 to accept wildcard BSSID in an Action frame when in AP mode. Fixes: `db8e173245` ("mac80211: ignore frames between TDLS peers when operating as AP") Cc: stable@vger.kernel.org Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-03-01 15:45:04 +01:00
Johannes Berg	9acc54beb4	mac80211: check PN correctly for GCMP-encrypted fragmented MPDUs Just like for CCMP we need to check that for GCMP the fragments have PNs that increment by one; the spec was updated to fix this security issue and now has the following text: The receiver shall discard MSDUs and MMPDUs whose constituent MPDU PN values are not incrementing in steps of 1. Adapt the code for CCMP to work for GCMP as well, luckily the relevant fields already alias each other so no code duplication is needed (just check the aliasing with BUILD_BUG_ON.) Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-03-01 15:42:01 +01:00
WANG Cong	bdf17661f6	sch_dsmark: update backlog as well Similarly, we need to update backlog too when we update qlen. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-29 17:02:33 -05:00
WANG Cong	431e3a8e36	sch_htb: update backlog as well We saw qlen!=0 but backlog==0 on our production machine: qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17 Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0) backlog 0b 72p requeues 0 The problem is we only count qlen for HTB qdisc but not backlog. We need to update backlog too when we update qlen, so that we can at least know the average packet length. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-29 17:02:33 -05:00
WANG Cong	2ccccf5fb4	net_sched: update hierarchical backlog too When the bottom qdisc decides to, for example, drop some packet, it calls qdisc_tree_decrease_qlen() to update the queue length for all its ancestors, we need to update the backlog too to keep the stats on root qdisc accurate. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-29 17:02:33 -05:00
WANG Cong	86a7996cc8	net_sched: introduce qdisc_replace() helper Remove nearly duplicated code and prepare for the following patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-29 17:02:33 -05:00
Sudip Mukherjee	8ee225e785	netfilter: xt_osf: remove unused variable While building with W=1 we got the warning: net/netfilter/xt_osf.c:265:9: warning: variable 'loop_cont' set but not used The local variable loop_cont was only initialized and then assigned a value but was never used or checked after that. While removing the variable, the case of OSFOPT_TS was not removed so that it will serve as a reminder to us that we can do something in that particular case. Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-29 13:59:43 +01:00
Florian Westphal	b07edbe1cf	netfilter: meta: add PRANDOM support Can be used to randomly match packets e.g. for statistic traffic sampling. See commit `3ad0040573` ("bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs") for more info why this doesn't use prandom_u32 directly. Unlike bpf nft_meta can be built as a module, so add an EXPORT_SYMBOL for prandom_seed_full_state too. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-29 13:55:59 +01:00
Simon Wunderlich	97575407e9	batman-adv: Start new development cycle Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-02-29 16:25:08 +08:00
Linus Luessing	626d23e83c	batman-adv: B.A.T.M.A.N. V - implement bat_neigh_print API Lists all neighbours detected by the Echo Locating Protocol (ELP) and their throughput metric. Initially Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>	2016-02-29 16:25:08 +08:00
Antonio Quartulli	261e264db6	batman-adv: B.A.T.M.A.N. V - implement bat_orig_print API Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2016-02-29 16:25:07 +08:00
Antonio Quartulli	9786906022	batman-adv: B.A.T.M.A.N. V - implement neighbor comparison API calls Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2016-02-29 16:25:07 +08:00
Antonio Quartulli	8d2d499e08	batman-adv: ELP - send unicast ELP packets for throughput sampling In case of an unused wireless link, the mac80211 throughput estimation won't get updated further. Consequently, the reported throughput metric will become obsolete. With this patch unicast sampling is introduced by periodically sending unicast ELP packets to each neighbor on idle WiFi links. These sampling packets will fill an entire frame, so that the measurement is as reliable as possible Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2016-02-29 16:25:07 +08:00
Antonio Quartulli	c833484e5f	batman-adv: ELP - compute the metric based on the estimated throughput In case of wireless interface retrieve the throughput by querying cfg80211. To perform this call a separate work must be scheduled because the function may sleep and this is not allowed within an RCU protected context (RCU in this case is used to iterate over all the neighbours). Use ethtool to retrieve information about an Ethernet link like HALF/FULL_DUPLEX and advertised bandwidth (e.g. 100/10Mbps). The metric is updated each time a new ELP packet is sent, this way it is possible to timely react to a metric variation which can imply (for example) a neighbour disconnection. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2016-02-29 16:25:06 +08:00
Antonio Quartulli	95d392784d	batman-adv: keep track of when unicast packets are sent To enable ELP to send probing packets over wireless links only if needed, batman-adv must keep track of the last time it sent a unicast packet towards every neighbour. For this purpose a 2 main changes are introduced: 1) a new member of the elp_neigh_node structure stores the last time a unicast packet was sent towards this neighbour; 2) a wrapper function for sending unicast packets is implemented. This function will simply update the member describe din point 1) and then forward the packet to the real sending routine. Point 2) implies that any code-path leading to a unicast sending now has to use the new wrapper. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2016-02-29 16:05:32 +08:00
Antonio Quartulli	0b5ecc6811	batman-adv: add throughput override attribute to hard_ifaces This attribute is exported to user space to disable the link throughput auto-detection by setting a fixed value. The throughput override value is used when batman-adv is computing the link throughput towards a neighbour. If the value is set to 0 then batman-adv will try to detect the throughput by itself. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2016-02-29 16:05:32 +08:00
Antonio Quartulli	9323158ef9	batman-adv: OGMv2 - implement originators logic Add the support for recognising new originators in the network and rebroadcast their OGMs. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2016-02-29 16:05:31 +08:00
Antonio Quartulli	0da0035942	batman-adv: OGMv2 - add basic infrastructure This is the initial implementation of the new OGM protocol (version 2). It has been designed to work on top of the newly added ELP. In the previous version the OGM protocol was used to both measure link qualities and flood the network with the metric information. In this version the protocol is in charge of the latter task only, leaving the former to ELP. This means being able to decouple the interval used by the neighbor discovery from the OGM broadcasting, which revealed to be costly in dense networks and needed to be relaxed so leading to a less responsive routing protocol. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2016-02-29 16:05:31 +08:00
Linus Luessing	7f136cd491	batman-adv: ELP - adding sysfs parameter for elp interval This parameter can be set individually on each interface and allows the configuration of the elp interval for the link quality measurements during runtime. Usually it is desirable to set it to a higher (= slower) value on interfaces which have a more static characteristic (e.g. wired interfaces) or very dense neighbourhoods to reduce overhead. Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> [antonio@open-mesh.com: respin on top of the latest master] Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>	2016-02-29 16:05:30 +08:00
Linus Luessing	162bd64c24	batman-adv: ELP - creating neighbor structures Initially developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>	2016-02-29 16:05:30 +08:00
Linus Luessing	d6f94d91f7	batman-adv: ELP - adding basic infrastructure The B.A.T.M.A.N. protocol originally only used a single message type (called OGM) to determine the link qualities to the direct neighbors and spreading these link quality information through the whole mesh. This procedure is summarized on the BATMAN concept page and explained in details in the RFC draft published in 2008. This approach was chosen for its simplicity during the protocol design phase and the implementation. However, it also bears some drawbacks: * Wireless interfaces usually come with some packet loss, therefore a higher broadcast rate is desirable to allow a fast reaction on flaky connections. Other interfaces of the same host might be connected to Ethernet LANs / VPNs / etc which rarely exhibit packet loss would benefit from a lower broadcast rate to reduce overhead. * It generally is more desirable to detect local link quality changes at a faster rate than propagating all these changes through the entire mesh (the far end of the mesh does not need to care about local link quality changes that much). Other optimizations strategies, like reducing overhead, might be possible if OGMs weren't used for all tasks in the mesh at the same time. As a result detecting local link qualities shall be handled by an independent message type, ELP, whereas the OGM message type remains responsible for flooding the mesh with these link quality information and determining the overall path transmit qualities. Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>	2016-02-29 16:05:29 +08:00
Linus Luessing	0744ff8fa8	batman-adv: Add hard_iface specific sysfs wrapper macros for UINT This allows us to easily add a sysfs parameter for an unsigned int later, which is not for a batman mesh interface (e.g. bat0), but for a common interface instead. It allows reading and writing an atomic_t in hard_iface (instead of bat_priv compared to the mesh variant). Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> [antonio@open-mesh.com: rename functions and move macros] Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>	2016-02-29 16:05:29 +08:00
MINOURA Makoto / 箕浦真	472681d57a	net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump. When the send skbuff reaches the end, nlmsg_put and friends returns -EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called within a for_each_netdev loop and the first fdb entry of a following netdev could fit in the remaining skbuff. This breaks the mechanism of cb->args[0] and idx to keep track of the entries that are already dumped, which results missing entries in bridge fdb show command. Signed-off-by: Minoura Makoto <minoura@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-26 15:04:02 -05:00
Alexander Duyck	2246387662	GSO: Provide software checksum of tunneled UDP fragmentation offload On reviewing the code I realized that GRE and UDP tunnels could cause a kernel panic if we used GSO to segment a large UDP frame that was sent through the tunnel with an outer checksum and hardware offloads were not available. In order to correct this we need to update the feature flags that are passed to the skb_segment function so that in the event of UDP fragmentation being requested for the inner header the segmentation function will correctly generate the checksum for the payload if we cannot segment the outer header. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-26 14:23:35 -05:00
David Lamparter	17b693cdd8	net: l3mdev: prefer VRF master for source address selection When selecting an address in context of a VRF, the vrf master should be preferred for address selection. If it isn't, the user has a hard time getting the system to select to their preference - the code will pick the address off the first in-VRF interface it can find, which on a router could well be a non-routable address. Signed-off-by: David Lamparter <equinox@diac24.net> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> [dsa: Fixed comment style and removed extra blank link ] Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-26 14:22:26 -05:00
David Ahern	3f2fb9a834	net: l3mdev: address selection should only consider devices in L3 domain David Lamparter noted a use case where the source address selection fails to pick an address from a VRF interface - unnumbered interfaces. Relevant commands from his script: ip addr add 9.9.9.9/32 dev lo ip link set lo up ip link add name vrf0 type vrf table 101 ip rule add oif vrf0 table 101 ip rule add iif vrf0 table 101 ip link set vrf0 up ip addr add 10.0.0.3/32 dev vrf0 ip link add name dummy2 type dummy ip link set dummy2 master vrf0 up --> note dummy2 has no address - unnumbered device ip route add 10.2.2.2/32 dev dummy2 table 101 ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02 tcpdump -ni dummy2 & And using ping instead of his socat example: $ ping -I vrf0 -c1 10.2.2.2 ping: Warning: source address might be selected on device other than vrf0. PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data. >From tcpdump: 12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64 Note the source address is from lo and is not a VRF local address. With this patch: $ ping -I vrf0 -c1 10.2.2.2 PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data. >From tcpdump: 12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64 Now the source address comes from vrf0. The ipv4 function for selecting source address takes a const argument. Removing the const requires touching a lot of places, so instead l3mdev_master_ifindex_rcu is changed to take a const argument and then do the typecast to non-const as required by netdev_master_upper_dev_get_rcu. This is similar to what l3mdev_fib_table_rcu does. IPv6 for unnumbered interfaces appears to be selecting the addresses properly. Cc: David Lamparter <david@opensourcerouting.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-26 14:22:26 -05:00
Linus Torvalds	29a9faa641	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "There are two small messenger bug fixes and a log spam regression fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: don't spam dmesg with stray reply warnings libceph: use the right footer size when skipping a message libceph: don't bail early from try_read() when skipping a message	2016-02-26 09:35:03 -08:00
Alexander Aring	2306f65637	6lowpan: iphc: fix invalid case handling This patch fixes the return value in a case which should never occur. Instead returning "-EINVAL" we return LOWPAN_IPHC_DAM_00 which is invalid on context based addresses. Also change the WARN_ON_ONCE to WARN_ONCE which was suggested by Dan Carpenter. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-02-26 09:08:15 +01:00
Linus Torvalds	81904dbbb4	One fix for a bug that could cause a NULL write past the end of a buffer in case of unusually long writes to some system interfaces used by mountd and other nfs support utilities. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWzlVJAAoJECebzXlCjuG+fVYQAMHQX2fx33juQF03r7R4A+cI 8QITgkAEDyQLcxYu/NdXqOXYc1Nv83uJRXljS2d2Y/D25spRB8ZaIZtpVFN/RlBc 8ss3nnI/iHFd6fqPECZfgHcORwalY+eYMICc2GdXFADYjyMQDD2/xIxF0GqW2RGC xLKPPxnrcaIcgUGhFgbptpykKeYXHTK4vb/xwjkgI6m+IISFULAG4Z1jfsbIX+4N NcNtTumgVQ1lFmMrl36K1JXj/0PY9vPPyDG38nSbD0s6zsS1mm6ALCt9N+mPJdX8 yJAq1FL70HtsBD9/oqm/qg9qpJ1dSB3r867HTXlECp6+mYwNi56u3I0jTJKbqwmJ 4eYlgQr1TPVUoa2vzbt/PE4yuKRsbo52TP1cX8VF+/bEtg0R5sz+uSTQZE9hqJE/ Cfk1eXYJxwVY8BnBY5tgmiGwApN13l/1fnhdZltWWsMEZiYvU46JIhVWv/8zMaA7 XOKP1r7JkWXcbPPvI7/KOxi5O4kzs9Mr4kT6xULrAHEbElPak6MIoODwJOLOMh1X z82HHvfEd3mwRhrDDg2v0MrdikRx+mtaB3zfNFFbXcHAChRCqCQ0OHxOCE9EJhCg GVsRpRjzafcSW+NwiQrn8OpjKJ7cSaa4gFXFaikQMFHYaYgavpJVWanH3fX6iKWV RRPufAXYsyRNFmTi4AOb =ezck -----END PGP SIGNATURE----- Merge tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux Pull nfsd bugfix from Bruce Fields: "One fix for a bug that could cause a NULL write past the end of a buffer in case of unusually long writes to some system interfaces used by mountd and other nfs support utilities" * tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux: sunrpc/cache: fix off-by-one in qword_get()	2016-02-25 19:31:01 -08:00
David Decotigny	3237fc63a3	net: ethtool: remove unused __ethtool_get_settings replaced by __ethtool_get_link_ksettings. Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 22:06:47 -05:00
David Decotigny	7cad1bac96	net: core: use __ethtool_get_ksettings Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 22:06:47 -05:00
David Decotigny	702b26a24d	net: bridge: use __ethtool_get_ksettings Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 22:06:46 -05:00
David Decotigny	577097981d	net: 8021q: use __ethtool_get_ksettings Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 22:06:46 -05:00
David Decotigny	3f1ac7a700	net: ethtool: add new ETHTOOL_xLINKSETTINGS API This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API, handled by the new get_link_ksettings/set_link_ksettings callbacks. This API provides support for most legacy ethtool_cmd fields, adds support for larger link mode masks (up to 4064 bits, variable length), and removes ethtool_cmd deprecated fields (transceiver/maxrxpkt/maxtxpkt). This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides the following backward compatibility properties: - legacy ethtool with legacy drivers: no change, still using the get_settings/set_settings callbacks. - legacy ethtool with new get/set_link_ksettings drivers: the new driver callbacks are used, data internally converted to legacy ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link mode mask. ETHTOOL_SSET will fail if user tries to set the ethtool_cmd deprecated fields to non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if driver sets higher bits. - future ethtool with legacy drivers: no change, still using the get_settings/set_settings callbacks, internally converted to new data structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be ignored and seen as 0 from user space. Note that that "future" ethtool tool will not allow changes to these deprecated fields. - future ethtool with new drivers: direct call to the new callbacks. By "future" ethtool, what is meant is: - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if fails - set: query first and remember which of ETHTOOL_GLINKSETTINGS or ETHTOOL_GSET was successful + if ETHTOOL_GLINKSETTINGS was successful, then change config with ETHTOOL_SLINKSETTINGS. A failure there is final (do not try ETHTOOL_SSET). + otherwise ETHTOOL_GSET was successful, change config with ETHTOOL_SSET. A failure there is final (do not try ETHTOOL_SLINKSETTINGS). The interaction user/kernel via the new API requires a small ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link mode bitmaps. If kernel doesn't agree with user, it returns the bitmap length it is expecting from user as a negative length (and cmd field is 0). When kernel and user agree, kernel returns valid info in all fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS). Data structure crossing user/kernel boundary is 32/64-bit agnostic. Converted internally to a legal kernel bitmap. The internal __ethtool_get_settings kernel helper will gradually be replaced by __ethtool_get_link_ksettings by the time the first "link_settings" drivers start to appear. So this patch doesn't change it, it will be removed before it needs to be changed. Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 22:06:45 -05:00
Tom Herbert	a87cb3e48e	net: Facility to report route quality of connected sockets This patch add the SO_CNX_ADVICE socket option (setsockopt only). The purpose is to allow an application to give feedback to the kernel about the quality of the network path for a connected socket. The value argument indicates the type of quality report. For this initial patch the only supported advice is a value of 1 which indicates "bad path, please reroute"-- the action taken by the kernel is to call dst_negative_advice which will attempt to choose a different ECMP route, reset the TX hash for flow label and UDP source port in encapsulation, etc. This facility should be useful for connected UDP sockets where only the application can provide any feedback about path quality. It could also be useful for TCP applications that have additional knowledge about the path outside of the normal TCP control loop. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 22:01:22 -05:00
David Ahern	f1705ec197	net: ipv6: Make address flushing on ifdown optional Currently, all ipv6 addresses are flushed when the interface is configured down, including global, static addresses: $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 2100:1::2/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link valid_lft forever preferred_lft forever $ ip link set dev eth1 down $ ip -6 addr show dev eth1 << nothing; all addresses have been flushed>> Add a new sysctl to make this behavior optional. The new setting defaults to flush all addresses to maintain backwards compatibility. When the set global addresses with no expire times are not flushed on an admin down. The sysctl is per-interface or system-wide for all interfaces $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1 or $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1 Will keep addresses on eth1 on an admin down. $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 2100:1::2/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link valid_lft forever preferred_lft forever $ ip link set dev eth1 down $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000 inet6 2100:1::2/120 scope global tentative valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative valid_lft forever preferred_lft forever Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 21:45:15 -05:00
Florian Westphal	619b17452a	tipc: fix null deref crash in compat config path msg.dst_sk needs to be set up with a valid socket because some callbacks later derive the netns from it. Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation") Reported-by: Jon Maloy <maloy@donjonn.com> Bisected-by: Jon Maloy <maloy@donjonn.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 17:04:48 -05:00
Jon Paul Maloy	d25a01257e	tipc: fix crash during node removal When the TIPC module is unloaded, we have identified a race condition that allows a node reference counter to go to zero and the node instance being freed before the node timer is finished with accessing it. This leads to occasional crashes, especially in multi-namespace environments. The scenario goes as follows: CPU0:(node_stop) CPU1:(node_timeout) // ref == 2 1: if(!mod_timer()) 2: if (del_timer()) 3: tipc_node_put() // ref -> 1 4: tipc_node_put() // ref -> 0 5: kfree_rcu(node); 6: tipc_node_get(node) 7: // BOOM! We now clean up this functionality as follows: 1) We remove the node pointer from the node lookup table before we attempt deactivating the timer. This way, we reduce the risk that tipc_node_find() may obtain a valid pointer to an instance marked for deletion; a harmless but undesirable situation. 2) We use del_timer_sync() instead of del_timer() to safely deactivate the node timer without any risk that it might be reactivated by the timeout handler. There is no risk of deadlock here, since the two functions never touch the same spinlocks. 3: We remove a pointless tipc_node_get() + tipc_node_put() from the timeout handler. Reported-by: Zhijiang Hu <huzhijiang@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 17:04:48 -05:00
Jon Paul Maloy	b170997ace	tipc: eliminate risk of finding to-be-deleted node instance Although we have never seen it happen, we have identified the following problematic scenario when nodes are stopped and deleted: CPU0: CPU1: tipc_node_xxx() //ref == 1 tipc_node_put() //ref -> 0 tipc_node_find() // node still in table tipc_node_delete() list_del_rcu(n. list) tipc_node_get() //ref -> 1, bad kfree_rcu() tipc_node_put() //ref to 0 again. kfree_rcu() // BOOM! We fix this by introducing use of the conditional kref_get_if_not_zero() instead of kref_get() in the function tipc_node_find(). This eliminates any risk of post-mortem access. Reported-by: Zhijiang Hu <huzhijiang@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 17:04:48 -05:00
Linus Lüssing	9b368814b3	net: fix bridge multicast packet checksum validation We need to update the skb->csum after pulling the skb, otherwise an unnecessary checksum (re)computation can ocure for IGMP/MLD packets in the bridge code. Additionally this fixes the following splats for network devices / bridge ports with support for and enabled RX checksum offloading: [...] [ 43.986968] eth0: hw csum failure [ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2 [ 43.996193] Hardware name: BCM2709 [ 43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14) [ 44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90) [ 44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac) [ 44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178) [ 44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188) [ 44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338) [ 44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00) [ 44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c) [...] Fixes: `9afd85c9e4` ("net: Export IGMP/MLD message validation code") Reported-by: Álvaro Fernández Rojas <noltari@gmail.com> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 16:16:38 -05:00
Vivien Didelot	477b184526	net: dsa: drop vlan_getnext The VLAN GetNext operation is specific to some switches, and thus can be complicated to implement for some drivers. Remove the support for the vlan_getnext/port_pvid_get approach in favor of the generic and simpler port_vlan_dump function. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 15:20:21 -05:00

1 2 3 4 5 ...

41170 Commits