linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-16 00:34:20 +08:00

Author	SHA1	Message	Date
David S. Miller	3ae55826ae	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Validate hooks for nf_tables NAT expressions, otherwise users can crash the kernel when using them from the wrong hook. We already got one user trapped on this when configuring masquerading. 2) Fix a BUG splat in nf_tables with CONFIG_DEBUG_PREEMPT=y. Reported by Andreas Schultz. 3) Avoid unnecessary reroute of traffic in the local input path in IPVS that triggers a crash in in xfrm. Reported by Florian Wiessner and fixes by Julian Anastasov. 4) Fix memory and module refcount leak from the error path of nf_tables_newchain(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-02 19:30:53 -08:00
Richard Weinberger	e6b02be81b	Documentation: Update netlink_mmap.txt Update netlink_mmap.txt wrt. commit `4682a03586` ("netlink: Always copy on mmap TX."). Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-02 18:50:00 -08:00
David L Stevens	44ba582bea	sunvnet: set queue mapping when doing packet copies This patch fixes a bug where vnet_skb_shape() didn't set the already-selected queue mapping when a packet copy was required. This results in using the wrong queue index for stops/starts, hung tx queues and watchdog timeouts under heavy load. Signed-off-by: David L Stevens <david.stevens@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-02 18:20:35 -08:00
Marcelo Leitner	61132bf7fb	qlge: Fix qlge_update_hw_vlan_features to handle if interface is down Currently qlge_update_hw_vlan_features() will always first put the interface down, then update features and then bring it up again. But it is possible to hit this code while the adapter is down and this causes a non-paired call to napi_disable(), which will get stuck. This patch fixes it by skipping these down/up actions if the interface is already down. Fixes: `a45adbe8d3` ("qlge: Enhance nested VLAN (Q-in-Q) handling.") Cc: Harish Patil <harish.patil@qlogic.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-02 17:51:14 -08:00
Eric Dumazet	bdbbb8527b	ipv4: tcp: get rid of ugly unicast_sock In commit `be9f4a44e7` ("ipv4: tcp: remove per net tcp_sock") I tried to address contention on a socket lock, but the solution I chose was horrible : commit `3a7c384ffd` ("ipv4: tcp: unicast_sock should not land outside of TCP stack") addressed a selinux regression. commit `0980e56e50` ("ipv4: tcp: set unicast_sock uc_ttl to -1") took care of another regression. commit `b5ec8eeac4` ("ipv4: fix ip_send_skb()") fixed another regression. commit `811230cd85` ("tcp: ipv4: initialize unicast_sock sk_pacing_rate") was another shot in the dark. Really, just use a proper socket per cpu, and remove the skb_orphan() call, to re-enable flow control. This solves a serious problem with FQ packet scheduler when used in hostile environments, as we do not want to allocate a flow structure for every RST packet sent in response to a spoofed packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-01 23:06:19 -08:00
Eric Dumazet	0d32ef8cef	net: sched: fix panic in rate estimators Doing the following commands on a non idle network device panics the box instantly, because cpu_bstats gets overwritten by stats. tc qdisc add dev eth0 root <your_favorite_qdisc> ... some traffic (one packet is enough) ... tc qdisc replace dev eth0 root est 1sec 4sec <your_favorite_qdisc> [ 325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c [ 325.362609] IP: [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90 [ 325.369158] PGD `1fa7067` PUD 0 [ 325.372254] Oops: 0000 [#1] SMP [ 325.375514] Modules linked in: ... [ 325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163 [ 325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000 [ 325.419518] RIP: 0010:[<ffffffff81541c9e>] [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90 [ 325.428506] RSP: 0018:ffff881ff2fa7928 EFLAGS: 00010286 [ 325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c [ 325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060 [ 325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000 [ 325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744 [ 325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c [ 325.469536] FS: 00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000 [ 325.477630] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0 [ 325.490510] Stack: [ 325.492524] ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0 [ 325.499990] ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744 [ 325.507460] 00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0 [ 325.514956] Call Trace: [ 325.517412] [<ffffffff815424ee>] gen_new_estimator+0x8e/0x230 [ 325.523250] [<ffffffff815427aa>] gen_replace_estimator+0x4a/0x60 [ 325.529349] [<ffffffff815718ab>] tc_modify_qdisc+0x52b/0x590 [ 325.535117] [<ffffffff8155edd0>] rtnetlink_rcv_msg+0xa0/0x240 [ 325.540963] [<ffffffff8155ed30>] ? __rtnl_unlock+0x20/0x20 [ 325.546532] [<ffffffff8157f811>] netlink_rcv_skb+0xb1/0xc0 [ 325.552145] [<ffffffff8155b355>] rtnetlink_rcv+0x25/0x40 [ 325.557558] [<ffffffff8157f0d8>] netlink_unicast+0x168/0x220 [ 325.563317] [<ffffffff8157f47c>] netlink_sendmsg+0x2ec/0x3e0 Lets play safe and not use an union : percpu 'pointers' are mostly read anyway, and we have typically few qdiscs per host. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Fixes: `22e0f8b932` ("net: sched: make bstats per cpu and estimator RCU safe") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-31 17:49:37 -08:00
Haiyang Zhang	d953ca4ddf	hyperv: Fix the error processing in netvsc_send() The existing code frees the skb in EAGAIN case, in which the skb will be retried from upper layer and used again. Also, the existing code doesn't free send buffer slot in error case, because there is no completion message for unsent packets. This patch fixes these problems. (Please also include this patch for stable trees. Thanks!) Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-31 17:31:49 -08:00
Iyappan Subramanian	ecf6ba83d7	drivers: net: xgene: fix: Out of order descriptor bytes read This patch fixes the following kernel crash, WARNING: CPU: 2 PID: 0 at net/ipv4/tcp_input.c:3079 tcp_clean_rtx_queue+0x658/0x80c() Call trace: [<fffffe0000096b7c>] dump_backtrace+0x0/0x184 [<fffffe0000096d10>] show_stack+0x10/0x1c [<fffffe0000685ea0>] dump_stack+0x74/0x98 [<fffffe00000b44e0>] warn_slowpath_common+0x88/0xb0 [<fffffe00000b461c>] warn_slowpath_null+0x14/0x20 [<fffffe00005b5c1c>] tcp_clean_rtx_queue+0x654/0x80c [<fffffe00005b6228>] tcp_ack+0x454/0x688 [<fffffe00005b6ca8>] tcp_rcv_established+0x4a4/0x62c [<fffffe00005bf4b4>] tcp_v4_do_rcv+0x16c/0x350 [<fffffe00005c225c>] tcp_v4_rcv+0x8e8/0x904 [<fffffe000059d470>] ip_local_deliver_finish+0x100/0x26c [<fffffe000059dad8>] ip_local_deliver+0xac/0xc4 [<fffffe000059d6c4>] ip_rcv_finish+0xe8/0x328 [<fffffe000059dd3c>] ip_rcv+0x24c/0x38c [<fffffe0000563950>] __netif_receive_skb_core+0x29c/0x7c8 [<fffffe0000563ea4>] __netif_receive_skb+0x28/0x7c [<fffffe0000563f54>] netif_receive_skb_internal+0x5c/0xe0 [<fffffe0000564810>] napi_gro_receive+0xb4/0x110 [<fffffe0000482a2c>] xgene_enet_process_ring+0x144/0x338 [<fffffe0000482d18>] xgene_enet_napi+0x1c/0x50 [<fffffe0000565454>] net_rx_action+0x154/0x228 [<fffffe00000b804c>] __do_softirq+0x110/0x28c [<fffffe00000b8424>] irq_exit+0x8c/0xc0 [<fffffe0000093898>] handle_IRQ+0x44/0xa8 [<fffffe000009032c>] gic_handle_irq+0x38/0x7c [...] Software writes poison data into the descriptor bytes[15:8] and upon receiving the interrupt, if those bytes are overwritten by the hardware with the valid data, software also reads bytes[7:0] and executes receive/tx completion logic. If the CPU executes the above two reads in out of order fashion, then the bytes[7:0] will have older data and causing the kernel panic. We have to force the order of the reads and thus this patch introduces read memory barrier between these reads. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Keyur Chudgar <kchudgar@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-30 18:17:31 -08:00
David S. Miller	08178e5ac4	Merge branch 'vlan_get_protocol' Toshiaki Makita says: ==================== Fix checksum error when using stacked vlan When I was testing 802.1ad, I found several drivers don't take into account 802.1ad or multiple vlans when retrieving L3 (IP/IPv6) or L4 (TCP/UDP) protocol for checksum offload. It is mainly due to vlan_get_protocol(), which extracts ether type only when it is tagged with single 802.1Q. When 802.1ad is used or there are multiple vlans, it extracts vlan protocol and drivers cannot determine which L3/L4 protocol is used. Those drivers, most of which have IP_CSUM/IPV6_CSUM features, get L3/L4 header-offset by software, so it seems that their checksum offload works with multiple vlans if we can parse protocols correctly. (They know mac header length, and probably don't care about what is in it.) And another thing, some of Intel's drivers seem to use skb->protocol where vlan_get_protocol() is more suitable. I tested that at least igb/igbvf on I350 works with this patch set. Note: We can hand a double tagged packet with CHECKSUM_PARTIAL to a HW driver by creating a vlan device on a bridge device and enabling vlan_filtering of the bridge with 802.1ad protocol. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-30 18:03:58 -08:00
Toshiaki Makita	10e4fb333c	ixgbevf: Fix checksum error when using stacked vlan When a skb has multiple vlans and it is CHECKSUM_PARTIAL, ixgbevf_tx_csum() fails to get the network protocol and checksum related descriptor fields are not configured correctly because skb->protocol doesn't show the L3 protocol in this case. Use first->protocol instead of skb->protocol to get the proper network protocol. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-30 18:03:47 -08:00
Toshiaki Makita	0213668f06	ixgbe: Fix checksum error when using stacked vlan When a skb has multiple vlans and it is CHECKSUM_PARTIAL, ixgbe_tx_csum() fails to get the network protocol and checksum related descriptor fields are not configured correctly because skb->protocol doesn't show the L3 protocol in this case. Use vlan_get_protocol() to get the proper network protocol. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-30 18:03:47 -08:00
Toshiaki Makita	72b1405964	igbvf: Fix checksum error when using stacked vlan When a skb has multiple vlans and it is CHECKSUM_PARTIAL, igbvf_tx_csum() fails to get the network protocol and checksum related descriptor fields are not configured correctly because skb->protocol doesn't show the L3 protocol in this case. Use vlan_get_protocol() to get the proper network protocol. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-30 18:03:47 -08:00
Toshiaki Makita	d4bcef3fbe	net: Fix vlan_get_protocol for stacked vlan vlan_get_protocol() could not get network protocol if a skb has a 802.1ad vlan tag or multiple vlans, which caused incorrect checksum calculation in several drivers. Fix vlan_get_protocol() to retrieve network protocol instead of incorrect vlan protocol. As the logic is the same as skb_network_protocol(), create a common helper function __vlan_get_protocol() and call it from existing functions. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-30 18:03:47 -08:00
Saran Maruti Ramanara	cfbf654efc	net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param When making use of RFC5061, section 4.2.4. for setting the primary IP address, we're passing a wrong parameter header to param_type2af(), resulting always in NULL being returned. At this point, param.p points to a sctp_addip_param struct, containing a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation id. Followed by that, as also presented in RFC5061 section 4.2.4., comes the actual sctp_addr_param, which also contains a sctp_paramhdr, but this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that param_type2af() can make use of. Since we already hold a pointer to addr_param from previous line, just reuse it for param_type2af(). Fixes: `d6de309759` ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Saran Maruti Ramanara <saran.neti@telus.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-30 17:45:23 -08:00
Pablo Neira	8b7c36d810	netlink: fix wrong subscription bitmask to group mapping in The subscription bitmask passed via struct sockaddr_nl is converted to the group number when calling the netlink_bind() and netlink_unbind() callbacks. The conversion is however incorrect since bitmask (1 << 0) needs to be mapped to group number 1. Note that you cannot specify the group number 0 (usually known as _NONE) from setsockopt() using NETLINK_ADD_MEMBERSHIP since this is rejected through -EINVAL. This problem became noticeable since `97840cb` ("netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind") when binding to bitmask (1 << 0) in ctnetlink. Reported-by: Andre Tomt <andre@tomt.net> Reported-by: Ivan Delalande <colona@arista.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-30 17:43:47 -08:00
Pablo Neira Ayuso	f5553c19ff	netfilter: nf_tables: fix leaks in error path of nf_tables_newchain() Release statistics and module refcount on memory allocation problems. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-01-30 18:42:08 +01:00
Julian Anastasov	579eb62ac3	ipvs: rerouting to local clients is not needed anymore commit `f5a41847ac` ("ipvs: move ip_route_me_harder for ICMP") from 2.6.37 introduced ip_route_me_harder() call for responses to local clients, so that we can provide valid rt_src after SNAT. It was used by TCP to provide valid daddr for ip_send_reply(). After commit `0a5ebb8000` ("ipv4: Pass explicit daddr arg to ip_send_reply()." from 3.0 this rerouting is not needed anymore and should be avoided, especially in LOCAL_IN. Fixes 3.12.33 crash in xfrm reported by Florian Wiessner: "3.12.33 - BUG xfrm_selector_match+0x25/0x2f6" Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de> Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-01-30 10:05:55 +09:00
Li Wei	3cdaa5be9e	ipv4: Don't increase PMTU with Datagram Too Big message. RFC 1191 said, "a host MUST not increase its estimate of the Path MTU in response to the contents of a Datagram Too Big message." Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 15:28:59 -08:00
David S. Miller	a1a0b558ce	Merge branch 'arm-build-fixes' Arnd Bergmann says: ==================== net: driver fixes from arm randconfig builds These four patches are fallout from test builds on ARM. I have a few more of them in my backlog but have not yet confirmed them to still be valid. The first three patches are about incomplete dependencies on old drivers. One could backport them to the beginning of time in theory, but there is little value since nobody would run into these problems. The final patch is one I had submitted before together with the respective pcmcia patch but forgot to follow up on that. It's still a valid but relatively theoretical bug, because the previous behavior of the driver was just as broken as what we have in mainline. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 15:08:27 -08:00
Arnd Bergmann	96a30175f9	net: am2150: fix nmclan_cs.c shared interrupt handling A recent patch tried to work around a valid warning for the use of a deprecated interface by blindly changing from the old pcmcia_request_exclusive_irq() interface to pcmcia_request_irq(). This driver has an interrupt handler that is not currently aware of shared interrupts, but can be easily converted to be. At the moment, the driver reads the interrupt status register repeatedly until it contains only zeroes in the interesting bits, and handles each bit individually. This patch adds the missing part of returning IRQ_NONE in case none of the bits are set to start with, so we can move on to the next interrupt source. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `5f5316fcd0` ("am2150: Update nmclan_cs.c to use update PCMCIA API") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 15:08:21 -08:00
Arnd Bergmann	e9b106b8fb	net: lance,ni64: don't build for ARM The ni65 and lance ethernet drivers manually program the ISA DMA controller that is only available on x86 PCs and a few compatible systems. Trying to build it on ARM results in this error: ni65.c: In function 'ni65_probe1': ni65.c:496:62: error: 'DMA1_STAT_REG' undeclared (first use in this function) ((inb(DMA1_STAT_REG) >> 4) & 0x0f) ^ ni65.c:496:62: note: each undeclared identifier is reported only once for each function it appears in ni65.c:497:63: error: 'DMA2_STAT_REG' undeclared (first use in this function) \| (inb(DMA2_STAT_REG) & 0xf0); The DMA1_STAT_REG and DMA2_STAT_REG registers are only defined for alpha, mips, parisc, powerpc and x86, although it is not clear which subarchitectures actually have them at the correct location. This patch for now just disables it for ARM, to avoid randconfig build errors. We could also decide to limit it to the set of architectures on which it does compile, but that might look more deliberate than guessing based on where the drivers build. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 15:08:21 -08:00
Arnd Bergmann	303c28d859	net: wan: add missing virt_to_bus dependencies The cosa driver is rather outdated and does not get built on most platforms because it requires the ISA_DMA_API symbol. However there are some ARM platforms that have ISA_DMA_API but no virt_to_bus, and they get this build error when enabling the ltpc driver. drivers/net/wan/cosa.c: In function 'tx_interrupt': drivers/net/wan/cosa.c:1768:3: error: implicit declaration of function 'virt_to_bus' unsigned long addr = virt_to_bus(cosa->txbuf); ^ The same problem exists for the Hostess SV-11 and Sealevel Systems 4021 drivers. This adds another dependency in Kconfig to avoid that configuration. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 15:08:21 -08:00
Arnd Bergmann	fc9a570783	net: cs89x0: always build platform code if !HAS_IOPORT_MAP The cs89x0 driver can either be built as an ISA driver or a platform driver, the choice is controlled by the CS89x0_PLATFORM Kconfig symbol. Building the ISA driver on a system that does not have a way to map I/O ports fails with this error: drivers/built-in.o: In function `cs89x0_ioport_probe.constprop.1': :(.init.text+0x4794): undefined reference to `ioport_map' :(.init.text+0x4830): undefined reference to `ioport_unmap' This changes the Kconfig logic to take that option away and always force building the platform variant of this driver if CONFIG_HAS_IOPORT_MAP is not set. This is the only correct choice in this case, and it avoids the build error. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 15:08:20 -08:00
Florian Westphal	e2a4800e75	ppp: deflate: never return len larger than output buffer When we've run out of space in the output buffer to store more data, we will call zlib_deflate with a NULL output buffer until we've consumed remaining input. When this happens, olen contains the size the output buffer would have consumed iff we'd have had enough room. This can later cause skb_over_panic when ppp_generic skb_put()s the returned length. Reported-by: Iain Douglas <centos@1n6.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 14:50:01 -08:00
David S. Miller	d445d63b77	Merge branch 'netns' Nicolas Dichtel says: ==================== netns: audit netdevice creation with IFLA_NET_NS_[PID\|FD] When one of these attributes is set, the netdevice is created into the netns pointed by IFLA_NET_NS_[PID\|FD] (see the call to rtnl_create_link() in rtnl_newlink()). Let's call this netns the dest_net. After this creation, if the newlink handler exists, it is called with a netns argument that points to the netns where the netlink message has been received (called src_net in the code) which is the link netns. Hence, with one of these attributes, it's possible to create a x-netns netdevice. Here is the result of my code review: - all ip tunnels (sit, ipip, ip6_tunnels, gre[tap][v6], ip_vti[6]) does not really allows to use this feature: the netdevice is created in the dest_net and the src_net is completely ignored in the newlink handler. - VLAN properly handles this x-netns creation. - bridge ignores src_net, which seems fine (NETIF_F_NETNS_LOCAL is set). - CAIF subsystem is not clear for me (I don't know how it works), but it seems to wrongly use src_net. Patch #1 tries to fix this, but it was done only by code review (and only compile-tested), so please carefully review it. I may miss something. - HSR subsystem uses src_net to parse IFLA_HSR_SLAVE[1\|2], but the netdevice has the flag NETIF_F_NETNS_LOCAL, so the question is: does this netdevice really supports x-netns? If not, the newlink handler should use the dest_net instead of src_net, I can provide the patch. - ieee802154 uses also src_net and does not have NETIF_F_NETNS_LOCAL. Same question: does this netdevice really supports x-netns? - bonding ignores src_net and flag NETIF_F_NETNS_LOCAL is set, ie x-netns is not supported. Fine. - CAN does not support rtnl/newlink, ok. - ipvlan uses src_net and does not have NETIF_F_NETNS_LOCAL. After looking at the code, it seems that this drivers support x-netns. Am I right? - macvlan/macvtap uses src_net and seems to have x-netns support. - team ignores src_net and has the flag NETIF_F_NETNS_LOCAL, ie x-netns is not supported. Ok. - veth uses src_net and have x-netns support ;-) Ok. - VXLAN didn't properly handle this. The link netns (vxlan->net) is the src_net and not dest_net (see patch #2). Note that it was already possible to create a x-netns vxlan before the commit `f01ec1c017` ("vxlan: add x-netns support") but the nedevice remains broken. To summarize: - CAIF patch must be carefully reviewed - for HSR, ieee802154, ipvlan: is x-netns supported? ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 14:20:16 -08:00
Nicolas Dichtel	33564bbb2c	vxlan: setup the right link netns in newlink hdlr Rename the netns to src_net to avoid confusion with the netns where the interface stands. The user may specify IFLA_NET_NS_[PID\|FD] to create a x-netns netndevice: IFLA_NET_NS_[PID\|FD] points to the netns where the netdevice stands and src_net to the link netns. Note that before commit `f01ec1c017` ("vxlan: add x-netns support"), it was possible to create a x-netns vxlan netdevice, but the netdevice was not operational. Fixes: `f01ec1c017` ("vxlan: add x-netns support") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 14:20:02 -08:00
Nicolas Dichtel	8997c27ec4	caif: remove wrong dev_net_set() call src_net points to the netns where the netlink message has been received. This netns may be different from the netns where the interface is created (because the user may add IFLA_NET_NS_[PID\|FD]). In this case, src_net is the link netns. It seems wrong to override the netns in the newlink() handler because if it was not already src_net, it means that the user explicitly asks to create the netdevice in another netns. CC: Sjur Brændeland <sjur.brandeland@stericsson.com> CC: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Fixes: `8391c4aab1` ("caif: Bugfixes in CAIF netdevice for close and flow control") Fixes: `c412540063` ("caif-hsi: Add rtnl support") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 14:20:02 -08:00
karl beldan	9ce357795e	lib/checksum.c: fix build for generic csum_tcpudp_nofold Fixed commit added from64to32 under _#ifndef do_csum_ but used it under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's robot reported TILEGX's). Move from64to32 under the latter. Fixes: `150ae0e946` ("lib/checksum.c: fix carry in csum_tcpudp_nofold") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-29 11:57:38 -08:00
Eric Dumazet	811230cd85	tcp: ipv4: initialize unicast_sock sk_pacing_rate When I added sk_pacing_rate field, I forgot to initialize its value in the per cpu unicast_sock used in ip_send_unicast_reply() This means that for sch_fq users, RST packets, or ACK packets sent on behalf of TIME_WAIT sockets might be sent to slowly or even dropped once we reach the per flow limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `95bd09eb27` ("tcp: TSO packets automatic sizing") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-28 23:24:47 -08:00
karl beldan	150ae0e946	lib/checksum.c: fix carry in csum_tcpudp_nofold The carry from the 64->32bits folding was dropped, e.g with: saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1, csum_tcpudp_nofold returned 0 instead of 1. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Mike Frysinger <vapier@gentoo.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-28 22:32:33 -08:00
Roopa Prabhu	59ccaaaa49	bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081 This patch avoids calling rtnl_notify if the device ndo_bridge_getlink handler does not return any bytes in the skb. Alternately, the skb->len check can be moved inside rtnl_notify. For the bridge vlan case described in 92081, there is also a fix needed in bridge driver to generate a proper notification. Will fix that in subsequent patch. v2: rebase patch on net tree Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-28 22:21:31 -08:00
David S. Miller	95224ac180	Merge branch 'tcp_stretch_acks' Neal Cardwell says: ==================== fix stretch ACK bugs in TCP CUBIC and Reno This patch series fixes the TCP CUBIC and Reno congestion control modules to properly handle stretch ACKs in their respective additive increase modes, and in the transitions from slow start to additive increase. This finishes the project started by commit `9f9843a751` ("tcp: properly handle stretch acks in slow start"), which fixed behavior for TCP congestion control when handling stretch ACKs in slow start mode. Motivation: In the Jan 2015 netdev thread 'BW regression after "tcp: refine TSO autosizing"', Eyal Perry documented a regression that Eric Dumazet determined was caused by improper handling of TCP stretch ACKs. Background: LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that cover more than the RFC-specified maximum of 2 packets. These stretch ACKs can cause serious performance shortfalls in common congestion control algorithms, like Reno and CUBIC, which were designed and tuned years ago with receiver hosts that were not using LRO or GRO, and were instead ACKing every other packet. Testing: at Google we have been using this approach for handling stretch ACKs for CUBIC datacenter and Internet traffic for several years, with good results. v2: * fixed return type of tcp_slow_start() to be u32 instead of int ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-28 22:19:09 -08:00
Neal Cardwell	d6b1a8a92a	tcp: fix timing issue in CUBIC slope calculation This patch fixes a bug in CUBIC that causes cwnd to increase slightly too slowly when multiple ACKs arrive in the same jiffy. If cwnd is supposed to increase at a rate of more than once per jiffy, then CUBIC was sometimes too slow. Because the bic_target is calculated for a future point in time, calculated with time in jiffies, the cwnd can increase over the course of the jiffy while the bic_target calculated as the proper CUBIC cwnd at time t=tcp_time_stamp+rtt does not increase, because tcp_time_stamp only increases on jiffy tick boundaries. So since the cnt is set to: ca->cnt = cwnd / (bic_target - cwnd); as cwnd increases but bic_target does not increase due to jiffy granularity, the cnt becomes too large, causing cwnd to increase too slowly. For example: - suppose at the beginning of a jiffy, cwnd=40, bic_target=44 - so CUBIC sets: ca->cnt = cwnd / (bic_target - cwnd) = 40 / (44 - 40) = 40/4 = 10 - suppose we get 10 acks, each for 1 segment, so tcp_cong_avoid_ai() increases cwnd to 41 - so CUBIC sets: ca->cnt = cwnd / (bic_target - cwnd) = 41 / (44 - 41) = 41 / 3 = 13 So now CUBIC will wait for 13 packets to be ACKed before increasing cwnd to 42, insted of 10 as it should. The fix is to avoid adjusting the slope (determined by ca->cnt) multiple times within a jiffy, and instead skip to compute the Reno cwnd, the "TCP friendliness" code path. Reported-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-28 22:18:38 -08:00
Neal Cardwell	9cd981dcf1	tcp: fix stretch ACK bugs in CUBIC Change CUBIC to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). In addition, because we are now precisely accounting for stretch ACKs, including delayed ACKs, we can now remove the delayed ACK tracking and estimation code that tracked recent delayed ACK behavior in ca->delayed_ack. Reported-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-28 22:18:38 -08:00
Neal Cardwell	c22bdca947	tcp: fix stretch ACK bugs in Reno Change Reno to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). In addition, if snd_cwnd crosses snd_ssthresh during slow start processing, and we then exit slow start mode, we need to carry over any remaining "credit" for packets ACKed and apply that to additive increase by passing this remaining "acked" count to tcp_cong_avoid_ai(). Reported-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-28 22:18:38 -08:00
Neal Cardwell	814d488c61	tcp: fix the timid additive increase on stretch ACKs tcp_cong_avoid_ai() was too timid (snd_cwnd increased too slowly) on "stretch ACKs" -- cases where the receiver ACKed more than 1 packet in a single ACK. For example, suppose w is 10 and we get a stretch ACK for 20 packets, so acked is 20. We ought to increase snd_cwnd by 2 (since acked/w = 20/10 = 2), but instead we were only increasing cwnd by 1. This patch fixes that behavior. Reported-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-28 22:18:37 -08:00
Neal Cardwell	e73ebb0881	tcp: stretch ACK fixes prep LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that cover more than the RFC-specified maximum of 2 packets. These stretch ACKs can cause serious performance shortfalls in common congestion control algorithms that were designed and tuned years ago with receiver hosts that were not using LRO or GRO, and were instead politely ACKing every other packet. This patch series fixes Reno and CUBIC to handle stretch ACKs. This patch prepares for the upcoming stretch ACK bug fix patches. It adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and changes all congestion control algorithms to pass in 1 for the ACKed count. It also changes tcp_slow_start() to return the number of packet ACK "credits" that were not processed in slow start mode, and can be processed by the congestion control module in additive increase mode. In future patches we will fix tcp_cong_avoid_ai() to handle stretch ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start and additive increase mode. Reported-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-28 22:18:37 -08:00
Linus Torvalds	59343cd7c4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Don't OOPS on socket AIO, from Christoph Hellwig. 2) Scheduled scans should be aborted upon RFKILL, from Emmanuel Grumbach. 3) Fix sleep in atomic context in kvaser_usb, from Ahmed S Darwish. 4) Fix RCU locking across copy_to_user() in bpf code, from Alexei Starovoitov. 5) Lots of crash, memory leak, short TX packet et al bug fixes in sh_eth from Ben Hutchings. 6) Fix memory corruption in SCTP wrt. INIT collitions, from Daniel Borkmann. 7) Fix return value logic for poll handlers in netxen, enic, and bnx2x. From Eric Dumazet and Govindarajulu Varadarajan. 8) Header length calculation fix in mac80211 from Fred Chou. 9) mv643xx_eth doesn't handle highmem correctly in non-TSO code paths. From Ezequiel Garcia. 10) udp_diag has bogus logic in it's hash chain skipping, copy same fix tcp diag used. From Herbert Xu. 11) amd-xgbe programs wrong rx flow control register, from Thomas Lendacky. 12) Fix race leading to use after free in ping receive path, from Subash Abhinov Kasiviswanathan. 13) Cache redirect routes otherwise we can get a heavy backlog of rcu jobs liberating DST_NOCACHE entries. From Hannes Frederic Sowa. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits) net: don't OOPS on socket aio stmmac: prevent probe drivers to crash kernel bnx2x: fix napi poll return value for repoll ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too sh_eth: Fix DMA-API usage for RX buffers sh_eth: Check for DMA mapping errors on transmit sh_eth: Ensure DMA engines are stopped before freeing buffers sh_eth: Remove RX overflow log messages ping: Fix race in free in receive path udp_diag: Fix socket skipping within chain can: kvaser_usb: Fix state handling upon BUS_ERROR events can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT can: kvaser_usb: Send correct context to URB completion can: kvaser_usb: Do not sleep in atomic context ipv4: try to cache dst_entries which would cause a redirect samples: bpf: relax test_maps check bpf: rcu lock must not be held when calling copy_to_user() net: sctp: fix slab corruption from use after free on INIT collisions net: mv643xx_eth: Fix highmem support in non-TSO egress path sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers ...	2015-01-27 13:55:36 -08:00
Christoph Hellwig	06539d3071	net: don't OOPS on socket aio Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 12:25:33 -08:00
Andy Shevchenko	9afec6efc6	stmmac: prevent probe drivers to crash kernel In the case when alloc_netdev fails we return NULL to a caller. But there is no check for NULL in the probe drivers. This patch changes NULL to an error pointer. The function description is amended to reflect what we may get returned. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 12:24:30 -08:00
Linus Torvalds	7da323bb45	Two powerpc fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUxx7MAAoJEFHr6jzI4aWAqe0QAIGGb2mtU1gON4XtWIlYPzSu TZDGAK6pCeCr92tH5H3HXy7hOEbZOPQGJCqvOHMK/kcJtlkXSrW1ERDPryqEv9Xq V+PJMY5Tre1ga1xttyk3ny4zKmAy4UcLm6Wepi5054kB0Svo7/s2IKECwsInY/D3 yYxSW4XUZXvCYqdZUqdQbCSgocWOfFJbPqt4kf2IiqjCPlWU1GGjpMpmqXFhTh4Y wuU0FtcC7db+wCvjaXs/dbCHMZqWqsQ+WCo9kz2+fLtRbthXSk0A9V96jLzah32/ yQHqyH2+/6ZklxApjgvuQ4ZhZogU2U2PlFLfpJqiBhy+5WnkBpZtJ1lIj9hGlyC4 pvKjrHTp0Q8+gZ8ZxuEuiKV1L+01kWbtQwg4Z5HIuulz7DfiSGatSez6CuwIeV49 A5dQmse0bK6L4XWlB+RKyLdnj+HSPVFIcLlTPMSV/UlDp7IySphR+zp28ykyWv0C yGa9JUeD7hvM3Oezo88QoS4P5rk/KmQjPMhYz+ab8/eX3RCtBA1jb/KrdLm2b4tM sVJyuiyalL35IoOZlAh8rrKHfu1ZgXyYkJgMdccAy2yQZghmK4H8o8gdf9YWGNNq vEZHk+2gKI1T1o5uqvHMmkw0IWV4pE9jm3Jyo2W/pZozxR3oswOtUebCXbK+mLqy 81GwzBWwALxbRM5HGtIr =cDwY -----END PGP SIGNATURE----- Merge tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc fixes from Michael Ellerman: "Two powerpc fixes" * tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/powernv: Restore LPCR with LPCR_PECE1 cleared powerpc/xmon: Fix another endiannes issue in RTAS call from xmon	2015-01-27 10:04:38 -08:00
Linus Torvalds	41592e2fb5	SCSI was using module_refcount() to figure out when the module was unloading: this broke with new atomic refcounting. The code is still suspicious, but this solves the WARN_ON(). Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUxuOxAAoJENkgDmzRrbjxtGIP/im0s3L+wM28NdQ5zFOiIpwX 0N4y4U6Aa+rrLs27jDZtXW4ss+fr/V7pUSD1uwTrTHl9X4a3M+DLpEME5Rq40TXG MYHihlQo8PM/yCuTMFR7esF20/q9CIksuVA3GkjYqwBW+ks097+t60Yn7YSfX22U HzSiR3Pp6yPjMoZWROP576rc9VtOM6tScKU1rzwe8Bo7ayzNADmTFJR4UpQuiQhU NwHn7Lih6EGsVTtaX3KS91dA7v5Un/S4jskqAkkwOW3ipgf38LLfjETueHvDWpda szVFoMMMkqDwse2MyluiLk/80Cpl528GO24D0MMbQoPfsUi7OgxQN6t1QviRxeut pH1VyK4VrRDqhpGB+5l7PQUOk58dL4SjmIVO/G9wxMgKzw2giONJdt/zHBXMr5/l 6RYINwHEQalMoG3X+kgYKa5E1WPXeJsE4xQTBIQRsaHO+9lk95BbNUVfUgvygUPV TTaymhHhEYOsntiMHPkTI05FkK79khs/qX7lU1MVtZ9x7UaYXs3ziY7z+D5fRKf1 WgoYemWBeFZznsceacNSZGjFsaMJr6gsZ/rVuPSbytXDXSOrQsJxiENph7CnHbRk 4UJffEomjdQ93CNGBI2hCYlr4Gf5DflbwHjJbT+YXcqumkXJq4+ktSFp5aCH+9q8 901XYQ7QIJ4roIOzODUG =JI7o -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull one more module fix from Rusty Russell: "SCSI was using module_refcount() to figure out when the module was unloading: this broke with new atomic refcounting. The code is still suspicious, but this solves the WARN_ON()" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: scsi: always increment reference count	2015-01-27 09:02:09 -08:00
Govindarajulu Varadarajan	24e579c889	bnx2x: fix napi poll return value for repoll With the commit `d75b1ade56` ("net: less interrupt masking in NAPI") napi repoll is done only when work_done == budget. When in busy_poll is we return 0 in napi_poll. We should return budget. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:29:29 -08:00
David S. Miller	bf693f7beb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== ipsec 2015-01-26 Just two small fixes for _decode_session6() where we might decode to wrong header information in some rare situations. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:28:38 -08:00
Hannes Frederic Sowa	6e9e16e614	ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too Lubomir Rintel reported that during replacing a route the interface reference counter isn't correctly decremented. To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>: \| [root@rhel7-5 lkundrak]# sh -x lal \| + ip link add dev0 type dummy \| + ip link set dev0 up \| + ip link add dev1 type dummy \| + ip link set dev1 up \| + ip addr add 2001:db8:8086::2/64 dev dev0 \| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20 \| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10 \| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20 \| + ip link del dev0 type dummy \| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ... \| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 \| \| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ... \| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 During replacement of a rt6_info we must walk all parent nodes and check if the to be replaced rt6_info got propagated. If so, replace it with an alive one. Fixes: `4a287eba2d` ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag") Reported-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:22:14 -08:00
David S. Miller	225776098b	Merge branch 'sh_eth' Ben Hutchings says: ==================== Fixes for sh_eth #3 I'm continuing review and testing of Ethernet support on the R-Car H2 chip. This series fixes the last of the more serious issues I've found. These are not tested on any of the other supported chips. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:18:57 -08:00
Ben Hutchings	52b9fa3696	sh_eth: Fix DMA-API usage for RX buffers - Use the return value of dma_map_single(), rather than calling virt_to_page() separately - Check for mapping failue - Call dma_unmap_single() rather than dma_sync_single_for_cpu() Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:18:54 -08:00
Ben Hutchings	aa3933b873	sh_eth: Check for DMA mapping errors on transmit dma_map_single() may fail if an IOMMU or swiotlb is in use, so we need to check for this. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:18:53 -08:00
Ben Hutchings	740c7f31c0	sh_eth: Ensure DMA engines are stopped before freeing buffers Currently we try to clear EDRRR and EDTRR and immediately continue to free buffers. This is unsafe because: - In general, register writes are not serialised with DMA, so we still have to wait for DMA to complete somehow - The R8A7790 (R-Car H2) manual states that the TX running flag cannot be cleared by writing to EDTRR - The same manual states that clearing the RX running flag only stops RX DMA at the next packet boundary I applied this patch to the driver to detect DMA writes to freed buffers: > --- a/drivers/net/ethernet/renesas/sh_eth.c > +++ b/drivers/net/ethernet/renesas/sh_eth.c > @@ -1098,7 +1098,14 @@ static void sh_eth_ring_free(struct net_device ndev) > / Free Rx skb ringbuffer */ > if (mdp->rx_skbuff) { > for (i = 0; i < mdp->num_rx_ring; i++) > + memcpy(mdp->rx_skbuff[i]->data, > + "Hello, world", 12); > + msleep(100); > + for (i = 0; i < mdp->num_rx_ring; i++) { > + WARN_ON(memcmp(mdp->rx_skbuff[i]->data, > + "Hello, world", 12)); > dev_kfree_skb(mdp->rx_skbuff[i]); > + } > } > kfree(mdp->rx_skbuff); > mdp->rx_skbuff = NULL; then ran the loop: while ethtool -G eth0 rx 128 ; ethtool -G eth0 rx 64; do echo -n .; done and 'ping -f' toward the sh_eth port from another machine. The warning fired several times a minute. To fix these issues: - Deactivate all TX descriptors rather than writing to EDTRR - As there seems to be no way of telling when RX DMA is stopped, perform a soft reset to ensure that both DMA enginess are stopped - To reduce the possibility of the reset truncating a transmitted frame, disable egress and wait a reasonable time to reach a packet boundary before resetting - Update statistics before resetting (The 'reasonable time' does not allow for CS/CD in half-duplex mode, but half-duplex no longer seems reasonable!) Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:18:53 -08:00
Ben Hutchings	dc1d0e6d55	sh_eth: Remove RX overflow log messages If RX traffic is overflowing the FIFO or DMA ring, logging every time this happens just makes things worse. These errors are visible in the statistics anyway. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-27 00:18:53 -08:00

1 2 3 4 5 ...

495696 Commits