linux/net
Neal Cardwell 5ae344c949 tcp: reduce spurious retransmits due to transient SACK reneging
This commit reduces spurious retransmits due to apparent SACK reneging
by only reacting to SACK reneging that persists for a short delay.

When a sequence space hole at snd_una is filled, some TCP receivers
send a series of ACKs as they apparently scan their out-of-order queue
and cumulatively ACK all the packets that have now been consecutiveyly
received. This is essentially misbehavior B in "Misbehaviors in TCP
SACK generation" ACM SIGCOMM Computer Communication Review, April
2011, so we suspect that this is from several common OSes (Windows
2000, Windows Server 2003, Windows XP). However, this issue has also
been seen in other cases, e.g. the netdev thread "TCP being hoodwinked
into spurious retransmissions by lack of timestamps?" from March 2014,
where the receiver was thought to be a BSD box.

Since snd_una would temporarily be adjacent to a previously SACKed
range in these scenarios, this receiver behavior triggered the Linux
SACK reneging code path in the sender. This led the sender to clear
the SACK scoreboard, enter CA_Loss, and spuriously retransmit
(potentially) every packet from the entire write queue at line rate
just a few milliseconds before the ACK for each packet arrives at the
sender.

To avoid such situations, now when a sender sees apparent reneging it
does not yet retransmit, but rather adjusts the RTO timer to give the
receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs
that will restore sanity to the SACK scoreboard. If the reneging
persists until this RTO then, as before, we clear the SACK scoreboard
and enter CA_Loss.

A 10ms delay tolerates a receiver sending such a stream of ACKs at
56Kbit/sec. And to allow for receivers with slower or more congested
paths, we wait for at least RTT/2.

We validated the resulting max(RTT/2, 10ms) delay formula with a mix
of North American and South American Google web server traffic, and
found that for ACKs displaying transient reneging:

 (1) 90% of inter-ACK delays were less than 10ms
 (2) 99% of inter-ACK delays were less than RTT/2

In tests on Google web servers this commit reduced reneging events by
75%-90% (as measured by the TcpExtTCPSACKReneging counter), without
any measurable impact on latency for user HTTP and SPDY requests.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 16:29:33 -07:00
..
6lowpan Merge tag 'master-2014-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2014-08-05 13:18:20 -07:00
9p 9P: remove unnecessary break after return 2014-07-15 16:27:00 -07:00
802 net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
8021q vlan: fail early when creating netdev named config 2014-07-29 11:43:50 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-07-16 14:09:34 -07:00
atm net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
ax25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-07-22 00:44:59 -07:00
bluetooth Merge tag 'master-2014-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2014-08-05 13:18:20 -07:00
bridge bridge: remove a useless comment 2014-08-04 12:46:51 -07:00
caif caif: remove unnecessary break after goto 2014-07-15 16:27:01 -07:00
can can: add hash based access to single EFF frame filters 2014-05-19 09:38:24 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-06-12 23:06:23 -07:00
core net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
dcb dcbnl : Fix misleading dcb_app->priority explanation 2014-07-30 17:21:05 -07:00
dccp inet: move ipv6only in sock_common 2014-07-01 23:46:21 -07:00
decnet net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
dns_resolver dns_resolver: Null-terminate the right string 2014-07-20 22:33:32 -07:00
dsa net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
ethernet net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
hsr net/hsr: Remove left-over never-true conditional code. 2014-07-11 15:04:40 -07:00
ieee802154 inet: frags: use kmem_cache for inet_frag_queue 2014-08-02 15:31:31 -07:00
ipv4 tcp: reduce spurious retransmits due to transient SACK reneging 2014-08-05 16:29:33 -07:00
ipv6 inet: frags: use kmem_cache for inet_frag_queue 2014-08-02 15:31:31 -07:00
ipx net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
irda irda: remove unnecessary break after return 2014-07-15 16:27:01 -07:00
iucv af_iucv: avoid path quiesce of severed path in shutdown() 2014-07-21 20:21:40 -07:00
key af_key: remove unnecessary break after return 2014-07-15 16:27:00 -07:00
l2tp net: use inet6_iif instead of IP6CB()->iif 2014-07-31 22:37:06 -07:00
lapb
llc llc: remove noisy WARN from llc_mac_hdr_init 2014-01-28 18:01:32 -08:00
mac80211 Merge tag 'master-2014-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2014-07-28 17:36:25 -07:00
mac802154 net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
mpls gre: Call gso_make_checksum 2014-06-04 22:46:38 -07:00
netfilter nftables: Convert nft_hash to use generic rhashtable 2014-08-02 19:49:38 -07:00
netlabel netlabel: remove unnecessary break after goto 2014-07-15 16:27:00 -07:00
netlink netlink: fix lockdep splats 2014-08-04 22:58:06 -07:00
netrom net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
nfc Merge tag 'master-2014-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2014-08-05 13:18:20 -07:00
openvswitch net: Remove unlikely() for WARN_ON() conditions 2014-07-30 17:41:47 -07:00
packet packet: remove deprecated syststamp timestamp 2014-07-29 11:39:50 -07:00
phonet net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-05-22 13:58:36 -04:00
rose net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
rxrpc net/rxrpc/ar-key.c: drop negativity check on unsigned value 2014-07-20 21:25:56 -07:00
sched net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
sctp net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS 2014-07-31 22:04:18 -07:00
sunrpc NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for support 2014-06-24 18:46:58 -04:00
tipc tipc: remove duplicated include from socket.c 2014-07-29 15:51:14 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
vmw_vsock vsock: Make transport the proto owner 2014-05-05 13:13:50 -04:00
wimax wimax: remove dead code 2013-11-21 13:09:42 -05:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-07-25 10:22:36 -04:00
x25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
xfrm xfrm: Fix installation of AH IPsec SAs 2014-06-30 07:42:12 +02:00
compat.c net: sendmsg: fix NULL pointer dereference 2014-07-29 12:20:22 -07:00
Kconfig 6lowpan: introduce new net/6lowpan directory 2014-07-12 01:53:30 +02:00
Makefile 6lowpan: introduce new net/6lowpan directory 2014-07-12 01:53:30 +02:00
nonet.c
socket.c net: remove deprecated syststamp timestamp 2014-07-29 11:39:50 -07:00
sysctl_net.c