linux/net/ipv4
Eric Dumazet c9eeec26e3 tcp: TSQ can use a dynamic limit
When TCP Small Queues was added, we used a sysctl to limit amount of
packets queues on Qdisc/device queues for a given TCP flow.

Problem is this limit is either too big for low rates, or too small
for high rates.

Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
auto sizing, it can better control number of packets in Qdisc/device
queues.

New limit is two packets or at least 1 to 2 ms worth of packets.

Low rates flows benefit from this patch by having even smaller
number of packets in queues, allowing for faster recovery,
better RTT estimations.

High rates flows benefit from this patch by allowing more than 2 packets
in flight as we had reports this was a limiting factor to reach line
rate. [ In particular if TX completion is delayed because of coalescing
parameters ]

Example for a single flow on 10Gbp link controlled by FQ/pacing

14 packets in flight instead of 2

$ tc -s -d qd
qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 1024 quantum 3028 initial_quantum 15140
 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
requeues 6822476)
 rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
  2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
  2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit

Note that sk_pacing_rate is currently set to twice the actual rate, but
this might be refined in the future when a flow is in congestion
avoidance.

Additional change : skb->destructor should be set to tcp_wfree().

A future patch (for linux 3.13+) might remove tcp_limit_output_bytes

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 20:41:57 -07:00
..
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-09-05 14:54:29 -07:00
af_inet.c net: net_secret should not depend on TCP 2013-09-28 15:19:40 -07:00
ah4.c ipv4: properly refresh rtable entries on pmtu/redirect events 2013-06-03 00:07:42 -07:00
arp.c net: neighbour: Remove CONFIG_ARPD 2013-09-03 21:41:43 -04:00
cipso_ipv4.c cipso: don't follow a NULL pointer when setsockopt() is called 2012-07-18 09:01:12 -07:00
datagram.c ipv4: Add a socket release callback for datagram sockets 2013-01-21 14:17:05 -05:00
devinet.c net: igmp: Allow user-space configuration of igmp unsolicited report interval 2013-08-09 11:27:46 -07:00
esp4.c net: esp{4,6}: fix potential MTU calculation overflows 2013-08-05 12:26:50 -07:00
fib_frontend.c netlink: fix splat in skb_clone with large messages 2013-06-27 22:44:16 -07:00
fib_lookup.h
fib_rules.c fib_rules: fix suppressor names and default values 2013-08-03 10:40:23 -07:00
fib_semantics.c ipv4: use next hop exceptions also for input routes 2013-06-28 21:27:47 -07:00
fib_trie.c fib_trie: remove potential out of bound access 2013-08-05 15:26:11 -07:00
gre_demux.c net: gre: move GSO functions to gre_offload 2013-07-03 14:37:39 -07:00
gre_offload.c gso: Update tunnel segmentation to support Tx checksum offload 2013-07-11 12:18:49 -07:00
icmp.c icmp: avoid allocating large struct on stack 2013-06-03 00:28:44 -07:00
igmp.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
inet_connection_sock.c tcp: Remove TCPCT 2013-03-17 14:35:13 -04:00
inet_diag.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
inet_fragment.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-07-09 18:24:39 -07:00
inet_hashtables.c inet: fix spacing in assignment 2013-07-11 12:02:39 -07:00
inet_lro.c ipv4: replace ip_fast_csum with csum_replace2 2013-03-15 09:12:25 -04:00
inet_timewait_sock.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
inetpeer.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
ip_forward.c ipv4: introduce rt_uses_gateway 2012-10-08 17:42:36 -04:00
ip_fragment.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-04-22 20:32:51 -04:00
ip_gre.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
ip_input.c net: add SNMP counters tracking incoming ECN bits 2013-08-08 22:24:59 -07:00
ip_options.c net/ipv4: Ensure that location of timestamp option is stored 2013-03-12 05:35:39 -04:00
ip_output.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
ip_sockglue.c net: prevent setting ttl=0 via IP_TTL 2013-01-08 17:57:10 -08:00
ip_tunnel_core.c tunnels: harmonize cleanup done on skb on xmit path 2013-09-04 00:27:25 -04:00
ip_tunnel.c ip_tunnel: Do not use stale inner_iph pointer. 2013-09-30 15:05:07 -04:00
ip_vti.c ipip: add x-netns support 2013-08-15 01:00:20 -07:00
ipcomp.c ipv4: properly refresh rtable entries on pmtu/redirect events 2013-06-03 00:07:42 -07:00
ipconfig.c ipconfig: add informative timeout messages while waiting for carrier 2013-04-02 14:35:33 -04:00
ipip.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-09-05 14:58:52 -04:00
ipmr.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
Kconfig net: neighbour: Remove CONFIG_ARPD 2013-09-03 21:41:43 -04:00
Makefile net: gre: move GSO functions to gre_offload 2013-07-03 14:37:39 -07:00
netfilter.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
ping.c net: proc_fs: trivial: print UIDs as unsigned int 2013-08-15 14:37:46 -07:00
proc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
protocol.c ipv4: Disallow non-namespace aware protocols to register. 2013-02-05 14:42:23 -05:00
raw.c net: raw: do not report ICMP redirects to user space 2013-09-24 10:15:49 -04:00
route.c ipv4: raise IP_MAX_MTU to theoretical limit 2013-08-20 15:05:04 -07:00
syncookies.c net: syncookies: export cookie_v4_init_sequence/cookie_v4_check 2013-08-28 00:27:44 +02:00
sysctl_net_ipv4.c tcp: TSO packets automatic sizing 2013-08-29 15:50:06 -04:00
tcp_bic.c tcp: fix undo after RTO for BIC 2012-01-20 14:17:26 -05:00
tcp_cong.c tcp: remove Appropriate Byte Count support 2013-02-05 14:51:16 -05:00
tcp_cubic.c tcp: cubic: fix bug in bictcp_acked() 2013-08-07 10:35:08 -07:00
tcp_diag.c inet_diag: Rename inet_diag_req into inet_diag_req_v2 2012-01-11 12:56:06 -08:00
tcp_fastopen.c tcp: add server ip to encrypt cookie in fast open 2013-08-10 00:35:33 -07:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c tcp: bool conversions 2012-05-17 14:59:59 -04:00
tcp_illinois.c net: fix divide by zero in tcp algorithm illinois 2012-11-01 11:55:59 -04:00
tcp_input.c tcp: properly increase rcv_ssthresh for ofo packets 2013-09-06 14:43:49 -04:00
tcp_ipv4.c tcp: Change return value of tcp_rcv_established() 2013-09-04 00:27:28 -04:00
tcp_lp.c
tcp_memcontrol.c memcg: rename RESOURCE_MAX to RES_COUNTER_MAX 2013-09-12 15:38:02 -07:00
tcp_metrics.c tcp: fix RTO calculated from cached RTT 2013-09-17 19:08:08 -04:00
tcp_minisocks.c tcp: consolidate SYNACK RTT sampling 2013-07-22 17:53:42 -07:00
tcp_offload.c net: tcp: move GRO/GSO functions to tcp_offload 2013-06-07 14:39:05 -07:00
tcp_output.c tcp: TSQ can use a dynamic limit 2013-09-30 20:41:57 -07:00
tcp_probe.c tcp: Change return value of tcp_rcv_established() 2013-09-04 00:27:28 -04:00
tcp_scalable.c
tcp_timer.c tcp: refactor F-RTO 2013-03-21 11:47:50 -04:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c tcp: refactor F-RTO 2013-03-21 11:47:50 -04:00
tcp_yeah.c
tcp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-09-05 14:58:52 -04:00
tunnel4.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
udp_diag.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
udp_impl.h ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
udp_offload.c net: udp4: move GSO functions to udp_offload 2013-06-12 00:47:25 -07:00
udp.c net: udp: do not report ICMP redirects to user space 2013-09-24 10:15:49 -04:00
udplite.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
xfrm4_input.c net: Add skb_unclone() helper function. 2013-02-15 15:10:37 -05:00
xfrm4_mode_beet.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
xfrm4_output.c xfrm: revert ipv4 mtu determination to dst_mtu 2013-08-26 12:40:53 +02:00
xfrm4_policy.c xfrm: make gc_thresh configurable in all namespaces 2013-02-06 11:36:29 +01:00
xfrm4_state.c xfrm: make local error reporting more robust 2013-08-14 13:07:12 +02:00
xfrm4_tunnel.c sit: add IPv4 over IPv4 support 2013-05-31 17:19:05 -07:00