linux/Documentation/networking
Eric Dumazet 65466904b0 tcp: adjust TSO packet sizes based on min_rtt
Back when tcp_tso_autosize() and TCP pacing were introduced,
our focus was really to reduce burst sizes for long distance
flows.

The simple heuristic of using sk_pacing_rate/1024 has worked
well, but can lead to too small packets for hosts in the same
rack/cluster, when thousands of flows compete for the bottleneck.

Neal Cardwell had the idea of making the TSO burst size
a function of both sk_pacing_rate and tcp_min_rtt()

Indeed, for local flows, sending bigger bursts is better
to reduce cpu costs, as occasional losses can be repaired
quite fast.

This patch is based on Neal Cardwell implementation
done more than two years ago.
bbr is adjusting max_pacing_rate based on measured bandwidth,
while cubic would over estimate max_pacing_rate.

/proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
this new feature, in logarithmic steps.

Tested:

100Gbit NIC, two hosts in the same rack, 4K MTU.
600 flows rate-limited to 20000000 bytes per second.

Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO)

~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96005

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         65,945.29 msec task-clock                #    2.845 CPUs utilized
         1,314,632      context-switches          # 19935.279 M/sec
             5,292      cpu-migrations            #   80.249 M/sec
           940,641      page-faults               # 14264.023 M/sec
   201,117,030,926      cycles                    # 3049769.216 GHz                   (83.45%)
    17,699,435,405      stalled-cycles-frontend   #    8.80% frontend cycles idle     (83.48%)
   136,584,015,071      stalled-cycles-backend    #   67.91% backend cycles idle      (83.44%)
    53,809,530,436      instructions              #    0.27  insn per cycle
                                                  #    2.54  stalled cycles per insn  (83.36%)
     9,062,315,523      branches                  # 137422329.563 M/sec               (83.22%)
       153,008,621      branch-misses             #    1.69% of all branches          (83.32%)

      23.182970846 seconds time elapsed

TcpInSegs                       15648792           0.0
TcpOutSegs                      58659110           0.0  # Average of 3.7 4K segments per TSO packet
TcpExtTCPDelivered              58654791           0.0
TcpExtTCPDeliveredCE            19                 0.0

After patch:

~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96046

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         48,982.58 msec task-clock                #    2.104 CPUs utilized
           186,014      context-switches          # 3797.599 M/sec
             3,109      cpu-migrations            #   63.472 M/sec
           941,180      page-faults               # 19214.814 M/sec
   153,459,763,868      cycles                    # 3132982.807 GHz                   (83.56%)
    12,069,861,356      stalled-cycles-frontend   #    7.87% frontend cycles idle     (83.32%)
   120,485,917,953      stalled-cycles-backend    #   78.51% backend cycles idle      (83.24%)
    36,803,672,106      instructions              #    0.24  insn per cycle
                                                  #    3.27  stalled cycles per insn  (83.18%)
     5,947,266,275      branches                  # 121417383.427 M/sec               (83.64%)
        87,984,616      branch-misses             #    1.48% of all branches          (83.43%)

      23.281200256 seconds time elapsed

TcpInSegs                       1434706            0.0
TcpOutSegs                      58883378           0.0  # Average of 41 4K segments per TSO packet
TcpExtTCPDelivered              58878971           0.0
TcpExtTCPDeliveredCE            9664               0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 20:05:44 -08:00
..
caif tty: cumulate and document tty_struct::flow* members 2021-05-13 16:57:16 +02:00
device_drivers docs: networking: device drivers: can: add flexcan 2022-01-08 21:22:58 +01:00
devlink Documentation: devlink: mlx5.rst: Fix htmldoc build warning 2022-01-06 16:22:55 -08:00
dsa docs: net: dsa: sja1105: document limitations of tc-flower rule VLAN awareness 2022-02-27 11:06:13 +00:00
mac80211_hwsim docs: net: convert two README files to ReST format 2019-07-31 13:31:56 -06:00
6lowpan.rst docs: networking: convert 6lowpan.txt to ReST 2020-02-28 14:52:36 +01:00
6pack.rst docs: networking: convert 6pack.txt to ReST 2020-04-28 14:38:38 -07:00
af_xdp.rst doc, af_xdp: Fix bind flags option typo 2021-07-12 16:55:01 +02:00
alias.rst docs: networking: Convert alias.txt to rst 2018-07-18 15:28:27 -07:00
arcnet-hardware.rst docs: networking: arcnet-hardware.rst: don't duplicate chapter names 2020-05-01 12:24:43 -07:00
arcnet.rst Documentation: networking: arcnet: drop doubled word 2020-07-04 17:46:21 -07:00
atm.rst docs: networking: convert atm.txt to ReST 2020-04-28 14:38:38 -07:00
ax25.rst Documentation: networking: ax25: drop doubled word 2020-07-04 17:46:21 -07:00
bareudp.rst Documentation: bareudp: Corrected description of bareudp module. 2020-07-28 17:53:03 -07:00
batman-adv.rst batman-adv: Move IRC channel to hackint.org 2021-08-08 20:05:46 +02:00
bonding.rst bonding: add new option ns_ip6_target 2022-02-21 12:13:45 +00:00
bridge.rst docs: networking: Convert bridge.txt to rst 2018-07-18 15:28:27 -07:00
can_ucan_protocol.rst Documentation: networking: can_ucan_protocol: drop doubled words 2020-07-04 17:46:21 -07:00
can.rst can: add a note that RECV_OWN_MSGS frames are subject to filtering 2021-04-24 14:36:51 +02:00
cdc_mbim.rst docs: networking: convert cdc_mbim.txt to ReST 2020-04-28 14:38:39 -07:00
checksum-offloads.rst docs: networking: convert netdev-features.txt to ReST 2020-04-30 12:56:36 -07:00
dccp.rst net: dccp: Add SIOCOUTQ IOCTL support (send buffer fill) 2020-07-22 17:00:37 -07:00
dctcp.rst docs: networking: convert dctcp.txt to ReST 2020-04-28 14:38:39 -07:00
decnet.rst docs: networking: convert decnet.txt to ReST 2020-04-28 14:39:45 -07:00
dns_resolver.rst docs: networking: convert dns_resolver.txt to ReST 2020-04-28 14:39:46 -07:00
driver.rst docs: networking: convert driver.txt to ReST 2020-04-28 14:39:46 -07:00
eql.rst docs: networking: convert eql.txt to ReST 2020-04-28 14:39:46 -07:00
ethtool-netlink.rst ethtool: add support to set/get completion queue event size 2022-02-23 20:33:05 -08:00
failover.rst net: Introduce generic failover module 2018-05-28 22:59:54 -04:00
fib_trie.rst docs: networking: convert fib_trie.txt to ReST 2020-04-28 14:39:46 -07:00
filter.rst bpf, docs: Split general purpose eBPF documentation out of filter.rst 2021-11-30 10:52:11 -08:00
gen_stats.rst docs: networking: convert gen_stats.txt to ReST 2020-04-28 14:39:46 -07:00
generic_netlink.rst docs: networking: convert generic_netlink.txt to ReST 2020-04-28 14:39:46 -07:00
generic-hdlc.rst docs: networking: convert generic-hdlc.txt to ReST 2020-04-28 14:39:46 -07:00
gtp.rst docs: networking: convert gtp.txt to ReST 2020-04-28 14:39:46 -07:00
ieee802154.rst docs: net: ieee802154.rst: fix C expressions 2020-10-15 07:49:41 +02:00
ila.rst docs: networking: convert ila.txt to ReST 2020-04-28 14:39:47 -07:00
index.rst net/smc: fix document build WARNING from smc-sysctl.rst 2022-03-03 21:24:34 -08:00
ioam6-sysctl.rst ipv6: ioam: Documentation for new IOAM sysctls 2021-07-21 08:14:33 -07:00
ip_dynaddr.rst docs: networking: convert ip_dynaddr.txt to ReST 2020-04-28 14:39:47 -07:00
ip-sysctl.rst tcp: adjust TSO packet sizes based on min_rtt 2022-03-09 20:05:44 -08:00
ipddp.rst docs: networking: convert ipddp.txt to ReST 2020-04-28 14:39:47 -07:00
ipsec.rst docs: networking: convert ipsec.txt to ReST 2020-04-28 14:39:47 -07:00
ipv6.rst docs: networking: convert ipv6.txt to ReST 2020-04-28 14:40:18 -07:00
ipvlan.rst docs: networking: convert ipvlan.txt to ReST 2020-04-28 14:40:18 -07:00
ipvs-sysctl.rst netfilter: ipvs: Fix reuse connection if RS weight is 0 2021-11-08 11:42:47 +01:00
j1939.rst can: j1939: add tables for the CAN identifier and its fields 2020-11-20 09:43:29 +01:00
kapi.rst wimax: move out to staging 2020-10-29 19:27:45 +01:00
kcm.rst docs: networking: convert kcm.txt to ReST 2020-04-28 14:40:19 -07:00
l2tp.rst docs: networking: add tracepoint info to l2tp.rst 2020-08-22 12:44:37 -07:00
lapb-module.rst docs: networking: convert lapb-module.txt to ReST 2020-04-30 12:56:35 -07:00
mac80211-auth-assoc-deauth.txt
mac80211-injection.rst doc: networking: wireless: fix wiki website url 2020-06-08 10:05:53 +02:00
mctp.rst mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control 2022-02-09 12:00:11 +00:00
mpls-sysctl.rst docs: networking: convert mpls-sysctl.txt to ReST 2020-04-30 12:56:36 -07:00
mptcp-sysctl.rst mptcp: faster active backup recovery 2021-08-14 11:37:25 +01:00
msg_zerocopy.rst docs: use the lore redirector everywhere 2021-10-12 13:58:19 -06:00
multiqueue.rst docs: networking: convert multiqueue.txt to ReST 2020-04-30 12:56:36 -07:00
net_dim.rst docs: networking: add full DIM API 2020-04-10 18:11:04 -07:00
net_failover.rst Documentation: networking: net_failover: Fix documentation 2021-11-17 13:59:49 +00:00
netconsole.rst docs: networking: convert netconsole.txt to ReST 2020-04-30 12:56:36 -07:00
netdev-FAQ.rst docs: networking: netdevsim rules 2021-08-04 12:43:27 +01:00
netdev-features.rst net: hsr: add offloading support 2021-02-11 13:24:44 -08:00
netdevices.rst net: bonding: move ioctl handling to private ndo operation 2021-07-27 20:11:45 +01:00
netfilter-sysctl.rst docs: networking: convert netfilter-sysctl.txt to ReST 2020-04-30 12:56:36 -07:00
netif-msg.rst docs: networking: convert netif-msg.txt to ReST 2020-04-30 12:56:36 -07:00
nexthop-group-resilient.rst Documentation: net: Document resilient next-hop groups 2021-03-29 13:51:38 -07:00
nf_conntrack-sysctl.rst Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2021-09-03 16:20:37 -07:00
nf_flowtable.rst docs: nf_flowtable: fix compilation and warnings 2021-03-25 17:42:02 -07:00
nfc.rst docs: networking: nfc: change to rst format 2019-11-23 11:00:19 -08:00
openvswitch.rst docs: networking: convert openvswitch.txt to ReST 2020-04-30 12:56:36 -07:00
operstates.rst docs: operstates: document IF_OPER_TESTING 2021-08-02 15:16:04 +01:00
packet_mmap.rst docs: networking: Replace strncpy() with strscpy() 2021-06-04 11:21:43 -06:00
page_pool.rst Documentation: update networking/page_pool.rst 2022-03-03 09:55:28 +00:00
phonet.rst docs: networking: convert phonet.txt to ReST 2020-04-30 12:56:37 -07:00
phy.rst net: document SMII and correct phylink's new validation mechanism 2021-11-16 19:22:30 -08:00
pktgen.rst pktgen: document the latest pktgen usage options 2021-08-25 13:44:30 +01:00
plip.rst docs: networking: convert PLIP.txt to ReST 2020-04-30 12:56:37 -07:00
ppp_generic.rst docs: update ppp_generic.rst to document new ioctls 2020-12-10 13:57:36 -08:00
proc_net_tcp.rst docs: networking: convert proc_net_tcp.txt to ReST 2020-04-30 12:56:37 -07:00
radiotap-headers.rst docs: networking: convert radiotap-headers.txt to ReST 2020-04-30 12:56:37 -07:00
rds.rst Doc: networking: Fix the title's Sphinx overline in rds.rst 2021-11-29 15:18:21 -07:00
regulatory.rst doc: networking: wireless: fix wiki website url 2020-06-08 10:05:53 +02:00
rxrpc.rst Documentation: networking: rxrpc: drop doubled word 2020-07-04 17:46:21 -07:00
scaling.rst docs: networking: update XPS to account for netif_set_xps_queue 2020-10-13 16:21:54 -07:00
sctp.rst docs: networking: convert sctp.txt to ReST 2020-04-30 12:56:38 -07:00
secid.rst docs: networking: convert secid.txt to ReST 2020-04-30 12:56:38 -07:00
seg6-sysctl.rst doc: move seg6_flowlabel to seg6-sysctl.rst 2021-04-14 13:13:15 -07:00
segmentation-offloads.rst networking: : fix typos in code comments 2019-05-20 20:24:34 -04:00
sfp-phylink.rst net: mdio: Remove of_phy_attach() 2021-02-17 13:17:49 -08:00
smc-sysctl.rst net/smc: fix document build WARNING from smc-sysctl.rst 2022-03-03 21:24:34 -08:00
snmp_counter.rst net-next: docs: Fix typos in snmp_counter.rst 2021-01-05 17:07:38 -08:00
statistics.rst docs: networking: extend the statistics documentation 2021-04-16 16:59:20 -07:00
strparser.rst docs: networking: convert strparser.txt to ReST 2020-04-30 12:56:38 -07:00
switchdev.rst Documentation: networking: switchdev: add missing "and" word 2021-03-17 12:34:34 -07:00
sysfs-tagging.rst Documentation: better locations for sysfs-pci, sysfs-tagging 2020-10-09 09:33:23 -06:00
tc-actions-env-rules.rst docs: networking: convert tc-actions-env-rules.txt to ReST 2020-04-30 12:56:38 -07:00
tcp-thin.rst docs: networking: convert tcp-thin.txt to ReST 2020-04-30 12:56:38 -07:00
team.rst docs: networking: convert team.txt to ReST 2020-04-30 12:56:38 -07:00
timestamping.rst docs: networking: Use netif_rx(). 2022-03-04 12:02:19 +00:00
tipc.rst Documentation: add more details in tipc.rst 2021-07-01 13:18:18 -07:00
tls-offload-layers.svg Documentation: add TLS offload documentation 2019-05-22 12:18:20 -07:00
tls-offload-reorder-bad.svg Documentation: add TLS offload documentation 2019-05-22 12:18:20 -07:00
tls-offload-reorder-good.svg Documentation: add TLS offload documentation 2019-05-22 12:18:20 -07:00
tls-offload.rst net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled 2021-01-19 15:58:05 -08:00
tls.rst net/tls: add TlsDeviceRxResync statistic 2019-10-05 16:29:00 -07:00
tproxy.rst docs: networking: convert tproxy.txt to ReST 2020-04-30 12:56:38 -07:00
tuntap.rst docs: networking: Replace strncpy() with strscpy() 2021-06-04 11:21:43 -06:00
udplite.rst docs: networking: convert udplite.txt to ReST 2020-05-01 12:24:40 -07:00
vrf.rst doc: Document unexpected tcp_l3mdev_accept=1 behavior 2021-08-23 11:53:24 +01:00
vxlan.rst docs: vxlan: add info about device features 2020-09-28 12:50:12 -07:00
x25-iface.rst net: x25: Queue received packets in the drivers instead of per-CPU queues 2021-04-05 11:42:12 -07:00
x25.rst net: x25: Remove unimplemented X.25-over-LLC code stubs 2020-12-12 17:15:33 -08:00
xfrm_device.rst docs: networking: Fix a typo 2021-03-20 19:02:42 -07:00
xfrm_proc.rst docs: networking: convert xfrm_proc.txt to ReST 2020-05-01 12:24:40 -07:00
xfrm_sync.rst docs: networking: convert xfrm_sync.txt to ReST 2020-05-01 12:24:41 -07:00
xfrm_sysctl.rst docs: networking: convert xfrm_sysctl.txt to ReST 2020-05-01 12:24:41 -07:00