linux/net/ipv4
Eric Dumazet 4bfe744ff1 tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
I had this bug sitting for too long in my pile, it is time to fix it.

Thanks to Doug Porter for reminding me of it!

We had various attempts in the past, including commit
0cbe6a8f08 ("tcp: remove SOCK_QUEUE_SHRUNK"),
but the issue is that TCP stack currently only generates
EPOLLOUT from input path, when tp->snd_una has advanced
and skb(s) cleaned from rtx queue.

If a flow has a big RTT, and/or receives SACKs, it is possible
that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0
and no more data can be sent until tp->snd_una finally advances.

What is needed is to also check if POLLOUT needs to be generated
whenever tp->snd_nxt is advanced, from output path.

This bug triggers more often after an idle period, as
we do not receive ACK for at least one RTT. tcp_notsent_lowat
could be a fraction of what CWND and pacing rate would allow to
send during this RTT.

In a followup patch, I will remove the bogus call
to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED)
from tcp_check_space(). Fact that we have decided to generate
an EPOLLOUT does not mean the application has immediately
refilled the transmit queue. This optimistic call
might have been the reason the bug seemed not too serious.

Tested:

200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2]

$ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat
$ cat bench_rr.sh
SUM=0
for i in {1..10}
do
 V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"`
 echo $V
 SUM=$(($SUM + $V))
done
echo SUM=$SUM

Before patch:
$ bench_rr.sh
130000000
80000000
140000000
140000000
140000000
140000000
130000000
40000000
90000000
110000000
SUM=1140000000

After patch:
$ bench_rr.sh
430000000
590000000
530000000
450000000
450000000
350000000
450000000
490000000
480000000
460000000
SUM=4680000000  # This is 410 % of the value before patch.

Fixes: c9bee3b7fd ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Doug Porter <dsp@fb.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25 12:07:45 +01:00
..
bpfilter net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces" 2020-08-10 12:06:44 -07:00
netfilter netfilter: flowtable: Remove the empty file 2022-04-25 10:37:33 +02:00
af_inet.c gso: do not skip outer ip header in case of ipip and net_failover 2022-02-21 11:41:30 +00:00
ah4.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
arp.c net: neigh: add skb drop reasons to arp_error_report() 2022-02-26 12:53:59 +00:00
bpf_tcp_ca.c bpf: reject program if a __user tagged memory accessed in kernel way 2022-01-27 12:03:46 -08:00
cipso_ipv4.c NET: IPV4: fix error "do not initialise globals to 0" 2021-09-19 12:43:56 +01:00
datagram.c net/ipv4/datagram.c: remove superfluous header files from datagram.c 2021-09-29 11:39:33 +01:00
devinet.c net: Add new protocol attribute to IP addresses 2022-02-18 21:20:06 -08:00
esp4_offload.c net: Fix esp GSO on inter address family tunnels. 2022-03-07 13:14:04 +01:00
esp4.c esp: limit skb_page_frag_refill use to a single page 2022-04-13 10:16:11 +02:00
fib_frontend.c net: Add l3mdev index to flow struct and avoid oif reset for port devices 2022-03-15 20:20:02 -07:00
fib_lookup.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
fib_notifier.c net: ipv4: remove superfluous header files from fib_notifier.c 2021-09-28 17:32:56 -07:00
fib_rules.c ipv4: Reject again rules with high DSCP values 2022-02-10 15:33:33 +00:00
fib_semantics.c net: ipv4: fix route with nexthop object delete warning 2022-04-01 12:09:17 +01:00
fib_trie.c net: Add l3mdev index to flow struct and avoid oif reset for port devices 2022-03-15 20:20:02 -07:00
fou.c gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlers 2021-11-24 17:21:42 -08:00
gre_demux.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
gre_offload.c gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlers 2021-11-24 17:21:42 -08:00
icmp.c ipv4: do not use per netns icmp sockets 2022-01-25 11:25:21 +00:00
igmp.c ipv4: drop unused assignment 2021-11-14 12:20:44 +00:00
inet_connection_sock.c tcp: Use BPF timeout setting for SYN ACK RTO 2022-02-02 14:45:18 +00:00
inet_diag.c inet_diag: fix kernel-infoleak for UDP sockets 2021-12-10 21:14:49 -08:00
inet_fragment.c net: ip: Handle delivery_time in ip defrag 2022-03-03 14:38:48 +00:00
inet_hashtables.c tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH. 2022-02-09 21:28:36 -08:00
inet_timewait_sock.c tcp: allocate tcp_death_row outside of struct netns_ipv4 2022-01-26 19:00:31 -08:00
inetpeer.c inetpeer: use div64_ul() and clamp_val() calculate inet_peer_threshold 2021-03-01 13:32:12 -08:00
ip_forward.c net: Add skb_clear_tstamp() to keep the mono delivery_time 2022-03-03 14:38:48 +00:00
ip_fragment.c net: ip: Handle delivery_time in ip defrag 2022-03-03 14:38:48 +00:00
ip_gre.c ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode 2022-04-25 11:40:45 +01:00
ip_input.c net: Postpone skb_clear_delivery_time() until knowing the skb is delivered locally 2022-03-03 14:38:48 +00:00
ip_options.c ipv4: drop fragmentation code from ip_options_build() 2022-01-29 17:53:07 +00:00
ip_output.c net: Set skb->mono_delivery_time and clear it after sch_handle_ingress() 2022-03-03 14:38:48 +00:00
ip_sockglue.c ipv4: Exposing __ip_sock_set_tos() in ip.h 2021-11-20 14:11:00 +00:00
ip_tunnel_core.c net: ip_tunnel: clean up endianness conversions 2021-01-08 19:25:35 -08:00
ip_tunnel.c net: Handle l3mdev in ip_tunnel_init_flow 2022-04-15 14:27:30 -07:00
ip_vti.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ipcomp.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
ipconfig.c net: ipconfig: Release the rtnl_lock while waiting for carrier 2021-10-28 14:36:41 +01:00
ipip.c ip: use dev_addr_set() in tunnels 2021-10-13 09:41:37 -07:00
ipmr_base.c
ipmr.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-10 17:29:56 -08:00
Kconfig net: ipv4: remove duplicate "the the" phrase in Kconfig text 2020-08-18 16:02:16 -07:00
Makefile bpf: Clean up sockmap related Kconfigs 2021-02-26 12:28:03 -08:00
metrics.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
netfilter.c netfilter: Dissect flow after packet mangling 2021-04-18 22:04:16 +02:00
netlink.c
nexthop.c nexthop: change nexthop_net_exit() to nexthop_net_exit_batch() 2022-02-08 20:41:33 -08:00
ping.c ping: remove pr_err from ping_lookup 2022-02-24 09:18:29 -08:00
proc.c tcp: allocate tcp_death_row outside of struct netns_ipv4 2022-01-26 19:00:31 -08:00
protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
raw_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
raw.c Networking fixes for 5.17-rc2, including fixes from netfilter and can. 2022-01-27 20:58:39 +02:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-23 10:53:49 -07:00
syncookies.c net: align static siphash keys 2021-11-16 19:07:54 -08:00
sysctl_net_ipv4.c tcp: adjust TSO packet sizes based on min_rtt 2022-03-09 20:05:44 -08:00
tcp_bbr.c bpf: Remove check_kfunc_call callback and old kfunc BTF ID API 2022-01-18 14:26:41 -08:00
tcp_bic.c
tcp_bpf.c bpf, sockmap: Fix double uncharge the mem of sk_msg 2022-03-15 16:43:31 +01:00
tcp_cdg.c
tcp_cong.c tcp: unexport tcp_ca_get_key_by_name and tcp_ca_get_name_by_key 2022-03-11 22:51:40 -08:00
tcp_cubic.c bpf: Remove check_kfunc_call callback and old kfunc BTF ID API 2022-01-18 14:26:41 -08:00
tcp_dctcp.c bpf: Remove check_kfunc_call callback and old kfunc BTF ID API 2022-01-18 14:26:41 -08:00
tcp_dctcp.h
tcp_diag.c
tcp_fastopen.c net/ipv4/tcp_fastopen.c: remove superfluous header files from tcp_fastopen.c 2021-09-20 13:09:06 +01:00
tcp_highspeed.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_htcp.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT 2022-04-25 12:07:45 +01:00
tcp_ipv4.c tcp: adjust TSO packet sizes based on min_rtt 2022-03-09 20:05:44 -08:00
tcp_lp.c ipv4: tcp_lp.c: Couple of typo fixes 2021-03-28 17:31:13 -07:00
tcp_metrics.c fixes-v5.11 2020-12-14 16:40:27 -08:00
tcp_minisocks.c tcp: md5: incorrect tcp_header_len for incoming connections 2022-04-22 15:05:59 -07:00
tcp_nv.c net/ipv4/tcp_nv.c: remove superfluous header files from tcp_nv.c 2021-09-27 12:47:39 +01:00
tcp_offload.c net: move gro definitions to include/net/gro.h 2021-11-16 13:16:54 +00:00
tcp_output.c tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT 2022-04-25 12:07:45 +01:00
tcp_rate.c tcp: ensure to use the most recently sent skb when filling the rate sample 2022-04-22 15:20:47 -07:00
tcp_recovery.c tcp: more accurately check DSACKs to grow RACK reordering window 2021-07-27 20:07:21 +01:00
tcp_scalable.c net: ipv4: delete repeated words 2020-08-24 17:31:20 -07:00
tcp_timer.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
tcp_ulp.c
tcp_vegas.c tcp: use semicolons rather than commas to separate statements 2020-10-13 17:11:52 -07:00
tcp_vegas.h
tcp_veno.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_westwood.c
tcp_yeah.c tcp_yeah: check struct yeah size at compile time 2021-06-29 11:54:36 -07:00
tcp.c tcp: autocork: take MSG_EOR hint into consideration 2022-03-09 20:05:20 -08:00
tunnel4.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
udp_bpf.c net: Implement ->sock_is_readable() for UDP and AF_UNIX 2021-10-26 12:29:33 -07:00
udp_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
udp_impl.h net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
udp_offload.c gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlers 2021-11-24 17:21:42 -08:00
udp_tunnel_core.c net/ipv4/udp_tunnel_core.c: remove superfluous header files from udp_tunnel_core.c 2021-09-21 10:17:20 +01:00
udp_tunnel_nic.c udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister() 2022-02-23 12:35:00 +00:00
udp_tunnel_stub.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udp.c net: udp: use kfree_skb_reason() in __udp_queue_rcv_skb() 2022-02-07 11:18:49 +00:00
udplite.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
xfrm4_input.c
xfrm4_output.c
xfrm4_policy.c net: Add l3mdev index to flow struct and avoid oif reset for port devices 2022-03-15 20:20:02 -07:00
xfrm4_protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
xfrm4_state.c
xfrm4_tunnel.c net/ipv4/xfrm4_tunnel.c: remove superfluous header files from xfrm4_tunnel.c 2021-09-23 10:10:00 +02:00