linux/net/ipv4
Neal Cardwell a6458ab7fd UPSTREAM: tcp: fix DSACK undo in fast recovery to call tcp_try_to_open()
In some production workloads we noticed that connections could
sometimes close extremely prematurely with ETIMEDOUT after
transmitting only 1 TLP and RTO retransmission (when we would normally
expect roughly tcp_retries2 = TCP_RETR2 = 15 RTOs before a connection
closes with ETIMEDOUT).

From tracing we determined that these workloads can suffer from a
scenario where in fast recovery, after some retransmits, a DSACK undo
can happen at a point where the scoreboard is totally clear (we have
retrans_out == sacked_out == lost_out == 0). In such cases, calling
tcp_try_keep_open() means that we do not execute any code path that
clears tp->retrans_stamp to 0. That means that tp->retrans_stamp can
remain erroneously set to the start time of the undone fast recovery,
even after the fast recovery is undone. If minutes or hours elapse,
and then a TLP/RTO/RTO sequence occurs, then the start_ts value in
retransmits_timed_out() (which is from tp->retrans_stamp) will be
erroneously ancient (left over from the fast recovery undone via
DSACKs). Thus this ancient tp->retrans_stamp value can cause the
connection to die very prematurely with ETIMEDOUT via
tcp_write_err().

The fix: we change DSACK undo in fast recovery (TCP_CA_Recovery) to
call tcp_try_to_open() instead of tcp_try_keep_open(). This ensures
that if no retransmits are in flight at the time of DSACK undo in fast
recovery then we properly zero retrans_stamp. Note that calling
tcp_try_to_open() is more consistent with other loss recovery
behavior, since normal fast recovery (CA_Recovery) and RTO recovery
(CA_Loss) both normally end when tp->snd_una meets or exceeds
tp->high_seq and then in tcp_fastretrans_alert() the "default" switch
case executes tcp_try_to_open(). Also note that by inspection this
change to call tcp_try_to_open() implies at least one other nice bug
fix, where now an ECE-marked DSACK that causes an undo will properly
invoke tcp_enter_cwr() rather than ignoring the ECE mark.

Fixes: c7d9d6a185 ("tcp: undo on DSACK during recovery")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28 10:28:14 +01:00
..
netfilter netfilter: tproxy: bail out if IP has been disabled on the device 2024-05-29 00:37:51 +02:00
af_inet.c net: gro: initialize network_offset in network layer 2024-05-27 16:46:59 -07:00
ah4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
arp.c arp: Convert ioctl(SIOCGARP) to RCU. 2024-05-01 18:37:07 -07:00
bpf_tcp_ca.c bpf: tcp: Allow to write tp->snd_cwnd_stamp in bpf_tcp_ca 2024-05-02 16:26:56 -07:00
cipso_ipv4.c cipso: make cipso_v4_skbuff_delattr() fully remove the CIPSO options 2024-06-14 08:19:54 +01:00
datagram.c ipv4: Set the routing scope properly in ip_route_output_ports(). 2024-02-12 17:33:05 -08:00
devinet.c rtnetlink: make the "split" NLM_DONE handling generic 2024-06-05 12:34:54 +01:00
esp4_offload.c xfrm: Support GRO for IPv4 ESP in UDP encapsulation 2023-10-06 07:30:40 +02:00
esp4.c ipsec-next-2024-05-03 2024-05-06 19:14:56 -07:00
fib_frontend.c rtnetlink: make the "split" NLM_DONE handling generic 2024-06-05 12:34:54 +01:00
fib_lookup.h
fib_notifier.c
fib_rules.c fib: remove unnecessary input parameters in fib_default_rule_add 2024-01-03 16:42:48 -08:00
fib_semantics.c net: add two more call_rcu_hurry() 2024-04-25 15:24:23 -07:00
fib_trie.c inet: switch inet_dump_fib() to RCU protection 2024-02-26 11:46:13 +00:00
fou_bpf.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
fou_core.c net: gro: rename skb_gro_header_hard() 2024-03-05 13:30:11 +01:00
fou_nl.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
fou_nl.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
gre_demux.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
gre_offload.c net: gro: rename skb_gro_header_hard() 2024-03-05 13:30:11 +01:00
icmp.c net/ipv4: add tracepoint for icmp_send 2024-05-08 10:39:26 +01:00
igmp.c ipv4: Set scope explicitly in ip_route_output(). 2024-04-08 13:20:51 +01:00
inet_connection_sock.c Fix race for duplicate reqsk on identical SYN 2024-06-25 11:37:45 +02:00
inet_diag.c inet_diag: skip over empty buckets 2024-01-23 15:13:55 +01:00
inet_fragment.c inet: frags: delay fqdir_free_fn() 2024-04-08 10:59:56 +01:00
inet_hashtables.c tcp: get rid of twsk_unique() 2024-05-09 20:25:55 -07:00
inet_timewait_sock.c tcp/dccp: do not care about families in inet_twsk_purge() 2024-04-01 21:27:58 -07:00
inetpeer.c net: ipv4: Simplify the allocation of slab caches in inet_initpeers 2024-01-31 16:39:42 -08:00
ip_forward.c net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check 2023-10-13 09:58:45 -07:00
ip_fragment.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
ip_gre.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip_input.c inet: introduce dst_rtable() helper 2024-04-30 18:32:38 -07:00
ip_options.c
ip_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-02 12:06:25 -07:00
ip_sockglue.c inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT 2024-03-06 12:37:06 +00:00
ip_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ip_tunnel.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip_vti.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipcomp.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.c net: ipconfig: move ic_nameservers_fallback into #ifdef block 2023-05-22 11:17:55 +01:00
ipip.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipmr_base.c
ipmr.c ip_tunnel: use a separate struct to store tunnel params in the kernel 2024-04-01 10:49:28 +01:00
Kconfig net/tcp: Add TCP-AO config and structures 2023-10-27 10:35:44 +01:00
Makefile bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-01-23 21:37:25 -08:00
netfilter.c xfrm: pass struct net to xfrm_decode_session wrappers 2023-10-06 08:31:53 +02:00
netlink.c
nexthop.c nexthop: fix uninitialized variable in nla_put_nh_group_stats() 2024-03-22 18:03:29 -07:00
ping.c bpf-next-for-netdev 2023-10-16 21:05:33 -07:00
proc.c net: add <net/proto_memory.h> 2024-04-30 18:46:52 -07:00
protocol.c
raw_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
raw.c ipv4: Fix uninit-value access in __ip_make_skb() 2024-05-02 10:16:35 +02:00
route.c net: fix __dst_negative_advice() race 2024-05-29 17:34:49 -07:00
syncookies.c tcp: annotate data-races around tp->window_clamp 2024-04-05 22:32:37 -07:00
sysctl_net_ipv4.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
tcp_ao.c net/tcp_ao: Don't leak ao_info on error-path 2024-06-19 17:30:19 -07:00
tcp_bbr.c tcp: Add new args for cong_control in tcp_congestion_ops 2024-05-02 16:26:56 -07:00
tcp_bic.c
tcp_bpf.c tcp_bpf: properly release resources on error paths 2023-10-18 18:09:31 -07:00
tcp_cdg.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
tcp_cong.c bpf, net: validate struct_ops when updating value. 2024-03-04 10:03:57 -08:00
tcp_cubic.c bpf: Remove CONFIG_X86 and CONFIG_DYNAMIC_FTRACE guard from the tcp-cc kfuncs 2024-03-28 18:31:40 -07:00
tcp_dctcp.c tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). 2024-05-21 13:34:50 +02:00
tcp_dctcp.h
tcp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
tcp_fastopen.c inet: move inet->defer_connect to inet->inet_flags 2023-08-16 11:09:18 +01:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c UPSTREAM: tcp: fix DSACK undo in fast recovery to call tcp_try_to_open() 2024-06-28 10:28:14 +01:00
tcp_ipv4.c tcp: reduce accepted window in NEW_SYN_RECV state 2024-05-27 16:47:23 -07:00
tcp_lp.c tcp: rename tcp_time_stamp() to tcp_time_stamp_ts() 2023-10-23 09:35:01 +01:00
tcp_metrics.c tcp_metrics: use parallel_ops for tcp_metrics_nl_family 2024-04-17 18:31:53 -07:00
tcp_minisocks.c tcp: reduce accepted window in NEW_SYN_RECV state 2024-05-27 16:47:23 -07:00
tcp_nv.c
tcp_offload.c net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment 2024-05-13 14:44:06 -07:00
tcp_output.c tcp: remove 64 KByte limit for initial tp->rcv_wnd value 2024-05-23 12:21:17 +02:00
tcp_plb.c prandom: remove prandom_u32_max() 2022-12-20 03:13:45 +01:00
tcp_rate.c
tcp_recovery.c tcp: fix excessive TLP and RACK timeouts from HZ rounding 2023-10-17 17:25:42 -07:00
tcp_scalable.c
tcp_sigpool.c net/tcp_sigpool: Use kref_get_unless_zero() 2024-01-01 14:42:05 +00:00
tcp_timer.c tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() 2024-06-10 19:50:10 -07:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-19 09:26:16 -08:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB 2024-06-05 12:32:46 +01:00
tunnel4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-03 17:25:15 +01:00
udp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
udp_impl.h sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
udp_offload.c net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment 2024-05-13 14:44:06 -07:00
udp_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
udp_tunnel_nic.c udp_tunnel: Use flex array to simplify code 2023-10-03 11:39:34 +02:00
udp_tunnel_stub.c
udp.c ipsec-next-2024-05-03 2024-05-06 19:14:56 -07:00
udplite.c udplite: remove UDPLITE_BIT 2023-09-14 16:16:36 +02:00
xfrm4_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
xfrm4_output.c
xfrm4_policy.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00