linux/net/ipv4
Matthieu Baerts (NGI0) c166829268 tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
The 'Fixes' commit recently changed the behaviour of TCP by skipping the
processing of the 3rd ACK when a sk->sk_socket is set. The goal was to
skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an
unnecessary ACK in case of simultaneous connect(). Unfortunately, that
had an impact on TFO and MPTCP.

I started to look at the impact on MPTCP, because the MPTCP CI found
some issues with the MPTCP Packetdrill tests [1]. Then Paolo Abeni
suggested me to look at the impact on TFO with "plain" TCP.

For MPTCP, when receiving the 3rd ACK of a request adding a new path
(MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that
has been created when the MPTCP connection got established before with
the first path. The newly added 'goto' will then skip the processing of
the segment text (step 7) and not go through tcp_data_queue() where the
MPTCP options are validated, and some actions are triggered, e.g.
sending the MPJ 4th ACK [2] as demonstrated by the new errors when
running a packetdrill test [3] establishing a second subflow.

This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
delayed. Still, we don't want to have this behaviour as it delays the
switch to the fully established mode, and invalid MPTCP options in this
3rd ACK will not be caught any more. This modification also affects the
MPTCP + TFO feature as well, and being the reason why the selftests
started to be unstable the last few days [4].

For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer
passing: if the 3rd ACK contains data, and the connection is accept()ed
before receiving them, these data would no longer be processed, and thus
not ACKed.

One last thing about MPTCP, in case of simultaneous connect(), a
fallback to TCP will be done, which seems fine:

  `../common/defaults.sh`

   0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
  +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)

  +0 > S  0:0(0)                 <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 < S  0:0(0) win 1000        <mss 1460, sackOK, TS val 407 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 > S. 0:0(0) ack 1           <mss 1460, sackOK, TS val 330 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>
  +0 >  . 1:1(0) ack 1           <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1>

Simultaneous SYN-data crossing is also not supported by TFO, see [6].

Kuniyuki Iwashima suggested to restrict the processing to SYN+ACK only:
that's a more generic solution than the one initially proposed, and
also enough to fix the issues described above.

Later on, Eric Dumazet mentioned that an ACK should still be sent in
reaction to the second SYN+ACK that is received: not sending a DUPACK
here seems wrong and could hurt:

   0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
  +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)

  +0 > S  0:0(0)                <mss 1460, sackOK, TS val 1000 ecr 0,nop,wscale 8>
  +0 < S  0:0(0)       win 1000 <mss 1000, sackOK, nop, nop>
  +0 > S. 0:0(0) ack 1          <mss 1460, sackOK, TS val 3308134035 ecr 0,nop,wscale 8>
  +0 < S. 0:0(0) ack 1 win 1000 <mss 1000, sackOK, nop, nop>
  +0 >  . 1:1(0) ack 1          <nop, nop, sack 0:1>  // <== Here

So in this version, the 'goto consume' is dropped, to always send an ACK
when switching from TCP_SYN_RECV to TCP_ESTABLISHED. This ACK will be
seen as a DUPACK -- with DSACK if SACK has been negotiated -- in case of
simultaneous SYN crossing: that's what is expected here.

Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1]
Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2]
Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3]
Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4]
Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5]
Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/client/simultaneous-fast-open.pkt [6]
Fixes: 23e89e8ee7 ("tcp: Don't drop SYN+ACK for simultaneous connect().")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240724-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v3-1-d48339764ce9@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-25 12:58:19 +02:00
..
netfilter netfilter: tproxy: bail out if IP has been disabled on the device 2024-05-29 00:37:51 +02:00
af_inet.c net: gro: initialize network_offset in network layer 2024-05-27 16:46:59 -07:00
ah4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
arp.c arp: Convert ioctl(SIOCGARP) to RCU. 2024-05-01 18:37:07 -07:00
bpf_tcp_ca.c bpf: pass bpf_struct_ops_link to callbacks in bpf_struct_ops. 2024-05-30 15:34:13 -07:00
cipso_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-20 13:49:59 -07:00
datagram.c ipv4: Set the routing scope properly in ip_route_output_ports(). 2024-02-12 17:33:05 -08:00
devinet.c rtnetlink: make the "split" NLM_DONE handling generic 2024-06-05 12:34:54 +01:00
esp4_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-07-15 13:19:17 -07:00
esp4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-07-15 13:19:17 -07:00
fib_frontend.c rtnetlink: make the "split" NLM_DONE handling generic 2024-06-05 12:34:54 +01:00
fib_lookup.h
fib_notifier.c
fib_rules.c fib: remove unnecessary input parameters in fib_default_rule_add 2024-01-03 16:42:48 -08:00
fib_semantics.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-07-15 13:19:17 -07:00
fib_trie.c ipv4: Fix incorrect TOS in route get reply 2024-07-18 11:11:02 +02:00
fou_bpf.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
fou_core.c fou: remove warn in gue_gro_receive on unsupported protocol 2024-06-17 17:49:50 -07:00
fou_nl.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
fou_nl.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
gre_demux.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
gre_offload.c net: gro: rename skb_gro_header_hard() 2024-03-05 13:30:11 +01:00
icmp.c net/ipv4: add tracepoint for icmp_send 2024-05-08 10:39:26 +01:00
igmp.c ipv4: Set scope explicitly in ip_route_output(). 2024-04-08 13:20:51 +01:00
inet_connection_sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-27 12:14:11 -07:00
inet_diag.c inet_diag: Initialize pad field in struct inet_diag_req_v2 2024-07-04 15:25:09 +02:00
inet_fragment.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
inet_hashtables.c tcp: get rid of twsk_unique() 2024-05-09 20:25:55 -07:00
inet_timewait_sock.c tcp: move inet_twsk_schedule helper out of header 2024-06-10 11:54:18 +01:00
inetpeer.c net: ipv4: Simplify the allocation of slab caches in inet_initpeers 2024-01-31 16:39:42 -08:00
ip_forward.c net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check 2023-10-13 09:58:45 -07:00
ip_fragment.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
ip_gre.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip_input.c inet: introduce dst_rtable() helper 2024-04-30 18:32:38 -07:00
ip_options.c
ip_output.c net: Add additional bit to support clockid_t timestamp type 2024-05-23 14:14:36 -07:00
ip_sockglue.c inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT 2024-03-06 12:37:06 +00:00
ip_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ip_tunnel.c ip_tunnel: Move stats allocation to core 2024-06-11 19:24:37 -07:00
ip_vti.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipcomp.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.c net: ipconfig: move ic_nameservers_fallback into #ifdef block 2023-05-22 11:17:55 +01:00
ipip.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipmr_base.c
ipmr.c ip_tunnel: use a separate struct to store tunnel params in the kernel 2024-04-01 10:49:28 +01:00
Kconfig net/tcp: Add TCP-AO config and structures 2023-10-27 10:35:44 +01:00
Makefile bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
metrics.c net: remove NULL-pointer net parameter in ip_metrics_convert 2024-06-05 10:06:00 +01:00
netfilter.c xfrm: pass struct net to xfrm_decode_session wrappers 2023-10-06 08:31:53 +02:00
netlink.c
nexthop.c net: nexthop: Initialize all fields in dumped nexthops 2024-07-24 15:13:43 +01:00
ping.c ping: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
proc.c net: add <net/proto_memory.h> 2024-04-30 18:46:52 -07:00
protocol.c
raw_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
raw.c net: raw: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
route.c ipv4: Fix incorrect source address in Record Route option 2024-07-23 10:28:16 +02:00
syncookies.c tcp: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
sysctl_net_ipv4.c net: ipv4: Add a sysctl to set multipath hash seed 2024-06-12 16:42:11 -07:00
tcp_ao.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-20 13:49:59 -07:00
tcp_bbr.c tcp: Add new args for cong_control in tcp_congestion_ops 2024-05-02 16:26:56 -07:00
tcp_bic.c
tcp_bpf.c tcp_bpf: properly release resources on error paths 2023-10-18 18:09:31 -07:00
tcp_cdg.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
tcp_cong.c tcp: Replace strncpy() with strscpy() 2024-07-16 07:52:15 -07:00
tcp_cubic.c bpf: Remove CONFIG_X86 and CONFIG_DYNAMIC_FTRACE guard from the tcp-cc kfuncs 2024-03-28 18:31:40 -07:00
tcp_dctcp.c tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). 2024-05-21 13:34:50 +02:00
tcp_dctcp.h
tcp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
tcp_fastopen.c net: use unrcu_pointer() helper 2024-06-06 11:52:52 +02:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: process the 3rd ACK with sk_socket for TFO/MPTCP 2024-07-25 12:58:19 +02:00
tcp_ipv4.c net/ipv4: Use nested-BH locking for ipv4_tcp_sk. 2024-06-24 16:41:22 -07:00
tcp_lp.c tcp: rename tcp_time_stamp() to tcp_time_stamp_ts() 2023-10-23 09:35:01 +01:00
tcp_metrics.c tcp_metrics: validate source addr length 2024-07-01 09:40:36 +01:00
tcp_minisocks.c tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child(). 2024-07-16 11:56:14 +02:00
tcp_nv.c
tcp_offload.c net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment 2024-05-13 14:44:06 -07:00
tcp_output.c net/tcp: Add tcp-md5 and tcp-ao tracepoints 2024-06-12 06:39:04 +01:00
tcp_plb.c prandom: remove prandom_u32_max() 2022-12-20 03:13:45 +01:00
tcp_rate.c
tcp_recovery.c tcp: fix excessive TLP and RACK timeouts from HZ rounding 2023-10-17 17:25:42 -07:00
tcp_scalable.c
tcp_sigpool.c net/tcp_sigpool: Use nested-BH locking for sigpool_scratch. 2024-06-24 16:41:22 -07:00
tcp_timer.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-07-11 12:58:13 -07:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-19 09:26:16 -08:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c net/tcp: Remove tcp_hash_fail() 2024-06-12 06:39:04 +01:00
tunnel4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-03 17:25:15 +01:00
udp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
udp_impl.h sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
udp_offload.c udp: Allow GSO transmit from devices with no checksum offload 2024-06-28 18:12:59 -07:00
udp_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
udp_tunnel_nic.c udp_tunnel: Use flex array to simplify code 2023-10-03 11:39:34 +02:00
udp_tunnel_stub.c
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-07-11 12:58:13 -07:00
udplite.c udplite: remove UDPLITE_BIT 2023-09-14 16:16:36 +02:00
xfrm4_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
xfrm4_output.c
xfrm4_policy.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00