linux/include/net
mfreemon@cloudflare.com b650d953cd tcp: enforce receive buffer memory limits by allowing the tcp window to shrink
Under certain circumstances, the tcp receive buffer memory limit
set by autotuning (sk_rcvbuf) is increased due to incoming data
packets as a result of the window not closing when it should be.
This can result in the receive buffer growing all the way up to
tcp_rmem[2], even for tcp sessions with a low BDP.

To reproduce:  Connect a TCP session with the receiver doing
nothing and the sender sending small packets (an infinite loop
of socket send() with 4 bytes of payload with a sleep of 1 ms
in between each send()).  This will cause the tcp receive buffer
to grow all the way up to tcp_rmem[2].

As a result, a host can have individual tcp sessions with receive
buffers of size tcp_rmem[2], and the host itself can reach tcp_mem
limits, causing the host to go into tcp memory pressure mode.

The fundamental issue is the relationship between the granularity
of the window scaling factor and the number of byte ACKed back
to the sender.  This problem has previously been identified in
RFC 7323, appendix F [1].

The Linux kernel currently adheres to never shrinking the window.

In addition to the overallocation of memory mentioned above, the
current behavior is functionally incorrect, because once tcp_rmem[2]
is reached when no remediations remain (i.e. tcp collapse fails to
free up any more memory and there are no packets to prune from the
out-of-order queue), the receiver will drop in-window packets
resulting in retransmissions and an eventual timeout of the tcp
session.  A receive buffer full condition should instead result
in a zero window and an indefinite wait.

In practice, this problem is largely hidden for most flows.  It
is not applicable to mice flows.  Elephant flows can send data
fast enough to "overrun" the sk_rcvbuf limit (in a single ACK),
triggering a zero window.

But this problem does show up for other types of flows.  Examples
are websockets and other type of flows that send small amounts of
data spaced apart slightly in time.  In these cases, we directly
encounter the problem described in [1].

RFC 7323, section 2.4 [2], says there are instances when a retracted
window can be offered, and that TCP implementations MUST ensure
that they handle a shrinking window, as specified in RFC 1122,
section 4.2.2.16 [3].  All prior RFCs on the topic of tcp window
management have made clear that sender must accept a shrunk window
from the receiver, including RFC 793 [4] and RFC 1323 [5].

This patch implements the functionality to shrink the tcp window
when necessary to keep the right edge within the memory limit by
autotuning (sk_rcvbuf).  This new functionality is enabled with
the new sysctl: net.ipv4.tcp_shrink_window

Additional information can be found at:
https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/

[1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F
[2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4
[3] https://www.rfc-editor.org/rfc/rfc1122#page-91
[4] https://www.rfc-editor.org/rfc/rfc793
[5] https://www.rfc-editor.org/rfc/rfc1323

Signed-off-by: Mike Freemon <mfreemon@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-17 09:53:53 +01:00
..
9p 9p: Add additional debug flags and open modes 2023-03-27 02:33:48 +00:00
bluetooth Bluetooth: ISO: use correct CIS order in Set CIG Parameters event 2023-06-05 17:14:07 -07:00
caif net: remove the caif_hsi driver 2021-07-01 13:19:48 -07:00
iucv net/af_iucv: Use struct_group() to zero struct iucv_sock region 2021-11-19 11:52:25 +00:00
mana net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters 2023-05-30 12:05:22 +02:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-15 22:19:41 -07:00
netns tcp: enforce receive buffer memory limits by allowing the tcp window to shrink 2023-06-17 09:53:53 +01:00
nfc NFC: add NCI_UNREG flag to eliminate the race 2021-11-17 20:17:05 -08:00
phonet net: ioctl: Use kernel memory on protocol ioctl callbacks 2023-06-15 22:33:26 -07:00
sctp sctp: delete the nested flexible array peer_init 2023-04-21 08:19:30 +01:00
tc_act net/sched: act_connmark: transition to percpu stats and rcu 2023-02-16 10:39:28 +01:00
6lowpan.h
act_api.h net/sched: Rename user cookie and act cookie 2023-02-20 16:46:10 -08:00
addrconf.h ipv6: constify inet6_mc_check() 2023-03-17 08:56:37 +00:00
af_ieee802154.h
af_rxrpc.h rxrpc: Fix timeout of a call that hasn't yet been granted a channel 2023-05-01 07:43:19 +01:00
af_unix.h af_unix: preserve const qualifier in unix_sk() 2023-03-18 12:23:33 +00:00
af_vsock.h vsock: support sockmap 2023-03-29 08:19:38 +01:00
ah.h
amt.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
arp.h neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
atmclip.h
ax25.h x25: preserve const qualifier in [a]x25_sk() 2023-03-18 12:23:34 +00:00
ax88796.h ax88796: Fix some typo in a comment 2022-08-09 22:14:02 -07:00
bareudp.h bareudp: Move definition of struct bareudp_conf to bareudp.c 2021-12-13 12:34:09 +00:00
bond_3ad.h net: bonding: Share lacpdu_mcast_addr definition 2022-09-16 14:34:01 +01:00
bond_alb.h bonding (gcc13): synchronize bond_{a,t}lb_xmit() types 2022-11-02 20:38:13 -07:00
bond_options.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
bonding.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-05-25 19:57:39 -07:00
bpf_sk_storage.h bpf: struct sock is declared twice in bpf_sk_storage header 2021-03-26 17:43:55 +01:00
busy_poll.h net: Fix a data-race around sysctl_net_busy_poll. 2022-08-24 13:46:58 +01:00
calipso.h
cfg80211-wext.h wifi: cfg80211: Avoid clashing function prototypes 2022-11-16 11:31:47 +02:00
cfg80211.h wifi: cfg80211: add a work abstraction with special semantics 2023-06-07 19:53:15 +02:00
cfg802154.h ieee802154: Add support for user beaconing requests 2023-01-28 13:51:22 +01:00
checksum.h net: checksum: drop the linux/uaccess.h include 2023-01-27 11:19:46 +00:00
cipso_ipv4.h
cls_cgroup.h
codel_impl.h codel: remove unnecessary sock.h include 2021-12-22 15:03:47 -08:00
codel_qdisc.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
codel.h codel: remove unnecessary pkt_sched.h include 2021-12-22 15:03:51 -08:00
compat.h net: copy from user before calling __get_compat_msghdr 2022-07-24 18:39:17 -06:00
datalink.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
dcbevent.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
dcbnl.h net: dcb: add helper functions to retrieve PCP and DSCP rewrite maps 2023-01-20 09:33:22 +00:00
devlink.h devlink: bring port new reply back 2023-06-01 21:37:32 -07:00
dropreason-core.h net: extend drop reasons for multiple subsystems 2023-04-20 20:20:49 -07:00
dropreason.h mac80211: use the new drop reasons infrastructure 2023-04-20 20:20:49 -07:00
dsa_stubs.h net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub 2023-04-09 15:35:49 +01:00
dsa.h net: dsa: add support for mac_prepare() and mac_finish() calls 2023-05-26 10:39:40 +01:00
dsfield.h
dst_cache.h wireguard: device: reset peer src endpoint when netns exits 2021-11-29 19:50:45 -08:00
dst_metadata.h xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF 2022-12-05 21:58:27 -08:00
dst_ops.h ipv6: remove max_size check inline with ipv4 2023-01-13 20:59:14 -08:00
dst.h net: dst: Switch to rcuref_t reference counting 2023-03-28 18:52:28 -07:00
erspan.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
esp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
espintcp.h
ethoc.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
failover.h net: failover: add net device refcount tracker 2021-12-06 16:06:02 -08:00
fib_notifier.h
fib_rules.h fib: expand fib_rule_policy 2021-12-16 07:18:35 -08:00
firewire.h firewire: net: Make use of get_unaligned_be48(), put_unaligned_be48() 2022-07-28 22:21:54 -07:00
flow_dissector.h net: flow_dissector: add support for cfm packets 2023-06-12 17:01:45 -07:00
flow_offload.h net/sched: cls_api: Support hardware miss to tc action 2023-02-20 16:46:10 -08:00
flow.h ipv4: Drop tos parameter from flowi4_update_output() 2023-06-02 10:52:38 +01:00
fou.h bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fq_impl.h wifi: mac80211: add support for restricting netdev features per vif 2022-12-01 15:09:10 +01:00
fq.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
garp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
gen_stats.h net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
genetlink.h mptcp: more detailed error reporting on endpoint creation 2022-11-21 13:09:07 +00:00
geneve.h net: geneve: fix array of flexible structures warnings 2022-10-31 10:43:04 +00:00
gre.h ip_gre: add csum offload support for gre header 2021-01-29 20:39:14 -08:00
gro_cells.h
gro.h net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
gso.h net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
gtp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
gue.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
handshake.h net/handshake: Enable the SNI extension to work properly 2023-05-24 22:05:24 -07:00
hwbm.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
icmp.h ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages 2021-06-28 14:29:45 -07:00
ieee80211_radiotap.h wifi: radiotap: separate vendor TLV into header/content 2023-03-07 20:11:36 +01:00
ieee802154_netdev.h mac802154: Handle basic beaconing 2023-01-28 13:55:10 +01:00
if_inet6.h ipv6: fix locking issues with loops over idev->addr_list 2022-04-06 22:09:39 -07:00
ife.h
ila.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
inet6_connection_sock.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
inet6_hashtables.h net: allow unbound socket for packets in VRF when tcp_l3mdev_accept set 2022-07-29 11:58:54 +01:00
inet_common.h ipv4, ipv6: Use splice_eof() to flush 2023-06-08 19:40:30 -07:00
inet_connection_sock.h net: Add a bhash2 table hashed by port and address 2022-08-24 19:30:07 -07:00
inet_dscp.h ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rules 2022-02-07 20:12:45 -08:00
inet_ecn.h net: add skb_get_dsfield() helper 2021-10-15 11:33:08 +01:00
inet_frag.h Revert "net: Remove low_thresh in ip defrag" 2023-05-16 20:46:30 -07:00
inet_hashtables.h tcp: Add TIME_WAIT sockets in bhash2. 2022-12-30 07:25:52 +00:00
inet_sock.h inet: preserve const qualifier in inet_sk() 2023-03-17 08:56:37 +00:00
inet_timewait_sock.h tcp: Add TIME_WAIT sockets in bhash2. 2022-12-30 07:25:52 +00:00
inetpeer.h
ioam6.h treewide: Replace zero-length arrays with flexible-array members 2022-02-17 07:00:39 -06:00
ip6_checksum.h net: move gro definitions to include/net/gro.h 2021-11-16 13:16:54 +00:00
ip6_fib.h ipv6: Remove in6addr_any alternatives. 2023-03-29 08:22:52 +01:00
ip6_route.h net: dst: Prevent false sharing vs. dst_entry:: __refcnt 2023-03-28 18:52:22 -07:00
ip6_tunnel.h ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode 2022-04-25 11:40:45 +01:00
ip_fib.h ipv4: Use dscp_t in struct fib_entry_notifier_info 2022-04-11 17:37:50 -07:00
ip_tunnels.h bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
ip_vs.h ipvs: Correct spelling in comments 2023-04-22 01:39:41 +02:00
ip.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-05-25 19:57:39 -07:00
ipcomp.h xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
ipv6_frag.h net: dropreason: add SKB_DROP_REASON_FRAG_REASM_TIMEOUT 2022-10-31 20:14:27 -07:00
ipv6_stubs.h bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt() 2022-09-02 20:34:32 -07:00
ipv6.h ipv6: icmp6: add drop reason support to icmpv6_notify() 2023-02-13 19:55:32 -08:00
iw_handler.h
kcm.h kcm: Send multiple frags in one sendmsg() 2023-06-12 21:13:23 -07:00
l3mdev.h
lag.h
lapb.h net: lapb: Make "lapb_t1timer_running" able to detect an already running timer 2021-03-23 14:14:50 -07:00
lib80211.h
llc_c_ac.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
llc_c_ev.h
llc_c_st.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
llc_conn.h llc: add net device refcount tracker 2021-12-07 20:44:59 -08:00
llc_if.h llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
llc_pdu.h net: llc: fix skb_over_panic 2021-07-27 13:05:56 +01:00
llc_s_ac.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
llc_s_ev.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
llc_s_st.h add missing includes and forward declarations to networking includes under linux/ 2022-07-28 11:29:36 +02:00
llc_sap.h
llc.h llc: fix out-of-bound array index in llc_sk_dev_hash() 2021-11-07 19:25:29 +00:00
lwtunnel.h netfilter: add netfilter hooks to SRv6 data plane 2021-08-30 01:51:36 +02:00
mac80211.h wifi: mac80211: provide a helper to fetch the medium synchronization delay 2023-06-06 14:15:16 +02:00
mac802154.h mac802154: Drop IEEE802154_HW_RX_DROP_BAD_CKSUM 2022-10-12 12:57:19 +02:00
macsec.h macsec: Use helper macsec_netdev_priv for offload drivers 2023-05-10 11:32:09 +01:00
mctp.h mctp: Use output netdev to allocate skb headroom 2022-04-01 12:04:15 +01:00
mctpdevice.h mctp: Pass flow data & flow release events to drivers 2021-10-29 13:23:51 +01:00
mip6.h
mld.h mld: add new workqueues for process mld events 2021-03-26 15:14:56 -07:00
mpls_iptunnel.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
mpls.h
mptcp.h mptcp: remove MPTCP 'ifdef' in TCP SYN cookies 2022-12-12 13:11:24 -08:00
mrp.h mrp: introduce active flags to prevent UAF when applicant uninit 2022-11-18 12:14:55 +00:00
ncsi.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
ndisc.h neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
neighbour.h neighbour: fix unaligned access to pneigh_entry 2023-06-01 21:36:37 -07:00
net_debug.h net: add CONFIG_DEBUG_NET 2022-05-11 12:43:10 +01:00
net_failover.h
net_namespace.h net: add a refcount tracker for kernel sockets 2022-10-24 11:04:43 +01:00
net_ratelimit.h
net_trackers.h net: add networking namespace refcount tracker 2021-12-10 06:38:26 -08:00
netdev_queues.h net: add macro netif_subqueue_completed_wake 2023-04-18 12:59:01 +02:00
netevent.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
netlabel.h
netlink.h net: netlink: recommend policy range validation 2023-01-28 00:33:51 -08:00
netprio_cgroup.h
netrom.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
nexthop.h ipv6: remove nexthop_fib6_nh_bh() 2023-05-11 18:07:05 -07:00
nl802154.h ieee802154: Add support for user beaconing requests 2023-01-28 13:51:22 +01:00
nsh.h
p8022.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
page_pool.h page_pool: fix inconsistency for page_pool_ring_[un]lock() 2023-05-23 20:25:13 -07:00
pie.h
ping.h net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294 2023-06-02 09:55:22 +01:00
pkt_cls.h sch_htb: Allow HTB priority parameter in offload mode 2023-05-15 09:31:07 +01:00
pkt_sched.h net/sched: taprio: report class offload stats per TXQ, not per TC 2023-06-12 09:43:30 +01:00
pptp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
protocol.h tcp/udp: Make early_demux back namespacified. 2022-07-15 18:50:35 -07:00
psample.h psample: Add a fwd declaration for skbuff 2021-08-09 15:34:21 -07:00
psnap.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
raw.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
rawv6.h ipv6: raw: constify raw_v6_match() socket argument 2023-03-17 08:56:37 +00:00
red.h treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
regulatory.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
request_sock.h tcp: Use BPF timeout setting for SYN ACK RTO 2022-02-02 14:45:18 +00:00
rose.h net: rose: add netdev ref tracker to 'struct rose_sock' 2022-08-01 11:59:23 -07:00
route.h ipv4: Drop tos parameter from flowi4_update_output() 2023-06-02 10:52:38 +01:00
rpl.h ipv6: rpl: Fix Route of Death. 2023-06-06 20:59:08 -07:00
rsi_91x.h
rtnetlink.h rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link 2022-10-31 18:10:21 -07:00
rtnh.h
sch_generic.h net: sched: Remove unused qdisc_l2t() 2023-06-17 00:17:42 -07:00
scm.h scm: add SO_PASSPIDFD and SCM_PIDFD 2023-06-12 10:45:49 +01:00
secure_seq.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
seg6_hmac.h
seg6_local.h
seg6.h udp6: Use Segment Routing Header for dest address if present 2022-01-04 12:17:35 +00:00
selftests.h net: selftest: fix build issue if INET is disabled 2021-04-28 14:06:45 -07:00
slhc_vj.h
smc.h net/smc: Introduce explicit check for v2 support 2023-03-15 08:18:35 +00:00
snmp.h
sock_reuseport.h soreuseport: Fix socket selection for SO_INCOMING_CPU. 2022-10-25 11:35:16 +02:00
sock.h net: ioctl: Use kernel memory on protocol ioctl callbacks 2023-06-15 22:33:26 -07:00
Space.h wan: remove sbni/granch driver 2021-08-03 13:05:26 +01:00
stp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
strparser.h tls: rx: remove the message decrypted tracking 2022-07-18 11:24:10 +01:00
switchdev.h bridge: switchdev: Allow device drivers to install locked FDB entries 2022-11-09 19:06:13 -08:00
tc_wrapper.h net/sched: Retire rsvp classifier 2023-02-16 09:27:07 +01:00
tcp_states.h
tcp.h net: ioctl: Use kernel memory on protocol ioctl callbacks 2023-06-15 22:33:26 -07:00
timewait_sock.h
tipc.h
tls_toe.h
tls.h net: tls: make the offload check helper take skb not socket 2023-06-15 09:01:05 +01:00
transp_v6.h inet6: Remove inet6_destroy_sock(). 2022-10-24 09:40:39 +01:00
tso.h net: tso: inline tso_count_descs() 2022-12-12 15:04:39 -08:00
tun_proto.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
udp_tunnel.h net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error() 2022-11-08 16:42:28 +00:00
udp.h net: ioctl: Use kernel memory on protocol ioctl callbacks 2023-06-15 22:33:26 -07:00
udplite.h tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). 2022-10-12 17:50:37 -07:00
vsock_addr.h
vxlan.h net: vxlan: Add nolocalbypass option to vxlan. 2023-05-13 17:02:33 +01:00
wext.h
x25.h x25: preserve const qualifier in [a]x25_sk() 2023-03-18 12:23:34 +00:00
x25device.h
xdp_priv.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
xdp_sock_drv.h xsk: Remove unused xsk_buff_discard 2022-09-30 07:55:46 -07:00
xdp_sock.h bpf, net: xskmap memory usage 2023-03-07 09:33:43 -08:00
xdp.h bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
xfrm.h xfrm: add new device offload acquire flag 2023-03-20 11:29:33 +02:00
xsk_buff_pool.h xsk: Use pool->dma_pages to check for DMA 2023-04-27 22:24:51 +02:00