linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-17 01:04:19 +08:00

History

mfreemon@cloudflare.com b650d953cd tcp: enforce receive buffer memory limits by allowing the tcp window to shrink Under certain circumstances, the tcp receive buffer memory limit set by autotuning (sk_rcvbuf) is increased due to incoming data packets as a result of the window not closing when it should be. This can result in the receive buffer growing all the way up to tcp_rmem[2], even for tcp sessions with a low BDP. To reproduce: Connect a TCP session with the receiver doing nothing and the sender sending small packets (an infinite loop of socket send() with 4 bytes of payload with a sleep of 1 ms in between each send()). This will cause the tcp receive buffer to grow all the way up to tcp_rmem[2]. As a result, a host can have individual tcp sessions with receive buffers of size tcp_rmem[2], and the host itself can reach tcp_mem limits, causing the host to go into tcp memory pressure mode. The fundamental issue is the relationship between the granularity of the window scaling factor and the number of byte ACKed back to the sender. This problem has previously been identified in RFC 7323, appendix F [1]. The Linux kernel currently adheres to never shrinking the window. In addition to the overallocation of memory mentioned above, the current behavior is functionally incorrect, because once tcp_rmem[2] is reached when no remediations remain (i.e. tcp collapse fails to free up any more memory and there are no packets to prune from the out-of-order queue), the receiver will drop in-window packets resulting in retransmissions and an eventual timeout of the tcp session. A receive buffer full condition should instead result in a zero window and an indefinite wait. In practice, this problem is largely hidden for most flows. It is not applicable to mice flows. Elephant flows can send data fast enough to "overrun" the sk_rcvbuf limit (in a single ACK), triggering a zero window. But this problem does show up for other types of flows. Examples are websockets and other type of flows that send small amounts of data spaced apart slightly in time. In these cases, we directly encounter the problem described in [1]. RFC 7323, section 2.4 [2], says there are instances when a retracted window can be offered, and that TCP implementations MUST ensure that they handle a shrinking window, as specified in RFC 1122, section 4.2.2.16 [3]. All prior RFCs on the topic of tcp window management have made clear that sender must accept a shrunk window from the receiver, including RFC 793 [4] and RFC 1323 [5]. This patch implements the functionality to shrink the tcp window when necessary to keep the right edge within the memory limit by autotuning (sk_rcvbuf). This new functionality is enabled with the new sysctl: net.ipv4.tcp_shrink_window Additional information can be found at: https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/ [1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F [2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4 [3] https://www.rfc-editor.org/rfc/rfc1122#page-91 [4] https://www.rfc-editor.org/rfc/rfc793 [5] https://www.rfc-editor.org/rfc/rfc1323 Signed-off-by: Mike Freemon <mfreemon@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>		2023-06-17 09:53:53 +01:00
..
bpf.h	bpf: Invert the dependency between bpf-netns.h and netns/bpf.h	2021-12-29 20:03:05 -08:00
can.h	net: add missing includes and forward declarations under net/	2022-07-22 12:53:22 +01:00
conntrack.h	netfilter: ctnetlink: make event listener tracking global	2023-02-22 00:28:47 +01:00
core.h	net: make default_rps_mask a per netns attribute	2023-02-20 11:22:54 +00:00
flow_table.h	netfilter: nf_flow_table: count pending offload workqueue tasks	2022-07-11 16:25:14 +02:00
generic.h	netns: Replace zero-length array with DECLARE_FLEX_ARRAY() helper	2022-09-28 18:51:47 -07:00
hash.h	netns: provide pure entropy for net_hash_mix()	2019-03-28 17:00:45 -07:00
ieee802154_6lowpan.h	net: dynamically allocate fqdir structures	2019-05-26 14:08:05 -07:00
ipv4.h	tcp: enforce receive buffer memory limits by allowing the tcp window to shrink	2023-06-17 09:53:53 +01:00
ipv6.h	net/ipv6: convert skip_notify_on_dev_down sysctl to u8	2023-06-02 22:55:43 -07:00
mctp.h	net: add missing includes and forward declarations under net/	2022-07-22 12:53:22 +01:00
mib.h	net: reorganize fields in netns_mib	2021-04-02 14:31:44 -07:00
mpls.h	net: add missing includes and forward declarations under net/	2022-07-22 12:53:22 +01:00
netfilter.h	Remove DECnet support from kernel	2022-08-22 14:26:30 +01:00
nexthop.h	net: add missing includes and forward declarations under net/	2022-07-22 12:53:22 +01:00
nftables.h	net: remove obsolete members from struct net	2021-04-06 00:34:53 +02:00
packet.h
sctp.h	sctp: add dif and sdif check in asoc and ep lookup	2022-11-18 11:42:54 +00:00
smc.h	net/smc: Unbind r/w buffer size from clcsock and make them tunable	2022-09-22 12:58:21 +02:00
unix.h	net: add missing includes and forward declarations under net/	2022-07-22 12:53:22 +01:00
xdp.h	net: xsk: Don't include <linux/rculist.h>	2022-12-06 20:04:34 -08:00
xfrm.h	xfrm: rework default policy structure	2022-03-18 07:23:12 +01:00