linux/net
Talal Ahmad f1a456f8f3 net: avoid double accounting for pure zerocopy skbs
Track skbs with only zerocopy data and avoid charging them to kernel
memory to correctly account the memory utilization for msg_zerocopy.
All of the data in such skbs is held in user pages which are already
accounted to user. Before this change, they are charged again in
kernel in __zerocopy_sg_from_iter. The charging in kernel is
excessive because data is not being copied into skb frags. This
excessive charging can lead to kernel going into memory pressure
state which impacts all sockets in the system adversely. Mark pure
zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove
charge/uncharge for data in such skbs.

Initially, an skb is marked pure zerocopy when it is empty and in
zerocopy path. skb can then change from a pure zerocopy skb to mixed
data skb (zerocopy and copy data) if it is at tail of write queue and
there is room available in it and non-zerocopy data is being sent in
the next sendmsg call. At this time sk_mem_charge is done for the pure
zerocopied data and the pure zerocopy flag is unmarked. We found that
this happens very rarely on workloads that pass MSG_ZEROCOPY.

A pure zerocopy skb can later be coalesced into normal skb if they are
next to each other in queue but this patch prevents coalescing from
happening. This avoids complexity of charging when skb downgrades from
pure zerocopy to mixed. This is also rare.

In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge
for SKB_TRUESIZE(MAX_TCP_HEADER) is done for sk_mem_charge in
tcp_skb_entail for an skb without data.

Testing with the msg_zerocopy.c benchmark between two hosts(100G nics)
with zerocopy showed that before this patch the 'sock' variable in
memory.stat for cgroup2 that tracks sum of sk_forward_alloc,
sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this
change it is 0. This is due to no charge to sk_forward_alloc for
zerocopy data and shows memory utilization for kernel is lowered.

Signed-off-by: Talal Ahmad <talalahmad@google.com>
Acked-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01 16:33:27 -07:00
..
6lowpan 6lowpan: iphc: Fix an off-by-one check of array index 2021-07-22 16:19:03 +02:00
9p net/9p: increase default msize to 128k 2021-09-05 08:36:44 +09:00
802 llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
8021q net: use eth_hw_addr_set() instead of ether_addr_copy() 2021-10-02 14:18:25 +01:00
appletalk net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
atm net: atm: use address setting helpers 2021-10-24 13:59:45 +01:00
ax25 ax25: constify dev_addr passing 2021-10-13 09:40:45 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
bluetooth bluetooth: use dev_addr_set() 2021-10-25 11:01:29 -07:00
bpf Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-10-01 19:58:02 -07:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2021-11-01 12:59:58 +00:00
caif net: caif: get ready for const netdev->dev_addr 2021-10-24 13:59:45 +01:00
can can: bcm: Use hrtimer_forward_now() 2021-10-24 16:24:28 +02:00
ceph Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
core net: avoid double accounting for pure zerocopy skbs 2021-11-01 16:33:27 -07:00
dcb net: dcb: Return the correct errno code 2021-06-01 17:01:33 -07:00
dccp tcp: switch orphan_count to bare per-cpu counters 2021-10-15 11:28:34 +01:00
decnet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
dns_resolver
dsa net: dsa: populate supported_interfaces member 2021-11-01 13:06:32 +00:00
ethernet eth: platform: add a helper for loading netdev->dev_addr 2021-10-08 14:54:33 +01:00
ethtool ethtool: don't drop the rtnl_lock half way thru the ioctl 2021-11-01 13:26:07 +00:00
hsr net: hsr: Add support for redbox supervision frames 2021-10-26 14:52:17 +01:00
ieee802154 mac802154: use dev_addr_set() - manual 2021-10-20 14:27:40 +01:00
ife
ipv4 net: avoid double accounting for pure zerocopy skbs 2021-11-01 16:33:27 -07:00
ipv6 ipv6: enable net.ipv6.route.max_size sysctl in network namespace 2021-10-28 12:53:39 +01:00
iucv net/iucv: Replace deprecated CPU-hotplug functions. 2021-08-09 10:13:32 +01:00
kcm net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
key net: Remove unnecessary variables 2021-05-26 07:03:39 +02:00
l2tp net/l2tp: Fix reference count leak in l2tp_udp_recv_core 2021-09-09 11:00:20 +01:00
l3mdev
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
mac802154 mac802154: use dev_addr_set() - manual 2021-10-20 14:27:40 +01:00
mctp mctp: Pass flow data & flow release events to drivers 2021-10-29 13:23:51 +01:00
mpls mpls: defer ttl decrement in mpls_forward() 2021-07-23 17:17:56 +01:00
mptcp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
ncsi net/ncsi: add get MAC address command to get Intel i210 MAC address 2021-09-01 17:18:56 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2021-11-01 12:59:58 +00:00
netlabel net: fix NULL pointer reference in cipso_v4_doi_free 2021-08-30 12:23:18 +01:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-07 15:24:06 -07:00
netrom ax25: constify dev_addr passing 2021-10-13 09:40:45 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-14 16:50:14 -07:00
nsh
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-19 18:09:18 -07:00
packet af_packet: Introduce egress hook 2021-10-14 23:06:44 +02:00
phonet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
psample
qrtr net: qrtr: combine nameservice into main module 2021-09-28 17:36:43 -07:00
rds net/rds: dma_map_sg is entitled to merge entries 2021-08-18 15:35:50 -07:00
rfkill
rose rose: constify dev_addr passing 2021-10-13 09:40:45 -07:00
rxrpc rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies() 2021-09-24 14:18:34 +01:00
sched sch_htb: Add extack messages for EOPNOTSUPP errors 2021-10-28 14:34:03 +01:00
sctp sctp: add vtag check in sctp_sf_ootb 2021-10-22 12:36:45 -07:00
smc net/smc: Introduce tracepoint for smcr link down 2021-11-01 13:39:14 +00:00
strparser net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
sunrpc Bug fixes for NFSD error handling paths 2021-10-07 14:11:40 -07:00
switchdev net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device 2021-10-27 14:54:02 +01:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
unix net: Implement ->sock_is_readable() for UDP and AF_UNIX 2021-10-26 12:29:33 -07:00
vmw_vsock vsock: Enable y2038 safe timeval for timeout 2021-10-08 16:21:53 +01:00
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
x25 net: x25: Use list_for_each_entry() to simplify code in x25_route.c 2021-06-10 14:08:09 -07:00
xdp xsk: Fix clang build error in __xp_alloc 2021-09-29 13:59:13 +02:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2021-11-01 13:01:44 +00:00
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
Kconfig net/core: disable NET_RX_BUSY_POLL on PREEMPT_RT 2021-10-01 15:45:10 -07:00
Makefile mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
socket.c Core: 2021-08-31 16:43:06 -07:00
sysctl_net.c