linux/net/ipv6
Wangyang Guo d288a162dd net: dst: Prevent false sharing vs. dst_entry:: __refcnt
dst_entry::__refcnt is highly contended in scenarios where many connections
happen from and to the same IP. The reference count is an atomic_t, so the
reference count operations have to take the cache-line exclusive.

Aside of the unavoidable reference count contention there is another
significant problem which is caused by that: False sharing.

perf top identified two affected read accesses. dst_entry::lwtstate and
rtable::rt_genid.

dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into
a seperate cacheline vs. the read mostly members located at the beginning
of the struct.

That prevents false sharing vs. the struct members in the first 64
bytes of the structure, but there is also

  dst_entry::lwtstate

which is located after the reference count and in the same cache line. This
member is read after a reference count has been acquired.

struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a
size of 112 bytes, which means that the struct members of rtable which
follow the dst member share the same cache line as dst_entry::__refcnt.
Especially

  rtable::rt_genid

is also read by the contexts which have a reference count acquired
already.

When dst_entry:__refcnt is incremented or decremented via an atomic
operation these read accesses stall. This was found when analysing the
memtier benchmark in 1:100 mode, which amplifies the problem extremly.

Move the rt[6i]_uncached[_list] members out of struct rtable and struct
rt6_info into struct dst_entry to provide padding and move the lwtstate
member after that so it ends up in the same cache line.

The resulting improvement depends on the micro-architecture and the number
of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached
benchmark.

[ tglx: Rearrange struct ]

Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230323102800.042297517@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-28 18:52:22 -07:00
..
ila ila: do not generate empty messages in ila_xlat_nl_cmd_get_mapping() 2023-03-01 08:48:46 +00:00
netfilter xtables: move icmp/icmpv6 logic to xt_tcpudp 2023-03-22 21:48:59 +01:00
addrconf_core.c net: rename reference+tracking helpers 2022-06-09 21:52:55 -07:00
addrconf.c ipv6: prevent router_solicitations for team port 2023-03-24 08:57:06 +00:00
addrlabel.c ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network 2022-11-07 12:26:15 +00:00
af_inet6.c net: annotate lockless accesses to sk->sk_err_soft 2023-03-17 08:25:05 +00:00
ah6.c net: ipv6: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
anycast.c
calipso.c cipso,calipso: resolve a number of problems with the DOI refcounts 2021-03-04 15:26:57 -08:00
datagram.c ipv6: Fix datagram socket connection with DSCP. 2023-02-09 22:49:04 -08:00
esp6_offload.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2022-11-29 20:50:51 -08:00
esp6.c net: ipv6: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
exthdrs_core.c
exthdrs_offload.c
exthdrs.c net: ipv6: add skb drop reasons to TLV parse 2022-04-13 13:09:57 +01:00
fib6_notifier.c
fib6_rules.c ipv6: change fib6_rules_net_exit() to batch mode 2022-02-08 20:41:34 -08:00
fou6.c
icmp.c ipv6: icmp6: add drop reason support to icmpv6_echo_reply() 2023-02-20 08:54:23 +00:00
inet6_connection_sock.c net: annotate lockless accesses to sk->sk_err_soft 2023-03-17 08:25:05 +00:00
inet6_hashtables.c tcp: Access &tcp_hashinfo via net. 2022-09-20 10:21:49 -07:00
ioam6_iptunnel.c ipv6: ioam: Insertion frequency in lwtunnel output 2022-02-04 20:24:45 -08:00
ioam6.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
ip6_checksum.c
ip6_fib.c ipv6: fib6_new_sernum() optimization 2022-11-16 12:42:00 +00:00
ip6_flowlabel.c ipv6: flowlabel: do not disable BH where not needed 2023-03-21 21:32:18 -07:00
ip6_gre.c erspan: do not use skb_mac_header() in ndo_start_xmit() 2023-03-21 21:16:26 -07:00
ip6_icmp.c net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending 2021-02-23 11:29:52 -08:00
ip6_input.c netfilter: keep conntrack reference until IPsecv6 policy checks are done 2023-03-22 21:50:23 +01:00
ip6_offload.c IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver 2022-12-12 15:41:44 -08:00
ip6_offload.h
ip6_output.c neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
ip6_tunnel.c net: tunnels: annotate lockless accesses to dev->needed_headroom 2023-03-15 00:04:04 -07:00
ip6_udp_tunnel.c
ip6_vti.c ipv6: tunnels: use DEV_STATS_INC() 2022-11-16 12:48:44 +00:00
ip6mr.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
ipcomp6.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipv6_sockglue.c net: no longer support SOCK_REFCNT_DEBUG feature 2023-02-15 10:25:21 +00:00
Kconfig crypto: lib - make the sha1 library optional 2022-07-15 16:43:59 +08:00
Makefile net: ipv6: use ipv6-y directly instead of ipv6-objs 2021-09-28 13:13:40 +01:00
mcast_snoop.c net: bridge: mcast: fix broken length + header check for MRDv6 Adv. 2021-04-27 14:02:06 -07:00
mcast.c ipv6: constify inet6_mc_check() 2023-03-17 08:56:37 +00:00
mip6.c xfrm: mip6: add extack to mip6_destopt_init_state, mip6_rthdr_init_state 2022-09-29 07:18:01 +02:00
ndisc.c neighbour: annotate lockless accesses to n->nud_state 2023-03-15 00:37:32 -07:00
netfilter.c netfilter: Use l3mdev flow key when re-routing mangled packets 2022-05-16 13:03:29 +02:00
output_core.c treewide: use get_random_u32_{above,below}() instead of manual loop 2022-11-18 02:15:22 +01:00
ping.c inet: preserve const qualifier in inet_sk() 2023-03-17 08:56:37 +00:00
proc.c icmp: Add counters for rate limits 2023-01-26 10:52:18 +01:00
protocol.c
raw.c netfilter: keep conntrack reference until IPsecv6 policy checks are done 2023-03-22 21:50:23 +01:00
reassembly.c net: dropreason: add SKB_DROP_REASON_DUP_FRAG 2022-10-31 20:14:26 -07:00
route.c net: dst: Prevent false sharing vs. dst_entry:: __refcnt 2023-03-28 18:52:22 -07:00
rpl_iptunnel.c net: ipv6: rpl_iptunnel: Replace 0-length arrays with flexible arrays 2023-01-06 19:28:01 -08:00
rpl.c net: ipv6: rpl*: Fix strange kerneldoc warnings due to bad header 2020-10-30 12:12:52 -07:00
seg6_hmac.c net: ipv6: unexport __init-annotated seg6_hmac_net_init() 2022-06-28 21:23:30 -07:00
seg6_iptunnel.c seg6: add support for SRv6 H.L2Encaps.Red behavior 2022-07-29 12:14:03 +01:00
seg6_local.c seg6: add PSP flavor support for SRv6 End behavior 2023-02-16 13:18:06 +01:00
seg6.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-09-08 18:38:30 +02:00
sit.c ipv6/sit: use DEV_STATS_INC() to avoid data-races 2022-11-16 12:48:44 +00:00
syncookies.c tcp: Fix data-races around sysctl_tcp_syncookies. 2022-07-18 12:21:54 +01:00
sysctl_net_ipv6.c net: sysctl: introduce sysctl SYSCTL_THREE 2022-05-03 10:15:06 +02:00
tcp_ipv6.c netfilter: keep conntrack reference until IPsecv6 policy checks are done 2023-03-22 21:50:23 +01:00
tcpv6_offload.c net: move gro definitions to include/net/gro.h 2021-11-16 13:16:54 +00:00
tunnel6.c
udp_impl.h tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). 2022-10-12 17:50:37 -07:00
udp_offload.c udp: allow header check for dodgy GSO_UDP_L4 packets. 2022-12-12 09:29:56 +00:00
udp.c netfilter: keep conntrack reference until IPsecv6 policy checks are done 2023-03-22 21:50:23 +01:00
udplite.c tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). 2022-10-12 17:50:37 -07:00
xfrm6_input.c
xfrm6_output.c xfrm: fix tunnel model fragmentation behavior 2022-03-01 12:08:40 +01:00
xfrm6_policy.c net: dst: Prevent false sharing vs. dst_entry:: __refcnt 2023-03-28 18:52:22 -07:00
xfrm6_protocol.c
xfrm6_state.c
xfrm6_tunnel.c xfrm: tunnel: add extack to ipip_init_state, xfrm6_tunnel_init_state 2022-09-29 07:18:00 +02:00