linux/net/dccp
Martin KaFai Lau cae3873c5b net: inet: Retire port only listening_hash
The listen sk is currently stored in two hash tables,
listening_hash (hashed by port) and lhash2 (hashed by port and address).

After commit 0ee58dad5b ("net: tcp6: prefer listeners bound to an address")
and commit d9fbc7f643 ("net: tcp: prefer listeners bound to an address"),
the TCP-SYN lookup fast path does not use listening_hash.

The commit 05c0b35709 ("tcp: seq_file: Replace listening_hash with lhash2")
also moved the seq_file (/proc/net/tcp) iteration usage from
listening_hash to lhash2.

There are still a few listening_hash usages left.
One of them is inet_reuseport_add_sock() which uses the listening_hash
to search a listen sk during the listen() system call.  This turns
out to be very slow on use cases that listen on many different
VIPs at a popular port (e.g. 443).  [ On top of the slowness in
adding to the tail in the IPv6 case ].  The latter patch has a
selftest to demonstrate this case.

This patch takes this chance to move all remaining listening_hash
usages to lhash2 and then retire listening_hash.

Since most changes need to be done together, it is hard to cut
the listening_hash to lhash2 switch into small patches.  The
changes in this patch is highlighted here for the review
purpose.

1. Because of the listening_hash removal, lhash2 can use the
   sk->sk_nulls_node instead of the icsk->icsk_listen_portaddr_node.
   This will also keep the sk_unhashed() check to work as is
   after stop adding sk to listening_hash.

   The union is removed from inet_listen_hashbucket because
   only nulls_head is needed.

2. icsk->icsk_listen_portaddr_node and its helpers are removed.

3. The current lhash2 users needs to iterate with sk_nulls_node
   instead of icsk_listen_portaddr_node.

   One case is in the inet[6]_lhash2_lookup().

   Another case is the seq_file iterator in tcp_ipv4.c.
   One thing to note is sk_nulls_next() is needed
   because the old inet_lhash2_for_each_icsk_continue()
   does a "next" first before iterating.

4. Move the remaining listening_hash usage to lhash2

   inet_reuseport_add_sock() which this series is
   trying to improve.

   inet_diag.c and mptcp_diag.c are the final two
   remaining use cases and is moved to lhash2 now also.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12 16:52:18 -07:00
..
ccids dccp: tfrc: fix doc warnings in tfrc_equation.c 2021-06-10 14:08:49 -07:00
ackvec.c net: dccp: Fix most of the kerneldoc warnings 2020-10-30 12:08:54 -07:00
ackvec.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
ccid.c net: dccp: Add __printf() markup to fix -Wsuggest-attribute=format 2020-10-30 11:31:46 -07:00
ccid.h net: dccp: Replace zero-length array with flexible-array member 2020-02-28 12:08:37 -08:00
dccp.h net: remove noblock parameter from recvmsg() entities 2022-04-12 15:00:25 +02:00
diag.c inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data 2020-02-27 18:50:19 -08:00
feat.c dccp: Return the correct errno code 2021-02-06 11:15:28 -08:00
feat.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
input.c net: dccp: Convert to use the preferred fallthrough macro 2020-08-22 12:38:34 -07:00
ipv4.c ipv4: Avoid using RTO_ONLINK with ip_route_connect(). 2022-04-22 13:06:03 +01:00
ipv6.c ipv6: Remove __ipv6_only_sock(). 2022-04-22 12:47:50 +01:00
ipv6.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
Kconfig dccp: Replace HTTP links with HTTPS ones 2020-07-13 11:54:07 -07:00
Makefile net: dccp: Remove dccpprobe module 2018-01-02 14:27:30 -05:00
minisocks.c tcp: allocate tcp_death_row outside of struct netns_ipv4 2022-01-26 19:00:31 -08:00
options.c net: dccp: Convert to use the preferred fallthrough macro 2020-08-22 12:38:34 -07:00
output.c net: dccp: Fix most of the kerneldoc warnings 2020-10-30 12:08:54 -07:00
proto.c net: inet: Retire port only listening_hash 2022-05-12 16:52:18 -07:00
qpolicy.c net: dccp: Fix most of the kerneldoc warnings 2020-10-30 12:08:54 -07:00
sysctl.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
timer.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
trace.h net: dccp: Use memset_startat() for TP zeroing 2021-11-19 11:22:49 +00:00