linux/include/net
Evgeniy Polyakov a9d8f9110d inet: Allowing more than 64k connections and heavily optimize bind(0) time.
With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.

Attached picture shows bind() time depending on number of already bound
sockets.

Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.

At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.

Blue area corresponds to the port selection optimization.

This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.

Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Tested-by: Denys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:34:31 -08:00
..
9p 9p: fix sparse warnings 2008-10-22 18:54:47 -05:00
bluetooth Bluetooth: Enable per-module dynamic debug messages 2008-11-30 12:17:28 +01:00
irda irda: Add irda_skb_cb qdisc related padding 2008-12-17 15:44:58 -08:00
iucv [S390] iucv: Locking free version of iucv_message_(receive|send) 2008-12-25 13:39:04 +01:00
netfilter netfilter: fix warning in net/netfilter/nf_conntrack_proto_tcp.c 2008-11-25 18:20:13 +01:00
netns netns: ip6mr: declare reg_vif_num per-namespace 2008-12-10 16:29:24 -08:00
phonet Phonet: use atomic for packet TX window 2008-12-17 15:48:31 -08:00
sctp sctp: Implement socket option SCTP_GET_ASSOC_NUMBER 2008-12-25 16:57:24 -08:00
tc_act pkt_action: add new action skbedit 2008-09-12 16:30:20 -07:00
tipc tipc: Remove unneeded parameter to tipc_createport_raw() 2008-07-14 22:42:19 -07:00
act_api.h [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get 2008-01-28 15:11:17 -08:00
addrconf.h netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr() 2008-08-14 15:33:21 -07:00
af_rxrpc.h
af_unix.h net: Fix soft lockups/OOM issues w/ unix garbage collector 2008-11-26 15:32:27 -08:00
ah.h
arp.h [NETFILTER]: ebtables: remove casts, use consts 2008-01-31 19:27:33 -08:00
atmclip.h clip: convert to internal network_device_stats 2009-01-21 14:01:59 -08:00
ax25.h [AX25] ax25_ds_timer: use mod_timer instead of add_timer 2008-02-12 17:53:34 -08:00
ax88796.h
cfg80211.h mac80211: Fix HT channel selection 2008-12-19 15:22:54 -05:00
checksum.h include/net net/ - csum_partial - remove unnecessary casts 2008-11-19 15:44:53 -08:00
cipso_ipv4.h netlabel: Update kernel configuration API 2008-12-31 12:54:11 -05:00
compat.h net: Use standard structures for generic socket address structures. 2008-07-19 22:35:47 -07:00
datalink.h
dcbnl.h net: fix DCB setstate to return success/failure 2008-12-21 20:09:50 -08:00
dn_dev.h
dn_fib.h decnet: remove private wrappers of endian helpers 2008-11-27 00:12:47 -08:00
dn_neigh.h
dn_nsp.h
dn_route.h
dn.h decnet: compile fix for removal of byteorder wrapper 2008-11-27 23:04:13 -08:00
dsa.h dsa: add support for Trailer tagging format 2008-10-08 17:24:16 -07:00
dsfield.h [NET]: Constify include/net/dsfield.h 2008-01-28 14:55:58 -08:00
dst.h netns xfrm: lookup in netns 2008-11-25 17:35:18 -08:00
esp.h [IPSEC]: Use crypto_aead and authenc in ESP 2008-01-31 19:27:02 -08:00
fib_rules.h net: add fib_rules_ops to flush_cache method 2008-07-05 19:01:28 -07:00
flow.h netns xfrm: lookup in netns 2008-11-25 17:35:18 -08:00
garp.h vlan: Add GVRP support 2008-07-05 21:26:57 -07:00
gen_stats.h pkt_sched: gen_estimator: Optimize gen_estimator_active() 2008-11-26 15:24:32 -08:00
genetlink.h netlink: Improve returned error codes 2008-06-03 16:36:54 -07:00
icmp.h mib: put icmpmsg statistics on struct net 2008-07-18 04:04:22 -07:00
ieee80211_radiotap.h wireless: clean up radiotap a bit 2008-12-05 09:32:59 -05:00
ieee80211.h ieee80211_security: correct warning about width of auth_mode 2008-12-12 13:48:30 -05:00
if_inet6.h ipv6: make struct ipv6_devconf static 2008-07-22 14:21:58 -07:00
inet6_connection_sock.h
inet6_hashtables.h inet: Don't lookup the socket if there's a socket attached to the skb 2008-10-07 12:41:01 -07:00
inet_common.h [NETNS]: Inet control socket should not hold a namespace. 2008-04-03 14:28:30 -07:00
inet_connection_sock.h net: more #ifdef CONFIG_COMPAT 2008-08-28 02:53:51 -07:00
inet_ecn.h [IPV6]: Use appropriate sock tclass setting for routing lookup. 2008-04-13 23:40:51 -07:00
inet_frag.h [NET]: Rename inet_frag.h identifiers COMPLETE, FIRST_IN, LAST_IN to INET_FRAG_* 2008-03-28 16:35:27 -07:00
inet_hashtables.h inet: Allowing more than 64k connections and heavily optimize bind(0) time. 2009-01-21 14:34:31 -08:00
inet_sock.h tcp: Port redirection support for TCP 2008-10-01 07:46:49 -07:00
inet_timewait_sock.h net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls 2008-11-16 19:40:17 -08:00
inetpeer.h net: remove CVS keywords 2008-06-11 21:00:38 -07:00
ip6_checksum.h
ip6_fib.h [NETNS][IPV6] rt6_info - move rt6_info structure inside the namespace 2008-03-04 13:48:30 -08:00
ip6_route.h netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr() 2008-08-14 15:33:21 -07:00
ip6_tunnel.h net: remove CVS keywords 2008-06-11 21:00:38 -07:00
ip_fib.h [IPV4]: Fix compile error building without CONFIG_FS_PROC 2008-02-05 02:54:16 -08:00
ip_vs.h include/net net/ - csum_partial - remove unnecessary casts 2008-11-19 15:44:53 -08:00
ip.h netns xfrm: per-netns sysctls 2008-11-25 18:00:48 -08:00
ipcomp.h ipsec: ipcomp - Merge IPComp implementations 2008-07-25 02:54:40 -07:00
ipconfig.h net: remove CVS keywords 2008-06-11 21:00:38 -07:00
ipip.h inet: Make tunnel RX/TX byte counters more consistent 2008-10-09 12:03:17 -07:00
ipv6.h ipv6: making ip and icmp statistics per/namespace 2008-10-08 11:16:45 -07:00
ipx.h
iw_handler.h wext: Emit event stream entries correctly when compat. 2008-06-16 18:50:49 -07:00
lapb.h
lib80211.h wireless: missing include in lib80211.h 2008-11-21 11:42:55 -05:00
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h [LLC]: Kill static inline llc_addrany 2008-02-29 11:46:17 -08:00
llc_pdu.h [LLC]: skb allocation size for responses 2008-03-31 21:02:47 -07:00
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h [LLC]: skb allocation size for responses 2008-03-31 21:02:47 -07:00
llc.h [LLC]: station source mac address 2008-03-28 16:28:36 -07:00
mac80211.h mac80211: more kernel-doc fixes 2009-01-16 17:08:23 -05:00
mip6.h [IPV6] MIP6: Use our standard definitions for paddings. 2008-04-12 13:43:22 +09:00
ndisc.h ipv6: Fix sporadic sendmsg -EINVAL when sending to multicast groups. 2009-01-04 16:04:39 -08:00
neighbour.h net: Cleanup of neighbour code 2008-11-12 00:54:54 -08:00
net_namespace.h netns xfrm: add netns boilerplate 2008-11-25 17:14:31 -08:00
netdma.h net_dma: convert to dma_find_channel 2009-01-06 11:38:15 -07:00
netevent.h [NET]: Remove unnecessary inclusion of dst.h 2008-01-28 14:53:38 -08:00
netlabel.h netlabel: Update kernel configuration API 2008-12-31 12:54:11 -05:00
netlink.h netlink: fix (theoretical) overrun in message iteration 2008-12-25 17:21:17 -08:00
netrom.h netrom: convert to internal net_device_stats 2009-01-21 14:02:01 -08:00
nexthop.h
p8022.h
pkt_cls.h ematch: simpler tcf_em_unregister() 2008-11-16 23:01:49 -08:00
pkt_sched.h pkt_sched: Remove the tx queue state check in qdisc_run() 2008-09-23 01:05:56 -07:00
protocol.h ipv6: Add GRO support 2009-01-08 10:40:57 -08:00
psnap.h
raw.h [RAW]: Add raw_hashinfo member on struct proto. 2008-03-22 16:56:51 -07:00
rawv6.h [IPv6] RAW: Compact the API for the kernel 2008-01-28 14:54:29 -08:00
red.h
request_sock.h net: Fix memory leak in the proto_register function 2008-11-21 16:45:22 -08:00
rose.h rose: improving AX25 routing frames via ROSE network 2008-06-17 17:08:32 -07:00
route.h ipv4: Conditionally enable transparent flow flag when connecting 2008-10-01 07:35:39 -07:00
rtnetlink.h [RTNL]: Introduce the rtnl_kill_links helper. 2008-04-16 00:46:52 -07:00
sch_generic.h pkt_sched: Remove qdisc->ops->requeue() etc. 2008-11-13 22:56:30 -08:00
scm.h Merge branch 'master' into next 2008-11-14 11:29:12 +11:00
slhc_vj.h
snmp.h net: remove CVS keywords 2008-06-11 21:00:38 -07:00
sock.h net: Use a percpu_counter for orphan_count 2008-11-25 21:17:14 -08:00
stp.h net: Add STP demux layer 2008-07-05 21:25:39 -07:00
tcp_states.h
tcp.h tcp: Add GRO support 2008-12-15 23:43:36 -08:00
timewait_sock.h net: Fix memory leak in the proto_register function 2008-11-21 16:45:22 -08:00
transp_v6.h net: change proto destroy method to return void 2008-06-14 17:04:49 -07:00
udp.h udp: Use hlist_nulls in UDP RCU code 2008-11-16 19:39:21 -08:00
udplite.h udp: introduce struct udp_table and multiple spinlocks 2008-10-29 01:41:45 -07:00
wext.h wext: Dispatch and handle compat ioctls entirely in net/wireless/wext.c 2008-06-16 18:32:46 -07:00
wimax.h wimax: fix typo in kernel-doc for debugfs_dentry in struct wimax_dev 2009-01-11 00:06:32 -08:00
wireless.h cfg80211: add support for custom firmware regulatory solutions 2008-11-25 16:41:27 -05:00
x25.h
x25device.h
xfrm.h netns xfrm: per-netns sysctls 2008-11-25 18:00:48 -08:00