linux/include/net
Eric Dumazet 75c119afe1 tcp: implement rb-tree based retransmit queue
Using a linear list to store all skbs in write queue has been okay
for quite a while : O(N) is not too bad when N < 500.

Things get messy when N is the order of 100,000 : Modern TCP stacks
want 10Gbit+ of throughput even with 200 ms RTT flows.

40 ns per cache line miss means a full scan can use 4 ms,
blowing away CPU caches.

SACK processing often can use various hints to avoid parsing
whole retransmit queue. But with high packet losses and/or high
reordering, hints no longer work.

Sender has to process thousands of unfriendly SACK, accumulating
a huge socket backlog, burning a cpu and massively dropping packets.

Using an rb-tree for retransmit queue has been avoided for years
because it added complexity and overhead, but now is the time
to be more resistant and say no to quadratic behavior.

1) RTX queue is no longer part of the write queue : already sent skbs
are stored in one rb-tree.

2) Since reaching the head of write queue no longer needs
sk->sk_send_head, we added an union of sk_send_head and tcp_rtx_queue

Tested:

 On receiver :
 netem on ingress : delay 150ms 200us loss 1
 GRO disabled to force stress and SACK storms.

for f in `seq 1 10`
do
 ./netperf -H lpaa6 -l30 -- -K bbr -o THROUGHPUT|tail -1
done | awk '{print $0} {sum += $0} END {printf "%7u\n",sum}'

Before patch :

323.87
351.48
339.59
338.62
306.72
204.07
304.93
291.88
202.47
176.88
   2840

After patch:

1700.83
2207.98
2070.17
1544.26
2114.76
2124.89
1693.14
1080.91
2216.82
1299.94
  18053

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07 00:28:54 +01:00
..
9p 9p: Implement show_options 2017-07-11 06:08:58 -04:00
bluetooth Bluetooth: make baswap src const 2017-09-01 22:49:47 +02:00
caif
iucv s390/iucv: do not use arrays as argument 2015-09-21 16:03:04 -07:00
netfilter netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable" 2017-09-08 18:55:50 +02:00
netns ipv4: Namespaceify tcp_fastopen_blackhole_timeout knob 2017-10-01 17:55:54 -07:00
nfc NFC: Add nfc_dbg() macro 2017-04-05 10:15:20 +02:00
phonet sock: struct proto hash function may error 2016-02-11 03:54:14 -05:00
sctp sctp: introduce round robin stream scheduler 2017-10-03 16:27:29 -07:00
tc_act net: sched: introduce helper to identify gact pass action 2017-09-26 20:26:45 -07:00
6lowpan.h 6lowpan: Fix IID format for Bluetooth 2017-04-12 22:02:36 +02:00
act_api.h net_sched: get rid of tcfa_rcu 2017-09-12 20:41:02 -07:00
addrconf.h net: Convert int functions to bool 2017-09-18 11:40:03 -07:00
af_ieee802154.h ieee802154: af_ieee802154: fix typo in comment. 2015-09-17 13:20:05 +02:00
af_rxrpc.h rxrpc: Allow failed client calls to be retried 2017-08-29 10:55:20 +01:00
af_unix.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-07-21 03:38:43 +01:00
af_vsock.h VSOCK: use TCP state constants for sk_state 2017-10-05 18:44:17 -07:00
ah.h
arp.h net: convert neighbour.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
atmclip.h
ax25.h net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t 2017-07-04 22:35:19 +01:00
ax88796.h
bond_3ad.h bonding: 3ad: apply ad_actor settings changes immediately 2016-02-09 04:45:49 -05:00
bond_alb.h
bond_options.h bonding: Prevent duplicate userspace notification 2017-05-27 18:51:41 -04:00
bonding.h net: Add extack to ndo_add_slave 2017-10-04 21:39:33 -07:00
busy_poll.h net: fix compilation when busy poll is not enabled 2017-08-11 14:59:24 -07:00
calipso.h net, calipso: convert calipso_doi.refcount from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
cfg80211-wext.h
cfg80211.h nl80211: add authorized flag to ROAM event 2017-06-13 11:04:37 +02:00
cfg802154.h ieee802154: add netns support 2016-07-08 12:20:57 +02:00
checksum.h csum: eliminate sparse warning in remcsum_unadjust() 2017-01-20 12:12:13 -05:00
cipso_ipv4.h net, ipv4: convert cipso_v4_doi.refcount from atomic_t to refcount_t 2017-07-04 01:29:04 -07:00
cls_cgroup.h cls_cgroup: get sk_classid only from full sockets 2016-04-19 20:09:25 -04:00
codel_impl.h codel: split into multiple files 2016-04-25 16:44:27 -04:00
codel_qdisc.h net_sched: fq_codel: cache skb->truesize into skb->cb 2016-06-25 12:19:35 -04:00
codel.h codel: split into multiple files 2016-04-25 16:44:27 -04:00
compat.h packet: compat support for sock_fprog 2016-06-09 23:41:03 -07:00
datalink.h
dcbevent.h
dcbnl.h
devlink.h devlink: Add IPv6 header for dpipe 2017-08-31 14:42:19 -07:00
dn_dev.h
dn_fib.h net, decnet: convert dn_fib_info.fib_clntref from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
dn_neigh.h netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
dn_nsp.h
dn_route.h
dn.h
dsa.h net: dsa: remove tag ops from the switch tree 2017-10-01 04:15:07 +01:00
dsfield.h
dst_cache.h net: add dst_cache support 2016-02-16 20:21:48 -05:00
dst_metadata.h net/dst: Make skb parameter of skb{metadata_dst, tunnel_info}() const 2017-10-02 11:06:07 -07:00
dst_ops.h net: add confirm_neigh method to dst_ops 2017-02-07 13:07:46 -05:00
dst.h net: prevent dst uses after free 2017-09-21 20:42:15 -07:00
erspan.h gre: introduce native tunnel support for ERSPAN 2017-08-22 14:29:30 -07:00
esp.h esp6: Reorganize esp_output 2017-04-14 10:06:42 +02:00
ethoc.h net/ethoc: support big-endian register layout 2015-09-23 15:33:15 -07:00
fib_notifier.h fib: notifier: Add VIF add and delete event types 2017-09-27 11:33:27 -07:00
fib_rules.h net: fib_rules: Implement notification logic in core 2017-08-03 15:35:59 -07:00
firewire.h
flow_dissector.h flow_dissector: Cleanup control flow 2017-09-05 11:40:08 -07:00
flow.h net: Extend struct flowi6 with multipath hash 2017-08-24 18:21:17 -07:00
fou.h fou: Add encap ops for IPv6 tunnels 2016-05-20 18:03:16 -04:00
fq_impl.h fq.h: Port memory limit mechanism from fq_codel 2016-09-30 13:29:21 +02:00
fq.h fq.h: Port memory limit mechanism from fq_codel 2016-09-30 13:29:21 +02:00
garp.h
gen_stats.h net_sched: gen_estimator: complete rewrite of rate estimators 2016-12-05 15:21:59 -05:00
genetlink.h genetlink: remove ops_list from genetlink header. 2017-06-05 10:54:55 -04:00
geneve.h net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
gre.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-08-18 01:17:32 -04:00
gro_cells.h gro_cells: move to net/core/gro_cells.c 2017-02-08 14:38:18 -05:00
gtp.h gtp: #define #define _GTP_H_ and not #define _GTP_H 2016-07-25 17:55:43 -07:00
gue.h
hwbm.h net: add a hardware buffer management helper API 2016-03-14 12:19:46 -04:00
icmp.h net: snmp: kill STATS_BH macros 2016-04-27 22:48:25 -04:00
ieee80211_radiotap.h wireless: radiotap: rewrite the radiotap header file 2017-01-25 16:00:33 +01:00
ieee802154_netdev.h mac802154: constify ieee802154_llsec_ops structure 2016-01-04 20:40:41 +01:00
if_inet6.h net, ipv6: convert ifacaddr6.aca_refcnt from atomic_t to refcount_t 2017-07-04 01:29:04 -07:00
ife.h net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
ila.h ila: Add generic ILA translation facility 2015-12-15 23:25:20 -05:00
inet6_connection_sock.h inet: drop ->bind_conflict 2017-01-18 13:04:28 -05:00
inet6_hashtables.h net: ipv6: add second dif to inet6 socket lookups 2017-08-07 11:39:22 -07:00
inet_common.h net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
inet_connection_sock.h tcp: ULP infrastructure 2017-06-15 12:12:40 -04:00
inet_ecn.h net-ipv6: remove unused IP6_ECN_clear() function 2017-10-01 17:55:54 -07:00
inet_frag.h Revert "net: fix percpu memory leaks" 2017-09-03 11:01:05 -07:00
inet_hashtables.h net: ipv4: add second dif to inet socket lookups 2017-08-07 11:39:21 -07:00
inet_sock.h net/tcp-fastopen: Add new API support 2017-01-25 14:04:38 -05:00
inet_timewait_sock.h ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob 2016-12-29 11:38:31 -05:00
inetpeer.h inetpeer: remove AVL implementation in favor of RB tree 2017-07-17 08:59:01 -07:00
ip6_checksum.h ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short 2016-03-13 23:55:13 -04:00
ip6_fib.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-01 17:42:05 -07:00
ip6_route.h ipv6: Use rt6i_idev index for echo replies to a local address 2017-08-29 15:32:25 -07:00
ip6_tunnel.h ip6_tunnel: Allow policy-based routing through tunnels 2017-04-21 13:21:30 -04:00
ip_fib.h net: ipv4: remove fib_weight 2017-09-29 06:19:32 +01:00
ip_tunnels.h ipv4: speedup ipv6 tunnels dismantle 2017-09-19 16:32:24 -07:00
ip_vs.h ipvs: remove unused function ip_vs_set_state_timeout 2017-04-28 12:00:10 +02:00
ip.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-21 17:06:42 -07:00
ipcomp.h
ipconfig.h
ipv6.h net/ipv6: Convert icmpv6_push_pending_frames to void 2017-10-06 09:52:31 -07:00
ipx.h net, ipx: convert ipx_route.refcnt from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
iw_handler.h wext: uninline stream addition functions 2017-01-13 09:38:42 +01:00
kcm.h kcm: Use stream parser 2016-08-17 19:36:23 -04:00
l3mdev.h net: ipv4: Do not drop to make_route if oif is l3mdev 2016-10-13 12:05:26 -04:00
lapb.h net, lapb: convert lapb_cb.refcnt from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
lib80211.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h net, llc: convert llc_sap.refcnt from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
lwtunnel.h net: add extack arg to lwtunnel build state 2017-05-30 11:55:32 -04:00
mac80211.h mac80211: fix VLAN handling with TXQs 2017-09-05 11:28:43 +02:00
mac802154.h ieee802154: cleanup WARN_ON for fc fetch 2016-07-08 13:23:12 +02:00
mip6.h
mld.h
mpls_iptunnel.h net: mpls: Increase max number of labels for lwt encap 2017-04-01 20:21:44 -07:00
mpls.h openvswitch: use mpls_hdr 2016-10-03 02:00:22 -04:00
mrp.h
ncsi.h net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references 2017-09-05 09:11:45 -07:00
ndisc.h net: convert neighbour.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
neighbour.h neigh: make strucrt neigh_table::entry_size unsigned int 2017-09-25 20:36:17 -07:00
net_namespace.h net: core: Make the FIB notification chain generic 2017-08-03 15:35:59 -07:00
net_ratelimit.h
netevent.h neigh: Send a notification when DELAY_PROBE_TIME changes 2016-07-05 09:06:29 -07:00
netlabel.h net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t 2017-07-01 07:39:09 -07:00
netlink.h netlink: fix nla_put_{u8,u16,u32} for KASAN 2017-09-25 20:18:27 -07:00
netprio_cgroup.h net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct 2015-12-08 22:02:33 -05:00
netrom.h net, netrom: convert nr_node.refcount from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
nexthop.h
nl802154.h ieee802154: add netns support 2016-07-08 12:20:57 +02:00
nsh.h net: add NSH header structures and helpers 2017-08-29 15:16:52 -07:00
p8022.h
ping.h net: ping: make ping_v6_sendmsg static 2016-03-23 22:09:58 -04:00
pkt_cls.h net: sched: remove cops->tcf_cl_offload 2017-08-11 13:47:01 -07:00
pkt_sched.h net: sched: Add helpers to identify classids 2017-08-11 13:47:00 -07:00
pptp.h pptp: Refactor the struct and macros of PPTP codes 2016-08-15 10:55:53 -07:00
protocol.h IPv4: early demux can return an error code 2017-10-01 03:55:47 +01:00
psample.h net: Introduce psample, a new genetlink channel for packet sampling 2017-01-24 13:44:28 -05:00
psnap.h
raw.h net: ipv4: add second dif to raw socket lookups 2017-08-07 11:39:21 -07:00
rawv6.h net: ipv6: add second dif to raw socket lookups 2017-08-07 11:39:22 -07:00
red.h ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
regulatory.h
request_sock.h net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
rose.h
route.h udp: perform source validation for mcast early demux 2017-10-01 03:55:47 +01:00
rtnetlink.h rtnetlink: remove __rtnl_af_unregister 2017-10-04 10:33:59 -07:00
sch_generic.h net_sched: no need to free qdisc in RCU callback 2017-09-19 16:30:03 -07:00
scm.h sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
secure_seq.h tcp: Namespaceify sysctl_tcp_timestamps 2017-06-08 10:53:29 -04:00
seg6_hmac.h ipv6: sr: add core files for SR HMAC support 2016-11-09 20:40:06 -05:00
seg6.h ipv6: sr: add support for ip4ip6 encapsulation 2017-08-25 17:10:23 -07:00
slhc_vj.h
smc.h smc: netlink interface for SMC sockets 2017-01-09 16:07:41 -05:00
snmp.h net: snmp: fix 64bit stats on 32bit arches 2016-04-28 11:49:45 -04:00
sock_reuseport.h soreuseport: fix NULL ptr dereference SO_REUSEPORT after bind 2016-01-19 14:44:23 -05:00
sock.h tcp: implement rb-tree based retransmit queue 2017-10-07 00:28:54 +01:00
Space.h
stp.h
strparser.h strparser: initialize all callbacks 2017-08-24 21:57:50 -07:00
switchdev.h net: switchdev: Remove bridge bypass support from switchdev 2017-08-07 14:48:48 -07:00
tcp_states.h
tcp.h tcp: implement rb-tree based retransmit queue 2017-10-07 00:28:54 +01:00
timewait_sock.h inet: remove BUG_ON() in twsk_destructor() 2015-07-09 15:12:20 -07:00
tls.h tls: kernel TLS support 2017-06-15 12:12:40 -04:00
transp_v6.h ipv6: add new struct ipcm6_cookie 2016-05-03 16:08:14 -04:00
tso.h net: define the TSO header size in net/tso.h 2017-08-23 20:42:09 -07:00
tun_proto.h vxlan: factor out VXLAN-GPE next protocol 2017-08-29 15:16:52 -07:00
udp_tunnel.h net: add infrastructure to un-offload UDP tunnel port 2017-07-24 13:52:59 -07:00
udp.h IPv4: early demux can return an error code 2017-10-01 03:55:47 +01:00
udplite.h udp: use a separate rx queue for packet reception 2017-05-16 15:41:29 -04:00
vsock_addr.h
vxlan.h vxlan: factor out VXLAN-GPE next protocol 2017-08-29 15:16:52 -07:00
wext.h dev_ioctl: copy only the smaller struct iwreq for wext 2017-06-14 13:52:44 +02:00
wimax.h
x25.h net, x25: convert x25_neigh.refcnt from atomic_t to refcount_t 2017-07-04 22:35:18 +01:00
x25device.h
xfrm.h xfrm: Add support for network devices capable of removing the ESP trailer 2017-08-31 09:04:03 +02:00