linux/net/sched
Maxim Mikityanskiy d03b195b5a sch_htb: Hierarchical QoS hardware offload
HTB doesn't scale well because of contention on a single lock, and it
also consumes CPU. This patch adds support for offloading HTB to
hardware that supports hierarchical rate limiting.

In the offload mode, HTB passes control commands to the driver using
ndo_setup_tc. The driver has to replicate the whole hierarchy of classes
and their settings (rate, ceil) in the NIC. Every modification of the
HTB tree caused by the admin results in ndo_setup_tc being called.

After this setup, the HTB algorithm is done completely in the NIC. An SQ
(send queue) is created for every leaf class and attached to the
hierarchy, so that the NIC can calculate and obey aggregated rate
limits, too. In the future, it can be changed, so that multiple SQs will
back a single leaf class.

ndo_select_queue is responsible for selecting the right queue that
serves the traffic class of each packet.

The data path works as follows: a packet is classified by clsact, the
driver selects a hardware queue according to its class, and the packet
is enqueued into this queue's qdisc.

This solution addresses two main problems of scaling HTB:

1. Contention by flow classification. Currently the filters are attached
to the HTB instance as follows:

    # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
    classid 1:10

It's possible to move classification to clsact egress hook, which is
thread-safe and lock-free:

    # tc filter add dev eth0 egress protocol ip flower dst_port 80
    action skbedit priority 1:10

This way classification still happens in software, but the lock
contention is eliminated, and it happens before selecting the TX queue,
allowing the driver to translate the class to the corresponding hardware
queue in ndo_select_queue.

Note that this is already compatible with non-offloaded HTB and doesn't
require changes to the kernel nor iproute2.

2. Contention by handling packets. HTB is not multi-queue, it attaches
to a whole net device, and handling of all packets takes the same lock.
When HTB is offloaded, it registers itself as a multi-queue qdisc,
similarly to mq: HTB is attached to the netdev, and each queue has its
own qdisc.

Some features of HTB may be not supported by some particular hardware,
for example, the maximum number of classes may be limited, the
granularity of rate and ceil parameters may be different, etc. - so, the
offload is not enabled by default, a new parameter is used to enable it:

    # tc qdisc replace dev eth0 root handle 1: htb offload

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22 20:41:29 -08:00
..
act_api.c net_sched: fix RTNL deadlock again caused by request_module() 2021-01-18 20:13:55 -08:00
act_bpf.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
act_connmark.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_csum.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_ct.c net/sched: cls_flower add CT_FLAGS_INVALID flag support 2021-01-20 21:09:44 -08:00
act_ctinfo.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
act_gact.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_gate.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
act_ife.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_ipt.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
act_meta_mark.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
act_meta_skbprio.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
act_meta_skbtcindex.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
act_mirred.c net/sched: sch_frag: add generic packet fragment support. 2020-11-27 14:36:02 -08:00
act_mpls.c net/sched: act_mpls: ensure LSE is pullable before reading it 2020-12-03 11:13:37 -08:00
act_nat.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_pedit.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_police.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_sample.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_simple.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
act_skbedit.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_skbmod.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_tunnel_key.c net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels 2020-10-20 21:10:41 -07:00
act_vlan.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
cls_api.c net_sched: fix RTNL deadlock again caused by request_module() 2021-01-18 20:13:55 -08:00
cls_basic.c net_sched: fix ops->bind_class() implementations 2020-01-27 10:51:43 +01:00
cls_bpf.c net_sched: fix ops->bind_class() implementations 2020-01-27 10:51:43 +01:00
cls_cgroup.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cls_flow.c Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
cls_flower.c net/sched: cls_flower add CT_FLAGS_INVALID flag support 2021-01-20 21:09:44 -08:00
cls_fw.c net_sched: fix ops->bind_class() implementations 2020-01-27 10:51:43 +01:00
cls_matchall.c net: qos offload add flow status with dropped count 2020-06-19 12:53:30 -07:00
cls_route.c net_sched: cls_route: remove the right filter from hashtable 2020-03-16 01:59:32 -07:00
cls_rsvp6.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cls_rsvp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cls_rsvp.h net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
cls_tcindex.c net_sched: avoid shift-out-of-bounds in tcindex_set_parms() 2021-01-15 18:11:10 -08:00
cls_u32.c net/sched: cls_u32: simplify the return expression of u32_reoffload_knode() 2020-12-08 16:22:53 -08:00
em_canid.c net: sched: kerneldoc fixes 2020-07-13 17:20:40 -07:00
em_cmp.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
em_ipset.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_ipt.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_meta.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_nbyte.c net: sched: Replace zero-length array with flexible-array member 2020-02-29 21:27:02 -08:00
em_text.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
em_u32.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ematch.c net: sched: kerneldoc fixes 2020-07-13 17:20:40 -07:00
Kconfig net: sched: incorrect Kconfig dependencies on Netfilter modules 2020-12-09 15:49:29 -08:00
Makefile net/sched: sch_frag: add generic packet fragment support. 2020-11-27 14:36:02 -08:00
sch_api.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_atm.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_blackhole.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_cake.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
sch_cbq.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_cbs.c net: don't include ethtool.h from netdevice.h 2020-11-23 17:27:04 -08:00
sch_choke.c net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sch_codel.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_drr.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_dsmark.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_etf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_ets.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_fifo.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_fq_codel.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
sch_fq_pie.c net/sched: fq_pie: initialize timer earlier in fq_pie_init() 2020-12-04 14:15:01 -08:00
sch_fq.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_frag.c net/sched: sch_frag: add generic packet fragment support. 2020-11-27 14:36:02 -08:00
sch_generic.c net/sched: get rid of qdisc->padded 2020-10-09 08:08:08 -07:00
sch_gred.c net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sch_hfsc.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_hhf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_htb.c sch_htb: Hierarchical QoS hardware offload 2021-01-22 20:41:29 -08:00
sch_ingress.c net: sched: Pass ingress block to tcf_classify_ingress 2020-02-19 17:49:48 -08:00
sch_mq.c net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues 2019-12-03 11:53:55 -08:00
sch_mqprio.c mqprio: Fix out-of-bounds access in mqprio_dump 2019-12-06 11:58:45 -08:00
sch_multiq.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_netem.c netem: fix zero division in tabledist 2020-10-29 11:45:47 -07:00
sch_pie.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
sch_plug.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_prio.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_qfq.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_red.c net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sch_sfb.c net: sched: Add extack to Qdisc_class_ops.delete 2021-01-22 20:41:29 -08:00
sch_sfq.c net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sch_skbprio.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_taprio.c taprio: boolean values to a bool variable 2021-01-19 17:43:48 -08:00
sch_tbf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_teql.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00