Commit Graph

1015745 Commits

Author SHA1 Message Date
Phil Sutter
06e95f0a2a netfilter: nft_extdhr: Drop pointless check of tprot_set
Pablo says, tprot_set is only there to detect if tprot was set to
IPPROTO_IP as that evaluates to zero. Therefore, code asserting a
different value in tprot does not need to check tprot_set.

Fixes: 935b7f6430 ("netfilter: nft_exthdr: add TCP option matching")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-16 22:25:01 +02:00
Phil Sutter
5acc44f394 netfilter: nft_exthdr: Search chunks in SCTP packets only
Since user space does not generate a payload dependency, plain sctp
chunk matches cause searching in non-SCTP packets, too. Avoid this
potential mis-interpretation of packet data by checking pkt->tprot.

Fixes: 133dc203d7 ("netfilter: nft_exthdr: Support SCTP chunks")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-16 22:25:01 +02:00
Yang Yingliang
c765449591 net: chelsio: cxgb4: use eth_zero_addr() to assign zero address
Using eth_zero_addr() to assign zero address insetad of
inefficient copy from an array.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:53:17 -07:00
David S. Miller
1f5c3cc1dd Merge branch 'cosa-cleanups'
Peng Li says:

====================
net: cosa: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:34 -07:00
Peng Li
b877320527 net: cosa: remove redundant spaces
According to the chackpatch.pl,
no spaces is necessary at the start of a line,
no space is necessary after a cast.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
6619e2b63b net: cosa: remove trailing whitespaces
This patch removes trailing whitespaces.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
e84c3e1436 net: cosa: add some required spaces
Add space required before the open parenthesis '(' and '{'.
Add space required after that close brace '}' and ','
Add spaces required around that '=' , '&', '*', '|', '+', '/' and '-'.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
573747254f net: cosa: fix the code style issue about trailing statements
Trailing statements should be on next line.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
9edc7d68b0 net: cosa: fix the alignment issue
Alignment should match open parenthesis.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
3fac4b941c net: cosa: use BIT macro
This patch uses the BIT macro for setting individual bits,
to fix the following checkpatch.pl issue:
CHECK: Prefer using the BIT macro.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
acc3edf005 net: cosa: add necessary () to macro argument
Macro argument 'cosa' may be better as '(cosa)' to avoid
precedence issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
70d063b9a6 net: cosa: remove redundant braces {}
This patch removes redundant braces {}, to fix the
checkpatch.pl warning:
"braces {} are not necessary for single statement blocks".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
c8f4b11727 net: cosa: add braces {} to all arms of the statement
Braces {} should be used on all arms of this statement.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
c0a963e25d net: cosa: fix the comments style issue
Networking block comments don't use an empty /* line,
use /* Comment...

Block comments use * on subsequent lines.
Block comments use a trailing */ on a separate line.

This patch fixes the comments style issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
b4d5f1e2cd net: cosa: move out assignment in if condition
Should not use assignment in if condition.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
2076b3e61a net: cosa: replace comparison to NULL with "!chan->rx_skb"
According to the chackpatch.pl, comparison to NULL could
be written "!chan->rx_skb".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
77282db510 net: cosa: fix the code style issue about "foo* bar"
Fix the checkpatch error as "foo* bar" should be "foo *bar".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
0569a3d416 net: cosa: add blank line after declarations
This patch fixes the checkpatch error about missing a blank line
after declarations.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Peng Li
786f0dc627 net: cosa: remove redundant blank lines
This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:52:33 -07:00
Zou Wei
95d359ed5a net: iosm: add missing MODULE_DEVICE_TABLE
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:45:37 -07:00
Wang Hai
56b57b809f qlcnic: Use list_for_each_entry() to simplify code in qlcnic_main.c
Convert list_for_each() to list_for_each_entry() where
applicable. This simplifies the code.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:44:54 -07:00
Jakub Kicinski
4d1fb7cde0 ethtool: add a stricter length check
There has been a few errors in the ethtool reply size calculations,
most of those are hard to trigger during basic testing because of
skb size rounding up and netdev names being shorter than max.
Add a more precise check.

This change will affect the value of payload length displayed in
case of -EMSGSIZE but that should be okay, "payload length" isn't
a well defined term here.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:40:44 -07:00
Maciej Żenczykowski
1b3fc77176 inet_diag: add support for tw_mark
Timewait sockets have included mark since approx 4.18.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Jon Maxwell <jmaxwell37@gmail.com>
Fixes: 0048369055 ("tcp: Add mark for TIMEWAIT sockets")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:38:59 -07:00
Jiapeng Chong
1d0bbbf22b net: mhi_net: make mhi_wwan_ops static
This symbol is not used outside of net.c, so marks it static.

Fix the following sparse warning:

drivers/net/mhi/net.c:385:23: warning: symbol 'mhi_wwan_ops' was not
declared. Should it be static?

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:38:02 -07:00
David S. Miller
66aeec855a Merge branch 'hns3-next'
Guangbin Huang says:

====================
net: hns3: updates for -next

This series includes some optimization in IO path for the HNS3 ethernet
driver.
====================

Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:36:20 -07:00
Yunsheng Lin
99f6b5fb5f net: hns3: use bounce buffer when rx page can not be reused
Currently rx page will be reused to receive future packet when
the stack releases the previous skb quickly. If the old page
can not be reused, a new page will be allocated and mapped,
which comsumes a lot of cpu when IOMMU is in the strict mode,
especially when the application and irq/NAPI happens to run on
the same cpu.

So allocate a new frag to memcpy the data to avoid the costly
IOMMU unmapping/mapping operation, and add "frag_alloc_err"
and "frag_alloc" stats in "ethtool -S ethX" cmd.

The throughput improves above 50% when running single thread of
iperf using TCP when IOMMU is in strict mode and iperf shares the
same cpu with irq/NAPI(rx_copybreak = 2048 and mtu = 1500).

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:36:06 -07:00
Yunsheng Lin
fa7711b888 net: hns3: optimize the rx page reuse handling process
Current rx page offset only reset to zero when all the below
conditions are satisfied:
1. rx page is only owned by driver.
2. rx page is reusable.
3. the page offset that is above to be given to the stack has
reached the end of the page.

If the page offset is over the hns3_buf_size(), it means the
buffer below the offset of the page is usable when the above
condition 1 & 2 are satisfied, so page offset can be reset to
zero instead of increasing the offset. We may be able to always
reuse the first 4K buffer of a 64K page, which means we can
limit the hot buffer size as much as possible.

The above optimization is a side effect when refacting the
rx page reuse handling in order to support the rx copybreak.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:36:06 -07:00
Yunsheng Lin
7459775e9f net: hns3: support dma_map_sg() for multi frags skb
Using the queue based tx buffer, it is also possible to allocate a
sgl buffer, and use skb_to_sgvec() to convert the skb to the sgvec
in order to support the dma_map_sg() to decreases the overhead of
IOMMU mapping and unmapping.

Firstly, it reduces the number of buffers. For example, a tcp skb
may have a 66-byte header and 3 fragments of 4328, 32768, and 28064
bytes. With this patch, dma_map_sg() will combine them into two
buffers, 66-bytes header and one 65160-bytes fragment by using IOMMU.

Secondly, it reduces the number of dma mapping and unmapping. All the
original 4 buffers are mapped only once rather than 4 times.

The throughput improves above 10% when running single thread of iperf
using TCP when IOMMU is in strict mode.

Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:36:06 -07:00
Huazhong Tan
1a00197b7d net: hns3: add support to query tx spare buffer size for pf
Add support to query tx spare buffer size from configuration
file, and use this info to do spare buffer initialization when
the module parameter 'tx_spare_buf_size' is not specified.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:36:06 -07:00
Yunsheng Lin
907676b130 net: hns3: use tx bounce buffer for small packets
when the packet or frag size is small, it causes both security and
performance issue. As dma can't map sub-page, this means some extra
kernel data is visible to devices. On the other hand, the overhead
of dma map and unmap is huge when IOMMU is on.

So add a queue based tx shared bounce buffer to memcpy the small
packet when the len of the xmitted skb is below tx_copybreak.
Add tx_spare_buf_size module param to set the size of tx spare
buffer, and add set/get_tunable to set or query the tx_copybreak.

The throughtput improves from 30 Gbps to 90+ Gbps when running 16
netperf threads with 32KB UDP message size when IOMMU is in the
strict mode(tx_copybreak = 2000 and mtu = 1500).

Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:36:06 -07:00
Yunsheng Lin
8677d78c3d net: hns3: refactor for hns3_fill_desc() function
Factor out hns3_fill_desc() so that it can be reused in the
tx bounce supporting.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:36:06 -07:00
Yunsheng Lin
26f1ccdf60 net: hns3: minor refactor related to desc_cb handling
desc_cb is used to store mapping and freeing info for the
corresponding desc, which is used in the cleaning process.
There will be more desc_cb type coming up when supporting the
tx bounce buffer, change desc_cb type to bit-wise value in order
to reduce the desc_cb type checking operation in the data path.

Also move the desc_cb type definition to hns3_enet.h because it
is only used in hns3_enet.c, and declare a local variable desc_cb
in hns3_clear_desc() to reduce lines of code.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16 00:36:06 -07:00
Lorenzo Bianconi
a078d981f8 net: ti: add pp skb recycling support
As already done for mvneta and mvpp2, enable skb recycling for ti
ethernet drivers

ti driver on net-next:
----------------------
[perf top]
 47.15%  [kernel]     [k] _raw_spin_unlock_irqrestore
 11.77%  [kernel]     [k] __cpdma_chan_free
  3.16%  [kernel]     [k] ___bpf_prog_run
  2.52%  [kernel]     [k] cpsw_rx_vlan_encap
  2.34%  [kernel]     [k] __netif_receive_skb_core
  2.27%  [kernel]     [k] free_unref_page
  2.26%  [kernel]     [k] kmem_cache_free
  2.24%  [kernel]     [k] kmem_cache_alloc
  1.69%  [kernel]     [k] __softirqentry_text_start
  1.61%  [kernel]     [k] cpsw_rx_handler
  1.19%  [kernel]     [k] page_pool_release_page
  1.19%  [kernel]     [k] clear_bits_ll
  1.15%  [kernel]     [k] page_frag_free
  1.06%  [kernel]     [k] __dma_page_dev_to_cpu
  0.99%  [kernel]     [k] memset
  0.94%  [kernel]     [k] __alloc_pages_bulk
  0.92%  [kernel]     [k] kfree_skb
  0.85%  [kernel]     [k] packet_rcv
  0.78%  [kernel]     [k] page_address
  0.75%  [kernel]     [k] v7_dma_inv_range
  0.71%  [kernel]     [k] __lock_text_start

[iperf3 tcp]
[  5]   0.00-10.00  sec   873 MBytes   732 Mbits/sec    0   sender
[  5]   0.00-10.01  sec   866 MBytes   726 Mbits/sec        receiver

ti + skb recycling:
-------------------
[perf top]
 40.58%  [kernel]    [k] _raw_spin_unlock_irqrestore
 16.18%  [kernel]    [k] __softirqentry_text_start
 10.33%  [kernel]    [k] __cpdma_chan_free
  2.62%  [kernel]    [k] ___bpf_prog_run
  2.05%  [kernel]    [k] cpsw_rx_vlan_encap
  2.00%  [kernel]    [k] kmem_cache_alloc
  1.86%  [kernel]    [k] __netif_receive_skb_core
  1.80%  [kernel]    [k] kmem_cache_free
  1.63%  [kernel]    [k] cpsw_rx_handler
  1.12%  [kernel]    [k] cpsw_rx_mq_poll
  1.11%  [kernel]    [k] page_pool_put_page
  1.04%  [kernel]    [k] _raw_spin_unlock
  0.97%  [kernel]    [k] clear_bits_ll
  0.90%  [kernel]    [k] packet_rcv
  0.88%  [kernel]    [k] __dma_page_dev_to_cpu
  0.85%  [kernel]    [k] kfree_skb
  0.80%  [kernel]    [k] memset
  0.71%  [kernel]    [k] __lock_text_start
  0.66%  [kernel]    [k] v7_dma_inv_range
  0.64%  [kernel]    [k] gen_pool_free_owner

[iperf3 tcp]
[  5]   0.00-10.00  sec   884 MBytes   742 Mbits/sec    0   sender
[  5]   0.00-10.01  sec   878 MBytes   735 Mbits/sec        receiver

Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:50:43 -07:00
M Chetan Kumar
925a56b2c0 net: wwan: iosm: Fix htmldocs warnings
Fixes .rst file warnings seen on linux-next build.

Fixes: f7af616c63 ("net: iosm: infrastructure")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:50:10 -07:00
Colin Ian King
f25dcde974 octeontx2-pf: Fix spelling mistake "morethan" -> "more than"
There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:29:58 -07:00
Colin Ian King
11b57faf95 net: dsa: b53: remove redundant null check on dev
The pointer dev can never be null, the null check is redundant
and can be removed. Cleans up a static analysis warning that
pointer priv is dereferencing dev before dev is being null
checked.

Addresses-Coverity: ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:28:01 -07:00
Jussi Maki
848ca9182a net: bonding: Use per-cpu rr_tx_counter
The round-robin rr_tx_counter was shared across CPUs leading to
significant cache thrashing at high packet rates. This patch switches
the round-robin packet counter to use a per-cpu variable to decide
the destination slave.

On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh
(-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after
this patch.

"perf top -e cache_misses" before:
    12.31%  [bonding]       [k] bond_xmit_roundrobin_slave_get
    10.59%  [sch_fq_codel]  [k] fq_codel_dequeue
     9.34%  [kernel]        [k] skb_release_data
after:
    15.42%  [sch_fq_codel]  [k] fq_codel_dequeue
    10.06%  [kernel]        [k] __memset
     9.12%  [kernel]        [k] skb_release_data

Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:26:15 -07:00
Liu Shixin
b8f6b0522c netlabel: Fix memory leak in netlbl_mgmt_add_common
Hulk Robot reported memory leak in netlbl_mgmt_add_common.
The problem is non-freed map in case of netlbl_domhsh_add() failed.

BUG: memory leak
unreferenced object 0xffff888100ab7080 (size 96):
  comm "syz-executor537", pid 360, jiffies 4294862456 (age 22.678s)
  hex dump (first 32 bytes):
    05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01  ................
  backtrace:
    [<0000000008b40026>] netlbl_mgmt_add_common.isra.0+0xb2a/0x1b40
    [<000000003be10950>] netlbl_mgmt_add+0x271/0x3c0
    [<00000000c70487ed>] genl_family_rcv_msg_doit.isra.0+0x20e/0x320
    [<000000001f2ff614>] genl_rcv_msg+0x2bf/0x4f0
    [<0000000089045792>] netlink_rcv_skb+0x134/0x3d0
    [<0000000020e96fdd>] genl_rcv+0x24/0x40
    [<0000000042810c66>] netlink_unicast+0x4a0/0x6a0
    [<000000002e1659f0>] netlink_sendmsg+0x789/0xc70
    [<000000006e43415f>] sock_sendmsg+0x139/0x170
    [<00000000680a73d7>] ____sys_sendmsg+0x658/0x7d0
    [<0000000065cbb8af>] ___sys_sendmsg+0xf8/0x170
    [<0000000019932b6c>] __sys_sendmsg+0xd3/0x190
    [<00000000643ac172>] do_syscall_64+0x37/0x90
    [<000000009b79d6dc>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 63c4168874 ("netlabel: Add network address selectors to the NetLabel/LSM domain mapping")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:19:04 -07:00
David S. Miller
f0c227c7df mlx5-updates-2021-06-14
1) Trivial Lag refactroing in preparation for upcomming Single FDB lag feature
  - First 3 patches
 
 2) Scalable IRQ distriburion for Sub-functions
 
 A subfunction (SF) is a lightweight function that has a parent PCI
 function (PF) on which it is deployed.
 
 Currently, mlx5 subfunction is sharing the IRQs (MSI-X) with their
 parent PCI function.
 
 Before this series the PF allocates enough IRQs to cover
 all the cores in a system, Newly created SFs will re-use all the IRQs
 that the PF has allocated for itself.
 Hence, the more SFs are created, there are more EQs per IRQs. Therefore,
 whenever we handle an interrupt, we need to pull all SFs EQs and PF EQs
 instead of PF EQs without SFs on the system. This leads to a hard impact
 on the performance of SFs and PF.
 
 For example, on machine with:
 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores.
 PCI Express 3 with BW of 126 Gb/s.
 ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16.
 
 test case: iperf TX BW single CPU, affinity of app and IRQ are the same.
 PF only: no SFs on the system, 56 IRQs.
 SF (before), 250 SFs Sharing the same 56 IRQs .
 SF (now),    250 SFs + 255 avaiable IRQs for the NIC. (please see IRQ spread scheme below).
 
 	    application SF-IRQ  channel   BW(Gb/sec)         interrupts/sec
             iperf TX            affinity
 PF only     cpu={0}     cpu={0} cpu={0}   79                 8200
 SF (before) cpu={0}     cpu={0} cpu={0}   51.3 (-35%)        9500
 SF (now)    cpu={0}     cpu={0} cpu={0}   78 (-2%)           8200
 
 command:
 $ taskset -c 0 iperf -c 11.1.1.1 -P 3 -i 6 -t 30 | grep SUM
 
 The different between the SF examples is that before this series we
 allocate num_cpus (56) IRQs, and all of them were shared among the PF
 and the SFs. And after this series, we allocate 255 IRQs, and we spread
 the SFs among the above IRQs. This have significantly decreased the load
 on each IRQ and the number of EQs per IRQ is down by 95% (251->11).
 
 In this patchset the solution proposed is to have a dedicated IRQ pool
 for SFs to use. the pool will allocate a large number of IRQs
 for SFs to grab from in order to minimize irq sharing between the
 different SFs.
 IRQs will not be requested from the OS until they are 1st requested by
 an SF consumer, and will be eventually released when the last SF consumer
 releases them.
 
 For the detailed IRQ spread and allocation scheme  please see last patch:
 ("net/mlx5: Round-Robin EQs over IRQs")
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmDIJUkACgkQSD+KveBX
 +j7tgQf+KtxzniuEY+JgbGWWyQvglx88S6WfhTOhZZllm2QXa2wWX24mz/AdYc0x
 QCT6yUzvaeaHPNpw/KwCw1IKpB9dlT+wIBD9NCEqtHqj+bVz+ioL/OlM5VJj+wC2
 kp+EjYsQbwgZIM40JgLLu2uzLy/5w7a1v9Rj0l4mLRZqPmrqeKrIAsVkVutaxtPg
 PtECBag4XtYERMXOfKohnXanwjW6ZyYQ0Yal76jNqoXXgy5dHr/JJDZQZTDURt7S
 3ex0gwTZwHfOLFQdRzD+U0kuC2/6sHMfeVrKO6QxuG/gihYe8FXEQ4qVSJmgXANP
 VH6n1Vk5IhaMzYKfGFb2OGOWanAVIA==
 =z0x7
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-06-14

1) Trivial Lag refactroing in preparation for upcomming Single FDB lag feature
 - First 3 patches

2) Scalable IRQ distriburion for Sub-functions

A subfunction (SF) is a lightweight function that has a parent PCI
function (PF) on which it is deployed.

Currently, mlx5 subfunction is sharing the IRQs (MSI-X) with their
parent PCI function.

Before this series the PF allocates enough IRQs to cover
all the cores in a system, Newly created SFs will re-use all the IRQs
that the PF has allocated for itself.
Hence, the more SFs are created, there are more EQs per IRQs. Therefore,
whenever we handle an interrupt, we need to pull all SFs EQs and PF EQs
instead of PF EQs without SFs on the system. This leads to a hard impact
on the performance of SFs and PF.

For example, on machine with:
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores.
PCI Express 3 with BW of 126 Gb/s.
ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16.

test case: iperf TX BW single CPU, affinity of app and IRQ are the same.
PF only: no SFs on the system, 56 IRQs.
SF (before), 250 SFs Sharing the same 56 IRQs .
SF (now),    250 SFs + 255 avaiable IRQs for the NIC. (please see IRQ spread scheme below).

	    application SF-IRQ  channel   BW(Gb/sec)         interrupts/sec
            iperf TX            affinity
PF only     cpu={0}     cpu={0} cpu={0}   79                 8200
SF (before) cpu={0}     cpu={0} cpu={0}   51.3 (-35%)        9500
SF (now)    cpu={0}     cpu={0} cpu={0}   78 (-2%)           8200

command:
$ taskset -c 0 iperf -c 11.1.1.1 -P 3 -i 6 -t 30 | grep SUM

The different between the SF examples is that before this series we
allocate num_cpus (56) IRQs, and all of them were shared among the PF
and the SFs. And after this series, we allocate 255 IRQs, and we spread
the SFs among the above IRQs. This have significantly decreased the load
on each IRQ and the number of EQs per IRQ is down by 95% (251->11).

In this patchset the solution proposed is to have a dedicated IRQ pool
for SFs to use. the pool will allocate a large number of IRQs
for SFs to grab from in order to minimize irq sharing between the
different SFs.
IRQs will not be requested from the OS until they are 1st requested by
an SF consumer, and will be eventually released when the last SF consumer
releases them.

For the detailed IRQ spread and allocation scheme  please see last patch:
("net/mlx5: Round-Robin EQs over IRQs")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:14:21 -07:00
David S. Miller
08ab4d7441 Merge branch 'occteontx2-rate-limit-offload'
Subbaraya Sundeep says:

====================
octeontx2: Add ingress ratelimit offload

This patchset adds ingress rate limiting hardware
offload support for CN10K silicons. Police actions
are added for TC matchall and flower filters.
CN10K has ingress rate limiting feature where
a receive queue is mapped to bandwidth profile
and the profile is configured with rate and burst
parameters by software. CN10K hardware supports
three levels of ingress policing or ratelimiting.
Multiple leaf profiles can  point to a single mid
level profile and multiple mid level profile can
point to a single top level one. Only leaf level
profiles are used for configuring rate limiting.

Patch 1 adds the new bandwidth profile contexts
in AF driver similar to other hardware contexts
Patch 2 adds the debugfs changes to dump bandwidth
profile contexts
Patch 3 adds support for police action with TC matchall filter
Patch 4 uses NL_SET_ERR_MSG_MOD for tc code
Patch 5 adds support for police action with TC flower filter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:11:05 -07:00
Subbaraya Sundeep
68fbff68db octeontx2-pf: Add police action for TC flower
Added police action for ingress TC flower
hardware offload. With this rate limiting can be
done per flow. Since rate limiting is tied to
RQs in hardware the number of TC flower filters
with action as police is limited to number
of receive queues of the interface. Both bps
and pps modes are supported.

Examples to rate limit a flow:
$ ethtool -K eth0 hw-tc-offload on
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 parent ffff: protocol ip \
  flower ip_proto udp dst_port 80 action \
  police rate 100Mbit burst 32Kbit

$ tc filter add dev eth0 parent ffff: \
  protocol ip flower dst_mac 5e:b2:34:ee:29:49 \
  action police pkts_rate 5000 pkts_burst 2048

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:11:05 -07:00
Subbaraya Sundeep
5d2fdd86d5 octeontx2-pf: Use NL_SET_ERR_MSG_MOD for TC
This patch modifies all netdev_err messages in
tc code to NL_SET_ERR_MSG_MOD. NL_SET_ERR_MSG_MOD
does not support format specifiers yet hence
netdev_err messages with only strings are modified.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:11:05 -07:00
Sunil Goutham
2ca89a2c37 octeontx2-pf: TC_MATCHALL ingress ratelimiting offload
Add TC_MATCHALL ingress ratelimiting offload support with POLICE
action for entire traffic coming into the interface.

Eg: To ratelimit ingress traffic to 100Mbps

$ ethtool -K eth0 hw-tc-offload on
$ tc qdisc add dev eth0 clsact
$ tc filter add dev eth0 ingress matchall skip_sw \
                action police rate 100Mbit burst 32Kbit

To support this, a leaf level bandwidth profile is allocated and all
RQs' contexts used by this interface are updated to point to it.
And the leaf level bandwidth profile is configured with user specified
rate and burst sizes.

Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:11:05 -07:00
Sunil Goutham
e7d8971763 octeontx2-af: cn10k: Debugfs support for bandwidth profiles
Added support for dumping current resource status of bandwidth
profiles and contexts of allocated profiles via debugfs.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:11:05 -07:00
Sunil Goutham
e8e095b3b3 octeontx2-af: cn10k: Bandwidth profiles config support
CN10K silicons supports hierarchial ingress packet ratelimiting.
There are 3 levels of profilers supported leaf, mid and top.
Ratelimiting is done after packet forwarding decision is taken
and a NIXLF's RQ is identified to DMA the packet. RQ's context
points to a leaf bandwidth profile which can be configured
to achieve desired ratelimit.

This patch adds logic for management of these bandwidth profiles
ie profile alloc, free, context update etc.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:11:05 -07:00
David S. Miller
ad5645d7b9 Merge branch 'pci200syn-cleanups'
Peng Li says:

====================
net: pci200syn: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:03:17 -07:00
Peng Li
6855d301e9 net: pci200syn: fix the comments style issue
Networking block comments don't use an empty /* line,
use /* Comment...

This patch fixes the comments style issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:03:17 -07:00
Peng Li
8e7680c102 net: pci200syn: add necessary () to macro argument
Macro argument 'card' may be better as '(card)' to
avoid precedence issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:03:17 -07:00
Peng Li
2b63744668 net: pci200syn: add some required spaces
Add spaces required after that close brace '}'.
Add spaces required before the open parenthesis '('.
Add spaces required after that ','.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:03:17 -07:00
Peng Li
b9282333ef net: pci200syn: replace comparison to NULL with "!card"
According to the chackpatch.pl, comparison to NULL could
be written "!card".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-15 11:03:17 -07:00