Commit Graph

996699 Commits

Author SHA1 Message Date
Geliang Tang
6445e17af7 mptcp: add rm_list in mptcp_out_options
This patch defined a new struct mptcp_rm_list, the ids field was an
array of the removing address ids, the nr field was the valid number of
removing address ids in the array. The array size was definced as a new
macro MPTCP_RM_IDS_MAX. Changed the member rm_id of struct
mptcp_out_options to rm_list.

In mptcp_established_options_rm_addr, invoked mptcp_pm_rm_addr_signal to
get the rm_list. According the number of addresses in it, calculated
the padded RM_ADDR suboption length. And saved the ids array in struct
mptcp_out_options's rm_list member.

In mptcp_write_options, iterated each address id from struct
mptcp_out_options's rm_list member, set the invalid ones as TCPOPT_NOP,
then filled them into the RM_ADDR suboption.

Changed TCPOLEN_MPTCP_RM_ADDR_BASE from 4 to 3.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:47:45 -08:00
David S. Miller
e9e90a70cc Merge branch 'resil-nhgroups-netdevsim-selftests'
Petr Machata says:

====================
net: Resilient NH groups: netdevsim, selftests

Support for resilient next-hop groups was added in a previous patch set.
Resilient next hop groups add a layer of indirection between the SKB hash
and the next hop. Thus the hash is used to reference a hash table bucket,
which is then used to reference a particular next hop. This allows the
system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.

This patch set introduces mock offloading of resilient next hop groups by
the netdevsim driver, and a suite of selftests.

- Patch #1 adds a netdevsim-specific lock to protect next-hop hashtable.
  Previously, netdevsim relied on RTNL to maintain mutual exclusion.
  Patch #2 extracts a helper to make the following patches clearer.

- Patch #3 implements the support for offloading of resilient next-hop
  groups.

- Patch #4 introduces a new debugfs interface to set activity on a selected
  next-hop bucket. This simulates how HW can periodically report bucket
  activity, and buckets thus marked are expected to be exempt from
  migration to new next hops when the group changes.

- Patches #5 and #6 clean up the fib_nexthop selftests.

- Patches #7, #8 and #9 add tests for resilient next hop groups. Patch #7
  adds resilient-hashing counterparts to fib_nexthops.sh. Patch #8 adds a
  new traffic test for resilient next-hop groups. Patch #9 adds a new
  traffic test for tunneling.

- Patch #10 actually leverages the netdevsim offload to implement a suite
  of algorithmic tests that verify how and when buckets are migrated under
  various simulated workload scenarios.

The overall plan is to contribute approximately the following patchsets:

1) Nexthop policy refactoring (already pushed)
2) Preparations for resilient next hop groups (already pushed)
3) Implementation of resilient next hop group (already pushed)
4) Netdevsim offload plus a suite of selftests (this patchset)
5) Preparations for mlxsw offload of resilient next-hop groups
6) mlxsw offload including selftests

Interested parties can look at the complete code at [2].

[1] https://tools.ietf.org/html/rfc2992
[2] https://github.com/idosch/linux/commits/submit/res_integ_v1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
b8a07c4cea selftests: netdevsim: Add test for resilient nexthop groups offload API
Test various aspects of the resilient nexthop group offload API on top
of the netdevsim implementation. Both good and bad flows are tested.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
902280cacc selftests: forwarding: Add resilient multipath tunneling nexthop test
Add a resilient nexthop objects version of gre_multipath_nh.sh. Test
that both IPv4 and IPv6 overlays work with resilient nexthop groups
where the nexthops are two GRE tunnels.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
386e3792b5 selftests: forwarding: Add resilient hashing test
Verify that IPv4 and IPv6 multipath forwarding works correctly with
resilient nexthop groups and with different weights.

Test that when the idle timer is not zero, the resilient groups are not
rebalanced - because the nexthop buckets are considered active - and the
initial weights (1:1) are used.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
557205f47d selftests: fib_nexthops: Test resilient nexthop groups
Add test cases for resilient nexthop groups. Exhaustive forwarding tests
are added separately under net/forwarding/.

Examples:

 # ./fib_nexthops.sh -t basic_res

Basic resilient nexthop group functional tests
----------------------------------------------
TEST: Add a nexthop group with default parameters                   [ OK ]
TEST: Get a nexthop group with default parameters                   [ OK ]
TEST: Get a nexthop group with non-default parameters               [ OK ]
TEST: Add a nexthop group with 0 buckets                            [ OK ]
TEST: Replace nexthop group parameters                              [ OK ]
TEST: Get a nexthop group after replacing parameters                [ OK ]
TEST: Replace idle timer                                            [ OK ]
TEST: Get a nexthop group after replacing idle timer                [ OK ]
TEST: Replace unbalanced timer                                      [ OK ]
TEST: Get a nexthop group after replacing unbalanced timer          [ OK ]
TEST: Replace with no parameters                                    [ OK ]
TEST: Get a nexthop group after replacing no parameters             [ OK ]
TEST: Replace nexthop group type - implicit                         [ OK ]
TEST: Replace nexthop group type - explicit                         [ OK ]
TEST: Replace number of nexthop buckets                             [ OK ]
TEST: Get a nexthop group after replacing with invalid parameters   [ OK ]
TEST: Dump all nexthop buckets                                      [ OK ]
TEST: Dump all nexthop buckets in a group                           [ OK ]
TEST: Dump all nexthop buckets with a specific nexthop device       [ OK ]
TEST: Dump all nexthop buckets with a specific nexthop identifier   [ OK ]
TEST: Dump all nexthop buckets in a non-existent group              [ OK ]
TEST: Dump all nexthop buckets in a non-resilient group             [ OK ]
TEST: Dump all nexthop buckets using a non-existent device          [ OK ]
TEST: Dump all nexthop buckets with invalid 'groups' keyword        [ OK ]
TEST: Dump all nexthop buckets with invalid 'fdb' keyword           [ OK ]
TEST: Get a valid nexthop bucket                                    [ OK ]
TEST: Get a nexthop bucket with valid group, but invalid index      [ OK ]
TEST: Get a nexthop bucket from a non-resilient group               [ OK ]
TEST: Get a nexthop bucket from a non-existent group                [ OK ]

Tests passed:  29
Tests failed:   0

 # ./fib_nexthops.sh -t ipv4_large_res_grp

IPv4 large resilient group (128k buckets)
-----------------------------------------
TEST: Dump large (x131072) nexthop buckets                          [ OK ]

Tests passed:   1
Tests failed:   0

 # ./fib_nexthops.sh -t ipv6_large_res_grp

IPv6 large resilient group (128k buckets)
-----------------------------------------
TEST: Dump large (x131072) nexthop buckets                          [ OK ]

Tests passed:   1
Tests failed:   0

 # ./fib_nexthops.sh -t ipv4_res_torture

IPv4 runtime resilient nexthop group torture
--------------------------------------------
TEST: IPv4 resilient nexthop group torture test                     [ OK ]

Tests passed:   1
Tests failed:   0

 # ./fib_nexthops.sh -t ipv6_res_torture

IPv6 runtime resilient nexthop group torture
--------------------------------------------
TEST: IPv6 resilient nexthop group torture test                     [ OK ]

Tests passed:   1
Tests failed:   0

 # ./fib_nexthops.sh -t ipv4_res_grp_fcnal

IPv4 resilient groups functional
--------------------------------
TEST: Nexthop group updated when entry is deleted                   [ OK ]
TEST: Nexthop buckets updated when entry is deleted                 [ OK ]
TEST: Nexthop group updated after replace                           [ OK ]
TEST: Nexthop buckets updated after replace                         [ OK ]
TEST: Nexthop group updated when entry is deleted - nECMP           [ OK ]
TEST: Nexthop buckets updated when entry is deleted - nECMP         [ OK ]
TEST: Nexthop group updated after replace - nECMP                   [ OK ]
TEST: Nexthop buckets updated after replace - nECMP                 [ OK ]

Tests passed:   8
Tests failed:   0

 # ./fib_nexthops.sh -t ipv6_res_grp_fcnal

IPv6 resilient groups functional
--------------------------------
TEST: Nexthop group updated when entry is deleted                   [ OK ]
TEST: Nexthop buckets updated when entry is deleted                 [ OK ]
TEST: Nexthop group updated after replace                           [ OK ]
TEST: Nexthop buckets updated after replace                         [ OK ]
TEST: Nexthop group updated when entry is deleted - nECMP           [ OK ]
TEST: Nexthop buckets updated when entry is deleted - nECMP         [ OK ]
TEST: Nexthop group updated after replace - nECMP                   [ OK ]
TEST: Nexthop buckets updated after replace - nECMP                 [ OK ]

Tests passed:   8
Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
a8f9952d21 selftests: fib_nexthops: List each test case in a different line
The lines with the IPv4 and IPv6 test cases are already very long and
more test cases will be added in subsequent patches.

List each test case in a different line to make it easier to extend the
test with more test cases.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
8e815284a5 selftests: fib_nexthops: Declutter test output
Before:

 # ./fib_nexthops.sh -t ipv4_torture

IPv4 runtime torture
--------------------
TEST: IPv4 torture test                                             [ OK ]
./fib_nexthops.sh: line 213: 19376 Killed                  ipv4_del_add_loop1
./fib_nexthops.sh: line 213: 19377 Killed                  ipv4_grp_replace_loop
./fib_nexthops.sh: line 213: 19378 Killed                  ip netns exec me ping -f 172.16.101.1 > /dev/null 2>&1
./fib_nexthops.sh: line 213: 19380 Killed                  ip netns exec me ping -f 172.16.101.2 > /dev/null 2>&1
./fib_nexthops.sh: line 213: 19381 Killed                  ip netns exec me mausezahn veth1 -B 172.16.101.2 -A 172.16.1.1 -c 0 -t tcp "dp=1-1023, flags=syn" > /dev/null 2>&1

Tests passed:   1
Tests failed:   0

 # ./fib_nexthops.sh -t ipv6_torture

IPv6 runtime torture
--------------------
TEST: IPv6 torture test                                             [ OK ]
./fib_nexthops.sh: line 213: 24453 Killed                  ipv6_del_add_loop1
./fib_nexthops.sh: line 213: 24454 Killed                  ipv6_grp_replace_loop
./fib_nexthops.sh: line 213: 24456 Killed                  ip netns exec me ping -f 2001:db8:101::1 > /dev/null 2>&1
./fib_nexthops.sh: line 213: 24457 Killed                  ip netns exec me ping -f 2001:db8:101::2 > /dev/null 2>&1
./fib_nexthops.sh: line 213: 24458 Killed                  ip netns exec me mausezahn -6 veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn" > /dev/null 2>&1

Tests passed:   1
Tests failed:   0

After:

 # ./fib_nexthops.sh -t ipv4_torture

IPv4 runtime torture
--------------------
TEST: IPv4 torture test                                             [ OK ]

Tests passed:   1
Tests failed:   0

 # ./fib_nexthops.sh -t ipv6_torture

IPv6 runtime torture
--------------------
TEST: IPv6 torture test                                             [ OK ]

Tests passed:   1
Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
c6385c0b67 netdevsim: Allow reporting activity on nexthop buckets
A key component of the resilient hashing algorithm is the hash buckets'
activity. If a bucket is active, it will not be populated with a new
nexthop in order not to break existing flows. Therefore, in order to
easily and thoroughly test the algorithm, we need to be in full control
over the reported activity.

Add a debugfs interface that allows user space to have netdevsim report
a nexthop bucket within a resilient nexthop group as active. For
example:

 # echo 10 23 > /sys/kernel/debug/netdevsim/netdevsim10/fib/nexthop_bucket_activity

Will mark bucket 23 in nexthop group 10 as active.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
d8eaa4faca netdevsim: Add support for resilient nexthop groups
Allow resilient nexthop groups to be programmed and account their
occupancy according to their number of buckets. The nexthop group itself
as well as its buckets are marked with hardware flags (i.e.,
'RTNH_F_TRAP').

Replacement of a single nexthop bucket can fail using the following
debugfs knob:

 # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
 N
 # echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
 # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
 Y

Replacement of a resilient nexthop group can fail using the following
debugfs knob:

 # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
 N
 # echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
 # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
 Y

This enables testing of various error paths.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
40ff83711f netdevsim: Create a helper for setting nexthop hardware flags
Instead of calling nexthop_set_hw_flags(), call a helper. It will be
used to also set nexthop bucket flags in a subsequent patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Petr Machata
86927c9c4d netdevsim: fib: Introduce a lock to guard nexthop hashtable
Currently netdevsim relies on RTNL to maintain exclusivity in accessing the
nexthop hash table. However, bucket notification may be called without RTNL
having been held. Instead, introduce a custom lock to guard the table.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
David S. Miller
b202923d3a Merge branch 'ptp-warnings'
Lee Jones says:

====================
Rid W=1 warnings from PTP

This set is part of a larger effort attempting to clean-up W=1
kernel builds, which are currently overwhelmingly riddled with
niggly little warnings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:09:34 -08:00
Lee Jones
287f93ded6 ptp: ptp_p: Demote non-conformant kernel-doc headers and supply a param description
Fixes the following W=1 kernel build warning(s):

 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'control' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'event' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'addend' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'accum' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'test' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ts_compare' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rsystime_lo' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rsystime_hi' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'systime_lo' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'systime_hi' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'trgt_lo' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'trgt_hi' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'asms_lo' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'asms_hi' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'amms_lo' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'amms_hi' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ch_control' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ch_event' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'tx_snap_lo' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'tx_snap_hi' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rx_snap_lo' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rx_snap_hi' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'src_uuid_lo' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'src_uuid_hi' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'can_status' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'can_snap_lo' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'can_snap_hi' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ts_sel' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ts_st' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'reserve1' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'stl_max_set_en' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'stl_max_set' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'reserve2' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'srst' not described in 'pch_ts_regs'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'regs' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'ptp_clock' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'caps' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'exts0_enabled' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'exts1_enabled' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'mem_base' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'mem_size' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'irq' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'pdev' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'register_lock' not described in 'pch_dev'
 drivers/ptp/ptp_pch.c:128: warning: Function parameter or member 'station' not described in 'pch_params'
 drivers/ptp/ptp_pch.c:291: warning: Function parameter or member 'pdev' not described in 'pch_set_station_address'

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: LAPIS SEMICONDUCTOR <tshimizu818@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:09:34 -08:00
Lee Jones
9ec04c71ab ptp: ptp_clockmatrix: Demote non-kernel-doc header to standard comment
Fixes the following W=1 kernel build warning(s):

 drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand  * @brief Maximum absolute value for write phase offset in picoseconds
 drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand  * @brief Maximum absolute value for write phase offset in picoseconds
 drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand  * @brief Maximum absolute value for write phase offset in picoseconds
 drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand  * @brief Maximum absolute value for write phase offset in picoseconds
 drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand  * @brief Maximum absolute value for write phase offset in picoseconds

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: IDT-support-1588@lm.renesas.com
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:09:34 -08:00
Lee Jones
f90fc37f28 ptp_pch: Move 'pch_*()' prototypes to shared header
Fixes the following W=1 kernel build warning(s):

 drivers/ptp/ptp_pch.c:193:6: warning: no previous prototype for ‘pch_ch_control_write’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:201:5: warning: no previous prototype for ‘pch_ch_event_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:212:6: warning: no previous prototype for ‘pch_ch_event_write’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:220:5: warning: no previous prototype for ‘pch_src_uuid_lo_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:231:5: warning: no previous prototype for ‘pch_src_uuid_hi_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:242:5: warning: no previous prototype for ‘pch_rx_snap_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:259:5: warning: no previous prototype for ‘pch_tx_snap_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:300:5: warning: no previous prototype for ‘pch_set_station_address’ [-Wmissing-prototypes]

Cc: Richard Cochran <richardcochran@gmail.com> (maintainer:PTP HARDWARE CLOCK SUPPORT)
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Flavio Suligoi <f.suligoi@asem.it>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:09:34 -08:00
Lee Jones
257382c54e ptp_pch: Remove unused function 'pch_ch_control_read()'
Fixes the following W=1 kernel build warning(s):

 drivers/ptp/ptp_pch.c:182:5: warning: no previous prototype for ‘pch_ch_control_read’ [-Wmissing-prototypes]

Cc: Richard Cochran <richardcochran@gmail.com> (maintainer:PTP HARDWARE CLOCK SUPPORT)
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Flavio Suligoi <f.suligoi@asem.it>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:09:34 -08:00
Rafał Miłecki
a9349f08ec net: dsa: bcm_sf2: setup BCM4908 internal crossbar
On some SoCs (e.g. BCM4908, BCM631[345]8) SF2 has an integrated
crossbar. It allows connecting its selected external ports to internal
ports. It's used by vendors to handle custom Ethernet setups.

BCM4908 has following 3x2 crossbar. On Asus GT-AC5300 rgmii is used for
connecting external BCM53134S switch. GPHY4 is usually used for WAN
port. More fancy devices use SerDes for 2.5 Gbps Ethernet.

              ┌──────────┐
SerDes ─── 0 ─┤          │
              │   3x2    ├─ 0 ─── switch port 7
 GPHY4 ─── 1 ─┤          │
              │ crossbar ├─ 1 ─── runner (accelerator)
 rgmii ─── 2 ─┤          │
              └──────────┘

Use setup data based on DT info to configure BCM4908's switch port 7.
Right now only GPHY and rgmii variants are supported. Handling SerDes
can be implemented later.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:06:37 -08:00
Rafał Miłecki
01488a0ccd net: dsa: bcm_sf2: store PHY interface/mode in port structure
It's needed later for proper switch / crossbar setup.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:06:37 -08:00
Shubhankar Kuranagatti
6ad086009f net: ipv4: route.c: Fix indentation of multi line comment.
All comment lines inside the comment block have been aligned.
Every line of comment starts with a * (uniformity in code).

Signed-off-by: Shubhankar Kuranagatti <shubhankarvk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:02:30 -08:00
Rafał Miłecki
12bb508bfe net: broadcom: bcm4908_enet: support TX interrupt
It appears that each DMA channel has its own interrupt and both rings
can be configured (the same way) to handle interrupts.

1. Make ring interrupts code generic (make it operate on given ring)
2. Move napi to ring (so each has its own)
3. Make IRQ handler generic (match ring against received IRQ number)
4. Add (optional) support for TX interrupt

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 16:48:38 -08:00
Rafał Miłecki
ab4dda7a8c dt-bindings: net: bcm4908-enet: add optional TX interrupt
I discovered that hardware actually supports two interrupts, one per DMA
channel (RX and TX).

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 16:48:38 -08:00
David S. Miller
26d2e0426a Merge branch 'macb-fixed-link-fixes'
Robert Hancock says:

====================
macb SGMII fixed-link fixes

Some fixes to the macb driver for use in SGMII mode with a fixed-link (such as
for chip-to-chip connectivity).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 16:44:46 -08:00
Robert Hancock
e276e5e40e net: macb: Disable PCS auto-negotiation for SGMII fixed-link mode
When using a fixed-link configuration in SGMII mode, it's not really
sensible to have auto-negotiation enabled since the link settings are
fixed by definition. In other configurations, such as an SGMII
connection to a PHY, it should generally be enabled.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 16:44:45 -08:00
Robert Hancock
8fab174b78 net: macb: poll for fixed link state in SGMII mode
When using a fixed-link configuration with GEM in SGMII mode, such as
for a chip-to-chip interconnect, the link state was always showing as
established regardless of the actual connectivity state. We can monitor
the pcs_link_state bit in the Network Status register to determine
whether the PCS link state is actually up.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 16:44:45 -08:00
David S. Miller
c232f81b0a mlx5-updates-2021-03-12
1) TC support for ICMP parameters
 2) TC connection tracking with mirroring
 3) A round of trivial fixups and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmBL+V4ACgkQSD+KveBX
 +j68tgf+Lzq6QkpGl4dpg87pjuKIwyykGVJ0NW/Yig+bNe9V3pbFCv785xymD16N
 ALcLKNtxUovE6p+Gy6JgzgBN7V9ZsM6bn564487ycYIqewz9c2KIQneJdGEUUDgi
 S8NkuCaLQst36Wl8gdiygApZmw8TAafOQYYzKOZ7HdZ+aRH+fIwbVABYreAYwjGa
 2iz5yu1AEzG5/10bTEx69rQHY+309EPy4N4na/jo/tXRL3fWQSQz6hzEnoi3EjAU
 YsktBB72eBX/NogqLIHpSzzjPgm5IFRQbNSHyL71TH9KEr0KVpzJd9VCbaic6goL
 gQn0TqoE74X7r4ZPWP3AHcGOL9L5Rg==
 =L2PO
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-03-12

1) TC support for ICMP parameters
2) TC connection tracking with mirroring
3) A round of trivial fixups and cleanups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 16:36:07 -08:00
Maor Dickman
a3222a2da0 net/mlx5e: Allow to match on ICMP parameters
Support matching on ICMPv4/6 type and code parameters using misc3
section of match parameters.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:34 -08:00
Paul Blakey
69e2916ebc net/mlx5: CT: Add support for mirroring
Add support for mirroring before the CT action by spliting the pre ct rule.
Mirror outputs are done first on the tc chain,prio table rule (the fwd
rule), which will then forward to a per port fwd table.
On this fwd table, we insert the original pre ct rule that forwards to
ct/ct nat table.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:33 -08:00
Alaa Hleihel
287e0df021 net/mlx5: Display the command index in command mailbox dump
Multiple commands can be printed at the same time which can
lead to wrong order of their lines in dmesg output.
As a result, it's hard to match data dumps to the correct command
or which command was fully dumped at some point.

Fix this by displaying the corresponding command index, and also
indicate when a command was fully dumped.

Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:33 -08:00
Arnd Bergmann
2119bda642 net/mlx5e: allocate 'indirection_rqt' buffer dynamically
Increasing the size of the indirection_rqt array from 128 to 256 bytes
pushed the stack usage of the mlx5e_hairpin_fill_rqt_rqns() function
over the warning limit when building with clang and CONFIG_KASAN:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:970:1: error: stack frame size of 1180 bytes in function 'mlx5e_tc_add_nic_flow' [-Werror,-Wframe-larger-than=]

Using dynamic allocation here is safe because the caller does the
same, and it reduces the stack usage of the function to just a few
bytes.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:32 -08:00
Tariq Toukan
e16cf9d754 net/mlx5e: Dump ICOSQ WQE descriptor on CQE with error events
Dump the ICOSQ's WQE descriptor when a completion with error is received.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:32 -08:00
Maxim Mikityanskiy
991b265460 net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath
Commit e20f0dbf20 ("net/mlx5e: RX, Add a prefetch command for small
L1_CACHE_BYTES") switched to using net_prefetchw at all places in mlx5e.
In the same time frame, commit 5af75c747e ("net/mlx5e: Enhanced TX
MPWQE for SKBs") added one more usage of prefetchw. When these two
changes were merged, this new occurrence of prefetchw wasn't replaced
with net_prefetchw.

This commit fixes this last occurrence of prefetchw in
mlx5e_tx_mpwqe_session_start, making the same change that was done in
mlx5e_xdp_mpwqe_session_start.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:31 -08:00
Roi Dayan
bca08a9145 net/mlx5e: Remove redundant newline in NL_SET_ERR_MSG_MOD
Fix the following coccicheck warnings:

drivers/net/ethernet/mellanox/mlx5/core/devlink.c:145:29-66: WARNING
avoid newline at end of message in NL_SET_ERR_MSG_MOD
drivers/net/ethernet/mellanox/mlx5/core/devlink.c:140:29-77: WARNING
avoid newline at end of message in NL_SET_ERR_MSG_MOD

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:31 -08:00
Mark Zhang
093bd76469 net/mlx5: Read congestion counters from all ports when lag is active
Read congestion counters from all ports in any lag mode rather than
only in RoCE lag mode (e.g., VF lag).

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:31 -08:00
Jiapeng Chong
7976092241 net/mlx5: remove unneeded semicolon
Fix the following coccicheck warnings:

./drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c:495:2-3: Unneeded
semicolon.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:30 -08:00
Junlin Yang
ad2c99ca75 net/mlx5: use kvfree() for memory allocated with kvzalloc()
It is allocated with kvzalloc(), the corresponding release function
should not be kfree(), use kvfree() instead.

Generated by: scripts/coccinelle/api/kfree_mismatch.cocci

Signed-off-by: Junlin Yang <yangjunlin@yulong.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:30 -08:00
Yevgeny Kliteynik
cc82a2e6c8 net/mlx5: DR, Add missing vhca_id consume from STEv1
The field source_eswitch_owner_vhca_id was not consumed
in the same way as in STEv0. Added the missing set.

Fixes: 10b6941864 ("net/mlx5: DR, Add HW STEv1 match logic")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:30 -08:00
Yevgeny Kliteynik
1412477882 net/mlx5: DR, Remove unneeded rx_decap_l3 function for STEv1
Remove the dr_ste_v1_set_rx_decap_l3 function that was
replaced by another function - fixing a rebase error.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:29 -08:00
Yevgeny Kliteynik
0142f09764 net/mlx5: DR, Fixed typo in STE v0
"reforamt" -> "reformat"

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:29 -08:00
Jonathan Neuschäfer
bfdfe7fc1b docs: networking: phy: Improve placement of parenthesis
"either" is outside the parentheses, so the matching "or" should be too.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 12:29:11 -08:00
David S. Miller
5215206d8b Merge branch 'tcp-delayed-completions'
Eric Dumazet says:

====================
tcp: better deal with delayed TX completions

Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)

While problems have been there forever, second patch might
introduce some regressions so I prefer not backport
them to stable releases before things settle.

Many thanks to FB team for their help and tests.

Few packetdrill tests need to be changed to reflect
the improvements brought by this series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 18:35:31 -08:00
Eric Dumazet
ac3959fd0d tcp: remove obsolete check in __tcp_retransmit_skb()
TSQ provides a nice way to avoid bufferbloat on individual socket,
including retransmit packets. We can get rid of the old
heuristic:

	/* Do not sent more than we queued. 1/4 is reserved for possible
	 * copying overhead: fragmentation, tunneling, mangling etc.
	 */
	if (refcount_read(&sk->sk_wmem_alloc) >
	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
		  sk->sk_sndbuf))
		return -EAGAIN;

This heuristic was giving false positives according to Jakub,
whenever TX completions are delayed above RTT. (Ack packets
are processed by TCP stack before clones are orphaned/freed)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 18:35:31 -08:00
Eric Dumazet
a7abf3cd76 tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()
Jakub reported Data included in a Fastopen SYN that had to be
retransmit would have to wait for an RTO if TX completions are slow,
even with prior fix.

This is because tcp_rcv_fastopen_synack() does not use standard
rtx logic, meaning TSQ handler exits early in tcp_tsq_write()
because tp->lost_out == tp->retrans_out

Lets make tcp_rcv_fastopen_synack() use standard rtx logic,
by using tcp_mark_skb_lost() on the skb thats needs to be
sent again.

Not this raised a warning in tcp_fastretrans_alert() during my tests
since we consider the data not being aknowledged
by the receiver does not mean packet was lost on the network.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 18:35:31 -08:00
Eric Dumazet
f4dae54e48 tcp: plug skb_still_in_host_queue() to TSQ
Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)

Main issue is that TCP stack has a logic preventing a packet
being retransmit if the prior clone has not yet been
orphaned or freed.

This logic came with commit 1f3279ae0c ("tcp: avoid
retransmits of TCP packets hanging in host queues")

Thankfully, in the case skb_still_in_host_queue() detects
the initial clone is still in flight, it can use TSQ logic
that will eventually retry later, at the moment the clone
is freed or orphaned.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Neil Spring <ntspring@fb.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 18:35:31 -08:00
Tong Zhang
8176f8c0f0 isdn: remove extra spaces in the header file
fix some coding style issues in the isdn header

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 18:23:55 -08:00
Hoang Huu Le
97bc84bbd4 tipc: clean up warnings detected by sparse
This patch fixes the following warning from sparse:

net/tipc/monitor.c:263:35: warning: incorrect type in assignment (different base types)
net/tipc/monitor.c:263:35:    expected unsigned int
net/tipc/monitor.c:263:35:    got restricted __be32 [usertype]
[...]
net/tipc/node.c:374:13: warning: context imbalance in 'tipc_node_read_lock' - wrong count at exit
net/tipc/node.c:379:13: warning: context imbalance in 'tipc_node_read_unlock' - unexpected unlock
net/tipc/node.c:384:13: warning: context imbalance in 'tipc_node_write_lock' - wrong count at exit
net/tipc/node.c:389:13: warning: context imbalance in 'tipc_node_write_unlock_fast' - unexpected unlock
net/tipc/node.c:404:17: warning: context imbalance in 'tipc_node_write_unlock' - unexpected unlock
[...]
net/tipc/crypto.c:1201:9: warning: incorrect type in initializer (different address spaces)
net/tipc/crypto.c:1201:9:    expected struct tipc_aead [noderef] __rcu *__tmp
net/tipc/crypto.c:1201:9:    got struct tipc_aead *
[...]

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 18:06:54 -08:00
Hoang Le
1980d37565 tipc: convert dest node's address to network order
(struct tipc_link_info)->dest is in network order (__be32), so we must
convert the value to network order before assigning. The problem detected
by sparse:

net/tipc/netlink_compat.c:699:24: warning: incorrect type in assignment (different base types)
net/tipc/netlink_compat.c:699:24:    expected restricted __be32 [usertype] dest
net/tipc/netlink_compat.c:699:24:    got int

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 18:06:54 -08:00
David S. Miller
1520929e26 Merge branch 'mlxsw-Implement-sampling-using-mirroring'
Ido Schimmel says:

====================
mlxsw: Implement sampling using mirroring

So far, sampling was implemented using a dedicated sampling mechanism
that is available on all Spectrum ASICs. Spectrum-2 and later ASICs
support sampling by mirroring packets to the CPU port with probability.
This method has a couple of advantages compared to the legacy method:

* Extra metadata per-packet: Egress port, egress traffic class, traffic
  class occupancy and end-to-end latency
* Ability to sample packets on egress / per-flow as opposed to only
  ingress

This series should not result in any user-visible changes and its aim is
to convert Spectrum-2 and later ASICs to perform sampling by mirroring
to the CPU port with probability. Future submissions will expose the
additional metadata and enable sampling using more triggers (e.g.,
egress).

Series overview:

Patches #1-#3 extend the SPAN (mirroring) module to accept new
parameters required for sampling. See individual commit messages for
detailed explanation.

Patch #4-#5 split sampling support between Spectrum-1 and later ASIC while
still using the legacy method for all ASIC generations.

Patch #6 converts Spectrum-2 and later ASICs to perform sampling by
mirroring to the CPU port with probability.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 16:22:39 -08:00
Ido Schimmel
cf31190ae0 mlxsw: spectrum_matchall: Implement sampling using mirroring
Spectrum-2 and later ASICs support sampling of packets by mirroring to
the CPU with probability. There are several advantages compared to the
legacy dedicated sampling mechanism:

* Extra metadata per-packet: Egress port, egress traffic class, traffic
  class occupancy and end-to-end latency
* Ability to sample packets on egress / per-flow

Convert Spectrum-2 and later ASICs to perform sampling by mirroring to
the CPU with probability.

Subsequent patches will add support for egress / per-flow sampling and
expose the extra metadata.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 16:22:39 -08:00
Ido Schimmel
34a277212c mlxsw: spectrum_trap: Split sampling traps between ASICs
Sampling of ingress packets is supported using a dedicated sampling
mechanism on all Spectrum ASICs. However, Spectrum-2 and later ASICs
support more sophisticated sampling by mirroring packets to the CPU.

As a preparation for more advanced sampling configurations, split the trap
configuration used for sampled packets between Spectrum-1 and later ASICs.

This is needed since packets that are mirrored to the CPU are trapped
via a different trap identifier compared to packets that are sampled
using the dedicated sampling mechanism.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 16:22:39 -08:00