2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-24 22:55:35 +08:00
Commit Graph

4972 Commits

Author SHA1 Message Date
David S. Miller
5f0d736e7f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-04-28

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Introduce BPF socket local storage map so that BPF programs can store
   private data they associate with a socket (instead of e.g. separate hash
   table), from Martin.

2) Add support for bpftool to dump BTF types. This is done through a new
   `bpftool btf dump` sub-command, from Andrii.

3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which
   was currently not supported since skb was used to lookup netns, from Stanislav.

4) Add an opt-in interface for tracepoints to expose a writable context
   for attached BPF programs, used here for NBD sockets, from Matt.

5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel.

6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to
   support tunnels such as sit. Add selftests as well, from Willem.

7) Various smaller misc fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28 08:42:41 -04:00
David S. Miller
8b44836583 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two easy cases of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25 23:52:29 -04:00
David Ahern
7973d9e767 mlxsw: spectrum_router: Prevent ipv6 gateway with v4 route via replace and append
mlxsw currently does not support v6 gateways with v4 routes. Commit
19a9d136f1 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
prevents a route from being added, but nothing stops the replace or
append. Add a catch for them too.
    $ ip  ro add 172.16.2.0/24 via 10.99.1.2
    $ ip  ro replace 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
    Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.
    $ ip  ro append 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
    Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-23 19:54:26 -07:00
David S. Miller
20eb08b2b0 mlx5-updates-2019-04-22
This series includes updates to mlx5e driver RX data path and some
 significant XDP RX/TX improvements to overcome/mitigate HW and PCIE
 bottlenecks.
 
 From Tariq:
 1) Some Enhancements in rq->flags
 2) Stabilize RX packet rate (on Striding RQ) with
 multiple outstanding UMR posts
 In this patch, we add support for multiple outstanding UMR posts,
  to allow faster gap closure between consuming MPWQEs and reposting
 them back into the WQ.
 
 Performance test:
 As expected, huge improvement in large-scale (48 cores).
 
 xdp_redirect_map, 64B UDP multi-stream.
 Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
 CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.
 
 Before: Unstable, 7 to 30 Mpps
 After:  Stable,   at 70.5 Mpps
 
 From Shay:
 3) XDP, Inline small packets into the TX MPWQE in XDP xmit flow
 
 Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
 resources are spent on prefetching TX descriptors, thus affecting
 transmission rates.
 This patch comes to mitigate this problem by moving some workload to the
 CPU and reducing the HW data prefetch overhead for small packets (<= 256B).
 
 When forwarding packets with XDP, a packet that is smaller
 than a certain size (set to ~256 bytes) would be sent inline within
 its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
 beyond a pre-defined water-mark.
 
 Performance:
     Tested packet rate for UDP 64Byte multi-stream
     over two dual port ConnectX-5 100Gbps NICs.
     CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
 
     * Tested with hyper-threading disabled
 
     XDP_TX:
 
     |          | before | after   |       |
     | 24 rings | 51Mpps | 116Mpps | +126% |
     | 1 ring   | 12Mpps | 12Mpps  | same  |
 
     XDP_REDIRECT:
 
     ** Below is the transmit rate, not the redirection rate
     which might be larger, and is not affected by this patch.
 
     |          | before  | after   |      |
     | 32 rings | 64Mpps  | 92Mpps  | +43% |
     | 1 ring   | 6.4Mpps | 6.4Mpps | same |
 
 As we can see, feature significantly improves scaling, without
 hurting single ring performance.
 
 From Maxim:
 4) Some trivial refactoring and code improvements prior to a larger series
 to support AF_XDP.
 
 -Saeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcv2LjAAoJEEg/ir3gV/o+90gIAI8+4lwkXZAVk4mxf9PMjxuB
 bQiKd80e++26sgrNHCyuWZnIzTQqYAnUJ3WRC+Kk1pFTo1O23A+fvweT8m1dqAvP
 Z/5ktfbAeF3fwOVu7aGu9vh4zJEWJj8oO+I+G+OaOe2iV7FVTTFnWHxiiCfungAW
 oUnXozq4vERSQLechqqgz6nACxOPgEOCJrp4T9lDYSbqZizHgFttmInMQguq/7KS
 LvITcNu3EF5l4y2LxwCFiKRgGc2y/belU63AK+2pQUXhH46kQPEHdncdLg5d9QYA
 xJwthn697qxS0PIP5oHPHNVN+qJXfuUHVonXqVOAJebGQnV82of6+sPweRxwh1s=
 =MfAR
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-04-22

This series includes updates to mlx5e driver RX data path and some
significant XDP RX/TX improvements to overcome/mitigate HW and PCIE
bottlenecks.

From Tariq:
1) Some Enhancements in rq->flags
2) Stabilize RX packet rate (on Striding RQ) with
multiple outstanding UMR posts
In this patch, we add support for multiple outstanding UMR posts,
 to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.

Performance test:
As expected, huge improvement in large-scale (48 cores).

xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.

Before: Unstable, 7 to 30 Mpps
After:  Stable,   at 70.5 Mpps

From Shay:
3) XDP, Inline small packets into the TX MPWQE in XDP xmit flow

Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).

When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.

Performance:
    Tested packet rate for UDP 64Byte multi-stream
    over two dual port ConnectX-5 100Gbps NICs.
    CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

    * Tested with hyper-threading disabled

    XDP_TX:

    |          | before | after   |       |
    | 24 rings | 51Mpps | 116Mpps | +126% |
    | 1 ring   | 12Mpps | 12Mpps  | same  |

    XDP_REDIRECT:

    ** Below is the transmit rate, not the redirection rate
    which might be larger, and is not affected by this patch.

    |          | before  | after   |      |
    | 32 rings | 64Mpps  | 92Mpps  | +43% |
    | 1 ring   | 6.4Mpps | 6.4Mpps | same |

As we can see, feature significantly improves scaling, without
hurting single ring performance.

From Maxim:
4) Some trivial refactoring and code improvements prior to a larger series
to support AF_XDP.
====================

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-23 17:03:40 -07:00
Maxim Mikityanskiy
f8ebecf2e3 net/mlx5e: Use #define for the WQE wait timeout constant
Create a #define for the timeout of mlx5e_wait_for_min_rx_wqes to
clarify the meaning of a magic number.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:22 -07:00
Maxim Mikityanskiy
03ceda6fe1 net/mlx5e: Remove unused rx_page_reuse stat
Remove the no longer used page_reuse stat of RQs.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:22 -07:00
Maxim Mikityanskiy
63d26b490b net/mlx5e: Take HW interrupt trigger into a function
mlx5e_trigger_irq posts a NOP to the ICO SQ just to trigger an IRQ and
enter the NAPI poll on the right CPU according to the affinity. Use it
in mlx5e_activate_rq.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:22 -07:00
Maxim Mikityanskiy
10961c5606 net/mlx5e: Remove unused parameter
mdev is unused in mlx5e_rx_is_linear_skb.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:22 -07:00
Maxim Mikityanskiy
b1b187e102 net/mlx5e: Add an underflow warning comment
mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on
the requested number of frames in the RQ (F) and the number of packets
per WQE (P). It ensures that N is not less than the minimum number of
WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min
should be true. This function deals with logarithms, so it should check
that log(F) - log(P) >= log(N_min). However, if F < P, this expression
will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min)
instead.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:21 -07:00
Maxim Mikityanskiy
9a22d5d839 net/mlx5e: Move parameter calculation functions to en/params.c
This commit moves the parameter calculation functions to a separate file
for better modularity and code sharing with future features.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:21 -07:00
Maxim Mikityanskiy
74bbaebf3c net/mlx5e: Report mlx5e_xdp_set errors
If the channels fail to reopen after setting an XDP program, return the
error code instead of 0. A proper fix is still needed, as now any error
while reopening the channels brings the interface down. This patch only
adds error reporting.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:21 -07:00
Maxim Mikityanskiy
83b2fd64ba net/mlx5e: Remove unused parameter
params is unused in mlx5e_init_di_list.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:20 -07:00
Shay Agroskin
c2273219ba net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow
Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).

When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.

This is added to better utilize the HW resources (which now makes
one less packet data prefetch) and allow better scalability, on the
account of CPU usage (which now 'memcpy's the packet into the WQE).

To load balance between HW and CPU and get max packet rate, we use
watermarks to detect how much the HW is congested and move the work
loads back and forth between HW and CPU.

Performance:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

* Tested with hyper-threading disabled

XDP_TX:

|          | before | after   |       |
| 24 rings | 51Mpps | 116Mpps | +126% |
| 1 ring   | 12Mpps | 12Mpps  | same  |

XDP_REDIRECT:

** Below is the transmit rate, not the redirection rate
which might be larger, and is not affected by this patch.

|          | before  | after   |      |
| 32 rings | 64Mpps  | 92Mpps  | +43% |
| 1 ring   | 6.4Mpps | 6.4Mpps | same |

As we can see, feature significantly improves scaling, without
hurting single ring performance.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:20 -07:00
Shay Agroskin
73cab880e7 net/mlx5e: XDP, Add TX MPWQE session counter
This counter tracks how many TX MPWQE sessions are started in XDP SQ
in XDP TX/REDIRECT flow. It counts per-channel and global stats.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:20 -07:00
Tariq Toukan
15143bf51c net/mlx5e: XDP, Enhance RQ indication for XDP redirect flush
The XDP redirect flush indication belongs to the receive queue,
not to its XDP send queue.

For this, use a new bit on rq->flags.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:19 -07:00
Tariq Toukan
f03590f74c net/mlx5e: XDP, Fix shifted flag index in RQ bitmap
Values in enum mlx5e_rq_flag are used as bit indixes.
Intention was to use them with no BIT(i) wrapping.

No functional bug fix here, as the same (shifted)flag bit
is used for all set, test, and clear operations.

Fixes: 121e892754 ("net/mlx5e: Refactor RQ XDP_TX indication")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:19 -07:00
Tariq Toukan
fd9b4be800 net/mlx5e: RX, Support multiple outstanding UMR posts
The buffers mapping of the Multi-Packet WQEs (of Striding RQ)
is done via UMR posts, one UMR WQE per an RX MPWQE.

A single MPWQE is capable of serving many incoming packets,
usually larger than the budget of a single napi cycle.
Hence, posting a single UMR WQE per napi cycle (and handling its
completion in the next cycle) works fine in many common cases,
but not always.

When an XDP program is loaded, every MPWQE is capable of serving less
packets, to satisfy the packet-per-page requirement.
Thus, for the same number of packets more MPWQEs (and UMR posts)
are needed (twice as much for the default MTU), giving less latency
room for the UMR completions.

In this patch, we add support for multiple outstanding UMR posts,
to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.

For better SW and HW locality, we combine the UMR posts in bulks of
(at least) two.

This is expected to improve packet rate in high CPU scale.

Performance test:
As expected, huge improvement in large-scale (48 cores).

xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.

Before: Unstable, 7 to 30 Mpps
After:  Stable,   at 70.5 Mpps

No degradation in other tested scenarios.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23 12:09:19 -07:00
Saeed Mahameed
3839f99d21 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux 2019-04-23 11:57:33 -07:00
Stanislav Fomichev
c43f1255b8 net: pass net_device argument to the eth_get_headlen
Update all users of eth_get_headlen to pass network device, fetch
network namespace from it and pass it down to the flow dissector.
This commit is a noop until administrator inserts BPF flow dissector
program.

Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-23 18:36:34 +02:00
Ido Schimmel
7a1ff9f45b mlxsw: spectrum_buffers: Adjust CPU port shared buffer egress quotas
Switch the CPU port to use the new dedicated egress pool instead the
previously used egress pool which was shared with normal front panel
ports.

Add per-port quotas for the amount of traffic that can be buffered for
the CPU port and also adjust the per-{port, TC} quotas.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:33 -07:00
Ido Schimmel
6d28725c4d mlxsw: spectrum_buffers: Allow skipping ingress port quota configuration
The CPU port is used to transmit traffic that is trapped to the host
CPU. It is therefore irrelevant to define ingress quota for it.

Add a 'skip_ingress' argument to the function tasked with configuring
per-port quotas, so that ingress quotas could be skipped in case the
passed local port is the CPU port.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:33 -07:00
Ido Schimmel
24a7cc1ef6 mlxsw: spectrum_buffers: Split business logic from mlxsw_sp_port_sb_pms_init()
The function is used to set the per-port shared buffer quotas.
Currently, these quotas are only set for front panel ports, but a
subsequent patch will configure these quotas for the CPU port as well.

The configuration required for the CPU port is a bit different than that
of the front panel ports, so split the business logic into a separate
function which will be called with different parameters for the CPU
port.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:33 -07:00
Ido Schimmel
50b5b90514 mlxsw: spectrum_buffers: Use new CPU ingress pool for control packets
Use the new ingress pool that was added in the previous patch for
control packets (e.g., STP, LACP) that are trapped to the CPU.

The previous management pool is no longer necessary and therefore its
size is set to 0.

The maximum quota for traffic towards the CPU is increased to 50% of the
free space in the new ingress pool and therefore the reserved space is
reduced by half, to 10KB - in both the shared and headroom buffer. This
allows for more efficient utilization of the shared buffer as reserved
space cannot be used for other purposes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:33 -07:00
Ido Schimmel
265c49b4b9 mlxsw: spectrum_buffers: Add pools for CPU traffic
Packets that are trapped to the CPU are transmitted through the CPU port
to the attached host. The CPU port is therefore like any other port and
needs to have shared buffer configuration.

The maximum quotas configured for the CPU are provided using dynamic
threshold and cannot be changed by the user. In order to make sure that
these thresholds are always valid, the configuration of the threshold
type of these pools is forbidden.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Ido Schimmel
857f138f04 mlxsw: spectrum_buffers: Remove assumption about pool order
The code currently assumes that ingress pools have lower indices than
egress pools. This makes it impossible to add more ingress pools
without breaking user configuration that relies on a certain pool index
to correspond to an egress pool.

Remove such assumptions from the code, so that more ingress pools could
be added by subsequent patches.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Ido Schimmel
f1aaeacdae mlxsw: spectrum_buffers: Forbid changing multicast TCs' attributes
Commit e83c045e53 ("mlxsw: spectrum_buffers: Configure MC pool")
configured the threshold of the multicast TCs as infinite so that the
admission of multicast packets is only depended on per-switch priority
threshold.

Forbid the user from changing the thresholds of these multicast TCs and
their binding to a different pool.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Ido Schimmel
51e15a4978 mlxsw: spectrum_buffers: Forbid changing threshold type of first egress pool
Multicast packets have three egress quotas:
* Per egress port
* Per egress port and traffic class
* Per switch priority

The limits on the switch priority are not exposed to the user and
specified as dynamic threshold on the first egress pool.

Forbid changing the threshold type of the first egress pool so that
these limits are always valid.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Ido Schimmel
cce7acca8a mlxsw: spectrum_buffers: Forbid configuration of multicast pool
Commit e83c045e53 ("mlxsw: spectrum_buffers: Configure MC pool") added
a dedicated pool for multicast traffic. The pool is visible to the user
so that it would be possible to monitor its occupancy, but its
configuration should be forbidden in order to maintain its intended
operation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Ido Schimmel
f7936d0bcf mlxsw: spectrum_buffers: Add ability to veto TC's configuration
Subsequent patches are going to need to veto changes in certain TCs'
binding and threshold configurations.

Add fields to the TC's struct that indicate if the TC can be bound to a
different pool and whether its threshold can change and enforce that.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Ido Schimmel
0636f4de79 mlxsw: spectrum_buffers: Add ability to veto pool's configuration
Subsequent patches are going to need to veto changes in certain pools'
size and / or threshold type (mode).

Add two fields to the pool's struct that indicate if either of these
attributes is allowed to change and enforce that.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Ido Schimmel
93d3668c02 mlxsw: spectrum_buffers: Use defines for pool indices
The pool indices are currently hard coded throughout the code, which
makes the code hard to follow and extend.

Overcome this by using defines for the pool indices.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Ido Schimmel
8f6862065d mlxsw: spectrum_buffers: Add extack messages for invalid configurations
Add extack messages to better communicate invalid configuration to the
user.

Example:

# devlink sb pool set pci/0000:01:00.0 pool 0 size 104857600 thtype dynamic
Error: mlxsw_spectrum: Exceeded shared buffer size.
devlink answers: Invalid argument

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Ido Schimmel
f2ad1a522e net: devlink: Add extack to shared buffer operations
Add extack to shared buffer set operations, so that meaningful error
messages could be propagated to the user.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 22:09:32 -07:00
Saeed Mahameed
c3bdd5e651 Linux 5.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlyOup0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHKoIAIKVuBSyD+m65TaM
 pjoAFa56weEc67Mmai2A84EOm0MVy9C6L7EOcOgVsJiLxDCYyWQ7xYwV2kceKJpW
 H5xauhb3+TxpxYeaeKdPPPHmBdejRwOPYvGAfnDMCqCCWQTad52sQUPCLI+yhF1t
 wgnuMi+SwNBWP9aYCXdFPK4fVhh27AcEAOEsRVCh4tIBH/wkf4GwrDr3IX1MFeMX
 jE/R43la4hu1swcWBsjkErWUasVPCgJSSQTfKDo9PQTVnoh0PHFp4fkOInVKLymQ
 7AGo+Knc+1he+sFsB2IbZwea0xqtJtjtr1oC+at8gNx66qVG+o7UZNi5LR1uPW4Z
 4+dwGBk=
 =pyXR
 -----END PGP SIGNATURE-----

Merge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next

Linux 5.1-rc1

We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-22 15:25:39 -07:00
David Ahern
be659b8d3c ipv6: Restore RTF_ADDRCONF check in rt6_qualify_for_ecmp
The RTF_ADDRCONF flag filters out routes added by RA's in determining
which routes can be appended to an existing one to create a multipath
route. Restore the flag check and add a comment to document the RA piece.

Fixes: 4e54507ab1 ("ipv6: Simplify rt6_qualify_for_ecmp")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-21 19:44:16 -07:00
David Ahern
4e54507ab1 ipv6: Simplify rt6_qualify_for_ecmp
After commit c7a1ce397a ("ipv6: Change addrconf_f6i_alloc to use
ip6_route_info_create"), the gateway is no longer filled in for fib6_nh
structs in a prefix route. Accordingly, the RTF_ADDRCONF flag check can
be dropped from the 'rt6_qualify_for_ecmp'.

Further, RTF_DYNAMIC is only set in rt6_info instances, so it can be
removed from the check as well.

This reduces rt6_qualify_for_ecmp and the mlxsw version to just checking
if the nexthop has a gateway which is the real indication of whether
entries can be coalesced into a multipath route.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-21 10:39:52 -07:00
Ido Schimmel
05414dd116 mlxsw: spectrum_router: Relax FIB rule validation
Currently, mlxsw does not support policy-based routing (PBR) and
therefore forbids the installation of non-default FIB rules except for
the l3mdev rule which is used for VRFs.

Relax the check to allow the installation of FIB rules that would never
match packets received by the device. Specifically, if the iif is that
of the loopback netdev. This is useful for users that need to redirect
locally generated packets based on FIB rules.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-21 10:31:45 -07:00
Ido Schimmel
fa73989f26 mlxsw: spectrum: Use a stable ECMP/LAG seed
In order to get a consistent behavior of traffic flows across reboots /
module unload, we need to use the same ECMP/LAG seed.

Calculate the seed by hashing the base MAC of the device. This results
in a seed that is both unique (to avoid polarization) and consistent.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-21 10:31:45 -07:00
Jiri Pirko
0a9798c123 mlxsw: spectrum: Assume CONFIG_NET_DEVLINK is always enabled
Since commit f6b19b354d ("net: devlink: select NET_DEVLINK
from drivers") adds implicit select of NET_DEVLINK for
mlxsw, the code does not have to deal with the case
when CONFIG_NET_DEVLINK is not enabled. So remove the ifcase
and adjust Makefile.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19 15:03:55 -07:00
Erez Alfasi
ace329f4ab net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query
Querying EEPROM high pages data for SFP module is currently
not supported by our driver and yet queried, resulting in
invalid FW queries.

Set the EEPROM ethtool data length to 256 for SFP module will
limit the reading for page 0 only and prevent invalid FW queries.

Fixes: bb64143eee ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-19 13:55:37 -07:00
Maxim Mikityanskiy
d460c27189 net/mlx5e: Fix the max MTU check in case of XDP
MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for
NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ.
This commit fixes the calculations and adds a brief explanation for the
formula used.

Fixes: a26a5bdf3e ("net/mlx5e: Restrict the combination of large MTU and XDP")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-19 13:55:36 -07:00
Maxim Mikityanskiy
12fc512f57 net/mlx5e: Fix use-after-free after xdp_return_frame
xdp_return_frame releases the frame. It leads to releasing the page, so
it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf
is at xdp->data_hard_start after convert_to_xdp_frame. This patch moves
the memory access to precede the return of the frame.

Fixes: 58b99ee3e3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-19 13:55:36 -07:00
Amit Cohen
151f0dddbb mlxsw: spectrum: Fix autoneg status in ethtool
If link is down and autoneg is set to on/off, the status in ethtool does
not change.

The reason is when the link is down the function returns with zero
before changing autoneg value.

Move the checking of link state (up/down) to be performed after setting
autoneg value, in order to be sure that autoneg will change in any case.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18 10:37:30 -07:00
Ido Schimmel
1ab3030193 mlxsw: pci: Reincrease PCI reset timeout
During driver initialization the driver sends a reset to the device and
waits for the firmware to signal that it is ready to continue.

Commit d2f372ba09 ("mlxsw: pci: Increase PCI SW reset timeout")
increased the timeout to 13 seconds due to longer PHY calibration in
Spectrum-2 compared to Spectrum-1.

Recently it became apparent that this timeout is too short and therefore
this patch increases it again to a safer limit that will be reduced in
the future.

Fixes: c3ab435466 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Fixes: d2f372ba09 ("mlxsw: pci: Increase PCI SW reset timeout")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18 10:37:30 -07:00
Petr Machata
f476b3f809 mlxsw: spectrum: Put MC TCs into DWRR mode
Both Spectrum-1 and Spectrum-2 chips are currently configured such that
pairs of TC n (which is used for UC traffic) and TC n+8 (which is used
for MC traffic) are feeding into the same subgroup. Strict
prioritization is configured between the two TCs, and by enabling
MC-aware mode on the switch, the lower-numbered (UC) TCs are favored
over the higher-numbered (MC) TCs.

On Spectrum-2 however, there is an issue in configuration of the
MC-aware mode. As a result, MC traffic is prioritized over UC traffic.
To work around the issue, configure the MC TCs with DWRR mode (while
keeping the UC TCs in strict mode).

With this patch, the multicast-unicast arbitration results in the same
behavior on both Spectrum-1 and Spectrum-2 chips.

Fixes: 7b81953066 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18 10:37:30 -07:00
David S. Miller
6b0a7f84ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflict resolution of af_smc.c from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17 11:26:25 -07:00
Ido Schimmel
caf345a18b mlxsw: spectrum_router: Add neighbour offload indication
In a similar fashion to routes and FDB entries, the neighbour table is
reflected to the device.

Set an offload indication on the neighbour in case it was programmed to
the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-15 13:29:20 -07:00
Ido Schimmel
a85e84e030 mlxsw: spectrum_router: Propagate neighbour update errors
Next patch will add offload indication to neighbours, but the indication
should only be altered in case the neighbour was successfully added to /
deleted from the device.

Propagate neighbour update errors, so that they could be taken into
account by the next patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-15 13:29:20 -07:00
David S. Miller
7324880182 mlx5-fixes-2019-04-09
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcrPOfAAoJEEg/ir3gV/o+c1sIAIuVUmF95OK6BxrNxQ31HN7i
 0V/OW29V6B5musqyGXVa90nl9wJ9BE2tmtHsg2HPABXdGdiYhNRP7Tm+aq+QYBe3
 8kJVk5U+HCLeHvf9k3dpJZokMzAgEhuWAbuAE1YelYUtbOXO9Zrj2uTL1NHJTYyc
 SNOg9+gATOMsOAuiUyygN0XMoYESTsUE7UH4tuhyYr44cKR85qOQDPAlcDEHGTfO
 uHWwmOznZqFVJUVyfwtEkTojsxNiW+QA2PR5faX/+eI7746qXOAzYq2JSjtNEyTz
 4xB9a+t47xpGDw4Svwu51pDw+4Uiiy1Yv0kOKKpBqrCk892bZ8l1gWcHRgjYx/8=
 =9wkB
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2019-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-04-09

This series provides some fixes to mlx5 driver.

I've cc'ed some of the checksum fixes to Eric Dumazet and i would like to get
his feedback before you pull.

For -stable v4.19
('net/mlx5: FPGA, tls, idr remove on flow delete')
('net/mlx5: FPGA, tls, hold rcu read lock a bit longer')

For -stable v4.20
('net/mlx5e: Rx, Check ip headers sanity')
('Revert "net/mlx5e: Enable reporting checksum unnecessary also for L3 packets"')
('net/mlx5e: Rx, Fixup skb checksum for packets with tail padding')

For -stable v5.0
('net/mlx5e: Switch to Toeplitz RSS hash by default')
('net/mlx5e: Protect against non-uplink representor for encap')
('net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-14 15:07:30 -07:00
Ido Schimmel
d5949d92c2 mlxsw: spectrum_buffers: Add a multicast pool for Spectrum-2
In Spectrum-1, when a multicast packet is admitted to the shared buffer
it increases the quotas of all the ports and {port, TC} to which it is
forwarded to.

The above means that multicast packets are accounted multiple times in
the shared buffer and can therefore cause the associated shared buffer
pool to fill up very quickly.

To work around this issue, commit e83c045e53 ("mlxsw:
spectrum_buffers: Configure MC pool") added a dedicated multicast pool
in which multicast packets are accounted.

The issue is not present in Spectrum-2, but in order to be backward
compatible with Spectrum-1, its default behavior is to allow a multicast
packet to increase multiple egress quotas instead of one.

Until the new (non-backward compatible) mode is supported, configure a
dedicated multicast pool as in Spectrum-1.

Fixes: fe099bf682 ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10 11:57:08 -07:00