linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2025-01-23 06:14:42 +08:00

Author	SHA1	Message	Date
Paul Blakey	69e2916ebc	net/mlx5: CT: Add support for mirroring Add support for mirroring before the CT action by spliting the pre ct rule. Mirror outputs are done first on the tc chain,prio table rule (the fwd rule), which will then forward to a per port fwd table. On this fwd table, we insert the original pre ct rule that forwards to ct/ct nat table. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:33 -08:00
Alaa Hleihel	287e0df021	net/mlx5: Display the command index in command mailbox dump Multiple commands can be printed at the same time which can lead to wrong order of their lines in dmesg output. As a result, it's hard to match data dumps to the correct command or which command was fully dumped at some point. Fix this by displaying the corresponding command index, and also indicate when a command was fully dumped. Signed-off-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:33 -08:00
Arnd Bergmann	2119bda642	net/mlx5e: allocate 'indirection_rqt' buffer dynamically Increasing the size of the indirection_rqt array from 128 to 256 bytes pushed the stack usage of the mlx5e_hairpin_fill_rqt_rqns() function over the warning limit when building with clang and CONFIG_KASAN: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:970:1: error: stack frame size of 1180 bytes in function 'mlx5e_tc_add_nic_flow' [-Werror,-Wframe-larger-than=] Using dynamic allocation here is safe because the caller does the same, and it reduces the stack usage of the function to just a few bytes. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:32 -08:00
Tariq Toukan	e16cf9d754	net/mlx5e: Dump ICOSQ WQE descriptor on CQE with error events Dump the ICOSQ's WQE descriptor when a completion with error is received. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:32 -08:00
Maxim Mikityanskiy	991b265460	net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath Commit `e20f0dbf20` ("net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES") switched to using net_prefetchw at all places in mlx5e. In the same time frame, commit `5af75c747e` ("net/mlx5e: Enhanced TX MPWQE for SKBs") added one more usage of prefetchw. When these two changes were merged, this new occurrence of prefetchw wasn't replaced with net_prefetchw. This commit fixes this last occurrence of prefetchw in mlx5e_tx_mpwqe_session_start, making the same change that was done in mlx5e_xdp_mpwqe_session_start. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:31 -08:00
Roi Dayan	bca08a9145	net/mlx5e: Remove redundant newline in NL_SET_ERR_MSG_MOD Fix the following coccicheck warnings: drivers/net/ethernet/mellanox/mlx5/core/devlink.c:145:29-66: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD drivers/net/ethernet/mellanox/mlx5/core/devlink.c:140:29-77: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:31 -08:00
Mark Zhang	093bd76469	net/mlx5: Read congestion counters from all ports when lag is active Read congestion counters from all ports in any lag mode rather than only in RoCE lag mode (e.g., VF lag). Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:31 -08:00
Jiapeng Chong	7976092241	net/mlx5: remove unneeded semicolon Fix the following coccicheck warnings: ./drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c:495:2-3: Unneeded semicolon. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:30 -08:00
Junlin Yang	ad2c99ca75	net/mlx5: use kvfree() for memory allocated with kvzalloc() It is allocated with kvzalloc(), the corresponding release function should not be kfree(), use kvfree() instead. Generated by: scripts/coccinelle/api/kfree_mismatch.cocci Signed-off-by: Junlin Yang <yangjunlin@yulong.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:30 -08:00
Yevgeny Kliteynik	cc82a2e6c8	net/mlx5: DR, Add missing vhca_id consume from STEv1 The field source_eswitch_owner_vhca_id was not consumed in the same way as in STEv0. Added the missing set. Fixes: `10b6941864` ("net/mlx5: DR, Add HW STEv1 match logic") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:30 -08:00
Yevgeny Kliteynik	1412477882	net/mlx5: DR, Remove unneeded rx_decap_l3 function for STEv1 Remove the dr_ste_v1_set_rx_decap_l3 function that was replaced by another function - fixing a rebase error. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:29 -08:00
Yevgeny Kliteynik	0142f09764	net/mlx5: DR, Fixed typo in STE v0 "reforamt" -> "reformat" Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-12 15:29:29 -08:00
Ido Schimmel	cf31190ae0	mlxsw: spectrum_matchall: Implement sampling using mirroring Spectrum-2 and later ASICs support sampling of packets by mirroring to the CPU with probability. There are several advantages compared to the legacy dedicated sampling mechanism: * Extra metadata per-packet: Egress port, egress traffic class, traffic class occupancy and end-to-end latency * Ability to sample packets on egress / per-flow Convert Spectrum-2 and later ASICs to perform sampling by mirroring to the CPU with probability. Subsequent patches will add support for egress / per-flow sampling and expose the extra metadata. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:22:39 -08:00
Ido Schimmel	34a277212c	mlxsw: spectrum_trap: Split sampling traps between ASICs Sampling of ingress packets is supported using a dedicated sampling mechanism on all Spectrum ASICs. However, Spectrum-2 and later ASICs support more sophisticated sampling by mirroring packets to the CPU. As a preparation for more advanced sampling configurations, split the trap configuration used for sampled packets between Spectrum-1 and later ASICs. This is needed since packets that are mirrored to the CPU are trapped via a different trap identifier compared to packets that are sampled using the dedicated sampling mechanism. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:22:39 -08:00
Ido Schimmel	20afb9bc48	mlxsw: spectrum_matchall: Split sampling support between ASICs Sampling of ingress packets is supported using a dedicated sampling mechanism on all Spectrum ASICs. However, Spectrum-2 and later ASICs support more sophisticated sampling by mirroring packets to the CPU. As a preparation for more advanced sampling configurations, split the sampling operations between Spectrum-1 and later ASICs. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:22:39 -08:00
Ido Schimmel	2dcbd9207b	mlxsw: spectrum_span: Add SPAN probability rate support Currently, every packet that matches a mirroring trigger (e.g., received packets, buffer dropped packets) is mirrored. Spectrum-2 and later ASICs support mirroring with probability, where every 1 in N matched packets is mirrored. Extend the API that creates the binding between the trigger and the SPAN agent with a probability rate parameter, which is an attribute of the trigger. Set it to '1' to maintain existing behavior. Subsequent patches will use it to perform more sophisticated sampling, by mirroring packets to the CPU with probability. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:22:39 -08:00
Ido Schimmel	fa3faeb7ae	mlxsw: reg: Extend mirroring registers with probability rate field The MPAR and MPAGR registers are used to configure the binding between the mirroring trigger (e.g., received packet) and the SPAN agent. Add probability rate field, which will allow us to support sampling by mirroring to the CPU. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:22:39 -08:00
Ido Schimmel	5c7659eba8	mlxsw: spectrum_span: Add SPAN session identifier support When packets are mirrored to the CPU, the trap identifier with which the packets are trapped is determined according to the session identifier of the SPAN agent performing the mirroring. Packets that are trapped for the same logical reason (e.g., buffer drops) should use the same session identifier. Currently, a single session is implicitly supported (identifier 0) and is used for packets that are mirrored to the CPU due to buffer drops (e.g., early drop). Subsequent patches are going to mirror packets to the CPU due to sampling, which will require a different session identifier. Prepare for that by making the session identifier an attribute of the SPAN agent. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:22:39 -08:00
Roi Dayan	9f4d928338	net/mlx5e: Alloc flow spec using kvzalloc instead of kzalloc flow spec is not small and we do allocate it using kvzalloc in most places of the driver. fix rest of the places to use kvzalloc to avoid failure in allocation when memory is too fragmented. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:15 -08:00
Eli Cohen	61e9508f1e	net/mlx5: Avoid unnecessary operation fs_get_obj retrieves the container of fs_parent_node just to pass the same value as &fs_ns->node. Just pass fs_parent_node to init_root_tree_recursive() to get exactly the same effect. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:15 -08:00
Saeed Mahameed	03e219c4cf	net/mlx5e: rep: Improve reg_cX conditions There is no point of calculating reg_c1 or overriding reg_c0 if we are going to abort the function. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>	2021-03-11 14:35:14 -08:00
Roi Dayan	3094552bcd	net/mlx5: SF, Fix return type Fix the following coccicheck warnings: drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h:50:8-9: WARNING: return of 0/1 in function 'mlx5_sf_dev_allocated' with return type bool Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:14 -08:00
Saeed Mahameed	51ada5a523	net/mlx5e: mlx5_tc_ct_init does not fail mlx5_tc_ct_init() either returns a valid pointer or a NULL, either way the caller can continue, remove IS_ERR check from callers as it has no effect. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:14 -08:00
Vlad Buslov	fbeab6be05	net/mlx5: Fix indir stable stubs Some of the stubs for CONFIG_MLX5_CLS_ACT==disabled are missing "static inline" in their definition which causes the following compilation warnings: In file included from drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:41: >> drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:34:1: warning: no previous prototype for function 'mlx5_esw_indir_table_init' [-Wmissing-prototypes] mlx5_esw_indir_table_init(void) ^ drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:33:1: note: declare 'static' if the function is not intended to be used outside of this translation unit struct mlx5_esw_indir_table * ^ static >> drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:40:1: warning: no previous prototype for function 'mlx5_esw_indir_table_destroy' [-Wmissing-prototypes] mlx5_esw_indir_table_destroy(struct mlx5_esw_indir_table indir) ^ drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:39:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void ^ static >> drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:61:1: warning: no previous prototype for function 'mlx5_esw_indir_table_needed' [-Wmissing-prototypes] mlx5_esw_indir_table_needed(struct mlx5_eswitch esw, ^ drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:60:1: note: declare 'static' if the function is not intended to be used outside of this translation unit bool ^ static 3 warnings generated. Add "static inline" prefix to signatures of stubs that were missing it. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:13 -08:00
Vlad Buslov	5632817b14	net/mlx5e: Add missing include When CONFIG_IPV6 is disabled the header nexthop.h is not included by fib_notifier.h which causes tc_tun_encap.c to fail to compile: In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:5: In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.h:7: In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h:7: In file included from drivers/net/ethernet/mellanox/mlx5/core/en_tc.h:40: drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h:78:5: warning: no previous prototype for function 'mlx5e_tc_tun_update_header_ipv6' [-Wmissing-prototypes] int mlx5e_tc_tun_update_header_ipv6(struct mlx5e_priv priv, ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h:78:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int mlx5e_tc_tun_update_header_ipv6(struct mlx5e_priv priv, ^ static >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:12: error: implicit declaration of function 'fib_info_nh' [-Werror,-Wimplicit-function-declaration] fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev; ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:12: note: did you mean 'fib_info_put'? include/net/ip_fib.h:528:20: note: 'fib_info_put' declared here static inline void fib_info_put(struct fib_info fi) ^ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:42: error: member reference type 'int' is not a pointer fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ include/net/ip_fib.h:113:21: note: expanded from macro 'fib_nh_dev' #define fib_nh_dev nh_common.nhc_dev ^ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1552:13: error: incomplete definition of type 'struct fib6_entry_notifier_info' fen_info = container_of(info, struct fib6_entry_notifier_info, info); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/kernel.h:694:51: note: expanded from macro 'container_of' BUILD_BUG_ON_MSG(!__same_type((ptr), ((type )0)->member) && \ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ include/linux/compiler_types.h:256:74: note: expanded from macro '__same_type' #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) ^ include/linux/build_bug.h:39:58: note: expanded from macro 'BUILD_BUG_ON_MSG' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ include/linux/compiler_types.h:320:22: note: expanded from macro 'compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/compiler_types.h:308:23: note: expanded from macro '_compiletime_assert' __compiletime_assert(condition, msg, prefix, suffix) ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/compiler_types.h:300:9: note: expanded from macro '__compiletime_assert' if (!(condition)) \ ^~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info fen_info; ^ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1552:13: error: offsetof of incomplete type 'struct fib6_entry_notifier_info' fen_info = container_of(info, struct fib6_entry_notifier_info, info); ^ ~~~~~~ include/linux/kernel.h:697:21: note: expanded from macro 'container_of' ((type )(__mptr - offsetof(type, member))); }) ^ ~~~~ include/linux/stddef.h:17:32: note: expanded from macro 'offsetof' #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) ^ ~~~~ include/linux/compiler_types.h:140:35: note: expanded from macro '__compiler_offsetof' #define __compiler_offsetof(a, b) __builtin_offsetof(a, b) ^ ~ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info fen_info; ^ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1552:11: error: assigning to 'struct fib6_entry_notifier_info ' from incompatible type 'void' fen_info = container_of(info, struct fib6_entry_notifier_info, info); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1553:12: error: implicit declaration of function 'fib6_info_nh_dev' [-Werror,-Wimplicit-function-declaration] fib_dev = fib6_info_nh_dev(fen_info->rt); ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1553:37: error: incomplete definition of type 'struct fib6_entry_notifier_info' fib_dev = fib6_info_nh_dev(fen_info->rt); ~~~~~~~~^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info fen_info; ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1555:14: error: incomplete definition of type 'struct fib6_entry_notifier_info' fen_info->rt->fib6_dst.plen != 128) ~~~~~~~~^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info fen_info; ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1562:39: error: incomplete definition of type 'struct fib6_entry_notifier_info' memcpy(&key.endpoint_ip.v6, &fen_info->rt->fib6_dst.addr, ~~~~~~~~^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info fen_info; ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1563:24: error: incomplete definition of type 'struct fib6_entry_notifier_info' sizeof(fen_info->rt->fib6_dst.addr)); ~~~~~~~~^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info *fen_info; ^ 1 warning and 10 errors generated. Manually include net/nexthop.h in tc_tun_encap.c. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:13 -08:00
Arnd Bergmann	87f77a6797	net/mlx5e: fix mlx5e_tc_tun_update_header_ipv6 dummy definition The alternative implementation of this function in a header file is declared as a global symbol, and gets added to every .c file that includes it, which leads to a link error: arm-linux-gnueabi-ld: drivers/net/ethernet/mellanox/mlx5/core/en_rx.o: in function `mlx5e_tc_tun_update_header_ipv6': en_rx.c:(.text+0x0): multiple definition of `mlx5e_tc_tun_update_header_ipv6'; drivers/net/ethernet/mellanox/mlx5/core/en_main.o:en_main.c:(.text+0x0): first defined here Mark it 'static inline' like the other functions here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:12 -08:00
Roi Dayan	76e68d950a	net/mlx5e: CT, Avoid false lock dependency warning To avoid false lock dependency warning set the ct_entries_ht lock class different than the lock class of the ht being used when deleting last flow from a group and then deleting a group, we get into del_sw_flow_group() which call rhashtable_destroy on fg->ftes_hash which will take ht->mutex but it's different than the ht->mutex here. ====================================================== WARNING: possible circular locking dependency detected 5.10.0-rc2+ #8 Tainted: G O ------------------------------------------------------ revalidator23/24009 is trying to acquire lock: ffff888128d83828 (&node->lock){++++}-{3:3}, at: mlx5_del_flow_rules+0x83/0x7a0 [mlx5_core] but task is already holding lock: ffff8881081ef518 (&ht->mutex){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x37/0x720 which lock already depends on the new lock. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:12 -08:00
Leon Romanovsky	fe06992b04	net/mlx5: Check returned value from health recover sequence MLX5_INTERFACE_STATE_UP is far from being reliable check for success to recover, because it can be changed any time and health logic doesn't have any locks to protect from it. The locks are not needed here because health recover is good to have, but not must to success, so rely on the returned value from the mlx5_recover_device() as a marker for success/failure. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:12 -08:00
Leon Romanovsky	7ad67a20f2	net/mlx5: Don't rely on interface state bit The check of MLX5_INTERFACE_STATE_UP is completely useless, because the FW tracer cleanup is called on every change of the interface and it ensures that notifier is disabled together with canceling all the pending works. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:11 -08:00
Leon Romanovsky	7e615b9978	net/mlx5: Remove second FW tracer check The FW tracer check is called twice, so delete one of them. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:11 -08:00
Leon Romanovsky	6dea2f7eff	net/mlx5: Separate probe vs. reload flows The mix between probe/unprobe and reload flows causes to have an extra mutex lock intf_state_mutex that generates LOCKDEP warning between it and devlink_mutex. As a preparation for the future removal, separate those flows. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:10 -08:00
Leon Romanovsky	d89edb3607	net/mlx5: Remove impossible checks of interface state The interface state is constant at this stage and checked before calling to the register/unregister reserved GIDs. There is no need to double check it. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:10 -08:00
Saeed Mahameed	7bef147a6a	net/mlx5: Don't skip vport check Users of mlx5_eswitch_get_vport() are required to check return value prior to passing mlx5_vport further. Fix all the places to do not skip that check. Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-11 14:35:10 -08:00
Danielle Ratson	4734a750f4	mlxsw: Adjust some MFDE fields shift and size to fw implementation MFDE.irisc_id and MFDE.event_id were adjusted according to what is actually implemented in firmware. Adjust the shift and size of these fields in mlxsw as well. Note that the displacement of the first field is not a regression. It was always incorrect and therefore reported "0". Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:04:57 -08:00
Danielle Ratson	315afd2068	mlxsw: core: Expose MFDE.log_ip to devlink health Add the MFDE.log_ip field to devlink health reporter in order to ease firmware debug. This field encodes the instruction pointer that triggered the CR space timeout. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:04:57 -08:00
Danielle Ratson	ff12ba3ad7	mlxsw: reg: Extend MFDE register with new log_ip field Extend MFDE (Monitoring FW Debug) register with new field specifying the instruction pointer that triggered the CR space timeout. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:04:57 -08:00
Petr Machata	2ab781c2cc	mlxsw: spectrum: Bump minimum FW version to xx.2008.2406 The indicated version fixes the following two issues: - MIRROR_SAMPLER_ACTION.mirror_probability_rate inverted. This has implication for per-flow sampling. - When adjacency is replaced-if-inactive (RATR.opcode=3), bad parameter was reported when replacing an active entry. This breaks offload of resilient next-hop groups. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:04:57 -08:00
Amit Cohen	675e5a1e1a	mlxsw: reg: Fix comment about slot_index field in PMAOS register The comment did not include the register name. Add `pmaos` to align the comment with other comments. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:04:57 -08:00
Danielle Ratson	825e888577	mlxsw: spectrum: Reword an error message for Q-in-Q veto 'Uppers' is not clear enough for all users when referring to upper devices. Reword the error message so it will be clearer. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:04:57 -08:00
Yevgeny Kliteynik	84076c4c80	net/mlx5: DR, Fix potential shift wrapping of 32-bit value in STEv1 getter Fix 32-bit variable shift wrapping in dr_ste_v1_get_miss_addr. Fixes: `a6098129c7` ("net/mlx5: DR, Add STEv1 setters and getters") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:59 -08:00
Shay Drory	dc694f11a7	net/mlx5: SF: Fix error flow of SFs allocation flow When SF id is unavailable, code jumps to wrong label that accesses sw id array outside of its range. Hence, when SF id is not allocated, avoid accessing such array. Fixes: `8f01054186` ("net/mlx5: SF, Add port add delete functionality") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:58 -08:00
Shay Drory	6fa37d66ef	net/mlx5: SF: Fix memory leak of work item Cited patch in the fixes tag missed to free the allocated work. Fix it by freeing the work after work execution. Fixes: `f3196bb0f1` ("net/mlx5: Introduce vhca state event notifier") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:58 -08:00
Parav Pandit	6a3717544c	net/mlx5: SF, Correct vhca context size Fix vhca context size as defined by device interface specification. Fixes: `f3196bb0f1` ("net/mlx5: Introduce vhca state event notifier") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:57 -08:00
Parav Pandit	8b90d89782	net/mlx5e: E-switch, Fix rate calculation division do_div() returns reminder, while cited patch wanted to use quotient. Fix it by using quotient. Fixes: `0e22bfb7c0` ("net/mlx5e: E-switch, Fix rate calculation for overflow") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:57 -08:00
Maor Gottlieb	4806f1e2fe	net/mlx5: Set QP timestamp mode to default QPs which don't care from timestamp mode, should set the ts_format to default, otherwise the QP creation could be failed if the timestamp mode is not supported. Fixes: `2fe8d4b878` ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:56 -08:00
Roi Dayan	469549e477	net/mlx5e: Fix error flow in change profile Move priv memset from init to cleanup to avoid double priv cleanup that can happen on profile change if also roolback fails. Add missing cleanup flow in mlx5e_netdev_attach_profile(). Fixes: `c4d7eb5768` ("net/mxl5e: Add change profile method") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:56 -08:00
Maor Dickman	f574531a0b	net/mlx5: Disable VF tunnel TX offload if ignore_flow_level isn't supported VF tunnel TX traffic offload is adding flow which forward to flow tables with lower level, which isn't support on all FW versions and may cause firmware to fail with syndrome. Fixed by enabling VF tunnel TX offload only if flow table capability ignore_flow_level is enabled. Fixes: `10742efc20` ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:55 -08:00
Roi Dayan	1e74152ed0	net/mlx5e: Check correct ip_version in decapsulation route resolution flow_attr->ip_version has the matching that should be done inner/outer. When working with chains, decapsulation is done on chain0 and next chain match on outer header which is the original inner which could be ipv4. So in tunnel route resolution we cannot use that to know which ip version we are at so save tun_ip_version when parsing the tunnel match and use that. Fixes: `a508728a4c` ("net/mlx5e: VF tunnel RX traffic offloading") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:55 -08:00
Aya Levin	55affa97d6	net/mlx5: Fix turn-off PPS command Fix a bug of uninitialized pin index when trying to turn off PPS out. Fixes: `de19cd6cc9` ("net/mlx5: Move some PPS logic into helper functions") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:54 -08:00
Maor Dickman	385d40b042	net/mlx5e: Don't match on Geneve options in case option masks are all zero The cited change added offload support for Geneve options without verifying the validity of the options masks, this caused offload of rules with match on Geneve options with class,type and data masks which are zero to fail. Fix by ignoring the match on Geneve options in case option masks are all zero. Fixes: `9272e3df30` ("net/mlx5e: Geneve, Add support for encap/decap flows offload") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:54 -08:00
Maxim Mikityanskiy	74640f0973	net/mlx5e: Revert parameters on errors when changing PTP state without reset Port timestamping for PTP can be enabled/disabled while the channels are closed. In that case mlx5e_safe_switch_channels is skipped, and the preactivate hook is called directly. However, if that hook returns an error, the channel parameters must be reverted back to their old values. This commit adds missing handling on this case. Fixes: `145e5637d9` ("net/mlx5e: Add TX PTP port object support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:53 -08:00
Maxim Mikityanskiy	e5eb01344e	net/mlx5e: When changing XDP program without reset, take refs for XSK RQs Each RQ (including XSK RQs) takes a reference to the XDP program. When an XDP program is attached or detached, the channels and queues are recreated, however, there is a special flow for changing an active XDP program to another one. In that flow, channels and queues stay alive, but the refcounts of the old and new XDP programs are adjusted. This flow didn't increment refcount by the number of active XSK RQs, and this commit fixes it. Fixes: `db05815b36` ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:53 -08:00
Aya Levin	1c2cdf0b60	net/mlx5e: Set PTP channel pointer explicitly to NULL When closing the PTP channel, set its pointer explicitly to NULL. PTP channel is opened on demand, the code verify the pointer validity before access. Nullify it when closing the PTP channel to avoid unexpected behavior. Fixes: `145e5637d9` ("net/mlx5e: Add TX PTP port object support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:53 -08:00
Aya Levin	354521eebd	net/mlx5e: Accumulate port PTP TX stats with other channels stats In addition to .get_ethtool_stats, add port PTP TX stats to .ndo_get_stats64. Fixes: `145e5637d9` ("net/mlx5e: Add TX PTP port object support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:52 -08:00
Tariq Toukan	d5dd03b26b	net/mlx5e: RX, Mind the MPWQE gaps when calculating offsets Since cited patch, MLX5E_REQUIRED_WQE_MTTS is not a power of two. Hence, usage of MLX5E_LOG_ALIGNED_MPWQE_PPW should be replaced, as it lost some accuracy. Use the designated macro to calculate the number of required MTTs. This makes sure the solution in cited patch works properly. While here, un-inline mlx5e_get_mpwqe_offset(), and remove the unused RQ parameter. Fixes: `c3c9402373` ("net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:52 -08:00
Tariq Toukan	5115daa675	net/mlx5e: Enforce minimum value check for ICOSQ size The ICOSQ size should not go below MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE. Enforce this where it's missing. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-03-10 11:01:51 -08:00
Kevin(Yudong) Yang	00ff801bb8	net/mlx4_en: update moderation when config reset This patch fixes a bug that the moderation config will not be applied when calling mlx4_en_reset_config. For example, when turning on rx timestamping, mlx4_en_reset_config() will be called, causing the NIC to forget previous moderation config. This fix is in phase with a previous fix: commit `79c54b6bbf` ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called") Tested: Before this patch, on a host with NIC using mlx4, run netserver and stream TCP to the host at full utilization. $ sar -I SUM 1 INTR intr/s 14:03:56 sum 48758.00 After rx hwtstamp is enabled: $ sar -I SUM 1 14:10:38 sum 317771.00 We see the moderation is not working properly and issued 7x more interrupts. After the patch, and turned on rx hwtstamp, the rate of interrupts is as expected: $ sar -I SUM 1 14:52:11 sum 49332.00 Fixes: `79c54b6bbf` ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called") Signed-off-by: Kevin(Yudong) Yang <yyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> CC: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-05 12:42:31 -08:00
Ido Schimmel	dc860b88ce	mlxsw: spectrum_router: Ignore routes using a deleted nexthop object Routes are currently processed from a workqueue whereas nexthop objects are processed in system call context. This can result in the driver not finding a suitable nexthop group for a route and issuing a warning [1]. Fix this by ignoring such routes earlier in the process. The subsequent deletion notification will be ignored as well. [1] WARNING: CPU: 2 PID: 7754 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4853 mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum] [...] CPU: 2 PID: 7754 Comm: kworker/u8:0 Not tainted 5.11.0-rc6-cq-20210207-1 #16 Hardware name: Mellanox Technologies Ltd. MSN2100/SA001390, BIOS 5.6.5 05/24/2018 Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib_event_work [mlxsw_spectrum] RIP: 0010:mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum] Fixes: `cdd6cfc54c` ("mlxsw: spectrum_router: Allow programming routes with nexthop objects") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Alex Veber <alexve@nvidia.com> Tested-by: Alex Veber <alexve@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-26 15:47:53 -08:00
Danielle Ratson	ae9b24ddb6	mlxsw: spectrum_ethtool: Add an external speed to PTYS register Currently, only external bits are added to the PTYS register, whereas there is one external bit that is wrongly marked as internal, and so was recently removed from the register. Add that bit to the PTYS register again, as this bit is no longer internal. Its removal resulted in '100000baseLR4_ER4/Full' link mode no longer being supported, causing a regression on some setups. Fixes: `5bf01b571c` ("mlxsw: spectrum_ethtool: Remove internal speeds from PTYS register") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reported-by: Eddie Shklaer <eddies@nvidia.com> Tested-by: Eddie Shklaer <eddies@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-26 15:47:53 -08:00
Linus Torvalds	5ad3dbab56	Networking fixes for 5.12-rc1. Rather small batch this time. Current release - regressions: - bcm63xx_enet: fix sporadic kernel panic due to queue length mis-accounting Current release - new code bugs: - bcm4908_enet: fix RX path possible mem leak - bcm4908_enet: fix NAPI poll returned value - stmmac: fix missing spin_lock_init in visconti_eth_dwmac_probe() - sched: cls_flower: validate ct_state for invalid and reply flags Previous releases - regressions: - net: introduce CAN specific pointer in the struct net_device to prevent mis-interpreting memory - phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081 - psample: fix netlink skb length with tunnel info Previous releases - always broken: - icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending - wireguard: device: do not generate ICMP for non-IP packets - mptcp: provide subflow aware release function to avoid a mem leak - hsr: add support for EntryForgetTime - r8169: fix jumbo packet handling on RTL8168e - octeontx2-af: fix an off by one in rvu_dbg_qsize_write() - i40e: fix flow for IPv6 next header (extension header) - phy: icplus: call phy_restore_page() when phy_select_page() fails - dpaa_eth: fix the access method for the dpaa_napi_portal Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmA36vIACgkQMUZtbf5S IrsG+xAAkAeZgVd8rCrE68dS9LHWGA9DMIPmguE2rh9gqax0HZDfdukvD251OFT7 60L6NKtOs2kT7r8vhpCHgu54cE7Tk1Fx8Y7Z1Du7Kq7rn9C1qFMx09H2iIP32rFF DjJcWq8E6tgY0FCaT5GbNKit+hE27IFKRwdK40BqWfdQ3D3rqqRdHja6/FPXIlPl 5bkcK3oEOau+yTRjMJaTVhgAmkJ/c5VgHux8mih2XeTbA7mf3+WWyh3Zr3p+7dUb KZ9Ft833ONtjaRaiU6LZX/BjWLwC6WT/NsuP+VgAEl5yhHQ2J5N37ICIcfQPFEs0 g9pDyWfGKy/Cw9577XE5TRuEPPlZJ4jEAL1TR5loSxPkkZwt5pthJDb9moBTwdzi IJNrza6WNx+OZ7KbU5jeZV34ax35dsFDjPQomcLQle3w0h3ESIpxTFWfeiksci8i PnhE+kLmlMmppQZVlydhgvw107bFVmBk2alwsmRzCROg1gOPhVd7VgnYhk6jsif8 v8HtBRrycb4DttSD+ZUaznO9uLg0yJjs+m45leKglvDqQ4me/trAamQnkrYfb9zc aVc+hRNwBbHwkOX2YRNDIhvAZJ3ZLDYP5H4C4A4Yv5E588gWdOxsgWqvZM98uk/P zlzpz28V3cp2rQ4dSnR2IwhfEwaekNkACtdr3VZ7jn1yZZvTl1g= =DUP/ -----END PGP SIGNATURE----- Merge tag 'net-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Rather small batch this time. Current release - regressions: - bcm63xx_enet: fix sporadic kernel panic due to queue length mis-accounting Current release - new code bugs: - bcm4908_enet: fix RX path possible mem leak - bcm4908_enet: fix NAPI poll returned value - stmmac: fix missing spin_lock_init in visconti_eth_dwmac_probe() - sched: cls_flower: validate ct_state for invalid and reply flags Previous releases - regressions: - net: introduce CAN specific pointer in the struct net_device to prevent mis-interpreting memory - phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081 - psample: fix netlink skb length with tunnel info Previous releases - always broken: - icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending - wireguard: device: do not generate ICMP for non-IP packets - mptcp: provide subflow aware release function to avoid a mem leak - hsr: add support for EntryForgetTime - r8169: fix jumbo packet handling on RTL8168e - octeontx2-af: fix an off by one in rvu_dbg_qsize_write() - i40e: fix flow for IPv6 next header (extension header) - phy: icplus: call phy_restore_page() when phy_select_page() fails - dpaa_eth: fix the access method for the dpaa_napi_portal" * tag 'net-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits) r8169: fix jumbo packet handling on RTL8168e net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081 net: psample: Fix netlink skb length with tunnel info net: broadcom: bcm4908_enet: fix NAPI poll returned value net: broadcom: bcm4908_enet: fix RX path possible mem leak net: hsr: add support for EntryForgetTime net: dsa: sja1105: Remove unneeded cast in sja1105_crc32() ibmvnic: fix a race between open and reset net: stmmac: Fix missing spin_lock_init in visconti_eth_dwmac_probe() net: introduce CAN specific pointer in the struct net_device net: usb: qmi_wwan: support ZTE P685M modem wireguard: kconfig: use arm chacha even with no neon wireguard: queueing: get rid of per-peer ring buffers wireguard: device: do not generate ICMP for non-IP packets wireguard: peer: put frequently used members above cache lines wireguard: selftests: test multiple parallel streams wireguard: socket: remove bogus __be32 annotation wireguard: avoid double unlikely() notation when using IS_ERR() net: qrtr: Fix memory leak in qrtr_tun_open vxlan: move debug check after netdev unregister ...	2021-02-25 12:06:25 -08:00
Linus Torvalds	6fbd6cf85a	Kbuild updates for v5.12 - Fix false-positive build warnings for ARCH=ia64 builds - Optimize dictionary size for module compression with xz - Check the compiler and linker versions in Kconfig - Fix misuse of extra-y - Support DWARF v5 debug info - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x exceeded the limit - Add generic syscall{tbl,hdr}.sh for cleanups across arches - Minor cleanups of genksyms - Minor cleanups of Kconfig -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmA3zhgVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsG0C4P/A5hUNFdkYI+EffAWZiHn69t0S8j M1GQkZildKu/yOfm6hp3mNwgHmYgw0aAuch1htkJuv+5rXRtoK77yw0xKbUqNHyO VqkJWQPVUXJbWIDiu332NaETHbFTWCnPZKGmzcbVOBHbYsXUJPp17gROQ9ke0fQN Ae6OV5WINhoS8UnjESWb3qOO87MdQTZ+9mP+NMnVh4kV1SUeMAXLFwFll66KZTkj GXB330N3p9L0wQVljhXpQ/YPOd76wJNPhJWJ9+hKLFbWsedovzlHb+duprh1z1xe 7LLaq9dEbXxe1Uz0qmK76lupXxilYMyUupTW9HIYtIsY8br8DIoBOG0bn46LVnuL /m+UQNfUFCYYePT7iZQNNc1DISQJrxme3bjq0PJzZTDukNnHJVahnj9x4RoNaF8j Dc+JME0r2i8Ccp28vgmaRgzvSsb8Xtw5icwRdwzIpyt1ubs/+tkd/GSaGzQo30Q8 m8y1WOjovHNX7OGnOaOWBGoQAX/2k/VHeAediMsPqWUoOxwsLHYxG/4KtgwbJ5vc gu/Fyk1GRDklZPpLdYFVvz8TGnqSDogJgF+7WolJ6YvPGAUIDAfd5Ky2sWayddlm wchc3sKDVyh3lov23h0WQVTvLO9xl+NZ6THxoAGdYeQ0DUu5OxwH8qje/UpWuo1a DchhNN+g5pa6n56Z =sLxb -----END PGP SIGNATURE----- Merge tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Fix false-positive build warnings for ARCH=ia64 builds - Optimize dictionary size for module compression with xz - Check the compiler and linker versions in Kconfig - Fix misuse of extra-y - Support DWARF v5 debug info - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x exceeded the limit - Add generic syscall{tbl,hdr}.sh for cleanups across arches - Minor cleanups of genksyms - Minor cleanups of Kconfig * tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits) initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD kbuild: remove deprecated 'always' and 'hostprogs-y/m' kbuild: parse C= and M= before changing the working directory kbuild: reuse this-makefile to define abs_srctree kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig kconfig: omit --oldaskconfig option for 'make config' kconfig: fix 'invalid option' for help option kconfig: remove dead code in conf_askvalue() kconfig: clean up nested if-conditionals in check_conf() kconfig: Remove duplicate call to sym_get_string_value() Makefile: Remove # characters from compiler string Makefile: reuse CC_VERSION_TEXT kbuild: check the minimum linker version in Kconfig kbuild: remove ld-version macro scripts: add generic syscallhdr.sh scripts: add generic syscalltbl.sh arch: syscalls: remove $(srctree)/ prefix from syscall tables arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work gen_compile_commands: prune some directories kbuild: simplify access to the kernel's version ...	2021-02-25 10:17:31 -08:00
Chuhong Yuan	8eb65fda4a	net/mlx4_core: Add missed mlx4_free_cmd_mailbox() mlx4_do_mirror_rule() forgets to call mlx4_free_cmd_mailbox() to free the memory region allocated by mlx4_alloc_cmd_mailbox() before an exit. Add the missed call to fix it. Fixes: `78efed2751` ("net/mlx4_core: Support mirroring VF DMFS rules on both ports") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20210221143559.390277-1-hslester96@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-22 19:08:33 -08:00
Linus Torvalds	3672ac8ac0	RDMA 5.12 merge window pull request - Driver updates and bug fixes: siw, hns, bnxt_re, mlx5, efa - Significant rework in rxe to get it ready to have XRC support added - Several rts bug fixes - Big series to get to 'make W=1' cleanness, primarily updating kdocs - Support for creating a RDMA MR from a DMABUF fd to allow PCI peer to peer transfers to GPU VRAM - Device disassociation now works properly with umad - Work to support more than 255 ports on a RDMA device - Further support for the new HNS HIP09 hardware - Coding style cleanups: comma to semicolon, unneded semicolon/blank lines, remove 'h' printk format, don't check for NULL before kfree, use true/false for bool. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmAzvugACgkQOG33FX4g mxpE4w/9HqJF0lsHkRHhorrVZnJwO5hMs1QzY4wya+BqWtMEi5DreS/75uMiRmYH EsvO4LOvzNuP8uDjUmznRe7MLBRUg7GqIfDrxhGDIJ4tWBJ5amoordoDCY/IKcTW fBETEGcL92wTnZBxXX8jsVA+7QUYgGenFr6ozpdQ9EldQeEBb2CHzn5sxD/CCHXS k49mdk2FvPanb0r7ZIkqsDDMXjP/n7/hi9JX9fK4oCbsap0S5YavCuwVMkV0XHPe l7hjxsrztHZwrxFq846Sz0tIdwPIiHam+3CWpV5pUJxaI7xUZkgmCaXHRTeRCYRR amDOpXL7FjvUShnTyp2wUAFNR/xHdHx2uMSGR0KR5chUTmSixwD4H6xQlg2ZCvgd hAVWIliMh5mMqFy1+gz6ES98/Wh4u+Iv7ws5iQ8qQXWVB26+OyWL1l9ArVVysuXW vMIXkDR2lMk//qSz8klnqQjPR2gpjnmZ9PYq6a6EQa6xRaS3oWj2E/OWXCkdo4mv ISpqTNq/aZPz5+wsiv6ngxMl36Vof0T8rPudCuN+SGYTG4D7s3gZu3IGsB0cbqbW DMXUXLzUWx/KlMeErxWjPOQQReHZ7jq4O/A8aXBe3q13hKRlmk15MY66YFu9Poad mUuqxRavINNxqfPP0dkxpVL1/1w5QDMREP6AHQRs4AGl5qzzvhk= =Z4cZ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "This is quite a small cycle, if not for Lee's 70 patches cleaning the kdocs it would be well below typical for patch count. Most of the interesting work here was in the HNS and rxe drivers which got fairly major internal changes. Summary: - Driver updates and bug fixes: siw, hns, bnxt_re, mlx5, efa - Significant rework in rxe to get it ready to have XRC support added - Several rts bug fixes - Big series to get to 'make W=1' cleanness, primarily updating kdocs - Support for creating a RDMA MR from a DMABUF fd to allow PCI peer to peer transfers to GPU VRAM - Device disassociation now works properly with umad - Work to support more than 255 ports on a RDMA device - Further support for the new HNS HIP09 hardware - Coding style cleanups: comma to semicolon, unneded semicolon/blank lines, remove 'h' printk format, don't check for NULL before kfree, use true/false for bool" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (205 commits) RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR() RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodes RDMA/mlx5: Fail QP creation if the device can not support the CQE TS RDMA/mlx5: Allow CQ creation without attached EQs RDMA/rtrs-srv-sysfs: fix missing put_device RDMA/rtrs-srv: fix memory leak by missing kobject free RDMA/rtrs: Only allow addition of path to an already established session RDMA/rtrs-srv: Fix stack-out-of-bounds RDMA/rxe: Remove unused pkt->offset RDMA/ucma: Fix use-after-free bug in ucma_create_uevent RDMA/core: Fix kernel doc warnings for ib_port_immutable_read() RDMA/qedr: Use true and false for bool variable RDMA/hns: Adjust definition of FRMR fields RDMA/hns: Refactor process of posting CMDQ RDMA/hns: Adjust fields and variables about CMDQ tail/head RDMA/hns: Remove redundant operations on CMDQ RDMA/hns: Fixes missing error code of CMDQ RDMA/hns: Remove unused member and variable of CMDQ RDMA/ipoib: Remove racy Subnet Manager sendonly join checks RDMA/mlx5: Support 400Gbps IB rate in mlx5 driver ...	2021-02-22 10:27:48 -08:00
Jason Gunthorpe	7289e26f39	Linux 5.11 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmAppPgeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGeXYH/imZPBd4A1jIMehN 5HV2A53Z+MXmmaMuGj9X1KV6vsf55/xB+IhOoFdtRAIsO8c2yYSCO8i4+4R0XfYA +/YFJeq672rojQnmh6XbpR8dugaAV7CUHy6n7KDsyvtT6EOCpwFSwkOb4X3tBRX6 TlYgm2d/xgV/wRHSgLVugK0MdFCLMAnyb7mkPfar9QrMgG1BiDKLq07xmwnS23On TkqpJ9yZ/rJpUrrUqQYPShSO/FmA+fSfWs0CDv7EIrJ40LUScD6PZxSHWTIHtjLk E4jFda6wuqLRVWsBwaBzUIdD0zk7X5quHRzEpbC5ga16SK6yrWvE5YJJXCguIEuZ f3FMRYs= =CAjn -----END PGP SIGNATURE----- Merge tag 'v5.11' into rdma.git for-next Linux 5.11 Merged to resolve conflicts with RDMA rc commits - drivers/infiniband/sw/rxe/rxe_net.c The final logic is to call rxe_get_dev_from_net() again with the master netdev if the packet was rx'd on a vlan. To keep the elimination of the local variables requires a trivial edit to the code in -rc Link: https://lore.kernel.org/r/20210210131542.215ea67c@canb.auug.org.au Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-18 11:19:29 -04:00
Jakub Kicinski	b646acd5eb	net: re-solve some conflicts after net -> net-next merge Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-16 23:12:23 -08:00
David S. Miller	d489ded1a3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-02-16 17:51:13 -08:00
David S. Miller	44c3203975	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== pull-request: mlx5-next 2021-02-16 The patches in this pr are already submitted and reviewed through the netdev and rdma mailing lists. The series includes mlx5 HW bits and definitions for mlx5 real time clock translation and handling in the mlx5 driver clock module to enable and support such mode [1] [1] https://patchwork.kernel.org/project/netdevbpf/patch/20210212223042.449816-7-saeed@kernel.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-16 14:53:30 -08:00
Aya Levin	432119de33	net/mlx5: Add cyc2time HW translation mode support Device timestamp can be in real time mode (cycles to time translation is offloaded into the Hardware). With real time mode, HW provides timestamp which is already translated into nanoseconds. With this mode, driver adjusts both the HW and timecounter (to keep clock_info_page updated) using callbacks: adjfreq, adjtime and settime. HW clock modifications are done via MTUTC access reg commands. Driver is allowed to modify HW real time clock only if MCAM ptpcyc2realtime_modify capability is set. Add MTUTC set function to be used for configuring the HW real time clock. Modify existing code to support both internal timer (with conversion via timecounter_cyc2time() and real time (no conversions). Align the signatures of the helpers converting from timestamp to nanoseconds. With that, when allocating a queue assign the corresponding callback with respect to the capability. Adjust 1PPS timestamp calculation flows based on the timestamp mode. Cyc2time offload brings two major advantages: - Improve MTAE (Max Time Absolute Error) for HW TS by up to 160 ns over a 100% loaded CPU. - Faster data-path timestamp to nanoseconds, as translation is lock-less and done in HW. On real time mode, timestamp format is 32 high bits of seconds and 32 low bits of nanoseconds. On some flows, driver shall convert this format into nanoseconds wall-clock with REAL_TIME_TO_NS macro. HW supports a single clock, and it is shared by all functions on a device. In case real time clock is used, it is recommended to use a single GM to all device's functions. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-16 14:04:54 -08:00
Eran Ben Elisha	de19cd6cc9	net/mlx5: Move some PPS logic into helper functions Some of PPS logic (timestamp calculations) fits only internal timer timestamp mode. Move these logics into helper functions. Later in the patchset cyc2time HW translation mode will expose its own PPS timestamp calculations. With this change, main flow will only hold calling PPS logic based on run time mode. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-16 14:04:54 -08:00
Eran Ben Elisha	d6f3dc8f50	net/mlx5: Move all internal timer metadata into a dedicated struct Internal timer mode (SW clock) requires some PTP clock related metadata structs. Real time mode (HW clock) will not need these metadata structs. This separation emphasize the different interfaces for HW clock and SW clock. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-16 14:04:54 -08:00
Eran Ben Elisha	1436de0b99	net/mlx5: Refactor init clock function Function mlx5_init_clock() is responsible for internal PTP related metadata initializations. Break mlx5_init_clock() to sub functions, each takes care of its own logic. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-16 14:04:54 -08:00
Sasha Levin	88a686728b	kbuild: simplify access to the kernel's version Instead of storing the version in a single integer and having various kernel (and userspace) code how it's constructed, export individual (major, patchlevel, sublevel) components and simplify kernel code that uses it. This should also make it easier on userspace. Signed-off-by: Sasha Levin <sashal@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-02-16 12:01:45 +09:00
Vladimir Oltean	e18f4c18ab	net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes This switchdev attribute offers a counterproductive API for a driver writer, because although br_switchdev_set_port_flag gets passed a "flags" and a "mask", those are passed piecemeal to the driver, so while the PRE_BRIDGE_FLAGS listener knows what changed because it has the "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final value. But certain drivers can offload only certain combinations of settings, like for example they cannot change unicast flooding independently of multicast flooding - they must be both on or both off. The way the information is passed to switchdev makes drivers not expressive enough, and unable to reject this request ahead of time, in the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during the deferred BRIDGE_FLAGS attribute, where the rejection is currently ignored. This patch also changes drivers to make use of the "mask" field for edge detection when possible. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-12 17:08:04 -08:00
Vladimir Oltean	4c08c586ff	net: switchdev: propagate extack to port attributes When a struct switchdev_attr is notified through switchdev, there is no way to report informational messages, unlike for struct switchdev_obj. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-12 17:08:04 -08:00
Tariq Toukan	2af3e35c5a	net/mlx5: Remove TLS dependencies on XPS No real dependency on XPS, but on RX queue mapping, which is being selected by TLS_DEVICE. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-11 19:08:06 -08:00
Moshe Shemesh	e1c3940c60	net/mlx5e: Check tunnel offload is required before setting SWP Check that tunnel offload is required before setting Software Parser offsets to get Geneve HW offload. In case of Geneve packet we check HW offload support of SWP in mlx5e_tunnel_features_check() and set features accordingly, this should be reflected in skb offload requested by the kernel and we should add the Software Parser offsets only if requested. Otherwise, in case HW doesn't support SWP for Geneve, data path will mistakenly try to offload Geneve SKBs with skb->encapsulation set, regardless of whether offload was requested or not on this specific SKB. Fixes: `e3cfc7e6b7` ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:16 -08:00
Oz Shlomo	a217313152	net/mlx5e: CT: manage the lifetime of the ct entry object The ct entry object is accessed by the ct add, del, stats and restore methods. In addition, it is referenced from several hash tables. The lifetime of the ct entry object was not managed which triggered race conditions as in the following kasan dump: [ 3374.973945] ================================================================== [ 3374.988552] BUG: KASAN: use-after-free in memcmp+0x4c/0x98 [ 3374.999590] Read of size 1 at addr ffff00036129ea55 by task ksoftirqd/1/15 [ 3375.016415] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G O 5.4.31+ #1 [ 3375.055301] Call trace: [ 3375.060214] dump_backtrace+0x0/0x238 [ 3375.067580] show_stack+0x24/0x30 [ 3375.074244] dump_stack+0xe0/0x118 [ 3375.081085] print_address_description.isra.9+0x74/0x3d0 [ 3375.091771] __kasan_report+0x198/0x1e8 [ 3375.099486] kasan_report+0xc/0x18 [ 3375.106324] __asan_load1+0x60/0x68 [ 3375.113338] memcmp+0x4c/0x98 [ 3375.119409] mlx5e_tc_ct_restore_flow+0x3a4/0x6f8 [mlx5_core] [ 3375.131073] mlx5e_rep_tc_update_skb+0x1d4/0x2f0 [mlx5_core] [ 3375.142553] mlx5e_handle_rx_cqe_rep+0x198/0x308 [mlx5_core] [ 3375.154034] mlx5e_poll_rx_cq+0x2a0/0x1060 [mlx5_core] [ 3375.164459] mlx5e_napi_poll+0x1d4/0xa78 [mlx5_core] [ 3375.174453] net_rx_action+0x28c/0x7a8 [ 3375.182004] __do_softirq+0x1b4/0x5d0 Manage the lifetime of the ct entry object by using synchornization mechanisms for concurrent access. Fixes: `ac991b48d4` ("net/mlx5e: CT: Offload established flows") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:15 -08:00
Shay Drory	edac23c2b3	net/mlx5: Disable devlink reload for lag devices Devlink reload can't be allowed on lag devices since reloading one lag device will cause traffic on the bond to get stucked. Users who wish to reload a lag device, need to remove the device from the bond, and only then reload it. Fixes: `4383cfcc65` ("net/mlx5: Add devlink reload") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:15 -08:00
Shay Drory	7ab91f2b03	net/mlx5: Disallow RoCE on lag device In lag mode, setting roce enabled/disable of lag device have no effect. e.g.: bond device (roce/vf_lag) roce status remain unchanged. Therefore disable it and add an error message. Fixes: `cc9defcbb8` ("net/mlx5: Handle "enable_roce" devlink param") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:15 -08:00
Shay Drory	c70f8597fc	net/mlx5: Disallow RoCE on multi port slave device In dual port mode, setting roce enabled/disable for the slave device have no effect. e.g.: the slave device roce status remain unchanged. Therefore disable it and add an error message. Enable or disable roce of the master device affect both master and slave devices. Fixes: `cc9defcbb8` ("net/mlx5: Handle "enable_roce" devlink param") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:14 -08:00
Shay Drory	d89ddaae17	net/mlx5: Disable devlink reload for multi port slave device Devlink reload can't be allowed on a multi port slave device, because reload of slave device doesn't take effect. The right flow is to disable devlink reload for multi port slave device. Hence, disabling it in mlx5_core probing. Fixes: `4383cfcc65` ("net/mlx5: Add devlink reload") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:14 -08:00
Maxim Mikityanskiy	b850bbff96	net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context wait_for_resync is unreliable - if it timeouts, priv_rx will be freed anyway. However, mlx5e_ktls_handle_get_psv_completion will be called sooner or later, leading to use-after-free. For example, it can happen if a CQ error happened, and ICOSQ stopped, but later on the queues are destroyed, and ICOSQ is flushed with mlx5e_free_icosq_descs. This patch converts the lifecycle of priv_rx to fully refcount-based, so that the struct won't be freed before the refcount goes to zero. Fixes: `0419d8c9d8` ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:13 -08:00
Maxim Mikityanskiy	ebf79b6be6	net/mlx5e: Fix CQ params of ICOSQ and async ICOSQ The commit mentioned below has split the parameters of ICOSQ and async ICOSQ, but it contained a typo: the CQ parameters were swapped for ICOSQ and async ICOSQ. Async ICOSQ is longer than the normal ICOSQ, and the CQ size must be the same as the size of the corresponding SQ, but due to this bug, the CQ of async ICOSQ was much shorter than async ICOSQ itself. It led to overflows of the CQ with such messages in dmesg, in particular, when running multiple kTLS-offloaded streams: mlx5_core 0000:08:00.0: cq_err_event_notifier:529:(pid 9422): CQ error on CQN 0x406, syndrome 0x1 mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000406 event=0x04 This commit fixes the issue by using the corresponding parameters for ICOSQ and async ICOSQ. Fixes: `c293ac927f` ("net/mlx5e: Refactor build channel params") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:12 -08:00
Maxim Mikityanskiy	4d6e6b0c6d	net/mlx5e: Replace synchronize_rcu with synchronize_net The commit cited below switched from using napi_synchronize to synchronize_rcu to have a guarantee that it will finish in finite time. However, on average, synchronize_rcu takes more time than napi_synchronize. Given that it's called multiple times per channel on deactivation, it accumulates to a significant amount, which causes timeouts in some applications (for example, when using bonding with NetworkManager). This commit replaces synchronize_rcu with synchronize_net, which is faster when called under rtnl_lock, allowing to speed up the described flow. Fixes: `9c25a22dfb` ("net/mlx5e: Use synchronize_rcu to sync with NAPI") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:12 -08:00
Shay Drory	51d138c261	net/mlx5: Fix health error state handling Currently, when we discover a fatal error, we are queueing a work that will wait for a lock in order to enter the device to error state. Meanwhile, FW commands are still being processed, and gets timeouts. This can block the driver for few minutes before the work will manage to get the lock and enter to error state. Setting the device to error state before queueing health work, in order to avoid FW commands being processed while the work is waiting for the lock. Fixes: `c1d4d2e92a` ("net/mlx5: Avoid calling sleeping function by the health poll thread") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:11 -08:00
Maxim Mikityanskiy	65ba8594a2	net/mlx5e: Change interrupt moderation channel params also when channels are closed struct mlx5e_params contains fields ({rx,tx}_cq_moderation) that depend on two things: whether DIM is enabled and the state of a private flag (MLX5E_PFLAG_{RX,TX}_CQE_BASED_MODER). Whenever the DIM state changes, mlx5e_reset_{rx,tx}_moderation is called to update the fields, however, only if the channels are open. The flow where the channels are closed misses the required update of the fields. This commit moves the calls of mlx5e_reset_{rx,tx}_moderation, so that they run in both flows. Fixes: `ebeaf084ad` ("net/mlx5e: Properly set default values when disabling adaptive moderation") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:11 -08:00
Maxim Mikityanskiy	019f93bc4b	net/mlx5e: Don't change interrupt moderation params when DIM is enabled When mlx5e_ethtool_set_coalesce doesn't change DIM state (enabled/disabled), it calls mlx5e_set_priv_channels_coalesce unconditionally, which in turn invokes a firmware command to set interrupt moderation parameters. It shouldn't happen while DIM manages those parameters dynamically (it might even be happening at the same time). This patch fixes it by splitting mlx5e_set_priv_channels_coalesce into two functions (for RX and TX) and calling them only when DIM is disabled (for RX and TX respectively). Fixes: `cb3c7fd4f8` ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:10 -08:00
Raed Salem	e33f9f5f2d	net/mlx5e: Enable XDP for Connect-X IPsec capable devices This limitation was inherited by previous Innova (FPGA) IPsec implementation, it uses its private set of RQ handlers which does not support XDP, for Connect-X this is no longer true. Fix by keeping this limitation only for Innova IPsec supporting devices, as otherwise this limitation effectively wrongly blocks XDP for all future Connect-X devices for all flows even if IPsec offload is not used. Fixes: `2d64663cd5` ("net/mlx5: IPsec: Add HW crypto offload support") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:10 -08:00
Raed Salem	e4484d9df5	net/mlx5e: Enable striding RQ for Connect-X IPsec capable devices This limitation was inherited by previous Innova (FPGA) IPsec implementation, it uses its private set of RQ handlers which does not support striding rq, for Connect-X this is no longer true. Fix by keeping this limitation only for Innova IPsec supporting devices, as otherwise this limitation effectively wrongly blocks striding RQs for all future Connect-X devices for all flows even if IPsec offload is not used. Fixes: `2d64663cd5` ("net/mlx5: IPsec: Add HW crypto offload support") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:09 -08:00
Parav Pandit	0e22bfb7c0	net/mlx5e: E-switch, Fix rate calculation for overflow rate_bytes_ps is a 64-bit field. It passed as 32-bit field to apply_police_params(). Due to this when police rate is higher than 4Gbps, 32-bit calculation ignores the carry. This results in incorrect rate configurationn the device. Fix it by performing 64-bit calculation. Fixes: `fcb64c0f56` ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-11 18:50:09 -08:00
Wei Yongjun	b50c4892cb	net/mlx5: SF, Fix error return code in mlx5_sf_dev_probe() Fix to return negative error code -ENOMEM from the ioremap() error handling case instead of 0, as done elsewhere in this function. Fixes: `1958fc2f07` ("net/mlx5: SF, Add auxiliary device driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-10 20:47:15 -08:00
Wei Yongjun	2b6c3c1e74	net/mlx5e: Fix error return code in mlx5e_tc_esw_init() Fix to return negative error code from the mlx5e_tc_tun_init() error handling case instead of 0, as done elsewhere in this function. This commit also using 0 instead of 'ret' when success since it is always equal to 0. Fixes: `8914add2c9` ("net/mlx5e: Handle FIB events to update tunnel endpoint device") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-10 20:47:14 -08:00
Dan Carpenter	4782c5d8b9	net/mlx5: Fix a NULL vs IS_ERR() check The mlx5_chains_get_table() function doesn't return NULL, it returns error pointers so we need to fix this condition. Fixes: `34ca65352d` ("net/mlx5: E-Switch, Indirect table infrastructure") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-10 20:47:14 -08:00
Vlad Buslov	36280f0797	net/mlx5e: Fix tc_tun.h to verify MLX5_ESWITCH config Exclude contents of tc_tun.h header when CONFIG_MLX5_ESWITCH is disabled to prevent compile-time errors when compiling with such config. Fixes: `0d9f964714` ("net/mlx5e: Extract tc tunnel encap/decap code to dedicated file") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-10 20:47:13 -08:00
Jiapeng Zhong	793985432d	net/mlx5: Assign boolean values to a bool variable Fix the following coccicheck warnings: ./drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:575:2-14: WARNING: Assignment of 0/1 to bool variable. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-10 20:47:13 -08:00
Colin Ian King	a3f5a45200	net/mlx5e: Fix spelling mistake "Unknouwn" -> "Unknown" There is a spelling mistake in a netdev_warn message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-10 20:47:13 -08:00
Colin Ian King	83907506f7	net/mlx5e: Fix spelling mistake "channles" -> "channels" There is a spelling mistake in a netdev_warn message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-10 20:47:12 -08:00
Zou Wei	b171fcd29c	net/mlx5_core: remove unused including <generated/utsrelease.h> Remove including <generated/utsrelease.h> that don't need it. Fixes: `17a7612b99` ("net/mlx5_core: Clean driver version and name") Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-10 20:47:12 -08:00
Colin Ian King	1b7eb33750	net/mlx5: fix spelling mistake in Kconfig "accelaration" -> "acceleration" There are some spelling mistakes in the Kconfig. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-10 20:47:11 -08:00
Amit Cohen	a4cb1c02c3	mlxsw: spectrum_router: Set offload_failed flag When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and route insertion fails, FIB abort is triggered. After aborting, set the appropriate hardware flag to make the kernel emit RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	0c5fcf9e24	IPv6: Add "offload failed" indication to routes After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. To avoid such cases, previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, this behavior is controlled by sysctl. With the above mentioned behavior, it is possible to know from user-space if the route was offloaded, but if the offload fails there is no indication to user-space. Following a failure, a routing daemon will wait indefinitely for a notification that will never come. This patch adds an "offload_failed" indication to IPv6 routes, so that users will have better visibility into the offload process. 'struct fib6_info' is extended with new field that indicates if route offload failed. Note that the new field is added using unused bit and therefore there is no need to increase struct size. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	36c5100e85	IPv4: Add "offload failed" indication to routes After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. To avoid such cases, previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, this behavior is controlled by sysctl. With the above mentioned behavior, it is possible to know from user-space if the route was offloaded, but if the offload fails there is no indication to user-space. Following a failure, a routing daemon will wait indefinitely for a notification that will never come. This patch adds an "offload_failed" indication to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias', and 'struct fib_rt_info' are extended with new field that indicates if route offload failed. Note that the new field is added using unused bit and therefore there is no need to increase structs size. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Yishai Hadas	db72438c93	RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flow Cleanup the synchronize_srcu() from the ODP flow as it was found to be a very heavy time consumer as part of dereg_mr. For example de-registration of 10000 ODP MRs each with size of 2M hugepage took 19.6 sec comparing de-registration of same number of non ODP MRs that took 172 ms. The new locking scheme uses the wait_event() mechanism which follows the use count of the MR instead of using synchronize_srcu(). By that change, the time required for the above test took 95 ms which is even better than the non ODP flow. Once fully dropped the srcu usage, had to come with a lock to protect the XA access. As part of using the above mechanism we could also clean the num_deferred_work stuff and follow the use count instead. Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-02-08 20:31:11 -04:00
Vlad Buslov	8914add2c9	net/mlx5e: Handle FIB events to update tunnel endpoint device Process FIB route update events to dynamically update the stack device rules when tunnel routing changes. Use rtnl lock to prevent FIB event handler from running concurrently with neigh update and neigh stats workqueue tasks. Use encap_tbl_lock mutex to synchronize with TC rule update path that doesn't use rtnl lock. FIB event workflow for encap flows: - Unoffload all flows attached to route encaps from slow or fast path depending on encap destination endpoint neigh state. - Update encap IP header according to new route dev. - Update flows mod_hdr action that is responsible for overwriting reg_c0 source port bits to source port of new underlying VF of new route dev. This step requires changing flow create/delete code to save flow parse attribute mod_hdr_acts structure for whole flow lifetime instead of deallocating it after flow creation. Refactor mod_hdr code to allow saving id of individual mod_hdr actions and updating them with dedicated helper. - Offload all flows to either slow or fast path depending on encap destination endpoint neigh state. FIB event workflow for decap flows: - Unoffload all route flows from hardware. When last route flow is deleted all indirect table rules for the route dev will also be deleted. - Update flow attr decap_vport and destination MAC according to underlying VF of new rote dev. - Offload all route flows back to hardware creating new indirect table rules according to updated flow attribute data. Extract some neigh update code to helper functions to be used by both neigh update and route update infrastructure. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:39 -08:00
Vlad Buslov	021905f806	net/mlx5e: Rename some encap-specific API to generic names Some of the encap-specific functions and fields will also be used by route update infrastructure in following patches. Rename them to generic names. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:38 -08:00
Vlad Buslov	c7b9038d8a	net/mlx5e: TC preparation refactoring for routing update event Following patch in series implement routing update event which requires ability to modify rule match_to_reg modify header actions dynamically during rule lifetime. In order to accommodate such behavior, refactor and extend TC infrastructure in following ways: - Modify mod_hdr infrastructure to preserve its parse attribute for whole rule lifetime, instead of deallocating it after rule creation. - Extend match_to_reg infrastructure with new function mlx5e_tc_match_to_reg_set_and_get_id() that returns mod_hdr action id that can be used afterwards to update the action, and mlx5e_tc_match_to_reg_mod_hdr_change() that can modify existing actions by its id. - Extend tun API with new functions mlx5e_tc_tun_update_header_ipv{4\|6}() that are used to updated existing encap entry tunnel header. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:38 -08:00
Vlad Buslov	2221d954d9	net/mlx5e: Refactor neigh update infrastructure Following patches in series implements route update which can cause encap entries to migrate between routing devices. Consecutively, their parent nhe's need to be also transferable between devices instead of having neigh device as a part of their immutable key. Move neigh device from struct mlx5_neigh to struct mlx5e_neigh_hash_entry and check that nhe and neigh devices are the same in workqueue neigh update handler. Save neigh net_device that can change dynamically in dedicated nhe->dev field. With FIB event handler that is implemented in following patches changing nhe->dev, NETEVENT_DELAY_PROBE_TIME_UPDATE handler can concurrently access the nhe entry when traversing neigh list under rcu read lock. Processing stale values in that handler doesn't change the handler logic, so just wrap all accesses to the dev pointer in {WRITE\|READ}_ONCE() helpers. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:38 -08:00
Vlad Buslov	777bb800c6	net/mlx5e: Create route entry infrastructure Implement dedicated route entry infrastructure to be used in following patch by route update event. Both encap (indirectly through their corresponding encap entries) and decap (directly) flows are attached to routing entry. Since route update also requires updating encap (route device MAC address is a source MAC address of tunnel encapsulation), same encap_tbl_lock mutex is used for synchronization. The new infrastructure looks similar to existing infrastructures for shared encap, mod_hdr and hairpin entries: - Per-eswitch hash table is used for quick entry lookup. - Flows are attached to per-entry linked list and hold reference to entry during their lifetime. - Atomic reference counting and rcu mechanisms are used as synchronization primitives for concurrent access. The infrastructure also enables connection tracking on stacked devices topology by attaching CT chain 0 flow on tunneling dev to decap route entry. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:38 -08:00
Vlad Buslov	0d9f964714	net/mlx5e: Extract tc tunnel encap/decap code to dedicated file Following patches in series extend the extracted code with routing infrastructure. To improve code modularity created a dedicated tc_tun_encap.c source file and move encap/decap related code to the new file. Export code that is used by both regular TC code and encap/decap code into tc_priv.h (new header intended to be used only by TC module). Rename some exported functions by adding "mlx5e_" prefix to their names. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:37 -08:00
Vlad Buslov	8e404fefa5	net/mlx5e: Match recirculated packet miss in slow table using reg_c1 Previous patch in series that implements stack devices RX path implements indirect table rules that match on tunnel VNI. After such rule is created all tunnel traffic is recirculated to root table. However, recirculated packet might not match on any rules installed in the table (for example, when IP traffic follows ARP traffic). In that case packets appear on representor of tunnel endpoint VF instead being redirected to the VF itself. Extend slow table with additional flow group that matches on reg_c0 (source port value set by indirect tables implemented by previous patch in series) and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install one rule per VF vport to match on recirculated miss packets and redirect them to appropriate VF vport. Modify indirect tables code to also rewrite reg_c1 with special 0xFFF mark. Implementation reuses reg_c1 tunnel id bits. This is safe to do because recirculated packets are always matched before decapsulation. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:37 -08:00
Vlad Buslov	48d216e559	net/mlx5e: Refactor reg_c1 usage Following patch in series uses reg_c1 in eswitch code. To use reg_c1 helpers in both TC and eswitch code, refactor existing helpers according to similar use case of reg_c0 and move the functionality into eswitch.h. Calculate reg mappings length from new defines to ensure that they are always in sync and only need to be changed in single place. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:37 -08:00
Vlad Buslov	a508728a4c	net/mlx5e: VF tunnel RX traffic offloading When tunnel endpoint is on VF the encapsulated RX traffic is exposed on the representor of the VF without any further processing of rules installed on the VF. Detect such case by checking if the device returned by route lookup in decap rule handling code is a mlx5 VF and handle it with new redirection tables API. Example TC rules for VF tunnel traffic: 1. Rule that encapsulates the tunneled flow and redirects packets from source VF rep to tunnel device: $ tc -s filter show dev enp8s0f0_1 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 0a:40:bd:30:89:99 src_mac ca:2e:a7:3f:f5:0f eth_type ipv4 ip_tos 0/0x3 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key set src_ip 7.7.7.5 dst_ip 7.7.7.1 key_id 98 dst_port 4789 nocsum ttl 64 pipe index 1 ref 1 bind 1 installed 411 sec used 411 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu used_hw_stats delayed action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen index 1 ref 1 bind 1 installed 411 sec used 0 sec Action statistics: Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 5615833 bytes 4028 pkt backlog 0b 0p requeues 0 cookie bb406d45d343bf7ade9690ae80c7cba4 no_percpu used_hw_stats delayed 2. Rule that redirects from tunnel device to UL rep: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:36 -08:00
Vlad Buslov	4ad9116c84	net/mlx5e: Remove redundant match on tunnel destination mac Remove hardcoded match on tunnel destination MAC address. Such match is no longer required and would be wrong for stacked devices topology where encapsulation destination MAC address will be the address of tunnel VF that can change dynamically on route change (implemented in following patches in the series). Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:36 -08:00
Vlad Buslov	34ca65352d	net/mlx5: E-Switch, Indirect table infrastructure Indirect table infrastructure is used to allow fully processing VF tunnel traffic in hardware. Kernel software model uses two TC rules for such traffic: UL rep to tunnel device, then tunnel VF rep to destination VF rep. To implement such pipeline driver needs to program the hardware after matching on UL rule to overwrite source vport from UL to tunnel VF and recirculate the packet to the root table to allow matching on the rule installed on tunnel VF. For this indirect table matches all encapsulated traffic by tunnel parameters and all other IP traffic is sent to tunnel VF by the miss rule. Indirect table API overview: - mlx5_esw_indir_table_{init\|destroy}() - init and destroy opaque indirect table object. - mlx5_esw_indir_table_get() - get or create new table according to vport id and IP version. Table has following pre-created groups: recirculation group with match on ethertype and VNI (rules that match encapsulated packets are installed to this group) and forward group with default/miss rule that forwards to vport of tunnel endpoint VF (rule for regular non-encapsulated packets). - mlx5_esw_indir_table_put() - decrease reference to the indirect table and matching rule (for encapsulated traffic). - mlx5_esw_indir_table_needed() - check that in_port is an uplink port and out_port is VF on the same eswitch, verify that the rule is for IP traffic and source port rewrite functionality can be used. - mlx5_esw_indir_table_decap_vport() - function returns decap vport of flow attribute. Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:36 -08:00
Vlad Buslov	6717986e15	net/mlx5e: Refactor tun routing helpers Refactor tun routing helpers to use dedicated struct mlx5e_tc_tun_route_attr instead of multiple output arguments. This simplifies the callers (no need to keep track of bunch of output param pointers) and allows to unify struct release code in new mlx5e_tc_tun_route_attr_cleanup() helper instead of requiring callers to manually release some of the output parameters that require it. Simplify code by unifying error handling at the end of the function and rearranging code. Remove redundant empty line. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:35 -08:00
Vlad Buslov	10742efc20	net/mlx5e: VF tunnel TX traffic offloading When tunnel endpoint is on VF, driver still assumes that endpoint is on uplink and incorrectly configures encap rule offload according to that assumption. As a result, traffic is sent directly to the uplink and rules installed on representor of tunnel endpoint VF are ignored. Implement following changes to allow offloading tx traffic with tunnel endpoint on VF: - For tunneling flows perform route lookup on route and out devices pair. If out device is uplink and route device is VF of same physical port, then modify packet reg_c_0 metadata register (source port) with the value of VF vport. Use eswitch vhca_id->vport mapping introduced in one of previous patches in the series to obtain vport from route netdevice. - Recirculate encapsulated packets to VF vport in order to apply any flow rules installed on VF representor that match on encapsulated traffic. Only enable support for this functionality when all following conditions are true: - Hardware advertises capability to preserve reg_c_0 value on packet recirculation. - Vport metadata matching is enabled. - Termination tables are to be used by the flow. Example TC rules for VF tunnel traffic: 1. Rule that redirects packets from UL to VF rep that has the tunnel endpoint IP address: $ tc -s filter show dev enp8s0f0 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 16:c9:a0:2d:69:2c src_mac 0c:42:a1:58:ab:e4 eth_type ipv4 ip_flags nofrag in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen index 3 ref 1 bind 1 installed 377 sec used 0 sec Action statistics: Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 114096 bytes 952 pkt backlog 0b 0p requeues 0 cookie 878fa48d8c423fc08c3b6ca599b50a97 no_percpu used_hw_stats delayed 2. Rule that decapsulates the tunneled flow and redirects to destination VF representor: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:35 -08:00
Vlad Buslov	9e51c0a624	net/mlx5: E-Switch, Refactor rule offload forward action processing Following patches in the series extend forwarding functionality with VF tunnel TX and RX handling. Extract action forwarding processing code into dedicated functions to simplify further extensions: - Handle every forwarding case with dedicated function instead of inline code. - Extract forwarding dest dispatch conditional into helper function esw_setup_dests(). - Unify forwaring cleanup code in error path of mlx5_eswitch_add_offloaded_rule() and in rule deletion code of __mlx5_eswitch_del_rule() in new helper function esw_cleanup_dests() (dual to new esw_setup_dests() helper). This patch does not change functionality. Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:35 -08:00
Vlad Buslov	275c21d6cb	net/mlx5e: Always set attr mdev pointer Eswitch offloads extensions in following patches in the series require attr->esw_attr->in_mdev pointer to always be set. This is already the case for all code paths except mlx5_tc_ct_entry_add_rule() function. Fix the function to assign mdev pointer with priv->mdev value. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:34 -08:00
Vlad Buslov	84ae9c1f29	net/mlx5e: E-Switch, Maintain vhca_id to vport_num mapping Following patches in the series need to be able to map VF netdev to vport. Since it is trivial to obtain vhca_id from netdev, maintain mapping from vhca_id to vport_num inside eswitch offloads using xarray. Provide function mlx5_eswitch_vhca_id_to_vport() to be used by TC code in following patches to obtain the mapping. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:34 -08:00
Mark Bloch	b055ecf582	net/mlx5: E-Switch, Refactor setting source port Setting the source port requires only the E-Switch and vport number. Refactor the function to get those parameters instead of passing the full attribute. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-05 20:53:33 -08:00
Alexander Lobakin	a79afa78e6	net: use the new dev_page_is_reusable() instead of private versions Now we can remove a bunch of identical functions from the drivers and make them use common dev_page_is_reusable(). All {,un}likely() checks are omitted since it's already present in this helper. Also update some comments near the call sites. Suggested-by: David Rientjes <rientjes@google.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-04 18:20:14 -08:00
Danielle Ratson	25a96f057a	mlxsw: ethtool: Pass link mode in use to ethtool Currently, when user space queries the link's parameters, as speed and duplex, each parameter is passed from the driver to ethtool. Instead, pass the link mode bit in use. In Spectrum-1, simply pass the bit that is set to '1' from PTYS register. In Spectrum-2, pass the first link mode bit in the mask of the used link mode. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	763ece86f0	mlxsw: ethtool: Add support for setting lanes when autoneg is off Currently, when auto negotiation is set to off, the user can force a specific speed or both speed and duplex. The user cannot influence the number of lanes that will be forced. Add support for setting speed along with lanes so one would be able to choose how many lanes will be forced. When lanes parameter is passed from user space, choose the link mode that its actual width equals to it. Otherwise, the default link mode will be the one that supports the width of the port. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	5fc4053df3	mlxsw: ethtool: Remove max lanes filtering Currently, when a speed can be supported by different number of lanes, the supported link modes bitmask contains only link modes with a single number of lanes. This was done in order to prevent auto negotiation on number of lanes after 50G-1-lane and 100G-2-lanes link modes were introduced. For example, if a port's max width is 4, only link modes with 4 lanes will be presented as supported by that port, so 100G is always achieved by 4 lanes of 25G. After the previous patches that allow selection of the number of lanes, auto negotiation on number of lanes becomes practical. Remove that filtering of the maximum number of lanes supported link modes, so indeed all the supported and advertised link modes will be shown. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Jakub Kicinski	390d9b565e	mlx5-updates-2021-02-01 mlx5 netdev updates: 1) Trivial refactoring ahead of the upcoming uplink representor series. 2) Increased RSS table size to 256, for better results 3) Misc. Cleanup and very trivial improvements -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmAY9rQACgkQSD+KveBX +j5PpggAy8h7xd4zUZpFWvoTgmzEUJV04StwdghfR+m7EtlJyU3mqGkbGoWV9d0O Vljh9sRs0V1/CnABThJ/UG5dqkJjU1ZbQDhHK/HLr9U0MggDoJqC1T6OT4+p3TRe Px91P9eYE73chhf1aDUSi9MI+xGvoGI1Dt3K2WX3cHiftl1U11G3w6hiL9/9bNVK xBlHZP6qtqIoFEs0nh7Ze/IsR0v7i+HTdujXy3g7BdJ1Q8hG/mEfmHxZV8YIjX9s huvrIlSXtLKCk8JGknJtgGVPF+5m5K6GlWPg7chZqhkK51G3vJn952LyWhqgwT5r I1S/vccVX4ROYTWaaCVfkTcmRjp1Gg== =iPSk -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-02-01 mlx5 netdev updates: 1) Trivial refactoring ahead of the upcoming uplink representor series. 2) Increased RSS table size to 256, for better results 3) Misc. Cleanup and very trivial improvements * tag 'mlx5-updates-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Avoid unnecessary csum recalculation on supporting devices net/mlx5e: CT: remove useless conversion to PTR_ERR then ERR_PTR net/mlx5e: accel, remove redundant space net/mlx5e: kTLS, Improve TLS RX workqueue scope net/mlx5e: remove h from printk format specifier net/mlx5e: Increase indirection RQ table size to 256 net/mlx5e: Enable napi in channel's activation stage net/mlx5e: Move representor neigh init into profile enable net/mlx5e: Avoid false lock depenency warning on tc_ht net/mlx5e: Move set vxlan nic info to profile init net/mlx5e: Move netif_carrier_off() out of mlx5e_priv_init() net/mlx5e: Refactor mlx5e_netdev_init/cleanup to mlx5e_priv_init/cleanup net/mxl5e: Add change profile method net/mlx5e: Separate between netdev objects and mlx5e profiles initialization ==================== Link: https://lore.kernel.org/r/20210202065457.613312-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:38:54 -08:00
Amit Cohen	efc42879ec	net: Do not call fib6_info_hw_flags_set() when IPv6 is disabled With the next patch mlxsw and netdevsim will fail in compilation if CONFIG_IPV6 is disabled. Do not call fib6_info_hw_flags_set() when IPv6 is disabled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 17:45:59 -08:00
Amit Cohen	fbaca8f895	net: Pass 'net' struct as first argument to fib6_info_hw_flags_set() The next patch will emit notification when hardware flags are changed, in case that fib_notify_on_flag_change sysctl is set to 1. To know sysctl values, net struct is needed. This change is consistent with the IPv4 version, which gets 'net' struct as its first argument. Currently, the only callers of this function are mlxsw and netdevsim. Patch the callers to pass net. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 17:45:59 -08:00
Jakub Kicinski	d1e1355aef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 14:21:31 -08:00
Maor Dickman	a34ffec8af	net/mlx5e: Release skb in case of failure in tc update skb In case of failure in tc update skb the packet is dropped without freeing the skb. Fixed by freeing the skb in case failure in tc update skb. Fixes: `d6d2778286` ("net/mlx5: E-Switch, Restore chain id on miss") Fixes: `c756909722` ("net/mlx5e: Add tc chains offload support for nic flows") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 23:02:02 -08:00
Maxim Mikityanskiy	5a2ba25a55	net/mlx5e: Update max_opened_tc also when channels are closed max_opened_tc is used for stats, so that potentially non-zero stats won't disappear when num_tc decreases. However, mlx5e_setup_tc_mqprio fails to update it in the flow where channels are closed. This commit fixes it. The new value of priv->channels.params.num_tc is always checked on exit. In case of errors it will just be the old value, and in case of success it will be the updated value. Fixes: `05909babce` ("net/mlx5e: Avoid reset netdev stats on configuration changes") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 23:02:02 -08:00
Maor Gottlieb	a5bfe6b467	net/mlx5: Fix leak upon failure of rule creation When creation of a new rule that requires allocation of an FTE fails, need to call to tree_put_node on the FTE in order to release its' resource. Fixes: `cefc23554f` ("net/mlx5: Fix FTE cleanup") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 23:02:01 -08:00
Daniel Jurgens	ed5e83a3c0	net/mlx5: Fix function calculation for page trees The function calculation always results in a value of 0. This works generally, but when the release all pages feature is enabled it will result in crashes. Fixes: `0aa128475d` ("net/mlx5: Maintain separate page trees for ECPF and PF functions") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 23:02:01 -08:00
Yevgeny Kliteynik	a283ea1b97	net/mlx5: DR, Avoid unnecessary csum recalculation on supporting devices If as part of the actions the TTL of the packet is modified, the packet's checksum needs to be recalculated. Connect-X6DX can handle this csum recalculation natively. Older devices require this additional recalculation. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:36 -08:00
Saeed Mahameed	902c024589	net/mlx5e: CT: remove useless conversion to PTR_ERR then ERR_PTR Just return the ptr directly. Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:36 -08:00
Saeed Mahameed	8271e341ed	net/mlx5e: accel, remove redundant space Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:36 -08:00
Tariq Toukan	26432001b5	net/mlx5e: kTLS, Improve TLS RX workqueue scope The TLS RX workqueue is needed only when kTLS RX device offload is supported. Move its creation from the general TLS init function to the kTLS RX init. Create it once at init time if supported, avoid creation/destroy everytime the feature bit is toggled. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:35 -08:00
Tom Rix	1d3a3f3bfe	net/mlx5e: remove h from printk format specifier This change fixes the checkpatch warning described in this commit commit `cbacb5ab0a` ("docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]") Standard integer promotion is already done and %hx and %hhx is useless so do not encourage the use of %hh[xudi] or %h[xudi]. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:35 -08:00
Noam Stolero	1dd55ba2fb	net/mlx5e: Increase indirection RQ table size to 256 Increasing the indirection RQ table size from 128 to 256 improves the packet distribution over the NIC HW queues for various cases. Let's take a look at the following scenario: Assuming RSS result distributed uniformly and indirection table is filled with queues in a cyclic manner. Let N be the number of queues on a given setup. If 256%N = 128%N = 0, then all queues have the same probability to be chosen for a given RSS result. This case doesn't improves nor degrade by this change. If 256%N != 0 and 128%N != 0, there is a remainder which will favor some queues. Increasing the indirection RQ table size to 256 reduce the ratio between the favored queues probability to be selected to the rest of the queues and improves the distribution. For example, let's assume the number of queues is 56. For a table size of 128, we have 128%56=16 queues which will have a 3/128 probability to be chosen and 2/128 for the rest 40. 16 queues have 1.5 times the probability to be chosen over the other 40. For a table size of 256, we have 256%56=32 queues which will have a 5/256 probability to be chosen and 4/256 probability for the rest 24 queues. Here 32 queues have 1.25 more probability to be chosen over the other 24. This shows that the larger indirection table size would more likely cause an even distribution. This change also aligns our mlx5 driver's indirection table size with other vendors. Signed-off-by: Noam Stolero <noams@nvidia.com> Reviewed-by: Tal Gilboa <talgi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:35 -08:00
Tariq Toukan	7637e499e2	net/mlx5e: Enable napi in channel's activation stage The channel's napi is first needed upon activation, not creation. Minimize its enabled scope by moving it from the channel's open/close stage into the activate/deactivate stage. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:34 -08:00
Roi Dayan	6b424e13b0	net/mlx5e: Move representor neigh init into profile enable Also cleanup neigh in profile disable. This is for logical separation. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:34 -08:00
Roi Dayan	9ba33339c0	net/mlx5e: Avoid false lock depenency warning on tc_ht To avoid false lock dependency warning set the tc_ht lock class different than the lock class of the ht being used when deleting last flow from a group and then deleting a group, we get into del_sw_flow_group() which call rhashtable_destroy on fg->ftes_hash which will take ht->mutex but it's different than the ht->mutex here. ====================================================== WARNING: possible circular locking dependency detected 5.11.0-rc4_net_next_mlx5_949fdcc #1 Not tainted ------------------------------------------------------ modprobe/12950 is trying to acquire lock: ffff88816510f910 (&node->lock){++++}-{3:3}, at: mlx5_del_flow_rules+0x2a/0x210 [mlx5_core] but task is already holding lock: ffff88815834e3e8 (&ht->mutex){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x37/0x340 which lock already depends on the new lock. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:34 -08:00
Roi Dayan	84db661247	net/mlx5e: Move set vxlan nic info to profile init Since its profile dependent let's init the vxlan info as part of profile initialization. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:33 -08:00
Roi Dayan	1227bbc5d0	net/mlx5e: Move netif_carrier_off() out of mlx5e_priv_init() It's not part of priv initialization. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:33 -08:00
Roi Dayan	c9fd1e33e9	net/mlx5e: Refactor mlx5e_netdev_init/cleanup to mlx5e_priv_init/cleanup We actually initialize priv and not netdev. The only call to set netdev carrier will be moved in the following commit. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-02-01 22:52:32 -08:00
Saeed Mahameed	c4d7eb5768	net/mxl5e: Add change profile method Port nic netdevice will be used as uplink representor in downstream patches. Add change profile method to allow changing a mlx5e netdevice profile dynamically. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>	2021-02-01 22:52:32 -08:00
Saeed Mahameed	3ef14e463f	net/mlx5e: Separate between netdev objects and mlx5e profiles initialization 1) Initialize netdevice features and structures on netdevice allocation and outside of the mlx5e profile. 2) As now mlx5e netdevice private params will be setup on profile init only after netdevice features are already set, we add a call to netde_update_features() to resolve any conflict. This is nice since we reuse the fix_features ndo code if a profile wants different default features, instead of duplicating features conflict resolution code on profile initialization. 3) With this we achieve total separation between mlx5e profiles and netdevices, and will allow replacing mlx5e profiles on the fly to reuse the same netdevice for multiple profiles. e.g. for uplink representor profile as shown in the following patch 4) Profile callbacks are not allowed to touch netdev->features directly anymore, since in downstream patch we will detach/attach netdev dynamically to profile, hence we move the code dealing with netdev->features from profile->init() to fix_features ndo, and we will call netdev_update_features() on mlx5e_attach_netdev(profile, netdev); Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>	2021-02-01 22:52:32 -08:00
Jakub Kicinski	1a2b60f6f1	mlx5-dr-2021-01-29 Add support for Connect-X6DX Software steering This series adds SW Steering support for Connect-X6DX. Since the STE and actions formats are different on this new HW, we implemented the HW specific STEv1 layer on the infrastructure implemented in previous mlx5 DR patchset to support all the functionalities as previous devices. Most of the code in this series very is low level HW specific, we implement the function pointers for the generic SW steering layer. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmAUwK8ACgkQSD+KveBX +j61CAgAnNwFuk8PQjdU6TR+nlPl0oQ0mVUIyWvP8giuyXPxTFLxo8wJVKnh9tfB JFHgaEzOXhdE6n3+/vknlN/NsFUpt6Kbg2cBXc65btEKKCdcm/D3Db45TUwu0o3d HE5cEWnJm/Qqvy7JvoVpzbNDcNh91AIdpWt95AxRYBFgWbcKvyz/Bq+DSb22grYz bSU2HMKZKpXtHbxOV0BsZ9b2si6hpIMKRXIofT3F5yVmx6t8M174NmD4u2h6VVaa v7dvZp7ItbnD61iJnKRLa3zftBptifDB/2wsei3W4wmfdAA1Uw9B2tPNJKboxMSa 8hWiBWE6U72rG6uz4fWd9V0mgkELXA== =CTbF -----END PGP SIGNATURE----- Merge tag 'mlx5-dr-2021-01-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-dr-2021-01-29 Add support for Connect-X6DX Software steering This series adds SW Steering support for Connect-X6DX. Since the STE and actions formats are different on this new HW, we implemented the HW specific STEv1 layer on the infrastructure implemented in previous mlx5 DR patchset to support all the functionalities as previous devices. Most of the code in this series very is low level HW specific, we implement the function pointers for the generic SW steering layer. * tag 'mlx5-dr-2021-01-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Allow SW steering for sw_owner_v2 devices net/mlx5: DR, Copy all 64B whenever replacing STE in the head of miss-list net/mlx5: DR, Use HW specific logic API when writing STE net/mlx5: DR, Use the right size when writing partial STE into HW net/mlx5: DR, Add STEv1 modify header logic net/mlx5: DR, Add STEv1 action apply logic net/mlx5: DR, Add STEv1 setters and getters net/mlx5: DR, Allow native protocol support for HW STEv1 net/mlx5: DR, Add HW STEv1 match logic net/mlx5: DR, Add match STEv1 structs to ifc net/mlx5: DR, Fix potential shift wrapping of 32-bit value ==================== Link: https://lore.kernel.org/r/20210130022618.317351-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-01 18:50:12 -08:00
Yevgeny Kliteynik	64f45c0fc4	net/mlx5: DR, Allow SW steering for sw_owner_v2 devices Allow sw_owner_v2 based on sw_format_version. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-29 18:13:00 -08:00
Yevgeny Kliteynik	8fdac12acf	net/mlx5: DR, Copy all 64B whenever replacing STE in the head of miss-list Till now the code assumed that need to copy reduced size of the ste because the rest is the mask part which shouldn't be changed. This is not true for all types of HW (like STEv1). Take all 64B from the new STE and write them in the replaced STE place. This change will make it easier to handle all STE HW types because we have all the data that is about to be written into HW. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-29 18:12:58 -08:00
Yevgeny Kliteynik	4fe45e1d31	net/mlx5: DR, Use HW specific logic API when writing STE STEv0 format and STEv1 HW format are different, each has a different order: STEv0: CTRL 32B, TAG 16B, BITMASK 16B STEv1: CTRL 32B, BITMASK 16B, TAG 16B To make this transparent to upper layers we introduce a new ste_ctx function to format the STE prior to writing it. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-01-29 18:12:55 -08:00

1 2 3 4 5 ...

7415 Commits