linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-18 01:34:14 +08:00

Author	SHA1	Message	Date
Petr Machata	aa7c062184	mlxsw: spectrum: Track buffer sizes in struct mlxsw_sp_hdroom So far, port buffers were always autoconfigured. When dcbnl_setbuffer callback is implemented, it will allow the user to change the buffer size configuration by hand. The sizes therefore need to be a configuration parameter, not always deduced, and therefore belong to struct mlxsw_sp_hdroom, where the configuration routine should take them from. Update mlxsw_sp_port_headroom_set() to update these sizes. Have the function update the sizes even for the case that a given buffer is not used. Additionally, change the loop iteration end to DCBX_MAX_BUFFERS instead of IEEE_8021QAZ_MAX_TCS. The value is the same, but the semantics differ. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-16 15:19:30 -07:00
Petr Machata	ca21e84e7e	mlxsw: spectrum: Track lossiness in struct mlxsw_sp_hdroom Client-side configuration has lossiness as an attribute of a priority. Therefore add a "lossy" attribute to struct mlxsw_sp_hdroom_prio. To a Spectrum ASIC, lossiness is a feature of a port buffer. Therefore add struct mlxsw_sp_hdroom_buf, which in the following patches will get more attributes, but right now only use it to track port buffer lossiness. Instead of passing around the primary indicators of PFC and pause_en, add a function mlxsw_sp_hdroom_bufs_reset_lossiness() to compute the buffer lossiness from the priority map and priority lossiness. Change mlxsw_sp_port_headroom_set() to take the buffer lossy flag from the headroom configuration. Have the PFC and pause handlers configure priority lossiness in mlxsw_sp_hdroom, from where it will propagate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-16 15:19:29 -07:00
Petr Machata	5df825ede4	mlxsw: spectrum: Track priorities in struct mlxsw_sp_hdroom The mapping from priorities to buffers determines which buffers should be configured. Lossiness of these priorities combined with the mapping determines whether a given buffer should be lossy. Currently this configuration is stored implicitly in DCB ETS, PFC and ethtool PAUSE configuration. Keeping it together with the rest of the headroom configuration and deriving it as needed from PFC / ETS / PAUSE will make things clearer. To that end, add a field "prios" to struct mlxsw_sp_hdroom. Previously, __mlxsw_sp_port_headroom_set() took prio_tc as an argument, and assumed that the same mapping as we use on the egress should be used on ingress as well. Instead, track this configuration at each priority, so that it can be adjusted flexibly. In the following patches, as dcbnl_setbuffer is implemented, it will need to store its own mapping, and it will also be sometimes necessary to revert back to the original ETS mapping. Therefore track two buffer indices: the one for chip configuration (buf_idx), and the source one (ets_buf_idx). Introduce a function to configure the chip-level buffer index, and for now have it simply copy the ETS mapping over to the chip mapping. Update the ETS handler to project prio_tc to the ets_buf_idx and invoke the buf_idx recomputation. Now that there is a canonical place to look for this configuration, mlxsw_sp_port_headroom_set() does not need to invent def_prio_tc to use if DCB is compiled out. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-16 15:19:29 -07:00
Petr Machata	0103a3e452	mlxsw: spectrum: Track MTU in struct mlxsw_sp_hdroom MTU influences sizes of auto-allocated buffers. Make it a part of port buffer configuration and have __mlxsw_sp_port_headroom_set() take it from there, instead of as an argument. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-16 15:19:29 -07:00
Petr Machata	b7e07bbd48	mlxsw: spectrum: Unify delay handling between PFC and pause When a priority is marked as lossless using DCB PFC, or when pause frames are enabled on a port, mlxsw adds to port buffers an extra space to cover the traffic that will arrive between the time that a pause or PFC frame is emitted, and the time traffic actually stops. This is called the delay. The concept is the same in PFC and pause, however the way the extra buffer space is calculated differs. In this patch, unify this handling. Delay is to be measured in bytes of extra space, and will not include MTU. PFC handler sets the delay directly from the parameter it gets through the DCB interface. To convert pause handler, move MLXSW_SP_PAUSE_DELAY to ethtool module, convert to bytes, and reduce it by maximum MTU, and divide by two. Then it has the same meaning as the delay_bytes set by the PFC handler. Keep the delay_bytes value in struct mlxsw_sp_hdroom introduced in the previous patch. Change PFC and pause handlers to store the new delay value there and have __mlxsw_sp_port_headroom_set() take it from there. Instead of mlxsw_sp_pfc_delay_get() and mlxsw_sp_pg_buf_delay_get(), introduce mlxsw_sp_hdroom_buf_delay_get() to calculate the delay provision. Drop the unnecessary MLXSW_SP_CELL_FACTOR, and instead add an explanatory comment describing the formula used. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-16 15:19:29 -07:00
Petr Machata	3a77f5a2d2	mlxsw: spectrum_buffers: Add struct mlxsw_sp_hdroom The port headroom handling is currently strewn across several modules and tricky to follow: MTU, DCB PFC, DCB ETS and ethtool pause all influence the settings, and then there is the completely separate initial configuraion in spectrum_buffers. A following patch will implement the dcbnl_setbuffer callback, which is going to further complicate the landscape. In order to simplify work with port buffers, the following patches are going to centralize all port-buffer handling in spectrum_buffers. As a first step, introduce a (currently empty) struct mlxsw_sp_hdroom that will keep the configuration parameters, and allocate and free it in appropriate places. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-16 15:19:29 -07:00
David S. Miller	045e42f3e6	mlx5-updates-2020-09-15 Various updates to mlx5 driver, 1) Eli adds support for TC trap action. 2) Eran, minor improvements to clock.c code structure 3) Better handling of error reporting in LAG from Jianbo 4) IPv6 traffic class (DSCP) header rewrite support from Maor 5) Ofer Levi adds support for CQE compression of multi-strides packets 6) Vu, Enables use of vport meta data by default. 7) Some minor code cleanup -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl9hDykACgkQSD+KveBX +j6qAQgAn4HWJp7Bu7S7okRbv1bg+uj7mQgU1oEU7P1xzpx2gfZcD0ejjwoxGV/8 iK/FC2KQeuBKqIkLPnQC1o4CH8Fk9kr2HuhmX46Gkn07ohyObf6w8fFVrGv/5QrB fWUWhu+TQJNA/qnMlCfQ5t5Jt+XYL0m7VdfhCHE3R5rmpcZ2PHhxmvoG/NlBLUUK kjggjtjX6Vv1CRit0w08FJwsJbqHy3wqpciX4Xc+wZp9A+D5VAyVtXP6ngaDIsAA RcUzGyH8x4gphnplySkvj/LXboaqiMtd8sPeXCOax2HlYarFAAnNG//7fwhfYIHe c/509buvfjSFsIwQYRem7d/abkU5Rw== =4r5e -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-09-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-09-15 Various updates to mlx5 driver, 1) Eli adds support for TC trap action. 2) Eran, minor improvements to clock.c code structure 3) Better handling of error reporting in LAG from Jianbo 4) IPv6 traffic class (DSCP) header rewrite support from Maor 5) Ofer Levi adds support for CQE compression of multi-strides packets 6) Vu, Enables use of vport meta data by default. 7) Some minor code cleanup ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-16 15:16:51 -07:00
David S. Miller	897dccb8db	Merge branch 'nexthop-Small-changes' Ido Schimmel says: ==================== nexthop: Small changes This patch set contains a few small changes that I split out of the RFC I sent last week [1]. Main change is the conversion of the nexthop notification chain to a blocking chain so that it could be reused by device drivers for nexthop objects programming in the future. Tested with fib_nexthops.sh: Tests passed: 164 Tests failed: 0 [1] https://lore.kernel.org/netdev/20200908091037.2709823-1-idosch@idosch.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 16:31:44 -07:00
Ido Schimmel	7a5e9d84f9	selftests: fib_nexthops: Test cleanup of FDB entries following nexthop deletion Commit `c7cdbe2efc` ("vxlan: support for nexthop notifiers") registered a listener in the VXLAN driver to the nexthop notification chain. Its purpose is to cleanup FDB entries that use a nexthop that is being deleted. Test that such FDB entries are removed when the nexthop group that they use is deleted. Test that entries are not deleted when a single nexthop in the group is deleted. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 16:31:31 -07:00
Ido Schimmel	0695564bb4	nexthop: Only emit a notification when nexthop is actually deleted Currently, the in-kernel delete notification is emitted from the error path of nexthop_add() and replace_nexthop(), which can be confusing to in-kernel listeners as they are not familiar with the nexthop. Instead, only emit the notification when the nexthop is actually deleted. The following sub-cases are covered: 1. User space deletes the nexthop 2. The nexthop is deleted by the kernel due to a netdev event (e.g., nexthop device going down) 3. A group is deleted because its last nexthop is being deleted 4. The network namespace of the nexthop device is deleted Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 16:31:25 -07:00
Ido Schimmel	80690ec6b5	nexthop: Convert to blocking notification chain Currently, the only listener of the nexthop notification chain is the VXLAN driver. Subsequent patches will add more listeners (e.g., device drivers such as netdevsim) that need to be able to block when processing notifications. Therefore, convert the notification chain to a blocking one. This is safe as notifications are always emitted from process context. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 16:31:17 -07:00
Ido Schimmel	52f7232a79	nexthop: Remove NEXTHOP_EVENT_ADD Not used anywhere. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 16:31:11 -07:00
Ido Schimmel	7d61588f69	nexthop: Remove unused function declaration from header file Not used or implemented anywhere. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 16:31:03 -07:00
Geert Uytterhoeven	e859536dac	chelsio/chtls: Re-add dependencies on CHELSIO_T4 to fix modular CHELSIO_T4 As CHELSIO_INLINE_CRYPTO is bool, and CHELSIO_T4 is tristate, the dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is not sufficient to protect CRYPTO_DEV_CHELSIO_TLS and CHELSIO_IPSEC_INLINE. The latter two are also tristate, hence if CHELSIO_T4=n, they cannot be builtin, as that would lead to link failures like: drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c:259: undefined reference to `cxgb4_port_viid' and drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c:752: undefined reference to `cxgb4_reclaim_completed_tx' Fix this by re-adding dependencies on CHELSIO_T4 to tristate symbols. The dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is kept to avoid asking the user. Fixes: `6bd860ac1c` ("chelsio/chtls: CHELSIO_INLINE_CRYPTO should depend on CHELSIO_T4") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:58:52 -07:00
David S. Miller	b18af883dc	Merge branch 'mlxsw-Introduce-fw_fatal-health-reporter-and-test-cmd-to -trigger-test-event' Ido Schimmel says: ==================== mlxsw: Introduce fw_fatal health reporter and test cmd to trigger test event Jiri says: This patch set introduces a health reporter for mlxsw that reports FW fatal events. Alongside that, it introduces a test command that is used to trigger a dummy FW fatal event by user: $ sudo devlink health test pci/0000:03:00.0 reporter fw_fatal $ devlink health pci/0000:03:00.0: reporter fw_fatal state error error 1 recover 0 last_dump_date 2020-07-27 last_dump_time 16:33:27 auto_dump true $ sudo devlink health dump show pci/0000:03:00.0 reporter fw_fatal -j -p { "irisc_id": 0, "event": [ "id": 3 ], "method": "query", "long_process": false, "command_type": "mad", "reg_attr_id": 0 } As a dependency, the FW validation and flashing is moved to core.c. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:57:16 -07:00
Jiri Pirko	7d83ee1110	mlxsw: core: Introduce fw_fatal health reporter Introduce devlink health reporter to report FW fatal events. Implement the event listener using MFDE trap and enable the events to be propagated using MFGD register configuration. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:57:16 -07:00
Jiri Pirko	e2ce94dc1d	devlink: introduce the health reporter test command Introduce a test command for health reporters. User might use this command to trigger test event on a reporter if the reporter supports it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:57:16 -07:00
Jiri Pirko	191c0c22b5	mlxsw: reg: Add Monitoring FW General Debug Register Introduce MFGD register that is used to configure firmware debugging. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:57:16 -07:00
Jiri Pirko	6ddac9dcb1	mlxsw: reg: Add Monitoring FW Debug Register Introduce MFDE register that is passed through MFDE trap in case of fatal FW event. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:57:16 -07:00
Jiri Pirko	703db0ceb8	mlxsw: Move fw_load_policy devlink param into core.c As the fw flashing code was moved to core.c, move the param which is related to it there as well. Remove unnecessary parentheses on the way. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:57:16 -07:00
Jiri Pirko	1fb0a49562	mlxsw: core: Push code doing params register/unregister into separate helpers Extract the code calling params register/unregister driver ops into separate functions. Call publish/unpublish unconditionally. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:57:16 -07:00
Jiri Pirko	b79cb787ac	mlxsw: Move fw flashing code into core.c As the firmware flashing is not specific to Spectrum, move the code to core.c and avoid one op call and 2 exported symbols. Also, this allows to do flash before call of driver->init function and possibly do other core calls in between. Do some small renaming here and there on the way to be consistent with the rest of core.c code. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:57:16 -07:00
Jiri Pirko	eab1924a2d	mlxsw: Bump firmware version to XX.2008.1310 Among other changes, this version supports FW monitoring. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:57:16 -07:00
David S. Miller	ef8e692d69	Merge branch 'net-stmmac-Add-ethtool-support-for-get-set-channels' Wong Vee Khee says: ==================== net: stmmac: Add ethtool support for get\|set channels This patch set is to add support for user to get or set Tx/Rx channel via ethtool. There are two patches that fixes bug introduced on upstream in order to have the feature work. Tested on Intel Tigerlake Platform. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:40:04 -07:00
Ong Boon Leong	9f19306d16	net: stmmac: use netif_tx_start\|stop_all_queues() function The current implementation of stmmac_stop_all_queues() and stmmac_start_all_queues() will not work correctly when the value of tx_queues_to_use is changed through ethtool -L DEVNAME rx N tx M command. Also, netif_tx_start\|stop_all_queues() are only needed in driver open() and close() only. Fixes: `c22a3f48` net: stmmac: adding multiple napi mechanism Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:39:57 -07:00
Aashish Verma	686cff3d70	net: stmmac: Fix incorrect location to set real_num_rx\|tx_queues netif_set_real_num_tx_queues() & netif_set_real_num_rx_queues() should be used to inform network stack about the real Tx & Rx queue (active) number in both stmmac_open() and stmmac_resume(), therefore, we move the code from stmmac_dvr_probe() to stmmac_hw_setup(). Fixes: `c02b7a9145` net: stmmac: use netif_set_real_num_{rx,tx}_queues Signed-off-by: Aashish Verma <aashishx.verma@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:39:47 -07:00
Ong Boon Leong	0366f7e06a	net: stmmac: add ethtool support for get/set channels Restructure NAPI add and delete process so that we can call them accordingly in open() and ethtool_set_channels() accordingly. Introduced stmmac_reinit_queues() to handle the transition needed for changing Rx & Tx channels accordingly. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 15:39:31 -07:00
David S. Miller	945c570488	Merge branch 'ethtool-add-pause-frame-stats' Jakub Kicinski says: ==================== ethtool: add pause frame stats This is the first (small) series which exposes some stats via the corresponding ethtool interface. Here (thanks to the excitability of netlink) we expose pause frame stats via the same interfaces as ethtool -a / -A. In particular the following stats from the standard: - 30.3.4.2 aPAUSEMACCtrlFramesTransmitted - 30.3.4.3 aPAUSEMACCtrlFramesReceived 4 real drivers are converted, I believe we got confirmation from maintainers that all exposed stats match the standard. v3: - fix mlx5 build - adjust the init logic in patch 1 v2: - netdevsim: add missing static - bnxt: fix sparse warning - mlx5: address Saeed's comments ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:26:29 -07:00
Jakub Kicinski	12d342fea1	mlx4: add pause frame stats Check if the pause stats are reported by HW by checking the bitmap. Calculation is based on the order of strings in main_strings from ethtool -S. Hopefully the semantics of these stats match the standard.. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:26:29 -07:00
Jakub Kicinski	098d9ed9ef	mlx5: add pause frame stats Plumb through all the indirection and copy some code from ethtool -S. The names of the group indicate that these are the stats we are after (and Saeed confirms it). v3: - fix build in mlx5_rep v2: - drop the ethool helper and call stats directly - don't pass 0 as initialized to in buffer - use local buffer Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:26:29 -07:00
Jakub Kicinski	eabbe2bb68	ixgbe: add pause frame stats Report standard pause frame stats. They are already aggregated in struct ixgbe_hw_stats. The combination of the registers is suggested as equivalent to PAUSEMACCtrlFramesTransmitted / PAUSEMACCtrlFramesReceived by the Intel 82576EB datasheet, I could not find any information in the HW actually supported by ixgbe. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:26:29 -07:00
Jakub Kicinski	423cffcf6c	bnxt: add pause frame stats These stats are already reported in ethtool -S. Michael confirms they are equivalent to standard stats. v2: - fix sparse warning about endian by using the macro - use u64 for pointer type Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:26:29 -07:00
Jakub Kicinski	242aaf03dc	selftests: add a test for ethtool pause stats Make sure the empty nest is reported even without stats. Make sure reporting only selected stats works fine. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:26:28 -07:00
Jakub Kicinski	ff1f7c17fb	netdevsim: add pause frame stats Add minimal ethtool interface for testing ethtool pause stats. v2: add missing static on nsim_ethtool_ops Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:26:28 -07:00
Jakub Kicinski	8c00bd936f	docs: net: include the new ethtool pause stats in the stats doc Tell people that there now is an interface for querying pause frames. A little bit of restructuring is needed given this is a first source of such statistics. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:26:28 -07:00
Jakub Kicinski	9a27a33027	ethtool: add standard pause stats Currently drivers have to report their pause frames statistics via ethtool -S, and there is a wide variety of names used for these statistics. Add the two statistics defined in IEEE 802.3x to the standard API. Create a new ethtool request header flag for including statistics in the response to GET commands. Always create the ETHTOOL_A_PAUSE_STATS nest in replies when flag is set. Testing if driver declares the op is not a reliable way of checking if any stats will actually be included and therefore we don't want to give the impression that presence of ETHTOOL_A_PAUSE_STATS indicates driver support. Note that this patch does not include PFC counters, which may fit better in dcbnl? But mostly I don't need them/have a setup to test them so I haven't looked deeply into exposing them :) v3: - add a helper for "uninitializing" stats, rather than a cryptic memset() (Andrew) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:26:28 -07:00
David S. Miller	0f9ad4e759	Merge branch 's390-qeth-next' Julian Wiedmann says: ==================== s390/qeth: updates 2020-09-10 subject to positive review by the bridge maintainers on patch 5, please apply the following patch series to netdev's net-next tree. Alexandra adds BR_LEARNING_SYNC support to qeth. In addition to the main qeth changes (controlling the feature, and raising switchdev events), this also needs - Patch 1 and 2 for some s390/cio infrastructure improvements (acked by Heiko to go in via net-next), and - Patch 5 to introduce a new switchdev_notifier_type, so that a driver can clear all previously learned entries from the bridge FDB in case things go out-of-sync later on. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	521c65b649	s390/qeth: implement ndo_bridge_setlink for learning_sync Documentation/networking/switchdev.txt and 'man bridge' indicate that the learning_sync bridge attribute is used to control whether a given device will sync MAC addresses learned on its device port to a master bridge FDB, where they will show up as 'extern_learn offload'. So we map qeth_l2_dev2br_an_set() to the learning_sync bridge link attribute. Turning off learning_sync will flush all extern_learn entries from the bridge fdb and all pending events from the card's work queue. When the hardware interface goes offline with learning_sync on (e.g. for HW recovery), all extern_learn entries will be flushed from the bridge fdb and all pending events from the card's work queue. When the interface goes online again, it will send new notifications for all then valid MACs. learning_sync attribute can not be modified while interface is offline. See 'commit `e6e771b3d8` ("s390/qeth: detach netdevice while card is offline")' An alternative implementation would be to always offload the 'learning' attribute of a software bridge to the hardware interface attached to it and thus implicitly enable fdb notification. This was not chosen for 2 reasons: 1) In our case the software bridge is NOT a representation of a hardware switch. It is just connected to a smart NIC that is able to inform about the addresses attached to it. It is not necessarily using source MAC learning for this and other bridgeports can be attached to other NICs with different properties. 2) We want a means to enable this notification explicitly. There may be cases where a bridgeport is set to 'learning', but we do not want to enable the notification. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	780b6e7db2	s390/qeth: implement ndo_bridge_getlink for learning_sync Documentation/networking/switchdev.txt and 'man bridge' indicate that the learning_sync bridge attribute is used to indicate whether a given device will sync MAC addresses learned on its device port to a master bridge FDB. learning_sync attribute can not be read while interface is offline (down). See 'commit `e6e771b3d8` ("s390/qeth: detach netdevice while card is offline")' We return EOPNOTSUPP and not EONODEV in this case, because EONOTSUPP is the only rc that is tolerated by 'bridge -d link show'. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	817741a8ea	s390/qeth: Reset address notification in case of buffer overflow In case hardware sends more device-to-bridge-address-change notfications than the qeth-l2 driver can handle, the hardware will send an overflow event and then stop sending any events. It expects software to flush its FDB and start over again. Re-enabling address-change-notification will report all current addresses. In order to re-enable address-change-notification this patch defines the functions qeth_l2_dev2br_an_set() and qeth_l2_dev2br_an_set_cb to enable or disable dev-to-bridge-address-notification. A following patch will use the learning_sync bridgeport flag to trigger enabling or disabling of address-change-notification, so we define priv->brport_features to store the current setting. BRIDGE_INFO and ADDR_INFO functionality are mutually exclusive, whereas ADDR_INFO and qeth_l2_vnicc* can be used together. Alternative implementations to handle buffer overflow: Just re-enabling notification and adding all newly reported addresses would cover any lost 'add' events, but not the lost 'delete' events. Then these invalid addresses would stay in the bridge FDB as long as the device exists. Setting the net device down and up, would be an alternative, but is a bit drastic. If the net device has many secondary addresses this will create many delete/add events at its peers which could de-stabilize the network segment. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	d05e8e68b0	bridge: Add SWITCHDEV_FDB_FLUSH_TO_BRIDGE notifier so the switchdev can notifiy the bridge to flush non-permanent fdb entries for this port. This is useful whenever the hardware fdb of the switchdev is reset, but the netdev and the bridgeport are not deleted. Note that this has the same effect as the IFLA_BRPORT_FLUSH attribute. CC: Jiri Pirko <jiri@resnulli.us> CC: Ivan Vecera <ivecera@redhat.com> CC: Roopa Prabhu <roopa@nvidia.com> CC: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	10a6cfc0fc	s390/qeth: Translate address events into switchdev notifiers A qeth-l2 HiperSockets card can show switch-ish behaviour in the sense, that it can report all MACs that are reachable via this interface. Just like a switch device, it can notify the software bridge about changes to its fdb. This patch exploits this device-to-bridge-notification and extracts the relevant information from the hardware events to generate notifications to an attached software bridge. There are 2 sources for this information: 1) The reply message of Perform-Network-Subchannel-Operations (PNSO) (operation code ADDR_INFO) reports all addresses that are currently reachable (implemented in a later patch). 2) As long as device-to-bridge-notification is enabled, hardware will generate address change notification events, whenever the content of the hardware fdb changes (this patch). The bridge_hostnotify feature (PNSO operation code BRIDGE_INFO) uses the same address change notification events. We need to distinguish between qeth_pnso_mode QETH_PNSO_BRIDGEPORT and QETH_PNSO_ADDR_INFO and call a different handler. In both cases deadlocks must be prevented, if the workqueue is drained under lock and QETH_PNSO_NONE, when notification is disabled. bridge_hostnotify generates udev events, there is no intend to do the same for dev2br. Instead this patch will generate SWITCHDEV_FDB_ADD_TO_BRIDGE and SWITCHDEV_FDB_DEL_TO_BRIDGE notifications, that will cause the software bridge to add (or delete) entries to its fdb as 'extern_learn offload'. Documentation/networking/switchdev.txt proposes to add "depends NET_SWITCHDEV" to driver's Kconfig. This is not done here, so even in absence of the NET_SWITCHDEV module, the QETH_L2 module will still be built, but then the switchdev notifiers will have no effect. No VLAN filtering is done on the entries and VLAN information is not passed on to the bridge fdb entries. This could be added later. For now VLAN interfaces can be defined on the upper bridge interface. Multicast entries are not passed on to the bridge fdb. This could be added later. For now mcast flooding can be used in the bridge. The card reports all MACs that are in its FDB, but we must not pass on MACs that are registered for this interface. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	fa115adff2	s390/qeth: Detect PNSO OC3 capability This patch detects whether device-to-bridge-notification, provided by the Perform Network Subchannel Operation (PNSO) operation code ADDR_INFO (OC3), is supported by this card. A following patch will map this to the learning_sync bridgeport flag, so we store it in priv->brport_hw_features in bridgeport flag format. Only IQD cards provide PNSO. There is a feature bit to indicate whether the machine provides OC3, unfortunately it is not set on old machines. So PNSO is called to find out. As this will disable notification and is exclusive with bridgeport_notification, this must be done during card initialisation before previous settings are restored. PNSO functionality requires some configuration values that are added to the qeth_card.info structure. Some helper functions are defined to fill them out when the card is brought online and some other places are adapted, that can also benefit from these fields. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	b983aa1f7d	s390/cio: Helper functions to read CSSID, IID, and CHID Add helper functions to expose Channel Subsystem ID (CSSID), MIF Image Id (IID), Channel ID (CHID) and Channel Path ID (CHPID). These values are required by the qeth driver's exploitation of network- address-change-notifications to determine which entries belong to this interface. Store the Partition identifier in System log, as this may be used to map a Linux view to a Hardware view for debugging purpose. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:46 -07:00
Alexandra Winter	4fea49a79e	s390/cio: Add new Operation Code OC3 to PNSO Add support for operation code 3 (OC3) of the Perform-Network-Subchannel-Operations (PNSO) function of the Channel-Subsystem-Call (CHSC) instruction. PNSO provides 2 operation codes: OC0 - BRIDGE_INFO OC3 - ADDR_INFO (new) Extend the function calls to pnso to pass the OC and add new response code 0108. Support for OC3 is indicated by a flag in the css_general_characteristics. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:46 -07:00
Ofer Levi	b7cf0806e8	net/mlx5e: Add CQE compression support for multi-strides packets Add CQE compression support for completions of packets that span multiple strides in a Striding RQ, per the HW capability. In our memory model, we use small strides (256B as of today) for the non-linear SKB mode. This feature allows CQE compression to work also for multiple strides packets. In this case decompressing the mini CQE array will use stride index provided by HW as part of the mini CQE. Before this feature, compression was possible only for single-strided packets, i.e. for packets of size up to 256 bytes when in non-linear mode, and the index was maintained by SW. This feature is supported for ConnectX-5 and above. Feature performance test: This was whitebox-tested, we reduced the PCI speed from 125Gb/s to 62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming packets before building the SKB to achieve low cpu utilization. Outcome is low cpu utilization and bottleneck on pci only. Test setup: Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores NIC: ConnectX-6 DX. Sender side generates 300 byte packets at full pci bandwidth. Receiver side configuration: Single channel, one cpu processing with one ring allocated. Cpu utilization is ~20% while pci bandwidth is fully utilized. For the generated traffic and interface MTU of 4500B (to activate the non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps to ~21Mpps. Without this feature, counters show no CQE compression blocks for this setup, while with the feature, counters show ~20.7Mpps compressed CQEs in ~500K compression blocks. Signed-off-by: Ofer Levi <oferle@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:53 -07:00
Maor Dickman	748cde9a38	net/mlx5e: Add IPv6 traffic class (DSCP) header rewrite support Add support for rewriting of IPV6 DSCP part of traffic class field. Next commands, for example, can be used to offload rewrite action: OVS: $ ovs-ofctl add-flow ovs-sriov "tcpv6, in_port=REP, \ actions=mod_nw_tos:68, output:NIC" iproute2: $ tc filter add dev REP ingress protocol ipv6 prio 1 flower skip_sw \ ip_proto tcp \ action pedit ex munge ip6 traffic_class set 68 retain 0xfc pipe \ action mirred egress redirect dev NIC Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:53 -07:00
Eli Cohen	f02882102b	net/mlx5e: Add support for tc trap Support tc trap such that packets can explicitly be forwarded to slow path if they match a specific rule. In the example below, we want packets with src IP equals 7.7.7.8 to be forwarded to software, in which case it will get to the appropriate representor net device. $ tc filter add dev eth1 protocol ip prio 1 root flower skip_sw \ src_ip 7.7.7.8 action trap Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:52 -07:00
Vu Pham	cd1ef96621	net/mlx5: E-Switch, Use vport metadata matching by default Multiple features use metadata matching such as bond vport in live migration, multi-port RoCE mode, stacked devices; hence, enable vport metadata matching by default. Fixes: `1e62e222db` ("net/mlx5: E-Switch, Use vport metadata matching only when mandatory") Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:52 -07:00
Vu Pham	fc99c3d637	net/mlx5: E-Switch, Setup all vports' metadata to support peer miss rule In merged eswitch configuration, peer miss rule is setup for all vports. If metadata is enabled, peer miss rule with metadata matching will be configured instead of source port matching; however, some vports that have not yet been enabled don't have default_metadata setup and their default_metadata will be zero. Hence, setup/cleanup default metadata for all vports when eswitch moves in/out of offloads mode. Fixes: `133dcfc577` ("net/mlx5: E-Switch, Alloc and free unique metadata for match") Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-15 11:59:52 -07:00

1 2 3 4 5 ...

951150 Commits