linux/drivers/net/ethernet/intel/ice/ice.h

905 lines
26 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2018, Intel Corporation. */
#ifndef _ICE_H_
#define _ICE_H_
#include <linux/types.h>
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/firmware.h>
#include <linux/netdevice.h>
#include <linux/compiler.h>
#include <linux/etherdevice.h>
#include <linux/skbuff.h>
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
#include <linux/cpumask.h>
#include <linux/rtnetlink.h>
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
#include <linux/if_vlan.h>
#include <linux/dma-mapping.h>
#include <linux/pci.h>
#include <linux/workqueue.h>
ice: implement device flash update via devlink Use the newly added pldmfw library to implement device flash update for the Intel ice networking device driver. This support uses the devlink flash update interface. The main parts of the flash include the Option ROM, the netlist module, and the main NVM data. The PLDM firmware file contains modules for each of these components. Using the pldmfw library, the provided firmware file will be scanned for the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for the main NVM module containing the primary device firmware, and "fw.netlist" containing the netlist module. The flash is separated into two banks, the active bank containing the running firmware, and the inactive bank which we use for update. Each module is updated in a staged process. First, the inactive bank is erased, preparing the device for update. Second, the contents of the component are copied to the inactive portion of the flash. After all components are updated, the driver signals the device to switch the active bank during the next EMP reset (which would usually occur during the next reboot). Although the firmware AdminQ interface does report an immediate status for each command, the NVM erase and NVM write commands receive status asynchronously. The driver must not continue writing until previous erase and write commands have finished. The real status of the NVM commands is returned over the receive AdminQ. Implement a simple interface that uses a wait queue so that the main update thread can sleep until the completion status is reported by firmware. For erasing the inactive banks, this can take quite a while in practice. To help visualize the process to the devlink application and other applications based on the devlink netlink interface, status is reported via the devlink_flash_update_status_notify. While we do report status after each 4k block when writing, there is no real status we can report during erasing. We simply must wait for the complete module erasure to finish. With this implementation, basic flash update for the ice hardware is supported. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24 08:22:03 +08:00
#include <linux/wait.h>
#include <linux/aer.h>
#include <linux/interrupt.h>
#include <linux/ethtool.h>
#include <linux/timer.h>
#include <linux/delay.h>
#include <linux/bitmap.h>
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
#include <linux/log2.h>
#include <linux/ip.h>
#include <linux/sctp.h>
#include <linux/ipv6.h>
#include <linux/pkt_sched.h>
#include <linux/if_bridge.h>
#include <linux/ctype.h>
#include <linux/bpf.h>
#include <linux/btf.h>
#include <linux/auxiliary_bus.h>
#include <linux/avf/virtchnl.h>
ice: Implement aRFS Enable accelerated Receive Flow Steering (aRFS). It is used to steer Rx flows to a specific queue. This functionality is triggered by the network stack through ndo_rx_flow_steer and requires Flow Director (ntuple on) to function. The fltr_info is used to add/remove/update flow rules in the HW, the fltr_state is used to determine what to do with the filter with respect to HW and/or SW, and the flow_id is used in co-ordination with the network stack. The work for aRFS is split into two paths: the ndo_rx_flow_steer operation and the ice_service_task. The former is where the kernel hands us an Rx SKB among other items to setup aRFS and the latter is where the driver adds/updates/removes filter rules from HW and updates filter state. In the Rx path the following things can happen: 1. New aRFS entries are added to the hash table and the state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 2. aRFS entries have their Rx Queue updated if we receive a pre-existing flow_id and the filter state is ICE_ARFS_ACTIVE. The state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 3. aRFS entries marked as ICE_ARFS_TODEL are deleted In the ice_service_task path the following things can happen: 1. New aRFS entries marked as ICE_ARFS_INACTIVE are added or updated in HW. and their state is updated to ICE_ARFS_ACTIVE. 2. aRFS entries are deleted from HW and their state is updated to ICE_ARFS_TODEL. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-12 09:01:46 +08:00
#include <linux/cpu_rmap.h>
ice: replace custom AIM algorithm with kernel's DIM library The ice driver has support for adaptive interrupt moderation, an algorithm for tuning the interrupt rate dynamically. This algorithm is based on various assumptions about ring size, socket buffer size, link speed, SKB overhead, ethernet frame overhead and more. The Linux kernel has support for a dynamic interrupt moderation algorithm known as "dimlib". Replace the custom driver-specific implementation of dynamic interrupt moderation with the kernel's algorithm. The Intel hardware has a different hardware implementation than the originators of the dimlib code had to work with, which requires the driver to use a slightly different set of inputs for the actual moderation values, while getting all the advice from dimlib of better/worse, shift left or right. The change made for this implementation is to use a pair of values for each of the 5 "slots" that the dimlib moderation expects, and the driver will program those pairs when dimlib recommends a slot to use. The currently implementation uses two tables, one for receive and one for transmit, and the pairs of values in each slot set the maximum delay of an interrupt and a maximum number of interrupts per second (both expressed in microseconds). There are two separate kinds of bugs fixed by using DIMLIB, one is UDP single stream send was too slow, and the other is that 8K ping-pong was going to the most aggressive moderation and has much too high latency. The overall result of using DIMLIB is that we meet or exceed our performance expectations set based on the old algorithm. Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-01 05:16:57 +08:00
#include <linux/dim.h>
#include <net/pkt_cls.h>
ice: Add tc-flower filter support for channel Add support to add/delete channel specific filter using tc-flower. For now, only supported action is "skip_sw hw_tc <tc_num>" Filter criteria is specific to channel and it can be combination of L3, L3+L4, L2+L4. Example: MATCH criteria Action --------------------------- src and/or dest IPv4[6]/mask -> Forward to "hw_tc <tc_num>" dest IPv4[6]/mask + dest L4 port -> Forward to "hw_tc <tc_num>" dest MAC + dest L4 port -> Forward to "hw_tc <tc_num>" src IPv4[6]/mask + src L4 port -> Forward to "hw_tc <tc_num>" src MAC + src L4 port -> Forward to "hw_tc <tc_num>" Adding tc-flower filter for channel using "hw_tc" ------------------------------------------------- tc qdisc add dev <ethX> clsact Above two steps are only needed the first time when adding tc-flower filter. tc filter add dev <ethX> protocol ip ingress prio 1 flower \ dst_ip 192.168.0.1/32 ip_proto tcp dst_port 5001 \ skip_sw hw_tc 1 tc filter show dev <ethX> ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 hw_tc 1 eth_type ipv4 ip_proto tcp dst_ip 192.168.0.1 dst_port 5001 skip_sw in_hw in_hw_count 1 Delete specific filter: ------------------------- tc filter del dev <ethx> ingress pref 1 handle 0x1 flower Delete All filters: ------------------ tc filter del dev <ethX> ingress Co-developed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-16 07:35:17 +08:00
#include <net/tc_act/tc_mirred.h>
#include <net/tc_act/tc_gact.h>
#include <net/ip.h>
#include <net/devlink.h>
#include <net/ipv6.h>
#include <net/xdp_sock.h>
#include <net/xdp_sock_drv.h>
#include <net/geneve.h>
#include <net/gre.h>
#include <net/udp_tunnel.h>
#include <net/vxlan.h>
#if IS_ENABLED(CONFIG_DCB)
#include <scsi/iscsi_proto.h>
#endif /* CONFIG_DCB */
#include "ice_devids.h"
#include "ice_type.h"
#include "ice_txrx.h"
#include "ice_dcb.h"
2018-03-20 22:58:08 +08:00
#include "ice_switch.h"
#include "ice_common.h"
ice: enable ndo_setup_tc support for mqprio_qdisc Add support in driver for TC_QDISC_SETUP_MQPRIO. This support enables instantiation of channels in HW using existing MQPRIO infrastructure which is extended to be offloadable. This provides a mechanism to configure dedicated set of queues for each TC. Configuring channels using "tc mqprio": -------------------------------------- tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \ queues 4@0 4@4 4@8 hw 1 mode channel Above command configures 3 TCs having 4 queues each. "hw 1 mode channel" implies offload of channel configuration to HW. When driver processes configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each TC maps to HW VSI with specified queues. User can optionally specify bandwidth min and max rate limit per TC (see example below). If shaper params like min and/or max bandwidth rate limit are specified, driver configures VSI specific rate limiter in HW. Configuring channels and bandwidth shaper parameters using "tc mqprio": ---------------------------------------------------------------- tc qdisc add dev <ethX> root mqprio \ num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \ shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \ max_rate 4Gbit 5Gbit 6Gbit 7Gbit Command to view configured TCs: ----------------------------- tc qdisc show dev <ethX> Deleting TCs: ------------ tc qdisc del dev <ethX> root mqprio Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-16 07:35:16 +08:00
#include "ice_flow.h"
2018-03-20 22:58:08 +08:00
#include "ice_sched.h"
#include "ice_idc_int.h"
#include "ice_virtchnl_pf.h"
#include "ice_sriov.h"
#include "ice_ptp.h"
#include "ice_fdir.h"
#include "ice_xsk.h"
ice: Implement aRFS Enable accelerated Receive Flow Steering (aRFS). It is used to steer Rx flows to a specific queue. This functionality is triggered by the network stack through ndo_rx_flow_steer and requires Flow Director (ntuple on) to function. The fltr_info is used to add/remove/update flow rules in the HW, the fltr_state is used to determine what to do with the filter with respect to HW and/or SW, and the flow_id is used in co-ordination with the network stack. The work for aRFS is split into two paths: the ndo_rx_flow_steer operation and the ice_service_task. The former is where the kernel hands us an Rx SKB among other items to setup aRFS and the latter is where the driver adds/updates/removes filter rules from HW and updates filter state. In the Rx path the following things can happen: 1. New aRFS entries are added to the hash table and the state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 2. aRFS entries have their Rx Queue updated if we receive a pre-existing flow_id and the filter state is ICE_ARFS_ACTIVE. The state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 3. aRFS entries marked as ICE_ARFS_TODEL are deleted In the ice_service_task path the following things can happen: 1. New aRFS entries marked as ICE_ARFS_INACTIVE are added or updated in HW. and their state is updated to ICE_ARFS_ACTIVE. 2. aRFS entries are deleted from HW and their state is updated to ICE_ARFS_TODEL. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-12 09:01:46 +08:00
#include "ice_arfs.h"
#include "ice_repr.h"
#include "ice_eswitch.h"
#include "ice_lag.h"
#define ICE_BAR0 0
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
#define ICE_REQ_DESC_MULTIPLE 32
#define ICE_MIN_NUM_DESC 64
#define ICE_MAX_NUM_DESC 8160
#define ICE_DFLT_MIN_RX_DESC 512
#define ICE_DFLT_NUM_TX_DESC 256
#define ICE_DFLT_NUM_RX_DESC 2048
#define ICE_DFLT_TRAFFIC_CLASS BIT(0)
#define ICE_INT_NAME_STR_LEN (IFNAMSIZ + 16)
#define ICE_AQ_LEN 192
#define ICE_MBXSQ_LEN 64
#define ICE_SBQ_LEN 64
ice: Fix MSI-X vector fallback logic The current MSI-X enablement logic tries to enable best-case MSI-X vectors and if that fails we only support a bare-minimum set. This includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X for the OICR interrupt. Unfortunately, the driver fails to load when we don't get as many MSI-X as requested for a couple reasons. First, the code to allocate MSI-X in the driver tries to allocate num_online_cpus() MSI-X for LAN traffic without caring about the number of MSI-X actually enabled/requested from the kernel for LAN traffic. So, when calling ice_get_res() for the PF VSI, it returns failure because the number of available vectors is less than requested. Fix this by not allowing the PF VSI to allocation more than pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues. Limiting the number of queues is done because we don't want more than 1 Tx/Rx queue per interrupt due to performance conerns. Second, the driver assigns pf->num_lan_msix = 2, to account for LAN traffic and the OICR. However, pf->num_lan_msix is only meant for LAN MSI-X. This is causing a failure when the PF VSI tries to allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has already been reserved, so there may not be enough MSI-X vectors left. Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for the failure case. Update the related defines used in ice_ena_msix_range() to align with the above behavior and remove the unused RDMA defines because RDMA is currently not supported. Also, remove the now incorrect comment. Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-01-22 02:38:06 +08:00
#define ICE_MIN_LAN_TXRX_MSIX 1
#define ICE_MIN_LAN_OICR_MSIX 1
#define ICE_MIN_MSIX (ICE_MIN_LAN_TXRX_MSIX + ICE_MIN_LAN_OICR_MSIX)
#define ICE_FDIR_MSIX 2
#define ICE_RDMA_NUM_AEQ_MSIX 4
#define ICE_MIN_RDMA_MSIX 2
#define ICE_ESWITCH_MSIX 1
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
#define ICE_NO_VSI 0xffff
#define ICE_VSI_MAP_CONTIG 0
#define ICE_VSI_MAP_SCATTER 1
#define ICE_MAX_SCATTER_TXQS 16
#define ICE_MAX_SCATTER_RXQS 16
#define ICE_Q_WAIT_RETRY_LIMIT 10
#define ICE_Q_WAIT_MAX_RETRY (5 * ICE_Q_WAIT_RETRY_LIMIT)
#define ICE_MAX_LG_RSS_QS 256
#define ICE_RES_VALID_BIT 0x8000
#define ICE_RES_MISC_VEC_ID (ICE_RES_VALID_BIT - 1)
#define ICE_RES_RDMA_VEC_ID (ICE_RES_MISC_VEC_ID - 1)
/* All VF control VSIs share the same IRQ, so assign a unique ID for them */
#define ICE_RES_VF_CTRL_VEC_ID (ICE_RES_RDMA_VEC_ID - 1)
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
#define ICE_INVAL_Q_INDEX 0xffff
ice: Refactor VSI allocation, deletion and rebuild flow This patch refactors aspects of the VSI allocation, deletion and rebuild flow. Some of the more noteworthy changes are described below. 1) On reset, all switch filters applied in the hardware are lost. In the rebuild flow, only MAC and broadcast filters are being restored. Instead, use a new function ice_replay_all_fltr to restore all the filters that were previously added. To do this, remove calls to ice_remove_vsi_fltr to prevent cleaning out the internal bookkeeping structures that ice_replay_all_fltr uses to replay filters. 2) Introduce a new state bit __ICE_PREPARED_FOR_RESET to distinguish the PF that requested the reset (and consequently prepared for it) from the rest of the PFs. These other PFs will prepare for reset only when they receive an interrupt from the firmware. 3) Use new functions ice_add_vsi and ice_free_vsi to create and destroy VSIs respectively. These functions accept a handle to uniquely identify a VSI. This same handle is required to rebuild the VSI post reset. To prevent confusion, the existing ice_vsi_add was renamed to ice_vsi_init. 4) Enhance ice_vsi_setup for the upcoming SR-IOV changes and expose a new wrapper function ice_pf_vsi_setup to create PF VSIs. Rework the error handling path in ice_setup_pf_sw. 5) Introduce a new function ice_vsi_release_all to release all PF VSIs. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-09 21:29:50 +08:00
#define ICE_INVAL_VFID 256
#define ICE_MAX_RXQS_PER_TC 256 /* Used when setting VSI context per TC Rx queues */
#define ICE_CHNL_START_TC 1
#define ICE_MAX_RESET_WAIT 20
#define ICE_VSIQF_HKEY_ARRAY_SIZE ((VSIQF_HKEY_MAX_INDEX + 1) * 4)
#define ICE_DFLT_NETIF_M (NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK)
#define ICE_MAX_MTU (ICE_AQ_SET_MAC_FRAME_SIZE_MAX - ICE_ETH_PKT_HDR_PAD)
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
#define ICE_UP_TABLE_TRANSLATE(val, i) \
(((val) << ICE_AQ_VSI_UP_TABLE_UP##i##_S) & \
ICE_AQ_VSI_UP_TABLE_UP##i##_M)
#define ICE_TX_DESC(R, i) (&(((struct ice_tx_desc *)((R)->desc))[i]))
#define ICE_RX_DESC(R, i) (&(((union ice_32b_rx_flex_desc *)((R)->desc))[i]))
#define ICE_TX_CTX_DESC(R, i) (&(((struct ice_tx_ctx_desc *)((R)->desc))[i]))
#define ICE_TX_FDIRDESC(R, i) (&(((struct ice_fltr_desc *)((R)->desc))[i]))
ice: enable ndo_setup_tc support for mqprio_qdisc Add support in driver for TC_QDISC_SETUP_MQPRIO. This support enables instantiation of channels in HW using existing MQPRIO infrastructure which is extended to be offloadable. This provides a mechanism to configure dedicated set of queues for each TC. Configuring channels using "tc mqprio": -------------------------------------- tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \ queues 4@0 4@4 4@8 hw 1 mode channel Above command configures 3 TCs having 4 queues each. "hw 1 mode channel" implies offload of channel configuration to HW. When driver processes configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each TC maps to HW VSI with specified queues. User can optionally specify bandwidth min and max rate limit per TC (see example below). If shaper params like min and/or max bandwidth rate limit are specified, driver configures VSI specific rate limiter in HW. Configuring channels and bandwidth shaper parameters using "tc mqprio": ---------------------------------------------------------------- tc qdisc add dev <ethX> root mqprio \ num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \ shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \ max_rate 4Gbit 5Gbit 6Gbit 7Gbit Command to view configured TCs: ----------------------------- tc qdisc show dev <ethX> Deleting TCs: ------------ tc qdisc del dev <ethX> root mqprio Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-16 07:35:16 +08:00
/* Minimum BW limit is 500 Kbps for any scheduler node */
#define ICE_MIN_BW_LIMIT 500
/* User can specify BW in either Kbit/Mbit/Gbit and OS converts it in bytes.
* use it to convert user specified BW limit into Kbps
*/
#define ICE_BW_KBPS_DIVISOR 125
ice: Support link events, reset and rebuild Link events are posted to a PF's admin receive queue (ARQ). This patch adds the ability to detect and process link events. This patch also adds the ability to process resets. The driver can process the following resets: 1) EMP Reset (EMPR) 2) Global Reset (GLOBR) 3) Core Reset (CORER) 4) Physical Function Reset (PFR) EMPR is the largest level of reset that the driver can handle. An EMPR resets the manageability block and also the data path, including PHY and link for all the PFs. The affected PFs are notified of this event through a miscellaneous interrupt. GLOBR is a subset of EMPR. It does everything EMPR does except that it doesn't reset the manageability block. CORER is a subset of GLOBR. It does everything GLOBR does but doesn't reset PHY and link. PFR is a subset of CORER and affects only the given physical function. In other words, PFR can be thought of as a CORER for a single PF. Since only the issuing PF is affected, a PFR doesn't result in the miscellaneous interrupt being triggered. All the resets have the following in common: 1) Tx/Rx is halted and all queues are stopped. 2) All the VSIs and filters programmed for the PF are lost and have to be reprogrammed. 3) Control queue interfaces are reset and have to be reprogrammed. In the rebuild flow, control queues are reinitialized, VSIs are reallocated and filters are restored. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:18 +08:00
/* Macro for each VSI in a PF */
#define ice_for_each_vsi(pf, i) \
for ((i) = 0; (i) < (pf)->num_alloc_vsi; (i)++)
/* Macros for each Tx/Xdp/Rx ring in a VSI */
#define ice_for_each_txq(vsi, i) \
for ((i) = 0; (i) < (vsi)->num_txq; (i)++)
#define ice_for_each_xdp_txq(vsi, i) \
for ((i) = 0; (i) < (vsi)->num_xdp_txq; (i)++)
#define ice_for_each_rxq(vsi, i) \
for ((i) = 0; (i) < (vsi)->num_rxq; (i)++)
/* Macros for each allocated Tx/Rx ring whether used or not in a VSI */
ice: Report stats for allocated queues via ethtool stats It is not safe to have the string table for statistics change order or size over the lifetime of a given netdevice. This is because of the nature of the 3-step process for obtaining stats. First, user space performs a request for the size of the strings table. Second it performs a separate request for the strings themselves, after allocating space for the table. Third, it requests the stats themselves, also allocating space for the table. If the size decreased, there is potential to see garbage data or stats values. In the worst case, we could potentially see stats values become mis-aligned with their strings, so that it looks like a statistic is being reported differently than it actually is. Even worse, if the size increased, there is potential that the strings table or stats table was not allocated large enough and the stats code could access and write to memory it should not, potentially resulting in undefined behavior and system crashes. It isn't even safe if the size always changes under the RTNL lock. This is because the calls take place over multiple user space commands, so it is not possible to hold the RTNL lock for the entire duration of obtaining strings and stats. Further, not all consumers of the ethtool API are the user space ethtool program, and it is possible that one assumes the strings will not change (valid under the current contract), and thus only requests the stats values when requesting stats in a loop. Finally, it's not possible in the general case to detect when the size changes, because it is quite possible that one value which could impact the stat size increased, while another decreased. This would result in the same total number of stats, but reordering them so that stats no longer line up with the strings they belong to. Since only size changes aren't enough, we would need some sort of hash or token to determine when the strings no longer match. This would require extending the ethtool stats commands, but there is no more space in the relevant structures. The real solution to resolve this would be to add a completely new API for stats, probably over netlink. In the ice driver, the only thing impacting the stats that is not constant is the number of queues. Instead of reporting stats for each used queue, report stats for each allocated queue. We do not change the number of queues allocated for a given netdevice, as we pass this into the alloc_etherdev_mq() function to set the num_tx_queues and num_rx_queues. This resolves the potential bugs at the slight cost of displaying many queue statistics which will not be activated. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-09 21:28:54 +08:00
#define ice_for_each_alloc_txq(vsi, i) \
for ((i) = 0; (i) < (vsi)->alloc_txq; (i)++)
#define ice_for_each_alloc_rxq(vsi, i) \
for ((i) = 0; (i) < (vsi)->alloc_rxq; (i)++)
#define ice_for_each_q_vector(vsi, i) \
for ((i) = 0; (i) < (vsi)->num_q_vectors; (i)++)
#define ice_for_each_chnl_tc(i) \
for ((i) = ICE_CHNL_START_TC; (i) < ICE_CHNL_MAX_TC; (i)++)
#define ICE_UCAST_PROMISC_BITS (ICE_PROMISC_UCAST_TX | ICE_PROMISC_UCAST_RX)
#define ICE_UCAST_VLAN_PROMISC_BITS (ICE_PROMISC_UCAST_TX | \
ICE_PROMISC_UCAST_RX | \
ICE_PROMISC_VLAN_TX | \
ICE_PROMISC_VLAN_RX)
#define ICE_MCAST_PROMISC_BITS (ICE_PROMISC_MCAST_TX | ICE_PROMISC_MCAST_RX)
#define ICE_MCAST_VLAN_PROMISC_BITS (ICE_PROMISC_MCAST_TX | \
ICE_PROMISC_MCAST_RX | \
ICE_PROMISC_VLAN_TX | \
ICE_PROMISC_VLAN_RX)
#define ice_pf_to_dev(pf) (&((pf)->pdev->dev))
enum ice_feature {
ICE_F_DSCP,
ICE_F_SMA_CTRL,
ICE_F_MAX
};
ice: introduce XDP_TX fallback path Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-19 20:00:03 +08:00
DECLARE_STATIC_KEY_FALSE(ice_xdp_locking_key);
struct ice_channel {
struct list_head list;
u8 type;
u16 sw_id;
u16 base_q;
u16 num_rxq;
u16 num_txq;
u16 vsi_num;
u8 ena_tc;
struct ice_aqc_vsi_props info;
u64 max_tx_rate;
u64 min_tx_rate;
atomic_t num_sb_fltr;
struct ice_vsi *ch_vsi;
};
struct ice_txq_meta {
u32 q_teid; /* Tx-scheduler element identifier */
u16 q_id; /* Entry in VSI's txq_map bitmap */
u16 q_handle; /* Relative index of Tx queue within TC */
u16 vsi_idx; /* VSI index that Tx queue belongs to */
u8 tc; /* TC number that Tx queue belongs to */
};
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
struct ice_tc_info {
u16 qoffset;
u16 qcount_tx;
u16 qcount_rx;
u8 netdev_tc;
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
};
struct ice_tc_cfg {
u8 numtc; /* Total number of enabled TCs */
u16 ena_tc; /* Tx map */
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
struct ice_tc_info tc_info[ICE_MAX_TRAFFIC_CLASS];
};
struct ice_res_tracker {
u16 num_entries;
ice: Refactor interrupt tracking Currently we have two MSI-x (IRQ) trackers, one for OS requested MSI-x entries (sw_irq_tracker) and one for hardware MSI-x vectors (hw_irq_tracker). Generally the sw_irq_tracker has less entries than the hw_irq_tracker because the hw_irq_tracker has entries equal to the max allowed MSI-x per PF and the sw_irq_tracker is mainly the minimum (non SR-IOV portion of the vectors, kernel granted IRQs). All of the non SR-IOV portions of the driver (i.e. LAN queues, RDMA queues, OICR, etc.) take at least one of each type of tracker resource. SR-IOV only grabs entries from the hw_irq_tracker. There are a few issues with this approach that can be seen when doing any kind of device reconfiguration (i.e. ethtool -L, SR-IOV, etc.). One of them being, any time the driver creates an ice_q_vector and associates it to a LAN queue pair it will grab and use one entry from the hw_irq_tracker and one from the sw_irq_tracker. If the indices on these does not match it will cause a Tx timeout, which will cause a reset and then the indices will match up again and traffic will resume. The mismatched indices come from the trackers not being the same size and/or the search_hint in the two trackers not being equal. Another reason for the refactor is the co-existence of features with SR-IOV. If SR-IOV is enabled and the interrupts are taken from the end of the sw_irq_tracker then other features can no longer use this space because the hardware has now given the remaining interrupts to SR-IOV. This patch reworks how we track MSI-x vectors by removing the hw_irq_tracker completely and instead MSI-x resources needed for SR-IOV are determined all at once instead of per VF. This can be done because when creating VFs we know how many are wanted and how many MSI-x vectors each VF needs. This also allows us to start using MSI-x resources from the end of the PF's allowed MSI-x vectors so we are less likely to use entries needed for other features (i.e. RDMA, L2 Offload, etc). This patch also reworks the ice_res_tracker structure by removing the search_hint and adding a new member - "end". Instead of having a search_hint we will always search from 0. The new member, "end", will be used to manipulate the end of the ice_res_tracker (specifically sw_irq_tracker) during runtime based on MSI-x vectors needed by SR-IOV. In the normal case, the end of ice_res_tracker will be equal to the ice_res_tracker's num_entries. The sriov_base_vector member was added to the PF structure. It is used to represent the starting MSI-x index of all the needed MSI-x vectors for all SR-IOV VFs. Depending on how many MSI-x are needed, SR-IOV may have to take resources from the sw_irq_tracker. This is done by setting the sw_irq_tracker->end equal to the pf->sriov_base_vector. When all SR-IOV VFs are removed then the sw_irq_tracker->end is reset back to sw_irq_tracker->num_entries. The sriov_base_vector, along with the VF's number of MSI-x (pf->num_vf_msix), vf_id, and the base MSI-x index on the PF (pf->hw.func_caps.common_cap.msix_vector_first_id), is used to calculate the first HW absolute MSI-x index for each VF, which is used to write to the VPINT_ALLOC[_PCI] and GLINT_VECT2FUNC registers to program the VFs MSI-x PCI configuration bits. Also, the sriov_base_vector is used along with VF's num_vf_msix, vf_id, and q_vector->v_idx to determine the MSI-x register index (used for writing to GLINT_DYN_CTL) within the PF's space. Interrupt changes removed any references to hw_base_vector, hw_oicr_idx, and hw_irq_tracker. Only sw_base_vector, sw_oicr_idx, and sw_irq_tracker variables remain. Change all of these by removing the "sw_" prefix to help avoid confusion with these variables and their use. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-17 01:30:44 +08:00
u16 end;
ice: Replace one-element array with flexible-array member There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Refactor the code according to the use of a flexible-array member in struct ice_res_tracker, instead of a one-element array and use the struct_size() helper to calculate the size for the allocations. Also, notice that the code below suggests that, currently, two too many bytes are being allocated with devm_kzalloc(), as the total number of entries (pf->irq_tracker->num_entries) for pf->irq_tracker->list[] is _vectors_ and sizeof(*pf->irq_tracker) also includes the size of the one-element array _list_ in struct ice_res_tracker. drivers/net/ethernet/intel/ice/ice_main.c:3511: 3511 /* populate SW interrupts pool with number of OS granted IRQs. */ 3512 pf->num_avail_sw_msix = (u16)vectors; 3513 pf->irq_tracker->num_entries = (u16)vectors; 3514 pf->irq_tracker->end = pf->irq_tracker->num_entries; With this change, the right amount of dynamic memory is now allocated because, contrary to one-element arrays which occupy at least as much space as a single object of the type, flexible-array members don't occupy such space in the containing structure. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays Built-tested-by: kernel test robot <lkp@intel.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-09-30 03:01:56 +08:00
u16 list[];
};
struct ice_qs_cfg {
struct mutex *qs_mutex; /* will be assigned to &pf->avail_q_mutex */
unsigned long *pf_map;
unsigned long pf_map_size;
unsigned int q_count;
unsigned int scatter_count;
u16 *vsi_map;
u16 vsi_map_offset;
u8 mapping_mode;
};
struct ice_sw {
struct ice_pf *pf;
u16 sw_id; /* switch ID for this switch */
u16 bridge_mode; /* VEB/VEPA/Port Virtualizer */
struct ice_vsi *dflt_vsi; /* default VSI for this switch */
u8 dflt_vsi_ena:1; /* true if above dflt_vsi is enabled */
};
enum ice_pf_state {
ICE_TESTING,
ICE_DOWN,
ICE_NEEDS_RESTART,
ICE_PREPARED_FOR_RESET, /* set by driver when prepared */
ICE_RESET_OICR_RECV, /* set by driver after rcv reset OICR */
ICE_PFR_REQ, /* set by driver */
ICE_CORER_REQ, /* set by driver */
ICE_GLOBR_REQ, /* set by driver */
ICE_CORER_RECV, /* set by OICR handler */
ICE_GLOBR_RECV, /* set by OICR handler */
ICE_EMPR_RECV, /* set by OICR handler */
ICE_SUSPENDED, /* set on module remove path */
ICE_RESET_FAILED, /* set by reset/rebuild */
/* When checking for the PF to be in a nominal operating state, the
* bits that are grouped at the beginning of the list need to be
* checked. Bits occurring before ICE_STATE_NOMINAL_CHECK_BITS will
* be checked. If you need to add a bit into consideration for nominal
* operating state, it must be added before
* ICE_STATE_NOMINAL_CHECK_BITS. Do not move this entry's position
* without appropriate consideration.
*/
ICE_STATE_NOMINAL_CHECK_BITS,
ICE_ADMINQ_EVENT_PENDING,
ICE_MAILBOXQ_EVENT_PENDING,
ICE_SIDEBANDQ_EVENT_PENDING,
ICE_MDD_EVENT_PENDING,
ICE_VFLR_EVENT_PENDING,
ICE_FLTR_OVERFLOW_PROMISC,
ICE_VF_DIS,
ICE_VF_DEINIT_IN_PROGRESS,
ICE_CFG_BUSY,
ICE_SERVICE_SCHED,
ICE_SERVICE_DIS,
ICE_FD_FLUSH_REQ,
ICE_OICR_INTR_DIS, /* Global OICR interrupt disabled */
ICE_MDD_VF_PRINT_PENDING, /* set when MDD event handle */
ICE_VF_RESETS_DISABLED, /* disable resets during ice_remove */
ICE_LINK_DEFAULT_OVERRIDE_PENDING,
ICE_PHY_INIT_COMPLETE,
ICE_FD_VF_FLUSH_CTX, /* set at FD Rx IRQ or timeout */
ICE_STATE_NBITS /* must be last */
};
enum ice_vsi_state {
ICE_VSI_DOWN,
ICE_VSI_NEEDS_RESTART,
ICE_VSI_NETDEV_ALLOCD,
ICE_VSI_NETDEV_REGISTERED,
ICE_VSI_UMAC_FLTR_CHANGED,
ICE_VSI_MMAC_FLTR_CHANGED,
ICE_VSI_VLAN_FLTR_CHANGED,
ICE_VSI_PROMISC_CHANGED,
ICE_VSI_STATE_NBITS /* must be last */
};
/* struct that defines a VSI, associated with a dev */
struct ice_vsi {
struct net_device *netdev;
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
struct ice_sw *vsw; /* switch this VSI is on */
struct ice_pf *back; /* back pointer to PF */
struct ice_port_info *port_info; /* back pointer to port_info */
struct ice_rx_ring **rx_rings; /* Rx ring array */
struct ice_tx_ring **tx_rings; /* Tx ring array */
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
struct ice_q_vector **q_vectors; /* q_vector array */
irqreturn_t (*irq_handler)(int irq, void *data);
u64 tx_linearize;
DECLARE_BITMAP(state, ICE_VSI_STATE_NBITS);
unsigned int current_netdev_flags;
u32 tx_restart;
u32 tx_busy;
u32 rx_buf_failed;
u32 rx_page_failed;
u16 num_q_vectors;
u16 base_vector; /* IRQ base for OS reserved vectors */
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
enum ice_vsi_type type;
u16 vsi_num; /* HW (absolute) index of this VSI */
u16 idx; /* software index in pf->vsi[] */
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
s16 vf_id; /* VF ID for SR-IOV VSIs */
u16 ethtype; /* Ethernet protocol for pause frame */
u16 num_gfltr;
u16 num_bfltr;
/* RSS config */
u16 rss_table_size; /* HW RSS table size */
u16 rss_size; /* Allocated RSS queues */
u8 *rss_hkey_user; /* User configured hash keys */
u8 *rss_lut_user; /* User configured lookup table entries */
u8 rss_lut_type; /* used to configure Get/Set RSS LUT AQ call */
ice: Implement aRFS Enable accelerated Receive Flow Steering (aRFS). It is used to steer Rx flows to a specific queue. This functionality is triggered by the network stack through ndo_rx_flow_steer and requires Flow Director (ntuple on) to function. The fltr_info is used to add/remove/update flow rules in the HW, the fltr_state is used to determine what to do with the filter with respect to HW and/or SW, and the flow_id is used in co-ordination with the network stack. The work for aRFS is split into two paths: the ndo_rx_flow_steer operation and the ice_service_task. The former is where the kernel hands us an Rx SKB among other items to setup aRFS and the latter is where the driver adds/updates/removes filter rules from HW and updates filter state. In the Rx path the following things can happen: 1. New aRFS entries are added to the hash table and the state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 2. aRFS entries have their Rx Queue updated if we receive a pre-existing flow_id and the filter state is ICE_ARFS_ACTIVE. The state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 3. aRFS entries marked as ICE_ARFS_TODEL are deleted In the ice_service_task path the following things can happen: 1. New aRFS entries marked as ICE_ARFS_INACTIVE are added or updated in HW. and their state is updated to ICE_ARFS_ACTIVE. 2. aRFS entries are deleted from HW and their state is updated to ICE_ARFS_TODEL. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-12 09:01:46 +08:00
/* aRFS members only allocated for the PF VSI */
#define ICE_MAX_ARFS_LIST 1024
#define ICE_ARFS_LST_MASK (ICE_MAX_ARFS_LIST - 1)
struct hlist_head *arfs_fltr_list;
struct ice_arfs_active_fltr_cntrs *arfs_fltr_cntrs;
spinlock_t arfs_lock; /* protects aRFS hash table and filter state */
atomic_t *arfs_last_fltr_id;
u16 max_frame;
u16 rx_buf_len;
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
struct ice_aqc_vsi_props info; /* VSI properties */
/* VSI stats */
struct rtnl_link_stats64 net_stats;
struct ice_eth_stats eth_stats;
struct ice_eth_stats eth_stats_prev;
struct list_head tmp_sync_list; /* MAC filters to be synced */
struct list_head tmp_unsync_list; /* MAC filters to be unsynced */
u8 irqs_ready:1;
u8 current_isup:1; /* Sync 'link up' logging */
u8 stat_offsets_loaded:1;
ice: Fix VF spoofchk There are many things wrong with the function ice_set_vf_spoofchk(). 1. The VSI being modified is the PF VSI, not the VF VSI. 2. We are enabling Rx VLAN pruning instead of Tx VLAN anti-spoof. 3. The spoofchk setting for each VF is not initialized correctly or re-initialized correctly on reset. To fix [1] we need to make sure we are modifying the VF VSI. This is done by using the vf->lan_vsi_idx to index into the PF's VSI array. To fix [2] replace setting Rx VLAN pruning in ice_set_vf_spoofchk() with setting Tx VLAN anti-spoof. To Fix [3] we need to make sure the initial VSI settings match what is done in ice_set_vf_spoofchk() for spoofchk=on. Also make sure this also works for VF reset. This was done by modifying ice_vsi_init() to account for the current spoofchk state of the VF VSI. Because of these changes, Tx VLAN anti-spoof needs to be removed from ice_cfg_vlan_pruning(). This is okay for the VF because this is now controlled from the admin enabling/disabling spoofchk. For the PF, Tx VLAN anti-spoof should not be set. This change requires us to call ice_set_vf_spoofchk() when configuring promiscuous mode for the VF which requires ice_set_vf_spoofchk() to move in order to prevent a forward declaration prototype. Also, add VLAN 0 by default when allocating a VF since the PF is unaware if the guest OS is running the 8021q module. Without this, MDD events will trigger on untagged traffic because spoofcheck is enabled by default. Due to this change, ignore add/delete messages for VLAN 0 from VIRTCHNL since this is added/deleted during VF initialization/teardown respectively and should not be modified. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-12-12 19:12:54 +08:00
u16 num_vlan;
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
/* queue information */
u8 tx_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */
u8 rx_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */
ice: Alloc queue management bitmaps and arrays dynamically The total number of queues available on the device is divided between multiple physical functions (PF) in the firmware and provided to the driver when it gets function capabilities from the firmware. Thus each PF knows how many Tx/Rx queues it has. These queues are then doled out to different VSIs (for LAN traffic, SR-IOV VF traffic, etc.) To track usage of these queues at the PF level, the driver uses two bitmaps avail_txqs and avail_rxqs. At the VSI level (i.e. struct ice_vsi instances) the driver uses two arrays txq_map and rxq_map, to track ownership of VSIs' queues in avail_txqs and avail_rxqs respectively. The aforementioned bitmaps and arrays should be allocated dynamically, because the number of queues supported by a PF is only available once function capabilities have been queried. The current static allocation consumes way more memory than required. This patch removes the DECLARE_BITMAP for avail_txqs and avail_rxqs and instead uses bitmap_zalloc to allocate the bitmaps during init. Similarly txq_map and rxq_map are now allocated in ice_vsi_alloc_arrays. As a result ICE_MAX_TXQS and ICE_MAX_RXQS defines are no longer needed. Also as txq_map and rxq_map are now allocated and freed, some code reordering was required in ice_vsi_rebuild for correct functioning. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-02 16:25:21 +08:00
u16 *txq_map; /* index in pf->avail_txqs */
u16 *rxq_map; /* index in pf->avail_rxqs */
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
u16 alloc_txq; /* Allocated Tx queues */
u16 num_txq; /* Used Tx queues */
u16 alloc_rxq; /* Allocated Rx queues */
u16 num_rxq; /* Used Rx queues */
u16 req_txq; /* User requested Tx queues */
u16 req_rxq; /* User requested Rx queues */
u16 num_rx_desc;
u16 num_tx_desc;
u16 qset_handle[ICE_MAX_TRAFFIC_CLASS];
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
struct ice_tc_cfg tc_cfg;
struct bpf_prog *xdp_prog;
struct ice_tx_ring **xdp_rings; /* XDP ring array */
ice: track AF_XDP ZC enabled queues in bitmap Commit c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") silently introduced a regression and broke the Tx side of AF_XDP in copy mode. xsk_pool on ice_ring is set only based on the existence of the XDP prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed. That is not something that should happen for copy mode as it should use the regular data path ice_clean_tx_irq. This results in a following splat when xdpsock is run in txonly or l2fwd scenarios in copy mode: <snip> [ 106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 106.057269] #PF: supervisor read access in kernel mode [ 106.062493] #PF: error_code(0x0000) - not-present page [ 106.067709] PGD 0 P4D 0 [ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45 [ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50 [ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00 [ 106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206 [ 106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800 [ 106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800 [ 106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800 [ 106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff [ 106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018 [ 106.157117] FS: 0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000 [ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0 [ 106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.192898] PKRU: 55555554 [ 106.195653] Call Trace: [ 106.198143] <IRQ> [ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice] [ 106.205087] ice_napi_poll+0x3e/0x590 [ice] [ 106.209356] __napi_poll+0x2a/0x160 [ 106.212911] net_rx_action+0xd6/0x200 [ 106.216634] __do_softirq+0xbf/0x29b [ 106.220274] irq_exit_rcu+0x88/0xc0 [ 106.223819] common_interrupt+0x7b/0xa0 [ 106.227719] </IRQ> [ 106.229857] asm_common_interrupt+0x1e/0x40 </snip> Fix this by introducing the bitmap of queues that are zero-copy enabled, where each bit, corresponding to a queue id that xsk pool is being configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and checked within ice_xsk_pool(). The latter is a function used for deciding which napi poll routine is executed. Idea is being taken from our other drivers such as i40e and ixgbe. Fixes: c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-28 03:52:09 +08:00
unsigned long *af_xdp_zc_qps; /* tracks AF_XDP ZC enabled qps */
u16 num_xdp_txq; /* Used XDP queues */
u8 xdp_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */
ice: set and release switchdev environment Switchdev environment has to be set up when user create VFs and eswitch mode is switchdev. Release is done when user delete all VFs. Data path in this implementation is based on control plane VSI. This VSI is used to pass traffic from port representors to corresponding VFs and vice versa. Default TX rule has to be added to forward packet to control plane VSI. This will redirect packets from VFs which don't match other rules to control plane VSI. On RX side default rule is added on uplink VSI to receive all traffic that doesn't match other rules. When setting switchdev environment all other rules from VFs should be removed. Packet to VFs will be forwarded by control plane VSI. As VF without any mac rules can't send any packet because of antispoof mechanism, VSI antispoof should be turned off on each VFs. To send packet from representor to correct VSI, destination VSI field in TX descriptor will have to be filled. Allow that by setting destination override bit in control plane VSI security config. Packet from VFs will be received on control plane VSI. Driver should decide to which netdev forward the packet. Decision is made based on src_vsi field from descriptor. There is a target netdev list in control plane VSI struct which choose netdev based on src_vsi number. Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-20 08:08:54 +08:00
struct net_device **target_netdevs;
struct tc_mqprio_qopt_offload mqprio_qopt; /* queue parameters */
/* Channel Specific Fields */
struct ice_vsi *tc_map_vsi[ICE_CHNL_MAX_TC];
u16 cnt_q_avail;
u16 next_base_q; /* next queue to be used for channel setup */
struct list_head ch_list;
u16 num_chnl_rxq;
u16 num_chnl_txq;
u16 ch_rss_size;
ice: Add tc-flower filter support for channel Add support to add/delete channel specific filter using tc-flower. For now, only supported action is "skip_sw hw_tc <tc_num>" Filter criteria is specific to channel and it can be combination of L3, L3+L4, L2+L4. Example: MATCH criteria Action --------------------------- src and/or dest IPv4[6]/mask -> Forward to "hw_tc <tc_num>" dest IPv4[6]/mask + dest L4 port -> Forward to "hw_tc <tc_num>" dest MAC + dest L4 port -> Forward to "hw_tc <tc_num>" src IPv4[6]/mask + src L4 port -> Forward to "hw_tc <tc_num>" src MAC + src L4 port -> Forward to "hw_tc <tc_num>" Adding tc-flower filter for channel using "hw_tc" ------------------------------------------------- tc qdisc add dev <ethX> clsact Above two steps are only needed the first time when adding tc-flower filter. tc filter add dev <ethX> protocol ip ingress prio 1 flower \ dst_ip 192.168.0.1/32 ip_proto tcp dst_port 5001 \ skip_sw hw_tc 1 tc filter show dev <ethX> ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 hw_tc 1 eth_type ipv4 ip_proto tcp dst_ip 192.168.0.1 dst_port 5001 skip_sw in_hw in_hw_count 1 Delete specific filter: ------------------------- tc filter del dev <ethx> ingress pref 1 handle 0x1 flower Delete All filters: ------------------ tc filter del dev <ethX> ingress Co-developed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-16 07:35:17 +08:00
u16 num_chnl_fltr;
/* store away rss size info before configuring ADQ channels so that,
* it can be used after tc-qdisc delete, to get back RSS setting as
* they were before
*/
u16 orig_rss_size;
/* this keeps tracks of all enabled TC with and without DCB
* and inclusive of ADQ, vsi->mqprio_opt keeps track of queue
* information
*/
u8 all_numtc;
u16 all_enatc;
/* store away TC info, to be used for rebuild logic */
u8 old_numtc;
u16 old_ena_tc;
struct ice_channel *ch;
/* setup back reference, to which aggregator node this VSI
* corresponds to
*/
struct ice_agg_node *agg_node;
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
} ____cacheline_internodealigned_in_smp;
/* struct that defines an interrupt vector */
struct ice_q_vector {
struct ice_vsi *vsi;
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
u16 v_idx; /* index in the vsi->q_vector array. */
u16 reg_idx;
u8 num_ring_rx; /* total number of Rx rings in vector */
u8 num_ring_tx; /* total number of Tx rings in vector */
ice: replace custom AIM algorithm with kernel's DIM library The ice driver has support for adaptive interrupt moderation, an algorithm for tuning the interrupt rate dynamically. This algorithm is based on various assumptions about ring size, socket buffer size, link speed, SKB overhead, ethernet frame overhead and more. The Linux kernel has support for a dynamic interrupt moderation algorithm known as "dimlib". Replace the custom driver-specific implementation of dynamic interrupt moderation with the kernel's algorithm. The Intel hardware has a different hardware implementation than the originators of the dimlib code had to work with, which requires the driver to use a slightly different set of inputs for the actual moderation values, while getting all the advice from dimlib of better/worse, shift left or right. The change made for this implementation is to use a pair of values for each of the 5 "slots" that the dimlib moderation expects, and the driver will program those pairs when dimlib recommends a slot to use. The currently implementation uses two tables, one for receive and one for transmit, and the pairs of values in each slot set the maximum delay of an interrupt and a maximum number of interrupts per second (both expressed in microseconds). There are two separate kinds of bugs fixed by using DIMLIB, one is UDP single stream send was too slow, and the other is that 8K ping-pong was going to the most aggressive moderation and has much too high latency. The overall result of using DIMLIB is that we meet or exceed our performance expectations set based on the old algorithm. Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-01 05:16:57 +08:00
u8 wb_on_itr:1; /* if true, WB on ITR is enabled */
/* in usecs, need to use ice_intrl_to_usecs_reg() before writing this
* value to the device
*/
u8 intrl;
struct napi_struct napi;
struct ice_ring_container rx;
struct ice_ring_container tx;
cpumask_t affinity_mask;
struct irq_affinity_notify affinity_notify;
ice: enable ndo_setup_tc support for mqprio_qdisc Add support in driver for TC_QDISC_SETUP_MQPRIO. This support enables instantiation of channels in HW using existing MQPRIO infrastructure which is extended to be offloadable. This provides a mechanism to configure dedicated set of queues for each TC. Configuring channels using "tc mqprio": -------------------------------------- tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \ queues 4@0 4@4 4@8 hw 1 mode channel Above command configures 3 TCs having 4 queues each. "hw 1 mode channel" implies offload of channel configuration to HW. When driver processes configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each TC maps to HW VSI with specified queues. User can optionally specify bandwidth min and max rate limit per TC (see example below). If shaper params like min and/or max bandwidth rate limit are specified, driver configures VSI specific rate limiter in HW. Configuring channels and bandwidth shaper parameters using "tc mqprio": ---------------------------------------------------------------- tc qdisc add dev <ethX> root mqprio \ num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \ shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \ max_rate 4Gbit 5Gbit 6Gbit 7Gbit Command to view configured TCs: ----------------------------- tc qdisc show dev <ethX> Deleting TCs: ------------ tc qdisc del dev <ethX> root mqprio Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-16 07:35:16 +08:00
struct ice_channel *ch;
char name[ICE_INT_NAME_STR_LEN];
ice: replace custom AIM algorithm with kernel's DIM library The ice driver has support for adaptive interrupt moderation, an algorithm for tuning the interrupt rate dynamically. This algorithm is based on various assumptions about ring size, socket buffer size, link speed, SKB overhead, ethernet frame overhead and more. The Linux kernel has support for a dynamic interrupt moderation algorithm known as "dimlib". Replace the custom driver-specific implementation of dynamic interrupt moderation with the kernel's algorithm. The Intel hardware has a different hardware implementation than the originators of the dimlib code had to work with, which requires the driver to use a slightly different set of inputs for the actual moderation values, while getting all the advice from dimlib of better/worse, shift left or right. The change made for this implementation is to use a pair of values for each of the 5 "slots" that the dimlib moderation expects, and the driver will program those pairs when dimlib recommends a slot to use. The currently implementation uses two tables, one for receive and one for transmit, and the pairs of values in each slot set the maximum delay of an interrupt and a maximum number of interrupts per second (both expressed in microseconds). There are two separate kinds of bugs fixed by using DIMLIB, one is UDP single stream send was too slow, and the other is that 8K ping-pong was going to the most aggressive moderation and has much too high latency. The overall result of using DIMLIB is that we meet or exceed our performance expectations set based on the old algorithm. Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-01 05:16:57 +08:00
u16 total_events; /* net_dim(): number of interrupts processed */
} ____cacheline_internodealigned_in_smp;
enum ice_pf_flags {
ICE_FLAG_FLTR_SYNC,
ICE_FLAG_RDMA_ENA,
ICE_FLAG_RSS_ENA,
ICE_FLAG_SRIOV_ENA,
ICE_FLAG_SRIOV_CAPABLE,
ICE_FLAG_DCB_CAPABLE,
ICE_FLAG_DCB_ENA,
ICE_FLAG_FD_ENA,
ICE_FLAG_PTP_SUPPORTED, /* PTP is supported by NVM */
ICE_FLAG_PTP, /* PTP is enabled by software */
ICE_FLAG_AUX_ENA,
ICE_FLAG_ADV_FEATURES,
ICE_FLAG_TC_MQPRIO, /* support for Multi queue TC */
ICE_FLAG_CLS_FLOWER,
ICE_FLAG_LINK_DOWN_ON_CLOSE_ENA,
ICE_FLAG_TOTAL_PORT_SHUTDOWN_ENA,
ICE_FLAG_NO_MEDIA,
ICE_FLAG_FW_LLDP_AGENT,
ICE_FLAG_MOD_POWER_UNSUPPORTED,
ICE_FLAG_PHY_FW_LOAD_FAILED,
ICE_FLAG_ETHTOOL_CTXT, /* set when ethtool holds RTNL lock */
ice: introduce legacy Rx flag Add an ethtool "legacy-rx" priv flag for toggling the Rx path. This control knob will be mainly used for build_skb usage as well as buffer size/MTU manipulation. In preparation for adding build_skb support in a way that it takes care of how we set the values of max_frame and rx_buf_len fields of struct ice_vsi. Specifically, in this patch mentioned fields are set to values that will allow us to provide headroom and tailroom in-place. This can be mostly broken down onto following: - for legacy-rx "on" ethtool control knob, old behaviour is kept; - for standard 1500 MTU size configure the buffer of size 1536, as network stack is expecting the NET_SKB_PAD to be provided and NET_IP_ALIGN can have a non-zero value (these can be typically equal to 32 and 2, respectively); - for larger MTUs go with max_frame set to 9k and configure the 3k buffer in case when PAGE_SIZE of underlying arch is less than 8k; 3k buffer is implying the need for order 1 page, so that our page recycling scheme can still be applied; With that said, substitute the hardcoded ICE_RXBUF_2048 and PAGE_SIZE values in DMA API that we're making use of with rx_ring->rx_buf_len and ice_rx_pg_size(rx_ring). The latter is an introduced helper for determining the page size based on its order (which was figured out via ice_rx_pg_order). Last but not least, take care of truesize calculation. In the followup patch the headroom/tailroom computation logic will be introduced. This change aligns the buffer and frame configuration with other Intel drivers, most importantly with iavf. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-24 16:11:22 +08:00
ICE_FLAG_LEGACY_RX,
ICE_FLAG_VF_TRUE_PROMISC_ENA,
ICE_FLAG_MDD_AUTO_RESET_VF,
ICE_FLAG_LINK_LENIENT_MODE_ENA,
ICE_PF_FLAGS_NBITS /* must be last */
};
ice: set and release switchdev environment Switchdev environment has to be set up when user create VFs and eswitch mode is switchdev. Release is done when user delete all VFs. Data path in this implementation is based on control plane VSI. This VSI is used to pass traffic from port representors to corresponding VFs and vice versa. Default TX rule has to be added to forward packet to control plane VSI. This will redirect packets from VFs which don't match other rules to control plane VSI. On RX side default rule is added on uplink VSI to receive all traffic that doesn't match other rules. When setting switchdev environment all other rules from VFs should be removed. Packet to VFs will be forwarded by control plane VSI. As VF without any mac rules can't send any packet because of antispoof mechanism, VSI antispoof should be turned off on each VFs. To send packet from representor to correct VSI, destination VSI field in TX descriptor will have to be filled. Allow that by setting destination override bit in control plane VSI security config. Packet from VFs will be received on control plane VSI. Driver should decide to which netdev forward the packet. Decision is made based on src_vsi field from descriptor. There is a target netdev list in control plane VSI struct which choose netdev based on src_vsi number. Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-20 08:08:54 +08:00
struct ice_switchdev_info {
struct ice_vsi *control_vsi;
struct ice_vsi *uplink_vsi;
bool is_running;
};
struct ice_agg_node {
u32 agg_id;
#define ICE_MAX_VSIS_IN_AGG_NODE 64
u32 num_vsis;
u8 valid;
};
struct ice_pf {
struct pci_dev *pdev;
struct devlink_region *nvm_region;
struct devlink_region *sram_region;
struct devlink_region *devcaps_region;
/* devlink port data */
struct devlink_port devlink_port;
/* OS reserved IRQ details */
struct msix_entry *msix_entries;
ice: Refactor interrupt tracking Currently we have two MSI-x (IRQ) trackers, one for OS requested MSI-x entries (sw_irq_tracker) and one for hardware MSI-x vectors (hw_irq_tracker). Generally the sw_irq_tracker has less entries than the hw_irq_tracker because the hw_irq_tracker has entries equal to the max allowed MSI-x per PF and the sw_irq_tracker is mainly the minimum (non SR-IOV portion of the vectors, kernel granted IRQs). All of the non SR-IOV portions of the driver (i.e. LAN queues, RDMA queues, OICR, etc.) take at least one of each type of tracker resource. SR-IOV only grabs entries from the hw_irq_tracker. There are a few issues with this approach that can be seen when doing any kind of device reconfiguration (i.e. ethtool -L, SR-IOV, etc.). One of them being, any time the driver creates an ice_q_vector and associates it to a LAN queue pair it will grab and use one entry from the hw_irq_tracker and one from the sw_irq_tracker. If the indices on these does not match it will cause a Tx timeout, which will cause a reset and then the indices will match up again and traffic will resume. The mismatched indices come from the trackers not being the same size and/or the search_hint in the two trackers not being equal. Another reason for the refactor is the co-existence of features with SR-IOV. If SR-IOV is enabled and the interrupts are taken from the end of the sw_irq_tracker then other features can no longer use this space because the hardware has now given the remaining interrupts to SR-IOV. This patch reworks how we track MSI-x vectors by removing the hw_irq_tracker completely and instead MSI-x resources needed for SR-IOV are determined all at once instead of per VF. This can be done because when creating VFs we know how many are wanted and how many MSI-x vectors each VF needs. This also allows us to start using MSI-x resources from the end of the PF's allowed MSI-x vectors so we are less likely to use entries needed for other features (i.e. RDMA, L2 Offload, etc). This patch also reworks the ice_res_tracker structure by removing the search_hint and adding a new member - "end". Instead of having a search_hint we will always search from 0. The new member, "end", will be used to manipulate the end of the ice_res_tracker (specifically sw_irq_tracker) during runtime based on MSI-x vectors needed by SR-IOV. In the normal case, the end of ice_res_tracker will be equal to the ice_res_tracker's num_entries. The sriov_base_vector member was added to the PF structure. It is used to represent the starting MSI-x index of all the needed MSI-x vectors for all SR-IOV VFs. Depending on how many MSI-x are needed, SR-IOV may have to take resources from the sw_irq_tracker. This is done by setting the sw_irq_tracker->end equal to the pf->sriov_base_vector. When all SR-IOV VFs are removed then the sw_irq_tracker->end is reset back to sw_irq_tracker->num_entries. The sriov_base_vector, along with the VF's number of MSI-x (pf->num_vf_msix), vf_id, and the base MSI-x index on the PF (pf->hw.func_caps.common_cap.msix_vector_first_id), is used to calculate the first HW absolute MSI-x index for each VF, which is used to write to the VPINT_ALLOC[_PCI] and GLINT_VECT2FUNC registers to program the VFs MSI-x PCI configuration bits. Also, the sriov_base_vector is used along with VF's num_vf_msix, vf_id, and q_vector->v_idx to determine the MSI-x register index (used for writing to GLINT_DYN_CTL) within the PF's space. Interrupt changes removed any references to hw_base_vector, hw_oicr_idx, and hw_irq_tracker. Only sw_base_vector, sw_oicr_idx, and sw_irq_tracker variables remain. Change all of these by removing the "sw_" prefix to help avoid confusion with these variables and their use. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-17 01:30:44 +08:00
struct ice_res_tracker *irq_tracker;
/* First MSIX vector used by SR-IOV VFs. Calculated by subtracting the
* number of MSIX vectors needed for all SR-IOV VFs from the number of
* MSIX vectors allowed on this PF.
*/
u16 sriov_base_vector;
u16 ctrl_vsi_idx; /* control VSI index in pf->vsi array */
struct ice_vsi **vsi; /* VSIs created by the driver */
struct ice_sw *first_sw; /* first switch created by firmware */
ice: support basic E-Switch mode control Write set and get eswitch mode functions used by devlink ops. Use new pf struct member eswitch_mode to track current eswitch mode in driver. Changing eswitch mode is only allowed when there are no VFs created. Create new file for eswitch related code. Add config flag ICE_SWITCHDEV to allow user to choose if switchdev support should be enabled or disabled. Use case examples: - show current eswitch mode ('legacy' is the default one) [root@localhost]# devlink dev eswitch show pci/0000:03:00.1 pci/0000:03:00.1: mode legacy - move to 'switchdev' mode [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode switchdev [root@localhost]# devlink dev eswitch show pci/0000:03:00.1 pci/0000:03:00.1: mode switchdev - create 2 VFs [root@localhost]# echo 2 > /sys/class/net/ens4f1/device/sriov_numvfs - unsuccessful attempt to change eswitch mode while VFs are created [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy devlink answers: Operation not supported - destroy VFs [root@localhost]# echo 0 > /sys/class/net/ens4f1/device/sriov_numvfs - restore 'legacy' mode [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy [root@localhost]# devlink dev eswitch show pci/0000:03:00.1 pci/0000:03:00.1: mode legacy Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-20 08:08:48 +08:00
u16 eswitch_mode; /* current mode of eswitch */
/* Virtchnl/SR-IOV config info */
struct ice_vf *vf;
u16 num_alloc_vfs; /* actual number of VFs allocated */
u16 num_vfs_supported; /* num VFs supported for this PF */
u16 num_qps_per_vf;
u16 num_msix_per_vf;
/* used to ratelimit the MDD event logging */
unsigned long last_printed_mdd_jiffies;
DECLARE_BITMAP(malvfs, ICE_MAX_VF_COUNT);
DECLARE_BITMAP(features, ICE_F_MAX);
DECLARE_BITMAP(state, ICE_STATE_NBITS);
DECLARE_BITMAP(flags, ICE_PF_FLAGS_NBITS);
ice: Alloc queue management bitmaps and arrays dynamically The total number of queues available on the device is divided between multiple physical functions (PF) in the firmware and provided to the driver when it gets function capabilities from the firmware. Thus each PF knows how many Tx/Rx queues it has. These queues are then doled out to different VSIs (for LAN traffic, SR-IOV VF traffic, etc.) To track usage of these queues at the PF level, the driver uses two bitmaps avail_txqs and avail_rxqs. At the VSI level (i.e. struct ice_vsi instances) the driver uses two arrays txq_map and rxq_map, to track ownership of VSIs' queues in avail_txqs and avail_rxqs respectively. The aforementioned bitmaps and arrays should be allocated dynamically, because the number of queues supported by a PF is only available once function capabilities have been queried. The current static allocation consumes way more memory than required. This patch removes the DECLARE_BITMAP for avail_txqs and avail_rxqs and instead uses bitmap_zalloc to allocate the bitmaps during init. Similarly txq_map and rxq_map are now allocated in ice_vsi_alloc_arrays. As a result ICE_MAX_TXQS and ICE_MAX_RXQS defines are no longer needed. Also as txq_map and rxq_map are now allocated and freed, some code reordering was required in ice_vsi_rebuild for correct functioning. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-02 16:25:21 +08:00
unsigned long *avail_txqs; /* bitmap to track PF Tx queue usage */
unsigned long *avail_rxqs; /* bitmap to track PF Rx queue usage */
unsigned long serv_tmr_period;
unsigned long serv_tmr_prev;
struct timer_list serv_tmr;
struct work_struct serv_task;
struct mutex avail_q_mutex; /* protects access to avail_[rx|tx]qs */
struct mutex sw_mutex; /* lock for protecting VSI alloc flow */
struct mutex tc_mutex; /* lock to protect TC changes */
u32 msg_enable;
struct ice_ptp ptp;
u16 num_rdma_msix; /* Total MSIX vectors for RDMA driver */
u16 rdma_base_vector;
ice: implement device flash update via devlink Use the newly added pldmfw library to implement device flash update for the Intel ice networking device driver. This support uses the devlink flash update interface. The main parts of the flash include the Option ROM, the netlist module, and the main NVM data. The PLDM firmware file contains modules for each of these components. Using the pldmfw library, the provided firmware file will be scanned for the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for the main NVM module containing the primary device firmware, and "fw.netlist" containing the netlist module. The flash is separated into two banks, the active bank containing the running firmware, and the inactive bank which we use for update. Each module is updated in a staged process. First, the inactive bank is erased, preparing the device for update. Second, the contents of the component are copied to the inactive portion of the flash. After all components are updated, the driver signals the device to switch the active bank during the next EMP reset (which would usually occur during the next reboot). Although the firmware AdminQ interface does report an immediate status for each command, the NVM erase and NVM write commands receive status asynchronously. The driver must not continue writing until previous erase and write commands have finished. The real status of the NVM commands is returned over the receive AdminQ. Implement a simple interface that uses a wait queue so that the main update thread can sleep until the completion status is reported by firmware. For erasing the inactive banks, this can take quite a while in practice. To help visualize the process to the devlink application and other applications based on the devlink netlink interface, status is reported via the devlink_flash_update_status_notify. While we do report status after each 4k block when writing, there is no real status we can report during erasing. We simply must wait for the complete module erasure to finish. With this implementation, basic flash update for the ice hardware is supported. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24 08:22:03 +08:00
/* spinlock to protect the AdminQ wait list */
spinlock_t aq_wait_lock;
struct hlist_head aq_wait_list;
wait_queue_head_t aq_wait_queue;
ice: support immediate firmware activation via devlink reload The ice hardware contains an embedded chip with firmware which can be updated using devlink flash. The firmware which runs on this chip is referred to as the Embedded Management Processor firmware (EMP firmware). Activating the new firmware image currently requires that the system be rebooted. This is not ideal as rebooting the system can cause unwanted downtime. In practical terms, activating the firmware does not always require a full system reboot. In many cases it is possible to activate the EMP firmware immediately. There are a couple of different scenarios to cover. * The EMP firmware itself can be reloaded by issuing a special update to the device called an Embedded Management Processor reset (EMP reset). This reset causes the device to reset and reload the EMP firmware. * PCI configuration changes are only reloaded after a cold PCIe reset. Unfortunately there is no generic way to trigger this for a PCIe device without a system reboot. When performing a flash update, firmware is capable of responding with some information about the specific update requirements. The driver updates the flash by programming a secondary inactive bank with the contents of the new image, and then issuing a command to request to switch the active bank starting from the next load. The response to the final command for updating the inactive NVM flash bank includes an indication of the minimum reset required to fully update the device. This can be one of the following: * A full power on is required * A cold PCIe reset is required * An EMP reset is required The response to the command to switch flash banks includes an indication of whether or not the firmware will allow an EMP reset request. For most updates, an EMP reset is sufficient to load the new EMP firmware without issues. In some cases, this reset is not sufficient because the PCI configuration space has changed. When this could cause incompatibility with the new EMP image, the firmware is capable of rejecting the EMP reset request. Add logic to ice_fw_update.c to handle the response data flash update AdminQ commands. For the reset level, issue a devlink status notification informing the user of how to complete the update with a simple suggestion like "Activate new firmware by rebooting the system". Cache the status of whether or not firmware will restrict the EMP reset for use in implementing devlink reload. Implement support for devlink reload with the "fw_activate" flag. This allows user space to request the firmware be activated immediately. For the .reload_down handler, we will issue a request for the EMP reset using the appropriate firmware AdminQ command. If we know that the firmware will not allow an EMP reset, simply exit with a suitable netlink extended ACK message indicating that the EMP reset is not available. For the .reload_up handler, simply wait until the driver has finished resetting. Logic to handle processing of an EMP reset already exists in the driver as part of its reset and rebuild flows. Implement support for the devlink reload interface with the "fw_activate" action. This allows userspace to request activation of firmware without a reboot. Note that support for indicating the required reset and EMP reset restriction is not supported on old versions of firmware. The driver can determine if the two features are supported by checking the device capabilities report. I confirmed support has existed since at least version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the EMP reset request has existed in all version of the EMP firmware for the ice hardware. Check the device capabilities report to determine whether or not the indications are reported by the running firmware. If the reset requirement indication is not supported, always assume a full power on is necessary. If the reset restriction capability is not supported, always assume the EMP reset is available. Users can verify if the EMP reset has activated the firmware by using the devlink info report to check that the 'running' firmware version has updated. For example a user might do the following: # Check current version $ devlink dev info # Update the device $ devlink dev flash pci/0000:af:00.0 file firmware.bin # Confirm stored version updated $ devlink dev info # Reload to activate new firmware $ devlink dev reload pci/0000:af:00.0 action fw_activate # Confirm running version updated $ devlink dev info Finally, this change does *not* implement basic driver-only reload support. I did look into trying to do this. However, it requires significant refactor of how the ice driver probes and loads everything. The ice driver probe and allocation flows were not designed with such a reload in mind. Refactoring the flow to support this is beyond the scope of this change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-28 07:22:55 +08:00
bool fw_emp_reset_disabled;
ice: implement device flash update via devlink Use the newly added pldmfw library to implement device flash update for the Intel ice networking device driver. This support uses the devlink flash update interface. The main parts of the flash include the Option ROM, the netlist module, and the main NVM data. The PLDM firmware file contains modules for each of these components. Using the pldmfw library, the provided firmware file will be scanned for the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for the main NVM module containing the primary device firmware, and "fw.netlist" containing the netlist module. The flash is separated into two banks, the active bank containing the running firmware, and the inactive bank which we use for update. Each module is updated in a staged process. First, the inactive bank is erased, preparing the device for update. Second, the contents of the component are copied to the inactive portion of the flash. After all components are updated, the driver signals the device to switch the active bank during the next EMP reset (which would usually occur during the next reboot). Although the firmware AdminQ interface does report an immediate status for each command, the NVM erase and NVM write commands receive status asynchronously. The driver must not continue writing until previous erase and write commands have finished. The real status of the NVM commands is returned over the receive AdminQ. Implement a simple interface that uses a wait queue so that the main update thread can sleep until the completion status is reported by firmware. For erasing the inactive banks, this can take quite a while in practice. To help visualize the process to the devlink application and other applications based on the devlink netlink interface, status is reported via the devlink_flash_update_status_notify. While we do report status after each 4k block when writing, there is no real status we can report during erasing. We simply must wait for the complete module erasure to finish. With this implementation, basic flash update for the ice hardware is supported. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24 08:22:03 +08:00
wait_queue_head_t reset_wait_queue;
u32 hw_csum_rx_error;
u16 oicr_idx; /* Other interrupt cause MSIX vector index */
u16 num_avail_sw_msix; /* remaining MSIX SW vectors left unclaimed */
ice: Alloc queue management bitmaps and arrays dynamically The total number of queues available on the device is divided between multiple physical functions (PF) in the firmware and provided to the driver when it gets function capabilities from the firmware. Thus each PF knows how many Tx/Rx queues it has. These queues are then doled out to different VSIs (for LAN traffic, SR-IOV VF traffic, etc.) To track usage of these queues at the PF level, the driver uses two bitmaps avail_txqs and avail_rxqs. At the VSI level (i.e. struct ice_vsi instances) the driver uses two arrays txq_map and rxq_map, to track ownership of VSIs' queues in avail_txqs and avail_rxqs respectively. The aforementioned bitmaps and arrays should be allocated dynamically, because the number of queues supported by a PF is only available once function capabilities have been queried. The current static allocation consumes way more memory than required. This patch removes the DECLARE_BITMAP for avail_txqs and avail_rxqs and instead uses bitmap_zalloc to allocate the bitmaps during init. Similarly txq_map and rxq_map are now allocated in ice_vsi_alloc_arrays. As a result ICE_MAX_TXQS and ICE_MAX_RXQS defines are no longer needed. Also as txq_map and rxq_map are now allocated and freed, some code reordering was required in ice_vsi_rebuild for correct functioning. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-02 16:25:21 +08:00
u16 max_pf_txqs; /* Total Tx queues PF wide */
u16 max_pf_rxqs; /* Total Rx queues PF wide */
u16 num_lan_msix; /* Total MSIX vectors for base driver */
u16 num_lan_tx; /* num LAN Tx queues setup */
u16 num_lan_rx; /* num LAN Rx queues setup */
u16 next_vsi; /* Next free slot in pf->vsi[] - 0-based! */
u16 num_alloc_vsi;
ice: Support link events, reset and rebuild Link events are posted to a PF's admin receive queue (ARQ). This patch adds the ability to detect and process link events. This patch also adds the ability to process resets. The driver can process the following resets: 1) EMP Reset (EMPR) 2) Global Reset (GLOBR) 3) Core Reset (CORER) 4) Physical Function Reset (PFR) EMPR is the largest level of reset that the driver can handle. An EMPR resets the manageability block and also the data path, including PHY and link for all the PFs. The affected PFs are notified of this event through a miscellaneous interrupt. GLOBR is a subset of EMPR. It does everything EMPR does except that it doesn't reset the manageability block. CORER is a subset of GLOBR. It does everything GLOBR does but doesn't reset PHY and link. PFR is a subset of CORER and affects only the given physical function. In other words, PFR can be thought of as a CORER for a single PF. Since only the issuing PF is affected, a PFR doesn't result in the miscellaneous interrupt being triggered. All the resets have the following in common: 1) Tx/Rx is halted and all queues are stopped. 2) All the VSIs and filters programmed for the PF are lost and have to be reprogrammed. 3) Control queue interfaces are reset and have to be reprogrammed. In the rebuild flow, control queues are reinitialized, VSIs are reallocated and filters are restored. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:18 +08:00
u16 corer_count; /* Core reset count */
u16 globr_count; /* Global reset count */
u16 empr_count; /* EMP reset count */
u16 pfr_count; /* PF reset count */
u8 wol_ena : 1; /* software state of WoL */
u32 wakeup_reason; /* last wakeup reason */
struct ice_hw_port_stats stats;
struct ice_hw_port_stats stats_prev;
struct ice_hw hw;
u8 stat_prev_loaded:1; /* has previous stats been loaded */
u8 rdma_mode;
u16 dcbx_cap;
u32 tx_timeout_count;
unsigned long tx_timeout_last_recovery;
u32 tx_timeout_recovery_level;
char int_name[ICE_INT_NAME_STR_LEN];
struct auxiliary_device *adev;
int aux_idx;
u32 sw_int_count;
ice: Add tc-flower filter support for channel Add support to add/delete channel specific filter using tc-flower. For now, only supported action is "skip_sw hw_tc <tc_num>" Filter criteria is specific to channel and it can be combination of L3, L3+L4, L2+L4. Example: MATCH criteria Action --------------------------- src and/or dest IPv4[6]/mask -> Forward to "hw_tc <tc_num>" dest IPv4[6]/mask + dest L4 port -> Forward to "hw_tc <tc_num>" dest MAC + dest L4 port -> Forward to "hw_tc <tc_num>" src IPv4[6]/mask + src L4 port -> Forward to "hw_tc <tc_num>" src MAC + src L4 port -> Forward to "hw_tc <tc_num>" Adding tc-flower filter for channel using "hw_tc" ------------------------------------------------- tc qdisc add dev <ethX> clsact Above two steps are only needed the first time when adding tc-flower filter. tc filter add dev <ethX> protocol ip ingress prio 1 flower \ dst_ip 192.168.0.1/32 ip_proto tcp dst_port 5001 \ skip_sw hw_tc 1 tc filter show dev <ethX> ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 hw_tc 1 eth_type ipv4 ip_proto tcp dst_ip 192.168.0.1 dst_port 5001 skip_sw in_hw in_hw_count 1 Delete specific filter: ------------------------- tc filter del dev <ethx> ingress pref 1 handle 0x1 flower Delete All filters: ------------------ tc filter del dev <ethX> ingress Co-developed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-16 07:35:17 +08:00
/* count of tc_flower filters specific to channel (aka where filter
* action is "hw_tc <tc_num>")
*/
u16 num_dmac_chnl_fltrs;
struct hlist_head tc_flower_fltr_list;
__le64 nvm_phy_type_lo; /* NVM PHY type low */
__le64 nvm_phy_type_hi; /* NVM PHY type high */
struct ice_link_default_override_tlv link_dflt_override;
struct ice_lag *lag; /* Link Aggregation information */
ice: set and release switchdev environment Switchdev environment has to be set up when user create VFs and eswitch mode is switchdev. Release is done when user delete all VFs. Data path in this implementation is based on control plane VSI. This VSI is used to pass traffic from port representors to corresponding VFs and vice versa. Default TX rule has to be added to forward packet to control plane VSI. This will redirect packets from VFs which don't match other rules to control plane VSI. On RX side default rule is added on uplink VSI to receive all traffic that doesn't match other rules. When setting switchdev environment all other rules from VFs should be removed. Packet to VFs will be forwarded by control plane VSI. As VF without any mac rules can't send any packet because of antispoof mechanism, VSI antispoof should be turned off on each VFs. To send packet from representor to correct VSI, destination VSI field in TX descriptor will have to be filled. Allow that by setting destination override bit in control plane VSI security config. Packet from VFs will be received on control plane VSI. Driver should decide to which netdev forward the packet. Decision is made based on src_vsi field from descriptor. There is a target netdev list in control plane VSI struct which choose netdev based on src_vsi number. Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-20 08:08:54 +08:00
struct ice_switchdev_info switchdev;
#define ICE_INVALID_AGG_NODE_ID 0
#define ICE_PF_AGG_NODE_ID_START 1
#define ICE_MAX_PF_AGG_NODES 32
struct ice_agg_node pf_agg_node[ICE_MAX_PF_AGG_NODES];
#define ICE_VF_AGG_NODE_ID_START 65
#define ICE_MAX_VF_AGG_NODES 32
struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
};
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
struct ice_netdev_priv {
struct ice_vsi *vsi;
struct ice_repr *repr;
/* indirect block callbacks on registered higher level devices
* (e.g. tunnel devices)
*
* tc_indr_block_cb_priv_list is used to look up indirect callback
* private data
*/
struct list_head tc_indr_block_priv_list;
ice: Add support for VSI allocation and deallocation This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-20 22:58:11 +08:00
};
ice: enable ndo_setup_tc support for mqprio_qdisc Add support in driver for TC_QDISC_SETUP_MQPRIO. This support enables instantiation of channels in HW using existing MQPRIO infrastructure which is extended to be offloadable. This provides a mechanism to configure dedicated set of queues for each TC. Configuring channels using "tc mqprio": -------------------------------------- tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \ queues 4@0 4@4 4@8 hw 1 mode channel Above command configures 3 TCs having 4 queues each. "hw 1 mode channel" implies offload of channel configuration to HW. When driver processes configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each TC maps to HW VSI with specified queues. User can optionally specify bandwidth min and max rate limit per TC (see example below). If shaper params like min and/or max bandwidth rate limit are specified, driver configures VSI specific rate limiter in HW. Configuring channels and bandwidth shaper parameters using "tc mqprio": ---------------------------------------------------------------- tc qdisc add dev <ethX> root mqprio \ num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \ shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \ max_rate 4Gbit 5Gbit 6Gbit 7Gbit Command to view configured TCs: ----------------------------- tc qdisc show dev <ethX> Deleting TCs: ------------ tc qdisc del dev <ethX> root mqprio Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-16 07:35:16 +08:00
/**
* ice_vector_ch_enabled
* @qv: pointer to q_vector, can be NULL
*
* This function returns true if vector is channel enabled otherwise false
*/
static inline bool ice_vector_ch_enabled(struct ice_q_vector *qv)
{
return !!qv->ch; /* Enable it to run with TC */
}
/**
* ice_irq_dynamic_ena - Enable default interrupt generation settings
* @hw: pointer to HW struct
* @vsi: pointer to VSI struct, can be NULL
* @q_vector: pointer to q_vector, can be NULL
*/
static inline void
ice_irq_dynamic_ena(struct ice_hw *hw, struct ice_vsi *vsi,
struct ice_q_vector *q_vector)
{
u32 vector = (vsi && q_vector) ? q_vector->reg_idx :
ice: Refactor interrupt tracking Currently we have two MSI-x (IRQ) trackers, one for OS requested MSI-x entries (sw_irq_tracker) and one for hardware MSI-x vectors (hw_irq_tracker). Generally the sw_irq_tracker has less entries than the hw_irq_tracker because the hw_irq_tracker has entries equal to the max allowed MSI-x per PF and the sw_irq_tracker is mainly the minimum (non SR-IOV portion of the vectors, kernel granted IRQs). All of the non SR-IOV portions of the driver (i.e. LAN queues, RDMA queues, OICR, etc.) take at least one of each type of tracker resource. SR-IOV only grabs entries from the hw_irq_tracker. There are a few issues with this approach that can be seen when doing any kind of device reconfiguration (i.e. ethtool -L, SR-IOV, etc.). One of them being, any time the driver creates an ice_q_vector and associates it to a LAN queue pair it will grab and use one entry from the hw_irq_tracker and one from the sw_irq_tracker. If the indices on these does not match it will cause a Tx timeout, which will cause a reset and then the indices will match up again and traffic will resume. The mismatched indices come from the trackers not being the same size and/or the search_hint in the two trackers not being equal. Another reason for the refactor is the co-existence of features with SR-IOV. If SR-IOV is enabled and the interrupts are taken from the end of the sw_irq_tracker then other features can no longer use this space because the hardware has now given the remaining interrupts to SR-IOV. This patch reworks how we track MSI-x vectors by removing the hw_irq_tracker completely and instead MSI-x resources needed for SR-IOV are determined all at once instead of per VF. This can be done because when creating VFs we know how many are wanted and how many MSI-x vectors each VF needs. This also allows us to start using MSI-x resources from the end of the PF's allowed MSI-x vectors so we are less likely to use entries needed for other features (i.e. RDMA, L2 Offload, etc). This patch also reworks the ice_res_tracker structure by removing the search_hint and adding a new member - "end". Instead of having a search_hint we will always search from 0. The new member, "end", will be used to manipulate the end of the ice_res_tracker (specifically sw_irq_tracker) during runtime based on MSI-x vectors needed by SR-IOV. In the normal case, the end of ice_res_tracker will be equal to the ice_res_tracker's num_entries. The sriov_base_vector member was added to the PF structure. It is used to represent the starting MSI-x index of all the needed MSI-x vectors for all SR-IOV VFs. Depending on how many MSI-x are needed, SR-IOV may have to take resources from the sw_irq_tracker. This is done by setting the sw_irq_tracker->end equal to the pf->sriov_base_vector. When all SR-IOV VFs are removed then the sw_irq_tracker->end is reset back to sw_irq_tracker->num_entries. The sriov_base_vector, along with the VF's number of MSI-x (pf->num_vf_msix), vf_id, and the base MSI-x index on the PF (pf->hw.func_caps.common_cap.msix_vector_first_id), is used to calculate the first HW absolute MSI-x index for each VF, which is used to write to the VPINT_ALLOC[_PCI] and GLINT_VECT2FUNC registers to program the VFs MSI-x PCI configuration bits. Also, the sriov_base_vector is used along with VF's num_vf_msix, vf_id, and q_vector->v_idx to determine the MSI-x register index (used for writing to GLINT_DYN_CTL) within the PF's space. Interrupt changes removed any references to hw_base_vector, hw_oicr_idx, and hw_irq_tracker. Only sw_base_vector, sw_oicr_idx, and sw_irq_tracker variables remain. Change all of these by removing the "sw_" prefix to help avoid confusion with these variables and their use. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-17 01:30:44 +08:00
((struct ice_pf *)hw->back)->oicr_idx;
int itr = ICE_ITR_NONE;
u32 val;
/* clear the PBA here, as this function is meant to clean out all
* previous interrupts and enable the interrupt
*/
val = GLINT_DYN_CTL_INTENA_M | GLINT_DYN_CTL_CLEARPBA_M |
(itr << GLINT_DYN_CTL_ITR_INDX_S);
if (vsi)
if (test_bit(ICE_VSI_DOWN, vsi->state))
return;
wr32(hw, GLINT_DYN_CTL(vector), val);
}
/**
* ice_netdev_to_pf - Retrieve the PF struct associated with a netdev
* @netdev: pointer to the netdev struct
*/
static inline struct ice_pf *ice_netdev_to_pf(struct net_device *netdev)
{
struct ice_netdev_priv *np = netdev_priv(netdev);
return np->vsi->back;
}
static inline bool ice_is_xdp_ena_vsi(struct ice_vsi *vsi)
{
return !!vsi->xdp_prog;
}
static inline void ice_set_ring_xdp(struct ice_tx_ring *ring)
{
ring->flags |= ICE_TX_FLAGS_RING_XDP;
}
/**
* ice_xsk_pool - get XSK buffer pool bound to a ring
* @ring: Rx ring to use
*
* Returns a pointer to xdp_umem structure if there is a buffer pool present,
* NULL otherwise.
*/
static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_rx_ring *ring)
{
ice: track AF_XDP ZC enabled queues in bitmap Commit c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") silently introduced a regression and broke the Tx side of AF_XDP in copy mode. xsk_pool on ice_ring is set only based on the existence of the XDP prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed. That is not something that should happen for copy mode as it should use the regular data path ice_clean_tx_irq. This results in a following splat when xdpsock is run in txonly or l2fwd scenarios in copy mode: <snip> [ 106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 106.057269] #PF: supervisor read access in kernel mode [ 106.062493] #PF: error_code(0x0000) - not-present page [ 106.067709] PGD 0 P4D 0 [ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45 [ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50 [ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00 [ 106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206 [ 106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800 [ 106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800 [ 106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800 [ 106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff [ 106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018 [ 106.157117] FS: 0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000 [ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0 [ 106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.192898] PKRU: 55555554 [ 106.195653] Call Trace: [ 106.198143] <IRQ> [ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice] [ 106.205087] ice_napi_poll+0x3e/0x590 [ice] [ 106.209356] __napi_poll+0x2a/0x160 [ 106.212911] net_rx_action+0xd6/0x200 [ 106.216634] __do_softirq+0xbf/0x29b [ 106.220274] irq_exit_rcu+0x88/0xc0 [ 106.223819] common_interrupt+0x7b/0xa0 [ 106.227719] </IRQ> [ 106.229857] asm_common_interrupt+0x1e/0x40 </snip> Fix this by introducing the bitmap of queues that are zero-copy enabled, where each bit, corresponding to a queue id that xsk pool is being configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and checked within ice_xsk_pool(). The latter is a function used for deciding which napi poll routine is executed. Idea is being taken from our other drivers such as i40e and ixgbe. Fixes: c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-28 03:52:09 +08:00
struct ice_vsi *vsi = ring->vsi;
u16 qid = ring->q_index;
if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi->af_xdp_zc_qps))
return NULL;
return xsk_get_pool_from_qid(vsi->netdev, qid);
}
/**
* ice_tx_xsk_pool - get XSK buffer pool bound to a ring
* @ring: Tx ring to use
*
* Returns a pointer to xdp_umem structure if there is a buffer pool present,
* NULL otherwise. Tx equivalent of ice_xsk_pool.
*/
static inline struct xsk_buff_pool *ice_tx_xsk_pool(struct ice_tx_ring *ring)
{
struct ice_vsi *vsi = ring->vsi;
u16 qid;
qid = ring->q_index - vsi->num_xdp_txq;
ice: track AF_XDP ZC enabled queues in bitmap Commit c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") silently introduced a regression and broke the Tx side of AF_XDP in copy mode. xsk_pool on ice_ring is set only based on the existence of the XDP prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed. That is not something that should happen for copy mode as it should use the regular data path ice_clean_tx_irq. This results in a following splat when xdpsock is run in txonly or l2fwd scenarios in copy mode: <snip> [ 106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 106.057269] #PF: supervisor read access in kernel mode [ 106.062493] #PF: error_code(0x0000) - not-present page [ 106.067709] PGD 0 P4D 0 [ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45 [ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50 [ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00 [ 106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206 [ 106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800 [ 106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800 [ 106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800 [ 106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff [ 106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018 [ 106.157117] FS: 0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000 [ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0 [ 106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.192898] PKRU: 55555554 [ 106.195653] Call Trace: [ 106.198143] <IRQ> [ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice] [ 106.205087] ice_napi_poll+0x3e/0x590 [ice] [ 106.209356] __napi_poll+0x2a/0x160 [ 106.212911] net_rx_action+0xd6/0x200 [ 106.216634] __do_softirq+0xbf/0x29b [ 106.220274] irq_exit_rcu+0x88/0xc0 [ 106.223819] common_interrupt+0x7b/0xa0 [ 106.227719] </IRQ> [ 106.229857] asm_common_interrupt+0x1e/0x40 </snip> Fix this by introducing the bitmap of queues that are zero-copy enabled, where each bit, corresponding to a queue id that xsk pool is being configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and checked within ice_xsk_pool(). The latter is a function used for deciding which napi poll routine is executed. Idea is being taken from our other drivers such as i40e and ixgbe. Fixes: c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-28 03:52:09 +08:00
if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi->af_xdp_zc_qps))
return NULL;
ice: track AF_XDP ZC enabled queues in bitmap Commit c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") silently introduced a regression and broke the Tx side of AF_XDP in copy mode. xsk_pool on ice_ring is set only based on the existence of the XDP prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed. That is not something that should happen for copy mode as it should use the regular data path ice_clean_tx_irq. This results in a following splat when xdpsock is run in txonly or l2fwd scenarios in copy mode: <snip> [ 106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 106.057269] #PF: supervisor read access in kernel mode [ 106.062493] #PF: error_code(0x0000) - not-present page [ 106.067709] PGD 0 P4D 0 [ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45 [ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50 [ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00 [ 106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206 [ 106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800 [ 106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800 [ 106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800 [ 106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff [ 106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018 [ 106.157117] FS: 0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000 [ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0 [ 106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.192898] PKRU: 55555554 [ 106.195653] Call Trace: [ 106.198143] <IRQ> [ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice] [ 106.205087] ice_napi_poll+0x3e/0x590 [ice] [ 106.209356] __napi_poll+0x2a/0x160 [ 106.212911] net_rx_action+0xd6/0x200 [ 106.216634] __do_softirq+0xbf/0x29b [ 106.220274] irq_exit_rcu+0x88/0xc0 [ 106.223819] common_interrupt+0x7b/0xa0 [ 106.227719] </IRQ> [ 106.229857] asm_common_interrupt+0x1e/0x40 </snip> Fix this by introducing the bitmap of queues that are zero-copy enabled, where each bit, corresponding to a queue id that xsk pool is being configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and checked within ice_xsk_pool(). The latter is a function used for deciding which napi poll routine is executed. Idea is being taken from our other drivers such as i40e and ixgbe. Fixes: c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-28 03:52:09 +08:00
return xsk_get_pool_from_qid(vsi->netdev, qid);
}
/**
* ice_get_main_vsi - Get the PF VSI
* @pf: PF instance
*
* returns pf->vsi[0], which by definition is the PF VSI
*/
static inline struct ice_vsi *ice_get_main_vsi(struct ice_pf *pf)
{
if (pf->vsi)
return pf->vsi[0];
return NULL;
}
/**
* ice_get_netdev_priv_vsi - return VSI associated with netdev priv.
* @np: private netdev structure
*/
static inline struct ice_vsi *ice_get_netdev_priv_vsi(struct ice_netdev_priv *np)
{
/* In case of port representor return source port VSI. */
if (np->repr)
return np->repr->src_vsi;
else
return np->vsi;
}
/**
* ice_get_ctrl_vsi - Get the control VSI
* @pf: PF instance
*/
static inline struct ice_vsi *ice_get_ctrl_vsi(struct ice_pf *pf)
{
/* if pf->ctrl_vsi_idx is ICE_NO_VSI, control VSI was not set up */
if (!pf->vsi || pf->ctrl_vsi_idx == ICE_NO_VSI)
return NULL;
return pf->vsi[pf->ctrl_vsi_idx];
}
ice: set and release switchdev environment Switchdev environment has to be set up when user create VFs and eswitch mode is switchdev. Release is done when user delete all VFs. Data path in this implementation is based on control plane VSI. This VSI is used to pass traffic from port representors to corresponding VFs and vice versa. Default TX rule has to be added to forward packet to control plane VSI. This will redirect packets from VFs which don't match other rules to control plane VSI. On RX side default rule is added on uplink VSI to receive all traffic that doesn't match other rules. When setting switchdev environment all other rules from VFs should be removed. Packet to VFs will be forwarded by control plane VSI. As VF without any mac rules can't send any packet because of antispoof mechanism, VSI antispoof should be turned off on each VFs. To send packet from representor to correct VSI, destination VSI field in TX descriptor will have to be filled. Allow that by setting destination override bit in control plane VSI security config. Packet from VFs will be received on control plane VSI. Driver should decide to which netdev forward the packet. Decision is made based on src_vsi field from descriptor. There is a target netdev list in control plane VSI struct which choose netdev based on src_vsi number. Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-20 08:08:54 +08:00
/**
* ice_is_switchdev_running - check if switchdev is configured
* @pf: pointer to PF structure
*
* Returns true if eswitch mode is set to DEVLINK_ESWITCH_MODE_SWITCHDEV
* and switchdev is configured, false otherwise.
*/
static inline bool ice_is_switchdev_running(struct ice_pf *pf)
{
return pf->switchdev.is_running;
}
/**
* ice_set_sriov_cap - enable SRIOV in PF flags
* @pf: PF struct
*/
static inline void ice_set_sriov_cap(struct ice_pf *pf)
{
if (pf->hw.func_caps.common_cap.sr_iov_1_1)
set_bit(ICE_FLAG_SRIOV_CAPABLE, pf->flags);
}
/**
* ice_clear_sriov_cap - disable SRIOV in PF flags
* @pf: PF struct
*/
static inline void ice_clear_sriov_cap(struct ice_pf *pf)
{
clear_bit(ICE_FLAG_SRIOV_CAPABLE, pf->flags);
}
#define ICE_FD_STAT_CTR_BLOCK_COUNT 256
#define ICE_FD_STAT_PF_IDX(base_idx) \
((base_idx) * ICE_FD_STAT_CTR_BLOCK_COUNT)
#define ICE_FD_SB_STAT_IDX(base_idx) ICE_FD_STAT_PF_IDX(base_idx)
#define ICE_FD_STAT_CH 1
#define ICE_FD_CH_STAT_IDX(base_idx) \
(ICE_FD_STAT_PF_IDX(base_idx) + ICE_FD_STAT_CH)
/**
* ice_is_adq_active - any active ADQs
* @pf: pointer to PF
*
* This function returns true if there are any ADQs configured (which is
* determined by looking at VSI type (which should be VSI_PF), numtc, and
* TC_MQPRIO flag) otherwise return false
*/
static inline bool ice_is_adq_active(struct ice_pf *pf)
{
struct ice_vsi *vsi;
vsi = ice_get_main_vsi(pf);
if (!vsi)
return false;
/* is ADQ configured */
if (vsi->tc_cfg.numtc > ICE_CHNL_START_TC &&
test_bit(ICE_FLAG_TC_MQPRIO, pf->flags))
return true;
return false;
}
bool netif_is_ice(struct net_device *dev);
int ice_vsi_setup_tx_rings(struct ice_vsi *vsi);
int ice_vsi_setup_rx_rings(struct ice_vsi *vsi);
int ice_vsi_open_ctrl(struct ice_vsi *vsi);
ice: set and release switchdev environment Switchdev environment has to be set up when user create VFs and eswitch mode is switchdev. Release is done when user delete all VFs. Data path in this implementation is based on control plane VSI. This VSI is used to pass traffic from port representors to corresponding VFs and vice versa. Default TX rule has to be added to forward packet to control plane VSI. This will redirect packets from VFs which don't match other rules to control plane VSI. On RX side default rule is added on uplink VSI to receive all traffic that doesn't match other rules. When setting switchdev environment all other rules from VFs should be removed. Packet to VFs will be forwarded by control plane VSI. As VF without any mac rules can't send any packet because of antispoof mechanism, VSI antispoof should be turned off on each VFs. To send packet from representor to correct VSI, destination VSI field in TX descriptor will have to be filled. Allow that by setting destination override bit in control plane VSI security config. Packet from VFs will be received on control plane VSI. Driver should decide to which netdev forward the packet. Decision is made based on src_vsi field from descriptor. There is a target netdev list in control plane VSI struct which choose netdev based on src_vsi number. Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-20 08:08:54 +08:00
int ice_vsi_open(struct ice_vsi *vsi);
void ice_set_ethtool_ops(struct net_device *netdev);
void ice_set_ethtool_repr_ops(struct net_device *netdev);
void ice_set_ethtool_safe_mode_ops(struct net_device *netdev);
u16 ice_get_avail_txq_count(struct ice_pf *pf);
u16 ice_get_avail_rxq_count(struct ice_pf *pf);
int ice_vsi_recfg_qs(struct ice_vsi *vsi, int new_rx, int new_tx);
void ice_update_vsi_stats(struct ice_vsi *vsi);
void ice_update_pf_stats(struct ice_pf *pf);
int ice_up(struct ice_vsi *vsi);
int ice_down(struct ice_vsi *vsi);
int ice_vsi_cfg(struct ice_vsi *vsi);
struct ice_vsi *ice_lb_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi);
ice: introduce XDP_TX fallback path Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-19 20:00:03 +08:00
int ice_vsi_determine_xdp_res(struct ice_vsi *vsi);
int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog);
int ice_destroy_xdp_rings(struct ice_vsi *vsi);
int
ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
u32 flags);
int ice_set_rss_lut(struct ice_vsi *vsi, u8 *lut, u16 lut_size);
int ice_get_rss_lut(struct ice_vsi *vsi, u8 *lut, u16 lut_size);
int ice_set_rss_key(struct ice_vsi *vsi, u8 *seed);
int ice_get_rss_key(struct ice_vsi *vsi, u8 *seed);
void ice_fill_rss_lut(u8 *lut, u16 rss_table_size, u16 rss_size);
int ice_schedule_reset(struct ice_pf *pf, enum ice_reset_req reset);
void ice_print_link_msg(struct ice_vsi *vsi, bool isup);
int ice_plug_aux_dev(struct ice_pf *pf);
void ice_unplug_aux_dev(struct ice_pf *pf);
int ice_init_rdma(struct ice_pf *pf);
const char *ice_aq_str(enum ice_aq_err aq_err);
bool ice_is_wol_supported(struct ice_hw *hw);
void ice_fdir_del_all_fltrs(struct ice_vsi *vsi);
ice: Implement aRFS Enable accelerated Receive Flow Steering (aRFS). It is used to steer Rx flows to a specific queue. This functionality is triggered by the network stack through ndo_rx_flow_steer and requires Flow Director (ntuple on) to function. The fltr_info is used to add/remove/update flow rules in the HW, the fltr_state is used to determine what to do with the filter with respect to HW and/or SW, and the flow_id is used in co-ordination with the network stack. The work for aRFS is split into two paths: the ndo_rx_flow_steer operation and the ice_service_task. The former is where the kernel hands us an Rx SKB among other items to setup aRFS and the latter is where the driver adds/updates/removes filter rules from HW and updates filter state. In the Rx path the following things can happen: 1. New aRFS entries are added to the hash table and the state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 2. aRFS entries have their Rx Queue updated if we receive a pre-existing flow_id and the filter state is ICE_ARFS_ACTIVE. The state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 3. aRFS entries marked as ICE_ARFS_TODEL are deleted In the ice_service_task path the following things can happen: 1. New aRFS entries marked as ICE_ARFS_INACTIVE are added or updated in HW. and their state is updated to ICE_ARFS_ACTIVE. 2. aRFS entries are deleted from HW and their state is updated to ICE_ARFS_TODEL. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-12 09:01:46 +08:00
int
ice_fdir_write_fltr(struct ice_pf *pf, struct ice_fdir_fltr *input, bool add,
bool is_tun);
void ice_vsi_manage_fdir(struct ice_vsi *vsi, bool ena);
int ice_add_fdir_ethtool(struct ice_vsi *vsi, struct ethtool_rxnfc *cmd);
int ice_del_fdir_ethtool(struct ice_vsi *vsi, struct ethtool_rxnfc *cmd);
int ice_get_ethtool_fdir_entry(struct ice_hw *hw, struct ethtool_rxnfc *cmd);
int
ice_get_fdir_fltr_ids(struct ice_hw *hw, struct ethtool_rxnfc *cmd,
u32 *rule_locs);
void ice_fdir_rem_adq_chnl(struct ice_hw *hw, u16 vsi_idx);
void ice_fdir_release_flows(struct ice_hw *hw);
void ice_fdir_replay_flows(struct ice_hw *hw);
void ice_fdir_replay_fltrs(struct ice_pf *pf);
int ice_fdir_create_dflt_rules(struct ice_pf *pf);
ice: implement device flash update via devlink Use the newly added pldmfw library to implement device flash update for the Intel ice networking device driver. This support uses the devlink flash update interface. The main parts of the flash include the Option ROM, the netlist module, and the main NVM data. The PLDM firmware file contains modules for each of these components. Using the pldmfw library, the provided firmware file will be scanned for the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for the main NVM module containing the primary device firmware, and "fw.netlist" containing the netlist module. The flash is separated into two banks, the active bank containing the running firmware, and the inactive bank which we use for update. Each module is updated in a staged process. First, the inactive bank is erased, preparing the device for update. Second, the contents of the component are copied to the inactive portion of the flash. After all components are updated, the driver signals the device to switch the active bank during the next EMP reset (which would usually occur during the next reboot). Although the firmware AdminQ interface does report an immediate status for each command, the NVM erase and NVM write commands receive status asynchronously. The driver must not continue writing until previous erase and write commands have finished. The real status of the NVM commands is returned over the receive AdminQ. Implement a simple interface that uses a wait queue so that the main update thread can sleep until the completion status is reported by firmware. For erasing the inactive banks, this can take quite a while in practice. To help visualize the process to the devlink application and other applications based on the devlink netlink interface, status is reported via the devlink_flash_update_status_notify. While we do report status after each 4k block when writing, there is no real status we can report during erasing. We simply must wait for the complete module erasure to finish. With this implementation, basic flash update for the ice hardware is supported. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24 08:22:03 +08:00
int ice_aq_wait_for_event(struct ice_pf *pf, u16 opcode, unsigned long timeout,
struct ice_rq_event_info *event);
int ice_open(struct net_device *netdev);
int ice_open_internal(struct net_device *netdev);
int ice_stop(struct net_device *netdev);
ice: Implement aRFS Enable accelerated Receive Flow Steering (aRFS). It is used to steer Rx flows to a specific queue. This functionality is triggered by the network stack through ndo_rx_flow_steer and requires Flow Director (ntuple on) to function. The fltr_info is used to add/remove/update flow rules in the HW, the fltr_state is used to determine what to do with the filter with respect to HW and/or SW, and the flow_id is used in co-ordination with the network stack. The work for aRFS is split into two paths: the ndo_rx_flow_steer operation and the ice_service_task. The former is where the kernel hands us an Rx SKB among other items to setup aRFS and the latter is where the driver adds/updates/removes filter rules from HW and updates filter state. In the Rx path the following things can happen: 1. New aRFS entries are added to the hash table and the state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 2. aRFS entries have their Rx Queue updated if we receive a pre-existing flow_id and the filter state is ICE_ARFS_ACTIVE. The state is set to ICE_ARFS_INACTIVE so the filter can be updated in HW by the ice_service_task path. 3. aRFS entries marked as ICE_ARFS_TODEL are deleted In the ice_service_task path the following things can happen: 1. New aRFS entries marked as ICE_ARFS_INACTIVE are added or updated in HW. and their state is updated to ICE_ARFS_ACTIVE. 2. aRFS entries are deleted from HW and their state is updated to ICE_ARFS_TODEL. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-12 09:01:46 +08:00
void ice_service_task_schedule(struct ice_pf *pf);
/**
* ice_set_rdma_cap - enable RDMA support
* @pf: PF struct
*/
static inline void ice_set_rdma_cap(struct ice_pf *pf)
{
if (pf->hw.func_caps.common_cap.rdma && pf->num_rdma_msix) {
set_bit(ICE_FLAG_RDMA_ENA, pf->flags);
set_bit(ICE_FLAG_AUX_ENA, pf->flags);
ice_plug_aux_dev(pf);
}
}
/**
* ice_clear_rdma_cap - disable RDMA support
* @pf: PF struct
*/
static inline void ice_clear_rdma_cap(struct ice_pf *pf)
{
ice_unplug_aux_dev(pf);
clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
clear_bit(ICE_FLAG_AUX_ENA, pf->flags);
}
#endif /* _ICE_H_ */