Commit Graph

52 Commits

Author SHA1 Message Date
Jesse Brandeburg
d59684a07e ice: refactor ITR data structures
Use a dedicated bitfield in order to both increase
the amount of checking around the length of ITR writes
as well as simplify the checks of dynamic mode.

Basically unpack the "high bit means dynamic" logic
into bitfields.

Also, remove some unused ITR defines.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14 17:00:06 -07:00
Jacob Keller
cdf1f1f169 ice: replace custom AIM algorithm with kernel's DIM library
The ice driver has support for adaptive interrupt moderation, an
algorithm for tuning the interrupt rate dynamically. This algorithm
is based on various assumptions about ring size, socket buffer size,
link speed, SKB overhead, ethernet frame overhead and more.

The Linux kernel has support for a dynamic interrupt moderation
algorithm known as "dimlib". Replace the custom driver-specific
implementation of dynamic interrupt moderation with the kernel's
algorithm.

The Intel hardware has a different hardware implementation than the
originators of the dimlib code had to work with, which requires the
driver to use a slightly different set of inputs for the actual
moderation values, while getting all the advice from dimlib of
better/worse, shift left or right.

The change made for this implementation is to use a pair of values
for each of the 5 "slots" that the dimlib moderation expects, and
the driver will program those pairs when dimlib recommends a slot to
use. The currently implementation uses two tables, one for receive
and one for transmit, and the pairs of values in each slot set the
maximum delay of an interrupt and a maximum number of interrupts per
second (both expressed in microseconds).

There are two separate kinds of bugs fixed by using DIMLIB, one is
UDP single stream send was too slow, and the other is that 8K
ping-pong was going to the most aggressive moderation and has much
too high latency.

The overall result of using DIMLIB is that we meet or exceed our
performance expectations set based on the old algorithm.

Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14 17:00:05 -07:00
Anirudh Venkataramanan
51fe27e179 ice: Remove rx_gro_dropped stat
Tracking of the rx_gro_dropped statistic was removed in
commit f73fc40327 ("ice: drop dead code in ice_receive_skb()").
Remove the associated variables and its reporting to ethtool stats.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:16 -07:00
Paul M Stillwell Jr
2ec5638559 ice: handle increasing Tx or Rx ring sizes
There is an issue when the Tx or Rx ring size increases using
'ethtool -L ...' where the new rings don't get the correct ITR
values because when we rebuild the VSI we don't know that some
of the rings may be new.

Fix this by looking at the original number of rings and
determining if the rings in ice_vsi_rebuild_set_coalesce()
were not present in the original rings received in
ice_vsi_rebuild_get_coalesce().

Also change the code to return an error if we can't allocate
memory for the coalesce data in ice_vsi_rebuild().

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31 14:21:27 -07:00
Benita Bose
634da4c118 ice: Add Support for XPS
Enable and configure XPS. The driver code implemented sets up the Transmit
Packet Steering Map, which in turn will be used by the kernel in queue
selection during Tx.

Signed-off-by: Benita Bose <benita.bose@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31 14:21:27 -07:00
Maciej Fijalkowski
f1b1f409bf ice: store the result of ice_rx_offset() onto ice_ring
Output of ice_rx_offset() is based on ethtool's priv flag setting, which
when changed, causes PF reset (disables napi, frees irqs, loads
different Rx mem model, etc.). This means that within napi its result is
constant and there is no reason to call it per each processed frame.

Add new 'rx_offset' field to ice_ring that is meant to hold the
ice_rx_offset() result and use it within ice_clean_rx_irq().
Furthermore, use it within ice_alloc_mapped_page().

Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12 10:36:57 -08:00
Maciej Fijalkowski
29b82f2a09 ice: move skb pointer from rx_buf to rx_ring
Similar thing has been done in i40e, as there is no real need for having
the sk_buff pointer in each rx_buf. Non-eop frames can be simply handled
on that pointer moved upwards to rx_ring.

Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12 10:18:40 -08:00
Jesse Brandeburg
1d9f7ca324 ice: fix writeback enable logic
The writeback enable logic was incorrectly implemented (due to
misunderstanding what the side effects of the implementation would be
during polling).

Fix this logic issue, while implementing a new feature allowing the user
to control the writeback frequency using the knobs for controlling
interrupt throttling that we already have.  Basically if you leave
adaptive interrupts enabled, the writeback frequency will be varied even
if busy_polling or if napi-poll is in use.  If the interrupt rates are
set to a fixed value by ethtool -C and adaptive is off, the driver will
allow the user-set interrupt rate to guide how frequently the hardware
will complete descriptors to the driver.

Effectively the user will get a control over the hardware efficiency,
allowing the choice between immediate interrupts or delayed up to a
maximum of the interrupt rate, even when interrupts are disabled
during polling.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08 16:27:01 -08:00
Jesse Brandeburg
b50f7bca5e intel-ethernet: clean up W=1 warnings in kdoc
This takes care of all of the trivial W=1 fixes in the Intel
Ethernet drivers, which allows developers and maintainers to
build more of the networking tree with more complete warning
checks.

There are three classes of kdoc warnings fixed:
 - cannot understand function prototype: 'x'
 - Excess function parameter 'x' description in 'y'
 - Function parameter or member 'x' not described in 'y'

All of the changes were trivial comment updates on
function headers.

Inspired by Lee Jones' series of wireless work to do the same.
Compile tested only, and passes simple test of
$ git ls-files *.[ch] | egrep drivers/net/ethernet/intel | \
  xargs scripts/kernel-doc -none

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25 16:28:59 -07:00
Magnus Karlsson
1742b3d528 xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umem
Replace the explicit umem reference passed to the driver in AF_XDP
zero-copy mode with the buffer pool instead. This in preparation for
extending the functionality of the zero-copy mode so that umems can be
shared between queues on the same netdev and also between netdevs. In
this commit, only an umem reference has been added to the buffer pool
struct. But later commits will add other entities to it. These are
going to be entities that are different between different queue ids
and netdevs even though the umem is shared between them.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
2020-08-31 21:15:03 +02:00
Jesse Brandeburg
a8fffd7ae9 ice: add useful statistics
Display and count some useful hot-path statistics. The usefulness is as
follows:

- tx_restart: use to determine if the transmit ring size is too small or
  if the transmit interrupt rate is too low.
- rx_gro_dropped: use to count drops from GRO layer, which previously were
  completely uncounted when occurring.
- tx_busy: use to determine when the driver is miscounting number of
  descriptors needed for an skb.
- tx_timeout: as our other drivers, count the number of times we've reset
  due to timeout because the kernel only prints a warning once per netdev.

Several of these were already counted but not displayed.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-08-01 08:44:04 -07:00
Jesse Brandeburg
a4c493fea5 ice: remove page_reuse statistic
The page reuse statistic wasn't even being displayed to the user, even
though the driver counted it. Don't waste the struct space and hot-path
cycles since the driver doesn't display it.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-08-01 08:43:59 -07:00
Jesse Brandeburg
22bef5e78f ice: fix signed vs unsigned comparisons
Fix the remaining signed vs unsigned issues, which appear
when compiling with -Werror=sign-compare.

Many of these are because there is an external interface that is passing
an int to us (which we can't change) but that we (rightfully) store
and compare against as an unsigned in our data structures.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-27 17:02:47 -07:00
David S. Miller
2b1a7f741a Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2020-05-22

This series contains updates to virtchnl and the ice driver.

Geert Uytterhoeven fixes a data structure alignment issue in the
virtchnl structures.

Henry adds Flow Director support which allows for the redirection on
ntuple rules over six patches.  Initially Henry adds the initial
infrastructure for Flow Director, and then later adds IPv4 and IPv6
support, as well as being able to display the ntuple rules.

Bret add Accelerated Receive Flow Steering (aRFS) support which is used
to steer receive flows to a specific queue.  Fixes a transmit timeout
when the VF link transitions from up/down/up because the transmit and
receive queue interrupts are not enabled as part of VF's link up.  Fixed
an issue when the default VF LAN address is changed and after reset the
PF will attempt to add the new MAC, which fails because it already
exists. This causes the VF to be disabled completely until it is removed
and enabled via sysfs.

Anirudh (Ani) makes a fix where the ice driver needs to call set_mac_cfg
to enable jumbo frames, so ensure it gets called during initialization
and after reset.  Fix bad register reads during a register dump in
ethtool by removing the bad registers.

Paul fixes an issue where the receive Malicious Driver Detection (MDD)
auto reset message was not being logged because it occurred after the VF
reset.

Victor adds a check for compatibility between the Dynamic Device
Personalization (DDP) package and the NIC firmware to ensure that
everything aligns.

Jesse fixes a administrative queue string call with the appropriate
error reporting variable.  Also fixed the loop variables that are
comparing or assigning signed against unsigned values.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-23 16:51:26 -07:00
Henry Tieman
cac2a27cd9 ice: Support IPv4 Flow Director filters
Support the addition and deletion of IPv4 filters.

Supported fields are: src-ip, dst-ip, src-port, and dst-port
Supported flow-types are: tcp4, udp4, sctp4, ip4

Example usage:

ethtool -N eth0 flow-type tcp4 src-ip 192.168.0.55 dst-ip 172.16.0.55 \
src-port 16 dst-port 12 action 32

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-22 21:36:27 -07:00
Henry Tieman
148beb6120 ice: Initialize Flow Director resources
Flow Director allows for redirection based on ntuple rules. Rules are
programmed using the ethtool set-ntuple interface. Supported actions are
redirect to queue and drop.

Setup the initial framework to process Flow Director filters. Create and
allocate resources to manage and program filters to the hardware. Filters
are processed via a sideband interface; a control VSI is created to manage
communication and process requests through the sideband. Upon allocation of
resources, update the hardware tables to accept perfect filters.

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-22 21:26:37 -07:00
David S. Miller
a152b85984 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-05-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 50 non-merge commits during the last 8 day(s) which contain
a total of 109 files changed, 2776 insertions(+), 2887 deletions(-).

The main changes are:

1) Add a new AF_XDP buffer allocation API to the core in order to help
   lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe
   as well as mlx5 have been moved over to the new API and also gained a
   small improvement in performance, from Björn Töpel and Magnus Karlsson.

2) Add getpeername()/getsockname() attach types for BPF sock_addr programs
   in order to allow for e.g. reverse translation of load-balancer backend
   to service address/port tuple from a connected peer, from Daniel Borkmann.

3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers
   being non-NULL, e.g. if after an initial test another non-NULL test on
   that pointer follows in a given path, then it can be pruned right away,
   from John Fastabend.

4) Larger rework of BPF sockmap selftests to make output easier to understand
   and to reduce overall runtime as well as adding new BPF kTLS selftests
   that run in combination with sockmap, also from John Fastabend.

5) Batch of misc updates to BPF selftests including fixing up test_align
   to match verifier output again and moving it under test_progs, allowing
   bpf_iter selftest to compile on machines with older vmlinux.h, and
   updating config options for lirc and v6 segment routing helpers, from
   Stanislav Fomichev, Andrii Nakryiko and Alan Maguire.

6) Conversion of BPF tracing samples outdated internal BPF loader to use
   libbpf API instead, from Daniel T. Lee.

7) Follow-up to BPF kernel test infrastructure in order to fix a flake in
   the XDP selftests, from Jesper Dangaard Brouer.

8) Minor improvements to libbpf's internal hashmap implementation, from
   Ian Rogers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 18:30:34 -07:00
Tony Nguyen
a4e82a81f5 ice: Add support for tunnel offloads
Create a boost TCAM entry for each tunnel port in order to get a tunnel
PTYPE. Update netdev feature flags and implement the appropriate logic to
get and set values for hardware offloads.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:03 -07:00
Björn Töpel
175fc43067 ice, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL
Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL
APIs.

v4->v5: Fixed "warning: Excess function parameter 'alloc' description
        in 'ice_alloc_rx_bufs_zc'" and "warning: Excess function
        parameter 'xdp' description in
        'ice_construct_skb_zc'". (Jakub)

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-10-bjorn.topel@gmail.com
2020-05-21 17:31:26 -07:00
Brett Creeley
840f8ad0aa ice: Don't reject odd values of usecs set by user
Currently if a user sets an odd [tx|rx]-usecs value through ethtool,
the request is denied because the hardware is set to have an ITR
granularity of 2us. This caused poor customer experience. Fix this by
aligning to a register allowed value, which results in rounding down.
Also, print a once per ring container type message to be clear about
our intentions.

Also, change the ITR_TO_REG define to be the bitwise and of the ITR
setting and the ICE_ITR_MASK. This makes the purpose of ITR_TO_REG more
obvious.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19 11:50:41 -08:00
Tony Nguyen
4ee656bba8 ice: Trivial fixes
This is a collection of trivial fixes including fixing whitespace, typos,
function headers, reverse Christmas tree, etc.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12 11:49:12 -08:00
Michal Swiatkowski
61dc79ced7 ice: Restore interrupt throttle settings after VSI rebuild
After each rebuild driver deallocates q_vectors, so the interrupt
throttle rate (ITR) settings get lost.

Create a function to save and restore ITR for each queue. If a user
increases the number of queues, restore all the previous queue
settings for each existing queue, and the additional queues will
get the default setting.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-03 16:08:33 -08:00
Maciej Fijalkowski
59bb080805 ice: introduce frame padding computation logic
Take into account the underlying architecture specific settings and
based on that calculate the possible padding that can be supplied.
Typically, for x86 and standard MTU size we will end up with 192 bytes
of headroom. This is the same behavior as our other drivers have and we
can dedicate it for XDP purposes.

Furthermore, introduce the Rx ring flag for indicating whether build_skb
is used on particular. Based on that invoke the routines for padding
calculation.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 13:09:50 -08:00
Maciej Fijalkowski
7237f5b0db ice: introduce legacy Rx flag
Add an ethtool "legacy-rx" priv flag for toggling the Rx path. This
control knob will be mainly used for build_skb usage as well as buffer
size/MTU manipulation.

In preparation for adding build_skb support in a way that it takes
care of how we set the values of max_frame and rx_buf_len fields of
struct ice_vsi. Specifically, in this patch mentioned fields are set to
values that will allow us to provide headroom and tailroom in-place.

This can be mostly broken down onto following:
- for legacy-rx "on" ethtool control knob, old behaviour is kept;
- for standard 1500 MTU size configure the buffer of size 1536, as
  network stack is expecting the NET_SKB_PAD to be provided and
  NET_IP_ALIGN can have a non-zero value (these can be typically equal
  to 32 and 2, respectively);
- for larger MTUs go with max_frame set to 9k and configure the 3k
  buffer in case when PAGE_SIZE of underlying arch is less than 8k; 3k
  buffer is implying the need for order 1 page, so that our page
  recycling scheme can still be applied;

With that said, substitute the hardcoded ICE_RXBUF_2048 and PAGE_SIZE
values in DMA API that we're making use of with rx_ring->rx_buf_len and
ice_rx_pg_size(rx_ring). The latter is an introduced helper for
determining the page size based on its order (which was figured out via
ice_rx_pg_order). Last but not least, take care of truesize calculation.

In the followup patch the headroom/tailroom computation logic will be
introduced.

This change aligns the buffer and frame configuration with other Intel
drivers, most importantly with iavf.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 13:09:46 -08:00
Krzysztof Kazimierczak
2d4238f556 ice: Add support for AF_XDP
Add zero copy AF_XDP support.  This patch adds zero copy support for
Tx and Rx; code for zero copy is added to ice_xsk.h and ice_xsk.c.

For Tx, implement ndo_xsk_wakeup. As with other drivers, reuse
existing XDP Tx queues for this task, since XDP_REDIRECT guarantees
mutual exclusion between different NAPI contexts based on CPU ID. In
turn, a netdev can XDP_REDIRECT to another netdev with a different
NAPI context, since the operation is bound to a specific core and each
core has its own hardware ring.

For Rx, allocate frames as MEM_TYPE_ZERO_COPY on queues that AF_XDP is
enabled.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 12:01:55 -08:00
Krzysztof Kazimierczak
0891d6d4b1 ice: Move common functions to ice_txrx_lib.c
In preparation of AF XDP, move functions that will be used both by skb and
zero-copy paths to a new file called ice_txrx_lib.c.  This allows us to
avoid using ifdefs to control the staticness of said functions.

Move other functions (ice_rx_csum, ice_rx_hash and ice_ptype_to_htype)
called only by the moved ones to the new file as well.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 11:45:05 -08:00
Maciej Fijalkowski
efc2214b60 ice: Add support for XDP
Add support for XDP. Implement ndo_bpf and ndo_xdp_xmit.  Upon load of
an XDP program, allocate additional Tx rings for dedicated XDP use.
The following actions are supported: XDP_TX, XDP_DROP, XDP_REDIRECT,
XDP_PASS, and XDP_ABORTED.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 10:23:59 -08:00
Anirudh Venkataramanan
eff380aaff ice: Introduce ice_base.c
Remove a few uses of kernel configuration flags from ice_lib.c by
introducing a new source file ice_base.c. Also move corresponding
function prototypes from ice_lib.h to ice_base.h and include ice_base.h
where required.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 10:03:14 -08:00
Brett Creeley
2ab28bb04c ice: Set WB_ON_ITR when we don't re-enable interrupts
Currently when busy polling is enabled we aren't setting/enabling
WB_ON_ITR in the driver. This doesn't break the driver, but it does
cause issues. If we don't enable WB_ON_ITR mode we will still get
write-backs from hardware during polling when a cache line has been
filled, but if a cache line is not filled we will not get the
write-back because WB_ON_ITR is not set. Fix this by enabling
WB_ON_ITR in the driver when interrupts are disabled.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20 14:21:21 -07:00
Jesse Brandeburg
0ab54c5f2f ice: Use bitfields when possible
We can use bit fields to store boolean values and when the
bit fields are next to each other, the compiler will combine them
(as long as the size holds enough).

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-23 10:51:54 -07:00
Jesse Brandeburg
65124bbf98 ice: Reorganize tx_buf and ring structs
Use more efficient structure ordering by using the pahole tool
and a lot of code inspection to get hot cache lines to have
packed data (no holes if possible) and adjacent warm data.

ice_ring prior to this change:
  /* size: 192, cachelines: 3, members: 23 */
  /* sum members: 158, holes: 4, sum holes: 12 */
  /* padding: 22 */

ice_ring after this change:
  /* size: 192, cachelines: 3, members: 25 */
  /* sum members: 162, holes: 1, sum holes: 1 */
  /* padding: 29 */

ice_tx_buf prior to this change:
  /* size: 48, cachelines: 1, members: 7 */
  /* sum members: 38, holes: 2, sum holes: 6 */
  /* padding: 4 */
  /* last cacheline: 48 bytes */

ice_tx_buf after this change:
  /* size: 40, cachelines: 1, members: 7 */
  /* sum members: 38, holes: 1, sum holes: 2 */
  /* last cacheline: 40 bytes */

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-23 10:51:54 -07:00
Brett Creeley
b9c8bb06b5 ice: Add ability to update rx-usecs-high
Currently the driver allows rx-usecs-high values to be set,
but when querying the device for rx-usecs-high the value
does not stick. This is because it was not yet implemented.
Add code to allow the user to change rx-usecs-high and
use this to set the q_vector's intrl value.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02 01:13:39 -07:00
Anirudh Venkataramanan
5f6aa50e4e ice: Add priority information into VLAN header
This patch introduces a new function ice_tx_prepare_vlan_flags_dcb to
insert 802.1p priority information into the VLAN header

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18 08:38:47 -07:00
Anirudh Venkataramanan
a629cf0a01 ice: Update rings based on TC information
This patch adds a new function ice_vsi_cfg_dcb_rings which updates a
VSI's rings based on DCB traffic class information.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-18 08:38:47 -07:00
Brett Creeley
92414f3292 ice: Update comment regarding the ITR_GRAN_S
Since the driver now hard codes the ITR granularity to 2 us in the
GLINT_CTL register the comment next to ITR_GRAN_S needs to be updated.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-26 15:22:44 -07:00
Brett Creeley
8244dd2d23 ice: Audit hotpath structures with pahole
Currently the ice_q_vector structure and ice_ring_container structure
are taking up more space than necessary due to cache alignment holes
and unnecessary variables respectively. This is not helping the
driver's performance. The following fixes were done to improve cache
alignment, reduce wasted space, and increase performance.

1. Remove the ice_latency_range enum as it is unused.
2. Remove the latency_range variable in the ice_ring_container structure.
3. Change the size of the itr_idx in the ice_ring_container structure
   from an int to an u16. This reduced the size of ice_ring_container
   structure to 32 Bytes so it has no holes or padding.
4. Re-arrange the ice_q_vector structure using pahole to align
   members as best as possible in regards to 64 Byte cache line size.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-26 15:03:25 -07:00
Anirudh Venkataramanan
64a59d05a4 ice: Fix for adaptive interrupt moderation
commit 63f545ed12 ("ice: Add support for adaptive interrupt moderation")
was meant to add support for adaptive interrupt moderation but there was
an error on my part while formatting the patch, and thus only part of the
patch ended up being submitted.

This patch rectifies the error by adding the rest of the code.

Fixes: 63f545ed12 ("ice: Add support for adaptive interrupt moderation")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-26 14:03:01 -07:00
Maciej Fijalkowski
a65f71fed5 ice: map Rx buffer pages with DMA attributes
Provide DMA_ATTR_WEAK_ORDERING and DMA_ATTR_SKIP_CPU_SYNC attributes to
the DMA API during the mapping operations on Rx side. With this change
the non-x86 platforms will be able to sync only with what is being used
(2k buffer) instead of entire page. This should yield a slight
performance improvement.

Furthermore, DMA unmap may destroy the changes that were made to the
buffer by CPU when platform is not a x86 one. DMA_ATTR_SKIP_CPU_SYNC
attribute usage fixes this issue.

Also add a sync_single_for_device call during the Rx buffer assignment,
to make sure that the cache lines are cleared before device attempting
to write to the buffer.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-25 10:10:39 -07:00
Maciej Fijalkowski
03c66a1376 ice: Introduce bulk update for page count
{get,put}_page are atomic operations which we use for page count
handling. The current logic for refcount handling is that we increment
it when passing a skb with the data from the first half of page up to
netstack and recycle the second half of page. This operation protects us
from losing a page since the network stack can decrement the refcount of
page from skb.

The performance can be gently improved by doing the bulk updates of
refcount instead of doing it one by one. During the buffer initialization,
maximize the page's refcount and don't allow the refcount to become
less than two.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-25 09:33:13 -07:00
Brett Creeley
70457520ba ice: configure GLINT_ITR to always have an ITR gran of 2
Instead of hoping that our ITR granularity will be 2 usec program the
GLINT_CTL register to make sure the ITR granularity is always 2 usecs.

Now that we know what the ITR granularity will be get rid of the check
in ice_probe() to verify our previous assumption.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 16:56:10 -07:00
Brett Creeley
67fe64d78c ice: Implement getting and setting ethtool coalesce
This patch includes the following ethtool operations:

1. get_coalesce
2. set_coalesce
3. get_per_q_coalesce
4. set_per_q_coalesce

Each ITR value (current_itr/target_itr) are stored on a per
ice_ring_container basis. This is because each valid ice_ring_container
can have 1 or more rings that are tied to the same q_vector ITR index.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-01-15 11:50:05 -08:00
Brett Creeley
63f545ed12 ice: Add support for adaptive interrupt moderation
Currently the driver does not support adaptive/dynamic interrupt
moderation. This patch adds support for this. Also, adaptive/dynamic
interrupt moderation is turned on by default upon driver load.

In order to support adaptive interrupt moderation, two functions were
added, ice_update_itr() and ice_itr_divisor(). These are used to
determine the current packet load and to determine a divisor based
on link speed respectively.

This patch also adds the ICE_ITR_GRAN_S define that is used in the
hot-path when setting a new ITR value. The shift is used to pet two
birds with one hand, set the ITR value while re-enabling the
interrupt. Also, the ICE_ITR_GRAN_S is defined as 1 because the device
has a ITR granularity of 2usecs.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-01-15 11:29:16 -08:00
Brett Creeley
c585ea42ec ice: Fix tx_timeout in PF driver
Prior to this commit the driver was running into tx_timeouts when a
queue was stressed enough. This was happening because the HW tail
and SW tail (NTU) were incorrectly out of sync. Consequently this was
causing the HW head to collide with the HW tail, which to the hardware
means that all descriptors posted for Tx have been processed.

Due to the Tx logic used in the driver SW tail and HW tail are allowed
to be out of sync. This is done as an optimization because it allows the
driver to write HW tail as infrequently as possible, while still
updating the SW tail index to keep track. However, there are situations
where this results in the tail never getting updated, resulting in Tx
timeouts.

Tx HW tail write condition:
	if (netif_xmit_stopped(txring_txq(tx_ring) || !skb->xmit_more)
		writel(sw_tail, tx_ring->tail);

An issue was found in the Tx logic that was causing the afore mentioned
condition for updating HW tail to never happen, causing tx_timeouts.

In ice_xmit_frame_ring we calculate how many descriptors we need for the
Tx transaction based on the skb the kernel hands us. This is then passed
into ice_maybe_stop_tx along with some extra padding to determine if we
have enough descriptors available for this transaction. If we don't then
we return -EBUSY to the stack, otherwise we move on and eventually
prepare the Tx descriptors accordingly in ice_tx_map and set
next_to_watch. In ice_tx_map we make another call to ice_maybe_stop_tx
with a value of MAX_SKB_FRAGS + 4. The key here is that this value is
possibly less than the value we sent in the first call to
ice_maybe_stop_tx in ice_xmit_frame_ring. Now, if the number of unused
descriptors is between MAX_SKB_FRAGS + 4 and the value used in the first
call to ice_maybe_stop_tx in ice_xmit_frame_ring then we do not update
the HW tail because of the "Tx HW tail write condition" above. This is
because in ice_maybe_stop_tx we return success from ice_maybe_stop_tx
instead of calling __ice_maybe_stop_tx and subsequently calling
netif_stop_subqueue, which sets the __QUEUE_STATE_DEV_XOFF bit. This
bit is then checked in the "Tx HW tail write condition" by calling
netif_xmit_stopped and subsequently updating HW tail if the
afore mentioned bit is set.

In ice_clean_tx_irq, if next_to_watch is not NULL, we end up cleaning
the descriptors that HW sets the DD bit on and we have the budget. The
HW head will eventually run into the HW tail in response to the
description in the paragraph above.

The next time through ice_xmit_frame_ring we make the initial call to
ice_maybe_stop_tx with another skb from the stack. This time we do not
have enough descriptors available and we return NETDEV_TX_BUSY to the
stack and end up setting next_to_watch to NULL.

This is where we are stuck. In ice_clean_tx_irq we never clean anything
because next_to_watch is always NULL and in ice_xmit_frame_ring we never
update HW tail because we already return NETDEV_TX_BUSY to the stack and
eventually we hit a tx_timeout.

This issue was fixed by making sure that the second call to
ice_maybe_stop_tx in ice_tx_map is passed a value that is >= the value
that was used on the initial call to ice_maybe_stop_tx in
ice_xmit_frame_ring. This was done by adding the following defines to
make the logic more clear and to reduce the chance of mucking this up
again:

ICE_CACHE_LINE_BYTES		64
ICE_DESCS_PER_CACHE_LINE	(ICE_CACHE_LINE_BYTES / \
				 sizeof(struct ice_tx_desc))
ICE_DESCS_FOR_CTX_DESC		1
ICE_DESCS_FOR_SKB_DATA_PTR	1

The ICE_CACHE_LINE_BYTES being 64 is an assumption being made so we
don't have to figure this out on every pass through the Tx path. Instead
I added a sanity check in ice_probe to verify cache line size and print
a message if it's not 64 Bytes. This will make it easier to file issues
if they are seen when the cache line size is not 64 Bytes when reading
from the GLPCI_CNF2 register.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-06 12:46:47 -08:00
Brett Creeley
d2b464a7ff ice: Add more flexibility on how we assign an ITR index
This issue came about when looking at the VF function
ice_vc_cfg_irq_map_msg. Currently we are assigning the itr_setting value
to the itr_idx received from the AVF driver, which is not correct and is
not used for the VF flow anyway. Currently the only way we set the ITR
index for both the PF and VF driver is by hard coding ICE_TX_ITR or
ICE_RX_ITR for the ITR index on each q_vector.

To fix this, add the member itr_idx in struct ice_ring_container. This
can then be used to dynamically program the correct ITR index. This change
also affected the PF driver so make the necessary changes there as well.

Also, removed the itr_setting member in struct ice_ring because it is not
being used meaningfully and is going to be removed in a future patch that
includes dynamic ITR.

On another note, this will be useful moving forward if we decide to split
Rx/Tx rings on different q_vectors instead of sharing them as queue pairs.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-03 07:42:30 -07:00
Brett Creeley
9e4ab4c29a ice: Add support for dynamic interrupt moderation
Currently there is no support for dynamic interrupt moderation. This
patch adds some initial code to support this. The following changes
were made:

1. Currently we are using multiple members to store the interrupt
   granularity (itr_gran_25/50/100/200). This is not necessary because
   we can query the device to determine what the interrupt granularity
   should be set to, done by a new function ice_get_itr_intrl_gran.

2. Added intrl to ice_q_vector structure to support interrupt rate
   limiting.

3. Added the function ice_intrl_usecs_to_reg for converting to a value
   in usecs that the device understands.

4. Added call to write to the GLINT_RATE register. Disable intrl by
   default for now.

5. Changed rx/tx_itr_setting to itr_setting because having both seems
   redundant because a ring is either Tx or Rx.

6. Initialize itr_setting for both Tx/Rx rings in ice_vsi_alloc_rings()

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-02 07:19:30 -07:00
Sudheer Mogilappagari
b3969fd727 ice: Add support for Tx hang, Tx timeout and malicious driver detection
When a malicious operation is detected, the firmware triggers an
interrupt, which is then picked up by the service task (specifically by
ice_handle_mdd_event). A reset is scheduled if required.

Tx hang detection works in a similar way, except the logic here monitors
the VSI's Tx queues and tries to revive them if stalled. If the hang is
not resolved, the kernel eventually calls ndo_tx_timeout, which is
handled by ice_tx_timeout.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-28 10:58:42 -07:00
Bruce Allan
43f8b22450 ice: Change struct members from bool to u8
Recent versions of checkpatch have a new warning based on a documented
preference of Linus to not use bool in structures due to wasted space and
the size of bool is implementation dependent.  For more information, see
the email thread at https://lkml.org/lkml/2017/11/21/384.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-23 11:32:59 -07:00
Anirudh Venkataramanan
d76a60ba7a ice: Add support for VLANs and offloads
This patch adds support for VLANs. When a VLAN is created a switch filter
is added to direct the VLAN traffic to the corresponding VSI. When a VLAN
is deleted, the filter is deleted as well.

This patch also adds support for the following hardware offloads.
    1) VLAN tag insertion/stripping
    2) Receive Side Scaling (RSS)
    3) Tx checksum and TCP segmentation
    4) Rx checksum

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 11:54:49 -07:00
Anirudh Venkataramanan
2b245cb294 ice: Implement transmit and NAPI support
This patch implements ice_start_xmit (the handler for ndo_start_xmit) and
related functions. ice_start_xmit ultimately calls ice_tx_map, where the
Tx descriptor is built and posted to the hardware by bumping the ring tail.

This patch also implements ice_napi_poll, which is invoked when there's an
interrupt on the VSI's queues. The interrupt can be due to either a
completed Tx or an Rx event. In case of a completed Tx/Rx event, resources
are reclaimed. Additionally, in case of an Rx event, the skb is fetched
and passed up to the network stack.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 11:27:05 -07:00
Anirudh Venkataramanan
cdedef59de ice: Configure VSIs for Tx/Rx
This patch configures the VSIs to be able to send and receive
packets by doing the following:

1) Initialize flexible parser to extract and include certain
   fields in the Rx descriptor.

2) Add Tx queues by programming the Tx queue context (implemented in
   ice_vsi_cfg_txqs). Note that adding the queues also enables (starts)
   the queues.

3) Add Rx queues by programming Rx queue context (implemented in
   ice_vsi_cfg_rxqs). Note that this only adds queues but doesn't start
   them. The rings will be started by calling ice_vsi_start_rx_rings on
   interface up.

4) Configure interrupts for VSI queues.

5) Implement ice_open and ice_stop.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 11:18:36 -07:00