When SRIOV is enabled, pf and vfs communicate via shared comm channel.
The vf gets its side of the comm channel via a VF BAR.
Each VF (slave) creates its vHCR (virtual HCA Command Register),
Its DMA address is passed to the PF (master) using Communication Channel Register.
The same Register is used to notify the master of commands posted by the
slaves and for the master to pass events to the slaves, such as command completions
and asynchronous events.
The vHCR format is identical to the HCR format, except for the 'go' and 't' bits,
which are reserved in the vHCR. Posting commands to the vHCR is identical to
the way it is done with the HCR, albeit that the function/PF token fields are
used instead of the HCR go bit.
Specifically:
- When the function prepares a new command in the vHCR, it issues the Post_vHCR_cmd
communication channel command and toggles the value of the function token;
when PF token has an equal value, the command has been accepted and a new command may be posted.
- When the PF detects a Post_vHCR_cmd command, it concludes that a new command is available in the vHCR;
after processing the command, the PF toggles the PF token to match the function token.
When the 'e' bit is not set, the completion of a Post_vHCR_cmd command also indicates
the completion the vHCR command. If, however, the 'e' bit is set, the completion of a
Post_vHCR_cmd command only indicates that the vHCR command has been accepted for execution by the PF.
Function commands are processed by the PF as follows:
-DMA (using the ACCESS_MEM command) the vHCR image into a shadow buffer.
-Validate that the opcode is non-privileged, and that the opcode- and input-modifiers are legal.
-DMA the in-box (if required) into a shadow buffer.
-Validate the command:
o Resource ranges (e.g., QP ranges).
o Partition key.
o Ranges of referenced resources (e.g., CQs within QP contexts).
-If the 'e' bit is set
o complete the Post_vHCR_cmd command
-Execute the command on the HCR.
-DMA the results to the vHCR out-box (if required).
-If the 'e' bit is set
o Indicate command completion by generating a completion event using the GEN_EQE command
-Otherwise
o DMA the command status to the vHCR
o Complete the Post_vHCR_cmd command
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrillin <yevgenyp@mellanox.com>
Signed-off-by: Liran Liss <liranl@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When SRIOV is enabled on the chip (at FW burning time),
the HCA uses only 17 bits for the PD. The remaining 7 high-order bits
are ignored.
Change the allocator to return only 17 bits for the PD. The MSB 7
bits will be used to encode the slave number for consistency
checking later on in the resource tracker.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
For SRIOV, some Hypervisor commands can be executed directly (native = 1).
Others should go through the command wrapper flow (for tracking resource
usage, for example, or for changing some HCA configurations that slaves
need to be notified of).
This patch sets the groundwork for this capability -- adding the correct
value of "native" in each case.
Note that if SRIOV is not activated, this parameter has no effect.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port mask now has additional state.
Port can be set as "none". In this case neither the mlx4_en or mlx4_ib
drivers take ownership of the port.
In multifunction mode there is an option to set the vfs as single ported devices.
(in single function mode, both physical ports belong to same function)
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
These changes will not affect module operation as yet. They
are only to get some structs and enums in place for use by
subsequent patches (making those smaller).
Added here:
* sriov state structs and inlines (mlx4_is_master/slave/mfunc)
* comm-channel and vhcr support structures
* enum values for new FW and comm-channel virtual commands
(i.e., commands, passed via the comm channel to the PF-driver).
* prototypes for many command wrapper functions (used by the
PF context for processing FW commands passed to it by the VFs).
* struct mlx4_eqe is moved from eq.c to mlx4.h (it will be used
by other mlx4_core source files).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
- use adapter->num_vfs (and not the module param) to store the actual
number of vfs created. Use the same variable to reflect SRIOV
enable/disable state. So, drop the adapter->sriov_enabled field.
- use for_all_vfs() macro in VF configuration code
- drop the "vf_" prefix for the fields of be_vf_cfg; the prefix is
redundant and removing it helps reduce line wrap
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool "-g" option is supposed to report the max queue length and
user modified queue length for RX and TX queues. be2net doesn't support
user modification of queue lengths. So, the correct values for these
would be the max numbers.
be2net incorrectly reports the queue used values for these fields.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit e52fcb2462 newly allocated
skb for small packets are not updated properly and dropped by stack.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
disable Tx vlan offloading in certain cases.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
update pmem_fifo_overflow_drop, rx_priority_pause_frames counters.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code is missing initialization of NO_FCOE_FLAG and NO_ISCSI*FLAGS
when CONFIG_CNIC is not selected.
This causes panic during driver load since commit
1d187b34da where NO_FCOE tested
unconditionally (outside #ifdef BCM_CNIC structure) and
accessed fp[FCOE_IDX] which is not allocated.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let caller know the result of adding/removing vlan id to/from vlan
filter.
In some drivers I make those functions to just return 0. But in those
where there is able to see if hw setup went correctly, return value is
set appropriately.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The .remove code is broken in several ways.
- mdiobus_unregister() is called twice for the same object in case of dual FEC
- phy_disconnect() is being called when the PHY is already disconnected
- the requested IRQ(s) are not freed
- fec_stop() is being called with the inteface already stopped
All of those lead to kernel crashes if the remove function is actually used.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Additionally to setting the ETHER_EN bit in FEC_ECNTRL the MII/RMII
setting in FEC_R_CNTRL needs to be preserved to keep the MII interface
functional.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the MAC address is supplied via platform_data it should be OK as
it is and should not be modified in case of a dual FEC setup.
Also copying the MAC from platform_data to the single 'macaddr'
variable will overwrite the MAC for the first interface in case of a
dual FEC setup.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
prevent calling request_irq() with a known invalid IRQ number and
preserve the return value of the platform_get_irq() function
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon detection of a FDX/HDX change the interface is restarted twice.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The con_id is actually not needed for clk_get().
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- remove some bogus whitespace
- remove line wraps from printk messages
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the tg3 version to 3.122.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the driver to return the flow control configuration
rather than the flow control status through the ETHTOOL_GPAUSEPARAM
ioctl.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds code to track the autonegotiation advertisements of the
link partner and report them through ethtool.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch integrates tg3_adv_1000T_flowctrl_ok() into
tg3_copper_is_advertising_all() and renames the function
tg3_phy_copper_an_config_ok().
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tg3 has a place to store stats, but doesn't really use it. This patch
modifies the driver so that stats are saved across chip resets and gets
cleared across close / open calls.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the ethtool stats member from the tg3 device
structure.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't provide FCoE and iSCSI statistics to management firmware if
CONFIG_CNIC is not set. Some needed structure fields are not defined
without CONFIG_CNIC.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 2df1a70aaf "bnx2x: Support
for byte queue limits" has introduced an asymmetry in usage of
netdev_tx_completed_queue and netdev_tx_sent_queue. Missing
call to netdev_tx_sent_queue causes the crash during ethtool -t.
The patch adds the missing call.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to send driver capabilities, settings and statistics to
management firmware.
[ Redone using many local variables, removed many unnecessary inlines,
and put #defines at the left margin suggested by Joe Perches ]
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support copying MAC addresses to firmware query structure.
[ Fixed up style and formatting errors noted by DaveM and Joe Perches ]
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add FCoE statistics support for FCoE capable devices.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Priority flow control counters for ethtool -S.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ded19addf9 ('pasemic_mac*: Move
the PA Semi driver') inadvertently split pasemi_mac into two separate
modules with unresolved symbols. Change it back into a single module.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 92fc43b415, rtl8169_tx_timeout ends up
resetting Rx and Tx indexes and thus racing with the NAPI handler via
-> rtl8169_hw_reset
-> rtl_hw_reset
-> rtl8169_init_ring_indexes
What about returning to the original state ?
rtl_hw_reset is only used by rtl8169_hw_reset and rtl8169_init_one.
The latter does not need rtl8169_init_ring_indexes because the indexes
still contain their original values from the newly allocated network
device private data area (i.e. 0).
rtl8169_hw_reset is used by:
1. rtl8169_down
Helper for rtl8169_close. rtl8169_open explicitely inits the indexes
anyway.
2. rtl8169_pcierr_interrupt
Indexes are set by rtl8169_reinit_task.
3. rtl8169_interrupt
rtl8169_hw_reset is needed when the device goes down. See 1.
4. rtl_shutdown
System shutdown handler. Indexes are irrelevant.
5. rtl8169_reset_task
Indexes must be set before rtl_hw_start is called.
6. rtl8169_tx_timeout
Indexes should not be set. This is the job of rtl8169_reset_task anyway.
The removal of rtl8169_hw_reset in rtl8169_tx_timeout and its move in
rtl8169_reset_task do not change the analysis.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: hayeswang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Realtek has specified that the post 8168c gigabit chips and the post
8105e fast ethernet chips recover automatically from a Rx FIFO overflow.
The driver does not need to clear the RxFIFOOver bit of IntrStatus and
it should rather avoid messing it.
The implementation deserves some explanation:
1. events outside of the intr_event bit mask are now ignored. It enforces
a no-processing policy for the events that either should not be there
or should be ignored.
2. RxFIFOOver was already ignored in rtl_cfg_infos[RTL_CFG_1] for the
whole 8168 line of chips with two exceptions:
- RTL_GIGA_MAC_VER_22 since b5ba6d12bd
("use RxFIFO overflow workaround for 8168c chipset.").
This one should now be correctly handled.
- RTL_GIGA_MAC_VER_11 (8168b) which requires a different Rx FIFO
overflow processing.
Though it does not conform to Realtek suggestion above, the updated
driver includes no change for RTL_GIGA_MAC_VER_12 and RTL_GIGA_MAC_VER_17.
Both are 8168b. RTL_GIGA_MAC_VER_12 is common and a bit old so I'd rather
wait for experimental evidence that the change suggested by Realtek really
helps or does not hurt in unexpected ways.
Removed case statements in rtl8169_interrupt are only 8168 relevant.
3. RxFIFOOver is masked for post 8105e 810x chips, namely the sole 8105e
(RTL_GIGA_MAC_VER_30) itself.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: hayeswang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This way we consolidate the RCU locking down into the place where it
actually matters, and also we can make the code handle
dst_get_neighbour_noref() returning NULL properly.
Signed-off-by: David S. Miller <davem@davemloft.net>
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
Transitioning through an IEEE DCBX version from a CEE DCBX
and back (CEE->IEEE->CEE) may leave IEEE attributes programmed
in the hardware. DCB uses a bit field in the set routines to
determine which attributes PG, PFC, APP need to be reprogrammed.
This is needed because user flow allows queueing a series
of changes and then reprogramming the hardware with the
entire set in one operation.
When transitioning from IEEE DCBX mode back into CEE DCBX
mode the PG and PFC bits need to be set so the possibly
different CEE attributes get programmed into the device.
This patch fixes broken logic that was evaluating to 0
and never setting any bits. Further this removes some
checks for num_tc in set routines. This logic only worked
when the number of traffic classes and user priorities
were equal. This is no longer the case for X540 devices.
Besides we can trust user input in this case if the
device is incorrectly configured the DCB bandwidths will
be incorrectly mapped but no OOPs, BUG, or hardware
failure will occur.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The order of operations is important in DCBnl set_all(). When FCoE
is configured it uses the up2tc map to learn which queues to configure
the hardware offloads on. Therefore we need to setup the map before
configuring FCoE.
This is only seen when the both up2tc mappings and APP info are
configured simultaneously.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch updates the DMA Coalescing feature parameters to account for
larger MTUs. Previously, sufficient space may not have been allocated in
the receive buffer, causing packet drop.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on a patch from Mike McElroy created against the out-of-tree e1000e
driver:
Hitting the BUG_ON in napi_enable(). Code inspection shows that this can
only be triggered by calling napi_enable() twice without an intervening
napi_disable().
I saw the following sequence of events in the stack trace:
1) We simulated a cable pull using an Extreme switch.
2) e1000_tx_timeout() was entered.
3) e1000_reset_task() was called. Saw the message from e_err() in the
console log.
4) e1000_reinit_locked was called. This function calls e1000_down() and
e1000_up(). These functions call napi_disable() and napi_enable()
respectively.
5) Then on another thread, a monitor task saw carrier was down and executed
'ip set link down' and 'ip set link up' commands.
6) Saw the '_E1000_RESETTING'warning fron the e1000_close function.
7) Either the e1000_open() executed between the e1000_down() and e1000_up()
calls in step 4 or the e1000_open() call executed after the e1000_up()
call. In either case, napi_enable() is called twice which triggers the
BUG_ON.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Cc: Mike McElroy <mike.mcelroy@stratus.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on the original patch submitted my Michael Wang
<wangyun@linux.vnet.ibm.com>.
Descriptors may not be write-back while checking TX hang with flag
FLAG2_DMA_BURST on.
So when we detect hang, we just flush the descriptor and detect
again for once.
-v2 change 1 to true and 0 to false and remove extra ()
CC: Michael Wang <wangyun@linux.vnet.ibm.com>
CC: Flavio Leitner <fbl@redhat.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.
The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.
The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>