linux/drivers/net/ethernet
Sivakumar Krishnasamy 66aa0678ef ibmveth: Support to enable LSO/CSO for Trunk VEA.
Current largesend and checksum offload feature in ibmveth driver,
 - Source VM sends the TCP packets with ip_summed field set as
   CHECKSUM_PARTIAL and TCP pseudo header checksum is placed in
   checksum field
 - CHECKSUM_PARTIAL flag in SKB will enable ibmveth driver to mark
   "no checksum" and "checksum good" bits in transmit buffer descriptor
   before the packet is delivered to pseries PowerVM Hypervisor
 - If ibmveth has largesend capability enabled, transmit buffer descriptors
   are market accordingly before packet is delivered to Hypervisor
   (along with mss value for packets with length > MSS)
 - Destination VM's ibmveth driver receives the packet with "checksum good"
   bit set and so, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY
 - If "largesend" bit was on, mss value is copied from receive descriptor
   into SKB's gso_size and other flags are appropriately set for
   packets > MSS size
 - The packet is now successfully delivered up the stack in destination VM

The offloads described above works fine for TCP communication among VMs in
the same pseries server ( VM A <=> PowerVM Hypervisor <=> VM B )

We are now enabling support for OVS in pseries PowerVM environment. One of
our requirements is to have ibmveth driver configured in "Trunk" mode, when
they are used with OVS. This is because, PowerVM Hypervisor will no more
bridge the packets between VMs, instead the packets are delivered to
IO Server which hosts OVS to bridge them between VMs or to external
networks (flow shown below),
  VM A <=> PowerVM Hypervisor <=> IO Server(OVS) <=> PowerVM Hypervisor
                                                                   <=> VM B
In "IO server" the packet is received by inbound Trunk ibmveth and then
delivered to OVS, which is then bridged to outbound Trunk ibmveth (shown
below),
        Inbound Trunk ibmveth <=> OVS <=> Outbound Trunk ibmveth

In this model, we hit the following issues which impacted the VM
communication performance,

 - Issue 1: ibmveth doesn't support largesend and checksum offload features
   when configured as "Trunk". Driver has explicit checks to prevent
   enabling these offloads.

 - Issue 2: SYN packet drops seen at destination VM. When the packet
   originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to
   IO server's inbound Trunk ibmveth, on validating "checksum good" bits
   in ibmveth receive routine, SKB's ip_summed field is set with
   CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux
   Bridge) and delivered to outbound Trunk ibmveth. At this point the
   outbound ibmveth transmit routine will not set "no checksum" and
   "checksum good" bits in transmit buffer descriptor, as it does so only
   when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets
   delivered to destination VM, TCP layer receives the packet with checksum
   value of 0 and with no checksum related flags in ip_summed field. This
   leads to packet drops. So, TCP connections never goes through fine.

 - Issue 3: First packet of a TCP connection will be dropped, if there is
   no OVS flow cached in datapath. OVS while trying to identify the flow,
   computes the checksum. The computed checksum will be invalid at the
   receiving end, as ibmveth transmit routine zeroes out the pseudo
   checksum value in the packet. This leads to packet drop.

 - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list.
   When Physical NIC has GRO enabled and when OVS bridges these packets,
   OVS vport send code will end up calling dev_queue_xmit, which in turn
   calls validate_xmit_skb.
   In validate_xmit_skb routine, the larger packets will get segmented into
   MSS sized segments, if SKB has a frag_list and if the driver to which
   they are delivered to doesn't support NETIF_F_FRAGLIST feature.

This patch addresses the above four issues, thereby enabling end to end
largesend and checksum offload support for better performance.

 - Fix for Issue 1 : Remove checks which prevent enabling TCP largesend and
   checksum offloads.
 - Fix for Issue 2 : When ibmveth receives a packet with "checksum good"
   bit set and if its configured in Trunk mode, set appropriate SKB fields
   using skb_partial_csum_set (ip_summed field is set with
   CHECKSUM_PARTIAL)
 - Fix for Issue 3: Recompute the pseudo header checksum before sending the
   SKB up the stack.
 - Fix for Issue 4: Linearize the SKBs with frag_list. Though we end up
   allocating buffers and copying data, this fix gives
   upto 4X throughput increase.

Note: All these fixes need to be dropped together as fixing just one of
them will lead to other issues immediately (especially for Issues 1,2 & 3).

Signed-off-by: Sivakumar Krishnasamy <ksiva@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21 13:29:01 -04:00
..
3com Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
8390 Annotate hardware config module parameters in drivers/net/ethernet/ 2017-04-20 12:02:32 +01:00
adaptec
adi net: bfin_mac: Remove unused stats member from struct bfin_mac_local 2017-03-27 16:01:59 -07:00
aeroflex net: greth: Utilize of_get_mac_address() 2017-03-22 12:00:39 -07:00
agere
alacritech
allwinner
alteon
altera
amazon net/ena: switch to pci_alloc_irq_vectors 2017-04-11 11:16:03 -04:00
amd Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
apm xgene: Check all RGMII phy mode variants 2017-05-19 19:41:43 -04:00
apple
aquantia sk_buff: remove support for csum_bad in sk_buff 2017-05-19 19:21:29 -04:00
arc net: arc_emac: switch to phy_start()/phy_stop() 2017-04-21 15:23:52 -04:00
atheros Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-05-09 15:42:31 -07:00
aurora
broadcom Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-05-18 16:11:32 -04:00
brocade bna: ethtool: Avoid reading past end of buffer 2017-05-08 14:41:42 -04:00
cadence net: macb: fix phy interrupt parsing 2017-04-30 22:21:49 -04:00
calxeda
cavium liquidio: make the spinlock octeon_devices_lock static 2017-05-18 11:24:32 -04:00
chelsio Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-05-18 16:11:32 -04:00
cirrus Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
cisco
davicom
dec Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
dlink net: dl2k: Use net_device_stats from struct net_device 2017-04-07 07:03:33 -07:00
emulex benet: Use time_before_eq for time comparison 2017-05-01 11:12:46 -04:00
ezchip Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-04-06 08:24:51 -07:00
faraday net: ethernet: faraday: To support device tree usage. 2017-05-18 10:10:44 -04:00
freescale powerpc updates for 4.12 part 2 2017-05-12 10:04:09 -07:00
fujitsu
hisilicon format-security: move static strings to const 2017-05-08 17:15:14 -07:00
hp Annotate hardware config module parameters in drivers/net/ethernet/ 2017-04-20 12:02:32 +01:00
i825xx
ibm ibmveth: Support to enable LSO/CSO for Trunk VEA. 2017-05-21 13:29:01 -04:00
intel pci-v4.12-changes 2017-05-08 19:03:25 -07:00
marvell sky2: Use seq_puts() in sky2_debug_show() 2017-04-18 13:55:11 -04:00
mediatek Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-04-20 10:35:33 -04:00
mellanox Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-05-18 16:11:32 -04:00
micrel
microchip
moxa net: moxa: Use net_device_stats from struct net_device 2017-04-07 07:03:33 -07:00
myricom
natsemi format-security: move static strings to const 2017-05-08 17:15:14 -07:00
neterion
netronome nfp: eliminate an if statement in calculation of completed frames 2017-05-16 12:59:04 -04:00
nuvoton net: nuvoton: Use net_device_stats from struct net_device 2017-04-07 07:03:33 -07:00
nvidia forcedeth: remove unnecessary carrier status check 2017-05-04 10:57:41 -04:00
nxp
oki-semi
packetengines
pasemi
qlogic qede: Support 1G advertisment. 2017-05-21 12:56:53 -04:00
qualcomm net: qca_spi: Fix alignment issues in rx path 2017-05-11 12:14:12 -04:00
rdc
realtek Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
renesas sh_eth: Do not print an error message for probe deferral 2017-05-18 11:21:07 -04:00
rocker Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-04-06 08:24:51 -07:00
samsung scripts/spelling.txt: add "intialise(d)" pattern and fix typo instances 2017-05-08 17:15:13 -07:00
seeq
sfc sfc: revert changes to NIC revision numbers 2017-05-12 12:22:53 -04:00
sgi
silan
sis
smsc Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
stmicro net: stmmac: use correct pointer when printing normal descriptor ring 2017-05-15 10:02:19 -04:00
sun ldmvsw: stop the clean timer at beginning of remove 2017-05-15 15:36:08 -04:00
synopsys net: dwc-xlgmac: add the initial ethtool support 2017-04-13 13:46:38 -04:00
tehuti net: tehuti: use new api ethtool_{get|set}_link_ksettings 2017-03-27 16:00:07 -07:00
ti net: netcp: fix check of requested timestamping filter 2017-05-15 15:21:03 -04:00
tile
toshiba format-security: move static strings to const 2017-05-08 17:15:14 -07:00
tundra
via
wiznet net: ethernet: wiznet: avoid format string exposure 2017-04-06 13:38:11 -07:00
xilinx
xircom
xscale
dnet.c
dnet.h
ec_bhf.c
ethoc.c net: ethoc: Use ether_addr_copy() 2017-03-21 17:16:56 -07:00
fealnx.c
jme.c
jme.h
Kconfig
korina.c
lantiq_etop.c
Makefile
netx-eth.c