2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-28 07:04:00 +08:00
linux-next/include
Achiad Shochat 88a85f99e5 net/mlx5e: TX latency optimization to save DMA reads
A regular TX WQE execution involves two or more DMA reads -
one to fetch the WQE, and another one per WQE gather entry.

These DMA reads obviously increase the TX latency.
There are two mlx5 mechanisms to bypass these DMA reads:
1) Inline WQE
2) Blue Flame (BF)

An inline WQE contains a whole packet, thus saves the DMA read/s
of the regular WQE gather entry/s. Inline WQE support was already
added in the previous commit.

A BF WQE is written directly to the device I/O mapped memory, thus
enables saving the DMA read that fetches the WQE.

The BF WQE I/O write must be in cache line granularity, thus uses
the CPU write combining mechanism.
A BF WQE I/O write acts also as a TX doorbell for notifying the
device of new TX WQEs.
A BF WQE is written to the same I/O mapped address as the regular TX
doorbell, thus this address is being mapped twice - once by ioremap()
and once by io_mapping_map_wc().

While both mechanisms reduce the TX latency, they both consume more CPU
cycles than a regular WQE:
- A BF WQE must still be written to host memory, in addition to being
  written directly to the device I/O mapped memory.
- An inline WQE involves copying the SKB data into it.

To handle this tradeoff, we introduce here a heuristic algorithm that
strives to avoid using these two mechanisms in case the TX queue is
being back-pressured by the device, and limit their usage rate otherwise.

An inline WQE will always be "Blue Flamed" (written directly to the
device I/O mapped memory) while a BF WQE may not be inlined (may contain
gather entries).

Preliminary testing using netperf UDP_RR shows that the latency goes down
from 17.5us to 16.9us, while the message rate (tested with pktgen) stays
the same.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:29:17 -07:00
..
acpi Additional ACPICA material for v4.2-rc1 2015-07-02 17:11:28 -07:00
asm-generic mm: clean up per architecture MM hook header files 2015-07-17 16:39:53 -07:00
clocksource ARM: 8366/1: move Dual-Timer SP804 driver to drivers/clocksource 2015-06-02 09:58:18 +01:00
crypto crypto: rng - Do not free default RNG when it becomes unused 2015-06-22 15:49:18 +08:00
drm drm: use kvfree() in drm_free_large() 2015-06-30 19:44:59 -07:00
dt-bindings The changes to the common clock framework for 4.2 are dominated by new 2015-07-01 19:22:00 -07:00
keys
kvm
linux net/mlx5e: TX latency optimization to save DMA reads 2015-07-27 00:29:17 -07:00
math-emu
media Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds 2015-07-01 19:09:11 -07:00
memory
misc cxl: Add AFU virtual PHB and kernel API 2015-06-03 13:27:20 +10:00
net ip_tunnel: Call ip_tunnel_core_init() from inet_init() 2015-07-23 01:28:21 -07:00
pcmcia
ras tracing: add trace event for memory-failure 2015-06-24 17:49:43 -07:00
rdma IB: Add rdma_cap_ib_switch helper and use where appropriate 2015-07-14 13:20:08 -04:00
rxrpc
scsi IB/srp: Avoid using uninitialized variable 2015-07-14 13:20:09 -04:00
soc Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2015-06-26 12:20:00 -07:00
sound ASoC: Further updates for v4.2 2015-06-22 11:32:41 +02:00
target Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2015-07-04 14:13:43 -07:00
trace Merge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2015-06-30 20:07:45 -07:00
uapi lwtunnel: export linux/lwtunnel.h to userspace 2015-07-26 21:45:54 -07:00
video Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux 2015-06-26 13:18:51 -07:00
xen
Kbuild