linux/drivers
Maxim Mikityanskiy db05815b36 net/mlx5e: Add XSK zero-copy support
This commit adds support for AF_XDP zero-copy RX and TX.

We create a dedicated XSK RQ inside the channel, it means that two
RQs are running simultaneously: one for non-XSK traffic and the other
for XSK traffic. The regular and XSK RQs use a single ID namespace split
into two halves: the lower half is regular RQs, and the upper half is
XSK RQs. When any zero-copy AF_XDP socket is active, changing the number
of channels is not allowed, because it would break to mapping between
XSK RQ IDs and channels.

XSK requires different page allocation and release routines. Such
functions as mlx5e_{alloc,free}_rx_mpwqe and mlx5e_{get,put}_rx_frag are
generic enough to be used for both regular and XSK RQs, and they use the
mlx5e_page_{alloc,release} wrappers around the real allocation
functions. Function pointers are not used to avoid losing the
performance with retpolines. Wherever it's certain that the regular
(non-XSK) page release function should be used, it's called directly.

Only the stats that could be meaningful for XSK are exposed to the
userspace. Those that don't take part in the XSK flow are not
considered.

Note that we don't wait for WQEs on the XSK RQ (unlike the regular RQ),
because the newer xdpsock sample doesn't provide any Fill Ring entries
at the setup stage.

We create a dedicated XSK SQ in the channel. This separation has its
advantages:

1. When the UMEM is closed, the XSK SQ can also be closed and stop
receiving completions. If an existing SQ was used for XSK, it would
continue receiving completions for the packets of the closed socket. If
a new UMEM was opened at that point, it would start getting completions
that don't belong to it.

2. Calculating statistics separately.

When the userspace kicks the TX, the driver triggers a hardware
interrupt by posting a NOP to a dedicated XSK ICO (internal control
operations) SQ, in order to trigger NAPI on the right CPU core. This XSK
ICO SQ is protected by a spinlock, as the userspace application may kick
the TX from any core.

Store the pointers to the UMEMs in the net device private context,
independently from the kernel. This way the driver can distinguish
between the zero-copy and non-zero-copy UMEMs. The kernel function
xdp_get_umem_from_qid does not care about this difference, but the
driver is only interested in zero-copy UMEMs, particularly, on the
cleanup it determines whether to close the XSK RQ and SQ or not by
looking at the presence of the UMEM. Use state_lock to protect the
access to this area of UMEM pointers.

LRO isn't compatible with XDP, but there may be active UMEMs while
XDP is off. If this is the case, don't allow LRO to ensure XDP can
be reenabled at any time.

The validation of XSK parameters typically happens when XSK queues
open. However, when the interface is down or the XDP program isn't
set, it's still possible to have active AF_XDP sockets and even to
open new, but the XSK queues will be closed. To cover these cases,
perform the validation also in these flows:

1. A new UMEM is registered, but the XSK queues aren't going to be
created due to missing XDP program or interface being down.

2. MTU changes while there are UMEMs registered.

Having this early check prevents mlx5e_open_channels from failing
at a later stage, where recovery is impossible and the application
has no chance to handle the error, because it got the successful
return value for an MTU change or XSK open operation.

The performance testing was performed on a machine with the following
configuration:

- 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz
- Mellanox ConnectX-5 Ex with 100 Gbit/s link

The results with retpoline disabled, single stream:

txonly: 33.3 Mpps (21.5 Mpps with queue and app pinned to the same CPU)
rxdrop: 12.2 Mpps
l2fwd: 9.4 Mpps

The results with retpoline enabled, single stream:

txonly: 21.3 Mpps (14.1 Mpps with queue and app pinned to the same CPU)
rxdrop: 9.9 Mpps
l2fwd: 6.8 Mpps

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27 22:53:28 +02:00
..
accessibility
acpi treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
amba treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
android binder: fix possible UAF when freeing buffer 2019-06-13 10:35:55 +02:00
ata treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
atm
auxdisplay
base drivers/base/devres: introduce devm_release_action() 2019-06-13 17:34:56 -10:00
bcma
block treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
bluetooth treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
bus SPDX update for 5.2-rc6 2019-06-21 09:58:42 -07:00
cdrom
char treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 2019-06-19 17:11:22 +02:00
clk treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
clocksource treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
connector
counter Second set of IIO fixes for the 5.2 cycle. 2019-06-17 22:28:29 +02:00
cpufreq treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
cpuidle treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
crypto treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
dax mm/devm_memremap_pages: fix final page put race 2019-06-13 17:34:56 -10:00
dca
devfreq treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
dio
dma treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
dma-buf treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 2019-06-19 17:09:07 +02:00
edac treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
eisa
extcon treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
firewire
firmware SPDX update for 5.2-rc6 2019-06-21 09:58:42 -07:00
fmc
fpga SPDX update for 5.2-rc4 2019-06-08 12:52:42 -07:00
fsi treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 469 2019-06-19 17:09:11 +02:00
gnss
gpio treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
gpu drm vmwgfx, panfrost, i915, imx fixes 2019-06-21 11:03:33 -07:00
hid treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
hsi treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 336 2019-06-05 17:37:07 +02:00
hv treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 320 2019-06-05 17:37:05 +02:00
hwmon treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
hwspinlock
hwtracing
i2c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
i3c
ide treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
idle treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 335 2019-06-05 17:37:06 +02:00
iio Staging/IIO/Counter fixes for 5.2-rc6 2019-06-21 10:20:19 -07:00
infiniband Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
input SPDX update for 5.2-rc6 2019-06-21 09:58:42 -07:00
interconnect
iommu treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
ipack treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
irqchip SPDX update for 5.2-rc6 2019-06-21 09:58:42 -07:00
isdn Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
leds treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
lightnvm treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 410 2019-06-05 17:37:14 +02:00
macintosh treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 247 2019-06-19 17:09:08 +02:00
mailbox treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
mcb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
md md: fix for divide error in status_resync 2019-06-18 08:02:25 -07:00
media Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
memory treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
memstick treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
message
mfd treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
misc Char/Misc driver fixes for 5.2-rc6 2019-06-21 10:18:16 -07:00
mmc SPDX update for 5.2-rc6 2019-06-21 09:58:42 -07:00
mtd treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
mux
net net/mlx5e: Add XSK zero-copy support 2019-06-27 22:53:28 +02:00
nfc treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 417 2019-06-05 17:37:15 +02:00
ntb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
nubus
nvdimm mm/devm_memremap_pages: fix final page put race 2019-06-13 17:34:56 -10:00
nvme Merge branch 'nvme-5.2-rc-next' of git://git.infradead.org/nvme into for-linus 2019-06-07 14:04:28 -06:00
nvmem treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
of treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
opp treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
oprofile
parisc SPDX update for 5.2-rc4 2019-06-08 12:52:42 -07:00
parport treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
pci Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
pcmcia treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
perf treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
phy treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
pinctrl treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
platform treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
pnp
power treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
powercap treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 309 2019-06-05 17:37:04 +02:00
pps
ps3
ptp ptp: add QorIQ PTP support for DPAA2 2019-06-15 13:43:06 -07:00
pwm treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
rapidio
ras RAS/CEC: Convert the timer callback to a workqueue 2019-06-07 23:21:39 +02:00
regulator treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
remoteproc treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
reset treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
rpmsg
rtc treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
sbus
scsi Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
sfi
sh
siox
slimbus
sn
soc SPDX update for 5.2-rc6 2019-06-21 09:58:42 -07:00
soundwire soundwire fixes for v5.2-rc4 2019-06-10 18:07:39 +02:00
spi treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
spmi treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 284 2019-06-05 17:36:37 +02:00
ssb
staging Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
target Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
tc
tee treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282 2019-06-05 17:36:37 +02:00
thermal treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
thunderbolt thunderbolt: Implement CIO reset correctly for Titan Ridge 2019-06-14 14:25:43 +03:00
tty
uio treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
usb usb: fixes for v5.2-rc5 2019-06-20 11:56:35 +02:00
uwb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 2019-06-19 17:09:07 +02:00
vfio treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
vhost Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
video treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
virt
virtio
visorbus
vlynq
vme
w1 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
watchdog treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
xen treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
zorro
Kconfig
Makefile