linux/drivers
Jussi Maki 9e2ee5c7e7 net, bonding: Add XDP support to the bonding driver
XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.

Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring. The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters. An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10". On top
of that, for ice, further work is in progress on improving the XDP_TX
numbers [4].

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

  [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
  [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
  [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
  [4]: https://lore.kernel.org/bpf/20210805230046.28715-1-maciej.fijalkowski@intel.com/T/#t

Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20210731055738.16820-4-joamaki@gmail.com
2021-08-09 23:20:14 +02:00
..
accessibility TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
acpi Merge branches 'acpi-resources' and 'acpi-dptf' 2021-07-30 20:26:38 +02:00
amba
android
ata libata-5.14-2021-07-30 2021-07-30 10:56:47 -07:00
atm atm: idt77252: clean up trigraph warning on ??) string 2021-07-20 07:24:13 -07:00
auxdisplay
base driver core: Prevent warning when removing a device link from unregistered consumer 2021-07-21 17:28:42 +02:00
bcma
block block-5.14-2021-07-30 2021-07-30 11:08:12 -07:00
bluetooth TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
bus Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-31 09:14:46 -07:00
cdrom block: remove REQ_OP_SCSI_{IN,OUT} 2021-06-30 15:34:19 -06:00
char net: split out ndo_siowandev ioctl 2021-07-27 20:11:45 +01:00
clk dt-bindings: clock: r9a07g044-cpg: Update clock/reset definitions 2021-07-12 10:52:03 +02:00
clocksource This round has a diffstat dominated by Qualcomm clk drivers. Honestly though 2021-07-01 13:26:16 -07:00
comedi Staging / IIO driver patches for 5.14-rc1 2021-07-05 14:01:53 -07:00
connector
counter
cpufreq cpufreq: Fix fall-through warning for Clang 2021-07-13 11:53:07 -05:00
cpuidle - Add support for the Qcom MSM8226 (Bartosz Dudziak) 2021-06-30 14:56:51 +02:00
crypto ARM: SoC changes for 5.14 2021-07-10 09:22:44 -07:00
cxl cxl/pci: Rename CXL REGLOC ID 2021-06-17 17:37:18 -07:00
dax fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
dca
devfreq PM / devfreq: passive: Fix get_target_freq when not using required-opp 2021-06-24 10:37:35 +09:00
dio
dma dmaengine: mpc512x: Fix fall-through warning for Clang 2021-07-14 11:05:55 -05:00
dma-buf Short summary of fixes pull: 2021-07-13 15:15:17 +02:00
edac EDAC/igen6: fix core dependency AGAIN 2021-07-15 11:59:59 -07:00
eisa
extcon Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
firewire Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
firmware A set of EFI fixes: 2021-07-25 10:04:27 -07:00
fpga fpga: machxo2-spi: Address warning about unused variable 2021-06-24 15:45:11 +02:00
fsi
gnss
gpio - Core Frameworks 2021-07-05 12:10:34 -07:00
gpu Merge tag 'amd-drm-fixes-5.14-2021-07-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes 2021-07-29 17:20:29 +10:00
greybus
hid HID: ft260: fix device removal due to USB disconnect 2021-07-29 12:38:32 +02:00
hsi
hv Drivers: hv: vmbus: Fix duplicate CPU assignments within a device 2021-07-19 09:26:31 +00:00
hwmon Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
hwspinlock
hwtracing Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
i2c i2c: mpc: Poll for MCF 2021-07-20 22:32:01 +02:00
i3c I3C for 5.14 2021-07-10 11:53:06 -07:00
idle
iio Staging / IIO driver patches for 5.14-rc1 2021-07-05 14:01:53 -07:00
infiniband Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-31 09:14:46 -07:00
input This pull request contains the following changes for UML: 2021-07-09 10:19:13 -07:00
interconnect interconnect changes for 5.14 2021-06-22 22:03:25 +02:00
iommu fallthrough fixes for Clang for 5.14-rc2 2021-07-15 13:57:31 -07:00
ipack TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
irqchip irqchip fixes for 5.14, take #1 2021-07-09 15:35:13 +02:00
isdn TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
leds This contains quite a lot of fixes, with more fixes in my inbox that 2021-07-03 11:57:42 -07:00
lightnvm
macintosh
mailbox mbox: add polarfire soc system controller mailbox 2021-06-26 12:06:48 -05:00
mcb mcb: Use DEFINE_RES_MEM() helper macro and fix the end address 2021-06-24 15:56:25 +02:00
md - Various DM persistent-data library improvements and fixes that 2021-06-30 18:19:39 -07:00
media ACPI fixes for 5.14-rc3 2021-07-23 11:08:06 -07:00
memory Memory controller drivers for v5.14 - Tegra SoC, part two 2021-06-16 17:36:30 -07:00
memstick for-5.14/block-2021-06-29 2021-06-30 12:12:56 -07:00
message scsi: message: mptfc: Switch from pci_ to dma_ API 2021-06-22 23:00:01 -04:00
mfd Driver core changes for 5.14-rc1 2021-07-05 13:51:41 -07:00
misc Merge tag 'at24-fixes-for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current 2021-07-20 22:28:56 +02:00
mmc MMC core: 2021-07-22 09:51:38 -07:00
most
mtd mtd: cfi_util: Fix unreachable code issue 2021-07-12 11:15:28 -05:00
mux
net net, bonding: Add XDP support to the bonding driver 2021-08-09 23:20:14 +02:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-31 09:14:46 -07:00
ntb
nubus
nvdimm cxl for 5.14 2021-07-04 11:55:13 -07:00
nvme block-5.14-2021-07-24 2021-07-24 12:57:06 -07:00
nvmem Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
of Devicetree updates for v5.14: 2021-07-03 10:54:08 -07:00
opp opp: Allow required-opps to be used for non genpd use cases 2021-06-18 09:00:55 +05:30
parisc kernel.h: split out panic and oops helpers 2021-07-01 11:06:04 -07:00
parport
pci PCI: Fix fall-through warning for Clang 2021-07-13 13:59:12 -05:00
pcmcia
perf drivers/perf: fix the missed ida_simple_remove() in ddr_perf_probe() 2021-06-17 19:45:24 +01:00
phy USB / Thunderbolt patches for 5.14-rc1 2021-07-05 14:16:22 -07:00
pinctrl This is the bulk of pin control changes for the v5.14 kernel: 2021-07-01 16:57:14 -07:00
platform platform/x86: gigabyte-wmi: add support for B550 Aorus Elite V2 2021-07-28 12:05:33 +02:00
pnp Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
power power: supply: Fix fall-through warnings for Clang 2021-07-13 14:50:47 -05:00
powercap
pps
ps3
ptp ptp: Relocate lookup cookie to correct block. 2021-07-08 12:33:10 -07:00
pwm pwm: ep93xx: Ensure configuring period and duty_cycle isn't wrongly skipped 2021-07-08 16:09:30 +02:00
rapidio
ras
regulator regulator: Fixes for v5.14 2021-07-21 12:37:49 -07:00
remoteproc remoteproc updates for v5.14 2021-07-07 10:50:03 -07:00
reset ARM: Drivers for 5.14 2021-07-10 09:46:20 -07:00
rpmsg rpmsg: core: Add driver_data for rpmsg_device_id 2021-06-18 13:13:40 -07:00
rtc RTC for 5.14 2021-07-10 16:19:10 -07:00
s390 dev_ioctl: split out ndo_eth_ioctl 2021-07-27 20:11:45 +01:00
sbus
scsi scsi: fas216: Fix fall-through warning for Clang 2021-07-29 12:51:16 -05:00
sh
siox siox: Simplify error handling via dev_err_probe() 2021-06-24 15:46:34 +02:00
slimbus
soc ARM: Drivers for 5.14 2021-07-10 09:46:20 -07:00
soundwire Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
spi spi: Fixes for v5.14 2021-07-21 12:41:41 -07:00
spmi spmi: hisi-spmi-controller: move driver from staging 2021-06-25 10:02:05 +02:00
ssb
staging dev_ioctl: split out ndo_eth_ioctl 2021-07-27 20:11:45 +01:00
target scsi: target: Fix NULL dereference on XCOPY completion 2021-07-20 23:18:22 -04:00
tc
tee fallthrough fixes for Clang for 5.14-rc1 2021-06-28 20:03:38 -07:00
thermal - Add rk3568 sensor support (Finley Xiao) 2021-07-10 11:43:25 -07:00
thunderbolt USB / Thunderbolt patches for 5.14-rc1 2021-07-05 14:16:22 -07:00
tty net: split out ndo_siowandev ioctl 2021-07-27 20:11:45 +01:00
uio
usb USB fixes for 5.14-rc3 2021-07-23 10:09:27 -07:00
vdpa vp_vdpa: allow set vq state to initial state after reset 2021-07-08 07:49:02 -04:00
vfio VFIO update for v5.14-rc1 2021-07-03 11:49:33 -07:00
vhost vdpa: support packed virtqueue for set/get_vq_state() 2021-07-08 07:49:01 -04:00
video drm fixes for 5.14-rc2 2021-07-16 11:14:54 -07:00
virt nitro_enclaves: Set Bus Master for the NE PCI device 2021-06-24 15:48:27 +02:00
virtio virtio,vhost,vdpa: features, fixes 2021-07-09 11:06:29 -07:00
visorbus
vlynq
vme
w1
watchdog linux-watchdog 5.14-rc1 tag 2021-07-07 12:57:46 -07:00
xen xen: branch for v5.14-rc1 2021-07-07 11:07:13 -07:00
zorro
Kconfig ide: remove the legacy ide driver 2021-06-16 08:53:58 -06:00
Makefile hyperv-next for 5.14 2021-06-29 11:21:35 -07:00