linux/drivers
Vladimir Oltean 3a5d12c9be net: enetc: keep RX ring consumer index in sync with hardware
The RX rings have a producer index owned by hardware, where newly
received frame buffers are placed, and a consumer index owned by
software, where newly allocated buffers are placed, in expectation of
hardware being able to place frame data in them.

Hardware increments the producer index when a frame is received, however
it is not allowed to increment the producer index to match the consumer
index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received
BDs. Whenever the producer index matches the value of the consumer
index, the ring has no unprocessed received frames and all BDs in the
ring have been initialized/prepared by software, i.e. hardware owns all
BDs in the ring.

The code uses the next_to_clean variable to keep track of the producer
index, and the next_to_use variable to keep track of the consumer index.

The RX rings are seeded from enetc_refill_rx_ring, which is called from
two places:

1. initially the ring is seeded until full with enetc_bd_unused(rx_ring),
   i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511:

.ndo_open
-> enetc_open
   -> enetc_setup_bdrs
      -> enetc_setup_rxbdr
         -> enetc_refill_rx_ring

2. then during the data path processing, it is refilled with 16 buffers
   at a time:

enetc_msix
-> napi_schedule
   -> enetc_poll
      -> enetc_clean_rx_ring
         -> enetc_refill_rx_ring

There is just one problem: the initial seeding done during .ndo_open
updates just the producer index (ENETC_RBPIR) with 0, and the software
next_to_clean and next_to_use variables. Notably, it will not update the
consumer index to make the hardware aware of the newly added buffers.

Wait, what? So how does it work?

Well, the reset values of the producer index and of the consumer index
of a ring are both zero. As per the description in the second paragraph,
it means that the ring is full of buffers waiting for hardware to put
frames in them, which by coincidence is almost true, because we have in
fact seeded 511 buffers into the ring.

But will the hardware attempt to access the 512th entry of the ring,
which has an invalid BD in it? Well, no, because in order to do that, it
would have to first populate the first 511 entries, and the NAPI
enetc_poll will kick in by then. Eventually, after 16 processed slots
have become available in the RX ring, enetc_clean_rx_ring will call
enetc_refill_rx_ring and then will [ finally ] update the consumer index
with the new software next_to_use variable. From now on, the
next_to_clean and next_to_use variables are in sync with the producer
and consumer ring indices.

So the day is saved, right? Well, not quite. Freeing the memory
allocated for the rings is done in:

enetc_close
-> enetc_clear_bdrs
   -> enetc_clear_rxbdr
      -> this just disables the ring
-> enetc_free_rxtx_rings
   -> enetc_free_rx_ring
      -> sets next_to_clean and next_to_use to 0

but again, nothing is committed to the hardware producer and consumer
indices (yay!). The assumption is that the ring is disabled, so the
indices don't matter anyway, and it's the responsibility of the "open"
code path to set those up.

.. Except that the "open" code path does not set those up properly.

While initially, things almost work, during subsequent enetc_close ->
enetc_open sequences, we have problems. To be precise, the enetc_open
that is subsequent to enetc_close will again refill the ring with 511
entries, but it will leave the consumer index untouched. Untouched
means, of course, equal to the value it had before disabling the ring
and draining the old buffers in enetc_close.

But as mentioned, enetc_setup_rxbdr will at least update the producer
index though, through this line of code:

	enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);

so at this stage we'll have:

next_to_clean=0 (in hardware 0)
next_to_use=511 (in hardware we'll have the refill index prior to enetc_close)

Again, the next_to_clean and producer index are in sync and set to
correct values, so the driver manages to limp on. Eventually, 16 ring
entries will be consumed by enetc_poll, and the savior
enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then
update the hardware consumer ring based upon the new next_to_use.

So.. it works?
Well, by coincidence, it almost does, but there's a circumstance where
enetc_clean_rx_ring won't be there to save us. If the previous value of
the consumer index was 15, there's a problem, because the NAPI poll
sequence will only issue a refill when 16 or more buffers have been
consumed.

It's easiest to illustrate this with an example:

ip link set eno0 up
ip addr add 192.168.100.1/24 dev eno0
ping 192.168.100.1 -c 20 # ping this port from another board
ip link set eno0 down
ip link set eno0 up
ping 192.168.100.1 -c 20 # ping it again from the same other board

One by one:

1. ip link set eno0 up
-> calls enetc_setup_rxbdr:
   -> calls enetc_refill_rx_ring(511 buffers)
   -> next_to_clean=0 (in hw 0)
   -> next_to_use=511 (in hw 0)

2. ping 192.168.100.1 -c 20 # ping this port from another board
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0)
enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15)

20 packets transmitted, 20 packets received, 0% packet loss

3. ip link set eno0 down
enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15)

4. ip link set eno0 up
-> calls enetc_setup_rxbdr:
   -> calls enetc_refill_rx_ring(511 buffers)
   -> next_to_clean=0 (in hw 0)
   -> next_to_use=511 (in hw 15)

5. ping 192.168.100.1 -c 20 # ping it again from the same other board
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15)
enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15)

20 packets transmitted, 12 packets received, 40% packet loss

And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal
to 15 for that to happen), no nothing. The hardware enters the condition where
the producer (14) + 1 is equal to the consumer (15) index, which makes it
believe it has no more free buffers to put packets in, so it starts discarding
them:

ip netns exec ns0 ethtool -S eno0 | grep -v ': 0'
NIC statistics:
     Rx ring  0 discarded frames: 8

Summarized, if the interface receives between 16 and 32 (mod 512) frames
and then there is a link flap, then the port will eventually die with no
way to recover. If it receives less than 16 (mod 512) frames, then the
initial NAPI poll [ before the link flap ] will not update the consumer
index in hardware (it will remain zero) which will be ok when the buffers
are later reinitialized. If more than 32 (mod 512) frames are received,
the initial NAPI poll has the chance to refill the ring twice, updating
the consumer index to at least 32. So after the link flap, the consumer
index is still wrong, but the post-flap NAPI poll gets a chance to
refill the ring once (because it passes through cleaned_cnt=15) and
makes the consumer index be again back in sync with next_to_use.

The solution to this problem is actually simple, we just need to write
next_to_use into the hardware consumer index at enetc_open time, which
always brings it back in sync after an initial buffer seeding process.

The simpler thing would be to put the write to the consumer index into
enetc_refill_rx_ring directly, but there are issues with the MDIO
locking: in the NAPI poll code we have the enetc_lock_mdio() taken from
top-level and we use the unlocked enetc_wr_reg_hot, whereas in
enetc_open, the enetc_lock_mdio() is not taken at the top level, but
instead by each individual enetc_wr_reg, so we are forced to put an
additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of
the code is left as a refactoring exercise.

Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01 13:34:47 -08:00
..
accessibility Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
acpi Additional ACPI updates for v5.12-rc1 2021-02-25 12:03:13 -08:00
amba
android
ata
atm atm: lanai: dont run lanai_dev_close if not open 2021-03-01 13:18:54 -08:00
auxdisplay
base Merge branch 'akpm' (patches from Andrew) 2021-02-24 16:20:38 -08:00
bcma
block virtio: features, fixes 2021-02-25 12:21:08 -08:00
bluetooth TTY/Serial driver changes for 5.12-rc1 2021-02-20 21:28:04 -08:00
bus Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
cdrom
char Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
clk Driver core / debugfs update for 5.12-rc1 2021-02-24 10:13:55 -08:00
clocksource A small set of clockevent fixes which fell through the cracks 2021-02-22 14:11:36 -08:00
connector
counter
cpufreq Simple Firmware Interface (SFI) support removal for v5.12-rc1 2021-02-24 10:35:29 -08:00
cpuidle
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-02-21 17:23:56 -08:00
cxl cxl/mem: Fix potential memory leak 2021-02-22 14:44:39 -08:00
dax Merge branch 'for-5.12/dax' into for-5.12/libnvdimm 2021-02-23 18:13:45 -08:00
dca
devfreq Merge branches 'pm-devfreq' and 'pm-tools' 2021-02-15 17:02:04 +01:00
dio
dma dmaengine updates for v5.12-rc1 2021-02-23 15:05:10 -08:00
dma-buf dma-fence: allow signaling drivers to set fence timestamp 2021-02-24 21:05:28 +05:30
edac Merge branch 'edac-misc' into edac-updates-for-v5.12 2021-02-15 10:06:58 +01:00
eisa
extcon
firewire firewire: replace tricky statement by two simple ones 2021-02-09 12:16:20 +01:00
firmware Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
fpga
fsi
gnss
gpio Driver core / debugfs update for 5.12-rc1 2021-02-24 10:13:55 -08:00
gpu drm fixes for 5.12-rc1 + msm-next 2021-02-25 12:10:22 -08:00
greybus
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2021-02-23 14:52:22 -08:00
hsi
hv Revert "Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer" 2021-02-15 10:49:11 +00:00
hwmon Devicetree updates for v5.12: 2021-02-22 10:05:12 -08:00
hwspinlock hwspinlock: omap: Add support for K3 AM64x SoCs 2021-02-09 11:36:50 -06:00
hwtracing ARM updates for 5.12-rc1: 2021-02-22 14:27:07 -08:00
i2c ARM updates for 5.12-rc1: 2021-02-22 14:27:07 -08:00
i3c I3C for 5.12 2021-02-22 09:52:55 -08:00
ide
idle
iio - Use the newly introduced 'hot' and 'critical' ops for the acpi 2021-02-22 09:39:11 -08:00
infiniband Linux 5.11 2021-02-18 11:19:29 -04:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2021-02-23 14:56:23 -08:00
interconnect
iommu dma-mapping updates for 5.12: 2021-02-24 09:54:24 -08:00
ipack
irqchip irqchip updates for Linux 5.12: 2021-02-15 15:41:56 +01:00
isdn
leds TTY/Serial driver changes for 5.12-rc1 2021-02-20 21:28:04 -08:00
lightnvm lightnvm: pblk: Replace guid_copy() with export_guid()/import_guid() 2021-02-14 21:27:24 -07:00
macintosh
mailbox mailbox: arm_mhuv2: Skip calling kfree() with invalid pointer 2021-02-22 13:34:27 -06:00
mcb
md - Fix DM integrity's HMAC support to provide enhanced security of 2021-02-22 10:22:54 -08:00
media Fixes around VM_FPNMAP and follow_pfn 2021-02-22 17:45:02 -08:00
memory Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
memstick
message
mfd Simple Firmware Interface (SFI) support removal for v5.12-rc1 2021-02-24 10:35:29 -08:00
misc pci-v5.12-changes 2021-02-25 09:56:08 -08:00
mmc ARM updates for 5.12-rc1: 2021-02-22 14:27:07 -08:00
most
mtd This pull request contains changes (actually just fixes) for UBIFS 2021-02-21 13:57:08 -08:00
mux
net net: enetc: keep RX ring consumer index in sync with hardware 2021-03-01 13:34:47 -08:00
nfc Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
ntb NTB: Add support for EPF PCI Non-Transparent Bridge 2021-02-23 14:12:53 -06:00
nubus
nvdimm libnvdimm + device-dax for 5.12 2021-02-24 09:35:54 -08:00
nvme for-5.12/drivers-2021-02-17 2021-02-21 11:06:54 -08:00
nvmem
of Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
opp opp: Don't skip freq update for different frequency 2021-02-18 12:31:08 +05:30
parisc
parport
pci pci-v5.12-changes 2021-02-25 09:56:08 -08:00
pcmcia
perf IOMMU Updates for Linux v5.12 2021-02-22 10:31:29 -08:00
phy phy: second round of phy fixes for v5.11 2021-02-10 10:39:23 +01:00
pinctrl This is the bulk of pin control changes for the v5.12 kernel. 2021-02-22 18:39:11 -08:00
platform Additional ACPI updates for v5.12-rc1 2021-02-25 12:03:13 -08:00
pnp
power
powercap powercap: intel_rapl: Use topology interface in rapl_init_domains() 2021-02-12 16:53:01 +01:00
pps
ps3
ptp ptp: ptp_clockmatrix: clean-up - parenthesis around a == b are unnecessary 2021-02-17 13:49:26 -08:00
pwm pwm: Changes for v5.12-rc1 2021-02-25 12:23:49 -08:00
rapidio
ras
regulator - Core Frameworks 2021-02-22 09:29:42 -08:00
remoteproc remoteproc: qcom: pas: Add SM8350 PAS remoteprocs 2021-02-11 12:52:18 -06:00
reset ARM: SoC drivers for v5.12 2021-02-20 18:42:28 -08:00
rpmsg
rtc ARM updates for 5.12-rc1: 2021-02-22 14:27:07 -08:00
s390 SCSI misc on 20210219 2021-02-22 10:24:58 -08:00
sbus
scsi SCSI misc on 20210219 2021-02-22 10:24:58 -08:00
sh
siox
slimbus
soc This is all driver updates, the majority of which is a bunch of new Qualcomm 2021-02-22 09:45:23 -08:00
soundwire soundwire: intel: fix possible crash when no device is detected 2021-02-11 10:49:52 +05:30
spi powerpc updates for 5.12 2021-02-22 14:34:00 -08:00
spmi spmi: spmi-pmic-arb: Fix hw_irq overflow 2021-02-12 12:26:46 +01:00
ssb
staging Simple Firmware Interface (SFI) support removal for v5.12-rc1 2021-02-24 10:35:29 -08:00
target SCSI misc on 20210219 2021-02-22 10:24:58 -08:00
tc
tee Simplify i2c acess in OP-TEE driver 2021-02-09 16:10:49 +01:00
thermal - Use the newly introduced 'hot' and 'critical' ops for the acpi 2021-02-22 09:39:11 -08:00
thunderbolt thunderbolt: Changes for v5.12 merge window 2021-02-09 13:00:07 +01:00
tty Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
uio uio: uio_pci_generic: don't fail probe if pdev->irq equals to IRQ_NOTCONNECTED 2021-02-09 12:25:32 +01:00
usb Kbuild updates for v5.12 2021-02-25 10:17:31 -08:00
vdpa vdpa/mlx5: fix param validation in mlx5_vdpa_get_config() 2021-02-23 07:52:59 -05:00
vfio VFIO updates for v5.12-rc1 2021-02-24 10:43:40 -08:00
vhost virtio: features, fixes 2021-02-25 12:21:08 -08:00
video drm fixes for 5.12-rc1 + msm-next 2021-02-25 12:10:22 -08:00
virt
virtio virtio-input: add multi-touch support 2021-02-23 07:52:59 -05:00
visorbus
vlynq
vme vme: make remove callback return void 2021-02-09 12:15:07 +01:00
w1
watchdog Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
xen xen: branch for v5.12-rc1 2021-02-21 13:06:08 -08:00
zorro
Kconfig cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints 2021-02-16 20:36:38 -08:00
Makefile Simple Firmware Interface (SFI) support removal for v5.12-rc1 2021-02-24 10:35:29 -08:00