linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2025-01-06 22:04:22 +08:00

Author	SHA1	Message	Date
YueHaibing	ec7d6dd870	ethernet: ucc_geth: Use kmemdup() rather than kmalloc+memcpy Issue identified with Coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-23 18:51:42 -07:00
Vladimir Oltean	f5120f5998	dpaa2-eth: don't print error from dpaa2_mac_connect if that's EPROBE_DEFER When booting a board with DPAA2 interfaces defined statically via DPL (as opposed to creating them dynamically using restool), the driver will print an unspecific error message. This change adds the error code to the message, and avoids printing altogether if the error code is EPROBE_DEFER, because that is not a cause of alarm. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-21 14:49:40 -07:00
Ioana Ciornei	30f43d6f1c	dpaa2-eth: name the debugfs directory after the DPNI object Name the debugfs directory after the DPNI object instead of the netdev name since this can be changed after probe by udev rules. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-21 14:05:04 -07:00
Ioana Ciornei	b193f2ed53	dpaa2-eth: setup the of_node field of the device When the DPNI object is connected to a DPMAC, setup the of_node to point to the DTS device node of that specific MAC. This enables other drivers, for example the DSA subsystem, to find the net_device by its device node. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-21 14:05:04 -07:00
Fugang Duan	052fcc4531	net: fec: add defer probe for of_get_mac_address If MAC address read from nvmem efuse by calling .of_get_mac_address(), but nvmem efuse is registered later than the driver, then it return -EPROBE_DEFER value. So modify the driver to support defer probe when read MAC address from nvmem efuse. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-12 14:01:50 -07:00
Fugang Duan	619fee9eb1	net: fec: fix the potential memory leak in fec_enet_init() If the memory allocated for cbd_base is failed, it should free the memory allocated for the queues, otherwise it causes memory leak. And if the memory allocated for the queues is failed, it can return error directly. Fixes: `59d0f74656` ("net: fec: init multi queue date structure") Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-12 14:01:50 -07:00
Oleksij Rempel	4a52dd8fef	net: selftest: fix build issue if INET is disabled In case ethernet driver is enabled and INET is disabled, selftest will fail to build. Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: `3e1e58d64c` ("net: add generic selftest support") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210428130947.29649-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-04-28 14:06:45 -07:00
Yangbo Lu	7ce9c3d363	enetc: fix locking for one-step timestamping packet transfer The previous patch to support PTP Sync packet one-step timestamping described one-step timestamping packet handling logic as below in commit message: - Trasmit packet immediately if no other one in transfer, or queue to skb queue if there is already one in transfer. The test_and_set_bit_lock() is used here to lock and check state. - Start a work when complete transfer on hardware, to release the bit lock and to send one skb in skb queue if has. There was not problem of the description, but there was a mistake in implementation. The locking/test_and_set_bit_lock() should be put in enetc_start_xmit() which may be called by worker, rather than in enetc_xmit(). Otherwise, the worker calling enetc_start_xmit() after bit lock released is not able to lock again for transfer. Fixes: `7294380c52` ("enetc: support PTP Sync packet one-step timestamping") Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-23 13:52:30 -07:00
Arnd Bergmann	74c97ea3b6	net: enetc: fix link error again A link time bug that I had fixed before has come back now that another sub-module was added to the enetc driver: ERROR: modpost: "enetc_ierb_register_pf" [drivers/net/ethernet/freescale/enetc/fsl-enetc.ko] undefined! The problem is that the enetc Makefile is not actually used for the ierb module if that is the only built-in driver in there and everything else is a loadable module. Fix it by always entering the directory this time, regardless of which symbols are configured. This should reliably fix the problem and prevent it from coming back another time. Fixes: `112463ddbe` ("net: dsa: felix: fix link error") Fixes: `e7d48e5fbf` ("net: enetc: add a mini driver for the Integrated Endpoint Register Block") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-22 13:23:07 -07:00
Michael Walle	1b8caefaf4	net: enetc: automatically select IERB module Now that enetc supports flow control we have to make sure the settings in the IERB are correct. Therefore, we actually depend on the enetc-ierb module. Previously it was possible that this module was disabled while the enetc was enabled. Fix it by automatically select the enetc-ierb module. Fixes: `e7d48e5fbf` ("net: enetc: add a mini driver for the Integrated Endpoint Register Block") Signed-off-by: Michael Walle <michael@walle.cc> Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-20 16:56:32 -07:00
Oleksij Rempel	6016ba345f	net: fec: make use of generic NET_SELFTESTS library With this patch FEC on iMX will able to run generic net selftests Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-20 16:08:02 -07:00
Vladimir Oltean	a864888788	net: enetc: add support for flow control In the ENETC receive path, a frame received by the MAC is first stored in a 256KB 'FIFO' memory, then transferred to DRAM when enqueuing it to the RX ring. The FIFO is a shared resource for all ENETC ports, but every port keeps track of its own memory utilization, on RX and on TX. There is a setting for RX rings through which they can either operate in 'lossy' mode (where the lack of a free buffer causes an immediate discard of the frame) or in 'lossless' mode (where the lack of a free buffer in the ring makes the frame stay longer in the FIFO). In turn, when the memory utilization of the FIFO exceeds a certain margin, the MAC can be configured to emit PAUSE frames. There is enough FIFO memory to buffer up to 3 MTU-sized frames per RX port while not jeopardizing the other use cases (jumbo frames), and also not consume bytes from the port TX allocations. Also, 3 MTU-sized frames worth of memory is enough to ensure zero loss for 64 byte packets at 1G line rate. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-19 15:31:45 -07:00
Vladimir Oltean	e7d48e5fbf	net: enetc: add a mini driver for the Integrated Endpoint Register Block The NXP ENETC is a 4-port Ethernet controller which 'smells' to operating systems like 4 distinct PCIe PFs with SR-IOV, each PF having its own driver instance, but in fact there are some hardware resources which are shared between all ports, like for example the 256 KB SRAM FIFO between the MACs and the Host Transfer Agent which DMAs frames to DRAM. To hide the stuff that cannot be neatly exposed per port, the hardware designers came up with this idea of having a dedicated register block which is supposed to be populated by the bootloader, and contains everything configuration-related: MAC addresses, FIFO partitioning, etc. When a port is reset using PCIe Function Level Reset, its defaults are transferred from the IERB configuration. Most of the time, the settings made through the IERB are read-only in the port's memory space (if they are even visible), so they cannot be modified at runtime. Linux doesn't have any advanced FIFO partitioning requirements at all, but when reading through the hardware manual, it became clear that, even though there are many good 'recommendations' for default values, many of them were not actually put in practice on LS1028A. So we end up with a default configuration that: (a) does not have enough TX and RX byte credits to support the max MTU of 9600 (which the Linux driver claims already) properly (at full speed) (b) allows the FIFO to be overrun with RX traffic, potentially overwriting internal data structures. The last part sounds a bit catastrophic, but it isn't. Frames are supposed to transit the FIFO for a very short time, but they can actually accumulate there under 2 conditions: (a) there is very severe congestion on DRAM memory, or (b) the RX rings visible to the operating system were configured for lossless operation, and they just ran out of free buffers to copy the frame to. This is what is used to put backpressure onto the MAC with flow control. So since ENETC has not supported flow control thus far, RX FIFO overruns were never seen with Linux. But with the addition of flow control, we should configure some registers to prevent this from happening. What we are trying to protect against are bad actors which continue to send us traffic despite the fact that we have signaled a PAUSE condition. Of course we can't be lossless in that case, but it is best to configure the FIFO to do tail dropping rather than letting it overrun. So in a nutshell, this driver is a fixup for all the IERB default values that should have been but aren't. The IERB configuration needs to be done _before_ the PFs are enabled. So every PF searches for the presence of the "fsl,ls1028a-enetc-ierb" node in the device tree, and if it finds it, it "registers" with the IERB, which means that it requests the IERB to fix up its default values. This is done through -EPROBE_DEFER. The IERB driver is part of the fsl_enetc module, but is technically a platform driver, since the IERB is a good old fashioned MMIO region, as opposed to ENETC ports which pretend to be PCIe devices. The driver was already configuring ENETC_PTXMBAR (FIFO allocation for TX) because due to an omission, TXMBAR is a read/write register in the PF memory space. But the manual is quite clear that the formula for this should depend upon the TX byte credits (TXBCR). In turn, the TX byte credits are only readable/writable through the IERB. So if we want to ensure that the TXBCR register also has a value that is correct and in line with TXMBAR, there is simply no way this can be done from the PF driver, access to the IERB is needed. I could have modified U-Boot to fix up the IERB values, but that is quite undesirable, as old U-Boot versions are likely to be floating around for quite some time from now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-19 15:31:45 -07:00
Vladimir Oltean	87614b931c	net: enetc: create a common enetc_pf_to_port helper Even though ENETC interfaces are exposed as individual PCIe PFs with their own driver instances, the ENETC is still fundamentally a multi-port Ethernet controller, and some parts of the IP take a port number (as can be seen in the PSFP implementation). Create a common helper that can be used outside of the TSN code for retrieving the ENETC port number based on the PF number. This is only correct for LS1028A, the only Linux-capable instantiation of ENETC thus far. Note that ENETC port 3 is PF 6. The TSN code did not care about this because ENETC port 3 does not support TSN, so the wrong mapping done by enetc_get_port for PF 6 could have never been hit. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-19 15:31:45 -07:00
Vladimir Oltean	24e3930971	net: enetc: apply the MDIO workaround for XDP_REDIRECT too Described in `fd5736bf9f` ("enetc: Workaround for MDIO register access issue") is a workaround for a hardware bug that requires a register access of the MDIO controller to never happen concurrently with a register access of a port PF. To avoid that, a mutual exclusion scheme with rwlocks was implemented - the port PF accessors are the 'read' side, and the MDIO accessors are the 'write' side. When we do XDP_REDIRECT between two ENETC interfaces, all is fine because the MDIO lock is already taken from the NAPI poll loop. But when the ingress interface is not ENETC, just the egress is, the MDIO lock is not taken, so we might access the port PF registers concurrently with MDIO, which will make the link flap due to wrong values returned from the PHY. To avoid this, let's just slap an enetc_lock_mdio/enetc_unlock_mdio at the beginning and ending of enetc_xdp_xmit. The fact that the MDIO lock is designed as a rwlock is important here, because the read side is reentrant (that is one of the main reasons why we chose it). Usually, the way we benefit of its reentrancy is by running the data path concurrently on both CPUs, but in this case, we benefit from the reentrancy by taking the lock even when the lock is already taken (and that's the situation where ENETC is both the ingress and the egress interface for XDP_REDIRECT, which was fine before and still is fine now). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:40 -07:00
Vladimir Oltean	92ff9a6e57	net: enetc: fix buffer leaks with XDP_TX enqueue rejections If the TX ring is congested, enetc_xdp_tx() returns false for the current XDP frame (represented as an array of software BDs). This array of software TX BDs is constructed in enetc_rx_swbd_to_xdp_tx_swbd from software BDs freshly cleaned from the RX ring. The issue is that we scrub the RX software BDs too soon, more precisely before we know that we can enqueue the TX BDs successfully into the TX ring. If we can't enqueue them (and enetc_xdp_tx returns false), we call enetc_xdp_drop which attempts to recycle the buffers held by the RX software BDs. But because we scrubbed those RX BDs already, two things happen: (a) we leak their memory (b) we populate the RX software BD ring with an all-zero rx_swbd structure, which makes the buffer refill path allocate more memory. enetc_refill_rx_ring -> if (unlikely(!rx_swbd->page)) -> enetc_new_page That is a recipe for fast OOM. Fixes: `7ed2bc8007` ("net: enetc: add support for XDP_TX") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:40 -07:00
Vladimir Oltean	975acc833c	net: enetc: handle the invalid XDP action the same way as XDP_DROP When the XDP program returns an invalid action, we should free the RX buffer. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:40 -07:00
Vladimir Oltean	7eab503b11	net: enetc: use dedicated TX rings for XDP It is possible for one CPU to perform TX hashing (see netdev_pick_tx) between the 8 ENETC TX rings, and the TX hashing to select TX queue 1. At the same time, it is possible for the other CPU to already use TX ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual exclusion between XDP and the network stack, we run into an issue because the ENETC TX procedure is not reentrant. The obvious approach would be to just make XDP take the lock of the network stack's TX queue corresponding to the ring it's about to enqueue in. For XDP_REDIRECT, this is quite straightforward, a lock at the beginning and end of enetc_xdp_xmit() should do the trick. But for XDP_TX, it's a bit more complicated. For one, we do TX batching all by ourselves for frames with the XDP_TX verdict. This is something we would like to keep the way it is, for performance reasons. But batching means that the network stack's lock should be kept from the first enqueued XDP_TX frame and until we ring the doorbell. That is mostly fine, except for cases when in the same NAPI loop we have mixed XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while we are holding the lock from the RX NAPI, then bam, deadlock. The naive answer could be 'just flush the XDP_TX frames first, then release the network stack's TX queue lock, then call xdp_do_flush_map()'. But even xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT frames, so unless we unlock/relock the TX queue around xdp_do_redirect(), there simply isn't any clean way to protect XDP_TX from concurrent network stack .ndo_start_xmit() on another CPU. So we need to take a different approach, and that is to reserve two rings for the sole use of XDP. We leave TX rings 0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we pick them from the end of the priv->tx_ring array. We make an effort to keep the mapping done by enetc_alloc_msix() which decides which CPU handles the TX completions of which TX ring in its NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the XDP TX ring of CPU 1 is handled by TX ring 7. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:40 -07:00
Vladimir Oltean	ee3e875f10	net: enetc: increase TX ring size Now that commit `d6a2829e82` ("net: enetc: increase RX ring default size") has increased the RX ring size, it is quite easy to congest the TX rings when the traffic is predominantly XDP_TX, as the RX ring is quite a bit larger than the TX one. Since we bit the bullet and did the expensive thing already (larger RX rings consume more memory pages), it seems quite foolish to keep the TX rings small. So make them equally sized with TX. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	a6369fe6e0	net: enetc: remove unneeded xdp_do_flush_map() xdp_do_redirect already contains: -> dev_map_enqueue -> __xdp_enqueue -> bq_enqueue -> bq_xmit_all // if we have more than 16 frames So the logic from enetc will never be hit, because ENETC_DEFAULT_TX_WORK is 128. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	8f50d8bb3f	net: enetc: stop XDP NAPI processing when build_skb() fails When the code path below fails: enetc_clean_rx_ring_xdp // XDP_PASS -> enetc_build_skb -> enetc_map_rx_buff_to_skb -> build_skb enetc_clean_rx_ring_xdp will 'break', but that 'break' instruction isn't strong enough to actually break the NAPI poll loop, just the switch/case statement for XDP actions. So we increment rx_frm_cnt and go to the next frames minding our own business. Instead let's do what the skb NAPI poll function does, and break the loop now, waiting for the memory pressure to go away. Otherwise the next calls to build_skb() are likely to fail too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	672f9a2198	net: enetc: recycle buffers for frames with RX errors When receiving a frame with errors, currently we do nothing with it (we don't construct an skb or an xdp_buff), we just exit the NAPI poll loop. Let's put the buffer back into the RX ring (similar to XDP_DROP). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	6b04830d5e	net: enetc: rename the buffer reuse helpers enetc_put_xdp_buff has nothing to do with XDP, frankly, it is just a helper to populate the recycle end of the shadow RX BD ring (next_to_alloc) with a given buffer. On the other hand, enetc_put_rx_buff plays more tricks than its name would suggest. So let's rename enetc_put_rx_buff into enetc_flip_rx_buff to reflect the half-page buffer reuse tricks that it employs, and enetc_put_xdp_buff into enetc_put_rx_buff which suggests a more garden-variety operation. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	e9e49ae88e	net: enetc: remove redundant clearing of skb/xdp_frame pointer in TX conf path Later in enetc_clean_tx_ring we have: /* Scrub the swbd here so we don't have to do that * when we reuse it during xmit / memset(tx_swbd, 0, sizeof(tx_swbd)); So these assignments are unnecessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Claudiu Manoil	8eda54c5e6	gianfar: Drop GFAR_MQ_POLLING support Gianfar used to enable all 8 Rx queues (DMA rings) per ethernet device, even though the controller can only support 2 interrupt lines at most. This meant that multiple Rx queues would have to be grouped per NAPI poll routine, and the CPU would have to split the budget and service them in a round robin manner. The overhead of this scheme proved to outweight the potential benefits. The alternative was to introduce the "Single Queue" polling mode, supporting one Rx queue per NAPI, which became the default packet processing option and helped improve the performance of the driver. MQ_POLLING also relies on undocumeted device tree properties to specify how to map the 8 Rx and Tx queues to a given interrupt line (aka "interrupt group"). Using module parameters to enable this mode wasn't an option either. Long story short, MQ_POLLING became obsolete, now it is just dead code, and no one asked for it so far. For the Tx queues, multi-queue support (more than 1 Tx queue per CPU) could be revisited by adding tc MQPRIO support, but again, one has to consider that there are only 2 interrupt lines. So the NAPI poll routine would have to service multiple Tx rings. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 15:46:15 -07:00
Vladimir Oltean	2c4eca3ef7	net: bridge: switchdev: include local flag in FDB notifications As explained in bugfix commit `6ab4c3117a` ("net: bridge: don't notify switchdev for local FDB addresses") as well as in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug, which was that drivers would not know what to do with local FDB entries, because they were not told that they are local. The bug fix was to simply not notify them of those addresses. Let us now add the 'is_local' bit to bridge FDB entries, and make all drivers ignore these entries by their own choice. Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 15:15:45 -07:00
Yangbo Lu	b6faf160d0	enetc: convert to schedule_work() Convert system_wq queue_work() to schedule_work() which is a wrapper around it, since the former is a rare construct. Fixes: `7294380c52` ("enetc: support PTP Sync packet one-step timestamping") Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-15 16:53:08 -07:00
Michael Walle	652d3be21d	net: enetc: fetch MAC address from device tree Normally, the bootloader will already initialize the MAC address registers of the ENETC and the driver will just use them or generate a random one, if it is not initialized. Add a new way to provide the MAC address: via device tree. Besides the usual 'mac-address' property, there is also the possibility to fetch it via a NVMEM provider. The sl28 board stores the MAC address in the SPI NOR flash OTP region. Having this will allow linux to fetch the MAC address from there without being dependent on the bootloader. No in-tree boards have the device tree properties set, thus for these, this is a no-op. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-14 14:04:36 -07:00
Ioana Ciornei	166179542e	dpaa2-switch: reuse dpaa2_switch_acl_entry_add() for STP frames trap Since we added the dpaa2_switch_acl_entry_add() function in the previous patches to hide all the details of actually adding the ACL entry by issuing a firmware command, let's use it also for adding a CPU trap for the STP frames. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-13 15:12:18 -07:00
Ioana Ciornei	4ba28c1a1a	dpaa2-switch: add tc matchall filter support Add support TC_SETUP_CLSMATCHALL by using the same ACL table entries framework as for tc flower. Adding a matchall rule is done by installing an entry which has a mask of all zeroes, thus matching on any packet. This can be used as a catch-all type of rule if used correctly, ie the priority of the matchall filter should be kept as the lowest one in the entire filter block. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-13 15:12:18 -07:00
Ioana Ciornei	1110318d83	dpaa2-switch: add tc flower hardware offload on ingress traffic This patch adds support for tc flower hardware offload on the ingress path. Shared filter blocks are supported by sharing a single ACL table between multiple ports. The following flow keys are supported: - Ethernet: dst_mac/src_mac - IPv4: dst_ip/src_ip/ip_proto/tos - VLAN: vlan_id/vlan_prio/vlan_tpid/vlan_dei - L4: dst_port/src_port As per flow actions, the following are supported: - drop - mirred egress redirect - trap Each ACL entry (filter) can be setup with only one of the listed actions. A sorted single linked list is used to keep the ACL entries by their order of priority. When adding a new filter, this enables us to quickly ascertain if the new entry has the highest priority of the entire block or if we should make some space in the ACL table by increasing the priority of the filters already in the table. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-13 15:12:18 -07:00
Ioana Ciornei	2bf90ba510	dpaa2-switch: install default STP trap rule with the highest priority Change the default ACL trap rule for STP frames to have the highest priority. In the same ACL table will reside both default rules added by the driver for its internal use as well as rules added with tc flower. In this case, the default rules such as the STP one that we already have should have the highest priority. Also, remove the check for a full ACL table since we already know that it's sized so that we don't hit this case. The last thing changes is that default trap filters will not be counted in the acl_tbl's num_rules variable since their number doesn't change. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-13 15:12:18 -07:00
Ioana Ciornei	1b0f14b6c2	dpaa2-switch: create a central dpaa2_switch_acl_tbl structure Introduce a new structure - dpaa2_switch_acl_tbl - to hold all data related to an ACL table: number of rules added, ACL table id, etc. This will be used more in the next patches when adding support for sharing an ACL table between ports. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-13 15:12:18 -07:00
Michael Walle	83216e3988	of: net: pass the dst buffer to of_get_mac_address() of_get_mac_address() returns a "const void" pointer to a MAC address. Lately, support to fetch the MAC address by an NVMEM provider was added. But this will only work with platform devices. It will not work with PCI devices (e.g. of an integrated root complex) and esp. not with DSA ports. There is an of_ variant of the nvmem binding which works without devices. The returned data of a nvmem_cell_read() has to be freed after use. On the other hand the return of_get_mac_address() points to some static data without a lifetime. The trick for now, was to allocate a device resource managed buffer which is then returned. This will only work if we have an actual device. Change it, so that the caller of of_get_mac_address() has to supply a buffer where the MAC address is written to. Unfortunately, this will touch all drivers which use the of_get_mac_address(). Usually the code looks like: const char *addr; addr = of_get_mac_address(np); if (!IS_ERR(addr)) ether_addr_copy(ndev->dev_addr, addr); This can then be simply rewritten as: of_get_mac_address(np, ndev->dev_addr); Sometimes is_valid_ether_addr() is used to test the MAC address. of_get_mac_address() already makes sure, it just returns a valid MAC address. Thus we can just test its return code. But we have to be careful if there are still other sources for the MAC address before the of_get_mac_address(). In this case we have to keep the is_valid_ether_addr() call. The following coccinelle patch was used to convert common cases to the new style. Afterwards, I've manually gone over the drivers and fixed the return code variable: either used a new one or if one was already available use that. Mansour Moufid, thanks for that coccinelle patch! <spml> @a@ identifier x; expression y, z; @@ - x = of_get_mac_address(y); + x = of_get_mac_address(y, z); <... - ether_addr_copy(z, x); ...> @@ identifier a.x; @@ - if (<+... x ...+>) {} @@ identifier a.x; @@ if (<+... x ...+>) { ... } - else {} @@ identifier a.x; expression e; @@ - if (<+... x ...+>@e) - {} - else + if (!(e)) {...} @@ expression x, y, z; @@ - x = of_get_mac_address(y, z); + of_get_mac_address(y, z); ... when != x </spml> All drivers, except drivers/net/ethernet/aeroflex/greth.c, were compile-time tested. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-13 14:35:02 -07:00
Yangbo Lu	7294380c52	enetc: support PTP Sync packet one-step timestamping This patch is to add support for PTP Sync packet one-step timestamping. Since ENETC single-step register has to be configured dynamically per packet for correctionField offeset and UDP checksum update, current one-step timestamping packet has to be sent only when the last one completes transmitting on hardware. So, on the TX, this patch handles one-step timestamping packet as below: - Trasmit packet immediately if no other one in transfer, or queue to skb queue if there is already one in transfer. The test_and_set_bit_lock() is used here to lock and check state. - Start a work when complete transfer on hardware, to release the bit lock and to send one skb in skb queue if has. And the configuration for one-step timestamping on ENETC before transmitting is, - Set one-step timestamping flag in extension BD. - Write 30 bits current timestamp in tstamp field of extension BD. - Update PTP Sync packet originTimestamp field with current timestamp. - Configure single-step register for correctionField offeset and UDP checksum update. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-12 13:34:21 -07:00
Yangbo Lu	f768e75130	enetc: mark TX timestamp type per skb Mark TX timestamp type per skb on skb->cb[0], instead of global variable for all skbs. This is a preparation for one step timestamp support. For one-step timestamping enablement, there will be both one-step and two-step PTP messages to transfer. And a skb queue is needed for one-step PTP messages making sure start to send current message only after the last one completed on hardware. (ENETC single-step register has to be dynamically configured per message.) So, marking TX timestamp type per skb is required. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-12 13:34:21 -07:00
Jakub Kicinski	8859a44ea0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: MAINTAINERS - keep Chandrasekar drivers/net/ethernet/mellanox/mlx5/core/en_main.c - simple fix + trust the code re-added to param.c in -next is fine include/linux/bpf.h - trivial include/linux/ethtool.h - trivial, fix kdoc while at it include/linux/skmsg.h - move to relevant place in tcp.c, comment re-wrapped net/core/skmsg.c - add the sk = sk // sk = NULL around calls net/tipc/crypto.c - trivial Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-04-09 20:48:35 -07:00
Claudiu Manoil	6c5e6b4ccc	enetc: Use generic rule to map Tx rings to interrupt vectors Even if the current mapping is correct for the 1 CPU and 2 CPU cases (currently enetc is included in SoCs with up to 2 CPUs only), better use a generic rule for the mapping to cover all possible cases. The number of CPUs is the same as the number of interrupt vectors: Per device Tx rings - device_tx_ring[idx], where idx = 0..n_rings_total-1 Per interrupt vector Tx rings - int_vector[i].ring[j], where i = 0..n_int_vects-1 j = 0..n_rings_per_v-1 Mapping rule - n_rings_per_v = n_rings_total / n_int_vects for i = 0..n_int_vects - 1: for j = 0..n_rings_per_v - 1: idx = n_int_vects * j + i int_vector[i].ring[j] <- device_tx_ring[idx] Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210409071613.28912-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-04-09 18:22:09 -07:00
Vladimir Oltean	a93580a02d	net: enetc: fix TX ring interrupt storm The blamed commit introduced a bit in the TX software buffer descriptor structure for determining whether a BD is final or not; we rearm the TX interrupt vector for every frame (hence final BD) transmitted. But there is a problem with the patch: it replaced a condition whose expression is a bool which was evaluated at the beginning of the "while" loop with a bool expression that is evaluated on the spot: tx_swbd->is_eof. The problem with the latter expression is that the tx_swbd has already been incremented at that stage, so the tx_swbd->is_eof check is in fact with the _next_ software BD. Which is _not_ final. The effect is that the CPU is in 100% load with ksoftirqd because it does not acknowledge the TX interrupt, so the handler keeps getting called again and again. The fix is to restore the code structure, and keep the local bool is_eof variable, just to assign it the tx_swbd->is_eof value instead of !!tx_swbd->skb. Fixes: `d504498d2e` ("net: enetc: add a dedicated is_eof bit in the TX software BD") Reported-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20210409192759.3895104-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-04-09 18:17:12 -07:00
Dan Carpenter	626b598aa8	net: enetc: fix array underflow in error handling code This loop will try to unmap enetc_unmap_tx_buff[-1] and crash. Fixes: `9d2b68cc10` ("net: enetc: add support for XDP_REDIRECT") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YHBHfCY/yv3EnM9z@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-04-09 16:48:29 -07:00
Heiner Kallweit	557d5dc83f	net: fec: use mac-managed PHY PM Use the new mac_managed_pm flag to work around an issue with KSZ8081 PHY that becomes unstable when a soft reset is triggered during aneg. Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com> Tested-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-04-09 16:37:04 -07:00
Ioana Ciornei	8ed3cefc26	dpaa2-eth: export the rx copybreak value as an ethtool tunable It's useful, especially for debugging purposes, to have the Rx copybreak value changeable at runtime. Export it as an ethtool tunable. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:25:47 -07:00
Ioana Ciornei	50f826999a	dpaa2-eth: add rx copybreak support DMA unmapping, allocating a new buffer and DMA mapping it back on the refill path is really not that efficient. Proper buffer recycling (page pool, flipping the page and using the other half) cannot be done for DPAA2 since it's not a ring based controller but it rather deals with multiple queues which all get their buffers from the same buffer pool on Rx. To circumvent these limitations, add support for Rx copybreak. For small sized packets instead of creating a skb around the buffer in which the frame was received, allocate a new sk buffer altogether, copy the contents of the frame and release the initial page back into the buffer pool. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:25:47 -07:00
Ioana Ciornei	28d137cc8c	dpaa2-eth: rename dpaa2_eth_xdp_release_buf into dpaa2_eth_recycle_buf Rename the dpaa2_eth_xdp_release_buf function into dpaa2_eth_recycle_buf since in the next patches we'll be using the same recycle mechanism for the normal stack path beside for XDP_DROP. Also, rename the array which holds the buffers to be recycled so that it does not have any reference to XDP. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:25:47 -07:00
Vladimir Oltean	9d2b68cc10	net: enetc: add support for XDP_REDIRECT The driver implementation of the XDP_REDIRECT action reuses parts from XDP_TX, most notably the enetc_xdp_tx function which transmits an array of TX software BDs. Only this time, the buffers don't have DMA mappings, we need to create them. When a BPF program reaches the XDP_REDIRECT verdict for a frame, we can employ the same buffer reuse strategy as for the normal processing path and for XDP_PASS: we can flip to the other page half and seed that to the RX ring. Note that scatter/gather support is there, but disabled due to lack of multi-buffer support in XDP (which is added by this series): https://patchwork.kernel.org/project/netdevbpf/cover/cover.1616179034.git.lorenzo@kernel.org/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:57:44 -07:00
Vladimir Oltean	d6a2829e82	net: enetc: increase RX ring default size As explained in the XDP_TX patch, when receiving a burst of frames with the XDP_TX verdict, there is a momentary dip in the number of available RX buffers. The system will eventually recover as TX completions will start kicking in and refilling our RX BD ring again. But until that happens, we need to survive with as few out-of-buffer discards as possible. This increases the memory footprint of the driver in order to avoid discards at 2.5Gbps line rate 64B packet sizes, the maximum speed available for testing on 1 port on NXP LS1028A. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:57:44 -07:00
Vladimir Oltean	7ed2bc8007	net: enetc: add support for XDP_TX For reflecting packets back into the interface they came from, we create an array of TX software BDs derived from the RX software BDs. Therefore, we need to extend the TX software BD structure to contain most of the stuff that's already present in the RX software BD structure, for reasons that will become evident in a moment. For a frame with the XDP_TX verdict, we don't reuse any buffer right away as we do for XDP_DROP (the same page half) or XDP_PASS (the other page half, same as the skb code path). Because the buffer transfers ownership from the RX ring to the TX ring, reusing any page half right away is very dangerous. So what we can do is we can recycle the same page half as soon as TX is complete. The code path is: enetc_poll -> enetc_clean_rx_ring_xdp -> enetc_xdp_tx -> enetc_refill_rx_ring (time passes, another MSI interrupt is raised) enetc_poll -> enetc_clean_tx_ring -> enetc_recycle_xdp_tx_buff But that creates a problem, because there is a potentially large time window between enetc_xdp_tx and enetc_recycle_xdp_tx_buff, period in which we'll have less and less RX buffers. Basically, when the ship starts sinking, the knee-jerk reaction is to let enetc_refill_rx_ring do what it does for the standard skb code path (refill every 16 consumed buffers), but that turns out to be very inefficient. The problem is that we have no rx_swbd->page at our disposal from the enetc_reuse_page path, so enetc_refill_rx_ring would have to call enetc_new_page for every buffer that we refill (if we choose to refill at this early stage). Very inefficient, it only makes the problem worse, because page allocation is an expensive process, and CPU time is exactly what we're lacking. Additionally, there is an even bigger problem: if we let enetc_refill_rx_ring top up the ring's buffers again from the RX path, remember that the buffers sent to transmission haven't disappeared anywhere. They will be eventually sent, and processed in enetc_clean_tx_ring, and an attempt will be made to recycle them. But surprise, the RX ring is already full of new buffers, because we were premature in deciding that we should refill. So not only we took the expensive decision of allocating new pages, but now we must throw away perfectly good and reusable buffers. So what we do is we implement an elastic refill mechanism, which keeps track of the number of in-flight XDP_TX buffer descriptors. We top up the RX ring only up to the total ring capacity minus the number of BDs that are in flight (because we know that those BDs will return to us eventually). The enetc driver manages 1 RX ring per CPU, and the default TX ring management is the same. So we do XDP_TX towards the TX ring of the same index, because it is affined to the same CPU. This will probably not produce great results when we have a tc-taprio/tc-mqprio qdisc on the interface, because in that case, the number of TX rings might be greater, but I didn't add any checks for that yet (mostly because I didn't know what checks to add). It should also be noted that we need to change the DMA mapping direction for RX buffers, since they may now be reflected into the TX ring of the same device. We choose to use DMA_BIDIRECTIONAL instead of unmapping and remapping as DMA_TO_DEVICE, because performance is better this way. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:57:44 -07:00
Vladimir Oltean	d1b15102dd	net: enetc: add support for XDP_DROP and XDP_PASS For the RX ring, enetc uses an allocation scheme based on pages split into two buffers, which is already very efficient in terms of preventing reallocations / maximizing reuse, so I see no reason why I would change that. +--------+--------+--------+--------+--------+--------+--------+ \| \| \| \| \| \| \| \| \| half B \| half B \| half B \| half B \| half B \| half B \| half B \| \| \| \| \| \| \| \| \| +--------+--------+--------+--------+--------+--------+--------+ \| \| \| \| \| \| \| \| \| half A \| half A \| half A \| half A \| half A \| half A \| half A \| RX ring \| \| \| \| \| \| \| \| +--------+--------+--------+--------+--------+--------+--------+ ^ ^ \| \| next_to_clean next_to_alloc next_to_use +--------+--------+--------+--------+--------+ \| \| \| \| \| \| \| half B \| half B \| half B \| half B \| half B \| \| \| \| \| \| \| +--------+--------+--------+--------+--------+--------+--------+ \| \| \| \| \| \| \| \| \| half B \| half B \| half A \| half A \| half A \| half A \| half A \| RX ring \| \| \| \| \| \| \| \| +--------+--------+--------+--------+--------+--------+--------+ \| \| \| ^ ^ \| half A \| half A \| \| \| \| \| \| next_to_clean next_to_use +--------+--------+ ^ \| next_to_alloc then when enetc_refill_rx_ring is called, whose purpose is to advance next_to_use, it sees that it can take buffers up to next_to_alloc, and it says "oh, hey, rx_swbd->page isn't NULL, I don't need to allocate one!". The only problem is that for default PAGE_SIZE values of 4096, buffer sizes are 2048 bytes. While this is enough for normal skb allocations at an MTU of 1500 bytes, for XDP it isn't, because the XDP headroom is 256 bytes, and including skb_shared_info and alignment, we end up being able to make use of only 1472 bytes, which is insufficient for the default MTU. To solve that problem, we implement scatter/gather processing in the driver, because we would really like to keep the existing allocation scheme. A packet of 1500 bytes is received in a buffer of 1472 bytes and another one of 28 bytes. Because the headroom required by XDP is different (and much larger) than the one required by the network stack, whenever a BPF program is added or deleted on the port, we drain the existing RX buffers and seed new ones with the required headroom. We also keep the required headroom in rx_ring->buffer_offset. The simplest way to implement XDP_PASS, where an skb must be created, is to create an xdp_buff based on the next_to_clean RX BDs, but not clear those BDs from the RX ring yet, just keep the original index at which the BDs for this frame started. Then, if the verdict is XDP_PASS, instead of converting the xdb_buff to an skb, we replay a call to enetc_build_skb (just as in the normal enetc_clean_rx_ring case), starting from the original BD index. We would also like to be minimally invasive to the regular RX data path, and not check whether there is a BPF program attached to the ring on every packet. So we create a separate RX ring processing function for XDP. Because we only install/remove the BPF program while the interface is down, we forgo the rcu_read_lock() in enetc_clean_rx_ring, since there shouldn't be any circumstance in which we are processing packets and there is a potentially freed BPF program attached to the RX ring. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:57:44 -07:00
Vladimir Oltean	65d0cbb414	net: enetc: move up enetc_reuse_page and enetc_page_reusable For XDP_TX, we need to call enetc_reuse_page from enetc_clean_tx_ring, so we need to avoid a forward declaration. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:57:44 -07:00
Vladimir Oltean	1ee8d6f3be	net: enetc: clean the TX software BD on the TX confirmation path With the future introduction of some new fields into enetc_tx_swbd such as is_xdp_tx, is_xdp_redirect etc, we need not only to set these bits to true from the XDP_TX/XDP_REDIRECT code path, but also to false from the old code paths. This is because TX software buffer descriptors are kept in a ring that is shadow of the hardware TX ring, so these structures keep getting reused, and there is always the possibility that when a software BD is reused (after we ran a full circle through the TX ring), the old user of the tx_swbd had set is_xdp_tx = true, and now we are sending a regular skb, which would need to set is_xdp_tx = false. To be minimally invasive to the old code paths, let's just scrub the software TX BD in the TX confirmation path (enetc_clean_tx_ring), once we know that nobody uses this software TX BD (tx_ring->next_to_clean hasn't yet been updated, and the TX paths check enetc_bd_unused which tells them if there's any more space in the TX ring for a new enqueue). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:57:44 -07:00
Vladimir Oltean	d504498d2e	net: enetc: add a dedicated is_eof bit in the TX software BD In the transmit path, if we have a scatter/gather frame, it is put into multiple software buffer descriptors, the last of which has the skb pointer populated (which is necessary for rearming the TX MSI vector and for collecting the two-step TX timestamp from the TX confirmation path). At the moment, this is sufficient, but with XDP_TX, we'll need to service TX software buffer descriptors that don't have an skb pointer, however they might be final nonetheless. So add a dedicated bit for final software BDs that we populate and check explicitly. Also, we keep looking just for an skb when doing TX timestamping, because we don't want/need that for XDP. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:57:44 -07:00
Vladimir Oltean	a800abd3ec	net: enetc: move skb creation into enetc_build_skb We need to build an skb from two code paths now: from the plain RX data path and from the XDP data path when the verdict is XDP_PASS. Create a new enetc_build_skb function which contains the essential steps for building an skb based on the first and last positions of buffer descriptors within the RX ring. We also squash the enetc_process_skb function into enetc_build_skb, because what that function did wasn't very meaningful on its own. The "rx_frm_cnt++" instruction has been moved around napi_gro_receive for cosmetic reasons, to be in the same spot as rx_byte_cnt++, which itself must be before napi_gro_receive, because that's when we lose ownership of the skb. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:57:44 -07:00
Vladimir Oltean	2fa423f5f0	net: enetc: consume the error RX buffer descriptors in a dedicated function We can and should check the RX BD errors before starting to build the skb. The only apparent reason why things are done in this backwards order is to spare one call to enetc_rxbd_next. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-31 14:57:43 -07:00
Ioana Ciornei	bc96781a89	dpaa2-switch: setup learning state on STP state change Depending on what STP state a port is in, the learning on that port should be enabled or disabled. When the STP state is DISABLED, BLOCKING or LISTENING no learning should be happening irrespective of what the bridge previously requested. The learning state is changed to be the one setup by the bridge when the STP state is LEARNING or FORWARDING. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:18:26 -07:00
Ioana Ciornei	1a64ed129c	dpaa2-switch: trap STP frames to the CPU Add an ACL entry in each port's ACL table to redirect any frame that has the destination MAC address equal to the STP dmac to the control interface. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:18:26 -07:00
Ioana Ciornei	62734c7405	dpaa2-switch: keep track of the current learning state per port Keep track of the current learning state per port so that we can reference it in the next patches when setting up a STP state. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:18:26 -07:00
Ioana Ciornei	90f0710235	dpaa2-switch: create and assign an ACL table per port In order to trap frames to the CPU, the DPAA2 switch uses the ACL table. At probe time, create an ACL table for each switch port so that in the next patches we can use this to trap STP frames and redirect them to the control interface. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:18:26 -07:00
Ioana Ciornei	6aa6791d1a	dpaa2-switch: fix the translation between the bridge and dpsw STP states The numerical values used for STP states are different between the bridge and the MC ABI therefore, the direct usage of the BR_STATE_* macros directly in the structures passed to the firmware is incorrect. Create a separate function that translates between the bridge STP states and the enum that holds the STP state as seen by the Management Complex. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-30 17:18:26 -07:00
Claudiu Manoil	bff5b62585	gianfar: Handle error code at MAC address change Handle return error code of eth_mac_addr(); Fixes: `3d23a05c75` ("gianfar: Enable changing mac addr when if up") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-29 13:45:41 -07:00
David S. Miller	241949e488	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-03-24 The following pull-request contains BPF updates for your net-next tree. We've added 37 non-merge commits during the last 15 day(s) which contain a total of 65 files changed, 3200 insertions(+), 738 deletions(-). The main changes are: 1) Static linking of multiple BPF ELF files, from Andrii. 2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo. 3) Spelling fixes from various folks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:30:46 -07:00
Vladimir Oltean	e366a39208	net: enetc: don't depend on system endianness in enetc_set_mac_ht_flt When enetc runs out of exact match entries for unicast address filtering, it switches to an approach based on hash tables, where multiple MAC addresses might end up in the same bucket. However, the enetc_set_mac_ht_flt function currently depends on the system endianness, because it interprets the 64-bit hash value as an array of two u32 elements. Modify this to use lower_32_bits and upper_32_bits. Tested by forcing enetc to go into hash table mode by creating two macvlan upper interfaces: ip link add link eno0 address 00:01:02:03:00:00 eno0.0 type macvlan && ip link set eno0.0 up ip link add link eno0 address 00:01:02:03:00:01 eno0.1 type macvlan && ip link set eno0.1 up and verified that the same bit values are written to the registers before and after: enetc_sync_mac_filters: addr 00:00:80:00:40:10 exact match 0 enetc_sync_mac_filters: addr 00:00:00:00:80:00 exact match 0 enetc_set_mac_ht_flt: hash 0x80008000000000 UMHFR0 0x0 UMHFR1 0x800080 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 16:28:59 -07:00
Vladimir Oltean	110eccdb24	net: enetc: don't depend on system endianness in enetc_set_vlan_ht_filter ENETC has a 64-entry hash table for VLAN RX filtering per Station Interface, which is accessed through two 32-bit registers: VHFR0 holding the low portion, and VHFR1 holding the high portion. The enetc_set_vlan_ht_filter function looks at the pf->vlan_ht_filter bitmap, which is fundamentally an unsigned long variable, and casts it to a u32 array of two elements. It puts the first u32 element into VHFR0 and the second u32 element into VHFR1. It is easy to imagine that this will not work on big endian systems (although, yes, we have bigger problems, because currently enetc assumes that the CPU endianness is equal to the controller endianness, aka little endian - but let's assume that we could add a cpu_to_le32 in enetc_wd_reg and a le32_to_cpu in enetc_rd_reg). Let's use lower_32_bits and upper_32_bits which are designed to work regardless of endianness. Tested that both the old and the new method produce the same results: $ ethtool -K eth1 rx-vlan-filter on $ ip link add link eth1 name eth1.100 type vlan id 100 enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x0 VHFR1 0x20 enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x0 VHFR1 0x20 $ ip link add link eth1 name eth1.101 type vlan id 101 enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x0 VHFR1 0x30 enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x0 VHFR1 0x30 $ ip link add link eth1 name eth1.34 type vlan id 34 enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x0 VHFR1 0x34 enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x0 VHFR1 0x34 $ ip link add link eth1 name eth1.1024 type vlan id 1024 enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x1 VHFR1 0x34 enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x1 VHFR1 0x34 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 16:28:59 -07:00
Ioana Ciornei	b175dfd7e6	dpaa2-switch: mark skbs with offload_fwd_mark If a switch port is under a bridge, the offload_fwd_mark should be setup before sending the skb towards the stack so that the bridge does not try to flood the packet on the other switch ports. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 16:37:45 -07:00
Ioana Ciornei	6253d5e39c	dpaa2-switch: add support for configuring per port unknown flooding Add support for configuring per port unknown flooding by accepting both BR_FLOOD and BR_MCAST_FLOOD as offloadable bridge port flags. The DPAA2 switch does not support at the moment configuration of unknown multicast flooding independently of unknown unicast flooding, therefore check that both BR_FLOOD and BR_MCAST_FLOOD have the same state. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 16:37:45 -07:00
Ioana Ciornei	b54eb093f5	dpaa2-switch: add support for configuring per port broadcast flooding The BR_BCAST_FLOOD bridge port flag is now accepted by the driver and a change in its state will determine a reconfiguration of the broadcast egress flooding list on the FDB associated with the port. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 16:37:45 -07:00
Ioana Ciornei	1e7cbabfdb	dpaa2-switch: add support for configuring learning state per port Add support for configuring the learning state of a switch port. When the user requests the HW learning to be disabled, a fast-age procedure on that specific port is run so that previously learnt addresses do not linger. At device probe as well as on a bridge leave action, the ports are configured with HW learning disabled since they are basically a standalone port. At the same time, at bridge join we inherit the bridge port BR_LEARNING flag state and configure it on the switch port. There were already some MC firmware ABI functions for changing the learning state, but those were per FDB (bridging domain) and not per port so we need to adjust those to use the new MC fw command which is per port. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 16:37:44 -07:00
Ioana Ciornei	f054e3e217	dpaa2-switch: refactor the egress flooding domain setup Extract the code that determines the list of egress flood interfaces for a specific flood type into a new function - dpaa2_switch_fdb_get_flood_cfg(). This will help us to not duplicate code when the broadcast and unknown ucast/mcast flooding domains will be individually configurable. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 16:37:44 -07:00
Ioana Ciornei	c7e856c859	dpaa2-switch: move the dpaa2_switch_fdb_set_egress_flood function In order to avoid a forward declaration in the next patches, move the dpaa2_switch_fdb_set_egress_flood() function to the top of the file. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 16:37:44 -07:00
Vladimir Oltean	c54f042dcc	net: enetc: teardown CBDR during PF/VF unbind Michael reports that after the blamed patch, unbinding a VF would cause these transactions to remain pending, and trigger some warnings with the DMA API debug: $ echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs pci 0000:00:01.0: [1957:ef00] type 00 class 0x020001 fsl_enetc_vf 0000:00:01.0: Adding to iommu group 19 fsl_enetc_vf 0000:00:01.0: enabling device (0000 -> 0002) fsl_enetc_vf 0000:00:01.0 eno0vf0: renamed from eth0 $ echo 0 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs DMA-API: pci 0000:00:01.0: device driver has pending DMA allocations while released from device [count=1] One of leaked entries details: [size=2048 bytes] [mapped with DMA_BIDIRECTIONAL] [mapped as coherent] WARNING: CPU: 0 PID: 2547 at kernel/dma/debug.c:853 dma_debug_device_change+0x174/0x1c8 (...) Call trace: dma_debug_device_change+0x174/0x1c8 blocking_notifier_call_chain+0x74/0xa8 device_release_driver_internal+0x18c/0x1f0 device_release_driver+0x20/0x30 pci_stop_bus_device+0x8c/0xe8 pci_stop_and_remove_bus_device+0x20/0x38 pci_iov_remove_virtfn+0xb8/0x128 sriov_disable+0x3c/0x110 pci_disable_sriov+0x24/0x30 enetc_sriov_configure+0x4c/0x108 sriov_numvfs_store+0x11c/0x198 (...) DMA-API: Mapped at: dma_entry_alloc+0xa4/0x130 debug_dma_alloc_coherent+0xbc/0x138 dma_alloc_attrs+0xa4/0x108 enetc_setup_cbdr+0x4c/0x1d0 enetc_vf_probe+0x11c/0x250 pci 0000:00:01.0: Removing from iommu group 19 This happens because stupid me moved enetc_teardown_cbdr outside of enetc_free_si_resources, but did not bother to keep calling enetc_teardown_cbdr from all the places where enetc_free_si_resources was called. In particular, now it is no longer called from the main unbind function, just from the probe error path. Fixes: `4b47c0b81f` ("net: enetc: don't initialize unused ports from a separate code path") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-19 12:13:11 -07:00
Lorenzo Bianconi	fdc13979f9	bpf, devmap: Move drop error path to devmap for XDP_REDIRECT We want to change the current ndo_xdp_xmit drop semantics because it will allow us to implement better queue overflow handling. This is working towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error path handling from each XDP ethernet driver to devmap code. According to the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx loop whenever the hw reports a tx error and it will just return to devmap caller the number of successfully transmitted frames. It will be devmap responsibility to free dropped frames. Move each XDP ndo_xdp_xmit capable driver to the new APIs: - veth - virtio-net - mvneta - mvpp2 - socionext - amazon ena - bnxt - freescale (dpaa2, dpaa) - xen-frontend - qede - ice - igb - ixgbe - i40e - mlx5 - ti (cpsw, cpsw-new) - tun - sfc Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org	2021-03-18 16:38:51 +01:00
Ioana Ciornei	4fe72de61e	dpaa2-eth: fixup kdoc warnings Running kernel-doc over the dpaa2-eth driver generates a bunch of warnings. Fix them up by removing code comments for macros which are self-explanatory, respecting the kdoc format for macro documentation and other small changes like describing the expected return values of functions. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 15:29:49 -07:00
Ioana Ciornei	5ac2d25438	dpaa2-switch: fit the function declaration on the same line Multiple ABI function declarations are split unnecessarry on multiple lines. Fix this so that we have a consistent coding style. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 15:29:49 -07:00
Ioana Ciornei	2b7e3f7d1b	dpaa2-switch: reduce the size of the if_id bitmap to 64 bits The maximum number of DPAA2 switch interfaces, including the control interface, is 64. Even though this restriction existed from the first place, the command structures which use an interface id bitmap were poorly described and even though a single uint64_t is enough, all of them used an array of 4 uint64_t's. Fix this by reducing the size of the interface id field to a single uint64_t. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 15:29:49 -07:00
Ioana Ciornei	05b363608b	dpaa2-switch: fix kdoc warnings Running kernel-doc over the dpaa2-switch driver generates a bunch of warnings. Fix them up by removing code comments for macros which are self-explanatory and adding a bit more context for the dpsw_if_get_port_mac_addr() function and the fields of the dpsw_vlan_if_cfg structure. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 15:29:48 -07:00
Ioana Ciornei	cba0445633	dpaa2-switch: remove unused ABI functions Cleanup the dpaa2-switch driver a bit by removing any unused MC firmware ABI definitions. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 15:29:48 -07:00
Baowen Zheng	6a56e19902	flow_offload: reject configuration of packet-per-second policing in offload drivers A follow-up patch will allow users to configures packet-per-second policing in the software datapath. In preparation for this, teach all drivers that support offload of the policer action to reject such configuration as currently none of them support it. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-13 14:18:09 -08:00
Ioana Ciornei	f48298d3fb	staging: dpaa2-switch: move the driver out of staging Now that the dpaa2-switch driver has basic I/O capabilities on the switch port net_devices and multiple bridging domains are supported, move the driver out of staging. The dpaa2-switch driver is placed right next to the dpaa2-eth driver since, in the near future, they will be sharing most of the data path. I didn't implement code reuse in this patch series because I wanted to keep it as small as possible. Also, the README is removed from staging with the intention to add proper rst documentation afterwards to actually match was is supported by the driver. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Vladimir Oltean	7a5222cb7a	net: enetc: make enetc_refill_rx_ring update the consumer index Since commit `fd5736bf9f` ("enetc: Workaround for MDIO register access issue"), enetc_refill_rx_ring no longer updates the RX BD ring's consumer index, that is left to be done by the caller. This has led to bugs such as the ones found in `96a5223b91` ("net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr") and `3a5d12c9be` ("net: enetc: keep RX ring consumer index in sync with hardware"), so it is desirable that we move back the update of the consumer index into enetc_refill_rx_ring. The trouble with that is the different MDIO locking context for the two callers of enetc_refill_rx_ring: - enetc_clean_rx_ring runs under enetc_lock_mdio() - enetc_setup_rxbdr runs outside enetc_lock_mdio() Simplify the callers of enetc_refill_rx_ring by making enetc_setup_rxbdr explicitly take enetc_lock_mdio() around the call. It will be the only place in need of ensuring the hot accessors can be used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	0486185ee2	net: enetc: remove forward declaration for enetc_map_tx_buffs There is no other reason why this forward declaration exists rather than poor ordering of the functions. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	8580b3c3d7	net: enetc: remove forward-declarations of enetc_clean_{rx,tx}_ring This patch moves the NAPI enetc_poll after enetc_clean_rx_ring such that we can delete the forward declarations. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	7f071a450b	net: enetc: use enum enetc_active_offloads The active_offloads variable of enetc_ndev_priv has an enum type, use it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	c027aa9201	net: enetc: simplify callers of enetc_rxbd_next When we iterate through the BDs in the RX ring, the software producer index (which is already passed by value to enetc_rxbd_next) lags behind, and we end up with this funny looking "++i == rx_ring->bd_count" check so that we drag it after us. Let's pass the software producer index "i" by reference, so that enetc_rxbd_next can increment it by itself (mod rx_ring->bd_count), especially since enetc_rxbd_next has to increment the index anyway. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	4b47c0b81f	net: enetc: don't initialize unused ports from a separate code path Since commit `3222b5b613` ("net: enetc: initialize RFS/RSS memories for unused ports too") there is a requirement to initialize the memories of unused PFs too, which has left the probe path in a bit of a rough shape, because we basically have a minimal initialization path for unused PFs which is separate from the main initialization path. Now that initializing a control BD ring is as simple as calling enetc_setup_cbdr, let's move that outside of enetc_alloc_si_resources (unused PFs don't need classification rules, so no point in allocating them just to free them later). But enetc_alloc_si_resources is called both for PFs and for VFs, so now that enetc_setup_cbdr is no longer called from this common function, it means that the VF probe path needs to explicitly call enetc_setup_cbdr too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	5b4daa7f12	net: enetc: pass bd_count as an argument to enetc_setup_cbdr It makes no sense from an API perspective to first initialize some portion of struct enetc_cbdr outside enetc_setup_cbdr, then leave that function to initialize the rest. enetc_setup_cbdr should be able to perform all initialization given a zero-initialized struct enetc_cbdr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	0bfde022b3	net: enetc: squash clear_cbdr and free_cbdr into teardown_cbdr All call sites call enetc_clear_cbdr and enetc_free_cbdr one after another, so let's combine the two functions into a single method named enetc_teardown_cbdr which does both, and in the same order. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	27f9025d49	net: enetc: save the mode register address inside struct enetc_cbdr enetc_clear_cbdr depends on struct enetc_hw because it must disable the ring through a register write. We'd like to remove that dependency, so let's do what's already done with the producer and consumer indices, which is to save the iomem address in a variable kept in struct enetc_cbdr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	24be14e326	net: enetc: squash enetc_alloc_cbdr and enetc_setup_cbdr enetc_alloc_cbdr and enetc_setup_cbdr are always called one after another, so we can simplify the callers and make enetc_setup_cbdr do everything that's needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	01121ab739	net: enetc: save the DMA device for enetc_free_cbdr We shouldn't need to pass the struct device *dev to enetc CBDR APIs over and over again, so save this inside struct enetc_cbdr::dma_dev and avoid calling it from the enetc_free_cbdr functions. This breaks the dependency of the cbdr API from struct enetc_si (the station interface). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	176769d10f	net: enetc: move the CBDR API to enetc_cbdr.c Since there is a dedicated file in this driver for interacting with control BD rings, it makes sense to move these functions there. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:14 -08:00
Vladimir Oltean	847cbfc014	net: add a helper to avoid issues with HW TX timestamping and SO_TXTIME As explained in commit `29d98f54a4` ("net: enetc: allow hardware timestamping on TX queues with tc-etf enabled"), hardware TX timestamping requires an skb with skb->tstamp = 0. When a packet is sent with SO_TXTIME, the skb->skb_mstamp_ns corrupts the value of skb->tstamp, so the drivers need to explicitly reset skb->tstamp to zero after consuming the TX time. Create a helper named skb_txtime_consumed() which does just that. All drivers which offload TC_SETUP_QDISC_ETF should implement it, and it would make it easier to assess during review whether they do the right thing in order to be compatible with hardware timestamping or not. Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 12:45:16 -08:00
Vladimir Oltean	29d98f54a4	net: enetc: allow hardware timestamping on TX queues with tc-etf enabled The txtime is passed to the driver in skb->skb_mstamp_ns, which is actually in a union with skb->tstamp (the place where software timestamps are kept). Since commit `b50a5c70ff` ("net: allow simultaneous SW and HW transmit timestamping"), __sock_recv_timestamp has some logic for making sure that the two calls to skb_tstamp_tx: skb_tx_timestamp(skb) # Software timestamp in the driver -> skb_tstamp_tx(skb, NULL) and skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver will both do the right thing and in a race-free manner, meaning that skb_tx_timestamp will deliver a cmsg with the software timestamp only, and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg with the hardware timestamp only. Why are races even possible? Well, because although the software timestamp skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb) lives in skb_shinfo(skb), an area which is shared between skbs and their clones. And skb_tstamp_tx works by cloning the packets when timestamping them, therefore attempting to perform hardware timestamping on an skb's clone will also change the hardware timestamp of the original skb. And the original skb might have been yet again cloned for software timestamping, at an earlier stage. So the logic in __sock_recv_timestamp can't be as simple as saying "does this skb have a hardware timestamp? if yes I'll send the hardware timestamp to the socket, otherwise I'll send the software timestamp", precisely because the hardware timestamp is shared. Instead, it's quite the other way around: __sock_recv_timestamp says "does this skb have a software timestamp? if yes, I'll send the software timestamp, otherwise the hardware one". This works because the software timestamp is not shared with clones. But that means we have a problem when we attempt hardware timestamping with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp will say "oh, yeah, this must be some sort of odd clone" and will not deliver the hardware timestamp to the socket. And this is exactly what is happening when we have txtime enabled on the socket: as mentioned, that is put in a union with skb->tstamp, so it is quite easy to mistake it. Do what other drivers do (intel igb/igc) and write zero to skb->tstamp before taking the hardware timestamp. It's of no use to us now (we're already on the TX confirmation path). Fixes: `0d08c9ec7d` ("enetc: add support time specific departure base on the qos etf") Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-08 12:03:42 -08:00
Alex Marginean	1b2395dfff	net: enetc: set MAC RX FIFO to recommended value On LS1028A, the MAC RX FIFO defaults to the value 2, which is too high and may lead to RX lock-up under traffic at a rate higher than 6 Gbps. Set it to 1 instead, as recommended by the hardware design team and by later versions of the ENETC block guide. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Jason Liu <jason.hui.liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-08 12:03:42 -08:00
Michael Braun	d8861bab48	gianfar: fix jumbo packets+napi+rx overrun crash When using jumbo packets and overrunning rx queue with napi enabled, the following sequence is observed in gfar_add_rx_frag: \| lstatus \| \| skb \| t \| lstatus, size, flags \| first \| len, data_len, *ptr \| ---+--------------------------------------+-------+-----------------------+ 13 \| 18002348, 9032, INTERRUPT LAST \| 0 \| 9600, 8000, f554c12e \| 12 \| 10000640, 1600, INTERRUPT \| 0 \| 8000, 6400, f554c12e \| 11 \| 10000640, 1600, INTERRUPT \| 0 \| 6400, 4800, f554c12e \| 10 \| 10000640, 1600, INTERRUPT \| 0 \| 4800, 3200, f554c12e \| 09 \| 10000640, 1600, INTERRUPT \| 0 \| 3200, 1600, f554c12e \| 08 \| 14000640, 1600, INTERRUPT FIRST \| 0 \| 1600, 0, f554c12e \| 07 \| 14000640, 1600, INTERRUPT FIRST \| 1 \| 0, 0, f554c12e \| 06 \| 1c000080, 128, INTERRUPT LAST FIRST \| 1 \| 0, 0, abf3bd6e \| 05 \| 18002348, 9032, INTERRUPT LAST \| 0 \| 8000, 6400, c5a57780 \| 04 \| 10000640, 1600, INTERRUPT \| 0 \| 6400, 4800, c5a57780 \| 03 \| 10000640, 1600, INTERRUPT \| 0 \| 4800, 3200, c5a57780 \| 02 \| 10000640, 1600, INTERRUPT \| 0 \| 3200, 1600, c5a57780 \| 01 \| 10000640, 1600, INTERRUPT \| 0 \| 1600, 0, c5a57780 \| 00 \| 14000640, 1600, INTERRUPT FIRST \| 1 \| 0, 0, c5a57780 \| So at t=7 a new packets is started but not finished, probably due to rx overrun - but rx overrun is not indicated in the flags. Instead a new packets starts at t=8. This results in skb->len to exceed size for the LAST fragment at t=13 and thus a negative fragment size added to the skb. This then crashes: kernel BUG at include/linux/skbuff.h:2277! Oops: Exception in kernel mode, sig: 5 [#1] ... NIP [c04689f4] skb_pull+0x2c/0x48 LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844 Call Trace: [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable) [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4 [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0 [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70 [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc [ec4bff08] [c0062718] kthread+0x140/0x144 [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c This patch fixes this by checking for computed LAST fragment size, so a negative sized fragment is never added. In order to prevent the newer rx frame from getting corrupted, the FIRST flag is checked to discard the incomplete older frame. Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-05 13:13:32 -08:00
Vladimir Oltean	3a5d12c9be	net: enetc: keep RX ring consumer index in sync with hardware The RX rings have a producer index owned by hardware, where newly received frame buffers are placed, and a consumer index owned by software, where newly allocated buffers are placed, in expectation of hardware being able to place frame data in them. Hardware increments the producer index when a frame is received, however it is not allowed to increment the producer index to match the consumer index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received BDs. Whenever the producer index matches the value of the consumer index, the ring has no unprocessed received frames and all BDs in the ring have been initialized/prepared by software, i.e. hardware owns all BDs in the ring. The code uses the next_to_clean variable to keep track of the producer index, and the next_to_use variable to keep track of the consumer index. The RX rings are seeded from enetc_refill_rx_ring, which is called from two places: 1. initially the ring is seeded until full with enetc_bd_unused(rx_ring), i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511: .ndo_open -> enetc_open -> enetc_setup_bdrs -> enetc_setup_rxbdr -> enetc_refill_rx_ring 2. then during the data path processing, it is refilled with 16 buffers at a time: enetc_msix -> napi_schedule -> enetc_poll -> enetc_clean_rx_ring -> enetc_refill_rx_ring There is just one problem: the initial seeding done during .ndo_open updates just the producer index (ENETC_RBPIR) with 0, and the software next_to_clean and next_to_use variables. Notably, it will not update the consumer index to make the hardware aware of the newly added buffers. Wait, what? So how does it work? Well, the reset values of the producer index and of the consumer index of a ring are both zero. As per the description in the second paragraph, it means that the ring is full of buffers waiting for hardware to put frames in them, which by coincidence is almost true, because we have in fact seeded 511 buffers into the ring. But will the hardware attempt to access the 512th entry of the ring, which has an invalid BD in it? Well, no, because in order to do that, it would have to first populate the first 511 entries, and the NAPI enetc_poll will kick in by then. Eventually, after 16 processed slots have become available in the RX ring, enetc_clean_rx_ring will call enetc_refill_rx_ring and then will [ finally ] update the consumer index with the new software next_to_use variable. From now on, the next_to_clean and next_to_use variables are in sync with the producer and consumer ring indices. So the day is saved, right? Well, not quite. Freeing the memory allocated for the rings is done in: enetc_close -> enetc_clear_bdrs -> enetc_clear_rxbdr -> this just disables the ring -> enetc_free_rxtx_rings -> enetc_free_rx_ring -> sets next_to_clean and next_to_use to 0 but again, nothing is committed to the hardware producer and consumer indices (yay!). The assumption is that the ring is disabled, so the indices don't matter anyway, and it's the responsibility of the "open" code path to set those up. .. Except that the "open" code path does not set those up properly. While initially, things almost work, during subsequent enetc_close -> enetc_open sequences, we have problems. To be precise, the enetc_open that is subsequent to enetc_close will again refill the ring with 511 entries, but it will leave the consumer index untouched. Untouched means, of course, equal to the value it had before disabling the ring and draining the old buffers in enetc_close. But as mentioned, enetc_setup_rxbdr will at least update the producer index though, through this line of code: enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0); so at this stage we'll have: next_to_clean=0 (in hardware 0) next_to_use=511 (in hardware we'll have the refill index prior to enetc_close) Again, the next_to_clean and producer index are in sync and set to correct values, so the driver manages to limp on. Eventually, 16 ring entries will be consumed by enetc_poll, and the savior enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then update the hardware consumer ring based upon the new next_to_use. So.. it works? Well, by coincidence, it almost does, but there's a circumstance where enetc_clean_rx_ring won't be there to save us. If the previous value of the consumer index was 15, there's a problem, because the NAPI poll sequence will only issue a refill when 16 or more buffers have been consumed. It's easiest to illustrate this with an example: ip link set eno0 up ip addr add 192.168.100.1/24 dev eno0 ping 192.168.100.1 -c 20 # ping this port from another board ip link set eno0 down ip link set eno0 up ping 192.168.100.1 -c 20 # ping it again from the same other board One by one: 1. ip link set eno0 up -> calls enetc_setup_rxbdr: -> calls enetc_refill_rx_ring(511 buffers) -> next_to_clean=0 (in hw 0) -> next_to_use=511 (in hw 0) 2. ping 192.168.100.1 -c 20 # ping this port from another board enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0) enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15) 20 packets transmitted, 20 packets received, 0% packet loss 3. ip link set eno0 down enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15) 4. ip link set eno0 up -> calls enetc_setup_rxbdr: -> calls enetc_refill_rx_ring(511 buffers) -> next_to_clean=0 (in hw 0) -> next_to_use=511 (in hw 15) 5. ping 192.168.100.1 -c 20 # ping it again from the same other board enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15) 20 packets transmitted, 12 packets received, 40% packet loss And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal to 15 for that to happen), no nothing. The hardware enters the condition where the producer (14) + 1 is equal to the consumer (15) index, which makes it believe it has no more free buffers to put packets in, so it starts discarding them: ip netns exec ns0 ethtool -S eno0 \| grep -v ': 0' NIC statistics: Rx ring 0 discarded frames: 8 Summarized, if the interface receives between 16 and 32 (mod 512) frames and then there is a link flap, then the port will eventually die with no way to recover. If it receives less than 16 (mod 512) frames, then the initial NAPI poll [ before the link flap ] will not update the consumer index in hardware (it will remain zero) which will be ok when the buffers are later reinitialized. If more than 32 (mod 512) frames are received, the initial NAPI poll has the chance to refill the ring twice, updating the consumer index to at least 32. So after the link flap, the consumer index is still wrong, but the post-flap NAPI poll gets a chance to refill the ring once (because it passes through cleaned_cnt=15) and makes the consumer index be again back in sync with next_to_use. The solution to this problem is actually simple, we just need to write next_to_use into the hardware consumer index at enetc_open time, which always brings it back in sync after an initial buffer seeding process. The simpler thing would be to put the write to the consumer index into enetc_refill_rx_ring directly, but there are issues with the MDIO locking: in the NAPI poll code we have the enetc_lock_mdio() taken from top-level and we use the unlocked enetc_wr_reg_hot, whereas in enetc_open, the enetc_lock_mdio() is not taken at the top level, but instead by each individual enetc_wr_reg, so we are forced to put an additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of the code is left as a refactoring exercise. Fixes: `d4fd0404c1` ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-01 13:34:47 -08:00
Vladimir Oltean	96a5223b91	net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr The Station Interface Receive Interrupt Detect Register (SIRXIDR) contains a 16-bit wide mask of 'interrupt detected' events for each ring associated with a port. Bit i is write-1-to-clean for RX ring i. I have no explanation whatsoever how this line of code came to be inserted in the blamed commit. I checked the downstream versions of that patch and none of them have it. The somewhat comical aspect of it is that we're writing a binary number to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring). Since the RX rings have 512 buffer descriptors, we end up writing 511 to this register, which is 0x1ff, so we are effectively clearing the 'interrupt detected' event for rings 0-8. This register is not what is used for interrupt handling though - it only provides a summary for the entire SI. The hardware provides one separate Interrupt Detect Register per RX ring, which auto-clears upon read. So there doesn't seem to be any adverse effect caused by this bogus write. There is, however, one reason why this should be handled as a bugfix: next_to_clean _should_ be committed to hardware, just not to that register, and this was obscuring the fact that it wasn't. This is fixed in the next patch, and removing the bogus line now allows the fix patch to be backported beyond that point. Fixes: `fd5736bf9f` ("enetc: Workaround for MDIO register access issue") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-01 13:34:47 -08:00
Vladimir Oltean	c76a97218d	net: enetc: force the RGMII speed and duplex instead of operating in inband mode The ENETC port 0 MAC supports in-band status signaling coming from a PHY when operating in RGMII mode, and this feature is enabled by default. It has been reported that RGMII is broken in fixed-link, and that is not surprising considering the fact that no PHY is attached to the MAC in that case, but a switch. This brings us to the topic of the patch: the enetc driver should have not enabled the optional in-band status signaling for RGMII unconditionally, but should have forced the speed and duplex to what was resolved by phylink. Note that phylink does not accept the RGMII modes as valid for in-band signaling, and these operate a bit differently than 1000base-x and SGMII (notably there is no clause 37 state machine so no ACK required from the MAC, instead the PHY sends extra code words on RXD[3:0] whenever it is not transmitting something else, so it should be safe to leave a PHY with this option unconditionally enabled even if we ignore it). The spec talks about this here: https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/138/RGMIIv1_5F00_3.pdf Fixes: `71b77a7a27` ("enetc: Migrate to PHYLINK and PCS_LYNX") Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-01 13:34:47 -08:00
Vladimir Oltean	a74dbce9d4	net: enetc: don't disable VLAN filtering in IFF_PROMISC mode Quoting from the blamed commit: In promiscuous mode, it is more intuitive that all traffic is received, including VLAN tagged traffic. It appears that it is necessary to set the flag in PSIPVMR for that to be the case, so VLAN promiscuous mode is also temporarily enabled. On exit from promiscuous mode, the setting made by ethtool is restored. Intuitive or not, there isn't any definition issued by a standards body which says that promiscuity has anything to do with VLAN filtering - it only has to do with accepting packets regardless of destination MAC address. In fact people are already trying to use this misunderstanding/bug of the enetc driver as a justification to transform promiscuity into something it never was about: accepting every packet (maybe that would be the "rx-all" netdev feature?): https://lore.kernel.org/netdev/20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/ This is relevant because there are use cases in the kernel (such as tc-flower rules with the protocol 802.1Q and a vlan_id key) which do not (yet) use the vlan_vid_add API to be compatible with VLAN-filtering NICs such as enetc, so for those, disabling rx-vlan-filter is currently the only right solution to make these setups work: https://lore.kernel.org/netdev/CA+h21hoxwRdhq4y+w8Kwgm74d4cA0xLeiHTrmT-VpSaM7obhkg@mail.gmail.com/ The blamed patch has unintentionally introduced one more way for this to work, which is to enable IFF_PROMISC, however this is non-portable because port promiscuity is not meant to disable VLAN filtering. Therefore, it could invite people to write broken scripts for enetc, and then wonder why they are broken when migrating to other drivers that don't handle promiscuity in the same way. Fixes: `7070eea5e9` ("enetc: permit configuration of rx-vlan-filter with ethtool") Cc: Markus Blöchl <Markus.Bloechl@ipetronik.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-01 13:34:47 -08:00
Vladimir Oltean	827b6fd046	net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets When the enetc ports have rx-vlan-offload enabled, they report a TPID of ETH_P_8021Q regardless of what was actually in the packet. When rx-vlan-offload is disabled, packets have the proper TPID. Fix this inconsistency by finishing the TODO left in the code. Fixes: `d4fd0404c1` ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-01 13:34:47 -08:00
Vladimir Oltean	6d36ecdbc4	net: enetc: take the MDIO lock only once per NAPI poll cycle The workaround for the ENETC MDIO erratum caused a performance degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of 64B packets). This is due to excessive locking and unlocking in the fast path, which can be avoided. By taking the MDIO read-side lock only once per NAPI poll cycle, we are able to regain 54 Kpps (65%) of the performance hit. The rest of the performance degradation comes from the TX data path, but unfortunately it doesn't look like we can optimize that away easily, even with netdev_xmit_more(), there just isn't any skb batching done, to help with taking the MDIO lock less often than once per packet. We need to change the register accessor type for enetc_get_tx_tstamp, because it now runs under the enetc_lock_mdio as per the new call path detailed below: enetc_msix -> napi_schedule -> enetc_poll -> enetc_lock_mdio -> enetc_clean_tx_ring -> enetc_get_tx_tstamp -> enetc_clean_rx_ring -> enetc_unlock_mdio Fixes: `fd5736bf9f` ("enetc: Workaround for MDIO register access issue") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-01 13:34:47 -08:00
Vladimir Oltean	3222b5b613	net: enetc: initialize RFS/RSS memories for unused ports too Michael reports that since linux-next-20210211, the AER messages for ECC errors have started reappearing, and this time they can be reliably reproduced with the first ping on one of his LS1028A boards. $ ping 1[ 33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0 72.16.0.1 PING [ 33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000 172.16.0.1 (172.16.0.1): 56 data bytes 64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms 64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms $ devmem 0x1f8010e10 32 0xC0000006 It isn't clear why this is necessary, but it seems that for the errors to go away, we must clear the entire RFS and RSS memory, not just for the ports in use. Sadly the code is structured in such a way that we can't have unified logic for the used and unused ports. For the minimal initialization of an unused port, we need just to enable and ioremap the PF memory space, and a control buffer descriptor ring. Unused ports must then free the CBDR because the driver will exit, but used ports can not pick up from where that code path left, since the CBDR API does not reinitialize a ring when setting it up, so its producer and consumer indices are out of sync between the software and hardware state. So a separate enetc_init_unused_port function was created, and it gets called right after the PF memory space is enabled. Fixes: `07bf34a50e` ("net: enetc: initialize the RFS and RSS memories") Reported-by: Michael Walle <michael@walle.cc> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-01 13:34:47 -08:00

1 2 3 4 5 ...

1604 Commits