linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2025-01-17 19:34:12 +08:00

Author	SHA1	Message	Date
Arnd Bergmann	0e9e7598c6	octeontx2-nic: fix mixed module build Building the VF and PF side of this driver differently, with one being a loadable module and the other one built-in results in a link failure for the common PTP driver: ld.lld: error: undefined symbol: __this_module >>> referenced by otx2_ptp.c >>> net/ethernet/marvell/octeontx2/nic/otx2_ptp.o:(otx2_ptp_init) in archive drivers/built-in.a >>> referenced by otx2_ptp.c >>> net/ethernet/marvell/octeontx2/nic/otx2_ptp.o:(otx2_ptp_init) in archive drivers/built-in.a Move the otx2_ptp.c code into a separate module that gets built for both configurations, making it built-in if at least one of the other two is built-in. Fixes: `43510ef4dd` ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 13:14:00 +01:00
Kunihiko Hayashi	9fd3d5dced	net: ethernet: ave: Add compatible string and SoC-dependent data for NX1 SoC Add basic support for UniPhier NX1 SoC. This includes a compatible string and SoC-dependent data. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 13:10:08 +01:00
Uwe Kleine-König	d40dfa0ceb	net: w5100: Make w5100_remove() return void Up to now w5100_remove() returns zero unconditionally. Make it return void instead which makes it easier to see in the callers that there is no error to handle. Also the return value of platform and spi remove callbacks is ignored anyway. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 12:59:12 +01:00
Uwe Kleine-König	2841bfd10a	net: ks8851: Make ks8851_remove_common() return void Up to now ks8851_remove_common() returns zero unconditionally. Make it return void instead which makes it easier to see in the callers that there is no error to handle. Also the return value of platform and spi remove callbacks is ignored anyway. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 12:59:12 +01:00
Ahmed S. Darwish	50dc9a8572	net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types The only factor differentiating per-CPU bstats data type (struct gnet_stats_basic_cpu) from the packed non-per-CPU one (struct gnet_stats_basic_packed) was a u64_stats sync point inside the former. The two data types are now equivalent: earlier commits added a u64_stats sync point to the latter. Combine both data types into "struct gnet_stats_basic_sync". This eliminates redundancy and simplifies the bstats read/write APIs. Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit architectures, u64_stats sync points do not use sequence counter protection. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 12:54:41 +01:00
Jakub Kicinski	ec356edef7	ethernet: ixgb: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). ixgb_get_ee_mac_addr() is used with a non-nevdev->dev_addr pointer so we can't deal with the problem inside it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	5c8b348534	ethernet: ibmveth: use ether_addr_to_u64() We'll want to make netdev->dev_addr const, remove the local helper which is missing a const qualifier on the argument and use ether_addr_to_u64(). Similar story to mlx4. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	d9ca87233b	ethernet: enetc: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Pass a netdev into the helper instead of just the address, read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	10e6ded812	ethernet: ec_bhf: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Copy the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	41edfff572	ethernet: enic: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Use a zero'ed array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	0c9e0c7931	ethernet: bcmgenet: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	a85c8f9ad2	ethernet: bnx2x: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	698c33d8b4	ethernet: aquantia: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Use an array on the stack, then call eth_hw_addr_set(). eth_hw_addr_set() is after error checking, this should be fine, error propagates all the way to failing probe. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	f98c50509a	ethernet: amd: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	ffaeca68fb	ethernet: alteon: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Break the address apart into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:46 +01:00
Jakub Kicinski	0d4c751715	ethernet: aeroflex: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. macaddr[] is a module param, and int, so copy the address into an array of u8 on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:45 +01:00
Jakub Kicinski	8ec53ed9af	ethernet: adaptec: use eth_hw_addr_set() Commit `406f42fa0d` ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:53:45 +01:00
David S. Miller	93eb2b7721	mlx5-updates-2021-10-15 1) From Rongwei Liu: Use system_image_guid and native_port_num when bonding. Don't relay on PCIe ids anymore. With some specific NIC, the physical devices may have PCIe IDs like 0001:01:00.0/1 and 0002:02:00.0/1. All of these devices should have the same system_image_guid and device index can be queried from native_port_num. For matching sibling devices/port of the same HCA, compare the HCA GUID reported on each device rather than just assuming PCIe ids have similar attributes. 2) From Amir Tzin: Use HCA defined Timouts Replace hard coded timeouts with values stored by firmware in default timeouts register (DTOR). Timeouts are read during driver load. If DTOR is not supported by firmware then fallback to hard coded defaults instead. 3) From Shay Drory: Disable roce at HCA level Disable RoCE in Firmware when devlink roce parameter is set to off. 4) A small set of trivial cleanups -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFqHtsACgkQSD+KveBX +j592AgAlO2CFEmYMSJmeQ7jBBIFCIq26XwWYJncuuwJsY+RxilzSRVTZGEydzw3 dfvYpUGvOox0jjRXm1kohz/BLVUW/crxhBDgr2oYYvP1yCLJtPa3lDzw5rABdt46 +cyweKQatKE1Bfbc3t9+aecHRzi84EcwjfVB1dc1/CuzmuwThMBaE0TxuGk8fYMc euLKIYQHuGzY6pFMiAHp6oxb5D0H0PPzjSZD+9Jubx7j7NiKPqfGu91B1PgNCkji zmbIOtj1xqQHzwVErUv8AbFDbjlajqUdAG6E9j5xQZkjPhn/1VlxS8VWPhGBO+w8 R+Ta3k/0vNFattQdtFFkTRaiElCpLw== =kDng -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-10-15 1) From Rongwei Liu: Use system_image_guid and native_port_num when bonding. Don't relay on PCIe ids anymore. With some specific NIC, the physical devices may have PCIe IDs like 0001:01:00.0/1 and 0002:02:00.0/1. All of these devices should have the same system_image_guid and device index can be queried from native_port_num. For matching sibling devices/port of the same HCA, compare the HCA GUID reported on each device rather than just assuming PCIe ids have similar attributes. 2) From Amir Tzin: Use HCA defined Timouts Replace hard coded timeouts with values stored by firmware in default timeouts register (DTOR). Timeouts are read during driver load. If DTOR is not supported by firmware then fallback to hard coded defaults instead. 3) From Shay Drory: Disable roce at HCA level Disable RoCE in Firmware when devlink roce parameter is set to off. 4) A small set of trivial cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 08:49:19 +01:00
Rongwei Liu	8a543184d7	net/mlx5: Use system_image_guid to determine bonding With specific NICs, the PFs may have different PCIe ids like 0001:01:00.0/1 and 0002:02:00:00/1. For PFs with the same system_image_guid, driver should consider them under the same physical NIC and they are legal to bond together. If firmware doesn't support system_image_guid, set it to zero and fallback to use PCIe ids. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:47 -07:00
Rongwei Liu	2ec16ddde1	net/mlx5: Introduce new device index wrapper Downstream patches. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:46 -07:00
Rongwei Liu	7b1b6d35f0	net/mlx5: Check return status first when querying system_image_guid When querying system_image_guid from firmware, we should check return value first. The buffer content is valid only if query succeed. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:46 -07:00
Len Baker	0e6f3ef469	net/mlx5: DR, Prefer kcalloc over open coded arithmetic As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, refactor the code a bit to use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function. [1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by: Len Baker <len.baker@gmx.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:46 -07:00
Abhiram R N	0885ae1a9d	net/mlx5e: Add extack msgs related to TC for better debug As multiple places EOPNOTSUPP and EINVAL is returned from driver it becomes difficult to understand the reason only with error code. With the netlink extack message exact reason will be known and will aid in debugging. Signed-off-by: Abhiram R N <abhiramrn@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:45 -07:00
Paul Blakey	88594d8331	net/mlx5: CT: Fix missing cleanup of ct nat table on init failure If CT fails to initialize it's rhashtables, it doesn't destroy the ct nat global table. Destroy the ct nat global table on ct init failure. Fixes: `d7cade5137` ("net/mlx5e: check return value of rhashtable_init") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:45 -07:00
Shay Drory	fbfa97b4d7	net/mlx5: Disable roce at HCA level Currently, when a user disables roce via the devlink param, this change isn't passed down to the device. If device allows disabling RoCE at device level, make use of it. This instructs the device to skip memory allocations related to RoCE functionality which otherwise is done by the device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:44 -07:00
Moosa Baransi	9fbe1c25ec	net/mlx5i: Enable Rx steering for IPoIB via ethtool Enable steering IPoIB packets via ethtool, the same way it is done today for Ethernet packets. Signed-off-by: Moosa Baransi <moosab@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:44 -07:00
Vlad Buslov	17ac528d88	net/mlx5: Bridge, provide flow source hints Currently, SMFS mode doesn't support rx-loopback flows which causes bridge egress rules to be rejected because without hint rules for both rx and tx destinations are created by default. Provide explicit flow source hints for compatibility with SMFS. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:44 -07:00
Amir Tzin	32def4120e	net/mlx5: Read timeout values from DTOR Replace hard coded timeouts with values stored by firmware in default timeouts register (DTOR). Timeouts are read during driver load. If DTOR is not supported by firmware then fallback to hard coded defaults instead. Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:43 -07:00
Amir Tzin	5945e1adea	net/mlx5: Read timeout values from init segment Replace hard coded timeouts with values stored in firmware's init segment. Timeouts are read from init segment during driver load. If init segment timeouts are not supported then fallback to hard coded defaults instead. Also move pre initialization timeouts which cannot be read from firmware to the new mechanism. Signed-off-by: Amir Tzin <amirtz@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-15 17:37:43 -07:00
Maciej Fijalkowski	2faf63b650	ice: make use of ice_for_each_* macros Go through the code base and use ice_for_each_* macros. While at it, introduce ice_for_each_xdp_txq() macro that can be used for looping over xdp_rings array. Commit is not introducing any new functionality. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-15 07:39:03 -07:00
Maciej Fijalkowski	22bf877e52	ice: introduce XDP_TX fallback path Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-15 07:39:03 -07:00
Maciej Fijalkowski	9610bd988d	ice: optimize XDP_TX workloads Optimize Tx descriptor cleaning for XDP. Current approach doesn't really scale and chokes when multiple flows are handled. Introduce two ring fields, @next_dd and @next_rs that will keep track of descriptor that should be looked at when the need for cleaning arise and the descriptor that should have the RS bit set, respectively. Note that at this point the threshold is a constant (32), but it is something that we could make configurable. First thing is to get away from setting RS bit on each descriptor. Let's do this only once NTU is higher than the currently @next_rs value. In such case, grab the tx_desc[next_rs], set the RS bit in descriptor and advance the @next_rs by a 32. Second thing is to clean the Tx ring only when there are less than 32 free entries. For that case, look up the tx_desc[next_dd] for a DD bit. This bit is written back by HW to let the driver know that xmit was successful. It will happen only for those descriptors that had RS bit set. Clean only 32 descriptors and advance the DD bit. Actual cleaning routine is moved from ice_napi_poll() down to the ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any SKBs in there that would rely on interrupts for the cleaning. Nice side effect is that for rare case of Tx fallback path (that next patch is going to introduce) we don't have to trigger the SW irq to clean the ring. With those two concepts, ring is kept at being almost full, but it is guaranteed that driver will be able to produce Tx descriptors. This approach seems to work out well even though the Tx descriptors are produced in one-by-one manner. Test was conducted with the ice HW bombarded with packets from HW generator, configured to generate 30 flows. Xdp2 sample yields the following results: <snip> proto 17: 79973066 pkt/s proto 17: 80018911 pkt/s proto 17: 80004654 pkt/s proto 17: 79992395 pkt/s proto 17: 79975162 pkt/s proto 17: 79955054 pkt/s proto 17: 79869168 pkt/s proto 17: 79823947 pkt/s proto 17: 79636971 pkt/s </snip> As that sample reports the Rx'ed frames, let's look at sar output. It says that what we Rx'ed we do actually Tx, no noticeable drops. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00 0.00 0.00 38.32 with tx_busy staying calm. When compared to a state before: Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00 0.00 0.00 43.64 it can be observed that the amount of txpck/s is almost doubled, meaning that the performance is improved by around 90%. All of this due to the drops in the driver, previously the tx_busy stat was bumped at a 7mpps rate. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-15 07:39:03 -07:00
Maciej Fijalkowski	eb087cd828	ice: propagate xdp_ring onto rx_ring With rings being split, it is now convenient to introduce a pointer to XDP ring within the Rx ring. For XDP_TX workloads this means that xdp_rings array access will be skipped, which was executed per each processed frame. Also, read the XDP prog once per NAPI and if prog is present, set up the local xdp_ring pointer. Reading prog a single time was discussed in [1] with some concern raised by Toke around dispatcher handling and having the need for going through the RCU grace period in the ndo_bpf driver callback, but ice currently is torning down NAPI instances regardless of the prog presence on VSI. Although the pointer to XDP ring introduced to Rx ring makes things a lot slimmer/simpler, I still feel that single prog read per NAPI lifetime is beneficial. Further patch that will introduce the fallback path will also get a profit from that as xdp_ring pointer will be set during the XDP rings setup. [1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/ Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-15 07:39:03 -07:00
Maciej Fijalkowski	a55e16fa33	ice: do not create xdp_frame on XDP_TX xdp_frame is not needed for XDP_TX data path in ice driver case. For this data path cleaning of sent descriptor will not happen anywhere outside of the driver, which means that carrying the information about the underlying memory model via xdp_frame will not be used. Therefore, this conversion can be simply dropped, which would relieve CPU a bit. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-15 07:39:03 -07:00
Maciej Fijalkowski	0bb4f9ecad	ice: unify xdp_rings accesses There has been a long lasting issue of improper xdp_rings indexing for XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index is mixed with smp_processor_id(), there could be a situation where Tx descriptors are produced onto XDP Tx ring, but tail is never bumped - for example pin a particular queue id to non-matching IRQ line. Address this problem by ignoring the user ring count setting and always initialize the xdp_rings array to be of num_possible_cpus() size. Then, always use the smp_processor_id() as an index to xdp_rings array. This provides serialization as at given time only a single softirq can run on a particular CPU. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-15 07:39:02 -07:00
Maciej Fijalkowski	e72bba2135	ice: split ice_ring onto Tx/Rx separate structs While it was convenient to have a generic ring structure that served both Tx and Rx sides, next commits are going to introduce several Tx-specific fields, so in order to avoid hurting the Rx side, let's pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs. Rx ring could be handled by the old ice_ring which would reduce the code churn within this patch, but this would make things asymmetric. Make the union out of the ring container within ice_q_vector so that it is possible to iterate over newly introduced ice_tx_ring. Remove the @size as it's only accessed from control path and it can be calculated pretty easily. Change definitions of ice_update_ring_stats and ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be used for both Rx and Tx rings. Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In Rx ring xdp_rxq_info occupies its own cacheline, so it's the major difference now. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-15 07:39:02 -07:00
Maciej Fijalkowski	dc23715cf3	ice: move ice_container_type onto ice_ring_container Currently ice_container_type is scoped only for ice_ethtool.c. Next commit that will split the ice_ring struct onto Rx/Tx specific ring structs is going to also modify the type of linked list of rings that is within ice_ring_container. Therefore, the functions that are taking the ice_ring_container as an input argument will need to be aware of a ring type that will be looked up. Embed ice_container_type within ice_ring_container and initialize it properly when allocating the q_vectors. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-15 07:39:02 -07:00
Maciej Fijalkowski	e93d1c37a8	ice: remove ring_active from ice_ring This field is dead and driver is not making any use of it. Simply remove it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-10-15 07:39:02 -07:00
Ioana Ciornei	fc398bec03	net: dpaa2: add adaptive interrupt coalescing Add support for adaptive interrupt coalescing to the dpaa2-eth driver. First of all, ETHTOOL_COALESCE_USE_ADAPTIVE_RX is defined as a supported coalesce parameter and the requested state is configured through the dpio APIs added in the previous patch. Besides the ethtool API interaction, we keep track of how many bytes and frames are dequeued per CDAN (Channel Data Availability Notification) and update the Net DIM instance through the dpaa2_io_update_net_dim() API. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-15 14:32:41 +01:00
Ioana Ciornei	a64b442137	net: dpaa2: add support for manual setup of IRQ coalesing Use the newly exported dpio driver API to manually configure the IRQ coalescing parameters requested by the user. The .get_coalesce() and .set_coalesce() net_device callbacks are implemented and directly export or setup the rx-usecs on all the channels configured. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-15 14:32:41 +01:00
Eric Dumazet	19757cebf0	tcp: switch orphan_count to bare per-cpu counters Use of percpu_counter structure to track count of orphaned sockets is causing problems on modern hosts with 256 cpus or more. Stefan Bach reported a serious spinlock contention in real workloads, that I was able to reproduce with a netfilter rule dropping incoming FIN packets. 53.56% server [kernel.kallsyms] [k] queued_spin_lock_slowpath \| ---queued_spin_lock_slowpath \| --53.51%--_raw_spin_lock_irqsave \| --53.51%--__percpu_counter_sum tcp_check_oom \| \|--39.03%--__tcp_close \| tcp_close \| inet_release \| inet6_release \| sock_close \| __fput \| ____fput \| task_work_run \| exit_to_usermode_loop \| do_syscall_64 \| entry_SYSCALL_64_after_hwframe \| __GI___libc_close \| --14.48%--tcp_out_of_resources tcp_write_timeout tcp_retransmit_timer tcp_write_timer_handler tcp_write_timer call_timer_fn expire_timers __run_timers run_timer_softirq __softirqentry_text_start As explained in commit `cf86a086a1` ("net/dst: use a smaller percpu_counter batch for dst entries accounting"), default batch size is too big for the default value of tcp_max_orphans (262144). But even if we reduce batch sizes, there would still be cases where the estimated count of orphans is beyond the limit, and where tcp_too_many_orphans() has to call the expensive percpu_counter_sum_positive(). One solution is to use plain per-cpu counters, and have a timer to periodically refresh this cache. Updating this cache every 100ms seems about right, tcp pressure state is not radically changing over shorter periods. percpu_counter was nice 15 years ago while hosts had less than 16 cpus, not anymore by current standards. v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m, reported by kernel test robot <lkp@intel.com> Remove unused socket argument from tcp_too_many_orphans() Fixes: `dd24c00191` ("net: Use a percpu_counter for orphan_count") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Stefan Bach <sfb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-15 11:28:34 +01:00
Srujana Challa	149f3b73cb	octeontx2-af: Add support to flush full CPT CTX cache Adds support to flush or invalidate CPT CTX entries as part of FLR and also provides a mailbox to flush CPT CTX entries in case of graceful exit. This patch also adds support for AF -> CPT PF uplink mailbox messages and adds a new mbox message to submit a CPT instruction from AF. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 20:01:06 -07:00
Nithin Dabilpuram	7054d39ccf	octeontx2-af: Perform cpt lf teardown in non FLR path Perform CPT LF teardown in non FLR path as well via cpt_lf_free() Currently CPT LF teardown and reset sequence is only done when FLR is handled with CPT LF still attached. This patch also fixes cpt_lf_alloc() to set EXEC_LDWB in CPT_AF_LFX_CTL2 when being completely overwritten as that is the default value and is better for performance. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 20:01:06 -07:00
Srujana Challa	4826090719	octeontx2-af: Enable CPT HW interrupts This patch enables and registers interrupt handler for CPT HW interrupts. Signed-off-by: Srujana Challa <schalla@marvell.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 20:01:06 -07:00
Randy Dunlap	a3d708925f	net: tulip: winbond-840: fix build for UML On i386, when builtin (not a loadable module), the winbond-840 driver inspects boot_cpu_data to see what CPU family it is running on, and then acts on that data. The "family" struct member (x86) does not exist when running on UML, so prevent that test and do the default action. Prevents this build error on UML + i386: ../drivers/net/ethernet/dec/tulip/winbond-840.c: In function ‘init_registers’: ../drivers/net/ethernet/dec/tulip/winbond-840.c:882:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’ if (boot_cpu_data.x86 <= 4) { Fixes: `68f5d3f3b6` ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-um@lists.infradead.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20211014050606.7288-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 19:18:53 -07:00
Randy Dunlap	523994ba3a	net: intel: igc_ptp: fix build for UML On a UML build, the igc_ptp driver uses CONFIG_X86_TSC for timestamp conversion. The function that is used is not available on UML builds, so have the function use the default system_counterval_t timestamp instead for UML builds. Prevents this build error on UML: ../drivers/net/ethernet/intel/igc/igc_ptp.c: In function ‘igc_device_tstamp_to_system’: ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: implicit declaration of function ‘convert_art_ns_to_tsc’ [-Werror=implicit-function-declaration] return convert_art_ns_to_tsc(tstamp); ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: incompatible types when returning type ‘int’ but ‘struct system_counterval_t’ was expected return convert_art_ns_to_tsc(tstamp); Fixes: `68f5d3f3b6` ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-um@lists.infradead.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/r/20211014050516.6846-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 19:18:40 -07:00
Randy Dunlap	cd2621d07d	net: fealnx: fix build for UML On i386, when builtin (not a loadable module), the fealnx driver inspects boot_cpu_data to see what CPU family it is running on, and then acts on that data. The "family" struct member (x86) does not exist when running on UML, so prevent that test and do the default action. Prevents this build error on UML + i386: ../drivers/net/ethernet/fealnx.c: In function ‘netdev_open’: ../drivers/net/ethernet/fealnx.c:861:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’ Fixes: `68f5d3f3b6` ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-um@lists.infradead.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20211014050500.5620-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 19:18:22 -07:00
Yuval Shaia	20d446f24f	net: mvneta: Delete unused variable The variable pp is not in use - delete it. Signed-off-by: Yuval Shaia <yshaia@marvell.com> Link: https://lore.kernel.org/r/20211013064921.26346-1-yshaia@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 19:10:53 -07:00
Yuiko Oshino	4ece1ae440	net: microchip: lan743x: add support for PTP pulse width (duty cycle) If the PTP_PEROUT_DUTY_CYCLE flag is set, then check if the request_on value in ptp_perout_request matches the pre-defined values or a toggle option. Return a failure if the value is not supported. Preserve the old behaviors if the PTP_PEROUT_DUTY_CYCLE flag is not set. Tested with an oscilloscope on EVB-LAN7430: e.g., to output PPS 1sec period 500mS on (high) to GPIO 2. ./testptp -L 2,2 ./testptp -p 1000000000 -w 500000000 Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Link: https://lore.kernel.org/r/1634046593-64312-1-git-send-email-yuiko.oshino@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 19:05:18 -07:00
Jakub Kicinski	e15f5972b8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/testing/selftests/net/ioam6.sh `7b1700e009` ("selftests: net: modify IOAM tests for undef bits") `bf77b1400a` ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 16:50:14 -07:00

1 2 3 4 5 ...

39789 Commits