linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-28 05:24:47 +08:00

Author	SHA1	Message	Date
David S. Miller	18a4ded9d1	mlx5-updates-2017-09-03 This series from Tariq includes micro data path optimization for mlx5e netdevice driver. Mainly Tariq introduces the following changes to NAPI and RX handling path of the driver: - RX ring structure reorganizing - Trivial code refactoring and optimization - NAPI busy-poll for when fast UMR is in progress - Non-atomic state operations in NAPI context - Remove unnecessary fields from fast path structures - page-cache micro optimization - Rely on NAPI to avoid missing an IRQ for RX/TX shared NAPI contexts - Stop NAPI when irq changes affinity - Distribute RSS table among all RX rings Thanks, Saeed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJZq3r/AAoJEEg/ir3gV/o+ZtwIAK6LcyUxNSa+Q8a7+57EIQgj xwA76EG1x5MskZ30QRJpZG6VP6C2WEOtG0/WDi6yfEzZh5J0+clqVv1cHVIJIFhC vB+0FCl8GIlTE/VMpRFqFTZapz6/BWCWNQEW3a1raHb026cpeRzq7c+g1x4lKXx5 RN0QhOd/G+yUz6A+xt6GCRlHsIkvFpigL90rhfQqcvg/T8QepxZ1trJiytpu2J51 OhEtnl9mIapgj0Z9nQMMKV+BnLSaJxlJ2j5xGWa5x8zuySrGv/P26TYDsAvYp8pV +f9OgISoQr7d2mWyus3IYSi31F3AkbGE01K3vBQTXpyd8pdEqv5bzXXWJtQASeQ= =dmvv -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2017-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-09-03 This series from Tariq includes micro data path optimization for mlx5e netdevice driver. Mainly Tariq introduces the following changes to NAPI and RX handling path of the driver: - RX ring structure reorganizing - Trivial code refactoring and optimization - NAPI busy-poll for when fast UMR is in progress - Non-atomic state operations in NAPI context - Remove unnecessary fields from fast path structures - page-cache micro optimization - Rely on NAPI to avoid missing an IRQ for RX/TX shared NAPI contexts - Stop NAPI when irq changes affinity - Distribute RSS table among all RX rings ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 21:17:07 -07:00
Petr Machata	ee954d1a91	mlxsw: spectrum_router: Support GRE tunnels This patch introduces callbacks and tunnel type to offload GRE tunnels. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:26 -07:00
Petr Machata	92107cfb41	mlxsw: spectrum_router: Add loopback accessors struct mlxsw_sp_rif is a router-private structure, and therefore everything related to it is as well: parameters, and derived RIF types including loopbacks. IPIP module needs access to some details of loopback interfaces, but exporting all the RIF shebang would create too large an interface. So instead export just the bare minimum necessary: accessors for RIF index and underlay VRF ID. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:26 -07:00
Petr Machata	86484de2c9	mlxsw: spectrum: Register for IPIP_DECAP_ERROR trap These traps are generated for packets that fail checks for source IP, encapsulation type, or GRE key. Trap these packets to CPU for follow-up handling by the kernel, which will send ICMP destination unreachable responses. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:26 -07:00
Petr Machata	1cc38fb144	mlxsw: spectrum_router: Use existing decap route The local route that points at IPIP's underlay device (decap route) can be present long before the GRE device. Thus when an encap route is added, it's necessary to look inside the underlay FIB if the decap route is already present. If so, the current trap offload needs to be withdrawn and replaced with a decap offload. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:26 -07:00
Petr Machata	4607f6d269	mlxsw: spectrum_router: Support IPv4 underlay decap Unlike encapsulation, which is represented by a next hop forwarding to an IPIP tunnel, decapsulation is a type of local route. It is created for local routes whose prefix corresponds to the local address of one of offloaded IPIP tunnels. When the tunnel is removed (i.e. all the encap next hops are removed), the decap offload is migrated back to a trap for resolution in slow path. This patch assumes that decap route is already present when encap route is added. A follow-up patch will fix this issue. Note that this patch only supports IPv4 underlay. Support for IPv6 underlay will be subject to follow-up work apart from this patchset. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:26 -07:00
Petr Machata	8f28a30976	mlxsw: spectrum_router: Support IPv6 overlay encap Add the missing bits to recognize IPv6 next hops as IPIP ones to enable offloading of IPv6 overlay encapsulation. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:26 -07:00
Petr Machata	1012b9ac28	mlxsw: spectrum_router: Support IPv4 overlay encap This introduces some common code for tracking of offloaded IP-in-IP tunnels, and support for offloading IPv4 overlay encapsulating routes in particular. A follow-up patch will introduce IPv6 overlay as well. Offloaded tunnels are kept in a linked list of mlxsw_sp_ipip_entry objects hooked up in mlxsw_sp_router. A network device that represents the tunnel is used as a key to look up the corresponding IPIP entry. Note that in the future, more general keying mechanism will be needed, because parts of the tunnel information can be provided by the route. IPIP entries are reference counted, because several next hops may end up using the same tunnel, and we only want to offload it once. Encapsulation path hooks into next hop handling. Routes that forward to a tunnel are now considered gateway routes, thus giving them the same treatment that other remote routes get. An IPIP next hop type is introduced. Details of individual tunnel types are kept in an array of mlxsw_sp_ipip_ops objects. If a tunnel type doesn't match any of the known tunnel types, the next-hop is not considered an IPIP next hop. The list of IPIP tunnel types is currently empty, follow-up patches will add support for GRE. Traffic to IPIP tunnel types that are not explicitly recognized by the driver traps and is handled in slow path. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	35225e4740	mlxsw: spectrum_router: Make nexthops typed In the router, some next hops may reference an encapsulating netdevice, such as GRE or IPIP. To properly offload these next hops, mlxsw needs to keep track of whether a given next hop is a regular Ethernet entry, or an IP-in-IP tunneling entry. To facilitate this book-keeping, add a type field to struct mlxsw_sp_nexthop. There is, as of this patch, only one next hop type: MLXSW_SP_NEXTHOP_TYPE_ETH. Follow-up patches will introduce the IP-in-IP variant. There are several places where next hops are initialized in the IPv4 path. Instead of replicating the logic at every one of them, factor it out to a function mlxsw_sp_nexthop4_type_init(). The corresponding fini is actually protocol-neutral, so put it to mlxsw_sp_nexthop_type_fini(), but create a corresponding protocoled _fini function that dispatches to the protocol-neutral one. The IPv6 path is simpler, but for symmetry with IPv4, create the same suite of functions with corresponding logic. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	f6050ee6f4	mlxsw: spectrum_router: Extract mlxsw_sp_rt6_is_gateway() IPv6 counterpart of the previous patch: introduce a function to determine whether a given route is a gateway route. The new function takes a mlxsw_sp argument which follow-up patches will use. Thus mlxsw_sp_fib6_entry_type_set() got that argument as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	9b01451ad5	mlxsw: spectrum_router: Extract mlxsw_sp_fi_is_gateway() For IPv4 IP-in-IP offload, routes that direct traffic to IP-in-IP devices need to be considered gateway routes as well. That involves a bit more logic, so extract the current test to a separate function, where the logic can be later added. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	6ddb7426a7	mlxsw: spectrum_router: Introduce loopback RIFs When offloading L3 tunnels, an adjacency entry is created that loops the packet back into the underlay router. Loopback interfaces then hold the corresponding information and are created for IP-in-IP netdevices. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	010cadf916	mlxsw: spectrum_router: Support FID-less RIFs Loopback RIFs, which will be introduced in a follow-up patch, differ from other RIFs in that they do not have a FID associated with them. To support this, demote FID allocation from mlxsw_sp_rif_create to configure op of the existing RIF types, and likewise the FID release from mlxsw_sp_rif_destroy to deconfigure op. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	38ebc0f454	mlxsw: spectrum_router: Add mlxsw_sp_ipip_ops Details of individual tunnel types are kept in an array of mlxsw_sp_ipip_ops objects. Follow-up patches will use the list to determine whether a constructed RIF should be a loopback, and to decide whether a next hop references a tunnel. The list is currently empty, follow-up patches will add support for GRE. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	ff1f06ce9d	mlxsw: spectrum_router: Publish mlxsw_sp_l3proto The spectrum_ipip module that will be introduced in the follow-up patches needs to know the data type. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	89e419828f	mlxsw: reg: Give mlxsw_reg_ratr_pack a type parameter To support IPIP, the driver needs to be able to construct an IPIP adjacency. Change mlxsw_reg_ratr_pack to take an adjacency type as an argument. Adjust the one existing caller. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	9571e828f4	mlxsw: reg: Extract mlxsw_reg_ritr_mac_pack() Unlike other interface types, loopback RIFs do not have MAC address. So drop the corresponding argument from mlxsw_reg_ritr_pack() and move it to a new function. Call that from callers of mlxsw_reg_ritr_pack. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:25 -07:00
Petr Machata	1e659ebf58	mlxsw: reg: Add Routing Tunnel Decap Properties Register The RTDP register is used for configuring the tunnel decap properties of NVE and IPinIP. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:24 -07:00
Petr Machata	a43da820c8	mlxsw: reg: Add mlxsw_reg_ralue_act_ip2me_tun_pack() To implement IP-in-IP decapsulation, Spectrum uses LPM entries of type IP2ME with tunnel validity bit and tunnel pointer set. The necessary register fields are already available, so add a function to pack the RALUE as appropriate. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:24 -07:00
Petr Machata	6c4153b1e7	mlxsw: reg: Move enum mlxsw_reg_ratr_trap_id This enum is used with reg_ratr_trap_id, so move it next to the register definition. While at it, drop the enumerator initializers. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:24 -07:00
Petr Machata	7c819de438	mlxsw: reg: Update RATR to support IP-in-IP tunnels So far, adjacencies have always been of type Ethernet (with value of 0), and thus there was no need to explicitly support RATR type. However to support IP-in-IP adjacencies, this type and a suite of IP-in-IP-specific attributes need to be added. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:24 -07:00
Petr Machata	99ae8e3e5e	mlxsw: reg: Update RITR to support loopback device Update the register so that loopback RIFs can be created and loopback properties specified. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:23:24 -07:00
Antoine Tenart	688cbaf202	net: mvpp2: fallback using h/w and random mac if the dt one isn't valid When using a mac address described in the device tree, a check is made to see if it is valid. When it's not, no fallback is defined. This patches tries to get the mac address from h/w (or use a random one if the h/w one isn't valid) when the dt mac address isn't valid. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:16:55 -07:00
Antoine Tenart	d2a6e48e52	net: mvpp2: fix use of the random mac address for PPv2.2 The MAC retrieval logic is using a variable to store an h/w stored mac address and checks this mac against invalid ones before using it. But the mac address is only read from h/w when using PPv2.1. So when using PPv2.2 it defaults to its init state. This patches fixes the logic to only check if the h/w mac is valid when actually retrieving a mac from h/w. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:16:55 -07:00
Antoine Tenart	3ba8c81e15	net: mvpp2: move the mac retrieval/copy logic into its own function The MAC retrieval has a quite complicated logic (which is broken). Moves it to its own function to prepare for patches fixing its logic, so that reviews are easier. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 20:16:55 -07:00
David S. Miller	b63f6044d8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Basically, updates to the conntrack core, enhancements for nf_tables, conversion of netfilter hooks from linked list to array to improve memory locality and asorted improvements for the Netfilter codebase. More specifically, they are: 1) Add expection to hashes after timer initialization to prevent access from another CPU that walks on the hashes and calls del_timer(), from Florian Westphal. 2) Don't update nf_tables chain counters from hot path, this is only used by the x_tables compatibility layer. 3) Get rid of nested rcu_read_lock() calls from netfilter hook path. Hooks are always guaranteed to run from rcu read side, so remove nested rcu_read_lock() where possible. Patch from Taehee Yoo. 4) nf_tables new ruleset generation notifications include PID and name of the process that has updated the ruleset, from Phil Sutter. 5) Use skb_header_pointer() from nft_fib, so we can reuse this code from the nf_family netdev family. Patch from Pablo M. Bermudo. 6) Add support for nft_fib in nf_tables netdev family, also from Pablo. 7) Use deferrable workqueue for conntrack garbage collection, to reduce power consumption, from Patch from Subash Abhinov Kasiviswanathan. 8) Add nf_ct_expect_iterate_net() helper and use it. From Florian Westphal. 9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian. 10) Drop references on conntrack removal path when skbuffs has escaped via nfqueue, from Florian. 11) Don't queue packets to nfqueue with dying conntrack, from Florian. 12) Constify nf_hook_ops structure, from Florian. 13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter. 14) Add nla_strdup(), from Phil Sutter. 15) Rise nf_tables objects name size up to 255 chars, people want to use DNS names, so increase this according to what RFC 1035 specifies. Patch series from Phil Sutter. 16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook registration on demand, suggested by Eric Dumazet, patch from Florian. 17) Remove unused variables in compat_copy_entry_from_user both in ip_tables and arp_tables code. Patch from Taehee Yoo. 18) Constify struct nf_conntrack_l4proto, from Julia Lawall. 19) Constify nf_loginfo structure, also from Julia. 20) Use a single rb root in connlimit, from Taehee Yoo. 21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo. 22) Use audit_log() instead of open-coding it, from Geliang Tang. 23) Allow to mangle tcp options via nft_exthdr, from Florian. 24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes a fix for a miscalculation of the minimal length. 25) Simplify branch logic in h323 helper, from Nick Desaulniers. 26) Calculate netlink attribute size for conntrack tuple at compile time, from Florian. 27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure. From Florian. 28) Remove holes in nf_conntrack_l4proto structure, so it becomes smaller. From Florian. 29) Get rid of print_tuple() indirection for /proc conntrack listing. Place all the code in net/netfilter/nf_conntrack_standalone.c. Patch from Florian. 30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is off. From Florian. 31) Constify most nf_conntrack_{l3,l4}proto helper functions, from Florian. 32) Fix broken indentation in ebtables extensions, from Colin Ian King. 33) Fix several harmless sparse warning, from Florian. 34) Convert netfilter hook infrastructure to use array for better memory locality, joint work done by Florian and Aaron Conole. Moreover, add some instrumentation to debug this. 35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once per batch, from Florian. 36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian. 37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao. 38) Remove unused code in the generic protocol tracker, from Davide Caratti. I think I will have material for a second Netfilter batch in my queue if time allow to make it fit in this merge window. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 17:08:42 -07:00
Colin Ian King	942e7e5fc1	net/mlx4_core: fix incorrect size allocation for dev->caps.spec_qps The current allocation for dev->caps.spec_qps is for the size of the pointer and not the size of the actual mlx4_spec_qps structure. Fix this by using the correct size. Also splint allocation over a few lines to make it cppcheck clean on overly wide lines. Detected by CoverityScan, CID#1455222 ("Wrong sizeof argument") Fixes: `c73c8b1e47` ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 10:57:10 -07:00
Colin Ian King	542deb88b0	net/mlx4_core: fix memory leaks on error exit path The structures hca_param and func_cap are not being kfree'd on an error exit path causing two memory leaks. Fix this by jumping to the existing free memory error exit path. Detected by CoverityScan, CID#1455219, CID#1455224 ("Resource Leak") Fixes: `c73c8b1e47` ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-03 10:57:10 -07:00
Tariq Toukan	d4b6c48800	net/mlx5e: Distribute RSS table among all RX rings In default, uniformly distribute the RSS indirection table entries among all RX rings, rather than restricting this only to the rings on the close NUMA node. irqbalancer would anyway dynamically override the default affinities set to the RX rings. This gives better multi-stream performance and CPU util. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	a8c2eb1579	net/mlx5e: Stop NAPI when irq balancer changes affinity NAPI context keeps rescheduling on same CPU as long as it's busy. This doesn't give the oppurtunity for changes in irq affinities to take effect. Fix that by calling napi_complete_done() upon a change in affinity. This would stop the NAPI and reschedule it on the new CPU. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	7b33aaeaae	net/mlx5e: Use kernel's mechanism to avoid missing NAPIs We used a channel state bit MLX5E_CHANNEL_NAPI_SCHED to make sure no NAPI is missed when a channel's napi_schedule() is called for completion events of the different channel's resources/rings while NAPI is currently running. Now, as similar mechanism is implemented in kernel, ("39e6c8208d7b net: solve a NAPI race"), we obsolete our own implementation and rely on the return value of napi_complete_done(). This patch removes a redundant overhead of atomic bit operations. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	29c2849e0d	net/mlx5e: Slightly increase RX page-cache size In XDP_TX flow, we now get back quicker to each page in page-cache, and on some occasions refcount does not get back to 1 on time, causing some costly page allocations. Slightly increase the size of RX page-cache to significantly decrease the chances for this to happen. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	70871f1ec4	net/mlx5e: Don't recycle page if moved to far NUMA Avoid recycling an RX page if it moved to another NUMA node. Add an ethtool counter to count such events. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	3b56f7b2af	net/mlx5e: Remove unnecessary fields in ICO SQ As of current design, in each NAPI, only a single UMR WQE completion could be available in the completion queue of the the internal control operations (ICO) send queue, in addition to nop operations that require no actions upon completion. This renders the consume index obsolete, as the wqe_counter field in CQE is sufficient. This helps removing a memory barrier, and obsoletes the need for tracking the num_wqebbs to update the consumer counter. In addition, remove other unused fields in icosq struct: pdev, dma_fifo_pc, and prev_cc. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	7cc6d77bb5	net/mlx5e: Type-specific optimizations for RX post WQEs function Separate the RX post WQEs function of the different RQ types. This enables RQ type-specific optimizations in data-path. Poll the ICOSQ completion queue only for Striding RQ, and only when a UMR post completion could be possibly available. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	a071cb9f25	net/mlx5e: Non-atomic RQ state indicator for UMR WQE in progress The indication for a UMR WQE in progress is needed only within the NAPI context, and hence no races possible and no need for the use of atomic operations. The only place the flag is read outside of NAPI context is in closure flow, after RQ is disabled flag is no more accessed in NAPI. Use a boolean instead of a bit in ring state, so that its non-atomic set operations do not race with the atomic sets of the other bits. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	a1eaba4c5c	net/mlx5e: Non-atomic indicator for ring enabled state Rings enabled state change occurs in control path only, and is always followed by a napi_sychronize(), so that following NAPIs read the new value. This read does not need to be atomic. The RQ auto-moderation bit is not set/cleared in data-path. No need for atomic read, a regular read operation is sufficient. In RQ creation time as well, there's no multiple threads trying to access it yet, hence a regular read can be used. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	604acb193b	net/mlx5e: Refactor data-path lro header function Refactor function mlx5e_lro_update_hdr() to reduce number of branches. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:09 +03:00
Tariq Toukan	4b7dfc9925	net/mlx5e: Early-return on empty completion queues NAPI context handles different kinds of completion queues (RX, TX, and others). Hence, upon a poll trial, some of them might be empty. Here we early-return upon empty completion queues, as well as full rx buffer, and save unnecessary logic and memory barriers. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:08 +03:00
Tariq Toukan	4cbb755801	net/mlx5e: NAPI busy-poll when UMR post is in progress If a UMR post is in progress, it means that there's a missing WQE in RQ, and that a completion will be shortly available in ICO SQ completion queue. Prefer busy-poll to handle it as soon as possible. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:08 +03:00
Tariq Toukan	4c2af5cc2b	net/mlx5e: Small enhancements for RX MPWQE allocation and free The dma offset of a MPWQE (Multi-Packet WQE) in memory region is fixed for all rounds. Calculate it once on creation time, instead of in runtime. This also obsoletes the wqe argument in the function. In addition, optimize dma_info iterator calculation. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:08 +03:00
Tariq Toukan	9bafe2adab	net/mlx5e: Use memset to init skbs_frags array to zeros In RX data-path, use memset() instead of loop assignment to init the whole skbs_frags array. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:08 +03:00
Tariq Toukan	b681c481f1	net/mlx5e: Remove unnecessary wqe_sz field from RQ buffer Field is used only locally within the RQ create function. The use of a local variable is sufficient. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:08 +03:00
Tariq Toukan	89e89f7a9f	net/mlx5e: Replace multiplication by stride size with a shift In RX data-path, use shift operations instead of a regular multiplication by stride size, as it is a power of two. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:08 +03:00
Tariq Toukan	b45d8b50b8	net/mlx5e: Reorganize struct mlx5e_rq Bring fast-path fields together, and combine RX WQE mutual exclusive fields into a union. Page-reuse and XDP are mutually exclusive and cannot be used at the same time. Use a union to combine their footprints. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-09-03 06:34:08 +03:00
Haiyang Zhang	db3cd7af9d	hv_netvsc: Fix the channel limit in netvsc_set_rxfh() The limit of setting receive indirection table value should be the current number of channels, not the VRSS_CHANNEL_MAX. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-01 20:39:12 -07:00
Haiyang Zhang	06be580ac7	hv_netvsc: Simplify the limit check in netvsc_set_channels() Because of the following code, net->num_tx_queues equals to VRSS_CHANNEL_MAX, and max_chn is less than or equals to VRSS_CHANNEL_MAX. netvsc_drv.c: alloc_etherdev_mq(sizeof(struct net_device_context), VRSS_CHANNEL_MAX); rndis_filter.c: net_device->max_chn = min_t(u32, VRSS_CHANNEL_MAX, num_possible_rss_qs); So this patch removes the unnecessary limit check before comparing with "max_chn". Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-01 20:39:12 -07:00
Haiyang Zhang	5c4217d05d	hv_netvsc: Simplify num_chn checking in rndis_filter_device_add() The minus one and assignment to a local variable is not necessary. This patch simplifies it. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-01 20:39:12 -07:00
Haiyang Zhang	715e2ec532	hv_netvsc: Clean up an unused parameter in rndis_filter_set_rss_param() This patch removes the parameter, num_queue in rndis_filter_set_rss_param(), which is no longer in use. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-01 20:39:12 -07:00
Stephen Hemminger	ec158f77de	netvsc: allow driver to be removed even if VF is present If VF is attached then can still allow netvsc driver module to be removed. Just have to make sure and do the cleanup. Also, avoid extra rtnl round trip when calling unregister. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-01 20:31:19 -07:00

1 2 3 4 5 ...

70606 Commits