linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-29 22:14:41 +08:00

Author	SHA1	Message	Date
Heiner Kallweit	3eef868932	net: phy: simplify genphy_config_advert by using the linkmode_adv_to_xxx_t functions Using linkmode_adv_to_mii_adv_t and linkmode_adv_to_mii_ctrl1000_t allows to simplify the code. In addition avoiding the conversion to the legacy u32 advertisement format allows to remove the warning. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:24:32 -07:00
Jiri Pirko	150e8f8a1b	netdevsim: register couple of devlink params Register couple of devlink params, one generic, one driver-specific. Make the values available over debugfs. Example: $ echo "111" > /sys/bus/netdevsim/new_device $ devlink dev param netdevsim/netdevsim111: name max_macs type generic values: cmode driverinit value 32 name test1 type driver-specific values: cmode driverinit value true $ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs 32 $ cat /sys/kernel/debug/netdevsim/netdevsim111/test1 Y $ devlink dev param set netdevsim/netdevsim111 name max_macs cmode driverinit value 16 $ devlink dev param set netdevsim/netdevsim111 name test1 cmode driverinit value false $ devlink dev reload netdevsim/netdevsim111 $ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs 16 $ cat /sys/kernel/debug/netdevsim/netdevsim111/test1 Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:20:25 -07:00
David S. Miller	6e5ee48339	Merge branch 'drop_monitor-Capture-dropped-packets-and-metadata' Ido Schimmel says: ==================== drop_monitor: Capture dropped packets and metadata So far drop monitor supported only one mode of operation in which a summary of recent packet drops is periodically sent to user space as a netlink event. The event only includes the drop location (program counter) and number of drops in the last interval. While this mode of operation allows one to understand if the system is dropping packets, it is not sufficient if a more detailed analysis is required. Both the packet itself and related metadata are missing. This patchset extends drop monitor with another mode of operation where the packet - potentially truncated - and metadata (e.g., drop location, timestamp, netdev) are sent to user space as a netlink event. Thanks to the extensible nature of netlink, more metadata can be added in the future. To avoid performing expensive operations in the context in which kfree_skb() is called, the dropped skbs are cloned and queued on per-CPU skb drop list. The list is then processed in process context (using a workqueue), where the netlink messages are allocated, prepared and finally sent to user space. A follow-up patchset will integrate drop monitor with devlink and allow the latter to call into drop monitor to report hardware drops. In the future, XDP drops can be added as well, thereby making drop monitor the go-to netlink channel for diagnosing all packet drops. Example usage with patched dropwatch [1] can be found here [2]. Example dissection of drop monitor netlink events with patched wireshark [3] can be found here [4]. I will submit both changes upstream after the kernel changes are accepted. Another change worth making is adding a dropmon pseudo interface to libpcap, similar to the nflog interface [5]. This will allow users to specifically listen on dropmon traffic instead of capturing all netlink packets via the nlmon netdev. Patches #1-#5 prepare the code towards the actual changes in later patches. Patch #6 adds another mode of operation to drop monitor in which the dropped packet itself is notified to user space along with metadata. Patch #7 allows users to truncate reported packets to a specific length, in case only the headers are of interest. The original length of the packet is added as metadata to the netlink notification. Patch #8 allows user to query the current configuration of drop monitor (e.g., alert mode, truncation length). Patches #9-#10 allow users to tune the length of the per-CPU skb drop list according to their needs. Changes since v1 [6]: * Add skb protocol as metadata. This allows user space to correctly dissect the packet instead of blindly assuming it is an Ethernet packet Changes since RFC [7]: * Limit the length of the per-CPU skb drop list and make it configurable * Do not use the hysteresis timer in packet alert mode * Introduce alert mode operations in a separate patch and only then introduce the new alert mode * Use 'skb->skb_iif' instead of 'skb->dev' because the latter is inside a union with 'dev_scratch' and therefore not guaranteed to point to a valid netdev * Return '-EBUSY' instead of '-EOPNOTSUPP' when trying to configure drop monitor while it is monitoring * Did not change schedule_work() in favor of schedule_work_on() as I did not observe a change in number of tail drops [1] https://github.com/idosch/dropwatch/tree/packet-mode [2] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile1-txt [3] https://github.com/idosch/wireshark/tree/drop-monitor-v2 [4] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile2-txt [5] https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-netfilter-linux.c [6] https://patchwork.ozlabs.org/cover/1143443/ [7] https://patchwork.ozlabs.org/cover/1135226/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:31 -07:00
Ido Schimmel	e9feb58020	drop_monitor: Expose tail drop counter Previous patch made the length of the per-CPU skb drop list configurable. Expose a counter that shows how many packets could not be enqueued to this list. This allows users determine the desired queue length. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	30328d46af	drop_monitor: Make drop queue length configurable In packet alert mode, each CPU holds a list of dropped skbs that need to be processed in process context and sent to user space. To avoid exhausting the system's memory the maximum length of this queue is currently set to 1000. Allow users to tune the length of this queue according to their needs. The configured length is reported to user space when drop monitor configuration is queried. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	444be061d0	drop_monitor: Add a command to query current configuration Users should be able to query the current configuration of drop monitor before they start using it. Add a command to query the existing configuration which currently consists of alert mode and packet truncation length. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	57986617a7	drop_monitor: Allow truncation of dropped packets When sending dropped packets to user space it is not always necessary to copy the entire packet as usually only the headers are of interest. Allow user to specify the truncation length and add the original length of the packet as additional metadata to the netlink message. By default no truncation is performed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	ca30707dee	drop_monitor: Add packet alert mode So far drop monitor supported only one alert mode in which a summary of locations in which packets were recently dropped was sent to user space. This alert mode is sufficient in order to understand that packets were dropped, but lacks information to perform a more detailed analysis. Add a new alert mode in which the dropped packet itself is passed to user space along with metadata: The drop location (as program counter and resolved symbol), ingress netdevice and drop timestamp. More metadata can be added in the future. To avoid performing expensive operations in the context in which kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and queued on per-CPU skb drop list. Then, in process context the netlink message is allocated, prepared and finally sent to user space. The per-CPU skb drop list is limited to 1000 skbs to prevent exhausting the system's memory. Subsequent patches will make this limit configurable and also add a counter that indicates how many skbs were tail dropped. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	28315f7999	drop_monitor: Add alert mode operations The next patch is going to add another alert mode in which the dropped packet is notified to user space, instead of only a summary of recent drops. Abstract the differences between the modes by adding alert mode operations. The operations are selected based on the currently configured mode and associated with the probes and the work item just before tracing starts. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	c5ab9b1c41	drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration Currently, the configure command does not do anything but return an error. Subsequent patches will enable the command to change various configuration options such as alert mode and packet truncation. Similar to other netlink-based configuration channels, make sure only users with the CAP_NET_ADMIN capability set can execute this command. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	44075f5637	drop_monitor: Reset per-CPU data before starting to trace The function reset_per_cpu_data() allocates and prepares a new skb for the summary netlink alert message ('NET_DM_CMD_ALERT'). The new skb is stored in the per-CPU 'data' variable and the old is returned. The function is invoked during module initialization and from the workqueue, before an alert is sent. This means that it is possible to receive an alert with stale data, if we stopped tracing when the hysteresis timer ('data->send_timer') was pending. Instead of invoking the function during module initialization, invoke it just before we start tracing and ensure we get a fresh skb. This also allows us to remove the calls to initialize the timer and the work item from the module initialization path, since both could have been triggered by the error paths of reset_per_cpu_data(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	70c69274f3	drop_monitor: Initialize timer and work item upon tracing enable The timer and work item are currently initialized once during module init, but subsequent patches will need to associate different functions with the work item, based on the configured alert mode. Allow subsequent patches to make that change by initializing and de-initializing these objects during tracing enable and disable. This also guarantees that once the request to disable tracing returns, no more netlink notifications will be generated. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	7c747838a5	drop_monitor: Split tracing enable / disable to different functions Subsequent patches will need to enable / disable tracing based on the configured alerting mode. Reduce the nesting level and prepare for the introduction of this functionality by splitting the tracing enable / disable operations into two different functions. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
David S. Miller	2cc2743d8f	Merge branch 'Networking-driver-debugfs-cleanups' Greg Kroah-Hartman says: ==================== Networking driver debugfs cleanups There is no need to test the result of any debugfs call anymore. The debugfs core warns the user if something fails, and the return value of a debugfs call can always be fed back into another debugfs call with no problems. Also, debugfs is for debugging, so if there are problems with debugfs (i.e. the system is out of memory) the rest of the kernel should not change behavior, so testing for debugfs calls is pointless and not the goal of debugfs at all. This series cleans up a lot of networking drivers and some wimax code that was calling debugfs and trying to do something with the return value that it didn't need to. Removing this logic makes the code smaller, easier to understand, and use less run-time memory in some cases, all good things. The series is against net-next, and have no dependancies between any of them if they want to go through any random tree/order. Or, if wanted, I can take them through my driver-core tree where other debugfs cleanups are being slowly fed during major merge windows. v3: fix build warning in i2400m, I thought I had caught them all :( add acks from some reviewers v2: fix up build warnings, it's as if I never even built these. Ugh, so sorry for wasting people's time with the v1 series. I need to stop relying on 0-day as it isn't working well anymore :( ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:49 -07:00
Greg Kroah-Hartman	7e174a49bb	ieee802154: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Alexander Aring <alex.aring@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Harry Morris <h.morris@cascoda.com> Cc: linux-wpan@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	35dc61ebfc	ixgbe: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	43c4eb0381	i40e: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	ecc5570751	fm10k: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	e6882aa623	mvpp2: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: "David S. Miller" <davem@davemloft.net> Cc: Maxime Chevallier <maxime.chevallier@bootlin.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Huckleberry <nhuck@google.com> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	2f62f8e6c3	skge: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Mirko Lindner <mlindner@marvell.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	687236b07a	qca: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: "David S. Miller" <davem@davemloft.net> Cc: Stefan Wahren <stefan.wahren@i2se.com> Cc: Michael Heimpold <michael.heimpold@i2se.com> Cc: Yangtao Li <tiny.windzz@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	92aff5b467	dpaa2: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Because we don't care about the individual files, we can remove the stored dentry for the files, as they are not needed to be kept track of at all. Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	8d72ab119f	stmmac: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Because we don't care about the individual files, we can remove the stored dentry for the files, as they are not needed to be kept track of at all. Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Cc: Jose Abreu <joabreu@synopsys.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: netdev@vger.kernel.org Cc: linux-stm32@st-md-mailman.stormreply.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	16e9b481e9	nfp: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Edwin Peer <edwin.peer@netronome.com> Cc: Yangtao Li <tiny.windzz@gmail.com> Cc: Simon Horman <simon.horman@netronome.com> Cc: oss-drivers@netronome.com Cc: netdev@vger.kernel.org Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	11ab11e69d	hns3: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Yisen Zhuang <yisen.zhuang@huawei.com> Cc: Salil Mehta <salil.mehta@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	9dac1e8eea	cxgb4: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. If a debugfs call fails, it will properly warn in the syslog, there's no need for all individual drivers to also print a message, so that is one more reason to not care about checking the return values. Cc: Vishal Kulkarni <vishal@chelsio.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Casey Leedom <leedom@chelsio.com> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	3a131e8504	bnxt: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. This cleans up a lot of unneeded code and logic around the debugfs files, making all of this much simpler and easier to understand. Cc: Michael Chan <michael.chan@broadcom.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:47 -07:00
Greg Kroah-Hartman	9e3926df87	xgbe: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. This cleans up a lot of unneeded code and logic around the debugfs files, making all of this much simpler and easier to understand. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:47 -07:00
Greg Kroah-Hartman	9f818c8a73	mlx5: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. This cleans up a lot of unneeded code and logic around the debugfs files, making all of this much simpler and easier to understand as we don't need to keep the dentries saved anymore. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:47 -07:00
Greg Kroah-Hartman	fedcc6da10	bonding: no need to print a message if debugfs_create_dir() fails The debugfs core now will print a message if this function fails, so don't duplicate that logic. Also, no need to change the code logic if the call fails either, as no debugfs calls should interrupt normal kernel code for any reason. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:47 -07:00
Greg Kroah-Hartman	a62052ba2a	wimax: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. This cleans up a lot of unneeded code and logic around the debugfs wimax files, making all of this much simpler and easier to understand. Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Cc: linux-wimax@intel.com Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:47 -07:00
David S. Miller	38b9e0f6d9	mlx5-updates-2019-08-09 This series includes update to mlx5 ethernet and core driver: In first #11 patches, Vlad submits part 2 of 3 part series to allow TC flow handling for concurrent execution. 1) TC flow handling for concurrent execution (part 2) Vald Says: ========== Refactor data structures that are shared between flows in tc. Currently, all cls API hardware offloads driver callbacks require caller to hold rtnl lock when calling them. Cls API has already been updated to update software filters in parallel (on classifiers that support unlocked execution), however hardware offloads code still obtains rtnl lock before calling driver tc callbacks. This set implements support for unlocked execution of tc hairpin, mod_hdr and encap subsystem. The changed implemented in these subsystems are very similar in general. The main difference is that hairpin is accessed through mlx5e_tc_table (legacy mode), mod_hdr is accessed through both mlx5e_tc_table and mlx5_esw_offload (legacy and switchdev modes) and encap is only accessed through mlx5_esw_offload (switchdev mode). 1.1) Hairpin handling and structure mlx5e_hairpin_entry refactored in following way: - Hairpin structure is extended with atomic reference counter. This approach allows to lookup of hairpin entry and obtain reference to it with hairpin_tbl_lock protection and then continue using the entry unlocked (including provisioning to hardware). - To support unlocked provisioning of hairpin entry to hardware, the entry is extended with 'res_ready' completion and is inserted to hairpin_tbl before calling the firmware. With this approach any concurrent users that attempt to use the same hairpin entry wait for completion first to prevent access to entries that are not fully initialized. - Hairpin entry is extended with new flows_lock spinlock to protect the list when multiple concurrent tc instances update flows attached to the same hairpin entry. 1.2) Modify header handling code and structure mlx5e_mod_hdr_entry are refactored in the following way: - Mod_hdr structure is extended with atomic reference counter. This approach allows to lookup of mod_hdr entry and obtain reference to it with mod_hdr_tbl_lock protection and then continue using the entry unlocked (including provisioning to hardware). - To support unlocked provisioning of mod_hdr entry to hardware, the entry is extended with 'res_ready' completion and is inserted to mod_hdr_tbl before calling the firmware. With this approach any concurrent users that attempt to use the same mod_hdr entry wait for completion first to prevent access to entries that are not fully initialized. - Mod_Hdr entry is extended with new flows_lock spinlock to protect the list when multiple concurrent tc instances update flows attached to the same mod_hdr entry. 1.3) Encapsulation handling code and Structure mlx5e_encap_entry are refactored in the following way: - encap structure is extended with atomic reference counter. This approach allows to lookup of encap entry and obtain reference to it with encap_tbl_lock protection and then continue using the entry unlocked (including provisioning to hardware). - To support unlocked provisioning of encap entry to hardware, the entry is extended with 'res_ready' completion and is inserted to encap_tbl before calling the firmware. With this approach any concurrent users that attempt to use the same encap entry wait for completion first to prevent access to entries that are not fully initialized. - As a difference from approach used to refactor hairpin and mod_hdr, encap entry is not extended with any per-entry fine-grained lock. Instead, encap_table_lock is used to synchronize all operations on encap table and instances of mlx5e_encap_entry. This is necessary because single flow can be attached to multiple encap entries simultaneously. During new flow creation or neigh update event all of encaps that flow is attached to must be accessed together as in atomic manner, which makes usage of per-entry lock infeasible. - Encap entry is extended with new flows_lock spinlock to protect the list when multiple concurrent tc instances update flows attached to the same encap entry. ========== 3) Parav improves the way port representors report their parent ID and port index. 4) Use refcount_t for refcount in vxlan data base from Chuhong Yuan -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1N64MACgkQSD+KveBX +j4iZAf/cXbX7B6QamcslzKR0HXUWeFBxj+6xrohlB4g4jAr62FbcNWbNyho26Fy ePZB5J2P2yujR7a7aDpGwPUFw42kRzmg0uvKVGW95459hVwx7fXaOWX8b9qfF9DK KJdvxw5s/b92qFMXUp/0mUGOD7Md0Q1Dy07rL0T6mgQGp9iKfennhtgGPBjtEkec Y8BLtRB4ZX3X16sSEj0Zm3h7IojqXT/0mqqKXoXM2N+kGTmXWAcCTeFdAUh31BMf ddlgEJu9t2OtLjg0iVKiUKE4r52LjdlJTsnRM0SkkUPSzS/+vI8iUUgF8X/XoqNG PtncRsSOGiWl2EU2Tb4m5v3obIanfA== =HzrJ -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2019-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-08-09 This series includes update to mlx5 ethernet and core driver: In first #11 patches, Vlad submits part 2 of 3 part series to allow TC flow handling for concurrent execution. 1) TC flow handling for concurrent execution (part 2) Vald Says: ========== Refactor data structures that are shared between flows in tc. Currently, all cls API hardware offloads driver callbacks require caller to hold rtnl lock when calling them. Cls API has already been updated to update software filters in parallel (on classifiers that support unlocked execution), however hardware offloads code still obtains rtnl lock before calling driver tc callbacks. This set implements support for unlocked execution of tc hairpin, mod_hdr and encap subsystem. The changed implemented in these subsystems are very similar in general. The main difference is that hairpin is accessed through mlx5e_tc_table (legacy mode), mod_hdr is accessed through both mlx5e_tc_table and mlx5_esw_offload (legacy and switchdev modes) and encap is only accessed through mlx5_esw_offload (switchdev mode). 1.1) Hairpin handling and structure mlx5e_hairpin_entry refactored in following way: - Hairpin structure is extended with atomic reference counter. This approach allows to lookup of hairpin entry and obtain reference to it with hairpin_tbl_lock protection and then continue using the entry unlocked (including provisioning to hardware). - To support unlocked provisioning of hairpin entry to hardware, the entry is extended with 'res_ready' completion and is inserted to hairpin_tbl before calling the firmware. With this approach any concurrent users that attempt to use the same hairpin entry wait for completion first to prevent access to entries that are not fully initialized. - Hairpin entry is extended with new flows_lock spinlock to protect the list when multiple concurrent tc instances update flows attached to the same hairpin entry. 1.2) Modify header handling code and structure mlx5e_mod_hdr_entry are refactored in the following way: - Mod_hdr structure is extended with atomic reference counter. This approach allows to lookup of mod_hdr entry and obtain reference to it with mod_hdr_tbl_lock protection and then continue using the entry unlocked (including provisioning to hardware). - To support unlocked provisioning of mod_hdr entry to hardware, the entry is extended with 'res_ready' completion and is inserted to mod_hdr_tbl before calling the firmware. With this approach any concurrent users that attempt to use the same mod_hdr entry wait for completion first to prevent access to entries that are not fully initialized. - Mod_Hdr entry is extended with new flows_lock spinlock to protect the list when multiple concurrent tc instances update flows attached to the same mod_hdr entry. 1.3) Encapsulation handling code and Structure mlx5e_encap_entry are refactored in the following way: - encap structure is extended with atomic reference counter. This approach allows to lookup of encap entry and obtain reference to it with encap_tbl_lock protection and then continue using the entry unlocked (including provisioning to hardware). - To support unlocked provisioning of encap entry to hardware, the entry is extended with 'res_ready' completion and is inserted to encap_tbl before calling the firmware. With this approach any concurrent users that attempt to use the same encap entry wait for completion first to prevent access to entries that are not fully initialized. - As a difference from approach used to refactor hairpin and mod_hdr, encap entry is not extended with any per-entry fine-grained lock. Instead, encap_table_lock is used to synchronize all operations on encap table and instances of mlx5e_encap_entry. This is necessary because single flow can be attached to multiple encap entries simultaneously. During new flow creation or neigh update event all of encaps that flow is attached to must be accessed together as in atomic manner, which makes usage of per-entry lock infeasible. - Encap entry is extended with new flows_lock spinlock to protect the list when multiple concurrent tc instances update flows attached to the same encap entry. ========== 3) Parav improves the way port representors report their parent ID and port index. 4) Use refcount_t for refcount in vxlan data base from Chuhong Yuan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 20:11:19 -07:00
Roman Mashak	62ad42ec9c	tc-testing: added tdc tests for matchall filter Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 19:59:13 -07:00
David Ahern	f887427b2c	selftests: Fix detection of nettest command in fcnal-test Most of the tests run by fcnal-test.sh relies on the nettest command. Rather than trying to cover all of the individual tests, check for the binary only at the beginning. Also removes the need for log_error which is undefined. Fixes: `6f9d5cacfe` ("selftests: Setup for functional tests for fib and socket lookups") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 19:54:14 -07:00
Chuhong Yuan	b51c225e6c	net/mlx5e: Use refcount_t for refcount refcount_t is better for reference counters since its implementation can prevent overflows. So convert atomic_t ref counters to refcount_t. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:11 -07:00
Parav Pandit	c938451f6b	net/mlx5e: Use vhca_id in generating representor port_index It is desired to use unique port indices when multiple pci devices' devlink instance have the same switch-id. Make use of vhca-id to generate such unique devlink port indices. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:11 -07:00
Parav Pandit	724ee17912	net/mlx5e: Simplify querying port representor parent id System image GUID doesn't depend on eswitch switchdev mode. Hence, remove the check which simplifies the code. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:11 -07:00
Parav Pandit	ef2e4094e0	net/mlx5: E-switch, Removed unused hwid Currently mlx5_eswitch_rep stores same hw ID for all representors. However it is never used from this structure. It is always used from mlx5_vport. Hence, remove unused field. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	d589e785ba	net/mlx5e: Allow concurrent creation of encap entries Encap entries creation is fully synchronized by encap_tbl_lock. In order to allow concurrent allocation of hardware resources used to offload encapsulation, extend mlx5e_encap_entry with 'res_ready' completion. Move call to mlx5e_tc_tun_create_header_ipv{4\|6}() out of encap_tbl_lock critical section. Modify code that attaches new flows to existing encap to wait for 'res_ready' completion before using the entry. Insert encap entry to table before provisioning it to hardware and modify all users of the encap table to verify that encap was fully initialized by checking completion result for non-zero value (and to wait for 'res_ready' completion, if necessary). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	61086f3910	net/mlx5e: Protect encap hash table with mutex To remove dependency on rtnl lock, protect encap hash table from concurrent modifications with new "encap_tbl_lock" mutex. Use the mutex to protect internal encap entry state from concurrent modification. This is necessary because a flow can be attached to multiple encap entries simultaneously, which significantly complicates using finer grained per-entry lock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	948993f2be	net/mlx5e: Extend encap entry with reference counter List of flows attached to encap entry is used as implicit reference counter (encap entry is deallocated when list becomes free) and as a mechanism to obtain encap entry that flow is attached to (through list head). This is not safe when concurrent modification of list of flows attached to encap entry is possible. Proper atomic reference counter is required to support concurrent access. As a preparation for extending encap with reference counting, extract code that lookups and deletes encap entry into standalone put/get helpers. In order to remove this dependency on external locking, extend encap entry with reference counter to manage its lifetime and extend flow structure with direct pointer to encap entry that flow is attached to. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	a734d00717	net/mlx5e: Allow concurrent creation of mod_hdr entries Mod_hdr entries creation is fully synchronized by mod_hdr_tbl->lock. In order to allow concurrent allocation of hardware resources used to offload header rewrite, extend mlx5e_mod_hdr_entry with 'res_ready' completion. Move call to mlx5_modify_header_alloc() out of mod_hdr_tbl->lock critical section. Modify code that attaches new flows to existing mh to wait for 'res_ready' completion before using the entry. Insert mh to mod_hdr table before provisioning it to hardware and modify all users of mod_hdr table to verify that mh was fully initialized by checking completion result for negative value (and to wait for 'res_ready' completion, if necessary). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	d2faae25c3	net/mlx5e: Protect mod_hdr hash table with mutex To remove dependency on rtnl lock, protect mod_hdr hash table from concurrent modifications with new mutex. Implement helper function to get flow namespace to prevent code duplication. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:09 -07:00
Vlad Buslov	83a52f0d52	net/mlx5e: Protect mod header entry flows list with spinlock To remove dependency on rtnl lock, extend mod header entry with spinlock and use it to protect list of flows attached to mod header entry from concurrent modifications. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:09 -07:00
Vlad Buslov	dd58edc328	net/mlx5e: Extend mod header entry with reference counter List of flows attached to mod header entry is used as implicit reference counter (mod header entry is deallocated when list becomes free) and as a mechanism to obtain mod header entry that flow is attached to (through list head). This is not safe when concurrent modification of list of flows attached to mod header entry is possible. Proper atomic reference counter is required to support concurrent access. As a preparation for extending mod header with reference counting, extract code that lookups and deletes mod header entry into standalone put/get helpers. In order to remove this dependency on external locking, extend mod header entry with reference counter to manage its lifetime and extend flow structure with direct pointer to mod header entry that flow is attached to. To remove code duplication between legacy and switchdev mode implementations that both support mod_hdr functionality, store mod_hdr table in dedicated structure used by both fdb and kernel namespaces. New table structure is extended with table lock by one of the following patches in this series. Implement helper function to get correct mod_hdr table depending on flow namespace. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:09 -07:00
Vlad Buslov	db76ca2424	net/mlx5e: Allow concurrent creation of hairpin entries Hairpin entries creation is fully synchronized by hairpin_tbl_lock. In order to allow concurrent initialization of mlx5e_hairpin structure instances and provisioning of hairpin entries to hardware, extend mlx5e_hairpin_entry with 'res_ready' completion. Move call to mlx5e_hairpin_create() out of hairpin_tbl_lock critical section. Modify code that attaches new flows to existing hpe to wait for 'res_ready' completion before using the hpe. Insert hpe to hairpin table before provisioning it to hardware and modify all users of hairpin table to verify that hpe was fully initialized by checking hpe->hp pointer (and to wait for 'res_ready' completion, if necessary). Modify dead peer update event handling function to save hpe's to temporary list with their reference counter incremented. Wait for completion of hpe's in temporary list and update their 'peer_gone' flag outside of hairpin_tbl_lock critical section. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:09 -07:00
Vlad Buslov	b32accda8a	net/mlx5e: Protect hairpin hash table with mutex To remove dependency on rtnl lock, protect hairpin hash table from concurrent modifications with new "hairpin_tbl_lock" mutex. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:08 -07:00
Vlad Buslov	73edca736e	net/mlx5e: Protect hairpin entry flows list with spinlock To remove dependency on rtnl lock, extend hairpin entry with spinlock and use it to protect list of flows attached to hairpin entry from concurrent modifications. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:08 -07:00
Vlad Buslov	e4f9abbd38	net/mlx5e: Extend hairpin entry with reference counter List of flows attached to hairpin entry is used as implicit reference counter (hairpin entry is deallocated when list becomes free) and as a mechanism to obtain hairpin entry that flow is attached to (through list head). This is not safe when concurrent modification of list of flows attached to hairpin entry is possible. Proper atomic reference counter is required to support concurrent access. As a preparation for extending hairpin with reference counting, extract code that deletes hairpin entry into standalone function. In order to remove this dependency on external locking, extend hairpin entry with reference counter to manage its lifetime and extend flow structure with direct pointer to hairpin entry that flow is attached to. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:08 -07:00
David S. Miller	f52ea3c55a	Merge branch 'hns3-next' Huazhong Tan says: ==================== net: hns3: add some bugfixes & optimizations & cleanups for HNS3 driver This patch-set includes code optimizations, bugfixes and cleanups for the HNS3 ethernet controller driver. [patch 01/12] fixes a GFP flag error. [patch 02/12] fixes a VF interrupt error. [patch 03/12] adds a cleanup for VLAN handling. [patch 04/12] fixes a bug in debugfs. [patch 05/12] modifies pause displaying format. [patch 06/12] adds more DFX information for ethtool -d. [patch 07/12] adds more TX statistics information. [patch 08/12] adds a check for TX BD number. [patch 09/12] adds a cleanup for dumping NCL_CONFIG. [patch 10/12] refines function for querying MAC pause statistics. [patch 11/12] adds a handshake with VF when doing PF reset. [patch 12/12] refines some macro definitions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:44:33 -07:00

1 2 3 4 5 ...

856809 Commits