linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-13 14:24:11 +08:00

Author	SHA1	Message	Date
Bryan Whitehead	e0e587878f	lan743x: Remove MAC Reset from initialization The MAC Reset was noticed to erase important EEPROM settings. It is also unnecessary since a chip wide reset was done earlier in initialization, and that reset preserves EEPROM settings. There for this patch removes the unnecessary MAC specific reset. Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 15:48:11 -08:00
Michael S. Tsirkin	c5c08bed84	virtio: fix test build after uio.h change Fixes: `d38499530e` ("fs: decouple READ and WRITE from the block layer ops") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-12-19 18:23:49 -05:00
David S. Miller	d9842f388b	mlx5-fixes-2018-12-19 Some fixes for the mlx5 driver -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJcGrikAAoJEEg/ir3gV/o+tSEH/A6192OXs+B6phGdMr6GyhbK nzw6bkJfjHLMDj00F+9qLYYQbqfG1QjDYZnthPCezD9Kv0thFuP8cvSSY1S6SsJq uMwlHwYQ4ljZvl49pkwiiKMufWGcuVYf+NiEmvrYVWXugej1wENe1xiTS9V3OCAA j9gQA7gYeEwveOLWLakwzbfvLJ+fyUkLDjB1EJPOzh2fnfkGl8Z8LovAmoXOywdn PdMh9xAlqovJyY9kS4vaj/1WDCO9+rUfGHXDLjjzG5ygsfipyb8xSdzsY9WgriMu NKgg1VpEjc9egalT9qqznUE7ilfBUvW++Yc0wsrH+Cme3Rj322L3Es+j5fduKGo= =YiD1 -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2018-12-19 Some fixes for the mlx5 driver ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 13:44:12 -08:00
David S. Miller	24894bc6ea	Merge branch 'neigh-get-support' Roopa Prabhu says: ==================== neigh get support This series adds support for neigh get similar to route and recently added fdb get. v2: fix key len check. and some other fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 13:37:34 -08:00
Roopa Prabhu	8deecf3557	selftests: rtnetlink.sh: add testcase for neigh get Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 13:37:34 -08:00
Roopa Prabhu	82cbb5c631	neighbour: register rtnl doit handler this patch registers neigh doit handler. The doit handler returns a neigh entry given dst and dev. This is similar to route and fdb doit (get) handlers. Also moves nda_policy declaration from rtnetlink.c to neighbour.c Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 13:37:34 -08:00
Alaa Hleihel	4765420439	net/mlx5e: Remove the false indication of software timestamping support mlx5 driver falsely advertises support of software timestamping. Fix it by removing the false indication. Fixes: `ef9814deaf` ("net/mlx5e: Add HW timestamping (TS) support") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-19 13:31:16 -08:00
Yuval Avnery	f033788914	net/mlx5: Typo fix in del_sw_hw_rule Expression terminated with "," instead of ";", resulted in set_fte getting bad value for modify_enable_mask field. Fixes: `bd5251dbf1` ("net/mlx5_core: Introduce flow steering destination of type counter") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-19 13:31:16 -08:00
Tariq Toukan	bfc698254b	net/mlx5e: RX, Fix wrong early return in receive queue poll When the completion queue of the RQ is empty, do not immediately return. If left-over decompressed CQEs (from the previous cycle) were processed, need to go to the finalization part of the poll function. Bug exists only when CQE compression is turned ON. This solves the following issue: mlx5_core 0000:82:00.1: mlx5_eq_int:544:(pid 0): CQ error on CQN 0xc08, syndrome 0x1 mlx5_core 0000:82:00.1 p4p2: mlx5e_cq_error_event: cqn=0x000c08 event=0x04 Fixes: `4b7dfc9925` ("net/mlx5e: Early-return on empty completion queues") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-19 13:31:16 -08:00
David S. Miller	4ab0edecaf	Merge branch 'mlxsw-Make-driver-more-robust' Ido Schimmel says: ==================== mlxsw: Make driver more robust In recent months we fixed several bugs in the driver that could have been avoided by re-evaluating some of the involved code paths and by introducing relevant and comprehensive test cases. This patchset tries to do that by introducing a set of small and mostly non-functional changes in addition to a new test. I have further improvements in mind, but they can be done in a different set. Patch #1 makes sure we correctly sanitize upper devices of a VLAN interface. Patch #2 removes an unexpected behavior from the driver, in which routes configured on a VLAN interface will cease being offloaded after certain operations. Patch #3 is a small cleanup. Patch #4 simplifies the driver by removing reference counting from VLAN entries configured on a port. Patches #5-#6 simplify linking/unlinking from a bridge, especially when LAG and VLAN devices are involved. They make both operations symmetric even when ports are unlinked from a bridged LAG device. Patch #7-#9 make router interface (RIF) deletion more robust by removing reliance on device chain to indicate whether a NETDEV_DOWN event in the inet{,6}addr notification chains should be processed. This is due to the fact that IP addresses can be flushed from a netdev after it was unlinked from its lower device. Patch #10 adds a new test to for valid and invalid configurations over mlxsw ports. Some of the test cases are derived from recent fixes. I expect that more test cases will be added over time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:08 -08:00
Ido Schimmel	489c25f9a3	selftests: mlxsw: Add rtnetlink tests Add a new test that is focused on rtnetlink configuration. Its purpose is to test valid and invalid (as deemed by mlxsw) configurations and make sure that they succeed / fail without producing a trace. Some of the test cases are derived from recent fixes in order to make sure that the fixed bugs are not introduced again. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	b61cd7c6f9	mlxsw: spectrum_router: Hold a reference on RIF's netdev Previous patches tried to make RIF deletion more robust and avoid use-after-free situations. As another precaution, hold a reference on a RIF's netdev and release it when the RIF is deleted. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	965fa8e600	mlxsw: spectrum_router: Make RIF deletion more robust In the past we had multiple instances where RIFs were not properly deleted. One of the reasons for leaking a RIF was that at the time when IP addresses were flushed from the respective netdev (prompting the destruction of the RIF), the netdev was no longer a mlxsw upper. This caused the inet{,6}addr notification blocks to ignore the NETDEV_DOWN event and leak the RIF. Instead of checking whether the netdev is our upper when an IP address is removed, we can instead check if the netdev has a RIF configured. To look up a RIF we need to access mlxsw private data, so the patch stores the notification blocks inside a mlxsw struct. This then allows us to use container_of() and extract the required private data. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	21ffedb6db	mlxsw: spectrum_router: Propagate 'struct mlxsw_sp' further Next patch is going to make RIF deletion more robust by removing reliance on fragile mlxsw_sp_lower_get(). This is because a netdev is not necessarily our upper anymore when its IP addresses are flushed. The inet{,6}addr notification blocks are going to resolve 'struct mlxsw_sp' using container_of(), but the functions they call still use mlxsw_sp_lower_get(). As a preparation for the next patch, propagate 'struct mlxsw_sp' down to the functions called from the notification blocks and remove reliance on mlxsw_sp_lower_get(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	be2d6f421f	mlxsw: spectrum: Properly cleanup LAG uppers when removing port from LAG When a LAG device or a VLAN device on top of it is enslaved to a bridge, the driver propagates the CHANGEUPPER event to the LAG's slaves. This causes each physical port to increase the reference count of the internal representation of the bridge port by calling mlxsw_sp_port_bridge_join(). However, when a port is removed from a LAG, the corresponding leave() function is not called and the reference count is not decremented. This leads to ugly hacks such as mlxsw_sp_bridge_port_should_destroy() that try to understand if the bridge port should be destroyed even when its reference count is not 0. Instead, make sure that when a port is unlinked from a LAG it would see the same events as if the LAG (or its uppers) were unlinked from a bridge. The above is achieved by walking the LAG's uppers when a port is unlinked and calling mlxsw_sp_port_bridge_leave() for each upper that is enslaved to a bridge. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	635c8c8bba	mlxsw: spectrum: Remove reference count from VLAN entries Commit `b3529af6bb` ("spectrum: Reference count VLAN entries") started reference counting port-VLAN entries in a similar fashion to the 8021q driver. However, this is not actually needed and only complicates things. Instead, the driver should forbid the creation of a VLAN on a port if this VLAN already exists. This would also solve the issue fixed by the mentioned commit. Therefore, remove the get()/put() API and use create()/destroy() instead. One place that needs special attention is VLAN addition in a VLAN-aware bridge via switchdev operations. In case the VLAN flags (e.g., 'pvid') are toggled, then the VLAN entry already exists. To prevent the driver from wrongly returning EEXIST, the driver is changed to check in the prepare phase whether the entry already exists and only returns an error in case it is not associated with the correct bridge port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	e149113a74	mlxsw: spectrum: Handle VLAN device unlinking In commit `993107fea5` ("mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl") I fixed a bug caused by the fact that the driver views differently the deletion of a VLAN device when it is deleted via an ioctl and netlink. Instead of relying on a specific order of events (device being unregistered vs. VLAN filter being updated), simply make sure that the driver performs the necessary cleanup when the VLAN device is unlinked, which always happens before the other two events. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	f1d7c33d6a	mlxsw: spectrum_fid: Remove unused function This function is no longer used. Remove it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	32fd4b49a3	mlxsw: spectrum_router: Do not destroy RIFs based on FID's reference count Currently, when a RIF is constructed on top of a FID, the RIF increments the FID's reference count and the RIF is destroyed when the FID's reference count drops to 1. This effectively means that when no local ports are member in the FID, the FID is destroyed regardless if the router port is a member in the FID or not. The above can lead to the unexpected behavior in which routes using a VLAN interface as their nexthop device are no longer offloaded after the last local port leaves the corresponding VLAN (FID). Example: # ip -4 route show dev br0.10 192.0.2.0/24 proto kernel scope link src 192.0.2.1 offload # bridge vlan del vid 10 dev swp3 # ip -4 route show dev br0.10 192.0.2.0/24 proto kernel scope link src 192.0.2.1 After the patch, the route is offloaded before and after the VLAN is removed from local port 'swp3', as the RIF corresponding to 'br0.10' continues to exists. In order to remove RIFs' reliance on the underlying FID's reference count, we need to add a reference count to sub-port RIFs, which are RIFs that correspond to physical ports and their uppers (e.g., LAG devices). In this case, each {Port, VID} ('struct mlxsw_sp_port_vlan') needs to hold a reference on the RIF. For example: bond0.10 \| bond0 \| +-------+ \| \| swp1 swp2 Both {Port 1, VID 10} and {Port 2, VID 10} will hold a reference on the RIF corresponding to 'bond0.10'. When the last reference is dropped, the RIF will be destroyed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	927d0ef10a	mlxsw: spectrum: Sanitize VLAN interface's uppers Currently, only VRF and macvlan uppers are supported on top of VLAN device configured over a bridge, so make sure the driver forbids other uppers. Note that enslavement to a VRF is handled earlier in the notification block, so there is no need to check for a VRF upper here. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Cong Wang	fb24274546	ipv6: explicitly initialize udp6_addr in udp_sock_create6() syzbot reported the use of uninitialized udp6_addr::sin6_scope_id. We can just set ::sin6_scope_id to zero, as tunnels are unlikely to use an IPv6 address that needs a scope id and there is no interface to bind in this context. For net-next, it looks different as we have cfg->bind_ifindex there so we can probably call ipv6_iface_scope_id(). Same for ::sin6_flowinfo, tunnels don't use it. Fixes: `8024e02879` ("udp: Add udp_sock_create for UDP tunnels to open listener socket") Reported-by: syzbot+c56449ed3652e6720f30@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:09:00 -08:00
Hoang Le	055722716c	tipc: fix uninitialized value for broadcast retransmission When sending broadcast message on high load system, there are a lot of unnecessary packets restranmission. That issue was caused by missing in initial criteria for retransmission. To prevent this happen, just initialize this criteria for retransmission in next 10 milliseconds. Fixes: `31c4f4cc32` ("tipc: improve broadcast retransmission algorithm") Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:53:10 -08:00
David S. Miller	013dc9d55c	Merge branch 'tipc-tracepoints' Tuong Lien says: ==================== tipc: tracepoints and trace_events in TIPC The patch series is the first step of introducing a tracing framework in TIPC, which will assist in collecting complete & plentiful data for post analysis, even in the case of a single failure occurrence e.g. when the failure is unreproducible. The tracing code in TIPC utilizes the powerful kernel tracepoints, trace events features along with particular dump functions to trace the TIPC object data and events (incl. bearer, link, socket, node, etc.). The tracing code should generate zero-load to TIPC when the trace events are not enabled. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:49:25 -08:00
Tuong Lien	cf5f55f7f0	tipc: add trace_events for tipc bearer The commit adds the new trace_event for TIPC bearer, L2 device event: trace_tipc_l2_device_event() Also, it puts the trace at the tipc_l2_device_event() function, then the device/bearer events and related info can be traced out during runtime when needed. Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:49:25 -08:00
Tuong Lien	eb18a510b5	tipc: add trace_events for tipc node The commit adds the new trace_events for TIPC node object: trace_tipc_node_create() trace_tipc_node_delete() trace_tipc_node_lost_contact() trace_tipc_node_timeout() trace_tipc_node_link_up() trace_tipc_node_link_down() trace_tipc_node_reset_links() trace_tipc_node_fsm_evt() trace_tipc_node_check_state() Also, enables the traces for the following cases: - When a node is created/deleted; - When a node contact is lost; - When a node timer is timed out; - When a node link is up/down; - When all node links are reset; - When node state is changed; - When a skb comes and node state needs to be checked/updated. Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:49:24 -08:00
Tuong Lien	01e661ebfb	tipc: add trace_events for tipc socket The commit adds the new trace_events for TIPC socket object: trace_tipc_sk_create() trace_tipc_sk_poll() trace_tipc_sk_sendmsg() trace_tipc_sk_sendmcast() trace_tipc_sk_sendstream() trace_tipc_sk_filter_rcv() trace_tipc_sk_advance_rx() trace_tipc_sk_rej_msg() trace_tipc_sk_drop_msg() trace_tipc_sk_release() trace_tipc_sk_shutdown() trace_tipc_sk_overlimit1() trace_tipc_sk_overlimit2() Also, enables the traces for the following cases: - When user creates a TIPC socket; - When user calls poll() on TIPC socket; - When user sends a dgram/mcast/stream message. - When a message is put into the socket 'sk_receive_queue'; - When a message is released from the socket 'sk_receive_queue'; - When a message is rejected (e.g. due to no port, invalid, etc.); - When a message is dropped (e.g. due to wrong message type); - When socket is released; - When socket is shutdown; - When socket rcvq's allocation is overlimit (> 90%); - When socket rcvq + bklq's allocation is overlimit (> 90%); - When the 'TIPC_ERR_OVERLOAD/2' issue happens; Note: a) All the socket traces are designed to be able to trace on a specific socket by either using the 'event filtering' feature on a known socket 'portid' value or the sysctl file: /proc/sys/net/tipc/sk_filter The file determines a 'tuple' for what socket should be traced: (portid, sock type, name type, name lower, name upper) where: + 'portid' is the socket portid generated at socket creating, can be found in the trace outputs or the 'tipc socket list' command printouts; + 'sock type' is the socket type (1 = SOCK_TREAM, ...); + 'name type', 'name lower' and 'name upper' are the service name being connected to or published by the socket. Value '0' means 'ANY', the default tuple value is (0, 0, 0, 0, 0) i.e. the traces happen for every sockets with no filter. b) The 'tipc_sk_overlimit1/2' event is also a conditional trace_event which happens when the socket receive queue (and backlog queue) is about to be overloaded, when the queue allocation is > 90%. Then, when the trace is enabled, the last skbs leading to the TIPC_ERR_OVERLOAD/2 issue can be traced. The trace event is designed as an 'upper watermark' notification that the other traces (e.g. 'tipc_sk_advance_rx' vs 'tipc_sk_filter_rcv') or actions can be triggerred in the meanwhile to see what is going on with the socket queue. In addition, the 'trace_tipc_sk_dump()' is also placed at the 'TIPC_ERR_OVERLOAD/2' case, so the socket and last skb can be dumped for post-analysis. Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:49:24 -08:00
Tuong Lien	26574db0c1	tipc: add trace_events for tipc link The commit adds the new trace_events for TIPC link object: trace_tipc_link_timeout() trace_tipc_link_fsm() trace_tipc_link_reset() trace_tipc_link_too_silent() trace_tipc_link_retrans() trace_tipc_link_bc_ack() trace_tipc_link_conges() And the traces for PROTOCOL messages at building and receiving: trace_tipc_proto_build() trace_tipc_proto_rcv() Note: a) The 'tipc_link_too_silent' event will only happen when the 'silent_intv_cnt' is about to reach the 'abort_limit' value (and the event is enabled). The benefit for this kind of event is that we can get an early indication about TIPC link loss issue due to timeout, then can do some necessary actions for troubleshooting. For example: To trigger the 'tipc_proto_rcv' when the 'too_silent' event occurs: echo 'enable_event:tipc:tipc_proto_rcv' > \ events/tipc/tipc_link_too_silent/trigger And disable it when TIPC link is reset: echo 'disable_event:tipc:tipc_proto_rcv' > \ events/tipc/tipc_link_reset/trigger b) The 'tipc_link_retrans' or 'tipc_link_bc_ack' event is useful to trace TIPC retransmission issues. In addition, the commit adds the 'trace_tipc_list/link_dump()' at the 'retransmission failure' case. Then, if the issue occurs, the link 'transmq' along with the link data can be dumped for post-analysis. These dump events should be enabled by default since it will only take effect when the failure happens. The same approach is also applied for the faulty case that the validation of protocol message is failed. Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:49:24 -08:00
Tuong Lien	b4b9771bcb	tipc: enable tracepoints in tipc As for the sake of debugging/tracing, the commit enables tracepoints in TIPC along with some general trace_events as shown below. It also defines some 'tipc__dump()' functions that allow to dump TIPC object data whenever needed, that is, for general debug purposes, ie. not just for the trace_events. The following trace_events are now available: - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data, e.g. message type, user, droppable, skb truesize, cloned skb, etc. - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or queues, e.g. TIPC link transmq, socket receive queue, etc. - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g. sk state, sk type, connection type, rmem_alloc, socket queues, etc. - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g. link state, silent_intv_cnt, gap, bc_gap, link queues, etc. - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g. node state, active links, capabilities, link entries, etc. How to use: Put the trace functions at any places where we want to dump TIPC data or events. Note: a) The dump functions will generate raw data only, that is, to offload the trace event's processing, it can require a tool or script to parse the data but this should be simple. b) The trace_tipc__dump() should be reserved for a failure cases only (e.g. the retransmission failure case) or where we do not expect to happen too often, then we can consider enabling these events by default since they will almost not take any effects under normal conditions, but once the rare condition or failure occurs, we get the dumped data fully for post-analysis. For other trace purposes, we can reuse these trace classes as template but different events. c) A trace_event is only effective when we enable it. To enable the TIPC trace_events, echo 1 to 'enable' files in the events/tipc/ directory in the 'debugfs' file system. Normally, they are located at: /sys/kernel/debug/tracing/events/tipc/ For example: To enable the tipc_link_dump event: echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable To enable all the TIPC trace_events: echo 1 > /sys/kernel/debug/tracing/events/tipc/enable To collect the trace data: cat trace or cat trace_pipe > /trace.out & To disable all the TIPC trace_events: echo 0 > /sys/kernel/debug/tracing/events/tipc/enable To clear the trace buffer: echo > trace d) Like the other trace_events, the feature like 'filter' or 'trigger' is also usable for the tipc trace_events. For more details, have a look at: Documentation/trace/ftrace.txt MAINTAINERS \| add two new files 'trace.h' & 'trace.c' in tipc Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:49:24 -08:00
Michael Chan	84404d5fd5	bnxt_en: Fix ethtool self-test loopback. The current code has 2 problems. It assumes that the RX ring for the loopback packet is combined with the TX ring. This is not true if the ethtool channels are set to non-combined mode. The second problem is that it won't work on 57500 chips without adjusting the logic to get the proper completion ring (cpr) pointer. Fix both issues by locating the proper cpr pointer through the RX ring. Fixes: `e44758b78a` ("bnxt_en: Use bnxt_cp_ring_info struct pointer as parameter for RX path.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:31:09 -08:00
David S. Miller	4a54877ee7	Merge branch 'sk_buff-add-extension-infrastructure' Florian Westphal says: ==================== sk_buff: add extension infrastructure TL;DR: - objdiff shows no change if CONFIG_XFRM=n && BR_NETFILTER=n - small size reduction when one or both options are set - no changes in ipsec performance Changes since v1: - Allocate entire extension space from a kmem_cache. - Avoid atomic_dec_and_test operation on skb_ext_put() for refcnt == 1 case. (similar to kfree_skbmem() fclone_ref use). This adds an optional extension infrastructure, with ispec (xfrm) and bridge netfilter as first users. The third (future) user is Multipath TCP which is still out-of-tree. MPTCP needs to map logical mptcp sequence numbers to the tcp sequence numbers used by individual subflows. This DSS mapping is read/written from tcp option space on receive and written to tcp option space on transmitted tcp packets that are part of and MPTCP connection. Extending skb_shared_info or adding a private data field to skb fclones doesn't work for incoming skb, so a different DSS propagation method would be required for the receive side. mptcp has same requirements as secpath/bridge netfilter: 1. extension memory is released when the sk_buff is free'd. 2. data is shared after cloning an skb (clone inherits extension) 3. adding extension to an skb will COW the extension buffer if needed. Two new members are added to sk_buff: 1. 'active_extensions' byte (filling a hole), telling which extensions are available for this skb. This has two purposes. a) avoids the need to initialize the pointer. b) allows to "delete" an extension by clearing its bit value in ->active_extensions. While it would be possible to store the active_extensions byte in the extension struct instead of sk_buff, there is one problem with this: When an extension has to be disabled, we can always clear the bit in skb->active_extensions. But in case it would be stored in the extension buffer itself, we might have to COW it first, if we are dealing with a cloned skb. On kmalloc failure we would be unable to turn an extension off. 2. extension pointer, located at the end of the sk_buff. If the active_extensions byte is 0, the pointer is undefined, it is not initialized on skb allocation. This adds extra code to skb clone and free paths (to deal with refcount/free of extension area) but this replaces similar code that manages skb->nf_bridge and skb->sp structs in the followup patches of the series. It is possible to add support for extensions that are not preseved on clones/copies: 1. define a bitmask of all extensions that need copy/cow on clone 2. change __skb_ext_copy() to check ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE 3. set clone->active_extensions to 0 if test is false. This isn't done here because all extensions that get added here need the copy/cow semantics. Last patch converts skb->sp, secpath information gets stored as new SKB_EXT_SEC_PATH, so the 'sp' pointer is removed from skbuff. Extra code added to skb clone and free paths (to deal with refcount/free of extension area) replaces the existing code that does the same for skb->nf_bridge and skb->secpath. I don't see any other in-tree users that could benefit from this infrastructure, it doesn't make sense to add an extension just for the sake of a single flag bit (like skb->nf_trace). Adding a new extension is a good fit if all of the following are true: 1. Data is related to the skb/packet aggregate 2. Data should be freed when the skb is free'd 3. Data is not going to be relevant/needed in normal case (udp, tcp, forwarding workloads, ...) 4. There are no fancy action(s) needed on clone/free, such as callbacks into kernel modules. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:45 -08:00
Florian Westphal	4165079ba3	net: switch secpath to use skb extension infrastructure Remove skb->sp and allocate secpath storage via extension infrastructure. This also reduces sk_buff by 8 bytes on x86_64. Total size of allyesconfig kernel is reduced slightly, as there is less inlined code (one conditional atomic op instead of two on skb_clone). No differences in throughput in following ipsec performance tests: - transport mode with aes on 10GB link - tunnel mode between two network namespaces with aes and null cipher Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:38 -08:00
Florian Westphal	a84e3f5333	xfrm: prefer secpath_set over secpath_dup secpath_set is a wrapper for secpath_dup that will not perform an allocation if the secpath attached to the skb has a reference count of one, i.e., it doesn't need to be COW'ed. Also, secpath_dup doesn't attach the secpath to the skb, it leaves this to the caller. Use secpath_set in places that immediately assign the return value to skb. This allows to remove skb->sp without touching these spots again. secpath_dup can eventually be removed in followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:38 -08:00
Florian Westphal	a053c86649	drivers: chelsio: use skb_sec_path helper reduce noise when skb->sp is removed later in the series. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:38 -08:00
Florian Westphal	26912e3756	xfrm: use secpath_exist where applicable Will reduce noise when skb->sp is removed later in this series. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	56d1ac3260	drivers: net: netdevsim: use skb_sec_path helper ... so this won't have to be changed when skb->sp goes away. v2: no changes, preserve ack. Acked-by: Shannon Nelson <shannon.lee.nelson@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	6362a6a040	drivers: net: ethernet: mellanox: use skb_sec_path helper Will avoid touching this when sp pointer is removed from sk_buff struct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	2fdb435bc0	drivers: net: intel: use secpath helpers in more places Use skb_sec_path and secpath_exists helpers where possible. This reduces noise in followup patch that removes skb->sp pointer. v2: no changes, preseve acks from v1. Acked-by: Shannon Nelson <shannon.lee.nelson@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	2294be0f11	net: use skb_sec_path helper in more places skb_sec_path gains 'const' qualifier to avoid xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type same reasoning as previous conversions: Won't need to touch these spots anymore when skb->sp is removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	7af8f4ca31	net: move secpath_exist helper to sk_buff.h Future patch will remove skb->sp pointer. To reduce noise in those patches, move existing helper to sk_buff and use it in more places to ease skb->sp replacement later. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	0ca64da128	xfrm: change secpath_set to return secpath struct, not error value It can only return 0 (success) or -ENOMEM. Change return value to a pointer to secpath struct. This avoids direct access to skb->sp: err = secpath_set(skb); if (!err) .. skb->sp-> ... Becomes: sp = secpath_set(skb) if (!sp) .. sp-> .. This reduces noise in followup patch which is going to remove skb->sp. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	de8bda1d22	net: convert bridge_nf to use skb extension infrastructure This converts the bridge netfilter (calling iptables hooks from bridge) facility to use the extension infrastructure. The bridge_nf specific hooks in skb clone and free paths are removed, they have been replaced by the skb_ext hooks that do the same as the bridge nf allocations hooks did. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	df5042f4c5	sk_buff: add skb extension infrastructure This adds an optional extension infrastructure, with ispec (xfrm) and bridge netfilter as first users. objdiff shows no changes if kernel is built without xfrm and br_netfilter support. The third (planned future) user is Multipath TCP which is still out-of-tree. MPTCP needs to map logical mptcp sequence numbers to the tcp sequence numbers used by individual subflows. This DSS mapping is read/written from tcp option space on receive and written to tcp option space on transmitted tcp packets that are part of and MPTCP connection. Extending skb_shared_info or adding a private data field to skb fclones doesn't work for incoming skb, so a different DSS propagation method would be required for the receive side. mptcp has same requirements as secpath/bridge netfilter: 1. extension memory is released when the sk_buff is free'd. 2. data is shared after cloning an skb (clone inherits extension) 3. adding extension to an skb will COW the extension buffer if needed. The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the mapping for tx and rx processing. Two new members are added to sk_buff: 1. 'active_extensions' byte (filling a hole), telling which extensions are available for this skb. This has two purposes. a) avoids the need to initialize the pointer. b) allows to "delete" an extension by clearing its bit value in ->active_extensions. While it would be possible to store the active_extensions byte in the extension struct instead of sk_buff, there is one problem with this: When an extension has to be disabled, we can always clear the bit in skb->active_extensions. But in case it would be stored in the extension buffer itself, we might have to COW it first, if we are dealing with a cloned skb. On kmalloc failure we would be unable to turn an extension off. 2. extension pointer, located at the end of the sk_buff. If the active_extensions byte is 0, the pointer is undefined, it is not initialized on skb allocation. This adds extra code to skb clone and free paths (to deal with refcount/free of extension area) but this replaces similar code that manages skb->nf_bridge and skb->sp structs in the followup patches of the series. It is possible to add support for extensions that are not preseved on clones/copies. To do this, it would be needed to define a bitmask of all extensions that need copy/cow semantics, and change __skb_ext_copy() to check ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set ->active_extensions to 0 on the new clone. This isn't done here because all extensions that get added here need the copy/cow semantics. v2: Allocate entire extension space using kmem_cache. Upside is that this allows better tracking of used memory, downside is that we will allocate more space than strictly needed in most cases (its unlikely that all extensions are active/needed at same time for same skb). The allocated memory (except the small extension header) is not cleared, so no additonal overhead aside from memory usage. Avoid atomic_dec_and_test operation on skb_ext_put() by using similar trick as kfree_skbmem() does with fclone_ref: If recount is 1, there is no concurrent user and we can free right away. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
Florian Westphal	c4b0e771f9	netfilter: avoid using skb->nf_bridge directly This pointer is going to be removed soon, so use the existing helpers in more places to avoid noise when the removal happens. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:21:37 -08:00
David S. Miller	8239d57904	Merge branch 'dpaa2-eth-add-QBMAN-statistics' Ioana Ciornei says: ==================== dpaa2-eth: add QBMAN statistics This patch set adds ethtool statistics for pending frames/bytes in Rx/Tx conf FQs and number of buffers in pool. The first patch adds support for the query APIs in the DPIO driver while the latter actually exposes the statistics through ethtool. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 10:37:23 -08:00
Ioana Radulescu	610febc68a	dpaa2-eth: Add QBMAN related stats Add statistics for pending frames in Rx/Tx conf FQs and number of buffers in pool. Available through ethtool -S. Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Ioana ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 10:37:22 -08:00
Roy Pledge	e80081c34b	soc: fsl: dpio: Add BP and FQ query APIs Add FQ (Frame Queue) and BP (Buffer Pool) query APIs that users of QBMan can invoke to see the status of the queues and pools that they are using. Signed-off-by: Roy Pledge <roy.pledge@nxp.com> Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 10:37:22 -08:00
Raju Lakkaraju	7b98f63ea7	net: phy: mscc: Fix the VSC 8531/41 Chip Init sequence - Turn on Broadcast writes - UNH 1.8.1 clear bias for UNH 1000BT distortion - UNH 1.8.7 optimize pre-emphasis for 100BasTx UNH 100W fix - Enable Token-ring during 'Coma Mode' Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 10:33:25 -08:00
David S. Miller	912cb1d55c	Merge branch 'rds-fixes' Shamir Rabinovitch says: ==================== WARNING in rds_message_alloc_sgs This patch set fix google syzbot rds bug found in linux-next. The first patch solve the syzbot issue. The second patch fix issue mentioned by Leon Romanovsky that drivers should not call WARN_ON as result from user input. syzbot bug report can be foud here: https://lkml.org/lkml/2018/10/31/28 v1->v2: - patch 1: make rds_iov_vector fields name more descriptive (Hakon) - patch 1: fix potential mem leak in rds_rm_size if krealloc fail (Hakon) v2->v3: - patch 2: harden rds_sendmsg for invalid number of sgs (Gerd) v3->v4 - Santosh a.b. on both patches + repost to net-dev ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 10:27:58 -08:00
shamir rabinovitch	c75ab8a55a	net/rds: remove user triggered WARN_ON in rds_sendmsg per comment from Leon in rdma mailing list https://lkml.org/lkml/2018/10/31/312 : Please don't forget to remove user triggered WARN_ON. https://lwn.net/Articles/769365/ "Greg Kroah-Hartman raised the problem of core kernel API code that will use WARN_ON_ONCE() to complain about bad usage; that will not generate the desired result if WARN_ON_ONCE() is configured to crash the machine. He was told that the code should just call pr_warn() instead, and that the called function should return an error in such situations. It was generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be triggered from user space need to be fixed." in addition harden rds_sendmsg to detect and overcome issues with invalid sg count and fail the sendmsg. Suggested-by: Leon Romanovsky <leon@kernel.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 10:27:58 -08:00
shamir rabinovitch	ea010070d0	net/rds: fix warn in rds_message_alloc_sgs redundant copy_from_user in rds_sendmsg system call expose rds to issue where rds_rdma_extra_size walk the rds iovec and and calculate the number pf pages (sgs) it need to add to the tail of rds message and later rds_cmsg_rdma_args copy the rds iovec again and re calculate the same number and get different result causing WARN_ON in rds_message_alloc_sgs. fix this by doing the copy_from_user only once per rds_sendmsg system call. When issue occur the below dump is seen: WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316 rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 19789 Comm: syz-executor827 Not tainted 4.19.0-next-20181030+ #101 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x244/0x39d lib/dump_stack.c:113 panic+0x2ad/0x55c kernel/panic.c:188 __warn.cold.8+0x20/0x45 kernel/panic.c:540 report_bug+0x254/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271 do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969 RIP: 0010:rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316 Code: c0 74 04 3c 03 7e 6c 44 01 ab 78 01 00 00 e8 2b 9e 35 fa 4c 89 e0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 14 9e 35 fa <0f> 0b 31 ff 44 89 ee e8 18 9f 35 fa 45 85 ed 75 1b e8 fe 9d 35 fa RSP: 0018:ffff8801c51b7460 EFLAGS: 00010293 RAX: ffff8801bc412080 RBX: ffff8801d7bf4040 RCX: ffffffff8749c9e6 RDX: 0000000000000000 RSI: ffffffff8749ca5c RDI: 0000000000000004 RBP: ffff8801c51b7490 R08: ffff8801bc412080 R09: ffffed003b5c5b67 R10: ffffed003b5c5b67 R11: ffff8801dae2db3b R12: 0000000000000000 R13: 000000000007165c R14: 000000000007165c R15: 0000000000000005 rds_cmsg_rdma_args+0x82d/0x1510 net/rds/rdma.c:623 rds_cmsg_send net/rds/send.c:971 [inline] rds_sendmsg+0x19a2/0x3180 net/rds/send.c:1273 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117 __sys_sendmsg+0x11d/0x280 net/socket.c:2155 __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x44a859 Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 6b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1d4710ada8 EFLAGS: 00000297 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 000000000044a859 RDX: 0000000000000000 RSI: 0000000020001600 RDI: 0000000000000003 RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000297 R12: 00000000006dcc2c R13: 646e732f7665642f R14: 00007f1d4710b9c0 R15: 00000000006dcd2c Kernel Offset: disabled Rebooting in 86400 seconds.. Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 10:27:58 -08:00

1 2 3 4 5 ...

800311 Commits