linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 08:14:15 +08:00

Author	SHA1	Message	Date
David S. Miller	38f75922a6	Merge branch 'mptcp-C-flag-and-fixes' Mat Martineau says: ==================== mptcp: Connection-time 'C' flag and two fixes Here are six more patches from the MPTCP tree. Most of them add support for the 'C' flag in the MPTCP connection-time option headers. This flag affects how the initial address and port are treated by each peer. Normally one peer may send MP_JOIN requests to the remote address and port that were used when initiating the MPTCP connection. The 'C' bit indicates that MP_JOINs should only be sent to remote addresses that have been advertised with ADD_ADDR. The other two patches are unrelated improvements. Patches 1-4: Add the 'C' flag feature, a sysctl to optionally enable it, and a selftest. Patch 5: Adjust rp_filter settings in a selftest. Patch 6: Improve rbuf cleanup for MPTCP sockets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 14:36:01 -07:00
Paolo Abeni	fde56eea01	mptcp: refine mptcp_cleanup_rbuf The current cleanup rbuf tries a bit too hard to avoid acquiring the subflow socket lock. We may end-up delaying the needed ack, or skip acking a blocked subflow. Address the above extending the conditions used to trigger the cleanup to reflect more closely what TCP does and invoking tcp_cleanup_rbuf() on all the active subflows. Note that we can't replicate the exact tests implemented in tcp_cleanup_rbuf(), as MPTCP lacks some of the required info - e.g. ping-pong mode. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 14:36:01 -07:00
Yonglong Li	d8e336f77e	selftests: mptcp: turn rp_filter off on each NIC To turn rp_filter off we should: echo 0 > /proc/sys/net/ipv4/conf/default/rp_filter and echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter before NIC created. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 14:36:01 -07:00
Geliang Tang	0cddb4a6f4	selftests: mptcp: add deny_join_id0 testcases This patch added a new argument '-d' for mptcp_join.sh script, to invoke the testcases for the MP_CAPABLE 'C' flag. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 14:36:01 -07:00
Geliang Tang	df377be387	mptcp: add deny_join_id0 in mptcp_options_received This patch added a new flag named deny_join_id0 in struct mptcp_options_received. Set it when MP_CAPABLE with the flag MPTCP_CAP_DENYJOIN_ID0 is received. Also add a new flag remote_deny_join_id0 in struct mptcp_pm_data. When the flag deny_join_id0 is set, set this remote_deny_join_id0 flag. In mptcp_pm_create_subflow_or_signal_addr, if the remote_deny_join_id0 flag is set, and the remote address id is zero, stop this connection. Suggested-by: Florian Westphal <fw@strlen.de> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 14:36:01 -07:00
Geliang Tang	bab6b88e05	mptcp: add allow_join_id0 in mptcp_out_options This patch defined a new flag MPTCP_CAP_DENY_JOIN_ID0 for the third bit, labeled "C" of the MP_CAPABLE option. Add a new flag allow_join_id0 in struct mptcp_out_options. If this flag is set, send out the MP_CAPABLE option with the flag MPTCP_CAP_DENY_JOIN_ID0. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 14:36:01 -07:00
Geliang Tang	d2f77960e5	mptcp: add sysctl allow_join_initial_addr_port This patch added a new sysctl, named allow_join_initial_addr_port, to control whether allow peers to send join requests to the IP address and port number used by the initial subflow. Suggested-by: Florian Westphal <fw@strlen.de> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 14:36:01 -07:00
David S. Miller	a432c771e2	Merge branch 'sctp-packetization-path-MTU' Xin Long says: ==================== sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport Overview(From RFC8899): In contrast to PMTUD, Packetization Layer Path MTU Discovery (PLPMTUD) [RFC4821] introduces a method that does not rely upon reception and validation of PTB messages. It is therefore more robust than Classical PMTUD. This has become the recommended approach for implementing discovery of the PMTU [BCP145]. It uses a general strategy in which the PL sends probe packets to search for the largest size of unfragmented datagram that can be sent over a network path. Probe packets are sent to explore using a larger packet size. If a probe packet is successfully delivered (as determined by the PL), then the PLPMTU is raised to the size of the successful probe. If a black hole is detected (e.g., where packets of size PLPMTU are consistently not received), the method reduces the PLPMTU. SCTP Probe Packets: As the RFC suggested, the probe packets consist of an SCTP common header followed by a HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to control the length of the probe packet. The HEARTBEAT chunk is used to trigger the sending of a HEARTBEAT ACK chunk to confirm this probe on the HEARTBEAT sender. The HEARTBEAT chunk also carries a Heartbeat Information parameter that includes the probe size to help an implementation associate a HEARTBEAT ACK with the size of probe that was sent. The sender use the nonce and the probe size to verify the information returned. Detailed Implementation on SCTP: +------+ +------->\| Base \|-----------------+ Connectivity \| +------+ \| or BASE_PLPMTU \| \| \| confirmation failed \| \| v \| \| Connectivity +-------+ \| \| and BASE_PLPMTU \| Error \| \| \| confirmed +-------+ \| \| \| Consistent \| v \| connectivity Black Hole \| +--------+ \| and BASE_PLPMTU detected \| \| Search \|<---------------+ confirmed \| +--------+ \| ^ \| \| \| \| \| Raise \| \| Search \| timer \| \| algorithm \| expired \| \| completed \| \| \| \| \| v \| +-----------------+ +---\| Search Complete \| +-----------------+ When PLPMTUD is enabled, it's in Base state, and starts to probe with BASE_PLPMTU (1200). If this probe succeeds, it goes to Search state; If this probe fails, it goes to Error state under which pl.pmtu goes down to MIN_PLPMTU (512) and keeps probing with BASE_PLPMTU until it succeeds and goes to Search state. During the Search state, the probe size is growing by a Big step (32) every time when the last probe succeeds at the beginning. Once a probe (such as 1420) fails after trying MAX_PROBES (3) times, the probe_size goes back to the last one (1420 - 32 = 1388), meanwhile 'probe_high' is set to 1420 and the growing step becomes a Small one (4). Then the probe is continuing with a Small step grown each round. Until it gets the optimal size (such as 1400) when probe with its next probe size (1404) fails, it sync this size to pathmtu and goes to Complete state. In Complete state, it will only does a probe check for the pathmtu just set, if it fails, which means a Black Hole is detected and it goes back to Base state. If it succeeds, it goes back to Search state again, and probe is continuing with growing a Small step (1400 + 4). If this probe fails, probe_high is set and goes back to 1388 and then Complete state, which is kind of a loop normally. However if the env's pathmtu changes to a big size somehow, this probe will succeed and then probe continues with growing a Big step (1400 + 32) each round until another probe fails. PTB Messages Process: PLPMTUD doesn't rely on these package to find the pmtu, and shouldn't trust it either. When processing them, it only changes the probe_size to PL_PTB_SIZE(info - hlen) if 'pl.pmtu < PL_PTB_SIZE < the current probe_size' druing Search state. As this could help probe_size to get to the optimal size faster, for exmaple: pl.pmtu = 1388, probe_size = 1420, while the env's pathmtu = 1400. When probe_size is 1420, a Toobig packet with 1400 comes back. If probe size changes to use 1400, it will save quite a few rounds to get there. But of course after having this value, PLPMTUD will still verify it on its own before using it. Patches: - Patch 1-6: introduce some new constants/variables from the RFC, systcl and members in transport, APIs for the following patches, chunks and a timer for the probe sending and some codes for the probe receiving. - Patch 7-9: implement the state transition on the tx path, rx path and toobig ICMP packet processing. This is the main algorithm part. - Patch 10: activate this feature - Patch 11-14: improve the process for ICMP packets for SCTP over UDP, so that it can also be covered by this feature. Tests: - do sysctl and setsockopt tests for this feature's enabling and disabling. - get these pr_debug points for this feature by # cat /sys/kernel/debug/dynamic_debug/control \| grep PLP and enable them on kernel dynamic debug, then play with the pathmtu and check if the state transition and plpmtu change match the RFC. - do the above tests for SCTP over IPv4/IPv6 and SCTP over UDP. v1->v2: - See Patch 06/14. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	9e47df005c	sctp: process sctp over udp icmp err on sctp side Previously, sctp over udp was using udp tunnel's icmp err process, which only does sk lookup on sctp side. However for sctp's icmp error process, there are more things to do, like syncing assoc pmtu/retransmit packets for toobig type err, and starting proto_unreach_timer for unreach type err etc. Now after adding PLPMTUD, which also requires to process toobig type err on sctp side. This patch is to process icmp err on sctp side by parsing the type/code/info in .encap_err_lookup and call sctp's icmp processing functions. Note as the 'redirect' err process needs to know the outer ip(v6) header's, we have to leave it to udp(v6)_err to handle it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	d83060759a	sctp: extract sctp_v4_err_handle function from sctp_v4_err This patch is to extract sctp_v4_err_handle() from sctp_v4_err() to only handle the icmp err after the sock lookup, and it also makes the code clearer. sctp_v4_err_handle() will be used in sctp over udp's err handling in the following patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	f6549bd37b	sctp: extract sctp_v6_err_handle function from sctp_v6_err This patch is to extract sctp_v6_err_handle() from sctp_v6_err() to only handle the icmp err after the sock lookup, and it also makes the code clearer. sctp_v6_err_handle() will be used in sctp over udp's err handling in the following patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	237a6a2e31	sctp: remove the unessessary hold for idev in sctp_v6_err Same as in tcp_v6_err() and __udp6_lib_err(), there's no need to hold idev in sctp_v6_err(), so just call __in6_dev_get() instead. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	7307e4fa4d	sctp: enable PLPMTUD when the transport is ready sctp_transport_pl_reset() is called whenever any of these 3 members in transport is changed: - probe_interval - param_flags & SPP_PMTUD_ENABLE - state == ACTIVE If all are true, start the PLPMTUD when it's not yet started. If any of these is false, stop the PLPMTUD when it's already running. sctp_transport_pl_update() is called when the transport dst has changed. It will restart the PLPMTUD probe. Again, the pathmtu won't change but use the dst's mtu until the Search phase is done. Note that after using PLPMTUD, the pathmtu is only initialized with the dst mtu when the transport dst changes. At other time it is updated by pl.pmtu. So sctp_transport_pmtu_check() will be called only when PLPMTUD is disabled in sctp_packet_config(). After this patch, the PLPMTUD feature from RFC8899 will be activated and can be used by users. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	8369640831	sctp: do state transition when receiving an icmp TOOBIG packet PLPMTUD will short-circuit the old process for icmp TOOBIG packets. This part is described in rfc8899#section-4.6.2 (PL_PTB_SIZE = PTB_SIZE - other_headers_len). Note that from rfc8899#section-5.2 State Machine, each case below is for some specific states only: a) PL_PTB_SIZE < MIN_PLPMTU \|\| PL_PTB_SIZE >= PROBED_SIZE, discard it, for any state b) MIN_PLPMTU < PL_PTB_SIZE < BASE_PLPMTU, Base -> Error, for Base state c) BASE_PLPMTU <= PL_PTB_SIZE < PLPMTU, Search -> Base or Complete -> Base, for Search and Complete states. d) PLPMTU < PL_PTB_SIZE < PROBED_SIZE, set pl.probe_size to PL_PTB_SIZE then verify it, for Search state. The most important one is case d), which will help find the optimal fast during searching. Like when pathmtu = 1392 for SCTP over IPv4, the search will be (20 is iphdr_len): 1. probe with 1200 - 20 2. probe with 1232 - 20 3. probe with 1264 - 20 ... 7. probe with 1388 - 20 8. probe with 1420 - 20 When sending the probe with 1420 - 20, TOOBIG may come with PL_PTB_SIZE = 1392 - 20. Then it matches case d), and saves some rounds to try with the 1392 - 20 probe. But of course, PLPMTUD doesn't trust TOOBIG packets, and it will go back to the common searching once the probe with the new size can't be verified. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	b87641aff9	sctp: do state transition when a probe succeeds on HB ACK recv path As described in rfc8899#section-5.2, when a probe succeeds, there might be the following state transitions: - Base -> Search, occurs when probe succeeds with BASE_PLPMTU, pl.pmtu is not changing, pl.probe_size increases by SCTP_PL_BIG_STEP, - Error -> Search, occurs when probe succeeds with BASE_PLPMTU, pl.pmtu is changed from SCTP_MIN_PLPMTU to SCTP_BASE_PLPMTU, pl.probe_size increases by SCTP_PL_BIG_STEP. - Search -> Search Complete, occurs when probe succeeds with the probe size SCTP_MAX_PLPMTU less than pl.probe_high, pl.pmtu is not changing, but update pathmtu with it, pl.probe_size is set back to pl.pmtu to double check it. - Search Complete -> Search, occurs when probe succeeds with the probe size equal to pl.pmtu, pl.pmtu is not changing, pl.probe_size increases by SCTP_PL_MIN_STEP. So search process can be described as: 1. When it just enters 'Search' state, pathmtu is not updated with pl.pmtu, and probe_size increases by a big step (SCTP_PL_BIG_STEP) each round. 2. Until pl.probe_high is set when a probe fails, and probe_size decreases back to pl.pmtu, as described in the last patch. 3. When the probe with the new size succeeds, probe_size changes to increase by a small step (SCTP_PL_MIN_STEP) due to pl.probe_high is set. 4. Until probe_size is next to pl.probe_high, the searching finishes and it goes to 'Complete' state and updates pathmtu with pl.pmtu, and then probe_size is set to pl.pmtu to confirm by once more probe. 5. This probe occurs after "30 * probe_inteval", a much longer time than that in Search state. Once it is done it goes to 'Search' state again with probe_size increased by SCTP_PL_MIN_STEP. As we can see above, during the searching, pl.pmtu changes while pathmtu doesn't. pathmtu is only updated when the search finishes by which it gets an optimal value for it. A big step is used at the beginning until it gets close to the optimal value, then it changes to a small step until it has this optimal value. The small step is also used in 'Complete' until it goes to 'Search' state again and the probe with 'pmtu + the small step' succeeds, which means a higher size could be used. Then probe_size changes to increase by a big step again until it gets close to the next optimal value. Note that anytime when black hole is detected, it goes directly to 'Base' state with pl.pmtu set to SCTP_BASE_PLPMTU, as described in the last patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	1dc68c1945	sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send path The state transition is described in rfc8899#section-5.2, PROBE_COUNT == MAX_PROBES means the probe fails for MAX times, and the state transition includes: - Base -> Error, occurs when BASE_PLPMTU Confirmation Fails, pl.pmtu is set to SCTP_MIN_PLPMTU, probe_size is still SCTP_BASE_PLPMTU; - Search -> Base, occurs when Black Hole Detected, pl.pmtu is set to SCTP_BASE_PLPMTU, probe_size is set back to SCTP_BASE_PLPMTU; - Search Complete -> Base, occurs when Black Hole Detected pl.pmtu is set to SCTP_BASE_PLPMTU, probe_size is set back to SCTP_BASE_PLPMTU; Note a black hole is encountered when a sender is unaware that packets are not being delivered to the destination endpoint. So it includes the probe failures with equal probe_size to pl.pmtu, and definitely not include that with greater probe_size than pl.pmtu. The later one is the normal probe failure where probe_size should decrease back to pl.pmtu and pl.probe_high is set. pl.probe_high would be used on HB ACK recv path in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	fe59379b9a	sctp: do the basic send and recv for PLPMTUD probe This patch does exactly what rfc8899#section-6.2.1.2 says: The SCTP sender needs to be able to determine the total size of a probe packet. The HEARTBEAT chunk could carry a Heartbeat Information parameter that includes, besides the information suggested in [RFC4960], the probe size to help an implementation associate a HEARTBEAT ACK with the size of probe that was sent. The sender could also use other methods, such as sending a nonce and verifying the information returned also contains the corresponding nonce. The length of the PAD chunk is computed by reducing the probing size by the size of the SCTP common header and the HEARTBEAT chunk. Note that HB ACK chunk will carry back whatever HB chunk carried, including the probe_size we put it in; We also check hbinfo->probe_size in the HB ACK against link->pl.probe_size to validate this HB ACK chunk. v1->v2: - Remove the unused 'sp' and add static for sctp_packet_bundle_pad(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	92548ec2f1	sctp: add the probe timer in transport for PLPMTUD There are 3 timers described in rfc8899#section-5.1.1: PROBE_TIMER, PMTU_RAISE_TIMER, CONFIRMATION_TIMER This patches adds a 'probe_timer' in transport, and it works as either PROBE_TIMER or PMTU_RAISE_TIMER. At most time, it works as PROBE_TIMER and expires every a 'probe_interval' time to send the HB probe packet. When transport pl enters COMPLETE state, it works as PMTU_RAISE_TIMER and expires in 'probe_interval * 30' time to go back to SEARCH state and do searching again. SCTP HB is an acknowledged packet, CONFIRMATION_TIMER is not needed. The timer will start when transport pl enters BASE state and stop when it enters DISABLED state. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	d9e2e410ae	sctp: add the constants/variables and states and some APIs for transport These are 4 constants described in rfc8899#section-5.1.2: MAX_PROBES, MIN_PLPMTU, MAX_PLPMTU, BASE_PLPMTU; And 2 variables described in rfc8899#section-5.1.3: PROBED_SIZE, PROBE_COUNT; And 5 states described in rfc8899#section-5.2: DISABLED, BASE, SEARCH, SEARCH_COMPLETE, ERROR; And these 4 APIs are used to reset/update PLPMTUD, check if PLPMTUD is enabled, and calculate the additional headers length for a transport. Note the member 'probe_high' in transport will be set to the probe size when a probe fails with this probe size in the next patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:52 -07:00
Xin Long	3190b649b4	sctp: add SCTP_PLPMTUD_PROBE_INTERVAL sockopt for sock/asoc/transport With this socket option, users can change probe_interval for a transport, asoc or sock after it's created. Note that if the change is for an asoc, also apply the change to each transport in this asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:51 -07:00
Xin Long	d1e462a7a5	sctp: add probe_interval in sysctl and sock/asoc/transport PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'. 'n' is the interval for PLPMTUD probe timer in milliseconds, and it can't be less than 5000 if it's not 0. All asoc/transport's PLPMTUD in a new socket will be enabled by default. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:51 -07:00
Xin Long	745a32117b	sctp: add pad chunk and its make function and event table This chunk is defined in rfc4820#section-3, and used to pad an SCTP packet. The receiver must discard this chunk and continue processing the rest of the chunks in the packet. Add it now, as it will be bundled with a heartbeat chunk to probe pmtu in the following patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 11:28:51 -07:00
Lorenzo Bianconi	aff0824dc4	net: marvell: return csum computation result from mvneta_rx_csum/mvpp2_rx_csum This is a preliminary patch to add hw csum hint support to mvneta/mvpp2 xdp implementation Tested-by: Matteo Croce <mcroce@linux.microsoft.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:55:05 -07:00
David S. Miller	f84974e75f	Merge branch 'tc-testing-dnat-tuple-collision' Marcelo Ricardo Leitner says: ==================== tc-testing: add test for ct DNAT tuple collision That was fixed in `13c62f5371` ("net/sched: act_ct: handle DNAT tuple collision"). For that, it requires that tdc is able to send diverse packets with scapy, which is then done on the 2nd patch of this series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:52:40 -07:00
Marcelo Ricardo Leitner	e469056413	tc-testing: add test for ct DNAT tuple collision When this test fails, /proc/net/nf_conntrack gets only 1 entry: ipv4 2 tcp 6 119 SYN_SENT src=10.0.0.10 dst=10.0.0.10 sport=5000 dport=10 [UNREPLIED] src=20.0.0.1 dst=10.0.0.10 sport=10 dport=5000 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 When it works, it gets 2 entries: ipv4 2 tcp 6 119 SYN_SENT src=10.0.0.10 dst=10.0.0.20 sport=5000 dport=10 [UNREPLIED] src=20.0.0.1 dst=10.0.0.10 sport=10 dport=58203 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 ipv4 2 tcp 6 119 SYN_SENT src=10.0.0.10 dst=10.0.0.10 sport=5000 dport=10 [UNREPLIED] src=20.0.0.1 dst=10.0.0.10 sport=10 dport=5000 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 The missing entry is because the 2nd packet hits a tuple collusion and the conntrack entry doesn't get allocated. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:52:39 -07:00
Marcelo Ricardo Leitner	11f04de902	tc-testing: add support for sending various scapy packets It can be worth sending different scapy packets on a given test, as in the last patch of this series. For that, lets listify the scapy attribute and simply iterate over it. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:52:39 -07:00
Marcelo Ricardo Leitner	b4fd096cbb	tc-testing: fix list handling python lists don't have an 'add' method, but 'append'. Fixes: `14e5175e9e` ("tc-testing: introduce scapyPlugin for basic traffic") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:52:39 -07:00
Loic Poulain	1b134d8d75	MAINTAINERS: network: add entry for WWAN This patch adds maintainer info for drivers/net/wwan subdir, including WWAN core and drivers. Adding Sergey and myself as maintainers and Johannes as reviewer. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:48:16 -07:00
Aaron Conole	c4ab7b56be	openvswitch: add trace points This makes openvswitch module use the event tracing framework to log the upcall interface and action execution pipeline. When using openvswitch as the packet forwarding engine, some types of debugging are made possible simply by using the ovs-vswitchd's ofproto/trace command. However, such a command has some limitations: 1. When trying to trace packets that go through the CT action, the state of the packet can't be determined, and probably would be potentially wrong. 2. Deducing problem packets can sometimes be difficult as well even if many of the flows are known 3. It's possible to use the openvswitch module even without the ovs-vswitchd (although, not common use). Introduce the event tracing points here to make it possible for working through these problems in kernel space. The style is copied from the mac80211 driver-trace / trace code for consistency - this creates some checkpatch splats, but the official 'guide' for adding tracepoints, as well as the existing examples all add the same splats so it seems acceptable. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:47:32 -07:00
Dan Carpenter	b0e03950dd	stmmac: dwmac-loongson: fix uninitialized variable in loongson_dwmac_probe() The "mdio" variable is never set to false. Also it should be a bool type instead of int. Fixes: `30bba69d7d` ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:44:50 -07:00
David S. Miller	a4bdf76f54	Merge branch 'ethtool-eeprom' Ido Schimmel says: ==================== ethtool: Module EEPROM API improvements This patchset contains various improvements to recently introduced module EEPROM netlink API. Noticed these while adding module EEPROM write support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:40:55 -07:00
Ido Schimmel	88f9a87afe	ethtool: Validate module EEPROM offset as part of policy Validate the offset to read from module EEPROM as part of the netlink policy and remove the corresponding check from the code. This also makes it possible to query the offset range from user space: $ genl ctrl policy name ethtool ... ID: 0x14 policy[32]:attr[2]: type=U32 range:[0,255] ... Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:40:54 -07:00
Ido Schimmel	0dc7dd02ba	ethtool: Validate module EEPROM length as part of policy Validate the number of bytes to read from the module EEPROM as part of the netlink policy and remove the corresponding check from the code. This also makes it possible to query the length range from user space: $ genl ctrl policy name ethtool ... ID: 0x14 policy[32]:attr[3]: type=U32 range:[1,128] ... Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:40:54 -07:00
Ido Schimmel	b8c48be23c	ethtool: Use kernel data types for internal EEPROM struct The struct is not visible to user space and therefore should not use the user visible data types. Instead, use internal data types like other structures in the file. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:40:54 -07:00
Ido Schimmel	37a025e839	ethtool: Document behavior when module EEPROM bank attribute is omitted The kernel assumes bank 0 when 'ETHTOOL_MSG_MODULE_EEPROM_GET' is sent without 'ETHTOOL_A_MODULE_EEPROM_BANK'. Document it as part of the interface documentation. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:40:54 -07:00
Ido Schimmel	f5fe211d13	ethtool: Decrease size of module EEPROM get policy array The 'ETHTOOL_A_MODULE_EEPROM_DATA' attribute is not part of the get request. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:40:54 -07:00
Ido Schimmel	913d026fbf	ethtool: Document correct attribute type 'ETHTOOL_A_MODULE_EEPROM_DATA' is a binary attribute, not a nested one. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:40:54 -07:00
Ido Schimmel	78c57f22e3	ethtool: Use correct command name in title The command is called 'ETHTOOL_MSG_MODULE_EEPROM_GET', not 'ETHTOOL_MSG_MODULE_EEPROM'. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:40:54 -07:00
gushengxian	98534fce52	bridge: cfm: remove redundant return Return statements are not needed in Void function. Signed-off-by: gushengxian <gushengxian@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:35:15 -07:00
Kees Cook	f2fcffe392	hv_netvsc: Avoid field-overflowing memcpy() In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. Add flexible array to represent start of buf_info, improving readability and avoid future warning where memcpy() thinks it is writing past the end of the structure. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:20:51 -07:00
Florian Fainelli	64a81b2448	net: dsa: b53: Create default VLAN entry explicitly In case CONFIG_VLAN_8021Q is not set, there will be no call down to the b53 driver to ensure that the default PVID VLAN entry will be configured with the appropriate untagged attribute towards the CPU port. We were implicitly relying on dsa_slave_vlan_rx_add_vid() to do that for us, instead make it explicit. Reported-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:19:39 -07:00
Kees Cook	ee8e7622e0	octeontx2-af: Avoid field-overflowing memcpy() In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. To avoid having memcpy() think a u64 "prof" is being written beyond, adjust the prof member type by adding struct nix_bandprof_s to the union to match the other structs. This silences the following future warning: In file included from ./include/linux/string.h:253, from ./include/linux/bitmap.h:10, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/cpumask.h:5, from ./arch/x86/include/asm/msr.h:11, from ./arch/x86/include/asm/processor.h:22, from ./arch/x86/include/asm/timex.h:5, from ./include/linux/timex.h:65, from ./include/linux/time32.h:13, from ./include/linux/time.h:60, from ./include/linux/stat.h:19, from ./include/linux/module.h:13, from drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:11: In function '__fortify_memcpy_chk', inlined from '__fortify_memcpy' at ./include/linux/fortify-string.h:310:2, inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:910:5: ./include/linux/fortify-string.h:268:4: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); please use struct_group() [-Wattribute-warning] 268 \| __write_overflow_field(); \| ^~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c: ... else if (req->ctype == NIX_AQ_CTYPE_BANDPROF) memcpy(&rsp->prof, ctx, sizeof(struct nix_bandprof_s)); ... Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Subbaraya Sundeep<sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:19:01 -07:00
David S. Miller	78c235f9ea	Merge branch 'wwan-link-creation-improvements' Sergey Ryazanov says: ==================== net: WWAN link creation improvements This series is intended to make the WWAN network links management easier for WWAN device drivers. The series begins with adding support for network links creation to the WWAN HW simulator to facilitate code testing. Then there are a couple of changes that prepe the WWAN core code for further modifications. The following patches (4-6) simplify driver unregistering procedures by performing the created links cleanup in the WWAN core. 7th patch is to avoid the odd hold of a driver module. Next patches (8th and 9th) make it easier for drivers to create a network interface for a default data channel. Finally, 10th patch adds support for reporting of data link (aka channel aka context) id to make user aware which network interface is bound to which WWAN device data channel. All core changes have been tested with the HW simulator. The MHI and IOSM drivers were only compile tested as I have no access to this hardware. So the coresponding patches require ACK from the driver authors. Changelog: v1 -> v2: * rebased on top of latest net-next * patch that reworks the creation of mhi_net default netdev was dropped; as Loic explained, this network device has different purpose depending on a driver mode; Loic has a plan to rework the mhi_net driver, so we will defer the default netdev creation reworkings * add a new patch that creates a default network interface for IOSM modems * 7th, 8th, 10th patches have a minor updates (see the patches for details) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:01:17 -07:00
Sergey Ryazanov	6994092403	wwan: core: add WWAN common private data for netdev The WWAN core not only multiplex the netdev configuration data, but process it too, and needs some space to store its private data associated with the netdev. Add a structure to keep common WWAN core data. The structure will be stored inside the netdev private data before WWAN driver private data and have a field to make it easier to access the driver data. Also add a helper function that simplifies drivers access to their data. At the moment we use the common WWAN private data to store the WWAN data link (channel) id at the time the link is created, and report it back to user using the .fill_info() RTNL callback. This should help the user to be aware which network interface is bound to which WWAN device data channel. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> CC: M Chetan Kumar <m.chetan.kumar@intel.com> CC: Intel Corporation <linuxwwan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:01:17 -07:00
Sergey Ryazanov	83068395bb	net: iosm: create default link via WWAN core Utilize the just introduced WWAN core feature to create a default netdev for the default data (IP MUX) channel. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> CC: M Chetan Kumar <m.chetan.kumar@intel.com> CC: Intel Corporation <linuxwwan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:01:17 -07:00
Sergey Ryazanov	ca374290aa	wwan: core: support default netdev creation Most, if not each WWAN device driver will create a netdev for the default data channel. Therefore, add an option for the WWAN netdev ops registration function to create a default netdev for the WWAN device. A WWAN device driver should pass a default data channel link id to the ops registering function to request the creation of a default netdev, or a special value WWAN_NO_DEFAULT_LINK to inform the WWAN core that the default netdev should not be created. For now, only wwan_hwsim utilize the default link creation option. Other drivers will be reworked next. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> CC: M Chetan Kumar <m.chetan.kumar@intel.com> CC: Intel Corporation <linuxwwan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:01:16 -07:00
Sergey Ryazanov	9f0248ea47	wwan: core: no more hold netdev ops owning module The WWAN netdev ops owner holding was used to protect from the unexpected memory disappear. This approach causes a dependency cycle (driver -> core -> driver) and effectively prevents a WWAN driver unloading. E.g. WWAN hwsim could not be unloaded until all simulated devices are removed: ~# modprobe wwan_hwsim devices=2 ~# lsmod \| grep wwan wwan_hwsim 16384 2 wwan 20480 1 wwan_hwsim ~# rmmod wwan_hwsim rmmod: ERROR: Module wwan_hwsim is in use ~# echo > /sys/kernel/debug/wwan_hwsim/hwsim0/destroy ~# echo > /sys/kernel/debug/wwan_hwsim/hwsim1/destroy ~# lsmod \| grep wwan wwan_hwsim 16384 0 wwan 20480 1 wwan_hwsim ~# rmmod wwan_hwsim For a real device driver this will cause an inability to unload module until a served device is physically detached. Since the last commit we are removing all child netdev(s) when a driver unregister the netdev ops. This allows us to permit the driver unloading, since any sane driver will call ops unregistering on a device deinitialization. So, remove the holding of an ops owner to make it easier to unload a driver module. The owner field has also beed removed from the ops structure as there are no more users of this field. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:01:16 -07:00
Sergey Ryazanov	322a0ba99c	net: iosm: drop custom netdev(s) removing Since the last commit, the WWAN core will remove all our network interfaces for us at the time of the WWAN netdev ops unregistering. Therefore, we can safely drop the custom code that cleans the list of created netdevs. Anyway it no longer removes any netdev, since all netdevs were removed earlier in the wwan_unregister_ops() call. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com> CC: M Chetan Kumar <m.chetan.kumar@intel.com> CC: Intel Corporation <linuxwwan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:01:16 -07:00
Sergey Ryazanov	2f75238014	wwan: core: remove all netdevs on ops unregistering We use the ops owner module hold to protect against ops memory disappearing. But this approach does not protect us from a driver that unregisters ops but forgets to remove netdev(s) that were created using this ops. In such case, we are left with netdev(s), which can not be removed since ops is gone. Moreover, batch netdevs removing on deinitialization is a desireable option for WWAN drivers as it is a quite common task. Implement deletion of all created links on WWAN netdev ops unregistering in the same way that RTNL removes all links on RTNL ops unregistering. Simply remove all child netdevs of a device whose WWAN netdev ops is unregistering. This way we protecting the kernel from buggy drivers and make it easier to write a driver deinitialization code. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:01:16 -07:00
Sergey Ryazanov	f492fccf3d	wwan: core: multiple netdevs deletion support Use unregister_netdevice_queue() instead of simple unregister_netdevice() if the WWAN netdev ops does not provide a dellink callback. This will help to accelerate deletion of multiple netdevs. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:01:16 -07:00

... 2 3 4 5 6 ...

1017505 Commits