linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2025-01-10 07:44:23 +08:00

Author	SHA1	Message	Date
Florian Westphal	a0a4de4d89	netfilter: remove NFPROTO_DECNET Decnet has been removed. so no need to reserve space in arrays for it. Signed-off-by: Florian Westphal <fw@strlen.de>	2022-09-07 16:46:03 +02:00
Florian Westphal	628d694344	netfilter: conntrack: reduce timeout when receiving out-of-window fin or rst In case the endpoints and conntrack go out-of-sync, i.e. there is disagreement wrt. validy of sequence/ack numbers between conntracks internal state and those of the endpoints, connections can hang for a long time (until ESTABLISHED timeout). This adds a check to detect a fin/fin exchange even if those are invalid. The timeout is then lowered to UNACKED (default 300s). Signed-off-by: Florian Westphal <fw@strlen.de>	2022-09-07 16:46:03 +02:00
Florian Westphal	09a59001b0	netfilter: conntrack: remove unneeded indent level After previous patch, the conditional branch is obsolete, reformat it. gcc generates same code as before this change. Signed-off-by: Florian Westphal <fw@strlen.de>	2022-09-07 16:46:03 +02:00
Liu Shixin	53fc01a0a8	net: sysctl: remove unused variable long_max The variable long_max is replaced by bpf_jit_limit_max and no longer be used. So remove it. No functional change. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 15:31:19 +01:00
Lorenzo Bianconi	c9daab3223	net: ethernet: mtk_eth_soc: remove mtk_foe_entry_timestamp Get rid of mtk_foe_entry_timestamp routine since it is no longer used. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 15:30:13 +01:00
Lorenzo Bianconi	f27b405ef4	net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb Even if max hash configured in hw in mtk_ppe_hash_entry is MTK_PPE_ENTRIES - 1, check theoretical OOB accesses in mtk_ppe_check_skb routine Fixes: `c4f033d9e0` ("net: ethernet: mtk_eth_soc: rework hardware flow table management") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 15:29:40 +01:00
Menglong Dong	9cb252c4c1	net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM As Eric reported, the 'reason' field is not presented when trace the kfree_skb event by perf: $ perf record -e skb:kfree_skb -a sleep 10 $ perf script ip_defrag 14605 [021] 221.614303: skb:kfree_skb: skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 reason: The cause seems to be passing kernel address directly to TP_printk(), which is not right. As the enum 'skb_drop_reason' is not exported to user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason string from the 'reason' field, which is a number. Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used to define the trace enum by TRACE_DEFINE_ENUM(). With the help of DEFINE_DROP_REASON(), now we can remove the auto-generate that we introduced in the commit `ec43908dd5` ("net: skb: use auto-generation to convert skb drop reason to string"), and define the string array 'drop_reasons'. Hmmmm...now we come back to the situation that have to maintain drop reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they are both in dropreason.h, which makes it easier. After this commit, now the format of kfree_skb is like this: $ cat /tracing/events/skb/kfree_skb/format name: kfree_skb ID: 1524 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:void * skbaddr; offset:8; size:8; signed:0; field:void * location; offset:16; size:8; signed:0; field:unsigned short protocol; offset:24; size:2; signed:0; field:enum skb_drop_reason reason; offset:28; size:4; signed:0; print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ...... Fixes: `ec43908dd5` ("net: skb: use auto-generation to convert skb drop reason to string") Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/ Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 15:28:08 +01:00
Lorenzo Bianconi	0e80707d94	net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear Set ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine. Fixes: `33fc42de33` ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 15:25:03 +01:00
Florian Westphal	6e250dcbff	netfilter: conntrack: ignore overly delayed tcp packets If 'nf_conntrack_tcp_loose' is off (the default), tcp packets that are outside of the current window are marked as INVALID. nf/iptables rulesets often drop such packets via 'ct state invalid' or similar checks. For overly delayed acks, this can be a nuisance if such 'invalid' packets are also logged. Since they are not invalid in a strict sense, just ignore them, i.e. conntrack won't extend timeout or change state so that they do not match invalid state rules anymore. This also avoids unwantend connection stalls in case conntrack considers retransmission (of data that did not reach the peer) as too old. The else branch of the conditional becomes obsolete. Next patch will reformant the now always-true if condition. The existing workaround for data that exceeds the calculated receive window is adjusted to use the 'ignore' state so that these packets do not refresh the timeout or change state other than updating ->td_end. Signed-off-by: Florian Westphal <fw@strlen.de>	2022-09-07 15:43:51 +02:00
Florian Westphal	d9a6f0d0df	netfilter: conntrack: prepare tcp_in_window for ternary return value tcp_in_window returns true if the packet is in window and false if it is not. If its outside of window, packet will be treated as INVALID. There are corner cases where the packet should still be tracked, because rulesets may drop or log such packets, even though they can occur during normal operation, such as overly delayed acks. In extreme cases, connection may hang forever because conntrack state differs from real state. There is no retransmission for ACKs. In case of ACK loss after conntrack processing, its possible that a connection can be stuck because the actual retransmits are considered stale ("SEQ is under the lower bound (already ACKed data retransmitted)". The problem is made worse by carrier-grade-nat which can also result in stale packets from old connections to get treated as 'recent' packets in conntrack (it doesn't support tcp timestamps at this time). Prepare tcp_in_window() to return an enum that tells the desired action (in-window/accept, bogus/drop). A third action (accept the packet as in-window, but do not change state) is added in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de>	2022-09-07 15:43:51 +02:00
David S. Miller	016eb59012	Merge branch 'macsec-offload-mlx5' Saeed Mahameed says: ==================== Introduce MACsec skb_metadata_dst and mlx5 macsec offload v1->v2: - attach mlx5 implementation patches. This patchset introduces MACsec skb_metadata_dst to lay the ground for MACsec HW offload. MACsec is an IEEE standard (IEEE 802.1AE) for MAC security. It defines a way to establish a protocol independent connection between two hosts with data confidentiality, authenticity and/or integrity, using GCM-AES. MACsec operates on the Ethernet layer and as such is a layer 2 protocol, which means it’s designed to secure traffic within a layer 2 network, including DHCP or ARP requests. Linux has a software implementation of the MACsec standard and HW offloading support. The offloading is re-using the logic, netlink API and data structures of the existing MACsec software implementation. For Tx: In the current MACsec offload implementation, MACsec interfaces shares the same MAC address by default. Therefore, HW can't distinguish from which MACsec interface the traffic originated from. MACsec stack will use skb_metadata_dst to store the SCI value, which is unique per MACsec interface, skb_metadat_dst will be used later by the offloading device driver to associate the SKB with the corresponding offloaded interface (SCI) to facilitate HW MACsec offload. For Rx: Like in the Tx changes, if there are more than one MACsec device with the same MAC address as in the packet's destination MAC, the packet will be forward only to one of the devices and not neccessarly to the desired one. Offloading device driver sets the MACsec skb_metadata_dst sci field with the appropriaate Rx SCI for each SKB so the MACsec rx handler will know to which port to divert those skbs, instead of wrongly solely relaying on dst MAC address comparison. 1) patch 1,2, Add support to skb_metadata_dst in MACsec code: net/macsec: Add MACsec skb_metadata_dst Tx Data path support net/macsec: Add MACsec skb_metadata_dst Rx Data path support 2) patch 3, Move some MACsec driver code for sharing with various drivers that implements offload: net/macsec: Move some code for sharing with various drivers that implements offload 3) The rest of the patches introduce mlx5 implementation for macsec offloads TX and RX via steering tables. a) TX, intercept skbs with macsec offlad mark in skb_metadata_dst and mark the descriptor for offload. b) RX, intercept offloaded frames and prepare the proper skb_metadata_dst to mark offloaded rx frames. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:09 +01:00
Lior Nahmanson	99d4dc66c8	net/mlx5e: Add support to configure more than one macsec offload device Add the ability to add up to 16 MACsec offload interfaces over the same physical interface Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:09 +01:00
Lior Nahmanson	807a1b765b	net/mlx5e: Add MACsec stats support for Rx/Tx flows Add the following statistics: RX successfully decrypted MACsec packets: macsec_rx_pkts : Number of packets decrypted successfully macsec_rx_bytes : Number of bytes decrypted successfully Rx dropped MACsec packets: macsec_rx_pkts_drop : Number of MACsec packets dropped macsec_rx_bytes_drop : Number of MACsec bytes dropped TX successfully encrypted MACsec packets: macsec_tx_pkts : Number of packets encrypted/authenticated successfully macsec_tx_bytes : Number of bytes encrypted/authenticated successfully Tx dropped MACsec packets: macsec_tx_pkts_drop : Number of MACsec packets dropped macsec_tx_bytes_drop : Number of MACsec bytes dropped The above can be seen using: ethtool -S <ifc> \|grep macsec Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:09 +01:00
Lior Nahmanson	5a39816a75	net/mlx5e: Add MACsec offload SecY support Add offload support for MACsec SecY callbacks - add/update/delete. add_secy is called when need to create a new MACsec interface. upd_secy is called when source MAC address or tx SC was changed. del_secy is called when need to destroy the MACsec interface. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:09 +01:00
Lior Nahmanson	b7c9400cbc	net/mlx5e: Implement MACsec Rx data path using MACsec skb_metadata_dst MACsec driver need to distinguish to which offload device the MACsec is target to, in order to handle them correctly. This can be done by attaching a metadata_dst to a SKB with a SCI, when there is a match on MACsec rule. To achieve that, there is a map between fs_id to SCI, so for each RX SC, there is a unique fs_id allocated when creating RX SC. fs_id passed to device driver as metadata for packets that passed Rx MACsec offload to aid the driver to retrieve the matching SCI. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:09 +01:00
Lior Nahmanson	3b20949cb2	net/mlx5e: Add MACsec RX steering rules Rx flow steering consists of two flow tables (FTs). The first FT (crypto table) have one default miss rule so non MACsec offloaded packets bypass the MACSec tables. All others flow table entries (FTEs) are divided to two equal groups size, both of them are for MACsec packets: The first group is for MACsec packets which contains SCI field in the SecTAG header. The second group is for MACsec packets which doesn't contain SCI, where need to match on the source MAC address (only if the SCI is built from default MACsec port). Destination MAC address, ethertype and some of SecTAG fields are also matched for both groups. In case of match, invoke decrypt action on the packet. For each MACsec Rx offloaded SA two rules are created: one with SCI and one without SCI. The second FT (check table) has two fixed rules: One rule is for verifying that the previous offload actions were finished successfully. In this case, need to decap the SecTAG header and forward the packet for further processing. Another default rule for dropping packets that failed in the previous decrypt actions. The MACsec FTs are created on demand when the first MACsec rule is added and destroyed when the last MACsec rule is deleted. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:09 +01:00
Lior Nahmanson	15d187e285	net/mlx5: Add MACsec Rx tables support to fs_core Add new namespace for MACsec RX flows. Encrypted MACsec packets should be first decrypted and stripped from MACsec header and then continues with the kernel's steering pipeline. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	aae3454e4d	net/mlx5e: Add MACsec offload Rx command support Add a support for Connect-X MACsec offload Rx SA & SC commands: add, update and delete. SCs are created on demend and aren't limited by number and unique by SCI. Each Rx SA must be associated with Rx SC according to SCI. Follow-up patches will implement the Rx steering. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	9515978eee	net/mlx5e: Implement MACsec Tx data path using MACsec skb_metadata_dst MACsec driver marks Tx packets for device offload using a dedicated skb_metadata_dst which holds a 64 bits SCI number. A previously set rule will match on this number so the correct SA is used for the MACsec operation. As device driver can only provide 32 bits of metadata to flow tables, need to used a mapping from 64 bit to 32 bits marker or id, which is can be achieved by provide a 32 bit unique flow id in the control path, and used a hash table to map 64 bit to the unique id in the datapath. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	e467b283ff	net/mlx5e: Add MACsec TX steering rules Tx flow steering consists of two flow tables (FTs). The first FT (crypto table) has two fixed rules: One default miss rule so non MACsec offloaded packets bypass the MACSec tables, another rule to make sure that MACsec key exchange (MKE) traffic passes unencrypted as expected (matched of ethertype). On each new MACsec offload flow, a new MACsec rule is added. This rule is matched on metadata_reg_a (which contains the id of the flow) and invokes the MACsec offload action on match. The second FT (check table) has two fixed rules: One rule for verifying that the previous offload actions were finished successfully and packet need to be transmitted. Another default rule for dropping packets that were failed in the offload actions. The MACsec FTs should be created on demand when the first MACsec rule is added and destroyed when the last MACsec rule is deleted. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	ee534d7f81	net/mlx5: Add MACsec Tx tables support to fs_core Changed EGRESS_KERNEL namespace to EGRESS_IPSEC and add new namespace for MACsec TX. This namespace should be the last namespace for transmitted packets. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	8ff0ac5be1	net/mlx5: Add MACsec offload Tx command support This patch adds support for Connect-X MACsec offload Tx SA commands: add, update and delete. In Connect-X MACsec, a Security Association (SA) is added or deleted via allocating a HW context of an encryption/decryption key and a HW context of a matching SA (MACsec object). When new SA is added: - Use a separate crypto key HW context. - Create a separate MACsec context in HW to include the SA properties. Introduce a new compilation flag MLX5_EN_MACSEC for it. Follow-up patches will implement the Tx steering. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	8385c51ff5	net/mlx5: Introduce MACsec Connect-X offload hardware bits and structures Add MACsec offload related IFC structs, layouts and enumerations. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	e227ee990b	net/mlx5: Generalize Flow Context for new crypto fields In order to support MACsec offload (and maybe some other crypto features in the future), generalize flow action parameters / defines to be used by crypto offlaods other than IPsec. The following changes made: ipsec_obj_id field at flow action context was changed to crypto_obj_id, intreduced a new crypto_type field where IPsec is the default zero type for backward compatibility. Action ipsec_decrypt was changed to crypto_decrypt. Action ipsec_encrypt was changed to crypto_encrypt. IPsec offload code was updated accordingly for backward compatibility. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	d1b2234b7f	net/mlx5: Removed esp_id from struct mlx5_flow_act esp_id is no longer in used Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	b1671253c6	net/macsec: Move some code for sharing with various drivers that implements offload Move some MACsec infrastructure like defines and functions, in order to avoid code duplication for future drivers which implements MACsec offload. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ben Ben-Ishay <benishay@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	860ead89b8	net/macsec: Add MACsec skb_metadata_dst Rx Data path support Like in the Tx changes, if there are more than one MACsec device with the same MAC address as in the packet's destination MAC, the packet will be forward only to this device and not neccessarly to the desired one. Offloading device drivers will mark offloaded MACsec SKBs with the corresponding SCI in the skb_metadata_dst so the macsec rx handler will know to which port to divert those skbs, instead of wrongly solely relaying on dst MAC address comparison. Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
Lior Nahmanson	0a28bfd497	net/macsec: Add MACsec skb_metadata_dst Tx Data path support In the current MACsec offload implementation, MACsec interfaces shares the same MAC address by default. Therefore, HW can't distinguish from which MACsec interface the traffic originated from. MACsec stack will use skb_metadata_dst to store the SCI value, which is unique per Macsec interface, skb_metadat_dst will be used by the offloading device driver to associate the SKB with the corresponding offloaded interface (SCI). Signed-off-by: Lior Nahmanson <liorna@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 14:02:08 +01:00
David S. Miller	0f51fa2a3c	Merge branch 'dsa-felix-fixes' Vladimir Oltean says: ==================== Fixes for Felix DSA driver calculation of tc-taprio guard bands This series fixes some bugs which are not quite new, but date from v5.13 when static guard bands were enabled by Michael Walle to prevent tc-taprio overruns. The investigation started when Xiaoliang asked privately what is the expected max SDU for a traffic class when its minimum gate interval is 10 us. The answer, as it turns out, is not an L1 size of 1250 octets, but 1245 octets, since otherwise, the switch will not consider frames for egress scheduling, because the static guard band is exactly as large as the time interval. The switch needs a minimum of 33 ns outside of the guard band to consider a frame for scheduling, and the reduction of the max SDU by 5 provides exactly for that. The fix for that (patch 1/3) is relatively small, but during testing, it became apparent that cut-through forwarding prevents oversized frame dropping from working properly. This is solved through the larger patch 2/3. Finally, patch 3/3 fixes one more tc-taprio locking problem found through code inspection. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 13:44:04 +01:00
Vladimir Oltean	a4bb481aeb	net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set The read-modify-write of QSYS_TAG_CONFIG from vsc9959_sched_speed_set() runs unlocked with respect to the other functions that access it, which are vsc9959_tas_guard_bands_update(), vsc9959_qos_port_tas_set() and vsc9959_tas_clock_adjust(). All the others are under ocelot->tas_lock, so move the vsc9959_sched_speed_set() access under that lock as well, to resolve the concurrency. Fixes: `55a515b1f5` ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 13:44:04 +01:00
Vladimir Oltean	843794bbde	net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605, frames even way larger than 601 octets are transmitted even though these should be considered as oversized, according to the documentation, and dropped. Since oversized frame dropping depends on frame size, which is only known at the EOF stage, and therefore not at SOF when cut-through forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_* into consideration for traffic classes that are cut-through. Since cut-through forwarding has no UAPI to control it, and the driver enables it based on the mantra "if we can, then why not", the strategy is to alter vsc9959_cut_through_fwd() to take into consideration which tc's have oversize frame dropping enabled, and disable cut-through for them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the cut-through determination process. There are 2 strategies for vsc9959_cut_through_fwd() to determine whether a tc has oversized dropping enabled or not. One is to keep a bit mask of traffic classes per port, and the other is to read back from the hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the feature is enabled). We choose reading back from registers, because struct ocelot_port is shared with drivers (ocelot, seville) that don't support either cut-through nor tc-taprio, and we don't have a felix specific extension of struct ocelot_port. Furthermore, reading registers from the Felix hardware is quite cheap, since they are memory-mapped. Fixes: `55a515b1f5` ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 13:44:04 +01:00
Vladimir Oltean	11afdc6526	net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet The blamed commit broke tc-taprio schedules such as this one: tc qdisc replace dev $swp1 root taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 \ sched-entry S 0x7f 990000 \ sched-entry S 0x80 10000 \ flags 0x2 because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard band added earlier than its 'gate close' event, such that packet overruns won't occur in the worst case of the largest packet possible. Since guard bands are statically determined based on the per-tc QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU, we need to discuss what happens with TC 7 depending on kernel version, since the driver, prior to commit `55a515b1f5` ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port"), did not touch QSYS_QMAXSDU_CFG_, and therefore relied on QSYS_PORT_MAX_SDU. 1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to 1518, and at gigabit this introduces a static guard band (independent of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit time of 20 octets => 160 ns). But this is larger than the time window itself, of 10000 ns. So, the queue system never considers a frame with TC 7 as eligible for transmission, since the gate practically never opens, and these frames are forever stuck in the TX queues and hang the port. 2 (after vsc9959_tas_guard_bands_update): Under the sole goal of enabling oversized frame dropping, we make an effort to set QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays one more role, which we did not take into account: per-tc static guard band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead). There is a discrepancy between what the driver thinks (that there is no guard band, and 100% of min_gate_len[tc] is available for egress scheduling) and what the hardware actually does (crops the equivalent of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this means that the hardware thinks it has exactly 0 ns for scheduling tc 7. In both cases, even minimum sized Ethernet frames are stuck on egress rather than being considered for scheduling on TC 7, even if they would fit given a proper configuration. Considering the current situation, with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230 octets in size are not eligible for oversized dropping (because they are smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible for scheduling either, because the min_gate_len[7] (10000 ns) minus the guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets 8 ns per octet == 9840 ns) minus the guard band auto-added for L1 overhead by QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets) leaves 0 ns for scheduling in the queue system proper. Investigating the hardware behavior, it becomes apparent that the queue system needs precisely 33 ns of 'gate open' time in order to consider a frame as eligible for scheduling to a tc. So the solution to this problem is to amend vsc9959_tas_guard_bands_update(), by giving the per-tc guard bands less space by exactly 33 ns, just enough for one frame to be scheduled in that interval. This allows the queue system to make forward progress for that port-tc, and prevents it from hanging. Fixes: `297c4de6f7` ("net: dsa: felix: re-enable TAS guard band mode") Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 13:44:04 +01:00
David S. Miller	da7d8e65b3	Merge branch 'netlink-be-policy' Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 12:33:44 +01:00
Florian Westphal	e7af210e6d	netfilter: nft_payload: reject out-of-range attributes via policy Now that nla_policy allows range checks for bigendian data make use of this to reject such attributes. At this time, reject happens later from the init or select_ops callbacks, but its prone to errors. In the future, new attributes can be handled via NLA_POLICY_MAX_BE and exiting ones can be converted one by one. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 12:33:44 +01:00
Florian Westphal	08724ef699	netlink: introduce NLA_POLICY_MAX_BE netlink allows to specify allowed ranges for integer types. Unfortunately, nfnetlink passes integers in big endian, so the existing NLA_POLICY_MAX() cannot be used. At the moment, nfnetlink users, such as nf_tables, need to resort to programmatic checking via helpers such as nft_parse_u32_check(). This is both cumbersome and error prone. This adds NLA_POLICY_MAX_BE which adds range check support for BE16, BE32 and BE64 integers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 12:33:43 +01:00
David S. Miller	98ba81081b	Merge branch 'sfc-ptp' Edward Cree says: ==================== sfc: add support for PTP over IPv6 and 802.3 Most recent cards (8000 series and newer) had enough hardware support for this, but it was not enabled in the driver. The transmission of PTP packets over these protocols was already added in commit `bd4a2697e5` ("sfc: use hardware tx timestamps for more than PTP"), but receiving them was already unsupported so synchronization didn't happen. These patches add support for timestamping received packets over IPv6/UPD and IEEE802.3. v2: fixed weird indentation in efx_ptp_init_filter v3: fixed bug caused by usage of htons in PTP_EVENT_PORT definition. It was used in more places, where htons was used too, so using it 2 times leave it again in host order. I didn't detected it in my tests because it only affected if timestamping through the MC, but the model I used do it through the MAC. Detected by kernel test robot <lkp@intel.com> v4: removed `inline` specifiers from 2 local functions v5: restored deleted comment with useful explanation about packets reordering. Deleted useless whitespaces. ==================== Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 12:23:25 +01:00
Íñigo Huguet	e4616f6472	sfc: support PTP over Ethernet The previous patch add support for PTP over IPv6/UDP (only for 8000 series and newer) and this one add support for PTP over 802.3. Tested: sync as master and as slave is correct with ptp4l. PTP over IPv4 and IPv6 still works fine. Suggested-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 12:23:25 +01:00
Íñigo Huguet	621918c45f	sfc: support PTP over IPv6/UDP commit `bd4a2697e5` ("sfc: use hardware tx timestamps for more than PTP") added support for hardware timestamping on TX for cards of the 8000 series and newer, in an effort to provide support for other transports other than IPv4/UDP. However, timestamping was still not working on RX for these other transports. This patch add support for PTP over IPv6/UDP. Tested: sync as master and as slave is correct using ptp4l from linuxptp package, both with IPv4 and IPv6. Suggested-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 12:23:25 +01:00
Íñigo Huguet	313aa13a07	sfc: allow more flexible way of adding filters for PTP In preparation for the support of PTP over IPv6/UDP and Ethernet in next patches, allow a more flexible way of adding and removing RX filters for PTP. Right now, only 2 filters are allowed, which are the ones needed for PTP over IPv4/UDP. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 12:23:25 +01:00
Jerry Ray	13248b9750	net: dsa: LAN9303: Add basic support for LAN9354 Adding support for the LAN9354 device by allowing it to use the LAN9303 DSA driver. These devices have the same underlying access and control methods and from a feature set point of view the LAN9354 is a superset of the LAN9303. The MDIO access method has been tested on a SAMA5D3-EDS board with a LAN9354 RMII daughter card. While the SPI access method should also be the same, it has not been tested and as such is not included at this time. Signed-off-by: Jerry Ray <jerry.ray@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 11:06:04 +01:00
Jerry Ray	732f374e23	net: dsa: LAN9303: Add early read to sync Add initial BYTE_ORDER read to sync the 32-bit accesses over the 16-bit mdio bus to improve driver robustness. The lan9303 expects two mdio read transactions back-to-back to read a 32-bit register. The first read transaction causes the other half of the 32-bit register to get latched. The subsequent read returns the latched second half of the 32-bit read. The BYTE_ORDER register is an exception to this rule. As it is a constant value, there is no need to latch the second half. We read this register first in case there were reads during the boot loader process that might have occurred prior to this driver taking over ownership of accessing this device. This patch has been tested on the SAMA5D3-EDS with a LAN9303 RMII daughter card. Signed-off-by: Jerry Ray <jerry.ray@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 11:06:04 +01:00
Romain Naour	6674e7fd3b	net: dsa: microchip: add regmap_range for KSZ9896 chip Add register validation for KSZ9896. Signed-off-by: Romain Naour <romain.naour@skf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 10:39:06 +01:00
Romain Naour	3a8b8ea6c7	net: dsa: microchip: ksz9477: remove 0x033C and 0x033D addresses from regmap_access_tables According to the KSZ9477S datasheet, there is no global register at 0x033C and 0x033D addresses. Signed-off-by: Romain Naour <romain.naour@skf.com> Cc: Oleksij Rempel <o.rempel@pengutronix.de> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 10:39:06 +01:00
Romain Naour	1376752592	net: dsa: microchip: add KSZ9896 to KSZ9477 I2C driver Add support for the KSZ9896 6-port Gigabit Ethernet Switch to the ksz9477 driver. The KSZ9896 supports both SPI (already in) and I2C. Signed-off-by: Romain Naour <romain.naour@skf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 10:39:06 +01:00
Romain Naour	2eb3ff3c09	net: dsa: microchip: add KSZ9896 switch support Add support for the KSZ9896 6-port Gigabit Ethernet Switch to the ksz9477 driver. Although the KSZ9896 is already listed in the device tree binding documentation since `a1c0ed24fe` (dt-bindings: net: dsa: document additional Microchip KSZ9477 family switches) the chip id (0x00989600) is not recognized by ksz_switch_detect() and rejected by the driver. The KSZ9896 is similar to KSZ9897 but has only one configurable MII/RMII/RGMII/GMII cpu port. Signed-off-by: Romain Naour <romain.naour@skf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-07 10:39:06 +01:00
chen zhang	7a1ec84ffb	efi/x86: libstub: remove unused variable The variable "has_system_memory" is unused in function ‘adjust_memory_range_protection’, remove it. Signed-off-by: chen zhang <chenzhang@kylinos.cn> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2022-09-07 09:03:53 +02:00
David Howells	0066f1b0e2	afs: Return -EAGAIN, not -EREMOTEIO, when a file already locked When trying to get a file lock on an AFS file, the server may return UAEAGAIN to indicate that the lock is already held. This is currently translated by the default path to -EREMOTEIO. Translate it instead to -EAGAIN so that we know we can retry it. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey E Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/166075761334.3533338.2591992675160918098.stgit@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-09-06 21:33:01 -04:00
Linus Torvalds	19f516ea34	ARM fixes for 6.0 Just one fix for now for the AMBA bus code from Isaac Manjarres. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAmMXuFIACgkQ9OeQG+St rGRCSRAAmKNCEAzciVrs8Zq7c1M+44ba5kbQxEFe4YkZyAVd/EEEhtQOFFxUA/BH drrUFLE6hBJH0sDMJ9o1c/AQe0GQ4L7tFwI5TqxIrThIy/9IwzBn6OTZpOTizlY9 z84xymy33+4TsozmSNKbzEGRSCA/UYSjsIlCrEAtEzGqWmFtBaVEs6hGCdsPQp+J bnVHpnbDM5WgTnyOHKh9sQFWqA1bLiPQpNksdGcdIMtXdhfBrPySMubVZ6J2zhpo hxhsp5Zdp4j5odj5/0wpPWtDMMRDo7Duiir6ubz1nAdkF8JjjMW5bxaS9I90LuZg /73BS/8MZUkA0URI0uZoMl4qrqLytjBgCnLG3rYX6U1xo4A4dVkyx2uKuE/Q99YN WlC5sA941U+8xYdIhukd5yELQbCQiP0WOYJBEIEg69OsAIiDZboDydw46iq+K2eV f9pMxVa+evkZkfBmvfB3ow655CwybYhjTvAQqC2EyJnv8Fs/XCIA2jea5f/tlfCc e2nIkunPV7UlC/MAK87U0B4i8AjTWoTWspaAdjgdY6TfCL/VSNv5I6YBh/PWIvNp TnFYcw+Aq1V9CXW8G6J0/Nmjn8KmAFhhoC8zSqZNBbyFkfFuW0CyGCcXFEogOxaE uVk/n1geBeIHq5+u26A2c+IVmAjzpq853WsaCxWOmH9CdwN7l7E= =WHUk -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm Pull ARM fix from Russell King: "Just one fix for now for the AMBA bus code from Isaac Manjarres" * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9229/1: amba: Fix use-after-free in amba_read_periphid()	2022-09-06 21:27:05 -04:00
Paolo Abeni	2786bcff28	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-09-05 The following pull-request contains BPF updates for your net-next tree. We've added 106 non-merge commits during the last 18 day(s) which contain a total of 159 files changed, 5225 insertions(+), 1358 deletions(-). There are two small merge conflicts, resolve them as follows: 1) tools/testing/selftests/bpf/DENYLIST.s390x Commit `27e23836ce` ("selftests/bpf: Add lru_bug to s390x deny list") in bpf tree was needed to get BPF CI green on s390x, but it conflicted with newly added tests on bpf-next. Resolve by adding both hunks, result: [...] lru_bug # prog 'printk': failed to auto-attach: -524 setget_sockopt # attach unexpected error: -524 (trampoline) cb_refs # expected error message unexpected error: -524 (trampoline) cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc) htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline) [...] 2) net/core/filter.c Commit `1227c1771d` ("net: Fix data-races around sysctl_[rw]mem_(max\|default).") from net tree conflicts with commit `29003875bd` ("bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt()") from bpf-next tree. Take the code as it is from bpf-next tree, result: [...] if (getopt) { if (optname == SO_BINDTODEVICE) return -EINVAL; return sk_getsockopt(sk, SOL_SOCKET, optname, KERNEL_SOCKPTR(optval), KERNEL_SOCKPTR(optlen)); } return sk_setsockopt(sk, SOL_SOCKET, optname, KERNEL_SOCKPTR(optval), optlen); [...] The main changes are: 1) Add any-context BPF specific memory allocator which is useful in particular for BPF tracing with bonus of performance equal to full prealloc, from Alexei Starovoitov. 2) Big batch to remove duplicated code from bpf_{get,set}sockopt() helpers as an effort to reuse the existing core socket code as much as possible, from Martin KaFai Lau. 3) Extend BPF flow dissector for BPF programs to just augment the in-kernel dissector with custom logic. In other words, allow for partial replacement, from Shmulik Ladkani. 4) Add a new cgroup iterator to BPF with different traversal options, from Hao Luo. 5) Support for BPF to collect hierarchical cgroup statistics efficiently through BPF integration with the rstat framework, from Yosry Ahmed. 6) Support bpf_{g,s}et_retval() under more BPF cgroup hooks, from Stanislav Fomichev. 7) BPF hash table and local storages fixes under fully preemptible kernel, from Hou Tao. 8) Add various improvements to BPF selftests and libbpf for compilation with gcc BPF backend, from James Hilliard. 9) Fix verifier helper permissions and reference state management for synchronous callbacks, from Kumar Kartikeya Dwivedi. 10) Add support for BPF selftest's xskxceiver to also be used against real devices that support MAC loopback, from Maciej Fijalkowski. 11) Various fixes to the bpf-helpers(7) man page generation script, from Quentin Monnet. 12) Document BPF verifier's tnum_in(tnum_range(), ...) gotchas, from Shung-Hsi Yu. 13) Various minor misc improvements all over the place. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (106 commits) bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc. bpf: Remove usage of kmem_cache from bpf_mem_cache. bpf: Remove prealloc-only restriction for sleepable bpf programs. bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. bpf: Remove tracing program restriction on map types bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. bpf: Add percpu allocation support to bpf_mem_alloc. bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. bpf: Adjust low/high watermarks in bpf_mem_cache bpf: Optimize call_rcu in non-preallocated hash map. bpf: Optimize element count in non-preallocated hash map. bpf: Relax the requirement to use preallocated hash maps in tracing progs. samples/bpf: Reduce syscall overhead in map_perf_test. selftests/bpf: Improve test coverage of test_maps bpf: Convert hash map to bpf_mem_alloc. bpf: Introduce any context BPF specific memory allocator. selftest/bpf: Add test for bpf_getsockopt() bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt() bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt() bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt() ... ==================== Link: https://lore.kernel.org/r/20220905161136.9150-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-06 23:21:18 +02:00
Michal Jaron	11c12adcbc	iavf: Fix race between iavf_close and iavf_reset_task During stress tests with adding VF to namespace and changing vf's trust there was a race between iavf_reset_task and iavf_close. Sometimes when IAVF_FLAG_AQ_DISABLE_QUEUES from iavf_close was sent to PF after reset and before IAVF_AQ_GET_CONFIG was sent then PF returns error IAVF_NOT_SUPPORTED to disable queues request and following requests. There is need to get_config before other aq_required will be send but iavf_close clears all flags, if get_config was not sent before iavf_close, then it will not be send at all. In case when IAVF_FLAG_AQ_GET_OFFLOAD_VLAN_V2_CAPS was sent before IAVF_FLAG_AQ_DISABLE_QUEUES then there was rtnl_lock deadlock between iavf_close and iavf_adminq_task until iavf_close timeouts and disable queues was sent after iavf_close ends. There was also a problem with sending delete/add filters. Sometimes when filters was not yet added to PF and in iavf_close all filters was set to remove there might be a try to remove nonexistent filters on PF. Add aq_required_tmp to save aq_required flags and send them after disable_queues will be handled. Clear flags given to iavf_down different than IAVF_FLAG_AQ_GET_CONFIG as this flag is necessary to sent other aq_required. Remove some flags that we don't want to send as we are in iavf_close and we want to disable interface. Remove filters which was not yet sent and send del filters flags only when there are filters to remove. Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-09-06 14:16:17 -07:00

1 2 3 4 5 ...

1123548 Commits