linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-24 11:34:50 +08:00

Author	SHA1	Message	Date
Eric Dumazet	bf36267e3a	tcp: annotate data-race around queue->synflood_warned Annotate the lockless read of queue->synflood_warned. Following xchg() has the needed data-race resolution. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 13:32:53 +00:00
Li zeming	1d7322f28f	ax25: af_ax25: Remove unnecessary (void) conversions The valptr pointer is of (void ) type, so other pointers need not be forced to assign values to it. Signed-off-by: Li zeming <zeming@nfschina.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 13:31:03 +00:00
David S. Miller	ca5ebbfec3	Merge branch 'net-atomic-dev-stats' Eric Dumazet says: ==================== net: add atomic dev->stats infra Long standing KCSAN issues are caused by data-race around some dev->stats changes. Most performance critical paths already use per-cpu variables, or per-queue ones. It is reasonable (and more correct) to use atomic operations for the slow paths. First patch adds the infrastructure, then three patches address the most common paths that syzbot is playing with. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:48:44 +00:00
Eric Dumazet	c4794d2225	ipv4: tunnels: use DEV_STATS_INC() Most of code paths in tunnels are lockless (eg NETIF_F_LLTX in tx). Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:48:44 +00:00
Eric Dumazet	2fad1ba354	ipv6: tunnels: use DEV_STATS_INC() Most of code paths in tunnels are lockless (eg NETIF_F_LLTX in tx). Adopt SMP safe DEV_STATS_{INC\|ADD}() to update dev->stats fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:48:44 +00:00
Eric Dumazet	cb34b7cf17	ipv6/sit: use DEV_STATS_INC() to avoid data-races syzbot/KCSAN reported that multiple cpus are updating dev->stats.tx_error concurrently. This is because sit tunnels are NETIF_F_LLTX, meaning their ndo_start_xmit() is not protected by a spinlock. While original KCSAN report was about tx path, rx path has the same issue. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:48:44 +00:00
Eric Dumazet	6c1c509778	net: add atomic_long_t to net_device_stats fields Long standing KCSAN issues are caused by data-race around some dev->stats changes. Most performance critical paths already use per-cpu variables, or per-queue ones. It is reasonable (and more correct) to use atomic operations for the slow paths. This patch adds an union for each field of net_device_stats, so that we can convert paths that are not yet protected by a spinlock or a mutex. netdev_stats_to_stats64() no longer has an #if BITS_PER_LONG==64 Note that the memcpy() we were using on 64bit arches had no provision to avoid load-tearing, while atomic_long_read() is providing the needed protection at no cost. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:48:44 +00:00
David S. Miller	68d268d089	Merge branch 'net-try_cmpxchg-conversions' Eric Dumazet says: ==================== net: more try_cmpxchg() conversions Adopt try_cmpxchg() and friends in more places, as this is preferred nowadays. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:42:01 +00:00
Eric Dumazet	4ebf802cf1	net: __sock_gen_cookie() cleanup Adopt atomic64_try_cmpxchg() and remove the loop, to make the intent more obvious. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:42:01 +00:00
Eric Dumazet	4ffa1d1c68	net: adopt try_cmpxchg() in napi_{enable\|disable}() This makes code a bit cleaner. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:42:01 +00:00
Eric Dumazet	1462160c74	net: adopt try_cmpxchg() in napi_schedule_prep() and napi_complete_done() This makes the code slightly more efficient. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:42:01 +00:00
Eric Dumazet	6af645a5b2	net: net_{enable\|disable}_timestamp() optimizations Adopting atomic_try_cmpxchg() makes the code cleaner. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:42:00 +00:00
Eric Dumazet	30189806fb	ipv6: fib6_new_sernum() optimization Adopt atomic_try_cmpxchg() which is slightly more efficient. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:42:00 +00:00
Eric Dumazet	57fc05e8e8	net: mm_account_pinned_pages() optimization Adopt atomic_long_try_cmpxchg() in mm_account_pinned_pages() as it is slightly more efficient. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 12:42:00 +00:00
Vladimir Oltean	8c55facecd	net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down RFC 2863 says: The lowerLayerDown state is also a refinement on the down state. This new state indicates that this interface runs "on top of" one or more other interfaces (see ifStackTable) and that this interface is down specifically because one or more of these lower-layer interfaces are down. DSA interfaces are virtual network devices, stacked on top of the DSA master, but they have a physical MAC, with a PHY that reports a real link status. But since DSA (perhaps improperly) uses an iflink to describe the relationship to its master since commit `c084080151` ("dsa: set ->iflink on slave interfaces to the ifindex of the parent"), default_operstate() will misinterpret this to mean that every time the carrier of a DSA interface is not ok, it is because of the master being not ok. In fact, since commit `c0a8a9c274` ("net: dsa: automatically bring user ports down when master goes down"), DSA cannot even in theory be in the lowerLayerDown state, because it just calls dev_close_many(), thereby going down, when the master goes down. We could revert the commit that creates an iflink between a DSA user port and its master, especially since now we have an alternative IFLA_DSA_MASTER which has less side effects. But there may be tooling in use which relies on the iflink, which has existed since 2009. We could also probably do something local within DSA to overwrite what rfc2863_policy() did, in a way similar to hsr_set_operstate(), but this seems like a hack. What seems appropriate is to follow the iflink, and check the carrier status of that interface as well. If that's down too, yes, keep reporting lowerLayerDown, otherwise just down. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:45:00 +00:00
David S. Miller	fd258f2aba	Merge branch 'udp-pernetns-hash' Kuniyuki Iwashima says: ==================== udp: Introduce optional per-netns hash table. This series is the UDP version of the per-netns ehash series [0], which were initially in the same patch set. [1] The notable difference with TCP is the max table size is 64K and the min size is 128. This is because the possible hash range by udp_hashfn() always fits in 64K within the same netns and because we want to keep a bitmap in udp_lib_get_port() on the stack. Also, the UDP per-netns table isolates both 1-tuple and 2-tuple tables. For details, please see the last patch. patch 1 - 4: prep for per-netns hash table patch 5: add per-netns hash table [0]: https://lore.kernel.org/netdev/20220908011022.45342-1-kuniyu@amazon.com/ [1]: https://lore.kernel.org/netdev/20220826000445.46552-1-kuniyu@amazon.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:43:36 +00:00
Kuniyuki Iwashima	9804985bf2	udp: Introduce optional per-netns hash table. The maximum hash table size is 64K due to the nature of the protocol. [0] It's smaller than TCP, and fewer sockets can cause a performance drop. On an EC2 c5.24xlarge instance (192 GiB memory), after running iperf3 in different netns, creating 32Mi sockets without data transfer in the root netns causes regression for the iperf3's connection. uhash_entries sockets length Gbps 64K 1 1 5.69 1Mi 16 5.27 2Mi 32 4.90 4Mi 64 4.09 8Mi 128 2.96 16Mi 256 2.06 32Mi 512 1.12 The per-netns hash table breaks the lengthy lists into shorter ones. It is useful on a multi-tenant system with thousands of netns. With smaller hash tables, we can look up sockets faster, isolate noisy neighbours, and reduce lock contention. The max size of the per-netns table is 64K as well. This is because the possible hash range by udp_hashfn() always fits in 64K within the same netns and we cannot make full use of the whole buckets larger than 64K. /* 0 < num < 64K -> X < hash < X + 64K */ (num + net_hash_mix(net)) & mask; Also, the min size is 128. We use a bitmap to search for an available port in udp_lib_get_port(). To keep the bitmap on the stack and not fire the CONFIG_FRAME_WARN error at build time, we round up the table size to 128. The sysctl usage is the same with TCP: $ dmesg \| cut -d ' ' -f 6- \| grep "UDP hash" UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc) # sysctl net.ipv4.udp_hash_entries net.ipv4.udp_hash_entries = 65536 # can be changed by uhash_entries # sysctl net.ipv4.udp_child_hash_entries net.ipv4.udp_child_hash_entries = 0 # disabled by default # ip netns add test1 # ip netns exec test1 sysctl net.ipv4.udp_hash_entries net.ipv4.udp_hash_entries = -65536 # share the global table # sysctl -w net.ipv4.udp_child_hash_entries=100 net.ipv4.udp_child_hash_entries = 100 # ip netns add test2 # ip netns exec test2 sysctl net.ipv4.udp_hash_entries net.ipv4.udp_hash_entries = 128 # own a per-netns table with 2^n buckets We could optimise the hash table lookup/iteration further by removing the netns comparison for the per-netns one in the future. Also, we could optimise the sparse udp_hslot layout by putting it in udp_table. [0]: https://lore.kernel.org/netdev/4ACC2815.7010101@gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:43:35 +00:00
Kuniyuki Iwashima	ba6aac1516	udp: Access &udp_table via net. We will soon introduce an optional per-netns hash table for UDP. This means we cannot use udp_table directly in most places. Instead, access it via net->ipv4.udp_table. The access will be valid only while initialising udp_table itself and creating/destroying each netns. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:43:35 +00:00
Kuniyuki Iwashima	478aee5d6b	udp: Set NULL to udp_seq_afinfo.udp_table. We will soon introduce an optional per-netns hash table for UDP. This means we cannot use the global udp_seq_afinfo.udp_table to fetch a UDP hash table. Instead, set NULL to udp_seq_afinfo.udp_table for UDP and get a proper table from net->ipv4.udp_table. Note that we still need udp_seq_afinfo.udp_table for UDP LITE. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:43:35 +00:00
Kuniyuki Iwashima	67fb43308f	udp: Set NULL to sk->sk_prot->h.udp_table. We will soon introduce an optional per-netns hash table for UDP. This means we cannot use the global sk->sk_prot->h.udp_table to fetch a UDP hash table. Instead, set NULL to sk->sk_prot->h.udp_table for UDP and get a proper table from net->ipv4.udp_table. Note that we still need sk->sk_prot->h.udp_table for UDP LITE. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:43:35 +00:00
Kuniyuki Iwashima	919dfa0b20	udp: Clean up some functions. This patch adds no functional change and cleans up some functions that the following patches touch around so that we make them tidy and easy to review/revert. The change is mainly to keep reverse christmas tree order. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:43:35 +00:00
David S. Miller	e88225656d	Merge branch 'sfc-TC-offload-counters' Edward Cree says: ==================== sfc: TC offload counters EF100 hardware supports attaching counters to action-sets in the MAE. Use these counters to implement stats for TC flower offload. The counters are delivered to the host over a special hardware RX queue which should only ever receive counter update messages, not 'real' network packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:03 +00:00
Edward Cree	50f8f2f7fb	sfc: implement counters readout to TC stats On FLOW_CLS_STATS, look up the MAE counter by TC cookie, and report the change in packet and byte count since the last time FLOW_CLS_STATS read them. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:03 +00:00
Edward Cree	83a187a4eb	sfc: validate MAE action order Currently the only actions supported are COUNT and DELIVER, which can only happen in the right order; but when more actions are added, it will be necessary to check that they are only used in the same order in which the hardware performs them (since the hardware API takes an action set in which the order is implicit). For instance, a VLAN pop must not follow a VLAN push. Most practical use-cases should be unaffected by these restrictions. Add a function efx_tc_flower_action_order_ok() that checks whether it is appropriate to add a specified action to the existing action-set. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:03 +00:00
Edward Cree	2e0f1eb056	sfc: attach an MAE counter to TC actions that need it The only actions that expect stats (that sfc HW supports) are gact shot (drop), mirred redirect and mirred mirror. Since these are 'deliverish' actions that end an action-set, we only require at most one counter per action-set. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:03 +00:00
Edward Cree	c4bad432b9	sfc: accumulate MAE counter values from update packets Add the packet and byte counts to the software running total, and store the latest jiffies every time the counter is bumped. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:02 +00:00
Edward Cree	0363aa2957	sfc: add functions to allocate/free MAE counters efx_tc_flower_get_counter_index() will create an MAE counter mapped to the passed (TC filter) cookie, or increment the reference if one already exists for that cookie. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:02 +00:00
Edward Cree	19a0c98910	sfc: add hashtables for MAE counters and counter ID mappings Nothing populates them yet. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:02 +00:00
Edward Cree	25730d8be5	sfc: add extra RX channel to receive MAE counter updates on ef100 Currently there is no counter-allocating machinery to connect the resulting counter update values to; that will be added in a subsequent patch. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:02 +00:00
Edward Cree	e5731274cd	sfc: add ef100 MAE counter support functions Start and stop MAE counter streaming, and grant credits. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:02 +00:00
Edward Cree	36df6136a7	sfc: add ability for extra channels to receive raw RX buffers The TC extra channel will need its own special RX handling, which must operate before any code that expects the RX buffer to contain a network packet; buffers on this RX queue contain MAE counter packets in a special format that does not resemble an Ethernet frame, and many fields of the RX packet prefix are not populated. The USER_MARK field, however, is populated with the generation count from the counter subsystem, which needs to be passed on to the RX handler. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:02 +00:00
Edward Cree	85697f97fd	sfc: add start and stop methods to channels The TC extra channel needs to do extra work in efx_{start,stop}_channels() to start/stop MAE counter streaming from the hardware. Add callbacks for it to implement. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:02 +00:00
Edward Cree	e395153984	sfc: add ability for an RXQ to grant credits on refill EF100 hardware streams MAE counter updates to the driver over a dedicated RX queue; however, the MCPU is not able to detect when RX buffers have been posted to the ring. Thus, the driver must call MC_CMD_MAE_COUNTERS_STREAM_GIVE_CREDITS; this patch adds the infrastructure to support that to the core RXQ handling code. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:02 +00:00
Edward Cree	5ae0c22634	sfc: fix ef100 RX prefix macro Macro PREFIX_WIDTH_MASK uses unsigned long arithmetic for a shift of up to 32 bits, which breaks on 32-bit systems. This did not previously show up as we weren't using any fields of width 32, but we now need to access ESF_GZ_RX_PREFIX_USER_MARK. Change it to unsigned long long. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-16 09:07:02 +00:00
Jakub Kicinski	7d63b21d27	Merge branch 'remove-phylink_validate-from-felix-dsa-driver' Vladimir Oltean says: ==================== Remove phylink_validate() from Felix DSA driver The Felix DSA driver still uses its own phylink_validate() procedure rather than the (relatively newly introduced) phylink_generic_validate() because the latter did not cater for the case where a PHY provides rate matching between the Ethernet cable side speed and the SERDES side speed (and does not advertise other speeds except for the SERDES speed). This changed with Sean Anderson's generic support for rate matching PHYs in phylib and phylink: https://patchwork.kernel.org/project/netdevbpf/cover/20220920221235.1487501-1-sean.anderson@seco.com/ Building upon that support, this patch set makes Linux understand that the PHYs used in combination with the Felix DSA driver (SCH-30841 riser card with AQR412 PHY, used with SERDES protocol 0x7777 - 4x2500base-x, plugged into LS1028A-QDS) do support PAUSE rate matching. This requires Aquantia PHY driver support for new PHY IDs. To activate the rate matching support in phylink, config->mac_capabilities must be populated. Coincidentally, this also opts the Felix driver into the generic phylink validation. Next, code that is no longer necessary is eliminated. This includes the Felix driver validation procedures for VSC9959 and VSC9953, the workaround in the Ocelot switch library to leave RX flow control always enabled, as well as DSA plumbing necessary for a custom phylink validation procedure to be propagated to the hardware driver level. Many thanks go to Sean Anderson for providing generic support for rate matching. ==================== Link: https://lore.kernel.org/r/20221114170730.2189282-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:34:40 -08:00
Vladimir Oltean	53d04b9811	net: dsa: remove phylink_validate() method As of now, no DSA driver uses a custom link mode validation procedure anymore. So remove this DSA operation and let phylink determine what is supported based on config->mac_capabilities (if provided by the driver). Leave a comment why we left the code that we did, and that there is more work to do. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:34:27 -08:00
Vladimir Oltean	de8586ed43	net: mscc: ocelot: drop workaround for forcing RX flow control As phylink gained generic support for PHYs with rate matching via PAUSE frames, the phylink_mac_link_up() method will be called with the maximum speed and with rx_pause=true if rate matching is in use. This means that setups with 2500base-x as the SERDES protocol between the MAC/PCS and the PHY now work with no need for the driver to do anything special. Tested with fsl-ls1028a-qds-7777.dts. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:34:27 -08:00
Vladimir Oltean	3e7e783291	net: dsa: felix: use phylink_generic_validate() Drop the custom implementation of phylink_validate() in favor of the generic one, which requires config->mac_capabilities to be set. This was used up until now because of the possibility of being paired with Aquantia PHYs with support for rate matching. The phylink framework gained generic support for these, and knows to advertise all 10/100/1000 lower speed link modes when our SERDES protocol is 2500base-x (fixed speed). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:34:27 -08:00
Vladimir Oltean	973fbe68df	net: phy: aquantia: add AQR112 and AQR412 PHY IDs These are Gen3 Aquantia N-BASET PHYs which support 5GBASE-T, 2.5GBASE-T, 1000BASE-T and 100BASE-TX (not 10G); also EEE, Sync-E, PTP, PoE. The 112 is a single PHY package, the 412 is a quad PHY package. The system-side SERDES interface of these PHYs selects its protocol depending on the negotiated media side link speed. That protocol can be 1000BASE-X, 2500BASE-X, 10GBASE-R, SGMII, USXGMII. The configuration of which SERDES protocol to use for which link speed is made by firmware; even though it could be overwritten over MDIO by Linux, we assume that the firmware provisioning is ok for the board on which the driver probes. For cases when the system side runs at a fixed rate, we want phylib/phylink to detect the PAUSE rate matching ability of these PHYs, so we need to use the Aquantia rather than the generic C45 driver. This needs aqr107_read_status() -> aqr107_read_rate() to set phydev->rate_matching, as well as the aqr107_get_rate_matching() method. I am a bit unsure about the naming convention in the driver. Since AQR107 is a Gen2 PHY, I assume all functions prefixed with "aqr107_" rather than "aqr_" mean Gen2+ features. So I've reused this naming convention. I've tested PHY "SGMII" statistics as well as the .link_change_notify method, which prints: Aquantia AQR412 mdio_mux-0.4:00: Link partner is Aquantia PHY, FW 4.3, fast-retrain downshift advertised, fast reframe advertised Tested SERDES protocols are usxgmii and 2500base-x (the latter with PAUSE rate matching). Tested link modes are 100/1000/2500 Base-T (with Aquantia link partner and with other link partners). No notable events observed. The placement of these PHY IDs in the driver is right before AQR113C, a Gen4 PHY. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:34:27 -08:00
Jakub Kicinski	b87584cb8d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Fix sparse warning in the new nft_inner expression, reported by Jakub Kicinski. 2) Incorrect vlan header check in nft_inner, from Peng Wu. 3) Two patches to pass reset boolean to expression dump operation, in preparation for allowing to reset stateful expressions in rules. This adds a new NFT_MSG_GETRULE_RESET command. From Phil Sutter. 4) Inconsistent indentation in nft_fib, from Jiapeng Chong. 5) Speed up siphash calculation in conntrack, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: conntrack: use siphash_4u64 netfilter: rpfilter/fib: clean up some inconsistent indenting netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters netfilter: nft_inner: fix return value check in nft_inner_parse_l2l3() netfilter: nft_payload: use __be16 to store gre version ==================== Link: https://lore.kernel.org/r/20221115095922.139954-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:33:20 -08:00
Walter Heymans	1ec6360ddb	Documentation: nfp: update documentation The NFP documentation is updated to include information about Corigine, and the new NFP3800 chips. The 'Acquiring Firmware' section is updated with new information about where to find firmware. Two new sections are added to expand the coverage of the documentation. The new sections include: - Devlink Info - Configure Device Signed-off-by: Walter Heymans <walter.heymans@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20221115090834.738645-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:31:08 -08:00
Jakub Kicinski	0f54d36e2f	Merge branch 'mtk_eth_soc-rx-vlan-offload-improvement-dsa-hardware-untag-support' Felix Fietkau says: ==================== mtk_eth_soc rx vlan offload improvement + dsa hardware untag support This series improves rx vlan offloading on mtk_eth_soc and extends it to support hardware DSA untagging where possible. This improves performance by avoiding calls into the DSA tag driver receive function, including mangling of skb->data. This is split out of a previous series, which added other fixes and multiqueue support ==================== Link: https://lore.kernel.org/r/20221114124214.58199-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:23:19 -08:00
Felix Fietkau	2d7605a729	net: ethernet: mtk_eth_soc: enable hardware DSA untagging - pass the tag to DSA via metadata dst - disabled on 7986 for now, since it's not working yet - disabled if a MAC is enabled that does not use DSA This improves performance by bypassing the DSA tag driver and avoiding extra skb data mangling Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:22:08 -08:00
Felix Fietkau	08666cbb7d	net: ethernet: mtk_eth_soc: add support for configuring vlan rx offload Keep the vlan rx offload feature in sync across all netdevs belonging to the device, since the feature is global and can't be turned off per MAC Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:22:08 -08:00
Felix Fietkau	1904870315	net: ethernet: mtk_eth_soc: pass correct VLAN protocol ID to the network stack Use the id from the DMA descriptor instead of hardcoding 802.1q Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:22:07 -08:00
Felix Fietkau	570d0a588d	net: dsa: add support for DSA rx offloading via metadata dst If a metadata dst is present with the type METADATA_HW_PORT_MUX on a dsa cpu port netdev, assume that it carries the port number and that there is no DSA tag present in the skb data. Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-15 20:22:07 -08:00
Daniel Machon	7eba450539	net: dcb: move getapptrust to separate function This patch fixes a frame size warning, reported by kernel test robot. >> net/dcb/dcbnl.c:1230:1: warning: the frame size of 1244 bytes is >> larger than 1024 bytes [-Wframe-larger-than=] The getapptrust part of dcbnl_ieee_fill is moved to a separate function, and the selector array is now dynamically allocated, instead of stack allocated. Tested on microchip sparx5 driver. Fixes: `6182d5875c` ("net: dcb: add new apptrust attribute") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/20221114092950.2490451-1-daniel.machon@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-15 15:27:43 +01:00
Florian Westphal	d2c806abcf	netfilter: conntrack: use siphash_4u64 This function is used for every packet, siphash_4u64 is noticeably faster than using local buffer + siphash: Before: 1.23% kpktgend_0 [kernel.vmlinux] [k] __siphash_unaligned 0.14% kpktgend_0 [nf_conntrack] [k] hash_conntrack_raw After: 0.79% kpktgend_0 [kernel.vmlinux] [k] siphash_4u64 0.15% kpktgend_0 [nf_conntrack] [k] hash_conntrack_raw In the pktgen test this gives about ~2.4% performance improvement. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-11-15 10:53:19 +01:00
Jiapeng Chong	971095c6fa	netfilter: rpfilter/fib: clean up some inconsistent indenting No functional modification involved. net/ipv4/netfilter/nft_fib_ipv4.c:141 nft_fib4_eval() warn: inconsistent indenting. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2733 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-11-15 10:53:18 +01:00
Phil Sutter	8daa8fde3f	netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET Analogous to NFT_MSG_GETOBJ_RESET, but for rules: Reset stateful expressions like counters or quotas. The latter two are the only consumers, adjust their 'dump' callbacks to respect the parameter introduced earlier. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-11-15 10:53:17 +01:00

1 2 3 4 5 ...

1138424 Commits