linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2025-01-06 13:55:08 +08:00

Author	SHA1	Message	Date
Maxime Chevallier	d078d48063	net: ethtool: strset: Allow querying phy stats by index The ETH_SS_PHY_STATS command gets PHY statistics. Use the phydev pointer from the ethnl request to allow query phy stats from each PHY on the link. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-01 18:38:57 +00:00
Maxime Chevallier	fcc4b105ca	net: ethtool: cable-test: Target the command to the requested PHY Cable testing is a PHY-specific command. Instead of targeting the command towards dev->phydev, use the request to pick the targeted PHY. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-01 18:38:57 +00:00
Maxime Chevallier	345237dbc1	net: ethtool: pse-pd: Target the command to the requested PHY PSE and PD configuration is a PHY-specific command. Instead of targeting the command towards dev->phydev, use the request to pick the targeted PHY device. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-01 18:38:57 +00:00
Maxime Chevallier	7db69ec9cf	net: ethtool: plca: Target the command to the requested PHY PLCA is a PHY-specific command. Instead of targeting the command towards dev->phydev, use the request to pick the targeted PHY. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-01 18:38:57 +00:00
Maxime Chevallier	63d5eaf35a	net: ethtool: Introduce a command to list PHYs on an interface As we have the ability to track the PHYs connected to a net_device through the link_topology, we can expose this list to userspace. This allows userspace to use these identifiers for phy-specific commands and take the decision of which PHY to target by knowing the link topology. Add PHY_GET and PHY_DUMP, which can be a filtered DUMP operation to list devices on only one interface. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-01 18:38:57 +00:00
Maxime Chevallier	2ab0edb505	net: ethtool: Allow passing a phy index for some commands Some netlink commands are target towards ethernet PHYs, to control some of their features. As there's several such commands, add the ability to pass a PHY index in the ethnl request, which will populate the generic ethnl_req_info with the relevant phydev when the command targets a PHY. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-01 18:38:57 +00:00
Maxime Chevallier	02018c544e	net: phy: Introduce ethernet link topology representation Link topologies containing multiple network PHYs attached to the same net_device can be found when using a PHY as a media converter for use with an SFP connector, on which an SFP transceiver containing a PHY can be used. With the current model, the transceiver's PHY can't be used for operations such as cable testing, timestamping, macsec offload, etc. The reason being that most of the logic for these configuration, coming from either ethtool netlink or ioctls tend to use netdev->phydev, which in multi-phy systems will reference the PHY closest to the MAC. Introduce a numbering scheme allowing to enumerate PHY devices that belong to any netdev, which can in turn allow userspace to take more precise decisions with regard to each PHY's configuration. The numbering is maintained per-netdev, in a phy_device_list. The numbering works similarly to a netdevice's ifindex, with identifiers that are only recycled once INT_MAX has been reached. This prevents races that could occur between PHY listing and SFP transceiver removal/insertion. The identifiers are assigned at phy_attach time, as the numbering depends on the netdevice the phy is attached to. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-01 18:38:56 +00:00
David S. Miller	109bf4cfe1	netfilter pull request 23-12-22 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmWFd9oACgkQ1V2XiooU IOTBog//auEj3LR5v2b5et6CLsuMESsCFkVko1AFdzzpu94sJ8TTkqlrW7SPSfPW CMNYz1EYFaRGK1GjfH0a49EUofeU/kPD/TmffmiYwuJlGyEIXp17ZIUQnGDrM6Mz UBHX8VrW73McJYuZJbxM4HX6Q3aFyshKy8iOPpffdFAQcCD5XtkYbW24Mzw6V4sI 8fBz6C/iT+5Jq1wGtvM2xmGkNYuMLJxbf9qOpinRZCXQV/z5IWda/bFnv3kpJ9mR x2ZYXHnzFEr2gVFIhZ7G6FvMgQAkkKGVQE+n9zVZ9V78czqBk9depdeLaq/RYGyx 8VYqJyaXHKQm7FqsBwSSxU964aQhi/zNrjF9SeJSfcRNhTmXgzdERndVWVjUPe9O rFLyQVoQ6JyKy5++1lL3+63cd6ZeGw8gUw8MCw9DX8rwamnNFunUsixKv4SLzQHA jfFgmAUq+HVWHXI4xmj+SDO8g+s7O3BpbZM09E3si6lwy4/LiQqHhV4sNgEVN/It hf76t/U6GRYZ0Hwy2dwQl8SZg1BxMSGFmzB7vyEZyGXaqxRZb1Jym4h9slOMKXWl gS5y4y0ZVTmzRCqJR+jUrMAKPCw/R964LrslDYwQpsQeP7nuUPeU5jH1VWD6kAbS Vc36uDRnfojoNiRcmAfuhvZIseM1+hI6UXCIsqYR6cekCFlBad8= =+Uvv -----END PGP SIGNATURE----- Merge tag 'nf-next-23-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== netfilter pull request 23-12-22 The following patchset contains Netfilter updates for net-next: 1) Add locking for NFT_MSG_GETSETELEM_RESET requests, to address a race scenario with two concurrent processes running a dump-and-reset which exposes negative counters to userspace, from Phil Sutter. 2) Use GFP_KERNEL in pipapo GC, from Florian Westphal. 3) Reorder nf_flowtable struct members, place the read-mostly parts accessed by the datapath first. From Florian Westphal. 4) Set on dead flag for NFT_MSG_NEWSET in abort path, from Florian Westphal. 5) Support filtering zone in ctnetlink, from Felix Huettner. 6) Bail out if user tries to redefine an existing chain with different type in nf_tables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-01 16:15:40 +00:00
Ido Schimmel	cd4d7263d5	genetlink: Use internal flags for multicast groups As explained in commit `e03781879a` ("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group"), the "flags" field in the multicast group structure reuses uAPI flags despite the field not being exposed to user space. This makes it impossible to extend its use without adding new uAPI flags, which is inappropriate for internal kernel checks. Solve this by adding internal flags (i.e., "GENL_MCAST_*") and convert the existing users to use them instead of the uAPI flags. Tested using the reproducers in commit `44ec98ea5e` ("psample: Require 'CAP_NET_ADMIN' when joining "packets" group") and commit `e03781879a` ("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group"). No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-29 08:43:59 +00:00
Greg Kroah-Hartman	f732ba4ac9	iucv: make iucv_bus const Now that the driver core can properly handle constant struct bus_type, move the iucv_bus variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-s390@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-29 07:46:38 +00:00
Kevin Hao	3fb65f6bc7	net: pktgen: Use wait_event_freezable_timeout() for freezable kthread A freezable kernel thread can enter frozen state during freezing by either calling try_to_freeze() or using wait_event_freezable() and its variants. So for the following snippet of code in a kernel thread loop: wait_event_interruptible_timeout(); try_to_freeze(); We can change it to a simple wait_event_freezable_timeout() and then eliminate a function call. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-27 14:34:52 +00:00
Radu Pirea (NXP OSS)	90abde49ea	net: rename dsa_realloc_skb to skb_ensure_writable_head_tail Rename dsa_realloc_skb to skb_ensure_writable_head_tail and move it to skbuff.c to use it as helper. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-27 13:08:09 +00:00
Lin Ma	c2b2ee3625	bridge: cfm: fix enum typo in br_cc_ccm_tx_parse It appears that there is a typo in the code where the nlattr array is being parsed with policy br_cfm_cc_ccm_tx_policy, but the instance is being accessed via IFLA_BRIDGE_CFM_CC_RDI_INSTANCE, which is associated with the policy br_cfm_cc_rdi_policy. This problem was introduced by commit `2be665c394` ("bridge: cfm: Netlink SET configuration Interface."). Though it seems like a harmless typo since these two enum owns the exact same value (1 here), it is quite misleading hence fix it by using the correct enum IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE here. Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 22:38:13 +00:00
Maxim Galaganov	c85636a292	mptcp: sockopt: support IP_LOCAL_PORT_RANGE and IP_BIND_ADDRESS_NO_PORT Support for IP_BIND_ADDRESS_NO_PORT sockopt was introduced in [1]. Recently [2] allowed its value to be accessed without locking the socket. Support for (newer) IP_LOCAL_PORT_RANGE sockopt was introduced in [3]. In the same series a selftest was added in [4]. This selftest also covers the IP_BIND_ADDRESS_NO_PORT sockopt. This patch enables getsockopt()/setsockopt() on MPTCP sockets for these socket options, syncing set values to subflows in sync_socket_options(). Ephemeral port range is synced to subflows, enabling NAT usecase described in [3]. [1] commit `90c337da15` ("inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations") [2] commit `ca571e2eb7` ("inet: move inet->bind_address_no_port to inet->inet_flags") [3] commit `91d0b78c51` ("inet: Add IP_LOCAL_PORT_RANGE socket option") [4] commit `ae5439658c` ("selftests/net: Cover the IP_LOCAL_PORT_RANGE socket option") Signed-off-by: Maxim Galaganov <max@internet.ru> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 22:33:21 +00:00
Maxim Galaganov	57d3117ca8	mptcp: rename mptcp_setsockopt_sol_ip_set_transparent() Next patch extends this function so that it's not specific to IP_TRANSPARENT. Change function name to mptcp_setsockopt_sol_ip_set(). Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 22:33:21 +00:00
Davide Caratti	8e2b8a9fa5	mptcp: don't overwrite sock_ops in mptcp_is_tcpsk() Eric Dumazet suggests: > The fact that mptcp_is_tcpsk() was able to write over sock->ops was a > bit strange to me. > mptcp_is_tcpsk() should answer a question, with a read-only argument. re-factor code to avoid overwriting sock_ops inside that function. Also, change the helper name to reflect the semantics and to disambiguate from its dual, sk_is_mptcp(). While at it, collapse mptcp_stream_accept() and mptcp_accept() into a single function, where fallback / non-fallback are separated into a single sk_is_mptcp() conditional. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/432 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 22:33:21 +00:00
Victor Nogueira	42f39036cd	net/sched: act_mirred: Allow mirred to block So far the mirred action has dealt with syntax that handles mirror/redirection for netdev. A matching packet is redirected or mirrored to a target netdev. In this patch we enable mirred to mirror to a tc block as well. IOW, the new syntax looks as follows: ... mirred <ingress \| egress> <mirror \| redirect> [index INDEX] < <blockid BLOCKID> \| <dev <devname>> > Examples of mirroring or redirecting to a tc block: $ tc filter add block 22 protocol ip pref 25 \ flower dst_ip 192.168.0.0/16 action mirred egress mirror blockid 22 $ tc filter add block 22 protocol ip pref 25 \ flower dst_ip 10.10.10.10/32 action mirred egress redirect blockid 22 Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 21:20:09 +00:00
Victor Nogueira	415e38bf1d	net/sched: act_mirred: Add helper function tcf_mirred_replace_dev The act of replacing a device will be repeated by the init logic for the block ID in the patch that allows mirred to a block. Therefore we encapsulate this functionality in a function (tcf_mirred_replace_dev) so that we can reuse it and avoid code repetition. Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 21:20:08 +00:00
Victor Nogueira	16085e48cb	net/sched: act_mirred: Create function tcf_mirred_to_dev and improve readability As a preparation for adding block ID to mirred, separate the part of mirred that redirect/mirrors to a dev into a specific function so that it can be called by blockcast for each dev. Also improve readability. Eg. rename use_reinsert to dont_clone and skb2 to skb_to_send. Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 21:20:08 +00:00
Victor Nogueira	a7042cf8f2	net/sched: cls_api: Expose tc block to the datapath The datapath can now find the block of the port in which the packet arrived at. In the next patch we show a possible usage of this patch in a new version of mirred that multicasts to all ports except for the port in which the packet arrived on. Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 21:20:08 +00:00
Victor Nogueira	913b47d342	net/sched: Introduce tc block netdev tracking infra This commit makes tc blocks track which ports have been added to them. And, with that, we'll be able to use this new information to send packets to the block's ports. Which will be done in the patch #3 of this series. Suggested-by: Jiri Pirko <jiri@nvidia.com> Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 21:20:08 +00:00
Denis Kirjanov	8e5443d2b8	net: remove SOCK_DEBUG leftovers SOCK_DEBUG comes from the old days. Let's move logging to standard net core ratelimited logging functions Signed-off-by: Denis Kirjanov <dkirjanov@suse.de> changes in v2: - remove SOCK_DEBUG macro altogether Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:31:01 +00:00
Wen Gu	b3bf76024f	net/smc: manage system EID in SMC stack instead of ISM driver The System EID (SEID) is an internal EID that is used by the SMCv2 software stack that has a predefined and constant value representing the s390 physical machine that the OS is executing on. So it should be managed by SMC stack instead of ISM driver and be consistent for all ISMv2 device (including virtual ISM devices) on s390 architecture. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:33 +00:00
Wen Gu	c6b8b8eb49	net/smc: disable SEID on non-s390 archs where virtual ISM may be used The system EID (SEID) is an internal EID used by SMC-D to represent the s390 physical machine that OS is executing on. On s390 architecture, it predefined by fixed string and part of cpuid and is enabled regardless of whether underlay device is virtual ISM or platform firmware ISM. However on non-s390 architectures where SMC-D can be used with virtual ISM devices, there is no similar information to identify physical machines, especially in virtualization scenarios. So in such cases, SEID is forcibly disabled and the user-defined UEID will be used to represent the communicable space. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:33 +00:00
Wen Gu	01fd1617db	net/smc: support extended GID in SMC-D lgr netlink attribute Virtual ISM devices introduced in SMCv2.1 requires a 128 bit extended GID vs. the existing ISM 64bit GID. So the 2nd 64 bit of extended GID should be included in SMC-D linkgroup netlink attribute as well. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:33 +00:00
Wen Gu	b40584d145	net/smc: compatible with 128-bits extended GID of virtual ISM device According to virtual ISM support feature defined by SMCv2.1, GIDs of virtual ISM device are UUIDs defined by RFC4122, which are 128-bits long. So some adaptation work is required. And note that the GIDs of existing platform firmware ISM devices still remain 64-bits long. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:33 +00:00
Wen Gu	8dd512df3c	net/smc: define a reserved CHID range for virtual ISM devices According to virtual ISM support feature defined by SMCv2.1, CHIDs in the range 0xFF00 to 0xFFFF are reserved for use by virtual ISM devices. And two helpers are introduced to distinguish virtual ISM devices from the existing platform firmware ISM devices. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:33 +00:00
Wen Gu	00e006a257	net/smc: introduce virtual ISM device support feature This introduces virtual ISM device support feature to SMCv2.1 as the first supplemental feature. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:33 +00:00
Wen Gu	ece60db3a4	net/smc: support SMCv2.x supplemental features negotiation This patch adds SMCv2.x supplemental features negotiation. Supported SMCv2.x supplemental features are represented by feature_mask in FCE field. The negotiation process is as follows. Server Client Proposal(features(c-mask bits)) <----------------------------------------- Accept(features(s-mask bits)) -----------------------------------------> Confirm(features(s&c-mask bits)) <----------------------------------------- Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:33 +00:00
Wen Gu	9505450d55	net/smc: unify the structs of accept or confirm message for v1 and v2 The structs of CLC accept and confirm messages for SMCv1 and SMCv2 are separately defined and often casted to each other in the code, which may increase the risk of errors caused by future divergence of them. So unify them into one struct for better maintainability. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:33 +00:00
Wen Gu	5205ac4483	net/smc: introduce sub-functions for smc_clc_send_confirm_accept() There is a large if-else block in smc_clc_send_confirm_accept() and it is better to split it into two sub-functions. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:32 +00:00
Wen Gu	ac053a169c	net/smc: rename some 'fce' to 'fce_v2x' for clarity Rename some functions or variables with 'fce' in their name but used in SMCv2.1 as 'fce_v2x' for clarity. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-26 20:24:32 +00:00
Jonathan Corbet	45248f2902	tipc: Remove some excess struct member documentation Remove documentation for nonexistent struct members, addressing these warnings: ./net/tipc/link.c:228: warning: Excess struct member 'media_addr' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'timer' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'refcnt' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'proto_msg' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'pmsg' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'backlog_limit' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'exp_msg_count' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'reset_rcv_checkpt' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'transmitq' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'snt_nxt' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'deferred_queue' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'unacked_window' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'next_out' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'long_msg_seq_no' description in 'tipc_link' ./net/tipc/link.c:228: warning: Excess struct member 'bc_rcvr' description in 'tipc_link' Signed-off-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 23:14:43 +00:00
Kuniyuki Iwashima	8191792c18	tcp: Remove dead code and fields for bhash2. Now all sockets including TIME_WAIT are linked to bhash2 using sock_common.skc_bind_node. We no longer use inet_bind2_bucket.deathrow, sock.sk_bind2_node, and inet_timewait_sock.tw_bind2_node. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	770041d337	tcp: Link sk and twsk to tb2->owners using skc_bind_node. Now we can use sk_bind_node/tw_bind_node for bhash2, which means we need not link TIME_WAIT sockets separately. The dead code and sk_bind2_node will be removed in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	b2cb9f9ef2	tcp: Unlink sk from bhash. Now we do not use tb->owners and can unlink sockets from bhash. sk_bind_node/tw_bind_node are available for bhash2 and will be used in the following patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	8002d44fe8	tcp: Check hlist_empty(&tb->bhash2) instead of hlist_empty(&tb->owners). We use hlist_empty(&tb->owners) to check if the bhash bucket has a socket. We can check the child bhash2 buckets instead. For this to work, the bhash2 bucket must be freed before the bhash bucket. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	b82ba728cc	tcp: Iterate tb->bhash2 in inet_csk_bind_conflict(). Sockets in bhash are also linked to bhash2, but TIME_WAIT sockets are linked separately in tb2->deathrow. Let's replace tb->owners iteration in inet_csk_bind_conflict() with two iterations over tb2->owners and tb2->deathrow. This can be done safely under bhash's lock because socket insertion/ deletion in bhash2 happens with bhash's lock held. Note that twsk_for_each_bound_bhash() will be removed later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	58655bc0ad	tcp: Rearrange tests in inet_csk_bind_conflict(). The following patch adds code in the !inet_use_bhash2_on_bind(sk) case in inet_csk_bind_conflict(). To avoid adding nest and make the change cleaner, this patch rearranges tests in inet_csk_bind_conflict(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	822fb91fc7	tcp: Link bhash2 to bhash. bhash2 added a new member sk_bind2_node in struct sock to link sockets to bhash2 in addition to bhash. bhash is still needed to search conflicting sockets efficiently from a port for the wildcard address. However, bhash itself need not have sockets. If we link each bhash2 bucket to the corresponding bhash bucket, we can iterate the same set of the sockets from bhash2 via bhash. This patch links bhash2 to bhash only, and the actual use will be in the later patches. Finally, we will remove sk_bind2_node. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	4dd7108854	tcp: Rename tb in inet_bind2_bucket_(init\|create)(). Later, we no longer link sockets to bhash. Instead, each bhash2 bucket is linked to the corresponding bhash bucket. Then, we pass the bhash bucket to bhash2 allocation functions as tb. However, tb is already used in inet_bind2_bucket_create() and inet_bind2_bucket_init() as the bhash2 bucket. To make the following diff clear, let's use tb2 for the bhash2 bucket there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	5a22bba13d	tcp: Save address type in inet_bind2_bucket. inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any() are called for each bhash2 bucket to check conflicts. Thus, we call ipv6_addr_any() and ipv6_addr_v4mapped() over and over during bind(). Let's avoid calling them by saving the address type in inet_bind2_bucket. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	06a8c04f89	tcp: Save v4 address as v4-mapped-v6 in inet_bind2_bucket.v6_rcv_saddr. In bhash2, IPv4/IPv6 addresses are saved in two union members, which complicate address checks in inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any() considering uninitialised memory and v4-mapped-v6 conflicts. Let's simplify that by saving IPv4 address as v4-mapped-v6 address and defining tb2.rcv_saddr as tb2.v6_rcv_saddr.s6_addr32[3]. Then, we can compare v6 address as is, and after checking v4-mapped-v6, we can compare v4 address easily. Also, we can remove tb2->family. Note these functions will be further refactored in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	56f3e3f01f	tcp: Rearrange tests in inet_bind2_bucket_(addr_match\|match_addr_any)(). The protocol family tests in inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any() are ordered as follows. if (sk->sk_family != tb2->family) else if (sk->sk_family == AF_INET6) else This patch rearranges them so that AF_INET6 socket is handled first to make the following patch tidy, where tb2->family will be removed. if (sk->sk_family == AF_INET6) else if (tb2->family == AF_INET6) else Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	5e07e67241	tcp: Use bhash2 for v4-mapped-v6 non-wildcard address. While checking port availability in bind() or listen(), we used only bhash for all v4-mapped-v6 addresses. But there is no good reason not to use bhash2 for v4-mapped-v6 non-wildcard addresses. Let's do it by returning true in inet_use_bhash2_on_bind(). Then, we also need to add a test in inet_bind2_bucket_match_addr_any() so that ::ffff:X.X.X.X will match with 0.0.0.0. Note that sk->sk_rcv_saddr is initialised for v4-mapped-v6 sk in __inet6_bind(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Pablo Neira Ayuso	aaba7ddc85	netfilter: nf_tables: validate chain type update if available Parse netlink attribute containing the chain type in this update, to bail out if this is different from the existing type. Otherwise, it is possible to define a chain with the same name, hook and priority but different type, which is silently ignored. Fixes: `96518518cc` ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-12-22 12:15:28 +01:00
Felix Huettner	eff3c558bb	netfilter: ctnetlink: support filtering by zone conntrack zones are heavily used by tools like openvswitch to run multiple virtual "routers" on a single machine. In this context each conntrack zone matches to a single router, thereby preventing overlapping IPs from becoming issues. In these systems it is common to operate on all conntrack entries of a given zone, e.g. to delete them when a router is deleted. Previously this required these tools to dump the full conntrack table and filter out the relevant entries in userspace potentially causing performance issues. To do this we reuse the existing CTA_ZONE attribute. This was previous parsed but not used during dump and flush requests. Now if CTA_ZONE is set we filter these operations based on the provided zone. However this means that users that previously passed CTA_ZONE will experience a difference in functionality. Alternatively CTA_FILTER could have been used for the same functionality. However it is not yet supported during flush requests and is only available when using AF_INET or AF_INET6. Co-developed-by: Luca Czesla <luca.czesla@mail.schwarz> Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> Co-developed-by: Max Lamprecht <max.lamprecht@mail.schwarz> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-12-22 12:15:20 +01:00
Florian Westphal	08e4c8c591	netfilter: nf_tables: mark newset as dead on transaction abort If a transaction is aborted, we should mark the to-be-released NEWSET dead, just like commit path does for DEL and DESTROYSET commands. In both cases all remaining elements will be released via set->ops->destroy(). The existing abort code does NOT post the actual release to the work queue. Also the entire __nf_tables_abort() function is wrapped in gc_seq begin/end pair. Therefore, async gc worker will never try to release the pending set elements, as gc sequence is always stale. It might be possible to speed up transaction aborts via work queue too, this would result in a race and a possible use-after-free. So fix this before it becomes an issue. Fixes: `5f68718b34` ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-12-22 12:08:38 +01:00
Florian Westphal	ffb40fba40	netfilter: nft_set_pipapo: prefer gfp_kernel allocation No need to use GFP_ATOMIC here. Fixes: `f6c383b8c3` ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-12-22 12:08:38 +01:00
Phil Sutter	3d483faa66	netfilter: nf_tables: Add locking for NFT_MSG_GETSETELEM_RESET requests Set expressions' dump callbacks are not concurrency-safe per-se with reset bit set. If two CPUs reset the same element at the same time, values may underrun at least with element-attached counters and quotas. Prevent this by introducing dedicated callbacks for nfnetlink and the asynchronous dump handling to serialize access. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-12-22 12:08:38 +01:00

1 2 3 4 5 ...

75562 Commits