linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-09-22 12:44:11 +08:00

Author	SHA1	Message	Date
Zhengchao Shao	c19d893fbf	net: sched: delete duplicate cleanup of backlog and qlen qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog _after_ calling qdisc->ops->reset. There is no need to clear them again in the specific reset function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-25 15:10:17 +02:00
Wenjuan Geng	ff763011ee	nfp: flower: support case of match on ct_state(0/0x3f) is_post_ct_flow() function will process only ct_state ESTABLISHED, then offload_pre_check() function will check FLOW_DISSECTOR_KEY_CT flag. When config tc filter match ct_state(0/0x3f), dissector->used_keys with FLOW_DISSECTOR_KEY_CT bit, function offload_pre_check() will return false, so not offload. This is a special case that can be handled safely. Therefore, modify to let initial packet which won't go through conntrack can be offloaded, as long as the cared ct fields are all zero. Signed-off-by: Wenjuan Geng <wenjuan.geng@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220823090122.403631-1-simon.horman@corigine.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-25 11:03:25 +02:00
Richard Gobert	35ffb66547	net: gro: skb_gro_header helper function Introduce a simple helper function to replace a common pattern. When accessing the GRO header, we fetch the pointer from frag0, then test its validity and fetch it from the skb when necessary. This leads to the pattern skb_gro_header_fast -> skb_gro_header_hard -> skb_gro_header_slow recurring many times throughout GRO code. This patch replaces these patterns with a single inlined function call, improving code readability. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220823071034.GA56142@debian Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-25 10:33:21 +02:00
Jiri Pirko	77a70f9c5b	Documentation: devlink: fix the locking section As all callbacks are converted now, fix the text reflecting that change. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20220823070213.1008956-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 19:32:02 -07:00
Jakub Kicinski	c21e1bf4d8	Merge branch 'add-a-second-bind-table-hashed-by-port-and-address' Joanne Koong says: ==================== Add a second bind table hashed by port and address Currently, there is one bind hashtable (bhash) that hashes by port only. This patchset adds a second bind table (bhash2) that hashes by port and address. The motivation for adding bhash2 is to expedite bind requests in situations where the port has many sockets in its bhash table entry (eg a large number of sockets bound to different addresses on the same port), which makes checking bind conflicts costly especially given that we acquire the table entry spinlock while doing so, which can cause softirq cpu lockups and can prevent new tcp connections. We ran into this problem at Meta where the traffic team binds a large number of IPs to port 443 and the bind() call took a significant amount of time which led to cpu softirq lockups, which caused packet drops and other failures on the machine. When experimentally testing this on a local server for ~24k sockets bound to the port, the results seen were: ipv4: before - 0.002317 seconds with bhash2 - 0.000020 seconds ipv6: before - 0.002431 seconds with bhash2 - 0.000021 seconds The additions to the initial bhash2 submission [0] are: * Updating bhash2 in the cases where a socket's rcv saddr changes after it has * been bound * Adding locks for bhash2 hashbuckets [0] https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/ v3: https://lore.kernel.org/netdev/20220722195406.1304948-2-joannelkoong@gmail.com/ v2: https://lore.kernel.org/netdev/20220712235310.1935121-1-joannelkoong@gmail.com/ v1: https://lore.kernel.org/netdev/20220623234242.2083895-2-joannelkoong@gmail.com/ ==================== Link: https://lore.kernel.org/r/20220822181023.3979645-1-joannelkoong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 19:30:19 -07:00
Joanne Koong	1be9ac87a7	selftests/net: Add sk_bind_sendto_listen and sk_connect_zero_addr This patch adds 2 new tests: sk_bind_sendto_listen and sk_connect_zero_addr. The sk_bind_sendto_listen test exercises the path where a socket's rcv saddr changes after it has been added to the binding tables, and then a listen() on the socket is invoked. The listen() should succeed. The sk_bind_sendto_listen test is copied over from one of syzbot's tests: https://syzkaller.appspot.com/x/repro.c?x=1673a38df00000 The sk_connect_zero_addr test exercises the path where the socket was never previously added to the binding tables and it gets assigned a saddr upon a connect() to address 0. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 19:30:09 -07:00
Joanne Koong	c35ecb95c4	selftests/net: Add test for timing a bind request to a port with a populated bhash entry This test populates the bhash table for a given port with MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long a bind request on the port takes. When populating the bhash table, we create the sockets and then bind the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT are set). When timing how long a bind on the port takes, we bind on a different address without SO_REUSEPORT set. We do not set SO_REUSEPORT because we are interested in the case where the bind request does not go through the tb->fastreuseport path, which is fragile (eg tb->fastreuseport path does not work if binding with a different uid). To run the script: Usage: ./bind_bhash.sh [-6 \| -4] [-p port] [-a address] 6: use ipv6 4: use ipv4 port: Port number address: ip address Without any arguments, ./bind_bhash.sh defaults to ipv6 using ip address "2001:0db8:0:f101::1" on port 443. On my local machine, I see: ipv4: before - 0.002317 seconds with bhash2 - 0.000020 seconds ipv6: before - 0.002431 seconds with bhash2 - 0.000021 seconds Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 19:30:09 -07:00
Joanne Koong	28044fc1d4	net: Add a bhash2 table hashed by port and address The current bind hashtable (bhash) is hashed by port only. In the socket bind path, we have to check for bind conflicts by traversing the specified port's inet_bind_bucket while holding the hashbucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()). In instances where there are tons of sockets hashed to the same port at different addresses, the bind conflict check is time-intensive and can cause softirq cpu lockups, as well as stops new tcp connections since __inet_inherit_port() also contests for the spinlock. This patch adds a second bind table, bhash2, that hashes by port and sk->sk_rcv_saddr (ipv4) and sk->sk_v6_rcv_saddr (ipv6). Searching the bhash2 table leads to significantly faster conflict resolution and less time holding the hashbucket spinlock. Please note a few things: * There can be the case where the a socket's address changes after it has been bound. There are two cases where this happens: 1) The case where there is a bind() call on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6) and then a connect() call. The kernel will assign the socket an address when it handles the connect() 2) In inet_sk_reselect_saddr(), which is called when rebuilding the sk header and a few pre-conditions are met (eg rerouting fails). In these two cases, we need to update the bhash2 table by removing the entry for the old address, and add a new entry reflecting the updated address. * The bhash2 table must have its own lock, even though concurrent accesses on the same port are protected by the bhash lock. Bhash2 must have its own lock to protect against cases where sockets on different ports hash to different bhash hashbuckets but to the same bhash2 hashbucket. This brings up a few stipulations: 1) When acquiring both the bhash and the bhash2 lock, the bhash2 lock will always be acquired after the bhash lock and released before the bhash lock is released. 2) There are no nested bhash2 hashbucket locks. A bhash2 lock is always acquired+released before another bhash2 lock is acquired+released. * The bhash table cannot be superseded by the bhash2 table because for bind requests on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6), every socket bound to that port must be checked for a potential conflict. The bhash table is the only source of port->socket associations. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 19:30:07 -07:00
Zhengchao Shao	0bf73255d3	netlink: fix some kernel-doc comments Modify the comment of input parameter of nlmsg_ and nla_ function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220824013621.365103-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 18:46:16 -07:00
Randy Dunlap	35bbe652c4	net: ethernet: ti: davinci_mdio: fix build for mdio bitbang uses davinci_mdio.c uses mdio bitbang APIs, so it should select MDIO_BITBANG to prevent build errors. arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdio_remove': drivers/net/ethernet/ti/davinci_mdio.c:649: undefined reference to `free_mdio_bitbang' arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdio_probe': drivers/net/ethernet/ti/davinci_mdio.c:545: undefined reference to `alloc_mdio_bitbang' arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdiobb_read': drivers/net/ethernet/ti/davinci_mdio.c:236: undefined reference to `mdiobb_read' arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdiobb_write': drivers/net/ethernet/ti/davinci_mdio.c:253: undefined reference to `mdiobb_write' Fixes: `d04807b806` ("net: ethernet: ti: davinci_mdio: Add workaround for errata i2329") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Ravi Gunasekaran <r-gunasekaran@ti.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com> Link: https://lore.kernel.org/r/20220824024216.4939-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 18:44:24 -07:00
Bagas Sanjaya	1faa34672f	Documentation: sysctl: align cells in second content column Stephen Rothwell reported htmldocs warning when merging net-next tree: Documentation/admin-guide/sysctl/net.rst:37: WARNING: Malformed table. Text in column margin in table line 4. ========= =================== = ========== ================== Directory Content Directory Content ========= =================== = ========== ================== 802 E802 protocol mptcp Multipath TCP appletalk Appletalk protocol netfilter Network Filter ax25 AX25 netrom NET/ROM bridge Bridging rose X.25 PLP layer core General parameter tipc TIPC ethernet Ethernet protocol unix Unix domain sockets ipv4 IP version 4 x25 X.25 protocol ipv6 IP version 6 ========= =================== = ========== ================== The warning above is caused by cells in second "Content" column of /proc/sys/net subdirectory table which are in column margin. Align these cells against the column header to fix the warning. Link: https://lore.kernel.org/linux-next/20220823134905.57ed08d5@canb.auug.org.au/ Fixes: `1202cdd665` ("Remove DECnet support from kernel") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20220824035804.204322-1-bagasdotme@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 18:42:51 -07:00
David S. Miller	8357d67f5e	Merge branch 'r8169-next' Heiner Kallweit says: ==================== r8169: remove support for few unused chip versions There's a number of chip versions that apparently never made it to the mass market. Detection of these chip versions has been disabled for few kernel versions now and nobody complained. Therefore remove support for these chip versions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:24:10 +01:00
Heiner Kallweit	efc37109c7	r8169: remove support for chip version 60 Detection of this chip version has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:24:10 +01:00
Heiner Kallweit	133706a960	r8169: remove support for chip version 50 Detection of this chip version has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. v3: - rebase patch Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:24:10 +01:00
Heiner Kallweit	8a1ab0c402	r8169: remove support for chip version 49 Detection of this chip version has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. v2: - fix a typo: RTL_GIGA_MAC_VER_40 -> RTL_GIGA_MAC_VER_50 Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:24:10 +01:00
Heiner Kallweit	ebe5989857	r8169: remove support for chip versions 45 and 47 Detection of these chip versions has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:24:10 +01:00
Heiner Kallweit	44307b27de	r8169: remove support for chip version 41 Detection of this chip version has been disabled for few kernel versions now. Nobody complained, so remove support for this chip version. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:24:10 +01:00
David S. Miller	1cd5ea448f	mlx5-updates-2022-08-22 Roi Dayan Says: =============== Add support for SF tunnel offload Mlx5 driver only supports VF tunnel offload. To add support for SF tunnel offload the driver needs to: 1. Add send-to-vport metadata matching rules like done for VFs. 2. Set an indirect table for SF vport, same as VF vport. info smaller sub functions for better maintainability. rules from esw init phase to representor load phase. SFs could be created after esw initialized and thus the send-to-vport meta rules would not be created for those SFs. By moving the creation of the rules to representor load phase we ensure creating the rules also for SFs created later. =============== Lama Kayal Says: ================ Make flow steering API loosely coupled from mlx5e_priv, in a manner to introduce more readable and maintainable modules. Make TC's private, let mlx5e_flow_steering struct be dynamically allocated, and introduce its API to maintain the code via setters and getters instead of publicly exposing it. Introduce flow steering debug macros to provide an elegant finish to the decoupled flow steering API, where errors related to flow steering shall be reported via them. All flow steering related files will drop any coupling to mlx5e_priv, instead they will get the relevant members as input. Among these, fs_tt_redirect, fs_tc, and arfs. ================ -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmMEaToACgkQSD+KveBX +j7H1wf8DPInwK4MsgilUZFXNd03RJprlptami+Feev2wmuzOIOwyTxOc+VvowFM G6r0Yuqahg0z5oE7zgxMNNdtKevjqmaNLWuDtbvv4UQKt7hwJz28Y0Ezuz3L9sQS JVEQpII/FhUFzHoeDTbbJojKGTO7N8hS4D7ZB9wGeQc7M8PmLiUq19PURKwGDxUq EhNwEgxOzbd6S5OMGfD0bDOOK+TpntKctkXUpYh7rr6JlLQpor03cBfYrOx9DcST M3Yk9ianx4fmeOeyZAAZOEzUpBkwMt74SxelIcwAAei9bcNRRI3FoNLc+PcKJjz1 c567AeSzx/HYnaoydtGFOroq8lAwnw== =pgqu -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2022-08-22 Roi Dayan Says: =============== Add support for SF tunnel offload Mlx5 driver only supports VF tunnel offload. To add support for SF tunnel offload the driver needs to: 1. Add send-to-vport metadata matching rules like done for VFs. 2. Set an indirect table for SF vport, same as VF vport. info smaller sub functions for better maintainability. rules from esw init phase to representor load phase. SFs could be created after esw initialized and thus the send-to-vport meta rules would not be created for those SFs. By moving the creation of the rules to representor load phase we ensure creating the rules also for SFs created later. =============== Lama Kayal Says: ================ Make flow steering API loosely coupled from mlx5e_priv, in a manner to introduce more readable and maintainable modules. Make TC's private, let mlx5e_flow_steering struct be dynamically allocated, and introduce its API to maintain the code via setters and getters instead of publicly exposing it. Introduce flow steering debug macros to provide an elegant finish to the decoupled flow steering API, where errors related to flow steering shall be reported via them. All flow steering related files will drop any coupling to mlx5e_priv, instead they will get the relevant members as input. Among these, fs_tt_redirect, fs_tc, and arfs. ================	2022-08-24 13:19:39 +01:00
Jerry Ray	fef5de753f	micrel: ksz8851: fixes struct pointer issue Issue found during code review. This bug has no impact as long as the ks8851_net structure is the first element of the ks8851_net_spi structure. As long as the offset to the ks8851_net struct is zero, the container_of() macro is subtracting 0 and therefore no damage done. But if the ks8851_net_spi struct is ever modified such that the ks8851_net struct within it is no longer the first element of the struct, then the bug would manifest itself and cause problems. struct ks8851_net is contained within ks8851_net_spi. ks is contained within kss. kss is the priv_data of the netdev structure. Signed-off-by: Jerry Ray <jerry.ray@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:02:15 +01:00
Eric Dumazet	aacd467c0a	tcp: annotate data-race around tcp_md5sig_pool_populated tcp_md5sig_pool_populated can be read while another thread changes its value. The race has no consequence because allocations are protected with tcp_md5sig_mutex. This patch adds READ_ONCE() and WRITE_ONCE() to document the race and silence KCSAN. Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 12:59:18 +01:00
Oleksandr Mazur	73ef239cd8	net: marvell: prestera: implement br_port_locked flag offloading Both <port> br_port_locked and <lag> interfaces's flag offloading is supported. No new ABI is being added, rather existing (port_param_set) API call gets extended. Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> V2: add missing receipents (linux-kernel, netdev) Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 12:55:47 +01:00
David S. Miller	0d0f034d06	Merge branch 'j7200-support' Siddharth Vadapalli says: ==================== J7200: CPSW5G: Add support for QSGMII mode to am65-cpsw driver Add support for QSGMII mode to am65-cpsw driver. Change log: v4-> v5: 1. Move ti,j7200-cpswxg-nuss compatible to the line above the ti,j721e-cpsw-nuss compatible. 2. Add allOf and move if-then statements within it to allow future if-then statements to be added easily. v3 -> v4: 1. Update bindings to disallow ports based on compatible, instead of adding a new if/then statement for the new compatible. 2. Add Else-If condition for RMII mode in the set of supported interfaces. Support for RMII mode is already present in the driver and I had missed out adding a condition for RMII mode in the previous patches. v2 -> v3: 1. In ti,k3-am654-cpsw-nuss.yaml, restrict if/then statement to port nodes. v1 -> v2: 1. Add new compatible for CPSW5G in ti,k3-am654-cpsw-nuss.yaml and extend properties for new compatible. 2. Add extra_modes member to struct am65_cpsw_pdata to be used for QSGMII mode by new compatible. 3. Add check for phylink supported modes to ensure that only one phy mode is advertised as supported. 4. Check if extra_modes supports QSGMII mode in am65_cpsw_nuss_mac_config() for register write. 5. Add check for assigning port->sgmii_base only when extra_modes is valid. v4: https://lore.kernel.org/r/20220816060139.111934-1-s-vadapalli@ti.com/ v3: https://lore.kernel.org/r/20220606110443.30362-1-s-vadapalli@ti.com/ v2: https://lore.kernel.org/r/20220602114558.6204-1-s-vadapalli@ti.com/ v1: https://lore.kernel.org/r/20220531113058.23708-1-s-vadapalli@ti.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 09:52:04 +01:00
Siddharth Vadapalli	763015a794	net: ethernet: ti: am65-cpsw: Move phy_set_mode_ext() to correct location In TI's J7200 SoC CPSW5G ports, each of the 4 ports can be configured as a QSGMII main or QSGMII-SUB port. This configuration is performed by phy-gmii-sel driver on invoking the phy_set_mode_ext() function. It is necessary for the QSGMII main port to be configured before any of the QSGMII-SUB interfaces are brought up. Currently, the QSGMII-SUB interfaces come up before the QSGMII main port is configured. Fix this by moving the call to phy_set_mode_ext() from am65_cpsw_nuss_ndo_slave_open() to am65_cpsw_nuss_init_slave_ports(), thereby ensuring that the QSGMII main port is configured before any of the QSGMII-SUB ports are brought up. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 09:52:04 +01:00
Siddharth Vadapalli	37184fc112	net: ethernet: ti: am65-cpsw: Add support for J7200 CPSW5G CPSW5G in J7200 supports additional modes like QSGMII and SGMII. Add new compatible for J7200 and enable QSGMII mode in am65-cpsw driver. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 09:52:04 +01:00
Siddharth Vadapalli	d98495169d	dt-bindings: net: ti: k3-am654-cpsw-nuss: Update bindings for J7200 CPSW5G Update bindings for TI K3 J7200 SoC which contains 5 ports (4 external ports) CPSW5G module and add compatible for it. Changes made: - Add new compatible ti,j7200-cpswxg-nuss for CPSW5G. - Extend pattern properties for new compatible. - Change maximum number of CPSW ports to 4 for new compatible. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 09:52:03 +01:00
Menglong Dong	c205cc7534	net: skb: prevent the split of kfree_skb_reason() by gcc Sometimes, gcc will optimize the function by spliting it to two or more functions. In this case, kfree_skb_reason() is splited to kfree_skb_reason and kfree_skb_reason.part.0. However, the function/tracepoint trace_kfree_skb() in it needs the return address of kfree_skb_reason(). This split makes the call chains becomes: kfree_skb_reason() -> kfree_skb_reason.part.0 -> trace_kfree_skb() which makes the return address that passed to trace_kfree_skb() be kfree_skb(). Therefore, introduce '__fix_address', which is the combination of '__noclone' and 'noinline', and apply it to kfree_skb_reason() to prevent to from being splited or made inline. (Is it better to simply apply '__noclone oninline' to kfree_skb_reason? I'm thinking maybe other functions have the same problems) Meanwhile, wrap 'skb_unref()' with 'unlikely()', as the compiler thinks it is likely return true and splits kfree_skb_reason(). Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 09:47:51 +01:00
Jakub Kicinski	fa2bc96259	Merge branch 'add-interface-mode-select-and-rmii' Wei Fang says: ==================== add interface mode select and RMII From: Wei Fang <wei.fang@nxp.com> The patches add the below feature support for both TJA1100 and TJA1101 PHYs cards: - Add MII and RMII mode support. - Add REF_CLK input/output support for RMII mode. ==================== Link: https://lore.kernel.org/r/20220822015949.1569969-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:43:30 -07:00
Wei Fang	60ddc78d16	net: phy: tja11xx: add interface mode and RMII REF_CLK support Add below features support for both TJA1100 and TJA1101 cards: - Add MII and RMII mode support. - Add REF_CLK input/output support for RMII mode. Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:43:28 -07:00
Wei Fang	52b2fe4535	dt-bindings: net: tja11xx: add nxp,refclk_in property TJA110x REF_CLK can be configured as interface reference clock intput or output when the RMII mode enabled. This patch add the property to make the REF_CLK can be configurable. Signed-off-by: Wei Fang <wei.fang@nxp.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:43:27 -07:00
Jakub Kicinski	3de1484bd3	Merge branch 'mlxsw-introduce-modular-system-support-by-minimal-driver' Petr Machata says: ==================== mlxsw: Introduce modular system support by minimal driver Vadim Pasternak writes: This patchset adds line cards support in mlxsw_minimal, which is used for monitoring purposes on BMC systems. The BMC is connected to the ASIC over I2C bus, unlike the host CPU that is connected to the ASIC via PCI bus. The BMC system needs to be notified whenever line cards become active or inactive, so that, for example, netdevs will be registered / unregistered by mlxsw_minimal. However, traps cannot be generated towards the BMC over the I2C bus. To overcome that, the I2C bus driver (i.e., mlxsw_i2c) registers an handler for an IRQ that is fired upon specific system wide changes, like line card activation and deactivation. The generated event is handled by mlxsw_core, which checks whether anything changed in the state of available line cards. If a line card becomes active or inactive, interested parties such as mlxsw_minimal are notified via their registered line card event callback. Patch set overview: Patches #1 is preparations. Patches #2-#3 extend mlxsw_core with an infrastructure to handle the previously mentioned system events. Patch #4 extends the I2C bus driver to register an handler for the IRQ fired upon specific system wide changes. Patches #5-#8 gradually add line cards support in mlxsw_minimal by dynamically registering / unregistering netdevs for ports found on line cards, whenever a line card becomes active / inactive. ==================== Link: https://lore.kernel.org/r/cover.1661093502.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:22:04 -07:00
Vadim Pasternak	706ddb7821	mlxsw: minimal: Extend to support line card dynamic operations Implement line card operation callbacks got_active() / got_inactive(). The purpose of these callback to create / remove line card ports after line card is getting active / inactive. Implement line ports_remove_selected() callback to support line card un-provisioning flow through 'devlink'. Add line card operation registration and de-registration APIs. Add module offset for line card. Offset for main board iz zero. For line card in slot #n offset is calculated as (#n - 1) multiplied by maximum modules number. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:22:00 -07:00
Vadim Pasternak	01328e23a4	mlxsw: minimal: Extend module to port mapping with slot index The interfaces for ports found on line card are created and removed dynamically after line card is getting active or inactive. Introduce per line card array with module to port mapping. For each port get 'slot_index' through PMLP register and set port mapping for the relevant [slot_index][module] entry. Split module and port allocation into separate routines. Split per line card port creation and removing into separate routines. Motivation to re-use these routines for line card operations. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:22:00 -07:00
Vadim Pasternak	9421c8b89d	mlxsw: minimal: Move ports allocation to separate routine Perform ports allocation in a separate routine. Motivation is to re-use this routine for ports found on line cards. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:22:00 -07:00
Vadim Pasternak	c7ea08badd	mlxsw: minimal: Extend APIs with slot index for modular system support Add 'slot_index' field to port structure. Replace zero slot_index argument with 'slot_index' in 'ethtool' related APIs. Add 'slot_index' argument to port initialization and de-initialization related APIs. Motivation is to prepare minimal driver for modular system support. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:22:00 -07:00
Vadim Pasternak	33fa6909a2	mlxsw: i2c: Add support for system interrupt handling Extend i2c bus driver with interrupt handler to support system specific hotplug events, related to line card state change. Provide system IRQ line for interrupt handler. IRQ line Id could be provided through the platform data if available, or could be set to the default value. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:21:59 -07:00
Vadim Pasternak	508c29bf15	mlxsw: core_linecards: Register a system event handler Add line card system event handler. Register it with core. It is triggered by system interrupts raised from chassis programmable logic devices to CPU. The purpose is to handle line card state changes over I2C bus. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:21:59 -07:00
Vadim Pasternak	2ab4e70966	mlxsw: core: Add registration APIs for system event handler The purpose of system event handler is to handle system interrupts. Such interrupts are raised to CPU from system programmable logic devices, upon specific system wide changes, like line card activation and deactivation. The purpose is to create an alternative to trap mechanism, which delivers these events to driver over PCI bus, but not available for the driver working over I2C bus. Mechanism is system dependent and applicable only for the systems equipped with programmable devices with custom logic. Add APIs for event handler registration and un-registration and API which should be invoked from the registered callbacks when system interrupt is raised to CPU. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:21:59 -07:00
Vadim Pasternak	4be4779b6c	mlxsw: core_linecards: Separate line card init and fini flow Currently, each line card is initialized using the following steps: 1. Initializing its various fields (e.g., slot index). 2. Creating the corresponding devlink object. 3. Enabling events (i.e., traps) for changes in line card status. 4. Querying and processing line card status. Unlike traps, the IRQ that notifies the CPU about line card status changes cannot be enabled / disabled on a per line card basis. If a handler is registered before the line cards are initialized, the handler risks accessing uninitialized memory. On the other hand, if the handler is registered after initialization, we risk missing events. For example, in step 4, the driver might see that a line card is in ready state and will tell the device to enable it. When enablement is done, the line card will be activated and the IRQ will be triggered. Since a handler was not registered, the event will be missed. Solve this by splitting the initialization sequence into two steps (1-2 and 3-4). In a subsequent patch, the handler will be registered between both steps. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 17:21:59 -07:00
Jakub Kicinski	510156a7f0	docs: netlink: basic introduction to Netlink Provide a bit of a brain dump of netlink related information as documentation. Hopefully this will be useful to people trying to navigate implementing YAML based parsing in languages we won't be able to help with. I started writing this doc while trying to figure out what it'd take to widen the applicability of YAML to good old rtnl, but the doc grew beyond that as it usually happens. In all honesty a lot of this information is new to me as I usually follow the "copy an existing example, drink to forget" process of writing netlink user space, so reviews will be much appreciated. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20220819200221.422801-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 16:10:23 -07:00
Jakub Kicinski	30b6055428	net: improve and fix netlink kdoc Subsequent patch will render the kdoc from include/uapi/linux/netlink.h into Documentation. We need to fix the warnings. While at it move the comments on struct nlmsghdr to a proper kdoc comment. Link: https://lore.kernel.org/r/20220819200221.422801-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 16:10:23 -07:00
Sergei Antonov	6c2c782fa0	net: ftmac100: set max_mtu to allow DSA overhead setting In case ftmac100 is used with a DSA switch, Linux wants to set MTU to 1504 to accommodate for DSA overhead. With the default max_mtu it leads to the error message: ftmac100 92000000.mac eth0: error -22 setting MTU to 1504 to include DSA overhead ftmac100 supports packet length 1518 (MAX_PKT_SIZE constant), so it is safe to report it in max_mtu. Signed-off-by: Sergei Antonov <saproj@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220821160844.474277-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 15:30:58 -07:00
Paolo Abeni	52412f5543	Merge branch 'dsa-changes-for-multiple-cpu-ports-part-3' Vladimir Oltean says: ==================== DSA changes for multiple CPU ports (part 3) Those who have been following part 1: https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/ and part 2: https://patchwork.kernel.org/project/netdevbpf/cover/20220521213743.2735445-1-vladimir.oltean@nxp.com/ will know that I am trying to enable the second internal port pair from the NXP LS1028A Felix switch for DSA-tagged traffic via "ocelot-8021q". This series represents part 3 of that effort. Covered here are some preparations in DSA for handling multiple DSA masters: - when changing the tagging protocol via sysfs - when the masters go down as well as preparation for monitoring the upper devices of a DSA master (to support DSA masters under a LAG). There are also 2 small preparations for the ocelot driver, for the case where multiple tag_8021q CPU ports are used in a LAG. Both those changes have to do with PGID forwarding domains. Compared to v1, the patches were trimmed down to just another preparation stage, and the UAPI changes were pushed further out to part 4. https://patchwork.kernel.org/project/netdevbpf/cover/20220523104256.3556016-1-olteanv@gmail.com/ Compared to v2, I had to export a symbol I forgot to (ocelot_port_teardown_dsa_8021q_cpu), to avoid a build breakage when the felix and seville drivers are built as modules. ==================== Link: https://lore.kernel.org/r/20220819174820.3585002-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:37 +02:00
Vladimir Oltean	291ac1517a	net: mscc: ocelot: adjust forwarding domain for CPU ports in a LAG Currently when we have 2 CPU ports configured for DSA tag_8021q mode and we put them in a LAG, a PGID dump looks like this: PGID_SRC[0] = ports 4, PGID_SRC[1] = ports 4, PGID_SRC[2] = ports 4, PGID_SRC[3] = ports 4, PGID_SRC[4] = ports 0, 1, 2, 3, 4, 5, PGID_SRC[5] = no ports (ports 0-3 are user ports, ports 4 and 5 are CPU ports) There are 2 problems with the configuration above: - user ports should enable forwarding towards both CPU ports, not just 4, and the aggregation PGIDs should prune one CPU port or the other from the destination port mask, based on a hash computed from packet headers. - CPU ports should not be allowed to forward towards themselves and also not towards other ports in the same LAG as themselves The first problem requires fixing up the PGID_SRC of user ports, when ocelot_port_assigned_dsa_8021q_cpu_mask() is called. We need to say that when a user port is assigned to a tag_8021q CPU port and that port is in a LAG, it should forward towards all ports in that LAG. The second problem requires fixing up the PGID_SRC of port 4, to remove ports 4 and 5 (in a LAG) from the allowed destinations. After this change, the PGID source masks look as follows: PGID_SRC[0] = ports 4, 5, PGID_SRC[1] = ports 4, 5, PGID_SRC[2] = ports 4, 5, PGID_SRC[3] = ports 4, 5, PGID_SRC[4] = ports 0, 1, 2, 3, PGID_SRC[5] = no ports Note that PGID_SRC[5] still looks weird (it should say "0, 1, 2, 3" just like PGID_SRC[4] does), but I've tested forwarding through this CPU port and it doesn't seem like anything is affected (it appears that PGID_SRC[4] is being looked up on forwarding from the CPU, since both ports 4 and 5 have logical port ID 4). The reason why it looks weird is because we've never called ocelot_port_assign_dsa_8021q_cpu() for any user port towards port 5 (all user ports are assigned to port 4 which is in a LAG with 5). Since things aren't broken, I'm willing to leave it like that for now and just document the oddity. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	36a0bf4435	net: mscc: ocelot: set up tag_8021q CPU ports independent of user port affinity This is a partial revert of commit `c295f9831f` ("net: mscc: ocelot: switch from {,un}set to {,un}assign for tag_8021q CPU ports"), because as it turns out, this isn't how tag_8021q CPU ports under a LAG are supposed to work. Under that scenario, all user ports are "assigned" to the single tag_8021q CPU port represented by the logical port corresponding to the bonding interface. So one CPU port in a LAG would have is_dsa_8021q_cpu set to true (the one whose physical port ID is equal to the logical port ID), and the other one to false. In turn, this makes 2 undesirable things happen: (1) PGID_CPU contains only the first physical CPU port, rather than both (2) only the first CPU port will be added to the private VLANs used by ocelot for VLAN-unaware bridging To make the driver behave in the same way for both bonded CPU ports, we need to bring back the old concept of setting up a port as a tag_8021q CPU port, and this is what deals with VLAN membership and PGID_CPU updating. But we also need the CPU port "assignment" (the user to CPU port affinity), and this is what updates the PGID_SRC forwarding rules. All DSA CPU ports are statically configured for tag_8021q mode when the tagging protocol is changed to ocelot-8021q. User ports are "assigned" to one CPU port or the other dynamically (this will be handled by a future change). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	5dc760d120	net: dsa: use dsa_tree_for_each_cpu_port in dsa_tree_{setup,teardown}_master More logic will be added to dsa_tree_setup_master() and dsa_tree_teardown_master() in upcoming changes. Reduce the indentation by one level in these functions by introducing and using a dedicated iterator for CPU ports of a tree. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	f41ec1fd1c	net: dsa: all DSA masters must be down when changing the tagging protocol The fact that the tagging protocol is set and queried from the /sys/class/net/<dsa-master>/dsa/tagging file is a bit of a quirk from the single CPU port days which isn't aging very well now that DSA can have more than a single CPU port. This is because the tagging protocol is a switch property, yet in the presence of multiple CPU ports it can be queried and set from multiple sysfs files, all of which are handled by the same implementation. The current logic ensures that the net device whose sysfs file we're changing the tagging protocol through must be down. That net device is the DSA master, and this is fine for single DSA master / CPU port setups. But exactly because the tagging protocol is per switch [ tree, in fact ] and not per DSA master, this isn't fine any longer with multiple CPU ports, and we must iterate through the tree and find all DSA masters, and make sure that all of them are down. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	7136097e11	net: dsa: only bring down user ports assigned to a given DSA master This is an adaptation of commit `c0a8a9c274` ("net: dsa: automatically bring user ports down when master goes down") for multiple DSA masters. When a DSA master goes down, only the user ports under its control should go down too, the others can still send/receive traffic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	4f03dcc6b9	net: dsa: existing DSA masters cannot join upper interfaces All the traffic to/from a DSA master is supposed to be distributed among its DSA switch upper interfaces, so we should not allow other upper device kinds. An exception to this is DSA_TAG_PROTO_NONE (switches with no DSA tags), and in that case it is actually expected to create e.g. VLAN interfaces on the master. But for those, netdev_uses_dsa(master) returns false, so the restriction doesn't apply. The motivation for this change is to allow LAG interfaces of DSA masters to be DSA masters themselves. We want to restrict the user's degrees of freedom by 1: the LAG should already have all DSA masters as lowers, and while lower ports of the LAG can be removed, none can be added after the fact. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	920a33cd72	net: bridge: move DSA master bridging restriction to DSA When DSA gains support for multiple CPU ports in a LAG, it will become mandatory to monitor the changeupper events for the DSA master. In fact, there are already some restrictions to be imposed in that area, namely that a DSA master cannot be a bridge port except in some special circumstances. Centralize the restrictions at the level of the DSA layer as a preliminary step. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	0498277ee1	net: dsa: don't stop at NOTIFY_OK when calling ds->ops->port_prechangeupper dsa_slave_prechangeupper_sanity_check() is supposed to enforce some adjacency restrictions, and calls ds->ops->port_prechangeupper if the driver implements it. We convert the error code from the port_prechangeupper() call to a notifier code, and 0 is converted to NOTIFY_OK, but the caller of dsa_slave_prechangeupper_sanity_check() stops at any notifier code different from NOTIFY_DONE. Avoid this by converting back the notifier code to an error code, so that both NOTIFY_OK and NOTIFY_DONE will be seen as 0. This allows more parallel sanity check functions to be added. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00

1 2 3 4 5 ...

1121880 Commits