-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmS4TjYACgkQxWXV+ddt
WDt8yw//b6E+Ab0QBBS6Dl9R3euv+aqQUa35Lg8z2L89oLjNlX/yZSj4ZIZ/467A
omDtCat1uwYSlzPgM7RtfvXew7ujULmK9UP5IdwEQgQ5qf+pIE50h78x64zCLG1R
g2C9OMGvNxU8kOR2sRFaBJ+Lf8Mth8ipOzDl8XPE4m7t87tLYdpmVoHJKzNj/AcF
UOruS8HilFYlc8Oqb/yxDPYeMmdUQaqahhq4tHkoIB/09aePaJwAiyhQ53v3TaAX
oBvBBrV65WdHu3epfB9WcPQRyd09ov+e14UXAX7CepkZ0RdIpU1CcszPGHTtOP94
fRaup2zFygolHSTMPFU3drtnI3CRIGntdfHFqDEH+M1ysvRxG4VUnAjWL6w9KZZC
3z2G6cYoVxpcbCch5wpn/97cr6PowRABK0AoirrreU0VA2Vkdi45N5eoT/2DTEVT
s5VirJsfC5IXCrvQf/h67jHlqk+23e+/uP8in+GWTgPEmddTSWNKCi9ZcHKQ/skf
5U9Y3vkFFl68cQEwcB6vYMlbCiCxCuimz8AvGFrxkU9qmah6gcgTnXwKqneAbr3f
jFwZvgh+E5ueK76/KLLWIKyfDx5cU8N/D5TlmZ+n/TAoWUFVs5V2ctB4EOsIotev
EN7sriJGUkfGp+EAs4Xux4OKGx6IinyogdZmQN9HmNIlGks02qk=
=yO9Y
-----END PGP SIGNATURE-----
Merge tag 'for-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Stable fixes:
- fix race between balance and cancel/pause
- various iput() fixes
- fix use-after-free of new block group that became unused
- fix warning when putting transaction with qgroups enabled after
abort
- fix crash in subpage mode when page could be released between map
and map read
- when scrubbing raid56 verify the P/Q stripes unconditionally
- fix minor memory leak in zoned mode when a block group with an
unexpected superblock is found
Regression fixes:
- fix ordered extent split error handling when submitting direct IO
- user irq-safe locking when adding delayed iputs"
* tag 'for-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix warning when putting transaction with qgroups enabled after abort
btrfs: fix ordered extent split error handling in btrfs_dio_submit_io
btrfs: set_page_extent_mapped after read_folio in btrfs_cont_expand
btrfs: raid56: always verify the P/Q contents for scrub
btrfs: use irq safe locking when running and adding delayed iputs
btrfs: fix iput() on error pointer after error during orphan cleanup
btrfs: fix double iput() on inode after an error during orphan cleanup
btrfs: zoned: fix memory leak after finding block group with super blocks
btrfs: fix use-after-free of new block group that became unused
btrfs: be a bit more careful when setting mirror_num_ret in btrfs_map_block
btrfs: fix race between balance and cancel/pause
One fix for an issue with parsing partially specified DTs.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmS5MIMACgkQJNaLcl1U
h9Cz9Af/VGJG/hTOHkaFop2ThvTEhneMejbGvUAbvbiFiG3qLDcH69oFG7v/smCB
n/lKzEqAs5FGiksfj4tRJbQUREaCWOvctyVt6XxFrjc8ewxnZP7YBIYPM/u4KD7t
irwqq+0SvjLsX2tjfUYPvCG1+m7OGQWHhygaTuxl+DIucJeGmhevzHtsV+ZBQvyc
1TD2T8CALxMXZp3u5E0hol+42dfSL0SkNixhc6psWqC/hDBXqAdiQ5xXwoK9r6+G
R7qw0ODTiWuBbusltlDZ8W+sUvdO977z/+NlCItYVpgVZWAK8ODAYurH+e9fwrvv
wjVZe+yS2ZaCjZDC0//1P671r8xmWA==
=Njfr
-----END PGP SIGNATURE-----
Merge tag 'regulator-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One fix for an issue with parsing partially specified DTs"
* tag 'regulator-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: da9063: fix null pointer deref with partial DT config
tcp_sequence() uses two conditions to decide to drop a packet,
and we currently report generic TCP_INVALID_SEQUENCE drop reason.
Duplicates are common, we need to distinguish them from
the other case.
I chose to not reuse TCP_OLD_DATA, and instead added
TCP_OLD_SEQUENCE drop reason.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719064754.2794106-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
end key should be equal to start unless NFT_SET_EXT_KEY_END is present.
Its possible to add elements that only have a start key
("{ 1.0.0.0 . 2.0.0.0 }") without an internval end.
Insertion treats this via:
if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
end = (const u8 *)nft_set_ext_key_end(ext)->data;
else
end = start;
but removal side always uses nft_set_ext_key_end().
This is wrong and leads to garbage remaining in the set after removal
next lookup/insert attempt will give:
BUG: KASAN: slab-use-after-free in pipapo_get+0x8eb/0xb90
Read of size 1 at addr ffff888100d50586 by task nft-pipapo_uaf_/1399
Call Trace:
kasan_report+0x105/0x140
pipapo_get+0x8eb/0xb90
nft_pipapo_insert+0x1dc/0x1710
nf_tables_newsetelem+0x31f5/0x4e00
..
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reported-by: lonial con <kongln9170@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Stefan Eichenberger says:
====================
Add a driver for the Marvell 88Q2110 PHY
Add support for 1000BASE-T1 to the phy-c45 helper and add a first
1000BASE-T1 driver for the Marvell 88Q2110 PHY.
====================
Link: https://lore.kernel.org/r/20230719064258.9746-1-eichest@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a driver for the Marvell 88Q2110. This driver allows to detect the
link, switch between 100BASE-T1 and 1000BASE-T1 and switch between
master and slave mode. Autonegotiation supported by the PHY does not yet
work.
Signed-off-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Read the ability to do 100BASE-T1 and 1000BASE-T1 from the extended
BASE-T1 ability register of the PHY.
Signed-off-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a separate function to read the BASE-T1 abilities. Some PHYs do not
indicate the availability of the extended BASE-T1 ability register, so
this function must be called separately.
Signed-off-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add support to force 1000BASE-T1 by setting the correct control bit in
the MDIO_MMD_PMA_PMD_BT1_CTRL register.
Signed-off-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add registers and definitions to support 1000BASE-T1. This includes the
PCS Control and Status registers (3.2304 and 3.2305) as well as some
missing bits on the PMA/PMD extended ability register (1.18) and PMA/PMD
CTRL (1.2100) register.
Signed-off-by: Stefan Eichenberger <eichest@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Can be called via nft set element list iteration, which may acquire
rcu and/or bh read lock (depends on set type).
BUG: sleeping function called from invalid context at net/netfilter/nf_tables_api.c:3353
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1232, name: nft
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by nft/1232:
#0: ffff8881180e3ea8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid
#1: ffffffff83f5f540 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire
Call Trace:
nft_chain_validate
nft_lookup_validate_setelem
nft_pipapo_walk
nft_lookup_validate
nft_chain_validate
nft_immediate_validate
nft_chain_validate
nf_tables_validate
nf_tables_abort
No choice but to move it to nf_tables_validate().
Fixes: 81ea010667 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
Signed-off-by: Florian Westphal <fw@strlen.de>
On some platforms there is a padding hole in the nft_verdict
structure, between the verdict code and the chain pointer.
On element insertion, if the new element clashes with an existing one and
NLM_F_EXCL flag isn't set, we want to ignore the -EEXIST error as long as
the data associated with duplicated element is the same as the existing
one. The data equality check uses memcmp.
For normal data (NFT_DATA_VALUE) this works fine, but for NFT_DATA_VERDICT
padding area leads to spurious failure even if the verdict data is the
same.
This then makes the insertion fail with 'already exists' error, even
though the new "key : data" matches an existing entry and userspace
told the kernel that it doesn't want to receive an error indication.
Fixes: c016c7e45d ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
Signed-off-by: Florian Westphal <fw@strlen.de>
A JSON pointer reference to the entire document must not have a trailing
"/" and should be just a "#". The existing jsonschema package allows
these, but changes in 4.18 make allowed "$ref" URIs stricter and throw
errors on these references.
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230718203202.1761304-1-robh@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima says:
====================
net: Support STP on bridge in non-root netns.
Currently, STP does not work in non-root netns as llc_rcv() drops
packets from non-root netns.
This series fixes it by making some protocol handlers netns-aware,
which are called from llc_rcv() as follows:
llc_rcv()
|
|- sap->rcv_func : registered by llc_sap_open()
|
| * functions : regsitered by register_8022_client()
| -> No in-kernel user call register_8022_client()
|
| * snap_rcv()
| |
| `- proto->rcvfunc() : registered by register_snap_client()
|
| * aarp_rcv() : drop packets from non-root netns
| * atalk_rcv() : drop packets from non-root netns
|
| * stp_pdu_rcv()
| |
| `- garp_protos[]->rcv() : registered by stp_proto_register()
|
| * garp_pdu_rcv() : netns-aware
| * br_stp_rcv() : netns-aware
|
|- llc_type_handlers[llc_pdu_type(skb) - 1]
|
| * llc_sap_handler() : NOT netns-aware (Patch 1)
| * llc_conn_handler() : NOT netns-aware (Patch 2)
|
`- llc_station_handler
* llc_station_rcv() : netns-aware
Patch 1 & 2 convert not-netns-aware functions and Patch 3 remove the
netns restriction in llc_rcv().
Note this series does not namespacify AF_LLC so that these patches
can be backported to stable without conflicts (at least to 4.14.y).
Another series that adds netns support for AF_LLC will be targeted
to net-next later.
====================
Link: https://lore.kernel.org/r/20230718174152.57408-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reverts commit 56a16035bb.
Since the previous commit, STP works on bridge in netns.
# unshare -n
# ip link add br0 type bridge
# ip link add veth0 type veth peer name veth1
# ip link set veth0 master br0 up
[ 50.558135] br0: port 1(veth0) entered blocking state
[ 50.558366] br0: port 1(veth0) entered disabled state
[ 50.558798] veth0: entered allmulticast mode
[ 50.564401] veth0: entered promiscuous mode
# ip link set veth1 master br0 up
[ 54.215487] br0: port 2(veth1) entered blocking state
[ 54.215657] br0: port 2(veth1) entered disabled state
[ 54.215848] veth1: entered allmulticast mode
[ 54.219577] veth1: entered promiscuous mode
# ip link set br0 type bridge stp_state 1
# ip link set br0 up
[ 61.960726] br0: port 2(veth1) entered blocking state
[ 61.961097] br0: port 2(veth1) entered listening state
[ 61.961495] br0: port 1(veth0) entered blocking state
[ 61.961653] br0: port 1(veth0) entered listening state
[ 63.998835] br0: port 2(veth1) entered blocking state
[ 77.437113] br0: port 1(veth0) entered learning state
[ 86.653501] br0: received packet on veth0 with own address as source address (addr:6e:0f:e7:6f:5f:5f, vlan:0)
[ 92.797095] br0: port 1(veth0) entered forwarding state
[ 92.797398] br0: topology change detected, propagating
Let's remove the warning.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Now these upper layer protocol handlers can be called from llc_rcv()
as sap->rcv_func(), which is registered by llc_sap_open().
* function which is passed to register_8022_client()
-> no in-kernel user calls register_8022_client().
* snap_rcv()
`- proto->rcvfunc() : registered by register_snap_client()
-> aarp_rcv() and atalk_rcv() drop packets from non-root netns
* stp_pdu_rcv()
`- garp_protos[]->rcv() : registered by stp_proto_register()
-> garp_pdu_rcv() and br_stp_rcv() are netns-aware
So, we can safely remove the netns restriction in llc_rcv().
Fixes: e730c15519 ("[NET]: Make packet reception network namespace safe")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We will remove this restriction in llc_rcv() in the following patch,
which means that the protocol handler must be aware of netns.
if (!net_eq(dev_net(dev), &init_net))
goto drop;
llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
if not NULL.
If the PDU type is LLC_DEST_CONN, llc_conn_handler() is called to pass
skb to corresponding sockets. Then, we must look up a proper socket in
the same netns with skb->dev.
llc_conn_handler() calls __llc_lookup() to look up a established or
litening socket by __llc_lookup_established() and llc_lookup_listener().
Both functions iterate on a list and call llc_estab_match() or
llc_listener_match() to check if the socket is the correct destination.
However, these functions do not check netns.
Also, bind() and connect() call llc_establish_connection(), which
finally calls __llc_lookup_established(), to check if there is a
conflicting socket.
Let's test netns in llc_estab_match() and llc_listener_match().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We will remove this restriction in llc_rcv() soon, which means that the
protocol handler must be aware of netns.
if (!net_eq(dev_net(dev), &init_net))
goto drop;
llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
if not NULL.
If the PDU type is LLC_DEST_SAP, llc_sap_handler() is called to pass skb
to corresponding sockets. Then, we must look up a proper socket in the
same netns with skb->dev.
If the destination is a multicast address, llc_sap_handler() calls
llc_sap_mcast(). It calculates a hash based on DSAP and skb->dev->ifindex,
iterates on a socket list, and calls llc_mcast_match() to check if the
socket is the correct destination. Then, llc_mcast_match() checks if
skb->dev matches with llc_sk(sk)->dev. So, we need not check netns here.
OTOH, if the destination is a unicast address, llc_sap_handler() calls
llc_lookup_dgram() to look up a socket, but it does not check the netns.
Therefore, we need to add netns check in llc_lookup_dgram().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Xin Long says:
====================
net: handle the exp removal problem with ovs upcall properly
With the OVS upcall, the original ct in the skb will be dropped, and when
the skb comes back from userspace it has to create a new ct again through
nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
However, the new ct will not be able to have the exp as the original ct
has taken it away from the hash table in nf_ct_find_expectation(). This
will cause some flow never to be matched, like:
'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
matched.
OVS conntrack works around this by adding its own exp lookup function to
not remove the exp from the hash table and saving the exp and its master
info to the flow keys instead of create a real ct. But this way doesn't
work for TC act_ct.
The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
allows both OVS conntrack and TC act_ct to have a simple and clear fix
for this problem in the patch 2/3 and 3/3.
====================
Link: https://lore.kernel.org/r/cover.1689541664.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed
from the hashtable when lookup, we can simplify the exp processing code a
lot in openvswitch conntrack.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
With the following flows, the packets will be dropped if OVS TC offload is
enabled.
'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
In the 1st flow, it finds the exp from the hashtable and removes it then
creates the ct with this exp in act_ct. However, in the 2nd flow it goes
to the OVS upcall at the 1st time. When the skb comes back from userspace,
it has to create the ct again without exp(the exp was removed last time).
With no 'rel' set in the ct, the 3rd flow can never get matched.
In OVS conntrack, it works around it by adding its own exp lookup function
ovs_ct_expect_find() where it doesn't remove the exp. Instead of creating
a real ct, it only updates its keys with the exp and its master info. So
when the skb comes back, the exp is still in the hashtable.
However, we can't do this trick in act_ct, as tc flower match is using a
real ct, and passing the exp and its master info to flower parsing via
tc_skb_cb is also not possible (tc_skb_cb size is not big enough).
The simple and clear fix is to not remove the exp at the 1st flow, namely,
not set IPS_CONFIRMED in tmpl when commit is not set in act_ct.
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently nf_conntrack_in() calling nf_ct_find_expectation() will
remove the exp from the hash table. However, in some scenario, we
expect the exp not to be removed when the created ct will not be
confirmed, like in OVS and TC conntrack in the following patches.
This patch allows exp not to be removed by setting IPS_CONFIRMED
in the status of the tmpl.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
After commit d2ccd7bc8a ("tcp: avoid resetting ACK timer in DCTCP"),
tcp_enter_quickack_mode() is only used from net/ipv4/tcp_input.c.
Fixes: d2ccd7bc8a ("tcp: avoid resetting ACK timer in DCTCP")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20230718162049.1444938-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
entries and bind debugfs files would display wrong data on NETSYS_V2 and
later because instead of using mtk_get_ib1_pkt_type the driver would use
MTK_FOE_IB1_PACKET_TYPE which corresponds to NETSYS_V1(.x) SoCs.
Use mtk_get_ib1_pkt_type so entries and bind records display correctly.
Fixes: 03a3180e5c ("net: ethernet: mtk_eth_soc: introduce flow offloading support for mt7986")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c0ae03d0182f4d27b874cbdf0059bc972c317f3c.1689727134.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In most cases UDP sockets use the default data ready callback.
Leverage the indirect call wrapper for such callback to avoid an
indirect call in fastpath.
The above gives small but measurable performance gain under UDP flood.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/d47d53e6f8ee7a11228ca2f025d6243cc04b77f3.1689691004.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit says:
====================
r8169: revert two changes that caused regressions
This reverts two changes that caused regressions.
====================
Link: https://lore.kernel.org/r/ddadceae-19c9-81b8-47b5-a4ff85e2563a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Fang says:
====================
clean up the FEC driver
When reading the codes of the FEC driver recently, I found there are
some redundant or invalid codes, these codes make the FEC driver a
bit messy and not concise, so this patch set has cleaned up the FEC
driver. At present, I only found these, but I believe these are not
all, I will continue to clean up the FEC driver in the future.
====================
Link: https://lore.kernel.org/r/20230718090928.2654347-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Three members of struct fec_enet_private have not been used since
they were first introduced into the FEC driver (commit 6605b730c0
("FEC: Add time stamping code and a PTP hardware clock")). Namely,
last_overflow_check, rx_hwtstamp_filter and base_incval. These
unused members make the struct fec_enet_private a bit messy and
might confuse the readers. There is no reason to keep them in the
FEC driver any longer.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20230718090928.2654347-4-wei.fang@nxp.com
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The fec_enet_init() is only invoked when the FEC driver probes, and
the network device of FEC is not been brought up at this moment. So
the fec_set_mac_address() does nothing and just returns zero when it
is invoked in the fec_enet_init(). Actually, the MAC address is set
into the hardware through fec_restart() which is also called in the
fec_enet_init().
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20230718090928.2654347-3-wei.fang@nxp.com
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the commit 95698ff617 ("net: fec: using page pool to manage
RX buffers") has been applied, all the rx packets, no matter small
packets or large packets are put directly into the kernel networking
buffers. That is to say, the rx copybreak function has been removed
since then, but the related code has not been completely cleaned up.
So the purpose of this patch is to clean up the remaining related
code of rx copybreak.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20230718090928.2654347-2-wei.fang@nxp.com
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'clock_in_out' property is optional, and it can be one of two enums.
The binding does not specify what is the behavior when the property is
missing altogether.
Hence, add a default value that the driver can use.
Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230718090914.282293-1-eugen.hristev@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 3f4ca5fafc.
Commit 3f4ca5fafc ("tcp: avoid the lookup process failing to get sk in
ehash table") reversed the order in how a socket is inserted into ehash
to fix an issue that ehash-lookup could fail when reqsk/full sk/twsk are
swapped. However, it introduced another lookup failure.
The full socket in ehash is allocated from a slab with SLAB_TYPESAFE_BY_RCU
and does not have SOCK_RCU_FREE, so the socket could be reused even while
it is being referenced on another CPU doing RCU lookup.
Let's say a socket is reused and inserted into the same hash bucket during
lookup. After the blamed commit, a new socket is inserted at the end of
the list. If that happens, we will skip sockets placed after the previous
position of the reused socket, resulting in ehash lookup failure.
As described in Documentation/RCU/rculist_nulls.rst, we should insert a
new socket at the head of the list to avoid such an issue.
This issue, the swap-lookup-failure, and another variant reported in [0]
can all be handled properly by adding a locked ehash lookup suggested by
Eric Dumazet [1].
However, this issue could occur for every packet, thus more likely than
the other two races, so let's revert the change for now.
Link: https://lore.kernel.org/netdev/20230606064306.9192-1-duanmuquan@baidu.com/ [0]
Link: https://lore.kernel.org/netdev/CANn89iK8snOz8TYOhhwfimC7ykYA78GA3Nyv8x06SZYa1nKdyA@mail.gmail.com/ [1]
Fixes: 3f4ca5fafc ("tcp: avoid the lookup process failing to get sk in ehash table")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230717215918.15723-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, there are two major issues with stmmac driver statistics
First of all, statistics in stmmac_extra_stats, stmmac_rxq_stats
and stmmac_txq_stats are 32 bit variables on 32 bit platforms. This
can cause some stats to overflow after several minutes of
high traffic, for example rx_pkt_n, tx_pkt_n and so on.
Secondly, if HW supports multiqueues, there are frequent cacheline
ping pongs on some driver statistic vars, for example, normal_irq_n,
tx_pkt_n and so on. What's more, frequent cacheline ping pongs on
normal_irq_n happens in ISR, this makes the situation worse.
To improve the driver, we convert those statistics to 64 bit, implement
ndo_get_stats64 and update .get_ethtool_stats implementation
accordingly. We also use per-queue statistics where necessary to remove
the cacheline ping pongs as much as possible to make multiqueue
operations faster. Those statistics which are not possible to overflow
and not frequently updated are kept as is.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230717160630.1892-3-jszhang@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
FWICT, the common style in other network drivers: the network
statistics are not cleared since initialization, follow the common
style for stmmac.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230717160630.1892-2-jszhang@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmS3s90ACgkQrB3Eaf9P
W7dl5A/+N59cq4fk7k8V4KQ/MWIB42quzq6lem8nqSams7tGaug1AyJkr4urZdjg
DwlmpGHEw4yWmHiOdzbPwgqOsqYyhzbcwU2RPNuDgdtkJm6CigbddJbIFfx8HvMu
zKLtnKJ7QDGf0TdVjuTcx0DQjsK4rxPzr445iZmvZ3g9rcomkxQZ/LDozjH2QZhc
LYkLrjt09kqLpd5Dkco5/ZuAYGJF4ujAHNTDVkdCuRQFmb+WsfLpzcpT5AtJLGeU
Q/9TfZwI3YH0AbfdpBL8qhDAVYzwTcx6ldWZLbSJNSUUTswg0H7tF+JyBR3wt626
9B+qsSWEvLuA0WFIKdOz484Dt/h3ggTgoEEDVjDEaDLenx3ESx3A8jv4f7IL2EMs
edjTSphj8ZTQv4uw/1CBNIdTbujEtvf4FhkagSvrUoUYOGvobcvqvsvCGzac0d+K
5x7bMSJU9Cnv8YYAB8xbhIN5LqE1+1bPiNHk4xAxW5qfYvO0E2rjwd52DYgCeV2G
GRDbnhAmtrk9HHdtAelWSZyI6tJJNG0H43KSjBUNEdHcVB2bz8iGLHFr72+cAfKp
I0nR3EoGYKQGZHIXiAtqSZIbEHlpU080pclT5WDEJJG0yLR0PwnDiVTrEI2AngCP
dkSvJCi5wwMCx+HljBY0JyH8CDDu3YS1SpfXzEe5lqset4ai4CI=
=e8vA
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2023-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2023-07-19
Just a leftover from the last development cycle:
1) delete a clear to zero of encap_oa, it is not needed anymore.
From Leon Romanovsky.
* tag 'ipsec-next-2023-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: delete not-needed clear to zero of encap_oa
====================
Link: https://lore.kernel.org/r/20230719100228.373055-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmS4IUIACgkQ6rmadz2v
bTrVCw/9GG5A5ebqwoh/DrsFXEzpKDmZFIAWd5wB+Fx2i8y+6Jl/Fw6SjkkAtUnc
215T3YX2u3Xg1WFC5zxY9lYm2OeMq2lPHVwjlqgt/pHE8D6b8cZ44eyN+f0ZSiLy
wyx0wHLd3oP4KvMyiqm7/ZmhDjAtBpuqMjY5FNsbUxrIGUUI2ZLC4VFVWhnWmzRA
eEOQuUge4e1YD62kfkWlT/GEv710ysqFZD2zs4yhevDfmr/6DAIaA7dhfKMYsM/S
hCPoCuuXWVoHiqksm0U1BwpEiAQrqR91Sx8RCAakw5Pyp5hkj9dJc9sLwkgMH/k7
2352IIPXddH8cGKQM+hIBrc/io+6MxMbVk7Pe+1OUIBrvP//zQrHWk0zbssF3D8C
z6TbxBLdSzbDELPph3gZu5bNaLSkpuODhNjLcIVGSOeSJ5nsgATCQtXFAAPV0E/Q
v2O7Te5aTjTOpFMcIrIK1eWXUS56yRA+YwDa1VuWXAiLrr+Rq0tm4tBqxhof3KlH
bfCoqFNa12MfpCJURHICcV7DJo53rWbCtDSJPaYwZXb/jJPd3gPb8EVixoLN2A1M
dV/ou9rKEEkJXxsZ4Bctuh7t5YwpqxTq74YSdvnkOJ8P1lBDYST2SfHgQVOayQPv
XH9MlMO3Qtb9Sl0ZiI7gHbpK7h6v9RvRuHJcnN2e3wwMEx256xE=
=VRCb
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2023-07-19
We've added 45 non-merge commits during the last 3 day(s) which contain
a total of 71 files changed, 7808 insertions(+), 592 deletions(-).
The main changes are:
1) multi-buffer support in AF_XDP, from Maciej Fijalkowski,
Magnus Karlsson, Tirthendu Sarkar.
2) BPF link support for tc BPF programs, from Daniel Borkmann.
3) Enable bpf_map_sum_elem_count kfunc for all program types,
from Anton Protopopov.
4) Add 'owner' field to bpf_rb_node to fix races in shared ownership,
Dave Marchevsky.
5) Prevent potential skb_header_pointer() misuse, from Alexei Starovoitov.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
bpf, net: Introduce skb_pointer_if_linear().
bpf: sync tools/ uapi header with
selftests/bpf: Add mprog API tests for BPF tcx links
selftests/bpf: Add mprog API tests for BPF tcx opts
bpftool: Extend net dump with tcx progs
libbpf: Add helper macro to clear opts structs
libbpf: Add link-based API for tcx
libbpf: Add opts-based attach/detach/query API for tcx
bpf: Add fd-based tcx multi-prog infra with link support
bpf: Add generic attach/detach/query API for multi-progs
selftests/xsk: reset NIC settings to default after running test suite
selftests/xsk: add test for too many frags
selftests/xsk: add metadata copy test for multi-buff
selftests/xsk: add invalid descriptor test for multi-buffer
selftests/xsk: add unaligned mode test for multi-buffer
selftests/xsk: add basic multi-buffer test
selftests/xsk: transmit and receive multi-buffer packets
xsk: add multi-buffer documentation
i40e: xsk: add TX multi-buffer support
ice: xsk: Tx multi-buffer support
...
====================
Link: https://lore.kernel.org/r/20230719175424.75717-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmS4IHYACgkQ6rmadz2v
bTo4bA/8C38pmtG+eyca/BLd0s/SbPq2b+BPYNHdbgBXp+iK2HMzBq3s19cbhDUk
UAXjDEUKOBnPx+6J37Hqq+CSiYaecRn/96Q2WQq6LfJdPSiBSVnudhJwZYFOLVZC
fApGAtzWQRYtlJuTLnDPL6rLEBVShz3aCroR/NXYHGqTw79yZFqvW+0VLKgoUSgK
4Px/TC6PvsIQPtIpN+x46ATa/p0DzTbiPH9qn1vz3fXfRSXrA+4dD5pDYkDdNE+L
lhBTIsrBHjc6Luz1EY3ac0haZPUAMkKEyDzT8PbsO+DKhNk/fBEgPOo+6iFTaLfE
N2Ns09iw5qNnnBgHkTphw1PhabPsDGxCf7oy4uSnTW+7O6KkmyshUIk1eF6NL5hl
TTPP0pAS3UJfIRtWdghatF+3ZrkGGwCI+3FzB16Hc8chLW3oyr8x6W5K7bHAJbI1
yg/nLYCkrLipm9+dRMtYjYrx8aoStGgSW0WvGTS0McpndHAJhuhdRLHkf26MFa18
dPus4xJ40njBJn2/f6xiJ24lIemasu37/vrYtHJafywUjP9a4lYkM476/0Cnr1Ek
+IMrydijUUH/WeSoeO0OhevdCfzhaPHS3g/7LyV3Lnn/2xl7Qg5GEkEjmHamFamY
XLbBDToJc6mWJjOZx4QV0LO3atT3Kaj3SgS0GkY/43X8gyj8GiY=
=Dscx
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
pull-request: bpf 2023-07-19
We've added 4 non-merge commits during the last 1 day(s) which contain
a total of 3 files changed, 55 insertions(+), 10 deletions(-).
The main changes are:
1) Fix stack depth check in presence of async callbacks,
from Kumar Kartikeya Dwivedi.
2) Fix BTI type used for freplace attached functions,
from Alexander Duyck.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, arm64: Fix BTI type used for freplace attached functions
selftests/bpf: Add more tests for check_max_stack_depth bug
bpf: Repeat check_max_stack_depth for async callbacks
bpf: Fix subprog idx logic in check_max_stack_depth
====================
Link: https://lore.kernel.org/r/20230719174502.74023-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEDs2BvajyNKlf9TJQvlAcSiqKBOgFAmS3jZATHG1rbEBwZW5n
dXRyb25peC5kZQAKCRC+UBxKKooE6Fo4CACVxU4+xMgAszot9Sup7psEiUjbOHKd
TCxEEm9GML/aquPbXxx3r0yztyAzd1dF9zXSFGofUyfVV/pdKbu7h1vP+/EndmiL
W9eoVhnYzAbz2TivFgqQwqqTlosEgKmZF5qGl4eloH6luOILbvrl/UuD0J70h3gK
Ixus6TyEIdI2y1ewTlWXMOriedTVeTX+DC2S02Bi42cVFdAm29nie2fCdHVGhF/o
a0KICO3HjKhTmGReSHDpCMro+r93BezmYxoM96K4a911tge7fLzYhQQ1+ZVyQ3Mw
DRHMD1ehM6cWd0GUce3LqBjb9zx9ciQK0ohqKzAIaEV+Rta84P/CHtbn
=GOLr
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-6.6-20230719' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2023-07-19
The first 2 patches are by Judith Mendez, target the m_can driver and
add hrtimer based polling support for TI AM62x SoCs, where the
interrupt of the MCU domain's m_can cores is not routed to the Cortex
A53 core.
A patch by Rob Herring converts the grcan driver to use the correct DT
include files.
Michal Simek and Srinivas Neeli add support for optional reset control
to the xilinx_can driver.
The next 2 patches are by Jimmy Assarsson and add support for new
Kvaser pciefd to the kvaser_pciefd driver.
Mao Zhu's patch for the ucan driver removes a repeated word from a
comment.
* tag 'linux-can-next-for-6.6-20230719' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: ucan: Remove repeated word
can: kvaser_pciefd: Add support for new Kvaser pciefd devices
can: kvaser_pciefd: Move hardware specific constants and functions into a driver_data struct
can: Explicitly include correct DT includes
can: xilinx_can: Add support for controller reset
dt-bindings: can: xilinx_can: Add reset description
can: m_can: Add hrtimer to generate software interrupt
dt-bindings: net: can: Remove interrupt properties for MCAN
====================
Link: https://lore.kernel.org/r/20230719072348.525039-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZJLBrQAKCRDh3BK/laaZ
PMGXAQC+EWva3wi86A4MeRAGtVnpQyKeFKRsBjEpU2MKdhvVhQEAn5eCsQAtt/R/
+1WmLVF2uAweoG6eXBKnWx7537dbQAs=
=MdDc
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Small but important fixes and a trivial cleanup"
* tag 'fuse-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: ioctl: translate ENOSYS in outarg
fuse: revalidate: don't invalidate if interrupted
fuse: Apply flags2 only when userspace set the FUSE_INIT_EXT
fuse: remove duplicate check for nodeid
fuse: add feature flag for expire-only
Seeing the following:
Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'
...so sync tools version missing some list_node/rb_tree fields.
Fixes: c3c510ce43 ("bpf: Add 'owner' field to bpf_{list,rb}_node")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20230719162257.20818-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
BPF link support for tc BPF programs
This series adds BPF link support for tc BPF programs. We initially
presented the motivation, related work and design at last year's LPC
conference in the networking & BPF track [0], and a recent update on
our progress of the rework during this year's LSF/MM/BPF summit [1].
The main changes are in first two patches and the last two have an
extensive batch of test cases we developed along with it, please see
individual patches for details. We tested this series with tc-testing
selftest suite as well as BPF CI/selftests. Thanks!
v5 -> v6:
- Remove export symbol on tcx_inc/dec (Jakub)
- Treat fd==0 as invalid (Stan, Alexei)
v4 -> v5:
- Updated bpftool docs and usage of bpftool net (Quentin)
- Consistent dump "prog id"/"link id" -> "prog_id"/"link_id" (Quentin)
- Reworked bpftool flag output handling (Quentin)
- LIBBPF_OPTS_RESET() macro with varargs for reinit (Andrii)
- libbpf opts/link bail out on relative_fd && relative_id (Andrii)
- libbpf improvements for assigning attr.relative_{id,fd} (Andrii)
- libbpf sorting in libbpf.map (Andrii)
- libbpf move ifindex to bpf_program__attach_tcx param (Andrii)
- libbpf move BPF_F_ID flag handling to bpf_link_create (Andrii)
- bpf_program_attach_fd with tcx instead of tc (Andrii)
- Reworking kernel-internal bpf_mprog API (Alexei, Andrii)
- Change "object" notation to "id_or_fd" (Andrii)
- Remove on stack cpp[BPF_MPROG_MAX] and switch to memmove (Andrii)
- Simplify bpf_mprog_{insert,delete} and add comment on internals
- Get rid of BPF_MPROG_* return codes (Alexei, Andrii)
v3 -> v4:
- Fix bpftool output to display tcx/{ingress,egress} (Stan)
- Documentation around API, BPF_MPROG_* return codes and locking
expectations (Stan, Alexei)
- Change _after and _before to have the same semantics for return
value (Alexei)
- Rework mprog initialization and move allocation/free one layer
up into tcx to simplify the code (Stan)
- Add comment on synchronize_rcu and parent->ref (Stan)
- Add comment on bpf_mprog_pos_() helpers wrt target position (Stan)
v2 -> v3:
- Removal of BPF_F_FIRST/BPF_F_LAST from control UAPI (Toke, Stan)
- Along with that full rework of bpf_mprog internals to simplify
dependency management, looks much nicer now imho
- Just single bpf_mprog_cp instead of two (Andrii)
- atomic64_t for revision counter (Andrii)
- Evaluate target position and reject on conflicts (Andrii)
- Keep track of actual count in bpf_mprob_bundle (Andrii)
- Make combo of REPLACE and BEFORE/AFTER work (Andrii)
- Moved miniq as first struct member (Jamal)
- Rework tcx_link_attach with regards to rtnl (Jakub, Andrii)
- Moved wrappers after bpf_prog_detach_ops (Andrii)
- Removed union for relative_fd and friends for opts and link in
libbpf (Andrii)
- Add doc comments to attach/detach/query libbpf APIs (Andrii)
- Dropped SEC_ATTACHABLE_OPT (Andrii)
- Add an OPTS_ZEROED check to bpf_link_create (Andrii)
- Keep opts as the last argument in bpf_program_attach_fd (Andrii)
- Rework bpf_program_attach_fd (Andrii)
- Remove OPTS_GET before we checked OPTS_VALID in
bpf_program__attach_tcx (Andrii)
- Add `size_t :0;` to prevent compiler from leaving garbage (Andrii)
- Add helper macro to clear opts structs which I found useful
when writing tests
- Rework of both opts and link test cases to accommodate for changes
v1 -> v2:
- Rework of almost entire series to remove prio from UAPI and switch
to better control directives BPF_F_FIRST/BPF_F_LAST/BPF_F_BEFORE/
BPF_F_AFTER (Alexei, Toke, Stan, Andrii)
- Addition of big test suite to cover all corner cases
[0] https://lpc.events/event/16/contributions/1353/
[1] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
====================
Link: https://lore.kernel.org/r/20230719140858.13224-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>