LLVM moved their issue tracker from their own Bugzilla instance to GitHub
issues. While all of the links are still valid, they may not necessarily
show the most up to date information around the issues, as all updates
will occur on GitHub, not Bugzilla.
Another complication is that the Bugzilla issue number is not always the
same as the GitHub issue number. Thankfully, LLVM maintains this mapping
through two shortlinks:
https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num>
https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num>
Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the
"https://llvm.org/pr<num>" shortlink so that the links show the most up to
date information. Each migrated issue links back to the Bugzilla entry,
so there should be no loss of fidelity of information here.
Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Fangrui Song <maskray@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Current release - regressions:
- af_unix: fix task hung while purging oob_skb in GC
- pds_core: do not try to run health-thread in VF path
Current release - new code bugs:
- sched: act_mirred: don't zero blockid when net device is being deleted
Previous releases - regressions:
- netfilter:
- nat: restore default DNAT behavior
- nf_tables: fix bidirectional offload, broken when unidirectional
offload support was added
- openvswitch: limit the number of recursions from action sets
- eth: i40e: do not allow untrusted VF to remove administratively
set MAC address
Previous releases - always broken:
- tls: fix races and bugs in use of async crypto
- mptcp: prevent data races on some of the main socket fields,
fix races in fastopen handling
- dpll: fix possible deadlock during netlink dump operation
- dsa: lan966x: fix crash when adding interface under a lag
when some of the ports are disabled
- can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
Misc:
- handful of fixes and reliability improvements for selftests
- fix sysfs documentation missing net/ in paths
- finish the work of squashing the missing MODULE_DESCRIPTION()
warnings in networking
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXOQ6AACgkQMUZtbf5S
IrsUrBAAhFMdcrJwLO73+ODfix4okmpOVPLvnW8DxsT46F9Uex3oP2mR7W5CtSp9
yr10n5Ce2rjRUu8T5D5XGkg0dHFFF887Ngs3PLxaZTEb13UcfxANZ+jjyyVB8XPf
HEODBqzJuFBkh4/qSY2/VEDjQW57JopyVVitC9ktF7yhJbZfFfEEf68L0DYqijF4
MzsGgcHenm2UuunOppp7S5yoWRHgl0IPr6Stz0Dw/AacqJrGl0sicuobTARvcGXP
G/0nLDerbcr+JhbgQUmKX3t3hxxwG9zyJmgyuX285NTPQagbGvYM5gQHLREdAwLF
8N2r2uoD0cPv00PQee/7/kfepLOiIkKthX9YEutT4fjOqtQ/CwSForXDqe7oI3rs
+KCMDn3LN/JECu9i8zUJUxdt2LBy0TPu7XrgZZuXbOEnAIKBjFQc59dtBE1Z2ROJ
r10Q4aR0xjaQ1yErl+mu/WP7zQpJTJb0PQCuy8zSYl3b64cbyJb+UqpLcXaizY8G
cT6XlTEpRvP21ULxU71/UyBLnYNX3msDTlfZRs2gVZEC1dt4WuM55BZmCl+mMvEd
nuAkaPyp61EiUNSVx+eeZ5r91qFuwDo+pPyAta4PNNEzeVx2CZI0RzeFrrFzJevB
DigB69R85zs8lhDJEC129GDNgGZpbQOttEA5GzVYFFsoxBS1ygk=
=YRod
-----END PGP SIGNATURE-----
Merge tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, wireless and netfilter.
Current release - regressions:
- af_unix: fix task hung while purging oob_skb in GC
- pds_core: do not try to run health-thread in VF path
Current release - new code bugs:
- sched: act_mirred: don't zero blockid when net device is being
deleted
Previous releases - regressions:
- netfilter:
- nat: restore default DNAT behavior
- nf_tables: fix bidirectional offload, broken when unidirectional
offload support was added
- openvswitch: limit the number of recursions from action sets
- eth: i40e: do not allow untrusted VF to remove administratively set
MAC address
Previous releases - always broken:
- tls: fix races and bugs in use of async crypto
- mptcp: prevent data races on some of the main socket fields, fix
races in fastopen handling
- dpll: fix possible deadlock during netlink dump operation
- dsa: lan966x: fix crash when adding interface under a lag when some
of the ports are disabled
- can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
Misc:
- a handful of fixes and reliability improvements for selftests
- fix sysfs documentation missing net/ in paths
- finish the work of squashing the missing MODULE_DESCRIPTION()
warnings in networking"
* tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
net: fill in MODULE_DESCRIPTION()s for missing arcnet
net: fill in MODULE_DESCRIPTION()s for mdio_devres
net: fill in MODULE_DESCRIPTION()s for ppp
net: fill in MODULE_DESCRIPTION()s for fddik/skfp
net: fill in MODULE_DESCRIPTION()s for plip
net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb
net: fill in MODULE_DESCRIPTION()s for xen-netback
net: ravb: Count packets instead of descriptors in GbEth RX path
pppoe: Fix memory leak in pppoe_sendmsg()
net: sctp: fix skb leak in sctp_inq_free()
net: bcmasp: Handle RX buffer allocation failure
net-timestamp: make sk_tskey more predictable in error path
selftests: tls: increase the wait in poll_partial_rec_async
ice: Add check for lport extraction to LAG init
netfilter: nf_tables: fix bidirectional offload regression
netfilter: nat: restore default DNAT behavior
netfilter: nft_set_pipapo: fix missing : in kdoc
igc: Remove temporary workaround
igb: Fix string truncation warnings in igb_set_fw_version
can: netlink: Fix TDCO calculation using the old data bittiming
...
In case of GSO, 'chunk->skb' pointer may point to an entry from
fraglist created in 'sctp_packet_gso_append()'. To avoid freeing
random fraglist entry (and so undefined behavior and/or memory
leak), introduce 'sctp_inq_chunk_free()' helper to ensure that
'chunk->skb' is set to 'chunk->head_skb' (i.e. fraglist head)
before calling 'sctp_chunk_free()', and use the aforementioned
helper in 'sctp_inq_pop()' as well.
Reported-by: syzbot+8bb053b5d63595ab47db@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=0d8351bbe54fd04a492c2daab0164138db008042
Fixes: 90017accff ("sctp: Add GSO support")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20240214082224.10168-1-dmantipov@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXNTiwACgkQ1V2XiooU
IORw4RAAmr6WYYKyKL9TLXtdxp2c5Aj2BClIrMS/mtLBT9RKjxvL5/m2ePFCvz7N
/i7Om+dquZ4m5bS8Dk6MO61fhaKEmNWYigvfIYs4fc4qYj5WTV6XMzhY2lCRIgns
UQXZ0zbb2+BbmsXL/izYcXwM3VMp2l8PLhb/OeGtUtLDMZXF+INXrn3krYLc3TxS
4UEeLiCwxy8hgGCyS1w73GctfkznQ5vd2Zb6sD6TJ0pG1H4LmhxGDaQPMEtR9DaV
l+gxC9+Igw6r1Gmv9c1QZ//dvw4Jb+0ZuYEifeD/xqT//M56AKh8UB1/Nil6Kazq
r/VroMxQcuTJIPcx72F14U94M6r1BVRDIpBjVcpWBCrWjkgaJZkl2tcwfmn8Cihb
GWRy0zGbYoBynlsseSQUWvfJBGn0D8aFCaoroHYkFfg67Gj8aom5/hIuP2OblN3a
d+9VQ9FbEkoddv/JAF0Dp6+VVPi6DRxUOj8zC9+Ynl/+AMtx8xZ9B4yUf3n8pEag
7+OWDEnVHV7aFyfSeBETUQOPLSi+k4wpvp02QilbKIJ8s7Pp4v9KKw3CvHD59nrI
Ci9Z7PhWICoh+cZXYgradZVbyoJ6iRv2LskG/RlRpHxilZ5os+pcOiUR7dEARX05
tPRLagMiHsMsy7lsYhe+YBKtYZ1FMxGU+5p63hpkSDUVvOoV+R4=
=G4r8
-----END PGP SIGNATURE-----
Merge tag 'nf-24-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
1) Missing : in kdoc field in nft_set_pipapo.
2) Restore default DNAT behavior When a DNAT rule is configured via
iptables with different port ranges, from Kyle Swenson.
3) Restore flowtable hardware offload for bidirectional flows
by setting NF_FLOW_HW_BIDIRECTIONAL flag, from Felix Fietkau.
netfilter pull request 24-02-15
* tag 'nf-24-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: fix bidirectional offload regression
netfilter: nat: restore default DNAT behavior
netfilter: nft_set_pipapo: fix missing : in kdoc
====================
Link: https://lore.kernel.org/r/20240214233818.7946-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEUEC6huC2BN0pvD5fKDiiPnotvG8FAmXMwlUTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRAoOKI+ei28bwDmCACBeVNV2d9mL8AwNoaIiUmOHF8LsclP
NsRSl4rz/TMDFgO2tX9oUQGLsZG0YTSqJ5dF3qI7zjskBlTBJX0y4fByvQAQ6mU9
ZhwZMBz3JSS+tuZFIMWqHW1yq2TXoTnx1IzIM5f+D83LWqtP5Jto15lw1Ratrtat
taZwGwR10cEWO0IFNUx+4c5SGa+gGbEBdr7UBlJU1MdZ9fzo+ByV/H6JrfY1qqEj
DvraQm/oNCVrSP5dVr1s+0Kqnh1X1ff+6JWs5q2CJDN7E+Ai2cOxrEd2/JP7GANG
S0UIqH744z3kJDSE+GuQjxF4vbXqX3qfKIP4Q+EYlNvs0oskIQ5ebCsW
=So6Y
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2024-02-14
this is a pull request of 3 patches for net/master.
the first patch is by Ziqi Zhao and targets the CAN J1939 protocol, it
fixes a potential deadlock by replacing the spinlock by an rwlock.
Oleksij Rempel's patch adds a missing spin_lock_bh() to prevent a
potential Use-After-Free in the CAN J1939's
setsockopt(SO_J1939_FILTER).
Maxime Jayat contributes a patch to fix the transceiver delay
compensation (TDCO) calculation, which is needed for higher CAN-FD bit
rates (usually 2Mbit/s).
* tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: netlink: Fix TDCO calculation using the old data bittiming
can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
====================
Link: https://lore.kernel.org/r/20240214140348.2412776-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When SOF_TIMESTAMPING_OPT_ID is used to ambiguate timestamped datagrams,
the sk_tskey can become unpredictable in case of any error happened
during sendmsg(). Move increment later in the code and make decrement of
sk_tskey in error path. This solution is still racy in case of multiple
threads doing snedmsg() over the very same socket in parallel, but still
makes error path much more predictable.
Fixes: 09c2d251b7 ("net-timestamp: add key to disambiguate concurrent datagrams")
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240213110428.1681540-1-vadfed@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
that's how we love it ;-)
iwlwifi:
- correct A3 in A-MSDUs
- fix crash when operating as AP and running out of station
slots to use
- clear link ID to correct some later checks against it
- fix error codes in SAR table loading
- fix error path in PPAG table read
mac80211:
- reload a pointer after SKB may have changed
(only in certain monitor inject mode scenarios)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmXNCSMACgkQ10qiO8sP
aAA79A//SAXDwnnfJDa+F57aqFFSQSs+y+4D01NgWsJkVSHVF9JJMowsCvWZ2lhz
NXaBtONTzwMjDVxnMaEQgqBMNH7HXWzxqi7twvDCbHYFPyJzInWcCPokpsfQ9/Tc
5n0yPcUuUwDdO1I06CxAdvBU26I9nMIvI353DuhdGRaZdH85isTDgW35T2G3TD3w
ratlmHoIYUe7cjvhJs/6p4R7quBSLT74mqIs00l3mtlyRKhdVGR+Tl3YwCfmUb+F
V/vQo13O04QnC2QOzEAz//PUj1Rm9XXCaiWQHKs8QyVM4opFQADhrKLRQjkqu/3p
KOaJPxJEr2NnTuuFWyfj78k+zV8tMSvfXcRwVPO/ZtXow6CtYtV4h09FK8xdpkJK
rkrQ06Up111sS8uDJrzWRlREBM/JTOIZHkLGF7ZkQK3ICVZPi1vGg8MbQjuM8lnd
Oc95eOn4BTC0lua3L65f/C/UQpSXr+vqKTq+xOsybxnWmLJBcFSWOIqeaLJTblsi
YiZwowlpxoFC/UCEzTsSTRKbjETb590oyJqeg0pchdUT50x9ZiBfo094sdovrKqE
eJDiiDiXWPIB1Cf+ic8iP6T6C0Qsv8zq+GZtyIMZ0ZAkywdUOTMNst8UA9LRstx4
AlvMRfOM9aJhSDmvDk/Nheff9mjIsJYjZ+U09wLRXOJO1Yse4VU=
=h/fM
-----END PGP SIGNATURE-----
Merge tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Valentine's day edition, with just few fixes because
that's how we love it ;-)
iwlwifi:
- correct A3 in A-MSDUs
- fix crash when operating as AP and running out of station
slots to use
- clear link ID to correct some later checks against it
- fix error codes in SAR table loading
- fix error path in PPAG table read
mac80211:
- reload a pointer after SKB may have changed
(only in certain monitor inject mode scenarios)
* tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: fix a crash when we run out of stations
wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()
wifi: iwlwifi: Fix some error codes
wifi: iwlwifi: clear link_id in time_event
wifi: iwlwifi: mvm: use correct address 3 in A-MSDU
wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
====================
Link: https://lore.kernel.org/r/20240214184326.132813-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 8f84780b84 ("netfilter: flowtable: allow unidirectional rules")
made unidirectional flow offload possible, while completely ignoring (and
breaking) bidirectional flow offload for nftables.
Add the missing flag that was left out as an exercise for the reader :)
Cc: Vlad Buslov <vladbu@nvidia.com>
Fixes: 8f84780b84 ("netfilter: flowtable: allow unidirectional rules")
Reported-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When a DNAT rule is configured via iptables with different port ranges,
iptables -t nat -A PREROUTING -p tcp -d 10.0.0.2 -m tcp --dport 32000:32010
-j DNAT --to-destination 192.168.0.10:21000-21010
we seem to be DNATing to some random port on the LAN side. While this is
expected if --random is passed to the iptables command, it is not
expected without passing --random. The expected behavior (and the
observed behavior prior to the commit in the "Fixes" tag) is the traffic
will be DNAT'd to 192.168.0.10:21000 unless there is a tuple collision
with that destination. In that case, we expect the traffic to be
instead DNAT'd to 192.168.0.10:21001, so on so forth until the end of
the range.
This patch intends to restore the behavior observed prior to the "Fixes"
tag.
Fixes: 6ed5943f87 ("netfilter: nat: remove l4 protocol port rovers")
Signed-off-by: Kyle Swenson <kyle.swenson@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add missing : in kdoc field names.
Fixes: 8683f4b995 ("nft_set_pipapo: Prepare for vectorised implementation: helpers")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The following 3 locks would race against each other, causing the
deadlock situation in the Syzbot bug report:
- j1939_socks_lock
- active_session_list_lock
- sk_session_queue_lock
A reasonable fix is to change j1939_socks_lock to an rwlock, since in
the rare situations where a write lock is required for the linked list
that j1939_socks_lock is protecting, the code does not attempt to
acquire any more locks. This would break the circular lock dependency,
where, for example, the current thread already locks j1939_socks_lock
and attempts to acquire sk_session_queue_lock, and at the same time,
another thread attempts to acquire j1939_socks_lock while holding
sk_session_queue_lock.
NOTE: This patch along does not fix the unregister_netdevice bug
reported by Syzbot; instead, it solves a deadlock situation to prepare
for one or more further patches to actually fix the Syzbot bug, which
appears to be a reference counting problem within the j1939 codebase.
Reported-by: <syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com>
Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20230721162226.8639-1-astrajoan@yahoo.com
[mkl: remove unrelated newline change]
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Commit 67f562e3e1 ("net/smc: transfer fasync_list in case of fallback")
leaves the socket's fasync list pointer within a container socket as well.
When the latter is destroyed, '__sock_release()' warns about its non-empty
fasync list, which is a dangling pointer to previously freed fasync list
of an underlying TCP socket. Fix this spurious warning by nullifying
fasync list of a container socket.
Fixes: 67f562e3e1 ("net/smc: transfer fasync_list in case of fallback")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Functions rds_still_queued and rds_clear_recv_queue lock a given socket
in order to safely iterate over the incoming rds messages. However
calling rds_inc_put while under this lock creates a potential deadlock.
rds_inc_put may eventually call rds_message_purge, which will lock
m_rs_lock. This is the incorrect locking order since m_rs_lock is
meant to be locked before the socket. To fix this, we move the message
item to a local list or variable that wont need rs_recv_lock protection.
Then we can safely call rds_inc_put on any item stored locally after
rs_recv_lock is released.
Fixes: bdbe6fbc6a ("RDS: recv.c")
Reported-by: syzbot+f9db6ff27b9bfdcfeca0@syzkaller.appspotmail.com
Reported-by: syzbot+dcd73ff9291e6d34b3ab@syzkaller.appspotmail.com
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://lore.kernel.org/r/20240209022854.200292-1-allison.henderson@oracle.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
rtnl_prop_list_size() can be called while alternative names
are added or removed concurrently.
if_nlmsg_size() / rtnl_calcit() can indeed be called
without RTNL held.
Use explicit RCU protection to avoid UAF.
Fixes: 88f4fb0c74 ("net: rtnetlink: put alternative names to getlink message")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240209181248.96637-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fastopen and PM-trigger subflow shutdown can race, as reported by
syzkaller.
In my first attempt to close such race, I missed the fact that
the subflow status can change again before the subflow_state_change
callback is invoked.
Address the issue additionally copying with all the states directly
reachable from TCP_FIN_WAIT1.
Fixes: 1e777f39b4 ("mptcp: add MSG_FASTOPEN sendmsg flag support")
Fixes: 4fd19a3070 ("mptcp: fix inconsistent state on fastopen race")
Cc: stable@vger.kernel.org
Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/458
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before adding a new entry in mptcp_userspace_pm_get_local_id(), it's
better to check whether this address is already in userspace pm local
address list. If it's in the list, no need to add a new entry, just
return it's address ID and use this address.
Fixes: 8b20137012 ("mptcp: read attributes of addr entries managed by userspace PMs")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <geliang.tang@linux.dev>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most MPTCP-level related fields are under the mptcp data lock
protection, but are written one-off without such lock at MPC
complete time, both for the client and the server
Leverage the mptcp_propagate_state() infrastructure to move such
initialization under the proper lock client-wise.
The server side critical init steps are done by
mptcp_subflow_fully_established(): ensure the caller properly held the
relevant lock, and avoid acquiring the same lock in the nested scopes.
There are no real potential races, as write access to such fields
is implicitly serialized by the MPTCP state machine; the primary
goal is consistency.
Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'msk->write_seq' and 'msk->snd_nxt' are always updated under
the msk socket lock, except at MPC handshake completiont time.
Builds-up on the previous commit to move such init under the relevant
lock.
There are no known problems caused by the potential race, the
primary goal is consistency.
Fixes: 6d0060f600 ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
mptcp_rcv_space_init() is supposed to happen under the msk socket
lock, but active msk socket does that without such protection.
Leverage the existing mptcp_propagate_state() helper to that extent.
We need to ensure mptcp_rcv_space_init will happen before
mptcp_rcv_space_adjust(), and the release_cb does not assure that:
explicitly check for such condition.
While at it, move the wnd_end initialization out of mptcp_rcv_space_init(),
it never belonged there.
Note that the race does not produce ill effect in practice, but
change allows cleaning-up and defying better the locking model.
Fixes: a6b118febb ("mptcp: add receive buffer auto-tuning")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Such field is there to avoid acquiring the data lock in a few spots,
but it adds complexity to the already non trivial locking schema.
All the relevant call sites (mptcp-level re-injection, set socket
options), are slow-path, drop such field in favor of 'cb_flags', adding
the relevant locking.
This patch could be seen as an improvement, instead of a fix. But it
simplifies the next patch. The 'Fixes' tag has been added to help having
this series backported to stable.
Fixes: e9d09baca6 ("mptcp: avoid atomic bit manipulation when possible")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev->lstats is notably used from loopback ndo_start_xmit()
and other virtual drivers.
Per cpu stats updates are dirtying per-cpu data,
but the pointer itself is read-only.
Fixes: 43a71cd66b ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tp->scaling_ratio is a read mostly field, used in rx and tx fast paths.
Fixes: d5fed5addb ("tcp: reorganize tcp_sock fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Wei Wang <weiwan@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We double count async, non-zc rx data. The previous fix was
lucky because if we fully zc async_copy_bytes is 0 so we add 0.
Decrypted already has all the bytes we handled, in all cases.
We don't have to adjust anything, delete the erroneous line.
Fixes: 4d42cd6bc2 ("tls: rx: fix return value for async crypto")
Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tls_decrypt_sg doesn't take a reference on the pages from clear_skb,
so the put_page() in tls_decrypt_done releases them, and we trigger
a use-after-free in process_rx_list when we try to read from the
partially-read skb.
Fixes: fd31f3996a ("tls: rx: decrypt into a fresh skb")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
-EBUSY instead of -EINPROGRESS in valid situations. For example, when
the cryptd queue for AESNI is full (easy to trigger with an
artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
to the backlog but still processed. In that case, the async callback
will also be called twice: first with err == -EINPROGRESS, which it
seems we can just ignore, then with err == 0.
Compared to Sabrina's original patch this version uses the new
tls_*crypt_async_wait() helpers and converts the EBUSY to
EINPROGRESS to avoid having to modify all the error handling
paths. The handling is identical.
Fixes: a54667f672 ("tls: Add support for encryption using async offload accelerator")
Fixes: 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to previous commit, the submitting thread (recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete().
Reorder scheduling the work before calling complete().
This seems more logical in the first place, as it's
the inverse order of what the submitting thread will do.
Reported-by: valis <sec@valis.email>
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The submitting thread (one which called recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete()
so any code past that point risks touching already freed data.
Try to avoid the locking and extra flags altogether.
Have the main thread hold an extra reference, this way
we can depend solely on the atomic ref counter for
synchronization.
Don't futz with reiniting the completion, either, we are now
tightly controlling when completion fires.
Reported-by: valis <sec@valis.email>
Fixes: 0cada33241 ("net/tls: fix race condition causing kernel panic")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor out waiting for async encrypt and decrypt to finish.
There are already multiple copies and a subsequent fix will
need more. No functional changes.
Note that crypto_wait_req() returns wait->err
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
files) and two cap handling fixes from Xiubo and Rishabh.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmXGRJATHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi+/0B/4pEweAm2W0UUaaS59DecNySBFobwed
m7bBDBGIAQ/I3duN46a13FzsGNclho967TeB0ig1jrQxnoo3HEMiXpZz5xfG9spe
fyvrIk3R8cSqgd7YsyITnUjGGd2UBvZVrbWOCbWrKofSoflS6IjcGDQF7ZrgEsff
0KkMaWHvO6poIU2mAToV//UkWUk6RrtAUNlSdjLpizXnUrrAQ+vUA3OU9SSp6Klf
xmFaIiAiVZC6M8qFpXJtnIf8Ba7PrpW5InAXgCOkxDKciE9fLaPsIu0B3H9lUVKZ
TJwjEJ0nB+akh0tRO5bZKyM8j0D3lhgxphJwNtUoYjQsV3m7LcGQV+Il
=u953
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Some fscrypt-related fixups (sparse reads are used only for encrypted
files) and two cap handling fixes from Xiubo and Rishabh"
* tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client:
ceph: always check dir caps asynchronously
ceph: prevent use-after-free in encode_cap_msg()
ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
libceph: just wait for more data to be available on the socket
libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
libceph: fail sparse-read if the data length doesn't match
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c323 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").
Then, much later, we ended up removing the workaround in commit
43c249ea0b ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.
Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround. But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.
It looks like there are at least two separate issues that all hit in
this area:
(a) some versions of gcc don't mark the asm goto as 'volatile' when it
has outputs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
which is easy to work around by just adding the 'volatile' by hand.
(b) Internal compiler errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
which are worked around by adding the extra empty 'asm' as a
barrier, as in the original workaround.
but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.
but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the network schedulers.
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240208164244.3818498-8-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the IPv4 modules.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240208164244.3818498-7-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the IPv6 modules.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240208164244.3818498-6-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to IPv6 over Low power Wireless Personal Area Network.
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240208164244.3818498-5-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the PF_KEY socket helpers.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240208164244.3818498-4-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Multi-Protocol Over ATM (MPOA) driver.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240208164244.3818498-3-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the XFRM interface drivers.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240208164244.3818498-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While testing tdc with parallel tests for mirred to block we caught an
intermittent bug. The blockid was being zeroed out when a net device
was deleted and, thus, giving us an incorrect blockid value whenever
we tried to dump the mirred action. Since we don't increment the block
refcount in the control path (and only use the ID), we don't need to
zero the blockid field whenever a net device is going down.
Fixes: 42f39036cd ("net/sched: act_mirred: Allow mirred to block")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240207222902.1469398-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ovs module allows for some actions to recursively contain an action
list for complex scenarios, such as sampling, checking lengths, etc.
When these actions are copied into the internal flow table, they are
evaluated to validate that such actions make sense, and these calls
happen recursively.
The ovs-vswitchd userspace won't emit more than 16 recursion levels
deep. However, the module has no such limit and will happily accept
limits larger than 16 levels nested. Prevent this by tracking the
number of recursions happening and manually limiting it to 16 levels
nested.
The initial implementation of the sample action would track this depth
and prevent more than 3 levels of recursion, but this was removed to
support the clone use case, rather than limited at the current userspace
limit.
Fixes: 798c166173 ("openvswitch: Optimize sample action for the clone use cases")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240207132416.1488485-2-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recently, handshake_req_destroy_test1 started failing:
Expected handshake_req_destroy_test == req, but
handshake_req_destroy_test == 0000000000000000
req == 0000000060f99b40
not ok 11 req_destroy works
This is because "sock_release(sock)" was replaced with "fput(filp)"
to address a memory leak. Note that sock_release() is synchronous
but fput() usually delays the final close and clean-up.
The delay is not consequential in the other cases that were changed
but handshake_req_destroy_test1 is testing that handshake_req_cancel()
followed by closing the file actually does call the ->hp_destroy
method. Thus the PTR_EQ test at the end has to be sure that the
final close is complete before it checks the pointer.
We cannot use a completion here because if ->hp_destroy is never
called (ie, there is an API bug) then the test will hang.
Reported by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/netdev/ZcKDd1to4MPANCrn@tissot.1015granger.net/T/#mac5c6299f86799f1c71776f3a07f9c566c7c3c40
Fixes: 4a0f07d71b ("net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/170724699027.91401.7839730697326806733.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXEuicACgkQ1V2XiooU
IOSbvA/9F2BC9TYKAh23/0EFbD4jOl4e26YE4E+Eu8AteoQ/nD+oI+mtWgw2hVXg
zXvm1vfIc02jGuGfcPZ+EIv/dkznnDqqUpUGa4ixtgvRw2bKkb2kKMlrFsjzsihj
yabXydwhxYE9b4Ch2AmRyApTLRMocte1IJ3ci4YUXwf68wZlOe2bIG5wyzGkFpjF
QZN/Rr14UKjC57EYNdUG9UdybWSqSKD23LPZSaLvi6wxoZd8cIcIkng5K4N0WVKF
lNskuNFY+j+bJz2Yn3mWIlCoM3R1N2B04t7wRkYnKWkSuwymG3O7JC3RUQaZDBZw
8AogEbvXaIY3nxyN4lHZ/jzM/QzNB1WHlPx6RjWKHoNhnas+xuBYrjCdJZwtEu8g
xs27Tjk3QtCIuaMuhN0RFqiq93MqZD/qx++kwMwJA0Wrg76MLPpf8yEWwVGYcAEG
0EWa61UfPezbcVkW8XveW6lgDfcOIOpBevxDQ3Nf7JB0AcbVBks7oDpGwDc5Pdz5
6y7WQIilxUtu9bHODUxrshxgTBwsocVkXUTIogCihUC+SgSZF+/G796c9Iy5/kPq
BtmSNJOJyCbnivkqKTLF0Pv0BplOv7W1sx2/fo+IfRXYTHoXVjHe1BYP0Ck3WEtS
9EPsFlI5f4AOtnPF3JrTPec9PvuHyVN+8aOPi82wlKiayJcXy1I=
=Rh2n
-----END PGP SIGNATURE-----
Merge tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Narrow down target/match revision to u8 in nft_compat.
2) Bail out with unused flags in nft_compat.
3) Restrict layer 4 protocol to u16 in nft_compat.
4) Remove static in pipapo get command that slipped through when
reducing set memory footprint.
5) Follow up incremental fix for the ipset performance regression,
this includes the missing gc cancellation, from Jozsef Kadlecsik.
6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0
as no filtering, from Felix Huettner.
7) Reject direction for NFT_CT_ID.
8) Use timestamp to check for set element expiration while transaction
is handled to prevent garbage collection from removing set elements
that were just added by this transaction. Packet path and netlink
dump/get path still use current time to check for expiration.
9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal.
10) map_index needs to be percpu and per-set, not just percpu.
At this time its possible for a pipapo set to fill the all-zero part
with ones and take the 'might have bits set' as 'start-from-zero' area.
From Florian Westphal. This includes three patches:
- Change scratchpad area to a structure that provides space for a
per-set-and-cpu toggle and uses it of the percpu one.
- Add a new free helper to prepare for the next patch.
- Remove the scratch_aligned pointer and makes AVX2 implementation
use the exact same memory addresses for read/store of the matching
state.
netfilter pull request 24-02-08
* tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_set_pipapo: remove scratch_aligned pointer
netfilter: nft_set_pipapo: add helper to release pcpu scratch area
netfilter: nft_set_pipapo: store index in scratch maps
netfilter: nft_set_rbtree: skip end interval element from gc
netfilter: nfnetlink_queue: un-break NF_REPEAT
netfilter: nf_tables: use timestamp to check for set element timeout
netfilter: nft_ct: reject direction for ct id
netfilter: ctnetlink: fix filtering for zone 0
netfilter: ipset: Missing gc cancellations fixed
netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
netfilter: nft_compat: restrict match/target protocol to u16
netfilter: nft_compat: reject unused compat flag
netfilter: nft_compat: narrow down revision to unsigned 8-bits
====================
Link: https://lore.kernel.org/r/20240208112834.1433-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
use ->scratch for both avx2 and the generic implementation.
After previous change the scratch->map member is always aligned properly
for AVX2, so we can just use scratch->map in AVX2 too.
The alignoff delta is stored in the scratchpad so we can reconstruct
the correct address to free the area again.
Fixes: 7400b06396 ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After next patch simple kfree() is not enough anymore, so add
a helper for it.
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pipapo needs a scratchpad area to keep state during matching.
This state can be large and thus cannot reside on stack.
Each set preallocates percpu areas for this.
On each match stage, one scratchpad half starts with all-zero and the other
is inited to all-ones.
At the end of each stage, the half that starts with all-ones is
always zero. Before next field is tested, pointers to the two halves
are swapped, i.e. resmap pointer turns into fill pointer and vice versa.
After the last field has been processed, pipapo stashes the
index toggle in a percpu variable, with assumption that next packet
will start with the all-zero half and sets all bits in the other to 1.
This isn't reliable.
There can be multiple sets and we can't be sure that the upper
and lower half of all set scratch map is always in sync (lookups
can be conditional), so one set might have swapped, but other might
not have been queried.
Thus we need to keep the index per-set-and-cpu, just like the
scratchpad.
Note that this bug fix is incomplete, there is a related issue.
avx2 and normal implementation might use slightly different areas of the
map array space due to the avx2 alignment requirements, so
m->scratch (generic/fallback implementation) and ->scratch_aligned
(avx) may partially overlap. scratch and scratch_aligned are not distinct
objects, the latter is just the aligned address of the former.
After this change, write to scratch_align->map_index may write to
scratch->map, so this issue becomes more prominent, we can set to 1
a bit in the supposedly-all-zero area of scratch->map[].
A followup patch will remove the scratch_aligned and makes generic and
avx code use the same (aligned) area.
Its done in a separate change to ease review.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
rbtree lazy gc on insert might collect an end interval element that has
been just added in this transactions, skip end interval elements that
are not yet active.
Fixes: f718863aca ("netfilter: nft_set_rbtree: fix overlap expiration walk")
Cc: stable@vger.kernel.org
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Only override userspace verdict if the ct hook returns something
other than ACCEPT.
Else, this replaces NF_REPEAT (run all hooks again) with NF_ACCEPT
(move to next hook).
Fixes: 6291b3a67a ("netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts")
Reported-by: l.6diay@passmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a timestamp field at the beginning of the transaction, store it
in the nftables per-netns area.
Update set backend .insert, .deactivate and sync gc path to use the
timestamp, this avoids that an element expires while control plane
transaction is still unfinished.
.lookup and .update, which are used from packet path, still use the
current time to check if the element has expired. And .get path and dump
also since this runs lockless under rcu read size lock. Then, there is
async gc which also needs to check the current time since it runs
asynchronously from a workqueue.
Fixes: c3e1b005ed ("netfilter: nf_tables: add set element timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>