-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmTUhbUACgkQ1V2XiooU
IORz1Q//a2fDuMsK5iW1BlF4y0P9aQUSVV//r3DYaoYOspJhsB2yZu4HtL+XQJvY
yncwg+ub24yQh5sUNSJnZztQVTN+NPY9Vl2TkXXMx6Wxs2XenmgzZmDdghUDzhTd
DuOjIGVEJ2M6XpPAOub89sqL+E0K7J0/q0aIcV0K0/xKo7U/z3vgLv4aZx/ZjPCV
daj3gcGpYQ1JJ9pi2se2yh89dzT321U7yYde9ek0TUeKGdCFJkfHkqMurwbcgoJ8
jkx5NOtrp+GLbhd+ME86IUtD+Edm46+bJUxvG0My99CVlak7y5gJh/aPxpAPACuW
NhWWy26kivVRWyttLQk0ScZfbO1CIwvaPzQC+QdlFdNA1eWTMhEk6AG2dVaU9CNB
V9WKWv59CPaDwPCKhXiPLQ9J+Kds7oyHPXGlV2dDOuSmJ9QbOh/HBQGEm/mI93qX
Fr+qqP3A9/juXZ5FdSLT2pJPuVlXdhQdgyHgiunyDPHoL9q7GFn5aQL/BVKE23tc
bgMez0GKzBR0waS9cycFSVls1rQN1XUIdoD6SLaRYq9FkKcCx+YGn3LH44Y1feL/
UnLMFlt9xIG4dPbGcGGy4r7mB53JpglHEqJEftvsNcBEd/r/f+4JP+/fa9FJ70uZ
GpGmv7Wo5DZT5V8LaMeWDWpJl6G7UcxrFOyDTw27l2OOVNaD2Ic=
=KNf7
-----END PGP SIGNATURE-----
Merge tag 'nf-23-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The existing attempt to resolve races between control plane and GC work
is error prone, as reported by Bien Pham <phamnnb@sea.com>, some places
forgot to call nft_set_elem_mark_busy(), leading to double-deactivation
of elements.
This series contains the following patches:
1) Do not skip expired elements during walk otherwise elements might
never decrement the reference counter on data, leading to memleak.
2) Add a GC transaction API to replace the former attempt to deal with
races between control plane and GC. GC worker sets on NFT_SET_ELEM_DEAD_BIT
on elements and it creates a GC transaction to remove the expired
elements, GC transaction could abort in case of interference with
control plane and retried later (GC async). Set backends such as
rbtree and pipapo also perform GC from control plane (GC sync), in
such case, element deactivation and removal is safe because mutex
is held then collected elements are released via call_rcu().
3) Adapt existing set backends to use the GC transaction API.
4) Update rhash set backend to set on _DEAD bit to report deleted
elements from datapath for GC.
5) Remove old GC batch API and the NFT_SET_ELEM_BUSY_BIT.
* tag 'nf-23-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: don't skip expired elements during walk
====================
Link: https://lore.kernel.org/r/20230810070830.24064-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The set types rhashtable and rbtree use a GC worker to reclaim memory.
From system work queue, in periodic intervals, a scan of the table is
done.
The major caveat here is that the nft transaction mutex is not held.
This causes a race between control plane and GC when they attempt to
delete the same element.
We cannot grab the netlink mutex from the work queue, because the
control plane has to wait for the GC work queue in case the set is to be
removed, so we get following deadlock:
cpu 1 cpu2
GC work transaction comes in , lock nft mutex
`acquire nft mutex // BLOCKS
transaction asks to remove the set
set destruction calls cancel_work_sync()
cancel_work_sync will now block forever, because it is waiting for the
mutex the caller already owns.
This patch adds a new API that deals with garbage collection in two
steps:
1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT
so they are not visible via lookup. Annotate current GC sequence in
the GC transaction. Enqueue GC transaction work as soon as it is
full. If ruleset is updated, then GC transaction is aborted and
retried later.
2) GC work grabs the mutex. If GC sequence has changed then this GC
transaction lost race with control plane, abort it as it contains
stale references to objects and let GC try again later. If the
ruleset is intact, then this GC transaction deactivates and removes
the elements and it uses call_rcu() to destroy elements.
Note that no elements are removed from GC lockless path, the _DEAD bit
is set and pointers are collected. GC catchall does not remove the
elements anymore too. There is a new set->dead flag that is set on to
abort the GC transaction to deal with set->ops->destroy() path which
removes the remaining elements in the set from commit_release, where no
mutex is held.
To deal with GC when mutex is held, which allows safe deactivate and
removal, add sync GC API which releases the set element object via
call_rcu(). This is used by rbtree and pipapo backends which also
perform garbage collection from control plane path.
Since element removal from sets can happen from control plane and
element garbage collection/timeout, it is necessary to keep the set
structure alive until all elements have been deactivated and destroyed.
We cannot do a cancel_work_sync or flush_work in nft_set_destroy because
its called with the transaction mutex held, but the aforementioned async
work queue might be blocked on the very mutex that nft_set_destroy()
callchain is sitting on.
This gives us the choice of ABBA deadlock or UaF.
To avoid both, add set->refs refcount_t member. The GC API can then
increment the set refcount and release it once the elements have been
free'd.
Set backends are adapted to use the GC transaction API in a follow up
patch entitled:
("netfilter: nf_tables: use gc transaction API in set backends")
This is joint work with Florian Westphal.
Fixes: cfed7e1b1f ("netfilter: nf_tables: add set garbage collection helpers")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmTSNyYNHGZ3QHN0cmxl
bi5kZQAKCRBwkajZrV/2AOpyD/4pRjBvLgU0O+33x3l/X8z80QW6VMwq6PDUDmNA
t0AFnhIx0v6yky0abLzGGV9q2N9SLdNltTsqb0pem00f1TncKR1BlHNttfU+yjS+
qmkuUTJ8HixxkbBRKB9E7kA3IPM2aj1gxd3sji/QHKMT8XtcE5ufoad8/jcM9px2
FHNHJ8Onwl8ohjV4qNeuPe8XWm47pN/FeaxK7jRrKJaCal0P96sT8AGf/Rvx5VNY
jCysb2+fIKMzHssbLcRr1UDMJJFqtcQx0alnzwxh4sEPsmgYYR7UGmcDku4pSbtB
uJBrjMnLpORpw1l2syuYiiyEy+VRAAIjWAUxb6oTOvDhj1Yj2ki/915b/Hl/jnqa
q8EUm6i+B4CuiE8LCj0WLG2gKO7vRjFnDH/Li/qiFMUHzW/HmnLRituxTVTXwopC
1CxFkekNIklxLr+n21dP6f+NJU9hIs1nw+iy0JhLffTV8u6TosZ3Ve5ZInUQ4Bna
hZUyksy1s42fi0oGHN1Gi3AWPhUIlH69lKXTOher/9PvC+rL5/l8LFVywqtYxT4t
HWJwBtpm58IrsDttgPm6fnJqIgrm5mcW+FJYqP9Td6NjUvyfafhJsdhXJ9Bs1lJV
SfJp4iEUkOx4bfUi0rDFb/8NR2fwrCpbbgYx2xdS+tHQ2BkRlAOxYJC9Q8YP2vxH
Mjgc9w==
=XfrS
-----END PGP SIGNATURE-----
Merge tag 'nf-next-2023-08-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter updates for net-next
First 4 Patches, from Yue Haibing, remove unused prototypes in
various netfilter headers.
Last patch makes nfnetlink_log to always include a packet timestamp,
up to now it was only included if the skb had assigned previously.
From Maciej Żenczykowski.
* tag 'nf-next-2023-08-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nfnetlink_log: always add a timestamp
netfilter: h323: Remove unused function declarations
netfilter: conntrack: Remove unused function declarations
netfilter: helper: Remove unused function declarations
netfilter: gre: Remove unused function declaration nf_ct_gre_keymap_flush()
====================
Link: https://lore.kernel.org/r/20230808124159.19046-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 6bc506b4fb ("bridge: switchdev: Add forward mark support for stacked devices")
removed the implementation but leave declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20230808145955.2176-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In commit 82e896d992 ("docs: net: page_pool: use kdoc to avoid
duplicating the information") I shied away from using the DOC:
comments when moving to kdoc for documenting page_pool API,
because I wasn't sure how familiar people are with it.
Turns out there is already a DOC: comment for the intro, which
is the same in both places, modulo what looks like minor rewording.
Use the version from Documentation/ but keep the contents with
the code.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230807210051.1014580-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commitbf9009bf21b5 ("net/fq_impl: drop get_default_func, move default flow to fq_tin")
remove its last user, so can remove it.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807142111.33524-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
AP_VLAN interfaces are virtual, so doesn't really exist as a type for
capabilities. When passed in as a type, AP is the one that's really intended.
Fixes: c4cbaf7973 ("cfg80211: Add support for HE")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230622165919.46841-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On x86_64, frag_* fields of struct page_pool are scattered across two
cachelines despite the summary size of 24 bytes. All three fields are
used in pretty much the same places, but the last field, ::frag_users,
is pushed out to the next CL, provoking unwanted false-sharing on
hotpath (frags allocation code).
There are some holes and cold members to move around. Move frag_* one
block up, placing them right after &page_pool_params perfectly at the
beginning of CL2. This doesn't do any meaningful to the second block, as
those are some destroy-path cold structures, and doesn't do anything to
::alloc_stats, which still starts at 200-byte offset, 8 bytes after CL3
(still fitting into 1 cacheline).
On my setup, this yields 1-2% of Mpps when using PP frags actively.
When it comes to 32-bit architectures with 32-byte CL: &page_pool_params
plus ::pad is 44 bytes, the block taken care of is 16 bytes within one
CL, so there should be at least no regressions from the actual change.
::pages_state_hold_cnt is not related directly to that triple, but is
paired currently with ::frags_offset and decoupling them would mean
either two 4-byte holes or more invasive layout changes.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230804180529.2483231-4-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, touching <net/page_pool/types.h> triggers a rebuild of more
than half of the kernel. That's because it's included in
<linux/skbuff.h>. And each new include to page_pool/types.h adds more
[useless] data for the toolchain to process per each source file from
that pile.
In commit 6a5bcd84e8 ("page_pool: Allow drivers to hint on SKB
recycling"), Matteo included it to be able to call a couple of functions
defined there. Then, in commit 57f05bc2ab ("page_pool: keep pp info as
long as page pool owns the page") one of the calls was removed, so only
one was left. It's the call to page_pool_return_skb_page() in
napi_frag_unref(). The function is external and doesn't have any
dependencies. Having very niche page_pool_types.h included only for that
looks like an overkill.
As %PP_SIGNATURE is not local to page_pool.c (was only in the
early submissions), nothing holds this function there. Teleport
page_pool_return_skb_page() to skbuff.c, just next to the main consumer,
skb_pp_recycle(), and rename it to napi_pp_put_page(), as it doesn't
work with skbs at all and the former name tells nothing. The #if guards
here are only to not compile and have it in the vmlinux when not needed
-- both call sites are already guarded.
Now, touching page_pool_types.h only triggers rebuilding of the drivers
using it and a couple of core networking files.
Suggested-by: Jakub Kicinski <kuba@kernel.org> # make skbuff.h less heavy
Suggested-by: Alexander Duyck <alexanderduyck@fb.com> # move to skbuff.c
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230804180529.2483231-3-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Split types and pure function declarations from page_pool.h
and add them in page_page/types.h, so that C sources can
include page_pool.h and headers should generally only include
page_pool/types.h as suggested by jakub.
Rename page_pool.h to page_pool/helpers.h to have both in
one place.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com
[Jakub: change microsoft/mana, fix kdoc paths in Documentation]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 6ba5a3c52d ("[UDP]: Make full use of proto.h.udp_hash innovation.")
removed these implementations but leave declarations.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f8572d8f2a ("sysctl net: Remove unused binary sysctl code")
left behind this declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit acb674428c ("net: sched: introduce per-block callbacks")
implemented these but never used it.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pneigh_for_each() is never implemented since the beginning of git history.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add page pool for RX buffers for faster buffer cycle and reduce CPU
usage.
The standard page pool API is used.
With iperf and 128 threads test, this patch improved the throughput
by 12-15%, and decreased the IRQ associated CPU's usage from 99-100% to
10-50%.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit d021c34405 ("VSOCK: Introduce VM Sockets") declared but never implemented
vsock_release_pending(). Also vsock_init_tap() never implemented since introduction
in commit 531b374834 ("VSOCK: Add vsockmon tap functions").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20230803134507.22660-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit d8d9ba8dc9 ("net: 802: remove dead leftover after ipx driver removal")
remove these implementations but leave the declarations.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230803135424.41664-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
llc_conn_ac_send_i_rsp_as_ack() and llc_conn_ev_sendack_tmr_exp()
are never implemented since beginning of git history.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230803134747.41512-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I have seen tcp_hashinfo starting at a non optimal location,
forcing input handlers to pull two cache lines instead of one,
and sharing a cache line that was dirtied more than necessary:
ffffffff83680600 b tcp_orphan_timer
ffffffff83680628 b tcp_orphan_cache
ffffffff8368062c b tcp_enable_tx_delay.__tcp_tx_delay_enabled
ffffffff83680630 B tcp_hashinfo
ffffffff83680680 b tcp_cong_list_lock
After this patch, ehash, ehash_locks, ehash_mask and ehash_locks_mask
are located in a read-only cache line.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At present hwc timeout value is a fixed value. This patch sets the hwc
timeout from the hardware. It now uses a new hardware capability
GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG to query and set the value
in hwc_timeout.
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5aa83a4c0a (" [PATCH] remove two obsolete net drivers") remove fmv18x_probe().
And commmit 01f4685797 ("eth: amd: remove NI6510 support (ni65)") leave ni65_probe().
Commit a10079c662 ("staging: remove hp100 driver") remove hp100 driver and hp100_probe()
declaration is not used anymore.
sonic_probe() and iph5526_probe() are never implemented since the beginning of git history.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20230802130716.37308-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRdM/uy1Ege0+EN1fNar9k/UBDW4wUCZMvevwAKCRBar9k/UBDW
42Z0AP90hLZ9OmoghYAlALHLl8zqXuHCV8OeFXR5auqG+kkcCwEAx6h99vnh4zgP
Tngj6Yid60o39/IZXXblhV37HfSiyQ8=
=/kVE
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2023-08-03
We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 84 files changed, 4026 insertions(+), 562 deletions(-).
The main changes are:
1) Add SO_REUSEPORT support for TC bpf_sk_assign from Lorenz Bauer,
Daniel Borkmann
2) Support new insns from cpu v4 from Yonghong Song
3) Non-atomically allocate freelist during prefill from YiFei Zhu
4) Support defragmenting IPv(4|6) packets in BPF from Daniel Xu
5) Add tracepoint to xdp attaching failure from Leon Hwang
6) struct netdev_rx_queue and xdp.h reshuffling to reduce
rebuild time from Jakub Kicinski
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
net: invert the netdevice.h vs xdp.h dependency
net: move struct netdev_rx_queue out of netdevice.h
eth: add missing xdp.h includes in drivers
selftests/bpf: Add testcase for xdp attaching failure tracepoint
bpf, xdp: Add tracepoint to xdp attaching failure
selftests/bpf: fix static assert compilation issue for test_cls_*.c
bpf: fix bpf_probe_read_kernel prototype mismatch
riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework
libbpf: fix typos in Makefile
tracing: bpf: use struct trace_entry in struct syscall_tp_t
bpf, devmap: Remove unused dtab field from bpf_dtab_netdev
bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry
netfilter: bpf: Only define get_proto_defrag_hook() if necessary
bpf: Fix an array-index-out-of-bounds issue in disasm.c
net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn
docs/bpf: Fix malformed documentation
bpf: selftests: Add defrag selftests
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Support not connecting client socket
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
...
====================
Link: https://lore.kernel.org/r/20230803174845.825419-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All struct members of the driver-facing APIs are documented twice,
in the code and under Documentation. This is a bit tedious.
I also get the feeling that a lot of developers will read the header
when coding, rather than the doc. Bring the two a little closer
together by using kdoc for structs and functions.
Using kdoc also gives us links (mentioning a function or struct
in the text gets replaced by a link to its doc).
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230802161821.3621985-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
xdp.h is far more specific and is included in only 67 other
files vs netdevice.h's 1538 include sites.
Make xdp.h include netdevice.h, instead of the other way around.
This decreases the incremental allmodconfig builds size when
xdp.h is touched from 5947 to 662 objects.
Move bpf_prog_run_xdp() to xdp.h, seems appropriate and filter.h
is a mega-header in its own right so it's nice to avoid xdp.h
getting included there as well.
The only unfortunate part is that the typedef for xdp_features_t
has to move to netdevice.h, since its embedded in struct netdevice.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-4-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
struct netdev_rx_queue is touched in only a few places
and having it defined in netdevice.h brings in the dependency
on xdp.h, because struct xdp_rxq_info gets embedded in
struct netdev_rx_queue.
In prep for removal of xdp.h from netdevice.h move all
the netdev_rx_queue stuff to a new header.
We could technically break the new header up to avoid
the sysfs.h include but it's so rarely included it
doesn't seem to be worth it at this point.
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-3-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Handful of drivers currently expect to get xdp.h by virtue
of including netdevice.h. This will soon no longer be the case
so add explicit includes.
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
These declarations are never implemented since beginning of git history.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20230731141030.32772-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch enables offload for TC classifier
flower rules which matches against SPI field.
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for dissecting IPSEC field SPI (which is
32bits in size) for ESP and AH packets.
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is never implemented since the beginning of git history.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230731140437.37056-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 'filter_cnt' counter is used to control a Qdisc class lifetime.
Each filter referecing this class by its id will eventually
increment/decrement this counter in their respective
'add/update/delete' routines.
As these operations are always serialized under rtnl lock, we don't
need an atomic type like 'refcount_t'.
It also means that we lose the overflow/underflow checks already
present in refcount_t, which are valuable to hunt down bugs
where the unsigned counter wraps around as it aids automated tools
like syzkaller to scream in such situations.
Wrap the open coded increment/decrement into helper functions and
add overflow checks to the operations.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
commit 8a59f9d1e3 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
left behind tcp_bpf_get_proto() declaration. And tcp_v4_tw_remember_stamp()
function is remove in ccb7c410dd ("timewait_sock: Create and use getpeer op.").
Since commit 686989700c ("tcp: simplify tcp_mark_skb_lost")
tcp_skb_mark_lost_uncond_verify() declaration is not used anymore.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230729122644.10648-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
devlink_port_region_destroy() is never implemented since
commit 544e7c33ec ("net: devlink: Add support for port regions").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230728132113.32888-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As 32bits of dissector->used_keys are exhausted,
increase the size to 64bits.
This is base change for ESP/AH flow dissector patch.
Please find patch and discussions at
https://lore.kernel.org/netdev/ZMDNjD46BvZ5zp5I@corigine.com/T/#t
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw
Tested-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800 ("[NET]: Introducing socket mark socket option.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
`udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
device from CB. The fix changes it to fetch the device from `skb->dev`.
l3mdev case requires special attention since it has a master and a slave
device.
Fixes: a6024562ff ("udp: Add GRO functions to UDP socket")
Reported-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuck Lever says:
====================
In-kernel support for the TLS Alert protocol
IMO the kernel doesn't need user space (ie, tlshd) to handle the TLS
Alert protocol. Instead, a set of small helper functions can be used
to handle sending and receiving TLS Alerts for in-kernel TLS
consumers.
====================
Merged on top of a tag in case it's needed in the NFS tree.
Link: https://lore.kernel.org/r/169047923706.5241.1181144206068116926.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>