2019-05-19 20:07:45 +08:00
|
|
|
# SPDX-License-Identifier: GPL-2.0-only
|
2005-04-17 06:20:36 +08:00
|
|
|
#
|
|
|
|
# Network configuration
|
|
|
|
#
|
|
|
|
|
2008-07-30 18:14:01 +08:00
|
|
|
menuconfig NET
|
2005-04-17 06:20:36 +08:00
|
|
|
bool "Networking support"
|
2009-03-04 14:53:30 +08:00
|
|
|
select NLATTR
|
2013-06-05 00:46:26 +08:00
|
|
|
select GENERIC_NET_UTILS
|
2014-10-24 09:41:08 +08:00
|
|
|
select BPF
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2005-04-17 06:20:36 +08:00
|
|
|
Unless you really know what you are doing, you should say Y here.
|
|
|
|
The reason is that some programs need kernel networking support even
|
|
|
|
when running on a stand-alone machine that isn't connected to any
|
2005-07-12 12:03:49 +08:00
|
|
|
other computer.
|
2018-07-25 03:29:18 +08:00
|
|
|
|
2005-07-12 12:03:49 +08:00
|
|
|
If you are upgrading from an older kernel, you
|
2005-04-17 06:20:36 +08:00
|
|
|
should consider updating your networking tools too because changes
|
|
|
|
in the kernel and the tools often go hand in hand. The tools are
|
|
|
|
contained in the package net-tools, the location and version number
|
|
|
|
of which are given in <file:Documentation/Changes>.
|
|
|
|
|
|
|
|
For a general introduction to Linux networking, it is highly
|
|
|
|
recommended to read the NET-HOWTO, available from
|
|
|
|
<http://www.tldp.org/docs.html#howto>.
|
|
|
|
|
2005-07-12 12:13:56 +08:00
|
|
|
if NET
|
2005-04-17 06:20:36 +08:00
|
|
|
|
net/compat/wext: send different messages to compat tasks
Wireless extensions have the unfortunate problem that events
are multicast netlink messages, and are not independent of
pointer size. Thus, currently 32-bit tasks on 64-bit platforms
cannot properly receive events and fail with all kinds of
strange problems, for instance wpa_supplicant never notices
disassociations, due to the way the 64-bit event looks (to a
32-bit process), the fact that the address is all zeroes is
lost, it thinks instead it is 00:00:00:00:01:00.
The same problem existed with the ioctls, until David Miller
fixed those some time ago in an heroic effort.
A different problem caused by this is that we cannot send the
ASSOCREQIE/ASSOCRESPIE events because sending them causes a
32-bit wpa_supplicant on a 64-bit system to overwrite its
internal information, which is worse than it not getting the
information at all -- so we currently resort to sending a
custom string event that it then parses. This, however, has a
severe size limitation we are frequently hitting with modern
access points; this limitation would can be lifted after this
patch by sending the correct binary, not custom, event.
A similar problem apparently happens for some other netlink
users on x86_64 with 32-bit tasks due to the alignment for
64-bit quantities.
In order to fix these problems, I have implemented a way to
send compat messages to tasks. When sending an event, we send
the non-compat event data together with a compat event data in
skb_shinfo(main_skb)->frag_list. Then, when the event is read
from the socket, the netlink code makes sure to pass out only
the skb that is compatible with the task. This approach was
suggested by David Miller, my original approach required
always sending two skbs but that had various small problems.
To determine whether compat is needed or not, I have used the
MSG_CMSG_COMPAT flag, and adjusted the call path for recv and
recvfrom to include it, even if those calls do not have a cmsg
parameter.
I have not solved one small part of the problem, and I don't
think it is necessary to: if a 32-bit application uses read()
rather than any form of recvmsg() it will still get the wrong
(64-bit) event. However, neither do applications actually do
this, nor would it be a regression.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-01 19:26:02 +08:00
|
|
|
config WANT_COMPAT_NETLINK_MESSAGES
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
This option can be selected by other options that need compat
|
|
|
|
netlink messages.
|
|
|
|
|
|
|
|
config COMPAT_NETLINK_MESSAGES
|
|
|
|
def_bool y
|
|
|
|
depends on COMPAT
|
2010-07-27 04:13:49 +08:00
|
|
|
depends on WEXT_CORE || WANT_COMPAT_NETLINK_MESSAGES
|
net/compat/wext: send different messages to compat tasks
Wireless extensions have the unfortunate problem that events
are multicast netlink messages, and are not independent of
pointer size. Thus, currently 32-bit tasks on 64-bit platforms
cannot properly receive events and fail with all kinds of
strange problems, for instance wpa_supplicant never notices
disassociations, due to the way the 64-bit event looks (to a
32-bit process), the fact that the address is all zeroes is
lost, it thinks instead it is 00:00:00:00:01:00.
The same problem existed with the ioctls, until David Miller
fixed those some time ago in an heroic effort.
A different problem caused by this is that we cannot send the
ASSOCREQIE/ASSOCRESPIE events because sending them causes a
32-bit wpa_supplicant on a 64-bit system to overwrite its
internal information, which is worse than it not getting the
information at all -- so we currently resort to sending a
custom string event that it then parses. This, however, has a
severe size limitation we are frequently hitting with modern
access points; this limitation would can be lifted after this
patch by sending the correct binary, not custom, event.
A similar problem apparently happens for some other netlink
users on x86_64 with 32-bit tasks due to the alignment for
64-bit quantities.
In order to fix these problems, I have implemented a way to
send compat messages to tasks. When sending an event, we send
the non-compat event data together with a compat event data in
skb_shinfo(main_skb)->frag_list. Then, when the event is read
from the socket, the netlink code makes sure to pass out only
the skb that is compatible with the task. This approach was
suggested by David Miller, my original approach required
always sending two skbs but that had various small problems.
To determine whether compat is needed or not, I have used the
MSG_CMSG_COMPAT flag, and adjusted the call path for recv and
recvfrom to include it, even if those calls do not have a cmsg
parameter.
I have not solved one small part of the problem, and I don't
think it is necessary to: if a 32-bit application uses read()
rather than any form of recvmsg() it will still get the wrong
(64-bit) event. However, neither do applications actually do
this, nor would it be a regression.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-01 19:26:02 +08:00
|
|
|
help
|
|
|
|
This option makes it possible to send different netlink messages
|
|
|
|
to tasks depending on whether the task is a compat task or not. To
|
|
|
|
achieve this, you need to set skb_shinfo(skb)->frag_list to the
|
|
|
|
compat skb before sending the skb, the netlink code will sort out
|
|
|
|
which message to actually pass to the task.
|
|
|
|
|
|
|
|
Newly written code should NEVER need this option but do
|
|
|
|
compat-independent messages instead!
|
|
|
|
|
2015-05-14 00:19:37 +08:00
|
|
|
config NET_INGRESS
|
|
|
|
bool
|
|
|
|
|
net, sched: add clsact qdisc
This work adds a generalization of the ingress qdisc as a qdisc holding
only classifiers. The clsact qdisc works on ingress, but also on egress.
In both cases, it's execution happens without taking the qdisc lock, and
the main difference for the egress part compared to prior version of [1]
is that this can be applied with _any_ underlying real egress qdisc (also
classless ones).
Besides solving the use-case of [1], that is, allowing for more programmability
on assigning skb->priority for the mqprio case that is supported by most
popular 10G+ NICs, it also opens up a lot more flexibility for other tc
applications. The main work on classification can already be done at clsact
egress time if the use-case allows and state stored for later retrieval
f.e. again in skb->priority with major/minors (which is checked by most
classful qdiscs before consulting tc_classify()) and/or in other skb fields
like skb->tc_index for some light-weight post-processing to get to the
eventual classid in case of a classful qdisc. Another use case is that
the clsact egress part allows to have a central egress counterpart to
the ingress classifiers, so that classifiers can easily share state (e.g.
in cls_bpf via eBPF maps) for ingress and egress.
Currently, default setups like mq + pfifo_fast would require for this to
use, for example, prio qdisc instead (to get a tc_classify() run) and to
duplicate the egress classifier for each queue. With clsact, it allows
for leaving the setup as is, it can additionally assign skb->priority to
put the skb in one of pfifo_fast's bands and it can share state with maps.
Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
w/o the need to perform a skb_dst_force() to hold on to it any longer. In
lwt case, we can also use this facility to setup dst metadata via cls_bpf
(bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
that (case of IFF_NO_QUEUE devices, for example).
The realization can be done without any changes to the scheduler core
framework. All it takes is that we have two a-priori defined minors/child
classes, where we can mux between ingress and egress classifier list
(dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
dev->_tx to avoid extra cacheline miss for moderate loads). The egress
part is a bit similar modelled to handle_ing() and patched to a noop in
case the functionality is not used. Both handlers are now called
sch_handle_ingress() and sch_handle_egress(), code sharing among the two
doesn't seem practical as there are various minor differences in both
paths, so that making them conditional in a single handler would rather
slow things down.
Full compatibility to ingress qdisc is provided as well. Since both
piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
per netdevice, and thus ingress qdisc specific behaviour can be retained
for user space. This means, either a user does 'tc qdisc add dev foo ingress'
and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
alternative, where both, ingress and egress classifier can be configured
as in the below example. ingress qdisc supports attaching classifier to any
minor number whereas clsact has two fixed minors for muxing between the
lists, therefore to not break user space setups, they are better done as
two separate qdiscs.
I decided to extend the sch_ingress module with clsact functionality so
that commonly used code can be reused, the module is being aliased with
sch_clsact so that it can be auto-loaded properly. Alternative would have been
to add a flag when initializing ingress to alter its behaviour plus aliasing
to a different name (as it's more than just ingress). However, the first would
end up, based on the flag, choosing the new/old behaviour by calling different
function implementations to handle each anyway, the latter would require to
register ingress qdisc once again under different alias. So, this really begs
to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
by its own that share callbacks used by both.
Example, adding qdisc:
# tc qdisc add dev foo clsact
# tc qdisc show dev foo
qdisc mq 0: root
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1
Adding filters (deleting, etc works analogous by specifying ingress/egress):
# tc filter add dev foo ingress bpf da obj bar.o sec ingress
# tc filter add dev foo egress bpf da obj bar.o sec egress
# tc filter show dev foo ingress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
# tc filter show dev foo egress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action
A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
show an empty list for clsact. Either using the parent names (ingress/egress)
or specifying the full major/minor will then show the related filter lists.
Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.
[1] http://patchwork.ozlabs.org/patch/512949/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-08 05:29:47 +08:00
|
|
|
config NET_EGRESS
|
|
|
|
bool
|
|
|
|
|
bpf: Add fd-based tcx multi-prog infra with link support
This work refactors and adds a lightweight extension ("tcx") to the tc BPF
ingress and egress data path side for allowing BPF program management based
on fds via bpf() syscall through the newly added generic multi-prog API.
The main goal behind this work which we also presented at LPC [0] last year
and a recent update at LSF/MM/BPF this year [3] is to support long-awaited
BPF link functionality for tc BPF programs, which allows for a model of safe
ownership and program detachment.
Given the rise in tc BPF users in cloud native environments, this becomes
necessary to avoid hard to debug incidents either through stale leftover
programs or 3rd party applications accidentally stepping on each others toes.
As a recap, a BPF link represents the attachment of a BPF program to a BPF
hook point. The BPF link holds a single reference to keep BPF program alive.
Moreover, hook points do not reference a BPF link, only the application's
fd or pinning does. A BPF link holds meta-data specific to attachment and
implements operations for link creation, (atomic) BPF program update,
detachment and introspection. The motivation for BPF links for tc BPF programs
is multi-fold, for example:
- From Meta: "It's especially important for applications that are deployed
fleet-wide and that don't "control" hosts they are deployed to. If such
application crashes and no one notices and does anything about that, BPF
program will keep running draining resources or even just, say, dropping
packets. We at FB had outages due to such permanent BPF attachment
semantics. With fd-based BPF link we are getting a framework, which allows
safe, auto-detachable behavior by default, unless application explicitly
opts in by pinning the BPF link." [1]
- From Cilium-side the tc BPF programs we attach to host-facing veth devices
and phys devices build the core datapath for Kubernetes Pods, and they
implement forwarding, load-balancing, policy, EDT-management, etc, within
BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
experienced hard-to-debug issues in a user's staging environment where
another Kubernetes application using tc BPF attached to the same prio/handle
of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath
it. The goal is to establish a clear/safe ownership model via links which
cannot accidentally be overridden. [0,2]
BPF links for tc can co-exist with non-link attachments, and the semantics are
in line also with XDP links: BPF links cannot replace other BPF links, BPF
links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
would solve mentioned issue of safe ownership model as 3rd party applications
would not be able to accidentally wipe Cilium programs, even if they are not
BPF link aware.
Earlier attempts [4] have tried to integrate BPF links into core tc machinery
to solve cls_bpf, which has been intrusive to the generic tc kernel API with
extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
be wiped from the qdisc also. Locking a tc BPF program in place this way, is
getting into layering hacks given the two object models are vastly different.
We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF
attach API, so that the BPF link implementation blends in naturally similar to
other link types which are fd-based and without the need for changing core tc
internal APIs. BPF programs for tc can then be successively migrated from classic
cls_bpf to the new tc BPF link without needing to change the program's source
code, just the BPF loader mechanics for attaching is sufficient.
For the current tc framework, there is no change in behavior with this change
and neither does this change touch on tc core kernel APIs. The gist of this
patch is that the ingress and egress hook have a lightweight, qdisc-less
extension for BPF to attach its tc BPF programs, in other words, a minimal
entry point for tc BPF. The name tcx has been suggested from discussion of
earlier revisions of this work as a good fit, and to more easily differ between
the classic cls_bpf attachment and the fd-based one.
For the ingress and egress tcx points, the device holds a cache-friendly array
with program pointers which is separated from control plane (slow-path) data.
Earlier versions of this work used priority to determine ordering and expression
of dependencies similar as with classic tc, but it was challenged that for
something more future-proof a better user experience is required. Hence this
resulted in the design and development of the generic attach/detach/query API
for multi-progs. See prior patch with its discussion on the API design. tcx is
the first user and later we plan to integrate also others, for example, one
candidate is multi-prog support for XDP which would benefit and have the same
'look and feel' from API perspective.
The goal with tcx is to have maximum compatibility to existing tc BPF programs,
so they don't need to be rewritten specifically. Compatibility to call into
classic tcf_classify() is also provided in order to allow successive migration
or both to cleanly co-exist where needed given its all one logical tc layer and
the tcx plus classic tc cls/act build one logical overall processing pipeline.
tcx supports the simplified return codes TCX_NEXT which is non-terminating (go
to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT.
The fd-based API is behind a static key, so that when unused the code is also
not entered. The struct tcx_entry's program array is currently static, but
could be made dynamic if necessary at a point in future. The a/b pair swap
design has been chosen so that for detachment there are no allocations which
otherwise could fail.
The work has been tested with tc-testing selftest suite which all passes, as
well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB.
Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews
of this work.
[0] https://lpc.events/event/16/contributions/1353/
[1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com
[2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog
[3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
[4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-19 22:08:52 +08:00
|
|
|
config NET_XGRESS
|
|
|
|
select NET_INGRESS
|
|
|
|
select NET_EGRESS
|
|
|
|
bool
|
|
|
|
|
2020-03-25 20:47:18 +08:00
|
|
|
config NET_REDIRECT
|
|
|
|
bool
|
|
|
|
|
2024-04-04 04:21:39 +08:00
|
|
|
config SKB_DECRYPTED
|
|
|
|
bool
|
|
|
|
|
sk_buff: add skb extension infrastructure
This adds an optional extension infrastructure, with ispec (xfrm) and
bridge netfilter as first users.
objdiff shows no changes if kernel is built without xfrm and br_netfilter
support.
The third (planned future) user is Multipath TCP which is still
out-of-tree.
MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
numbers used by individual subflows.
This DSS mapping is read/written from tcp option space on receive and
written to tcp option space on transmitted tcp packets that are part of
and MPTCP connection.
Extending skb_shared_info or adding a private data field to skb fclones
doesn't work for incoming skb, so a different DSS propagation method would
be required for the receive side.
mptcp has same requirements as secpath/bridge netfilter:
1. extension memory is released when the sk_buff is free'd.
2. data is shared after cloning an skb (clone inherits extension)
3. adding extension to an skb will COW the extension buffer if needed.
The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
mapping for tx and rx processing.
Two new members are added to sk_buff:
1. 'active_extensions' byte (filling a hole), telling which extensions
are available for this skb.
This has two purposes.
a) avoids the need to initialize the pointer.
b) allows to "delete" an extension by clearing its bit
value in ->active_extensions.
While it would be possible to store the active_extensions byte
in the extension struct instead of sk_buff, there is one problem
with this:
When an extension has to be disabled, we can always clear the
bit in skb->active_extensions. But in case it would be stored in the
extension buffer itself, we might have to COW it first, if
we are dealing with a cloned skb. On kmalloc failure we would
be unable to turn an extension off.
2. extension pointer, located at the end of the sk_buff.
If the active_extensions byte is 0, the pointer is undefined,
it is not initialized on skb allocation.
This adds extra code to skb clone and free paths (to deal with
refcount/free of extension area) but this replaces similar code that
manages skb->nf_bridge and skb->sp structs in the followup patches of
the series.
It is possible to add support for extensions that are not preseved on
clones/copies.
To do this, it would be needed to define a bitmask of all extensions that
need copy/cow semantics, and change __skb_ext_copy() to check
->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
->active_extensions to 0 on the new clone.
This isn't done here because all extensions that get added here
need the copy/cow semantics.
v2:
Allocate entire extension space using kmem_cache.
Upside is that this allows better tracking of used memory,
downside is that we will allocate more space than strictly needed in
most cases (its unlikely that all extensions are active/needed at same
time for same skb).
The allocated memory (except the small extension header) is not cleared,
so no additonal overhead aside from memory usage.
Avoid atomic_dec_and_test operation on skb_ext_put()
by using similar trick as kfree_skbmem() does with fclone_ref:
If recount is 1, there is no concurrent user and we can free right away.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 00:15:16 +08:00
|
|
|
config SKB_EXTENSIONS
|
|
|
|
bool
|
|
|
|
|
2005-07-12 12:13:56 +08:00
|
|
|
menu "Networking options"
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-07-12 12:13:56 +08:00
|
|
|
source "net/packet/Kconfig"
|
|
|
|
source "net/unix/Kconfig"
|
2017-06-15 02:37:39 +08:00
|
|
|
source "net/tls/Kconfig"
|
2005-07-12 12:13:56 +08:00
|
|
|
source "net/xfrm/Kconfig"
|
2007-02-09 05:37:42 +08:00
|
|
|
source "net/iucv/Kconfig"
|
2017-01-09 23:55:13 +08:00
|
|
|
source "net/smc/Kconfig"
|
2018-05-02 19:01:22 +08:00
|
|
|
source "net/xdp/Kconfig"
|
2005-04-17 06:20:36 +08:00
|
|
|
|
net/handshake: Create a NETLINK service for handling handshake requests
When a kernel consumer needs a transport layer security session, it
first needs a handshake to negotiate and establish a session. This
negotiation can be done in user space via one of the several
existing library implementations, or it can be done in the kernel.
No in-kernel handshake implementations yet exist. In their absence,
we add a netlink service that can:
a. Notify a user space daemon that a handshake is needed.
b. Once notified, the daemon calls the kernel back via this
netlink service to get the handshake parameters, including an
open socket on which to establish the session.
c. Once the handshake is complete, the daemon reports the
session status and other information via a second netlink
operation. This operation marks that it is safe for the
kernel to use the open socket and the security session
established there.
The notification service uses a multicast group. Each handshake
mechanism (eg, tlshd) adopts its own group number so that the
handshake services are completely independent of one another. The
kernel can then tell via netlink_has_listeners() whether a handshake
service is active and prepared to handle a handshake request.
A new netlink operation, ACCEPT, acts like accept(2) in that it
instantiates a file descriptor in the user space daemon's fd table.
If this operation is successful, the reply carries the fd number,
which can be treated as an open and ready file descriptor.
While user space is performing the handshake, the kernel keeps its
muddy paws off the open socket. A second new netlink operation,
DONE, indicates that the user space daemon is finished with the
socket and it is safe for the kernel to use again. The operation
also indicates whether a session was established successfully.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17 22:32:26 +08:00
|
|
|
config NET_HANDSHAKE
|
|
|
|
bool
|
|
|
|
depends on SUNRPC || NVME_TARGET_TCP || NVME_TCP
|
|
|
|
default y
|
|
|
|
|
2023-04-17 22:32:39 +08:00
|
|
|
config NET_HANDSHAKE_KUNIT_TEST
|
|
|
|
tristate "KUnit tests for the handshake upcall mechanism" if !KUNIT_ALL_TESTS
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
help
|
|
|
|
This builds the KUnit tests for the handshake upcall mechanism.
|
|
|
|
|
|
|
|
KUnit tests run during boot and output the results to the debug
|
|
|
|
log in TAP format (https://testanything.org/). Only useful for
|
|
|
|
kernel devs running KUnit test harness and are not for inclusion
|
|
|
|
into a production build.
|
|
|
|
|
|
|
|
For more information on KUnit and unit tests in general, refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
config INET
|
|
|
|
bool "TCP/IP networking"
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2005-04-17 06:20:36 +08:00
|
|
|
These are the protocols used on the Internet and on most local
|
|
|
|
Ethernets. It is highly recommended to say Y here (this will enlarge
|
2008-02-12 16:35:16 +08:00
|
|
|
your kernel by about 400 KB), since some programs (e.g. the X window
|
2005-04-17 06:20:36 +08:00
|
|
|
system) use TCP/IP even if your machine is not connected to any
|
|
|
|
other computer. You will get the so-called loopback device which
|
|
|
|
allows you to ping yourself (great fun, that!).
|
|
|
|
|
|
|
|
For an excellent introduction to Linux networking, please read the
|
|
|
|
Linux Networking HOWTO, available from
|
|
|
|
<http://www.tldp.org/docs.html#howto>.
|
|
|
|
|
|
|
|
If you say Y here and also to "/proc file system support" and
|
|
|
|
"Sysctl support" below, you can change various aspects of the
|
|
|
|
behavior of the TCP/IP code by writing to the (virtual) files in
|
|
|
|
/proc/sys/net/ipv4/*; the options are explained in the file
|
2020-04-28 06:01:49 +08:00
|
|
|
<file:Documentation/networking/ip-sysctl.rst>.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
Short answer: say Y.
|
|
|
|
|
2005-07-12 12:13:56 +08:00
|
|
|
if INET
|
2005-04-17 06:20:36 +08:00
|
|
|
source "net/ipv4/Kconfig"
|
|
|
|
source "net/ipv6/Kconfig"
|
2006-11-06 08:44:06 +08:00
|
|
|
source "net/netlabel/Kconfig"
|
2020-01-22 08:56:15 +08:00
|
|
|
source "net/mptcp/Kconfig"
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-07-12 12:13:56 +08:00
|
|
|
endif # if INET
|
|
|
|
|
2006-06-09 15:29:17 +08:00
|
|
|
config NETWORK_SECMARK
|
|
|
|
bool "Security Marking"
|
|
|
|
help
|
|
|
|
This enables security marking of network packets, similar
|
|
|
|
to nfmark, but designated for security purposes.
|
|
|
|
If you are unsure how to answer this question, answer N.
|
|
|
|
|
2014-04-01 22:20:23 +08:00
|
|
|
config NET_PTP_CLASSIFY
|
|
|
|
def_bool n
|
|
|
|
|
2010-07-17 16:49:36 +08:00
|
|
|
config NETWORK_PHY_TIMESTAMPING
|
|
|
|
bool "Timestamping in PHY devices"
|
2014-04-01 22:20:23 +08:00
|
|
|
select NET_PTP_CLASSIFY
|
2010-07-17 16:49:36 +08:00
|
|
|
help
|
2019-12-26 10:16:16 +08:00
|
|
|
This allows timestamping of network packets by PHYs (or
|
|
|
|
other MII bus snooping devices) with hardware timestamping
|
|
|
|
capabilities. This option adds some overhead in the transmit
|
|
|
|
and receive paths.
|
2010-07-17 16:49:36 +08:00
|
|
|
|
|
|
|
If you are unsure how to answer this question, answer N.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
menuconfig NETFILTER
|
2006-11-29 09:35:43 +08:00
|
|
|
bool "Network packet filtering framework (Netfilter)"
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2005-04-17 06:20:36 +08:00
|
|
|
Netfilter is a framework for filtering and mangling network packets
|
|
|
|
that pass through your Linux box.
|
|
|
|
|
|
|
|
The most common use of packet filtering is to run your Linux box as
|
|
|
|
a firewall protecting a local network from the Internet. The type of
|
|
|
|
firewall provided by this kernel support is called a "packet
|
|
|
|
filter", which means that it can reject individual network packets
|
|
|
|
based on type, source, destination etc. The other kind of firewall,
|
|
|
|
a "proxy-based" one, is more secure but more intrusive and more
|
|
|
|
bothersome to set up; it inspects the network traffic much more
|
|
|
|
closely, modifies it and has knowledge about the higher level
|
|
|
|
protocols, which a packet filter lacks. Moreover, proxy-based
|
|
|
|
firewalls often require changes to the programs running on the local
|
|
|
|
clients. Proxy-based firewalls don't need support by the kernel, but
|
|
|
|
they are often combined with a packet filter, which only works if
|
|
|
|
you say Y here.
|
|
|
|
|
|
|
|
You should also say Y here if you intend to use your Linux box as
|
|
|
|
the gateway to the Internet for a local network of machines without
|
|
|
|
globally valid IP addresses. This is called "masquerading": if one
|
|
|
|
of the computers on your local network wants to send something to
|
|
|
|
the outside, your box can "masquerade" as that computer, i.e. it
|
|
|
|
forwards the traffic to the intended outside destination, but
|
|
|
|
modifies the packets to make it look like they came from the
|
|
|
|
firewall box itself. It works both ways: if the outside host
|
|
|
|
replies, the Linux box will silently forward the traffic to the
|
|
|
|
correct local computer. This way, the computers on your local net
|
|
|
|
are completely invisible to the outside world, even though they can
|
|
|
|
reach the outside and can receive replies. It is even possible to
|
|
|
|
run globally visible servers from within a masqueraded local network
|
|
|
|
using a mechanism called portforwarding. Masquerading is also often
|
|
|
|
called NAT (Network Address Translation).
|
|
|
|
|
|
|
|
Another use of Netfilter is in transparent proxying: if a machine on
|
|
|
|
the local network tries to connect to an outside host, your Linux
|
|
|
|
box can transparently forward the traffic to a local server,
|
|
|
|
typically a caching proxy server.
|
|
|
|
|
|
|
|
Yet another use of Netfilter is building a bridging firewall. Using
|
|
|
|
a bridge with Network packet filtering enabled makes iptables "see"
|
|
|
|
the bridged traffic. For filtering on the lower network and Ethernet
|
|
|
|
protocols over the bridge, use ebtables (under bridge netfilter
|
|
|
|
configuration).
|
|
|
|
|
|
|
|
Various modules exist for netfilter which replace the previous
|
|
|
|
masquerading (ipmasqadm), packet filtering (ipchains), transparent
|
|
|
|
proxying, and portforwarding mechanisms. Please see
|
|
|
|
<file:Documentation/Changes> under "iptables" for the location of
|
|
|
|
these packages.
|
|
|
|
|
|
|
|
if NETFILTER
|
|
|
|
|
2007-12-18 14:47:05 +08:00
|
|
|
config NETFILTER_ADVANCED
|
|
|
|
bool "Advanced netfilter configuration"
|
|
|
|
depends on NETFILTER
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
If you say Y here you can select between all the netfilter modules.
|
2009-01-26 18:12:25 +08:00
|
|
|
If you say N the more unusual ones will not be shown and the
|
2007-12-18 14:47:05 +08:00
|
|
|
basic ones needed by most people will default to 'M'.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
config BRIDGE_NETFILTER
|
2014-09-18 17:29:03 +08:00
|
|
|
tristate "Bridged IP/ARP packets filtering"
|
2014-09-30 16:59:18 +08:00
|
|
|
depends on BRIDGE
|
2014-09-18 17:29:03 +08:00
|
|
|
depends on NETFILTER && INET
|
2007-12-18 14:47:05 +08:00
|
|
|
depends on NETFILTER_ADVANCED
|
2017-12-07 23:28:26 +08:00
|
|
|
select NETFILTER_FAMILY_BRIDGE
|
2018-12-19 00:15:17 +08:00
|
|
|
select SKB_EXTENSIONS
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2005-04-17 06:20:36 +08:00
|
|
|
Enabling this option will let arptables resp. iptables see bridged
|
|
|
|
ARP resp. IP traffic. If you want a bridging firewall, you probably
|
|
|
|
want this option enabled.
|
|
|
|
Enabling or disabling this option doesn't enable or disable
|
|
|
|
ebtables.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2005-09-17 15:41:21 +08:00
|
|
|
source "net/netfilter/Kconfig"
|
2005-04-17 06:20:36 +08:00
|
|
|
source "net/ipv4/netfilter/Kconfig"
|
|
|
|
source "net/ipv6/netfilter/Kconfig"
|
|
|
|
source "net/bridge/netfilter/Kconfig"
|
|
|
|
|
|
|
|
endif
|
|
|
|
|
2005-08-10 11:14:34 +08:00
|
|
|
source "net/dccp/Kconfig"
|
2005-04-17 06:20:36 +08:00
|
|
|
source "net/sctp/Kconfig"
|
2009-02-24 23:30:39 +08:00
|
|
|
source "net/rds/Kconfig"
|
2006-01-16 23:39:13 +08:00
|
|
|
source "net/tipc/Kconfig"
|
2005-07-12 12:13:56 +08:00
|
|
|
source "net/atm/Kconfig"
|
2010-04-02 14:18:33 +08:00
|
|
|
source "net/l2tp/Kconfig"
|
2008-07-06 12:25:39 +08:00
|
|
|
source "net/802/Kconfig"
|
2005-07-12 12:13:56 +08:00
|
|
|
source "net/bridge/Kconfig"
|
net: Distributed Switch Architecture protocol support
Distributed Switch Architecture is a protocol for managing hardware
switch chips. It consists of a set of MII management registers and
commands to configure the switch, and an ethernet header format to
signal which of the ports of the switch a packet was received from
or is intended to be sent to.
The switches that this driver supports are typically embedded in
access points and routers, and a typical setup with a DSA switch
looks something like this:
+-----------+ +-----------+
| | RGMII | |
| +-------+ +------ 1000baseT MDI ("WAN")
| | | 6-port +------ 1000baseT MDI ("LAN1")
| CPU | | ethernet +------ 1000baseT MDI ("LAN2")
| |MIImgmt| switch +------ 1000baseT MDI ("LAN3")
| +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4")
| | | |
+-----------+ +-----------+
The switch driver presents each port on the switch as a separate
network interface to Linux, polls the switch to maintain software
link state of those ports, forwards MII management interface
accesses to those network interfaces (e.g. as done by ethtool) to
the switch, and exposes the switch's hardware statistics counters
via the appropriate Linux kernel interfaces.
This initial patch supports the MII management interface register
layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and
supports the "Ethertype DSA" packet tagging format.
(There is no officially registered ethertype for the Ethertype DSA
packet format, so we just grab a random one. The ethertype to use
is programmed into the switch, and the switch driver uses the value
of ETH_P_EDSA for this, so this define can be changed at any time in
the future if the one we chose is allocated to another protocol or
if Ethertype DSA gets its own officially registered ethertype, and
everything will continue to work.)
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Nicolas Pitre <nico@marvell.com>
Tested-by: Byron Bradley <byron.bbradley@gmail.com>
Tested-by: Tim Ellis <tim.ellis@mac.com>
Tested-by: Peter van Valderen <linux@ddcrew.com>
Tested-by: Dirk Teurlings <dirk@upexia.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 21:44:02 +08:00
|
|
|
source "net/dsa/Kconfig"
|
2005-07-12 12:13:56 +08:00
|
|
|
source "net/8021q/Kconfig"
|
2005-04-17 06:20:36 +08:00
|
|
|
source "net/llc/Kconfig"
|
2023-10-09 22:10:28 +08:00
|
|
|
source "net/appletalk/Kconfig"
|
2005-07-12 12:13:56 +08:00
|
|
|
source "net/x25/Kconfig"
|
|
|
|
source "net/lapb/Kconfig"
|
2009-01-23 11:00:25 +08:00
|
|
|
source "net/phonet/Kconfig"
|
2014-07-11 16:24:18 +08:00
|
|
|
source "net/6lowpan/Kconfig"
|
2009-06-08 20:18:48 +08:00
|
|
|
source "net/ieee802154/Kconfig"
|
2012-05-16 04:50:20 +08:00
|
|
|
source "net/mac802154/Kconfig"
|
2005-04-17 06:20:36 +08:00
|
|
|
source "net/sched/Kconfig"
|
2008-11-21 12:52:10 +08:00
|
|
|
source "net/dcb/Kconfig"
|
2010-08-04 22:16:33 +08:00
|
|
|
source "net/dns_resolver/Kconfig"
|
2010-12-13 19:19:28 +08:00
|
|
|
source "net/batman-adv/Kconfig"
|
2011-10-26 10:26:31 +08:00
|
|
|
source "net/openvswitch/Kconfig"
|
VSOCK: Introduce VM Sockets
VM Sockets allows communication between virtual machines and the hypervisor.
User level applications both in a virtual machine and on the host can use the
VM Sockets API, which facilitates fast and efficient communication between
guest virtual machines and their host. A socket address family, designed to be
compatible with UDP and TCP at the interface level, is provided.
Today, VM Sockets is used by various VMware Tools components inside the guest
for zero-config, network-less access to VMware host services. In addition to
this, VMware's users are using VM Sockets for various applications, where
network access of the virtual machine is restricted or non-existent. Examples
of this are VMs communicating with device proxies for proprietary hardware
running as host applications and automated testing of applications running
within virtual machines.
The VMware VM Sockets are similar to other socket types, like Berkeley UNIX
socket interface. The VM Sockets module supports both connection-oriented
stream sockets like TCP, and connectionless datagram sockets like UDP. The VM
Sockets protocol family is defined as "AF_VSOCK" and the socket operations
split for SOCK_DGRAM and SOCK_STREAM.
For additional information about the use of VM Sockets, please refer to the
VM Sockets Programming Guide available at:
https://www.vmware.com/support/developer/vmci-sdk/
Signed-off-by: George Zhang <georgezhang@vmware.com>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy king <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-06 22:23:56 +08:00
|
|
|
source "net/vmw_vsock/Kconfig"
|
2013-03-22 00:33:48 +08:00
|
|
|
source "net/netlink/Kconfig"
|
2013-05-24 05:02:52 +08:00
|
|
|
source "net/mpls/Kconfig"
|
2017-08-29 03:43:24 +08:00
|
|
|
source "net/nsh/Kconfig"
|
2013-10-31 04:10:47 +08:00
|
|
|
source "net/hsr/Kconfig"
|
2014-11-28 21:34:17 +08:00
|
|
|
source "net/switchdev/Kconfig"
|
2015-09-30 11:07:11 +08:00
|
|
|
source "net/l3mdev/Kconfig"
|
2016-05-06 22:09:08 +08:00
|
|
|
source "net/qrtr/Kconfig"
|
net/ncsi: Resource management
NCSI spec (DSP0222) defines several objects: package, channel, mode,
filter, version and statistics etc. This introduces the data structs
to represent those objects and implement functions to manage them.
Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI
stack.
* The user (e.g. netdev driver) dereference NCSI device by
"struct ncsi_dev", which is embedded to "struct ncsi_dev_priv".
The later one is used by NCSI stack internally.
* Every NCSI device can have multiple packages simultaneously, up
to 8 packages. It's represented by "struct ncsi_package" and
identified by 3-bits ID.
* Every NCSI package can have multiple channels, up to 32. It's
represented by "struct ncsi_channel" and identified by 5-bits ID.
* Every NCSI channel has version, statistics, various modes and
filters. They are represented by "struct ncsi_channel_version",
"struct ncsi_channel_stats", "struct ncsi_channel_mode" and
"struct ncsi_channel_filter" separately.
* Apart from AEN (Asynchronous Event Notification), the NCSI stack
works in terms of command and response. This introduces "struct
ncsi_req" to represent a complete NCSI transaction made of NCSI
request and response.
link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 09:54:16 +08:00
|
|
|
source "net/ncsi/Kconfig"
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2021-03-20 01:39:33 +08:00
|
|
|
config PCPU_DEV_REFCNT
|
|
|
|
bool "Use percpu variables to maintain network device refcount"
|
|
|
|
depends on SMP
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
network device refcount are using per cpu variables if this option is set.
|
|
|
|
This can be forced to N to detect underflows (with a performance drop).
|
|
|
|
|
net: introduce a config option to tweak MAX_SKB_FRAGS
Currently, MAX_SKB_FRAGS value is 17.
For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
attempts order-3 allocations, stuffing 32768 bytes per frag.
But with zero copy, we use order-0 pages.
For BIG TCP to show its full potential, we add a config option
to be able to fit up to 45 segments per skb.
This is also needed for BIG TCP rx zerocopy, as zerocopy currently
does not support skbs with frag list.
We have used MAX_SKB_FRAGS=45 value for years at Google before
we deployed 4K MTU, with no adverse effect, other than
a recent issue in mlx4, fixed in commit 26782aad00cc
("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS")
Back then, goal was to be able to receive full size (64KB) GRO
packets without the frag_list overhead.
Note that /proc/sys/net/core/max_skb_frags can also be used to limit
the number of fragments TCP can use in tx packets.
By default we keep the old/legacy value of 17 until we get
more coverage for the updated values.
Sizes of struct skb_shared_info on 64bit arches
MAX_SKB_FRAGS | sizeof(struct skb_shared_info):
==============================================
17 320
21 320+64 = 384
25 320+128 = 448
29 320+192 = 512
33 320+256 = 576
37 320+320 = 640
41 320+384 = 704
45 320+448 = 768
This inflation might cause problems for drivers assuming they could pack
both the incoming packet (for MTU=1500) and skb_shared_info in half a page,
using build_skb().
v3: fix build error when CONFIG_NET=n
v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long"
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20230323162842.1935061-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-24 00:28:42 +08:00
|
|
|
config MAX_SKB_FRAGS
|
|
|
|
int "Maximum number of fragments per skb_shared_info"
|
|
|
|
range 17 45
|
|
|
|
default 17
|
|
|
|
help
|
|
|
|
Having more fragments per skb_shared_info can help GRO efficiency.
|
|
|
|
This helps BIG TCP workloads, but might expose bugs in some
|
|
|
|
legacy drivers.
|
|
|
|
This also increases memory overhead of small packets,
|
|
|
|
and in drivers using build_skb().
|
|
|
|
If unsure, say 17.
|
|
|
|
|
2010-03-25 03:13:54 +08:00
|
|
|
config RPS
|
2024-06-05 10:29:32 +08:00
|
|
|
bool "Receive packet steering"
|
2013-11-22 06:32:01 +08:00
|
|
|
depends on SMP && SYSFS
|
2010-03-25 03:13:54 +08:00
|
|
|
default y
|
2024-06-05 10:29:32 +08:00
|
|
|
help
|
|
|
|
Software receive side packet steering (RPS) distributes the
|
|
|
|
load of received packet processing across multiple CPUs.
|
2010-03-25 03:13:54 +08:00
|
|
|
|
2011-01-19 19:03:53 +08:00
|
|
|
config RFS_ACCEL
|
2024-06-05 10:29:32 +08:00
|
|
|
bool "Hardware acceleration of RFS"
|
2013-08-30 15:39:53 +08:00
|
|
|
depends on RPS
|
2011-01-19 19:03:53 +08:00
|
|
|
select CPU_RMAP
|
|
|
|
default y
|
2024-06-05 10:29:32 +08:00
|
|
|
help
|
|
|
|
Allowing drivers for multiqueue hardware with flow filter tables to
|
|
|
|
accelerate RFS.
|
2011-01-19 19:03:53 +08:00
|
|
|
|
2021-02-11 19:35:51 +08:00
|
|
|
config SOCK_RX_QUEUE_MAPPING
|
|
|
|
bool
|
|
|
|
|
2010-11-26 16:36:09 +08:00
|
|
|
config XPS
|
2014-12-21 04:41:11 +08:00
|
|
|
bool
|
2013-11-22 06:32:01 +08:00
|
|
|
depends on SMP
|
2021-02-11 19:35:51 +08:00
|
|
|
select SOCK_RX_QUEUE_MAPPING
|
2010-11-26 16:36:09 +08:00
|
|
|
default y
|
|
|
|
|
2016-03-14 16:39:04 +08:00
|
|
|
config HWBM
|
2019-11-21 21:28:35 +08:00
|
|
|
bool
|
2016-03-14 16:39:04 +08:00
|
|
|
|
2013-12-30 00:27:11 +08:00
|
|
|
config CGROUP_NET_PRIO
|
2014-02-08 23:36:58 +08:00
|
|
|
bool "Network priority cgroup"
|
2011-11-22 13:10:51 +08:00
|
|
|
depends on CGROUPS
|
2015-12-08 06:38:52 +08:00
|
|
|
select SOCK_CGROUP_DATA
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2011-11-22 13:10:51 +08:00
|
|
|
Cgroup subsystem for use in assigning processes to network priorities on
|
2013-12-30 00:27:11 +08:00
|
|
|
a per-interface basis.
|
2011-11-22 13:10:51 +08:00
|
|
|
|
2013-12-30 01:27:10 +08:00
|
|
|
config CGROUP_NET_CLASSID
|
2014-12-21 04:41:11 +08:00
|
|
|
bool "Network classid cgroup"
|
2013-12-30 01:27:10 +08:00
|
|
|
depends on CGROUPS
|
2015-12-08 06:38:52 +08:00
|
|
|
select SOCK_CGROUP_DATA
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2013-12-30 01:27:10 +08:00
|
|
|
Cgroup subsystem for use as general purpose socket classid marker that is
|
|
|
|
being used in cls_cgroup and for netfilter matching.
|
|
|
|
|
2013-08-01 11:10:25 +08:00
|
|
|
config NET_RX_BUSY_POLL
|
2014-12-21 04:41:11 +08:00
|
|
|
bool
|
2023-05-23 19:15:18 +08:00
|
|
|
default y if !PREEMPT_RT || (PREEMPT_RT && !NETCONSOLE)
|
2013-06-10 16:39:50 +08:00
|
|
|
|
2011-11-29 00:33:09 +08:00
|
|
|
config BQL
|
2014-12-21 04:41:11 +08:00
|
|
|
bool
|
2024-02-16 01:05:07 +08:00
|
|
|
prompt "Enable Byte Queue Limits"
|
2011-11-29 00:33:09 +08:00
|
|
|
depends on SYSFS
|
|
|
|
select DQL
|
|
|
|
default y
|
|
|
|
|
2017-08-28 22:12:21 +08:00
|
|
|
config BPF_STREAM_PARSER
|
|
|
|
bool "enable BPF STREAM_PARSER"
|
bpf, sockmap: convert to generic sk_msg interface
Add a generic sk_msg layer, and convert current sockmap and later
kTLS over to make use of it. While sk_buff handles network packet
representation from netdevice up to socket, sk_msg handles data
representation from application to socket layer.
This means that sk_msg framework spans across ULP users in the
kernel, and enables features such as introspection or filtering
of data with the help of BPF programs that operate on this data
structure.
Latter becomes in particular useful for kTLS where data encryption
is deferred into the kernel, and as such enabling the kernel to
perform L7 introspection and policy based on BPF for TLS connections
where the record is being encrypted after BPF has run and came to
a verdict. In order to get there, first step is to transform open
coding of scatter-gather list handling into a common core framework
that subsystems can use.
The code itself has been split and refactored into three bigger
pieces: i) the generic sk_msg API which deals with managing the
scatter gather ring, providing helpers for walking and mangling,
transferring application data from user space into it, and preparing
it for BPF pre/post-processing, ii) the plain sock map itself
where sockets can be attached to or detached from; these bits
are independent of i) which can now be used also without sock
map, and iii) the integration with plain TCP as one protocol
to be used for processing L7 application data (later this could
e.g. also be extended to other protocols like UDP). The semantics
are the same with the old sock map code and therefore no change
of user facing behavior or APIs. While pursuing this work it
also helped finding a number of bugs in the old sockmap code
that we've fixed already in earlier commits. The test_sockmap
kselftest suite passes through fine as well.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-13 08:45:58 +08:00
|
|
|
depends on INET
|
2017-08-28 22:12:21 +08:00
|
|
|
depends on BPF_SYSCALL
|
bpf, sockmap: convert to generic sk_msg interface
Add a generic sk_msg layer, and convert current sockmap and later
kTLS over to make use of it. While sk_buff handles network packet
representation from netdevice up to socket, sk_msg handles data
representation from application to socket layer.
This means that sk_msg framework spans across ULP users in the
kernel, and enables features such as introspection or filtering
of data with the help of BPF programs that operate on this data
structure.
Latter becomes in particular useful for kTLS where data encryption
is deferred into the kernel, and as such enabling the kernel to
perform L7 introspection and policy based on BPF for TLS connections
where the record is being encrypted after BPF has run and came to
a verdict. In order to get there, first step is to transform open
coding of scatter-gather list handling into a common core framework
that subsystems can use.
The code itself has been split and refactored into three bigger
pieces: i) the generic sk_msg API which deals with managing the
scatter gather ring, providing helpers for walking and mangling,
transferring application data from user space into it, and preparing
it for BPF pre/post-processing, ii) the plain sock map itself
where sockets can be attached to or detached from; these bits
are independent of i) which can now be used also without sock
map, and iii) the integration with plain TCP as one protocol
to be used for processing L7 application data (later this could
e.g. also be extended to other protocols like UDP). The semantics
are the same with the old sock map code and therefore no change
of user facing behavior or APIs. While pursuing this work it
also helped finding a number of bugs in the old sockmap code
that we've fixed already in earlier commits. The test_sockmap
kselftest suite passes through fine as well.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-13 08:45:58 +08:00
|
|
|
depends on CGROUP_BPF
|
2017-08-28 22:12:21 +08:00
|
|
|
select STREAM_PARSER
|
bpf, sockmap: convert to generic sk_msg interface
Add a generic sk_msg layer, and convert current sockmap and later
kTLS over to make use of it. While sk_buff handles network packet
representation from netdevice up to socket, sk_msg handles data
representation from application to socket layer.
This means that sk_msg framework spans across ULP users in the
kernel, and enables features such as introspection or filtering
of data with the help of BPF programs that operate on this data
structure.
Latter becomes in particular useful for kTLS where data encryption
is deferred into the kernel, and as such enabling the kernel to
perform L7 introspection and policy based on BPF for TLS connections
where the record is being encrypted after BPF has run and came to
a verdict. In order to get there, first step is to transform open
coding of scatter-gather list handling into a common core framework
that subsystems can use.
The code itself has been split and refactored into three bigger
pieces: i) the generic sk_msg API which deals with managing the
scatter gather ring, providing helpers for walking and mangling,
transferring application data from user space into it, and preparing
it for BPF pre/post-processing, ii) the plain sock map itself
where sockets can be attached to or detached from; these bits
are independent of i) which can now be used also without sock
map, and iii) the integration with plain TCP as one protocol
to be used for processing L7 application data (later this could
e.g. also be extended to other protocols like UDP). The semantics
are the same with the old sock map code and therefore no change
of user facing behavior or APIs. While pursuing this work it
also helped finding a number of bugs in the old sockmap code
that we've fixed already in earlier commits. The test_sockmap
kselftest suite passes through fine as well.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-13 08:45:58 +08:00
|
|
|
select NET_SOCK_MSG
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2021-02-24 02:49:26 +08:00
|
|
|
Enabling this allows a TCP stream parser to be used with
|
2019-11-21 21:28:35 +08:00
|
|
|
BPF_MAP_TYPE_SOCKMAP.
|
2017-08-28 22:12:21 +08:00
|
|
|
|
2013-05-20 12:02:32 +08:00
|
|
|
config NET_FLOW_LIMIT
|
2024-06-05 10:29:32 +08:00
|
|
|
bool "Net flow limit"
|
2013-05-20 12:02:32 +08:00
|
|
|
depends on RPS
|
|
|
|
default y
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2013-05-20 12:02:32 +08:00
|
|
|
The network stack has to drop packets when a receive processing CPU's
|
|
|
|
backlog reaches netdev_max_backlog. If a few out of many active flows
|
|
|
|
generate the vast majority of load, drop their traffic earlier to
|
|
|
|
maintain capacity for the other flows. This feature provides servers
|
|
|
|
with many clients some protection against DoS by a single (spoofed)
|
|
|
|
flow that greatly exceeds average workload.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
menu "Network testing"
|
|
|
|
|
|
|
|
config NET_PKTGEN
|
|
|
|
tristate "Packet Generator (USE WITH CAUTION)"
|
2013-07-29 19:44:15 +08:00
|
|
|
depends on INET && PROC_FS
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2005-04-17 06:20:36 +08:00
|
|
|
This module will inject preconfigured packets, at a configurable
|
|
|
|
rate, out of a given interface. It is used for network interface
|
|
|
|
stress testing and performance analysis. If you don't understand
|
|
|
|
what was just said, you don't need it: say N.
|
|
|
|
|
|
|
|
Documentation on how to use the packet generator can be found
|
2020-05-01 00:04:13 +08:00
|
|
|
at <file:Documentation/networking/pktgen.rst>.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
To compile this code as a module, choose M here: the
|
|
|
|
module will be called pktgen.
|
|
|
|
|
2009-03-11 17:53:16 +08:00
|
|
|
config NET_DROP_MONITOR
|
2012-05-17 18:04:00 +08:00
|
|
|
tristate "Network packet drop alerting service"
|
2012-10-03 02:19:40 +08:00
|
|
|
depends on INET && TRACEPOINTS
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2019-11-21 21:28:35 +08:00
|
|
|
This feature provides an alerting service to userspace in the
|
|
|
|
event that packets are discarded in the network stack. Alerts
|
|
|
|
are broadcast via netlink socket to any listening user space
|
|
|
|
process. If you don't need network drop alerts, or if you are ok
|
|
|
|
just checking the various proc files and other utilities for
|
|
|
|
drop statistics, say N here.
|
2009-03-11 17:53:16 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
endmenu
|
|
|
|
|
|
|
|
endmenu
|
|
|
|
|
|
|
|
source "net/ax25/Kconfig"
|
2007-11-17 07:52:17 +08:00
|
|
|
source "net/can/Kconfig"
|
2005-04-17 06:20:36 +08:00
|
|
|
source "net/bluetooth/Kconfig"
|
2007-04-27 06:48:28 +08:00
|
|
|
source "net/rxrpc/Kconfig"
|
2016-03-08 06:11:06 +08:00
|
|
|
source "net/kcm/Kconfig"
|
strparser: Stream parser for messages
This patch introduces a utility for parsing application layer protocol
messages in a TCP stream. This is a generalization of the mechanism
implemented of Kernel Connection Multiplexor.
The API includes a context structure, a set of callbacks, utility
functions, and a data ready function.
A stream parser instance is defined by a strparse structure that
is bound to a TCP socket. The function to initialize the structure
is:
int strp_init(struct strparser *strp, struct sock *csk,
struct strp_callbacks *cb);
csk is the TCP socket being bound to and cb are the parser callbacks.
The upper layer calls strp_tcp_data_ready when data is ready on the lower
socket for strparser to process. This should be called from a data_ready
callback that is set on the socket:
void strp_tcp_data_ready(struct strparser *strp);
A parser is bound to a TCP socket by setting data_ready function to
strp_tcp_data_ready so that all receive indications on the socket
go through the parser. This is assumes that sk_user_data is set to
the strparser structure.
There are four callbacks.
- parse_msg is called to parse the message (returns length or error).
- rcv_msg is called when a complete message has been received
- read_sock_done is called when data_ready function exits
- abort_parser is called to abort the parser
The input to parse_msg is an skbuff which contains next message under
construction. The backend processing of parse_msg will parse the
application layer protocol headers to determine the length of
the message in the stream. The possible return values are:
>0 : indicates length of successfully parsed message
0 : indicates more data must be received to parse the message
-ESTRPIPE : current message should not be processed by the
kernel, return control of the socket to userspace which
can proceed to read the messages itself
other < 0 : Error is parsing, give control back to userspace
assuming that synchronzation is lost and the stream
is unrecoverable (application expected to close TCP socket)
In the case of error return (< 0) strparse will stop the parser
and report and error to userspace. The application must deal
with the error. To handle the error the strparser is unbound
from the TCP socket. If the error indicates that the stream
TCP socket is at recoverable point (ESTRPIPE) then the application
can read the TCP socket to process the stream. Once the application
has dealt with the exceptions in the stream, it may again bind the
socket to a strparser to continue data operations.
Note that ENODATA may be returned to the application. In this case
parse_msg returned -ESTRPIPE, however strparser was unable to maintain
synchronization of the stream (i.e. some of the message in question
was already read by the parser).
strp_pause and strp_unpause are used to provide flow control. For
instance, if rcv_msg is called but the upper layer can't immediately
consume the message it can hold the message and pause strparser.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-16 05:51:01 +08:00
|
|
|
source "net/strparser/Kconfig"
|
2021-07-29 10:20:39 +08:00
|
|
|
source "net/mctp/Kconfig"
|
2006-01-21 07:46:55 +08:00
|
|
|
|
2006-08-04 18:38:38 +08:00
|
|
|
config FIB_RULES
|
|
|
|
bool
|
|
|
|
|
2008-07-25 00:20:09 +08:00
|
|
|
menuconfig WIRELESS
|
|
|
|
bool "Wireless"
|
2007-05-10 21:46:01 +08:00
|
|
|
depends on !S390
|
2008-07-25 00:20:09 +08:00
|
|
|
default y
|
|
|
|
|
|
|
|
if WIRELESS
|
2007-04-24 03:19:12 +08:00
|
|
|
|
|
|
|
source "net/wireless/Kconfig"
|
2007-05-06 02:45:53 +08:00
|
|
|
source "net/mac80211/Kconfig"
|
2007-04-24 03:19:12 +08:00
|
|
|
|
2008-07-25 00:20:09 +08:00
|
|
|
endif # WIRELESS
|
2007-04-24 03:19:12 +08:00
|
|
|
|
2007-05-07 15:34:20 +08:00
|
|
|
source "net/rfkill/Kconfig"
|
2007-07-11 06:57:28 +08:00
|
|
|
source "net/9p/Kconfig"
|
2010-03-30 21:56:28 +08:00
|
|
|
source "net/caif/Kconfig"
|
2010-04-07 06:14:15 +08:00
|
|
|
source "net/ceph/Kconfig"
|
2011-07-02 06:31:33 +08:00
|
|
|
source "net/nfc/Kconfig"
|
net: Introduce psample, a new genetlink channel for packet sampling
Add a general way for kernel modules to sample packets, without being tied
to any specific subsystem. This netlink channel can be used by tc,
iptables, etc. and allow to standardize packet sampling in the kernel.
For every sampled packet, the psample module adds the following metadata
fields:
PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable
PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable
PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been
truncated during sampling
PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the
user who initiated the sampling. This field allows the user to
differentiate between several samplers working simultaneously and
filter packets relevant to him
PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The
sequence is kept for each group
PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets
PSAMPLE_ATTR_DATA - the actual packet bits
The sampled packets are sent to the PSAMPLE_NL_MCGRP_SAMPLE multicast
group. In addition, add the GET_GROUPS netlink command which allows the
user to see the current sample groups, their refcount and sequence number.
This command currently supports only netlink dump mode.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-23 18:07:08 +08:00
|
|
|
source "net/psample/Kconfig"
|
2017-02-01 21:30:02 +08:00
|
|
|
source "net/ife/Kconfig"
|
2010-03-30 21:56:28 +08:00
|
|
|
|
2015-07-21 16:43:46 +08:00
|
|
|
config LWTUNNEL
|
|
|
|
bool "Network light weight tunnels"
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2015-07-21 16:43:46 +08:00
|
|
|
This feature provides an infrastructure to support light weight
|
|
|
|
tunnels like mpls. There is no netdevice associated with a light
|
|
|
|
weight tunnel endpoint. Tunnel encapsulation parameters are stored
|
|
|
|
with light weight tunnel state associated with fib routes.
|
2007-05-07 15:34:20 +08:00
|
|
|
|
2016-12-01 00:10:10 +08:00
|
|
|
config LWTUNNEL_BPF
|
|
|
|
bool "Execute BPF program as route nexthop action"
|
2019-02-16 01:51:35 +08:00
|
|
|
depends on LWTUNNEL && INET
|
2016-12-01 00:10:10 +08:00
|
|
|
default y if LWTUNNEL=y
|
2020-06-14 00:50:22 +08:00
|
|
|
help
|
2016-12-01 00:10:10 +08:00
|
|
|
Allows to run BPF programs as a nexthop action following a route
|
|
|
|
lookup for incoming and outgoing packets.
|
|
|
|
|
2016-02-12 22:43:53 +08:00
|
|
|
config DST_CACHE
|
2016-03-22 06:37:22 +08:00
|
|
|
bool
|
2016-02-12 22:43:53 +08:00
|
|
|
default n
|
|
|
|
|
2017-02-08 07:37:15 +08:00
|
|
|
config GRO_CELLS
|
|
|
|
bool
|
|
|
|
default n
|
|
|
|
|
2018-04-30 15:16:12 +08:00
|
|
|
config SOCK_VALIDATE_XMIT
|
|
|
|
bool
|
2024-05-03 21:13:42 +08:00
|
|
|
|
|
|
|
config NET_IEEE8021Q_HELPERS
|
|
|
|
bool
|
2018-04-30 15:16:12 +08:00
|
|
|
|
2021-04-19 21:01:03 +08:00
|
|
|
config NET_SELFTESTS
|
|
|
|
def_tristate PHYLIB
|
2021-04-28 21:09:46 +08:00
|
|
|
depends on PHYLIB && INET
|
2021-04-19 21:01:03 +08:00
|
|
|
|
bpf, sockmap: convert to generic sk_msg interface
Add a generic sk_msg layer, and convert current sockmap and later
kTLS over to make use of it. While sk_buff handles network packet
representation from netdevice up to socket, sk_msg handles data
representation from application to socket layer.
This means that sk_msg framework spans across ULP users in the
kernel, and enables features such as introspection or filtering
of data with the help of BPF programs that operate on this data
structure.
Latter becomes in particular useful for kTLS where data encryption
is deferred into the kernel, and as such enabling the kernel to
perform L7 introspection and policy based on BPF for TLS connections
where the record is being encrypted after BPF has run and came to
a verdict. In order to get there, first step is to transform open
coding of scatter-gather list handling into a common core framework
that subsystems can use.
The code itself has been split and refactored into three bigger
pieces: i) the generic sk_msg API which deals with managing the
scatter gather ring, providing helpers for walking and mangling,
transferring application data from user space into it, and preparing
it for BPF pre/post-processing, ii) the plain sock map itself
where sockets can be attached to or detached from; these bits
are independent of i) which can now be used also without sock
map, and iii) the integration with plain TCP as one protocol
to be used for processing L7 application data (later this could
e.g. also be extended to other protocols like UDP). The semantics
are the same with the old sock map code and therefore no change
of user facing behavior or APIs. While pursuing this work it
also helped finding a number of bugs in the old sockmap code
that we've fixed already in earlier commits. The test_sockmap
kselftest suite passes through fine as well.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-13 08:45:58 +08:00
|
|
|
config NET_SOCK_MSG
|
|
|
|
bool
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
The NET_SOCK_MSG provides a framework for plain sockets (e.g. TCP) or
|
|
|
|
ULPs (upper layer modules, e.g. TLS) to process L7 application data
|
|
|
|
with the help of BPF programs.
|
|
|
|
|
2016-02-27 00:32:23 +08:00
|
|
|
config NET_DEVLINK
|
2019-03-24 18:14:38 +08:00
|
|
|
bool
|
|
|
|
default n
|
2016-02-27 00:32:23 +08:00
|
|
|
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 22:46:17 +08:00
|
|
|
config PAGE_POOL
|
2019-11-21 21:28:35 +08:00
|
|
|
bool
|
page_pool: refurbish version of page_pool code
Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.
Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page. The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.
[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
V2: Adjustments requested by Tariq
- Changed page_pool_create return codes, don't return NULL, only
ERR_PTR, as this simplifies err handling in drivers.
V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
e.g. remove a WARN as pointed out by Tariq
e.g. quicker fallback if ptr_ring is empty.
V5: Fixed SPDX license as pointed out by Alexei
V6: Adjustments requested by Eric Dumazet
- Adjust ____cacheline_aligned_in_smp usage/placement
- Move rcu_head in struct page_pool
- Free pages quicker on destroy, minimize resources delayed an RCU period
- Remove code for forward/backward compat ABI interface
V8: Issues found by kbuild test robot
- Address sparse should be static warnings
- Only compile+link when a driver use/select page_pool,
mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 22:46:17 +08:00
|
|
|
|
2022-03-02 15:55:47 +08:00
|
|
|
config PAGE_POOL_STATS
|
|
|
|
default n
|
|
|
|
bool "Page pool stats"
|
|
|
|
depends on PAGE_POOL
|
|
|
|
help
|
|
|
|
Enable page pool statistics to track page allocation and recycling
|
|
|
|
in page pools. This option incurs additional CPU cost in allocation
|
|
|
|
and recycle paths and additional memory cost to store the statistics.
|
|
|
|
These statistics are only available if this option is enabled and if
|
|
|
|
the driver using the page pool supports exporting this data.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2018-05-25 00:55:13 +08:00
|
|
|
config FAILOVER
|
|
|
|
tristate "Generic failover module"
|
|
|
|
help
|
|
|
|
The failover module provides a generic interface for paravirtual
|
|
|
|
drivers to register a netdev and a set of ops with a failover
|
|
|
|
instance. The ops are used as event handlers that get called to
|
|
|
|
handle netdev register/unregister/link change/name change events
|
|
|
|
on slave pci ethernet devices with the same mac address as the
|
|
|
|
failover netdev. This enables paravirtual drivers to use a
|
|
|
|
VF as an accelerated low latency datapath. It also allows live
|
|
|
|
migration of VMs with direct attached VFs by failing over to the
|
|
|
|
paravirtual datapath when the VF is unplugged.
|
|
|
|
|
2019-12-27 22:55:18 +08:00
|
|
|
config ETHTOOL_NETLINK
|
|
|
|
bool "Netlink interface for ethtool"
|
ethtool: provide customized dim profile management
The NetDIM library, currently leveraged by an array of NICs, delivers
excellent acceleration benefits. Nevertheless, NICs vary significantly
in their dim profile list prerequisites.
Specifically, virtio-net backends may present diverse sw or hw device
implementation, making a one-size-fits-all parameter list impractical.
On Alibaba Cloud, the virtio DPU's performance under the default DIM
profile falls short of expectations, partly due to a mismatch in
parameter configuration.
I also noticed that ice/idpf/ena and other NICs have customized
profilelist or placed some restrictions on dim capabilities.
Motivated by this, I tried adding new params for "ethtool -C" that provides
a per-device control to modify and access a device's interrupt parameters.
Usage
========
The target NIC is named ethx.
Assume that ethx only declares support for rx profile setting
(with DIM_PROFILE_RX flag set in profile_flags) and supports modification
of usec and pkt fields.
1. Query the currently customized list of the device
$ ethtool -c ethx
...
rx-profile:
{.usec = 1, .pkts = 256, .comps = n/a,},
{.usec = 8, .pkts = 256, .comps = n/a,},
{.usec = 64, .pkts = 256, .comps = n/a,},
{.usec = 128, .pkts = 256, .comps = n/a,},
{.usec = 256, .pkts = 256, .comps = n/a,}
tx-profile: n/a
2. Tune
$ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n
"n" means do not modify this field.
$ ethtool -c ethx
...
rx-profile:
{.usec = 1, .pkts = 1, .comps = n/a,},
{.usec = 2, .pkts = 256, .comps = n/a,},
{.usec = 3, .pkts = 3, .comps = n/a,},
{.usec = 4, .pkts = 4, .comps = n/a,},
{.usec = 256, .pkts = 5, .comps = n/a,}
tx-profile: n/a
3. Hint
If the device does not support some type of customized dim profiles,
the corresponding "n/a" will display.
If the "n/a" field is being modified, -EOPNOTSUPP will be reported.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240621101353.107425-4-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-21 18:13:51 +08:00
|
|
|
select DIMLIB
|
2019-12-27 22:55:18 +08:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
An alternative userspace interface for ethtool based on generic
|
|
|
|
netlink. It provides better extensibility and some new features,
|
|
|
|
e.g. notification messages.
|
|
|
|
|
2021-11-19 22:21:55 +08:00
|
|
|
config NETDEV_ADDR_LIST_TEST
|
|
|
|
tristate "Unit tests for device address list"
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
|
2023-10-09 22:41:51 +08:00
|
|
|
config NET_TEST
|
|
|
|
tristate "KUnit tests for networking" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
KUnit tests covering core networking infra, such as sk_buff.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2005-07-12 12:13:56 +08:00
|
|
|
endif # if NET
|