Lawrence Brakmo says:
====================
bpf: Add support for sock_ops
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.) and setting
connection parameters such as buffer sizes, initial window, SYN/SYN-ACK
RTOs, etc.
Unlike current BPF program types that expect to be called at a particular
place in the network stack code, SOCK_OPS program can be called at
different places and use an "op" field to indicate the context. There
are currently two types of operations, those whose effect is through
their return value and those whose effect is through the new
bpf_setsocketop BPF helper function.
Example operands of the first type are:
BPF_SOCK_OPS_TIMEOUT_INIT
BPF_SOCK_OPS_RWND_INIT
BPF_SOCK_OPS_NEEDS_ECN
Example operands of the secont type are:
BPF_SOCK_OPS_TCP_CONNECT_CB
BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB
BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
Current operands are only called during connection establishment so
there should not be any BPF overheads after connection establishment. The
main idea is to use connection information form both hosts, such as IP
addresses and ports to allow setting of per connection parameters to
optimize the connection's peformance.
Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it does not require
application changes and it can be updated easily at any time.
It uses the existing bpf cgroups infrastructure so the programs can be
attached per cgroup with full inheritance support. Although the bpf cgroup
framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK),
I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type
expects to be called only once during the connections's lifetime. In contrast,
the new program type will be called multiple times from different places in the
network stack code. For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set congestion
control, etc. As a result it has "op" field to specify the type of operation
requested.
This patch set also includes sample BPF programs to demostrate the differnet
features.
v2: Formatting changes, rebased to latest net-next
v3: Fixed build issues, changed socket_ops to sock_ops throught,
fixed formatting issues, removed the syscall to load sock_ops
program and added functionality to use existing bpf attach and
bpf detach system calls, removed reader/writer locks in
sock_bpfops.c (used when saving sock_ops global program)
and fixed missing module refcount increment.
v4: Removed global sock_ops program and instead used existing cgroup bpf
infrastructure to support a new BPF_CGROUP_ATTCH type.
v5: fixed kbuild warning happening in bpf-cgroup.h
removed automatic converstion to host byte order from some sock_ops
fields (ipv4 and ipv6 addresses, remote port)
Added conversion to host byte order in some of the sample programs
Added to sample BPF program comments about using load_sock_ops to load
Removed is_req_sock field from bpf_sock_ops_kern and related places,
using sk_fullsock() instead.
v6: fixes to BPF helper function setsockopt (possible NULL deferencing, etc.)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Update tools/include/uapi/linux/bpf.h to include changes related to new
bpf sock_ops program type.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sample BPF program, tcp_clamp_kern.c, to demostrate the use
of setting the sndcwnd clamp. This program assumes that if the
first 5.5 bytes of the host's IPv6 addresses are the same, then
the hosts are in the same datacenter and sets sndcwnd clamp to
100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
sizes to 150KB.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which
sets the initial congestion window. It is useful to limit the sndcwnd
when the host are close to each other (small RTT).
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sample BPF program that assumes hosts are far away (i.e. large RTTs)
and sets initial cwnd and initial receive window to 40 packets,
send and receive buffers to 1.5MB.
In practice there would be a test to insure the hosts are actually
far enough away.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the
initial congestion window. This can be used when the hosts are far
apart (large RTTs) and it is safe to start with a large inital cwnd.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sample BPF program that sets congestion control to dctcp when both hosts
are within the same datacenter. In this example that is assumed to be
when they have the first 5.5 bytes of their IPv6 address are the same.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for changing congestion control for SOCK_OPS bpf
programs through the setsockopt bpf helper function. It also adds
a new SOCK_OPS op, BPF_SOCK_OPS_NEEDS_ECN, that is needed for
congestion controls, like dctcp, that need to enable ECN in the
SYN packets.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains a BPF program to set initial receive window to
40 packets and send and receive buffers to 1.5MB. This would usually
be done after doing appropriate checks that indicate the hosts are
far enough away (i.e. large RTT).
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added callbacks to BPF SOCK_OPS type program before an active
connection is intialized and after a passive or active connection is
established.
The following patch demostrates how they can be used to set send and
receive buffer sizes.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for calling a subset of socket setsockopts from
BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather
than making the changes to call the socket setsockopt function because
the changes required would have been larger.
The ops supported are:
SO_RCVBUF
SO_SNDBUF
SO_MAX_PACING_RATE
SO_PRIORITY
SO_RCVLOWAT
SO_MARK
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sample bpf program, tcp_rwnd_kern.c, sets the initial
advertized window to 40 packets in an environment where
distinct IPv6 prefixes indicate that both hosts are not
in the same data center.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds suppport for setting the initial advertized window from
within a BPF_SOCK_OPS program. This can be used to support larger
initial cwnd values in environments where it is known to be safe.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
RTOs to 10ms when both hosts are within the same datacenter (i.e.
small RTTs) in an environment where common IPv6 prefixes indicate
both hosts are in the same data center.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for setting a per connection SYN and
SYN_ACK RTOs from within a BPF_SOCK_OPS program. For example,
to set small RTOs when it is known both hosts are within a
datacenter.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program load_sock_ops can be used to load sock_ops bpf programs and
to attach it to an existing (v2) cgroup. It can also be used to detach
sock_ops programs.
Examples:
load_sock_ops [-l] <cg-path> <prog filename>
Load and attaches a sock_ops program at the specified cgroup.
If "-l" is used, the program will continue to run to output the
BPF log buffer.
If the specified filename does not end in ".o", it appends
"_kern.o" to the name.
load_sock_ops -r <cg-path>
Detaches the currently attached sock_ops program from the
specified cgroup.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). It uses the
existing bpf cgroups infrastructure so the programs can be attached per
cgroup with full inheritance support. The program will be called at
appropriate times to set relevant connections parameters such as buffer
sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
as IP addresses, port numbers, etc.
Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
distinct advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.
Although the bpf cgroup framework already contains a sock related
program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
(BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
only once during the connections's lifetime. In contrast, the new
program type will be called multiple times from different places in the
network stack code. For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.
The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.
This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.
This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.
Two new corresponding structs (one for the kernel one for the user/BPF
program):
/* kernel version */
struct bpf_sock_ops_kern {
struct sock *sk;
__u32 op;
union {
__u32 reply;
__u32 replylong[4];
};
};
/* user version
* Some fields are in network byte order reflecting the sock struct
* Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to
* convert them to host byte order.
*/
struct bpf_sock_ops {
__u32 op;
union {
__u32 reply;
__u32 replylong[4];
};
__u32 family;
__u32 remote_ip4; /* In network byte order */
__u32 local_ip4; /* In network byte order */
__u32 remote_ip6[4]; /* In network byte order */
__u32 local_ip6[4]; /* In network byte order */
__u32 remote_port; /* In network byte order */
__u32 local_port; /* In host byte horder */
};
Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.
The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2017-07-01
Here are some more Bluetooth patches for the 4.13 kernel:
- Added support for Broadcom BCM43430 controllers
- Added sockaddr length checks before accessing sa_family
- Fixed possible "might sleep" errors in bnep, cmtp and hidp modules
- A few other minor fixes
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on a request raised on the sctp devel list, there is a need to
augment the sctp_peeloff operation while specifying the O_CLOEXEC and
O_NONBLOCK flags (simmilar to the socket syscall). Since modifying the
SCTP_SOCKOPT_PEELOFF socket option would break user space ABI for existing
programs, this patch creates a new socket option
SCTP_SOCKOPT_PEELOFF_FLAGS, which accepts a third flags parameter to
allow atomic assignment of the socket descriptor flags.
Tested successfully by myself and the requestor
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Andreas Steinmetz <ast@domdv.de>
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree says:
====================
sfc: small MCDI cleanups
Giving the full MCDI event rather than just the code can aid in
debugging. While fixing this I noticed an outdated comment.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistake in mlx5_core_dbg debug message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the NFC pull requesy for 4.13. We have:
- A conversion to unified device and GPIO APIs for the
fdp, pn544, and st{21,-nci} drivers.
- A fix for NFC device IDs allocation.
- A fix for the nfcmrvl driver firmware download mechanism.
- A trf7970a DT and GPIO cleanup and clock setting fix.
- A few fixes for potential overflows in the digital and LLCP code.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZVYXGAAoJEIqAPN1PVmxKuA0P/RhaNOWIFLSpVSPOO2/bhLj0
oxlTW9y+Asy9MPfu5t4vllM/GqtpCYYBpfl35SDGViHlfSIPgrSDqQA2fNRT+hk1
AwQR8HcCcptfzJ5vkXhJl/9XOUmAalHk1e7KqvNioXP1dAZbCw4/2YK8skzBnoeR
sRYYSmsJNF1iZR6FSbtlPHWTZqD/SLRSvebDzWiYnltXMqBZeiwXmuEfRPrL/uxO
C2iAG9Ol9bkxpxpm4jgVTFrW6equymYX0iW4rYKmmd4Ej+29iS8NWAsgK/hCUtNO
jGrc9+1O/7MM2PoxGt70vojYMQWZpUmPWF2dvRx32+XWTcj96Nk3qOHEM62cQHMs
OhvgUh9BurXzjtlp+ZipwJhpylczxWZ3TEFxyDTV91cuk+hdQ8xtOWseGgx7XKBy
3kaYwlYKtiwACYBZ3b/4nzr+vNQ3GzL8VZoeGWzEyimofeWtLUWGuxtxFL4OaSeG
X3NOZM+g4ILLKH9IKG4DkqWjAn5RD70Jf01z6GfHsfjyz5Awax9Hpp+B8dih/pmb
4C4g5hX4X7L840Zc+l4WXMRLpH6AgL64orJmLBDxGeYAiPRlWSBHcMTrWSJuN/I4
pJ++rMpbL2dMXaX8LBgEchtPmzQzVCsKxTXB29NyIbdZE6ebNAnZ9Gur/PwiBQqD
wnb8HwFOyXgjKBFiA57P
=Ve8v
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC 4.13 pull request
This is the NFC pull requesy for 4.13. We have:
- A conversion to unified device and GPIO APIs for the
fdp, pn544, and st{21,-nci} drivers.
- A fix for NFC device IDs allocation.
- A fix for the nfcmrvl driver firmware download mechanism.
- A trf7970a DT and GPIO cleanup and clock setting fix.
- A few fixes for potential overflows in the digital and LLCP code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commits 2c0cba482e ("arm: sun8i: sunxi-h3-h5: Add dt node
for the syscon control module") to 2428fd0fe5 ("arm64: defconfig: Enable
dwmac-sun8i driver on defconfig") and 3432a86e64 ("arm: sun8i:
orangepipc: use internal phy-mode") to 5a79b4f2a5 ("arm: sun8i:
orangepi-2: use internal phy-mode") that should be merged
through the arm-soc tree, and end up in merge conflicts and build failures.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mostly fixes and cleanups, but iwlwifi and rtlwifi had also some new
features.
Major changes:
iwlwifi
* some changes in suspend/resume handling to support new FWs
* Continued work towards the A000 family
* support for a new version of the TX flush FW API
* remove some noise from the kernel logs
rtlwifi
* more bluetooth coexistance improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJZVenmAAoJEG4XJFUm622bA5IH+wVYmo7BOWXIihYbSnqMcdcq
oRxBygKjti2OUUSwpFfgLN2mR0Drs2tAsziyVtib6dSlR/zPfoyANcoAEEdU/cp3
sUpavj1615+2Un07w8Tj7WSFY1Oqr1HGzzvbAJNdj1MOmQWOcUxNWQaDo2u7uepo
FnJazQ81A/MbyFz9LqWjtpcW4C2duuIQ+c40LepCMN1mjVJvwFDL8gBgK14ZVmn8
/yrbpa+ytJk7Likvn9kk5ubeK0mJvoSvkmXj5AwQIMR0xa6PSjg+nj1hbA1RGcmQ
5D4kGvfrKTTKg9nW+QzFY9L7Fz6DxoPaSVdfJrMlzWNH5SjZvagZZLkx6PLWc98=
=qryd
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2017-06-30' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.13
Mostly fixes and cleanups, but iwlwifi and rtlwifi had also some new
features.
Major changes:
iwlwifi
* some changes in suspend/resume handling to support new FWs
* Continued work towards the A000 family
* support for a new version of the TX flush FW API
* remove some noise from the kernel logs
rtlwifi
* more bluetooth coexistance improvements
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When compiling OvS-master on 4.4.0-81 kernel,
there is a warning:
CC [M] /root/ovs/datapath/linux/datapath.o
/root/ovs/datapath/linux/datapath.c: In function
'ovs_flow_cmd_set':
/root/ovs/datapath/linux/datapath.c:1221:1: warning:
the frame size of 1040 bytes is larger than 1024 bytes
[-Wframe-larger-than=]
This patch factors out match-init and action-copy to avoid
"Wframe-larger-than=1024" warning. Because mask is only
used to get actions, we new a function to save some
stack space.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: remove typedefs from structures part 1
As we know, typedef is suggested not to use in kernel, even checkpatch.pl
also gives warnings about it. Now sctp is using it for many structures.
All this kind of typedef's using should be removed. As the 1st part, this
patchset is to remove it for 11 basic structures in linux/sctp.h. It is
also to fix some indents.
No any code's logic is changed in these patches, only cleaning up.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_init_chunk_t, and replace
with struct sctp_init_chunk in the places where it's using this
typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_inithdr_t, and replace
with struct sctp_inithdr in the places where it's using this
typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_data_chunk_t, and replace
with struct sctp_data_chunk in the places where it's using this
typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_datahdr_t, and replace with
struct sctp_datahdr in the places where it's using this typedef.
It is also to use izeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove this typedef, there is even no places using it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_param_t, and replace with
struct sctp_paramhdr in the places where it's using this typedef.
It is also to remove the useless declaration sctp_addip_addr_config
and fix the lack of params for some other functions' declaration.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_paramhdr_t, and replace
with struct sctp_paramhdr in the places where it's using this
typedef.
It is also to fix some indents and use sizeof(variable) instead
of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove this typedef, there is even no places using it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_cid_t, and replace
with struct sctp_cid in the places where it's using this
typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_chunkhdr_t, and replace
with struct sctp_chunkhdr in the places where it's using this
typedef.
It is also to fix some indents and use sizeof(variable) instead
of sizeof(type)., especially in sctp_new.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_sctphdr_t, and replace
with struct sctphdr in the places where it's using this typedef.
It is also to fix some indents and use sizeof(variable) instead
of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman says:
====================
nfp: introduce flower offload capabilities
this series adds flower offload to the NFP driver. It builds on recent
work to add representor and a skeleton flower app - now the app does what
its name says.
In general the approach taken is to allow some flows within
the universe of possible flower matches and tc actions to be offloaded.
It is planned that this support will grow over time but the support
offered by this patch-set seems to be a reasonable starting point.
Key Changes since v2:
* Revised flow structures to simplify setup/teardown and locking of stats
* Addressed other code-change review of v2
Other review questions regarding v2 have been answered on netdev.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously the flower offloads never sends messages to the hardware,
and never registers a handler for receiving messages from hardware.
This patch enables the flower offloads to send control messages to
hardware when adding and removing flow rules. Additionally it
registers a control message rx handler for receiving stats updates
from hardware for each offloaded flow.
Additionally this patch adds 4 control message types; Add, modify and
delete flow, as well as flow stats. It also allows
nfp_flower_cmsg_get_data() to be used outside of cmsg.c.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously there was no way of updating flow rule stats after they
have been offloaded to hardware. This is solved by keeping track of
stats received from hardware and providing this to the TC handler
on request.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds metadata describing the mask id of each flow and keeps track of
flows installed in hardware. Previously a flow could not be removed
from hardware as there was no way of knowing if that a specific flow
was installed. This is solved by storing the offloaded flows in a
hash table.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds push vlan, pop vlan, output and drop action capabilities
to flower offloads.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends matching capabilities for flower offloads to include vlan,
layer 2, layer 3 and layer 4 type matches. This includes both exact
and wildcard matching.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends the flower flow add function by calculating which match
fields are present in the flower offload structure and allocating
the appropriate space to describe these.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a flower based TC offload handler for representor devices, this
is in addition to the bpf based offload handler. The changes in this
patch will be used in a follow-up patch to add tc flower offload to
the NFP.
The flower app enables tc offloads on representors by default.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add phys_switch_id support by allowing lookup of
SWITCHDEV_ATTR_ID_PORT_PARENT_ID via the nfp_repr_port_attr_get
switchdev operation.
This is visible to user-space in the phys_switch_id attribute
of a netdev.
e.g.
cd /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0
find . -name phys_switch_id | xargs grep .
./net/eth3/phys_switch_id:00154d1300bd
./net/eth4/phys_switch_id:00154d1300bd
./net/eth2/phys_switch_id:00154d1300bd
grep: ./net/eth5/phys_switch_id: Operation not supported
In the above eth2 and eth3 and representor netdevs for the first and second
physical port. eth4 is the representor for the PF. And eth5 is the PF netdev.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to allow switchdev ops to be set if NET_SWITCHDEV is configured
and do nothing otherwise. This allows for slightly cleaner code which
uses switchdev but does not select NET_SWITCHDEV.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ep->base.sk gets it's value since sctp_endpoint_new, nowhere
will change it. So there's no need to check if it's null, as
it can never be null.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>